1996.11.12 14:21 "Basic TIFF parsing question", by Ben Ko

1996.11.12 11:00 "Re: Basic TIFF parsing question", by Niles Ritter

This question is WAY basic but I'm not sure where else to ask.

I would like to parse a TIFF file for the following info:

I'm not going to modify the image data in anyway. I just want to find out what's in there.

Are you sure you want to do this by (Mac Perl) hand?

The alternative is to use libtiff, available from

ftp://ftp.sgi.com/graphics/tiff

which can be built as a CodeWarrior or Symantec C++ or MPW project. (since you are using a Macintosh). It can parse the dickens out of of TIFF file, so you don't have to. It is written in C, not perl.

If you really are a glutton for punishment, here's how to do it by hand (hold your breath):

1. Read in the First two bytes of the file. If "MM" then all of the multi-byte binary data (integers, floats, etc) are in Motorola (Macintosh) native format, otherwise if "II" they are in Intel (PC) byte order. If your hardware is in opposite byte order then you will need to swap the binary data. The information you seek is not in ASCII, so you will indeed need to check this.

2) Read in the next two bytes as a two-byte integer, and swap if in the opposite byte order of your machine. If the answer is not 42, this is not a valid TIFF file (I kid you not). Otherwise, forge on...

3) Read in the next four bytes as an unsigned 4-byte integer, (and swap if needed; I won't say this again). This gives the offset from the beginning of the file to the information about the *first* image in the file (there can be more than one). That's what the spec calls an "image file directory" (IFD).

4) Skip down the file to the offset location (offset 0 = first byte of file, etc), and read in the next two bytes as an unsigned integer N. The value N is the number of tags written out for this image. Among these N tags are the image height, width color mode and resolution. Each tag is referenced by a 12-byte header, which are listed immediately following the value N. Every tag has a unique ID number, used to identify the kind of information contained in it. The tag ID's you seek in the list are:

         ImageWidth                  ID = 256 
         ImageLength                 ID = 257 
         Photometric Interpretations ID = 262 
         X Resolution                ID = 282 
         Y Resolution                ID = 283 
         Resolution Unit             ID = 296 

5) For each 12-byte tag entry in list,

  read first two bytes as an integer tag ID
  if the ID is not in the above list
        skip to next 12-byte tag entry
        (the tag ID's should occur in sorted-order, but
         several bogus PC TIFF writing programs violated this).
  otherwise
        read in next two bytes of entry as an integer field format type:
           if type=3, the tag Value is a two-byte SHORT integer,
           if type=4, the tag Value is a 4-byte LONG integer
           if type=5, the tag Value is a RATIONAL number.
           (There are other format codes, but you wont need them here.)
        Skip the next four bytes, which are the Value-count (which
           will be 1 in your cases)
        If type=3 read in the next two bytes as the integer tag Value
        if type=4 read in the next four bytes as the integer tag Value
        if type=5 read in the next four bytes as as file-offset to
           a pair of 4-byte unsigned integers P and Q. The Value of the
           tag is quotient P/Q. Note: the location of P,Q may be anywhere
           in the file, even prior to the tag-entry list, so you
           better have saved all the stream data that came in thus
           far (TIFF is not easily stream-able).

6) The width and length tag Values are typically either SHORT or LONG, the X and Y resolution are RATIONAL, while the Resolution unit and Photometric interpretation Values are SHORT integer codes.

For Photometric interpretation, the tag value indicates:

      code=0 -> MIN is White
      code=1 -> MIN is Black
      code=2 -> RGB Color
      code=3 -> indexed color map (palette)
      code=4 -> image (holdout) mask
      code=5 -> color separation (often, CMYK; need to check InkSet tag)
      code=6 -> Y,Cb,Cr color
      code=8 -> CIE L*a*b* color

For resolution unit, the tag value indicates:

      code=1 no unit
      code=2 inches
      code=3 centimeter

default value is 2.

That wasn't so hard, was it?

--Niles.