1996.11.12 11:00 "Re: Basic TIFF parsing question", by Niles Ritter
This question is WAY basic but I'm not sure where else to ask.
I would like to parse a TIFF file for the following info:
- image dimensions (w x h)
- color mode (e.g., grayscale, bilevel)
I'm not going to modify the image data in anyway. I just want to find out what's in there.
Are you sure you want to do this by (Mac Perl) hand?
The alternative is to use libtiff, available from
which can be built as a CodeWarrior or Symantec C++ or MPW project. (since you are using a Macintosh). It can parse the dickens out of of TIFF file, so you don't have to. It is written in C, not perl.
If you really are a glutton for punishment, here's how to do it by hand (hold your breath):
1. Read in the First two bytes of the file. If "MM" then all of the multi-byte binary data (integers, floats, etc) are in Motorola (Macintosh) native format, otherwise if "II" they are in Intel (PC) byte order. If your hardware is in opposite byte order then you will need to swap the binary data. The information you seek is not in ASCII, so you will indeed need to check this.
2) Read in the next two bytes as a two-byte integer, and swap if in the opposite byte order of your machine. If the answer is not 42, this is not a valid TIFF file (I kid you not). Otherwise, forge on...
3) Read in the next four bytes as an unsigned 4-byte integer, (and swap if needed; I won't say this again). This gives the offset from the beginning of the file to the information about the *first* image in the file (there can be more than one). That's what the spec calls an "image file directory" (IFD).
4) Skip down the file to the offset location (offset 0 = first byte of file, etc), and read in the next two bytes as an unsigned integer N. The value N is the number of tags written out for this image. Among these N tags are the image height, width color mode and resolution. Each tag is referenced by a 12-byte header, which are listed immediately following the value N. Every tag has a unique ID number, used to identify the kind of information contained in it. The tag ID's you seek in the list are:
ImageWidth ID = 256
ImageLength ID = 257
Photometric Interpretations ID = 262
X Resolution ID = 282
Y Resolution ID = 283
Resolution Unit ID = 296
5) For each 12-byte tag entry in list,
read first two bytes as an integer tag ID
if the ID is not in the above list
skip to next 12-byte tag entry
(the tag ID's should occur in sorted-order, but
several bogus PC TIFF writing programs violated this).
read in next two bytes of entry as an integer field format type:
if type=3, the tag Value is a two-byte SHORT integer,
if type=4, the tag Value is a 4-byte LONG integer
if type=5, the tag Value is a RATIONAL number.
(There are other format codes, but you wont need them here.)
Skip the next four bytes, which are the Value-count (which
will be 1 in your cases)
If type=3 read in the next two bytes as the integer tag Value
if type=4 read in the next four bytes as the integer tag Value
if type=5 read in the next four bytes as as file-offset to
a pair of 4-byte unsigned integers P and Q. The Value of the
tag is quotient P/Q. Note: the location of P,Q may be anywhere
in the file, even prior to the tag-entry list, so you
better have saved all the stream data that came in thus
far (TIFF is not easily stream-able).
6) The width and length tag Values are typically either SHORT or LONG, the X and Y resolution are RATIONAL, while the Resolution unit and Photometric interpretation Values are SHORT integer codes.
For Photometric interpretation, the tag value indicates:
code=0 -> MIN is White
code=1 -> MIN is Black
code=2 -> RGB Color
code=3 -> indexed color map (palette)
code=4 -> image (holdout) mask
code=5 -> color separation (often, CMYK; need to check InkSet tag)
code=6 -> Y,Cb,Cr color
code=8 -> CIE L*a*b* color
For resolution unit, the tag value indicates:
code=1 no unit
default value is 2.
That wasn't so hard, was it?