1999.10.14 03:31 "Re: Bad G3 TIFF file or LIBTIFF bug?", by Niles Ritter

Our customer has brought in a G3 TIFF file that libtiff has trouble decoding. And I am stuck debugging through libtiff.

Could you take a look at the file?:

http://www.fastio.com/bad.tif

(66kb)

and determine what is at fault?

Yes.

Actually, this is a 2-page TIFF file, and libtiff succeeds in decoding page 1, but fails on page 2.

For example, "tiffsplit bad.tif" shows the error:

TIFFReadRawStrip: bad.tif: Read error at scanline 4294967295; got 17843 bytes, expected 48213.

Your error message is exactly right. The size of the file is 48461 bytes, and a tiffdump reveals for the second image directory:

StripOffsets (273) LONG (4) 1<30618>
...
StripByteCounts (279) LONG (4) 1<48213>

so, if you go to offset 30618 in the file, you only have 17843 bytes left in the file, contradicting the next line that the data for the (only) strip in the image has 48213 bytes, just like the error message said.

The imaging for windows software has not had the best track record for a TIFF implementation; in fact it sucks. It has been known to write bad noncompliant TIFF (it does not use libtiff), and it wrote a bad file in this case. I think this was the one that used to write bogus JPEG-TIFF files, but I could be confusing it with the Wang Imaging product (unless they merged evil forces).

Tiffdump does not show anthing unusual about the tags.

This file is viewable correctly by "Imaging for Windows Preview" made by Kodak for Microsoft on Win98.

I am willing to bet that the data is okay, but that the bogus software wrote the wrong byte size (or didn't update the value after doing the compression). In fact, I'll bet the code has a bogus hack that someone put into IWP when they realized that their byte counts are wrong, which simply ignores the byte count if it can get a full image out of the data they do have.

The proof of this will be to hack the byte offset to 17843 and see if libtiff can upack it....

[hack hack...]

Okay, I just hacked the bytes and both images come through tiffsplit just fine. The first image says "this is a first page", and the second says "I received your fax!" and a bunch of other stuff. You may find a copy of the hacked file in

http://home.earthlink.net/~ritter/tiff/good.tif

Run tiffdump and tiffsplit and see for yourself.

Like I said; the MS imaging package sucks. I vaguely recall being on the phone with them to try to convince their programmers to do a better job, but they didn't seem to care (this was several years ago).

I don't know if it's worth it to put yet another workaround in libtiff to try to unpack less data than it is supposed to have or not. My preference would be for a lot of folks to harangue whoever owns the blasted imaging thing now and tell them I said it *still* sucks.

--Niles.