1997.05.02 07:04 "BIG TIFF file problems", by Hans Christiansen

1997.05.02 14:32 "RE: BIG TIFF file problems", by Ed Grissom

I am having problems with very large tiff files created by our RIP software.

The tiff file created is a bilevel tiff comprising of uncompressed data in a single strip!! The size of the file is 707,953,939 bytes.

I noticed that the width (85640) multiplied by the height (66133) (=5,663,630,120) exceeds the maximum possible value of the type uint32 which is 4,294,967,295. It is this 'overflow' that causes tiffcp to fail!

Although it appears that Hans' theory on why his image fails is in error for this particular case, we are beginning to run into this problem for some files we are creating.

Even though this is more of a TIFF problem than a libtiff problem, I would like to make others aware of the limitations of TIFF - and stdio - that we are running into. Typically, we create tiled format data, but the problems we are seeing will happen for stripped data also.

We have a scanner that is capable of scanning 10"x10" transparencies at 7 microns (3628.5 DPI) for either greyscale or color. This leads to image sizes of 36Kx36K pixels, and uncompressed data sizes of 1,316,658,328 pixels (>1GPixel). For RGB data, this is 3,949,974,984 bytes (nearly 4GB).

At this point, we run into a problem with fseek, which (on 32-bit architectures) uses a SIGNED long for the offset. Tiles or strips with their starting address above 2GB are difficult to seek to since values above 2GB are considered negative. This can be partially compensated for by some arithmetic and seeking from the start or end of the file as appropriate -- However, it is not straight forward since negative values are valid inputs to fseek. Luckily, we can use NT's APIs for file I/O which have a 64-bit offset for the "fseek" analog. This renders the images fairly useless for interchange with other apps, however, since fseek is widely used.

Compression of the raster data could alleviate some of the problem here, but no compression method that is well suited to photographic images is fast enough to keep up with the scanner. Thus we are currently forced to generate uncompressed data from the scanner. Images that need to be interchanged with others can often be converted to compressed data so that the files are less than the 2GB magic number for fseek.

The next problem comes from the fact that we also typically create a pyramid of overviews to aid in fast display of these large images. When a full set of overviews (2x, 4x, 8x, ....) are added to one of the RGB images above, the total size in bytes approaches 5GB. The offset to any tiles or ifd's that exist above the 4GB address cannot be stored in the current TIFF format which only allows 4bytes for offsets.

Our current solution is to restrict the size of the scan when writing to the TIFF format.

We have plans for a scanner with even higher resolution where these problems will be even more of a difficulty (because they will be more commonly encountered)

If anyone knows about plans for getting around either of these problems in TIFF 7.0, I'd sure like to hear about them.

Thanks for listening..

--
ed grissom
egrissom@ingr.com