2005.11.23 02:32 "[Tiff] libtiff and over 4GB files", by Frank Warmerdam

2005.11.23 03:24 "Re: [Tiff] libtiff and over 4GB files", by Joris Van Damme

This post is *not* about BigTIFF. Instead it is about how libtiff reports errors if the application attempts to write a file larger than 4GB. The short answer is that libtiff doesn't seem to even notice, it just writes out a corrupt file (with tile/strip offsets wrapping back to 0 at the 4GB mark).

I have created a bug on this issue at:

  http://bugzilla.remotesensing.org/show_bug.cgi?id=1004

I am interested in feedback from folks on appropriate or inappropriate approaches to dealing with this problem.

There's writing beyond 4 gig, and there's writing beyond 4 gig. Let's ignore current wrapping back behaviour, and discuss what is actual desired behaviour. Is a classic TIFF truelly limited to 4 gig to begin with?

In theory, the stuff that makes the 4 gig boundary, is the fact that offsets are limited to 32 bits. But, from a file format point of view, one could argue that not every byte needs a offset pointing to it. For example, if a TIFF has a tile of 1 gig compressed space, and this tile is written as the very last datablock, then the highest offset used inside the TIFF is 1 gig less then the size of TIFF. Thus, theoretically, such a TIFF could grow to 5 gig in size.

But I believe this is somehow a wrong point of view. We all agree that file format specifications should not yield to implementation whim. But this is not just a whim issue of a particular implementation design, this is saying that offsets, as used in implementations at all, need to be able to grow beyond 4 gig, and a 'huge offset' variant of IO API is needed. The only way an app would be able to avoid the need for offsets >32 bits in reader implementation, in the above case, is reading the excess gig tile as a single chunk into memory... OK, LibTiff would actually try to do that anyhow, but we cannot seriously consider this behaviour to be a premise, and a more responsible choice of streaming smaller sized blocks through the graphics pipeline to be a particular 'implementation whim' that should not influence specification. In fact, LibTiff's current read-whole-tiles-only behaviour seems more like the whim, and to me it feels that if we were to regard such 5 gig TIFF as legit, we would imply the need for offsets >4 gig in any generic implementation.

Thus, I feel that LibTiff should check writing offset consistently. That would mean the whole wrapping back behaviour is solved anyhow, so it seems ignoring that behaviour to start my argument is valid.

Does anyone depend on libtiff being able to write a bit past the 4GB mark?

If I remember correctly, there was a user reporting on the list writing files > 4 gig and not being able to read them back not too long ago. I seem to also remember this was caused by current wrapping back behaviour, thus causing LibTiff to override previously written file header and all...

Are there situations where you think you can safely recommend users to depend on libtiff currently being able to write 'a bit' past the 4 gig mark? Can you define 'a bit' such that it makes operational sense to the LibTiff user, and not just from a file format point of view? (These questions are not rethorical, I do not know the answer.)

Joris Van Damme
info@awaresystems.be
http://www.awaresystems.be/
Download your free TIFF tag viewer for windows here:
http://www.awaresystems.be/imaging/tiff/astifftagviewer.html