2005.11.23 02:32 "[Tiff] libtiff and over 4GB files", by Frank Warmerdam

2005.11.23 04:06 "Re: [Tiff] libtiff and over 4GB files", by Frank Warmerdam

On 11/22/05, Joris <joris.at.lebbeke@skynet.be> wrote:

There's writing beyond 4 gig, and there's writing beyond 4 gig. Let's ignore current wrapping back behaviour, and discuss what is actual desired behaviour. Is a classic TIFF truelly limited to 4 gig to begin with?

In theory, the stuff that makes the 4 gig boundary, is the fact that offsets are limited to 32 bits. But, from a file format point of view, one could argue that not every byte needs a offset pointing to it. For example, if a TIFF has a tile of 1 gig compressed space, and this tile is written as the very last datablock, then the highest offset used inside the TIFF is 1 gig less then the size of TIFF. Thus, theoretically, such a TIFF could grow to 5 gig in size.

Joris,

As you note, there are various potentially legal TIFF files somewhat larger than 4GB. I personally would be in favor of preventing libtiff from writing such files, though I wouldn't necessarily want to preclude it from reading them. I could also live with us just ensuring that we don't write files with any invalid offsets.

I would add that since libtiff normally writes the directory *after* writing the imagery, it is somewhat hard to construct a TIFF file with libtiff that has imagery going past the 4GB boundary since when the directory is written past the end of that it will be all screwed up. It is, in theory, possible, if you flush the directory to disk before writing imagery (and then refresh the offsets).

Does anyone depend on libtiff being able to write a bit past the 4GB mark?

If I remember correctly, there was a user reporting on the list writing files > 4 gig and not being able to read them back not too long ago. I seem to also remember this was caused by current wrapping back behaviour, thus causing LibTiff to override previously written file header and all...

Right, and this is what I see happening frequently in the GDAL user community. The current lack of checking is resulting in the generation of lots of corrupt files, without users realizing why.

Several years ago when I was at PCI I tried to make the application level code calling libtiff recognise in advance if the file was going to be too large, but it is very hard to do this reliably, especially when compression is being used. That is why I am now promoting the idea of fixing this right in libtiff.

Are there situations where you think you can safely recommend users to depend on libtiff currently being able to write 'a bit' past the 4 gig mark? Can you define 'a bit' such that it makes operational sense to the LibTiff user, and not just from a file format point of view? (These questions are not rethorical, I do not know the answer.)

Well, we could allow writing a bit past the 4GB mark by just checking that offsets are valid, but I don't see this as a particularly valuable behavior. I was hoping that no one would speak up in favor of this need so I could just strictly limit libtiff to 4GB files.

Best regards,
--

---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent