2004.04.17 02:50 "Re: [Tiff] Large TIFF files", by Joris Van Damme

Which ever approach is taken, I would intend to produce a libtiff that supported both the new "big" format and the existing 32bit format. Ideally, libtiff would produce old style files normally, and switch over to producing the big format for large images.

How about this, for the benefit of the 'co-existence' of 32bit and 64bit TIFF:

For the header, let the first 8 bytes be the same old unchanged 32bit TIFF header, like this:

Bytes 0-1: endianess
Bytes 2-3: 42
Bytes 4-7: usual first IFD pointer.

Next, as an integral next part of the 64bit TIFF header, I propose

Bytes 8-15: 64bit TIFF signature value
Bytes 16-23: first 64bit IFD pointer.

This would mean that any 64bit TIFF is still a plain old 32bit TIFF, as far as any reader is concerned.

Applications/libraries writing TIFFs can use this feature to either

* Scheme 1: 32bit TIFF

Encode a plain old non-64bit 32bit TIFF. The chances of bytes 8-15 being the 64bit TIFF signature 'by accident' in a TIFF that is really 32bit only are 1/(2^16), are thus astronomically low.

* Scheme 2: pure 64bit TIFF

Encode a completely new 64bit TIFF, without real regard for 32bit-TIFF-only readers. This would imply writing zero for the first 32bit IFD pointer value. I suspect a robust old pre-64bit TIFF reader would not crash seeing those values, though possibly report an error like 'no image in file' or something. This means that the standard extension '.tif' and '.tiff' file extension and such can still be used, or, in other words, that 64bit TIFF is sortoff backwards compatible to some degree.

* Scheme 3: 64bit TIFF with added usefullness in 32bit TIFF readers

More fancy and possibly more recommended, a new TIFF writer could encode a true 64bit TIFF with downsampled images being accessable in the old 32bit TIFF way. The data of the downsamples could even be reused, probably. Like this

header
   - points to first 32bit TIFF IFD
        - points to downsample image data (a)
   - points to first 64bit TIFF IFD
        - points to real large image data (A)
        - points to SubIFD that is downsample
                - points to same downsample image data (a)

The major advantage of this last use of such a both-ways header (besides the presence of downsamples, that are bound to be usefull probably, since you are talking data >4gig and thus you are most probably talking either huge images or huge number of images), is that not only old readers will not break, but will even read meaningfully usefull data.

Old readers, on the other hand, could just go with the old 32bit TIFF header only, that's the sortoff backwards compatibility that would possibly yield usefull stuff still, even from a 64bit TIFF. They will not break, whichever of the writing schemes is used. Thus, this new format can be argued to still be TIFF, for what that's worth. A new reader that supports 64bit TIFF could check the next 8 bytes too, following the old-style header, and if it find these are set to the 64bit TIFF signature value, it can simply and only follow 64bit TIFF IFD structure, ignoring 32bit IFD structure, which would nevertheless yield access to the 32bit TIFF downsamples if they are present. Thus, scheme 3 seems maybe trouble to writers, but readers don't need to follow and match both IFD trees and can find and interpret all data from the true 64bit IFD tree only.

The disadvantage is of course that this scheme is not as plain and trouble-prone as a single IFD tree. Care has to be taken that all downsample data and 32bit IFD's are written in the lower 4 gig. Which means 'freedom of streaming in and writing the data in the order it gets calculated' is broken for the writer, and with such a huge amount of data, this freedom may be important. This scheme seems like allmost 'sinfull' when viewed in the true spirit of TIFF, and I'm not even sure I myself am in favor of this idea. But then again, encoders for whom this freedom is really important, can always choose to opt out of 32bit TIFF usefullness and just write 0 for first 32bit IFD pointer, meaning effectively the 32bit structures could be viewed as an added bonus, adding 32bit-reader-usefullness to a 64bit TIFF file, and not an obligation.

Just my two cents, going from the assumption that a single tile/strip will never break the 4 gig size boundary all by itself and that actual low-level compression/decompression schemes can remain unchanged, otherwise it doesn't seem to make much sense at all, of course.

Joris Van Damme
info@awaresystems.be
http://www.awaresystems.be
Download your free TIFF tag viewer for windows here:
http://www.awaresystems.be/imaging/tiff/astifftagviewer.html