2004.04.20 14:20 "Re: [Tiff] Large TIFF files", by Joris Van Damme
There are in essence only 2 routes for 64 bit tiff
- compatibility with existing
- breaking with existing (but keeping as many good things as possible)
The first leads to all kinds of trouble as mentioned in earlier mails. Frank proposed a new 64 bit format BTIFF (or BIF: Big Image File or BIg File) and for now this seems to me the most promissing way.
Does make sense. I did make a proposal in the other direction (sortoff), but the incompatible 64bit BTIFF option that is a logical extension of 32bit TIFF is certainly more clean ans less troublesome.
It seems to me that many applications will not need 64 bit.
Don't be too sure. Future has a way of catching up with all expectations very quickly.
PRELIM. PROPOSAL BIF
All types will be mapped upon signed 64 bit, alignment to 64 bit or 8 bytes
1 = ASCII string 0 terminated
2 = BLOB bytearray
3 = DOUBLE
4 = LONG64
5 = RATIONAL64 LONG64/LONG64 (16 bytes)
6 = COMPLEX LONG64-LONG64 (16 bytes)
All nummeric types are signed, this would make files max. 2^63 bytes in stead of 2^64
seems large enough for now. 2^63 = 9.223.372.036.854.775.808
All (tiff32) types are mapped upon long64 (even boolean) This seems a waste of bytes but if we talk about files >> 4GB these few bytes won't matter too much I assume.
As to the 'all 64bit values are signed' part, I can live with that. But I think it is preferable to build on from the existing datatypes of 32bit TIFF. This has the advantage of
- being validated by age and past experience
- being documented
- being coded up already
Also, I don't agree with the 'doesn't matter much, so let's go ahead and waste' part. If there's no true benefit to wasting, then there's no true benefit to wasting. Tag data that is an aray of words should not take up 8 bytes per word. (If that is indeed what you intended to say here.)
Bytes 0-1: endianess // must we keep this ?
Bytes 2-7: SAM 64 // in ASCII
Bytes 8-15: 64 bit IFD offset // 0000000 for last IFD
The number of directory entries is also a LONG64;
Endianness: Yes I think we must keep this, for these reasons
- There are two worlds in this world.
- Been validated by age and past experience
- Been documented
- Been coded up already
I'm repeating myself, but that's just my general opinion: I don't think we should re-invent TIFF, but build on from it instead, merely logically extending it with the new 64bit stuff. This implies not breaking the existing datatype scheme, but merely adding a few entries to it. And it implies no building a new library from scratch, but merely extending existing LibTiff, which is important too. I don't see any downside to that, we needn't comprimise because of the desire to stick with what has proven to work.
Magical signature value: Hey, someone remembers Sam?! But what has he got to do with 64bit TIFF?
Byte 0-7: tag
Byte 8-15: type
Byte 16-23: value (type=3-6) length (type=1,2)
Byte 24-31: opt. value (type=5,6) 64bit offset (type=1,2)
Tag's are the same as in the TIFF 6.0 spec. preceded with 0's. The range with the first 32 bit a 0 are reserved for this 'compatibility'
The range with the first 32 bit a 1 are reserved for 'the commitee' (whoever ...)
Tag code: Again, I'd prefer sticking with the existing tag codes, which implies that this remains a word, not suddenly and pointlessly grow to a 64bit value, which would lead to duplicating history in the end.
Type code: Why 64bits? Existing scheme allows for 65536 datatypes, is more then sufficient.
Count and offset: I'm not quite sure I understand your description of the count field, but if you mean the logical extension, having them both grow to 64bit, I do agree.
In general: the desire to extend TIFF's size limits doesn not mean we have to go overboard and every single bit suddenly needs to take up 64 of them. What is logically a Word value, has always been a Word value, and hasn't had the need for more then a few dozen values at most sofar, shouldn't suddenly change, there's no single reason for that.
There is one new TAG the metaTAG
FFFFFFFF-00000001 // Metatag
00000000-00000001 // ASCII
offset --> data
data = "XXXXXXXX-XXXXXXXX:description"
X = value of a new tag used in this file
example data: "11111111-00000001: A tag for internal use of application X only."
this way applications can extend the metadata in a documented way. Note that the tag is not registered in any way and only valid for this file.
Seems like a nice idea. On the other hand, seems like duplicate trouble. I personally prefer the JPEG tradition when it comes to 'identifying proprietary data'. I'd prefer recommending starting proprietary data with a short nul-terminated string that identifies the writer. The downside of this is off course that proprietary data cannot be said to be of type 64bit. So maybe your idea is best.
END PRELIM PROPOSAL
I didn't think about the image data yet but for now compressions are just same, tiles and stripes are just as usual only they have a 64 bit offset/count. I think it is good practice to keep tiles small.
Yes, and I'd be in favor of recommending a particular max size. Like the old spec recommended 8K, I'd recommend 2 meg these days. Not obligatory.
[and the discussion continues ...]
Yes it does! We may be witnessing a historical moment. ;-)
Joris Van Damme
Download your free TIFF tag viewer for windows here: