2004.04.21 07:06 "Re: [Tiff] Large TIFF files", by Rob van den Tillaart

There are in essence only 2 routes for 64 bit tiff

compatibility with existing
breaking with existing (but keeping as many good things as possible)

The first leads to all kinds of trouble as mentioned in earlier mails. Frank proposed a new 64 bit format BTIFF (or BIF: Big Image File or BIg File) and for now this seems to me the most promissing way.

Does make sense. I did make a proposal in the other direction (sortoff), but the incompatible 64bit BTIFF option that is a logical extension of 32bit TIFF is certainly more clean ans less troublesome.

I think that both directions should be worked out and discussed, influencing each other.

It seems to me that many applications will not need 64 bit.

Don't be too sure. Future has a way of catching up with all expectations very quickly.

true, I know of the famous statements like that the US would only need 4 computers in the long term and that no application will need more than 640KB. So the statement becomes all applications will use 64 bit in the future. This is an argument to do an evolution in fileformat in stead of revolution to ease the transit. (guess Joris is more an evolutionist, and I am more an revolutionist :)

PRELIM. PROPOSAL BIF

All types will be mapped upon signed 64 bit, alignment to 64 bit or 8 bytes

TYPES

1       = ASCII         string 0 terminated
2       = BLOB          bytearray
3       = DOUBLE
4       = LONG64
5       = RATIONAL64    LONG64/LONG64   (16 bytes)
6       = COMPLEX       LONG64-LONG64   (16 bytes)

All nummeric types are signed, this would make files max. 2^63 bytes in stead of 2^64

seems large enough for now. 2^63 = 9.223.372.036.854.775.808

All (tiff32) types are mapped upon long64 (even boolean) This seems a waste of bytes but if we talk about files >> 4GB these few bytes won't matter too much I assume.

As to the 'all 64bit values are signed' part, I can live with that. But I think it is preferable to build on from the existing datatypes of 32bit TIFF. This has the advantage of

being validated by age and past experience
being documented
being coded up already

All existing tiff32 datatypes map upon these 6. The documentation can be reused for a large part. By minimizing the number of datatypes the maintenance of the codebase will be easier.

Question: should BTIFF have UNICODE (2byte char) iso ASCII (1 byte char)?

Also, I don't agree with the 'doesn't matter much, so let's go ahead and waste' part. If there's no true benefit to wasting, then there's no true benefit to wasting. Tag data that is an aray of words should not take up 8 bytes per word. (If that is indeed what you intended to say here.)

Diskspace and memoryspace has become a commodity compared to the time the original tiff spec was made. I am aware that TAG's would take up far more space than necessary but as % of the total filesize (we are trying to break the 4 GB barrier) I expect that it will be max 5%. OK, for multipage tiff with millions of relative small images this percentage can be higher.

simple calculation:

A 4GB image broken up in 2MB tiles would need 2K of offset pointer + length bytes = 2K * 16 bit = 32KB. Compared to the 4GB this is approx 1%.

HEADER

Bytes 0-1: endianess            // must we keep this ?
Bytes 2-7: SAM 64               // in ASCII
Bytes 8-15: 64 bit IFD offset   // 0000000 for last IFD

The number of directory entries is also a LONG64;

Endianness: Yes I think we must keep this, for these reasons

There are two worlds in this world.
Been validated by age and past experience
Been documented
Been coded up already

I'm repeating myself, but that's just my general opinion: I don't think we should re-invent TIFF, but build on from it instead, merely logically extending it with the new 64bit stuff. This implies not breaking the existing datatype scheme, but merely adding a few entries to it. And it implies no building a new library from scratch, but merely extending existing LibTiff, which is important too. I don't see any downside to that, we needn't comprimise because of the desire to stick with what has proven to work.

Magical signature value: Hey, someone remembers Sam?! But what has he got to do with 64bit TIFF?

If Sam didn't make tiff32 we hadn't this discussion. It seems to me a small tribute to someone who had such great impact.

TAGS

Byte 0-7:       tag
Byte 8-15:      type
Byte 16-23:     value           (type=3-6)      length          (type=1,2)
Byte 24-31:     opt. value      (type=5,6)      64bit offset    (type=1,2)

Tag's are the same as in the TIFF 6.0 spec. preceded with 0's. The range with the first 32 bit a 0 are reserved for this 'compatibility'

The range with the first 32 bit a 1 are reserved for 'the commitee' (whoever ...)

Tag code: Again, I'd prefer sticking with the existing tag codes, which implies that this remains a word, not suddenly and pointlessly grow to a 64bit value, which would lead to duplicating history in the end.

Type code: Why 64bits? Existing scheme allows for 65536 datatypes, is more then sufficient.

Count and offset: I'm not quite sure I understand your description of the count field, but if you mean the logical extension, having them both grow to 64bit, I do agree.

In general: the desire to extend TIFF's size limits doesn not mean we have to go overboard and every single bit suddenly needs to take up 64 of them. What is logically a Word value, has always been a Word value, and hasn't had the need for more then a few dozen values at most sofar, shouldn't suddenly change, there's no single reason for that.

I must apologize that I didn't reread this section before mailing it, there are errors in it (mea culpa). The scheme meant is logical identical to the tiff32 scheme but all fields upgraded to 64bit. [except byte 0-7 of the header] I'll have to redo some homework :)

METATAG

There is one new TAG the metaTAG

        FFFFFFFF-00000001       // Metatag
        00000000-00000001       // ASCII
        some length
        offset --> data

        data = "XXXXXXXX-XXXXXXXX:description"
        X = value of a new tag used in this file

        example data: "11111111-00000001: A tag for internal use of application X only."

this way applications can extend the metadata in a documented way. Note that the tag is not registered in any way and only valid for this file.

Seems like a nice idea. On the other hand, seems like duplicate trouble. I personally prefer the JPEG tradition when it comes to 'identifying proprietary data'. I'd prefer recommending starting proprietary data with a short nul-terminated string that identifies the writer. The downside of this is off course that proprietary data cannot be said to be of type 64bit. So maybe your idea is best.

Good idea to add the owner in this description. Additional info could be an URL to a detailed description of the tags etc. In good TIFF tradition the use of the Metatag should be recommended when you use proprietary tags, but not mandatory.

Example

A new TAG can be an audio file (e.g. digital cameras, voice annotations). The type will be BLOB and the data can be stored at some 64bit offset. Additional tags to describe format compression etc of the audio file can be added.

The 64bit TAG space gives lots of room for registering company specific ranges

48 bit = company ID => 16 bit tag range.
48 bit => 10^15 companies. 16 bit private tags = 65536 tags

Problem where to register this?
IANA? (could)
reuse the SNMP enterprise ID's? (no, bad idea )

END PRELIM PROPOSAL

I didn't think about the image data yet but for now compressions are just same, tiles and stripes are just as usual only they have a 64 bit offset/count. I think it is good practice to keep tiles small.

Yes, and I'd be in favor of recommending a particular max size. Like the old spec recommended 8K, I'd recommend 2 meg these days. Not obligatory.

OK,

[and the discussion continues ...]

Yes it does! We may be witnessing a historical moment. ;-)

(wasn't it the Muppet lab where history was made?" :)

Thanks for your good remarks, you do a great job 'defending' the evolutionary approach. For discussion sake I will stay on the opposite site to keep us sharp.

regards
rob tillaart
" Yesterday is history, tomorrow is a mystery, today is a gift, thats
why we call it the present."
- origin unknown -