2004.05.27 01:15 "[Tiff] large TIFF - two alternatives", by Steve Carlsen

2004.06.04 13:31 "RE: [Tiff] large TIFF - two alternatives", by Ed Grissom

I seemed to have gotten distracted down a side path of this discussion. Going back to the original proposal, I have a few comments.

Alternative 1: Minimal changes for TIFF to support 8-byte addresses

  1. ID = 43 (or maybe 0x4242?)
  2. 8-byte offset to 0th IFD
  3. Value/Offset fields are 8 bytes

Note: some tags may now fit into a "value" that formerly had to use offsets. For instance, any RATIONAL will now be in the directory entry, and will not be at some offset. This may lead to more code changes than were anticipated.

  1. 8-byte offset to the next IFD (does anyone use this?)

Yes. We do not yet use the "TIFF Trees" approach to overviews, so ours are stored as separate IFDs. So, yes, we need an 8byte offset to the next IFD.

  1. add TIFFType of LONG8, an 8 byte (unsigned) int

Although I am not sure that any use would be made of it, why not have a signed version as well? Is the omission just complexity reduction?

  1. StripOffsets and TileOffsets and ByteCounts can be LONG8

Can be? How do you know which they are? I think I am misunderstanding this. Either say that they _must_be_ LONG8 or define new tags StripOffsets8 and TileOffsets8 and ByteCounts8 that are always LONG8

Alternative 2: A more modern and general approach

While I would prefer an approach like the one above that require only minimal changes to the code to support a new format definition, I am not adverse to an entirely new format as long as it remains as flexible as the original TIFF is. The only issue is implementation time, of which we all have less than we need.

(in the following, any power of two can be used instead of 8)

  1. ID = the string "TIFF2"
  2. 8-byte pointer to 0th IFD/Dictionary
  3. Value/pointer fields are 8 (2**x) bytes
  4. 8-byte child pointers
  5. add an 8-byte TIFF integer type, but it's rarely used explicitly
  6. StripOffsets and TileOffsets and ByteCounts are 8 bytes
  7. support for 4 or 8 or 16 or ... 2**X byte addresses (header tells which one you're using)

This (and "c") means that the size of each directory entry changes based on the header as well, so there may be some complications there.

  1. use of ASCII tags/keys instead of binary ones

I can see good and bad here. The PNG approach is reasonable, and it does allow you to get in with a hex editor and see more of what is going on. However, it is my understanding that when using the Microsoft compilers, if you are building a "UNICODE" app, then it is a little tricky to deal with straight ASCII - it is not hard, but you cant just fscanf into a variable.

  1. most values are ASCII strings, too: integers, floating point numbers, enums, yielding fewer data types

As someone else brought up, there are localization issues with this. Even if the user never sees these values, developers need to watch out. For instance,

   sprintf(buff,"%f", dMyDouble);

produces different text depending on the localization setting. In particular, in some (most?) European settings, commas are used where Americans use a period (decimal point). If the file is created in America and transferred to Europe, it is possible that

  sscanf(buff, "%f", &dMyNEWDouble);

Will either fail, give 0, or only give the integer portion. Likewise when the transfer goes the other way. Also, are "thousands separators" banned, allowed, encouraged, required? These are different based on the locale as well. This can be alleviated by judicious use of " locale("C"); ", but why bother? The IEEE floating point standard is a world wide standard.

For integers, the issues become

The NITF format uses ASCII for the entire header, and has run into several issues with variables needing more space than what was planned for. So much so that there is now a optional binary section where apps can write their own info to give more precise metadata.

For enums, I don't see much of a problem, but I am not sure how to handle new enums and how to tell the difference between them and misspelled ones.

  1. Dictionaries (IFDs) can be really large.
  2. Cleaner hierarchical structure, with explicit inheritance from parents. top-level dictionary contains metadata only; it does not point to image data.
  3. Explicit support for layers

Please excuse my ignorance, but what are layers? (*) If these are essentially additional bands of data that need to travel along with the image, I'd be enthusiastic if this could be general enough to support additional data bands as well. There are several satellites and new mapping cameras that produce Near Infrared bands in addition to the familiar R,G, and B bands. Also there are satellites that produce data that is not visible-light R, G, or B, and they produce up to 7 bands of this stuff. It would be nice to be able to name the bands with some human readable text that the user can select to form a RGB trio for display.

(*) Ok, I am an occasional PhotoShop user, so I have heard of the concept. I would like it spelled out for me in case I have some misunderstandings, though.

  1. All data is planar (though it can be interleaved by row or tile)

YES!!! (well, my API is band separate, so this makes it easier for me)

  1. So there are 4 levels to the hierarchy: Root, Image, Layer, and Channel. Only Channel dictionaries point to image data.

Where do overviews and transparency masks fit in this hierarchy?

  1. Multi-byte numeric values are always big-endian, not that it matters very often in this more string-oriented approach.

I think this is where I got distracted. As discussed in another branch, multi-byte raster data values present a performance problem for the little-endian platform with this choice. Especially when the entire workflow from capture to rectify to mosaic to cut-out to print is all on the same LE platform. The alternative of having a BE header and LE data could work, but as another poster said this will be confusing at best. I would prefer the current TIFF approach of reading either but writing native.

  1. All other rules are normal TIFF rules.

Ok. Although, while we are on the subject, there are several other things I'd like to see in a new format....... ( I can see Steve throwing up his hands and wishing he had not opened this can of worms, so I will leave those to another day).

ed grissom
egrissom@ziimaging.com