2000.03.28 19:54 "Extremely large images in TIFF", by Chris Hanson

2000.03.28 20:29 "RE: Extremely large images in TIFF", by Ed Grissom

Here's an idea that I had for supporting large raster images in a TIFF-like file format.

Chris -

This came up about a year ago on this same list. Here is the summary of what we came up with at that time. Your suggestion would change every "offset" in the file to be of a type specified at the beginning of the file. The change we eventually arrived at below just makes all offsets 64-bits and does not bother with trying to save a few bytes if ALL of the offsets are less than 65K or less than 4GB.

First, I suggested....

###########################################################################

- Large images (>2gb & >4gb)

...It is apparent that larger files are in the works: the recently released NITF format allows images up to 17 Gigabytes in size.

Niles Ritter suggested that one solution to this dilemma could be to have the first IFD in the file be a reduced resolution version of the image that adheres to the current TIFF spec, and always contains offsets less than 2Gig (to avoid either problem). Another poster suggested that perhaps the first image would have text in it that explained that the file was too large for normal TIFF viewers and pointing to a document for the new spec.

A "private" tag in the first IFD could point to other IFDs much the same way the TIFF TREES proposal in TTN1 does. These IFDs would contain advanced features that a naive TIFF reader would not understand, but only apps that understand the private tag would be accessing these IFDs.

To implement this, we need some new tags, and some new tag types.

Obviously a tag type of "LONG64 - A 64 bit unsigned integer" is needed, and while we are at it we might as well include the signed version: "SLONG64"

New definitions of the offset tags for both strips and tiles are needed. I would assume that for single strip and single tile compatibility, new definitions of the bytecounts tags will also be needed, as well as a new RowsPerStrip definition.

The question now is: What numbers do we assign these new tags and types? I realize that with a private tag and essentially private IFDs, we are free to use any numbers we see fit. However, if any of these changes are planned for TIFF 7.0, I would like to be compatible with them.

Conversely, what should I do to get this into TIFF 7.0?

Write up TTN3?

Here is a stab at the basics....

========================================================================

TAG TYPES:

YY = "LONG64 - A 64 bit unsigned integer"
YY+1 = "SLONG64 - A 64 bit signed integer"
!!! DONT USE THESE TYPES - THIS IS NOT YET APPROVED!!!

TAGS:

IFDOffset64!!! DONT USE THIS TAG - THIS IS NOT YET APPROVED!!!
  Tag = XXX
  Type = LONG
  Count = 1

Points to a IFD that contains LONG64 data types in the tag list.

RowsPerStrip64!!! DONT USE THIS TAG - THIS IS NOT YET APPROVED!!!
  Tag = XXX+1
  Type = LONG64

The number of rows in each strip (except possibly the last strip.) For example, if ImageLength is 24, and RowsPerStrip is 10, then there are 3 strips, with 10 rows in the first strip, 10 rows in the second strip, and 4 rows in the third strip. (The data in the last strip is not padded with 6 extra rows of dummy data.)

StripOffsets64!!! DONT USE THIS TAG - THIS IS NOT YET APPROVED!!!
  Tag = XXX+2
  Type = LONG64

For each strip, the byte offset of that strip.

StripByteCounts64!!! DONT USE THIS TAG - THIS IS NOT YET APPROVED!!!
  Tag = XXX+3
  Type = LONG64

For each strip, the number of bytes in that strip after any compression.

TileOffsets64!!! DONT USE THIS TAG - THIS IS NOT YET APPROVED!!!
  Tag = XXX+4
  Type = LONG64
  N = TilesPerImage for PlanarConfiguration = 1
    = SamplesPerPixel * TilesPerImage for PlanarConfiguration = 2

For each tile, the byte offset of that tile, as compressed and stored on disk. The offset is specified with respect to the beginning of the TIFF file. Note that this implies that each tile has a location independent of the locations of other tiles. Offsets are ordered left-to-right and top-to-bottom. For PlanarConfiguration = 2, the offsets for the first component plane are stored first, followed by all the offsets for the second component plane, and so on.

No default. See also TileWidth, TileLength, TileByteCounts64.

TileByteCounts 64!!! DONT USE THIS TAG - THIS IS NOT YET APPROVED!!!
 Tag = XXX+5
 Type = LONG64
 N = TilesPerImage for PlanarConfiguration = 1
   = SamplesPerPixel * TilesPerImage for PlanarConfiguration = 2

For each tile, the number of (compressed) bytes in that tile. See TileOffsets for a description of how the byte counts are ordered.

No default. See also TileWidth, TileLength, TileOffsets64.

========================================================================

I can see at least one problem with this approach. Programs that add to or modify header values "in-place" (i.e. without re-writing the entire file) may try to re-write the IFD or some data pointed to by the TAGs in the IFD at the end of the image. Such programs will either fail or write a corrupted image if they try to modify one of these files.

Even programs that understand the LONG64 construct will be hard pressed to do this correctly since there is no free space in the <2gig area to write additional data for the first image.

Perhaps an additional tag that points to some free space that was purposefully left in the <2gig area could help here. (bring back the deprecated "FreeOffsets & FreeByteCounts tags ?)

The only other valid solution I can see is so drastic that I hate to even bring it up. We would need to modify the TIFF Version number to be something other than "42" and use an 8-byte initial IFD offset along with the definitions for the new tags above (or re-cast the current implementation of the necessary TAGs to accept LONG64 values).

############################################################################

Then there was a lot of discussion which eventually decided that the right way to do this is to change the TIFF magic number and go whole hog with a new definition, summarized as:

############################################################################

Initial file bytes changes:

New Data Type definitions:

LONG64 - A 64 bit unsigned integer
SLONG64 - A 64 bit signed integer"

Tag Definition:

A tag is now made up of 16 bytes, the last field is 8 bytes long, and either contains the data (if it will fit in 8 bytes), or an 8 byte pointer to the location of the data.

Redefinition of Existing Tags:

All tags that currently require a 4 byte pointer to data will use an 8 byte pointer in the new format.

The data offset tags (stripoffsets, tileoffsets) will _have_ to contain 8-byte pointers. Other TAGs that use offsets ( resolution, bits/sample/band ) will almost certainly have to use 8-byte pointers also, since we cannot guarantee the location that this data will be stored at.

I don't see how we can put an 8-byte pointer in _some_ TAGs without changing the size of all TAGs. Since we are redefining the size of a TAG structure, I would not say that the changes are modest.

############################################################################

To which Rainer Wiesenfarth added:

############################################################################

I'd prefer 66 (= 0x42) or 258 (a base 64 noted number 42) to keep the meaning behind it

> [...]
> A tag is now made up of 16 bytes, the last field is 8 bytes long, and
> either contains the data (if it will fit in 8 bytes), or an 8 byte pointer
> to the location of the data.

If we do changes, our goal should be not to change it again two years later. So maybe the following changes should also be done:

############################################################################

ed grissom
egrissom@ziimaging.com