2007.01.15 01:09 "[Tiff] bigtiff", by Albert Cahalan

2007.01.15 16:12 "Re: [Tiff] bigtiff", by Albert Cahalan

On 1/15/07, Joris <joris.at.lebbeke@skynet.be> wrote:

> Albert Cahalan wrote:

More importantly, it was intended that software modifications be easy. Keeping the data structures the same size (particularly the IFD entry structs and the main header) lets TIFF-handling software be upgraded with very little change. The same data structures may be used.

But the price we have to pay is considerable. If I understand you correctly, we'd have holes in the files of up to 4096 bytes. Now, remember it is typical for an IFD to have lots of tags that have a few bytes in data size, not fitting in the IFD. That means an average overhead of an IFD would cost in the order of say five to ten times 4 kilobyte. That's just wasteful.

The waste is less than a millionth of the file size, or 0.0001 %. At "ten times", it's still only 1/100000 of the file size, or 0.001 %.

For the latest "cruft removal", you proposed relaxing the alignment requirement, or possibly enforcing it by actually shifting offsets. I would agree that the alignment requirements currently are somewhat pointless since I don't believe they are actually adhered to (even by libtiff?)

Enforce it or kill it.

That's not how it works. Be correct in what you write, and liberal in what you read, is our motto. Enforce it or kill it is not an option, as we've seen many times before. Some vendors still write OJPEG, we've been trying to kill that for two decades or so.

With bit shifting, there is no need to worry. The writer shifts values to the right. The reader shifts values to the left. There is no way to create a reader or writer that is semi-compatible. Either it works or it does not.

Both SSE and AltiVec have instructions that require 16-byte alignment. Handling unaligned data is slow. If the programmer doesn't handle it, SSE will fault and AltiVec will round off the addresses.

Is why I see little use of memory mapping based codec. You need to copy, anyway. And if you do, alignment is restored, and this is not a problem.

This isn't about memory mapping. Your bigtiff data structures are not aligned. When you read, you can only force the alignment of one thing. Everything else may remain unaligned. Operating systems tend to read more efficiently when you keep things aligned, so you're hurting there too. Some OSes do a page-flipping trick if the read is nicely aligned. Some OSes have a memcpy that is sensitive to the alignment, being only able to operate quickly if there is a way (via a few byte-by-byte operations) to mutually align source and destination.

No, bigtiff already broke things severely by changing struct layouts. If you're going to break things to that extent, you may as well take the opportunity to clean up some of the ugly things.

I disagree. Breaking severely, is, for example, getting a new tag labelling scheme in there. All we did was change offset bitdepth, in the most logical consistent and minimal change fashion.

Fine, forget that. Just pad things.

Pad 2-byte items to 2-byte boundries.

Pad 4-byte items to 4-byte boundries.

Pad 8-byte items to 8-byte boundries.

Pad bulk data to 16-byte boundries.

For structures, you do that by adding explicit padding to the definition of the struct.

For anything described by a file offset, you do that by requiring that the offsets be stored with a 4-bit shift.

Like I said, there is one point where I don't agree with Frank. I think it *is* too late to change BigTIFF design. The discussions ended over two years ago, and that's how long the 'proposal' has been up there for

I think this proves my point. People wanted this over two years ago. Clearly the spec has been overly difficult to implement and/or just not all that desirable.

You could make it easy to implement, by keeping the same data structures and merely adding a bit-shift flag to the tag type codes. You could make it a desirable format by aligning things nicely. Bigtiff does neither. It leaves tiff implementers with significant work for little gain.