2005.09.23 21:11 "[Tiff] Additional Lossless Compression Schemes", by Frank Warmerdam

2005.09.25 01:28 "Re: [Tiff] Additional Lossless Compression Schemes", by Joris Van Damme

Frank Warmerdam wrote:

> My understanding is that Deflate is... not as widely supported.

It's hard to define 'wide support' in an objective manner. From where I'm standing, it sure does *seem* (=subjective) to be widely supported, though. I now a few attempts at TIFF codec that only support very limited subsets (only uncompressed, or only G3 or G4 compressed, for example), there's also Photoshop using Adobe's code base, and there's a vast amount of apps using LibTiff. The first cannot be expected to support flate compression, but the second and third do.

Since then I have seen "gzip" on Unix, but I am not clear if that is just deflate or not.

It is.

Are there any opinions on possibly incorporating LZMA as an additional compression type?

What is the yield? It'll take some time before the new compression mode 'penetrates' the software pool 'out there', so is it worth it? If the yield is just 5% or even up to 15% better compression, I have doubts about this.

Also, note Chris' comments on prediction. Chances are you're using flate compression without prediction, and making this small adjustment could yield considerably already.

Whilst I doubt that any new lossless compression type could be worth the effort (except maybe JPEG2000, I don't know), I think it might be worth looking into a PCD-type scheme. Support for large images is getting more important as the years go by. We've currently got a real problem when the bulk of mainstream readers try to access a large image in a TIFF. They depend on single-chunk allocations and sometimes even decoding LibTiff RGBA style. In practice, this means many readers hang for half an hour of disc swapping activity, rendering the complete system useless, before the anoyed user decides to kill the process.

The SubIFD scheme, tiled pyramide stuff, was an attempt at curing this. It was a good attempt, and a logical one, but experience has proven there are two great disadvantages to the sceme

  1. The 'main image', the 'real image', the biggest one, is still the one encoded in the primare 'top level' IFD. That is good, and logical, but proves to not cure the problem with mainstream readers that don't even attempt to look beyond that top level IFD.
  2. The main image takes up space, and so do all of its downsamples. Lots of redundancy there.
  3. Perhaps we ought to look into a PCD-style encoding, where a 'reasonably sized' image is encoded as the primary IFD, and a new SuperIFD kinda tag points to progressively larger versions. Each larger image should be encoded not as is, but as the difference between the image and the upsampled version of the image that is one level smaller. A sortoff upside-down tile pyramide, except that lower levels are delta's.
  4. I can't be sure, but I think this is actually a good prediction scheme on large images. It could turn out, I think, that such an upside-down pyramide compresses to actually smaller a file then the single only largest image, with flate compression. Plus, it cures the mainstream readers problems with big images. Seems that is important, especially with BigTIFF applications in mind.
  5. Whether or not this applies to your quest... What is your quest? How did you come to seek better lossless compression, for what application?

Joris Van Damme
Download your free TIFF tag viewer for windows here: