2007.08.22 22:03 "[Tiff] Re: How to handle big single-strip images?", by Leonardo Serni

i wonder what will happen if one opens an tiff image where all the data is in a SINGLE strip (some writers offer this option) and also compressed (G4) and someone tries to use TIFFReadScanline with it? I would expect that libtiff needs to read the whole strip, because it's compressed, into memory. Or is it possible to read compressed data by chunks?

It is possible, but it looks like libtiff doesn't do this. If I read this correctly (I am now browsing 3.8.0 code, tif_read.c):

TiffReadScanLine invokes TIFFSeek

TIFFSeek calculates parent strip (in this case the first)

TIFFSeek sees it hasn't strip data, calls TIFFFillStrip to get it

TIFFFillStrip has an ominous comment:

Read the specified strip and setup for decoding.
The data buffer is expanded, as necessary, to
hold the strip's data.
*/

and, later on, in the case of memory mapped files (but the same applies for unmapped ones):

     /*
      * Expand raw data buffer, if needed, to
      * hold data strip coming from file
      * (perhaps should set upper bound on
      *  the size of a buffer we'll use?).
      */

Anyway, what happens is that the whole byte count is read, or so it seems to me:

         bytecount = td->td_stripbytecount[strip];
         ...
         if (TIFFReadRawStrip1(tif, strip, (unsigned char *)tif->tif_rawdata,
                     bytecount, module) != bytecount)
               return (0);

So, yes, if you have a 700 meg compressed file, libtiff apparently reads the whole 700 meg, then decompresses it one row at a time.

This means that a 700 meg compressed, 4GB uncompressed, single stripe image would not require 4GB or 4.7 GB, no... but yes, it *would* require 700 MB.

Also, whenever you seek from row N to row M with M < N, libtiff appears to "rewind" the strip and decompress-seek on it again:

} else if (row < tif->tif_row) {
/*

                  * Moving backwards within the same strip: backup
                  * to the start and then decode forward (below).
                  *
                  * NB: If you're planning on lots of random access within a
                  * strip, it's better to just read and decode the entire
                  * strip, and then access the decoded data in a random

fashion.
                  */
                 if (!TIFFStartStrip(tif, strip))
                         return (0);
         }

You can get good performances (memory-wise and compression-wise) using wisely the RowsPerStrip parameter. It requires application-specific tuning (e.g., if you care little about squeezing the last bytes of disk space, but do care for performances, you'll choose RowsPerStrip = 1; I believe that 8 and 16 tend to get you best performance with JPEG; otherwise the manual hints to keep stripe size (encoded) around 64K, and I concur, but it depends on your IOSS buffers: you might find that 1.4K, 128K or 96K give better disk throughput).

But the best choice is by far the one suggested by jcupitt: tiled TIFFs... He who despises them will end up reinventing them (BTDTGTTS).

Best regards,

Leonardo