2010.04.02 07:29 "Re: [Tiff] TIFFReadScanline and large compressed one-strip files", by
I have an LZW compressed TIFF file of 290MB or so with the whole image in one strip. Reading the file with the TIFFReadScanline() function causes the entire 280MB of compressed data to be loaded into memory at the point the first scanline is read.
Such lack of scalability is the reason that my proprietary TIFF codec streams the stuff, rather then working with complete buffers. That is, on the right hand side the scanline consumer pulls scanlines from the decoding step, as needed. As a consequence, on the left hand side, the decoder pulls raws data from the raw data input step, as needed. The sceme can work perfectly fine with multiple steps (like a de-predicting step, and/or a byteswapping step, color conversion step, etc), and can work perfectly fine with very small buffers (not too small, as that starts to cause some overhead and bad caching, but still no bigger then a few dozen kilobyte raw data buffer, or one single scanline buffer, per step).
It would be a widespread change to implement this kind of behaviour inside LibTiff. But if you're satisfied to solve this local and well-defined situation only, you could copy existing stuff like the LZW decoder, and modify it outside LibTiff to interface with LibTiff IO functions and progress through small buffers at the time. Should not be hard, especially since the LZW algorithm is rather simple. If you want, or if LibTiff's implementation of LZW decoding is too obfuscated, I can lend a hand showing you the appropriate snippet of my own codec.
I don't agree with the 'Whack the file's producer upside the head' comments. That simply doesn't quite solve the problem. Illegit files are out there, and it's a decoder's job description to handle it as forgiving and robust and wide as possible. This type of file, is not even illegit, it's a perfectly fine TIFF, the single concern here is that a decoder should be scalable, and that's something we should always aim for in any code anyway. Whenever any size does not have a theoretical and practical limit that is predictably small enough, design should not be such that this size ends up to determine the size of any buffer, that's just common sense. Whack original LibTiff's designers upside the head, and let's whack ourselves while we're at it for not upgrading this design a decade ago.