2012.01.26 19:02 "Re: [Tiff] TIFFClientOpen() output stream/encoding behaviour", by Joris Van Damme

Chris,

The image library provides access to output stream which can be written an to only forward. Once data is written, it's written and the stream pointer can only go forward.

That's a problem.

Depending on encoder design, the grand scheme probably comes down to this

write tiff header
for each page in the multi-page tiff
   write ifd
   write tag data
   write compressed image data

The problem is that all the above contains a lot of forward pointers. For example, the IFD contains pointers to tag data and compressed image data. That should mean you'll have to go back and update these pointers, once you're ready to start writing those blocks and thus you have the offset at which you start writing it.

You can try and avoid this problem, by writing tag data and compressed image data first, carefully remembering where you put them, so as to next be able to write the ifd and include correct pointers that do not need to be updated. However, the first ifd is pointed to by a forward pointer in the tiff header, and each subsequent ifd is pointed to by a pointer in the previous ifd, so that does not completely solve you problem.

I think there's two possible complete solutions, but they both have a considerable price tag.

The first option is to encode to a memory buffer. Do the writing only when you're confident all forward pointers have been updated to their final values. The price tag here, is that you'll need a memory buffer that is at least as large as a complete tiff page. Since you cannot know the exact size beforehand, you'll have to resort to a dynamic memory growth scheme of constant realloc or such, or use a fragmented buffer that is hard to index.

The second option is to refactor encoding as a two-pass process. All item encoding routines, should be able to calculate their required length, and remember the correct writing offset. Only in the second-pass is an item encoding routine allowed to do the actual writing. In that second pass, it's allowed to use the exact offset values remembered with other items, and this will not need a subsequent update. This is a scalable solution, you will not need a memory buffer of unknown and possibly vast size. But the price you pay, is a serious performance penalty.

Adopting LibTiff to use either of these schemes, will be hard.

Is essence, TIFF is a random-access file format, not a sequential file format. Try and rethink the output stream limitations. If possible, that'll be the best solution by far.

Best regards,

Joris Van Damme
AWare Systems