2005.03.30 00:18 "[Tiff] RFC: fast 'copy free' tiff decoding", by Ron

2005.03.30 05:55 "Re: [Tiff] RFC: fast 'copy free' tiff decoding", by Ron

The basic idea is very simple. First map a tiff into memory, then cast carefully constructed C structures into the right offsets of that memory space to be able to access the data directly.

What problem are you trying to solve?

How to handle subsets of the available data is probably the most pressing one. Without handling them more than necessary and without losing data that I have incomplete knowledge about.

I want to be able to read/write baseline tiffs, and to be able to examine and manipulate exif data in them, and also to do the same with exif data found in some jpeg images. (other private IFD's should be a cinch once all that is easy)

What I don't want to do is exhaustively hand code copy operations to extract and verify every possible tag and IFD and copy them and their values into (distorted) mirror structures before being able to access any part of that, or to pass that information to other parts of my code. At least not before having a good general solution that catches all the ones we have not yet specialised by default. And either way, I don't want to be forced to decide whether to keep the 'original' tiff in memory, or just the processed copy, or worse both, when unless the image data in it is compressed etc. they are essentially the same data slightly reshuffled.

Fixing the private IFD access should not require re-implementing the whole library! :-)

Most certainly. And indeed neither should this I think. It would break the current abi because the TIFF structure would have some common parts removed from it and reordered, but the essential functions of the library would go almost completely unchanged. Or at least could where retaining source compatibility with old code was pre-eminent.

The pro for doing that is we then have some very nice public structures which users can pass about in their code, and which don't need to be put through a translation function to turn them back into the 'stuff of tiffs' since they are one and the same.

I'm raising this here precisely because I don't want to reimplement the whole lib, most of the functionality is fine, but we need a nice (set of) data structure(s) to pass this information around in and that's the angle I've been looking at this from.

I was able to decompose a tiff like this (including the non-baseline hunk out of a jpeg APP0 section) with just a few lines of code that did not use libtiff at all. Getting existing libtiff to read from those structures does not seem terribly daunting, but the question of whether it is in fact a good idea to do is of course still open. :-)

I agree that mmap() is cool, but libtiff already uses it to read files, just not quite as efficiently as it could. Requiring mmap() would make libtiff less portable.

Yes, this does not require mmap to be an advantage though, even without it you can still more efficiently read a block of memory in by whatever method you like and cast the correct data structure over it. Then seek and read another etc.

Based on my own testing, libtiff's performance is quite good. The library is very robust and well tested.

From what I have seen of the code that is not surprising, I'm certainly not suggesting this because of any performance bottle neck I've measured, its the trouble I'm having with seeing how to get generic tiff data in and out, and how to pass it around within my code efficiently.

Having tiff data resemble a tiff would seem to be an advantage for a lot of reasons though, so I would not be surprised if some gains for some people were found in this respect too.

Perhaps other people have some very simple strategy to the same end that is better than this, but it would seem like we need to change at least some things in a way that may not be compatible so I'd welcome any clues on how others see this may all unfold.

Right now I can trivially extract:

Tiff: using memory mapped source
Tiff: byte sex = 4949
Tiff: version = 2a
Tiff: first IFD at 8
Tiff: 49 49 2a 0 8 0 0 0 b 0 e 1 2 0 20 0 0 0 92 0
Tiff: IFD 1 has 11 tags
Tiff: next IFD offset = 792
Tiff: IFD 1 at 0xb7b76008, tags at 0xb7b7600a
Tiff: tag: 270, type 2, count 32, value: 146
Tiff: tag: 271, type 2, count 24, value: 178
Tiff: tag: 272, type 2, count 7, value: 202
Tiff: tag: 274, type 3, count 1, value: 1
Tiff: tag: 282, type 5, count 1, value: 216
Tiff: tag: 283, type 5, count 1, value: 224
Tiff: tag: 296, type 3, count 1, value: 2
Tiff: tag: 305, type 2, count 8, value: 232
Tiff: tag: 306, type 2, count 20, value: 264
Tiff: tag: 531, type 3, count 1, value: 2
Tiff: tag: 34665, type 4, count 1, value: 284
Tiff: has exif IFD
Tiff: exif IFD has 24 tags
Tiff: next IFD offset = 0
Tiff: exif tag: 33434, type 5, count 1, value: 578
Tiff: exif tag: 33437, type 5, count 1, value: 586
Tiff: exif tag: 34850, type 3, count 1, value: 3
Tiff: exif tag: 34855, type 3, count 1, value: 100
Tiff: exif tag: 36864, type 7, count 4, value: 808530480
Tiff: exif tag: 36867, type 2, count 20, value: 594
Tiff: exif tag: 36868, type 2, count 20, value: 614
Tiff: exif tag: 37121, type 7, count 4, value: 197121
Tiff: exif tag: 37122, type 5, count 1, value: 634
Tiff: exif tag: 37380, type 10, count 1, value: 642
Tiff: exif tag: 37381, type 5, count 1, value: 650
Tiff: exif tag: 37383, type 3, count 1, value: 5
Tiff: exif tag: 37384, type 3, count 1, value: 3
Tiff: exif tag: 37385, type 3, count 1, value: 0
Tiff: exif tag: 37386, type 5, count 1, value: 658
Tiff: exif tag: 37500, type 7, count 520, value: 916
Tiff: exif tag: 37510, type 7, count 125, value: 666
Tiff: exif tag: 40960, type 7, count 4, value: 808464688
Tiff: exif tag: 40961, type 3, count 1, value: 1
Tiff: exif tag: 40962, type 4, count 1, value: 1600
Tiff: exif tag: 40963, type 4, count 1, value: 1200
Tiff: exif tag: 40965, type 4, count 1, value: 886
Tiff: exif tag: 41728, type 7, count 1, value: 3
Tiff: exif tag: 41729, type 7, count 1, value: 1
Tiff: IFD 2 has 6 tags
Tiff: next IFD offset = 0
Tiff: IFD 2 at 0xb7b76318, tags at 0xb7b7631a
Tiff: tag: 259, type 3, count 1, value: 6
Tiff: tag: 282, type 5, count 1, value: 870
Tiff: tag: 283, type 5, count 1, value: 878
Tiff: tag: 296, type 3, count 1, value: 2
Tiff: tag: 513, type 4, count 1, value: 4084
Tiff: tag: 514, type 4, count 1, value: 4601

Without needing to know anything more than the basic tiff IFD structure (and the exif IFD pointer in this case), and pass this information to other parts of my code without needing to copy (much) more than the address of a pointer -- without fear of losing or misinterpreting any of it along the way.

If there is an even easier way, then I'm all ears...

best,
Ron