AWARE [SYSTEMS] Imaging expertise for the Delphi developer
AWare Systems, Imaging expertise for the Delphi developer, Home TIFF and LibTiff Mailing List Archive

LibTiff Mailing List

TIFF and LibTiff Mailing List Archive
January 2008

Previous Thread
Next Thread

Previous by Thread
Next by Thread

Previous by Date
Next by Date

Contact

The TIFF Mailing List Homepage
This list is run by Frank Warmerdam
Archive maintained by AWare Systems



Valid HTML 4.01!



2008.01.08 15:00 "Tag handling in LibTiff - past, present and future", by Joris Van Damme

Folks,


The main thing missing with regard to LibTiff 4.0 works, is documentation.
In the interest of easing that pain, here's a small draft write-up about my
personal thoughts on tag handling and what it should evolve to. Though I
think that largely there is consensus amongst us on these issues, please
note that no other LibTiff maintainer has had a chance to comment on this
exact text yet so please regard it no more then my personal opinion.


In order to explain the what and how of what I was thinking on tag handling
works and related areas in LibTiff 4.0, we need to first agree on the why,
so allow me to take a step back and start with what exact details in current
implementation we don't like.

1) The TIFFSetField and TIFFGetField variable arguments scheme is a major
pain, on all fields of a) usage, b) implementation and maintainance, and c)
documentation. We need to find an alternative.

2) We badly need generic tag handling. Many users rightly expect that the
basic currency in TIFF being a tag, calling code should be able to do such
straightforward tasks as finding out how many and what tags exactly are in
an IFD, what datatypes and values exactly they have, removing, adding,
changing some, etc.

3) Related to 2), but not entirely the same, is the notion that there are
two different 'kinds' of value as far as image tags are concerned. When we
see a subsampling value of [1,1] and auto-correct it from the JPEG
compressed data to [2,1], that's an instance when these two 'kinds' are
actually different values. In other words, there's the literal tag value,
and the sensible interpreted-image values. We often get into trouble if we
don't have a separate notion of these two kinds, and, more importantly, we
cannot hope to provide generic and exact tag handling support as detailed in
2) if for instance we keep mistaking a tag that has a default value for a
tag that is always present even when it's not.

4) The tag registry and auto-registration is simply wrong. In this scheme,
we assume that one we see a new tag type with datatype short, it's allright
to register it as short and expect short from that point on. This is not
correct. People are perfectly free to use a single tag type with different
datatypes. In fact, some of the standard tags do. Hacking our way through to
allow different datatypes in auto-registered tags is wrong, too. In fact,
the datatype as we've seen it is about all that is registered, it's all that
is know. We need to drop the complete auto-registration, and start
supporting get and set operation on unknown tags without registration.

5) The fact that all tag values are read, at IFD reading time, regardless of
whether or not we're going to need them, is another major pain. We're
already seeing some tags with major amounts of value, and we should expect
these amounts of unneeded overhead at IFD reading time to grow, still,
especially in a format that is specifically designed to break the 4 gigabyte
boundary. Se we need to evolve toward lazy reading, i.e. read the value of a
tag first time we actually need it, on the one hand, and start supporting
streaming tag value data in and out without full buffers as a second and
less urgent feature.

6) Too much is tied onto the compression scheme. LibTiff supports prediction
in flate compression, or lzw compression, so why should it fail to correctly
decode an image that combines prediction with packbits compression or no
compression? As another example, the subsampling tags really have nothing to
do with the JPEG compression scheme, they're totally independent of one
another. And, most importantly, real TIFF has no concept of
compression-dependent tags. In other words, the G3Options tag still is the
G3Options tag even if JPEG compression is applied. It doesn't suddenly
change to an undefined tag.


Thinking we agree on all the above, I took the opportunity to add some of
the first steps in order to evolve in the right direction and remain
backwards compatible all the way. To clearify what and how, let's first take
again a step back and consider what is a proper implementation and interface
for tag handling.

1) At IFD reading time, a suitable structure of each tag should be build.
This structure should contain the basic information that is present inside
the TIFF file, but it should also leave room for hooking on such stuff as a
value buffer memory block at any later time.

2) Tag counts and tag ids, datatypes, and values, should next be exposed in
the reading interface. Note that this is the literal kind of information I'm
refering to here, not the sensible kind. In fact, the IFD does not need to
make sense and any actual image interpretation code is totally distinct from
this. As tags can often occur with different datatypes, and callers often
require unification into the single datatype they support (see our own
strip/tile bytecount/offset handling as a perfect example), this value
reading interface should support auto-conversion, allowing a caller to query
LONG values when actual tag has SHORT values for example.

3) At the time of image ifd interpretation, the previous interface should be
used to query literal tag values, and sensible image property values should
be derived and stored independently in a sensible-image-information block
rather then being mixed with tag values.

4) At the time of image writing, the sensible-image-information block data
should be used to determine what tags with what values need to be written,
and the interface next detailed in 5) should be used to do this.

5) A tag writing interface should expose access to the tag count (removing
and adding tags), and for individual tags it should expose methods to choose
datatype and value. Note that often times here too some flavour of
auto-conversion would be a nice thing, in that we've many tags were standard
and good policy is to select shortest possible datatype from a set of
allowed datatatypes and actual values. For example, our own strip/tile
bytecount/offset values are handled as 64bit values on the 'sensible image'
level, but at writing time we need to select shortest possible datatype from
the set [SHORT,LONG] or [SHORT,LONG,LONG8] in BigTIFF, depending on actual
values.

6) From AsTiff experience, I know there needs to be an opportunity for the
application code to use the generic tag handling interface 5) after step 4)
and before actual IFD writing. This is required because some people at some
times need to assemble very specific TIFFs beyond what a the standard image
writing interface provides. One example is test image building code that
could be writing small images, but could want to write any and every allowed
datatype for strip/tile bytecount/offset tags. This kind of code can come in
between image-to-ifd and ifd-to-file stages, use the auto-converting generic
tag reading to read the values as any type, and next use the generic tag
writing to write any datatype it desires. Other examples of code that needs
this is code that needs to write stuff compatible with specific brain-dead
readers and such.

7) Apart of this correct interface, we also need to provide backwards
compatible TIFFGetField and TIFFSetField implementation.


At both reading and writing end, I did some of the primary work in LibTiff
4.0 that should get us one step along the right road.

At the IFD reading stage, things are assembled in the TIFFDirEntry
structure. This is a first embryonic version of the in-memory tag structure
refered to in 1) above. At this time, it's immediatelly interpreted and
converted to TIFFSetField instructions and such, and disposed off, but the
intention is to evolve towards holding on to it throughout the life of the
TIFF (or future TIFFIFD) structure and reuse this same bugger at writing
time. I've also provided a large series of embryonic auto-converting access
functions, style TIFFReadDirEntryShort and such, and tried to be more or
less complete even if that got us some compiler warnings as to some of these
not being used in current code.

At the IFD writing stage, I have a not yet completely separated embryonic
image-to-IFD stage as detailed in 4) above, in TIFFWriteDirectorySec, that
converts sensible image data to tag writing commands, where these tag
writing commands work on the same TIFFDirEntry structure. Again, there's
also a series of embryonic writer functions that should evolve towards above
specified.

For part 7), I think we'll need to work out something like a per
parameter-type switch statement, rather then a huge per-tag switch
statement. For this to be possible, we need per-tag knowledge of exact
getter and setter parameter types. The are the new fields in the tag
definition structure. I've tried to use existing documentation and LibTiff
code to derive as many of the correct data as I could. However, for many
tags I simply did not find the proper information in either documentation
nor code. The TIFFTAG_PIXAR_TEXTUREFORMAT tag Frank's mentioned is amongst
those. I've simply not found any code that deals with that tag, nor any
documentation of the handling of this tag in the man pages. So it was
logical to assume there's simply no support for this tag. This is reflected
in the TIFF_SETGET_UNDEFINED values.



Best regards,


Joris Van Damme
info@awaresystems.be
http://www.awaresystems.be/
Download your free TIFF tag viewer for windows here:
http://www.awaresystems.be/imaging/tiff/astifftagviewer.html