2009.02.07 10:53 "Re: [Tiff] assertions, and building with DEBUG/NDEBUG", by

Frank,

I would appreciate it if you could get a bugzilla account, and be prepared to interact with bugs there. In the meantime I've appended your email to the ticket.

OK.

In TIFFFetchNormalTag() we have lots of logic like:

        case TIFF_SETGET_UINT16:
                {
                        uint16 data;
                        assert(fip->field_readcount==1);
                        assert(fip->field_passcount==0);
                        err=TIFFReadDirEntryShort(tif,dp,&data);
                        if (err==TIFFReadDirEntryErrOk)
                        {
                                if (!TIFFSetField(tif,dp->tdir_tag,data))
                                        return(0);
                        }
                }
                break;

Would it be better to change these assertions to a runtime error indicating that this tag is not suitable for processing in TIFFFetchNormalTag(), possibly indicating a corrupt file or a logic error in libtiff?

Personaly, I don't think that's a good idea. If you have encapsulating logic working one way (the different stages in TIFFReadDirectory), and you have called logic underneath, then the latter shouldn't normally be build to withstand the encapsulating logic failing. Rather, the called logic should be build to raise our LibTiff developers interest if/when the calling logic fails, so that we know exactly what is going on and why, and can assess the actual problem rather then obscure its symptoms.

This is quite a dirty file so I'm pleased to settle with reasonable error reports and recovery.

I presume the "fip" description of the tag cannot be made compatible with TIFFFetchNormalTag for some reason? It does not *seem* like a particularly complicated tag to handle generically. It is currently:

{ TIFFTAG_COMPRESSION, -1, 1, TIFF_SHORT, 0, TIFF_SETGET_UINT16,

TIFF_SETGET_UNDEFINED, FIELD_COMPRESSION, 0, 0, "Compression", NULL },

I a bit lost in aspects of this - particularly the stuff about set_field_type and get_field_type in the TIFFField structure.

I think we're all convinced of the trouble with the varargs calling protocol in TIFFGetField and TIFFSetField. The set_field_type and get_field_type is best regarded a first attempt at making the library 'aware' on a meta-level of what types are expected in the TIFFGetField and TIFFSetField interface for the various tags. This is required for a) moving away from specific handling, towards more generic handling, in some of the lowest level under the hood code that is the deepest implementation of TIFFGetField and TIFFSetField, and b) it is also required to one day build a backwards compatible TIFFGetField and TIFFSetField that calls a newer more robust and generic tag interface. Both a) and b) are mostly long-term goals at best, and it's anybody's guess whether they'll ever be a reality, but jumping to the major update 4.0 seemed like a good time to get at least some primary requirements started.

That's not to say it's the only use for these, that's not to say it's complete or perfect in reaching its goals, and I'm not saying it's all purely long-term anticipating either - in fact some of a) is already in place if I remember correctly. What I am saying is that it might be most productive if we think of these this way and try to take it from there.

Actually, can't we just set the field_readcount to 1 and then this definition would work?

I understand, and it seems like an option. Furthermore, I don't wish to defend the crappy logic in TIFFReadDirectory, as I think it's simply not good.

But, short of rewriting LibTiff, building on a proper generic tag and IFD handling layer, subsequently adding a full lazy as-needed image-interpretation layer, TIFFReadDirectory logic stands, and we best ensure it works.

So, I still feel you're obscuring one symptom with the cure you propose here, where you might easilly choose to deal with the problem instead. If you do as you propose here, you'll get the exact same problem again if a file with multiple TIFFTAG_SAMPLESPERPIXEL tags in an IFD surfaces sooner or later, and there is certainly also potential for other, more obscure and much harder to diagnose trouble because of the fix-up hacks in TIFFReadDirectory that are bound to particular stages and get into trouble if some of their prerequisits gets overwritten in a subsequent stage. What happens if a second TIFFTAG_COMPRESSION tag has a legit datatype and value, different from the first, and overwrites the value in the third TIFFReadDirectory stage? Does it depend on the exact values? Might it depend on other specific compression-type dependent hacks scattered all over the library? Would you like to guess what happens to multiple TIFFTAG_YCBCRSUBSAMPLING tags combined with a subsampling-autocorrecting compression mode like JPEG? What if they've different values? Even investigating these situations and estimating the problems seems like a lot of unpleasant work.

In conclusion, we've a crappy TIFFReadDirectory. Due to its multi-stage design handling a lot of things in a very specific manner, it's fragile and complicated. It much depends on the IGNORE mechanism to avoid reading any given data twice. It gets into trouble if the IGNORE mechanism fails due to a given tag being present twice. To avoid even more complicated situations, I propose we make sure the IGNORE mechanism works even if there are multiple instances of a tag, but first ensuring there aren't, sorting the tags making all unique at the start of TIFFReadDirectory, rather then just dig a hole to put this particular symptom with this particular file in and having this opportunity for a clear and easy but profound fix go to waste whilst adding to the complexity of the situation.

Best regards,

Joris