2004.07.10 17:56 "[Tiff] unintentional ABI change between 3.5 and 3.6?", by Jay Berkenbilt

2004.07.11 16:47 "Re: [Tiff] unintentional ABI change between 3.5 and 3.6?", by Jay Berkenbilt

Executive summary: after additional experimentation, I no longer believe that there was an ABI change between 3.5.7 and 3.6.1. Read on for details....

It seems to me that the TIFFDirectory, _TIFFRGBAImage and TIFF structure contents are not intended to be used publically. I am not familiar with what resulted in the TIFFYCbCrToRGB structure change or what it is used for, so I am not clear on whether this is really a change to the public ABI or not.

I've done a bunch of experimentation this morning and agree with your conclusion. I have compiled various programs that follow the TIFF interface correctly and have not been able to create a situation where a program compiled with libtiff 3.5.7 and run with 3.6.1 or vice versa behaves incorrectly. It is also clear that libtiff uses the well established technique of hiding the structures from the public interface through an opaque typedef. This even isolates the client applications from changes in the sizes of the data structures.

Upon further investigation of the Debian bugs, it appears that all the programs that were crashing were loading tiffs through gdk-pixbuf either via libpixbufloader-tiff.so (which attaches libtiff at runtime) or by linking with gdk_pixbuf. Between the original bug reports and now, debian has switched to gnome 2.6, and a lot of the client code has changed. Programs like gthumb that use libtiff directly work equally well now with libtiff 3.5.7 and 3.6.1. Some programs that use gdk_pixbuf, like eog, also seem to work equally well in both cases. Programs like gqview and nautilus that use libpixbufloader-tiff seem to fail to be able to display some images when running with 3.5.7 (having been built with 3.6.1) but they don't crash anymore. They just don't load any image data. It's not obvious without further study why this is.

As far as I can tell, none of the current code is using non-public interfaces, but I wouldn't swear to it.

In any case, I no longer that all the Debian bugs are really related to a bug in libtiff. If anything, they are probably related to a problem with gdk-pixbuf which may have since been fixed (though this is a guess). I think someone jumped to an incorrect conclusion and all the bugs got assigned there as a result. Although I had forgotten this, my original bug reports as well as some of the others did make the observation that gdk-pixbuf seemed to be a common thread and that some applications that use libtiff directly never suffered from this problem.

My other reaction to this is that I didn't realize the soname was actually tied to the library version. My understanding was the libtool/shared library versions are generally now not tied directly to the public versions of libraries but instead are otherwise meaningless numbers updated whenever needed. This is the whole -version-info stuff for libtool, right? Perhaps we haven't been doing it that way for libtiff and should. That is, I think the sonames should be decoupled from the published release numbers.

All that said, I am not adverse to having the next release of libtiff (based on autoconf/libtool/etc) be called 4.0.0 though there isn't honestly any significant ABI change from 3.6.x or 3.5.x as far as I know. Normally my intention would be to only upgrade the primary version number when a pretty dramatic change is made in the source level interfaces, with the minor release number changing for ABI changes that don't generally require much if any change in application code.

I almost suggested this in my original message but backed away at the last minute. My rule is the same as what you said -- change the major version when there are source interface changes. Some applications tie the soname to the version and some don't. I prefer keeping them separate. That said, I'm not sure that it's really necessary to upgrade the soname for libtiff after all. The build changes combined with the (hopeful) reactivation of the LZW code could be a good excuse to go to 4.0.0, but if there really aren't source level interface changes....

I'm going to discuss the current debian situation with the handful of people who decided to assign all the bugs to libtiff and see whether there's any consensus.

In Debian, the current situation is that none of the reported bugs are reproducible using the current versions of libraries. In most cases, there are no problems downgrading to 3.5.7, but in some cases, the situation is worse there. The segmentation faults that were present when 3.5.7 was first replaced with 3.6.1 in debian are no longer reproducible with any of the test images attached to the bug reports.

I'm going to try a little harder to figure out why gqview and nautilus behave differently with 3.5.7 and 3.6.1. The gqview in debian can't display any tiff images including uncompressed ones with 3.5.7 but works just fine with 3.6.1.

If I can demonstrate conclusively that this is not the result of an ABI change, then I'll retract my request to call the next release 4.0.0 and change its soname on that basis. Of course, it may still be a good idea to decouple the soname and library version as discussed in your message and other responses, but it could be done just because it's a good idea to do in conjunction with moving to libtool rather than because of this specific problem.

I'll try to report back with more conclusive information.

--Jay