2006.08.24 07:29 "[Tiff] IPTC tag", by Joris Van Damme

2006.08.24 21:34 "Re: [Tiff] IPTC tag", by Chris Cox

You need to special case the IPTC tag to not do byte order swapping, and always read as bytes (the Photoshop parser does this).

Yes, Photoshop writes the IPTC tag in the funny way to make it work with the wire service agency's software.

If (tag == IPTC_tag)
    // calculate the length in bytes from the data type

    // read the number of bytes, ignoring the data type

On 8/24/06 12:29 AM, "Joris" <joris.at.lebbeke@skynet.be> wrote:

I see I've some catching up to do, reading a thread about IPTC (and other) tags. I thus thought I throw a document in the list that I have laying about, on the subject of that nasty IPTC tag. It's only a draft, but if the list agrees I think it may make a nice addition to the info that is in http://www.awaresystems.be/imaging/tiff/tifftags.html. Any comments are highly appreciated.


The situation with the IPTC tag is particularly nasty. For all those concerned, I'll try and make a complete summary of the state it's in, and a humble recommendation on how to read and write it.

For starters, the IPTC specification clearly shows how the data is organised in meaningful units of variable bitlength. The *single* correct way to encode that, is using a tag of datatype undefined, and passing the bytecount as tag count (i.e. length of datatype undefined is actually defined, it's the same length as byte).

However, some bozo somewhere in the early days of IPTC in TIFF decided to write the stuff as long datatype, with a count such that the tag data length is the length of the IPTC block padded with up to 3 zero bytes. As far as I can know, the bozo that decided this first was likely using LibTiff. So let's look at what happens:

Essentially, that means there is no way to predict whether or not swapping the data as longs (again, and thus unswapping actually) is needed, before they start making sense again the IPTC way... One could speculate that bozo was using early versions of LibTiff, and likely wrote files in the machine's native byte order, which at one time was even the only supported mode as far as I know. So the following cases are likely the most common:

On the other hand, other vendors, including Adobe, sought to cure wrong by doing the same wrong... But who knows what exact wrong they did as to the above detailed byteorder issue? So, likely, the only good conclusion on the current state of affairs, is that just about anything is out there, all combinations go.

Put all together, likely the best strategy in reading the totally screwed up IPTC data is the following:

if datatype is undefined, or byte
    OK, this is good, read as any tag, interpret according to IPTC spec
else if datatype is different from long
    this is real bad, unexpected and unable to read
    read with LibTiff or whatever library you are using as if the data
        is indeed an array of long, even knowing that it isn't
    actually check the data reinterpreted as undefined array (IPTC
        defined byte array) of appropriate length (*4), see from
        the first actual bytes if the data makes any sense
    if so
        OK, you've got your data
        swap the byteorder of the data interpreted as longs
        reinterpret the results as undefined array, they should be valid
            IPTC now

At the cost of one added difficulty (using your existing solution to interpret IPTC in a new way, i.e. to check whether some block of data might be valid IPTC instead of going through a full decoding session (startup)), this gets you full reading support for anything that is out there that is IPTC at all, including what any bozo's write. If that cost is too great, I recommend as a second best solution to accept IPTC data when passed as datatype *undefined* or *byte*, only, in which case you have totally no difficulty and can defend your case as at least supportive of the valid.

Put all together, likely you don't want to write IPTC data yourself, at all if you can avoid it, but especially you *don't* want to write it with datatype long. Use the only correct datatype undefined. As a second best, use the datatype byte if for some strange reason you can't manage to do undefined.