2003.11.20 05:16 "[Tiff] tiffcp analysis", by Ross Finlayson

I figure I should analyze tiffcp. After all, if I am complaining about it crashing, I should fix it. Recently I noticed that it crashes sometimes when compressing JPEG, or reading JPEG images, for example tiffcp -w 128 -l 128 -c jpeg quad-lzw.tif quad-tile-jpeg.tif.

Also, I figure I can add the OJPEG to JPEG conversion into it so that if gets an OJPEG file as input, that it could write a new JPEG file as output, in some far-fetched ideal scenario.

Also, we could see how to copy all the tags without having to type them in a list and manually update it as tag definitions change, as the tag definitions are stored in the TIFF with an according TIFFFieldInfo struct.

Also, it could use raw conversion, for example for merely appending TIFFs instead of changing the structural details.

The entry point to tiffcp functionality is the main function. A variety of its variables are stored in static variables. I think it would be better to have a context or state struct with the variables, and pass it to all the functions, because it would make it easier to use tiffcp functions as library functions.

The main function processes its arguments.

Here is the list of the functions besides the macros, usage, and main:

tiffcp()
processG3Options()
processCompressOptions()

pickCopyFunc()

openSrcImage()
nextSrcImage()

cpTag()
cpStriptoTile()
cpSeparateBufToContigBuf()
cpImage()
cpContigBufToSeparateBuf()

The tiffcp(TIF*, TIFF*) functions accepts two open TIFF files, in and out, for reading and writing, it copies the tags and data from in to out.

Back to main, main opens with variables for in and out, getopt variables, tile and strip rows parameters, a directory offset, mode array and pointer alias, and it thus progresses.

The b option specifies a bias image to be masked over the output. That is the image processing option, the other options have to do with the output file configuration: tile and strip configuration, planar configuration, and compression.

The last argument of the argv argument to main is opened as out, the output TIFF for writing. This is somewhat dangerous as something accidental like tiffcp *.tif would copy all the files matching specification over the last file matched by the specification, as I know from overwriting a file. UNIX commands are explicit in their usage and not necessarily designed to protect the users from their actions, and in this case the file specification is done by the command shell, so glob couldn't be monitored to warn or ask for confirmation of an errant wildcard match.

For each of the file name arguments prior to the last one, the openSrcImage is called with that char* and it returns a TIFF that in, the input TIFF file, has assignment. The char** argument to openSrcImage, which may call nextSrcImage to set a directory, there using the syntax to select a directory, and if the specification is not parsed, nextSrcImage calls exit(-4), which it notes as a syntax error, instead of the more generic exit(EXIT_FAILURE).

So in has been opened for reading and set to a directory. Then there is an endless for loop where functions for config ,compression, fillorder, rowsperstrip, tilewidth, tilelength, and g3opts are called, then the tiffcp function is called, then the next directory is set, or if no more, the loop is breaked or returned, the input file is close, and the next file is opened through openSrcImage.

I don't see where it calls TIFFClose(out), it just exit(0)'s. It calls TIFFClose(out) if tiffcp(in, out) fails, and exit(1).

Then after the main function are defined a couple functions to process the input arguments. Then, there is implemented a usage function. I don't know is usage is always supposed to be called usage and be static and void so that it can be called from an external program from loading the object image and calling the usage function without calling main. I haven't seen any usage of that type of usage. The usage function calls the TIFFGetVersion, invoking it would probably cause the libtiff shared or static object to be loaded or loaded and linked or whatever it is a.out, or ELF, or PE/COFF, Mach/o, or what have you, do their things.

Then, there are the CopyField macros, with variously one, two, three, or four arguments, they resolve to "if fTIFFGetField(...) TIFFSetField(...)". When the tag and types of arguments are known, they are handy.

The next function is a cpTag, presumably for "copy tag", Based on a data type argument, there is a switch argument on the data type to call the CopyTag macros on temporary or working variables of names of the variable type with v for value, eg floatv, or av for array value, eg floatav. I think the way it uses the variable represents about as standard as libtiff field definitions get, considering we are using C: a typed language where everything is an int.

After the cpTag function is a static array of cpTag structs called tags. Its elements are a tag, like TIFFTAG_IMAGEWIDTH, the number of values and the type of values for the tag, where the number, or rather, count, is an integer and the type is one of the libtiff definitions of the enumeration of the TIFF field types as specified in TIFF 6.0, extended from unsigned ints and ASCII strings of TIFF 5.0, ahhermmm. The static struct does not end with a NULL entry, instead a macro NTAGS is defined that divides the size of the structs by the sizeof a struct thus that entries are readily added to the list without specifying the count of the specified tags.

This leads me into a brief aside about a preprocessor extension I would like to see, given a const char* "s" it expands to the "s", strlen(s), except it runs strlen itself: "s", 1, for something like write ( str("s"), out) expanding to write("s", 1, 1, out). Anyways.

Where above the CopyField macros were defined that work within the cpTag function, then the CopyTag macro is defined which calls cpTag with in, out.

Then, there is implemented the tiffcp function. It copies image width, length, bits per sample, and samples per pixel. Then, it copies compression, and against compression copies compression scheme dependent tags. I'm not quite sure yet, but I think there might be better orders to copy some of the tags for how the internal logic of TIFFGetField and TIFFSetField must handle a variety of tag specifications and implications of their logic.

The tiffcp function then copies over to the output photometric interpretation, fill order as specified, orientation, rows per strip, planar configuration as specified, transfer function, colormap, compression scheme related tags, ICC profile, and a page number.

Then, a copy function is returned from the pickCopyFunc function, variously from the input, output, length, width, and samples per pixel.

There are then macros defined to construct the cpFunc,readFunc, and writeFunc.

/*
  * Contig -> contig by scanline for rows/strip change.
  */
/*
  * Contig -> contig by scanline while subtracting a bias image.
  */
/*
  * Strip -> strip for change in encoding.
  */
/*
  * Separate -> separate by row for rows/strip change.
  */
/*
  * Contig -> separate by row.
  */
/*
  * Separate -> contig by row.
  */

static void
cpStripToTile(uint8* out, uint8* in,
static void
cpContigBufToSeparateBuf(uint8* out, uint8* in,
static void
cpSeparateBufToContigBuf(uint8* out, uint8* in,
static int
cpImage(TIFF* in, TIFF* out, readFunc fin, writeFunc fout,

Then, those are the copy functions, there are the defined read functions. Where the cpImage function has as arguments a read function and a write function, it allocates a buffer to hold the complete image and fills it with the read function and writes it to the output with the write function.

DECLAREreadFunc(readContigStripsIntoBuffer)
DECLAREreadFunc(readSeparateStripsIntoBuffer)
DECLAREreadFunc(readContigTilesIntoBuffer)
DECLAREreadFunc(readSeparateTilesIntoBuffer)

DECLAREwriteFunc(writeBufferToContigStrips)
DECLAREwriteFunc(writeBufferToSeparateStrips)
DECLAREwriteFunc(writeBufferToContigTiles)
DECLAREwriteFunc(writeBufferToSeparateTiles)

Then are some more copy functions.

/*
  * Contig strips -> contig tiles.
  */
DECLAREcpFunc(cpContigStrips2ContigTiles)
DECLAREcpFunc(cpContigStrips2SeparateTiles)
DECLAREcpFunc(cpSeparateStrips2ContigTiles)
DECLAREcpFunc(cpSeparateStrips2SeparateTiles)
DECLAREcpFunc(cpContigTiles2ContigTiles)
DECLAREcpFunc(cpContigTiles2SeparateTiles)
DECLAREcpFunc(cpSeparateTiles2ContigTiles)
DECLAREcpFunc(cpSeparateTiles2SeparateTiles)
DECLAREcpFunc(cpContigTiles2ContigStrips)
DECLAREcpFunc(cpContigTiles2SeparateStrips)
DECLAREcpFunc(cpSeparateTiles2ContigStrips)
DECLAREcpFunc(cpSeparateTiles2SeparateStrips)

Largely the copy functions implement the combinations of read and write functions to assimilate and express the configuration of strips and tiles, and separate planes of data and interleaved contiguous planes of data.

Then, there is implemented the pickCopyFunc function which returns a function pointer of one of those cp functions prototyped as a copyFunc.

The pickCopyFunc gets fields from the input and output TIFFs and rejects unsupported cases. Then, there is some kind of truth table implementation, some kind of poor man's finite state automaton, that evaluates to a big switch statement returning the appropriate copyFunc for the inputs and specifications.

I see a few trivial things to modify that don't affect the structure at all, not all the functions are prototyped using the macros, for example.

Another issue is the fields that are copied, we want to extend libtiff so that tiffcp handles all the fields without modification as the library is extended.

Another issue is structural reorganization of the functions and some of their arguments to reduce or completely remove static variables and enable the tiffcp function to be more readily called from a stable system program.

Another point would be to increase diagnostics, error and warnign reporting, or even variable reporting around a static variable, or variable of the state struct. By this point, I'm past having any motivation to do it myself but I'm quite happy to offer direction. That's not entirely true, I'd be happy to do it but just don't feel like it right now.

Another thing is i had this concept about copying OJPEG into new JPEG files, and also about copying JPEG data without uncompressing it. JPEG data is mostly lossy data and uncompression and recompression almost always introduces (more) visual artifacts into the image.

There could be matched read and write functions which accomplish this, I think, I'm not quite sure yet, something to consider.

I don't see the function handling subdirectories generally, per TIFF Tech Note 1. Subdirectories are often used to store reduced image and image mask information, and are used in various high profiles as well.

Overall, I think it's sharp but I want to put all the static variables in a struct passed to each function so it could be compiled into a library and used in a safe multi-threaded fashion. Then, I guess I will start trying to neaten it and make consistent its organization for that there could be ready implementation of more general ideas as basic image processing. Someone might want to deskew their images, or otherwise filter the data between read and write, as the bias mask example does, almost tortuously.

It's not the point of making life difficult, it's of making it easy.

The basic function that could be exposed would be the copy directory. It would copy a directory and its subdirectories, where libtiff has functional support, so I hear, for one level of subdirectories. It would copy all the configured and extension tags via their TIFFFieldInfo or special case handling for legacy tags.

Ross F.