AWARE [SYSTEMS] Imaging expertise for the Delphi developer
AWare Systems, Imaging expertise for the Delphi developer, Home TIFF and LibTiff Mailing List Archive

LibTiff Mailing List

TIFF and LibTiff Mailing List Archive
November 2003

Previous Thread
Next Thread

Previous by Thread
Next by Thread

Previous by Date
Next by Date

Contact

The TIFF Mailing List Homepage
This list is run by Frank Warmerdam
Archive maintained by AWare Systems



Valid HTML 4.01!



2003.11.20 05:16 "tiffcp analysis", by Ross Finlayson

tiffcp analysis

I figure I should analyze tiffcp.  After all, if I am complaining about 
it crashing, I should fix it.  Recently I noticed that it crashes 
sometimes when compressing JPEG, or reading JPEG images, for example 
tiffcp -w 128 -l 128 -c jpeg quad-lzw.tif quad-tile-jpeg.tif.

Also, I figure I can add the OJPEG to JPEG conversion into it so that if 
gets an OJPEG file as input, that it could write a new JPEG file as 
output, in some far-fetched ideal scenario.

Also, we could see how to copy all the tags without having to type them 
in a list and manually update it as tag definitions change, as the tag 
definitions are stored in the TIFF with an according TIFFFieldInfo 
struct.

Also, it could use raw conversion, for example for merely appending 
TIFFs instead of changing the structural details.

The entry point to tiffcp functionality is the main function.  A variety 
of its variables are stored in static variables.  I think it would be 
better to have a context or state struct with the variables, and pass it 
to all the functions, because it would make it easier to use tiffcp 
functions as library functions.

The main function processes its arguments.

Here is the list of the functions besides the macros, usage, and main:

tiffcp()
processG3Options()
processCompressOptions()

pickCopyFunc()

openSrcImage()
nextSrcImage()

cpTag()
cpStriptoTile()
cpSeparateBufToContigBuf()
cpImage()
cpContigBufToSeparateBuf()

The tiffcp(TIF*, TIFF*) functions accepts two open TIFF files, in and 
out, for reading and writing, it copies the tags and data from in to out.

Back to main, main opens with variables for in and out, getopt 
variables, tile and strip rows parameters, a directory offset, mode 
array and pointer alias, and it thus progresses.

The b option specifies a bias image to be masked over the output.  That 
is the image processing option, the other options have to do with the 
output file configuration:  tile and strip configuration, planar 
configuration, and compression.

The last argument of the argv argument to main is opened as out, the 
output TIFF for writing.  This is somewhat dangerous as something 
accidental like tiffcp *.tif would copy all the files matching 
specification over the last file matched by the specification, as I know 
from overwriting a file.  UNIX commands are explicit in their usage and 
not necessarily designed to protect the users from their actions, and in 
this case the file specification is done by the command shell, so glob 
couldn't be monitored to warn or ask for confirmation of an errant 
wildcard match.

For each of the file name arguments prior to the last one, the 
openSrcImage is called with that char* and it returns a TIFF that in, 
the input TIFF file, has assignment.  The char** argument to 
openSrcImage, which may call nextSrcImage to set a directory, there 
using the syntax to select a directory, and if the specification is not 
parsed, nextSrcImage calls exit(-4), which it notes as a syntax error, 
instead of the more generic exit(EXIT_FAILURE).

So in has been opened for reading and set to a directory.  Then there is 
an endless for loop where functions for config ,compression, fillorder, 
rowsperstrip, tilewidth, tilelength, and g3opts are called, then the 
tiffcp function is called, then the next directory is set, or if no 
more, the loop is breaked or returned, the input file is close, and the 
next file is opened through openSrcImage.

I don't see where it calls TIFFClose(out), it just exit(0)'s.  It calls 
TIFFClose(out) if tiffcp(in, out) fails, and exit(1).

Then after the main function are defined a couple functions to process 
the input arguments.  Then, there is implemented a usage function.  I 
don't know is usage is always supposed to be called usage and be static 
and void so that it can be called from an external program from loading 
the object image and calling the usage function without calling main.  I 
haven't seen any usage of that type of usage.  The usage function calls 
the TIFFGetVersion, invoking it would probably cause the libtiff shared 
or static object to be loaded or loaded and linked or whatever it is 
a.out, or ELF, or PE/COFF, Mach/o, or what have you, do their things.

Then, there are the CopyField macros, with variously one, two, three, or 
four arguments, they resolve to "if fTIFFGetField(...) 
TIFFSetField(...)".  When the tag and types of arguments are known, they 
are handy.

The next function is a cpTag, presumably for "copy tag", Based on a data 
type argument, there is a switch argument on the data type to call the 
CopyTag macros on temporary or working variables of names of the 
variable type with v for value, eg  floatv, or av for array value, eg 
floatav.  I think the way it uses the variable represents about as 
standard as libtiff field definitions get, considering we are using C: a 
typed language where everything is an int.

After the cpTag function is a static array of cpTag structs called 
tags.  Its elements are a tag, like TIFFTAG_IMAGEWIDTH, the number of 
values and the type of values for the tag, where the number, or rather, 
count, is an integer and the type is one of the libtiff definitions of 
the enumeration of the TIFF field types as specified in TIFF 6.0, 
extended from unsigned ints and ASCII strings of TIFF 5.0, ahhermmm.  
The static struct does not end with a NULL entry, instead a macro NTAGS 
is defined that  divides the size of the structs by the sizeof a struct 
thus that entries are readily added to the list without specifying the 
count of the specified tags.

This leads me into a brief aside about a preprocessor extension I would 
like to see, given a const char* "s" it expands to the "s", strlen(s), 
except it runs strlen itself: "s", 1, for something like write
( str("s"), out) expanding to write("s", 1, 1, out).  Anyways.

Where above the CopyField macros were defined that work within the cpTag 
function, then the CopyTag macro is defined which calls cpTag with in, 
out.

Then, there is implemented the tiffcp function.  It copies image width, 
length, bits per sample, and samples per pixel.  Then, it copies 
compression, and against compression copies compression scheme dependent 
tags.  I'm not quite sure yet, but I think there might be better orders 
to copy some of the tags for how the internal logic of TIFFGetField and 
TIFFSetField must handle a variety of tag specifications and 
implications of their logic.

The tiffcp function then copies over to the output photometric 
interpretation, fill order as specified, orientation, rows per strip, 
planar configuration as specified, transfer function, colormap, 
compression scheme related tags, ICC profile, and a page number.

Then, a copy function is returned from the pickCopyFunc function, 
variously from the input, output, length, width, and samples per pixel.

There are then macros defined to construct the cpFunc,readFunc, and 
writeFunc.

/*
  * Contig -> contig by scanline for rows/strip change.
  */
/*
  * Contig -> contig by scanline while subtracting a bias image.
  */
/*
  * Strip -> strip for change in encoding.
  */
/*
  * Separate -> separate by row for rows/strip change.
  */
/*
  * Contig -> separate by row.
  */
/*
  * Separate -> contig by row.
  */

static void
cpStripToTile(uint8* out, uint8* in,
static void
cpContigBufToSeparateBuf(uint8* out, uint8* in,
static void
cpSeparateBufToContigBuf(uint8* out, uint8* in,
static int
cpImage(TIFF* in, TIFF* out, readFunc fin, writeFunc fout,


Then, those are the copy functions, there are the defined read 
functions.  Where the cpImage function has as arguments a read function 
and a write function, it allocates a buffer to hold the complete image 
and fills it with the read function and writes it to the output with the 
write function.

DECLAREreadFunc(readContigStripsIntoBuffer)
DECLAREreadFunc(readSeparateStripsIntoBuffer)
DECLAREreadFunc(readContigTilesIntoBuffer)
DECLAREreadFunc(readSeparateTilesIntoBuffer)

DECLAREwriteFunc(writeBufferToContigStrips)
DECLAREwriteFunc(writeBufferToSeparateStrips)
DECLAREwriteFunc(writeBufferToContigTiles)
DECLAREwriteFunc(writeBufferToSeparateTiles)

Then are some more copy functions.

/*
  * Contig strips -> contig tiles.
  */
DECLAREcpFunc(cpContigStrips2ContigTiles)
DECLAREcpFunc(cpContigStrips2SeparateTiles)
DECLAREcpFunc(cpSeparateStrips2ContigTiles)
DECLAREcpFunc(cpSeparateStrips2SeparateTiles)
DECLAREcpFunc(cpContigTiles2ContigTiles)
DECLAREcpFunc(cpContigTiles2SeparateTiles)
DECLAREcpFunc(cpSeparateTiles2ContigTiles)
DECLAREcpFunc(cpSeparateTiles2SeparateTiles)
DECLAREcpFunc(cpContigTiles2ContigStrips)
DECLAREcpFunc(cpContigTiles2SeparateStrips)
DECLAREcpFunc(cpSeparateTiles2ContigStrips)
DECLAREcpFunc(cpSeparateTiles2SeparateStrips)

Largely the copy functions implement the combinations of read and write 
functions to assimilate and express the configuration of strips and 
tiles, and separate planes of data and interleaved contiguous planes of 
data.

Then, there is implemented the pickCopyFunc function which returns a 
function pointer of one of those cp functions prototyped as a copyFunc.

The pickCopyFunc gets fields from the input and output TIFFs and rejects 
unsupported cases.  Then, there is some kind of truth table 
implementation, some kind of poor man's finite state automaton, that 
evaluates to a big switch statement returning the appropriate copyFunc 
for the inputs and specifications.


I see a few trivial things to modify that don't affect the structure at 
all, not all the functions are prototyped using the macros, for example.

Another issue is the fields that are copied, we want to extend libtiff 
so that tiffcp handles all the fields without modification as the 
library is extended.

Another issue is structural reorganization of the functions and some of 
their arguments to reduce or completely remove static variables and 
enable the tiffcp function to be more readily called from a stable 
system program.

Another point would be to increase diagnostics, error and warnign 
reporting, or even variable reporting around a static variable, or 
variable of the state struct.  By this point, I'm past having any 
motivation to do it myself but I'm quite happy to offer direction.  
That's not entirely true, I'd be happy to do it but just don't feel like 
it right now.

Another thing is i had this concept about copying OJPEG into new JPEG 
files, and also about copying JPEG data without uncompressing it.  JPEG 
data is mostly lossy data and uncompression and recompression almost 
always introduces (more) visual artifacts into the image.

There could be matched read and write functions which accomplish this, I 
think, I'm not quite sure yet, something to consider.

I don't see the function handling subdirectories generally, per TIFF 
Tech Note 1.  Subdirectories are often used to store reduced image and 
image mask information, and are used in various high profiles as well.

Overall, I think it's sharp but I want to put all the static variables 
in a struct passed to each function so it could be compiled into a 
library and used in a safe multi-threaded fashion.  Then, I guess I will 
start trying to neaten it and make consistent its organization for that 
there could be ready implementation of more general ideas as basic image 
processing.  Someone might want to deskew their images, or otherwise 
filter the data between read and write, as the bias mask example does, 
almost tortuously.

It's not the point of making life difficult, it's of making it easy.

The basic function that could be exposed would be the copy directory.  
It would copy a directory and its subdirectories, where libtiff has 
functional support, so I hear, for one level of subdirectories.  It 
would copy all the configured and extension tags via their TIFFFieldInfo 
or special case handling for legacy tags.

Ross F.