AWARE [SYSTEMS] Imaging expertise for the Delphi developer
AWare Systems, Imaging expertise for the Delphi developer, Home TIFF and LibTiff Mailing List Archive

LibTiff Mailing List

TIFF and LibTiff Mailing List Archive
June 2008

Previous Thread
Next Thread

Previous by Thread
Next by Thread

Previous by Date
Next by Date

Contact

The TIFF Mailing List Homepage
This list is run by Frank Warmerdam
Archive maintained by AWare Systems



Valid HTML 4.01!



Thread

2008.06.19 21:34 "Scalability problem in tif_dirread.c - detecting loops in IFD offsets", by Steve Eddins
2008.06.19 22:04 "Re: Scalability problem in tif_dirread.c - detecting loops in IFD offsets", by Frank Warmerdam
2008.06.20 12:02 "Re: Scalability problem in tif_dirread.c - detecting loops in IFD offsets", by Steve Eddins
2008.06.20 13:38 "Re: Scalability problem in tif_dirread.c - detecting loops in IFD offsets", by Bob Friesenhahn
2008.06.20 17:19 "Re: Scalability problem in tif_dirread.c - detecting loops in IFD offsets", by Andrey Kiselev
2008.06.20 18:19 "Re: Scalability problem in tif_dirread.c - detecting loops in IFD offsets", by Edward Lam
2008.06.22 17:15 "Re: Scalability problem in tif_dirread.c - detecting loops in IFD offsets", by Steve Eddins

2008.06.19 21:34 "Scalability problem in tif_dirread.c - detecting loops in IFD offsets", by Steve Eddins

Hi,

The file tif_dirread.c in 3.8.2 (and back at least to 3.7.0) contains code
meant to detect a loop in the list of IFD offsets.  The detection code
maintains an array of IFD offsets already seen.  When reading each new
directory, it scans the array to see if the next offset has been seen
before.  Then it realloc's the array to make room for processing the next
directory.

I think this particular code is related to a performance problem reported by
some of our users.  Believe it or not, some of our users are working with
TIFF files containing tens of thousands of IFDs.  They are complaining that
our software takes longer and longer to read each image in the file.  For
example, a Sandia National Laboratory user sent us a 1.1 GB file containing
around 129,000 IFDs.  It takes about 200 time longer to read images at the
end of the IDF chain, compared to the time it takes to read images at the
beginning.  I suspect that the loop-detection code is causing the time to
read the N-th image to be N^2.  (If I understand the code correctly, there
are 129,000 calls to realloc required just to arrive at the last IFD, not to
mention all those linear scans.)  The constant of proportionality is very
low, so you don't notice the N^2 behavior unless you are processing files
with a very large number of IFDs.

I'm curious about this code and why it was put in.  Was it inserted in
response to actual perverted TIFF files detected in the wild?  Or just
plain-ol' cautious programming?

Either way, the current implementation scales badly with the number of IFDs.

I'm considering writing my own routine to quickly scan a TIFF file to
determine the list of IFD offsets, and then patching tif_dirread.c to add a
new method that can read an IFD at a specified offset.  This would allow us
to provide our users with much faster "random IFD" access for their TIFF
files.  I'm wondering if I need to implement loop detection in my IFD offset
list routine.  If so, I'd be tempted to use Floyd's cycle detection
algorithm, which is linear time and requires no extra space.

Does anyone have any thoughts about the cycle detection code in
tif_dirread.c, or about my ideas for implementing a work-around?

Thanks,

Steve Eddins
The MathWorks, Inc.