2017.06.27 20:51 "[Tiff] Excessive memory allocation while chopping strips", by Nicolas RUFF

2017.06.27 22:08 "Re: [Tiff] Excessive memory allocation while chopping strips", by Even Rouault

On mardi 27 juin 2017 22:51:43 CEST Nicolas RUFF wrote:

I saw in https://trac.osgeo.org/gdal/changeset/39082 that you try to defend against overly large memory allocations.

I have a pathological test case from fuzzing where header values are:

ImageWidth = 1060
ImageLength = 536871550

It will hit the following piece of code in tif_dirread.c:

...
/*

*/
if ((tif->tif_dir.td_planarconfig==PLANARCONFIG_CONTIG)&&
   (tif->tif_dir.td_nstrips==1)&&
   (tif->tif_dir.td_compression==COMPRESSION_NONE)&&
   ((tif->tif_flags&(TIFF_STRIPCHOP|TIFF_ISTILED))==TIFF_STRIPCHOP))
    {
        if ( !_TIFFFillStriles(tif) || !tif->tif_dir.td_stripbytecount )
            return 0;
        ChopUpSingleUncompressedStrip(tif);
    }
...

ChopUpSingleUncompressedStrip() will compute the required number of strips and reach this point:

...
newcounts = (uint64*) _TIFFCheckMalloc(tif, nstrips, sizeof (uint64),
"for chopped \"StripByteCounts\" array");
...

In the my case, nstrips = 268435775 (and rowsperstrip = 2), which will try to allocate 268435775 * 8 bytes. This is already beyond 2**31, but we can make it even larger and beyond 2**32 by playing with ImageLength.

Rather than playing this cat-and-mouse game of assigning arbitrary limits to every possible tag, I would suggest the following patch in tif_aux.c:

...
#include <math.h>
+#include <limits.h>
...
void*
_TIFFCheckRealloc(TIFF* tif, void* buffer,
 tmsize_t nmemb, tmsize_t elem_size, const char* what)
{
void* cp = NULL;
tmsize_t bytes = nmemb * elem_size;

+ /* TIFF size is limited to 32-bit by design and 31-bit by implementation, so

+  * there should be no reason to ever allocate more than INT_MAX bytes.
+  */+  if ((nmemb <= INT_MAX) && (elem_size <= INT_MAX) && (bytes <=

INT_MAX))
cp = _TIFFrealloc(buffer, bytes);

Note: this patch assumes that tmsize_t is 64 bits, otherwise the check should be done in a different way.

What do you think?

Glad you raise this point. I wanted to raise this topic (and a bit beyond the particular case you mention), but haven't found yet the time.

BigTIFF may theoretically have strips that are larger than 4GB (although that would be insane to craft such a file), so that might be legit to allocate that amount of memory

And in your case that could even be a standard TIFF file with width=1 and height=2 billion that would fit on 2 GB. So either it has a single strip declared and ChopUpSingleUncompressedStrip() is right trying to create a large amount of strips that will require more than 2 GB of RAM, or either it has 2 billion strips declared in StripOffsets/StripByteCounts tags and you might have try allocating & reading that from disk (in which case BigTIFF would be needed since the size for the tags woud be enormous)...

But indeed you raise a more general problem. libtiff may allocate in various places enormous virtual memory for very short files. I've tried workarounding a few situations at application level in GDAL, but I failed in a few ocurences, for example on a OJPEG file where TIFFReadDirectory() will force a call to _TIFFFillStriles() due to how the OJPEG codec works.

A potential solution would be to check the allocation request w.r.t to the file size. One downside of this is that, in one of my use case, I want support reading TIFF (some particular formulations where the IFD is at the beginning) in a pure streaming way (from a pipe), so I have no idea of the file size. Or in other use cases, I read a gzipped TIFF in a streaming way from disk, but knowing the uncompressed size is super costly (requires to uncompress the whole file). But I admit those are marginal cases.

I can see several ways of addessing the issue, which might be complementary:

--> moderately involved to implement

* as ChopUpSingleUncompressedStrip() may allocate memory unrelated to the file size, limit it to a mximum number of comupted strips, say 1 million. This strip chopping mechanism is a band-aid for malformed file. We don't have to support it for files of insane dimension, IMHO Besides allocating a lot of memory, this also can take a few hundred of milliseconds to compute the synthetic tag values.

--> easy to implement

* replicate in libtiff the lazy fetching of the strip offset and counts (read only the values for the current strip, actually a bit before/after and cache that) that I've implemented in GDAL so you don't need to fetch the whole tags (that would probably solve my OJPEG case that I can't workaround from GDAL); This would help a lot in situations where you switch back and forth among the IFDs of a same file (for example if you display a big image with overviews, and zoom in/zoom out often, you end up reading them all the time if you just have a single TIFF handle)

--> involved to implement. Requires auditing the codebase to no longer use td_stripbytecount / td_stripoffset. Retrieval of the whole content of the corresponding tags through TIFFGetField() would be left only for compatibility of external code.

Even

--
Spatialys - Geospatial professional services
http://www.spatialys.com