2008.11.17 21:44 "Re: [Tiff] Memory leak (TIFFOpen, TIFFReadTile)?", by John

2008/11/17 Frank Warmerdam <warmerdam@pobox.com>:

I have found that memory-mapped I/O is completely FUBARed, even on Linux. Even if the kernel is ultimately going to release the memory, the system still thrashes like crazy. It really shouldn't be the default mode for TIFFOpen().

I am interested in advice on whether to change TIFFOpen() to *not* use memory mapped IO by default.

I tried a couple of benchmarks:

1) Sequential read of a large file

I made a 1.8 gb RGB 8-bit tiff in strips and read it in order. I got about 60s real time, 12s user, 5s sys for both read() and mmap(). There was quite a lot of variability between runs, but no clear winner.

Watching progress in "top", you could see that the read() version had a steady and low RSS, whereas the mmap() one slowly crept up to 1.8 gb as the operation ran. However, resident shared pages are counted in RSS, so this wasn't a "real" 1.8 gb process, it was just showing (in effect) the system disc cache as part of the process size. The read() version will have just as dramatic effect on the state of memory, as the system tries to cache all the pages it passes to read(). I think the two versions are equivalent in terms of memory use.

2) Random read of a large file

I made a 250 mb tiled RGB 8-bit tiff and rotated it 90 degrees. I had thought that the heavy seeking would favour mmap(), but again they seem to have almost identical performance. Both versions ran in about 10s real, 3s user, 1s sys.

Again, although they look very different to "top", I think they are equivalent in terms of mem use as well.

To summarise, I'm a bit surprised as I thought that mmap did have some performance advantages. It certainly did 15 years ago, or whenever it was that we were all so excited about it. I guess read() has become a lot better. But I'm not sure that mmap() has any real disadvantages either, at least in this case (local disc). I didn't try a networked device.

John