2021.12.16 14:27 "Re: [Tiff] Ensuring buffer alignment when creating tiff files", by Bob Friesenhahn

As you can see, these offsets are misaligned for a 2byte/16bit greyscale image.

Looking at the libtiff API, we cannot find anything that would allow us to ensure that the SubIFDs are aligned correctly. Are we missing something or is this simply not possible currently?

We think that it would only require a small change in the code base, namely ensuring that the seek at [1] ends at an aligned address based on the BitsPerSample for the current IFD.

The notion of writing 'aligned' data (which requires inserting some dead space to assure alignment) is interesting and seems useful. This is mostly useful when the data is not compressed. I have not heard of this before in the context of the TIFF format, but some some other formats take care to assure it. Obviously, alignment could only be assured if the file is written by a TIFF writer which assures it.

You seem to be talking about aligning the TIFF data samples (a good thing), but there may be other beneficial alignment factors such as alignment to mmap memory page boundaries, or filesystem block-size boundaries.

Regardless, take care not to be sucked into the vortex of using memory mapping to read files. When using memory mapping to read data, your program loses control and the thread which is doing the reading is put to sleep while the I/O is taking place. The I/Os are usually the size of a memory page, which is often just 4k. This means that your program gets put to sleep more often than desired, with many more context switches than if a larger copy-based I/O was used. If the data has recently been read before then memory mapping seems great since the data is likely to already be cached in memory.

When reading from a file, it is common for the operating system to try to deduce if the reading is sequential or random. If it is able to deduce that the reading is sequential, then it may pre-read data in order to lessen the hit (time spent sleeping) when the data is read in order. Operating systems may not have useful support for detecting sequential reads when using mmap to do the reading. TIFF requires random access and so the operating system might be slow to detect and optimize for a sequential read.

If the operating system does not provide a "unified page cache" with the filesystem, then there may be a filesystem data cache, and another copy of the data for use with mmap. This increases memory usage and does not avoid a data copy. It seems like the "unified page cache" approach has falled by the wayside since it is difficult to implement with the many filesystems available. Instead operating systems have moved toward offering "direct I/O" to lessen caching and data copies.

In summary, the use of mmap and carefully aligned input data might not provide actual benefit over larger programmed (or scheduled via async-I/O) reads into a aligned buffer, even though it clearly requires an extra memory copy.

Bob

Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt