2021.03.01 16:29 "[Tiff] TIFF and multispectral images of historical documents?", by Andreas Romeyke

2021.03.02 21:37 "Re: [Tiff] Tiff Digest, Vol 28, Issue 1", by Richard Nolde

Today's Topics:
> 1. TIFF and multispectral images of historical documents?
> (Romeyke, Andreas)

Unless I'm missing something, no one seems to be addressing the question that might be considered the elephant in the room. Multi-spectral data, whether LIDAR data, or medical imaging data, has both point data and meta-data. Saving the point data to one or more images or planes in an image with PLANARCONFIG_SEPARATE for all but the RGB image portion of the data is a matter of knowing the range of data values and the data type needed to store that range such that the original data can be retrieved. This has nothing to do with compression. In this sense, the TIFF file data is just a collection of values at known offsets.

I became involved with LibTIFF 30+ years ago when I wrote a program to store Magnetic Resonance Imaging data as raw data files and to export them to TIFF files so that researchers could embed them in grant applications and papers. The original data came from German Bruker spectrometers that used 24 bit words with a floating point format that involved non-contiguous bit groups for the mantissa and the exponent. The software that I wrote ran on a VAXstation that did not use IEEE floating point formats. The researchers used PCs and Macs to write their grants. After writing the code to convert between floating point formats of any type and size it was just a matter of choosing which export formats to support, eg BigEndian, LittleEndian, 8 bit int, 16 bit int, 32 bit IEEE float, 64 bit IEEE double, etc. TIFF was the ideal format for the graphics export option since in that ancient time there were few options for displaying images in word processor documents and tiff was the most widely supported.

Step one is to determine the range of values in the input data for each band used by the multi-spectral scanner. Step two is to convert that into a data type supported by LibTIFF while retaining the full range of input data and preserving the greatest degree of precision available with that data type. Step three is to offer multiple output options, including, but not limited to TIFF, that store enough information about the data in each layer to allow the original information to be reconstructed faithfully. If you write a binary data dump file, you need to have some kind of header that explains what is in the data portion of the file and where it is located. This sounds a lot like what TIFF does. For meta data like BITSPERSAMPLE, SAMPLES_PER_PIXEL, etc this is trivial. For scanner specific parameters that don't have an obvious logical tag choice, you may have to create a custom tag, or embed a proprietary blob of information in it. DSLR raw files are a perfect example of this. Note that the width and height tags are obviously incorrect as reported here by file but not by tiffinfo or tiffdump.

file _DSC9849.NEF
_DSC9849.NEF: TIFF image data, big-endian, direntries=28, height=0,
bps=0, compression=none, PhotometricIntepretation=RGB,
manufacturer=NIKON CORPORATION, model=NIKON D800,
orientation=upper-left, width=0

If you run tiffinfo on them, you will see lots more information including references to unknown tags.

If you dump the EXIF data from one of these raw files, you will see hundreds of lines of additional information.

TIFF will preserve that data if you encode it correctly. That doesn't mean that any ordinary TIFF viewer is going to interpret it correctly or even show the layers of data represented by the non-RGB image. Whether you encode all the data in one TIFF image, in a series of subfiles within that image, or in a series of images will affect what can be viewed by less capable viewers. We have all seen proprietary software that uses TIFF files to store data in a way that only that software can use, early versions of Photoshop being one package that comes to mind.

Before you define your storage format, you need to decide what your requirements will be to view or recover the data. A binary dump file with an ASCII header will preserve your data, but only you will be able to make any sense of it. If you start the header with a fixed length field to identify the program and version followed by a field that specifies the length of the header, you can expand the header in the future if need be. Once again, TIFF does all of this for you...

TIFF data formats are not limited to 1,4,8,16 bits per channel, plus floating point formats. Tiffcrop and programs like GraphicsMagick support arbitrary bit depths from 1 to 32 bits for most operations. Modern DSLRs commonly have 14 bit DACs but store their data as 16 bits per channel per pixel. TIFF gives you the flexibility to store the data any way you see fit but it does not ensure that everyone will be able to make full use of it.

You might want to investigate programs such as DICOM readers/writers and programs that store satellite and LIDAR data to see how they deal with these issues.

Richard Nolde