2007.01.18 05:40 "[Tiff] Scientific Data in Tiff images", by Richard Nolde

2007.01.18 05:40 "[Tiff] Scientific Data in Tiff images", by Richard Nolde

Tiff users,

In regard to storing elevation data in TIFF images: At the most abstract level there are two distinct approaches depending on your intended audience. One is to pre-process the data to fit in a "normal" TIFF image that any reasonably sophisticated graphics application can read and the other is to encode the raw data in a format that retains as much of the original information as possible at the expense of portability and usefulness in a general graphics context.

I first became involved with LibTIFF when I wrote imaging software for a magnetic resonance imaging center. The data was collected on a German made spectrometer with using 24 bits words, non IEEE 48 bit floating point format with signed mantissa and signed exponent and interleaved data words of mantissa and exponent with part of each word being assigned to both the mantissa and the exponent. Only Bruker Instruments could dream up such a format. We ported the program to a Dec VAX running VMS which has 4 different floating point formats, none of which were IEEE compliant and we did Fourier transforms on the data with a Skywarrior vector processor at first and then in C code when SKY quit supporting the vector processor. Reading the raw data from the spectrometer and converting it to every known machine format in the world was not an option. I chose to support big endian and little endian variants for 8, 16, 32 bit integer, IEEE float and double floating point formats, and TIFF and I contributed some patches to the bitbashing macros for the VAX C floating point to IEEE format conversions.

The data represented signal strength values collected over time in a three dimensional space, slices if you will, through the brain. The intensity varied logarithmically rather than linearly. To put this raw data in a TIFF file would be meaningless even if it could have been done. Preprocessing was everything; to find the range of values that defined the region of interest and an acceptable range to present in a graphic format. The histogram analysis, clipping, thresholding, etc could only be performed by someone who was an expert in the field and knew what they were looking for. While a field like MRI might be an extreme case, I doubt that it is the only case where assuming the data can be reconstructed meaningfully in any automated fashion would be a false assumption. To people in the field that wanted to write their own software, we offered the raw data in any of the exported formats with an ASCII header that described all the acquisition parameters used in the study. There was no way to put all this in a header and add it to the TIFF file. For grant writers we offered TIFFs as grayscale or palette color RGB. AFTER clipping, min/max filtering, etc by the scientist, we scaled the image to fit in the range of the TIFF image type, ie 0 - 255 for grayscale images, MIN_IS_BLACK. We also offered an option to chose among a series of custom palettes to provide various visual interpretations of the data. For example, we created an RGB palette that mapped the top 16 values to 16 distinct colors and then mapped the remaining remaining palette entries linearly to shades of gray, a sort of pseudo-color over grayscale approach. Visually, this conveyed the information in the data very nicely, the "hot" signal strength spots were highlighted against a background that appeared to be grayscale even though there was never any red, green, or blue component of the data. I also wrote an algorithm that created isolines mapped to the color palette entries at the points where the data values changed between discrete ranges so that the regions would be outlined rather than obscured, a contour map of the intensity levels as it were. These were artificial as well. but it was common practice in the field to use color to map to intensity just as any false color images of space objects are frequently used by NASA.

In that application, the original data would have no meaning to anyone but a spectrometer in raw form. Digital Elevation Model data can be negative but at least it is linear. It is my opinion that it is wrong to assume that you can auto scale any scientific data and not do an injustice to the result in at least some special area of scientific application. Set values for SMinSampleValue and SMaxSampleValue for the agreed upon logical min/max in that scientific field, not the specific image in question. If you are using TIFF as a scientific data interchange format instead of a picture viewer format, then the people in each field need to get together and decide what the logical range of values in the data set is going to be, not the TIFF community who can't possibly know these things. If you just want TIFF to show a representation of the data for general viewing and passing amongst desktop applications, then pick a set of minima/maxima that lend themselves to viewing and printing, scale your data to those values and be done with it. Having long double floating point numbers in a TIFF file won't do you any good if all the values are zero except for one pixel that has a value of 2**38 - 1 if your intention is to view the image on screen without your own custom software.

I would vote for adding new TAGS to provide the additional information that explains how the stored image data relates to the original collected range, ie scale factor, and the range of possible values for that data. You really need both for scientific data. If there are standard algorithms that are applied to the data that can be identified in a TAG, add them too. How does GeoTIFF stores the information about the Geodetic reference model and projection used for the image? Aren't these comparable "scientific" issues that a non-geotiff enhanced reader ignores? That will limit the ability to get this information to those applications that support those tags of course, but anybody could read the scaled data into a graphics package and manipulate the image to view it in some fashion if the data is pre-scaled to the ranges normally supported in integer formats say 0 - 255, but only the specialists in each field will know what the original range of values are and how they were preprocessed to get to those values. Anybody can write a propriety binary file format that no one else can read, but TIFF is still my best vote for platform independence, speed, flexibility, and scalability. There are also other formats for cross platform data interchange that are public I believe if viewing is not your real concern. Look at netcdf among others if all you want is portable data interchange.

My two cents worth and nothing more. Descendit soapbox.

Richard Nolde