2009.12.22 06:41 "[Tiff] PATCH: proposed changes to some JPEG routines", by Lee Howard

Hello everyone,

As you may recall from a month ago or so I have been doing some work with an application that will be using libtiff's JPEG-in-TIFF support. The application is HylaFAX. Color fax image data is compressed as JPEG. In order to put a multi-page fax into one file (as well as to make use of valuable data-space in TIFF headers), HylaFAX is wrapping the JPEG images with TIFF.

Unfortunately, there are some obstacles (libtiff bugs) in this purpose which has required me to seek out and fix the problems - at least for my purpose. I would like this message to start a conversation about getting these bugs fixed (and ultimately concluding with that) because I do not want to continually distribute a patched libtiff in order my users to be able to use these new features.

The primary bugs are these:

When generating a JPEG-compressed TIFF file and writing the JPEG data into the TIFF with TIFFWriteRawStrip libtiff automatically generates a JPEGTables TIFF tag without being requested by the application and said JPEGTables TIFF tag contains no valid data, just 2000 null bytes.
It does not seem possible for the application to produce the JPEGTables tag independent from libtiff. In other words, libtiff must use libjpeg to compress the bitmap data into JPEG in order for a valid JPEGTables tag to be produced. The application cannot suggest a JPEGTables tag to libtiff.
Consequently the invalid JPEGTables tag makes it impossible for the viewer applications to render the image properly... and there does not seem to be a workaround other than to decompress the JPEG first and then to recompress it into the TIFF via libtiff (which would introduce some unnecessary loss).
When running a JPEG-compressed TIFF file through the tiff2pdf tool (which basically just changes the TIFF wrapper for a PDF wrapper since PDF can have JPEG-compressed image data in it), the resulting PDF file is not viewable in Acrobat Reader, evince, or Ghostscript, although xpdf does handle it fine.
I've attached a patch file against libtiff 4.0.0beta5. By no means is this patch file a suggestion of what should be committed to the code repository. Instead, I attach the patch for your reference. It shows the changes that I made in order to accomplish what I needed. I added comments as I felt appropriate for understanding. I would like to discuss my changes more verbosely here, now. I would appreciate any comments that you may have. I hope that the end-result will be sufficient changes to fix the three bugs I've listed above.
In tif_jpeg.c you'll notice two places where I've suggested that some sections of code should not apply to RGB photometric data. As I ended up using YCbCr photometric data these points are not as critical to me, but I'm pointing them out to you now in order to simply raise my concern that something looks incorrect there.
At the end of tif_jpeg.c is the most-critical change that I made in order to resolve my point #1 above (which made my point #2 unimportant). I'm not sure if TIFFSetFieldBit(tif, FIELD_JPEGTABLES) actually needs to be called at this point. I suspect that this field bit will get set later if it is used by the application or by the library, and so I think that during this memory-allocation process at the outset of the TIFF creation it is not important to set the field bit. In any case, removing this one line solves the majority of my grief.
I found tiff2pdf.c to be extremely well-documented in parts, and wholly lacking of documentation in other parts. In particular, the code doesn't provide human-readable information that details what it's doing as it does it. So in tiff2pdf.c you'll notice that I've added some comments to help readers know what JPEG headers are being parsed by the code at the indicated points. I think this kind of documentation is good.
The solution to my point #3 above is to write the JPEG SOI headers into the TIFF strip data rather than to skip them. Those two lines of code solve problem #3. Then the PDF can be viewed in all PDF readers I have tested.
My last change in tiff2pdf.c is to question the correctness of a change that was made to the code in libtiff 3.8.1 by Dan Cobra. However, since I've switched from RGB photometric to YCbCr this code is no longer a problem for me, and, indeed, it's possible that before I settled on YCbCr data that my RGB data with which I was testing was incorrect. However, it may be a good idea for someone to double-check the sanity of that change made in 3.8.1.
I look forward to your feedback.

Thanks,

Lee.

diff -Nru tiff-4.0.0beta5/libtiff/tif_jpeg.c tiff-4.0.0beta5.new/libtiff/tif_jpeg.c --- tiff-4.0.0beta5/libtiff/tif_jpeg.c 2009-07-07 19:46:29.000000000 -0700 +++ tiff-4.0.0beta5.new/libtiff/tif_jpeg.c 2009-12-19 16:06:06.000000000 -0800 @@ -1082,6 +1082,11 @@

return (0);
}
#endif
+/*

+Sampling factors are not relevant for PHOTOMETRIC_RGB. So if the +RGB JPEG image has confusing sampling factors we can ignore it.

+*/
+if (sp->photometric != PHOTOMETRIC_RGB) {
        if (td->td_planarconfig == PLANARCONFIG_CONTIG) {
                /* Component 0 should have expected sampling factors */
                if (sp->cinfo.d.comp_info[0].h_samp_factor != sp->h_sampling ||
@@ -1147,6 +1152,7 @@
                        return (0);
                }
        }
+}
        downsampled_output = FALSE;
        if (td->td_planarconfig == PLANARCONFIG_CONTIG &&
            sp->photometric == PHOTOMETRIC_YCBCR &&
@@ -1158,7 +1164,12 @@
                        /* Suppress colorspace handling */
                sp->cinfo.d.jpeg_color_space = JCS_UNKNOWN;
                sp->cinfo.d.out_color_space = JCS_UNKNOWN;
+/*

+Sampling factors are not relevant for PHOTOMETRIC_RGB. So if the +RGB JPEG image has confusing sampling factors we can ignore it.

+*/
                if (td->td_planarconfig == PLANARCONFIG_CONTIG &&
+(sp->photometric != PHOTOMETRIC_RGB) &&
                    (sp->h_sampling != 1 || sp->v_sampling != 1))
                        downsampled_output = TRUE;
                /* XXX what about up-sampling? */
@@ -2259,7 +2270,15 @@

if( tif->tif_diroff == 0 )

{
#define SIZE_OF_JPEGTABLES 2000

+/*

+The following line assumes incorrectly that all JPEG-in-TIFF files will have +a JPEGTABLES tag generated and causes null-filled JPEGTABLES tags to be written +when the JPEG data is placed with TIFFWriteRawStrip. The field bit should be +set, anyway, later when actual JPEGTABLES header is generated, so removing it +here hopefully is harmless.

TIFFSetFieldBit(tif, FIELD_JPEGTABLES);

+*/

             sp->jpegtables_length = SIZE_OF_JPEGTABLES;
            sp->jpegtables = (void *) _TIFFmalloc(sp->jpegtables_length);
            _TIFFmemset(sp->jpegtables, 0, SIZE_OF_JPEGTABLES);

diff -Nru tiff-4.0.0beta5/tools/tiff2pdf.c tiff-4.0.0beta5.new/tools/tiff2pdf.c --- tiff-4.0.0beta5/tools/tiff2pdf.c 2009-01-29 13:58:26.000000000 -0800 +++ tiff-4.0.0beta5.new/tools/tiff2pdf.c 2009-12-19 22:31:37.000000000 -0800 @@ -3320,13 +3320,16 @@

int j=0;

        i++;
-
        while(i<(*striplength)){
                switch( strip[i] ){

                        case 0xd8:
+                               /* SOI - start of image */
+                               _TIFFmemcpy(&(buffer[*bufferoffset]), &(strip[i-1]), 2);
+                               *bufferoffset+=2;
                                i+=2;
                                break;
                        case 0xc0:
+                               /* SOF - start of frame */
                        case 0xc1:
                        case 0xc3:
                        case 0xc9:
@@ -3365,12 +3368,15 @@
                                }
                                break;
                        case 0xc4:
+                               /* DHT - define huffman tables */
                        case 0xdb:
+                               /* DQT - define quantization tables */
                                _TIFFmemcpy(&(buffer[*bufferoffset]), &(strip[i-1]), strip[i+2]+2);
                                *bufferoffset+=strip[i+2]+2;
                                i+=strip[i+2]+2;
                                break;
                        case 0xda:
+                               /* SOS - start of scan */

                                if(no==0){
                                        _TIFFmemcpy(&(buffer[*bufferoffset]), &(strip[i-1]), strip[i+2]+2);
                                        *bufferoffset+=strip[i+2]+2;
@@ -5074,10 +5080,16 @@
                case T2P_COMPRESS_JPEG:
                        written += t2pWriteFile(output, (tdata_t) "/DCTDecode ", 11);

+/*

+This code which was added in 3.8.1 breaks at least some PHOTOMETRIC_RGB JPEG-in-TIFF images... +My guess is that the need condition isn't based purely on the photometric values. +Dan Cobra - from whom this originated - should probably be consulted.

+
if(t2p->tiff_photometric != PHOTOMETRIC_YCBCR) {

written += t2pWriteFile(output, (tdata_t) "/DecodeParms ", 13);
written += t2pWriteFile(output, (tdata_t) "<< /ColorTransform 0 >>\n", 24);

}
+*/
break;
#endif
#ifdef ZIP_SUPPORT