AWare Systems, , Home TIFF and LibTiff Mailing List Archive

LibTiff Mailing List

TIFF and LibTiff Mailing List Archive
January 2019

Previous Thread
Next Thread

Previous by Thread
Next by Thread

Previous by Date
Next by Date


The TIFF Mailing List Homepage
Archive maintained by AWare Systems

New Datamatrix section

Valid HTML 4.01!

2019.01.15 20:08 "Re: tiffcp altering image contents - reworded, simplified and with example", by Richard Nolde

On 1/15/19 1:44 AM, Binarus wrote:
> Dear Bob, Richard and Dan,
> thank you very much for your help so far.
> I think I can drastically reword and simplify my question, and I have
> attached two example images so that you can reproduce the problem.
> Let's ignore the issues which arise from source image types being
> different, and focus on exactly one type of image: 24 BPP,
> JPEG-compressed.
> It seems that tiffcp alters the image data when merging or copying such
> images, in contrast to what the manual promises, and adding further
> degradation with each action (i.e. in each processing step).
> As an example, please consider the attached files "in.tif" and "out.tif".
> "in.tif" is an image which we might get from elsewhere or which me might
> produce using our own workflow, i.e. it is our "source file". It is a
> TIFF image, 24 BPP, JPEG-compressed.
> "out.tif" has been created by tiffcp using the following command line:
> tiffcp in.tif out.tif
> As you can see, "out.tif" is about half the size of "in.tif", meaning
> that a lot of information has been lost even though this is a simple
> copy with no merging or other processing!
> Following Dan's advice, I did
> tiffinfo -s in.tif
> and
> tiffinfo -s out.tif
> The results (please consider attachments in.txt and out.txt) clearly
> show that the number of stripes has been kept, but the size of each
> stripe has been decreased by about 50%, meaning that the image size
> reduction is not just due to garbage collection, but that real data has
> been lost.
> In turn, this means that there will be new degradation caused by tiffcp
> in every processing (merging) step with such images.
> I suspect that tiffcp reads "in.tif", decompresses it to memory, and
> re-compresses it when writing "out.tif". While this would be OK when
> processing images which are compressed using lossless methods (ZIP, LZW,
> and so on), it is condemned to go horribly wrong when processing
> JPEG-compressed images.
> Is there a way to solve that problem?
> Thank you very much again,
> Binarus

I've done a little poking at the files to see exactly what changes
between in.tif and out.tif. The simplest way to get this information is
with the tiffdump command which gives the strip counts, offsets and JPEG
tables along with the usual tags.

tiffdump -m 512 in.tif out.tif | sort -b -n | less

The value 512 is just an arbitrarily large number guaranteed to print
all the strip offsets. If run the same command but pipe the output to |
grep JPEG instead of less, it is clear the JPEG tables are different in
the output image so the YCBCR data is in fact being recoded. This is
also indicated by smaller strip sizes and different strip offsets.

Next, I tried using with the option to produce RGB encoded data rather
than YCbCr encoded data and it fails with an error from the JPEG library.

tiffcp -c jpeg:r:100 in.tif out-RGB-100.tif
JPEGLib: Warning, Application transferred too many scanlines.
out-RGB-100.tif: Error, can't write strip 0.

Next, I tried using tiffcrop, my enhanced version of tiffcp which has
slightly different parsing of the jpeg options:

tiffcrop -c jpeg:rgb:100 in.tif out-RGB-100.tif

No errors reported and the output file looks fine.

 tiffinfo out-RGB-100.tif
TIFF Directory at offset 0x30c146 (3195206)
  Image Width: 3504 Image Length: 2336
  Resolution: 72, 72 pixels/inch
  Bits/Sample: 8
  Sample Format: unsigned integer
  Compression Scheme: JPEG
  Photometric Interpretation: YCbCr
  Orientation: row 0 top, col 0 lhs
  Samples/Pixel: 3
  Rows/Strip: 16
  Planar Configuration: single image plane
  Page Number: 0-1
  Reference Black/White:
     0:     0   255
     1:   128   255
     2:   128   255
  DocumentName: in_out.tif
  Software: GraphicsMagick 1.3.25 2016-09-05 Q16
  JPEG Tables: (574 bytes)

So even if you specify RGB, you are going to get YCbCr encoded data.
This is confirmed by running tiffcrop and explicitly asking for raw
YCbCr data

tiffcrop -c jpeg:raw:100 in.tif out-YCbCr-100.tif

followed by cmp -b -l out-RGB-100.tif out-YCbCr-100.tif
echo $?

I've tried every possible combination of copying a JPEG compressed input
file to another JPEG compressed output file and you will always get
YCbCr encoded output.  Tiffcrop has extensive debugging options if
compiled with the DEBUG2 environment variable set

#ifdef DEBUG2
  char compressionid[16];

  switch (input_compression)
    case COMPRESSION_NONE:      /* 1  dump mode */
         strcpy (compressionid, "None/dump");
    case COMPRESSION_CCITTRLE:    /* 2 CCITT modified Huffman RLE */
         strcpy (compressionid, "Huffman RLE");
    case COMPRESSION_CCITTFAX3:   /* 3 CCITT Group 3 fax encoding */
         strcpy (compressionid, "Group3 Fax");
    case COMPRESSION_CCITTFAX4:   /* 4 CCITT Group 4 fax encoding */
         strcpy (compressionid, "Group4 Fax");
    case COMPRESSION_LZW:         /* 5 Lempel-Ziv  & Welch */
         strcpy (compressionid, "LZW");
    case COMPRESSION_OJPEG:       /* 6 !6.0 JPEG */
         strcpy (compressionid, "Old Jpeg");
    case COMPRESSION_JPEG:        /* 7 %JPEG DCT compression */
         strcpy (compressionid, "New Jpeg");

I found the following note while searching for a way to copy JPEG in
TIFF data without recompressing:

Old style TIFF-JPEG (compression type 6) basically stuffed a normal JFIF
file inside of a TIFF wrapper. The newer style TIFF-JPEG (compression
type 7) allows the JPEG table data (Huffman, quantization), to be stored
in a separate tag (0x015B JPEGTables). This allows you to put strips of
JPEG data with SOI/EOI markers in the file without having to repeat the
Huffman and Quantization tables. This is probably what you're seeing
with your file. The individual strips begin with the sequence FFD8, but
are missing the Huffman and quantization tables. This is the way that
Photoshop products usually write the files.

Another portion of the tiffcrop source code:


  if (input_compression == COMPRESSION_JPEG)
    {  /* Force conversion to RGB */
    jpegcolormode = JPEGCOLORMODE_RGB;
  /* The clause up to the read statement is taken from Tom Lane's tiffcp
patch */
    {   /* Otherwise, can't handle subsampled input */
    if (input_photometric == PHOTOMETRIC_YCBCR)
      if (subsampling_horiz != 1 || subsampling_vert != 1)
                "Can't copy/convert subsampled image with
subsampling %d
horiz %d vert",
                subsampling_horiz, subsampling_vert);
        return (-1);


  if (compression != (uint16)-1)
    TIFFSetField(out, TIFFTAG_COMPRESSION, compression);
    { /* OJPEG is no longer supported for writing so upgrade to JPEG */
    if (input_compression == COMPRESSION_OJPEG)
      compression = COMPRESSION_JPEG;
      jpegcolormode = JPEGCOLORMODE_RAW;
    else /* Use the compression from the input file */
      CopyField(TIFFTAG_COMPRESSION, compression);

  if (compression == COMPRESSION_JPEG)
    if ((input_photometric == PHOTOMETRIC_PALETTE) ||  /* color map
indexed */
        (input_photometric == PHOTOMETRIC_MASK))       /*
holdout mask */
      TIFFError ("writeSingleSection",
                 "JPEG compression cannot be used with %s
image data",
                 (input_photometric == PHOTOMETRIC_PALETTE)
                 "palette" : "mask");
      return (-1);
    if ((input_photometric == PHOTOMETRIC_RGB) &&
        (jpegcolormode == JPEGCOLORMODE_RGB))
        TIFFSetField(out, TIFFTAG_PHOTOMETRIC, input_photometric);

The point of this is that there doesn't seem to be anyway to force
libtiff to not re-encode the data, but this is due to a change in the
way LibTIFF interacts with the JPEG subsystem. I don't know why this is
so and I could be wrong, but neither tiffcp nor tiffcrop seems to be
willing to copy input to output without recoding. I've studied the
tiffcp and tiffcrop source code again and I cannot see any way to change
that behavior unless you could add an output option that used
TIFFWriteRawStrip instead of TIFFWriteEncodedStrip or TIFFWriteScanline
and I'm not sure now that would work with JPEG data since the raw data
has to be interpreted with the JPEG table data.

It looks to me that you are going to have to process your data in a
single pass if you use JPEG encoding. The next best option would be Zip
(Deflate) encoding with horizontal predictor set to 2. I used a sample
file here that was produced by RawTherapee as an 8 bit RGB Tiff file
with Deflate compression. Based on the sizes below, it looks like tiffcp
uses level p9 (slowest but best compression) as the default but without
the horizontal differencing value of 2 which reduces the file size
slightly, at least for my test file.

ls -lS D202000-8bit*tif | awk ' { printf ("Size: %8d Ratio: %5.2f 
%s\n", $5, $5/1920342, $9)}'

Size:  1920342 Ratio:  1.00  D202000-8bit-none-compress.tif
Size:  1624820 Ratio:  0.85  D202000-8bit-lzw-compress.tif
Size:  1349538 Ratio:  0.70  D202000-8bit-zip-compress.tif
Size:  1349538 Ratio:  0.70  D202000-8bit-zip-p9-compress.tif
Size:  1342698 Ratio:  0.70  D202000-8bit.tif
Size:  1035504 Ratio:  0.54  D202000-8bit-zip-2-p9-compress.tif
Size:   422040 Ratio:  0.22  D202000-8bit-jpeg-100-compress.tif

I'm investigating on old utility called tifftool to see how it would
deal with your data but I'm having a bit of difficulty building it since
it is some 14 years old and the Makefile is going to take some hacking
to get configured for a modern Linux system.


Tiff mailing list