2019.01.15 20:08 "Re: [Tiff] tiffcp altering image contents - reworded, simplified and with example", by Richard Nolde

On 1/15/19 1:44 AM, Binarus wrote:
> Dear Bob, Richard and Dan,
>
> thank you very much for your help so far.
>

> I think I can drastically reword and simplify my question, and I have > attached two example images so that you can reproduce the problem.

> Let's ignore the issues which arise from source image types being > different, and focus on exactly one type of image: 24 BPP, JPEG-compressed.

> It seems that tiffcp alters the image data when merging or copying such > images, in contrast to what the manual promises, and adding further

> degradation with each action (i.e. in each processing step).

>
> As an example, please consider the attached files "in.tif" and "out.tif".
>

> "in.tif" is an image which we might get from elsewhere or which me might > produce using our own workflow, i.e. it is our "source file". It is a

> TIFF image, 24 BPP, JPEG-compressed.
>
> "out.tif" has been created by tiffcp using the following command line:
>
> tiffcp in.tif out.tif
>

> As you can see, "out.tif" is about half the size of "in.tif", meaning > that a lot of information has been lost even though this is a simple

> copy with no merging or other processing!
>
> Following Dan's advice, I did
>
> tiffinfo -s in.tif
>
> and
>
> tiffinfo -s out.tif
>

> The results (please consider attachments in.txt and out.txt) clearly > show that the number of stripes has been kept, but the size of each

> stripe has been decreased by about 50%, meaning that the image size > reduction is not just due to garbage collection, but that real data has

> been lost.
>

> In turn, this means that there will be new degradation caused by tiffcp > in every processing (merging) step with such images.

> I suspect that tiffcp reads "in.tif", decompresses it to memory, and > re-compresses it when writing "out.tif". While this would be OK when

> processing images which are compressed using lossless methods (ZIP, LZW, > and so on), it is condemned to go horribly wrong when processing

> JPEG-compressed images.
>
> Is there a way to solve that problem?

Thank you very much again,
Binarus

I've done a little poking at the files to see exactly what changes between in.tif and out.tif. The simplest way to get this information is with the tiffdump command which gives the strip counts, offsets and JPEG tables along with the usual tags.

tiffdump -m 512 in.tif out.tif | sort -b -n | less

The value 512 is just an arbitrarily large number guaranteed to print all the strip offsets. If run the same command but pipe the output to | grep JPEG instead of less, it is clear the JPEG tables are different in the output image so the YCBCR data is in fact being recoded. This is also indicated by smaller strip sizes and different strip offsets.

Next, I tried using with the option to produce RGB encoded data rather than YCbCr encoded data and it fails with an error from the JPEG library.

tiffcp -c jpeg:r:100 in.tif out-RGB-100.tif JPEGLib: Warning, Application transferred too many scanlines. out-RGB-3G