2003.11.17 09:28 "[Tiff] tiff2pdf contribution", by Ross Finlayson

2003.11.17 09:28 "[Tiff] tiff2pdf contribution", by Ross Finlayson


I've written a tiff2pdf program, a tiff2pdf.c. It is similar in scope to other programs in the tools directory of the libtiff distribution. It uses only standard C library and libtiff functions, and is from one file, tiff2pdf.c, with no conditional compilation except for compression support macros in libtiff. I wrote it especially this past week to be a libtiff-specific TIFF to PDF converter, I've written some other software that converts TIFF and other image file formats into PDF, so I was able to port some of that. It converts a given TIFF file into a PDF file. It converts a variety of TIFF profiles into PDF. The generated PDF can be of compressed CCITT G4 Fax, JPEG, or Zip/Deflate data, using the libtiff encoders. Input TIFF files using those types of compression can often be converted without uncompressing the data in the TIFF or transcoding. Tiled TIFFs and multipage TIFFs are supported. The software does not generate PDF thumbnails nor does it generate PDF annotations, bookmarks, or encryption, nor does it perform PDF linearization, nor OCR, it only writes the raster data and some document information of a TIFF into a PDF.

I've communicated with Andrey Kiselev about donating this software to libtiff and asked his advice on some uses of libtiff, his suggestions are appreciated. I'm hoping that before submitting it to the official libtiff distribution that I could get some users of the libtiff list and particularly developers who can review the source code to try the software on a variety of TIFF images as a measure of testing before calling it halfway good and committing it to CVS. If you find an image upon which it fails please send me the output of the tiffinfo program on the image.

I will try it on a variety of the TIFF samples that I have collected, and endeavor to ensure that it rationally handle standard files.

I'm trying to figure out what to do about the TIFF orientation tag. Another point of interest or confusion are the page numberings. I've implemented actually getting the pages out of the TIFF per the specification, and then some, I'm hoping that if you come across other varieties of paginations that you can present sample TIFF files of those.

It doesn't have function to compose an image mask onto the output where PDF supports that. I hope someone can give an idea of how many of those types of images are out there. Other things to consider with regards to multiple layer composition are TIFF IT and TIFF FX MRC, although either of those use compression methods not supported by libtiff.

It currently doesn't handle images with a separated planar configuration.

Here is a link to the file tiff2pdf.c.gz, the compressed tiff2pdf.c.


Please forward comments, complaints, and suggestions to libtiff@remotesensing.org, or to me personally. I hope that we can discover and amend any obvious flaws.

It's not done yet. Tiles aren't fully implemented, for example, only the determination of where the tiles go on the output. I've been trying it on the libtiff sample pics, on some it does not work. I have tried any images that use the OJPEG raw/no transcode method. It won't generate readable PDF files for some combinations of input that it handles separately. Suffice it to say it's not done yet. It minimally functions on fax2d.tif and g3test.tif, and most of the images in the TIFF 3.4 sample pics. I test it on forty or fifty types of images.

Well I'm still working on this, I should get the tile output put together here shortly, I already see places where it is getting large and unruly, so I will refactor some of its functions and hopefully make it somewhat more streamlined.

Please excuse that it still has a bunch of C++ "//" style comments still in it, they are to be removed. If you find a bug around one they might help explain what I was thinking about that. If you find some other element that inhibits C portability please let me know.

It is pretty simply broken into tiff2pdf.c and a t2p.c and t2p.h, with tiff2pdf.c containing only the function main and the functions called from main.

The basic idea of the program is to generate a PDF representation of the input image(s). PDF supports 1, 2, 4 and 8 bits per component/sample, and in some recent version 16, but that is not handled here. The compression formats output into the PDF include Deflate, CCITT Fax, and JPEG, the program outputs only Group IV fax. I haven't much considered ICC profiles, transfer functions, etc. The basic idea is for use with client browsers that handle PDF but not TIFF, it's recommended to not store the images as PDF.

The program has a problem stitching JPEG tables and strips together. I was hoping that the tables and strips just started with SOI and ended with EOI so I could just put the tables in a buffer, save the last two bytes before EOI, then write a strip over them, putting the two bytes back in over the SOI of the strip, etcetera for proceeding strips. Yet, then I get into issues like Minimum Coded Units, Restart Markers, and other JPEG internals. The program compresses non-JPEG input data OK, I just want it to process already compressed JPEG data correctly. I've figured it out before and it's a hassle, I hope someone can concisely explain what to do. I have written a tiffoj2j program to convert OJPEG to new JPEG tiffs. The file smallliz.tif has incorrect strip byte counts and offsets.

Anyways, please try this tiff2pdf program and bash it around, or tcsh or whatever, it's not done yet, so input is appreciated.

Thanks go to Andrey for his advice and commentary.

Have a nice day, I hope this finds you in good spirits and health.

Ross Finlayson