1994.02.09 21:19 "Draft TIFF/JPEG spec", by Tom Lane

This is a first try at a detailed definition of a revised TIFF JPEG compression specification. I'm sending it out to the entire TAC mail list in hopes of getting useful comments from people who didn't participate in the small discussion group.

THIS IS ONLY A DRAFT, it's not set in stone. However, there are good reasons for most of the decisions we've made. Please contact me if you have any questions/suggestions.

                        regards, tom lane

----------------------------------

OVERVIEW:

This document supersedes Section 22 (JPEG) of the TIFF 6.0 spec. It defines a new JPEG-based TIFF compression scheme which is not compatible with 6.0's Section 22. We anticipate that the 6.0 scheme will be withdrawn, or at least labeled obsolete. The design presented here uses a new Compression tag value and new auxiliary field tags, so it will not create any confusion for readers that need to decode both old and new JPEG schemes. (Since the old scheme has seen only very limited use, most applications will not need to handle it.)

Throughout this document, the term "image segment" or "segment" means either a strip or tile, whichever storage format is used by the TIFF file. This compression scheme fits within the standard TIFF 6.0 strip and tile storage layouts; there is no "JPEGInterchangeFormat" hack. It is possible to store an image as a single JPEG datastream contained in a single strip (but designers are reminded that the TIFF specification discourages huge strips).

The scheme presented here is signaled by Compression tag value (TBD) [probably 7?]. When Compression has this value, each image segment contains a complete JPEG datastream which is valid according to the ISO JPEG standard (ISO/IEC 10918-1). Any sequential JPEG process can be used, including lossless JPEG, but progressive and hierarchical processes are not supported. To avoid patent problems, use of arithmetic coding processes in TIFF files intended for inter-application interchange is discouraged. [The TAC will no doubt have a good fight over the exact wording of that sentence :-). There is no technical impediment to using arithmetic coding, but I know that the TAC wants to avoid including any patented schemes in the standard.]

No additional TIFF fields are required to support this compression scheme, but two optional fields are provided to save space and improve compatibility. These fields are:

JPEGClass:
        Tag = (TBD)
        Type = SHORT
        N = 1

This optional field specifies the subset of JPEG which is used in the TIFF file. Currently defined values are:

        0: Strict JPEG Baseline (8 bits, Huffman coded, max 2 DC/2 AC tables)
        1: Extended DCT JPEG (8 bits, Huffman, up to 4 DC/4 AC Huffman tables)
        2: 12-bit DCT JPEG (12 bit precision, otherwise same as class 1)
        3: Lossless JPEG (2-16 bits precision, Huffman coded)

Additional classes may be defined in future. If this field is missing or has an unrecognized value, it is recommended that readers attempt to decode the file anyway, but apply careful error checking to the JPEG markers to ensure that the file is within their capabilities. In particular, to avoid being confused by future extensions to the JPEG standard, it is important to abort if unknown marker codes are seen.

[Do we really need this field? Its only justification is to allow decoders to skimp on error checking. I'd rather remove it and tell people to decode the JPEG markers properly. Comments?]

JPEGTables:
        Tag = (TBD)
        Type = BYTE [or perhaps UNDEFINED would be better?]
        N = length of tables, typically a few hundred bytes

When this field is present, it contains a JPEG "abbreviated table specification" datastream. Use of this field allows a multi-segment image to avoid repeating JPEG table definitions in each segment. JPEGTables provides default JPEG quantization and/or Huffman tables which are used whenever a segment datastream does not contain its own tables, as specified below.

INTERACTION WITH OTHER TIFF FIELDS:

The ISO JPEG compression scheme is applied to the same image data that would be stored in the file if a different Compression tag were used. ISO JPEG does not incorporate any color conversion or downsampling steps; therefore, if color conversion or downsampling is applied, the regular TIFF fields must reflect this fact. PhotometricInterpretation and related fields shall describe the color space actually stored in the file. With the TIFF 6.0 field set, downsampling is permissible only for YCbCr data, and it must correspond to the YCbCrSubSampling field. (Note that the default value for this field is not 1,1; so the default for YCbCr is to apply downsampling!) We anticipate that the TIFF field set will be extended to allow downsampling of other color spaces, but that is not strictly a part of this JPEG proposal.

[In some applications, the original data may have been converted to a different colorspace solely to improve JPEG compression. There is some feeling in the discussion group that it would be useful to provide additional, optional TIFF fields to record the original colorspace or the conversion mapping. This is still under debate.]

When DCT-based JPEG is used in a strip TIFF file, RowsPerStrip is required to be a multiple of 8 times the largest vertical sampling factor, ie, a multiple of the height of an interleaved MCU. (For simplicity of specification, we require this even if the data is not actually interleaved.) Any padding required at the right edge of the image, or at the bottom of the last strip, is assumed to occur internally to the JPEG codec.

When DCT-based JPEG is used in a tiled TIFF file, TileLength is required to be a multiple of 8 times the largest vertical sampling factor, ie, a multiple of the height of an interleaved MCU; and TileWidth is required to be a multiple of 8 times the largest horizontal sampling factor, ie, a multiple of the width of an interleaved MCU. (For simplicity of specification, we require this even if the data is not actually interleaved.) All edge padding required will therefore occur in the context of normal TIFF tile padding; it is not special to JPEG.

Note that lossless JPEG does not impose any segment size constraints beyond the subsampling-related constraints (see below).

CLARIFICATION OF TIFF 6.0 SUBSAMPLING DISCUSSION:

TIFF 6.0's Section 21 is undesirably vague about subsampling, and it has at least one bogus requirement. We recommend it be revised/clarified as follows:

When subsampling is applied, all image dimensions found in TIFF fields are given in terms of the highest-precision component (luminance). [I believe this is the intention of TIFF 6.0, but it is not spelled out anywhere.]

Segment dimensions (RowsPerStrip/TileWidth/TileLength) are constrained to be multiples of the corresponding (vertical or horizontal) sampling factor; this ensures that there are an integer number of samples of each component per segment. (Note that use of lossy JPEG compression imposes a more severe constraint on segment size, as mentioned above.)

ImageWidth and ImageLength are *not* constrained. If the image size is not a multiple of the sampling factor, pad the data as necessary before downsampling, or discard excess data after upsampling. Padding shall be done by replicating the last sample row or column.

[I have a separate document giving proposed wording changes for section 21.]

CONTENTS OF JPEG-COMPRESSED IMAGE SEGMENTS:

Each image segment in a JPEG-compressed TIFF file shall contain a valid JPEG datastream according to the ISO JPEG spec's rules for interchange-format or abbreviated-image-format data. The datastream shall contain a single JPEG frame storing that segment of the image. The required JPEG markers within a segment are:

        SOI     (must appear at very beginning of segment)
        SOFn
        SOS     (one for each scan, if there is more than one scan)
        EOI     (must appear at very end of segment)

The actual compressed data follows SOS; it may contain RSTn markers if DRI is used.

Additional JPEG "tables and miscellaneous" markers may appear between SOI and SOFn, between SOFn and SOS, and before each subsequent SOS if there is more than one scan. These markers include:

        DQT
        DHT
        DAC     (not to appear unless arithmetic coding is used)
        DRI
        APPn    (shall be ignored by TIFF readers)
        COM     (shall be ignored by TIFF readers)

(DNL markers shall not be used in TIFF files.) Readers should abort if any other marker type is found, especially the JPEG reserved markers; occurrence of such a marker is likely to indicate a JPEG extension.

The tables/miscellaneous markers may appear in any order. Readers are cautioned that although the SOFn marker refers to DQT tables, JPEG does not require those tables to precede SOFn, only the SOS. Missing-table checks should be made at SOS time.

If no JPEGTables field is used, then each image segment shall be a complete JPEG interchange datastream. Each segment must define all the tables it references. To allow readers to decode segments in any order, no segment may rely on tables being carried over from a previous segment.

When a JPEGTables field is used, image segments may omit tables that have been specified in the JPEGTables field. Further details appear below.

The SOFn marker shall be of type SOF0 for strict JPEG baseline files, of type SOF1 for non-baseline DCT files, or of type SOF3 for lossless JPEG files. [SOF9 or SOF11 would be used for arithmetic coding.] All segments of a JPEG-compressed TIFF image shall use the same SOFn type.

The data precision field of the SOFn marker shall agree with the TIFF BitsPerSample field. (Note that when PlanarConfiguration=1, this implies that all components must have the same BitsPerSample value; when PlanarConfiguration=2, different components could have different depths.) For SOF0 only precision 8 is permitted; for SOF1, precision 8 or 12 is permitted; for SOF3, precisions 2 to 16 are permitted.

The image dimensions given in the SOFn marker shall agree with the logical dimensions of that particular strip or tile. For strip images, the SOFn image width shall equal ImageWidth and the height shall equal RowsPerStrip, except in the last strip which may have a smaller height (padding rows shall not be counted). For tile images, each SOFn shall have width TileWidth and height TileHeight; padding in the edge tiles is the concern of some higher level of the TIFF software. It is worth noting that the overall size of a tiled image can exceed the 64K*64K limit of JPEG itself; strip JPEG images are limited to 64K width, but not limited in height.

In PlanarConfiguration 2, the dimensional rules are different since each JPEG datastream must be a valid image standing on its own. In PC 2 the dimensions given in the SOFn of a subsampled component shall match the actual number of samples stored in that segment; thus, they are scaled down by the sampling factors compared to the "nominal" dimensions that would be used in PC 1.

The number of components in the JPEG datastream shall equal SamplesPerPixel for PlanarConfiguration=1, and shall be 1 for PlanarConfiguration=2. The components shall be stored in the same order as they are described at the TIFF field level. (This applies both to their order in the SOFn marker, and to the order in which they are scanned if multiple JPEG scans are used.) The component ID bytes are arbitrary so long as each component within an image segment is given a distinct ID. For consistency, we require that all segments of a TIFF image use the same ID code for a given component.

In PlanarConfiguration 1, the sampling factors given in the SOFn shall agree with the sampling factors defined by the related TIFF fields. In PlanarConfiguration 2, all SOFn sampling factors shall be 1, since each image segment looks like a simple grayscale image at the JPEG level. (Any downsampling in PC 2 will need to happen externally to the JPEG codec.)

Within both JPEG image segments and JPEGTables fields, multibyte values appear in the MSB-first order specified by the JPEG standard, regardless of the byte ordering of the surrounding TIFF file.

CONTENTS OF JPEGTABLES FIELD:

The purpose of JPEGTables is to predefine DQT and/or DHT tables for subsequent use by JPEG image segments. When this is done, these rather bulky tables need not be duplicated in each segment, thus saving space and processing time. JPEGTables may be used even in a single-segment file, although there is no space savings in that case.

When the optional JPEGTables field is present, it shall contain a valid JPEG "abbreviated table specification" datastream. This datastream shall begin with SOI and end with EOI. It may contain zero or more JPEG "tables and miscellaneous" markers, namely:

        DQT
        DHT
        DAC     (not to appear unless arithmetic coding is used)
        DRI
        APPn    (shall be ignored by TIFF readers)
        COM     (shall be ignored by TIFF readers)

Since JPEG defines the SOI marker to reset the DAC and DRI status, these marker values cannot be carried over into any image datastream, and thus they are effectively no-ops in the JPEGTables field. To avoid confusion, it is recommended that writers not place these marker types in JPEGTables. However readers should properly skip over them if they appear.

When JPEGTables is present, readers shall load the table specifications contained in JPEGTables before processing image segment datastreams. Image segments may simply refer to the preloaded tables without defining them. An image segment can still define and use its own tables, subject to the restrictions below.

An image segment may not redefine any table defined in JPEGTables. (This restriction is imposed to allow readers to process image segments in random order without having to rescan JPEGTables between segments.) Therefore, use of JPEGTables divides the available table slots into two groups: "global" slots are defined in JPEGTables and may be used but not redefined by segments; "local" slots are available for local definition and use in each segment. To support random access, a segment may not reference any local tables that it does not itself define.

[We could interpret "may not redefine" in at least three different ways:

  1. Strict: no DQT/DHT for that slot number may appear in image segments.
  2. Sloppy: DQT/DHT may appear, but only if it loads the table with the
     identical values loaded by JPEGTables.
  3. Pulling a fast one: segment may change a global table, but only if it
     changes it back before EOI.

I like #1. #2 is of dubious value, and #3 looks downright dangerous --- what if a reader doesn't bother to scan to EOI, but stops after it's got the data?]

There is no default value for JPEGTables; standard TIFF files must define all tables that they reference. For some closed systems in which many files will have identical tables, it might make sense to invent a default JPEGTables value to avoid actually storing the tables. Or even better, invent a private field selecting one of N default JPEGTables settings, so as to allow for future expansion. This should be regarded as a private extension.

RECOMMENDATIONS FOR MAXIMUM INTERCHANGEABILITY:

ISO JPEG is an extremely general standard; few existing implementations support the entire standard. For maximum cross-application compatibility, we recommend that writers confine themselves to the following JPEG subset unless there is very good reason to do otherwise. Readers shall support *at least* all of the following subset of JPEG in order to claim TIFF/JPEG compatibility.

[Some of these suggestions should perhaps be turned into requirements of the TIFF standard. Any comments here?]

Use the ISO "baseline" JPEG subset: 8-bit data precision, Huffman coding, no more than 2 DC and 2 AC Huffman tables. (We recommend deviating from baseline JPEG only if 12-bit data precision or lossless coding is required.)

Use no subsampling (all JPEG sampling factors = 1) for color spaces other than YCbCr. For YCbCr, use one of the following choices:

        YCbCrSubSampling field          JPEG sampling factors
        1,1                             1h1v, 1h1v, 1h1v
        2,1                             2h1v, 1h1v, 1h1v
        2,2  (default value)            2h2v, 1h1v, 1h1v

We recommend that RGB source data be converted to YCbCr for best compression results. Other source data colorspaces should probably be left alone.

Use PlanarConfiguration=1 to avoid the nonintuitive requirements of PC=2, especially if any subsampling is going on.

Use a single interleaved scan in each image segment. (This is not legal JPEG if the sampling factors are such that more than 10 blocks would be needed per MCU; in that case, use a separate scan for each component. The recommended color spaces and sampling factors will not run into that restriction.)

Avoid "noise" JPEG markers (COM and APPn markers). Standard TIFF fields provide a better way to transport any non-image data. Some JPEG decoders may change behavior if they see an APPn marker they think they understand; since the TIFF spec requires these markers to be ignored, this behavior is undesirable.

To claim TIFF/JPEG compatibility, readers shall support multiple-strip TIFF files and the optional JPEGTables field; it is not acceptable to read only single-datastream files. Support for tiled TIFF files is strongly recommended but not required. [I know this paragraph will draw complaints. The way I see it, a single-strip-only subset doesn't deserve to be called TIFF at all.]