2006.04.22 03:37 "[Tiff] Microsoft Document Imaging status / snapshot", by Brad Hards

2006.04.25 14:49 "[Tiff] Re: Tiff Digest, Vol 23, Issue 23", by Glenn Widener

>
> Message: 1
> Date: Mon, 24 Apr 2006 13:39:41 +0200
> From: "Gerben Vos" <Gerben@ZyLAB.COM>
> Subject: RE: [SPAM HEADER] - [Tiff] Microsoft Document Imaging status
> / snapshot - Email found in subject
> To: <tiff@lists.maptools.org>
> Message-ID:

>       <FC840EC0BF7BFA45A0F94548AF36920DABF224@zynlms01.ZyLAB.WAN>
> Content-Type: text/plain;     charset="iso-8859-1"

 0xef  0x82  0xa7  = some kind of bullet point symbol
0xef  0x82  0xb7  = some kind of bullet point symbol (different to a7)
0xe2  0x80  0x93  = em-dash
0xe2  0x80  0x9c  = `` (smart doublequotes, left side of  quoted material)
0xe2  0x80  0x9d  = '' (smart doublequotes, right side of  quoted material)
0xe2  0x80  0x99  = ' (apostrophe of some kind)
0xe2  0x80  0xa6
0xe2  0x80  0x94 = short dash?
0xc3  0xa9 = e with grave. (00a9 is the unicode equivalent, perhaps

this will form some pattern)

These are clearly UTF-8 encoded Unicode characters:

U+F0A7 = (user-defined)
U+F0B7 = (user-defined)
U+2013 = en-dash (shorter than em-dash!)
U+201C = left double quote
U+201D = right double quote
U+2019 = right single quote
U+2026 = ellipsis (three dots)
U+2014 = em-dash (longer than en-dash!)
U+00A9 = e-grave

Some of the ones you list (e.g., the first two bullets) are in the "implementation defined" Unicode area, but lists with the Microsoft assignments in there are easy to find on the Internet.

>From our experience decyphering "Word Smart Quotes" in Windows print driver output, it also produces:

 2018 = left single quote.

By the way, thanks for posting this; I was intending to try to figure this out, but had to postpone it.

Likewise. I'm finishing up a release and will be diving into this later this week. Beyond wanting to read/write MS's TIFF text info, I will be pondering whether their format or a variant might be the basis for a "standard" TIFF selectable text extension. Note that I say "selectable" - text bounding boxes are an essential requirement for us.

--
Glenn Widener
SwiftView Tools Product Manager

SwiftView Inc. - quality PCL portable document tools and services

www.swiftview.com
Work: (971)223-2621
Cell: (503)351-1178