AWARE [SYSTEMS] Imaging expertise for the Delphi developer
AWare Systems, Imaging expertise for the Delphi developer, Home TIFF and LibTiff Mailing List Archive

LibTiff Mailing List

TIFF and LibTiff Mailing List Archive
December 2003

Previous Thread
Next Thread

Previous by Thread
Next by Thread

Previous by Date
Next by Date

Contact

The TIFF Mailing List Homepage
This list is run by Frank Warmerdam
Archive maintained by AWare Systems



Valid HTML 4.01!



Thread

2003.12.13 15:00 "Stupid question", by Joris Van Damme
2003.12.13 16:26 "Re: Stupid question", by Frank Warmerdam
2003.12.13 16:54 "Re: Stupid question", by Joris Van Damme
2003.12.13 21:30 "Re: Stupid question", by Phillip Crews
2003.12.13 22:14 "Re: Stupid question", by Joris Van Damme
2003.12.16 08:07 "Re: Stupid question", by Joris Van Damme
2003.12.16 08:53 "Re: Stupid question", by Andrey Kiselev
2003.12.16 09:33 "Re: Stupid question", by Joris Van Damme
2003.12.16 13:04 "Re: Stupid question", by Phillip Crews
2003.12.16 13:47 "Re: Stupid question", by Joris Van Damme
2003.12.16 14:27 "Re: Stupid question", by Frank Warmerdam
2003.12.16 15:34 "Re: Stupid question", by Joris Van Damme
2003.12.16 16:02 "Re: Stupid question", by Andrey Kiselev
2003.12.16 17:02 "Re: Stupid question", by Joris Van Damme
2003.12.16 19:14 "Re: Stupid question", by Frank Warmerdam
2003.12.16 23:46 "Re: Stupid question", by Joris Van Damme

2003.12.16 23:46 "Re: Stupid question", by Joris Van Damme

> No pressure.  Frankly if you could assemble the email, clean out the spam
> and make it available for a normal threaded archive I think that would be
> great.

When text is concerned, I strongly believe in 'single-sourcing'. You know,
grabbing data, building to codes to convert it to a proprietary format that
is really cut out for the particular data, and next being able to build the
code to convert it into anything else. The main point of this
'single-sourcing' is that the actual content should not be format-specific,
meaning, since we live in a real world, should have a format that is exactly
fitted to the data and can be transformed into any other.

Having progressed a little more, I can see now that I best work in two
stages. Indeed, there should be no regrouping or categorizing in this first
stage. Therefore, the result of the first stage is a normal threaded
archive. I'll be able to deliver mbox, I'm sure, as well as HTML pages, a
Word doc, pdf, or just about anything. I already build and am able to reuse
a limited HTML and PDF codec, and have experience in Word OLE, so it's just
this mbox, but that will be no problem at all I guess. I realize this talk
about 'proprietary format' in-between may not sound very average, but, trust
me, I did some similar stuff before. The only real reason for concern is the
magnitude of the project.

If you can run (and trust) a windows executable, I'll send you a viewer for
the proprietary format with the data I processed up to that point in a few
days. That executable will be able to build a plain text output, HTML pages
and a Word doc already. If not, I'll send you the HTML pages. That is, if
you like to receive such a preview, of course. The current processing
involves:
- extracting from the HTML pages (the aug 99 archive)
- processing of the headers to no longer include 'Company' and 'CC' and
'Reply-To' and such, if originally present
- processing of the 'From' field in the header to uniquelly identify a
sender, even if that sender changed his/hers e-mail address during these
many years
- converting the date indications to GMT
- filtering out test messages, mailing list software generated messages, and
spam
- converting the occasional HTML message to plain text

Perhaps I'll add some de-word-wrapping code, so as to end up with formatable
text, as opposed to pre-formated, but I haven't given that a lot of thought
yet.

All of these actions are done automatically, for the most part, by my
temporary dirty code. But I still review each processed message by hand, to
either give my consent or improve on the code, and I plan to continue doing
that... Which means that this will take some time...

> Even better if you can extract a few FAQs with answers.

In a second stage, I will still like to build a 'TIFF-sensitive knowledge
base' kinda thing, that is able to enumerate relevant pointers into the spec
and relevant mailing list messages from any TIFF page, and I like this idea
of a FAQ too. Categorizing the messages in the complete archive build in the
first stage will be the main issue here, I guess.

> I don't see any problem with your hosting a copy, or any reasonable use
> you put the email archive to.  You have my complete approval.

Grand!



Joris