AWARE [SYSTEMS] Imaging expertise for the Delphi developer
AWare Systems, Imaging expertise for the Delphi developer, Home TIFF and LibTiff Mailing List Archive

LibTiff Mailing List

TIFF and LibTiff Mailing List Archive
December 2003

Previous Thread
Next Thread

Previous by Thread
Next by Thread

Previous by Date
Next by Date

Contact

The TIFF Mailing List Homepage
This list is run by Frank Warmerdam
Archive maintained by AWare Systems



Valid HTML 4.01!



Thread

2003.12.13 15:00 "Stupid question", by Joris Van Damme
2003.12.13 16:26 "Re: Stupid question", by Frank Warmerdam
2003.12.13 16:54 "Re: Stupid question", by Joris Van Damme
2003.12.13 21:30 "Re: Stupid question", by Phillip Crews
2003.12.13 22:14 "Re: Stupid question", by Joris Van Damme
2003.12.16 08:07 "Re: Stupid question", by Joris Van Damme
2003.12.16 08:53 "Re: Stupid question", by Andrey Kiselev
2003.12.16 09:33 "Re: Stupid question", by Joris Van Damme
2003.12.16 13:04 "Re: Stupid question", by Phillip Crews
2003.12.16 13:47 "Re: Stupid question", by Joris Van Damme
2003.12.16 14:27 "Re: Stupid question", by Frank Warmerdam
2003.12.16 15:34 "Re: Stupid question", by Joris Van Damme
2003.12.16 16:02 "Re: Stupid question", by Andrey Kiselev
2003.12.16 17:02 "Re: Stupid question", by Joris Van Damme
2003.12.16 19:14 "Re: Stupid question", by Frank Warmerdam
2003.12.16 23:46 "Re: Stupid question", by Joris Van Damme

2003.12.16 15:34 "Re: Stupid question", by Joris Van Damme

> > I think my only hope in completing this project is by not going about
> > things manually anyway. For example, the 'old' on-line archive that 
> > goes back to aug 99 is 6217 messages long, and accessable only 
> > through 6217 seperate HTML pages. I didn't quite feel up to hitting 
> > 'Save as' 6217 times, so I brew a few lines of code that downloaded 
> > the 6217 messages to my hard disc with a single click. I figure 
> > that's the only feasable strategy for all post-processing and most of 
> > the grouping and indexing too. So I don't mind CR/LF, in fact, that's 
> > going to be quite a relief after brewing code to extract the true 
> > messages from the HTML pages and filtering out all viagra related 
> > stuff.
>
> Joris,
>
> Yikes.

You can say that again. ;-)

No, really, it's not that bad. The HTML is very predictable and uniform
accross pages, as is typical for generated HTML, of course, so I coded up a
little thing to extract the first message (your 'test' message, quite a
symbol), and that is already enough to handle most pages.

> Note, Andrey and I are happy to provide more direct access to the
> archives ... such as the mbox format recent archive if you need it.

The extraction from those HTML pages is nearly finished. But I figure I need
to handle this mbox format anyhow if I understand you correctly that this is
the format of the 2003 archive. And I'll most defenetly end up with code
eliminating duplicates, so anything is helpfull.

Nevertheless, the folder with the auto-downloaded HTML pages, each to a
seperate text file, is about 30 meg. This mbox format is a lot more
efficient, but still, I think it'll be 10 to 15 meg, right? It's not going
to be easy to mail that to me, especially seeing my provider's mail servers
are having a bad week again (or is it a bad year?). So, bottom line is: Yes,
I appreciate receive 'em, unless it's too much trouble sending them.

> Also,
> we are happy to host what you come up with as a cleaned up archive.

Let's first wait and see if I get this job done, I'm not yet quite 100% sure
that it's going to be feasable, even with maximum non-manual handling. I've
got a few ideas for usefull regrouping as a kind of context-sensitive help.
Except that it'll be more like 'TIFF-sensitive' instead of
context-sensitive, and more like 'knowledge base' instead of help. That is
not quite the complete archive but is almost guaranteed to be feasable I
think... So I guess something usefull is going to come out of this, anyhow,
even though it may perhaps not be a completely restored archive for the last
15 years or so. Anyway, let's just wait and see first.

As to the hosting... That may be the only part of the job that might be
somewhat beneficial to me. I may want to take you up on the hosting offer,
but I'm also wondering if you would mind me hosting the results myself. Even
if I'm allowed to host it myself, it will - of course - be a completely free
download, no strings or even advertising attached, except for the hosting
domain name and maybe a single pointer to my site or something. If this is
not acceptable, just say so, I'm not going to do anything on this project
that either you or Andrey object to. Your approval is important to me.



Joris