2018.01.16 18:36 "Re: [Tiff] Strategies for multi-core speedups", by

I am a complete newbie to digital imagery and TIFFs in general, so please

forgive any ignorance.

I'm an automated test tool developer and I am actually going to start
working on a viewer for the TIFFs that my company generates, to
potentially include relying on the bigtiff format and this is the thing
I've spent most of my time thinking about. I've come to the conclusion
that for the images we generate, up to 10GB uncompressed(soon to be 40GB),
that I will need to preprocess a single TIFF into multiple TIFFs
containing intelligently placed tiles just to get around the I/O issue. By
intelligently placed, I mean that if a user were to scroll their view
window to the right, and five tiles(vertically aligned) had to be loaded
into memory, I could theoretically have those files intelligently located
in different TIFF files(subTIFFs?) that would allow multiple threads to
operate on. So each thread only had to find one tile. So for my tool it
would be a staggering amount of pre-processing, but real time manipulation
is extremely fluid for the user which is basically my number one

requirement.

So for a case like mine, thread overhead is nothing to worry about. That

and the more I read about the C++11 thread library, the less I worry. I

can just leave it to the library to determine how many threads should

operate in my 4-core environments, or my 16-core environments. At least

that's the impression I've garnered.

Joseph Maniaci
Staff Software Engineer
RPPS

Subject: Re: [Tiff] Strategies for multi-core speedups [EXTERNAL]
Sent by: tiff-bounces@lists.maptools.org

On 01/15/2018 08:15 PM, Bob Friesenhahn wrote:

On Mon, 15 Jan 2018, Mike Stanton wrote:

To begin with I am an Operating Systems guy. When you are looking

at going to multi-threading, what have you determined as your

setup/teardown cost for the individual threads? How large a chunk

will each thread be processing? Will the individual Scan-lines

and/or Tiles be sufficiently large to absorb the cost of that setup

and teardown by the inherent performance you are projecting?

It is normal to use a pool of threads where threads are created in

advance (or added to the pool on demand) and obtained from, and

returned to, the pool. This significantly reduces overheads caused by

starting/stopping threads.

Creating and destroying threads ought to take on the scale of a few microseconds (on Linux anyway). Larry Gritz was saying it takes 16.2 seconds to read what could be 1024 tiles of size 256×256, which is 15.8 ms per tile, which is nearly four orders of magnitude longer than thread creation for reading. Writing with Deflate compression is about an order of magnitude slower than reading.

However, you don't need to launch a new thread for every tile if you just break up the job into N=number_of_cpus pieces and launch N threads where each is pre-assigned to read or write a particular group of tiles of the overall image. The thread-creation overhead would be around six or seven orders of magnitude smaller than the reading or writing.

https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.maptools.org_mailman_listinfo_tiff&d=DwICAg&c=5hYF0Zu0Yz-C6S-kaHDItw&r=HnVw9YJBS4ST0tOBucT3NvWUJRRGhDxXu4zHnS5AkgY&m=maPcpWIIyHM0McvhMluxRFvFc3ku8eN0opVcbXErRdw&s=gO3D_Jk1nPaHZIGWP4FoVE2ZjlePymETKPEx3kVi0g4&e=

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.remotesensing.org_libtiff_&d=DwICAg&c=5hYF0Zu0Yz-C6S-kaHDItw&r=HnVw9YJBS4ST0