2000.03.19 02:42 "faxq maxing out CPU - let's crack this!!!!!", by Darren Nickerson

2000.03.19 10:09 "Re: faxq maxing out CPU - let's crack this!!!!!", by Robert Colquhoun

As may of you know, the latest release of Sam Leffler's libtiff (now maintained by the great people at http://www.libtiff.org) broke HylaFAX(tm). Not too much of a problem for us until RedHat decided to adopt the new libtiff and release update RPMs for RedHat-6.1 (Cartman). Now we're in deep deep trouble, and it's essential that we resolve this issue - as far as I know it's the only MAJOR problem holding back a release of hylafax-4.1-beta3.

The true 'deep deep trouble' is that the C++ code hylafax consists of is of such average quality that only a handful of programmers are able to understand it....this is just a symptom.

There are a couple of other patches needed before 4.1beta3 is done - a suse patch to discard ip-source routed packets and a fix to the manpage script to prevent a gaping security hole, and probably others i have forgotten about.

I sat down today and recreated the bug. It's easy to do... just install a build of the current HylaFAX CVS onto vanilla RedHat, then upgrade libtiff to:

[darren@ducent darren]$ rpm -qa | egrep tiff
libtiff-3.5.4-1
libtiff-devel-3.5.4-1

I set strace up on each of the relevant processes, and then submitted a fax, and watched. Sure enough, the fax was submitted and processing began (sendfax exited cleanly) and faxq ended up maxing out CPU, leaving hylafax completely hung and unresponsive, yet not making any syscalls. In fact, once the hang was initiated, the strace output did not grow at all.

Using a debugger the problem is in the EXPAND1D macro in faxd/tif_fax3.h include:

It is looping continuously in the for loop starting at line 341 to 359

Somehow: TabEnt->State = S_MakeUpB, a0 = 0, RunLength = 0 and TabEnt->Param = 0

(section of code starting at line 341)

         for (;;) {                                                      \
             LOOKUP16(13, TIFFFaxBlackTable, eof1d);                     \
             switch (TabEnt->State) {                                    \
             case S_EOL:                                                 \
                 EOLcnt = 1;                                             \
                 goto done1d;                                            \
             case S_TermB:                                               \
                 SETVAL(TabEnt->Param);                                  \
                 goto doneBlack1d;                                       \
             case S_MakeUpB:                                             \
             case S_MakeUp:                                              \
                 a0 += TabEnt->Param;                                    \
                 RunLength += TabEnt->Param;                             \
                 break;                                                  \
             default:                                                    \
                 unexpected("BlackTable", a0);                           \
                 goto done1d;                                            \
             }                                                           \
         }

Possibly something funny is happening in the LOOKUP16 function, or perhaps earlier there is an error that leads to this situation.

PS I am having trouble debugging this - nested defines which expand to hundreds of lines of code, many variables, goto's all over the place...its as if the author saw hell in a dream, then woke up and had to write about it and the result is this code!

- Robert