PR# 17930 Test runtime008 fails about 0.5-1.0% of the time on Solaris x86

Problem Report Summary
Submitter: prestoat2000
Category: Runtime
Priority: Medium
Date: 2011/10/31
Class: Bug
Severity: Serious
Number: 17930
Release: 7.0.87451
Confidential: No
Status: Open
Environment: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv: Gecko/2009042715 Firefox/3.0.10 Solaris 10 on x86
Synopsis: Test runtime008 fails about 0.5-1.0% of the time on Solaris x86

Test runtime008 was supposedly fixed in rev 74088.  However, I have found
that on Solaris 10 on x86, a leak occurs between 6 and 12 times per
1000 runs (roughly 0.6% to 1.2% of the time).  The failures occur with
the 6.8 release and also with 7.0.87451 intermediate release.  These failures
can be reproduced outside of eweasel on a lightly loaded system.

So far, I have not been able to get it to fail on Solaris SPARC.

So it seems to me that either the test is written incorrectly or there is
still a bug.

To Reproduce
Run eweasel test runtime008 with -keep option.  Test passes.
Now run the resulting frozen executable 1000 times.  I used:

   repeat 1000 /tmp/eweasel/runtime008/EIFGENs/test/W_code/test 100

(Argument 100 is the number of threads to create).  Usually between 6
and 12 leaks are reported.

Problem Report Interactions
From:prestoat2000    Date:2011/11/02    Download   
I have also reproduced the leak on Solaris SPARC 64-bit.  It doesn't happen
nearly as often, but that may be because the SPARC machine is a lot slower.

From:prestoat2000    Date:2011/11/01    Download   
I wonder whether the intermittent failures I have seen on tests runtime008,
thread015 and thread020 (all of which are supposedly fixed) might be due to
one of the thread-related bugs I reported some time ago.  In particular,
these three, which have not been closed yet, may be relevant:

14519    Blocking C externals that can raise exception not thread-safe
14518    Routine {THREAD_CONTROL}.join not thread-safe
14517    Possible execution of `eif_thr_exit' and call to eif_access while GC in progress

Just a thought.

From:prestoat2000    Date:2011/11/01    Download   
After a bit more experimentation, I discovered that the frozen executable
from test runtime008 can fail with an argument as little as "2"
(2 iterations of the loop).  Leaks with such a small number of iterations
are much more likely if the system is loaded.  I artificially loaded it
by running a number of executions in parallel.

I also found that most of the time when a test fails, there are two
leaks displayed.  A typical one is:

Iteration 57   Found leak: (curr - prev=922104) (curr - first=922104)
Iteration 58   Found leak: (curr - prev=-922104) (curr - first=0)

(Note that I modified the original test to display which iteration it is on).

To reproduce, you can use the attached "runtime008" script.
First run test runtime008 with -keep option.
Then modify the line at the beginning of the attached script to set the correct
path to the executable and the desired number of iterations (first argument
to the executable).

Then execute the script via (from csh):

   source runtime008 > my_
Output truncated, Click download to get the full message

Attachment: runtime008     Size:148961