PR# 17930 Test runtime008 fails about 0.5-1.0% of the time on Solaris x86
Problem Report Summary
Submitter: prestoat2000
Category: Runtime
Priority: Medium
Date: 2011/10/31
Class: Bug
Severity: Serious
Number: 17930
Release: 7.0.87451
Confidential: No
Status: Open
Responsible:
Environment: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.9.0.10) Gecko/2009042715 Firefox/3.0.10
Solaris 10 on x86
Synopsis: Test runtime008 fails about 0.5-1.0% of the time on Solaris x86
Description
Test runtime008 was supposedly fixed in rev 74088. However, I have found that on Solaris 10 on x86, a leak occurs between 6 and 12 times per 1000 runs (roughly 0.6% to 1.2% of the time). The failures occur with the 6.8 release and also with 7.0.87451 intermediate release. These failures can be reproduced outside of eweasel on a lightly loaded system. So far, I have not been able to get it to fail on Solaris SPARC. So it seems to me that either the test is written incorrectly or there is still a bug.
To Reproduce
Run eweasel test runtime008 with -keep option. Test passes. Now run the resulting frozen executable 1000 times. I used: repeat 1000 /tmp/eweasel/runtime008/EIFGENs/test/W_code/test 100 (Argument 100 is the number of threads to create). Usually between 6 and 12 leaks are reported.
Problem Report Interactions
I have also reproduced the leak on Solaris SPARC 64-bit. It doesn't happen nearly as often, but that may be because the SPARC machine is a lot slower.
I wonder whether the intermittent failures I have seen on tests runtime008, thread015 and thread020 (all of which are supposedly fixed) might be due to one of the thread-related bugs I reported some time ago. In particular, these three, which have not been closed yet, may be relevant: 14519 Blocking C externals that can raise exception not thread-safe 14518 Routine {THREAD_CONTROL}.join not thread-safe 14517 Possible execution of `eif_thr_exit' and call to eif_access while GC in progress Just a thought.
After a bit more experimentation, I discovered that the frozen executable from test runtime008 can fail with an argument as little as "2" (2 iterations of the loop). Leaks with such a small number of iterations are much more likely if the system is loaded. I artificially loaded it by running a number of executions in parallel. I also found that most of the time when a test fails, there are two leaks displayed. A typical one is: Iteration 57 Found leak: (curr - prev=922104) (curr - first=922104) Iteration 58 Found leak: (curr - prev=-922104) (curr - first=0) (Note that I modified the original test to display which iteration it is on). To reproduce, you can use the attached "runtime008" script. First run test runtime008 with -keep option. Then modify the line at the beginning of the attached script to set the correct path to the executable and the desired number of iterations (first argument to the executable). Then execute the script via (from csh): source runtime008 > my_ .... Output truncated, Click download to get the full message