PR# 17745 Second example of seg faults in frozen or finalized SCOOP system
Problem Report Summary
Submitter: prestoat2000
Category: Runtime
Priority: Medium
Date: 2011/07/15
Class: Bug
Severity: Serious
Number: 17745
Release: 6.8.86627
Confidential: No
Status: Analyzed
Responsible: alexk_es
Environment: Mozilla/5.0 (X11; SunOS sun4u; rv:5.0) Gecko/20100101 Firefox/5.
Solaris 10 on SPARC
Synopsis: Second example of seg faults in frozen or finalized SCOOP system
Description
Here is a second example of seg faults in a frozen or finalized SCOOP system. These do not occur every time - maybe once in 10 or 20 executions. One stack trace (from dbx, with translations): (dbx) bt current thread: t@1 =>[1] F30_632(Current = 0x256a70 "", arg1 = 0, arg2 = 4046U), line 2333 in "is27.c" (processor_yield) [2] F30_541(Current = 0x256a70 "", arg1 = 0, arg2 = 4046U), line 1773 in "is27.c" (processor_is_idle) [3] F30_540(Current = 0x256a70 "", arg1 = 0), line 1741 in "is27.c" (scoop_processor_loop) [4] F30_522(Current = 0x256a70 ""), line 405 in "is27.c" (root_processor_creation_routine_exited) [5] F30_499(Current = 0x256a70 "", arg1 = '\n', arg2 = 0, arg3 = 0, arg4 = 0, arg5 = (nil), arg6 = (nil)), line 129 in \"is27.c" (scoop_manager_task_callback) [6] emain(argc = 2, argv = 0xffbfe894), line 20 in "einit.c" [7] main(argc = 2, argv = 0xffbfe894, envp = 0xffbfe8a0), line 46 in "emain.c" A second (similar) stack trace: t@1 (l@1) signal SEGV (no mapping at the fault address) in eif_synchronize_for_gc at 0x1db89d4 0x01db89d4: eif_synchronize_for_gc+0x0030: ld [%i5 + 156], %i3 (dbx) where current thread: t@1 =>[1] eif_synchronize_for_gc(0xbe800, 0x0, 0x1ea6a90, 0x0, 0xfffffff8, 0x0), at 0x1db89d4 [2] F27_630(0x209afb0, 0xffbfe078, 0x1a, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1d2ac5c (processor_sleep) [3] F27_632(0x209afb0, 0xffbfe198, 0xffbfe188, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1d2b7bc (processor_yield) [4] F27_541(0x209afb0, 0xffbfe298, 0xffbfe288, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1d1f948 (processor_is_idle) [5] F27_540(0xffbfe2b0, 0x1cb24a0, 0x1a, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1d1eda0 (scoop_processor_loop) [6] F27_522(0xffbfe470, 0x17, 0x1a, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1cfbc28 (root_processor_creation_routine_exited) [7] F27_499(0x209afb0, 0xffbfe710, 0xffbfe700, 0xffbfe6f0, 0xffbfe6e0, 0xffbfe6d0), at 0x1cf4b7c (scoop_manager_task_callback) [8] emain(0xffbfe748, 0xffbfe738, 0xffbfe728, 0x2b14f4, 0x1de8058, 0x2099330), at 0xedd94 [9] main(0x1, 0xffbfe89c, 0xffbfe8a4, 0x1e39c00, 0x7fb50100, 0x0), at 0xf13a4
To Reproduce
Freeze or finalize with attached classes and config file. Run EIFGENs/test/W_code/test or EIFGENs/test/F_code/test repeatedly (e.g., using "repeat 100 SOME_COMMAND"). You will notice some panics due to seg faults. Now run under dbx: dbx EIFGENs/test/F_code/test (dbx) run Do this repeatedly until execution stops with a seg fault (may take 10-20 tries).
Problem Report Interactions
A scoop system terminates when the gc has flagged every processor as redundant. A processor is marked as redundant when it can no longer be reached by any other processor, or when the only processors that can access it are also redundant. The root processor waits for all launched threads to exit before exiting itself. When all processors are marked as idle then a full collection will take place to determine which processors can be flagged as redundant, obviously any prior logged calls will cancel the redundancy.
In 7.3, system execution does not seg fault. Instead, it never terminates which Manu says is expected behavior since it is acquiring multiple locks one by one (and in different orders) instead of all locks at once. I think this example used to report a deadlock at runtime, but I'm not sure of that. There still seems to be some deadlock detection in ISE_SCOOP_MANAGER so I don't see why a deadlock is not detected at runtime.