PR# 17745 Second example of seg faults in frozen or finalized SCOOP system

Problem Report Summary
Submitter: prestoat2000
Category: Runtime
Priority: Medium
Date: 2011/07/15
Class: Bug
Severity: Serious
Number: 17745
Release: 6.8.86627
Confidential: No
Status: Analyzed
Responsible: alexk_es
Environment: Mozilla/5.0 (X11; SunOS sun4u; rv:5.0) Gecko/20100101 Firefox/5. Solaris 10 on SPARC
Synopsis: Second example of seg faults in frozen or finalized SCOOP system

Description
Here is a second example of seg faults in a frozen or finalized SCOOP system.
These do not occur every time - maybe once in 10 or 20 executions.

One stack trace (from dbx, with translations):

(dbx) bt
current thread: t@1
=>[1] F30_632(Current = 0x256a70 "", arg1 = 0, arg2 = 4046U), line 2333 in "is27.c"
      (processor_yield)
  [2] F30_541(Current = 0x256a70 "", arg1 = 0, arg2 = 4046U), line 1773 in "is27.c"
      (processor_is_idle)
  [3] F30_540(Current = 0x256a70 "", arg1 = 0), line 1741 in "is27.c"
      (scoop_processor_loop)
  [4] F30_522(Current = 0x256a70 ""), line 405 in "is27.c"
      (root_processor_creation_routine_exited)
  [5] F30_499(Current = 0x256a70 "", arg1 = '\n', arg2 = 0, arg3 = 0, arg4 = 0, arg5 = (nil), arg6 = (nil)), line 129 in \"is27.c"
      (scoop_manager_task_callback)
  [6] emain(argc = 2, argv = 0xffbfe894), line 20 in "einit.c"
  [7] main(argc = 2, argv = 0xffbfe894, envp = 0xffbfe8a0), line 46 in "emain.c"

A second (similar) stack trace:
t@1 (l@1) signal SEGV (no mapping at the fault address) in eif_synchronize_for_gc at 0x1db89d4
0x01db89d4: eif_synchronize_for_gc+0x0030:      ld       [%i5 + 156], %i3
(dbx) where
current thread: t@1
=>[1] eif_synchronize_for_gc(0xbe800, 0x0, 0x1ea6a90, 0x0, 0xfffffff8, 0x0), at 0x1db89d4
  [2] F27_630(0x209afb0, 0xffbfe078, 0x1a, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1d2ac5c
      (processor_sleep)
  [3] F27_632(0x209afb0, 0xffbfe198, 0xffbfe188, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1d2b7bc
      (processor_yield)
  [4] F27_541(0x209afb0, 0xffbfe298, 0xffbfe288, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1d1f948
      (processor_is_idle)
  [5] F27_540(0xffbfe2b0, 0x1cb24a0, 0x1a, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1d1eda0
      (scoop_processor_loop)
  [6] F27_522(0xffbfe470, 0x17, 0x1a, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1cfbc28
      (root_processor_creation_routine_exited)
  [7] F27_499(0x209afb0, 0xffbfe710, 0xffbfe700, 0xffbfe6f0, 0xffbfe6e0, 0xffbfe6d0), at 0x1cf4b7c
      (scoop_manager_task_callback)
  [8] emain(0xffbfe748, 0xffbfe738, 0xffbfe728, 0x2b14f4, 0x1de8058, 0x2099330), at 0xedd94
  [9] main(0x1, 0xffbfe89c, 0xffbfe8a4, 0x1e39c00, 0x7fb50100, 0x0), at 0xf13a4


To Reproduce
Freeze or finalize with attached classes and config file.
Run EIFGENs/test/W_code/test or EIFGENs/test/F_code/test repeatedly
(e.g., using "repeat 100 SOME_COMMAND").  You will notice some panics
due to seg faults.

Now run under dbx:

   dbx EIFGENs/test/F_code/test
   (dbx) run

Do this repeatedly until execution stops with a seg fault (may take 10-20
tries).
Problem Report Interactions
From:misterieking    Date:2013/10/04    Download   
A scoop system terminates when the gc has flagged every processor as redundant.  A processor is marked as redundant when it can no longer be reached by any other processor, or when the only processors that can access it are also redundant.  The root processor waits for all launched threads to exit before exiting itself.  When all processors are marked as idle then a full collection will take place to determine which processors can be flagged as redundant, obviously any prior logged calls will cancel the redundancy.

From:prestoat2000    Date:2013/10/04    Download   
In 7.3, system execution does not seg fault.  Instead, it never terminates
which Manu says is expected behavior since it is acquiring multiple locks
one by one (and in different orders) instead of all locks at once.

I think this example used to report a deadlock at runtime, but I'm not sure
of that.  There still seems to be some deadlock detection in ISE_SCOOP_MANAGER
so I don't see why a deadlock is not detected at runtime.


From:prestoat2000    Date:2011/07/15    Download   
Attachments for problem report #17745

Attachment: test.e     Size:528
Attachment: test1.e     Size:105
Attachment: test2.e     Size:199
Attachment: test.ecf     Size:899