PR# 17745 Second example of seg faults in frozen or finalized SCOOP system

Problem Report Summary

Submitter: prestoat2000

Category: Runtime

Priority: Medium

Date: 2011/07/15

Class: Bug

Severity: Serious

Number: 17745

Release: 6.8.86627

Confidential: No

Status: Analyzed

Responsible: alexk_es

Environment: Mozilla/5.0 (X11; SunOS sun4u; rv:5.0) Gecko/20100101 Firefox/5. Solaris 10 on SPARC

Synopsis: Second example of seg faults in frozen or finalized SCOOP system

Description

Here is a second example of seg faults in a frozen or finalized SCOOP system.
These do not occur every time - maybe once in 10 or 20 executions.

One stack trace (from dbx, with translations):

(dbx) bt
current thread: t@1
=>[1] F30_632(Current = 0x256a70 "", arg1 = 0, arg2 = 4046U), line 2333 in "is27.c"
      (processor_yield)
  [2] F30_541(Current = 0x256a70 "", arg1 = 0, arg2 = 4046U), line 1773 in "is27.c"
      (processor_is_idle)
  [3] F30_540(Current = 0x256a70 "", arg1 = 0), line 1741 in "is27.c"
      (scoop_processor_loop)
  [4] F30_522(Current = 0x256a70 ""), line 405 in "is27.c"
      (root_processor_creation_routine_exited)
  [5] F30_499(Current = 0x256a70 "", arg1 = '\n', arg2 = 0, arg3 = 0, arg4 = 0, arg5 = (nil), arg6 = (nil)), line 129 in \"is27.c"
      (scoop_manager_task_callback)
  [6] emain(argc = 2, argv = 0xffbfe894), line 20 in "einit.c"
  [7] main(argc = 2, argv = 0xffbfe894, envp = 0xffbfe8a0), line 46 in "emain.c"

A second (similar) stack trace:
t@1 (l@1) signal SEGV (no mapping at the fault address) in eif_synchronize_for_gc at 0x1db89d4
0x01db89d4: eif_synchronize_for_gc+0x0030:      ld       [%i5 + 156], %i3
(dbx) where
current thread: t@1
=>[1] eif_synchronize_for_gc(0xbe800, 0x0, 0x1ea6a90, 0x0, 0xfffffff8, 0x0), at 0x1db89d4
  [2] F27_630(0x209afb0, 0xffbfe078, 0x1a, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1d2ac5c
      (processor_sleep)
  [3] F27_632(0x209afb0, 0xffbfe198, 0xffbfe188, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1d2b7bc
      (processor_yield)
  [4] F27_541(0x209afb0, 0xffbfe298, 0xffbfe288, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1d1f948
      (processor_is_idle)
  [5] F27_540(0xffbfe2b0, 0x1cb24a0, 0x1a, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1d1eda0
      (scoop_processor_loop)
  [6] F27_522(0xffbfe470, 0x17, 0x1a, 0xfffffffc, 0x1de8058, 0x209a560), at 0x1cfbc28
      (root_processor_creation_routine_exited)
  [7] F27_499(0x209afb0, 0xffbfe710, 0xffbfe700, 0xffbfe6f0, 0xffbfe6e0, 0xffbfe6d0), at 0x1cf4b7c
      (scoop_manager_task_callback)
  [8] emain(0xffbfe748, 0xffbfe738, 0xffbfe728, 0x2b14f4, 0x1de8058, 0x2099330), at 0xedd94
  [9] main(0x1, 0xffbfe89c, 0xffbfe8a4, 0x1e39c00, 0x7fb50100, 0x0), at 0xf13a4

To Reproduce

Freeze or finalize with attached classes and config file.
Run EIFGENs/test/W_code/test or EIFGENs/test/F_code/test repeatedly
(e.g., using "repeat 100 SOME_COMMAND").  You will notice some panics
due to seg faults.

Now run under dbx:

   dbx EIFGENs/test/F_code/test
   (dbx) run

Do this repeatedly until execution stops with a seg fault (may take 10-20
tries).

Problem Report Interactions

From:misterieking Date:2013/10/04 Download

A scoop system terminates when the gc has flagged every processor as redundant.  A processor is marked as redundant when it can no longer be reached by any other processor, or when the only processors that can access it are also redundant.  The root processor waits for all launched threads to exit before exiting itself.  When all processors are marked as idle then a full collection will take place to determine which processors can be flagged as redundant, obviously any prior logged calls will cancel the redundancy.

From:prestoat2000 Date:2013/10/04 Download

In 7.3, system execution does not seg fault.  Instead, it never terminates
which Manu says is expected behavior since it is acquiring multiple locks
one by one (and in different orders) instead of all locks at once.

I think this example used to report a deadlock at runtime, but I'm not sure
of that.  There still seems to be some deadlock detection in ISE_SCOOP_MANAGER
so I don't see why a deadlock is not detected at runtime.

From:prestoat2000 Date:2011/07/15 Download

Attachments for problem report #17745

Attachment: test.e Size:528

Attachment: test1.e Size:105

Attachment: test2.e Size:199

Attachment: test.ecf Size:899