PR# 14518 Routine {THREAD_CONTROL}.join not thread-safe

Problem Report Summary
Submitter: prestoat2000
Category: Runtime
Priority: Medium
Date: 2008/06/24
Class: Bug
Severity: Serious
Number: 14518
Release: 6.2.73753
Confidential: No
Status: Analyzed
Responsible:
Environment: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.8.1.13) Gecko/20080328 Firefox/2.0.0.13 Solaris 10 on SPARC
Synopsis: Routine {THREAD_CONTROL}.join not thread-safe

Description
Based on code inspection only, it looks like routine
{THREAD_CONTROL}.join is not thread-safe.  There seem to be two
problems.  Below is my reasoning (apologies in advance if I'm wrong).

1. First problem:

{THREAD_CONTROL}.join calls `thread_wait', which is a blocking C external that
   is run-time routine `eif_thr_wait.
At entry to `eif_thr_wait', status of GC status of calling thread is
   EIF_THREAD_BLOCKED.
Routine `eif_thr_wait' calls RT_GC_PROTECT(thread_object), which translates
   to a call to `epush'.
Epush may have to call `st_extend', which calls `eif_rt_xmalloc' with GC_OFF.
Eif_rt_xmalloc calls `allocate_free_list', which locks `eif_free_list_mutex'
   and then calls `allocate_free_list_helper'.  This latter routine looks at
   the free list and may modify it as well.
Since the thread status is still EIF_THREAD_BLOCKED, a call to
   `eif_synchronize_gc' by another thread could complete and that thread
   could start GC.
GC can call `rel_core' which can call `free_chunk' which can call
   `chunk_coalesc' which can access and modify the free list.  So we could
   have two threads modifying the free list at the same time, one holding
   `eif_free_list_mutex' and the other under GC synchronization.

The problem seems to be that `eif_thr_wait' can do things with
Eiffel-controlled memory while its status is EIF_THREAD_BLOCKED, which
is a no-no.

2. Second problem:

Even if the call to RT_GC_PROTECT(thread_object) does not have to call
`eif_rt_xmalloc', there is a problem.  Since the thread status is
EIF_THREAD_BLOCKED, a GC may be in progress while `eif_thr_wait'
executes.  So `eif_access(Current)' may compute a value, then GC may
move the object, then the old location could be stored int
`thread_object'.  There are similar race conditions farther down in
the routine.
To Reproduce

										
Problem Report Interactions
From:prestoat2000    Date:2008/07/04    Download   
I don't have any other ideas.  If the thread object cannot move (if it is
frozen early on), then there is no problem.  If it can move, then it seems
like the solution is to have the run-time think the thread is in Eiffel
code (via EIF_EXIT_C) whenever the thread can try to access `thread_object'.
However, it seems to me that after each EIF_EXIT_C call, you must call
RTGC or equivalent.  Otherwise, a GC could be in progress in another thread
and then there is a race condition where `thread_object' is evaluated, 
then the thread object is moved by the thread doing the GC cycle and
`thread_object' is updated, then the thread executing `eif_thr_wait'
evaluates the rest of the expression

   *(EIF_BOOLEAN *) (thread_object + offset) == EIF_FALSE

and incorrectly uses the old location.

From:manus_eiffel    Date:2008/06/30    Status: Analyzed    Download   
The safe solution ignoring the RT_GC_PROTECT call which we could do otherwise to avoid the first issue you mentioned (i.e. not marking the external blocking but inserting ourself the calls to EIF_ENTER_C and EIF_EXIT_C in the `eif_thr_wait' routine directly) is to ensure that a thread object cannot move.

Do you have any other idea?