PR# 16859 (Void Safe) Problem with blocking sockets and threads
Problem Report Summary
Submitter: jsostroff
Category: EiffelNet
Priority: High
Date: 2010/06/17
Class: Bug
Severity: Critical
Number: 16859
Release: 6.6.8.3355 GPL Edition - win64)
Confidential: No
Status: Closed
Responsible:
Environment: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; GTB0.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2)
Synopsis: (Void Safe) Problem with blocking sockets and threads
Description
Version = EiffelStudio 6 (6.6.8.3355 GPL Edition - win64) See attached file with a Plant app and a Controller app communictaing messages with each other over a socket. Finalizing the Plant application generates a segmentation error (see prior bug report). On the Plant side there are is (a) a thread managing the communication over the socket, and (b) the other threads access a global state (also accessed by (a)). There are two places where the server blocks. The first is at a socket.accept (the point where the client-server connection is established). The second is where the server retrieves a message (via STORABLE) from the client socket: check -- might block here attached {MESSAGE} retrieved (client_socket) as cs then ... end At either blocking point, if the client is not ready to connect or to send a message, the server blocks (i.e. thread (a) is blocked waiting for client communication), which is ok. The problem is that thread (b) also blocks until execution on (a) continues... This is not the behaviour we were expecting. We want thread (b) to work irrespective of what is happening on the communication thread (a).
To Reproduce
(1) Finalizing the Plant application generates a segmentation error (see prior bug report). (2) Blocking Problem: Compile Plant and Controller seprarately. In the communication manager, set the simulation boolean variable to false. Execute Plant. Execute Controller. The two agree to communicate. At the controller, the user is prompted to enter 0, 1, 2 or 3. This is where the Plant blocks i.e. all threads in category (b).
Problem Report Interactions
This is fixed for the 6.7 release in rev#83784. The fix won't be apply to 6.6 and the suggested workaround is what to do in the meantime.
We are trying to put a similar fix in our runtime to avoid you from having to do this workaround in 6.7. But in the meantime, this is the best thing to do.
I see. I understand the issue. Indeed because the code of retrieval is not thread safe, we have to block all running threads to ensure that nothing bads happen. We have faced the same issue with AutoTest which is also using EiffelNet serialization to communicate with the tested program. The solution is to simply send a first byte of data and when that data is received, to call `eif_net_retrieved'.
<<I've ran the system and I'm not sure what is blocking. Nothing is blocking for me regardless of choosing either 0, 1, 2 or 3. Can you provide more specifics?>> I am assuming that you have compiled and are running each of the the Controller app and the Plant app (they communicate message objects over a blocking socket). I am assuming that communication has been established between the two apps. (*) From here on in, the Plant should constantly be printing event actions to the Plant console, irrespective of what actions or messages you send on the Controller app. This is because each Plant event runs in its own thread and communication is set up in such a way that communication actions cannot block Plant events forever. The problem is that when you enter a 0, 1, 2 or 3 message at the Controller side, the Plant events eventually stop firing. (**) I have checked the same design sending strings instead of objects across the socket. The desired ongoing Plant event firing is observed. My conclusion .... Output truncated, Click download to get the full message
I've ran the system and I'm not sure what is blocking. Nothing is blocking for me regardless of choosing either 0, 1, 2 or 3. Can you provide more specifics? I can also reproduce the failure of finalization.
I now believe that the problem is due to the interaction between SOCKET and STORABLE. (1) When sending strings via the socket, there is no blocking system deadlock. (2) When sending objects (e.g. of type MESSAGE) via STORABLE across the socket is when we get the blocking system deadlock.