PR# 2907 Cannot resolve ambiguous expressions

Problem Report Summary
Submitter:
Category: EiffelLex
Priority: Medium
Date: 2001/08/29
Class: Bug
Severity: Serious
Number: 2907
Release: 5.0
Confidential: No
Status: Open
Responsible:
Environment: Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)
Synopsis: Cannot resolve ambiguous expressions

Description
If I take a freshly created heir of METALEX, and use put_nameless_expression to add the following two expressions (and no others):

"hello"
*($.)

I will never receive the token for "hello" from EiffelLex.  When the second expression is added, EiffelLex outputs a warning message to io.  Basically, it states:

"Some tokens can be recognized by both "hello" and *($.).  The second one will have priority."

Since that exactly described what I saw happening, I changed my code to add the two expressions in the other order.  I experienced exactly the same behavior (only recognized the wildcard token, not "hello") except that this time the warning message said:

"Some tokens can be recognized by both *($.) and "hello".  The second one will have priority."

In this case, the warning message is incorrect.  "hello" did not get priority.

This case is common, and handled by lex, flex, and any other lex-based engine.  They disambiguate by using the most precise match.  I would be satisfied (for current use) by EiffelLex simply doing what its warning message claims that it is going to do.  Any time you need to absorb some unknown amount of junk and extract known tokens from it, you will use something like this.  

The exact use I am trying to make is a URL retrieval verification, where the retrieved file is tested against a user specified pattern to determine if it matches the pattern or not.  The user can specify a pattern like:

<html>
...
<body>
...
</body>
...
</html>

After I take in that input, I use *($.) as the pattern wherever the user specified "...".  This particular pattern is only useful for seeing if a well formed HTML page was present, but more complicated patterns could be specified.

Even more exactly, the two precise expressions I am attempting to match are:

"GIF"
*($.)

and the string being matched against is:

"GIF87a<various characters>"

The various characters have all been filtered so that they are normal ASCII (0-127), so that they are all in the text region.  The string is always matched as a giant token of type *($.), which is accurate, but not useful.  It should have returned a token of type "GIF" and then a token of type *($.).

I have tried specifying the "GIF" expression as 'G' 'I' 'F', but it does not change anything.

I cannot complete this project without a workaround.
To Reproduce
1. Make console system.
2. Add EiffelLex cluster.
3. Derive class BUG from METALEX.
4. Create instance a_bug of BUG.
5. Use a_bug.put_nameless_expression or a_bug.put_expression to add the expression "GIF" or 'G' 'I' 'F' with a value of 13.
6. Use a_bug.put_nameless_expression or a_bug.put_expression to add the expression *($.) with a value of 42.
7. Use a_bug.make_analyzer.
8. Use a_bug.analyzer.set_string("GIF87a").
9. Call a_bug.analyzer.get_token.
10. Look at a_bug.analyzer.last_token.  It *should* have a value of 13, but will have a value of 42.

Steps 5 and 6 can be flipped with no change in the outcome.
Problem Report Interactions
From:    Date:2001/08/29    Download