[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Lexical analysis
Dear Peter,
we are just implementing a lexical analysis for
our Isabelle CASL parser (up to now, we have used
Isabelle's standard lexical analysis, but this
is not exactly what is needed for CASL).
Now there are the following questions and problems:
- Are the following recognized as complete TOKENs or not?
-> ->? =e= {} ×
If they are, they should be listed together with
< * ? ! and / on p. C-10, if not, they should be listed
together with : :? ::= etc. on p. C-10.
- A NUMBER can simultaneously be a WORDS.
We currently resolve this by scanning it as a NUMBER,
and adding
TOKEN ::= WORDS | SIGNS | DOT-WORDS | NUMBER
and perhaps also
SIMPLE-ID ::= WORDS | NUMBER
but the latter seems not to be very useful.
By the way, "5a5" and "'5a" are recognized as WORDS as well,
was this intended, or should a WORDS start with a letter?
- There is no syntax for PATH and URL. Should we follow some
international standard here? If so, which one and how to obtain
a precise description of the syntax?
- A WORDS can (probably - see above) simultaneously be a PATH.
We currently resolve this by scanning it as a WORDS,
and adding
LIB-IB ::= URL | PATH | WORDS
- More seriously, x/zero is currently recognized as a PATH,
but within a TERM, it should be recognized as three lexical
tokens, namely WORDS SINGS WORDS. There is no way to distinguish
these cases at the lexical level, and by the longest match rule,
we always get a PATH.
Moreover, probably other SIGNS (like ".") will be allowed in a PATH,
leading to similar problems.
One way out would be to disallow SIGNS in a PATH, and reinroduce
the necessary SIGNS, such as "/" and ".", via the grammar. But this would
allow to write PATHs interspersed with spaces, such as
CASLdir / examples / file1 . casl
while we probably would like to enforce the user to write
CASLdir/examples/file1.casl
The other possibility would be to require to quote a PATH, e.g.
"CASLdir/examples/file1.casl"
The same problem also occurs for URLs.
We also have two problems with the grammar:
- There is no syntax for TOKEN-PLACES on page C-5 bottom.
We assume that the syntax is
TOKEN-PLACES ::= PLACE ... PLACE TOKEN PLACE ... PLACE
| TOKEN PLACE ... PLACE
| PLACE ... PLACE TOKEN
| TOKEN
(if there is not exactly one TOKEN in a TOKEN-PLACES,
it becomes unclear where to attach the components of
the compound id).
- The production
SIMPLE-TERM ::= ID | ....
has to be replaced by
SIMPLE-TERM ::= TOKEN-ID | ....
because a MIXFIX-ID should not be a legal SIMPLE-TERM
(and we would run into ambiguity problems).
Greetings and happy new year,
Till and Kolyang