[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: concrete syntax problems in arch spec and views
I have heavily edited Peter's answer to my original mail to keep this message
short, hoping without altering the meaning.
> From mosses@csl.sri.com Sun Sep 13 00:24:06 1998
> > From Frederic.Voisin@lri.fr
> Subject: Re: concrete syntax problems in arch spec and views
> > Here is a short list of the current problems with the concrete syntax
> > for Casl.
For architectural spec. and views, here is the current state of work from my
point of view:
1. I have adopted the suggestions proposed by Christophe, Michel and Peter in
July for some of the problems:
- in architectural spec. one could not embed arbitrary BASIC-SPEC without
bracketting, i.e. the original productions
UNIT-SPEC ::= UNIT-SPEC-NAME
| SPEC
| SPEC * ... * SPEC -> SPEC
and VIEW-TYPE ::= SPEC -> SPEC
would be replaced by
UNIT-SPEC ::= GROUPED-SPEC
| GROUPED-SPEC * ... * GROUPED-SPEC -> GROUPED-SPEC
and VIEW-TYPE ::= GROUPED-SPEC -> GROUPED-SPEC
Production UNIT-SPEC ::= UNIT-SPEC-NAME has been removed since it is
a special case of GROUPED-SPEC (one cannot make the distinction between
a SPEC-NAME and a UNIT-SPEC-NAME at the syntactic level).
- As proposed by Michel, the UNIT-TERM-list after the "given" keyword
in UNIT-DECL has been restricted to a unique UNIT-TERM. Maybe using
the "then" instead of "," plus fixing a precedence level might solve
the problem if really needed (or we rely on the "and" ??)
- Once the above is adopted, most of the problems are related to the
precedence level of UNIT-TERM (RESTRICTION, AND, ..) By adopting the same
precedence levels as in structured specs. most of them disappear in the LaLR
version.
- Still there is a problem with optional semi-colons. Consider:
arch spec Name =
unit UnitName = lambda
UName : arch spec unit <UnitDeclDefnList>
result <UnitExpression>
; <--- Problem here !!!
....
When about to parse the semi-colon, one cannot know if it is the optional
semi-colon that ends the BASIC-ARCH-SPEC declared by "units.... result..."
or if that construction ends without the semi-colon, in which case
the semi-colon is the one between multiple UNIT-BINDIND in the lambda
expression... This may not be the only case where the problem occurs.
I tried using an optional "end", or using "," instead of ";", at various
locations, but the only solution I found is by DISALLOWING the optional
";" at the end of the production for BASIC-ARCH-SPEC...
Any clever idea is welcome.
With the above propositions, I have no additional conflicts for
arch. spec. than with structured spec only (that does not imply that the
work is done :-( )
Do you think that the above restrictions are reasonable or not ? Let me know
before I go further...
> > There are also problems with the syntax of PATHs when naming specifications.
> I don't see anything called PATH in the grammars in the Summary v0.99;
> I guess you mean the use of "/" in LIB-ID? Notice that this is not
> allowed when referring to a spec, only to a library.
>
Well, in fact it is Christophe that was having the problem in his LL(2)
parser... For me is seems to work.
Note however that by restricting the various parts in a LIB-ID to
SIMPLE-ID, we cannot deal with an arbitrary UNIX path :-(.
Consider for instance any file named .../ASF+SFD/... , because of the '+'.
This is why we have suggested to use a special lexical form to denote such
path, like "arbitrary_unix_path_enclosed_in_double__quotes" or
whatever you like but still scannable as a unique token without having
to control much of its content.
> > Micro-problems/ambiguities in the current document
> > - In a definition by extension, spaces are needed around the '.'
> > NO: Odd = {n:Nat.odd n} %% Note that ".odd" is a valid token
> > Yes: Odd = {n:Nat . odd n}
>
> I see the need for the second space, but not for the first one
> Moreover, isn't it just the same situation as with QUANTIFICATION:
True for both points. The space before the "." in the above example is
not technically needed, only the one after it ! I put a space both before and
after the "." because it seems more readable.
It is clear that the same problem occurs also for QUANTIFICATION
and "bulletized formulae". It is a purely lexical problem of knowing whether the
"." is separate from the adjacent identifier, or not.
> Of course, some might prefer to remove the rather odd-looking
> DOT-WORDS from the language altogether, since it also prevents using
> "." as in infix operation, and might hinder the introduction of the
> notorious "dot-notation" in extensions of CASL. But let's not waste
> our precious time on this relatively minor issue, which has already
> been debated at considerable length - extensions will in any case have
> to restrict the CASL lexical syntax for ID so as to reserve new
> keywords, and they might just as well be allowed to remove DOT-WORDS
> at the same time.
Agreed, of course !!
> > - <Sort> is defined as <TokenId>. Probably it should be a <SimpleId>
>
> NO! In structured specs, TOKEN-ID allows for compound ids, such as
> "List[Elem]", whereas SIMPLE-ID is still merely WORDS.
OK, sorry I have been too restrictive and missed my point: the current
SORT ::= TOKEN-ID derives into TOKEN that itself derives into SIGNS or
DOT-WORDS, besides WORDS. I'm not sure it was intended...
> > - Are both ->? and -> ? needed ? Or which form is the right one ?
> For the ASF+SDF CASL v0.99 parser (see ftp://ftp.brics.dk/Projects/CoFI/
> Documents/CASL/SyntaxExamples/ASF+SDF/ZCasl-BasicItems.syn) Bjarke
> only allowed "->?".
Probably I can live with any solution for that problem, as long as anyone
agrees about it. Let say that only "->?" is correct.
> > - The "display annotations" must probably be allowed also for symbols
> > declared in renamings, not only in "declarations"
>
> Right - because new symbols can be introduced there too. Thanks,
> I hadn't noticed that. The same goes for fittings in instantiations;
> so I suggest to provide these annotations uniformly for SYMB-MAP-ITEMS.
or only for the targets in SYMB-MAP (not the source).
> > - the tokens -> and ->? are not described as being complete reserved tokens
>
> Neither is "*"; presumably the treatment of these symbols should be
> analogous? Perhaps the explanation in App. C.4:
> `(since they all have to be recognized as terminal symbols)'
> is misleading. As far as I can see, one needs to make sure that a
> terminal symbol which can terminate an ID, TERM, or FORMULA cannot
> also be a valid complete TOKEN. That motivates reserving:
> : :? = => <=> . | |-> \/ /\ { } [ ]
> as well as most of the keywords. (Maybe "::=" should be left
> unreserved, although I can't imagine anyone wanting to use it.) I
> don't see any *technical* reason for reserving tokens such as "*" and
> "->", since when they are used in FUN-TYPE in TERM, they always follow
> a SORT, which is a single TOKEN-ID.
Let we answer with my current focus: "Technical" = parsing technology using
standard tools !
With parsers using lookahead techniques, one has to tell the scanner
"well in advance" (1 or 2 tokens before the next move) if a token like -> or
* is to be scanned as a SIGN or as a special token, depending on the
syntactical context... and this is a source for potential bugs and increased
complexity for the grammar. The less there are such symbols, the less bugs we
will have probably !
It is clear for me that we need special cases like "*". If people really feel
for having -> and ->? (and similar symbols), ok but otherwise let us try to
have the tools as simple as possible.
> However, perhaps "->" and "->?" should be reserved to avoid problems
> with parsing arising in a higher-order extension of CASL (where the
> distinction between SORT and TERM may disappear). The question is
> then whether "*" should be reserved too - I hope not! In fact I'd
> prefer to regard all of "->", "->?" and "*" as ordinary tokens usable
> as infix operators (but getting a predefined interpretation when
> applied to types in HO-CASL).
-------------------------------
> > - Is the operator {{ __ }} allowed, since in an application like
> > {{ ( a ) }}, from a grammatical point of view we have different IDs
> > "{{" and "}}" and the reference manual aks each ID to have balanced
> > occurrences of {.. } (or [... ]).
>
> I don't understand your "grammatical" point of view: "{{" and "}}" are
> simply separate TOKENs, and *not* valid as complete IDs. The ID
> "{{ __ }}" is allowed, and since it is balanced wrt "{" and "}", so is
> any TERM (or FORMULA) built from it.
>
"complete" ID ???
Probably only a question of wording (and apology to all of you that will find
that part quite esoteric and boring). What do we call an ID ? A sequence
of TOKEN-OR-PLACE (let us consider only mixfix) as in the grammar ? In that
case for an application of a mixfix operator like {{ ( a ) }}, and because
of the parentheses, I see three chunks ("ID" in the grammar... sorry) namely
"{{", "a", "}}", separated by "(" and ")". With that (narrow) view, none of
the ID "{{" and "}}" is balanced by itself, as requested by the paragraph in C.4
If what we call an ID in C.4 is indeed an operator/predicate name, like
{{ __ }}, of course it is balanced.
> > - I have doubts about the possibility of allowing [ and ] in IDs since
> > the symbols are also used for marking compound ids.
>
> I had hoped that e.g. "[ __ ]" and "__ [ __ ]" would be a valid IDs,
> but that may well lead to problems when trying to group a list of IDs
> into a TERM. E.g., whether "f[elem]" is a constant compound ID or
> mixfix notation for "__[__](f,elem)" cannot be determined
> context-freely.
Exactly.
> However, a solution there may be for context-free parsers to leave the
> recognition of compound ids to the static analysis. After all,
> "f[elem]" is only legal as a compound id when it has been declared as
> such (possibly by the renaming that is implicit in the instantiation
> of a generic spec); in specs not using compound ids at all, e.g.,
> those arising from translation of OBJ3 specs, it would be annoying to
> have unnecessary restrictions on mixfix symbols.
>
> In a CASL sublanguage not allowing mixfix notation, however, terms
> should be fully parsed, context-freely, including compound ids. Maybe
> a parser annotation could be provided, to declare whether mixfix
> notation is to be used in a spec or not.
Parser annotation will not change what has been fixed in the context-free
grammar.
Frederic