[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Syntax of URL and PATH
Till asked:
> But we do not have a syntax for URL and PATH - which is indeed a
> problem for us in Bremen, since we need one for our parser. Perhaps
> there is some international standard which we can refer to?
A cut-down version of (what appears to be) the current official
grammar for URLs is appended. It has *not* yet been tested; if I've
accidentally removed something that is needed, please refer to the web
page given below for the original grammar.
Note also that a grammar for URIs (including URLs) is available at
http://www.w3.org/Addressing/URL/5_URI_BNF.html (see also
http://www.ics.uci.edu/pub/ietf/uri/rfc2396.txt, but check the
copyright restrictions at the end of that document!).
CASL might start from a more severely cut-down version (e.g., removing
user, password, hostnumber, port) but as far as I can see there is no
real need to prohibit any of the bits below.
A lexical analyser should recognize a URL or PATH only when expecting
a LIB-ID. If this is problematic for particular parsing technologies,
my suggestion would be to let such parsers deal only with single
LIB-ITEMs (other than DOWNLOAD-ITEMS), regarding LIB-DEFNs as a
special command language to some other tool that may call the parser
when needed.
Happy New Year to all,
-- Peter
_________________________________________________________
Dr. Peter D. Mosses International Fellow (*)
Computer Science Laboratory mailto:mosses@csl.sri.com
SRI International phone: +1 (650) 859-2200
333 Ravenswood Avenue fax: +1 (650) 859-2844
Menlo Park, CA 94025, USA http://www.brics.dk/~pdm/
(*) on leave from DAIMI & BRICS, University of Aarhus, DK
also affiliated to CS Department, Stanford University
_________________________________________________________
BNF for specific URL schemes [http://www.w3.org/Addressing/URL/5_BNF.html]
-- cut down to URL and PATH in CASL v1.0:
29 Dec 1998, Peter D. Mosses, mosses@csl.sri.com
This is a BNF-like description of the Uniform Resource Locator syntax. A
vertical line "|" indicates alternatives, and [brackets] indicate optional
parts. Spaces are represented by the word "space", and the vertical line
character by "vline". Single letters stand for single letters. All words of
more than one letter below are entities described somewhere in this
description.
The "national" and "punctuation" characters do not appear in any productions
and therefore may not appear in URLs.
URL
httpaddress | ftpaddress
PATH
segment [ / path ]
httpaddress
h t t p : / / hostport [ / path ]
ftpaddress
f t p : / / login / path
login
[ user [ : password ] @ ] hostport
hostport
host [ : port ]
host
hostname | hostnumber
hostname
ialpha [ . hostname ]
hostnumber
digits . digits . digits . digits
port
digits
path
void | segment [ / path ]
segment
xpalphas
user
alphanum2 [ user ]
password
alphanum2 [ password ]
alphanum2
alpha | digit | - | _ | . | +
xalpha
alpha | digit | safe | extra | escape
xalphas
xalpha [ xalphas ]
xpalpha
xalpha | +
xpalphas
xpalpha [ xpalphas ]
ialpha
alpha [ xalphas ]
alpha
a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r |
s | t | u | v | w | x | y | z | A | B | C | D | E | F | G | H | I | J |
K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
digit
0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
safe
$ | - | _ | @ | . | & | + | -
extra
! | * | " | ' | ( | ) | ,
reserved
= | ; | / | # | ? | : | space
escape
% hex hex
hex
digit | a | b | c | d | e | f | A | B | C | D | E | F
national
{ | } | vline | [ | ] | \ | ^ | ~
punctuation
< | >
digits
digit [ digits ]
alphanum
alpha | digit
alphanums
alphanum [ alphanums ]
void
(end of URL BNF)