[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Syntax of URL and PATH




Till asked:

> But we do not have a syntax for URL and PATH - which is indeed a
> problem for us in Bremen, since we need one for our parser.  Perhaps
> there is some international standard which we can refer to?

A cut-down version of (what appears to be) the current official
grammar for URLs is appended.  It has *not* yet been tested; if I've
accidentally removed something that is needed, please refer to the web
page given below for the original grammar.

Note also that a grammar for URIs (including URLs) is available at
http://www.w3.org/Addressing/URL/5_URI_BNF.html (see also
http://www.ics.uci.edu/pub/ietf/uri/rfc2396.txt, but check the
copyright restrictions at the end of that document!).

CASL might start from a more severely cut-down version (e.g., removing
user, password, hostnumber, port) but as far as I can see there is no
real need to prohibit any of the bits below.

A lexical analyser should recognize a URL or PATH only when expecting
a LIB-ID.  If this is problematic for particular parsing technologies,
my suggestion would be to let such parsers deal only with single
LIB-ITEMs (other than DOWNLOAD-ITEMS), regarding LIB-DEFNs as a
special command language to some other tool that may call the parser
when needed.

Happy New Year to all,

-- Peter
_________________________________________________________
Dr. Peter D. Mosses             International Fellow  (*)

Computer Science Laboratory     mailto:mosses@csl.sri.com
SRI International               phone: +1 (650)  859-2200
333 Ravenswood Avenue           fax:   +1 (650)  859-2844
Menlo Park, CA 94025, USA       http://www.brics.dk/~pdm/

(*) on leave from DAIMI & BRICS, University of Aarhus, DK
    also affiliated to CS Department, Stanford University
_________________________________________________________


BNF for specific URL schemes [http://www.w3.org/Addressing/URL/5_BNF.html]

-- cut down to URL and PATH in CASL v1.0:

29 Dec 1998, Peter D. Mosses, mosses@csl.sri.com

This is a BNF-like description of the Uniform Resource Locator syntax. A
vertical line "|" indicates alternatives, and [brackets] indicate optional
parts. Spaces are represented by the word "space", and the vertical line
character by "vline". Single letters stand for single letters. All words of
more than one letter below are entities described somewhere in this
description.

The "national" and "punctuation" characters do not appear in any productions
and therefore may not appear in URLs.

URL
     httpaddress | ftpaddress
PATH
     segment [ / path ]

httpaddress
     h t t p : / / hostport [ / path ]
ftpaddress
     f t p : / / login / path
login
     [ user [ : password ] @ ] hostport
hostport
     host [ : port ]
host
     hostname | hostnumber
hostname
     ialpha [ . hostname ]
hostnumber
     digits . digits . digits . digits
port
     digits
path
     void | segment [ / path ]
segment
     xpalphas
user
     alphanum2 [ user ]
password
     alphanum2 [ password ]
alphanum2
     alpha | digit | - | _ | . | +
xalpha
     alpha | digit | safe | extra | escape
xalphas
     xalpha [ xalphas ]
xpalpha
     xalpha | +
xpalphas
     xpalpha [ xpalphas ]
ialpha
     alpha [ xalphas ]
alpha
     a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r |
     s | t | u | v | w | x | y | z | A | B | C | D | E | F | G | H | I | J |
     K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
digit
     0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
safe
     $ | - | _ | @ | . | & | + | -
extra
     ! | * | " | ' | ( | ) | ,
reserved
     = | ; | / | # | ? | : | space
escape
     % hex hex
hex
     digit | a | b | c | d | e | f | A | B | C | D | E | F
national
     { | } | vline | [ | ] | \ | ^ | ~
punctuation
     < | >
digits
     digit [ digits ]
alphanum
     alpha | digit
alphanums
     alphanum [ alphanums ]
void

(end of URL BNF)