The Program fxp
fxp is a validating XML parser. It reads an XML document
and reports all well-formedness errors, validity errors and other
errors in that document. It can also warn about interoperability
features and other issues mentioned in the XML recommendation.
The typical invocation of fxp is
fxp [option ...] [infile]
If infile is given, fxp reads its input document
from that file, otherwise from standard input.
By default fxp reports all errors and warnings to the standard error.
This can be controlled by options:
-
All messages can be redirected to a file named errfile via the
option --error-output=errfile or, for short, -e errfile.
-
All messages can be suppressed by supplying the --silent
option, or -s for short.
-
By default, the parser tries avoid printing an error that has already been
printed earlier. E.g., if an attribute is misspelled in the attribute list
declaration, there will be an undeclared-attribute error ech time this
attribute is actually specified for an element. Printing of all but the
first of these errors is suppressed. In order to make fxp print this
kind of duplicate error messages, use the --few-errors=no option.
By default fxp is a validating parser, but it can be run in
non-validating mode with the --validate=no or, for short,
-nv option.
This has the following effects:
- only the internal subset of the DTD is parsed and checked for
well-formedness;
- the external subset and all references to external parameter entities
are ignored;
- declarations in the internal subset are processed only upto the first
reference to an external parameter entity;
- validity constraints are not verified;
- no referenced parameter entities are included;
- by default, no external parsed general entities are included;
this can be changed with the --include-external option;
- all attributes for which no declaration has been processed are assumed
to be declared CDATA with default value #IMPLIED;
For instance, consider an example document
exa-1.xml, referencing files
exa-1.ext,
ext.elem and
ext.decl.
Running
fxp exa-1.xml
reports the following errors:
[exa-1.xml:17.11] Error: Attribute 'num' has the value 'a' but was declared with
a fixed default value of '0'.
[exa-1.xml:18.12] Error: ID name 'id1' already occurred as an attribute value.
[exa-1.xml:19.0] Error: Element type 'a' not allowed at this point in the
content of element 'a'.
[ext.elem:1.11] Error: Attribute 'num' has the value '1' but was declared with a
fixed default value of '0'.
[ext.elem:1.15] Error: Element 'a' was ended by an end-tag for 'b'.
[exa-1.xml:20.7] Error: Attribute 'nmu' was not declared for element type 'b'.
[exa-1.xml:20.12] Error: No value was specified for required attribute 'num'.
[exa-1.xml:20.12] Error: The end-tag for element 'b' with declared EMPTY content
must follow immediately after its start-tag.
whereas the non-validating mode
fxp -nv exa-1.xml
does not find any errors.
Note that the error at [ext.elem:1.15] is a well-formedness
error but is not reported since the external entity reference &ext;
in not included. But if we make the parser include external parsed entities:
fxp -nv --include-external exa-1.xml
then the error is reported:
[ext.elem:1.15] Error: Element 'a' was ended by an end-tag for 'b'.
Some features in XML have only been included for compatibility with SGML.
These include:
- the string (]]>) may not appear literally in content;
- a comment may not contain a double-hyphen (--);
- content models must be unambiguous.
By default fxp checks for compatibility and prints errors in case
it is not obeyed. This can be changed with the --compatibility=no
or, for short, -nc option.
In non-compatibility mode, however, the parser must handle ambiguous
content models. This implies generation of a deterministic finite state
machine (DFA), which may in the worst case have size exponential in the
size of the content model. In order to avoid too high space usage,
fxp imposes a limit on the size of the generated DFA. If this limit
is exceeded, a warning is printed and the content model is approximated by
(e1|...|en)*,
where e1, ..., en are
all element types occurring in the original model. The new content model
is less restrictive but allows for a small DFA. The limit defaults to 256
and can be set by the --dfa-max-size option, and the warning
can be suppressed with the --dfa-warn-size=no option.
For instance, consider the document
exa-2.xml. Note that the
content model for element a is ambiguous, and its DFA needs
at least 257 states. Running fxp in compatibility mode produces
the following errors:
[exa-2.xml:4.65] Error: Content model is ambiguous: conflict between the 1st and
the 2nd occurrence of element 'b'. Using an approximation instead.
[exa-2.xml:10.26] Error: '--' is not allowed in a comment.
[exa-2.xml:13.26] Error: Character '>' must be escaped for compatibility.
Note that the empty element tag for a is not an error since
a's content model was approximated. Running in non-compatibility
mode:
fxp -nc exa-1.xml
suppressed these errors, but reports the following instead:
[exa-2.xml:4.65] Warning: The finite state machine for the content model of
element type 'a' would have more than the maximal allowed number of 256
states. Using an approximation instead.
This warning can be suppressed by invoking fxp like this:
fxp -nc --dfa-warn-size=no exa-1.xml
But still the invalidity of the empty-element tag for a is
not detected. In order to achieve this, we can raise the limit for the
DFA's size:
fxp -nc --dfa-max-size=257 exa-1.xml
Now element a's content can be validated and the error is
reported:
[exa-2.xml:12.0] Error: Empty-element tag for element type 'a' whose content
model requires non-empty content.
XML also includes some interoperability recommendations in to allow existing
SGML software to process XML documents. These recommendations are
non-binding and therefore not checked for by default. The
--interoperability or, for short, -i option makes
fxp run in interoperability-mode, which enables checking for these
features. Some of these features can additionally be controlled by individual
options.
The following table lists the features supported by fxp, together
with the option (if any) that enables or disables them, and whether they
are enabled by default if --interoperability is supplied:
Controlling option |
Default |
Interoperability Feature |
(none)
| yes |
The empty element tag must be used and may only be used for elements
declared EMPTY.
|
--warn-mult-decl=attlist
| no |
There should be at most one attribute list declaration for each element type.
|
--warn-mult-decl=att
| no |
No attribute should be declared twice for the same element type.
|
(none)
| yes |
The same name token should not occur more than once in the enumerated
attribute types of a single element type.
|
--warn-predefined=no
| yes |
Valid documents should declare the entities amp, lt,
gt, apos and quot.
|
Note that all arguments to the --warn-mult-decl option must be
specified in a list; see a detailed description here.
As example consider the document
exa-3.xml.
Running fxp -i exa-3.xml reports the following:
[exa-3.xml:10.2] Warning: The following name tokens occur more than once in the
enumerated attribute types of element 'a': 'yes', 'no'.
[exa-3.xml:10.2] Warning: The predefined entities 'lt', 'gt', 'apos', 'quot' and
'amp' should have been declared.
[exa-3.xml:13.4] Error: An empty-element tag must be used for element type 'a'
with EMPTY declared content.
[exa-3.xml:15.0] Error: Empty-element tag for element 'b' with non-EMPTY
declared content.
Now we add some options:
fxp -i --warn-mult-decl=att,attlist --warn-predefined=no exa-3.xml
The result is that the predefined entities are not checked, but multiple
declarations are detected now:
[exa-3.xml:9.12] Warning: Repeated attribute-list declaration for element type
'a'.
[exa-3.xml:9.28] Warning: Repeated definition of attribute 'x' for element type
'a'.
[exa-3.xml:10.2] Warning: The following name tokens occur more than once in the
enumerated attribute types of element 'a': 'yes', 'no'.
[exa-3.xml:13.4] Error: An empty-element tag must be used for element type 'a'
with EMPTY declared content.
[exa-3.xml:15.0] Error: Empty-element tag for element 'b' with non-EMPTY
declared content.
The following table lists some features from the XML recommendation which can
be enabled or disabled by command line options:
Controlling option |
Default |
Feature |
--warn-att-elem
| no |
There should be attribute list declarations for declared element types only.
|
--check-predefined=no
| yes |
If the predefined entities are declared, this must be according to section
"4.6 Predefined Entities".
|
--check-lang-id
| no |
The values of the attribute xml:lang must be language identifiers
as defined by IETF RFC 1766, "Tags for the Identification of Languages".
|
--check-iso639
| no |
An ISO-639 Code in a value of the attribute xml:lang must be
a two-letter language code as defined by ISO 639, "Codes for the
representation of names of languages"
|
--warn-uri=no
| yes |
System identifiers are URI's and may only contain ASCII characters,
according to IETF RFC 2396, "Uniform Resource Identifiers (URI):
Generic Syntax"
|
--check-xml-version=no
| yes |
Processors may signal an error if they receive documents labeled with
versions they do not support.
|
--warn-xml-decl
| no |
XML documents should, begin with an XML declaration which specifies the
version of XML being used.
|
--warn-mult-decl=ent
| no |
An XML processor may issue a warning if entities are declared multiple times.
|
--warn-mult-decl=not
| no |
Ditto for notations. This is not mentioned in the XML recommendation but sensible.
|
Note that all arguments to the --warn-mult-decl option must be
specified in a list; see a detailed description here.
For instance, consider the example document
exa-4.xml. Running fxp
without options produces the following:
[exa-4.xml:1.20] Error: XML version '1.1' is not supported.
[exa-4.xml:12.21] Error: General entity 'amp' must be declared as internal
entity with replacement text '&'.
We can suppress these messages while making the parser check for the other
features listed above by typing:
fxp --warn-att-elem --check-predefined=no --check-lang-id --check-iso639
--check-xml-version=no --warn-mult-decl=ent,not exa-4.xml
The result is:
[exa-4.xml:9.32] Error: 'i-' is not a language identifier.
[exa-4.xml:10.12] Warning: Attribute-list declaration for undeclared element
type 'c'.
[exa-4.xml:13.25] Warning: Repeated declaration for general entity 'amp'.
[exa-4.xml:16.45] Warning: Repeated declaration for notation 'text'.
[exa-4.xml:20.17] Error: 'yy' is not a language identifier.
Each option can be one of:
- A file name specifying the input document.
Only one input document may be specified.
- A long option of the form --key[=arg]
- A short option of the form -k, where k consists
of single character. If k consists of more than one character,
each character is assumed to be a short option itself (e.g., -vic
equals -v -i -c).
- A short option with argument of the form -k arg, where
k consists of a single character.
- A negative short option of the form -nk, where k consists
of single character. If k consists of more than one character,
each character is assumed to be a negative short option itself (e.g., -nvic
equals -nv -ni -nc). If k is empty, then we have the
(non-negative) short option -n.
- The string --. This option is ignored, except that all remaining
options are interpreted as file names, whether they start with -
or not.
The following options are available (see also the
catalog options):
-
-s
-
--silent
-
Do not print any errors or warnings.
-
--few-errors=[(yes|no)]
-
If yes, the parser tries to avoid printing errors caused by
something that already caused an error earlier. E.g., an attribute specification
for an attribute not declared for some element will cause an error only at the
first instance of that element with the attribute.
If no argument is given, yes is assumed.
Default is yes.
-
-e fname
-
--error-output=fname
-
Write all errors and warnings to the file named fname. If
fname is -, standard error is used.
Default is -.
-
--validate[=(yes|no)]
-
Turns on or off validation. If no argument is given, yes is assumed.
Default is yes.
-
-v
-
Same as --validate=yes.
-
-nv
-
Same as --validate=no.
-
--compatibility[=(yes|no)]
-
If yes, the parser checks for features that were included
into XML solely for compatibility with SGML.
If no argument is given, yes is assumed.
Default is yes.
-
--compat[=(yes|no)]
-
Same as --compatibility.
-
-c
-
Same as --compatibility=yes.
-
-nc
-
Same as --compatibility=no.
-
--interoperability[=(yes|no)]
-
If yes, the parser checks whether the (non-binding)
recommendations XML makes for enhancing interoperability with existing
SGML software are followed.
If no argument is given, yes is assumed.
Default is no.
-
--interop[=(yes|no)]
-
Same as --interoperability.
-
-i
-
Same as --interoperability=yes.
-
-ni
-
Same as --interoperability=no.
-
--check-reserved[=(yes|no)]
-
If yes, the parser checks whether element names,
attribute names and PI targets are reserved for standardization
and thus invalid.
If no argument is given, yes is assumed.
Default is no.
-
--check-predefined[=(yes|no)]
-
If yes, the parser checks whether declarations for the
predefined entities (amp, lt, gt,
apos and quot) are in accordance to section 4.6
in the XML recommendation.
If no argument is given, yes is assumed.
Default is yes.
-
--check-predef[=(yes|no)]
-
Same as --check-predefined.
-
--check-lang-id[=(yes|no)]
-
If yes, the parser checks whether values of the 'xml:lang'
attribute are language identifiers as defined in RFC 1776.
If no argument is given, yes is assumed.
Default is no.
-
--check-iso639[=(yes|no)]
-
If yes, the parser checks whether an ISO language code in a
language identifier is in accordance to ISO 639. Has no effect unless
--check-lang-id=yes was specified.
If no argument is given, yes is assumed.
Default is no.
-
--check-xml-version[=(yes|no)]
-
If yes, the parser checks whether the version number in a XML or
text declaration is supported.
If no argument is given, yes is assumed.
Default is yes.
-
--warn-uri[=(yes|no)]
-
If yes, the parser prints a warning for each non-ASCII
character occurring in a system literal (URI). If no argument is given,
yes is assumed.
Default is yes.
-
--warn-xml-decl[=(yes|no)]
-
Turns on or off a warning if there is no XML declaration.
If no argument is given, yes is assumed.
Default is no.
-
--warn-att-elem[=(yes|no)]
-
Turns on or off warnings about attribute list declarations for undeclared elements.
If no argument is given, yes is assumed.
Default is no.
-
--warn-predefined[=(yes|no)]
-
Turns on or off a warning if at least one of the predefined entities
(amp, lt, gt, apos
and quot) are not declared.
Has no effect in non-validating mode or if --interoperability=yes
was not specified.
If no argument is given, yes is assumed.
Default is no.
-
--warn-mult-decl[=arg]
-
Turns on or off a warning if something is declared multiple times.
arg specifies which declarations this applies to, and must be one of
the following:
- A comma-separated list key1[,key2 ...],
where each key is one out of:
- att
for multiple definitions of an attribute for the same element;
- attlist
for multiple attribute list declaration for an element;
- ent for multiple declarations of an entity;
- not for multiple declarations of a notation.
- all for all of the keys above;
- none for all of the keys above.
att and attlist have no effect unless
--interoperability=yes was specified.
If no argument is given, all is assumed.
Default is none.
-
--warn[=(yes|no)]
-
If yes or without argument, equivalent to
--warn-xml-decl --warn-att-elem --warn-predefined --warn-mult-decl=all.
If no, equivalent to
--warn-xml-decl=no --warn-att-elem=no --warn-predefined=no
--warn-mult-decl=none.
-
--include-external[=(yes|no)]
-
Specifies whether external parsed entity references are included in content or not.
Has no effect in validating mode (then all references are included).
If no argument is given, yes is assumed.
Default is no.
-
--include-ext[=(yes|no)]
-
Same as --include-external.
-
--dfa-initial-size=n
-
The transition table of a finite state machine grows dynamically during its
creation, i.e., if the table's size is exceeded, it is recreated with double
size. This option sets the initial size of the transition table to the next
power of 2 larger or equal to n.
Default is 16.
-
--dfa-initial-width=n
-
Same as --dfa-initial-size=2n.
-
--dfa-max-size=n
-
For ambiguous content models the parser generates a deterministic finite
state machine (DFA), which may in the worst case have size exponential in the
size of the content model. This option specifies a threshold for the
number of admissible states of the DFA. If it is exceeded, the content
model is approximated by the content model
(e1|...|en)*,
where e1, ..., en are
all element types occurring in the original model.
Default is 256.
-
--dfa-warn-size[=(yes|no)]
-
Turns on or off a warning if the maximal number of states specified by
--dfa-max-size is exceeded by the DFA construction for a
content model.
If no argument is given, yes is assumed.
Default is yes.
-
-?
-
--help
-
Print a summary of the command line options and exit.
A. Neumann
(neumann@PSI.Uni-Trier.DE)