fxp The Program fxp

----------------
o Description
o Options by Example
o Summary of Options

----------------

Description

fxp is a validating XML parser. It reads an XML document and reports all well-formedness errors, validity errors and other errors in that document. It can also warn about interoperability features and other issues mentioned in the XML recommendation.

The typical invocation of fxp is

fxp [option ...] [infile]
If infile is given, fxp reads its input document from that file, otherwise from standard input.

----------------

Options by Example

o Controlling Error Printing
o Validating and Non-Validating Mode
o Compatibility Modes
o Interoperability Modes
o Other Errors and Warnings
o Catalog Support

Controlling Error Printing

By default fxp reports all errors and warnings to the standard error. This can be controlled by options:

Validating and Non-Validating Mode

By default fxp is a validating parser, but it can be run in non-validating mode with the --validate=no or, for short, -nv option. This has the following effects: For instance, consider an example document exa-1.xml, referencing files exa-1.ext, ext.elem and ext.decl. Running
fxp exa-1.xml
reports the following errors:
[exa-1.xml:17.11] Error: Attribute 'num' has the value 'a' but was declared with
    a fixed default value of '0'.
[exa-1.xml:18.12] Error: ID name 'id1' already occurred as an attribute value.
[exa-1.xml:19.0] Error: Element type 'a' not allowed at this point in the 
    content of element 'a'.
[ext.elem:1.11] Error: Attribute 'num' has the value '1' but was declared with a
    fixed default value of '0'.
[ext.elem:1.15] Error: Element 'a' was ended by an end-tag for 'b'.
[exa-1.xml:20.7] Error: Attribute 'nmu' was not declared for element type 'b'.
[exa-1.xml:20.12] Error: No value was specified for required attribute 'num'.
[exa-1.xml:20.12] Error: The end-tag for element 'b' with declared EMPTY content
    must follow immediately after its start-tag.
whereas the non-validating mode
fxp -nv exa-1.xml
does not find any errors. Note that the error at [ext.elem:1.15] is a well-formedness error but is not reported since the external entity reference &ext; in not included. But if we make the parser include external parsed entities:
fxp -nv --include-external exa-1.xml
then the error is reported:
[ext.elem:1.15] Error: Element 'a' was ended by an end-tag for 'b'.

Compatibility Modes

Some features in XML have only been included for compatibility with SGML. These include: By default fxp checks for compatibility and prints errors in case it is not obeyed. This can be changed with the --compatibility=no or, for short, -nc option.

In non-compatibility mode, however, the parser must handle ambiguous content models. This implies generation of a deterministic finite state machine (DFA), which may in the worst case have size exponential in the size of the content model. In order to avoid too high space usage, fxp imposes a limit on the size of the generated DFA. If this limit is exceeded, a warning is printed and the content model is approximated by (e1|...|en)*, where e1, ..., en are all element types occurring in the original model. The new content model is less restrictive but allows for a small DFA. The limit defaults to 256 and can be set by the --dfa-max-size option, and the warning can be suppressed with the --dfa-warn-size=no option.

For instance, consider the document exa-2.xml. Note that the content model for element a is ambiguous, and its DFA needs at least 257 states. Running fxp in compatibility mode produces the following errors:

[exa-2.xml:4.65] Error: Content model is ambiguous: conflict between the 1st and
    the 2nd occurrence of element 'b'. Using an approximation instead.
[exa-2.xml:10.26] Error: '--' is not allowed in a comment.
[exa-2.xml:13.26] Error: Character '>' must be escaped for compatibility.
Note that the empty element tag for a is not an error since a's content model was approximated. Running in non-compatibility mode:
fxp -nc exa-1.xml
suppressed these errors, but reports the following instead:
[exa-2.xml:4.65] Warning: The finite state machine for the content model of 
    element type 'a' would have more than the maximal allowed number of 256 
    states. Using an approximation instead.
This warning can be suppressed by invoking fxp like this:
fxp -nc --dfa-warn-size=no exa-1.xml
But still the invalidity of the empty-element tag for a is not detected. In order to achieve this, we can raise the limit for the DFA's size:
fxp -nc --dfa-max-size=257 exa-1.xml
Now element a's content can be validated and the error is reported:
[exa-2.xml:12.0] Error: Empty-element tag for element type 'a' whose content 
    model requires non-empty content.

Interoperability Modes

XML also includes some interoperability recommendations in to allow existing SGML software to process XML documents. These recommendations are non-binding and therefore not checked for by default. The --interoperability or, for short, -i option makes fxp run in interoperability-mode, which enables checking for these features. Some of these features can additionally be controlled by individual options. The following table lists the features supported by fxp, together with the option (if any) that enables or disables them, and whether they are enabled by default if --interoperability is supplied:
Controlling option   Default   Interoperability Feature
(none) yes The empty element tag must be used and may only be used for elements declared EMPTY.
--warn-mult-decl=attlist no There should be at most one attribute list declaration for each element type.
--warn-mult-decl=att no No attribute should be declared twice for the same element type.
(none) yes The same name token should not occur more than once in the enumerated attribute types of a single element type.
--warn-predefined=no yes Valid documents should declare the entities amp, lt, gt, apos and quot.
Note that all arguments to the --warn-mult-decl option must be specified in a list; see a detailed description here.

As example consider the document exa-3.xml. Running fxp -i exa-3.xml reports the following:

[exa-3.xml:10.2] Warning: The following name tokens occur more than once in the 
    enumerated attribute types of element 'a': 'yes', 'no'.
[exa-3.xml:10.2] Warning: The predefined entities 'lt', 'gt', 'apos', 'quot' and
    'amp' should have been declared.
[exa-3.xml:13.4] Error: An empty-element tag must be used for element type 'a' 
    with EMPTY declared content.
[exa-3.xml:15.0] Error: Empty-element tag for element 'b' with non-EMPTY 
    declared content.
Now we add some options:
fxp -i --warn-mult-decl=att,attlist --warn-predefined=no exa-3.xml
The result is that the predefined entities are not checked, but multiple declarations are detected now:
[exa-3.xml:9.12] Warning: Repeated attribute-list declaration for element type 
    'a'.
[exa-3.xml:9.28] Warning: Repeated definition of attribute 'x' for element type 
    'a'.
[exa-3.xml:10.2] Warning: The following name tokens occur more than once in the 
    enumerated attribute types of element 'a': 'yes', 'no'.
[exa-3.xml:13.4] Error: An empty-element tag must be used for element type 'a' 
    with EMPTY declared content.
[exa-3.xml:15.0] Error: Empty-element tag for element 'b' with non-EMPTY 
    declared content.

Other Errors and Warnings

The following table lists some features from the XML recommendation which can be enabled or disabled by command line options:
Controlling option   Default   Feature
--warn-att-elem no There should be attribute list declarations for declared element types only.
--check-predefined=no yes If the predefined entities are declared, this must be according to section "4.6 Predefined Entities".
--check-lang-id no The values of the attribute xml:lang must be language identifiers as defined by IETF RFC 1766, "Tags for the Identification of Languages".
--check-iso639 no An ISO-639 Code in a value of the attribute xml:lang must be a two-letter language code as defined by ISO 639, "Codes for the representation of names of languages"
--warn-uri=no yes System identifiers are URI's and may only contain ASCII characters, according to IETF RFC 2396, "Uniform Resource Identifiers (URI): Generic Syntax"
--check-xml-version=no yes Processors may signal an error if they receive documents labeled with versions they do not support.
--warn-xml-decl no XML documents should, begin with an XML declaration which specifies the version of XML being used.
--warn-mult-decl=ent no An XML processor may issue a warning if entities are declared multiple times.
--warn-mult-decl=not no Ditto for notations. This is not mentioned in the XML recommendation but sensible.
Note that all arguments to the --warn-mult-decl option must be specified in a list; see a detailed description here.

For instance, consider the example document exa-4.xml. Running fxp without options produces the following:

[exa-4.xml:1.20] Error: XML version '1.1' is not supported.
[exa-4.xml:12.21] Error: General entity 'amp' must be declared as internal 
    entity with replacement text '&'.
We can suppress these messages while making the parser check for the other features listed above by typing:
fxp --warn-att-elem --check-predefined=no --check-lang-id --check-iso639 
    --check-xml-version=no --warn-mult-decl=ent,not exa-4.xml
The result is:
[exa-4.xml:9.32] Error: 'i-' is not a language identifier.
[exa-4.xml:10.12] Warning: Attribute-list declaration for undeclared element 
    type 'c'.
[exa-4.xml:13.25] Warning: Repeated declaration for general entity 'amp'.
[exa-4.xml:16.45] Warning: Repeated declaration for notation 'text'.
[exa-4.xml:20.17] Error: 'yy' is not a language identifier.

----------------

Summary of Command Line Options

Each option can be one of: The following options are available (see also the catalog options):
-s
--silent
Do not print any errors or warnings.
--few-errors=[(yes|no)]
If yes, the parser tries to avoid printing errors caused by something that already caused an error earlier. E.g., an attribute specification for an attribute not declared for some element will cause an error only at the first instance of that element with the attribute. If no argument is given, yes is assumed. Default is yes.
-e fname
--error-output=fname
Write all errors and warnings to the file named fname. If fname is -, standard error is used. Default is -.

--validate[=(yes|no)]
Turns on or off validation. If no argument is given, yes is assumed. Default is yes.
-v
Same as --validate=yes.
-nv
Same as --validate=no.

--compatibility[=(yes|no)]
If yes, the parser checks for features that were included into XML solely for compatibility with SGML. If no argument is given, yes is assumed. Default is yes.
--compat[=(yes|no)]
Same as --compatibility.
-c
Same as --compatibility=yes.
-nc
Same as --compatibility=no.

--interoperability[=(yes|no)]
If yes, the parser checks whether the (non-binding) recommendations XML makes for enhancing interoperability with existing SGML software are followed. If no argument is given, yes is assumed. Default is no.
--interop[=(yes|no)]
Same as --interoperability.
-i
Same as --interoperability=yes.
-ni
Same as --interoperability=no.

--check-reserved[=(yes|no)]
If yes, the parser checks whether element names, attribute names and PI targets are reserved for standardization and thus invalid. If no argument is given, yes is assumed. Default is no.

--check-predefined[=(yes|no)]
If yes, the parser checks whether declarations for the predefined entities (amp, lt, gt, apos and quot) are in accordance to section 4.6 in the XML recommendation. If no argument is given, yes is assumed. Default is yes.
--check-predef[=(yes|no)]
Same as --check-predefined.

--check-lang-id[=(yes|no)]
If yes, the parser checks whether values of the 'xml:lang' attribute are language identifiers as defined in RFC 1776. If no argument is given, yes is assumed. Default is no.
--check-iso639[=(yes|no)]
If yes, the parser checks whether an ISO language code in a language identifier is in accordance to ISO 639. Has no effect unless --check-lang-id=yes was specified. If no argument is given, yes is assumed. Default is no.

--check-xml-version[=(yes|no)]
If yes, the parser checks whether the version number in a XML or text declaration is supported. If no argument is given, yes is assumed. Default is yes.

--warn-uri[=(yes|no)]
If yes, the parser prints a warning for each non-ASCII character occurring in a system literal (URI). If no argument is given, yes is assumed. Default is yes.

--warn-xml-decl[=(yes|no)]
Turns on or off a warning if there is no XML declaration. If no argument is given, yes is assumed. Default is no.
--warn-att-elem[=(yes|no)]
Turns on or off warnings about attribute list declarations for undeclared elements. If no argument is given, yes is assumed. Default is no.
--warn-predefined[=(yes|no)]
Turns on or off a warning if at least one of the predefined entities (amp, lt, gt, apos and quot) are not declared. Has no effect in non-validating mode or if --interoperability=yes was not specified. If no argument is given, yes is assumed. Default is no.
--warn-mult-decl[=arg]
Turns on or off a warning if something is declared multiple times. arg specifies which declarations this applies to, and must be one of the following: att and attlist have no effect unless --interoperability=yes was specified. If no argument is given, all is assumed. Default is none.
--warn[=(yes|no)]
If yes or without argument, equivalent to --warn-xml-decl --warn-att-elem --warn-predefined --warn-mult-decl=all.
If no, equivalent to --warn-xml-decl=no --warn-att-elem=no --warn-predefined=no --warn-mult-decl=none.

--include-external[=(yes|no)]
Specifies whether external parsed entity references are included in content or not. Has no effect in validating mode (then all references are included). If no argument is given, yes is assumed. Default is no.
--include-ext[=(yes|no)]
Same as --include-external.

--dfa-initial-size=n
The transition table of a finite state machine grows dynamically during its creation, i.e., if the table's size is exceeded, it is recreated with double size. This option sets the initial size of the transition table to the next power of 2 larger or equal to n. Default is 16.
--dfa-initial-width=n
Same as --dfa-initial-size=2n.
--dfa-max-size=n
For ambiguous content models the parser generates a deterministic finite state machine (DFA), which may in the worst case have size exponential in the size of the content model. This option specifies a threshold for the number of admissible states of the DFA. If it is exceeded, the content model is approximated by the content model (e1|...|en)*, where e1, ..., en are all element types occurring in the original model. Default is 256.
--dfa-warn-size[=(yes|no)]
Turns on or off a warning if the maximal number of states specified by --dfa-max-size is exceeded by the DFA construction for a content model. If no argument is given, yes is assumed. Default is yes.

-?
--help
Print a summary of the command line options and exit.

----------------

A. Neumann (neumann@PSI.Uni-Trier.DE)