fxp The Program fxesis

----------------
o Description
o Output Format
o Output Example
o Options by Example
o Summary of Options

----------------

Description

fxesis is a validating XML processor. It reads an XML document and produces a textual description of its Element Structure Information Set (ESIS). This contains only little information about the DTD, and no information about the document's entity structure, but provides all information about the document's logical (element) structure.

The typical invocation of fxesis is

fxesis [option ...] [infile]
If infile is given, fxesis reads its input document from that file, otherwise from standard input. By default, it prints its output to the standard output.

----------------

The Output Format

The fxesis output is a series of plain text lines. The meaning of each line is determined by its first character. Some lines, e.g. attribute specifications, define arguments for a following line. All lines contain only LATIN1 characters, or, if the --ascii option was given, ASCII characters. In order to print other characters fxesis uses escape sequences with the following meaning:
\\ the character '\';
\n a newline character;
\t a tab character;
\U+hex;   the Unicode character whose hexadecimal code is hex.
The following output lines can appear:
-data A sequence of data characters, including newlines. The data need not have been contiguous in the input document, but may have consisted of a series of data characters, CDATA sections and character references, interspersed with comments.
(elem The start of an element of type elem. Preceded by an A line for each of its attributes.
)elem The end of an element of type elem.
Aatt value A specification of attribute att for a following ( (element-start) line. value is one out of:
IMPLIED The attribute value was implied. This is used only in validating mode only.
CDATA data The attribute was declared CDATA; its value is data.
NOTATION name A notation attribute with value name; that notation was defined in a previous N (notation definition) line.
ENTITY name ... An attribute with declared type ENTITY or ENTITIES. Each name is the name of an unparsed general entity that was defined in a preceding E (entity definition) line.
TOKEN token ... An attribute with declared type NMTOKEN, NMTOKENS, ID, IDREF, IDREFS, or enumeration. Each token is a name token complying with the attribute type.
?target text A processing instruction with target target and text text.
Eent NDATA nt Defines an unparsed external entity named ent whose notation is nt and has been defined by a preceding N (notation definition) line. This line is immediately preceded by an optional p (public identifier) line, an s (system identifier) line and, if a filename could be generated, an f (filename) line for the external identifier declared for ent. An entity is defined by an E line only once per document.
Nnt Defines the notation named nt. This line is immediately preceded by an optional p (public identifier) line and an optional s (system identifier) line for the external identifier declared for nt. A notation is defined by an N line only once per document.
ppubid pubid is the public identifier belonging to the external identifier of a following N (notation definition) or E (entity definition).
ssysid sysid is the system identifier belonging to the external identifier of a following N (notation definition) or E (entity definition).
f<OSFILE>filename filename is the system file name generated for the external identifier of a following E (entity definition).

----------------

An Output Example

Consider the example document exa-5.xml. The fxesis output, if called without options, for this document is exa-5.esis-8. Note that all the adjacent data segments of the first a element are merged into one; note also that there is an A line for each implied attribute. Furthermore, notation man is not redefined at its second occurrence.

Opposed to that, fxesis -7 -nv exa-5.xml produces the output in exa-5.esis-7. Note the difference: on the one hand, no A lines are printed for implied attribute, because validation was turned off. On the other hand, characters ö, ü and ß are represented by escape sequences, because they are not ASCII-characters.

----------------

Options by Example

fxesis understands all options documented for fxp; the additional options control how output is generated.

By default, fxesis writes its output to the standard output. It can be redirected to a file named outfile via the option --output=outfile or, for short, -o outfile.

Output Encoding

By default, fxesis produces its output in the LATIN1 character set, i.e., using 8-bit characters. It can be restricted to using only 7-bit characters with the --ascii or, for short, -7 option. For instance, consider the element
<addr city="Köln">Müllerstraße 13</addr>
Called with fxesis -8 ..., the output for this element is
Acity CDATA Köln
(addr
-Müllerstraße 13
)addr
whereas fxesis -7 ... outputs the following:
Acity CDATA K\U+f6;ln
(addr
-M\U+fc;llerstra\U+df;e 13
)addr

----------------

Summary of Command Line Options

Each option can be one of: fxesis understands all options documented for fxp; additionally, the following options are available:
-o fname
--output=fname
Write all output, except for errors and warnings, to the file named fname. If fname is -, the standard output is used. Defaults to -.

-7
--ascii
Produce the output in ASCII encoding, i.e., using 7-bit characters only.
-8
--latin1
Produce output in Latin1 encoding, i.e., using 8-bit characters also. This is the default.

----------------

A. Neumann (neumann@PSI.Uni-Trier.DE)