fxp The Program fxcopy

----------------
o Description
o Options by Example
o Summary of Options

----------------

Description

fxcopy is a validating XML processor. It reads an XML document and produces a copy of it. This copy can be in a different encoding, and can be normalized in several ways by, e.g., expanding entity references.

The typical invocation of fxcopy is

fxcopy [option ...] [infile]
If infile is given, fxcopy reads its input document from that file, otherwise fxcopy reads from standard input.

----------------

Options by Example

In addition to the options of fxp, fxcopy accepts arguments in the following two areas:

o Controlling Output
o Expansion of References in the Document Instance and in the Declaration Subset

Controlling Output

By default, fxcopy writes to standard output, in the same encoding as the input document. This can be changed by the following options: For instance,
fxcopy -o output.utf8 --output-encoding=UTF-8 input.ascii
recodes the file input.ascii to UTF-8 and writes it to the file input.utf8.

Expansion of References

By default fxcopy produces a document that is for the most parts identical to the input, i.e., This behavior can be affected by options.
Reference Expansion in the Document Instance
Expansion of references in content can be controlled by the --expand-ref-content option. Its takes as argument a list of keywords each specifying a class of references to expand, where
char   means that a character reference shall be replaced by the described character unless that character cannot be represented directly in the output encoding.
int means that references to internal general entities shall be substituted with their replacement text, unless the entity is undeclared (which may only happen in non-validating mode).
ext means references to external parsed entities shall be substituted by the content of the file they point to, unless the entity is undeclared (which may only happen in non-validating mode).
Alternatively, we can use --expand-ref-content for specifying all of the above.

The second place within the document instance where references can occur is attribute values. Furthermore, attribute values are normalized according to their attribute type after replacement of references. By default, fxcopy reproduces attribute values literally. Given the --expand-att-vals option, it outputs the normalized value instead.

As an example for expansion in the document instance assume the following declarations in the DTD:

<!ENTITY q "quote sign">
<!ENTITY int "internal entity">
<!ENTITY ext SYSTEM "ext.ent">
<!ATTLIST a x NMTOKENS #IMPLIED
            y CDATA    #IMPLIED>
and the content of the file ext.ent is the string "external entity". Let us consider the following document fragment:
<a x=" a   b " y="two &q;s: &#x27; and &#x22;">    
here is a character reference: &#64;
here is an &int;           
here is an &ext;
</a>
Running fxcopy --expand-refs-content=char,int produces this:
<a x=" a   b " y="two &q;s: &#x27; and &#x22;">    
here is a character reference: @
here is an internal entity           
here is an &ext;
</a>
whereas fxcopy --expand-refs-content=ext --expand-att-vals yields
<a x="a b" y="two quote signs: ' and &#x22;">
here is a character reference: &#64;
here is an &int;           
here is an external entity
</a>
Note that the &#x22; in the attribute value is not replaced by the " sign because then it would be recognized as the end of the attribute value literal.
Reference Expansion in the Declaration Subset
Normally fxcopy reproduces only the internal subset of the document type, preserving all references to parameter entities. This behavior can be changed with the --expand-ents-subset option. Its argument indicates which references shall be substitutes by their replacement text:
int Expand all references to internal parameter entities.
ext Replace all references to external parameter entities with the content of file they point to. Note that this option implies --expand-ent-vals in order to ensure well-formedness.
yes   Expand references to internal and external parameter entities. --expand-ents-subset is equivalent --expand-ents-subset=yes
no Expand no parameter entity references at all.
This applies to references occurring where a declaration could occur. It does not affect references within declarations which are expanded regardless of options.

The external subset can be viewed as a special reference. The --expand-ext-subset option makes fxcopy drop the external identifier from the document type declaration, and copy the content of the file it denotes to the end of the internal subset. As --expand-ents-subset=ext, this option implies --expand-ent-vals.

Usually, entity values in entity declarations are reproduced literally, i.e., without replacement of references. However, if a declaration is copied from an external entity to the internal subset, parameter entity references become invalid in the entity value. Therefore, given the --expand-ent-vals option, fxcopy substitutes the derived entity replacement text for the entity value. This does not contain parameter entity references (only if the %-sign was escaped with a character reference, but then it wasn't even recognized as a reference by the parser); it uses character references only for characters that can not be represented directly.

For instance, consider the document exa-6.xml:

<?xml version="1.0"?>
<!DOCTYPE exa SYSTEM "exa-6.ext" [
<!ENTITY % int "<!ELEMENT exa ANY>">
<!ENTITY % ext SYSTEM "ext-6.decl">
%int;
%ext;
]>
<exa/>
where the content of the file exa-6.ext is
<!ENTITY % vnum "1.0">
<!ENTITY % version "xml version %vnum;">
and ext-6.decl contains
<!NOTATION text SYSTEM "/bin/cat">
Running fxcopy --expand-refs-subset=int exa-6.xml produces:
<?xml version="1.0"?>
<!DOCTYPE exa SYSTEM "exa-6.ext" [
<!ENTITY % int "<!ELEMENT exa ANY>">
<!ENTITY % ext SYSTEM "ext-6.decl">
<!ELEMENT exa ANY>
%ext;
]>
<exa/>
Note that only the internal reference %int; was expanded. On the other hand, if we run fxcopy --expand-refs-subset=ext exa-6.xml we get:
<?xml version="1.0"?>
<!DOCTYPE exa SYSTEM "exa-6.ext" [
<!ENTITY % int "<!ELEMENT exa ANY>">
<!ENTITY % ext SYSTEM "ext-6.decl">
%int;
<!NOTATION text SYSTEM "/bin/cat">
]>
<exa/>
Finally, using fxcopy --expand-ext-subset exa-6.xml yields
<?xml version="1.0"?>
<!DOCTYPE exa [
<!ENTITY % int "<!ELEMENT exa ANY>">
<!ENTITY % ext SYSTEM "ext-6.decl">
%int;
%ext;
<!ENTITY % vnum "1.0">
<!ENTITY % version "xml version 1.0">
]>
<exa/>
Note that the entity value in the last entity declaration has been expanded, because the --expand-ent-vals option was implied by --expand-ext-subset. If we supersede this with --expand-ext-subset=no, we get
<!ENTITY % version "xml version %vnum;">
but this is not well-formed:
> fxcopy --expand-ext-subset --expand-ent-vals=no exa-6.xml | fxp
[<stdin>:8.33] Error: a parameter entity reference is not allowed in a 
    declaration in the internal subset.

----------------

Summary of Command Line Options

Each option can be one of: fxcopy understands all options documented for fxp; additionally, the following options are available:
-o fname
--output=fname
Write all output, except for errors and warnings, to the file named fname. If fname is -, the standard output is used. Defaults is -.
--output-encoding=enc
Use encoding enc for generating the output. enc must be a supported encoding. Default is the encoding of the input document.

--expand-refs-content[=key]
Controls whether entity references in content are expanded, i.e., included or preserved as references in the output. key is either a comma-separated list of or it is yes for all or no for none of the above. If no key is given, yes is assumed. Default is no.
--expand-refs-subset[=key]
Controls whether parameter entity references in the internal or external subset are expanded, i.e., included or preserved as references in the output. key is one out of to be expanded. If key is omitted, yes is assumed. Default is no.
--expand-ext-subset[=(yes|no)]
Controls whether the external subset shall be expanded, i.e., appended to the internal subset of the output while dropping its external identifier from the document type declaration. yes implies --expand-ent-vals. If no argument is given, yes is assumed. Default is no.
--expand-att-vals[=(yes|no)]
Controls whether attribute values are reproduced literally or in expanded form, i.e., with all references expanded and white space normalized according to the attribute type. If no argument is given, yes is assumed. Default is no.
--expand-ent-vals[=(yes|no)]
Controls whether entity values are reproduced literally or in expanded form, i.e., with all references expanded. If no argument is given, yes is assumed. Default is no.

--expand=key
Depending on key, equivalent to:
yes: --expand-refs-content --expand-refs-subset --expand-ext-subset --expand-att-vals --expand-ent-vals
no: --expand-refs-content=no --expand-refs-subset=no --expand-ext-subset=no --expand-att-vals=no --expand-ent-vals=no
int: --expand-refs-content=char,int --expand-refs-subset=int --expand-ext-subset=no --expand-att-vals --expand-ent-vals=no
ext: --expand-refs-content=ext --expand-refs-subset=yes --expand-ext-subset --expand-att-vals=no --expand-ent-vals

----------------

A. Neumann (neumann@PSI.Uni-Trier.DE)