Description | |
Options by Example | |
Summary of Options |
The typical invocation of fxcopy is
If infile is given, fxcopy reads its input document from that file, otherwise fxcopy reads from standard input.fxcopy [option ...] [infile]
Controlling Output | |
Expansion of References in the Document Instance and in the Declaration Subset |
recodes the file input.ascii to UTF-8 and writes it to the file input.utf8.fxcopy -o output.utf8 --output-encoding=UTF-8 input.ascii
Alternatively, we can use --expand-ref-content for specifying all of the above.
char means that a character reference shall be replaced by the described character unless that character cannot be represented directly in the output encoding. int means that references to internal general entities shall be substituted with their replacement text, unless the entity is undeclared (which may only happen in non-validating mode). ext means references to external parsed entities shall be substituted by the content of the file they point to, unless the entity is undeclared (which may only happen in non-validating mode).
The second place within the document instance where references can occur is attribute values. Furthermore, attribute values are normalized according to their attribute type after replacement of references. By default, fxcopy reproduces attribute values literally. Given the --expand-att-vals option, it outputs the normalized value instead.
As an example for expansion in the document instance assume the following declarations in the DTD:
and the content of the file ext.ent is the string "external entity". Let us consider the following document fragment:<!ENTITY q "quote sign"> <!ENTITY int "internal entity"> <!ENTITY ext SYSTEM "ext.ent"> <!ATTLIST a x NMTOKENS #IMPLIED y CDATA #IMPLIED>
Running fxcopy --expand-refs-content=char,int produces this:<a x=" a b " y="two &q;s: ' and ""> here is a character reference: @ here is an ∫ here is an &ext; </a>
whereas fxcopy --expand-refs-content=ext --expand-att-vals yields<a x=" a b " y="two &q;s: ' and ""> here is a character reference: @ here is an internal entity here is an &ext; </a>
Note that the " in the attribute value is not replaced by the " sign because then it would be recognized as the end of the attribute value literal.<a x="a b" y="two quote signs: ' and ""> here is a character reference: @ here is an ∫ here is an external entity </a>
This applies to references occurring where a declaration could occur. It does not affect references within declarations which are expanded regardless of options.
int Expand all references to internal parameter entities. ext Replace all references to external parameter entities with the content of file they point to. Note that this option implies --expand-ent-vals in order to ensure well-formedness. yes Expand references to internal and external parameter entities. --expand-ents-subset is equivalent --expand-ents-subset=yes no Expand no parameter entity references at all.
The external subset can be viewed as a special reference. The --expand-ext-subset option makes fxcopy drop the external identifier from the document type declaration, and copy the content of the file it denotes to the end of the internal subset. As --expand-ents-subset=ext, this option implies --expand-ent-vals.
Usually, entity values in entity declarations are reproduced literally, i.e., without replacement of references. However, if a declaration is copied from an external entity to the internal subset, parameter entity references become invalid in the entity value. Therefore, given the --expand-ent-vals option, fxcopy substitutes the derived entity replacement text for the entity value. This does not contain parameter entity references (only if the %-sign was escaped with a character reference, but then it wasn't even recognized as a reference by the parser); it uses character references only for characters that can not be represented directly.
For instance, consider the document exa-6.xml:
where the content of the file exa-6.ext is<?xml version="1.0"?> <!DOCTYPE exa SYSTEM "exa-6.ext" [ <!ENTITY % int "<!ELEMENT exa ANY>"> <!ENTITY % ext SYSTEM "ext-6.decl"> %int; %ext; ]> <exa/>
and ext-6.decl contains<!ENTITY % vnum "1.0"> <!ENTITY % version "xml version %vnum;">
Running fxcopy --expand-refs-subset=int exa-6.xml produces:<!NOTATION text SYSTEM "/bin/cat">
Note that only the internal reference %int; was expanded. On the other hand, if we run fxcopy --expand-refs-subset=ext exa-6.xml we get:<?xml version="1.0"?> <!DOCTYPE exa SYSTEM "exa-6.ext" [ <!ENTITY % int "<!ELEMENT exa ANY>"> <!ENTITY % ext SYSTEM "ext-6.decl"> <!ELEMENT exa ANY> %ext; ]> <exa/>
Finally, using fxcopy --expand-ext-subset exa-6.xml yields<?xml version="1.0"?> <!DOCTYPE exa SYSTEM "exa-6.ext" [ <!ENTITY % int "<!ELEMENT exa ANY>"> <!ENTITY % ext SYSTEM "ext-6.decl"> %int; <!NOTATION text SYSTEM "/bin/cat"> ]> <exa/>
Note that the entity value in the last entity declaration has been expanded, because the --expand-ent-vals option was implied by --expand-ext-subset. If we supersede this with --expand-ext-subset=no, we get<?xml version="1.0"?> <!DOCTYPE exa [ <!ENTITY % int "<!ELEMENT exa ANY>"> <!ENTITY % ext SYSTEM "ext-6.decl"> %int; %ext; <!ENTITY % vnum "1.0"> <!ENTITY % version "xml version 1.0"> ]> <exa/>
but this is not well-formed:<!ENTITY % version "xml version %vnum;">
> fxcopy --expand-ext-subset --expand-ent-vals=no exa-6.xml | fxp [<stdin>:8.33] Error: a parameter entity reference is not allowed in a declaration in the internal subset.
yes: | --expand-refs-content --expand-refs-subset --expand-ext-subset --expand-att-vals --expand-ent-vals | |
no: | --expand-refs-content=no --expand-refs-subset=no --expand-ext-subset=no --expand-att-vals=no --expand-ent-vals=no | |
int: | --expand-refs-content=char,int --expand-refs-subset=int --expand-ext-subset=no --expand-att-vals --expand-ent-vals=no | |
ext: | --expand-refs-content=ext --expand-refs-subset=yes --expand-ext-subset --expand-att-vals=no --expand-ent-vals |
A. Neumann (neumann@PSI.Uni-Trier.DE)