<copyDocument to = Path selection = boolean : false preserveInclusions = boolean : false filterDuplicateIDs = boolean : true saveCharsAsEntityRefs = boolean : false indent = boolean : false encoding = (ISO-8859-1|ISO-8859-13|ISO-8859-15|ISO-8859-2| ISO-8859-3|ISO-8859-4|ISO-8859-5|ISO-8859-7| ISO-8859-9|KOI8-R|MacRoman|US-ASCII|UTF-16|UTF-8| Windows-1250|Windows-1251|Windows-1252|Windows-1253| Windows-1257) : UTF-8 > Content: [ extract ]* [ resources ]* </copyDocument> <extract xpath = Absolute XPath (subset) dataType = anyURI|hexBinary|base64Binary|XML toDir = Path baseName = File basename without an extension extension = file name extension > <processingInstruction target = Name data = string /> | <attribute name = QName value = string /> | any element </extract> <resources match = Regexp pattern copyTo = Path referenceAs = anyURI />
Copy document being edited to the location specified by required attribute to
.
Attribute | Description |
---|---|
to | Specifies the file where the document (or the node selection) is to be copied. |
selection | If this attribute is specified with value If multiple nodes are explicitly selected, their parent element is saved and a special processing-instruction Example, the user has selected paragraphs with content 2, 3 and 4: <div> <?select-child-nodes 3-5?> <p>1</p> <p>2</p> <p>3</p> <p>4</p> </div> In the above example, Otherwise, it is the whole document which is saved to the specified location. |
preserveInclusions | If this attribute is specified with value
Otherwise (default value),
|
filterDuplicateIDs | Ignored unless If this attribute is specified with value |
saveCharsAsEntityRefs | If this attribute is specified with value Otherwise, the generated XML file contains character references such as |
indent | If this attribute is specified with value Otherwise, the generated XML file is not indented. |
encoding | Specifies the encoding of the generated XML file. |
<extract xpath = Absolute XPath (subset) dataType = anyURI|hexBinary|base64Binary|XML toDir = Path baseName = File basename without an extension extension = File name extension > <processingInstruction target = Name data = string /> | <attribute name = QName value = string /> | any element </extract>
The extract
element is designed to ease the writing of XSLT style sheets that need to transform XML documents where binary images (TIFF, PNG, etc) or XML images (typically SVG) are embedded.
In order to do this, the extract
element copies the image data found in the element or the attribute specified by attribute xpath
to a file created in the directory specified by attribute toDir
.
The name of the image is automatically generated by extract
. However, attributes baseName
and extension
may be used to parametrize to a certain extent the generation of the image file name.
Now the question is: how does the XSLT style sheet know about the ``extracted'' image files? The extract
element offers three options:
Replace the element containing image data by the one specified as a child element of extract
.
If xpath
selects an attribute instead of an element, the element containing the selected attribute is replaced.
DocBook example: replace embedded svg:svg
(allowed in "-//OASIS//DTD DocBook SVG Module V1.0//EN
") by much simpler imagedata
:
<cfg:extract xmlns="" xpath="//imageobject/svg:svg" toDir="raw"> <imagedata fileref="resources/{$url.rootName}.png" /> </cfg:extract>
OR, replace the element containing image data by the attribute which is specified using the attribute
child element of extract
. This attribute is added to the parent element of the element containing image data.
If xpath
selects an attribute instead of an element, the element containing the selected attribute is replaced.
DocBook 5 example: replace embedded db5:imagedata/svg:svg
by db5:imagedata/@fileref
:
<cfg:extract xmlns="" xmlns:db5="http://docbook.org/ns/docbook" xmlns:svg="http://www.w3.org/2000/svg" xpath="//db5:imagedata/svg:svg" toDir="raw" > <cfg:attribute name="fileref" value="resources/{$url.rootName}.png" /> </cfg:extract>
OR, more general approach, insert a processing instruction (which is specified using the processingInstruction
child element of extract
) at the beginning of the element from which data has been extracted.
If xpath
selects an attribute instead of an element, the processing instruction is inserted in the element containing the selected attribute.
Example: insert <?extracted
in extracted_file_name
?>imgd:image_ab
and imgd:image_eb
:
<extract xpath="//imgd:image_ab/@data | //imgd:image_eb" toDir="raw"> <processingInstruction target="extracted" data="resources/{$url.rootName}.png" /> </extract>
The replacement element (attribute values or text nodes in the element or in any of its descendant) and the inserted processing instruction (target and data) can reference the following variables which are substituted by their values during the extraction step:
Variable | Value | |||
---|---|---|---|---|
{$file.path} | Pathname of the extracted image file. Example: "/tmp/xxe1234/book_image_3.svg ". | |||
{$file.parent} | Pathname of the directory containing the extracted image file. Example: "/tmp/xxe1234/ ". | |||
{$file.name} | Name of the extracted image file. Example: "book_image_3.svg ". | |||
{$file.rootName} | Name of the extracted image file, but without an extension. Example: "book_image_3 ". | |||
{$file.extension} | Extension of the extracted image file name. Example: "svg ". | |||
{$file.separator} | Native path component separator of the platform. Example: | |||
{$url} | URL of the extracted image file. Example: "
| |||
{$url.parent} | URL of the directory containing the extracted image file. Example: "file:///tmp/xxe1234 ". Note that this URL does not end with a '/'. | |||
{$url.name} | Name of the extracted image file. Example: "book_image_3.svg ". | |||
{$url.rootName} | Name of the extracted image file, but without an extension. Example: "book_image_3 ". | |||
{$url.extension} | Extension of the extracted image file name. Example: "svg ". |
In fact, any XPath expression (full XPath 1.0, not just the subset used in attribute xpath
), not only variable references, can be put between curly braces (example: {./@id}
). Such XPath expressions are evaluated as strings in the context of the element selected by attribute xpath
. If attribute xpath
selects an attribute, its parent element is used as an evaluation context for the XPath expression.
Attributes:
Selects elements and attributes containing the image data to be extracted.
This XPath expression must conform to the XPath subset needed to implement W3C XML Schemas (but not only relative paths, also absolute paths).
Specifies how the image data is ``stored'' in the elements or the attributes selected by the above XPath expression: anyURI, hexBinary, base64Binary or XML. This cannot be guessed for documents conforming to a DTD and for documents not constrained by a grammar.
Default: find the data type using the grammar of the document being processed.
Specifies the directory where extracted image files are to be created. Relative directories are relative to the temporary directory created during the execution of the process (that is, %W
).
Default: use the temporary directory created during the execution of the process (that is, %W
).
Specifies the start of the extracted image file names. An automatically generated part is always added after this user prefix.
Default: the base name of an extracted image file is automatically generated in its entirety.
Specifies which extension to use for extracted image file names. Specifying "svgz
" for extracted SVG images allows to create compressed SVG files.
Default: the extension is guessed by XXE for a number of common image formats.
<resources include = NMTOKENS exclude = NMTOKENS match = Regexp pattern resolve = boolean : false copyTo = Path referenceAs = anyURI />
The resources
child element specifies what to do with the resources which are logically part of the document.
The resources which are logically part of the document are specified using another configuration element: documentResources
(see Section 10, “documentResources” in XMLmind XML Editor - Configuration and Deployment). DocBook example:
<cfg:documentResources xmlns=""> <cfg:resource kind="image" path="//imagedata/@fileref"/> <cfg:resource kind="image" path="//graphic/@fileref"/> <cfg:resource kind="image" path="//inlinegraphic/@fileref"/> <cfg:resource kind="text" path="//textdata/@fileref"/> <cfg:resource kind="audio" path="//audiodata/@fileref"/> <cfg:resource kind="video" path="//videodata/@fileref"/> </cfg:documentResources>
Note that elements replaced during an extraction step specified by the extract
element are never scanned for resources.
The default resources
child elements are:
<resources match="^[a-zA-Z][a-zA-Z0-9.+-]*:/.+" /> <resources match=".+" copyTo="." />
Attributes of the resources
child element specifying how to match a resource:
match
For each resource of the document specified by the documentResources
element, its URI is tested to see if it matches the first resources
child element. If it does not match the first resources
child element, the second resources
child element is tried and so on until a matching resources
child element is found.
If the matching resources
element has no resolve
, copyTo
or referenceAs
attribute, the matched resource is ignored. For example, rule <resources match="^[a-zA-Z][a-zA-Z0-9.+-]*:/.+"/>
is designed to ignore resources of any kind having an absolute URL.
include
This attribute contains one or more kinds of resources separated by whitespace. Example related to the above DocBook example: include="image"
.
Unless the resource being processed has been given a kind and unless this kind is referenced in attribute include
of element resources
, the action corresponding to element resources
is skipped.
exclude
This attribute contains one or more kinds of resources separated by whitespace. Example related to the above DocBook example: exclude="text image"
.
If the resource being processed has been given a kind and if this kind is referenced in attribute exclude
of element resources
, the action corresponding to element resources
is skipped.
Attribute exclude
has priority over attribute include
.
Attributes of the resources
child element specifying an action on the matched resource:
resolve
If resolve="true"
, attributes copyTo
and referenceAs
are ignored. Instead, in the copy of the document, the relative URI of the matched resource is replaced by its equivalent absolute URI.
Example:
<resources include="text" match=".+" resolve="true"/>
Let's suppose document file:///docs/doc.xml
references text resource examples/sample1.txt
. The copy of the document will reference absolute URI file:///docs/examples/sample1.txt
.
copyTo
Specifies where to copy the matched resource. This can be a file name or a directory name.
The value of this attribute can contain $1
, $2
, ..., $9
variables, which are substituted with the substrings matching the parenthesized groups of the match
regular expression.
Example:
<resources match="(?:.+/)?(.+)\.jpg" copyTo="resources/$1.jpeg"/>
Let's suppose the document references resource images/logo.jpg
. File logo.jpg
will be copied to resources/logo.jpeg
and the copy of the document will reference resources/logo.jpeg
.
referenceAs
Specifies the reference to the resource in the document created by the copyDocument
configuration element.
Like for copyTo
, the value of this attribute may contain $1
, $2
, ..., $9
variables.
Generally, this attribute is not needed because the reference implied by the value of the copyTo
attribute is sufficient. But this attribute can be useful if images are to be converted from their original format to a format supported by the target XSL-FO processor.
DocBook example:
<process> <mkdir dir="resources"/> <mkdir dir="raw"/> <copyDocument to="__doc.xml"> <resources match="^[a-zA-Z][a-zA-Z0-9.+-]*:/.+"/> <resources include="text" match=".+" resolve="true"/> <resources include="image" match=".+\.(png|jpg|jpeg|gif)" copyTo="resources"/> <resources include="image" match="(?:.+/)?(.+)\.(\w+)" copyTo="raw" referenceAs="resources/$1.png"/> <resources exclude="text image" match=".+" copyTo="resources"/> </copyDocument> <convertImage from="raw" to="resources" format="png"/> ... </process>