Edit step

Edit in place input XHTML document using a XED script. The result of this step is the same XHTML document, but modified by the script.

For clarity, the “edit.” parameter name prefix is omitted here.

However when you’ll pass any of the following parameters to w2x, please do not forget this prefix. Example: -p edit.ids.generate-section-ids yes.

Parameters:

Name

Value

Description

xed-url-or-file

An absolute URL or the path of an existing file.

No default (required).

Specifies which XED script should be used to edit the input XHTML document. A relative file path is relative to the current working directory.

Any other parameter is passed to the XED script as a XED global variable.

XMLmind Word to XML (w2x for short) comes with two stock “main” XED scripts:

w2x:xed/main-styled.xed

Invokes XED scripts used to “polish up” the styled XHTML 1.0 Transitional document created by the Convert step (e.g. process consecutive paragraphs having identical borders).

w2x:xed/main.xed

Invokes XED scripts used to prepare the generation of semantic XML of all kinds: XHTML, DocBook, DITA. These scripts leverage the CSS styles and classes found in the styled XHTML 1.0 Transitional document created by the Convert step. They translate these CSS styles and classes (e.g. numbered paragraph) into semantic tags (e.g. ol/li).

Note: Something like “w2x:xed/main.xed” is an absolute URL supported by w2x. “w2x:” is an URL prefix (defined in the automatic XML catalog used by w2x) which specifies the location of the parent directory of both the xed/ and xslt/ subdirectories.

Table 1 Parameters common to w2x:xed/main-styled.xed and w2x:xed/main.xed

Name

Value

Description

finish-styles.css-uri

An absolute or relative “file:” URI.

Default: “”. “Interned” CSS styles, if any, are stored in a head/style element.

Global variable defined in w2x:xed/finish-styles.xed.

Store “interned” CSS styles, if any, in the CSS (UTF-8 encoded) file having this URI. A relative URI is relative to the URI specified by parameter xhtml-file.

More information about “interned” CSS styles in command parse-styles (command invoked by w2x:xed/init-styles.xed) and inverse command unparsed-styles (command invoked by w2x:xed/finish-styles.xed).

finish-styles. custom-styles-url-or-file

An absolute URL or a filename. A relative filename is relative to the current working directory.

Default: “” (no custom styles).

Global variable defined in w2x:xed/finish-styles.xed.

Specifies the location of a CSS file. The custom CSS styles found in specified file are simply appended to the automatically generated CSS styles.

Using this variable is the easiest way to customize the automatically generated CSS styles.

When generating multi-page styled or semantic XHTML of any kind (frameset, Web Help, EPUB)

Please use finish-styles. custom-styles-url-or-file to specify custom CSS styles.

No need to specify finish-styles.css-uri as all the CSS styles are anyway stored into an external “.css” file having the same basename as the main output file.

finish-styles.mathjax

yes” | “no” | “auto

Default: “no”.

Global variable defined in w2x:xed/finish-styles.xed.

Very few web browsers (Firefox) can natively render MathML. Fortunately, there is MathJax.

MathJax is a JavaScript display engine for mathematics that works in all browsers.

yes

Add a <script> element loading MathJax to the <html>/<head> element of the generated XHTML file.

auto

Same as “yes”, but add <script> only when the generated XHTML file contains MathML.

finish-styles.mathjax-url

String.

Default value: the URL pointing to the MathJax CDN, as recommended in the MathJax documentation.

Global variable defined in w2x:xed/finish-styles.xed.

The URL allowing to load the MathJax engine configured for rendering MathML.

Ignored unless parameter mathjax is set to “yes” or “auto”.

title.keep-title

yes” | “no

Default: “yes” when generating styled or semantic XHTML of all kinds (single-page, EPUB, etc), “no” when generating any other format.

Global variable defined in w2x:xed/title.xed.

Default value “no” specifies that paragraphs having “p-Title” and “p-Subtitle” styles (to make it simple; see also parameters title.title-style-names and title.subtitle-style-names) are to be converted only to head/title and to head/meta name="description".

This simple behavior makes these titles invisible to the user, though usable by programs such as the XSLT stylesheets generating DITA or DocBook.

Value “yes” may be used to specify that paragraphs having “p-Title” and “p-Subtitle” styles are additionally converted to equivalent, visible, XHTML elements.

These equivalent, visible, XHTML elements are specified by parameters title.title-container and title.subtitle-container.

title.title-container

An XHTML element name possibly followed by one or more attributes.

Default: “” when generating styled XHTML; otherwise “h1 class='role-document-title'” .

Global variable defined in w2x:xed/title.xed.

Specifies the XHTML element to which a paragraph having a “p-Title” style is to be converted. An empty string value is equivalent to “p”.

Ignored when parameter title.keep-title is “no”.

title.title-style-names

List of user-defined style names separated by space characters.

Default: “” (empty list).

Global variable defined in w2x:xed/title.xed.

Specifies which user-defined paragraph styles should be considered to be equivalent to standard style “p-Title”.

(Paragraph styles, whether user-defined or standard, are given a “p-“ prefix by the Convert step.)

title.subtitle-container

An XHTML element name possibly followed by one or more attributes.

Default: “” when generating styled XHTML; otherwise “p class='role-document-subtitle'”.

Global variable defined in w2x:xed/title.xed.

Specifies the XHTML element to which a paragraph having a “p-Subtitle” style is to be converted. An empty string value is equivalent to “p”.

Ignored when parameter title.keep-title is “no”.

title.subtitle-style-names

List of user-defined style names separated by space characters.

Default: “” (empty list).

Global variable defined in w2x:xed/title.xed.

Specifies which user-defined paragraph styles should be considered to be equivalent to standard style “p-Subtitle”.

(Paragraph styles, whether user-defined or standard, are given a “p-“ prefix by the Convert step.)

Table 2 Parameters which are specific to w2x:xed/main-styled.xed

Name

Value

Description

remove-pis.except

One or more processing-instructions targets separated by space characters.

Default: “” (remove all processing-instructions)

Global variable defined in w2x:xed/remove-pis.xed.

Specifies which processing-instructions should be kept in the styled HTML document.

By default, all processing-instructions are removed from the styled HTML document. Such processing-instructions are useful only when the styled HTML document created by the Convert step is used as an intermediate format in order to generate semantic XML.

Table 3 Parameters which are specific to w2x:xed/main.xed

Name

Value

Description

before-save.allow-flow

yes” | “no”.

Default: “no”.

Global variable defined in w2x:xed/before-save.xed.

If “yes”, allow flow elements (e.g. li) to directly contain text and inline elements.

If “no”, do not allow flow elements (e.g. li) to directly contain text and inline elements. Instead “wrap” these text and and inline elements in <p class=”role-inline-wrapper”> elements.

The “no” option greatly eases the generation of certain types of semantic XML (e.g. DocBook) during the Transform step.

biblio.style-names

List of user-defined style names separated by space characters.

Default: “” (empty list).

Global variable defined in w2x:xed/biblio.xed.

Specifies which user-defined paragraph styles should be considered to be equivalent to standard style “p-Bibliography”.

(Paragraph styles, whether user-defined or standard, are given a “p-“ prefix by the Convert step.)

blocks.convert

A conversion specification.

Default: “”. No conversions other than those performed by w2x:xed/blocks.xed.

Global variable defined in w2x:xed/blocks.xed.

Specified paragraph styles are converted to specified XHTML elements. See below.

blocks.convert-to-pre

A conversion specification.

Default: “”.

Global variable defined in w2x:xed/blocks.xed.

Specified paragraph styles are converted to specified XHTML elements. See below.

When using MS-Word, there two ways to represent code samples:

Use a sequence of paragraphs having the same style. Each paragraph contains one line of the code sample. Let’s call the style of these paragraphs Code1.

Use a single paragraph containing the whole code sample, which means that this single paragraph contains significant whitespace and line breaks. Let’s call the style of this paragraph Code2.

A sequence of Code1 paragraphs may be converted to an XHTML pre using:

–p edit.blocks.convert "p-Code1 span g:id='pre' g:container='pre'"

A Code2 paragraph may be converted to an XHTML pre using:

–p edit.blocks.convert-to-pre "p-Code2 pre"

captions.style-names

List of user-defined style names separated by space characters.

Default: “” (empty list).

Global variable defined in w2x:xed/captions.xed.

Specifies which user-defined paragraph styles should be considered to be equivalent to standard style “p-Caption”.

(Paragraph styles, whether user-defined or standard, are given a “p-“ prefix by the Convert step.)

convert-tabs.to-table

yes” | “no”.

Default: “no”.

Global variable defined in w2x:xed/convert-tabs.xed.

If set to “yes”, convert consecutive paragraphs containing text runs aligned on tab stops to a borderless table.

This option is turned off by default because, in the general case, it's not possible to emulate tab stops using tables.

convert-tabs.unwrap-paragraphs

yes” | “no”.

Default: “yes”.

Global variable defined in w2x:xed/convert-tabs.xed.

If set to “yes”, the cells contained in the borderless table used to emulate tab stops directly contain text runs rather than paragraphs.

headings.convert

A conversion specification.

Default: “”. No conversions other than those performed by w2x:xed/headings.xed.

Global variable defined in w2x:xed/headings.xed.

Specified paragraph styles are converted to specified XHTML heading elements (h1, h2, …, h6). See below.

Note that by default, script headings.xed automatically converts paragraphs having an outline level to h1, h2, …, h6 headings.

ids.generate-section-ids

yes” | “no”.

Default: “no”.

Global variable defined in w2x:xed/ids.xed.

Ensure that all the sections found in the semantic XHTML resulting from the conversion of a DOCX file have a unique ID.

When this ID is missing, it is computed using the content of the h1, h2, ..., h6 heading which is the first child of the section. Example:

<div class="role-section2"

id="Title_of_this_section">

<h2>Title of this section</h2>

...

Setting ids.generate-section-ids to yes is especially useful when converting a DOCX file to a DITA map or bookmap. With this parameter, the filenames of the topics referenced by the generated map are guaranteed to have meaningful values (e.g. "Introduction.dita" rather than "d0e35.dita").

ids.section-id-max-length

An integer greater or equal to 1.

Default: 32.

Global variable defined in w2x:xed/ids.xed.

Specifies the maximum length of the automatically computed ID when parameter ids.generate-section-ids is set to yes.

index.index-term-separator

A string.

Default: "".

Global variable defined in w2x:xed/index.xed.

Specifies the string used to join index terms when a redirection to another index entry is to be generated (example: “See Cat, Siamese, Seal point”).

inlines.b-element,
inlines.big-element
,
inlines.i-element,
inlines.s-element,
inlines.small-element,
inlines.sub-element,
inlines.sup-element,
inlines.tt-element,
inlines.u-element

An element name optionally followed by attributes.

Defaults: "b", "big", "i", "s", "small", "sub", "sup", "tt", "u".

Global variables defined in w2x:xed/inlines.xed.

By default, the Edit step converts a text span having style="font-weight:bold" (as generated by the Convert step) to XHTML element b. Specifying parameter –p edit.inlines.b-element "strong" replaces default b element by a strong element.

Similarly, alternate element names may be specified using the following parameters: inlines.sup-element, inlines.sup-element, inlines.small-element, inlines.big-element, inlines.s-element, inlines.u-element, inlines.tt-element, inlines.i-element.

Example 1: generate code rather than tt elements: -p edit.inlines.tt-element "code".

Example 2: do not generate small elements: -p edit.inlines.small-element "span style='font-size:x-small'" (notice how one or more attributes may be specified too).

This facility is useful only when generating semantic XHTML and all formats based on semantic XHTML. Using it when generating DITA or DocBook may give poor results.

inlines.convert

A conversion specification.

Default: “”. No conversions other than those performed by w2x:xed/inlines.xed.

Global variable defined in w2x:xed/inlines.xed.

Specified character styles are converted to specified XHTML elements . See below.

inlines.generate-big-small

yes” | “no”.

Default: “yes”.

Global variable defined in w2x:xed/inlines.xed.

Specifies whether spans having a bigger (respectively smaller) font size than their parent elements should be converted to big (respectively small) elements.

metas.keep

Regular expression matching part or all of the name of the XHTML meta.

Global variable defined in w2x:xed/metas.xed.

When generating semantic XML of any kind, all the XHTML meta elements but author, description, dcterms.* are automatically suppressed from the semantic XHTML 1.0 Transitional document generated by the Edit step and used as an input by the Transform step.

If you want to keep some or all the meta elements in this intermediate semantic XHTML 1.0 Transitional document, you may specify
-p edit.metas.keep regexp.

Examples: -p edit metas.keep ".*" keeps all metas; -p edit metas.keep "^dc\." keep all metas having a name starting with "dc." (e.g. <meta name="dc.subject" content="..."/>).

prune.preserve

List of user-defined style names separated by space characters.

Default: “” (empty list).

Global variable defined in w2x:xed/prune.xed.

Empty paragraphs having a user-defined style found in this list will not be deleted by w2x:xed/prune.xed.

remove-styles.preserved-classes

List of user-defined style names separated by space characters.

Default: “” (empty list).

Global variable defined in w2x:xed/remove-styles.xed.

The CSS classes used to apply the user-defined styles specified in this list will not be removed by w2x:xed/removes-styles.xed.

Note that specifying both parameters prune.preserve and remove-styles.preserved-classes is currently the only way to keep in the generated semantic XHTML empty paragraphs having a given MS-Word style. For example, specifying -p prune.preserve p-PlaceHolder and -p remove-styles.preserved-classes p-PlaceHolder may be used to keep in the semantic XHTML output all empty paragraphs having the p-PlaceHolder style.

sections.max-level

An integer greater or equal to 1.

Default: -1. No maximum level.

Global variable defined in w2x:xed/sections.xed.

Wrap sequences of elements starting with a hN element (that is, h1, h2, h3, h4, h5, h6) into <div class=”role-sectionN> elements.

This parameter specifies the maximum level of nesting for such sections.

Simple conversion specifications

Above parameter blocks.convert (respectively inlines.convert) provides the user of w2x with a simple mean to convert p (respectively span) elements having certain paragraph (respectively character) styles to XHTML elements possibly having attributes.

The syntax of a simple conversion specification is:

specsimple_spec [ S ‘!’ S simple_spec ]*

simple_specstyle_spec S XHTML_element_qname [ S attribute_spec ]*

style_specstyle_name | style_pattern

style_pattern → ‘/’ pattern ’/’ | ‘^’ pattern ‘$’

attribute_specattribute_qname ‘=’ quoted_attribute_value

quoted_attribute_value → “’” value “’” | ‘”’ value ‘”’

Note that when specifying a XHTML_element_qname, you must restrict yourself to XHTML 1.0 Transitional elements. Specifying for example, XHTML 5.0 elements such as mark, aside, section, etc, will not give you the results you’ll expect.

Examples: stock styled span conversions used by w2x:xed/inlines.xed:

/Emphasis$/ em !

c-Strong strong !

c-BookTitle cite !

/((IntenseReference)|(SubtleReference)|(QuoteChar))$/ em !

/((itleChar)|(Heading\d+Char))$/ strong

Custom styled span conversions used to process this manual:

c-Code code

Stock styled paragraph conversions used by w2x:xed/blocks.xed:

/Quote$/ p g:id='blockquote' g:container='blockquote'

Custom styled paragraph conversions used to process this manual:

p-Term dt g:id="dl" g:container="dl" !

p-Definition dd g:id="dl" g:container="dl" !

p-ProgramListing span g:id="pre" g:container="pre"

Automatic grouping of the XHTML elements which are the results of the styled paragraph conversions

In the above examples, attributes having names prefixed with “g:” are in the “urn:x-mlmind:namespace:group” namespace. These attributes are called grouping attributes. Examples: g:id, g:container.

When parameter blocks.convert is used to create XHTML elements having grouping attributes, command group() is automatically invoked at the end of all the styled paragraph conversions. To make it simple, this command groups consecutive XHTML elements having the same g:id attribute into a common parent element. The parent element is specified by attribute g:container.

In the above examples,

Consecutive p elements having grouping attributes g:id='blockquote' and g:container='blockquote' are grouped into a common blockquote parent element.

Consecutive dt and dt elements having grouping attributes g:id="dl" and g:container="dl are grouped into a common dl parent element.

Consecutive span elements having grouping attributes g:id="pre" and g:container="pre" are grouped into a common pre parent element.