Extension points

Custom conversion step

The stock conversion steps are: com.xmlmind.w2x.processor.ConvertStep, DeleteFilesStep, EditStep, LoadStep, SaveStep, TransformStep.

A custom conversion step may be implemented by deriving abstract class com.xmlmind.w2x.processor.ProcessStep. Such task poses no technical problems whatsoever. Suffice for that to implement a single method: ProcessStep.process.

See reference of class com.xmlmind.w2x.processor.Processor.

Custom image converters

Image converters are used to convert images having a format not supported by Web browsers (TIFF, WMF, EMF, etc) to a format supported by Web browsers (SVG, PNG, JPEG).

Image converters are specified by interface com.xmlmind.w2x.docx.image.ImageConverterFactory. XMLmind Word To XML ships with 4 classes implementing this interface:

com.xmlmind.w2x.docx.image.ImageConverterFactoryImpl

Image converter factory used to convert TIFF images to PNG or JPEG.

com.xmlmind.w2x_ext.wmf_converter.WMFConverterFactory

Image converter factory used to convert WMF graphics to SVG.

com.xmlmind.w2x_ext.emf2png.EMF2PNG

This image converter factory is available only on Windows. It leverages Windows own GDI+ to convert EMF (in fact, Windows metafiles of any kind, including WMF) to PNG.

This is not that great because, unlike above WMFConverterFactory which converts WMF (Windows vector graphics format) to SVG (standard vector graphics format), EMF2PNG converts a vector graphics format to a raster image format. However, having EMF2PNG is better than nothing at all.

EMF2PNG has one parameter called resolution. Its value is a real number expressed in Dot Per Inch (DPI). The default value of parameter resolution is -300 (see below).

The resolution parameter specifies the resolution of the output PNG file. 0 means: same resolution as the one found input EMF/WMF file; a positive number means: use this value to override the resolution found in the input EMF/WMF file; a negative number means: use specified absolute value but only if this absolute value is greater than the resolution found in the input EMF/WMF file.

com.xmlmind.w2x.docx.image.ExternalImageConverter

This image converter factory executes an external program to perform the conversion. See ‎9.1.2.1 below.

If you want w2x to support more image formats, you’ll have to create your own ImageConverterFactory and register it with w2x using method ImageConverterFactories.register.

About thread-safety

A single instance of a class implementing ImageConverterFactory is used by all instances of com.xmlmind.w2x.processor.Processor. This implies that an implementation of ImageConverterFactory must be thread-safe.

See reference of package com.xmlmind.w2x.docx.image.ImageConverterFactories.

Specifying an external image converter

Examples of W2X_IMAGE_CONVERSIONS specifications (see ‎9.1.2.2 below):

Convert EMF/WMF to SVG using OpenOffice/LibreOffice:

.emf.svg.wmf.svg soffice --headless --convert-to svg -–outdir %~po %i

Or equivalently using unoconv:

.emf.svg.wmf.svg unoconv -f svg -o %o %i

Convert EMF to SVG using Inkscape:

.emf.svg inkscape -l -o %o %i

The command executed by an external image converter may contain the following variables:

Variable

Definition

%I

Absolute path of the input image file.

%O

Absolute path of the output image file.

%i

Same as %I but quoted, that is, equivalent to “%I”.

%o

Same as %O but quoted, that is, equivalent to “%O”.

%S

File separator: “\” on Windows, “/” on Mac/Linux.

The following modifiers may be applied to the %I, %O, %i, %o variables:

Modifier

Definition

~p

Absolute path of the parent directory of the file. For example, if %I is “C:\temp\doc_files\logo.wmf”, then %~pI is “C:\temp\doc_files”.

~n

Basename of the file. For example, if %I is “C:\temp\doc_files\logo.wmf”, then %~nI is “logo.wmf”.

~r

Basename of the file without any extension. For example, if %I is “C:\temp\doc_files\logo.wmf”, then %~rI is “logo”.

~e

Extension of the file. For example, if %I is “C:\temp\doc_files\logo.wmf”, then %~eI is “wmf”.

Also note that “%%” may be used to escape character “%”. More generally, just like in an URL, an %HH UTF-8 sequence may be used to escape any character. Example: “%3B” is “;” (semi colon), “%C3%A9” is “é” (“e” with acute accent).

Controlling how image files found in the input DOCX file are converted to standard formats

Conversion of images found in the DOCX file (TIFF, WMF, EMF, etc) to standard formats (SVG, PNG, JPEG) may be controlled using environment variable (or Java™ property) W2X_IMAGE_CONVERSIONS.

The default value of this variable is (all specifications on a single line):

.wmf.svg java:com.xmlmind.w2x_ext.wmf_converter.WMFConverterFactory;

.tiff.png java:com.xmlmind.w2x.docx.image.ImageConverterFactoryImpl

On Windows, the default value of W2X_IMAGE_CONVERSIONS is (all specifications on a single line):

.wmf.svg java:com.xmlmind.w2x_ext.wmf_converter.WMFConverterFactory;

.emf.png.wmf.png java:com.xmlmind.w2x_ext.emf2png.EMF2PNG resolution -300;

.tiff.png java:com.xmlmind.w2x.docx.image.ImageConverterFactoryImpl

The syntax of W2X_IMAGE_CONVERSIONS is:

specifications -> “-” | specification_list

specification_list -> specification [ “;” specification ]+

specification -> “+” | image_conversion

image_conversion -> extensions S ( java_image_conversion | external_image_conversion )

extensions -> [ “.” input_file_extension “.” output_file_extension ]+

java_image_conversion -> “java:” fully_qualified_java_class_name parameters

parameters -> [ S parameter_name S possibly_quoted_parameter_value ]*

external_image_conversion -> command_line

About this syntax:

-” means: no specifications; hence no image conversions at all.

+” means: insert default value of W2X_IMAGE_CONVERSIONS at this point. Example:

set W2X_IMAGE_CONVERSIONS=.emf.svg inkscape -l -o %o %i;+

where default value of W2X_IMAGE_CONVERSIONS is (on Windows):

.wmf.svg java:com.xmlmind.w2x_ext.wmf_converter.WMFConverterFactory;

.emf.png.wmf.png java:com.xmlmind.w2x_ext.emf2png.EMF2PNG resolution -300;

.tiff.png java:com.xmlmind.w2x.docx.image.ImageConverterFactoryImpl

Note that the image conversion specifications are considered in the order of their declarations in variable W2X_IMAGE_CONVERSIONS. In the case of the above example, it’s custom “inkscape -l -o %o %i” which is used to convert EMF to PNG and not stock “java:com.xmlmind.w2x_ext.emf2png.EMF2PNG resolution -300”.