Extension points
Custom conversion step
The stock conversion steps are: com.xmlmind.w2x.processor.ConvertStep, DeleteFilesStep, EditStep, LoadStep, SaveStep, TransformStep.
A custom conversion step may be implemented by deriving abstract class com.xmlmind.w2x.processor.ProcessStep. Such task poses no technical problems whatsoever. Suffice for that to implement a single method: ProcessStep.process.
See reference of class com.xmlmind.w2x.processor.Processor.
Custom image converters
Image converters are used to convert images having a format not supported by Web browsers (TIFF, WMF, EMF, etc) to a format supported by Web browsers (SVG, PNG, JPEG).
Image converters are specified by interface com.xmlmind.w2x.docx.image.ImageConverterFactory. XMLmind Word To XML ships with 4 classes implementing this interface:
com.xmlmind.w2x.docx.image.ImageConverterFactoryImpl
Image converter factory used to convert TIFF images to PNG or JPEG.
com.xmlmind.w2x_ext.wmf_converter.WMFConverterFactory
Image converter factory used to convert WMF graphics to SVG.
com.xmlmind.w2x_ext.emf2png.EMF2PNG
This image converter factory is available only on Windows. It leverages Windows own GDI+ to convert EMF (in fact, Windows metafiles of any kind, including WMF) to PNG.
This is not that great because, unlike above WMFConverterFactory which converts WMF (Windows vector graphics format) to SVG (standard vector graphics format), EMF2PNG converts a vector graphics format to a raster image format. However, having EMF2PNG is better than nothing at all.
EMF2PNG has one parameter called resolution. Its value is a real number expressed in Dot Per Inch (DPI). The default value of parameter resolution is -300 (see below).
The resolution parameter specifies the resolution of the output PNG file. 0 means: same resolution as the one found input EMF/WMF file; a positive number means: use this value to override the resolution found in the input EMF/WMF file; a negative number means: use specified absolute value but only if this absolute value is greater than the resolution found in the input EMF/WMF file.
com.xmlmind.w2x.docx.image.ExternalImageConverter
This image converter factory executes an external program to perform the conversion. See 9.1.2.1 below.
If you want w2x to support more image formats, you’ll have to create your own ImageConverterFactory and register it with w2x using method ImageConverterFactories.register.
About thread-safety
A single instance of a class implementing ImageConverterFactory is used by all instances of com.xmlmind.w2x.processor.Processor. This implies that an implementation of ImageConverterFactory must be thread-safe.
See reference of package com.xmlmind.w2x.docx.image.ImageConverterFactories.
Specifying an external image converter
Examples of W2X_IMAGE_CONVERSIONS specifications (see 9.1.2.2 below):
Convert EMF/WMF to SVG using OpenOffice/LibreOffice:
.emf.svg.wmf.svg soffice --headless --convert-to svg -–outdir %~po %i
Or equivalently using unoconv:
.emf.svg.wmf.svg unoconv -f svg -o %o %i
Convert EMF to SVG using Inkscape:
.emf.svg inkscape -l -o %o %i
The command executed by an external image converter may contain the following variables:
Variable | Definition |
%I | Absolute path of the input image file. |
%O | Absolute path of the output image file. |
%i | Same as %I but quoted, that is, equivalent to “%I”. |
%o | Same as %O but quoted, that is, equivalent to “%O”. |
%S | File separator: “\” on Windows, “/” on Mac/Linux. |
The following modifiers may be applied to the %I, %O, %i, %o variables:
Modifier | Definition |
~p | Absolute path of the parent directory of the file. For example, if %I is “C:\temp\doc_files\logo.wmf”, then %~pI is “C:\temp\doc_files”. |
~n | Basename of the file. For example, if %I is “C:\temp\doc_files\logo.wmf”, then %~nI is “logo.wmf”. |
~r | Basename of the file without any extension. For example, if %I is “C:\temp\doc_files\logo.wmf”, then %~rI is “logo”. |
~e | Extension of the file. For example, if %I is “C:\temp\doc_files\logo.wmf”, then %~eI is “wmf”. |
Also note that “%%” may be used to escape character “%”. More generally, just like in an URL, an %HH UTF-8 sequence may be used to escape any character. Example: “%3B” is “;” (semi colon), “%C3%A9” is “é” (“e” with acute accent).
Controlling how image files found in the input DOCX file are converted to standard formats
Conversion of images found in the DOCX file (TIFF, WMF, EMF, etc) to standard formats (SVG, PNG, JPEG) may be controlled using environment variable (or Java™ property) W2X_IMAGE_CONVERSIONS.
The default value of this variable is (all specifications on a single line):
.wmf.svg java:com.xmlmind.w2x_ext.wmf_converter.WMFConverterFactory;
.tiff.png java:com.xmlmind.w2x.docx.image.ImageConverterFactoryImpl
On Windows, the default value of W2X_IMAGE_CONVERSIONS is (all specifications on a single line):
.wmf.svg java:com.xmlmind.w2x_ext.wmf_converter.WMFConverterFactory;
.emf.png.wmf.png java:com.xmlmind.w2x_ext.emf2png.EMF2PNG resolution -300;
.tiff.png java:com.xmlmind.w2x.docx.image.ImageConverterFactoryImpl
The syntax of W2X_IMAGE_CONVERSIONS is:
specifications -> “-” | specification_list
specification_list -> specification [ “;” specification ]+
specification -> “+” | image_conversion
image_conversion -> extensions S ( java_image_conversion | external_image_conversion )
extensions -> [ “.” input_file_extension “.” output_file_extension ]+
java_image_conversion -> “java:” fully_qualified_java_class_name parameters
parameters -> [ S parameter_name S possibly_quoted_parameter_value ]*
external_image_conversion -> command_line
About this syntax:
“-” means: no specifications; hence no image conversions at all.
“+” means: insert default value of W2X_IMAGE_CONVERSIONS at this point. Example:
set W2X_IMAGE_CONVERSIONS=.emf.svg inkscape -l -o %o %i;+
where default value of W2X_IMAGE_CONVERSIONS is (on Windows):
.wmf.svg java:com.xmlmind.w2x_ext.wmf_converter.WMFConverterFactory;
.emf.png.wmf.png java:com.xmlmind.w2x_ext.emf2png.EMF2PNG resolution -300;
.tiff.png java:com.xmlmind.w2x.docx.image.ImageConverterFactoryImpl
Note that the image conversion specifications are considered in the order of their declarations in variable W2X_IMAGE_CONVERSIONS. In the case of the above example, it’s custom “inkscape -l -o %o %i” which is used to convert EMF to PNG and not stock “java:com.xmlmind.w2x_ext.emf2png.EMF2PNG resolution -300”.