In the realm of public-domain spell-checking, ispell
and its “affix files” are a popular way of defining word lists. See http://fmg-www.cs.ucla.edu/geoff/ispell.html for an introduction to ispell.
If you want to build a dictionary for a language that is not in the distribution, you will probably find (if it exists) a word list in ispell format. Check the following URL: http://fmg-www.cs.ucla.edu/geoff/ispell-dictionaries.html
Notice that most of these dictionaries are covered by the GNU Public License. Therefore you have to check whether it is suitable for your needs: it is always suitable for personal use, but redistributing compiled dictionaries could raise a legal issue with some authors.
You need of course an operational ispell
installed on your machine (most probably a Unix box). It is strongly recommended to have a 8-bit clean version, with a compile-time option "MASK BITS
" set to 64. This sounds esoteric, but it should be the case with most recent Linux distributions.
The basic job is to expand the affixed word list into a plain word list. For this you have to first compile into a ispell ``hashfile'':
buildhash mydict.txt mydict.aff mydict.hash
Here the mydict.txt
file is the affixed word list, mydict.aff
the affix file.
Then, the word list is expanded this way:
ispell -e -d ./mydict.hash < mydict.txt > mydict.wl
Then the expanded word list (here mydict.wl
) can be used with the builder.