How to Use Unicode in Aspell Dictionary

Aspell decodes dictionary file as latin1 even if both environment and aspell config specifies the encoding as UTF-8

When creating a dictionary with -lang=en Aspell looks for the en language file. On my Ubuntu system that looks like:

name en
charset iso8859-1
special ' -*-
soundslike en
affix en

So Aspell uses that charset. To override that setting you use the --encoding=utf-8 option.

Then for input (and suggested words) set the encoding option.

adding many dictionaries to aspell

According to the texinfo manual (info aspell), aspell uses a list option format that is different from other GNU programs, in which the base option name is prefixed with add- or rem- to respectively add or remove items from a list:

4.1.1.3 List options ....................

To add a value to the list, prefix the option name with an 'add-' and
then specify the value to add. For example, to add the URL filter use
'--add-filter url'. To remove a value from a list option, prefix the
option name with a 'rem-' and then specify the value to remove. For
example, to remove the URL filter use '--rem-filter url'. To remove
all items from a list prefix the option name with a 'clear-' without
specify any value. For example, to remove all filters use
'--clear-filter'.

Following this pattern for the --extra-dicts option, you would add multiple extra dictionaries as

--add-extra-dicts dict1 --add-extra-dicts dict2

The documentation for Aspell 0.60.7-20110707 also mentions a (possibly newer) more direct delimited list format, using a third prefix lset:

A list option can also be set directly, in which case it will be
set to a single value. To directly set a list option to multiple
values prefix the option name with a 'lset-' and separate each value
with a ':'. For example, to use the URL and TeX filter use
'--lset-filter url:tex'.

Following this format, your option would become

--lset-extra-dicts dict1:dict2

Where to find dictionaries for other languages for IntelliJ?

Current IDEA versions load dictionaries in UTF-8, you don't need to convert them to the platform encoding, ignore the iconv step below.

The dictionary can be produced using aspell for Unix/Mac OS X or under Cygwin. You need to have aspell and appropriate dictionary installed.

Here is the example for Russian dictionary I've used:

aspell --lang ru-yeyo dump master | aspell --lang ru expand | tr ' ' '\n' > russian.dic

For German it would be:

aspell --lang de_DE dump master | aspell --lang de_DE expand | tr ' ' '\n' > de.dic


Related Topics



Leave a reply



Submit