How do I sort unicode strings alphabetically in Python?
IBM's ICU library does that (and a lot more). It has Python bindings: PyICU.
Update: The core difference in sorting between ICU and locale.strcoll
is that ICU uses the full Unicode Collation Algorithm while strcoll
uses ISO 14651.
The differences between those two algorithms are briefly summarized here: http://unicode.org/faq/collation.html#13. These are rather exotic special cases, which should rarely matter in practice.
>>> import icu # pip install PyICU
>>> sorted(['a','b','c','ä'])
['a', 'b', 'c', 'ä']
>>> collator = icu.Collator.createInstance(icu.Locale('de_DE.UTF-8'))
>>> sorted(['a','b','c','ä'], key=collator.getSortKey)
['a', 'ä', 'b', 'c']
How to sort unicode strings alphabetically in Common Lisp?
If you use SBCL, you have integrated support for unicode.
String operations
Try to sort with unicode< instead of string-lessp.
Sorting strings with accented characters in python
I finally chose to strip diacritics and compare the stripped version of the strings so that I don't have to add the PyICU dependency.
How to sort a list with an exception in Python
Here is a simple solution based on the Turkish alphabet:
alphabet = "abcçdefgğhıijklmnoöprsştuüvyz"
words = ["merhaba", "aşk", "köpek", "Teşekkürle"]
sorted_words = sorted(words, key=lambda word: tuple(alphabet.index(c) for c in word.lower()))
This code is able to sort words using the lexicographic order. It also works with words containing capital letters.
Related Topics
How to Compute the Intersection Point of Two Lines
Multiprocessing: Sharing a Large Read-Only Object Between Processes
Remove Punctuation from Unicode Formatted Strings
How to See If There's an Available and Active Network Connection in Python
Python CSV Error: Line Contains Null Byte
Normalize Columns of a Dataframe
How to Print a Generator Expression
Imread Returns None, Violating Assertion !_Src.Empty() in Function 'Cvtcolor' Error
How to Get the Different Parts of a Flask Request's Url
How to Use Argsort in Descending Order
How to Remove Nan Value While Combining Two Column in Panda Data Frame
Django Filefield with Upload_To Determined at Runtime
Insert an Element at a Specific Index in a List and Return the Updated List
Applying Function with Multiple Arguments to Create a New Pandas Column