Difference between InvariantCulture and Ordinal string comparison
InvariantCulture
Uses a "standard" set of character orderings (a,b,c, ... etc.). This is in contrast to some specific locales, which may sort characters in different orders ('a-with-acute' may be before or after 'a', depending on the locale, and so on).
Ordinal
On the other hand, looks purely at the values of the raw byte(s) that represent the character.
There's a great sample at http://msdn.microsoft.com/en-us/library/e6883c06.aspx that shows the results of the various StringComparison values. All the way at the end, it shows (excerpted):
StringComparison.InvariantCulture:
LATIN SMALL LETTER I (U+0069) is less than LATIN SMALL LETTER DOTLESS I (U+0131)
LATIN SMALL LETTER I (U+0069) is less than LATIN CAPITAL LETTER I (U+0049)
LATIN SMALL LETTER DOTLESS I (U+0131) is greater than LATIN CAPITAL LETTER I (U+0049)
StringComparison.Ordinal:
LATIN SMALL LETTER I (U+0069) is less than LATIN SMALL LETTER DOTLESS I (U+0131)
LATIN SMALL LETTER I (U+0069) is greater than LATIN CAPITAL LETTER I (U+0049)
LATIN SMALL LETTER DOTLESS I (U+0131) is greater than LATIN CAPITAL LETTER I (U+0049)
You can see that where InvariantCulture yields (U+0069, U+0049, U+00131), Ordinal yields (U+0049, U+0069, U+00131).
String comparison: InvariantCultureIgnoreCase vs OrdinalIgnoreCase?
If you really want to match only the dot, then StringComparison.Ordinal
would be fastest, as there is no case-difference.
"Ordinal" doesn't use culture and/or casing rules that are not applicable anyway on a symbol like a .
.
is StringComparison.Ordinal the same as InvariantCulture for testing equality?
It does matter, for example - there is a thing called character expansion
var s1 = "Strasse";
var s2 = "Straße";
s1.Equals(s2, StringComparison.Ordinal); // false
s1.Equals(s2, StringComparison.InvariantCulture); // true
With InvariantCulture
the ß
character gets expanded to ss
.
When should I use StringComparison.InvariantCulture instead of StringComparison.CurrentCulture to test string equality?
Combining diacritics / non-normalised strings is one example. See this answer for a decent treatment with code: https://stackoverflow.com/a/31361980/2701753
In summary for (many) 'alphabets' there are several potential Unicode (and UCS-2) representations for the same glyph (letter)
For example:
Unicode Character “á” (U+00E1) [one unicode codepoint]
Unicode Character “a” (U+0061) [followed by] Unicode Character “◌́” (U+0301) [two unicode codepoints]
so:
á
á
Same linguistic string (for all cultures, they are supposed to represent the same character) but different ordinal string (different bytes).
So Invariant equality comparison is [in this case] like normalising the strings before comparing them
Look-up unicode normalisation / decomposition for more info.
There are other interesting cases, ligatures for example. And left to right and right to left marks and ....
So, in summary, once you have 'interesting' alphabets in play (pretty much anything outside pure ascii), once you are interested in any sort of comparison of the strings as linguistic items / streams of glyphs, you probably do want to go beyond ordinal comparison.
To directly answer the question: If you have a multicultural user-base, but still need the above linguistic sensitivity, what culture would you choose for:
StringComparison.CurrentCulture
(for some manually set thread culture, in order to not depend on the machine OS configuations)
other than InvariantCulture?
C# String comparisons: Difference between CurrentCultureIgnoreCase and InvariantCultureIgnoreCase
Microsoft gives some decent guidance for when to use the InvariantCulture
property:
MSDN: CultureInfo.InvariantCulture Property
... an application should use the
invariant culture only for processes
that require culture-independent
results, such as formatting and
parsing data that is persisted to a
file. In other cases, it produces
results that might be linguistically
incorrect or culturally inappropriate.Security Considerations
If a security decision will be made
based on the result of a string
comparison or case change, your
application should use an ordinal
comparison that ignores case instead
of using InvariantCulture. [...]String Operations
If your application needs to perform a
culture-sensitive string operation
that is not affected by the value of
CurrentCulture, it should use a method
that accepts a CultureInfo parameter.
[...]Persisting Data
The InvariantCulture property is
useful for storing data that will not
be displayed directly to users.
Storing data in a culture-independent
format guarantees a known format that
does not change. When users from
different cultures access the data, it
can be formatted appropriately based
on specific user. [...]
Which is generally best to use — StringComparison.OrdinalIgnoreCase or StringComparison.InvariantCultureIgnoreCase?
Newer .Net Docs now has a table to help you decide which is best to use in your situation.
From MSDN's "New Recommendations for Using Strings in Microsoft .NET 2.0"
Summary: Code owners previously using the
InvariantCulture
for string comparison, casing, and sorting should strongly consider using a new set ofString
overloads in Microsoft .NET 2.0. Specifically, data that is designed to be culture-agnostic and linguistically irrelevant should begin specifying overloads using either theStringComparison.Ordinal
orStringComparison.OrdinalIgnoreCase
members of the newStringComparison
enumeration. These enforce a byte-by-byte comparison similar tostrcmp
that not only avoids bugs from linguistic interpretation of essentially symbolic strings, but provides better performance.
What is the right string comparison value to be used in a machine to machine communication scenario?
It would seem from the details provided that you are correct in your assumption - you want to find a specific user where the name is "Bob". "Bób" is a different user and should not match, i.e. you are actually trying to match two symbols, and not how the username would be read.
If, however, you were looking up street names, you may want to ask the code to consider "strasse" and "Straße" to be considered as the same, as you are doing a linguistic match i.e. would the client read these two things in the same way.
Why does OrdinalIgnoreCase and InvariantCultureIgnoreCase return different results?
"877495169fa05b9d8639a0ebc42022338f7d2324"
Sounds like a trick question. There's an extra character at the start at this string, before the first digit 8. It isn't visible in the browser. It is U+200E, "Left to Right Mark". The ordinal comparison sees that character, the invariant comparison ignores it. You can see it for yourself by using ToCharArray() on the string.
Delete that string and paste this one instead, I removed U+200E from it:
"877495169fa05b9d8639a0ebc42022338f7d2324"
And the Compare() method now returns 0 like it should. Do watch out for that text editor or IME you are using right now. Isn't Unicode fun?
Why is using StringComparison.Ordinal considered preferable in this situation?
The one-parameter version of IndexOf()
uses culture-specific comparison. It will behave differently depending on where you code runs. In many cases Ordinal or InvariantCulture comparison is more appropriate. This is what Resharper recommends. When you look for '/', Ordinal is sufficient and it also happens to be the simplest and fastest comparison type, so that's what Resharper recommends.
For cases where you do want to use culture-sensitive comparisons, just specify StringComparison.CurrentCulture
, and Resharper will assume that you know what you are doing. I know you did not ask about this last part. Just adding it for completeness.
C# anyString .Contains('\0', StringComparison.InvariantCulture) returns true in .NET5 but false in older versions
not a bug, a feature
The issue that I've opened has been closed, but they gave a very good explanation. Now... In .NET 5.0 they began using on Windows (on Linux it was already present) a new library for comparing strings, the ICU library. It is the official library of the Unicode Consortium, so it is "the verb". That library is used for CurrentCulture
, InvariantCulture
(plus the respective IgnoreCase
) and and any other culture. The only exception is the Ordinal
/OrdinalIgnoreCase
. The library is targetted for text and it has some "particular" ideas about non-text. In this particular case, there are some characters that are simply ignored. In the block 0000-00FF I would say the ignored characters are all control codes (please ignore the fact that they are shown as €‚ƒ„†‡ˆ‰Š‹ŒŽ‘’“”•–—™š›œžŸ
, at a certain point these characters have been remapped somewhere else in the Unicode, but the glyps shown don't reflect it, but if you try to see their code, like doing char ch = '€'; int val = (int)ch;
you'll see it), and '\0'
is a control code.
Now... My personal thinking is that to compare string
from today you'll need a master's degree in Unicode Technologies , and I do hope that they'll do some shenanigans in .NET 6.0 to make the default comparison Ordinal
(it is one of the proposals for .NET 6.0, the Option B). Note that if you want to make programs that can run in Turkey you already needed a master's degree in Unicode Technologies (see the Turkish i problem).
In general I would say that to look for words that aren't keywords/fixed words (for example column names), you should use Culture-aware comparisons, while to look for keywords/fixed words (for example column names) and symbols/control codes you should use Ordinal comparisons. The problem is when you want to look for both at the same time. Normally in this case you are looking for exact words, so you can use Ordinal. Otherwise it becames hellish. And I don't even want to think how Regex works internally in a Culture-aware environment. That I don't want to think about. Becasue in that direction there can only be folly and nightmares .
As a sidenote, even before the "default" Culture-aware comparisons had some secret shaeaningans... for example:
int ix = "ʹ$ʹ".IndexOf("$"); // -1 on .NET Framework or .NET Core <= 3.1
what I had written before
I'll say that it is a bug. There is a similar bug with IndexOf
. I've opened an Issue on github to track it.
As you have written, the Ordinal
and OrdinalIgnoreCase
work as expected (probably because they don't need to use the new ICU library for handling Unicode).
Some sample code:
Console.WriteLine($"Ordinal Contains null char {"test".Contains("\0", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase Contains null char {"test".Contains("\0", StringComparison.OrdinalIgnoreCase)}");
Console.WriteLine($"CurrentCulture Contains null char {"test".Contains("\0", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase Contains null char {"test".Contains("\0", StringComparison.CurrentCultureIgnoreCase)}");
Console.WriteLine($"InvariantCulture Contains null char {"test".Contains("\0", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase Contains null char {"test".Contains("\0", StringComparison.InvariantCultureIgnoreCase)}");
Console.WriteLine($"Ordinal IndexOf null char {"test".IndexOf("\0t", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase IndexOf null char {"test".IndexOf("\0", StringComparison.OrdinalIgnoreCase)}");
Console.WriteLine($"CurrentCulture IndexOf null char {"test".IndexOf("\0", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase IndexOf null char {"test".IndexOf("\0", StringComparison.CurrentCultureIgnoreCase)}");
Console.WriteLine($"InvariantCulture IndexOf null char {"test".IndexOf("\0", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase IndexOf null char {"test".IndexOf("\0", StringComparison.InvariantCultureIgnoreCase)}");
and
Console.WriteLine($"Ordinal Contains null char {"test".Contains("\0test", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase Contains null char {"test".Contains("\0test", StringComparison.OrdinalIgnoreCase)}");
Console.WriteLine($"CurrentCulture Contains null char {"test".Contains("\0test", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase Contains null char {"test".Contains("\0test", StringComparison.CurrentCultureIgnoreCase)}");
Console.WriteLine($"InvariantCulture Contains null char {"test".Contains("\0test", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase Contains null char {"test".Contains("\0test", StringComparison.InvariantCultureIgnoreCase)}");
Console.WriteLine($"Ordinal IndexOf null char {"test".IndexOf("\0t", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase IndexOf null char {"test".IndexOf("\0test", StringComparison.OrdinalIgnoreCase)}");
Console.WriteLine($"CurrentCulture IndexOf null char {"test".IndexOf("\0test", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase IndexOf null char {"test".IndexOf("\0test", StringComparison.CurrentCultureIgnoreCase)}");
Console.WriteLine($"InvariantCulture IndexOf null char {"test".IndexOf("\0test", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase IndexOf null char {"test".IndexOf("\0test", StringComparison.InvariantCultureIgnoreCase)}");
Related Topics
Do Try/Catch Blocks Hurt Performance When Exceptions Are Not Thrown
Make Https Call Using Httpclient
How to Stop C# Console Applications from Closing Automatically
Can Anonymous Class Implement Interface
How to Calculate Distance Similarity Measure of Given 2 Strings
Assigning Out/Ref Parameters in Moq
How to Load Dll 'Sqlite.Interop.Dll'
How to Configure Socket Connect Timeout
How to Change the Timeout on a .Net Webclient Object
Split String Containing Command-Line Parameters into String[] in C#
Entity Framework Stored Procedure Table Value Parameter
Lambda/Linq with Contains Criteria for Multiple Keywords
How to Generate and Validate a Software License Key
How to Convert a String to Its Equivalent Linq Expression Tree