Difference Between Invariantculture and Ordinal String Comparison

Difference between InvariantCulture and Ordinal string comparison

InvariantCulture

Uses a "standard" set of character orderings (a,b,c, ... etc.). This is in contrast to some specific locales, which may sort characters in different orders ('a-with-acute' may be before or after 'a', depending on the locale, and so on).

Ordinal

On the other hand, looks purely at the values of the raw byte(s) that represent the character.


There's a great sample at http://msdn.microsoft.com/en-us/library/e6883c06.aspx that shows the results of the various StringComparison values. All the way at the end, it shows (excerpted):

StringComparison.InvariantCulture:
LATIN SMALL LETTER I (U+0069) is less than LATIN SMALL LETTER DOTLESS I (U+0131)
LATIN SMALL LETTER I (U+0069) is less than LATIN CAPITAL LETTER I (U+0049)
LATIN SMALL LETTER DOTLESS I (U+0131) is greater than LATIN CAPITAL LETTER I (U+0049)

StringComparison.Ordinal:
LATIN SMALL LETTER I (U+0069) is less than LATIN SMALL LETTER DOTLESS I (U+0131)
LATIN SMALL LETTER I (U+0069) is greater than LATIN CAPITAL LETTER I (U+0049)
LATIN SMALL LETTER DOTLESS I (U+0131) is greater than LATIN CAPITAL LETTER I (U+0049)

You can see that where InvariantCulture yields (U+0069, U+0049, U+00131), Ordinal yields (U+0049, U+0069, U+00131).

String comparison: InvariantCultureIgnoreCase vs OrdinalIgnoreCase?

If you really want to match only the dot, then StringComparison.Ordinal would be fastest, as there is no case-difference.

"Ordinal" doesn't use culture and/or casing rules that are not applicable anyway on a symbol like a ..

is StringComparison.Ordinal the same as InvariantCulture for testing equality?

It does matter, for example - there is a thing called character expansion

    var s1 = "Strasse";
var s2 = "Straße";

s1.Equals(s2, StringComparison.Ordinal); // false
s1.Equals(s2, StringComparison.InvariantCulture); // true

With InvariantCulture the ß character gets expanded to ss.

When should I use StringComparison.InvariantCulture instead of StringComparison.CurrentCulture to test string equality?

Combining diacritics / non-normalised strings is one example. See this answer for a decent treatment with code: https://stackoverflow.com/a/31361980/2701753

In summary for (many) 'alphabets' there are several potential Unicode (and UCS-2) representations for the same glyph (letter)

For example:

Unicode Character “á” (U+00E1) [one unicode codepoint]
Unicode Character “a” (U+0061) [followed by] Unicode Character “◌́” (U+0301) [two unicode codepoints]

so:
á

Same linguistic string (for all cultures, they are supposed to represent the same character) but different ordinal string (different bytes).

So Invariant equality comparison is [in this case] like normalising the strings before comparing them

Look-up unicode normalisation / decomposition for more info.

There are other interesting cases, ligatures for example. And left to right and right to left marks and ....

So, in summary, once you have 'interesting' alphabets in play (pretty much anything outside pure ascii), once you are interested in any sort of comparison of the strings as linguistic items / streams of glyphs, you probably do want to go beyond ordinal comparison.

To directly answer the question: If you have a multicultural user-base, but still need the above linguistic sensitivity, what culture would you choose for:

StringComparison.CurrentCulture (for some manually set thread culture, in order to not depend on the machine OS configuations)

other than InvariantCulture?

C# String comparisons: Difference between CurrentCultureIgnoreCase and InvariantCultureIgnoreCase

Microsoft gives some decent guidance for when to use the InvariantCulture property:

MSDN: CultureInfo.InvariantCulture Property

... an application should use the
invariant culture only for processes
that require culture-independent
results, such as formatting and
parsing data that is persisted to a
file. In other cases, it produces
results that might be linguistically
incorrect or culturally inappropriate.

Security Considerations

If a security decision will be made
based on the result of a string
comparison or case change, your
application should use an ordinal
comparison that ignores case instead
of using InvariantCulture. [...]

String Operations

If your application needs to perform a
culture-sensitive string operation
that is not affected by the value of
CurrentCulture, it should use a method
that accepts a CultureInfo parameter.
[...]

Persisting Data

The InvariantCulture property is
useful for storing data that will not
be displayed directly to users.
Storing data in a culture-independent
format guarantees a known format that
does not change. When users from
different cultures access the data, it
can be formatted appropriately based
on specific user. [...]

Which is generally best to use — StringComparison.OrdinalIgnoreCase or StringComparison.InvariantCultureIgnoreCase?

Newer .Net Docs now has a table to help you decide which is best to use in your situation.

From MSDN's "New Recommendations for Using Strings in Microsoft .NET 2.0"

Summary: Code owners previously using the InvariantCulture for string comparison, casing, and sorting should strongly consider using a new set of String overloads in Microsoft .NET 2.0. Specifically, data that is designed to be culture-agnostic and linguistically irrelevant should begin specifying overloads using either the StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase members of the new StringComparison enumeration. These enforce a byte-by-byte comparison similar to strcmp that not only avoids bugs from linguistic interpretation of essentially symbolic strings, but provides better performance.

What is the right string comparison value to be used in a machine to machine communication scenario?

It would seem from the details provided that you are correct in your assumption - you want to find a specific user where the name is "Bob". "Bób" is a different user and should not match, i.e. you are actually trying to match two symbols, and not how the username would be read.

If, however, you were looking up street names, you may want to ask the code to consider "strasse" and "Straße" to be considered as the same, as you are doing a linguistic match i.e. would the client read these two things in the same way.

Why does OrdinalIgnoreCase and InvariantCultureIgnoreCase return different results?

"‎877495169fa05b9d8639a0ebc42022338f7d2324"

Sounds like a trick question. There's an extra character at the start at this string, before the first digit 8. It isn't visible in the browser. It is U+200E, "Left to Right Mark". The ordinal comparison sees that character, the invariant comparison ignores it. You can see it for yourself by using ToCharArray() on the string.

Delete that string and paste this one instead, I removed U+200E from it:

"877495169fa05b9d8639a0ebc42022338f7d2324"

And the Compare() method now returns 0 like it should. Do watch out for that text editor or IME you are using right now. Isn't Unicode fun?

Why is using StringComparison.Ordinal considered preferable in this situation?

The one-parameter version of IndexOf() uses culture-specific comparison. It will behave differently depending on where you code runs. In many cases Ordinal or InvariantCulture comparison is more appropriate. This is what Resharper recommends. When you look for '/', Ordinal is sufficient and it also happens to be the simplest and fastest comparison type, so that's what Resharper recommends.

For cases where you do want to use culture-sensitive comparisons, just specify StringComparison.CurrentCulture, and Resharper will assume that you know what you are doing. I know you did not ask about this last part. Just adding it for completeness.

C# anyString .Contains('\0', StringComparison.InvariantCulture) returns true in .NET5 but false in older versions

not a bug, a feature

The issue that I've opened has been closed, but they gave a very good explanation. Now... In .NET 5.0 they began using on Windows (on Linux it was already present) a new library for comparing strings, the ICU library. It is the official library of the Unicode Consortium, so it is "the verb". That library is used for CurrentCulture, InvariantCulture (plus the respective IgnoreCase) and and any other culture. The only exception is the Ordinal/OrdinalIgnoreCase. The library is targetted for text and it has some "particular" ideas about non-text. In this particular case, there are some characters that are simply ignored. In the block 0000-00FF I would say the ignored characters are all control codes (please ignore the fact that they are shown as €‚ƒ„†‡ˆ‰Š‹ŒŽ‘’“”•–—™š›œžŸ, at a certain point these characters have been remapped somewhere else in the Unicode, but the glyps shown don't reflect it, but if you try to see their code, like doing char ch = '€'; int val = (int)ch; you'll see it), and '\0' is a control code.

Now... My personal thinking is that to compare string from today you'll need a master's degree in Unicode Technologies , and I do hope that they'll do some shenanigans in .NET 6.0 to make the default comparison Ordinal (it is one of the proposals for .NET 6.0, the Option B). Note that if you want to make programs that can run in Turkey you already needed a master's degree in Unicode Technologies (see the Turkish i problem).

In general I would say that to look for words that aren't keywords/fixed words (for example column names), you should use Culture-aware comparisons, while to look for keywords/fixed words (for example column names) and symbols/control codes you should use Ordinal comparisons. The problem is when you want to look for both at the same time. Normally in this case you are looking for exact words, so you can use Ordinal. Otherwise it becames hellish. And I don't even want to think how Regex works internally in a Culture-aware environment. That I don't want to think about. Becasue in that direction there can only be folly and nightmares .

As a sidenote, even before the "default" Culture-aware comparisons had some secret shaeaningans... for example:

int ix = "ʹ$ʹ".IndexOf("$"); // -1 on .NET Framework or .NET Core <= 3.1

what I had written before

I'll say that it is a bug. There is a similar bug with IndexOf. I've opened an Issue on github to track it.

As you have written, the Ordinal and OrdinalIgnoreCase work as expected (probably because they don't need to use the new ICU library for handling Unicode).

Some sample code:

Console.WriteLine($"Ordinal Contains null char {"test".Contains("\0", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase Contains null char {"test".Contains("\0", StringComparison.OrdinalIgnoreCase)}");

Console.WriteLine($"CurrentCulture Contains null char {"test".Contains("\0", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase Contains null char {"test".Contains("\0", StringComparison.CurrentCultureIgnoreCase)}");

Console.WriteLine($"InvariantCulture Contains null char {"test".Contains("\0", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase Contains null char {"test".Contains("\0", StringComparison.InvariantCultureIgnoreCase)}");

Console.WriteLine($"Ordinal IndexOf null char {"test".IndexOf("\0t", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase IndexOf null char {"test".IndexOf("\0", StringComparison.OrdinalIgnoreCase)}");

Console.WriteLine($"CurrentCulture IndexOf null char {"test".IndexOf("\0", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase IndexOf null char {"test".IndexOf("\0", StringComparison.CurrentCultureIgnoreCase)}");

Console.WriteLine($"InvariantCulture IndexOf null char {"test".IndexOf("\0", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase IndexOf null char {"test".IndexOf("\0", StringComparison.InvariantCultureIgnoreCase)}");

and

Console.WriteLine($"Ordinal Contains null char {"test".Contains("\0test", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase Contains null char {"test".Contains("\0test", StringComparison.OrdinalIgnoreCase)}");

Console.WriteLine($"CurrentCulture Contains null char {"test".Contains("\0test", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase Contains null char {"test".Contains("\0test", StringComparison.CurrentCultureIgnoreCase)}");

Console.WriteLine($"InvariantCulture Contains null char {"test".Contains("\0test", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase Contains null char {"test".Contains("\0test", StringComparison.InvariantCultureIgnoreCase)}");

Console.WriteLine($"Ordinal IndexOf null char {"test".IndexOf("\0t", StringComparison.Ordinal)}");
Console.WriteLine($"OrdinalIgnoreCase IndexOf null char {"test".IndexOf("\0test", StringComparison.OrdinalIgnoreCase)}");

Console.WriteLine($"CurrentCulture IndexOf null char {"test".IndexOf("\0test", StringComparison.CurrentCulture)}");
Console.WriteLine($"CurrentCultureIgnoreCase IndexOf null char {"test".IndexOf("\0test", StringComparison.CurrentCultureIgnoreCase)}");

Console.WriteLine($"InvariantCulture IndexOf null char {"test".IndexOf("\0test", StringComparison.InvariantCulture)}");
Console.WriteLine($"InvariantCultureIgnoreCase IndexOf null char {"test".IndexOf("\0test", StringComparison.InvariantCultureIgnoreCase)}");


Related Topics



Leave a reply



Submit