Linq Where Ignore Accentuation and Case

LINQ Where Ignore Accentuation and Case

To ignore case and accents (diacritics) you can first define an extension method like this:

    public static string RemoveDiacritics(this String s)
{
String normalizedString = s.Normalize(NormalizationForm.FormD);
StringBuilder stringBuilder = new StringBuilder();

for (int i = 0; i < normalizedString.Length; i++)
{
Char c = normalizedString[i];
if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
stringBuilder.Append(c);
}

return stringBuilder.ToString();
}

(Modified from Ignoring accented letters in string comparison)

Now you can run your query:

string queryText = filter.ToUpper().RemoveDiacritics();

var result = from p in People
where p.Name.ToUpper().RemoveDiacritics() == queryText
select p;

This is fine if you are just iterating over a collection in C#, but if you are using LINQ to SQL it is preferable to avoid non-standard methods (including extension methods) in your LINQ query. This is because your code cannot be converted into valid SQL and hence run on SQL Server with all its lovely performance optimization.

Since there doesn't seem to be a standard way of ignoring accents within LINQ to SQL, in this case I would suggest changing the field type that you want to search to be case- and accent-insensitive (CI_AI).

With your example:

ALTER TABLE People ALTER COLUMN Name [varchar](100) COLLATE SQL_Latin1_General_CP1_CI_AI

Your query should now ignore accentuation and case.

Note that you will need to temporarily remove any unique constraints on the field before running the above query, e.g.

ALTER TABLE People DROP CONSTRAINT UQ_People_Name

Now your LINQ query would simply be:

var result = from p in People
where p.Name == filter
select p;

See related question here.

Linq Contains without considering accents

Borrowing a similar solution form here:

string[] result = {"hello there", "héllo there","goodbye"};

string word = "héllo";

var compareInfo = CultureInfo.InvariantCulture.CompareInfo;

var filtered = result.Where(
p => compareInfo.IndexOf(p, word, CompareOptions.IgnoreNonSpace) > -1);

Ignore acutes in LINQ when using Contains

The answer has been in the comments for a while. Here it is as an answer too:

Alter the database collation to one that ends with "_AI" for making it accent insensitive (or "_CI_AI", for making it case insensitive too)

Is there an easy way to handle accent marks in linq-to-entities queries

You may need to change the collation of the field in the database table to be case- and accent-insensitive for LINQ to work with standard methods like .Contains():

ALTER TABLE Khans ALTER COLUMN Name [varchar](100) COLLATE SQL_Latin1_General_CP1_CI_AI

The "CI" at the end of the collation name means "case-insensitive" and the "AI" means "accent-insensitive."

Ignoring accents in SQL Server using LINQ to SQL

In SQL queries (Sql Server 2000+, as I recall), you do this by doing something like select MyString, MyId from MyTable where MyString collate Latin1_General_CI_AI ='aaaa'.

I'm not sure if this is possible in Linq, but someone more cozy with Linq can probably translate.

If you are ok with sorting and select/where queries ALWAYS ignoring accents, you can alter the table to specify the same collation on the field(s) with which you are concerned.

Ignoring accents while searching the database using Entity Framework

If you set an accent-insensitive collation order on the Name column then the queries should work as required.

Remove accents from string in LINQ loop - Ucommerce products

Not clear from your question, but most probable cause is this line:

|| p.Name.RemoveDiacritics().Contains(whatToSearch)

I think this code is contacting to the database to fetch products based on the query and translator cannot find SQL analogue to RemoveDiacritics() functions.

EDIT:- Sure this will be a lot slower if it works, but lets try anyway:

if (!string.IsNullOrWhiteSpace(whatToSearch))
{
//I've added call to ALL to fetch all products and then filter on
//code side.
products = UCommerce.EntitiesV2.Product.All().Where(p =>
p.VariantSku == null && p.DisplayOnSite &&
(
p.Sku.Contains(whatToSearch)
|| p.Name.RemoveDiacritics().Contains(whatToSearch)
|| p.ProductDescriptions.Any(
d => d.DisplayName.Contains(whatToSearch)
|| d.ShortDescription.Contains(whatToSearch)
|| d.LongDescription.Contains(whatToSearch)
)
)
);
}

Ignoring accented letters in string comparison

FWIW, knightfor's answer below (as of this writing) should be the accepted answer.

Here's a function that strips diacritics from a string:

static string RemoveDiacritics(string text)
{
string formD = text.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();

foreach (char ch in formD)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
if (uc != UnicodeCategory.NonSpacingMark)
{
sb.Append(ch);
}
}

return sb.ToString().Normalize(NormalizationForm.FormC);
}

More details on MichKap's blog (RIP...).

The principle is that is it turns 'é' into 2 successive chars 'e', acute.
It then iterates through the chars and skips the diacritics.

"héllo" becomes "he<acute>llo", which in turn becomes "hello".

Debug.Assert("hello"==RemoveDiacritics("héllo"));

Note: Here's a more compact .NET4+ friendly version of the same function:

static string RemoveDiacritics(string text)
{
return string.Concat(
text.Normalize(NormalizationForm.FormD)
.Where(ch => CharUnicodeInfo.GetUnicodeCategory(ch)!=
UnicodeCategory.NonSpacingMark)
).Normalize(NormalizationForm.FormC);
}


Related Topics



Leave a reply



Submit