Are There Any Fuzzy Search or String Similarity Functions Libraries Written for C#

Are there any Fuzzy Search or String Similarity Functions libraries written for C#?

Levenshtein distance implementation:

  • Using LINQ (not really, see comments)
  • Not using LINQ

I have a .NET 1.1 project in which I use the latter. It's simplistic, but works perfectly for what I need. From what I remember it needed a bit of tweaking, but nothing that wasn't obvious.

Fuzzy match in C#

Current versions don't have it built in.

I have seen and used Soundex (a method for fuzzy matching) operations for this in the past. Here's an article on how to implement Soundex in .Net.

Compare string similarity

static class LevenshteinDistance
public static int Compute(string s, string t)
if (string.IsNullOrEmpty(s))
if (string.IsNullOrEmpty(t))
return 0;
return t.Length;

if (string.IsNullOrEmpty(t))
return s.Length;

int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];

// initialize the top and right of the table to 0, 1, 2, ...
for (int i = 0; i <= n; d[i, 0] = i++);
for (int j = 1; j <= m; d[0, j] = j++);

for (int i = 1; i <= n; i++)
for (int j = 1; j <= m; j++)
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
int min1 = d[i - 1, j] + 1;
int min2 = d[i, j - 1] + 1;
int min3 = d[i - 1, j - 1] + cost;
d[i, j] = Math.Min(Math.Min(min1, min2), min3);
return d[n, m];

Fuzzy Text Matching C#

Let me introduce you to the Levenshtein distance formula. It is awesome:

In information theory and computer science, the Levenshtein distance is a string metric for measuring the amount of difference between two sequences. The term edit distance is often used to refer specifically to Levenshtein distance.

Personally I used this in a healthcare setting, where Provider names were checked for duplicates. Using the Levenshtein process, we gave them a confidence rating and allowed them to determine if it was a true duplicate or something unique.

How can I check the input if it's nearly same or not?

Google shows me this

Approximate string matching

There are various string distance metrics you could use.

I would recommend Jaro-Winkler. Unlike edit-distance where the result of a comparison is in discrete units of edits, JW gives you a 0-1 score. It is especially suited for proper names. Also look at this nice tutorial and this SO question.

I haven't worked with C# but here are some implementations of JW I found online:

Impl 1 (They have a DOT NET version too if you look at the file list)

Impl 2

If you want to do a bit more sophisticated matching, you can try to do some custom normalization of word forms commonly occurring in company names such as ltd/limited, inc/incorporated, corp/corporation to account for case insensitivity, abbreviations etc. This way if you compute

distance (normalize("foo corp."),
normalize("FOO CORPORATION") )

you should get the result to be 0 rather than 14 (which is what you would get if you computed levenshtein edit-distance).

Related Topics

Leave a reply
