comparing two strings in SQL Server
There is no direct string compare function in SQL Server
CASE
WHEN str1 = str2 THEN 0
WHEN str1 < str2 THEN -1
WHEN str1 > str2 THEN 1
ELSE NULL --one of the strings is NULL so won't compare (added on edit)
END
Notes
- you can wraps this via a UDF using CREATE FUNCTION etc
- you may need NULL handling (in my code above, any NULL will report 1)
- str1 and str2 will be column names or @variables
SQL Compare Characters in two strings count total identical
You could try and create a script something like this php script to help you:
$words = array();
$duplicates = array();
function _compare($value, $key, $array) {
global $duplicates;
$diff = array_diff($array, $value);
if (!empty($diff)) {
$duplicates[$key] = array_keys($diff);
}
return $diff;
}
$mysqli = new mysqli('localhost', 'username', 'password', 'database');
$query = "SELECT id, business_name FROM table";
if ($result = $mysqli->query($query)) {
while ($row = $result->fetch_object()) {
$pattern = '#[^\w\s]+#i';
$row->business_name = preg_replace($pattern, '', $row->business_name);
$_words = explode(' ', $row->business_name);
$diff = array_walk($words, '_compare', $_words);
$words[$row->id][] = $_words;
$result->close();
}
}
$mysqli->close();
This is not tested but you need something like this, because I don't think this is possible with SQL alone.
---------- EDIT ----------
Or you could do a research on what the guys in the comment recommend Levenshtein distance in T-SQL
Hope it helps, good luck!
TSQL function to compare two strings
Not really sure what you are looking for. From your question, I understand that you need to check 2 email addresses for similarity / dissimilarity.
Why can you not use this?
declare @email1 varchar(100) set @email1 = 'billg@microsoft.com'
declare @email2 varchar(100) set @email2 = 'melinda@microsoft.com'
IF
@email1=@email2
BEGIN
PRINT 'Same Email'
END
ELSE
BEGIN
PRINT 'Not Same Email'
END
Raj
Compare Two Strings For Common Value
If your DB version is 2016+, then you can create queries containing STRING_SPLIT()
function with CROSS APPLY
next to each of your tables, and then filter common values through INTERSECT
operator :
SELECT value
FROM tab1
CROSS APPLY STRING_SPLIT(str, ' ')
INTERSECT
SELECT value
FROM tab2
CROSS APPLY STRING_SPLIT(str, ' ')
Demo
which yields case-insensitive matching among splitted words.
Find sql records containing similar strings
If you really want to define similarity in the exact way that you have formulated in your question, then you would - as you say - have to implement the Levensthein Distance calculation. Either in code calculated on each row retrieved by a DataReader or as a SQL Server function.
The problem stated is actually more tricky than it may appear at first sight, because you cannot assume to know what the mutually shared elements between two strings may be.
So in addition to Levensthein Distance you probably also want to specify a minimum number of consecutive characters that actually have to match (in order for sufficient similarity to be concluded).
In sum: It sounds like an overly complicated and time consuming/slow approach.
Interestingly, in SQL Server 2008 you have the DIFFERENCE function which may be used for something like this.
It evaluates the phonetic value of two strings and calculates the difference. I'm unsure if you will get it to work properly for multi-word expressions such as movie titles since it doesn't deal well with spaces or numbers and puts too much emphasis on the beginning of the string, but it is still an interesting predicate to be aware of.
If what you are actually trying to describe is some sort of search feature, then you should look into the Full Text Search capabilities of SQL Server 2008. It provides built-in Thesaurus support, fancy SQL predicates and a ranking mechanism for "best matches"
EDIT: If you are looking to eliminate duplicates maybe you could look into SSIS Fuzzy Lookup and Fuzzy Group Transformation. I have not tried this myself, but it looks like a promising lead.
EDIT2: If you don't want to dig into SSIS and still struggle with the performance of the Levensthein Distance algorithm, you could perhaps try this algorithm which appears to be less complex.
I need to identify strings, in sql server, that contain the same keywords as a given string in no particular order
Your approach is very much "row based". Here is a set based approach, less code, better maintenance and faster...
DECLARE @forbiddenWords TABLE(item VARCHAR(100));
INSERT INTO @forbiddenWords VALUES ('&'),( 'a'),( 'and'),( 'at'),( 'by'),( 'can'),( 'for'),( 'if'),( 'in'),( 'is'),( 'it'),( 'of'),( 'on'),( 'or'),( 'the'),( 'this'),( 'to'),( 'too'),( 'verizon'),( 'with'),( 'your')
DECLARE @breakingCharacters TABLE(item VARCHAR(100));
INSERT INTO @breakingCharacters VALUES(':'),(';'),(','),('!'),('-'),('?'),('.'),('%'),('$'),('&'),('£'),('"');
DECLARE @Phrase1 VARCHAR(MAX)='This is a text where I try to find similar words. Let''s see if it works!';
DECLARE @Phrase2 VARCHAR(MAX)='This is another text where I use some words of Phrase1 to check their similarity!';
--Replace all breaking Characters
SELECT @Phrase1=REPLACE(@Phrase1,item,' ')
FROM @breakingCharacters;
SELECT @Phrase2=REPLACE(@Phrase2,item,' ')
FROM @breakingCharacters;
WITH Splitted AS
(
SELECT CAST('<x>' + REPLACE(LOWER(@Phrase1),' ','</x><x>') + '</x>' AS xml) AS Phrase1AsXml
,CAST('<x>' + REPLACE(LOWER(@Phrase2),' ','</x><x>') + '</x>' AS xml) AS Phrase2AsXml
)
,Phrase1AsFilteredWords AS
(
SELECT DISTINCT The.word.value('.','varchar(max)') AS OneWord
FROM Splitted
CROSS APPLY Phrase1AsXml.nodes('/x') AS The(word)
WHERE LEN(The.word.value('.','varchar(max)'))>0
AND NOT EXISTS(SELECT * FROM @forbiddenWords AS fw WHERE fw.item = The.word.value('.','varchar(max)') )
)
,Phrase2AsFilteredWords AS
(
SELECT DISTINCT The.word.value('.','varchar(max)') AS OneWord
FROM Splitted
CROSS APPLY Phrase2AsXml.nodes('/x') AS The(word)
WHERE LEN(The.word.value('.','varchar(max)'))>0
AND NOT EXISTS(SELECT * FROM @forbiddenWords AS fw WHERE fw.item = The.word.value('.','varchar(max)') )
)
,CommonWords AS
(
SELECT p1.OneWord
FROM Phrase1AsFilteredWords AS p1
INNER JOIN Phrase2AsFilteredWords AS p2 ON p1.OneWord=p2.OneWord
)
,WordCounter AS
(
SELECT
(SELECT COUNT(*) FROM Phrase1AsFilteredWords) AS CountPhrase1
,(SELECT COUNT(*) FROM Phrase2AsFilteredWords) AS CountPhrase2
,(SELECT COUNT(*) FROM CommonWords) AS CountCommon
)
SELECT WordCounter.*
,(CountCommon*100) / CountPhrase1 AS Phrase1PC
,(CountCommon*100) / CountPhrase2 AS Phrase2PC
,STUFF((
SELECT ', ' + OneWord
FROM CommonWords
FOR XML PATH('')
),1,2,'') AS CommonWords
FROM WordCounter
The result :
CountPhrase1 CountPhrase2 CountCommon Phrase1PC Phrase2PC CommonWords
10 11 4 40 36 i, text, where, words
One hint: If you compare many with many it will cost a lot to do the calculation again and again. I'd advise you to prepare all phrases in one go and compare these prepared results...
One more hint: If you do this more often and your phrases don't change, it could be clever to store the preparated word list permanently.
Happy coding!
T-SQL - compare strings char by char
for columns in table you don't want to use row by row approach, try this one:
with cte(n) as (
select 1
union all
select n + 1 from cte where n < 9
)
select
t.s1, t.s2,
sum(
case
when substring(t.s1, c.n, 1) <> substring(t.s2, c.n, 1) then 1
else 0
end
) as diff
from test as t
cross join cte as c
group by t.s1, t.s2
=>sql fiddle demo
Related Topics
MySQL Count() Multiple Columns
SQL Query to Create a Calculated Field
Reverse in Oracle This Path Z/Y/X to X/Y/Z
SQL Case: Does the Order of the When Statements Matter
Using Insert into with 'Select' to Supply Some Values But Not Others (Access 2010)
Connect to Remote SQL Database Using Excel
Case Statement with Different Data Type
Unpivot on an Indeterminate Number of Columns
Local Collection Types Not Allowed in SQL Statements
Good Database and Structure to Store Synonyms
If It Is Not Allowed to Rollback a Truncate Statement Then How How to Use It in a Transaction
SQL Performance of a Lookup Table
In SQL Server, Why Is It That Null Does Not Equal Empty String and Doesn't Not Equal Empty String