Algorithms for String Similarities (Better Than Levenshtein, and Similar_Text)? PHP, Js

Algorithms for string similarities (better than Levenshtein, and similar_text)? Php, Js

Here's a solution that I've come up to. It's based on Tim's suggestion of comparing the order of subsequent charachters. Some results:

  • jonas / jonax : 0.8
  • jonas / sjona : 0.68
  • jonas / sjonas : 0.66
  • jonas / asjon : 0.52
  • jonas / xxjon : 0.36

I'm sure i isn't perfect, and that it could be optimized, but nevertheless it seems to produce the results that I'm after...
One weak spot is that when strings have different length, it produces different result when the values are swapped...

static public function string_compare($str_a, $str_b) 
{
$length = strlen($str_a);
$length_b = strlen($str_b);

$i = 0;
$segmentcount = 0;
$segmentsinfo = array();
$segment = '';
while ($i < $length)
{
$char = substr($str_a, $i, 1);
if (strpos($str_b, $char) !== FALSE)
{
$segment = $segment.$char;
if (strpos($str_b, $segment) !== FALSE)
{
$segmentpos_a = $i - strlen($segment) + 1;
$segmentpos_b = strpos($str_b, $segment);
$positiondiff = abs($segmentpos_a - $segmentpos_b);
$posfactor = ($length - $positiondiff) / $length_b; // <-- ?
$lengthfactor = strlen($segment)/$length;
$segmentsinfo[$segmentcount] = array( 'segment' => $segment, 'score' => ($posfactor * $lengthfactor));
}
else
{
$segment = '';
$i--;
$segmentcount++;
}
}
else
{
$segment = '';
$segmentcount++;
}
$i++;
}

// PHP 5.3 lambda in array_map
$totalscore = array_sum(array_map(function($v) { return $v['score']; }, $segmentsinfo));
return $totalscore;
}

How to improve PHP string match with similar_text()?

Levenshtein distance: http://php.net/manual/en/function.levenshtein.php

It's reverse to similar_text(), so 0% means there is no difference.

// <!-- Overcast, Rain or Showers compared Overcast, Rain or Showers is 0 -->
// <!-- Overcast, Risk of Rain or Showers compared Overcast, Rain or Showers is 11 -->
// <!-- Overcast, Chance of Rain or Showers compared Overcast, Rain or Showers is 13 -->

Php check similarity of multiple strings

Well it isn't seems to be problem actually.
Because,
There can be different users with slight difference in their email id.

How can you tell that user with email ids : nike1@gmail.com and nike2@gmail.com are the same that of nike@gmail.com ?

but how ever if you want to check so :

1) You can remove the last numbers by using the regx or something similar
2) Then can check the original email id if it exists in your database.

PHP String Comparison and similarity index

see similar_text(). And if you want to exclude spaces simple str_replace(' ', '', $string); prior.

echo similar_text ( 'LEGENDARY' , 'BARNEYSTINSON', $percent); // outputs 3
echo $percent; // outputs 27.272727272727

Here's another way using only unique characters

<?php
function unique_chars($string) {
return count_chars(strtolower(str_replace(' ', '', $string)), 3);
}
function compare_strings($a, $b) {
$index = similar_text(unique_chars($a), unique_chars($b), $percent);
return array('index' => $index, 'percent' => $percent);
}
print_r( compare_strings('LEGENDARY', 'BARNEY STINSON') );

// outputs:
?>

Array
(
[index] => 5
[percent] => 55.555555555556
)


Related Topics



Leave a reply



Submit