How can I find the Largest Common Substring between two strings in PHP?
I have since found a relevant wikipedia article. It is not a NP complete problem, it can be done in O(mn) time using a dynamic programming algorithm.
In PHP I found the similar_text function very useful. Here's a code sample to retrieve a series of text emails and loop through them and find ones that are 90% similar to each other. Note: Something like this is NOT scalable:
<?php
// Gather all messages by a user into two identical associative arrays
$getMsgsRes = mysql_query(SELECT * FROM email_messages WHERE from = '$someUserID');
while($msgInfo = mysql_fetch_assoc($getMsgsRes))
{
$msgsInfo1[] = $msgInfo;
$msgsInfo2[] = $msgInfo;
}
// Loop over msgs and compare each one to every other
foreach ($msgsInfo1 as $msg1)
foreach ($msgsInfo2 as $msg2)
similar_text($msg1['msgTxt'],$msg2['msgTxt'],$similarity_pst);
if ($similarity_pst > 90)
echo "{$msg1['msgID']} is ${similarity_pst}% to {$msg2['msgID']}\n";
?>
How to get a substring between two strings in PHP?
If the strings are different (ie: [foo] & [/foo]), take a look at this post from Justin Cook.
I copy his code below:
function get_string_between($string, $start, $end){
$string = ' ' . $string;
$ini = strpos($string, $start);
if ($ini == 0) return '';
$ini += strlen($start);
$len = strpos($string, $end, $ini) - $ini;
return substr($string, $ini, $len);
}
$fullstring = 'this is my [tag]dog[/tag]';
$parsed = get_string_between($fullstring, '[tag]', '[/tag]');
echo $parsed; // (result = dog)
Print ALL common parts between 2 strings
I modified your function to get longest matches in array
<?php
function getLongestMatchingSubstring($str1, $str2)
{
$len_1 = strlen($str1);
$longest = array();
for($i = 0; $i < $len_1; $i++){
for($j = $len_1 - $i; $j > 0; $j--){
$sub = substr($str1, $i, $j);
if (strpos($str2, $sub) !== false && strlen($sub) > 1){
$longest[] = $sub;
$i = strpos($str1, $sub) + (strlen($sub)-1);
break;
}
}
}
return $longest;
}
$string1 = 'asbaMbaeMATCH1asjdndreyMATCH22r5g7jdg3MATCH33';
$string2 = '1241M342gtMAThgMATCH1m1g5mMATCH2cghdiMATCH33';
print_r(getLongestMatchingSubstring($string1, $string2));
the output would be
Array([0] => M [1] => MATCH1 [2] => MATCH2 [3] => MATCH33 )
Find common substring between two strings
Its called Longest Common Substring problem. Here I present a simple, easy to understand but inefficient solution. It will take a long time to produce correct output for large strings, as the complexity of this algorithm is O(N^2).
def longestSubstringFinder(string1, string2):
answer = ""
len1, len2 = len(string1), len(string2)
for i in range(len1):
match = ""
for j in range(len2):
if (i + j < len1 and string1[i + j] == string2[j]):
match += string2[j]
else:
if (len(match) > len(answer)): answer = match
match = ""
return answer
print(longestSubstringFinder("apple pie available", "apple pies"))
print(longestSubstringFinder("apples", "appleses"))
print(longestSubstringFinder("bapples", "cappleses"))
Output
apple pie
apples
apples
Find common substrings from 2 separate sets
As Voitcus said before, you have to clean your data first before you start comparing or looking for duplicates. A phone number should follow a strict pattern. For the numbers which do not match the pattern try to adjust them to it. Then you have the ability to look for duplicates.
Morevover you should do data-cleaning before persisting it, maybe in a seperate column. You then dont have to care for that when looking for duplicates ... just to avoid performance peaks.
Algorithms like levenshtein or similar_text() in php, doesnt fit to that use-case quite well.
PHP: find biggest overlap between multiple strings
Update
I've completely misunderstood the question; the aim was to find the biggest overlap between an array of strings:
$array = array('abc123', 'ac123', 'tbc123', '1ac123');
function overlap($a, $b)
{
if (!strlen($b)) {
return '';
}
if (strpos($a, $b) !== false) {
return $b;
}
$left = overlap($a, substr($b, 1));
$right = overlap($a, substr($b, 0, -1));
return strlen($left) > strlen($right) ? $left : $right;
}
$biggest = null;
foreach ($array as $item) {
if ($biggest === null) {
$biggest = $item;
}
if (($biggest = overlap($biggest, $item)) === '') {
break;
}
}
echo "Biggest match = $biggest\n";
I'm not great at recursion, but I believe this should work ;-)
Old answer
I would probably use preg_grep()
for that; it returns an array with the matches it found based on your search string:
$matches = preg_grep('/' . preg_quote($find, '/') . '/', $array);
Alternatively, you could use array_filter()
:
$matches = array_filter($array, function($item) use ($find) {
return strpos($item, $find) !== false;
});
I need to extract the value "c123" like it is the biggest match for all strings in array
I think what you would want to do here is then sort the above output based on string length (i.e. smallest string length first) and then take the first item:
if ($matches) {
usort($matches, function($a, $b) {
return strlen($a) - strlen($b);
});
echo current($matches); // take first one: ac123
}
Let me know if I'm wrong about that.
If you're just after knowing whether $find
matches an element exactly:
$matching_keys = array_keys($array, $find, true); // could be empty array
Or:
$matching_key = array_search($find, $array, true); // could be false
Or event:
$have_value = in_array($find, $array, true);
Related Topics
Php: How to Add Leading Zeros/Zero Padding to Float via Sprintf()
Linkify Regex Function PHP Daring Fireball Method
Can No Longer Add Registration Fields in Magento 1.4.2.0
Duplicate Data Insert in Codeigniter
How to Format a PHP Include() Absolute (Rather Than Relative) Path
Insert a Persian Text in MySQL Table
How to Use Multiple PHP Header Content Types on the Same Page? Is This Possible
Get a Single Row from the Database Using Ajax in Codeigniter
PHP Remove/Fix Module Not Found or Already Loaded Warnings
PHP - Make Multi-Dimensional Associative Array from a Delimited String
How to Get Order Id in Woocommerce_Email_Headers Hook
How to Display Seo Friendly Urls Using Mod_Rewrite
In PHP, Which Is Faster: Preg_Split or Explode