What Is a Shell Command to Find The Longest Common Substring of Two Strings in Unix

Longest common prefix of two strings in bash

In sed, assuming the strings don't contain any newline characters:

string1="test toast"
string2="test test"
printf "%s\n%s\n" "$string1" "$string2" | sed -e 'N;s/^\(.*\).*\n\1.*$/\1/'

Extract common part text from number of lines in shell script

awk to the rescue!

 awk -F/ 'NR==1{w=split($0,base,FS); next} 
{for(i=1;i<=w;i++)
if(base[i]!=$i)
{w=i; next}
}
END{for(i=1;i<w;i++)
printf base[i] FS;
print ""
}' file

Description: Construct a base array from the first line separated by FS, keep the size in w(for width). The match can be at most w. For each line compare up to 2 fields until a mismatch occurs, update w. When all done print the matching fields.

How do I find common characters between two strings in bash?

My solution below uses fold to break the string into one character per line, sort to sort the lists, comm to compare the two strings and finally tr to delete the new line characters

comm -12 <(fold -w1 <<< $s1 | sort -u) <(fold -w1 <<< $s2 | sort -u) | tr -d '\n'

Alternatively, here is a pure Bash solution (which also maintains the order of the characters). It iterates over the first string and checks if each character is present in the second string.

s="temp_foo_bar"
t="temp_bar"
i=0
while [ $i -ne ${#s} ]
do
c=${s:$i:1}
if [[ $result != *$c* && $t == *$c* ]]
then
result=$result$c
fi
((i++))
done
echo $result

prints: temp_bar

Shell command to find the longest consecutive simple repeat of a pattern/word occurring on a single line of the text file

If you want only a matched text.

$ cat file.txt 
ABC
OTHER ABCABCABC OTHER
ABCABC
$ grep -f <(grep -oE "(ABC)+" file.txt | sort | tail -1) file.txt
OTHER ABCABCABC OTHER

Portable sed way to find longest common prefix of strings

The following solutions are tested with GNU sed, macOS (10.15) sed and busybox (v1.29) sed.

$ printf '%s\n' a ab abc | sed -e '$q;N;s/^\(.*\).*\n\1.*$/\1/;h;G;D'
a
$ printf '%s\n' a b c | sed -e '$q;N;s/^\(.*\).*\n\1.*$/\1/;h;G;D'

$

To be more efficient when there are many strings especially when there's no common prefix at all (note the ..* part which is different from the previous solution):

$ printf '%s\n' a ab abc | sed -ne :L -e '$p;N;s/^\(..*\).*\n\1.*/\1/;tL' -e q
a
$ printf '%s\n' a b c | sed -ne :L -e '$p;N;s/^\(..*\).*\n\1.*/\1/;tL' -e q
$


Regarding $q in the first solution

According to GNU sed manual (info sed):

  • N command on the last line

    Most versions of sed exit without printing anything when the N command is issued on the last line of a file. GNU sed prints pattern space before exiting unless of course the -n command switch has been specified.


Note that I did not use sed -E because macOS sed's -E does not support \N back-reference in s/pattern/replace/ command's pattern string.

$ # with GNU sed:
$ echo foofoo | gsed -E 's/(foo)\1/bar/'
bar
$
$ # with macOS's own sed:
$ echo foofoo | sed -E 's/(foo)\1/bar/'
foofoo
$


UPDATE (2021-04-26):

Found this in another answer :

sed -e '1{h;d;}' -e 'G;s/\(.*\).*\n\1.*/\1/;h;$!d'

Note that it does not work when there's only one line. Can be easily fixed by removing the 1d part:

sed -e '1h;G;s/^\(.*\).*\n\1.*/\1/;h;$!d'

shell - Characters contained in both strings - edited

Use Character Classes with GNU Grep

The isn't a widely-applicable solution, but it fits your particular use case quite well. The idea is to use the first variable as a character class to match against the second string. For example:

a='abghrsy'
b='cgmnorstuvz'
echo "$b" | grep --only-matching "[$a]" | xargs | tr --delete ' '

This produces grs as you expect. Note that the use of xargs and tr is simply to remove the newlines and spaces from the output; you can certainly handle this some other way if you prefer.

Set Intersection

What you're really looking for is a set intersection, though. While you can "wing it" in the shell, you'd be better off using a language like Ruby, Python, or Perl to do this.

A Ruby One-Liner

If you need to integrate with an existing shell script, a simple Ruby one-liner that uses Bash variables could be called like this inside your current script:

a='abghrsy'
b='cgmnorstuvz'
ruby -e "puts ('$a'.split(//) & '$b'.split(//)).join"

A Ruby Script

You could certainly make things more elegant by doing the whole thing in Ruby instead.

string1_chars = 'abghrsy'.split //
string2_chars = 'cgmnorstuvz'.split //
intersection = string1_chars & string2_chars
puts intersection.join

This certainly seems more readable and robust to me, but your mileage may vary. At least now you have some options to choose from.

Replace one substring for another string in shell script

To replace the first occurrence of a pattern with a given string, use ${parameter/pattern/string}:

#!/bin/bash
firstString="I love Suzi and Marry"
secondString="Sara"
echo "${firstString/Suzi/"$secondString"}"
# prints 'I love Sara and Marry'

To replace all occurrences, use ${parameter//pattern/string}:

message='The secret code is 12345'
echo "${message//[0-9]/X}"
# prints 'The secret code is XXXXX'

(This is documented in the Bash Reference Manual, §3.5.3 "Shell Parameter Expansion".)

Note that this feature is not specified by POSIX — it's a Bash extension — so not all Unix shells implement it. For the relevant POSIX documentation, see The Open Group Technical Standard Base Specifications, Issue 7, the Shell & Utilities volume, §2.6.2 "Parameter Expansion".



Related Topics



Leave a reply



Submit