Find Difference Between Two Strings

Find difference between two strings in JavaScript

Another option, for more sophisticated difference checking, is to make use of the PatienceDiff algorithm. I ported this algorithm to Javascript at...

https://github.com/jonTrent/PatienceDiff

...which although the algorithm is typically used for line-by-line comparison of text (such as computer programs), it can still be used for comparison character-by-character. Eg, to compare two strings, you can do the following...

let a = "thelebronnjamist";
let b = "the lebron james";

let difference = patienceDiff( a.split(""), b.split("") );

...with difference.lines being set to an array with the results of the comparison...

difference.lines: Array(19)

0: {line: "t", aIndex: 0, bIndex: 0}
1: {line: "h", aIndex: 1, bIndex: 1}
2: {line: "e", aIndex: 2, bIndex: 2}
3: {line: " ", aIndex: -1, bIndex: 3}
4: {line: "l", aIndex: 3, bIndex: 4}
5: {line: "e", aIndex: 4, bIndex: 5}
6: {line: "b", aIndex: 5, bIndex: 6}
7: {line: "r", aIndex: 6, bIndex: 7}
8: {line: "o", aIndex: 7, bIndex: 8}
9: {line: "n", aIndex: 8, bIndex: 9}
10: {line: "n", aIndex: 9, bIndex: -1}
11: {line: " ", aIndex: -1, bIndex: 10}
12: {line: "j", aIndex: 10, bIndex: 11}
13: {line: "a", aIndex: 11, bIndex: 12}
14: {line: "m", aIndex: 12, bIndex: 13}
15: {line: "i", aIndex: 13, bIndex: -1}
16: {line: "e", aIndex: -1, bIndex: 14}
17: {line: "s", aIndex: 14, bIndex: 15}
18: {line: "t", aIndex: 15, bIndex: -1}

Wherever aIndex === -1 or bIndex === -1 is an indication of a difference between the two strings. Specifically...

  • Element 3 indicates that character " " was found in b in position 3.
  • Element 10 indicates that character "n" was found in a in position 9.
  • Element 11 indicates that character " " was found in b in position 10.
  • Element 15 indicates that character "i" was found in a in position 13.
  • Element 16 indicates that character "e" was found in b in position 14.
  • Element 18 indicates that character "t" was found in a in position 15.

Note that the PatienceDiff algorithm is useful for comparing two similar blocks of text or strings. It will not tell you if basic edits have occurred. Eg, the following...

let a = "james lebron";
let b = "lebron james";

let difference = patienceDiff( a.split(""), b.split("") );

...returns difference.lines containing...

difference.lines: Array(18)

0: {line: "j", aIndex: 0, bIndex: -1}
1: {line: "a", aIndex: 1, bIndex: -1}
2: {line: "m", aIndex: 2, bIndex: -1}
3: {line: "e", aIndex: 3, bIndex: -1}
4: {line: "s", aIndex: 4, bIndex: -1}
5: {line: " ", aIndex: 5, bIndex: -1}
6: {line: "l", aIndex: 6, bIndex: 0}
7: {line: "e", aIndex: 7, bIndex: 1}
8: {line: "b", aIndex: 8, bIndex: 2}
9: {line: "r", aIndex: 9, bIndex: 3}
10: {line: "o", aIndex: 10, bIndex: 4}
11: {line: "n", aIndex: 11, bIndex: 5}
12: {line: " ", aIndex: -1, bIndex: 6}
13: {line: "j", aIndex: -1, bIndex: 7}
14: {line: "a", aIndex: -1, bIndex: 8}
15: {line: "m", aIndex: -1, bIndex: 9}
16: {line: "e", aIndex: -1, bIndex: 10}
17: {line: "s", aIndex: -1, bIndex: 11}

Notice that the PatienceDiff does not report the swap of the first and last name, but rather, provides a result showing what characters were removed from a and what characters were added to b to end up with the result of b.

EDIT: Added new algorithm dubbed patienceDiffPlus.

After mulling over the last example provided above that showed a limitation of the PatienceDiff in identifying lines that likely moved, it dawned on me that there was an elegant way of using the PatienceDiff algorithm to determine if any lines had indeed likely moved rather than just showing deletions and additions.

In short, I added the patienceDiffPlus algorithm (to the GitHub repo identified above) to the bottom of the PatienceDiff.js file. The patienceDiffPlus algorithm takes the deleted aLines[] and added bLines[] from the initial patienceDiff algorithm, and runs them through the patienceDiff algorithm again. Ie, patienceDiffPlus is seeking the Longest Common Subsequence of lines that likely moved, whereupon it records this in the original patienceDiff results. The patienceDiffPlus algorithm continues this until no more moved lines are found.

Now, using patienceDiffPlus, the following comparison...

let a = "james lebron";
let b = "lebron james";

let difference = patienceDiffPlus( a.split(""), b.split("") );

...returns difference.lines containing...

difference.lines: Array(18)

0: {line: "j", aIndex: 0, bIndex: -1, moved: true}
1: {line: "a", aIndex: 1, bIndex: -1, moved: true}
2: {line: "m", aIndex: 2, bIndex: -1, moved: true}
3: {line: "e", aIndex: 3, bIndex: -1, moved: true}
4: {line: "s", aIndex: 4, bIndex: -1, moved: true}
5: {line: " ", aIndex: 5, bIndex: -1, moved: true}
6: {line: "l", aIndex: 6, bIndex: 0}
7: {line: "e", aIndex: 7, bIndex: 1}
8: {line: "b", aIndex: 8, bIndex: 2}
9: {line: "r", aIndex: 9, bIndex: 3}
10: {line: "o", aIndex: 10, bIndex: 4}
11: {line: "n", aIndex: 11, bIndex: 5}
12: {line: " ", aIndex: 5, bIndex: 6, moved: true}
13: {line: "j", aIndex: 0, bIndex: 7, moved: true}
14: {line: "a", aIndex: 1, bIndex: 8, moved: true}
15: {line: "m", aIndex: 2, bIndex: 9, moved: true}
16: {line: "e", aIndex: 3, bIndex: 10, moved: true}
17: {line: "s", aIndex: 4, bIndex: 11, moved: true}

Notice the addition of the moved attribute, which identifies whether a line (or character in this case) was likely moved. Again, patienceDiffPlus simply matches the deleted aLines[] and added bLines[], so there is no guarantee that the lines were actually moved, but there is a strong likelihood that they were indeed moved.

Find the difference between two strings based on individual words

If you compare the sentences word by word and return what's different in sentence B compared to sentence A at that same position, you may split strings into arrays of words and compare those:

const speechA = `you are and you could`,      speechB = `you are and you couldn't`
const getStrDifference = (s1, s2) => { const a1 = s1.split(' '), a2 = s2.split(' ') return a2.reduce((diff, word, pos) => (word != a1[pos] && diff.push(word), diff), [])}
console.log(getStrDifference(speechA, speechB))

Calculating the difference between two strings

What about doing it recursively? If two elements are the same, the first element of the resulting tuple is incremented; otherwise, the second element of the resulting tuple is appended by the mismatched element:

calcP :: [String] -> [String] -> (Int,[String])
calcP (x:xs) (y:ys)
| x == y = increment (calcP xs ys)
| otherwise = append y (calcP xs ys)
where
increment (count, results) = (count + 1, results)
append y (count, results) = (count, y:results)

calcP [] x = (0, x)
calcP x [] = (0, [])

a = ["A1","A2","B3","C3"]
b = ["A1","B2","B3","D5"]

main = print $ calcP a b

The printed result is (2,["B2","D5"])

Note, that

calcP [] x = (0, x)
calcP x [] = (0, [])

are needed to provide exhaustiveness for the pattern matching. In other words, you need to provide the case when one of the passed elements is an empty list. This also provides the following logic:

If the first list is greater than the second one on n elements, these n last elements are ignored.

If the second list is greater than the first one on n elements, these n last elements are appended to the second element of the resulting tuple.

Returning difference between two strings (irrespective of their type)

You can just use loops, check if a character is not present in the string and you can save the difference in a variable.

Here's a way to do it in python:

x = 'abcd'
y = 'cdefg'

s = ''
t = ''

for i in x: # checking x with y
if i not in y:
s += i

for i in y: # checking y with x
if i not in x:
t += i

print(s) # ab
print(t) # efg

Edit:

I guess you are working in pandas column, so here's the code that would help you:

# importing pandas as pd
import pandas as pd

# Creating the DataFrame
df = pd.DataFrame({'PN':[555, 444, 333, 222, 111],
'whatever':['555A', 444, '333B', 222, '111C'],})

A=list(df['PN']) # Coverting Column to a list
B=list(df['whatever']) # Coverting Column to a list

def convert_str(a): # Function to convert element of list to string
return str(a)

C=[convert_str(i) for i in A] # Converting Element in List A to string
D=[convert_str(i) for i in B] # Converting Element in List B to string
E="".join(C) # Joinning the list C
F="".join(D) # Joinning the list D

diffrence=[i for i in F if i not in E] # Differences of F-E
print(diffrence)

# Output ['A', 'B', 'C']


Related Topics



Leave a reply



Submit