Python - Difference Between Two Strings

Returning difference between two strings (irrespective of their type)

You can just use loops, check if a character is not present in the string and you can save the difference in a variable.

Here's a way to do it in python:

x = 'abcd'
y = 'cdefg'

s = ''
t = ''

for i in x: # checking x with y
if i not in y:
s += i

for i in y: # checking y with x
if i not in x:
t += i

print(s) # ab
print(t) # efg

Edit:

I guess you are working in pandas column, so here's the code that would help you:

# importing pandas as pd
import pandas as pd

# Creating the DataFrame
df = pd.DataFrame({'PN':[555, 444, 333, 222, 111],
'whatever':['555A', 444, '333B', 222, '111C'],})

A=list(df['PN']) # Coverting Column to a list
B=list(df['whatever']) # Coverting Column to a list

def convert_str(a): # Function to convert element of list to string
return str(a)

C=[convert_str(i) for i in A] # Converting Element in List A to string
D=[convert_str(i) for i in B] # Converting Element in List B to string
E="".join(C) # Joinning the list C
F="".join(D) # Joinning the list D

diffrence=[i for i in F if i not in E] # Differences of F-E
print(diffrence)

# Output ['A', 'B', 'C']

Python - getting just the difference between strings

a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'

splitA = set(a.split("\n"))
splitB = set(b.split("\n"))

diff = splitB.difference(splitA)
diff = ", ".join(diff) # ' testing this is working 2, more things if there were...'

Essentially making each string a set of lines, and taking the set difference - i.e. All things in B that are not in A. Then taking that result and joining it all into one string.

Edit: This is a conveluded way of saying what @ShreyasG said - [x for x if x not in y]...

python 3, differences between two strings

Using difflib is probably your best bet as you are unlikely to come up with a more efficient solution than the algorithms it provides. What you want is to use SequenceMatcher.get_matching_blocks. Here is what it will output according to the doc.

Return list of triples describing matching subsequences. Each triple
is of the form (i, j, n), and means that a[i:i+n] == b[j:j+n]. The
triples are monotonically increasing in i and j.

Here is a way you could use this to reconstruct a string from which you removed the delta.

from difflib import SequenceMatcher

x = "abc_def"
y = "abc--ef"

matcher = SequenceMatcher(None, x, y)
blocks = matcher.get_matching_blocks()

# blocks: [Match(a=0, b=0, size=4), Match(a=5, b=5, size=2), Match(a=7, b=7, size=0)]

string = ''.join([x[a:a+n] for a, _, n in blocks])

# string: "abcef"

Edit: It was also pointed out that in a case where you had two strings like such.

t1 = 'WordWordaayaaWordWord'
t2 = 'WordWordbbbybWordWord'

Then the above code would return 'WordWordyWordWord. This is because get_matching_blocks will catch that 'y' that is present in both strings between the expected blocks. A solution around this is to filter the returned blocks by length.

string = ''.join([x[a:a+n] for a, _, n in blocks if n > 1])

If you want more complex analysis of the returned blocks you could also do the following.

def block_filter(substring):
"""Outputs True if the substring is to be merged, False otherwise"""
...

string = ''.join([x[a:a+n] for a, _, n in blocks if block_filter(x[a:a+n])])

Python - compare two string by words using difflib and print only difference

If you don't have to use difflib, you could use a set and string splitting!

>>> original = "Apple Microsoft Google Oracle"
>>> edited = "Apple Nvdia IBM"
>>> set(original.split()).symmetric_difference(set(edited.split()))
{'IBM', 'Google', 'Oracle', 'Microsoft', 'Nvdia'}

You can also get the shared members with the .intersection()

>>> set(original.split()).intersection(set(edited.split()))
{'Apple'}

The Wikipedia has a good section on basic set operations with accompanying Venn diagrams

https://en.wikipedia.org/wiki/Set_(mathematics)#Basic_operations


However, if you have to use difflib (some strange environment or assignment) you can also just find every member with a +- prefix and slice off the all the prefixes

>>> diff = d.compare(original.split(), edited.split())
>>> list(a[2:] for a in diff if a.startswith(("+", "-")))
['Nvdia', 'IBM', 'Microsoft', 'Google', 'Oracle']

All of these operations result in an iterable of strings, so you can .join() 'em together or similar to get a single result as you do in your Question

>>> print("\n".join(result))
IBM
Google
Oracle
Microsoft
Nvdia

Difference between two strings (sentences)

Use a simple list comprehension:

diff = [x for x in difflib.ndiff(text1_lines, text2_lines) if x[0] != ' ']

It will show you the deletions and addendums

Output:

['-  ', '- D', '- i', '- f', '- f', '- e', '- r', '- e', '- n', '- c', '- e']

(everything with a minus behind it was deleted)

Conversely, switching text1_lines and text2_lines would produce this result:

['+  ', '+ D', '+ i', '+ f', '+ f', '+ e', '+ r', '+ e', '+ n', '+ c', '+ e']

To remove signs, you can convert the above list:

diff_nl = [x[2] for x in diff]

To fully convert to a string, just use .join():

diff_nl = ''.join([x[2] for x in diff])


Related Topics



Leave a reply



Submit