Returning difference between two strings (irrespective of their type)
You can just use loops, check if a character is not present in the string and you can save the difference in a variable.
Here's a way to do it in python:
x = 'abcd'
y = 'cdefg'
s = ''
t = ''
for i in x: # checking x with y
if i not in y:
s += i
for i in y: # checking y with x
if i not in x:
t += i
print(s) # ab
print(t) # efg
Edit:
I guess you are working in pandas column, so here's the code that would help you:
# importing pandas as pd
import pandas as pd
# Creating the DataFrame
df = pd.DataFrame({'PN':[555, 444, 333, 222, 111],
'whatever':['555A', 444, '333B', 222, '111C'],})
A=list(df['PN']) # Coverting Column to a list
B=list(df['whatever']) # Coverting Column to a list
def convert_str(a): # Function to convert element of list to string
return str(a)
C=[convert_str(i) for i in A] # Converting Element in List A to string
D=[convert_str(i) for i in B] # Converting Element in List B to string
E="".join(C) # Joinning the list C
F="".join(D) # Joinning the list D
diffrence=[i for i in F if i not in E] # Differences of F-E
print(diffrence)
# Output ['A', 'B', 'C']
Python - getting just the difference between strings
a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'
splitA = set(a.split("\n"))
splitB = set(b.split("\n"))
diff = splitB.difference(splitA)
diff = ", ".join(diff) # ' testing this is working 2, more things if there were...'
Essentially making each string a set of lines, and taking the set difference - i.e. All things in B that are not in A. Then taking that result and joining it all into one string.
Edit: This is a conveluded way of saying what @ShreyasG said - [x for x if x not in y]...
python 3, differences between two strings
Using difflib
is probably your best bet as you are unlikely to come up with a more efficient solution than the algorithms it provides. What you want is to use SequenceMatcher.get_matching_blocks
. Here is what it will output according to the doc.
Return list of triples describing matching subsequences. Each triple
is of the form(i, j, n)
, and means thata[i:i+n] == b[j:j+n]
. The
triples are monotonically increasing in i and j.
Here is a way you could use this to reconstruct a string from which you removed the delta.
from difflib import SequenceMatcher
x = "abc_def"
y = "abc--ef"
matcher = SequenceMatcher(None, x, y)
blocks = matcher.get_matching_blocks()
# blocks: [Match(a=0, b=0, size=4), Match(a=5, b=5, size=2), Match(a=7, b=7, size=0)]
string = ''.join([x[a:a+n] for a, _, n in blocks])
# string: "abcef"
Edit: It was also pointed out that in a case where you had two strings like such.
t1 = 'WordWordaayaaWordWord'
t2 = 'WordWordbbbybWordWord'
Then the above code would return 'WordWordyWordWord
. This is because get_matching_blocks
will catch that 'y'
that is present in both strings between the expected blocks. A solution around this is to filter the returned blocks by length.
string = ''.join([x[a:a+n] for a, _, n in blocks if n > 1])
If you want more complex analysis of the returned blocks you could also do the following.
def block_filter(substring):
"""Outputs True if the substring is to be merged, False otherwise"""
...
string = ''.join([x[a:a+n] for a, _, n in blocks if block_filter(x[a:a+n])])
Python - compare two string by words using difflib and print only difference
If you don't have to use difflib
, you could use a set
and string splitting!
>>> original = "Apple Microsoft Google Oracle"
>>> edited = "Apple Nvdia IBM"
>>> set(original.split()).symmetric_difference(set(edited.split()))
{'IBM', 'Google', 'Oracle', 'Microsoft', 'Nvdia'}
You can also get the shared members with the .intersection()
>>> set(original.split()).intersection(set(edited.split()))
{'Apple'}
The Wikipedia has a good section on basic set operations with accompanying Venn diagrams
https://en.wikipedia.org/wiki/Set_(mathematics)#Basic_operations
However, if you have to use difflib
(some strange environment or assignment) you can also just find every member with a +-
prefix and slice off the all the prefixes
>>> diff = d.compare(original.split(), edited.split())
>>> list(a[2:] for a in diff if a.startswith(("+", "-")))
['Nvdia', 'IBM', 'Microsoft', 'Google', 'Oracle']
All of these operations result in an iterable of strings, so you can .join()
'em together or similar to get a single result as you do in your Question
>>> print("\n".join(result))
IBM
Google
Oracle
Microsoft
Nvdia
Difference between two strings (sentences)
Use a simple list comprehension:
diff = [x for x in difflib.ndiff(text1_lines, text2_lines) if x[0] != ' ']
It will show you the deletions and addendums
Output:
['- ', '- D', '- i', '- f', '- f', '- e', '- r', '- e', '- n', '- c', '- e']
(everything with a minus behind it was deleted)
Conversely, switching text1_lines
and text2_lines
would produce this result:
['+ ', '+ D', '+ i', '+ f', '+ f', '+ e', '+ r', '+ e', '+ n', '+ c', '+ e']
To remove signs, you can convert the above list:
diff_nl = [x[2] for x in diff]
To fully convert to a string, just use .join()
:
diff_nl = ''.join([x[2] for x in diff])
Related Topics
What Is the Recommended Way of Allocating Memory for a Typed Memory View
How to Remove Blanks/Na's from Dataframe and Shift the Values Up
Deleting List Elements Based on Condition
Can Elementtree Be Told to Preserve the Order of Attributes
Python Numpy/Scipy Curve Fitting
Using the Class as a Type Hint for Arguments in Its Methods
How to Extract Parameters from a List and Pass Them to a Function Call
Implement Matlab's Im2Col 'Sliding' in Python
How to Filter Pandas Dataframes by Multiple Columns
How to Keep Index When Using Pandas Merge
How to Use Virtualenv with Python
Include Intermediary (Through Model) in Responses in Django Rest Framework
Save/Load Scipy Sparse Csr_Matrix in Portable Data Format
List of Tables, Db Schema, Dump etc Using the Python SQLite3 API