i am trying to split a full name to first middle and last name in pandas but i am stuck at replace
I think you need mask
which replace if same values in both columns to empty strings:
df = pd.DataFrame({'owner1_name':['THOMAS MARY D', 'JOE Long', 'MARY Small']})
splitted = df['owner1_name'].str.split()
df['owner1_first_name'] = splitted.str[0]
df['owner1_last_name'] = splitted.str[-1]
df['owner1_middle_name'] = splitted.str[1]
df['owner1_middle_name'] = df['owner1_middle_name']
.mask(df['owner1_middle_name'] == df['owner1_last_name'], '')
print (df)
owner1_name owner1_first_name owner1_last_name owner1_middle_name
0 THOMAS MARY D THOMAS D MARY
1 JOE Long JOE Long
2 MARY Small MARY Small
What is same as:
splitted = df['owner1_name'].str.split()
df['owner1_first_name'] = splitted.str[0]
df['owner1_last_name'] = splitted.str[-1]
middle = splitted.str[1]
df['owner1_middle_name'] = middle.mask(middle == df['owner1_last_name'], '')
print (df)
owner1_name owner1_first_name owner1_last_name owner1_middle_name
0 THOMAS MARY D THOMAS D MARY
1 JOE Long JOE Long
2 MARY Small MARY Small
EDIT:
For replace
by rows is possible use apply
with axis=1
:
df = pd.DataFrame({'owner1_name':['THOMAS MARY-THOMAS', 'JOE LongJOE', 'MARY Small']})
splitted = df['owner1_name'].str.split()
df['a'] = splitted.str[0]
df['b'] = splitted.str[-1]
df['c'] = df.apply(lambda x: x['b'].replace(x['a'], ''), axis=1)
print (df)
owner1_name a b c
0 THOMAS MARY-THOMAS THOMAS MARY-THOMAS MARY-
1 JOE LongJOE JOE LongJOE Long
2 MARY Small MARY Small Small
the exact code to in three line to achieve what i wanted in my question is
df['owner1_first_name'] = df['owner1_name'].str.split().str[0]
df['owner1_last_name'] = df.apply(lambda x: x['owner1_name'].split()
[-1].replace(x['owner1_first_name'], ''), axis=1)
df['owner1_middle_name'] = df.apply(lambda x:
x['owner1_name'].replace(x['owner1_first_name'],
'').replace(x['owner1_last_name'], ''), axis=1)
Pandas Full Name Split into First , Middle and Last Names
You can use negative indexing to get the last item in the list for the last name and also use a slice to get all but the first and last for the middle name:
fullnames = "Walter John Ross Schmidt"
first = fullnames.split()[0]
last = fullnames.split()[-1]
middle = " ".join(fullnames.split()[1:-1])
print("First = {first}".format(first=first))
print("Middle = {middle}".format(middle=middle))
print("Last = {last}".format(last=last))
PS if you are working with a data frame you can use:
df = pd.DataFrame({'fullnames':['Walter John Ross Schmidt']})
df = df.assign(**{
'first': df['fullnames'].str.split().str[0],
'middle': df['fullnames'].str.split().str[1:-1].str.join(' '),
'last': df['fullnames'].str.split().str[-1]
})
Output:
fullnames first middle last
0 Walter John Ross Schmidt Walter John Ross Schmidt
Python CSV Splitting Full Name String into First and Last Name
Here is some code that will help you, you can use the csv module as others have suggested in the comments
import csv
with open('old.csv', 'rb') as f:
reader = csv.reader(f)
newcsvdict = {"first name": [], "last name": []}
for row in reader:
first = row[0].split()[0]
last = row[0].split()[1]
newcsvdict["first name"].append(first)
newcsvdict["last name"].append(last)
with open('new.csv', 'wb') as f:
w = csv.DictWriter(f, newcsvdict.keys())
w.writeheader()
w.writerows(newcsvdict)
Reversing names in file input and output
input_file.txt
David Andrew Joyner
Hart, Melissa Joan
Cyrus, Billy Ray
Kevin Pietersen
Rowling, Johanne Kimberley
CODE
def name_fixer(input_file, output_file):
with open(input_file, "r") as f, open(output_file, "w+") as g:
for line in f:
line = line.strip()
if ',' in line:
line = ' '.join(line.split(',')[::-1]).strip()
g.write(f"{line}\n")
myinputfile = "input_file.txt"
myoutputfile = "output_file.txt"
name_fixer(myinputfile, myoutputfile)
output_file.txt
David Andrew Joyner
Melissa Joan Hart
Billy Ray Cyrus
Kevin Pietersen
Johanne Kimberley Rowling
prevent string from splitting on comma when writing a csv file
csv will automatically split any string on a ,
into multiple fields. In order to avoid this, you need to enclose your string containing a ,
within quotes,
Output_File.write('"{}","{}","{}"'.format(restaurant, names[i], review[i]))
This way you can still make it ,
delimited.
You can also replace ,
with any other character as a delimiter.
Alternative way to split a list into groups of n
A Python recipe (In Python 2.6, use itertools.izip_longest
):
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
Example usage:
>>> list(grouper(3, range(9)))
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
>>> list(grouper(3, range(10)))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
If you want the last group to be shorter than the others instead of padded with fillvalue
, then you could e.g. change the code like this:
>>> def mygrouper(n, iterable):
... args = [iter(iterable)] * n
... return ([e for e in t if e != None] for t in itertools.zip_longest(*args))
...
>>> list(mygrouper(3, range(9)))
[[0, 1, 2], [3, 4, 5], [6, 7, 8]]
>>> list(mygrouper(3, range(10)))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
Related Topics
Dividing Each Row by the Previous One
Pyspark - Pass List as Parameter to Udf
How to Check If a String Is Unicode or Ascii
Loop Through Json Data in Python
How to Downgrade Tensorflow, Multiple Versions Possible
How to Get the Latest File in a Folder
Printing Simple Diamond Pattern in Python
Loading All Images Using Imread from a Given Folder
Psycopg2 Insert Python Dictionary as Json
How Can My Model Primary Key Start With a Specific Number
Write a Program That Find the Largest Integer in a String
Get Character Position in Alphabet
Pandas Extract Numbers from Column into New Columns
How to Convert Column With Dtype as Object to String in Pandas Dataframe
In Dictionary, Converting the Value from String to Integer
Retrieve Top N in Each Group of a Dataframe in Pyspark