I Am Trying to Split a Full Name to First Middle and Last Name in Pandas But I Am Stuck At Replace

i am trying to split a full name to first middle and last name in pandas but i am stuck at replace

I think you need mask which replace if same values in both columns to empty strings:

df = pd.DataFrame({'owner1_name':['THOMAS MARY D', 'JOE Long', 'MARY Small']})

splitted = df['owner1_name'].str.split()
df['owner1_first_name'] = splitted.str[0]
df['owner1_last_name'] = splitted.str[-1]
df['owner1_middle_name'] = splitted.str[1]
df['owner1_middle_name'] = df['owner1_middle_name']
.mask(df['owner1_middle_name'] == df['owner1_last_name'], '')
print (df)
owner1_name owner1_first_name owner1_last_name owner1_middle_name
0 THOMAS MARY D THOMAS D MARY
1 JOE Long JOE Long
2 MARY Small MARY Small

What is same as:

splitted = df['owner1_name'].str.split()
df['owner1_first_name'] = splitted.str[0]
df['owner1_last_name'] = splitted.str[-1]
middle = splitted.str[1]
df['owner1_middle_name'] = middle.mask(middle == df['owner1_last_name'], '')
print (df)
owner1_name owner1_first_name owner1_last_name owner1_middle_name
0 THOMAS MARY D THOMAS D MARY
1 JOE Long JOE Long
2 MARY Small MARY Small

EDIT:

For replace by rows is possible use apply with axis=1:

df = pd.DataFrame({'owner1_name':['THOMAS MARY-THOMAS', 'JOE LongJOE', 'MARY Small']})

splitted = df['owner1_name'].str.split()
df['a'] = splitted.str[0]
df['b'] = splitted.str[-1]

df['c'] = df.apply(lambda x: x['b'].replace(x['a'], ''), axis=1)
print (df)
owner1_name a b c
0 THOMAS MARY-THOMAS THOMAS MARY-THOMAS MARY-
1 JOE LongJOE JOE LongJOE Long
2 MARY Small MARY Small Small

the exact code to in three line to achieve what i wanted in my question is

df['owner1_first_name'] = df['owner1_name'].str.split().str[0]
df['owner1_last_name'] = df.apply(lambda x: x['owner1_name'].split()
[-1].replace(x['owner1_first_name'], ''), axis=1)
df['owner1_middle_name'] = df.apply(lambda x:
x['owner1_name'].replace(x['owner1_first_name'],
'').replace(x['owner1_last_name'], ''), axis=1)

Pandas Full Name Split into First , Middle and Last Names

You can use negative indexing to get the last item in the list for the last name and also use a slice to get all but the first and last for the middle name:

fullnames = "Walter John  Ross Schmidt"
first = fullnames.split()[0]
last = fullnames.split()[-1]
middle = " ".join(fullnames.split()[1:-1])
print("First = {first}".format(first=first))
print("Middle = {middle}".format(middle=middle))
print("Last = {last}".format(last=last))

PS if you are working with a data frame you can use:

df = pd.DataFrame({'fullnames':['Walter John  Ross Schmidt']})
df = df.assign(**{
'first': df['fullnames'].str.split().str[0],
'middle': df['fullnames'].str.split().str[1:-1].str.join(' '),
'last': df['fullnames'].str.split().str[-1]
})

Output:

   fullnames                  first   middle     last
0 Walter John Ross Schmidt Walter John Ross Schmidt

Python CSV Splitting Full Name String into First and Last Name

Here is some code that will help you, you can use the csv module as others have suggested in the comments

import csv

with open('old.csv', 'rb') as f:
reader = csv.reader(f)
newcsvdict = {"first name": [], "last name": []}
for row in reader:
first = row[0].split()[0]
last = row[0].split()[1]
newcsvdict["first name"].append(first)
newcsvdict["last name"].append(last)

with open('new.csv', 'wb') as f:
w = csv.DictWriter(f, newcsvdict.keys())
w.writeheader()
w.writerows(newcsvdict)

Reversing names in file input and output

input_file.txt

David Andrew Joyner 
Hart, Melissa Joan
Cyrus, Billy Ray
Kevin Pietersen
Rowling, Johanne Kimberley

CODE

def name_fixer(input_file, output_file):
with open(input_file, "r") as f, open(output_file, "w+") as g:
for line in f:
line = line.strip()
if ',' in line:
line = ' '.join(line.split(',')[::-1]).strip()
g.write(f"{line}\n")

myinputfile = "input_file.txt"
myoutputfile = "output_file.txt"
name_fixer(myinputfile, myoutputfile)

output_file.txt

David Andrew Joyner
Melissa Joan Hart
Billy Ray Cyrus
Kevin Pietersen
Johanne Kimberley Rowling

prevent string from splitting on comma when writing a csv file

csv will automatically split any string on a , into multiple fields. In order to avoid this, you need to enclose your string containing a , within quotes,

Output_File.write('"{}","{}","{}"'.format(restaurant, names[i], review[i]))

This way you can still make it , delimited.

You can also replace , with any other character as a delimiter.

Alternative way to split a list into groups of n

A Python recipe (In Python 2.6, use itertools.izip_longest):

def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)

Example usage:

>>> list(grouper(3, range(9)))
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
>>> list(grouper(3, range(10)))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]

If you want the last group to be shorter than the others instead of padded with fillvalue, then you could e.g. change the code like this:

>>> def mygrouper(n, iterable):
... args = [iter(iterable)] * n
... return ([e for e in t if e != None] for t in itertools.zip_longest(*args))
...
>>> list(mygrouper(3, range(9)))
[[0, 1, 2], [3, 4, 5], [6, 7, 8]]
>>> list(mygrouper(3, range(10)))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]


Related Topics



Leave a reply



Submit