How to Split a Dataframe Column by The First Instance of a Character in Its Values

How to split a dataframe column by the first instance of a character in its values

Another option might be to use tidyr::separate:

separate(x,a,into = c("b","c"),sep = "_",remove = FALSE,extra = "merge")

Split column on first occurence of '-'

Setup

df = pd.DataFrame({'house_nr': ['123-Rd-thing', '456-House', '567-House-thing']})

          house_nr
0 123-Rd-thing
1 456-House
2 567-House-thing

Using a list comprehension and split, which will be faster than pandas string methods:

pd.DataFrame([i.split('-', 1) for i in df.house_nr], columns=['num', 'suffix'])

num suffix
0 123 Rd-thing
1 456 House
2 567 House-thing

Is there a better way to split a pandas dataframe column based on some characters?

You can split the column data directly with a \D+ pattern that matches one or more chars other than digits (since space, X or Y are non-digits):

import pandas as pd
xx = {'Code': ["001", "002","003"], 'Date': ["202103151716Y202103151716","202103151716X202103151716","202103151716 202103151716"]}
df = pd.DataFrame(data=xx)
df[['Date1', 'Date2']] = df['Date'].str.split(r'\D+', 1, expand=True)
df['Date1'] = pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])
df
# => Code Date Date1 Date2
# 0 001 202103151716Y202103151716 2021-03-15 17:16:00 2021-03-15 17:16:00
# 1 002 202103151716X202103151716 2021-03-15 17:16:00 2021-03-15 17:16:00
# 2 003 202103151716 202103151716 2021-03-15 17:16:00 2021-03-15 17:16:00

So, there is no need replacing anything in the first place.

After separating with non-numerical chars, you may use pd.to_datetime() to cast the datetime numerical value to a datetime type.

Note that [ XY] is a character class that matches only a space, X or Y, but it seems the \D+ non-digit pattern should be safe with the data you showed.

Split dataframe text column by first occurrence and last occurrence of dash '-'

You can use split with rsplit by first occurence of separator:

df[['location','position']] = df.pop('row').str.split('-', n=1, expand=True)
df[['position','company']] = df['position'].str.rsplit('-', n=1, expand=True)
print (df)
location position company
0 india manager intel
1 india sales-manager amazon
2 banglore ccm- head - county jp morgan

Pandas df.str.split() on first element only

Try with add n=1

filtered_transcript_text['msgText'].str.split(':', expand = True,n=1)

Split dataframe column into two based on first occurrence of an item in column value

Add parameter expand=True for DataFrame and then add [] for new columns:

df[['date','time']] = df.Time.str.split(":", 1, expand=True)
print (df)
IP Time URL Staus \
0 10.128.2.1 [29/Nov/2017:06:58:55 GET/login.php HTTP/1.1 200
1 10.128.2.1 [29/Nov/2017:06:59:02 POST/process.php HTTP/1.1 302

date time
0 [29/Nov/2017 06:58:55
1 [29/Nov/2017 06:59:02

Or also add Series.str.strip for remove trailing []:

df[['date','time']] = df.Time.str.strip('[]').str.split(":", 1, expand=True)
print (df)
IP Time URL Staus \
0 10.128.2.1 [29/Nov/2017:06:58:55 GET/login.php HTTP/1.1 200
1 10.128.2.1 [29/Nov/2017:06:59:02 POST/process.php HTTP/1.1 302

date time
0 29/Nov/2017 06:58:55
1 29/Nov/2017 06:59:02

Splitting pandas dataframe column (into two) after the first letter in the cell

Use split be first whitespace:

df[['Amino Acid', 'Percentage']] = df['Percentage'].str.split(n=1, expand=True)

Get last column after .str.split() operation on column in pandas DataFrame

Do this:

In [43]: temp2.str[-1]
Out[43]:
0 p500
1 p600
2 p700
Name: ticker

So all together it would be:

>>> temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})
>>> temp['ticker'].str.split(' ').str[-1]
0 p500
1 p600
2 p700
Name: ticker, dtype: object

Splitting on first occurrence

From the docs:

str.split([sep[, maxsplit]])

Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements).

s.split('mango', 1)[1]


Related Topics



Leave a reply



Submit