How to Split a Dataframe Column by The First Instance of a Character in Its Values

How to split a dataframe column by the first instance of a character in its values

Another option might be to use tidyr::separate:

separate(x,a,into = c("b","c"),sep = "_",remove = FALSE,extra = "merge")

Split column on first occurence of '-'

Setup

df = pd.DataFrame({'house_nr': ['123-Rd-thing', '456-House', '567-House-thing']})

          house_nr
0     123-Rd-thing
1        456-House
2  567-House-thing

Using a list comprehension and split, which will be faster than pandas string methods:

pd.DataFrame([i.split('-', 1) for i in df.house_nr], columns=['num', 'suffix'])

   num       suffix
0  123     Rd-thing
1  456        House
2  567  House-thing

Is there a better way to split a pandas dataframe column based on some characters?

You can split the column data directly with a \D+ pattern that matches one or more chars other than digits (since space, X or Y are non-digits):

import pandas as pd
xx = {'Code': ["001", "002","003"], 'Date': ["202103151716Y202103151716","202103151716X202103151716","202103151716 202103151716"]}
df = pd.DataFrame(data=xx)
df[['Date1', 'Date2']] = df['Date'].str.split(r'\D+', 1, expand=True)
df['Date1'] = pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])
df
# =>  Code                       Date               Date1               Date2
#   0  001  202103151716Y202103151716 2021-03-15 17:16:00 2021-03-15 17:16:00
#   1  002  202103151716X202103151716 2021-03-15 17:16:00 2021-03-15 17:16:00
#   2  003  202103151716 202103151716 2021-03-15 17:16:00 2021-03-15 17:16:00

So, there is no need replacing anything in the first place.

After separating with non-numerical chars, you may use pd.to_datetime() to cast the datetime numerical value to a datetime type.

Note that [ XY] is a character class that matches only a space, X or Y, but it seems the \D+ non-digit pattern should be safe with the data you showed.

Split dataframe text column by first occurrence and last occurrence of dash '-'

You can use split with rsplit by first occurence of separator:

df[['location','position']] = df.pop('row').str.split('-', n=1, expand=True)
df[['position','company']] = df['position'].str.rsplit('-', n=1, expand=True)
print (df)
    location            position     company
0     india              manager       intel
1     india        sales-manager      amazon
2  banglore   ccm- head - county   jp morgan

Pandas df.str.split() on first element only

Try with add n=1

filtered_transcript_text['msgText'].str.split(':', expand = True,n=1)

Split dataframe column into two based on first occurrence of an item in column value

Add parameter expand=True for DataFrame and then add [] for new columns:

df[['date','time']] = df.Time.str.split(":", 1, expand=True)
print (df)
           IP                   Time                        URL  Staus  \
0  10.128.2.1  [29/Nov/2017:06:58:55     GET/login.php HTTP/1.1    200   
1  10.128.2.1  [29/Nov/2017:06:59:02  POST/process.php HTTP/1.1    302   

           date      time  
0  [29/Nov/2017  06:58:55  
1  [29/Nov/2017  06:59:02

Or also add Series.str.strip for remove trailing []:

df[['date','time']] = df.Time.str.strip('[]').str.split(":", 1, expand=True)
print (df)
           IP                   Time                        URL  Staus  \
0  10.128.2.1  [29/Nov/2017:06:58:55     GET/login.php HTTP/1.1    200   
1  10.128.2.1  [29/Nov/2017:06:59:02  POST/process.php HTTP/1.1    302   

          date      time  
0  29/Nov/2017  06:58:55  
1  29/Nov/2017  06:59:02

Splitting pandas dataframe column (into two) after the first letter in the cell

Use split be first whitespace:

df[['Amino Acid', 'Percentage']] = df['Percentage'].str.split(n=1, expand=True)

Get last column after .str.split() operation on column in pandas DataFrame

Do this:

In [43]: temp2.str[-1]
Out[43]: 
0    p500
1    p600
2    p700
Name: ticker

So all together it would be:

>>> temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})
>>> temp['ticker'].str.split(' ').str[-1]
0    p500
1    p600
2    p700
Name: ticker, dtype: object

Splitting on first occurrence

From the docs:

str.split([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements).

s.split('mango', 1)[1]

How to Split a Dataframe Column by The First Instance of a Character in Its Values