How to split a dataframe column by the first instance of a character in its values
Another option might be to use tidyr::separate
:
separate(x,a,into = c("b","c"),sep = "_",remove = FALSE,extra = "merge")
Split column on first occurence of '-'
Setup
df = pd.DataFrame({'house_nr': ['123-Rd-thing', '456-House', '567-House-thing']})
house_nr
0 123-Rd-thing
1 456-House
2 567-House-thing
Using a list comprehension and split
, which will be faster than pandas string methods:
pd.DataFrame([i.split('-', 1) for i in df.house_nr], columns=['num', 'suffix'])
num suffix
0 123 Rd-thing
1 456 House
2 567 House-thing
Is there a better way to split a pandas dataframe column based on some characters?
You can split the column data directly with a \D+
pattern that matches one or more chars other than digits (since space, X
or Y
are non-digits):
import pandas as pd
xx = {'Code': ["001", "002","003"], 'Date': ["202103151716Y202103151716","202103151716X202103151716","202103151716 202103151716"]}
df = pd.DataFrame(data=xx)
df[['Date1', 'Date2']] = df['Date'].str.split(r'\D+', 1, expand=True)
df['Date1'] = pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])
df
# => Code Date Date1 Date2
# 0 001 202103151716Y202103151716 2021-03-15 17:16:00 2021-03-15 17:16:00
# 1 002 202103151716X202103151716 2021-03-15 17:16:00 2021-03-15 17:16:00
# 2 003 202103151716 202103151716 2021-03-15 17:16:00 2021-03-15 17:16:00
So, there is no need replacing anything in the first place.
After separating with non-numerical chars, you may use pd.to_datetime()
to cast the datetime numerical value to a datetime type.
Note that [ XY]
is a character class that matches only a space, X
or Y
, but it seems the \D+
non-digit pattern should be safe with the data you showed.
Split dataframe text column by first occurrence and last occurrence of dash '-'
You can use split
with rsplit
by first occurence of separator:
df[['location','position']] = df.pop('row').str.split('-', n=1, expand=True)
df[['position','company']] = df['position'].str.rsplit('-', n=1, expand=True)
print (df)
location position company
0 india manager intel
1 india sales-manager amazon
2 banglore ccm- head - county jp morgan
Pandas df.str.split() on first element only
Try with add n=1
filtered_transcript_text['msgText'].str.split(':', expand = True,n=1)
Split dataframe column into two based on first occurrence of an item in column value
Add parameter expand=True
for DataFrame
and then add []
for new columns:
df[['date','time']] = df.Time.str.split(":", 1, expand=True)
print (df)
IP Time URL Staus \
0 10.128.2.1 [29/Nov/2017:06:58:55 GET/login.php HTTP/1.1 200
1 10.128.2.1 [29/Nov/2017:06:59:02 POST/process.php HTTP/1.1 302
date time
0 [29/Nov/2017 06:58:55
1 [29/Nov/2017 06:59:02
Or also add Series.str.strip
for remove trailing []
:
df[['date','time']] = df.Time.str.strip('[]').str.split(":", 1, expand=True)
print (df)
IP Time URL Staus \
0 10.128.2.1 [29/Nov/2017:06:58:55 GET/login.php HTTP/1.1 200
1 10.128.2.1 [29/Nov/2017:06:59:02 POST/process.php HTTP/1.1 302
date time
0 29/Nov/2017 06:58:55
1 29/Nov/2017 06:59:02
Splitting pandas dataframe column (into two) after the first letter in the cell
Use split
be first whitespace:
df[['Amino Acid', 'Percentage']] = df['Percentage'].str.split(n=1, expand=True)
Get last column after .str.split() operation on column in pandas DataFrame
Do this:
In [43]: temp2.str[-1]
Out[43]:
0 p500
1 p600
2 p700
Name: ticker
So all together it would be:
>>> temp = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})
>>> temp['ticker'].str.split(' ').str[-1]
0 p500
1 p600
2 p700
Name: ticker, dtype: object
Splitting on first occurrence
From the docs:
str.split([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most
maxsplit+1
elements).
s.split('mango', 1)[1]
Related Topics
All Paths in Directed Tree Graph from Root to Leaves in Igraph R
Using If Else on a Dataframe Across Multiple Columns
Plot The Intensity of a Continuous with Geom_Tile in Ggplot
Conda Build R Package Fails at C Compiler Issue on Macos Mojave
Classification Functions in Linear Discriminant Analysis in R
Multiplication of Large Integers
Data.Table Join (Multiple) Selected Columns with New Names
How to Force Ggplot's Geom_Tile to Fill Every Facet
Include Link to Local HTML File in Datatable in Shiny
Ggplot: Line Plot for Discrete X-Axis
Why Ggplot2 Legend Not Show in The Graph
Is There More Efficient or Concise Way to Use Tidyr::Gather to Make My Data Look 'Tidy'
Ggplot2 Log Transformation for Data and Scales
R: Remove Repeating Row Entries in Gridextra Table
Flag First By-Group in R Data Frame
Obtain Date Column from Xts Object
Is Ifelse Ever Appropriate in a Non-Vectorized Situation and Vice-Versa