How to split a dataframe string column into two columns?
There might be a better way, but this here's one approach:
row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
columns = ['fips','row'])
fips row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
Split data frame string column into multiple columns
Use stringr::str_split_fixed
library(stringr)
str_split_fixed(before$type, "_and_", 2)
How to split a Pandas DataFrame column into multiple columns if the column is a string of varying length?
You can try using str.rsplit
:
Splits string around given separator/delimiter, starting from the
right.
df['Col_1'].str.rsplit(' ', 2, expand=True)
Output:
0 1 2
0 Hello X Y
1 Hello world Q R
2 Hi S T
As a full dataframe:
df['Col_1'].str.rsplit(' ', 2, expand=True).add_prefix('nCol_').join(df)
Output:
nCol_0 nCol_1 nCol_2 Col_1 Col_2
0 Hello X Y Hello X Y A
1 Hello world Q R Hello world Q R B
2 Hi S T Hi S T C
Split data frame string column into multiple columns (comma separated characters)
One option using str_split
, unnest_longer
and table
subject <- c(1,2,3)
letters <- c("a, b, f, g", "b, g, m, l", "g, m, z")
df1 <- data.frame(subject, letters)
library(tidyverse)
df1 %>%
mutate(letters = str_split(letters, ', ')) %>%
unnest_longer(letters) %>%
table
#> letters
#> subject a b f g l m z
#> 1 1 1 1 1 0 0 0
#> 2 0 1 0 1 1 1 0
#> 3 0 0 0 1 0 1 1
Created on 2022-02-10 by the reprex package (v2.0.0)
Seeing some of the other answers, separate_rows
is a better solution here
df1 %>%
separate_rows(letters) %>%
table
Python split one column into multiple columns and reattach the split columns into original dataframe
There is unique index in original data and is not changed in next code for both DataFrames, so you can use concat
for join together and then add to original by DataFrame.join
or concat
with axis=1
:
address = df['Residence'].str.split(';',expand=True)
country = address[0] != 'USA'
USA, nonUSA = address[~country], address[country]
USA.columns = ['Country', 'State', 'County', 'City']
nonUSA = nonUSA.dropna(axis=0, subset=[1])
nonUSA = nonUSA[nonUSA.columns[0:2]]
#changed order for avoid error
nonUSA.columns = ['Country', 'State']
df = pd.concat([df, pd.concat([USA, nonUSA])], axis=1)
Or:
df = df.join(pd.concat([USA, nonUSA]))
print (df)
ID Residence Name Gender Country State \
0 1 USA;CA;Los Angeles;Los Angeles Ann F USA CA
1 2 USA;MA;Suffolk;Boston Betty F USA MA
2 3 Canada;ON Carl M Canada ON
3 4 USA;FL;Charlotte David M USA FL
4 5 NA Emily F NaN NaN
5 6 Canada;QC Frank M Canada QC
6 7 USA;AZ George M USA AZ
County City
0 Los Angeles Los Angeles
1 Suffolk Boston
2 NaN NaN
3 Charlotte None
4 NaN NaN
5 NaN NaN
6 None None
But it seems it is possible simplify:
c = ['Country', 'State', 'County', 'City']
df[c] = df['Residence'].str.split(';',expand=True)
print (df)
ID Residence Name Gender Country State \
0 1 USA;CA;Los Angeles;Los Angeles Ann F USA CA
1 2 USA;MA;Suffolk;Boston Betty F USA MA
2 3 Canada;ON Carl M Canada ON
3 4 USA;FL;Charlotte David M USA FL
4 5 NA Emily F NA None
5 6 Canada;QC Frank M Canada QC
6 7 USA;AZ George M USA AZ
County City
0 Los Angeles Los Angeles
1 Suffolk Boston
2 None None
3 Charlotte None
4 None None
5 None None
6 None None
Splitting a dataframe string column into multiple different columns
A very direct way is to just use read.table
on your character vector:
> read.table(text = text, sep = ".", colClasses = "character")
V1 V2 V3 V4
1 F US CLE V13
2 F US CA6 U13
3 F US CA6 U13
4 F US CA6 U13
5 F US CA6 U13
6 F US CA6 U13
7 F US CA6 U13
8 F US CA6 U13
9 F US DL U13
10 F US DL U13
11 F US DL U13
12 F US DL Z13
13 F US DL Z13
colClasses
needs to be specified, otherwise F
gets converted to FALSE
(which is something I need to fix in "splitstackshape", otherwise I would have recommended that :) )
Update (> a year later)...
Alternatively, you can use my cSplit
function, like this:
cSplit(as.data.table(text), "text", ".")
# text_1 text_2 text_3 text_4
# 1: F US CLE V13
# 2: F US CA6 U13
# 3: F US CA6 U13
# 4: F US CA6 U13
# 5: F US CA6 U13
# 6: F US CA6 U13
# 7: F US CA6 U13
# 8: F US CA6 U13
# 9: F US DL U13
# 10: F US DL U13
# 11: F US DL U13
# 12: F US DL Z13
# 13: F US DL Z13
Or, separate
from "tidyr", like this:
library(dplyr)
library(tidyr)
as.data.frame(text) %>% separate(text, into = paste("V", 1:4, sep = "_"))
# V_1 V_2 V_3 V_4
# 1 F US CLE V13
# 2 F US CA6 U13
# 3 F US CA6 U13
# 4 F US CA6 U13
# 5 F US CA6 U13
# 6 F US CA6 U13
# 7 F US CA6 U13
# 8 F US CA6 U13
# 9 F US DL U13
# 10 F US DL U13
# 11 F US DL U13
# 12 F US DL Z13
# 13 F US DL Z13
How to split a dataframe column into 2 new columns, by slicing the all strings before the last item and last item
There are certainly alot of ways of doing this :) I would go for using str and rpartition. rpartition splits your string in 3 components, the remaining part, the partition string, and the part after remaining and the partition string. If you just take the first and remaining part you should be done.
df[["begining", "ending"]]=df.street.str.rpartition(" ")[[0,2]]
split one column into multiple columns usining delimiter
Use str.split
to split
df[['date', 'date2', 'date3']] = df['date'].replace('NULL', np.nan).str.split('+', expand=True)
and count
to count
df['number of dates'] = df[['date', 'date2', 'date3']].count(axis=1)
print(df)
ID date date2 date3 number of dates
0 3009 2016 2017 None 2
1 129 2015 None None 1
2 119 2014 2019 2020 3
3 120 2020 None None 1
4 121 NaN NaN NaN 0
How to split a column value at comma into multiple columns and rename them as its number of column as suffix
You can use str.split
to split the strings in the column and then attach the resulting DataFrame to the original DataFrame, assigning column names using its width.
temp = df['List_of_Order_Id'].str.split(',', expand=True).applymap(lambda x: np.nan if x is None else x)
df[['Order_Id_'+str(i) for i in range(1,temp.shape[1] + 1)]] = temp
Mobile ... List_of_Order_Id Order_Id_1 Order_Id_2 \
0 9.163820e+08 ... 21810 21810 NaN
1 9.179049e+08 ... 23387 23387 NaN
2 9.183748e+08 ... 21767 21767 NaN
3 9.186110e+08 ... 23457 23457 NaN
4 9.187790e+08 ... 23117,23163 23117 23163
.. ... ... ... ... NaN
353 9.970647e+09 ... 21549 21549 NaN
354 9.971940e+09 ... 22753 22753 NaN
355 9.994742e+09 ... 21505,21836,22291,22539,22734 21505 21836
356 9.994964e+09 ... 22348 22348 NaN
357 9.994997e+09 ... 21100,21550 21100 21550
Order_Id_3 Order_Id_4 Order_Id_5
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
.. NaN NaN NaN
353 NaN NaN NaN
354 NaN NaN NaN
355 22291 22539 22734
356 NaN NaN NaN
357 NaN NaN NaN
How do I split a string into several columns in a dataframe with pandas Python?
The str.split
method has an expand
argument:
>>> df['string'].str.split(',', expand=True)
0 1 2
0 astring isa string
1 another string la
2 123 232 another
>>>
With column names:
>>> df['string'].str.split(',', expand=True).rename(columns = lambda x: "string"+str(x+1))
string1 string2 string3
0 astring isa string
1 another string la
2 123 232 another
Much neater with Python >= 3.6 f-strings:
>>> (df['string'].str.split(',', expand=True)
... .rename(columns=lambda x: f"string_{x+1}"))
string_1 string_2 string_3
0 astring isa string
1 another string la
2 123 232 another
Related Topics
How to Sort a Data Frame by Alphabetic Order of a Character Variable in R
Multi-Row X-Axis Labels in Ggplot Line Chart
Loop Through Data Frame and Variable Names
Changing from Upper to Lower Case in Several Data Frames
Dynamically Select Data Frame Columns Using $ and a Character Value
Combine a List of Data Frames into One Data Frame by Row
How to Read Data When Some Numbers Contain Commas as Thousand Separator
Cannot Install R-Forge Package Using Install.Packages
Position Geom_Text on Dodged Barplot
Delete Rows Containing Specific Strings in R
Split an Audio File into Pieces of an Arbitrary Size
R Markdown - Changing Font Size and Font Type in HTML Output
Grouping Functions (Tapply, By, Aggregate) and the *Apply Family
Add Regression Line Equation and R^2 on Graph