Split Data Frame String Column into Multiple Columns

How to split a dataframe string column into two columns?

There might be a better way, but this here's one approach:

                            row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
columns = ['fips','row'])
   fips                 row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL

Split data frame string column into multiple columns

Use stringr::str_split_fixed

library(stringr)
str_split_fixed(before$type, "_and_", 2)

How to split a Pandas DataFrame column into multiple columns if the column is a string of varying length?

You can try using str.rsplit:

Splits string around given separator/delimiter, starting from the
right.

df['Col_1'].str.rsplit(' ', 2, expand=True)

Output:

             0  1  2
0 Hello X Y
1 Hello world Q R
2 Hi S T

As a full dataframe:

df['Col_1'].str.rsplit(' ', 2, expand=True).add_prefix('nCol_').join(df)

Output:

        nCol_0 nCol_1 nCol_2            Col_1 Col_2
0 Hello X Y Hello X Y A
1 Hello world Q R Hello world Q R B
2 Hi S T Hi S T C

Split data frame string column into multiple columns (comma separated characters)

One option using str_split, unnest_longer and table

subject <- c(1,2,3)
letters <- c("a, b, f, g", "b, g, m, l", "g, m, z")

df1 <- data.frame(subject, letters)


library(tidyverse)

df1 %>%
mutate(letters = str_split(letters, ', ')) %>%
unnest_longer(letters) %>%
table
#> letters
#> subject a b f g l m z
#> 1 1 1 1 1 0 0 0
#> 2 0 1 0 1 1 1 0
#> 3 0 0 0 1 0 1 1

Created on 2022-02-10 by the reprex package (v2.0.0)


Seeing some of the other answers, separate_rows is a better solution here

df1 %>%
separate_rows(letters) %>%
table

Python split one column into multiple columns and reattach the split columns into original dataframe

There is unique index in original data and is not changed in next code for both DataFrames, so you can use concat for join together and then add to original by DataFrame.join or concat with axis=1:

address = df['Residence'].str.split(';',expand=True)
country = address[0] != 'USA'
USA, nonUSA = address[~country], address[country]
USA.columns = ['Country', 'State', 'County', 'City']

nonUSA = nonUSA.dropna(axis=0, subset=[1])
nonUSA = nonUSA[nonUSA.columns[0:2]]
#changed order for avoid error
nonUSA.columns = ['Country', 'State']

df = pd.concat([df, pd.concat([USA, nonUSA])], axis=1)

Or:

df = df.join(pd.concat([USA, nonUSA]))
print (df)
ID Residence Name Gender Country State \
0 1 USA;CA;Los Angeles;Los Angeles Ann F USA CA
1 2 USA;MA;Suffolk;Boston Betty F USA MA
2 3 Canada;ON Carl M Canada ON
3 4 USA;FL;Charlotte David M USA FL
4 5 NA Emily F NaN NaN
5 6 Canada;QC Frank M Canada QC
6 7 USA;AZ George M USA AZ

County City
0 Los Angeles Los Angeles
1 Suffolk Boston
2 NaN NaN
3 Charlotte None
4 NaN NaN
5 NaN NaN
6 None None

But it seems it is possible simplify:

c = ['Country', 'State', 'County', 'City']
df[c] = df['Residence'].str.split(';',expand=True)
print (df)
ID Residence Name Gender Country State \
0 1 USA;CA;Los Angeles;Los Angeles Ann F USA CA
1 2 USA;MA;Suffolk;Boston Betty F USA MA
2 3 Canada;ON Carl M Canada ON
3 4 USA;FL;Charlotte David M USA FL
4 5 NA Emily F NA None
5 6 Canada;QC Frank M Canada QC
6 7 USA;AZ George M USA AZ

County City
0 Los Angeles Los Angeles
1 Suffolk Boston
2 None None
3 Charlotte None
4 None None
5 None None
6 None None

Splitting a dataframe string column into multiple different columns

A very direct way is to just use read.table on your character vector:

> read.table(text = text, sep = ".", colClasses = "character")
V1 V2 V3 V4
1 F US CLE V13
2 F US CA6 U13
3 F US CA6 U13
4 F US CA6 U13
5 F US CA6 U13
6 F US CA6 U13
7 F US CA6 U13
8 F US CA6 U13
9 F US DL U13
10 F US DL U13
11 F US DL U13
12 F US DL Z13
13 F US DL Z13

colClasses needs to be specified, otherwise F gets converted to FALSE (which is something I need to fix in "splitstackshape", otherwise I would have recommended that :) )


Update (> a year later)...

Alternatively, you can use my cSplit function, like this:

cSplit(as.data.table(text), "text", ".")
# text_1 text_2 text_3 text_4
# 1: F US CLE V13
# 2: F US CA6 U13
# 3: F US CA6 U13
# 4: F US CA6 U13
# 5: F US CA6 U13
# 6: F US CA6 U13
# 7: F US CA6 U13
# 8: F US CA6 U13
# 9: F US DL U13
# 10: F US DL U13
# 11: F US DL U13
# 12: F US DL Z13
# 13: F US DL Z13

Or, separate from "tidyr", like this:

library(dplyr)
library(tidyr)

as.data.frame(text) %>% separate(text, into = paste("V", 1:4, sep = "_"))
# V_1 V_2 V_3 V_4
# 1 F US CLE V13
# 2 F US CA6 U13
# 3 F US CA6 U13
# 4 F US CA6 U13
# 5 F US CA6 U13
# 6 F US CA6 U13
# 7 F US CA6 U13
# 8 F US CA6 U13
# 9 F US DL U13
# 10 F US DL U13
# 11 F US DL U13
# 12 F US DL Z13
# 13 F US DL Z13

How to split a dataframe column into 2 new columns, by slicing the all strings before the last item and last item

There are certainly alot of ways of doing this :) I would go for using str and rpartition. rpartition splits your string in 3 components, the remaining part, the partition string, and the part after remaining and the partition string. If you just take the first and remaining part you should be done.

df[["begining", "ending"]]=df.street.str.rpartition(" ")[[0,2]]

split one column into multiple columns usining delimiter

Use str.split to split

df[['date', 'date2', 'date3']] = df['date'].replace('NULL', np.nan).str.split('+', expand=True)

and count to count

df['number of dates'] = df[['date', 'date2', 'date3']].count(axis=1)

print(df)

ID date date2 date3 number of dates
0 3009 2016 2017 None 2
1 129 2015 None None 1
2 119 2014 2019 2020 3
3 120 2020 None None 1
4 121 NaN NaN NaN 0

How to split a column value at comma into multiple columns and rename them as its number of column as suffix

You can use str.split to split the strings in the column and then attach the resulting DataFrame to the original DataFrame, assigning column names using its width.

temp = df['List_of_Order_Id'].str.split(',', expand=True).applymap(lambda x: np.nan if x is None else x)
df[['Order_Id_'+str(i) for i in range(1,temp.shape[1] + 1)]] = temp

Mobile ... List_of_Order_Id Order_Id_1 Order_Id_2 \
0 9.163820e+08 ... 21810 21810 NaN
1 9.179049e+08 ... 23387 23387 NaN
2 9.183748e+08 ... 21767 21767 NaN
3 9.186110e+08 ... 23457 23457 NaN
4 9.187790e+08 ... 23117,23163 23117 23163
.. ... ... ... ... NaN
353 9.970647e+09 ... 21549 21549 NaN
354 9.971940e+09 ... 22753 22753 NaN
355 9.994742e+09 ... 21505,21836,22291,22539,22734 21505 21836
356 9.994964e+09 ... 22348 22348 NaN
357 9.994997e+09 ... 21100,21550 21100 21550

Order_Id_3 Order_Id_4 Order_Id_5
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
.. NaN NaN NaN
353 NaN NaN NaN
354 NaN NaN NaN
355 22291 22539 22734
356 NaN NaN NaN
357 NaN NaN NaN

How do I split a string into several columns in a dataframe with pandas Python?

The str.split method has an expand argument:

>>> df['string'].str.split(',', expand=True)
0 1 2
0 astring isa string
1 another string la
2 123 232 another
>>>

With column names:

>>> df['string'].str.split(',', expand=True).rename(columns = lambda x: "string"+str(x+1))
string1 string2 string3
0 astring isa string
1 another string la
2 123 232 another

Much neater with Python >= 3.6 f-strings:

>>> (df['string'].str.split(',', expand=True)
... .rename(columns=lambda x: f"string_{x+1}"))
string_1 string_2 string_3
0 astring isa string
1 another string la
2 123 232 another


Related Topics



Leave a reply



Submit