Splitting a Dataframe String Column into Multiple Different Columns

How to split a dataframe string column into two columns?

There might be a better way, but this here's one approach:

                            row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
columns = ['fips','row'])
   fips                 row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL

How to split a Pandas DataFrame column into multiple columns if the column is a string of varying length?

You can try using str.rsplit:

Splits string around given separator/delimiter, starting from the
right.

df['Col_1'].str.rsplit(' ', 2, expand=True)

Output:

             0  1  2
0 Hello X Y
1 Hello world Q R
2 Hi S T

As a full dataframe:

df['Col_1'].str.rsplit(' ', 2, expand=True).add_prefix('nCol_').join(df)

Output:

        nCol_0 nCol_1 nCol_2            Col_1 Col_2
0 Hello X Y Hello X Y A
1 Hello world Q R Hello world Q R B
2 Hi S T Hi S T C

Python split one column into multiple columns and reattach the split columns into original dataframe

There is unique index in original data and is not changed in next code for both DataFrames, so you can use concat for join together and then add to original by DataFrame.join or concat with axis=1:

address = df['Residence'].str.split(';',expand=True)
country = address[0] != 'USA'
USA, nonUSA = address[~country], address[country]
USA.columns = ['Country', 'State', 'County', 'City']

nonUSA = nonUSA.dropna(axis=0, subset=[1])
nonUSA = nonUSA[nonUSA.columns[0:2]]
#changed order for avoid error
nonUSA.columns = ['Country', 'State']

df = pd.concat([df, pd.concat([USA, nonUSA])], axis=1)

Or:

df = df.join(pd.concat([USA, nonUSA]))
print (df)
ID Residence Name Gender Country State \
0 1 USA;CA;Los Angeles;Los Angeles Ann F USA CA
1 2 USA;MA;Suffolk;Boston Betty F USA MA
2 3 Canada;ON Carl M Canada ON
3 4 USA;FL;Charlotte David M USA FL
4 5 NA Emily F NaN NaN
5 6 Canada;QC Frank M Canada QC
6 7 USA;AZ George M USA AZ

County City
0 Los Angeles Los Angeles
1 Suffolk Boston
2 NaN NaN
3 Charlotte None
4 NaN NaN
5 NaN NaN
6 None None

But it seems it is possible simplify:

c = ['Country', 'State', 'County', 'City']
df[c] = df['Residence'].str.split(';',expand=True)
print (df)
ID Residence Name Gender Country State \
0 1 USA;CA;Los Angeles;Los Angeles Ann F USA CA
1 2 USA;MA;Suffolk;Boston Betty F USA MA
2 3 Canada;ON Carl M Canada ON
3 4 USA;FL;Charlotte David M USA FL
4 5 NA Emily F NA None
5 6 Canada;QC Frank M Canada QC
6 7 USA;AZ George M USA AZ

County City
0 Los Angeles Los Angeles
1 Suffolk Boston
2 None None
3 Charlotte None
4 None None
5 None None
6 None None

How to split a dataframe column into 2 new columns, by slicing the all strings before the last item and last item

There are certainly alot of ways of doing this :) I would go for using str and rpartition. rpartition splits your string in 3 components, the remaining part, the partition string, and the part after remaining and the partition string. If you just take the first and remaining part you should be done.

df[["begining", "ending"]]=df.street.str.rpartition(" ")[[0,2]]

How to split a dataframe string column into multiple columns?

do this..

import pandas as pd

tags = [
"letter1=A&letter2=B&letter3=C",
"letter1=D&letter2=E&letter3=F",
"letter1=G&letter2=H&letter3=I",
"letter1=J&letter2=K&letter3=L",
"letter1=M&letter2=N&letter3=O",
"letter1=P&letter2=R&letter3=S"
]
df = pd.DataFrame({"tags": tags})

df["letter1"] = df["tags"].apply(lambda x: x.split("&")[0].split("=")[-1])
df["letter2"] = df["tags"].apply(lambda x: x.split("&")[1].split("=")[-1])
df["letter3"] = df["tags"].apply(lambda x: x.split("&")[2].split("=")[-1])
df = df[["letter1", "letter2", "letter3"]]
df

Sample Image

How do I split a string into several columns in a dataframe with pandas Python?

The str.split method has an expand argument:

>>> df['string'].str.split(',', expand=True)
0 1 2
0 astring isa string
1 another string la
2 123 232 another
>>>

With column names:

>>> df['string'].str.split(',', expand=True).rename(columns = lambda x: "string"+str(x+1))
string1 string2 string3
0 astring isa string
1 another string la
2 123 232 another

Much neater with Python >= 3.6 f-strings:

>>> (df['string'].str.split(',', expand=True)
... .rename(columns=lambda x: f"string_{x+1}"))
string_1 string_2 string_3
0 astring isa string
1 another string la
2 123 232 another

Split column into multiple columns when a row starts with a string

try this:

pd.concat([sub.reset_index(drop=True) for _, sub in df.groupby(
df.Group.str.contains(r'^Group\s+123').cumsum())], axis=1)
>>>

Group Group Group
0 Group 123 nv-1 Group 123 mt-d2 Group 123 id-01
1 a, v b, v n,m
2 s,b NaN x, y
3 y, i NaN z, m
4 NaN NaN l,b

Split data frame string column into multiple columns (comma separated characters)

One option using str_split, unnest_longer and table

subject <- c(1,2,3)
letters <- c("a, b, f, g", "b, g, m, l", "g, m, z")

df1 <- data.frame(subject, letters)


library(tidyverse)

df1 %>%
mutate(letters = str_split(letters, ', ')) %>%
unnest_longer(letters) %>%
table
#> letters
#> subject a b f g l m z
#> 1 1 1 1 1 0 0 0
#> 2 0 1 0 1 1 1 0
#> 3 0 0 0 1 0 1 1

Created on 2022-02-10 by the reprex package (v2.0.0)


Seeing some of the other answers, separate_rows is a better solution here

df1 %>%
separate_rows(letters) %>%
table

splitting a column into multiple columns with specific name in pandas dataframe

Use join + split + add_prefix:

df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec'))
print (df)
pri sec sec0 sec1 sec2 sec3 sec4
0 TOM AB,CD,EF AB CD EF None None
1 JACK XY,YZ XY YZ None None None
2 HARRY FG FG None None None None
3 NICK KY,NY,SD,EF,FR KY NY SD EF FR

And if need NaNs add fillna:

df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec').fillna(np.nan))
print (df)
pri sec sec0 sec1 sec2 sec3 sec4
0 TOM AB,CD,EF AB CD EF NaN NaN
1 JACK XY,YZ XY YZ NaN NaN NaN
2 HARRY FG FG NaN NaN NaN NaN
3 NICK KY,NY,SD,EF,FR KY NY SD EF FR

How to split two strings into different columns in Python with Pandas?

The key here is to include the parameter expand=True in your str.split() to expand the split strings into separate columns.

Type it like this:

df[['First String','Second String']] = df['Full String'].str.split(expand=True)

Output:

    Full String First String Second String
0 Orange Juice Orange Juice
1 Pink Bird Pink Bird
2 Blue Ball Blue Ball
3 Green Tea Green Tea
4 Yellow Sun Yellow Sun


Related Topics



Leave a reply



Submit