Splitting a Dataframe String Column into Multiple Different Columns

How to split a dataframe string column into two columns?

There might be a better way, but this here's one approach:

                            row
    0       00000 UNITED STATES
    1             01000 ALABAMA
    2  01001 Autauga County, AL
    3  01003 Baldwin County, AL
    4  01005 Barbour County, AL

df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
                                 columns = ['fips','row'])

   fips                 row
0  00000       UNITED STATES
1  01000             ALABAMA
2  01001  Autauga County, AL
3  01003  Baldwin County, AL
4  01005  Barbour County, AL

How to split a Pandas DataFrame column into multiple columns if the column is a string of varying length?

You can try using str.rsplit:

Splits string around given separator/delimiter, starting from the
right.

df['Col_1'].str.rsplit(' ', 2, expand=True)

Output:

             0  1  2
0        Hello  X  Y
1  Hello world  Q  R
2           Hi  S  T

As a full dataframe:

df['Col_1'].str.rsplit(' ', 2, expand=True).add_prefix('nCol_').join(df)

Output:

        nCol_0 nCol_1 nCol_2            Col_1 Col_2
0        Hello      X      Y        Hello X Y     A
1  Hello world      Q      R  Hello world Q R     B
2           Hi      S      T           Hi S T     C

Python split one column into multiple columns and reattach the split columns into original dataframe

There is unique index in original data and is not changed in next code for both DataFrames, so you can use concat for join together and then add to original by DataFrame.join or concat with axis=1:

address = df['Residence'].str.split(';',expand=True)
country = address[0] != 'USA'
USA, nonUSA = address[~country], address[country]
USA.columns = ['Country', 'State', 'County', 'City']

nonUSA = nonUSA.dropna(axis=0, subset=[1])
nonUSA = nonUSA[nonUSA.columns[0:2]]
#changed order for avoid error
nonUSA.columns = ['Country', 'State']

df = pd.concat([df, pd.concat([USA, nonUSA])], axis=1)

Or:

df = df.join(pd.concat([USA, nonUSA]))
print (df)
  ID                       Residence    Name Gender Country State  \
0  1  USA;CA;Los Angeles;Los Angeles     Ann      F     USA    CA   
1  2           USA;MA;Suffolk;Boston   Betty      F     USA    MA   
2  3                       Canada;ON    Carl      M  Canada    ON   
3  4                USA;FL;Charlotte   David      M     USA    FL   
4  5                              NA   Emily      F     NaN   NaN   
5  6                       Canada;QC   Frank      M  Canada    QC   
6  7                          USA;AZ  George      M     USA    AZ   

        County         City  
0  Los Angeles  Los Angeles  
1      Suffolk       Boston  
2          NaN          NaN  
3    Charlotte         None  
4          NaN          NaN  
5          NaN          NaN  
6         None         None

But it seems it is possible simplify:

c = ['Country', 'State', 'County', 'City']
df[c] = df['Residence'].str.split(';',expand=True)
print (df)
  ID                       Residence    Name Gender Country State  \
0  1  USA;CA;Los Angeles;Los Angeles     Ann      F     USA    CA   
1  2           USA;MA;Suffolk;Boston   Betty      F     USA    MA   
2  3                       Canada;ON    Carl      M  Canada    ON   
3  4                USA;FL;Charlotte   David      M     USA    FL   
4  5                              NA   Emily      F      NA  None   
5  6                       Canada;QC   Frank      M  Canada    QC   
6  7                          USA;AZ  George      M     USA    AZ   

        County         City  
0  Los Angeles  Los Angeles  
1      Suffolk       Boston  
2         None         None  
3    Charlotte         None  
4         None         None  
5         None         None  
6         None         None

How to split a dataframe column into 2 new columns, by slicing the all strings before the last item and last item

There are certainly alot of ways of doing this :) I would go for using str and rpartition. rpartition splits your string in 3 components, the remaining part, the partition string, and the part after remaining and the partition string. If you just take the first and remaining part you should be done.

df[["begining", "ending"]]=df.street.str.rpartition(" ")[[0,2]]

How to split a dataframe string column into multiple columns?

do this..

import pandas as pd

tags = [
    "letter1=A&letter2=B&letter3=C",
    "letter1=D&letter2=E&letter3=F",
    "letter1=G&letter2=H&letter3=I",
    "letter1=J&letter2=K&letter3=L",
    "letter1=M&letter2=N&letter3=O",
    "letter1=P&letter2=R&letter3=S"
]
df = pd.DataFrame({"tags": tags})

df["letter1"] = df["tags"].apply(lambda x: x.split("&")[0].split("=")[-1])
df["letter2"] = df["tags"].apply(lambda x: x.split("&")[1].split("=")[-1])
df["letter3"] = df["tags"].apply(lambda x: x.split("&")[2].split("=")[-1])
df = df[["letter1", "letter2", "letter3"]]
df

Sample Image

How do I split a string into several columns in a dataframe with pandas Python?

The str.split method has an expand argument:

>>> df['string'].str.split(',', expand=True)
         0        1         2
0  astring      isa    string
1  another   string        la
2      123      232   another
>>>

With column names:

>>> df['string'].str.split(',', expand=True).rename(columns = lambda x: "string"+str(x+1))
   string1  string2   string3
0  astring      isa    string
1  another   string        la
2      123      232   another

Much neater with Python >= 3.6 f-strings:

>>> (df['string'].str.split(',', expand=True)
...              .rename(columns=lambda x: f"string_{x+1}"))
  string_1 string_2  string_3
0  astring      isa    string
1  another   string        la
2      123      232   another

Split column into multiple columns when a row starts with a string

try this:

pd.concat([sub.reset_index(drop=True) for _, sub in df.groupby(
    df.Group.str.contains(r'^Group\s+123').cumsum())], axis=1)
>>>

    Group           Group           Group
0   Group 123 nv-1  Group 123 mt-d2 Group 123 id-01
1   a, v            b, v            n,m
2   s,b             NaN             x, y
3   y, i            NaN             z, m
4   NaN             NaN             l,b

Split data frame string column into multiple columns (comma separated characters)

One option using str_split, unnest_longer and table

subject <- c(1,2,3)
letters <- c("a, b, f, g", "b, g, m, l", "g, m, z")

df1 <- data.frame(subject, letters)


library(tidyverse)

df1 %>%
  mutate(letters = str_split(letters, ', ')) %>%
  unnest_longer(letters) %>%
  table
#>        letters
#> subject a b f g l m z
#>       1 1 1 1 1 0 0 0
#>       2 0 1 0 1 1 1 0
#>       3 0 0 0 1 0 1 1

^{Created on 2022-02-10 by the reprex package (v2.0.0)}

Seeing some of the other answers, separate_rows is a better solution here

df1 %>%
  separate_rows(letters) %>%
  table

splitting a column into multiple columns with specific name in pandas dataframe

Use join + split + add_prefix:

df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec'))
print (df)
     pri             sec sec0  sec1  sec2  sec3  sec4
0    TOM        AB,CD,EF   AB    CD    EF  None  None
1   JACK           XY,YZ   XY    YZ  None  None  None
2  HARRY              FG   FG  None  None  None  None
3   NICK  KY,NY,SD,EF,FR   KY    NY    SD    EF    FR

And if need NaNs add fillna:

df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec').fillna(np.nan))
print (df)
     pri             sec sec0 sec1 sec2 sec3 sec4
0    TOM        AB,CD,EF   AB   CD   EF  NaN  NaN
1   JACK           XY,YZ   XY   YZ  NaN  NaN  NaN
2  HARRY              FG   FG  NaN  NaN  NaN  NaN
3   NICK  KY,NY,SD,EF,FR   KY   NY   SD   EF   FR

How to split two strings into different columns in Python with Pandas?

The key here is to include the parameter expand=True in your str.split() to expand the split strings into separate columns.

Type it like this:

df[['First String','Second String']] = df['Full String'].str.split(expand=True)

Output:

    Full String First String Second String
0  Orange Juice       Orange         Juice
1     Pink Bird         Pink          Bird
2     Blue Ball         Blue          Ball
3     Green Tea        Green           Tea
4    Yellow Sun       Yellow           Sun

Splitting a Dataframe String Column into Multiple Different Columns

How to split a dataframe string column into two columns?

How to split a Pandas DataFrame column into multiple columns if the column is a string of varying length?

Python split one column into multiple columns and reattach the split columns into original dataframe

How to split a dataframe column into 2 new columns, by slicing the all strings before the last item and last item

How to split a dataframe string column into multiple columns?

How do I split a string into several columns in a dataframe with pandas Python?

Split column into multiple columns when a row starts with a string

Split data frame string column into multiple columns (comma separated characters)

splitting a column into multiple columns with specific name in pandas dataframe

How to split two strings into different columns in Python with Pandas?

Related Topics

Leave a reply