Split a Column to Multiple Columns

How to split a dataframe string column into two columns?

There might be a better way, but this here's one approach:

                            row
    0       00000 UNITED STATES
    1             01000 ALABAMA
    2  01001 Autauga County, AL
    3  01003 Baldwin County, AL
    4  01005 Barbour County, AL

df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
                                 columns = ['fips','row'])

   fips                 row
0  00000       UNITED STATES
1  01000             ALABAMA
2  01001  Autauga County, AL
3  01003  Baldwin County, AL
4  01005  Barbour County, AL

split one column into multiple columns usining delimiter

Use str.split to split

df[['date', 'date2', 'date3']] = df['date'].replace('NULL', np.nan).str.split('+', expand=True)

and count to count

df['number of dates'] = df[['date', 'date2', 'date3']].count(axis=1)

print(df)

     ID  date date2 date3  number of dates
0  3009  2016  2017  None                2
1   129  2015  None  None                1
2   119  2014  2019  2020                3
3   120  2020  None  None                1
4   121   NaN   NaN   NaN                0

How to split a column into multiple (non equal) columns in R

We could use cSplit from splitstackshape

library(splitstackshape)
cSplit(DF, "Col1",",")

-output

cSplit(DF, "Col1",",")
   Col1_1 Col1_2 Col1_3 Col1_4
1:      a      b      c   <NA>
2:      a      b   <NA>   <NA>
3:      a      b      c      d

How to split a Pandas DataFrame column into multiple columns if the column is a string of varying length?

You can try using str.rsplit:

Splits string around given separator/delimiter, starting from the
right.

df['Col_1'].str.rsplit(' ', 2, expand=True)

Output:

             0  1  2
0        Hello  X  Y
1  Hello world  Q  R
2           Hi  S  T

As a full dataframe:

df['Col_1'].str.rsplit(' ', 2, expand=True).add_prefix('nCol_').join(df)

Output:

        nCol_0 nCol_1 nCol_2            Col_1 Col_2
0        Hello      X      Y        Hello X Y     A
1  Hello world      Q      R  Hello world Q R     B
2           Hi      S      T           Hi S T     C

Python split one column into multiple columns and reattach the split columns into original dataframe

There is unique index in original data and is not changed in next code for both DataFrames, so you can use concat for join together and then add to original by DataFrame.join or concat with axis=1:

address = df['Residence'].str.split(';',expand=True)
country = address[0] != 'USA'
USA, nonUSA = address[~country], address[country]
USA.columns = ['Country', 'State', 'County', 'City']

nonUSA = nonUSA.dropna(axis=0, subset=[1])
nonUSA = nonUSA[nonUSA.columns[0:2]]
#changed order for avoid error
nonUSA.columns = ['Country', 'State']

df = pd.concat([df, pd.concat([USA, nonUSA])], axis=1)

Or:

df = df.join(pd.concat([USA, nonUSA]))
print (df)
  ID                       Residence    Name Gender Country State  \
0  1  USA;CA;Los Angeles;Los Angeles     Ann      F     USA    CA   
1  2           USA;MA;Suffolk;Boston   Betty      F     USA    MA   
2  3                       Canada;ON    Carl      M  Canada    ON   
3  4                USA;FL;Charlotte   David      M     USA    FL   
4  5                              NA   Emily      F     NaN   NaN   
5  6                       Canada;QC   Frank      M  Canada    QC   
6  7                          USA;AZ  George      M     USA    AZ   

        County         City  
0  Los Angeles  Los Angeles  
1      Suffolk       Boston  
2          NaN          NaN  
3    Charlotte         None  
4          NaN          NaN  
5          NaN          NaN  
6         None         None

But it seems it is possible simplify:

c = ['Country', 'State', 'County', 'City']
df[c] = df['Residence'].str.split(';',expand=True)
print (df)
  ID                       Residence    Name Gender Country State  \
0  1  USA;CA;Los Angeles;Los Angeles     Ann      F     USA    CA   
1  2           USA;MA;Suffolk;Boston   Betty      F     USA    MA   
2  3                       Canada;ON    Carl      M  Canada    ON   
3  4                USA;FL;Charlotte   David      M     USA    FL   
4  5                              NA   Emily      F      NA  None   
5  6                       Canada;QC   Frank      M  Canada    QC   
6  7                          USA;AZ  George      M     USA    AZ   

        County         City  
0  Los Angeles  Los Angeles  
1      Suffolk       Boston  
2         None         None  
3    Charlotte         None  
4         None         None  
5         None         None  
6         None         None

Split column to multiple columns by another column value (complicated separator)

option 1

Splitting on spaces is an option, if you have a single word for the last two columns. Use rsplit:

df['column1'].str.rsplit(n=2, expand=True)

output:

        0    1      2
0  abc 33  aaa  9g98f
1     cde  aaa  95fwf
2  12 faf  bbb  92gcs
3     faf  bbb  7t87f

NB. this doesn't work with the updated example

option 2

Alternatively, to split on the provided delimiter:

df[['new_column1', 'new_column2']] = [a.split(f' {b} ') for a,b in
                                      zip(df['column1'], df['column2'])]

output:

                column1 column2 new_column1 new_column2
0  abc 33 aaa 9g98f 333     aaa      abc 33   9g98f 333
1         cde aaa 95fwf     aaa         cde       95fwf
2      12 faf bbb 92gcs     bbb      12 faf       92gcs
3         faf bbb 7t87f     bbb         faf       7t87f

option 3

Finally, if you have many time the same delimiters and many rows, it might be worth using vectorial splitting per group:

(df
 .groupby('column2')
 .apply(lambda g: g['column1'].str.split(f'\s*{g.name}\s*', expand=True)) 
)

output:

        0          1
0  abc 33  9g98f 333
1     cde      95fwf
2  12 faf      92gcs
3     faf      7t87f

How to split a column in multiple columns using data.table

Use tstrsplit with keep = 1:3 to keep only the first three columns:

dt[, c("bins", "positions", "IDs") := tstrsplit(name, "_", fixed = TRUE, keep = 1:3)]

                                name  bin  position  ID
1:                bin1_position1_ID1 bin1 position1 ID1
2:                bin2_position2_ID2 bin2 position2 ID2
3:                bin3_position3_ID3 bin3 position3 ID3
4:                bin4_position4_ID4 bin4 position4 ID4
5: bin5_position5_ID5_another5_more5 bin5 position5 ID5

Split list in a column to multiple columns

You could map ast.literal_eval to items in df2["1"]; build a DataFrame and join it to df1:

import ast
out = df1.join(pd.DataFrame(map(ast.literal_eval, df2["1"].tolist())).add_prefix('feature_'))

Output:

                          Text    Topic  feature_0  feature_1  feature_2
0  Where is the party tonight?    Party  -0.011571  -0.010117   0.062448
1                  Let's dance    Party  -0.082682  -0.001614   0.020942
2                  Hello world    Other  -0.063768  -0.015903   0.020942
3            It is rainy today  Weather   0.063796  -0.028781   0.056791

Split a Column to Multiple Columns

How to split a dataframe string column into two columns?

split one column into multiple columns usining delimiter

How to split a column into multiple (non equal) columns in R

How to split a Pandas DataFrame column into multiple columns if the column is a string of varying length?

Python split one column into multiple columns and reattach the split columns into original dataframe

Split column to multiple columns by another column value (complicated separator)

option 1

option 2

option 3

How to split a column in multiple columns using data.table

Split list in a column to multiple columns

Related Topics

Leave a reply