Split a Column to Multiple Columns

How to split a dataframe string column into two columns?

There might be a better way, but this here's one approach:

                            row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
columns = ['fips','row'])
   fips                 row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL

split one column into multiple columns usining delimiter

Use str.split to split

df[['date', 'date2', 'date3']] = df['date'].replace('NULL', np.nan).str.split('+', expand=True)

and count to count

df['number of dates'] = df[['date', 'date2', 'date3']].count(axis=1)

print(df)

ID date date2 date3 number of dates
0 3009 2016 2017 None 2
1 129 2015 None None 1
2 119 2014 2019 2020 3
3 120 2020 None None 1
4 121 NaN NaN NaN 0

How to split a column into multiple (non equal) columns in R

We could use cSplit from splitstackshape

library(splitstackshape)
cSplit(DF, "Col1",",")

-output

cSplit(DF, "Col1",",")
Col1_1 Col1_2 Col1_3 Col1_4
1: a b c <NA>
2: a b <NA> <NA>
3: a b c d

How to split a Pandas DataFrame column into multiple columns if the column is a string of varying length?

You can try using str.rsplit:

Splits string around given separator/delimiter, starting from the
right.

df['Col_1'].str.rsplit(' ', 2, expand=True)

Output:

             0  1  2
0 Hello X Y
1 Hello world Q R
2 Hi S T

As a full dataframe:

df['Col_1'].str.rsplit(' ', 2, expand=True).add_prefix('nCol_').join(df)

Output:

        nCol_0 nCol_1 nCol_2            Col_1 Col_2
0 Hello X Y Hello X Y A
1 Hello world Q R Hello world Q R B
2 Hi S T Hi S T C

Python split one column into multiple columns and reattach the split columns into original dataframe

There is unique index in original data and is not changed in next code for both DataFrames, so you can use concat for join together and then add to original by DataFrame.join or concat with axis=1:

address = df['Residence'].str.split(';',expand=True)
country = address[0] != 'USA'
USA, nonUSA = address[~country], address[country]
USA.columns = ['Country', 'State', 'County', 'City']

nonUSA = nonUSA.dropna(axis=0, subset=[1])
nonUSA = nonUSA[nonUSA.columns[0:2]]
#changed order for avoid error
nonUSA.columns = ['Country', 'State']

df = pd.concat([df, pd.concat([USA, nonUSA])], axis=1)

Or:

df = df.join(pd.concat([USA, nonUSA]))
print (df)
ID Residence Name Gender Country State \
0 1 USA;CA;Los Angeles;Los Angeles Ann F USA CA
1 2 USA;MA;Suffolk;Boston Betty F USA MA
2 3 Canada;ON Carl M Canada ON
3 4 USA;FL;Charlotte David M USA FL
4 5 NA Emily F NaN NaN
5 6 Canada;QC Frank M Canada QC
6 7 USA;AZ George M USA AZ

County City
0 Los Angeles Los Angeles
1 Suffolk Boston
2 NaN NaN
3 Charlotte None
4 NaN NaN
5 NaN NaN
6 None None

But it seems it is possible simplify:

c = ['Country', 'State', 'County', 'City']
df[c] = df['Residence'].str.split(';',expand=True)
print (df)
ID Residence Name Gender Country State \
0 1 USA;CA;Los Angeles;Los Angeles Ann F USA CA
1 2 USA;MA;Suffolk;Boston Betty F USA MA
2 3 Canada;ON Carl M Canada ON
3 4 USA;FL;Charlotte David M USA FL
4 5 NA Emily F NA None
5 6 Canada;QC Frank M Canada QC
6 7 USA;AZ George M USA AZ

County City
0 Los Angeles Los Angeles
1 Suffolk Boston
2 None None
3 Charlotte None
4 None None
5 None None
6 None None

Split column to multiple columns by another column value (complicated separator)

option 1

Splitting on spaces is an option, if you have a single word for the last two columns. Use rsplit:

df['column1'].str.rsplit(n=2, expand=True)

output:

        0    1      2
0 abc 33 aaa 9g98f
1 cde aaa 95fwf
2 12 faf bbb 92gcs
3 faf bbb 7t87f

NB. this doesn't work with the updated example

option 2

Alternatively, to split on the provided delimiter:

df[['new_column1', 'new_column2']] = [a.split(f' {b} ') for a,b in
zip(df['column1'], df['column2'])]

output:

                column1 column2 new_column1 new_column2
0 abc 33 aaa 9g98f 333 aaa abc 33 9g98f 333
1 cde aaa 95fwf aaa cde 95fwf
2 12 faf bbb 92gcs bbb 12 faf 92gcs
3 faf bbb 7t87f bbb faf 7t87f

option 3

Finally, if you have many time the same delimiters and many rows, it might be worth using vectorial splitting per group:

(df
.groupby('column2')
.apply(lambda g: g['column1'].str.split(f'\s*{g.name}\s*', expand=True))
)

output:

        0          1
0 abc 33 9g98f 333
1 cde 95fwf
2 12 faf 92gcs
3 faf 7t87f

How to split a column in multiple columns using data.table

Use tstrsplit with keep = 1:3 to keep only the first three columns:

dt[, c("bins", "positions", "IDs") := tstrsplit(name, "_", fixed = TRUE, keep = 1:3)]
                                name  bin  position  ID
1: bin1_position1_ID1 bin1 position1 ID1
2: bin2_position2_ID2 bin2 position2 ID2
3: bin3_position3_ID3 bin3 position3 ID3
4: bin4_position4_ID4 bin4 position4 ID4
5: bin5_position5_ID5_another5_more5 bin5 position5 ID5

Split list in a column to multiple columns

You could map ast.literal_eval to items in df2["1"]; build a DataFrame and join it to df1:

import ast
out = df1.join(pd.DataFrame(map(ast.literal_eval, df2["1"].tolist())).add_prefix('feature_'))

Output:

                          Text    Topic  feature_0  feature_1  feature_2
0 Where is the party tonight? Party -0.011571 -0.010117 0.062448
1 Let's dance Party -0.082682 -0.001614 0.020942
2 Hello world Other -0.063768 -0.015903 0.020942
3 It is rainy today Weather 0.063796 -0.028781 0.056791


Related Topics



Leave a reply



Submit