Pandas split column into multiple columns by comma
In case someone else wants to split a single column (deliminated by a value) into multiple columns - try this:
series.str.split(',', expand=True)
This answered the question I came here looking for.
Credit to EdChum's code that includes adding the split columns back to the dataframe.
pd.concat([df[[0]], df[1].str.split(', ', expand=True)], axis=1)
Note: The first argument df[[0]]
is DataFrame
.
The second argument df[1].str.split
is the series that you want to split.
split Documentation
concat Documentation
How to split a column value at comma into multiple columns and rename them as its number of column as suffix
You can use str.split
to split the strings in the column and then attach the resulting DataFrame to the original DataFrame, assigning column names using its width.
temp = df['List_of_Order_Id'].str.split(',', expand=True).applymap(lambda x: np.nan if x is None else x)
df[['Order_Id_'+str(i) for i in range(1,temp.shape[1] + 1)]] = temp
Mobile ... List_of_Order_Id Order_Id_1 Order_Id_2 \
0 9.163820e+08 ... 21810 21810 NaN
1 9.179049e+08 ... 23387 23387 NaN
2 9.183748e+08 ... 21767 21767 NaN
3 9.186110e+08 ... 23457 23457 NaN
4 9.187790e+08 ... 23117,23163 23117 23163
.. ... ... ... ... NaN
353 9.970647e+09 ... 21549 21549 NaN
354 9.971940e+09 ... 22753 22753 NaN
355 9.994742e+09 ... 21505,21836,22291,22539,22734 21505 21836
356 9.994964e+09 ... 22348 22348 NaN
357 9.994997e+09 ... 21100,21550 21100 21550
Order_Id_3 Order_Id_4 Order_Id_5
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
.. NaN NaN NaN
353 NaN NaN NaN
354 NaN NaN NaN
355 22291 22539 22734
356 NaN NaN NaN
357 NaN NaN NaN
How to split a dataframe string column into two columns?
There might be a better way, but this here's one approach:
row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
columns = ['fips','row'])
fips row
0 00000 UNITED STATES
1 01000 ALABAMA
2 01001 Autauga County, AL
3 01003 Baldwin County, AL
4 01005 Barbour County, AL
How to split comma separated text into columns on pandas dataframe?
Maybe you can try this without pivot.
Create the dataframe.
import pandas as pd
import io
s = '''Data
a,b,c
a,c,d
d,e
a,e
a,b,c,d,e'''
df = pd.read_csv(io.StringIO(s), sep = "\s+")
We can use pandas.Series.str.split
with expand
argument equals to True
. And value_counts
each rows with axis = 1
.
Finally fillna
with zero and change the data into integer with astype(int)
.
df["Data"].str.split(pat = ",", expand=True).apply(lambda x : x.value_counts(), axis = 1).fillna(0).astype(int)
#
a b c d e
0 1 1 1 0 0
1 1 0 1 1 0
2 0 0 0 1 1
3 1 0 0 0 1
4 1 1 1 1 1
And then merge it with the original column.
new = df["Data"].str.split(pat = ",", expand=True).apply(lambda x : x.value_counts(), axis = 1).fillna(0).astype(int)
pd.concat([df, new], axis = 1)
#
Data a b c d e
0 a,b,c 1 1 1 0 0
1 a,c,d 1 0 1 1 0
2 d,e 0 0 0 1 1
3 a,e 1 0 0 0 1
4 a,b,c,d,e 1 1 1 1 1
How to Split a column into two by comma delimiter, and put a value without comma in second column and not in first?
We can try using str.extract
here:
df["Location"] = df["Origin"].str.extract(r'(.*),')
df["Country"] = df["Origin"].str.extract(r'(\w+(?: \w+)*)$')
Python or pandas split columns by comma and append into rows
The pandas DataFrame has explode
method that does exactly what you want. See explode() documentation. It works with list-like object, so if the column you want to explode is of type string, then you need to split it into list. See str.split() documentation. Additionally you can remove any white spaces with Pandas map function.
Full code example:
import pandas as pd
df = pd.DataFrame({
"x": [1,2,3,4],
"y": ["a, b, c, d", "e, f, g", "h, i", "j, k, l, m, n"]
})
# Convert string with commas into list of string and strip spaces
df['y'] = df['y'].str.split(',').map(lambda elements: [e.strip() for e in elements])
# Explode lists in the column 'y' into separate values
df.explode('y')
Output:
x y
0 1 a
0 1 b
0 1 c
0 1 d
1 2 e
1 2 f
1 2 g
2 3 h
2 3 i
3 4 j
3 4 k
3 4 l
3 4 m
3 4 n
Pandas: pivot comma delimited column into multiple columns
You could use str.get_dummies
to get the dummy variables; then join
back to df
:
out = df[['id']].join(df['type'].str.get_dummies(sep=',').add_prefix('type_').replace(0, float('nan')))
Output:
id type_a type_b type_c type_d type_e
0 1 1.0 1.0 1.0 1.0 NaN
1 2 NaN 1.0 NaN 1.0 NaN
2 3 NaN NaN 1.0 NaN 1.0
3 4 NaN NaN NaN NaN NaN
How to split comma separated strings in a column into different columns if they're not of same length using python or pandas in jupyter notebook
We can use a regular expression pattern to find all the matching key-value pairs from each row of column_A
, then map
the list of pairs from each row to dictionary in order to create records then construct a dataframe from these records
pd.DataFrame(map(dict, df['column_A'].str.findall(r'\s*([^:,]+):\s*([^,]+)')))
See the online regex demo
Garbage Organics Recycle Junk
0 Tissues Milk Cardboards NaN
1 Paper Towels Eggs Glass Feces
2 cups NaN Plastic bottles NaN
Here is an alternate approach in case you don't want to use regular expression patterns
df['column_A'].str.split(', ').explode()\
.str.split(': ', expand=True)\
.set_index(0, append=True)[1].unstack()
Related Topics
Remove Trailing Newline from the Elements of a String List
Difference Between .String and .Text Beautifulsoup
Python Typeerror: Not Enough Arguments for Format String
Trying to Import Module with the Same Name as a Built-In Module Causes an Import Error
How to Sort a List with Two Keys But One in Reverse Order
How to Sort a List of Tuples According to Another List
How to Plot a Gradient Color Line in Matplotlib
How to Do/Workaround a Conditional Join in Python Pandas
Play Animations in Gif with Tkinter
How to Escape Special Characters of a String with Single Backslashes
Search for String in All Pandas Dataframe Columns and Filter
Does Python Have a Bitfield Type
First Non-Null Value Per Row from a List of Pandas Columns
Python Argparse Mutual Exclusive Group