How can I split a column of tuples in a Pandas dataframe?
You can do this by doing pd.DataFrame(col.tolist())
on that column:
In [2]: df = pd.DataFrame({'a':[1,2], 'b':[(1,2), (3,4)]})
In [3]: df
Out[3]:
a b
0 1 (1, 2)
1 2 (3, 4)
In [4]: df['b'].tolist()
Out[4]: [(1, 2), (3, 4)]
In [5]: pd.DataFrame(df['b'].tolist(), index=df.index)
Out[5]:
0 1
0 1 2
1 3 4
In [6]: df[['b1', 'b2']] = pd.DataFrame(df['b'].tolist(), index=df.index)
In [7]: df
Out[7]:
a b b1 b2
0 1 (1, 2) 1 2
1 2 (3, 4) 3 4
Note: in an earlier version, this answer recommended to use df['b'].apply(pd.Series)
instead of pd.DataFrame(df['b'].tolist(), index=df.index)
. That works as well (because it makes a Series of each tuple, which is then seen as a row of a dataframe), but it is slower / uses more memory than the tolist
version, as noted by the other answers here (thanks to denfromufa).
Split tuples columns in pandas dataframe
You can use:
df = pd.DataFrame(data={0: ['Neck', 'RShoulder', 'LShoulder', 'RElbow', 'RWrist', 'LElbow'],
1: [None, None, (840, 183), None, None, (936,255)]})
df[['new_col_1', 'new_col_2']] = df[1].apply(pd.Series)
Output:
0 1 new_col_1 new_col_2
0 Neck None NaN NaN
1 RShoulder None NaN NaN
2 LShoulder (840, 183) 840.0 183.0
3 RElbow None NaN NaN
4 RWrist None NaN NaN
5 LElbow (936, 255) 936.0 255.0
Pandas splitting Columns and creating Columns of tuples
Code
def merge(row):
return pd.Series({
"colAA": (row.colB, row.colC),
"colBB": (row.colC, row.colA),
})
df['colB'] = df['colB'].str.split(';')
df = df.explode('colB')
newDf = df.apply(merge, axis=1).reset_index(drop=True)
Explanation
You can split
colB to get list of values,
Then apply explode
function to get multiple rows
df['colB'] = df['colB'].str.split(';')
df = df.explode('colB')
# output
colA colB colC
0 rqp 129 a
1 pot 217 u
1 pot 345 u
2 ghay 716 b
3 rbba 217 d
Then apply merge function below to create new data frame
def merge(row):
for b in row.colB.split(";"):
return pd.Series({
"colAA": (b, row.colC),
"colBB": (row.colC, row.colA),
})
Then apply this function on Df
newDf = df.apply(merge, axis=1).reset_index(drop=True)
# output
colAA colBB
0 (129, a) (a, rqp)
1 (217, u) (u, pot)
2 (345, u) (u, pot)
3 (716, b) (b, ghay)
4 (217, d) (d, rbba)
5 (345, d) (d, rbba)
6 (612, a) (a, tary)
7 (811, a) (a, tary)
8 (760, a) (a, tary)
9 (716, t) (t, kals)
Splitting strings of tuples of different lengths to columns in Pandas DF
You can do it this way. It will just put None in places where it couldn't find the values. You can then append the df1 to df.
d = {'id': [1,2,3],
'human_id': ["('apples', '2022-12-04', 'a5ted')",
"('bananas', '2012-2-14')",
"('2012-2-14', 'reda21', 'ss')"
]}
df = pd.DataFrame(data=d)
list_human_id = tuple(list(df['human_id']))
newList = []
for val in listh:
newList.append(eval(val))
df1 = pd.DataFrame(newList, columns=['col1', 'col2', 'col3'])
print(df1)
Output
col1 col2 col3
0 apples 2022-12-04 a5ted
1 bananas 2012-2-14 None
2 2012-2-14 reda21 ss
Pandas: How to split a column of string of multiple tuples to multiple columns of individual string of tuple
You can use str.extract()
with regex, as follows:
df['data'].str.extract(r'(\(\d+,\s*\d+\))\s*,\s*(\(\d+,\s*\d+\))')
or use str.split()
, as follows:
df['data'].str.split(r'(?<=\))\s*,\s*', expand=True)
Here we use regex positive lookbehind to look for a closing parenthesis )
before comma ,
for the comma to match. Hence, we only split on the comma between tuples and not within tuples.
Result:
0 1
0 (0,1) (1,2)
Split a Pandas column with lists of tuples into separate columns
Try explode followed by apply ( pd.Series ) then merge back to the DataFrame:
import pandas as pd
df = pd.DataFrame({'ID': ['A', 'B', 'C'],
'col': [[('123', '456', '111', False),
('124', '456', '111', True),
('125', '456', '111', False)],
[],
[('123', '555', '333', True)]]
})
# Explode into Rows
new_df = df.explode('col').reset_index(drop=True)
# Merge Back Together
new_df = new_df.merge(
# Turn into Multiple Columns
new_df['col'].apply(pd.Series),
left_index=True,
right_index=True) \
.drop(columns=['col']) # Drop Old Col Column
# Rename Columns
new_df.columns = ['ID', 'col1', 'col2', 'col3', 'col4']
# For Display
print(new_df)
Output:
ID col1 col2 col3 col4
0 A 123 456 111 False
1 A 124 456 111 True
2 A 125 456 111 False
3 B NaN NaN NaN NaN
4 C 123 555 333 True
How to split tuples in all columns of a dataframe
Considering the following toy dataframe:
import pandas as pd
df = pd.DataFrame(
{
0: {
0: None,
1: None,
2: None,
3: ("bartenbach gmbh rinner strasse 14 aldrans", 96, 1050),
4: (
"ait austrian institute of technology gmbh giefinggasse 4 wien",
70,
537,
),
},
1: {0: None, 1: None, 2: None, 3: None, 4: None},
2: {0: None, 1: None, 2: None, 3: None, 4: None},
}
)
print(df)
# Outputs
0 1 2
0 None None None
1 None None None
2 None None None
3 (bartenbach gmbh rinner strasse 14 aldrans, 96... None None
4 (ait austrian institute of technology gmbh gie... None None
You could iterate on each column, then each value, split the string and populate a new dataframe, like this:
new_df = pd.DataFrame()
for col_num, series in df.iteritems():
for i, value in enumerate(series.values):
try:
name, score, id_num = value
new_df.loc[i, f"Name{col_num}"] = name
new_df.loc[i, f"Score{col_num}"] = score
new_df.loc[i, f"ID{col_num}"] = id_num
except TypeError:
continue
new_df = new_df.reset_index(drop=True)
print(new_df)
# Outputs
Name0 Score0 ID0
0 bartenbach gmbh rinner strasse 14 aldrans 96.0 1050.0
1 ait austrian institute of technology gmbh gief... 70.0 537.0
How can I split pandas dataframe into column of tuple, quickly?
One idea is use list comprehension:
s = pd.Series('a_1, a_2, a_3, b_1'.split(', '))
#4k rows
s = pd.concat([s] * 1000, ignore_index=True)
In [195]: %timeit s.str.split("_").apply(tuple)
2.49 ms ± 41.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [196]: %timeit [tuple(x.split('_')) for x in s]
1.46 ms ± 79.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [197]: %timeit pd.Index(s).str.split("_", expand=True).tolist()
4.31 ms ± 14.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
s = pd.Series('a_1, a_2, a_3, b_1'.split(', '))
#400k rows
s = pd.concat([s] * 100000, ignore_index=True)
In [199]: %timeit s.str.split("_").apply(tuple)
252 ms ± 4.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [200]: %timeit [tuple(x.split('_')) for x in s]
180 ms ± 370 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [201]: %timeit pd.Index(s).str.split("_", expand=True).tolist()
379 ms ± 1.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
split several columns with tuples into separate columns
Here's a solution:
new_df = pd.concat([pd.DataFrame(spl[c].tolist()).add_prefix(c[-1]) for c in spl], axis=1)
new_df.columns = pd.MultiIndex.from_arrays([np.repeat(spl.columns.get_level_values(0), 2), new_df.columns])
Output:
>>> new_df
a e
b0 b1 c0 c1 b0 b1
0 0 1 0 1 0 1
1 1 2 2 3 2 3
2 2 3 4 5 4 5
One-big-liner :)
new_df = pd.concat([pd.DataFrame(spl[c].tolist()).add_prefix(c[-1]) for c in spl], axis=1).pipe(lambda x: x.set_axis(pd.MultiIndex.from_arrays([np.repeat(spl.columns.get_level_values(0), 2), x.columns]), axis=1))
Related Topics
Index N Dimensional Array with (N-1) D Array
What Can You Use Generator Functions For
Simple Way to Encode a String According to a Password
Differences Between Staticfiles_Dir, Static_Root and Media_Root
Threading in a Pyqt Application: Use Qt Threads or Python Threads
Too Many Values to Unpack', Iterating Over a Dict. Key=>String, Value=>List
Appending the Same String to a List of Strings in Python
Windows Scipy Install: No Lapack/Blas Resources Found
Keyboard Interrupts with Python's Multiprocessing Pool
How to Activate an Anaconda Environment
What Is the Fastest Way to Flatten Arbitrarily Nested Lists in Python
How to Execute Python File in Linux
Efficient Way to Apply Multiple Filters to Pandas Dataframe or Series