How to Split a Column of Tuples in a Pandas Dataframe

How can I split a column of tuples in a Pandas dataframe?

You can do this by doing pd.DataFrame(col.tolist()) on that column:

In [2]: df = pd.DataFrame({'a':[1,2], 'b':[(1,2), (3,4)]})

In [3]: df
Out[3]:
   a       b
0  1  (1, 2)
1  2  (3, 4)

In [4]: df['b'].tolist()
Out[4]: [(1, 2), (3, 4)]

In [5]: pd.DataFrame(df['b'].tolist(), index=df.index)
Out[5]:
   0  1
0  1  2
1  3  4

In [6]: df[['b1', 'b2']] = pd.DataFrame(df['b'].tolist(), index=df.index)

In [7]: df
Out[7]:
   a       b  b1  b2
0  1  (1, 2)   1   2
1  2  (3, 4)   3   4

Note: in an earlier version, this answer recommended to use df['b'].apply(pd.Series) instead of pd.DataFrame(df['b'].tolist(), index=df.index). That works as well (because it makes a Series of each tuple, which is then seen as a row of a dataframe), but it is slower / uses more memory than the tolist version, as noted by the other answers here (thanks to denfromufa).

Split tuples columns in pandas dataframe

You can use:

df = pd.DataFrame(data={0: ['Neck', 'RShoulder', 'LShoulder', 'RElbow', 'RWrist', 'LElbow'],
                        1: [None, None, (840, 183), None, None, (936,255)]})

df[['new_col_1', 'new_col_2']] = df[1].apply(pd.Series)

Output:

           0           1  new_col_1  new_col_2
0       Neck        None        NaN        NaN
1  RShoulder        None        NaN        NaN
2  LShoulder  (840, 183)      840.0      183.0
3     RElbow        None        NaN        NaN
4     RWrist        None        NaN        NaN
5     LElbow  (936, 255)      936.0      255.0

Pandas splitting Columns and creating Columns of tuples

Code

def merge(row):
    return pd.Series({
            "colAA": (row.colB, row.colC),
            "colBB": (row.colC, row.colA),
        })

df['colB'] = df['colB'].str.split(';')
df = df.explode('colB')
newDf = df.apply(merge, axis=1).reset_index(drop=True)

Explanation

You can split colB to get list of values,
Then apply explode function to get multiple rows

df['colB'] = df['colB'].str.split(';')
df = df.explode('colB')

# output
    colA    colB    colC
0   rqp 129 a
1   pot 217 u
1   pot 345 u
2   ghay    716 b
3   rbba    217 d

Then apply merge function below to create new data frame

def merge(row):
    for b in row.colB.split(";"):
         return pd.Series({
            "colAA": (b, row.colC),
            "colBB": (row.colC, row.colA),

        })

Then apply this function on Df

newDf = df.apply(merge, axis=1).reset_index(drop=True)

# output
    colAA        colBB
0   (129, a)    (a, rqp)
1   (217, u)    (u, pot)
2   (345, u)    (u, pot)
3   (716, b)    (b, ghay)
4   (217, d)    (d, rbba)
5   (345, d)    (d, rbba)
6   (612, a)    (a, tary)
7   (811, a)    (a, tary)
8   (760, a)    (a, tary)
9   (716, t)    (t, kals)

Splitting strings of tuples of different lengths to columns in Pandas DF

You can do it this way. It will just put None in places where it couldn't find the values. You can then append the df1 to df.

d = {'id': [1,2,3], 
     'human_id': ["('apples', '2022-12-04', 'a5ted')", 
                  "('bananas', '2012-2-14')",
                  "('2012-2-14', 'reda21', 'ss')"
                 ]}

df = pd.DataFrame(data=d)

list_human_id = tuple(list(df['human_id']))

newList = []
for val in listh:
    newList.append(eval(val))

df1 = pd.DataFrame(newList, columns=['col1', 'col2', 'col3'])

print(df1)

Output

        col1        col2   col3
0     apples  2022-12-04  a5ted
1    bananas   2012-2-14   None
2  2012-2-14      reda21     ss

Pandas: How to split a column of string of multiple tuples to multiple columns of individual string of tuple

You can use str.extract() with regex, as follows:

df['data'].str.extract(r'(\(\d+,\s*\d+\))\s*,\s*(\(\d+,\s*\d+\))')

or use str.split(), as follows:

df['data'].str.split(r'(?<=\))\s*,\s*', expand=True)

Here we use regex positive lookbehind to look for a closing parenthesis ) before comma , for the comma to match. Hence, we only split on the comma between tuples and not within tuples.

Result:

       0      1
0  (0,1)  (1,2)

Split a Pandas column with lists of tuples into separate columns

Try explode followed by apply ( pd.Series ) then merge back to the DataFrame:

import pandas as pd

df = pd.DataFrame({'ID': ['A', 'B', 'C'],
                   'col': [[('123', '456', '111', False),
                            ('124', '456', '111', True),
                            ('125', '456', '111', False)],
                           [],
                           [('123', '555', '333', True)]]
                   })
# Explode into Rows
new_df = df.explode('col').reset_index(drop=True)  

# Merge Back Together
new_df = new_df.merge(
    # Turn into Multiple Columns
    new_df['col'].apply(pd.Series),
    left_index=True,
    right_index=True) \
    .drop(columns=['col'])  # Drop Old Col Column

# Rename Columns
new_df.columns = ['ID', 'col1', 'col2', 'col3', 'col4']

# For Display
print(new_df)

Output:

  ID col1 col2 col3   col4
0  A  123  456  111  False
1  A  124  456  111   True
2  A  125  456  111  False
3  B  NaN  NaN  NaN    NaN
4  C  123  555  333   True

How to split tuples in all columns of a dataframe

Considering the following toy dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        0: {
            0: None,
            1: None,
            2: None,
            3: ("bartenbach gmbh rinner strasse 14 aldrans", 96, 1050),
            4: (
                "ait austrian institute of technology gmbh giefinggasse 4 wien",
                70,
                537,
            ),
        },
        1: {0: None, 1: None, 2: None, 3: None, 4: None},
        2: {0: None, 1: None, 2: None, 3: None, 4: None},
    }
)

print(df)
# Outputs
                                                   0     1     2
0                                               None  None  None
1                                               None  None  None
2                                               None  None  None
3  (bartenbach gmbh rinner strasse 14 aldrans, 96...  None  None
4  (ait austrian institute of technology gmbh gie...  None  None

You could iterate on each column, then each value, split the string and populate a new dataframe, like this:

new_df = pd.DataFrame()

for col_num, series in df.iteritems():
    for i, value in enumerate(series.values):
        try:
            name, score, id_num = value
            new_df.loc[i, f"Name{col_num}"] = name
            new_df.loc[i, f"Score{col_num}"] = score
            new_df.loc[i, f"ID{col_num}"] = id_num
        except TypeError:
            continue
new_df = new_df.reset_index(drop=True)

print(new_df)
# Outputs
                                               Name0  Score0     ID0
0          bartenbach gmbh rinner strasse 14 aldrans    96.0  1050.0
1  ait austrian institute of technology gmbh gief...    70.0   537.0

How can I split pandas dataframe into column of tuple, quickly?

One idea is use list comprehension:

s = pd.Series('a_1, a_2, a_3, b_1'.split(', '))
#4k rows
s = pd.concat([s] * 1000, ignore_index=True)

In [195]: %timeit s.str.split("_").apply(tuple)
2.49 ms ± 41.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [196]: %timeit [tuple(x.split('_')) for x in s]
1.46 ms ± 79.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [197]: %timeit pd.Index(s).str.split("_", expand=True).tolist()
4.31 ms ± 14.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

s = pd.Series('a_1, a_2, a_3, b_1'.split(', '))
#400k rows
s = pd.concat([s] * 100000, ignore_index=True)

In [199]: %timeit s.str.split("_").apply(tuple)
252 ms ± 4.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [200]: %timeit [tuple(x.split('_')) for x in s]
180 ms ± 370 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [201]: %timeit pd.Index(s).str.split("_", expand=True).tolist()
379 ms ± 1.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

split several columns with tuples into separate columns

Here's a solution:

new_df = pd.concat([pd.DataFrame(spl[c].tolist()).add_prefix(c[-1]) for c in spl], axis=1)
new_df.columns = pd.MultiIndex.from_arrays([np.repeat(spl.columns.get_level_values(0), 2), new_df.columns])

Output:

>>> new_df
   a           e   
  b0 b1 c0 c1 b0 b1
0  0  1  0  1  0  1
1  1  2  2  3  2  3
2  2  3  4  5  4  5

One-big-liner :)

new_df = pd.concat([pd.DataFrame(spl[c].tolist()).add_prefix(c[-1]) for c in spl], axis=1).pipe(lambda x: x.set_axis(pd.MultiIndex.from_arrays([np.repeat(spl.columns.get_level_values(0), 2), x.columns]), axis=1))

How to Split a Column of Tuples in a Pandas Dataframe

How can I split a column of tuples in a Pandas dataframe?

Split tuples columns in pandas dataframe

Pandas splitting Columns and creating Columns of tuples

Splitting strings of tuples of different lengths to columns in Pandas DF

Pandas: How to split a column of string of multiple tuples to multiple columns of individual string of tuple

Split a Pandas column with lists of tuples into separate columns

How to split tuples in all columns of a dataframe

How can I split pandas dataframe into column of tuple, quickly?

split several columns with tuples into separate columns

Related Topics

Leave a reply