How to Form Tuple Column from Two Columns in Pandas

How to form tuple column from two columns in Pandas

Get comfortable with zip. It comes in handy when dealing with column data.

df['new_col'] = list(zip(df.lat, df.long))

It's less complicated and faster than using apply or map. Something like np.dstack is twice as fast as zip, but wouldn't give you tuples.

Pandas output 2 column in data frame using apply function which returns a tuple / list of 2 items

Since you need to loop through multiple columns by rows, a better / more efficient approach is to use zip + for loop to create a list of tuples which you can directly assign to a list of columns to the original data frame:

df[['c1', 'c2']] = [some_func(x, y) for x, y in zip(df.x9, df.x10)]

df    
   x1  x2  x3  x4  x5  x6  x7  x8  x9  x10        c1          c2
0  20  67  76  95  28  60  82  81  90   93  0.516667  288.300000
1  94  30  97  82  51  10  54  43  36   41  0.569444  140.083333
2  50  57  85  48  67  65  41  91  48   46  0.479167  132.250000
3  61  36  44  59  18  71  42  18  56   77  0.687500  317.625000
4  11  85  34  66  45  55  21  42  77   27  0.175325   28.402597
5  20  19  86  46  97  21  84  12  86   98  0.569767  335.023256
6  24  87  65  62  22  43  26  80  15   64  2.133333  819.200000
7  38  15  23  22  89  89  19  32  21   33  0.785714  155.571429
8  82  88  64  89  92  88  15  30  85   83  0.488235  243.141176
9  96  24  91  70  96  54  57  81  59   32  0.271186   52.067797

Pandas list of tuples from two columns containing list

A list comprehension with zipping:

df["dev_mod"] = [list(zip(dev_name, dev_model))
                 for dev_name, dev_model in zip(df.device_names, df.device_models)]

to get

   user_id    device_names   device_models                           dev_mod
0        1  [dev_1, dev_2]  [mod_1, mod_2]  [(dev_1, mod_1), (dev_2, mod_2)]
1        2  [dev_1, dev_5]  [mod_1, mod_5]  [(dev_1, mod_1), (dev_5, mod_5)]

Second zip glues two columns together, first one kind of transposes to get the desired result.

Pandas: Create a tuple column from multiple columns

You can use:

my_df['event_time'] = my_df[['event','time']].apply(tuple, axis=1)

Or:

my_df['event_time'] = tuple(zip(my_df['event'], my_df['time']))

Or:

my_df['event_time'] = [tuple(x) for x in my_df[['event','time']].values.tolist()]

All return:

print (my_df)
  Person event        time       event_time
0   John     A  2017-10-11  (A, 2017-10-11)
1   John     B  2017-10-12  (B, 2017-10-12)
2   John     C  2017-10-14  (C, 2017-10-14)
3   John     D  2017-10-15  (D, 2017-10-15)
4    Ann     X  2017-09-01  (X, 2017-09-01)
5    Ann     Y  2017-09-02  (Y, 2017-09-02)
6   Dave     M  2017-10-05  (M, 2017-10-05)
7   Dave     N  2017-10-07  (N, 2017-10-07)
8   Dave     Q  2017-10-20  (Q, 2017-10-20)

How to form tuple column from two columns in Pandas of only non empty values

Thanks for the helpful comments guys, Here's something that worked for me:

import copy
def remove_empty(x):
    for c in copy.copy(x): 
        if not c: 
            x.discard(c)
    return x

df['new_col'] = zip(df.lat, df.long)
df['new_col'] = df['new_col'].apply(set)
df['new_col'] = df['new_col'].apply(remove_empty)

How can I split a column of tuples in a Pandas dataframe?

You can do this by doing pd.DataFrame(col.tolist()) on that column:

In [2]: df = pd.DataFrame({'a':[1,2], 'b':[(1,2), (3,4)]})

In [3]: df
Out[3]:
   a       b
0  1  (1, 2)
1  2  (3, 4)

In [4]: df['b'].tolist()
Out[4]: [(1, 2), (3, 4)]

In [5]: pd.DataFrame(df['b'].tolist(), index=df.index)
Out[5]:
   0  1
0  1  2
1  3  4

In [6]: df[['b1', 'b2']] = pd.DataFrame(df['b'].tolist(), index=df.index)

In [7]: df
Out[7]:
   a       b  b1  b2
0  1  (1, 2)   1   2
1  2  (3, 4)   3   4

Note: in an earlier version, this answer recommended to use df['b'].apply(pd.Series) instead of pd.DataFrame(df['b'].tolist(), index=df.index). That works as well (because it makes a Series of each tuple, which is then seen as a row of a dataframe), but it is slower / uses more memory than the tolist version, as noted by the other answers here (thanks to denfromufa).

python pandas data frame: assign function return tuple to two columns of a data frame

`zip`/`map`

data['c'], data['d'] = zip(*map(givetup, data['b']))

data

   a             b    c    d
0  1       ssdfsdf  ssd  SSD
1  2        bbbbbb  bbb  BBB
2  3  cccccccccccc  ccc  CCC
3  4           ddd  ddd  DDD
4  5        eeeeee  eee  EEE
5  6        ffffff  fff  FFF

`Series.str` and `assign`

This is specific to the examples given in givetup. But if it is possible to disentangle, then it is likely worth it.

The assign method arguments can take calables that reference columns created in an argument jus prior (NEAT).

data.assign(c=lambda d: d.b.str[0:3], d=lambda d: d.c.str.upper())

   a             b    c    d
0  1       ssdfsdf  ssd  SSD
1  2        bbbbbb  bbb  BBB
2  3  cccccccccccc  ccc  CCC
3  4           ddd  ddd  DDD
4  5        eeeeee  eee  EEE
5  6        ffffff  fff  FFF

Timings

data = pd.concat([data] * 10_000, ignore_index=True)

%timeit data['c'], data['d'] = zip(*map(givetup, data['b']))
%timeit data[['c','d']] = [givetup(a) for a in data['b']]
%timeit data.assign(c=lambda d: d.b.str[0:3], d=lambda d: d.c.str.upper())

69.7 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
137 ms ± 937 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
34.6 ms ± 235 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Split Pandas Series Tuple on the fly to Pandas Column

Assign both str to 2 columns:

df = pd.DataFrame(data={'a':[1, 2, 3, 4, 5, 6]})

s = pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)])

df['b'], df['c'] = s.str[0], s.str[1]

Or create 2 columns DataFrame:

s = pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)])

df[['b', 'c']] = pd.DataFrame(s.tolist(), index=df.index)
print(df)
   a    b   c
0  1  NaN   1
1  2   AB  10
2  3   CD   1
3  4    3   1
4  5    4   1
5  6   NA   1

What is same like one lines code:

df['b'], df['c'] = pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)]).str[0], pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)]).str[1]
df[['b', 'c']] = pd.DataFrame(pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)]).tolist(), index=df.index)

How to Form Tuple Column from Two Columns in Pandas

How to form tuple column from two columns in Pandas

Pandas output 2 column in data frame using apply function which returns a tuple / list of 2 items

Pandas list of tuples from two columns containing list

Pandas: Create a tuple column from multiple columns

How to form tuple column from two columns in Pandas of only non empty values

How can I split a column of tuples in a Pandas dataframe?

python pandas data frame: assign function return tuple to two columns of a data frame

`zip`/`map`

`Series.str` and `assign`

Timings

Split Pandas Series Tuple on the fly to Pandas Column

Related Topics

Leave a reply

How to form tuple column from two columns in Pandas

Pandas output 2 column in data frame using apply function which returns a tuple / list of 2 items

Pandas list of tuples from two columns containing list

Pandas: Create a tuple column from multiple columns

How to form tuple column from two columns in Pandas of only non empty values

How can I split a column of tuples in a Pandas dataframe?

python pandas data frame: assign function return tuple to two columns of a data frame

zip/map

Series.str and assign

Timings

Split Pandas Series Tuple on the fly to Pandas Column

Related Topics

Leave a reply

`zip`/`map`

`Series.str` and `assign`