How to Form Tuple Column from Two Columns in Pandas

How to form tuple column from two columns in Pandas

Get comfortable with zip. It comes in handy when dealing with column data.

df['new_col'] = list(zip(df.lat, df.long))

It's less complicated and faster than using apply or map. Something like np.dstack is twice as fast as zip, but wouldn't give you tuples.

Pandas output 2 column in data frame using apply function which returns a tuple / list of 2 items

Since you need to loop through multiple columns by rows, a better / more efficient approach is to use zip + for loop to create a list of tuples which you can directly assign to a list of columns to the original data frame:

df[['c1', 'c2']] = [some_func(x, y) for x, y in zip(df.x9, df.x10)]

df
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 c1 c2
0 20 67 76 95 28 60 82 81 90 93 0.516667 288.300000
1 94 30 97 82 51 10 54 43 36 41 0.569444 140.083333
2 50 57 85 48 67 65 41 91 48 46 0.479167 132.250000
3 61 36 44 59 18 71 42 18 56 77 0.687500 317.625000
4 11 85 34 66 45 55 21 42 77 27 0.175325 28.402597
5 20 19 86 46 97 21 84 12 86 98 0.569767 335.023256
6 24 87 65 62 22 43 26 80 15 64 2.133333 819.200000
7 38 15 23 22 89 89 19 32 21 33 0.785714 155.571429
8 82 88 64 89 92 88 15 30 85 83 0.488235 243.141176
9 96 24 91 70 96 54 57 81 59 32 0.271186 52.067797

Pandas list of tuples from two columns containing list

A list comprehension with zipping:

df["dev_mod"] = [list(zip(dev_name, dev_model))
for dev_name, dev_model in zip(df.device_names, df.device_models)]

to get

   user_id    device_names   device_models                           dev_mod
0 1 [dev_1, dev_2] [mod_1, mod_2] [(dev_1, mod_1), (dev_2, mod_2)]
1 2 [dev_1, dev_5] [mod_1, mod_5] [(dev_1, mod_1), (dev_5, mod_5)]

Second zip glues two columns together, first one kind of transposes to get the desired result.

Pandas: Create a tuple column from multiple columns

You can use:

my_df['event_time'] = my_df[['event','time']].apply(tuple, axis=1)

Or:

my_df['event_time'] = tuple(zip(my_df['event'], my_df['time']))

Or:

my_df['event_time'] = [tuple(x) for x in my_df[['event','time']].values.tolist()]

All return:

print (my_df)
Person event time event_time
0 John A 2017-10-11 (A, 2017-10-11)
1 John B 2017-10-12 (B, 2017-10-12)
2 John C 2017-10-14 (C, 2017-10-14)
3 John D 2017-10-15 (D, 2017-10-15)
4 Ann X 2017-09-01 (X, 2017-09-01)
5 Ann Y 2017-09-02 (Y, 2017-09-02)
6 Dave M 2017-10-05 (M, 2017-10-05)
7 Dave N 2017-10-07 (N, 2017-10-07)
8 Dave Q 2017-10-20 (Q, 2017-10-20)

How to form tuple column from two columns in Pandas of only non empty values

Thanks for the helpful comments guys, Here's something that worked for me:

import copy
def remove_empty(x):
for c in copy.copy(x):
if not c:
x.discard(c)
return x

df['new_col'] = zip(df.lat, df.long)
df['new_col'] = df['new_col'].apply(set)
df['new_col'] = df['new_col'].apply(remove_empty)

How can I split a column of tuples in a Pandas dataframe?

You can do this by doing pd.DataFrame(col.tolist()) on that column:

In [2]: df = pd.DataFrame({'a':[1,2], 'b':[(1,2), (3,4)]})

In [3]: df
Out[3]:
a b
0 1 (1, 2)
1 2 (3, 4)

In [4]: df['b'].tolist()
Out[4]: [(1, 2), (3, 4)]

In [5]: pd.DataFrame(df['b'].tolist(), index=df.index)
Out[5]:
0 1
0 1 2
1 3 4

In [6]: df[['b1', 'b2']] = pd.DataFrame(df['b'].tolist(), index=df.index)

In [7]: df
Out[7]:
a b b1 b2
0 1 (1, 2) 1 2
1 2 (3, 4) 3 4

Note: in an earlier version, this answer recommended to use df['b'].apply(pd.Series) instead of pd.DataFrame(df['b'].tolist(), index=df.index). That works as well (because it makes a Series of each tuple, which is then seen as a row of a dataframe), but it is slower / uses more memory than the tolist version, as noted by the other answers here (thanks to denfromufa).

python pandas data frame: assign function return tuple to two columns of a data frame

zip/map

data['c'], data['d'] = zip(*map(givetup, data['b']))

data

a b c d
0 1 ssdfsdf ssd SSD
1 2 bbbbbb bbb BBB
2 3 cccccccccccc ccc CCC
3 4 ddd ddd DDD
4 5 eeeeee eee EEE
5 6 ffffff fff FFF


Series.str and assign

This is specific to the examples given in givetup. But if it is possible to disentangle, then it is likely worth it.

The assign method arguments can take calables that reference columns created in an argument jus prior (NEAT).

data.assign(c=lambda d: d.b.str[0:3], d=lambda d: d.c.str.upper())

a b c d
0 1 ssdfsdf ssd SSD
1 2 bbbbbb bbb BBB
2 3 cccccccccccc ccc CCC
3 4 ddd ddd DDD
4 5 eeeeee eee EEE
5 6 ffffff fff FFF


Timings

data = pd.concat([data] * 10_000, ignore_index=True)

%timeit data['c'], data['d'] = zip(*map(givetup, data['b']))
%timeit data[['c','d']] = [givetup(a) for a in data['b']]
%timeit data.assign(c=lambda d: d.b.str[0:3], d=lambda d: d.c.str.upper())

69.7 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
137 ms ± 937 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
34.6 ms ± 235 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Split Pandas Series Tuple on the fly to Pandas Column

Assign both str to 2 columns:

df = pd.DataFrame(data={'a':[1, 2, 3, 4, 5, 6]})

s = pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)])

df['b'], df['c'] = s.str[0], s.str[1]

Or create 2 columns DataFrame:

s = pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)])

df[['b', 'c']] = pd.DataFrame(s.tolist(), index=df.index)
print(df)
a b c
0 1 NaN 1
1 2 AB 10
2 3 CD 1
3 4 3 1
4 5 4 1
5 6 NA 1

What is same like one lines code:

df['b'], df['c'] = pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)]).str[0], pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)]).str[1]
df[['b', 'c']] = pd.DataFrame(pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)]).tolist(), index=df.index)


Related Topics



Leave a reply



Submit