How to form tuple column from two columns in Pandas
Get comfortable with zip
. It comes in handy when dealing with column data.
df['new_col'] = list(zip(df.lat, df.long))
It's less complicated and faster than using apply
or map
. Something like np.dstack
is twice as fast as zip
, but wouldn't give you tuples.
Pandas output 2 column in data frame using apply function which returns a tuple / list of 2 items
Since you need to loop through multiple columns by rows, a better / more efficient approach is to use zip
+ for loop to create a list of tuples which you can directly assign to a list of columns to the original data frame:
df[['c1', 'c2']] = [some_func(x, y) for x, y in zip(df.x9, df.x10)]
df
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 c1 c2
0 20 67 76 95 28 60 82 81 90 93 0.516667 288.300000
1 94 30 97 82 51 10 54 43 36 41 0.569444 140.083333
2 50 57 85 48 67 65 41 91 48 46 0.479167 132.250000
3 61 36 44 59 18 71 42 18 56 77 0.687500 317.625000
4 11 85 34 66 45 55 21 42 77 27 0.175325 28.402597
5 20 19 86 46 97 21 84 12 86 98 0.569767 335.023256
6 24 87 65 62 22 43 26 80 15 64 2.133333 819.200000
7 38 15 23 22 89 89 19 32 21 33 0.785714 155.571429
8 82 88 64 89 92 88 15 30 85 83 0.488235 243.141176
9 96 24 91 70 96 54 57 81 59 32 0.271186 52.067797
Pandas list of tuples from two columns containing list
A list comprehension with zipping:
df["dev_mod"] = [list(zip(dev_name, dev_model))
for dev_name, dev_model in zip(df.device_names, df.device_models)]
to get
user_id device_names device_models dev_mod
0 1 [dev_1, dev_2] [mod_1, mod_2] [(dev_1, mod_1), (dev_2, mod_2)]
1 2 [dev_1, dev_5] [mod_1, mod_5] [(dev_1, mod_1), (dev_5, mod_5)]
Second zip
glues two columns together, first one kind of transposes to get the desired result.
Pandas: Create a tuple column from multiple columns
You can use:
my_df['event_time'] = my_df[['event','time']].apply(tuple, axis=1)
Or:
my_df['event_time'] = tuple(zip(my_df['event'], my_df['time']))
Or:
my_df['event_time'] = [tuple(x) for x in my_df[['event','time']].values.tolist()]
All return:
print (my_df)
Person event time event_time
0 John A 2017-10-11 (A, 2017-10-11)
1 John B 2017-10-12 (B, 2017-10-12)
2 John C 2017-10-14 (C, 2017-10-14)
3 John D 2017-10-15 (D, 2017-10-15)
4 Ann X 2017-09-01 (X, 2017-09-01)
5 Ann Y 2017-09-02 (Y, 2017-09-02)
6 Dave M 2017-10-05 (M, 2017-10-05)
7 Dave N 2017-10-07 (N, 2017-10-07)
8 Dave Q 2017-10-20 (Q, 2017-10-20)
How to form tuple column from two columns in Pandas of only non empty values
Thanks for the helpful comments guys, Here's something that worked for me:
import copy
def remove_empty(x):
for c in copy.copy(x):
if not c:
x.discard(c)
return x
df['new_col'] = zip(df.lat, df.long)
df['new_col'] = df['new_col'].apply(set)
df['new_col'] = df['new_col'].apply(remove_empty)
How can I split a column of tuples in a Pandas dataframe?
You can do this by doing pd.DataFrame(col.tolist())
on that column:
In [2]: df = pd.DataFrame({'a':[1,2], 'b':[(1,2), (3,4)]})
In [3]: df
Out[3]:
a b
0 1 (1, 2)
1 2 (3, 4)
In [4]: df['b'].tolist()
Out[4]: [(1, 2), (3, 4)]
In [5]: pd.DataFrame(df['b'].tolist(), index=df.index)
Out[5]:
0 1
0 1 2
1 3 4
In [6]: df[['b1', 'b2']] = pd.DataFrame(df['b'].tolist(), index=df.index)
In [7]: df
Out[7]:
a b b1 b2
0 1 (1, 2) 1 2
1 2 (3, 4) 3 4
Note: in an earlier version, this answer recommended to use df['b'].apply(pd.Series)
instead of pd.DataFrame(df['b'].tolist(), index=df.index)
. That works as well (because it makes a Series of each tuple, which is then seen as a row of a dataframe), but it is slower / uses more memory than the tolist
version, as noted by the other answers here (thanks to denfromufa).
python pandas data frame: assign function return tuple to two columns of a data frame
zip
/map
data['c'], data['d'] = zip(*map(givetup, data['b']))
data
a b c d
0 1 ssdfsdf ssd SSD
1 2 bbbbbb bbb BBB
2 3 cccccccccccc ccc CCC
3 4 ddd ddd DDD
4 5 eeeeee eee EEE
5 6 ffffff fff FFF
Series.str
and assign
This is specific to the examples given in givetup
. But if it is possible to disentangle, then it is likely worth it.
The assign
method arguments can take calables that reference columns created in an argument jus prior (NEAT).
data.assign(c=lambda d: d.b.str[0:3], d=lambda d: d.c.str.upper())
a b c d
0 1 ssdfsdf ssd SSD
1 2 bbbbbb bbb BBB
2 3 cccccccccccc ccc CCC
3 4 ddd ddd DDD
4 5 eeeeee eee EEE
5 6 ffffff fff FFF
Timings
data = pd.concat([data] * 10_000, ignore_index=True)
%timeit data['c'], data['d'] = zip(*map(givetup, data['b']))
%timeit data[['c','d']] = [givetup(a) for a in data['b']]
%timeit data.assign(c=lambda d: d.b.str[0:3], d=lambda d: d.c.str.upper())
69.7 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
137 ms ± 937 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
34.6 ms ± 235 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Split Pandas Series Tuple on the fly to Pandas Column
Assign both str
to 2 columns:
df = pd.DataFrame(data={'a':[1, 2, 3, 4, 5, 6]})
s = pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)])
df['b'], df['c'] = s.str[0], s.str[1]
Or create 2 columns DataFrame:
s = pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)])
df[['b', 'c']] = pd.DataFrame(s.tolist(), index=df.index)
print(df)
a b c
0 1 NaN 1
1 2 AB 10
2 3 CD 1
3 4 3 1
4 5 4 1
5 6 NA 1
What is same like one lines code:
df['b'], df['c'] = pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)]).str[0], pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)]).str[1]
df[['b', 'c']] = pd.DataFrame(pd.Series([(np.nan, 1), ('AB', 10), ('CD', 1), (3, 1), (4, 1), ('NA', 1)]).tolist(), index=df.index)
Related Topics
Truncate to Three Decimals in Python
Why Can't Environmental Variables Set in Python Persist
Pygame How to Let Balls Collide
Can One Get Hierarchical Graphs from Networkx with Python 3
Python 2.X - Write Binary Output to Stdout
How to Take the Nth Digit of a Number in Python
Python Requests.Exceptions.Sslerror: Eof Occurred in Violation of Protocol
Unicodedecodeerror: 'Ascii' Codec Can't Decode Byte 0Xe2 in Position 13: Ordinal Not in Range(128)
Using Print() (The Function Version) in Python2.X
Schedule a Repeating Event in Python 3
How to Use a Conditional Expression (Expression with If and Else) in a List Comprehension
How to Check If All Items in a List Are There in Another List
Operation on Every Pair of Element in a List
Reading Dynamically Generated Web Pages Using Python