Pandas convert dataframe to array of tuples
list(data_set.itertuples(index=False))
As of 17.1, the above will return a list of namedtuples.
If you want a list of ordinary tuples, pass name=None
as an argument:
list(data_set.itertuples(index=False, name=None))
How to convert dataframe into list of tuples with respect to category?
Use GroupBy.agg
with tuple
like:
print (df.groupby('id_customer', sort=False)['product_id'].agg(tuple).tolist())
print (df.groupby('id_customer', sort=False)['product_id'].apply(tuple).tolist())
print (list(df.groupby('id_customer', sort=False)['product_id'].agg(tuple)))
print (list(df.groupby('id_customer', sort=False)['product_id'].apply(tuple)))
[('5', '7', '8'), ('5', '30')]
Pandas convert dataframe to array of tuples without None
Use nested list comprehension with filtering:
data = [tuple([y for y in x if y is not None]) for x in data.values]
print (data)
[('a', 'b', 'c', 'd'), ('a', 'b', 'c')]
Slowier alternative if large data - reshape for remove None
s and aggregate by first level of MultiIndex
for tuples:
data = data.stack().groupby(level=0).apply(tuple).tolist()
print (data)
[('a', 'b', 'c', 'd'), ('a', 'b', 'c')]
List of tuples for each pandas dataframe slice
I think this should work too:
import pandas as pd
import itertools
df = pd.DataFrame({"A": [1, 2, 3, 1], "B": [2, 2, 2, 2], "C": ["A", "B", "C", "B"]})
tuples_in_df = sorted(tuple(df.to_records(index=False)), key=lambda x: x[0])
output = [[tuple(x)[1:] for x in group] for _, group in itertools.groupby(tuples_in_df, lambda x: x[0])]
print(output)
Out:
[[(2, 'A'), (2, 'B')], [(2, 'B')], [(2, 'C')]]
Converting pandas dataframe into list of tuples with index
You can iterate over the result of to_records(index=True)
.
Say you start with this:
In [6]: df = pd.DataFrame({'a': range(3, 7), 'b': range(1, 5), 'c': range(2, 6)}).set_index('a')
In [7]: df
Out[7]:
b c
a
3 1 2
4 2 3
5 3 4
6 4 5
then this works, except that it does not include the index (a
):
In [8]: [tuple(x) for x in df.to_records(index=False)]
Out[8]: [(1, 2), (2, 3), (3, 4), (4, 5)]
However, if you pass index=True
, then it does what you want:
In [9]: [tuple(x) for x in df.to_records(index=True)]
Out[9]: [(3, 1, 2), (4, 2, 3), (5, 3, 4), (6, 4, 5)]
Converting dictionary with tuple as key and values as array to a pandas dataframe
Use:
d = {(0, 6): np.array([[1, 2, 3, 0, 1 ,1]]),
(0, 9): np.array([[1, 2, 3, 0, 1, 1]])}
df = pd.DataFrame.from_dict({k: v[0] for k, v in d.items()}, orient='index')
df = df.rename_axis('Key').rename(columns=lambda x: f'v{x+1}').reset_index()
print (df)
Key v1 v2 v3 v4 v5 v6
0 (0, 6) 1 2 3 0 1 1
1 (0, 9) 1 2 3 0 1 1
Or:
df = pd.DataFrame(np.vstack(list(d.values()))).rename(columns=lambda x: f'v{x+1}')
df.insert(0,'Key',list(d.keys()))
print (df)
Key v1 v2 v3 v4 v5 v6
0 (0, 6) 1 2 3 0 1 1
1 (0, 9) 1 2 3 0 1 1
Related Topics
Datetime Dtypes in Pandas Read_Csv
Scraping Dynamic Content Using Python-Scrapy
Dynamically Add Field to a Form
What Does a . in an Import Statement in Python Mean
Scatter Plot and Color Mapping in Python
Python: Importing a Sub‑Package or Sub‑Module
How to Compute the Intersection Point of Two Lines
How to Sort Unicode Strings Alphabetically in Python
Round to 5 (Or Other Number) in Python
Unicodedecodeerror: 'Ascii' Codec Can't Decode Byte 0Xef in Position 1
Difference Between Two Dates in Python
How to Implement a Python for Range Loop Without an Iterator Variable
Python: Defining My Own Operators