Get list from pandas dataframe column or row?
Pandas DataFrame columns are Pandas Series when you pull them out, which you can then call x.tolist()
on to turn them into a Python list. Alternatively you cast it with list(x)
.
import pandas as pd
data_dict = {'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two': pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(data_dict)
print(f"DataFrame:\n{df}\n")
print(f"column types:\n{df.dtypes}")
col_one_list = df['one'].tolist()
col_one_arr = df['one'].to_numpy()
print(f"\ncol_one_list:\n{col_one_list}\ntype:{type(col_one_list)}")
print(f"\ncol_one_arr:\n{col_one_arr}\ntype:{type(col_one_arr)}")
Output:DataFrame:
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
column types:
one float64
two int64
dtype: object
col_one_list:
[1.0, 2.0, 3.0, nan]
type:<class 'list'>
col_one_arr:
[ 1. 2. 3. nan]
type:<class 'numpy.ndarray'>
Convert List to Pandas Dataframe Column
Use:
L = ['Thanks You', 'Its fine no problem', 'Are you sure']
#create new df
df = pd.DataFrame({'col':L})
print (df)
col
0 Thanks You
1 Its fine no problem
2 Are you sure
df = pd.DataFrame({'oldcol':[1,2,3]})
#add column to existing df
df['col'] = L
print (df)
oldcol col
0 1 Thanks You
1 2 Its fine no problem
2 3 Are you sure
Thank you DYZ:#default column name 0
df = pd.DataFrame(L)
print (df)
0
0 Thanks You
1 Its fine no problem
2 Are you sure
Read lists into columns of pandas DataFrame
Someone just recommended creating a dictionary from the data then loading that into the DataFrame like this:
In [8]: data = pd.DataFrame({'x': x, 'sin(x)': y})
In [9]: data
Out[9]:
x sin(x)
0 0.000000 0.000000e+00
1 0.349066 3.420201e-01
2 0.698132 6.427876e-01
3 1.047198 8.660254e-01
4 1.396263 9.848078e-01
5 1.745329 9.848078e-01
6 2.094395 8.660254e-01
7 2.443461 6.427876e-01
8 2.792527 3.420201e-01
9 3.141593 1.224647e-16
[10 rows x 2 columns]
Note than a dictionary is an unordered set of key-value pairs. If you care about the column orders, you should pass a list of the ordered key values to be used (you can also use this list to only include some of the dict entries):data = pd.DataFrame({'x': x, 'sin(x)': y}, columns=['x', 'sin(x)'])
Pandas column of lists, create a row for each list element
UPDATE: the solution below was helpful for older Pandas versions, because the DataFrame.explode() wasn’t available. Starting from Pandas 0.25.0 you can simply use DataFrame.explode()
.
lst_col = 'samples'
r = pd.DataFrame({
col:np.repeat(df[col].values, df[lst_col].str.len())
for col in df.columns.drop(lst_col)}
).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]
Result:In [103]: r
Out[103]:
samples subject trial_num
0 0.10 1 1
1 -0.20 1 1
2 0.05 1 1
3 0.25 1 2
4 1.32 1 2
5 -0.17 1 2
6 0.64 1 3
7 -0.22 1 3
8 -0.71 1 3
9 -0.03 2 1
10 -0.65 2 1
11 0.76 2 1
12 1.77 2 2
13 0.89 2 2
14 0.65 2 2
15 -0.98 2 3
16 0.65 2 3
17 -0.30 2 3
PS here you may find a bit more generic solutionUPDATE: some explanations: IMO the easiest way to understand this code is to try to execute it step-by-step:
in the following line we are repeating values in one column N
times where N
- is the length of the corresponding list:
In [10]: np.repeat(df['trial_num'].values, df[lst_col].str.len())
Out[10]: array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int64)
this can be generalized for all columns, containing scalar values:In [11]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: )
Out[11]:
trial_num subject
0 1 1
1 1 1
2 1 1
3 2 1
4 2 1
5 2 1
6 3 1
.. ... ...
11 1 2
12 2 2
13 2 2
14 2 2
15 3 2
16 3 2
17 3 2
[18 rows x 2 columns]
using np.concatenate()
we can flatten all values in the list
column (samples
) and get a 1D vector:In [12]: np.concatenate(df[lst_col].values)
Out[12]: array([-1.04, -0.58, -1.32, 0.82, -0.59, -0.34, 0.25, 2.09, 0.12, 0.83, -0.88, 0.68, 0.55, -0.56, 0.65, -0.04, 0.36, -0.31])
putting all this together:In [13]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: ).assign(**{lst_col:np.concatenate(df[lst_col].values)})
Out[13]:
trial_num subject samples
0 1 1 -1.04
1 1 1 -0.58
2 1 1 -1.32
3 2 1 0.82
4 2 1 -0.59
5 2 1 -0.34
6 3 1 0.25
.. ... ... ...
11 1 2 0.68
12 2 2 0.55
13 2 2 -0.56
14 2 2 0.65
15 3 2 -0.04
16 3 2 0.36
17 3 2 -0.31
[18 rows x 3 columns]
using pd.DataFrame()[df.columns]
will guarantee that we are selecting columns in the original order... How to transform dataframe column containing list of values in to its own individual column with count of occurrence?
edited based on feedback from Henry Ecker in comments, might as well have the better answer here:
You can use pd.explode()
to get everything within the lists to become separate rows, and then use pd.crosstab()
to count the occurrences.
df = presence_data.explode('presence')
pd.crosstab(index=df['id'],columns=df['presence'])
This gave me the following:presence A B C G I
id
id1 2 1 1 0 0
id2 1 2 0 1 1
Get a list from Pandas DataFrame column headers
You can get the values as a list by doing:
list(my_dataframe.columns.values)
Also you can simply use (as shown in Ed Chum's answer):list(my_dataframe)
Pandas convert column where every cell is list of strings to list of integers
You can try
df['l'] = df['l'].apply(lambda lst: list(map(int, lst)))
print(df)
C1 C2 l
0 1 7 [5, 9, 1]
1 3 1 [7, 1, 6]
Pandas: Convert columns of lists into a single list
You can use a nested list comprehension:
dataSet['combined'] = [[e for l in x for e in l]
for _,x in dataSet.filter(like='value').iterrows()]
Output: key valueA valueB valueN combined
0 1_1 [1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]
1 1_2 [7, 8, 9, 10, 11, 12] [7, 8, 9, 10, 11, 12] [7, 8, 9, 10, 11, 12] [7, 8, 9, 10, 11, 12, 7, 8, 9, 10, 11, 12, 7, 8, 9, 10, 11, 12]
2 1_3 [13, 14, 15, 16, 17, 18] [13, 14, 15, 16, 17, 18] [13, 14, 15, 16, 17, 18] [13, 14, 15, 16, 17, 18, 13, 14, 15, 16, 17, 18, 13, 14, 15, 16, 17, 18]
Timing comparison with repeated addition (100 rows, 100 columns, 1000 items per list):# repeated addition of the lists
8.66 s ± 309 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# nested list comprehension
729 ms ± 285 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Related Topics
Python Pip on Windows - Command 'Cl.Exe' Failed
Numpy Array Initialization (Fill with Identical Values)
Should I Be Adding the Django Migration Files in the .Gitignore File
Cannot Redirect Output When I Run Python Script on Windows Using Just Script's Name
Does Python Evaluate If's Conditions Lazily
Filename and Line Number of Python Script
Networkx - Change Color/Width According to Edge Attributes - Inconsistent Result
Complete Scan of Dynamodb with Boto3
Splitting a String by List of Indices
Python [Errno 98] Address Already in Use
Best Way to Make Django's Login_Required the Default
Runtimeerror: This Event Loop Is Already Running in Python
Difference Between Variable and Get_Variable in Tensorflow
Is There a Numpy Builtin to Reject Outliers from a List
How to Upsert Pandas Dataframe to Microsoft SQL Server Table