Get list from pandas dataframe column or row?
Pandas DataFrame columns are Pandas Series when you pull them out, which you can then call x.tolist()
on to turn them into a Python list. Alternatively you cast it with list(x)
.
import pandas as pd
data_dict = {'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two': pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(data_dict)
print(f"DataFrame:\n{df}\n")
print(f"column types:\n{df.dtypes}")
col_one_list = df['one'].tolist()
col_one_arr = df['one'].to_numpy()
print(f"\ncol_one_list:\n{col_one_list}\ntype:{type(col_one_list)}")
print(f"\ncol_one_arr:\n{col_one_arr}\ntype:{type(col_one_arr)}")
Output:
DataFrame:
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
column types:
one float64
two int64
dtype: object
col_one_list:
[1.0, 2.0, 3.0, nan]
type:<class 'list'>
col_one_arr:
[ 1. 2. 3. nan]
type:<class 'numpy.ndarray'>
Pandas column tolist() while each row data being list of strings?
Your problem lies with the saving method. CSVs are not natively able to store lists unless you specifically parse them after reading.
Would it be possible for you to save time and effort by saving in another format instead? JSON natively supprots lists and is also a format that can be easily read by humans.
Here is an obligatory snippet for you:
import pandas as pd
df = pd.DataFrame([{"sentence":['aa', 'bb', 'cc']},{"sentence":['dd', 'ee', 'ff']}])
df.to_json("myfile.json")
df2 = pd.read_json("myfile.json")
Giving the following result:
>>> df2
sentence
0 [aa, bb, cc]
1 [dd, ee, ff]
Pandas column of lists, create a row for each list element
UPDATE: the solution below was helpful for older Pandas versions, because the DataFrame.explode() wasn’t available. Starting from Pandas 0.25.0 you can simply use DataFrame.explode()
.
lst_col = 'samples'
r = pd.DataFrame({
col:np.repeat(df[col].values, df[lst_col].str.len())
for col in df.columns.drop(lst_col)}
).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]
Result:
In [103]: r
Out[103]:
samples subject trial_num
0 0.10 1 1
1 -0.20 1 1
2 0.05 1 1
3 0.25 1 2
4 1.32 1 2
5 -0.17 1 2
6 0.64 1 3
7 -0.22 1 3
8 -0.71 1 3
9 -0.03 2 1
10 -0.65 2 1
11 0.76 2 1
12 1.77 2 2
13 0.89 2 2
14 0.65 2 2
15 -0.98 2 3
16 0.65 2 3
17 -0.30 2 3
PS here you may find a bit more generic solution
UPDATE: some explanations: IMO the easiest way to understand this code is to try to execute it step-by-step:
in the following line we are repeating values in one column N
times where N
- is the length of the corresponding list:
In [10]: np.repeat(df['trial_num'].values, df[lst_col].str.len())
Out[10]: array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int64)
this can be generalized for all columns, containing scalar values:
In [11]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: )
Out[11]:
trial_num subject
0 1 1
1 1 1
2 1 1
3 2 1
4 2 1
5 2 1
6 3 1
.. ... ...
11 1 2
12 2 2
13 2 2
14 2 2
15 3 2
16 3 2
17 3 2
[18 rows x 2 columns]
using np.concatenate()
we can flatten all values in the list
column (samples
) and get a 1D vector:
In [12]: np.concatenate(df[lst_col].values)
Out[12]: array([-1.04, -0.58, -1.32, 0.82, -0.59, -0.34, 0.25, 2.09, 0.12, 0.83, -0.88, 0.68, 0.55, -0.56, 0.65, -0.04, 0.36, -0.31])
putting all this together:
In [13]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: ).assign(**{lst_col:np.concatenate(df[lst_col].values)})
Out[13]:
trial_num subject samples
0 1 1 -1.04
1 1 1 -0.58
2 1 1 -1.32
3 2 1 0.82
4 2 1 -0.59
5 2 1 -0.34
6 3 1 0.25
.. ... ... ...
11 1 2 0.68
12 2 2 0.55
13 2 2 -0.56
14 2 2 0.65
15 3 2 -0.04
16 3 2 0.36
17 3 2 -0.31
[18 rows x 3 columns]
using pd.DataFrame()[df.columns]
will guarantee that we are selecting columns in the original order...
Python pandas dataframe to list by column instead of row
Just use Transpose(T) attribute:
lst=df.T.values.tolist()
OR
use transpose()
method:
lst=df.transpose().values.tolist()
If you print lst
you will get:
[['Apple', 'Orange', 'Kiwi', 'Mango'], [220, 200, 1000, 800], ['a', 'o', 'k', 'm']]
Get a list from Pandas DataFrame column headers
You can get the values as a list by doing:
list(my_dataframe.columns.values)
Also you can simply use (as shown in Ed Chum's answer):
list(my_dataframe)
Lookup a Dataframe column with list and return list matching the row of another column?
You could make Col1
the index and use .loc
:
q = df.set_index('Col1').loc[query]
Output:
>>> q
Col2 Col3
Col1
A 5 9
D 8 12
D 8 12
B 6 10
B 6 1
>>> q['Col2'].tolist()
[5, 8, 8, 6, 6]
>>> q['Col3'].tolist()
[9, 12, 12, 10, 10]
Dataframe with a column which every row is a list of values
Here's an alternative approach:
result = pd.concat(
[df[["name", "favourite_fruits"]],
pd.DataFrame(lst for lst in df["votes"]).rename(columns=lambda n: f"vote{n + 1}")],
axis=1
)
Find rows in dataframe that must contain at least 2 elements from a list
You can use set operations:
S = set(list1)
out = df[[len(set(l.split())&S)>=2 for l in df['A']]]
# or
# out = df[[len(S.intersection(l.split()))>=2 for l in df['A']]]
Output:
A B
2 brit dave red terri
Related Topics
Pygame Already Installed; However, Python Terminal Says "No Module Named 'Pygame' " (Ubuntu 20.04.1)
How to Mock Requests and the Response
Converting Integer to Binary in Python
Difference Between Variables Inside and Outside of _Init_()
Access an Arbitrary Element in a Dictionary in Python
Python Functions Call by Reference
Send File Using Post from a Python Script
Replace All Elements of Python Numpy Array That Are Greater Than Some Value
Extracting Text from a PDF File Using PDFminer in Python
Convert String in Base64 to Image and Save on Filesystem
How to Write a Python Dictionary to a CSV File
Converting Int to Bytes in Python 3
How to Install Python Packages on Windows
Does Anybody Know How to Identify Shadow Dom Web Elements Using Selenium Webdriver