How do I select rows from a DataFrame based on column values?
To select rows whose column value equals a scalar, some_value
, use ==
:
df.loc[df['column_name'] == some_value]
To select rows whose column value is in an iterable, some_values
, use isin
:
df.loc[df['column_name'].isin(some_values)]
Combine multiple conditions with &
:
df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]
Note the parentheses. Due to Python's operator precedence rules, &
binds more tightly than <=
and >=
. Thus, the parentheses in the last example are necessary. Without the parentheses
df['column_name'] >= A & df['column_name'] <= B
is parsed as
df['column_name'] >= (A & df['column_name']) <= B
which results in a Truth value of a Series is ambiguous error.
To select rows whose column value does not equal some_value
, use !=
:
df.loc[df['column_name'] != some_value]
isin
returns a boolean Series, so to select rows whose value is not in some_values
, negate the boolean Series using ~
:
df.loc[~df['column_name'].isin(some_values)]
For example,
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
'B': 'one one two three two two one three'.split(),
'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
# A B C D
# 0 foo one 0 0
# 1 bar one 1 2
# 2 foo two 2 4
# 3 bar three 3 6
# 4 foo two 4 8
# 5 bar two 5 10
# 6 foo one 6 12
# 7 foo three 7 14
print(df.loc[df['A'] == 'foo'])
yields
A B C D
0 foo one 0 0
2 foo two 2 4
4 foo two 4 8
6 foo one 6 12
7 foo three 7 14
If you have multiple values you want to include, put them in a
list (or more generally, any iterable) and use isin
:
print(df.loc[df['B'].isin(['one','three'])])
yields
A B C D
0 foo one 0 0
1 bar one 1 2
3 bar three 3 6
6 foo one 6 12
7 foo three 7 14
Note, however, that if you wish to do this many times, it is more efficient to
make an index first, and then use df.loc
:
df = df.set_index(['B'])
print(df.loc['one'])
yields
A C D
B
one foo 0 0
one bar 1 2
one foo 6 12
or, to include multiple values from the index use df.index.isin
:
df.loc[df.index.isin(['one','two'])]
yields
A C D
B
one foo 0 0
one bar 1 2
two foo 2 4
two foo 4 8
two bar 5 10
one foo 6 12
Pandas_select rows from a dataframe based on column values
You are using & thats why you are getting an empty dataframe. Frequency can not be 0.8 and 0.6 at the same time. Use | instead.
Try this:
df = df[(df['Frequency'] == 0.8) | (df['Frequency'] == 0.6)]
OR
df = df[df["Frequency"].isin([0.6,0.8])]
Selecting rows from a Dataframe based on values from multiple columns in pandas
I think I understand your modified question. After sub-selecting on a condition of B
, then you can select the columns you want, such as:
In [1]: df.loc[df.B =='two'][['A', 'B']]
Out[1]:
A B
2 foo two
4 foo two
5 bar two
For example, if I wanted to concatenate all the string of column A, for which column B had value 'two'
, then I could do:
In [2]: df.loc[df.B =='two'].A.sum() # <-- use .mean() for your quarterly data
Out[2]: 'foofoobar'
You could also groupby
the values of column B and get such a concatenation result for every different B-group from one expression:
In [3]: df.groupby('B').apply(lambda x: x.A.sum())
Out[3]:
B
one foobarfoo
three barfoo
two foofoobar
dtype: object
To filter on A
and B
use numpy.logical_and
:
In [1]: df.loc[np.logical_and(df.A == 'foo', df.B == 'two')]
Out[1]:
A B C D
2 foo two 2 4
4 foo two 4 8
How to select rows in pandas dataframe that only have certain values in some columns?
As per what you said, the following code meets the solution
#import libs
import pandas as pd
import numpy as np
#import dataset
df = pd.read_csv('/content/Data.csv')
#filter
df[df['Adj Close']>90]
Obs: In google colab you have to add the database through the field that has a folder symbol and then click on the field (upload to session storage)
Test in my Google Colab
Data.csv Download
how to select rows in pandas dataframe based on between from another column
You can merge your 2 lines of code by using a lambda
function.
>>> df.loc['A'].loc[lambda A: A['x'].between(p, q), 'y']
1 0.9
2 1.3
Name: y, dtype: float64
The output of your code:
indices=df.loc[("A"),"x"].between(p,q)
output=df.loc[("A"),"y"][indices]
print(output)
# Output
1 0.9
2 1.3
Name: y, dtype: float64
Pandas select rows from a DataFrame based on column values?
I think your issue is due to the json structure. You are actually loading into df
a single row that is the whole list of field component
.
You should instead pass to the dataframe the list of records. Something like:
json_data = json.loads(data)
df = pd.DataFrame(json_data["components"])
filtered_data = df[df["ossId"] == 2550]
How to select rows based on dynamic column value?
Create N dataframes, one for each sector, then concatenate them into a single one:
out = pd.concat([pd.DataFrame(df_B[df_B['sector'] == sector].to_dict('records'))
for sector in df_A['sector'].unique().tolist()], axis=1)
print(out)
# Output
NAME sector SALES EBIT DPS NAME sector SALES EBIT DPS NAME sector SALES EBIT DPS NAME sector SALES EBIT DPS
0 AAPL IT xxxx yyyy zzz BP ENERGY xxxx yyyy zzz HSBC FINANCE xxxx yyyy zzz TGT CONSUMER xxxx yyyy zzz
1 MSFT IT xxxx yyyy zzz CVX ENERGY xxxx yyyy zzz JPM FINANCE xxxx yyyy zzz WMT CONSUMER xxxx yyyy zzz
2 GOOG IT xxxx yyyy zzz NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN MCD CONSUMER xxxx yyyy zzz
3 META IT xxxx yyyy zzz NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Related Topics
Why Is Printing to Stdout So Slow? Can It Be Sped Up
How to Use "/" (Directory Separator) in Both Linux and Windows in Python
Listing Available Devices in Python-Opencv
What Does the "Yield" Keyword Do
How to Keep Keys/Values in Same Order as Declared
How to Extract Numbers from a String in Python
Difference Between @Staticmethod and @Classmethod
Correct Way to Write Line to File
What's With the Integer Cache Maintained by the Interpreter
Run Multiple Python Scripts Concurrently
Get Total Physical Memory in Python
Store Large Data or a Service Connection Per Flask Session
How to Iterate Through Two Lists in Parallel
How to Make Function Decorators and Chain Them Together
Why Can't I Call Read() Twice on an Open File
Pip' Is Not Recognized as an Internal or External Command
Get Statistics For Each Group (Such as Count, Mean, etc) Using Pandas Groupby