How to Select Rows from a Dataframe Based on Column Values

How do I select rows from a DataFrame based on column values?

To select rows whose column value equals a scalar, some_value, use ==:

df.loc[df['column_name'] == some_value]

To select rows whose column value is in an iterable, some_values, use isin:

df.loc[df['column_name'].isin(some_values)]

Combine multiple conditions with &:

df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]

Note the parentheses. Due to Python's operator precedence rules, & binds more tightly than <= and >=. Thus, the parentheses in the last example are necessary. Without the parentheses

df['column_name'] >= A & df['column_name'] <= B

is parsed as

df['column_name'] >= (A & df['column_name']) <= B

which results in a Truth value of a Series is ambiguous error.

To select rows whose column value does not equal some_value, use !=:

df.loc[df['column_name'] != some_value]

isin returns a boolean Series, so to select rows whose value is not in some_values, negate the boolean Series using ~:

df.loc[~df['column_name'].isin(some_values)]

For example,

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
#      A      B  C   D
# 0  foo    one  0   0
# 1  bar    one  1   2
# 2  foo    two  2   4
# 3  bar  three  3   6
# 4  foo    two  4   8
# 5  bar    two  5  10
# 6  foo    one  6  12
# 7  foo  three  7  14

print(df.loc[df['A'] == 'foo'])

yields

     A      B  C   D
0  foo    one  0   0
2  foo    two  2   4
4  foo    two  4   8
6  foo    one  6  12
7  foo  three  7  14

If you have multiple values you want to include, put them in a
list (or more generally, any iterable) and use isin:

print(df.loc[df['B'].isin(['one','three'])])

yields

     A      B  C   D
0  foo    one  0   0
1  bar    one  1   2
3  bar  three  3   6
6  foo    one  6  12
7  foo  three  7  14

Note, however, that if you wish to do this many times, it is more efficient to
make an index first, and then use df.loc:

df = df.set_index(['B'])
print(df.loc['one'])

yields

       A  C   D
B              
one  foo  0   0
one  bar  1   2
one  foo  6  12

or, to include multiple values from the index use df.index.isin:

df.loc[df.index.isin(['one','two'])]

yields

       A  C   D
B              
one  foo  0   0
one  bar  1   2
two  foo  2   4
two  foo  4   8
two  bar  5  10
one  foo  6  12

Pandas_select rows from a dataframe based on column values

You are using & thats why you are getting an empty dataframe. Frequency can not be 0.8 and 0.6 at the same time. Use | instead.

Try this:

df = df[(df['Frequency'] == 0.8) | (df['Frequency'] == 0.6)]

df = df[df["Frequency"].isin([0.6,0.8])]

Selecting rows from a Dataframe based on values from multiple columns in pandas

I think I understand your modified question. After sub-selecting on a condition of B, then you can select the columns you want, such as:

In [1]: df.loc[df.B =='two'][['A', 'B']]
Out[1]: 
     A    B
2  foo  two
4  foo  two
5  bar  two

For example, if I wanted to concatenate all the string of column A, for which column B had value 'two', then I could do:

In [2]: df.loc[df.B =='two'].A.sum()  # <-- use .mean() for your quarterly data
Out[2]: 'foofoobar'

You could also groupby the values of column B and get such a concatenation result for every different B-group from one expression:

In [3]: df.groupby('B').apply(lambda x: x.A.sum())
Out[3]: 
B
one      foobarfoo
three       barfoo
two      foofoobar
dtype: object

To filter on A and B use numpy.logical_and:

In [1]: df.loc[np.logical_and(df.A == 'foo', df.B == 'two')]
Out[1]: 
     A    B  C  D
2  foo  two  2  4
4  foo  two  4  8

How to select rows in pandas dataframe that only have certain values in some columns?

As per what you said, the following code meets the solution

#import libs
import pandas as pd
import numpy as np

#import dataset
df = pd.read_csv('/content/Data.csv')

#filter
df[df['Adj Close']>90]

Obs: In google colab you have to add the database through the field that has a folder symbol and then click on the field (upload to session storage)

Test in my Google Colab

Data.csv Download

how to select rows in pandas dataframe based on between from another column

You can merge your 2 lines of code by using a lambda function.

>>> df.loc['A'].loc[lambda A: A['x'].between(p, q), 'y']

1    0.9
2    1.3
Name: y, dtype: float64

The output of your code:

indices=df.loc[("A"),"x"].between(p,q)
output=df.loc[("A"),"y"][indices]
print(output)

# Output
1    0.9
2    1.3
Name: y, dtype: float64

Pandas select rows from a DataFrame based on column values?

I think your issue is due to the json structure. You are actually loading into df a single row that is the whole list of field component.

You should instead pass to the dataframe the list of records. Something like:

json_data = json.loads(data)
df = pd.DataFrame(json_data["components"])

filtered_data = df[df["ossId"] == 2550]

How to select rows based on dynamic column value?

Create N dataframes, one for each sector, then concatenate them into a single one:

out = pd.concat([pd.DataFrame(df_B[df_B['sector'] == sector].to_dict('records'))
                    for sector in df_A['sector'].unique().tolist()], axis=1)
print(out)

# Output
   NAME sector SALES  EBIT  DPS NAME  sector SALES  EBIT  DPS  NAME   sector SALES  EBIT  DPS NAME    sector SALES  EBIT  DPS
0  AAPL     IT  xxxx  yyyy  zzz   BP  ENERGY  xxxx  yyyy  zzz  HSBC  FINANCE  xxxx  yyyy  zzz  TGT  CONSUMER  xxxx  yyyy  zzz
1  MSFT     IT  xxxx  yyyy  zzz  CVX  ENERGY  xxxx  yyyy  zzz   JPM  FINANCE  xxxx  yyyy  zzz  WMT  CONSUMER  xxxx  yyyy  zzz
2  GOOG     IT  xxxx  yyyy  zzz  NaN     NaN   NaN   NaN  NaN   NaN      NaN   NaN   NaN  NaN  MCD  CONSUMER  xxxx  yyyy  zzz
3  META     IT  xxxx  yyyy  zzz  NaN     NaN   NaN   NaN  NaN   NaN      NaN   NaN   NaN  NaN  NaN       NaN   NaN   NaN  NaN

How to Select Rows from a Dataframe Based on Column Values

How do I select rows from a DataFrame based on column values?

Pandas_select rows from a dataframe based on column values

Selecting rows from a Dataframe based on values from multiple columns in pandas

How to select rows in pandas dataframe that only have certain values in some columns?

how to select rows in pandas dataframe based on between from another column

Pandas select rows from a DataFrame based on column values?

How to select rows based on dynamic column value?

Related Topics

Leave a reply