How to Select Rows from a Dataframe Based on Column Values

How do I select rows from a DataFrame based on column values?

To select rows whose column value equals a scalar, some_value, use ==:

df.loc[df['column_name'] == some_value]

To select rows whose column value is in an iterable, some_values, use isin:

df.loc[df['column_name'].isin(some_values)]

Combine multiple conditions with &:

df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]

Note the parentheses. Due to Python's operator precedence rules, & binds more tightly than <= and >=. Thus, the parentheses in the last example are necessary. Without the parentheses

df['column_name'] >= A & df['column_name'] <= B

is parsed as

df['column_name'] >= (A & df['column_name']) <= B

which results in a Truth value of a Series is ambiguous error.


To select rows whose column value does not equal some_value, use !=:

df.loc[df['column_name'] != some_value]

isin returns a boolean Series, so to select rows whose value is not in some_values, negate the boolean Series using ~:

df.loc[~df['column_name'].isin(some_values)]

For example,

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
'B': 'one one two three two two one three'.split(),
'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
# A B C D
# 0 foo one 0 0
# 1 bar one 1 2
# 2 foo two 2 4
# 3 bar three 3 6
# 4 foo two 4 8
# 5 bar two 5 10
# 6 foo one 6 12
# 7 foo three 7 14

print(df.loc[df['A'] == 'foo'])

yields

     A      B  C   D
0 foo one 0 0
2 foo two 2 4
4 foo two 4 8
6 foo one 6 12
7 foo three 7 14

If you have multiple values you want to include, put them in a
list (or more generally, any iterable) and use isin:

print(df.loc[df['B'].isin(['one','three'])])

yields

     A      B  C   D
0 foo one 0 0
1 bar one 1 2
3 bar three 3 6
6 foo one 6 12
7 foo three 7 14

Note, however, that if you wish to do this many times, it is more efficient to
make an index first, and then use df.loc:

df = df.set_index(['B'])
print(df.loc['one'])

yields

       A  C   D
B
one foo 0 0
one bar 1 2
one foo 6 12

or, to include multiple values from the index use df.index.isin:

df.loc[df.index.isin(['one','two'])]

yields

       A  C   D
B
one foo 0 0
one bar 1 2
two foo 2 4
two foo 4 8
two bar 5 10
one foo 6 12

Pandas_select rows from a dataframe based on column values

You are using & thats why you are getting an empty dataframe. Frequency can not be 0.8 and 0.6 at the same time. Use | instead.

Try this:

df = df[(df['Frequency'] == 0.8) | (df['Frequency'] == 0.6)]

OR

df = df[df["Frequency"].isin([0.6,0.8])]

Selecting rows from a Dataframe based on values from multiple columns in pandas

I think I understand your modified question. After sub-selecting on a condition of B, then you can select the columns you want, such as:

In [1]: df.loc[df.B =='two'][['A', 'B']]
Out[1]:
A B
2 foo two
4 foo two
5 bar two

For example, if I wanted to concatenate all the string of column A, for which column B had value 'two', then I could do:

In [2]: df.loc[df.B =='two'].A.sum()  # <-- use .mean() for your quarterly data
Out[2]: 'foofoobar'

You could also groupby the values of column B and get such a concatenation result for every different B-group from one expression:

In [3]: df.groupby('B').apply(lambda x: x.A.sum())
Out[3]:
B
one foobarfoo
three barfoo
two foofoobar
dtype: object

To filter on A and B use numpy.logical_and:

In [1]: df.loc[np.logical_and(df.A == 'foo', df.B == 'two')]
Out[1]:
A B C D
2 foo two 2 4
4 foo two 4 8

How to select rows in pandas dataframe that only have certain values in some columns?

As per what you said, the following code meets the solution

#import libs
import pandas as pd
import numpy as np

#import dataset
df = pd.read_csv('/content/Data.csv')

#filter
df[df['Adj Close']>90]

Obs: In google colab you have to add the database through the field that has a folder symbol and then click on the field (upload to session storage)

Test in my Google Colab

Data.csv Download

how to select rows in pandas dataframe based on between from another column

You can merge your 2 lines of code by using a lambda function.

>>> df.loc['A'].loc[lambda A: A['x'].between(p, q), 'y']

1 0.9
2 1.3
Name: y, dtype: float64

The output of your code:

indices=df.loc[("A"),"x"].between(p,q)
output=df.loc[("A"),"y"][indices]
print(output)

# Output
1 0.9
2 1.3
Name: y, dtype: float64

Pandas select rows from a DataFrame based on column values?

I think your issue is due to the json structure. You are actually loading into df a single row that is the whole list of field component.

You should instead pass to the dataframe the list of records. Something like:

json_data = json.loads(data)
df = pd.DataFrame(json_data["components"])

filtered_data = df[df["ossId"] == 2550]

How to select rows based on dynamic column value?

Create N dataframes, one for each sector, then concatenate them into a single one:

out = pd.concat([pd.DataFrame(df_B[df_B['sector'] == sector].to_dict('records'))
for sector in df_A['sector'].unique().tolist()], axis=1)
print(out)

# Output
NAME sector SALES EBIT DPS NAME sector SALES EBIT DPS NAME sector SALES EBIT DPS NAME sector SALES EBIT DPS
0 AAPL IT xxxx yyyy zzz BP ENERGY xxxx yyyy zzz HSBC FINANCE xxxx yyyy zzz TGT CONSUMER xxxx yyyy zzz
1 MSFT IT xxxx yyyy zzz CVX ENERGY xxxx yyyy zzz JPM FINANCE xxxx yyyy zzz WMT CONSUMER xxxx yyyy zzz
2 GOOG IT xxxx yyyy zzz NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN MCD CONSUMER xxxx yyyy zzz
3 META IT xxxx yyyy zzz NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN


Related Topics



Leave a reply



Submit