Return Df with a Columns Values That Occur More Than Once

Python pandas: if column A value appears more than once, assign first value of column B

It is transform and first

df['new_code'] = df.groupby('color').code.transform('first')

Out[21]:
color code new_code
0 black E45 E45
1 mauve M46 M46
2 teal Y76 Y76
3 green G44 G44
4 teal T76 Y76
5 black B43 E45

How to select rows in Pandas dataframe where value appears more than once

You can use value_counts + isin -

v = df.Parameter.value_counts()
df[df.Parameter.isin(v.index[v.gt(5)])]

For example, where K = 2 (get all items which have more than 2 readings) -

df

ID Parameter Value
0 0 A 4.3
1 1 B 3.1
2 2 C 8.9
3 3 A 2.1
4 4 A 3.9
5 5 B 4.5

v = df.Parameter.value_counts()
v

A 3
B 2
C 1
Name: Parameter, dtype: int64

df[df.Parameter.isin(v.index[v.gt(2)])]

ID Parameter Value
0 0 A 4.3
3 3 A 2.1
4 4 A 3.9

Display rows where any value in a particular column occurs more than once

Use duplicated with subset='website' and keep=False:

df[df.duplicated(subset='website', keep=False)]

Sample Input:

  col1  website
0 A abc.com
1 B abc.com
2 C abc.com
3 D abc.net
4 E xyz.com
5 F foo.bar
6 G xyz.com
7 H foo.baz

Sample Output:

  col1  website
0 A abc.com
1 B abc.com
2 C abc.com
4 E xyz.com
6 G xyz.com

How to select rows whose value appears more than x times in the table?

You can use a mask, such as:

good_dates = filea.date.value_counts().loc[lambda s: s > 5].index.tolist()
filtered_filea = filea[filea.data.isin(good_dates)]

Identify instances where string exists more than once in a row + Python, Pandas, Dataframe

Try, you can do column filtering if you don't want to check the entire dataframe for yes, then use eq, equals to, and sum with axis=1 to sum values along rows then check to see if that sum is gt, greater than, 1:

df['Result'] = df.eq('Yes').sum(1).gt(1)

Output:

   ID Ans1 Ans2 Ans3  Result
0 1 Yes No No False
1 2 Yes Yes No True
2 3 Yes No No False

Return Pandas dataframe rows where more than N columns have the same value

We can using value_counts, ge mean >=, you can change number 3 in it to what you need

df[df.apply(pd.value_counts,1).ge(3).any(1)]
Out[257]:
'A' 'B' 'C' 'D' 'E'
0 1 1 1 3 5
2 3 4 3 2 3
3 5 5 5 4 5
4 1 2 1 2 1

Pandas: Filter rows by multiple occurrences of specific substrings in column cells

This appears to work for my purposes (thanks to @Himanshuman for pointing in this direction):

import re

df = df[
(df['column_1'].str.count(r'apple', re.I) > 1) | \
(df['column_1'].str.count(r'banana', re.I) > 1)
]

It uses the bitwise OR operator (|) between the conditions [with \ as line break character]
and will give the desired result):

                             column_1    column_2
1 Apple Apple Some value
2 Apple Apple Some value
5 Banana Banana Some value
6 Apple is Apple Some value
7 Banana is not Apple but is Banana Some value

Count of unique values that occur more than 100 in a data frame

This would solve the problem.


import pandas as pd

# sample dict with repeated items
d = {'drug_name':['hello', 'hello', 'hello', 'hello', 'bye', 'bye']}
df = pd.DataFrame(d)
print(df)
print()

# this gets the unique values with their respective frequency
df_counted = df['drug_name'].value_counts()
print(df_counted)
print()

# filter to values > 3
df_filtered = df_counted[df_counted>2]
print(df_filtered)

this is the sample dataframe:

  drug_name
0 hello
1 hello
2 hello
3 bye
4 bye

These are the unique values counted:

hello    4
bye 2

These are the unique values > n:

hello    4


Related Topics



Leave a reply



Submit