Return Df with a Columns Values That Occur More Than Once

Python pandas: if column A value appears more than once, assign first value of column B

It is transform and first

df['new_code'] = df.groupby('color').code.transform('first')

Out[21]:
   color code new_code
0  black  E45      E45
1  mauve  M46      M46
2   teal  Y76      Y76
3  green  G44      G44
4   teal  T76      Y76
5  black  B43      E45

How to select rows in Pandas dataframe where value appears more than once

You can use value_counts + isin -

v = df.Parameter.value_counts()
df[df.Parameter.isin(v.index[v.gt(5)])]

For example, where K = 2 (get all items which have more than 2 readings) -

df

   ID Parameter  Value
0   0         A    4.3
1   1         B    3.1
2   2         C    8.9
3   3         A    2.1
4   4         A    3.9
5   5         B    4.5

v = df.Parameter.value_counts()
v

A    3
B    2
C    1
Name: Parameter, dtype: int64

df[df.Parameter.isin(v.index[v.gt(2)])]

   ID Parameter  Value
0   0         A    4.3
3   3         A    2.1
4   4         A    3.9

Display rows where any value in a particular column occurs more than once

Use duplicated with subset='website' and keep=False:

df[df.duplicated(subset='website', keep=False)]

Sample Input:

  col1  website
0    A  abc.com
1    B  abc.com
2    C  abc.com
3    D  abc.net
4    E  xyz.com
5    F  foo.bar
6    G  xyz.com
7    H  foo.baz

Sample Output:

  col1  website
0    A  abc.com
1    B  abc.com
2    C  abc.com
4    E  xyz.com
6    G  xyz.com

How to select rows whose value appears more than x times in the table?

You can use a mask, such as:

good_dates = filea.date.value_counts().loc[lambda s: s > 5].index.tolist()
filtered_filea = filea[filea.data.isin(good_dates)]

Identify instances where string exists more than once in a row + Python, Pandas, Dataframe

Try, you can do column filtering if you don't want to check the entire dataframe for yes, then use eq, equals to, and sum with axis=1 to sum values along rows then check to see if that sum is gt, greater than, 1:

df['Result'] = df.eq('Yes').sum(1).gt(1)

Output:

   ID Ans1 Ans2 Ans3  Result
0   1  Yes   No   No   False
1   2  Yes  Yes   No    True
2   3  Yes   No   No   False

Return Pandas dataframe rows where more than N columns have the same value

We can using value_counts, ge mean >=, you can change number 3 in it to what you need

df[df.apply(pd.value_counts,1).ge(3).any(1)]
Out[257]: 
   'A'  'B'  'C'  'D'  'E'
0    1    1    1    3    5
2    3    4    3    2    3
3    5    5    5    4    5
4    1    2    1    2    1

Pandas: Filter rows by multiple occurrences of specific substrings in column cells

This appears to work for my purposes (thanks to @Himanshuman for pointing in this direction):

import re

df = df[
        (df['column_1'].str.count(r'apple', re.I) > 1) | \
        (df['column_1'].str.count(r'banana', re.I) > 1)
        ]

It uses the bitwise OR operator (|) between the conditions [with \ as line break character]
and will give the desired result):

                             column_1    column_2
1                         Apple Apple  Some value
2                         Apple Apple  Some value
5                       Banana Banana  Some value
6                      Apple is Apple  Some value
7  Banana is not Apple but is Banana   Some value

Count of unique values that occur more than 100 in a data frame

This would solve the problem.


import pandas as pd

# sample dict with repeated items
d = {'drug_name':['hello', 'hello', 'hello', 'hello', 'bye', 'bye']}
df = pd.DataFrame(d)
print(df)
print()

# this gets the unique values with their respective frequency
df_counted = df['drug_name'].value_counts()
print(df_counted)
print()

# filter to values > 3
df_filtered = df_counted[df_counted>2]
print(df_filtered)

this is the sample dataframe:

  drug_name
0     hello
1     hello
2     hello
3       bye
4       bye

These are the unique values counted:

hello    4
bye      2

These are the unique values > n:

hello    4