Python pandas: if column A value appears more than once, assign first value of column B
It is transform
and first
df['new_code'] = df.groupby('color').code.transform('first')
Out[21]:
color code new_code
0 black E45 E45
1 mauve M46 M46
2 teal Y76 Y76
3 green G44 G44
4 teal T76 Y76
5 black B43 E45
How to select rows in Pandas dataframe where value appears more than once
You can use value_counts
+ isin
-
v = df.Parameter.value_counts()
df[df.Parameter.isin(v.index[v.gt(5)])]
For example, where K = 2
(get all items which have more than 2 readings) -
df
ID Parameter Value
0 0 A 4.3
1 1 B 3.1
2 2 C 8.9
3 3 A 2.1
4 4 A 3.9
5 5 B 4.5
v = df.Parameter.value_counts()
v
A 3
B 2
C 1
Name: Parameter, dtype: int64
df[df.Parameter.isin(v.index[v.gt(2)])]
ID Parameter Value
0 0 A 4.3
3 3 A 2.1
4 4 A 3.9
Display rows where any value in a particular column occurs more than once
Use duplicated
with subset='website'
and keep=False
:
df[df.duplicated(subset='website', keep=False)]
Sample Input:
col1 website
0 A abc.com
1 B abc.com
2 C abc.com
3 D abc.net
4 E xyz.com
5 F foo.bar
6 G xyz.com
7 H foo.baz
Sample Output:
col1 website
0 A abc.com
1 B abc.com
2 C abc.com
4 E xyz.com
6 G xyz.com
How to select rows whose value appears more than x times in the table?
You can use a mask, such as:
good_dates = filea.date.value_counts().loc[lambda s: s > 5].index.tolist()
filtered_filea = filea[filea.data.isin(good_dates)]
Identify instances where string exists more than once in a row + Python, Pandas, Dataframe
Try, you can do column filtering if you don't want to check the entire dataframe for yes, then use eq
, equals to, and sum
with axis=1 to sum values along rows then check to see if that sum is gt
, greater than, 1:
df['Result'] = df.eq('Yes').sum(1).gt(1)
Output:
ID Ans1 Ans2 Ans3 Result
0 1 Yes No No False
1 2 Yes Yes No True
2 3 Yes No No False
Return Pandas dataframe rows where more than N columns have the same value
We can using value_counts
, ge mean >=, you can change number 3 in it to what you need
df[df.apply(pd.value_counts,1).ge(3).any(1)]
Out[257]:
'A' 'B' 'C' 'D' 'E'
0 1 1 1 3 5
2 3 4 3 2 3
3 5 5 5 4 5
4 1 2 1 2 1
Pandas: Filter rows by multiple occurrences of specific substrings in column cells
This appears to work for my purposes (thanks to @Himanshuman for pointing in this direction):
import re
df = df[
(df['column_1'].str.count(r'apple', re.I) > 1) | \
(df['column_1'].str.count(r'banana', re.I) > 1)
]
It uses the bitwise OR operator (|
) between the conditions [with \
as line break character]
and will give the desired result):
column_1 column_2
1 Apple Apple Some value
2 Apple Apple Some value
5 Banana Banana Some value
6 Apple is Apple Some value
7 Banana is not Apple but is Banana Some value
Count of unique values that occur more than 100 in a data frame
This would solve the problem.
import pandas as pd
# sample dict with repeated items
d = {'drug_name':['hello', 'hello', 'hello', 'hello', 'bye', 'bye']}
df = pd.DataFrame(d)
print(df)
print()
# this gets the unique values with their respective frequency
df_counted = df['drug_name'].value_counts()
print(df_counted)
print()
# filter to values > 3
df_filtered = df_counted[df_counted>2]
print(df_filtered)
this is the sample dataframe:
drug_name
0 hello
1 hello
2 hello
3 bye
4 bye
These are the unique values counted:
hello 4
bye 2
These are the unique values > n:
hello 4
Related Topics
Categorize Continuous Variable with Dplyr
Difference Between R-Base and R-Recommended Packages
Saving and Loading a Model in R
Get Date Difference in Years (Floating Point)
How to Append a Whole Dataframe to a CSV in R
Plotting During a Loop in Rstudio
Maps, Ggplot2, Fill by State Is Missing Certain Areas on the Map
Changing Binary Variables to Yes/No
Annotate Ggplot with an Extra Tick and Label
Selecting a Subset of Columns in a Data.Table
Output in R, Avoid Writing "[1]"
Dealing with Very Small Numbers in R
Heatmap-Like Plot, But for Categorical Variables