Nested if else statements over a number of columns
Edit: Updated solution using the fast melt/dcast methods implemented in data.table
versions >= 1.9.0. Go here for more info.
require(data.table)
require(reshape2)
dt <- as.data.table(df)
# melt data.table
dt.m <- melt(dt, id=c("marker", "alleleA", "alleleB"),
variable.name="id", value.name="val")
dt.m[, id := gsub("\\.[0-9]+$", "", id)] # replace `.[0-9]` with nothing
# aggregation
dt.m <- dt.m[, list(alleleA = alleleA[1],
alleleB = alleleB[1], val = max(val)),
keyby=list(marker, id)][val <= 0.8, val := NA]
# casting back
dt.c <- dcast.data.table(dt.m, marker + alleleA + alleleB ~ id)
# marker alleleA alleleB X345 X346 X818
# 1: chr3_21902130_21902131_A_T A T NA 0.8626 0.8626
# 2: chr3_21902134_21902135_T_C T C NA NA NA
# 3: kgp5209280_chr3_21902067 T A 1 1.0000 1.0000
Solution 1: Probably not the best way, but this is what I could think of at the moment:
mm <- t(apply(df[-(1:3)], 1, function(x) tapply(x, gl(3,3), max)))
mode(mm) <- "numeric"
mm[mm < 0.8] <- NA
# you can set the column names of mm here if necessary
out <- cbind(df[, 1:3], mm)
# marker alleleA alleleB 1 2 3
# 1 kgp5209280_chr3_21902067 T A 1.0000 1 1.0000
# 2 chr3_21902130_21902131_A_T A T 0.8626 NA 0.8626
# 3 chr3_21902134_21902135_T_C T C NA NA NA
gl(3,3)
gives a factor with values 1,1,1,2,2,2,3,3,3
with levels 1,2,3
. That is, tapply
will take the values x
3 at a time and get their max
(first 3, next 3 and the last 3). And apply
sends each row one by one.
Solution 2: A data.table
solution with melt
and cast
within data.table
without using reshape
or reshape2
:
require(data.table)
dt <- data.table(df)
# melt your data.table to long format
dt.melt <- dt[, list(id = names(.SD), val = unlist(.SD)),
by=list(marker, alleleA, alleleB)]
# replace `.[0-9]` with nothing
dt.melt[, id := gsub("\\.[0-9]+$", "", id)]
# get max value grouping by marker and id
dt.melt <- dt.melt[, list(alleleA = alleleA[1],
alleleB = alleleB[1],
val = max(val)),
keyby=list(marker, id)][val <= 0.8, val := NA]
# edit mnel (use setattr(,'names') to avoid copy by `names<-` within `setNames`
dt.cast <- dt.melt[, as.list(setattr(val,'names', id)),
by=list(marker, alleleA, alleleB)]
# marker alleleA alleleB X345 X346 X818
# 1: chr3_21902130_21902131_A_T A T NA 0.8626 0.8626
# 2: chr3_21902134_21902135_T_C T C NA NA NA
# 3: kgp5209280_chr3_21902067 T A 1 1.0000 1.0000
Nested ifelse statement with multiple columns
Thank you, case_when indeed solved my problem:
c <- c %>% mutate(Country = case_when(CountryAP == 109 ~ 'Afghanistan',
CountryAP == 124 ~ 'New Zealand',
CountryEr == 313 ~ 'Sweden',
CountryEr == 287 ~ 'Finland',
CountryEr == 278 ~ 'Azerbaijan'))
nested if / else if conditional on multiple column values - R
if you use a case_when()
from the dplyr-package, it becomes more readable.. you can also loose the for
.
library( dplyr )
df %>%
mutate( final.cond = case_when(
!is.na( recount ) ~ recount,
item == "a" & raw.count > 10 & loc == "in" & side == "L" ~ 0.2 * raw.count,
item == "a" & raw.count > 10 & loc == "in" & side == "R" ~ 0.6 * raw.count,
raw.count <= 10 ~ raw.count,
loc == "out" ~ raw.count,
TRUE ~ as.numeric(NA)
))
Elegant way to do nested if else statements for multiple groups
Solution using data.table
:
library(data.table)
setDT(dta)[, rank := sample(1:.N), stratum]
# uniqueID stratum rank
# 1: 952925 group1 4
# 2: 952926 group1 2
# 3: 952927 group1 1
# 4: 952928 group1 6
# 5: 952933 group1 7
# 6: 952934 group1 3
# 7: 952935 group1 5
# 8: 951641 group13 2
# 9: 952923 group13 1
# 10: 952924 group13 3
# ...
Explanation:
- Transform object into a
data.table
(setDT()
) - Sample rank per group (
, stratum]
) from 1 to.N
(how many rows there are in each group)
multiple if else conditions in pandas dataframe and derive multiple columns
You need chained comparison using upper and lower bound
def flag_df(df):
if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return 'Red'
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return 'Yellow'
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return 'Orange'
elif (df['height'] > 8):
return np.nan
df2['Flag'] = df2.apply(flag_df, axis = 1)
student score height trigger1 trigger2 trigger3 Flag
0 A 100 7 84 99 114 Yellow
1 B 96 4 95 110 125 Red
2 C 80 9 15 30 45 NaN
3 D 105 5 78 93 108 Yellow
4 E 156 3 16 31 46 Orange
Note: You can do this with a very nested np.where but I prefer to apply a function for multiple if-else
Edit: answering @Cecilia's questions
- what is the returned object is not strings but some calculations, for example, for the first condition, we want to return df['height']*2
Not sure what you tried but you can return a derived value instead of string using
def flag_df(df):
if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['height']*2
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['height']*3
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['height']*4
elif (df['height'] > 8):
return np.nan
- what if there are 'NaN' values in osome columns and I want to use df['xxx'] is None as a condition, the code seems like not working
Again not sure what code did you try but using pandas isnull
would do the trick
def flag_df(df):
if pd.isnull(df['height']):
return df['height']
elif (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['height']*2
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['height']*3
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['height']*4
elif (df['height'] > 8):
return np.nan
Nested ifelse with varying columns in data.table
Working on your 'for' loop and taking advantage of the list
- data.table
structure:
ans_col = rep_len(NA_character_, nrow(dt))
ans_val = rep_len(NA_real_, nrow(dt))
for(col in selected_cols) {
i = is.na(ans_col) & (!is.na(dt[[col]]))
ans_col[i] = col
ans_val[i] = dt[[col]][i]
}
data.frame(ans_val, ans_col)
# ans_val ans_col
#1 NA <NA>
#2 84 V3
#3 61 V1
#4 63 V1
#5 82 V4
#6 53 V4
#7 92 V4
New column with nested if else based on two columns, ignoring NA if present in only one other column
The simplest way I know of to do this is with dplyr::coalesce
:
dplyr::coalesce(c(1,0,0,NA), c(1, NA, 1, 1))
#> [1] 1 0 0 1
Why bother writing the expression to do it if someone has done it for you? ;)
Using if else on a dataframe across multiple columns
For your example dataset this will work;
Option 1, name the columns to change:
dat[which(dat$desc == "blank"), c("x", "y", "z")] <- NA
In your actual data with 40 columns, if you just want to set the last 39 columns to NA, then the following may be simpler than naming each of the columns to change;
Option 2, select columns using a range:
dat[which(dat$desc == "blank"), 2:40] <- NA
Option 3, exclude the 1st column:
dat[which(dat$desc == "blank"), -1] <- NA
Option 4, exclude a named column:
dat[which(dat$desc == "blank"), !names(dat) %in% "desc"] <- NA
As you can see, there are many ways to do this kind of operation (this is far from a complete list), and understanding how each of these options works will help you to get a better understanding of the language.
Related Topics
R - Faster Way to Calculate Rolling Statistics Over a Variable Interval
R - Ggplot2 - Highlighting Selected Points and Strange Behavior
How to Format Data for Plotly Sunburst Diagram
Overlay Geom_Points() on Geom_Boxplot(Fill=Group)
Remove Part of a String in Dataframe Column (R)
How to Get the Nth Element of Each Item of a List, Which Is Itself a Vector of Unknown Length
R Remove Last Word from String
Adding an Repeated Index for Factors in Data Frame
How to Order Bars Within All Facets
Fitting a Curve to Specific Data
How to Find Difference Between Values in Two Rows in an R Dataframe Using Dplyr
Read Multiple Xlsx Files with Multiple Sheets into One R Data Frame
Grouping Every N Minutes with Dplyr
How to Filter Data Frame with Conditions of Two Columns
Show Content for Menuitem When Menusubitems Exist in Shiny Dashboard
"'\W' Is an Unrecognized Escape" in Grep
Automated Httr Authentication with Twitter , Provide Response to Interactive Prompt in "Batch" Mode