Nested If Else Statements Over a Number of Columns

Nested if else statements over a number of columns

Edit: Updated solution using the fast melt/dcast methods implemented in data.table versions >= 1.9.0. Go here for more info.

require(data.table)
require(reshape2)
dt <- as.data.table(df)

# melt data.table
dt.m <- melt(dt, id=c("marker", "alleleA", "alleleB"),
variable.name="id", value.name="val")
dt.m[, id := gsub("\\.[0-9]+$", "", id)] # replace `.[0-9]` with nothing
# aggregation
dt.m <- dt.m[, list(alleleA = alleleA[1],
alleleB = alleleB[1], val = max(val)),
keyby=list(marker, id)][val <= 0.8, val := NA]
# casting back
dt.c <- dcast.data.table(dt.m, marker + alleleA + alleleB ~ id)
# marker alleleA alleleB X345 X346 X818
# 1: chr3_21902130_21902131_A_T A T NA 0.8626 0.8626
# 2: chr3_21902134_21902135_T_C T C NA NA NA
# 3: kgp5209280_chr3_21902067 T A 1 1.0000 1.0000

Solution 1: Probably not the best way, but this is what I could think of at the moment:

mm <- t(apply(df[-(1:3)], 1, function(x) tapply(x, gl(3,3), max)))
mode(mm) <- "numeric"
mm[mm < 0.8] <- NA
# you can set the column names of mm here if necessary
out <- cbind(df[, 1:3], mm)

# marker alleleA alleleB 1 2 3
# 1 kgp5209280_chr3_21902067 T A 1.0000 1 1.0000
# 2 chr3_21902130_21902131_A_T A T 0.8626 NA 0.8626
# 3 chr3_21902134_21902135_T_C T C NA NA NA

gl(3,3) gives a factor with values 1,1,1,2,2,2,3,3,3 with levels 1,2,3. That is, tapply will take the values x 3 at a time and get their max (first 3, next 3 and the last 3). And apply sends each row one by one.


Solution 2: A data.table solution with melt and cast within data.table without using reshape or reshape2:

require(data.table)
dt <- data.table(df)
# melt your data.table to long format
dt.melt <- dt[, list(id = names(.SD), val = unlist(.SD)),
by=list(marker, alleleA, alleleB)]
# replace `.[0-9]` with nothing
dt.melt[, id := gsub("\\.[0-9]+$", "", id)]
# get max value grouping by marker and id
dt.melt <- dt.melt[, list(alleleA = alleleA[1],
alleleB = alleleB[1],
val = max(val)),
keyby=list(marker, id)][val <= 0.8, val := NA]
# edit mnel (use setattr(,'names') to avoid copy by `names<-` within `setNames`
dt.cast <- dt.melt[, as.list(setattr(val,'names', id)),
by=list(marker, alleleA, alleleB)]

# marker alleleA alleleB X345 X346 X818
# 1: chr3_21902130_21902131_A_T A T NA 0.8626 0.8626
# 2: chr3_21902134_21902135_T_C T C NA NA NA
# 3: kgp5209280_chr3_21902067 T A 1 1.0000 1.0000

Nested ifelse statement with multiple columns

Thank you, case_when indeed solved my problem:

c <- c %>%   mutate(Country = case_when(CountryAP == 109 ~ 'Afghanistan',
CountryAP == 124 ~ 'New Zealand',
CountryEr == 313 ~ 'Sweden',
CountryEr == 287 ~ 'Finland',
CountryEr == 278 ~ 'Azerbaijan'))

nested if / else if conditional on multiple column values - R

if you use a case_when() from the dplyr-package, it becomes more readable.. you can also loose the for.

library( dplyr )
df %>%
mutate( final.cond = case_when(
!is.na( recount ) ~ recount,
item == "a" & raw.count > 10 & loc == "in" & side == "L" ~ 0.2 * raw.count,
item == "a" & raw.count > 10 & loc == "in" & side == "R" ~ 0.6 * raw.count,
raw.count <= 10 ~ raw.count,
loc == "out" ~ raw.count,
TRUE ~ as.numeric(NA)
))

Elegant way to do nested if else statements for multiple groups

Solution using data.table:

library(data.table)
setDT(dta)[, rank := sample(1:.N), stratum]

# uniqueID stratum rank
# 1: 952925 group1 4
# 2: 952926 group1 2
# 3: 952927 group1 1
# 4: 952928 group1 6
# 5: 952933 group1 7
# 6: 952934 group1 3
# 7: 952935 group1 5
# 8: 951641 group13 2
# 9: 952923 group13 1
# 10: 952924 group13 3
# ...

Explanation:

  1. Transform object into a data.table (setDT())
  2. Sample rank per group (, stratum]) from 1 to .N (how many rows there are in each group)

multiple if else conditions in pandas dataframe and derive multiple columns

You need chained comparison using upper and lower bound

def flag_df(df):

if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return 'Red'
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return 'Yellow'
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return 'Orange'
elif (df['height'] > 8):
return np.nan

df2['Flag'] = df2.apply(flag_df, axis = 1)

student score height trigger1 trigger2 trigger3 Flag
0 A 100 7 84 99 114 Yellow
1 B 96 4 95 110 125 Red
2 C 80 9 15 30 45 NaN
3 D 105 5 78 93 108 Yellow
4 E 156 3 16 31 46 Orange

Note: You can do this with a very nested np.where but I prefer to apply a function for multiple if-else

Edit: answering @Cecilia's questions

  1. what is the returned object is not strings but some calculations, for example, for the first condition, we want to return df['height']*2

Not sure what you tried but you can return a derived value instead of string using

def flag_df(df):

if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['height']*2
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['height']*3
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['height']*4
elif (df['height'] > 8):
return np.nan

  1. what if there are 'NaN' values in osome columns and I want to use df['xxx'] is None as a condition, the code seems like not working

Again not sure what code did you try but using pandas isnull would do the trick

def flag_df(df):

if pd.isnull(df['height']):
return df['height']
elif (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['height']*2
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['height']*3
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['height']*4
elif (df['height'] > 8):
return np.nan

Nested ifelse with varying columns in data.table

Working on your 'for' loop and taking advantage of the list - data.table structure:

ans_col = rep_len(NA_character_, nrow(dt))
ans_val = rep_len(NA_real_, nrow(dt))
for(col in selected_cols) {
i = is.na(ans_col) & (!is.na(dt[[col]]))
ans_col[i] = col
ans_val[i] = dt[[col]][i]
}
data.frame(ans_val, ans_col)
# ans_val ans_col
#1 NA <NA>
#2 84 V3
#3 61 V1
#4 63 V1
#5 82 V4
#6 53 V4
#7 92 V4

New column with nested if else based on two columns, ignoring NA if present in only one other column

The simplest way I know of to do this is with dplyr::coalesce:


dplyr::coalesce(c(1,0,0,NA), c(1, NA, 1, 1))
#> [1] 1 0 0 1

Why bother writing the expression to do it if someone has done it for you? ;)

Using if else on a dataframe across multiple columns

For your example dataset this will work;

Option 1, name the columns to change:

dat[which(dat$desc == "blank"), c("x", "y", "z")] <- NA

In your actual data with 40 columns, if you just want to set the last 39 columns to NA, then the following may be simpler than naming each of the columns to change;

Option 2, select columns using a range:

dat[which(dat$desc == "blank"), 2:40] <- NA

Option 3, exclude the 1st column:

dat[which(dat$desc == "blank"), -1] <- NA

Option 4, exclude a named column:

dat[which(dat$desc == "blank"), !names(dat) %in% "desc"] <- NA

As you can see, there are many ways to do this kind of operation (this is far from a complete list), and understanding how each of these options works will help you to get a better understanding of the language.



Related Topics



Leave a reply



Submit