Using ifelse statement on the whole dataset instead of a single column
You could do
DF[] <- as.integer(DF > 0)
DF
# Var1 Var2 Var3
#1 1 1 1
#2 1 1 0
#3 0 1 1
#4 1 1 1
#5 1 0 1
In case you want to extend your dataframe, try
DF[paste0(names(DF), "_Binary")] <- as.integer(DF > 0)
DF
# Var1 Var2 Var3 Var1_Binary Var2_Binary Var3_Binary
#1 1 1 1 1 1 1
#2 3 2 0 1 1 0
#3 0 1 2 0 1 1
#4 3 3 1 1 1 1
#5 5 0 3 1 0 1
data
DF <- structure(list(Var1 = c(1L, 3L, 0L, 3L, 5L), Var2 = c(1L, 2L,
1L, 3L, 0L), Var3 = c(1L, 0L, 2L, 1L, 3L)), .Names = c("Var1",
"Var2", "Var3"), row.names = c(NA, -5L), class = "data.frame")
How to shift value with inside nested ifelse statement?
Using dplyr
one way is to group_by
group_UID
and occurrence of "Non"
value and assign NA
to first row and first id
in each group otherwise.
library(dplyr)
df %>%
group_by(group_UID, group = cumsum(Amount_type == "Non")) %>%
mutate(p_RID = ifelse(row_number() == 1, NA, id[1L])) %>%
ungroup() %>%
select(-group)
# id user_id Amount_type group_UID p_RID
# <int> <int> <fct> <int> <int>
#1 30 11 Non 1 NA
#2 31 11 Draw 1 30
#3 54 5 Non 2 NA
#4 322 5 Draw 2 54
#5 21 5 Draw 2 54
#6 13 5 Non 2 NA
#7 2445 5 Draw 2 13
#8 111 44 Non 3 NA
#9 287 44 Draw 3 111
Another way would be
df %>%
group_by(group_UID, group = cumsum(Amount_type == "Non")) %>%
mutate(p_RID = ifelse(Amount_type == "Non", NA, first(id))) %>%
ungroup() %>%
select(-group)
We can also use base R ave
here
with(df, ave(id, group_UID, cumsum(Amount_type == "Non"), FUN = function(x)
ifelse(seq_along(x) == 1, NA, x[1L])))
#[1] NA 30 NA 54 54 NA 13 NA 111
Using the ifelse statement in R
You can do it in a two-liner instead:
z <- data$x
z[data$ind == 0] <- colSums(m[,data$ind == 0])
[1] -1.3367324 0.1836433 1.3413668 1.5952808 4.5120996 -0.8204684 1.2736029 0.7383247 3.4748021
[10] -0.3053884
more generally, you could use an apply
function. This will in general be slower than a straight vectorised solution, like the above. Here's sapply:
sapply(1:nrow(data), function(x){ifelse(data$ind[x] == 1, data$x[x], sum(m[, x]))})
[1] -1.3367324 0.1836433 1.3413668 1.5952808 4.5120996 -0.8204684 1.2736029 0.7383247 3.4748021
[10] -0.3053884
A benchmark:
microbenchmark::microbenchmark(
sapply = sapply(1:nrow(data), function(x){ifelse(data$ind[x] == 1, data$x[x], sum(m[, x]))}),
vectorised = {z <- data$x;
z[data$ind == 0] <- colSums(m[,data$ind == 0])})
Unit: microseconds
expr min lq mean median uq max neval cld
sapply 391.297 408.193 423.6525 412.4170 423.7450 853.249 100 b
vectorised 197.377 199.873 208.7701 202.5605 214.4645 284.545 100 a
R ifelse statement
Add another variable B to dataset and use ifelse function where you get 0 for "N"and 1 for "Y" values
Dataset$B <- ifelse(Dataset$A=="N",0,1)
or you can use ifelse function on same variable as
Dataset$A <- ifelse(Dataset$A=="N",0,1)
Using If/Else on a data frame
Use ifelse
:
frame$twohouses <- ifelse(frame$data>=2, 2, 1)
frame
data twohouses
1 0 1
2 1 1
3 2 2
4 3 2
5 4 2
...
16 0 1
17 2 2
18 1 1
19 2 2
20 0 1
21 4 2
The difference between if
and ifelse
:
if
is a control flow statement, taking a single logical value as an argumentifelse
is a vectorised function, taking vectors as all its arguments.
The help page for if
, accessible via ?"if"
will also point you to ?ifelse
Using ifelse statement in R dataframe to generate additional variables
Up front: I think the use of ifelse
statements in this problem is strongly ill-advised. It requires significant nesting, sacrificing performance and readability. Though these two solutions may be a little harder if you aren't familiar with mapply
or table-join-calculus, the payoff in stability and performance will far outweigh the time to learn these techniques.
Two methods:
Lookup matrix
One way is to define look-up arrays, where the row names reflect the possible V1
values, and the column names reflect the possible V2
values. (Note that when referencing these lookup matrices, one must use as.character
if your values are numeric/integer, since otherwise they will look for the slice/row number, not the specific matching column/row.)
Examples:
dat <- data.frame(
V1 = c(0,0,0,1,1,1,2,2,2),
V2 = c(0,1,2,0,1,2,0,1,2)
)
dmnms <- list(c(0,1,2), c(0,1,2))
m3 <- matrix(c(0, 1, 2,
0, NA, 1,
0, 0, 0),
nrow = 3, byrow = TRUE, dimnames = dmnms)
m4 <- matrix(c("AA", "AD", "DD",
"AB", NA, "CD",
"BB", "BC", "CC"),
nrow = 3, byrow = TRUE, dimnames = dmnms)
m3
# 0 1 2
# 0 0 1 2
# 1 0 NA 1
# 2 0 0 0
m4
# 0 1 2
# 0 "AA" "AD" "DD"
# 1 "AB" NA "CD"
# 2 "BB" "BC" "CC"
in this case, notice the 0
, 1
, and 2
in the row/column margins. In a matrix with no names, these are typically [1,]
, [2,]
, etc, indicating that actual names are not available, instead reflecting just the row number. However, since these are character
(no brackets/commas), they can be referenced directly, ala
m3["0","2"]
# [1] 2
m4["1","0"]
# [1] "AB"
From here, you just need to map these lookups into new columns, something like:
dat$V3 <- mapply(`[`, list(m3), as.character(dat$V1), as.character(dat$V2))
dat$V4 <- mapply(`[`, list(m4), as.character(dat$V1), as.character(dat$V2))
dat
# V1 V2 V3 V4
# 1 0 0 0 AA
# 2 0 1 1 AD
# 3 0 2 2 DD
# 4 1 0 0 AB
# 5 1 1 NA <NA>
# 6 1 2 1 CD
# 7 2 0 0 BB
# 8 2 1 0 BC
# 9 2 2 0 CC
Joining data.frame
Another method is to join a known data.frame
onto your data. This has an added benefit of easily expanding to more than two criteria. (Technically, the matrix
method can expand to more than 2, in which case it would be an n
-dim array
, but it is often a little harder to edit, manage, and visualize.)
In your example, this doesn't initially gain you much, since you need to pre-define your data.frame, but I'm guessing that this is just representative data, and your conditional classification is on much more data.
I'll define the joiner data.frame
that will be used against your actual data. This is the reference data, from which all input permutations will be defined into the respective V3
and V4
values.
joiner <- data.frame(
V1 = c(0,0,0,1,1,1,2,2,2),
V2 = c(0,1,2,0,1,2,0,1,2),
V3 = c(0, 1, 2, 0, NA, 1, 0, 0, 0),
V4 = c("AA", "AD", "DD", "AB", NA, "CD", "BB", "BC", "CC"),
stringsAsFactors = FALSE
)
I'll create a sample second data to demonstrate the merge:
dat2 <- data.frame(
V1 = c(2, 0, 1, 0),
V2 = c(0, 1, 2, 2)
)
merge(dat2, joiner, by = c("V1", "V2"))
# V1 V2 V3 V4
# 1 0 1 1 AD
# 2 0 2 2 DD
# 3 1 2 1 CD
# 4 2 0 0 BB
Edit: if you are concerned about dropping rows, then add all.x=TRUE
to the merge
command. If (as you saw based on your comment) you use all=TRUE
, this is a full join in SQL parlance, meaning it will keep all rows from both tables, even if there is not a match made. This may be better explained by referencing this answer and noting that I'm suggesting a left join with all.x
, keeping all on the left (first argument), only merging in rows on the right where a match is made.
(Note: this can also be done quite easily using dplyr
and data.table
packages.)
ifelse statement in R to assign values to a new column
The following should work;
Trial <- c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3)
ContourFix <- c(1, 0, 0, 0, 0, 1, 0, 1, 0, 0)
trial.ends <- c(which(diff(Trial)==1),length(Trial))
one.starts <- which(ContourFix ==1)
TrialFix <- rep(0,length(Trial))
for (i in 1:length(one.starts)){
TrialFix[one.starts[i]:trial.ends[i]] <- 1
}
It's a bit hacky but should serve your purposes. It requires that every set of trials has at least one corresponding value for ContourFix and that your data is grouped as in the example.
Using if else on a dataframe across multiple columns
For your example dataset this will work;
Option 1, name the columns to change:
dat[which(dat$desc == "blank"), c("x", "y", "z")] <- NA
In your actual data with 40 columns, if you just want to set the last 39 columns to NA, then the following may be simpler than naming each of the columns to change;
Option 2, select columns using a range:
dat[which(dat$desc == "blank"), 2:40] <- NA
Option 3, exclude the 1st column:
dat[which(dat$desc == "blank"), -1] <- NA
Option 4, exclude a named column:
dat[which(dat$desc == "blank"), !names(dat) %in% "desc"] <- NA
As you can see, there are many ways to do this kind of operation (this is far from a complete list), and understanding how each of these options works will help you to get a better understanding of the language.
R ifelse to replace values in a column
This should work, using the working example:
var <- c("Private", "Private", "?", "Private")
df <- data.frame(var)
df$var[which(df$var == "?")] = "Private"
Then this will replace the values of "?" with "Private"
The reason your replacement isn't working (I think) is as if the value in df$var
isn't "?"
then it replaces the element of the vector with the whole df$var
column, not just reinserting the element you want.
Related Topics
Use First Row Data as Column Names in R
How to Get Rowsums for Selected Columns in R
R Collapse Multiple Rows into 1 Row - Same Columns
How Split Column of List-Values into Multiple Columns
Adding a New Column Based Upon Values in Another Column Using Dplyr
How to Add Row and Column to a Dataframe of Different Length
If Else Statements to Check If a String Contains a Substring in R
Creating a Boxplot for Each Column in R
How to Sort a Data Frame by Alphabetic Order of a Character Variable in R
Delete Rows With Negative Values
How to Remove Na from a Factor Variable (And from a Ggplot Chart)
Regex Expression to Match Decimal Numbers With Comma as a Separator
R - Test If a String Vector Contains Any Element of Another List
Concatenating Two Text Columns in Dplyr
Select Every Nth Row from Dataframe