R Gotcha: Logical-And Operator for Combining Conditions Is & Not &&

R gotcha: logical-and operator for combining conditions is & not &&

From the help page for Logical Operators, accessible by ?"&&":

& and && indicate logical AND and | and || indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in if clauses.

(R version 2.13-0)

In other words, when using subset, use the single &.

Here is an illustration of the difference:

c(1,1,0,0) & c(1,0,1,0)
[1]  TRUE FALSE FALSE FALSE

c(1,1,0,0) && c(1,0,1,0)
[1] TRUE

If this looks quirky compared to other programming paradigms, remember that R needs to provide a vectorised form of the operator.

Is there a reason to prefer '&&' over '&' in 'if' statements, other than short-circuiting?

Short answer: Yes, the different symbol makes the meaning more clear to the reader.

Thanks for this interesting question! If I can summarize, it seems to be a follow-up specifically about this section of my answer to the question you linked,

... you want to use the long forms only when you are certain the
vectors are length one. You should be absolutely certain your vectors
are only length 1, such as in cases where they are functions that
return only length 1 booleans. You want to use the short forms if the
vectors are length possibly >1. So if you're not absolutely sure, you
should either check first, or use the short form and then use all and
any to reduce it to length one for use in control flow statements,
like if.

I hear your question (given comments) this way: But & and && will do the same thing if the inputs are length one, so other than short-circuiting, why prefer &&? Perhaps & should be preferred because if they're not length one, if will give me a warning, helping me be even more certain that the inputs are length one.

First, I agree with the comment by @James that you may be "overstating the value of getting a warning"; if it's not length one, the safer thing will be to handle this appropriately, not to just plow ahead. You could make a case that && should throw an error if they're not length one, and perhaps a good case; I don't know the reason why it does what it does. But without going back in time, the best we can do now is to check that the inputs are indeed appropriate for your use.

Given then, that you have checked to make sure your inputs are appropriate, I would still recommend && because it semantically reminds me as the reader that I should be making sure the inputs are scalars (length one). I'm so used to thinking vector-ally that this reminder is helpful to me. It follows the principle that different operations should have different symbols, and for me, a operation that is meant for use on scalars is different enough than a vectorized operation that it warrants a different symbol.

(Not to start a flame war (I hope), but this is also why I prefer <- to =; one for assigning variables, one for setting parameters to functions. Although deep down this is the same thing, it's different enough in practice to make the different symbols helpful to me as a reader.)

What is the difference between short (&,|) and long (&&, ||) forms of AND, OR logical operators in R?

& and | - are element-wise and can be used with vector operations, whereas, || and && always generate single TRUE or FALSE

theck the difference:

> x <- 1:5
> y <- 5:1
> (x > 2) & (y < 3) 
  [1] FALSE FALSE FALSE  TRUE  TRUE
> (x > 2) && (y < 3) # here operaand && takes only 1'st elements from logical
                     # vectors (x>2) and (y<3)
> FALSE

So, && and || are commonly used in if (condition) state_1 else state_2 statements, as
dealing with vectors of length 1

Correctly Specifying Logical Conditions (in R)

UPDATE:

I think I was able to resolve this problem - now the "logical conditions" are respected in the final output:

#load libraries
library(dplyr)
library(mco)

#define function

funct_set <- function (x) {
    x1 <- x[1]; x2 <- x[2]; x3 <- x[3] ; x4 <- x[4]; x5 <- x[5]; x6 <- x[6]; x[7] <- x[7]
    f <- numeric(4)
    
    
    #bin data according to random criteria
    train_data <- train_data %>%
        mutate(cat = ifelse(a1 <= x1 & b1 <= x3, "a",
                            ifelse(a1 <= x2 & b1 <= x4, "b", "c")))
    
    train_data$cat = as.factor(train_data$cat)
    
    #new splits
    a_table = train_data %>%
        filter(cat == "a") %>%
        select(a1, b1, c1, cat)
    
    b_table = train_data %>%
        filter(cat == "b") %>%
        select(a1, b1, c1, cat)
    
    c_table = train_data %>%
        filter(cat == "c") %>%
        select(a1, b1, c1, cat)
    
    
    
    #calculate  quantile ("quant") for each bin
    
    table_a = data.frame(a_table%>% group_by(cat) %>%
                             mutate(quant = ifelse(c1 > x[5],1,0 )))
    
    table_b = data.frame(b_table%>% group_by(cat) %>%
                             mutate(quant = ifelse(c1 > x[6],1,0 )))
    
    table_c = data.frame(c_table%>% group_by(cat) %>%
                             mutate(quant = ifelse(c1 > x[7],1,0 )))
    
    f[1] = mean(table_a$quant)
    f[2] = mean(table_b$quant)
    f[3] = mean(table_c$quant)
    
    
    #group all tables
    
    final_table = rbind(table_a, table_b, table_c)
    # calculate the total mean : this is what needs to be optimized
    
    f[4] = mean(final_table$quant)
    
    
    return (f);
}


gn <- function(x) {
    g1 <- x[3] - x[1] 
    g2<- x[4] - x[2] 
    g3 <- x[7] - x[6]
    g4 <- x[6] - x[5] 
    return(c(g1,g2,g3,g4))
}

optimization <- nsga2(funct_set, idim = 7, odim = 4 , constraints = gn, cdim = 4,
                      
                      generations=150,
                      popsize=100,
                      cprob=0.7,
                      cdist=20,
                      mprob=0.2,
                      mdist=20,
                      lower.bounds=rep(80,80,80,80, 100,200,300),
                      upper.bounds=rep(120,120,120,120,200,300,400)
)

Now, if we take a look at the output:

#view output
optimization

Sample Image

All the logical conditions (i.e. the "constraints") are now respected!

Note: if possible, I would still be interested in seeing alternate ways to solve this problem

Thanks everyone!

Understanding when the && operator short circuits

This doesn't make sense to me, because && should evaluate left to
right, and stop as soon as one of its conditions is true.

This is wrong. You are mixing up && with ||:

TRUE && FALSE gives FALSE
- && requires both conditions to be TRUE
- && will short-circuit on FALSE
TRUE || FALSE gives TRUE
- || requires a single condition to be TRUE
- || will short-circuit on TRUE

Also,

TRUE || NA

gives

TRUE

R - Unexpected output when running a function on a dataframe

You need vectorised ifelse with a single & (instead of &&) if you want to test a condition on every element of a vector.

From ?ifelse

‘ifelse’ returns a value with the same shape as ‘test’ which is
filled with elements selected from either ‘yes’ or ‘no’ depending
on whether the element of ‘test’ is ‘TRUE’ or ‘FALSE’.

From ?`&&`

‘&’ and ‘&&’ indicate logical AND and ‘|’ and ‘||’ indicate
logical OR. The shorter form performs elementwise comparisons in
much the same way as arithmetic operators. The longer form
evaluates left to right examining only the first element of each
vector. Evaluation proceeds only until the result is determined.
The longer form is appropriate for programming control-flow and
typically preferred in ‘if’ clauses.

The short form & performs an element-wise comparison, while && evaluates only the first element of the vector.

Here is an example based on your df

f1 <- function(x) if (x < 32 && x > 0) x + 100 else x - 100;
f2 <- function(x) ifelse(x < 32 & x > 0, x + 100, x - 100);

f1(df$A)
#[1] 110 120 130 140

f2(df$A)
#[1] 110 120 130 -60

R multiple conditions in row selection of matrix

You need to use a single '&':

dataOnBoth = data[data$value_1 > 0 & data$value_2 > 0,]

See this question for more details.

Two conditions in one if statement does the second matter if the first is false?

It is common for languages (Java and Python are among them) to evaluate the first argument of a logical AND and finish evaluation of the statement if the first argument is false. This is because:

From The Order of Evaluation of Logic Operators,

When Java evaluates the expression d = b && c;, it first checks whether b is true. Here b is false, so b && c must be false regardless of whether c is or is not true, so Java doesn't bother checking the value of c.

This is known as short-circuit evaluation, and is also referred to in the Java docs.

It is common to see list.count > 0 && list[0] == "Something" to check a list element, if it exists.

It is also worth mentioning that if (list.length>2 && list[3] == 2) is not equal to the second case

if (list.length>2){
    if (list[3] == 2){
        ...
    }
}

if there is an else afterwards. The else will apply only to the if statement to which it is attached.

To demonstrate this gotcha:

if (x.kind.equals("Human")) {
    if (x.name.equals("Jordan")) {
        System.out.println("Hello Jordan!");
    }
} else {
    System.out.println("You are not a human!");
}

will work as expected, but

if (x.kind.equals("Human") && x.name.equals("Jordan")) {
    System.out.println("Hello Jordan!");
} else {
    System.out.println("You are not a human!");
}

will also tell any Human who isn't Jordan they are not human.

R Gotcha: Logical-And Operator for Combining Conditions Is & Not &&