Why Doesn't Comparison Between Numeric and Character Variables Give a Warning

Why doesn't comparison between numeric and character variables give a warning?

In your example 5 is converted to a character, so the test is the same as 'two' < as.character(5).

From ?Comparison:

If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.

comparing unsigned vs signed does not make a warning (using const)

I'd say it's a compiler bug in the -Wsign-compare option.

Test by compiling your example with -Wall -Wextra -O3. With -O3 added, the warning suddenly goes away in the const case. Even though the generated machine code with or without const is identical. This doesn't make any sense.

Naturally, neither const nor the generated machine code has any effect on the signedness of the C operands, so the warning shouldn't come inconsistently depending on type qualifiers or optimizer settings.

A warning - comparison between signed and unsigned integer expressions

It is usually a good idea to declare variables as unsigned or size_t if they will be compared to sizes, to avoid this issue. Whenever possible, use the exact type you will be comparing against (for example, use std::string::size_type when comparing with a std::string's length).

Compilers give warnings about comparing signed and unsigned types because the ranges of signed and unsigned ints are different, and when they are compared to one another, the results can be surprising. If you have to make such a comparison, you should explicitly convert one of the values to a type compatible with the other, perhaps after checking to ensure that the conversion is valid. For example:

unsigned u = GetSomeUnsignedValue();
int i = GetSomeSignedValue();

if (i >= 0)
{
// i is nonnegative, so it is safe to cast to unsigned value
if ((unsigned)i >= u)
iIsGreaterThanOrEqualToU();
else
iIsLessThanU();
}
else
{
iIsNegative();
}

Unexpected result of greater than or less than comparison on PHP 8

There is no obviously correct result for a comparison between a string and a number. In many languages, it would just give an error; in others, including PHP, the language tries to make sense of it by converting both operands to the same type, but this involves a judgement of which type to "prefer".


Historically, PHP has preferred comparing numbers to comparing strings: it treated "U0M262" > 100000 as (int)"U0M262" > 100000. Since (int)"U0M262" has no obvious value, it is evaluated as 0, and the expression becomes 0 > 100000, which is false.

As of PHP 8, this behaviour has changed and PHP now only uses a numeric comparison for "numeric strings", e.g. "42" clearly "looks like" 42.

Since "U0M262" doesn't fit the requirements for a numeric string, "U0M262" > 100000 is now treated as "U0M262" > (string)100000. This does a byte-wise comparison of the sort order for the two strings, and finds that since "U" comes after "1" in ASCII (and any ASCII-derived encoding, including UTF-8), the result is true.


Because of how ASCII (and compatible encodings such as UTF-8) is arranged:

  • A string starting with a control character or space will be "less than" any number
  • A string starting with a letter will be "more than" any number
  • A string starting with any of "! " # $ % & ' ( ) * + , - . /" will be "less than" any number
  • For a string starting with a digit, you need to look at the individual bytes
  • Any other string will be "more than" any number

As ever, you can tell PHP which comparison you intended, and get the correct behaviour in all versions, using explicit casts:

var_dump((int)"U0M262" > (int)100000); // bool(false)
var_dump((string)"U0M262" > (string)100000); // bool(true)

(Obviously, this makes no sense if you're hard-coding both sides anyway, but assuming one or both is a variable, this is how you'd do it.)

Multi-character constant warnings

According to the standard (§6.4.4.4/10)

The value of an integer character constant containing more than one
character (e.g., 'ab'), [...] is implementation-defined.

long x = '\xde\xad\xbe\xef'; // yes, single quotes

This is valid ISO 9899:2011 C. It compiles without warning under gcc with -Wall, and a “multi-character character constant” warning with -pedantic.

From Wikipedia:

Multi-character constants (e.g. 'xy') are valid, although rarely
useful — they let one store several characters in an integer (e.g. 4
ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one).
Since the order in which the characters are packed into one int is not
specified, portable use of multi-character constants is difficult.

For portability sake, don't use multi-character constants with integral types.

Avoiding type conflicts with dplyr::case_when

As said in ?case_when:

All RHSs must evaluate to the same type of vector.

You actually have two possibilities:

1) Create new as a numeric vector

df <- df %>% mutate(new = case_when(old == 1 ~ 5,
old == 2 ~ NA_real_,
TRUE ~ as.numeric(old)))

Note that NA_real_ is the numeric version of NA, and that you must convert old to numeric because you created it as an integer in your original dataframe.

You get:

str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ old: int 1 2 3
# $ new: num 5 NA 3

2) Create new as an integer vector

df <- df %>% mutate(new = case_when(old == 1 ~ 5L,
old == 2 ~ NA_integer_,
TRUE ~ old))

Here, 5L forces 5 into the integer type, and NA_integer_ is the integer version of NA.

So this time new is integer:

str(df)
# 'data.frame': 3 obs. of 2 variables:
# $ old: int 1 2 3
# $ new: int 5 NA 3


Related Topics



Leave a reply



Submit