Why Do Numeric String Comparisons Give Unexpected Results

Why do numeric string comparisons give unexpected results?

Strings are compared character by character.

When you compare 1: vs 2:, the comparison begins with 2 vs 1, and the comparison stops there with the expected result.

When you compare 1: vs 10:, the comparison begins with 1 vs 1, and since it is a tie, the comparison moves on to the next comparison, which is : vs 0, and the comparison stops there with the result that you have found surprising (given your expectation that the integers within the strings would be compared).

To do the comparison you expect, use to_i to convert both operands to integers.

Comparing Element.value to number is giving unexpected results

It's probably because your fsnumber, snumber and tnumber are stored in html as strings. You just need to convert those numbers to integers. Obviously add some logic to check if it succeeded in converting before using the variables further.

  var fnumber = parseInt(document.getElementById("fnumber").value);
var snumber = parseInt(document.getElementById("snumber").value);
var tnumber = parseInt(document.getElementById("tnumber").value);

Comparing string against numeric field returning unexpected results

It's in the MySQL documentation here:
http://dev.mysql.com/doc/refman/5.1/en/type-conversion.html

Unexpected result of greater than or less than comparison on PHP 8

There is no obviously correct result for a comparison between a string and a number. In many languages, it would just give an error; in others, including PHP, the language tries to make sense of it by converting both operands to the same type, but this involves a judgement of which type to "prefer".


Historically, PHP has preferred comparing numbers to comparing strings: it treated "U0M262" > 100000 as (int)"U0M262" > 100000. Since (int)"U0M262" has no obvious value, it is evaluated as 0, and the expression becomes 0 > 100000, which is false.

As of PHP 8, this behaviour has changed and PHP now only uses a numeric comparison for "numeric strings", e.g. "42" clearly "looks like" 42.

Since "U0M262" doesn't fit the requirements for a numeric string, "U0M262" > 100000 is now treated as "U0M262" > (string)100000. This does a byte-wise comparison of the sort order for the two strings, and finds that since "U" comes after "1" in ASCII (and any ASCII-derived encoding, including UTF-8), the result is true.


Because of how ASCII (and compatible encodings such as UTF-8) is arranged:

  • A string starting with a control character or space will be "less than" any number
  • A string starting with a letter will be "more than" any number
  • A string starting with any of "! " # $ % & ' ( ) * + , - . /" will be "less than" any number
  • For a string starting with a digit, you need to look at the individual bytes
  • Any other string will be "more than" any number

As ever, you can tell PHP which comparison you intended, and get the correct behaviour in all versions, using explicit casts:

var_dump((int)"U0M262" > (int)100000); // bool(false)
var_dump((string)"U0M262" > (string)100000); // bool(true)

(Obviously, this makes no sense if you're hard-coding both sides anyway, but assuming one or both is a variable, this is how you'd do it.)

Unexpected result comparing strings with `==`

The problem you've encountered here is due to recycling (not the eco-friendly kind). When applying an operation to two vectors that requires them to be the same length, R often automatically recycles, or repeats, the shorter one, until it is long enough to match the longer one. Your unexpected results are due to the fact that R recycles the vector c("p", "o") to be length 4 (length of the larger vector) and essentially converts it to c("p", "o", "p", "o"). If we compare c("p", "o", "p", "o") and c("p", "o", "l", "o") we can see we get the unexpected results of above:

c("p", "o", "p", "o") == c("p", "o", "l", "o")
#> [1] TRUE TRUE FALSE TRUE

It's not exactly clear to me why you would expect the result to be TRUE TRUE FALSE FALSE, as it's somewhat of an ambiguous comparison to compare a length-2 vector to a length-4 vector, and recycling the length-2 vector (which is what R is doing) seems to be the most reasonable default aside from throwing an error.

1-length string comparison gives different result than character comparison... why?

The default string comparison is doing a 'word sort'. From the documentation,

The .NET Framework uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them. For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases. Therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string.

The comparison you are expecting is the ordinal comparison, which you can get by using StringComparison.Ordinal in the String.Compare overload, like so:

bool resStrComp = String.Compare("9", "=", StringComparison.Ordinal) < 0;

This will compare the strings by using their unicode values, in the same way comparing a character to another character does.

Unexpected result when comparing dates

MySQLs implicit type conversion can be very surprising. If you want to understand the behavior of your queries, you can try to apply the type conversion rules as described in Type Conversion in Expression Evaluation. However - I failed to do that for your case. For example: For the two expressions date > '2019' and date > 2019 I would apply the following rule:

If one of the arguments is a TIMESTAMP or DATETIME column and the
other argument is a constant, the constant is converted to a timestamp
before the comparison is performed.

But that cannot be the case, because neither the number 2019 nor the string '2019' can be converted to a temporal type. Here is a query, which demonstrates some implicit conversions:

select '2019' + interval 0 day -- implicit cast to date(time)
, 2019 + interval 0 day
, 20190101 + interval 0 day
, 190101 + interval 0 day
, '2019*01*01' + interval 0 day
, '2019-01-01' + interval 0 day
, '2019-01-01' + 0 -- implicit cast to numeric
, date('2019-01-01') + 0
, date('2018-01-01') > 2019
, date('2018-01-01') > '2019'
;

Result:

Expression                    | Result
------------------------------|-----------
'2019' + interval 0 day | null
2019 + interval 0 day | null
20190101 + interval 0 day | 2019-01-01
190101 + interval 0 day | 2019-01-01
'2019*01*01' + interval 0 day | 2019-01-01
'2019-01-01' + interval 0 day | 2019-01-01
'2019-01-01' + 0 | 2019
date('2019-01-01') + 0 | 20190101
date('2018-01-01') > 2019 | 1
date('2018-01-01') > '2019' | 0

As you see, when we try to convert 2019 or '2019' to a date (or datetime), we get NULL. Thus the conditions should also be evaluated to NULL and the result set would be empty. But as we know, that is not the case. Maybe I'm just wrong, assuming that 2019 and '2019' are constants. But then I don't know what they could mean.

So I can only make assumptions. And my assumtion is: Whenever one comparator is numeric, the other value is also converted to a numeric value. This would be the case for date > 2019 aswell as for date > year(@THIS_YEAR). In this case the date 2018-01-01 is converted to 20180101 (see the table above), which (in numeric context) is greater than 2019. So you get rows from the year 2018.

For date > '2019' I can only assume, that the values are compared as strings. And '2018-01-01' as string is considered "smaller" than 2019.

But even if that behavior would be properly documented, the rules are too difficult to remember, because one can hardly see any logic behind them. (I don't say - there is no logic - I just don't see any.)

So I can give you one advise: If you want to compare two incompatible types, always cast or convert them to be compatible.

WHERE year(date) >= year(@THIS_YEAR)

would be fine, since you compare two numeric values. But that is not necessery in your case and you can just use

WHERE date >= @THIS_YEAR

because 2019-01-01 00:00:00 in

`SET @THIS_YEAR = "2019-01-01 00:00:00";`

is a perfectly formatted DATETIME string and can be considered compatible with the DATETIME type. '2019-01-01' would be just fine aswell.

Note that if you wrapp a column into a function call (like year(date)) you will loose the ability to use an index on that column.

weird results with IF

The reason for this result can be found in documentation of function strtol which is used first on using the comparison operators EQU, NEQ, LSS, LEQ, GTR, GEQ as explained in my answer on Symbol equivalent to NEQ, LSS, GTR, etc. in Windows batch files.

Return Value
On success, the function returns the converted integral number as a long int value.

If no valid conversion could be performed, a zero value is returned (0L).

If the value read is out of the range of representable values by a long int, the function returns LONG_MAX or LONG_MIN (defined in <climits>), and errno is set to ERANGE.

The last sentence is most important here.

It looks like IF in cmd.exe is coded similar to this C code:

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main (int argc, char* argv[])
{
const char csNo[] = "no";
const char csYes[] = "yes";

char* pcEndValue1;
char* pcEndValue2;
int iExitCode = 2;
int iErrorNumber1;
int iErrorNumber2;
int iStringResult1;
int iStringResult2;
long lIntegerValue1;
long lIntegerValue2;

if(argc > 2)
{
/* Convert the two arguments to 32-bit signed integers. */
lIntegerValue1 = strtol(argv[1],&pcEndValue1,0);
iErrorNumber1 = errno;
lIntegerValue2 = strtol(argv[2],&pcEndValue2,0);
iErrorNumber2 = errno;

/* Failed the conversion for any of the two arguments? */
if(((lIntegerValue1 == 0) && (*pcEndValue1 != '\0')) ||
((lIntegerValue2 == 0) && (*pcEndValue2 != '\0')))
{
/* Compare case-sensitive the two arguments as strings. */
iStringResult1 = strcmp(argv[1],argv[2]);
iStringResult2 = strcmp(argv[2],argv[1]);

printf("String comparing %s (a) with %s (b):\n\n",argv[1],argv[2]);
printf("a GEQ b: %s\n",(iStringResult1 >= 0) ? csYes : csNo);
printf("b GEQ a: %s\n",(iStringResult2 >= 0) ? csYes : csNo);
printf("a LEQ b: %s\n",(iStringResult1 <= 0) ? csYes : csNo);
printf("b LEQ a: %s\n",(iStringResult2 <= 0) ? csYes : csNo);
printf("a EQU b: %s\n",(iStringResult2 == 0) ? csYes : csNo);
iExitCode = 1;
}
else
{
/* Compare the values. */
printf("Value comparing %s/%ld (a) with %s/%ld (b):\n\n",argv[1],lIntegerValue1,argv[2],lIntegerValue2);
printf("a GEQ b: %s\n",(lIntegerValue1 >= lIntegerValue2) ? csYes : csNo);
printf("b GEQ a: %s\n",(lIntegerValue2 >= lIntegerValue1) ? csYes : csNo);
printf("a LEQ b: %s\n",(lIntegerValue1 <= lIntegerValue2) ? csYes : csNo);
printf("b LEQ a: %s\n",(lIntegerValue2 <= lIntegerValue1) ? csYes : csNo);
printf("a EQU b: %s\n",(lIntegerValue2 == lIntegerValue1) ? csYes : csNo);
iExitCode = 0;
}
printf("\nError number a: %d ... %s\n",iErrorNumber1,strerror(iErrorNumber1));
printf("Error number b: %d ... %s\n",iErrorNumber2,strerror(iErrorNumber2));
}
return iExitCode;
}

Compiling this C code as console application and running the executable with the parameters 333333333333 444444444444 results for example in output:

Value comparing 333333333333/2147483647 (a) with 444444444444/2147483647 (b):

a GEQ b: yes
b GEQ a: yes
a LEQ b: yes
b LEQ a: yes
a EQU b: yes

Error number a: 2 ... Output of function out of range (ERANGE)
Error number b: 2 ... Output of function out of range (ERANGE)

And running the executable with the parameters 333333333333 222222222222 results for example in output:

Value comparing 333333333333/2147483647 (a) with 222222222222/2147483647 (b):

a GEQ b: yes
b GEQ a: yes
a LEQ b: yes
b LEQ a: yes
a EQU b: yes

Error number a: 2 ... Output of function out of range (ERANGE)
Error number b: 2 ... Output of function out of range (ERANGE)

Note: The error number and the corresponding error string can differ depending on used C compiler respectively standard library.

In both test cases both arguments resulted in a 32-bit signed integer overflow on conversion from string to long int. Therefore strtol returned for all four values LONG_MAX and set errno to ERANGE. But the overflow condition is not evaluated by code of IF in cmd.exe. It is just checked the conversion result and on which character the end pointer points to for both arguments like by the C code above.

In other words IF processes on usage of comparison operators EQU, NEQ, LSS, LEQ, GTR, GEQ always an integer comparison as long as conversion from string to integer does not fail for any of the two arguments because of an invalid character in argument strings. An out of range condition is no reason for IF not doing an integer comparison.

A string comparison is done only if one of the two argument strings contains an invalid character for an integer.



Related Topics



Leave a reply



Submit