Weird Java Behavior with Casts to Primitive Types

Weird java behavior with casts to primitive types

It's being parsed as this:

long i = (byte)( +(char)( -(int)( +(long)(-1) ) ) );

where all the + and - operators are unary + or -.

In which case, the 1 gets negated twice, so it prints out as a 1.

Confusing type casting in Java

This code uses the unary + and - operators.

It's equivalent to -(-1) with a bunch of extra casts. (the unary + operator doesn't change the value)

Weird Lazy Casting Behaviour for character in Java 11

The int value -1 is equivalent to 0xFFFF_FFFF in the Two’s Complement representation. When casting it to a char, you’re cutting off the upper bits, ending up at 0xFFFF or rather '\uFFFF'.

It’s important to keep in mind that when you do System.out.println(charo); you’re ending up at a different method than the other print statements, as a char does not only have a different value range than short or int, but also different semantics.

When you cast 0xFFFF to short, the value doesn’t change, but 0xFFFF is exactly -1 in the 16 bit Two’s Complement representation. On the other hand, when you cast it to int, the value gets zero extended to 0x0000_FFFF which equals to 65535.

That’s the way to explain it in terms of the short, char, and int datatypes, but since you also asked “What is happening under the hood?”, it’s worth pointing out that this not how Java actually works.

In Java, all arithmetic involving byte, short, char, or int is done using int. Even local variables of any of these types are actually int variables on the bytecode level. In fact, the same applies to boolean variables but the Java language does not allow us to exploit this for arithmetic.

So the code

char charo = (char)-1;
System.out.println(charo);
System.out.println((short)charo);
System.out.println((int)charo);

actually compiles to the same as

int charo = (char)-1;
System.out.println(charo); // but invoking println(char)
System.out.println((short)charo);
System.out.println(charo);

or

int charo = 0x0000_FFFF;
System.out.println(charo); // but invoking println(char)
System.out.println((short)charo);
System.out.println(charo);

A said at the beginning, the first println ends up at a different method, responsible for the different semantics. The compile-time type of the variable only matters insofar as it made the compiler select the different method.

When always maintaining all 32 bits of a value, a cast to char has the effect of setting the upper 16 bits to zero. So the result of (char)-1 is 0x0000_FFFF and this operation is even done at compile-time already. So the first statement assigns the constant 0xFFFF to a variable.

The next statement invokes the println(char) method. No conversion is involved at the caller’s side.

The other two invocations end up at println(int) and here, the cast to short is actually modifying the value. It has the effect of sign-extending a short value to an int value, which means, the 15th bit is copied over to the upper 16 bits. So for 0x...._FFFF, the 15th bit is a one, so all upper bits are set to one, ending up at 0xFFFF_FFFF, which is the int value -1 when using Two’s Complement.

The final result is in line with the first explanation given above, reasoning about the value ranges of char, short, and int. For a lot of scenarios, explanations on that level are sufficient. But you might notice that there is no println(short) method, so to understand why println(int) is sufficient for printing short (or byte) values, it’s necessary to know what’s really going on.

Inconsistent behaviour of primitive integer types in Java

Section 5.1.3 of the JLS talks about the behavior of the narrowing primitive conversion used by the cast

Otherwise, one of the following two cases must be true:

The value must be too small (a negative value of large magnitude or negative infinity), and the result of the first step is the smallest representable value of type int or long.

The value must be too large (a positive value of large magnitude or positive infinity), and the result of the first step is the largest representable value of type int or long.

(emphasis mine)

That is why (int) (Math.pow(2, 32)); becomes Integer.MAX_VALUE and (long) (Math.pow(2, 64)) becomes Long.MAX_VALUE.

short to int cast, weird behaviour

short is signed, and that is preserved when casting to int. 0x90AF is a negative short, so the result is negative int. Your solution of masking it is correct.

Weird behavior when overloading using variable arguments with primitive types in java

A method call will not compile if the compiler cannot determine which of the overloaded method variants it should use.

Let's use z4 as an example:

  1. The method call z4() fits the signature of both variants.
  2. The method call z4(4) also fits the signature of both variants because the variable can be auto-boxed.
  3. The method call z4("asdas") is not ambiguous, as String cannot be cast to int.

Update: the rules for resolving overloaded method calls are as follows:

The first phase (§15.12.2.2) performs overload resolution without permitting boxing or unboxing conversion, or the use of variable arity method invocation. If no applicable method is found during this phase then processing continues to the second phase.

...

The second phase (§15.12.2.3) performs overload resolution while allowing boxing and unboxing, but still precludes the use of variable arity method invocation. If no applicable method is found during this phase then processing continues to the third phase.

...

The third phase (§15.12.2.4) allows overloading to be combined with variable arity methods, boxing, and unboxing.

If more than one variant is selected in the same phase, then the most specific one is chosen,* but in short z3(String...) is more specific than z3(Object...), while z4(int...) and z4(Object...) are equally specific.

*The rules for determining this most specific variant are somewhat complicated (see here)

confusing code, compiles fine. How this code works?

This is just a sequence of unary + and - operations mixed with type casts.

You start with -1, cast it to a long, the unary plus does nothing, cast it to an int, unary minus (value is now +1), cast to char, unary +, cast to byte.

Weird compile-time behavior when trying to use primitive type in generics

The type of int.class is Class<Integer>, so genericArrayNewInstance() would be inferred to return a Integer[]. But the function actually creates an int[], so it would have a class cast exception when it is returned. Basically, the cast to T[] inside the function is not legitimate in this case, because int[] is not a T[] (primitives can't be used in type variables). You cannot handle primitive array types generically; so you either have to have your method just return type Object, or you have to make separate methods for reference types and for primitive types.

Java Casting method without knowing what to cast to

Your cast method does an unchecked conversion, which is handled specially in the JVM to maintain backward compatibility with non-generic code.

Such calls cannot be shown to be statically safe under the type system using generics. Rejecting such calls would invalidate large bodies of existing code, and prevent them from using newer versions of the libraries. JLS 5.1.9

Calling the method without explicit type parameters will cause the compiler to infer the type parameter of the invocations, in this case based on their expected return type. Type Inference, JLS 15.12.2.7.. This means that code is equvivalent to this:

String foo = Caster.<String>cast("hi"); // no exception
int bar = Caster.<Integer>cast("1"); // runtime ClassCastException

Primitive types will inferred to their boxed version:

If A is a primitive type, then A is converted to a reference type U via boxing conversion and this algorithm is applied recursively to the constraint U << F. JLS 15.12.2.7.

The JVM ensures type safety by doing runtime type checks on return values of functions containing unchecked casts, at the first point where the type information is not erased (I didn't find it explicitly stated in the specification, but things look to work this way, although it is mentioned in The Java Tutorials). In this case, where you try to assign the value to a typed local variable, the type of the return value is checked, resulting in a ClassCastException.

To give some more idea when that enforced runtime type checking cast happens, here are a few more examples:

Object a3 = Caster.<String>cast(3); // no exception, a3 is now Integer
Object a4 = (String)Caster.<String>cast(3); // an explicit cast causes runtime ClassCastException

EDIT:

Here is a StackOverflow question about when are runtime type checks enforced: When is generic return value of function casted after type erasure?

Are Java integer-type primitive casts capped at the MAX_INT of the casting type?

This is expected behavior. Remember that there are no primitive unsigned long or int types in Java, and the Java Language Specification (Java 7) for Narrowing primitive conversion (5.1.3) states that casting a "too small or too large" floating point value (be it double or float) to an integral type of int or long will use the minimum or maximum value of signed integral types (emphasis mine):

A narrowing conversion of a floating-point number to an integral type
T takes two steps:

  1. In the first step, the floating-point number is converted either to
    a long, if T is long, or to an int, if T is byte, short, char, or
    int, as follows:

    • If the floating-point number is NaN (§4.2.3), the result of the first step of the conversion is an int or long 0.
    • Otherwise, if the floating-point number is not an infinity, the floating-point value is rounded to an integer value V, rounding
      toward zero using IEEE 754 round-toward-zero mode (§4.2.3). Then there
      are two cases:

      • a. If T is long, and this integer value can be represented as a long, then the result of the first step is the long value V.
      • b. Otherwise, if this integer value can be represented as an int, then the result of the first step is the int value V.
    • Otherwise, one of the following two cases must be true:

      • a. The value must be too small (a negative value of large magnitude or negative infinity), and the result of the first step is
        the smallest representable value of type int or long.
      • b. The value must be too large (a positive value of large magnitude or positive infinity), and the result of the first step is
        the largest representable value of type int or long.
        *
  2. In the second step: * If T is int or long, the result of the conversion is the result of the first step. * If T is byte, char, or
    short, the result of the conversion is the result of a narrowing
    conversion to type T (§5.1.3) of the result of the first step.


Example 5.1.3-1. Narrowing Primitive Conversion

class Test {
public static void main(String[] args) {
float fmin = Float.NEGATIVE_INFINITY;
float fmax = Float.POSITIVE_INFINITY;
System.out.println("long: " + (long)fmin + ".." + (long)fmax);
System.out.println("int: " + (int)fmin + ".." + (int)fmax);
System.out.println("short: " + (short)fmin + ".." + (short)fmax);
System.out.println("char: " + (int)(char)fmin + ".." + (int)(char)fmax);
System.out.println("byte: " + (byte)fmin + ".." + (byte)fmax);
}
}

This program produces the output:

long: -9223372036854775808..9223372036854775807
int: -2147483648..2147483647
short: 0..-1
char: 0..65535
byte: 0..-1

The results for char, int, and long are unsurprising, producing the
minimum and maximum representable values of the type.

The results for byte and short lose information about the sign and
magnitude of the numeric values and also lose precision. The results
can be understood by examining the low order bits of the minimum and
maximum int. The minimum int is, in hexadecimal, 0x80000000, and the
maximum int is 0x7fffffff. This explains the short results, which are
the low 16 bits of these values, namely, 0x0000 and 0xffff; it
explains the char results, which also are the low 16 bits of these
values, namely, '\u0000' and '\uffff'; and it explains the byte
results, which are the low 8 bits of these values, namely, 0x00 and
0xff.

The first case int formulaTest = (int) (maxUintFromDoubleAsLong * 1.0); thus promotes maxUintFromDoubleAsLong to a double via multiplication and then casts it to an int. Since the value is too large to represent as a signed integer, the value becomes 2147483647 (Integer.MAX_VALUE) or 0x7FFFFFFF.

As for the latter case:

A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number
of bits used to represent type T. In addition to a possible loss of
information about the magnitude of the numeric value, this may cause
the sign of the resulting value to differ from the sign of the input
value.

So int testFormulaeWithDoubleCast = (int)((long) (maxUintFromDoubleAsLong * 1.0)); first promotes maxUintFromDoubleAsLong to double, back to long (still fitting) and then to an int. In the last cast, the excess bits are simply dropped, leaving you with 0xFFFFFFFF, which is -1 when interpreted as a signed integer.



Related Topics



Leave a reply



Submit