In Java Lambda's Why Is Getclass() Called on a Captured Variable

In Java Lambda's why is getClass() called on a captured variable

Yes, calling getClass() has become a canonical “test for null” idiom, as getClass() is expected to be a cheap intrinsic operation and, I suppose, HotSpot might be capable of detecting this pattern and reduce the operation to an intrinsic null-check operation, if the result of getClass() is not used.

Another example is creating an inner class instance with an outer instance that is not this:

public class ImplicitNullChecks {
    class Inner {}
    void createInner(ImplicitNullChecks obj) {
        obj.new Inner();
    }

    void lambda(Object o) {
        Supplier<String> s=o::toString;
    }
}

compiles to

Compiled from "ImplicitNullChecks.java"
public class bytecodetests.ImplicitNullChecks {
  public bytecodetests.ImplicitNullChecks();
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return

  void createInner(bytecodetests.ImplicitNullChecks);
    Code:
       0: new           #23                 // class bytecodetests/ImplicitNullChecks$Inner
       3: dup
       4: aload_1
       5: dup
       6: invokevirtual #24                 // Method java/lang/Object.getClass:()Ljava/lang/Class;
       9: pop
      10: invokespecial #25                 // Method bytecodetests/ImplicitNullChecks$Inner."<init>":(Lbytecodetests/ImplicitNullChecks;)V
      13: pop
      14: return

  void lambda(java.lang.Object);
    Code:
       0: aload_1
       1: dup
       2: invokevirtual #24                 // Method java/lang/Object.getClass:()Ljava/lang/Class;
       5: pop
       6: invokedynamic #26,  0             // InvokeDynamic #0:get:(Ljava/lang/Object;)Ljava/util/function/Supplier;
      11: astore_2
      12: return
}

What does just calling .getClass(); do?

This looks like decompiled code, and my guess is that the decompiler hasn't generated Java code that is equivalent to the original source code.

The literal meaning of

var10001.getClass();

is to return the Class object for the type of the object that var10001 refers to. But the value that is returned appears to be discarded, so the call (apparently) doesn't achieve anything. Hence, my tentative conclusion that the decompiler has stuffed up.

You may need to read the (disassembled) bytecodes directly to discern what they are actually doing. (Or you could try a different decompiler.)

UPDATE

It is plausible that getClass() is called solely for the side-effect of checking for null. (I've never seen that idiom ... but it would work.) I wouldn't expect it to make the code faster, but it would make it more compact.

However, if this is being done in the (original) source code, it would appear to be unnecessary. A couple of lines later, the code takes var10001::get and passes it as an argument in a Stream.map call. I'm pretty sure that that evaluating var10001::get will entail checking that var10001 is not null.

Why is getClass() called when we create an object for Inner class?

As far as I understand, @Eugene's answer is absolutely correct. I decided to add an explanation in simple words. Hopefully, it will help someone.

Answer: Calls to Object.getClass were used by the compiler in JDK8 to generate NullPointerExceptions where necessary. In your example, this check is unnecessary, as new Outer() can't be null, but the compiler wasn't smart enough to determine it.

In later versions of JDK, null checks were changed to use a more readable Objects.requireNotNull. The compiler was also improved to optimize away redundant null checks.

Explanation:

Consider code like this:

class Outer{
      class Inner{
      }
      public static void main(String args[]){
            Outer.Inner obj = ((Outer) null).new Inner();
      } 
}

This code throws a NullPointerException, as it should.

The problem is, NPE is only logical from the Java point of view. Constructors do not exist on the byte code level. The compiler generates a byte code more or less equivalent to the following pseudocode:

class Outer {
    public static void main(String[] args) {
         Outer tmp = (Outer) null;
         Outer$Inner obj = new; //object created
         obj."<init>"(tmp);
    }
}
class Outer$Inner {
    //generated field
    private final Outer outer;
    //generated initializer
    void "<init>"(Outer outer) {
         this.outer = outer;
    }    
}

As you can see, the constructor was replaced with a method. And the method, by itself, will not check it's argument for null and, thus, will not throw an Exception.

For that reason, compiler has to add an additional null check to generate a NullPointerException. Before Java 8, the quick and dirty way to achieve this was to emit a call to getClass:

Outer tmp = (Outer) null;
tmp.getClass(); //generates an NPE

How can you check that this is, indeed, the reason:

Compile the Outer class above using JDK 8.
Run it, it should throw an NPE.
Remove the call to Object.getClass from Outer.class using any bytecode editor (e.g. JBE).
Run the program again, it should complete successfully.

Why do lambda and anonymous class have different implicitly added fields?

Anonymous class and Lambda do have some differences.

Anonymous class may extend another class and also can have more than one method. So the keywords, this and super need to be used to access its own members and its super class members respectively. Also, if an anonymous class needs to access the enclosing class & the enclosing class's parent members it has to use qualified this and super keywords. Example: EnclosingClass.this.toString()
On the other hand, Lambda can only implement an interface with one method. It has no other members, so the keywords, this and super can rather be used to refer to the Enclosing class and its parent's members.

Quoting from Java Spec

Unlike code appearing in anonymous class declarations, the meaning of
names and the this and super keywords appearing in a lambda body,
along with the accessibility of referenced declarations, are the same
as in the surrounding context

Example Code:

public class EnclosingClass {

    public static void main(String[] args) {
        new EnclosingClass().start();
    }

    private void run(Runnable runnable) {
        runnable.run();
    }

    private void start() {
        System.out.println("--- Lambda ---");
        run(() -> {
            System.out.println(this.toString());
            System.out.println(super.toString()); // Use Object.toString
        });

        System.out.println("---Anonymous class---");
        run(new AnonymousSuperClass() {
            @Override
            public void run() {
                super.commonMethod();
                System.out.println(this.toString());
                System.out.println(super.toString());
                System.out.println(EnclosingClass.this.toString());
                System.out.println(EnclosingClass.super.toString()); // Use Object.toString
            }

            @Override
            public String toString() {
                return "Anonymous Class";
            }
        });
    }

    @Override
    public String toString() {
        return "Enclosing Class";
    }

    private static abstract class AnonymousSuperClass implements Runnable {
        protected void commonMethod() {
            System.out.println("Some common processing");
        }

        @Override
        public String toString() {
            return "Anonymous Super Class";
        }
    }
}

Sample Output:

--- Lambda ---
Enclosing Class
EnclosingClass@eed1f14
---Anonymous class---
Some common processing
Anonymous Class
Anonymous Super Class
Enclosing Class
EnclosingClass@eed1f14

Why local capturing lambda needs implicitly added field when anonymous class don't?
This is not true. In your example, anonymous class was not using any field from the enclosing method.

For the following snippet,

private static void start() throws IllegalAccessException {
        int x = 5;
        System.out.println("---- Anonymous class ----");
        print(new Runnable() {

            @Override
            public void run() {
                System.out.println(x);
            }
        }.getClass());

        System.out.println("---- Lambda class ----");
        Runnable r = () -> System.out.println(x);
        print(r.getClass());
    }

This is the output:

---- Anonymous class ----
- constructors -
EnclosingClassV2$1(int)
- fields -
 field name: val$x,
 field type: int,
 ---- Lambda class ----
- constructors -
private EnclosingClassV2$$Lambda$1/455659002(int)
- fields -
 field name: arg$1,
 field type: int,

Note:

Since, anonymous instance is present inside the static method, no reference to the enclosing instance is available.
The access specifier for anonymous class is default while for lambda it is private. I could not see any reference in the spec related to this. I dont see any reason why the access specifier for anonymous class cannot be private. May be since, anonymous class is an extension of inner class, the same constructor generation strategy has been reused.

To conclude, a different strategy is followed for generating lambda classes, so that the additional class generation complexity involved in anonymous class can be avoided.

Lambda field capturing local variable .isSynthetic() returns false

In general, your understanding about the synthetic nature of fields generated for captured variables is right.

When we use the following program

public class CheckSynthetic {
    public static void main(String[] args) {
        new CheckSynthetic().check(true);
    }
    private void check(boolean b) {
        print(getClass());
        print(new Runnable() { public void run() { check(!b); } }.getClass());
        print(((Runnable)() -> check(!b)).getClass());
    }
    private void print(Class<?> c) {
        System.out.println(c.getName()+", synthetic: "+c.isSynthetic());
        Stream.of(c.getDeclaredFields(),c.getDeclaredConstructors(),c.getDeclaredMethods())
            .flatMap(Arrays::stream)
            .forEach(m->System.out.println("\t"+m.getClass().getSimpleName()+' '+m.getName()
                                           +", synthetic: "+m.isSynthetic()));
    }
}

we get something like

CheckSynthetic, synthetic: false
    Constructor CheckSynthetic, synthetic: false
    Method main, synthetic: false
    Method check, synthetic: false
    Method print, synthetic: false
    Method lambda$print$1, synthetic: true
    Method lambda$check$0, synthetic: true
CheckSynthetic$1, synthetic: false
    Field val$b, synthetic: true
    Field this$0, synthetic: true
    Constructor CheckSynthetic$1, synthetic: false
    Method run, synthetic: false
CheckSynthetic$$Lambda$21/0x0000000840074440, synthetic: true
    Field arg$1, synthetic: false
    Field arg$2, synthetic: false
    Constructor CheckSynthetic$$Lambda$21/0x0000000840074440, synthetic: false
    Method run, synthetic: false
    Method get$Lambda, synthetic: false

Prior to JDK-11, you'll also find an entry like

    Method access$000, synthetic: true

in the outer class CheckSynthetic.

So for the anonymous inner class, the fields this$0 and val$b are marked as synthetic, as expected.

For the lambda expression, the entire class has been marked as synthetic, but none of its members.

One interpretation could be that marking a class as synthetic is already sufficient here. Considering JVMS §4.7.8:

A class member that does not appear in the source code must be marked using a Synthetic attribute, or else it must have its ACC_SYNTHETIC flag set.

we could say that when the class does not appear in source code, there is no source code that could be checked for the presence of member declarations.

But more important is that this specification applies to class files and while those of us interested in more details know that under the hood, the reference implementation of LambdaMetafactory will generate byte code in the class file format to create an anonymous class, this is an unspecified implementation detail.

As John Rose puts it:

VM anonymous classes are an implementation detail that is opaque to system components except for the lowest layers of the JDK runtime and the JVM itself. […] Ideally we should not make them visible at all, but sometimes it helps (e.g., with single stepping through BCs).

…

You can't rely on any of this meaning what you think it means,
even if it appears to have a classfile structure.

So we shouldn’t reason about this class file structure and only focus on the visible behavior, which is the return value of Field.isSynthetic(). While it’s reasonable to assume that under the hood, this implementation will just report whether the bytecode had the flag or attribute, we have to focus on the bytecode independent contract of isSynthetic:

Returns:

true if and only if this field is a synthetic field as defined by the Java Language Specification.

Which brings us to JLS §13.1:

A construct emitted by a Java compiler must be marked as synthetic if it does not correspond to a construct declared explicitly or implicitly in source code, unless the emitted construct is a class initialization method (JVMS §2.9).

Not only is the possibility of a construct to be “declared … implicitly in source code” quiet fuzzy, the requirement to be marked as synthetic is limited to “a construct emitted by a Java compiler”. But the classes generated at runtime for lambda expressions are not emitted by a Java compiler, they are generated automatically by a bytecode factory. This is more than just quibbling, as the entire §13 is about Binary Compatibility, but the ephemeral classes generated within a single runtime are not subject to Binary Compatibility at all, as the current runtime is the only software which has to deal with them.

The requirements on the runtime class are specified in JLS §15.27.4:

The value of a lambda expression is a reference to an instance of a class with the following properties:

The class implements the targeted functional interface type and, if the target type is an intersection type, every other interface type mentioned in the intersection.

Where the lambda expression has type U, for each non-static member method m of U:

If the function type of U has a subsignature of the signature of m, then the class declares a method that overrides m. The method's body has the effect of evaluating the lambda body, if it is an expression, or of executing the lambda body, if it is a block; if a result is expected, it is returned from the method.

If the erasure of the type of a method being overridden differs in its signature from the erasure of the function type of U, then before evaluating or executing the lambda body, the method's body checks that each argument value is an instance of a subclass or subinterface of the erasure of the corresponding parameter type in the function type of U; if not, a ClassCastException is thrown.

The class overrides no other methods of the targeted functional interface type or other interface types mentioned above, although it may override methods of the Object class.

So the specification does not cover many properties of the actual class and that’s intentional.

So when the result of Field.isSynthetic() is only determined by the Java Language Specification, but the class of the inspected field is off specification, the result is unspecified.

There’s room for interpretation whether, now that we can observe certain artifacts of a generated class, those artifacts should follow certain expectations regarding a similarity to ordinary classes, but there’s not enough information to discuss that. Most notably, there is not a single word in any of the cited specifications about why we have to mark constructs as synthetic and which consequences the presence or absence of the marker has.

Practical tests revealed that Java compilers, i.e. javac, treat synthetic members as nonexistent when trying to access them on source level, but that has not been specified anywhere. Further, this behavior is not relevant for a runtime generated class which is never seen by a Java compiler. In contrast, for an access via Reflection, the synthetic flag seems to have no effect at all.

Why does a lambda need to capture the enclosing instance when referencing a final String field?

If the lambda was capturing foo instead of this, you could in some cases get a different result. Consider the following example:

public class TestClass {
    public static void main(String[] args) {
        MyClass m = new MyClass();
        m.consumer.accept("bar2");
    }
}

class MyClass {
    final String foo;
    final Consumer<String> consumer;

    public MyClass() {
        consumer = getConsumer();
        // first call to illustrate the value that would have been captured
        consumer.accept("bar1");
        foo = "foo";
    }

    public Consumer<String> getConsumer() {
        return bar -> System.out.println(bar + foo);
    }
}

Output:

bar1null
bar2foo

If foo was captured by the lambda, it would be captured as null and the second call would print bar2null. However since the MyClass instance is captured, it prints the correct value.

Of course this is ugly code and a bit contrived, but in more complex, real-life code, such an issue could somewhat easily occur.

Note that the only true ugly thing, is that we are forcing a read of the to-be-assigned foo in the constructor, through the consumer. Building the consumer itself is not expected to read foo at that time, so it is still legit to build it before assigning foo – as long as you don't use it immediately.

However the compiler will not let you initialize the same consumer in the constructor before assigning foo – probably for the best :-)

What is the difference between a lambda and a method reference at a runtime level?

Getting Started

To investigate this we start with the following class:

import java.io.Serializable;
import java.util.Comparator;

public final class Generic {

    // Bad implementation, only used as an example.
    public static final Comparator<Integer> COMPARATOR = (a, b) -> (a > b) ? 1 : -1;

    public static Comparator<Integer> reference() {
        return (Comparator<Integer> & Serializable) COMPARATOR::compare;
    }

    public static Comparator<Integer> explicit() {
        return (Comparator<Integer> & Serializable) (a, b) -> COMPARATOR.compare(a, b);
    }

}

After compilation, we can disassemble it using:

javap -c -p -s -v Generic.class

Removing the irrelevant parts (and some other clutter, such as fully-qualified types and the initialisation of COMPARATOR) we are left with

  public static final Comparator<Integer> COMPARATOR;    

  public static Comparator<Integer> reference();
      0: getstatic     #2  // Field COMPARATOR:LComparator;    
      3: dup    
      4: invokevirtual #3   // Method Object.getClass:()LClass;    
      7: pop    
      8: invokedynamic #4,  0  // InvokeDynamic #0:compare:(LComparator;)LComparator;    
      13: checkcast     #5  // class Serializable    
      16: checkcast     #6  // class Comparator    
      19: areturn

  public static Comparator<Integer> explicit();
      0: invokedynamic #7,  0  // InvokeDynamic #1:compare:()LComparator;    
      5: checkcast     #5  // class Serializable    
      8: checkcast     #6  // class Comparator    
      11: areturn

  private static int lambda$explicit$d34e1a25$1(Integer, Integer);
     0: getstatic     #2  // Field COMPARATOR:LComparator;
     3: aload_0
     4: aload_1
     5: invokeinterface #44,  3  // InterfaceMethod Comparator.compare:(LObject;LObject;)I
    10: ireturn

BootstrapMethods:    
  0: #61 invokestatic invoke/LambdaMetafactory.altMetafactory:(Linvoke/MethodHandles$Lookup;LString;Linvoke/MethodType;[LObject;)Linvoke/CallSite;    
    Method arguments:    
      #62 (LObject;LObject;)I    
      #63 invokeinterface Comparator.compare:(LObject;LObject;)I    
      #64 (LInteger;LInteger;)I    
      #65 5    
      #66 0    

  1: #61 invokestatic invoke/LambdaMetafactory.altMetafactory:(Linvoke/MethodHandles$Lookup;LString;Linvoke/MethodType;[LObject;)Linvoke/CallSite;    
    Method arguments:    
      #62 (LObject;LObject;)I    
      #70 invokestatic Generic.lambda$explicit$df5d232f$1:(LInteger;LInteger;)I    
      #64 (LInteger;LInteger;)I    
      #65 5    
      #66 0

Immediately we see that the bytecode for the reference() method is different to the bytecode for explicit(). However, the notable difference isn't actually relevant, but the bootstrap methods are interesting.

An invokedynamic call site is linked to a method by means of a bootstrap method, which is a method specified by the compiler for the dynamically-typed language that is called once by the JVM to link the site.

(Java Virtual Machine Support for Non-Java Languages, emphasis theirs)

This is the code responsible for creating the CallSite used by the lambda. The Method arguments listed below each bootstrap method are the values passed as the variadic parameter (i.e. args) of LambdaMetaFactory#altMetaFactory.

Format of the Method arguments

samMethodType - Signature and return type of method to be implemented by the function object.
implMethod - A direct method handle describing the implementation method which should be called (with suitable adaptation of argument types, return types, and with captured arguments prepended to the invocation arguments) at invocation time.
instantiatedMethodType - The signature and return type that should be enforced dynamically at invocation time. This may be the same as samMethodType, or may be a specialization of it.
flags indicates additional options; this is a bitwise OR of desired flags. Defined flags are FLAG_BRIDGES, FLAG_MARKERS, and FLAG_SERIALIZABLE.
bridgeCount is the number of additional method signatures the function object should implement, and is present if and only if the FLAG_BRIDGES flag is set.

In both cases here bridgeCount is 0, and so there is no 6, which would otherwise be bridges - a variable-length list of additional methods signatures to implement (given that bridgeCount is 0, I'm not entirely sure why FLAG_BRIDGES is set).

Matching the above up with our arguments, we get:

The function signature and return type (Ljava/lang/Object;Ljava/lang/Object;)I, which is the return type of Comparator#compare, because of generic type erasure.
The method being called when this lambda is invoked (which is different).
The signature and return type of the lambda, which will be checked when the lambda is invoked: (LInteger;LInteger;)I (note that these aren't erased, because this is part of the lambda specification).
The flags, which in both cases is the composition of FLAG_BRIDGES and FLAG_SERIALIZABLE (i.e. 5).
The amount of bridge method signatures, 0.

We can see that FLAG_SERIALIZABLE is set for both lambdas, so it's not that.

Implementation methods

The implementation method for the method reference lambda is Comparator.compare:(LObject;LObject;)I, but for the explicit lambda it's Generic.lambda$explicit$df5d232f$1:(LInteger;LInteger;)I. Looking at the disassembly, we can see that the former is essentially an inlined version of the latter. The only other notable difference is the method parameter types (which, as mentioned earlier, is because of generic type erasure).

When is a lambda actually serializable?

You can serialize a lambda expression if its target type and its captured arguments are serializable.

Lambda Expressions (The Java™ Tutorials)

The important part of that is "captured arguments". Looking back at the disassembled bytecode, the invokedynamic instruction for the method reference certainly looks like it's capturing a Comparator (#0:compare:(LComparator;)LComparator;, in contrast to the explicit lambda, #1:compare:()LComparator;).

Confirming capturing is the issue

ObjectOutputStream contains an extendedDebugInfo field, which we can set using the -Dsun.io.serialization.extendedDebugInfo=true VM argument:

$ java -Dsun.io.serialization.extendedDebugInfo=true Generic

When we try to serialize the lambdas again, this gives a very satisfactory

Exception in thread "main" java.io.NotSerializableException: Generic$$Lambda$1/321001045
        - element of array (index: 0)
        - array (class "[LObject;", size: 1)
/* ! */ - field (class "invoke.SerializedLambda", name: "capturedArgs", type: "class [LObject;") // <--- !!
        - root object (class "invoke.SerializedLambda", SerializedLambda[capturingClass=class Generic, functionalInterfaceMethod=Comparator.compare:(LObject;LObject;)I, implementation=invokeInterface Comparator.compare:(LObject;LObject;)I, instantiatedMethodType=(LInteger;LInteger;)I, numCaptured=1])
    at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1182)
    /* removed */
    at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
    at Generic.main(Generic.java:27)

What's actually going on

From the above, we can see that the explicit lambda is not capturing anything, whereas the method reference lambda is. Looking over the bytecode again makes this clear:

  public static Comparator<Integer> explicit();
      0: invokedynamic #7,  0  // InvokeDynamic #1:compare:()LComparator;    
      5: checkcast     #5  // class java/io/Serializable    
      8: checkcast     #6  // class Comparator    
      11: areturn

Which, as seen above, has an implementation method of:

  private static int lambda$explicit$d34e1a25$1(java.lang.Integer, java.lang.Integer);
     0: getstatic     #2  // Field COMPARATOR:Ljava/util/Comparator;
     3: aload_0
     4: aload_1
     5: invokeinterface #44,  3  // InterfaceMethod java/util/Comparator.compare:(Ljava/lang/Object;Ljava/lang/Object;)I
    10: ireturn

The explicit lambda is actually calling lambda$explicit$d34e1a25$1, which in turn calls the COMPARATOR#compare. This layer of indirection means it's not capturing anything that isn't Serializable (or anything at all, to be precise), and so is safe to serialize. The method reference expression directly uses COMPARATOR (the value of which is then passed to the bootstrap method):

In Java Lambda's Why Is Getclass() Called on a Captured Variable