Is Method Reference Caching a Good Idea in Java 8

Is method reference caching a good idea in Java 8?

You have to make a distinction between frequent executions of the same call-site, for stateless lambda or stateful lambdas, and frequent uses of a method-reference to the same method (by different call-sites).

Look at the following examples:

    Runnable r1=null;
    for(int i=0; i<2; i++) {
        Runnable r2=System::gc;
        if(r1==null) r1=r2;
        else System.out.println(r1==r2? "shared": "unshared");
    }

Here, the same call-site is executed two times, producing a stateless lambda and the current implementation will print "shared".

Runnable r1=null;
for(int i=0; i<2; i++) {
  Runnable r2=Runtime.getRuntime()::gc;
  if(r1==null) r1=r2;
  else {
    System.out.println(r1==r2? "shared": "unshared");
    System.out.println(
        r1.getClass()==r2.getClass()? "shared class": "unshared class");
  }
}

In this second example, the same call-site is executed two times, producing a lambda containing a reference to a Runtime instance and the current implementation will print "unshared" but "shared class".

Runnable r1=System::gc, r2=System::gc;
System.out.println(r1==r2? "shared": "unshared");
System.out.println(
    r1.getClass()==r2.getClass()? "shared class": "unshared class");

In contrast, in the last example are two different call-sites producing an equivalent method reference but as of 1.8.0_05 it will print "unshared" and "unshared class".

For each lambda expression or method reference the compiler will emit an invokedynamic instruction that refers to a JRE provided bootstrap method in the class LambdaMetafactory and the static arguments necessary to produce the desired lambda implementation class. It is left to the actual JRE what the meta factory produces but it is a specified behavior of the invokedynamic instruction to remember and re-use the CallSite instance created on the first invocation.

The current JRE produces a ConstantCallSite containing a MethodHandle to a constant object for stateless lambdas (and there’s no imaginable reason to do it differently). And method references to static method are always stateless. So for stateless lambdas and single call-sites the answer must be: don’t cache, the JVM will do and if it doesn’t, it must have strong reasons that you shouldn’t counteract.

For lambdas having parameters, and this::func is a lambda that has a reference to the this instance, things are a bit different. The JRE is allowed to cache them but this would imply maintaining some sort of Map between actual parameter values and the resulting lambda which could be more costly than just creating that simple structured lambda instance again. The current JRE does not cache lambda instances having a state.

But this does not mean that the lambda class is created every time. It just means that the resolved call-site will behave like an ordinary object construction instantiating the lambda class that has been generated on the first invocation.

Similar things apply to method references to the same target method created by different call-sites. The JRE is allowed to share a single lambda instance between them but in the current version it doesn’t, most probably because it is not clear whether the cache maintenance will pay off. Here, even the generated classes might differ.

So caching like in your example might have your program do different things than without. But not necessarily more efficient. A cached object is not always more efficient than a temporary object. Unless you really measure a performance impact caused by a lambda creation, you should not add any caching.

I think, there are only some special cases where caching might be useful:

we are talking about lots of different call-sites referring to the same method
the lambda is created in the constructor/class initialize because later on the use-site will
- be called by multiple threads concurrently
- suffer from the lower performance of the first invocation

Why aren't method references singleton?

For instance methods, I don't think it would make sense for them to be cached. You'd have to cache one method per instance... which would either mean an extra field within the class associated with the method - one per public method, presumably, because the methods could be referenced from outside the class - or cached within the user of the method reference, in which case you'd need some sort of per-instance cache.

I think it makes more sense for method references to static methods to be cached, because they'll be the same forever. However, to cache the actual Runnable, you'd need a cache per type that it was targeting. For example:

public interface NotRunnable {
    void foo();
}

Runnable thing1 = Main::doStuff; // After making doStuff static
NotRunnable thing2 = Main::doStuff;

Should thing1 and thing2 be equal here? How would the compiler know what type to create, if so? It could create two instances here and cache them separately - and always cache at the point of use rather than the point of method declaration. (Your example has the same class declaring the method and referencing it, which is a very special case. You should consider the more general case where they're different>)

The JLS allows for method references to be cached. From section 15.13.3:

Next, either a new instance of a class with the properties below is allocated and initialized, or an existing instance of a class with the properties below is referenced.

... but even for static methods, it seems javac doesn't do any caching at the moment.

Differences between using a method reference and function object in stream operations?

Unless there's some additional magic I'm unaware of, the current lambda implementation will desugar your non-capturing lambda into a static method and will cache the lambda instance. By doing the same thing explicitly (a static final reference to a lambda), you're basically duplicating that implicit work, so you're ending up with two cached references to the same thing. You are also defeating the lazy initialization of the lambda instance, which you'd otherwise be getting for free.

This is why I would prefer just the method reference: it's simpler to write, more idiomatic, and also seems to be more lightweight in terms of implementation.

Same Java Method Referenced But Different Address Returned

Are method references nothing more than syntactic sugar for anonymous classes?

Correct. They aren't necessarily always implemented as heavyweight as that, but conceptually it's all they are.

If not, what do I have to do to always get the same method reference? (Aside from storing a reference once in a field to work with.)

Store the reference in a field. That's the answer. (Sorry.)

Caching the result of all methods

You could use the Reflectivity framework to add pre and post meta links to your methods. A link could check a cache before execution transparently.

link := MetaLink new
    metaObject: self;
    selector: #cachedExecute:;
    arguments: #(selector);
    control: #before.
(MyClass>>#myMethodSelector) ast link: link.

This code will install a meta link that sends #cachedExecute: to a MyClass object with the argument #myMethodSelector. The link is installed on the first AST node of the compiled method (of that same method selector, but could be on another method). The #control: message ensures that the link will be executed before the AST node is executed.

You can of course install multiple meta links that influence each other.

Note that in the above example you must not send the same message (#myMethodSelector) again inside of the #cachedExecute: method since you'd end up in a loop.

Update
There's actually an error in the code above (now fixed). The #arguments: message takes a list of symbols that define the parameters of the method specified via #selector:. Those arguments will be reified from the context. To pass the method selector you's use the #selector reification, for the method context the #context reification and for method arguments #arguments. To see which reifications are available, look at the #key on the class side of the subclasses of RFReification.

Lazy, but persisted, evaluation of java 8 lambda

Disclaimer: this answer doesn't respond the question directly, as it doesn't use neither Supplier nor Optional directly in the Node class. Instead, a generic functional programming technique is presented that might help solve the problem.

If the problem is all about evaluating the function only once for each input value, then you shouldn't change your tree/array/nodes. Instead, memoize the function, which is a pure functional approach:

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again

Here's a way to do it, inspired by this excellent article written by Pierre-Yves Saumont (please check it for an in-depth introduction to memoization):

public static <T, U> Function<T, U> memoize(Function<T, U> function) {
    Map<T, U> cache = new ConcurrentHashMap<>();
    return input -> cache.computeIfAbsent(input, function::apply);
}

Suppose you have a method that takes quite long to execute. Then, you could use the memoize method this way:

// This method takes quite long to execute
Integer longCalculation(Integer x) {
    try {
        Thread.sleep(1_000);
    } catch (InterruptedException ignored) {
    }
    return x * 2;
}

// Our function is a method reference to the method above
Function<Integer, Integer> function = this::longCalculation;

// Now we memoize the function declared above
Function<Integer, Integer> memoized = memoize(function);

Now, if you call:

int result1 = function.apply(1);
int result2 = function.apply(2);
int result3 = function.apply(3);
int result4 = function.apply(2);
int result5 = function.apply(1);

You'll notice that the five calls take ~5 seconds altogether (1 second for each call).

However, if you use the memoized function with the same input values 1 2 3 2 1:

int memoizedResult1 = memoized.apply(1);
int memoizedResult2 = memoized.apply(2);
int memoizedResult3 = memoized.apply(3);
int memoizedResult4 = memoized.apply(2); // <-- returned from cache
int memoizedResult5 = memoized.apply(1); // <-- returned from cache

You'll notice that now the five calls take ~3 seconds altogether. This is because the last two results are immediately returned from the cache.

So, back to your structure... Inside your map method, you could just memoize the given function and use the returned memoized function instead. Internally, this will cache the function's return values in the ConcurrentHashMap.

As the memoize method uses a ConcurrentHashMap internally, you don't need to worry about concurrency.

Note: This is just the beginning... I'm thinking about two possible improvements here. One would be to limit the size of the cache, so that it doesn't take the whole memory if the domain of the given function is too big. The other improvement would be to memoize the given function only if it hasn't been memoized previously. But these details are left as an exercise for the reader... ;)

In Java why is Function.identity() a static method instead of something else?

TL;DR Using Function.identity() creates only one object, so it's very memory efficient.

Third implementation doesn't compile, because T is undefined, so that's not an option.

In second implementation, every time you write Function::identity a new object instance is created.

In first implementation, whenever you call Function.identity(), an instance to the same lambda object is returned.

It is simple to see for yourself. Start by creating the two identity methods in the same class, so rename them to identity1 and identity2 to keep them separately identifiable.

static <T> Function<T, T> identity1() {
    return t -> t;
}

static <T> T identity2(T in) {
    return in;
}

Write a test method that accepts a Function and prints the object, so we can see it's unique identity, as reflected by the hash code.

static <A, B> void test(Function<A, B> func) {
    System.out.println(func);
}

Call the test method repeatedly to see if each one gets a new object instance or not (my code is in a class named Test).

test(Test.identity1());
test(Test.identity1());
test(Test.identity1());
test(Test::identity2);
test(Test::identity2);
for (int i = 0; i < 3; i++)
    test(Test::identity2);

Output

Test$$Lambda$1/0x0000000800ba0840@7adf9f5f
Test$$Lambda$1/0x0000000800ba0840@7adf9f5f
Test$$Lambda$1/0x0000000800ba0840@7adf9f5f
Test$$Lambda$2/0x0000000800ba1040@5674cd4d
Test$$Lambda$3/0x0000000800ba1440@65b54208
Test$$Lambda$4/0x0000000800ba1840@6b884d57
Test$$Lambda$4/0x0000000800ba1840@6b884d57
Test$$Lambda$4/0x0000000800ba1840@6b884d57

As you can see, multiple statements calling Test.identity1() all get the same object, but multiple statements using Test::identity2 all get different objects.

It is true that repeated executions of the same statement gets the same object (as seen in result from the loop), but that's different from result obtained from different statements.

Conclusion: Using Test.identity1() creates only one object, so it's more memory efficient than using Test::identity2.

:: (double colon) operator in Java 8

Usually, one would call the reduce method using Math.max(int, int) as follows:

reduce(new IntBinaryOperator() {
    int applyAsInt(int left, int right) {
        return Math.max(left, right);
    }
});

That requires a lot of syntax for just calling Math.max. That's where lambda expressions come into play. Since Java 8 it is allowed to do the same thing in a much shorter way:

reduce((int left, int right) -> Math.max(left, right));

How does this work? The java compiler "detects", that you want to implement a method that accepts two ints and returns one int. This is equivalent to the formal parameters of the one and only method of interface IntBinaryOperator (the parameter of method reduce you want to call). So the compiler does the rest for you - it just assumes you want to implement IntBinaryOperator.

But as Math.max(int, int) itself fulfills the formal requirements of IntBinaryOperator, it can be used directly. Because Java 7 does not have any syntax that allows a method itself to be passed as an argument (you can only pass method results, but never method references), the :: syntax was introduced in Java 8 to reference methods:

reduce(Math::max);

Note that this will be interpreted by the compiler, not by the JVM at runtime! Although it produces different bytecodes for all three code snippets, they are semantically equal, so the last two can be considered to be short (and probably more efficient) versions of the IntBinaryOperator implementation above!

(See also Translation of Lambda Expressions)

Is Method Reference Caching a Good Idea in Java 8