Is method reference caching a good idea in Java 8?
You have to make a distinction between frequent executions of the same call-site, for stateless lambda or stateful lambdas, and frequent uses of a method-reference to the same method (by different call-sites).
Look at the following examples:
Runnable r1=null;
for(int i=0; i<2; i++) {
Runnable r2=System::gc;
if(r1==null) r1=r2;
else System.out.println(r1==r2? "shared": "unshared");
}
Here, the same call-site is executed two times, producing a stateless lambda and the current implementation will print "shared"
.
Runnable r1=null;
for(int i=0; i<2; i++) {
Runnable r2=Runtime.getRuntime()::gc;
if(r1==null) r1=r2;
else {
System.out.println(r1==r2? "shared": "unshared");
System.out.println(
r1.getClass()==r2.getClass()? "shared class": "unshared class");
}
}
In this second example, the same call-site is executed two times, producing a lambda containing a reference to a Runtime
instance and the current implementation will print "unshared"
but "shared class"
.
Runnable r1=System::gc, r2=System::gc;
System.out.println(r1==r2? "shared": "unshared");
System.out.println(
r1.getClass()==r2.getClass()? "shared class": "unshared class");
In contrast, in the last example are two different call-sites producing an equivalent method reference but as of 1.8.0_05
it will print "unshared"
and "unshared class"
.
For each lambda expression or method reference the compiler will emit an invokedynamic
instruction that refers to a JRE provided bootstrap method in the class LambdaMetafactory
and the static arguments necessary to produce the desired lambda implementation class. It is left to the actual JRE what the meta factory produces but it is a specified behavior of the invokedynamic
instruction to remember and re-use the CallSite
instance created on the first invocation.
The current JRE produces a ConstantCallSite
containing a MethodHandle
to a constant object for stateless lambdas (and there’s no imaginable reason to do it differently). And method references to static
method are always stateless. So for stateless lambdas and single call-sites the answer must be: don’t cache, the JVM will do and if it doesn’t, it must have strong reasons that you shouldn’t counteract.
For lambdas having parameters, and this::func
is a lambda that has a reference to the this
instance, things are a bit different. The JRE is allowed to cache them but this would imply maintaining some sort of Map
between actual parameter values and the resulting lambda which could be more costly than just creating that simple structured lambda instance again. The current JRE does not cache lambda instances having a state.
But this does not mean that the lambda class is created every time. It just means that the resolved call-site will behave like an ordinary object construction instantiating the lambda class that has been generated on the first invocation.
Similar things apply to method references to the same target method created by different call-sites. The JRE is allowed to share a single lambda instance between them but in the current version it doesn’t, most probably because it is not clear whether the cache maintenance will pay off. Here, even the generated classes might differ.
So caching like in your example might have your program do different things than without. But not necessarily more efficient. A cached object is not always more efficient than a temporary object. Unless you really measure a performance impact caused by a lambda creation, you should not add any caching.
I think, there are only some special cases where caching might be useful:
- we are talking about lots of different call-sites referring to the same method
- the lambda is created in the constructor/class initialize because later on the use-site will
- be called by multiple threads concurrently
- suffer from the lower performance of the first invocation
Why aren't method references singleton?
For instance methods, I don't think it would make sense for them to be cached. You'd have to cache one method per instance... which would either mean an extra field within the class associated with the method - one per public method, presumably, because the methods could be referenced from outside the class - or cached within the user of the method reference, in which case you'd need some sort of per-instance cache.
I think it makes more sense for method references to static methods to be cached, because they'll be the same forever. However, to cache the actual Runnable
, you'd need a cache per type that it was targeting. For example:
public interface NotRunnable {
void foo();
}
Runnable thing1 = Main::doStuff; // After making doStuff static
NotRunnable thing2 = Main::doStuff;
Should thing1
and thing2
be equal here? How would the compiler know what type to create, if so? It could create two instances here and cache them separately - and always cache at the point of use rather than the point of method declaration. (Your example has the same class declaring the method and referencing it, which is a very special case. You should consider the more general case where they're different>)
The JLS allows for method references to be cached. From section 15.13.3:
Next, either a new instance of a class with the properties below is allocated and initialized, or an existing instance of a class with the properties below is referenced.
... but even for static methods, it seems javac
doesn't do any caching at the moment.
Differences between using a method reference and function object in stream operations?
Unless there's some additional magic I'm unaware of, the current lambda implementation will desugar your non-capturing lambda into a static method and will cache the lambda instance. By doing the same thing explicitly (a static final
reference to a lambda), you're basically duplicating that implicit work, so you're ending up with two cached references to the same thing. You are also defeating the lazy initialization of the lambda instance, which you'd otherwise be getting for free.
This is why I would prefer just the method reference: it's simpler to write, more idiomatic, and also seems to be more lightweight in terms of implementation.
Same Java Method Referenced But Different Address Returned
Are method references nothing more than syntactic sugar for anonymous classes?
Correct. They aren't necessarily always implemented as heavyweight as that, but conceptually it's all they are.
If not, what do I have to do to always get the same method reference? (Aside from storing a reference once in a field to work with.)
Store the reference in a field. That's the answer. (Sorry.)
Caching the result of all methods
You could use the Reflectivity framework to add pre and post meta links to your methods. A link could check a cache before execution transparently.
link := MetaLink new
metaObject: self;
selector: #cachedExecute:;
arguments: #(selector);
control: #before.
(MyClass>>#myMethodSelector) ast link: link.
This code will install a meta link that sends #cachedExecute:
to a MyClass
object with the argument #myMethodSelector
. The link is installed on the first AST node of the compiled method (of that same method selector, but could be on another method). The #control:
message ensures that the link will be executed before the AST node is executed.
You can of course install multiple meta links that influence each other.
Note that in the above example you must not send the same message (#myMethodSelector
) again inside of the #cachedExecute:
method since you'd end up in a loop.
Update
There's actually an error in the code above (now fixed). The #arguments:
message takes a list of symbols that define the parameters of the method specified via #selector:
. Those arguments will be reified from the context. To pass the method selector you's use the #selector
reification, for the method context the #context
reification and for method arguments #arguments
. To see which reifications are available, look at the #key
on the class side of the subclasses of RFReification
.
Lazy, but persisted, evaluation of java 8 lambda
Disclaimer: this answer doesn't respond the question directly, as it doesn't use neither Supplier
nor Optional
directly in the Node
class. Instead, a generic functional programming technique is presented that might help solve the problem.
If the problem is all about evaluating the function only once for each input value, then you shouldn't change your tree/array/nodes. Instead, memoize the function, which is a pure functional approach:
In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again
Here's a way to do it, inspired by this excellent article written by Pierre-Yves Saumont (please check it for an in-depth introduction to memoization):
public static <T, U> Function<T, U> memoize(Function<T, U> function) {
Map<T, U> cache = new ConcurrentHashMap<>();
return input -> cache.computeIfAbsent(input, function::apply);
}
Suppose you have a method that takes quite long to execute. Then, you could use the memoize
method this way:
// This method takes quite long to execute
Integer longCalculation(Integer x) {
try {
Thread.sleep(1_000);
} catch (InterruptedException ignored) {
}
return x * 2;
}
// Our function is a method reference to the method above
Function<Integer, Integer> function = this::longCalculation;
// Now we memoize the function declared above
Function<Integer, Integer> memoized = memoize(function);
Now, if you call:
int result1 = function.apply(1);
int result2 = function.apply(2);
int result3 = function.apply(3);
int result4 = function.apply(2);
int result5 = function.apply(1);
You'll notice that the five calls take ~5 seconds altogether (1 second for each call).
However, if you use the memoized
function with the same input values 1 2 3 2 1
:
int memoizedResult1 = memoized.apply(1);
int memoizedResult2 = memoized.apply(2);
int memoizedResult3 = memoized.apply(3);
int memoizedResult4 = memoized.apply(2); // <-- returned from cache
int memoizedResult5 = memoized.apply(1); // <-- returned from cache
You'll notice that now the five calls take ~3 seconds altogether. This is because the last two results are immediately returned from the cache.
So, back to your structure... Inside your map
method, you could just memoize the given function and use the returned memoized function instead. Internally, this will cache the function's return values in the ConcurrentHashMap
.
As the memoize
method uses a ConcurrentHashMap
internally, you don't need to worry about concurrency.
Note: This is just the beginning... I'm thinking about two possible improvements here. One would be to limit the size of the cache, so that it doesn't take the whole memory if the domain of the given function is too big. The other improvement would be to memoize the given function only if it hasn't been memoized previously. But these details are left as an exercise for the reader... ;)
In Java why is Function.identity() a static method instead of something else?
TL;DR Using Function.identity()
creates only one object, so it's very memory efficient.
Third implementation doesn't compile, because T
is undefined, so that's not an option.
In second implementation, every time you write Function::identity
a new object instance is created.
In first implementation, whenever you call Function.identity()
, an instance to the same lambda object is returned.
It is simple to see for yourself. Start by creating the two identity
methods in the same class, so rename them to identity1
and identity2
to keep them separately identifiable.
static <T> Function<T, T> identity1() {
return t -> t;
}
static <T> T identity2(T in) {
return in;
}
Write a test
method that accepts a Function
and prints the object, so we can see it's unique identity, as reflected by the hash code.
static <A, B> void test(Function<A, B> func) {
System.out.println(func);
}
Call the test
method repeatedly to see if each one gets a new object instance or not (my code is in a class named Test
).
test(Test.identity1());
test(Test.identity1());
test(Test.identity1());
test(Test::identity2);
test(Test::identity2);
for (int i = 0; i < 3; i++)
test(Test::identity2);
Output
Test$$Lambda$1/0x0000000800ba0840@7adf9f5f
Test$$Lambda$1/0x0000000800ba0840@7adf9f5f
Test$$Lambda$1/0x0000000800ba0840@7adf9f5f
Test$$Lambda$2/0x0000000800ba1040@5674cd4d
Test$$Lambda$3/0x0000000800ba1440@65b54208
Test$$Lambda$4/0x0000000800ba1840@6b884d57
Test$$Lambda$4/0x0000000800ba1840@6b884d57
Test$$Lambda$4/0x0000000800ba1840@6b884d57
As you can see, multiple statements calling Test.identity1()
all get the same object, but multiple statements using Test::identity2
all get different objects.
It is true that repeated executions of the same statement gets the same object (as seen in result from the loop), but that's different from result obtained from different statements.
Conclusion: Using Test.identity1()
creates only one object, so it's more memory efficient than using Test::identity2
.
:: (double colon) operator in Java 8
Usually, one would call the reduce
method using Math.max(int, int)
as follows:
reduce(new IntBinaryOperator() {
int applyAsInt(int left, int right) {
return Math.max(left, right);
}
});
That requires a lot of syntax for just calling Math.max
. That's where lambda expressions come into play. Since Java 8 it is allowed to do the same thing in a much shorter way:
reduce((int left, int right) -> Math.max(left, right));
How does this work? The java compiler "detects", that you want to implement a method that accepts two int
s and returns one int
. This is equivalent to the formal parameters of the one and only method of interface IntBinaryOperator
(the parameter of method reduce
you want to call). So the compiler does the rest for you - it just assumes you want to implement IntBinaryOperator
.
But as Math.max(int, int)
itself fulfills the formal requirements of IntBinaryOperator
, it can be used directly. Because Java 7 does not have any syntax that allows a method itself to be passed as an argument (you can only pass method results, but never method references), the ::
syntax was introduced in Java 8 to reference methods:
reduce(Math::max);
Note that this will be interpreted by the compiler, not by the JVM at runtime! Although it produces different bytecodes for all three code snippets, they are semantically equal, so the last two can be considered to be short (and probably more efficient) versions of the IntBinaryOperator
implementation above!
(See also Translation of Lambda Expressions)
Related Topics
Http Basic Authentication in Java Using Httpclient
Java Time-Based Map/Cache with Expiring Keys
"Integer Number Too Large" Error Message for 600851475143
How to Inject a Property Value into a Spring Bean Which Was Configured Using Annotations
How Does a for Loop Work, Specifically For(;;)
Preferred Java Way to Ping an Http Url for Availability
How to Make a Jframe Modal in Swing Java
How to Include Jar Files with Java File and Compile in Command Prompt
How to Escape Special HTML Characters in Jsp
Convert Iterable to Stream Using Java 8 Jdk
Encrypt Password in Configuration Files
Why Does the Jtable Header Not Appear in the Image
How to Get Java 11 Run-Time Environment Working Since There Is No More Jre 11 for Download
Handling Exceptions from Java Executorservice Tasks
When to Use Abstract Class or Interface