Optimizing Lazy Collections

Optimizing lazy collections

I completely agree with Alexander here. If you're storing lazy collections, you're generally doing something wrong, and the cost of repeated accesses is going to constantly surprise you.

These collections already blow up their complexity requirements, it's true:

Note: The performance of accessing startIndex, first, or any methods that depend on startIndex depends on how many elements satisfy the predicate at the start of the collection, and may not offer the usual performance given by the Collection protocol. Be aware, therefore, that general operations on LazyDropWhileCollection instances may not have the documented complexity.

But caching won't fix that. They'll still be O(n) on the first access, so a loop like

for i in 0..<xs.count { print(xs[i]) }

is still O(n^2). Also remember that O(1) and "fast" are not the same thing. It feels like you're trying to get to "fast" but that doesn't fix the complexity promise (that said, lazy structures are already breaking their complexity promises in Swift).

Caching is a net-negative because it makes the normal (and expected) use of lazy data structures slower. The normal way to use lazy data structures is to consume them either zero or one times. If you were going to consume them more than one time, you should use a strict data structure. Caching something that you never use is a waste of time and space.

There are certainly conceivable use cases where you have a large data structure that will be sparsely accessed multiple times, and so caching would be useful, but this isn't the use case lazy was built to handle.

Attempting to optimize lazy collections on structs by using caching is impossible since subscript(_ position:) and all other methods that you'd need to implement to conform to LazyProtocolCollection are non-mutating and structs are immutable by default. This means that we have to recompute all operations for every call to a property or method.

This isn't true. A struct can internally store a reference type to hold its cache and this is common. Strings do exactly this. They include a StringBuffer which is a reference type (for reasons related to a Swift compiler bug, StringBuffer is actually implemented as a struct that wraps a class, but conceptually it is a reference type). Lots of value types in Swift store internal buffer classes this way, which allows them to be internally mutable while presenting an immutable interface. (It's also important for CoW and lots of other performance and memory related reasons.)

Note that adding caching today would also break existing use cases of lazy:

struct Massive {
    let id: Int
    // Lots of data, but rarely needed.
}

// We have lots of items that we look at occassionally
let ids = 0..<10_000_000

// `massives` is lazy. When we ask for something it creates it, but when we're 
// done with it, it's thrown away. If `lazy` forced caching, then everything 
// we accessed would be forever. Also, if the values in `Massive` change over
// time, I certainly may want it to be rebuilt at this point and not cached.
let massives = ids.lazy.map(Massive.init)
let aMassive = massives[10]

This isn't to say a caching data structure wouldn't be useful in some cases, but it certainly isn't always a win. It imposes a lot of costs and breaks some uses while helping others. So if you want those other use cases, you should build a data structure that provides them. But it's reasonable that lazy is not that tool.

How to speed up lazy-Loading of very large collections

Based on to the question/answer What is the difference between hibernate.jdbc.fetch_size and hibernate.jdbc.batch_size? try to set the properties hibernate.jdbc.fetch_size and hibernate.jdbc.batch_size. At least the property 'hibernate.jdbc.fetch_size' sets the fetch size directly on the JDBC connection as you do in the JDBC test itself. See 4.5.4. Batching Database Operations.

Hibernate: best practice to pull all lazy collections

Use Hibernate.initialize() within @Transactional to initialize lazy objects.

 start Transaction 
      Hibernate.initialize(entity.getAddresses());
      Hibernate.initialize(entity.getPersons());
 end Transaction

Now out side of the Transaction you are able to get lazy objects.

entity.getAddresses().size();
entity.getPersons().size();

Understanding lazy loading optimization in C#

Lazy loading is not something specific to C# or to Entity Framework. It's a common pattern, which allows defer some data loading. Deferring means not loading immediately. Some samples when you need that:

Loading images in (Word) document. Document may be big and it can contain thousands of images. If you'll load all them when document is opened it might take big amount of time. Nobody wants sit and watch 30 seconds on loading document. Same approach is used in web browsers - resources are not sent with body of page. Browser defers resources loading.
Loading graphs of objects. It may be objects from database, file system objects etc. Loading full graph might be equal to loading all database content into memory. How long it will take? Is it efficient? No. If you are building some file system explorer will you load info about every file in system before you start using it? It's much faster if you will load info about current directory only (and probably it's direct children).

Lazy loading not always mean deferring loading until you really need data. Loading might occur in background thread before you really need that data. E.g. you might never scroll to the bottom of web page to see footer image. Lazy loading means only deferring. And C# enumerators can help you with that. Consider getting list of files in directory:

string[] files = Directory.GetFiles("D:");
IEnumerable<string> filesEnumerator = Directory.EnumerateFiles("D:");

First approach returns array of files. It means directory should get all its files and save their names to array before you can get even first file name. It's like loading all images before you see document.

Second approach uses enumerator - it returns files one by one when you ask for next file name. It means that enumerator is returned immediately without getting all files and saving them to some collection. And you can process files one by one when you need that. Here getting files list is deferred.

But you should be careful. If underlying operation is not deferred, then returning enumerator gives you no benefits. E.g.

public IEnumerable<string> EnumerateFiles(string path)
{
    foreach(string file in Directory.GetFiles(path))
        yield return file;
}

Here you use GetFiles method which fills array of file names before returning them. So yielding files one by one gives you no speed benefits.

Btw in your case you have exactly same problem - GetCustomAttributes extension internally uses Attribute.GetCustomAttributes method which returns array of attributes. So you will not reduce time of getting first result.

How to optimize fetching multiple collections in Hibernate?

You could also use the annotation @Fetch(FetchMode.SUBSELECT) on your collection
if you "touch" one collection of that type, all collections will be fetched in ONE single SQL request.

@Entity
public class Country implements java.io.Serializable {

    private long id;
    private int version;
    private String country;
    private Set<City> cities = new HashSet<City>(0);

    @Fetch(FetchMode.SUBSELECT)
    @OneToMany(mappedBy = "country", cascade = CascadeType.ALL)
    public Set<City> getCities() {
        return cities;
    }

    ...
}

Here is an example of how to use it:

    public List<Country> selectCountrySubSelect() {
        List<Country> list = getSession().createQuery("select c from Country c").list();
        // You don't have to initialize every collections
        // for (Country country : list) {
        // Hibernate.initialize(country.getCities());
        // }
        // but just "touch" one, and all will be initialized
        Hibernate.initialize(((Country) list.get(0)).getCities());
        return list;
    }

the logs :

DEBUG org.hibernate.engine.loading.internal.CollectionLoadContext.endLoadingCollections():  - 2 collections were found in result set for role: business.hb.Country.cities
DEBUG org.hibernate.engine.loading.internal.CollectionLoadContext.endLoadingCollection():  - Collection fully initialized: [business.hb.Country.cities#1]
DEBUG org.hibernate.engine.loading.internal.CollectionLoadContext.endLoadingCollection():  - Collection fully initialized: [business.hb.Country.cities#2]
DEBUG org.hibernate.engine.loading.internal.CollectionLoadContext.endLoadingCollections():  - 2 collections initialized for role: business.hb.Country.cities
DEBUG org.hibernate.engine.internal.StatefulPersistenceContext.initializeNonLazyCollections():  - Initializing non-lazy collections

Doctrine: Prevent lazy loading whole collection when need a single element

Have you looked at Extra Lazy Associations?

New in version 2.1.
In many cases associations between entities can get pretty large. Even in a simple scenario like a blog. where posts can be commented, you always have to assume that a post draws hundreds of comments. In Doctrine 2.0 if you accessed an association it would always get loaded completely into memory. This can lead to pretty serious performance problems, if your associations contain several hundreds or thousands of entities.
With Doctrine 2.1 a feature called Extra Lazy is introduced for associations. Associations are marked as Lazy by default, which means the whole collection object for an association is populated the first time its accessed. If you mark an association as extra lazy the following methods on collections can be called without triggering a full load of the collection:
Collection#contains($entity)
Collection#containsKey($key) (available with Doctrine 2.5)
Collection#count()
Collection#get($key) (available with Doctrine 2.4)
Collection#slice($offset, $length = null)

Initialize lazy collections

I had the some problem a few months ago.
I have solved it setting a parameter in my session factory builder.

Try setting the parameter "hibernate.enable_lazy_load_no_trans" to your hibernate config.

sfBuilder.getProperties().put("hibernate.enable_lazy_load_no_trans",
            "true");

This parameter solved my problem. Hope it helps.

Edited:
Use with caution. Check out this post.

How to optimize LINQ-to-NHibernate Lazy Loading

Recommended practice in NHibernate is to keep a session short-lived ans thus prevent lazy loading. You can improve the efficiency of your query by applying "join fetch" (refer to the NHibernate documentation), which, by the way, will also read all child objects, but in one shot and not in the infamous 1 + N anti pattern.

Children is not an IQueryable so you can't use an expression. Linq to NHibernate would allow you to query a Session with linq statements that get translated into sql. Then you could query the Children collection with expressions as predicates.

Optimizing Lazy Collections