Ef 6 - How to Correctly Perform Parallel Queries

EF 6 - How to correctly perform parallel queries

The problem is this:

EF doesn't support processing multiple requests through the same DbContext object. If your second asynchronous request on the same DbContext instance starts before the first request finishes (and that's the whole point), you'll get an error message that your request is processing against an open DataReader.

Source: https://visualstudiomagazine.com/articles/2014/04/01/async-processing.aspx

You will need to modify your code to something like this:

async Task<List<E1Entity>> GetE1Data()
{
    using(var MyCtx = new MyCtx())
    {
         return await MyCtx.E1.Where(bla bla bla).ToListAsync();
    }
}

async Task<List<E2Entity>> GetE2Data()
{
    using(var MyCtx = new MyCtx())
    {
         return await MyCtx.E2.Where(bla bla bla).ToListAsync();
    }
}

async Task DoSomething()
{
    var t1 = GetE1Data();
    var t2 = GetE2Data();
    await Task.WhenAll(t1,t2);
    DoSomething(t1.Result, t2.Result);
}

Does Entity Framework support parallel async queries?

This is not supported as per the specifications of version 6.

This should throw a DbConcurrencyException exception saying

A second operation started on this context before a previous
asynchronous operation completed. Use 'await' to ensure that any
asynchronous operations have completed before calling another method
on this context. Any instance members are not guaranteed to be thread
safe.
EF will detect if the developer attempts to execute two async operations at one time and throw.

From a codeplex page of the project:

Enabling asynchronous execution of database operations is actually
orthogonal to enabling concurrent execution on the same context. In
the particular case of server scenarios, using concurrent access could
affect scalability negatively as it would mean that in order to
process a single request you would be spinning of an arbitrary number
of different threads. All the threads would compete for resources such
as memory with other threads necessary to server other concurrent
requests.

Entity Framework Core does not support this scenario either.

EF Core doesn't support multiple parallel operations being run on the same context instance. You should always wait for an operation to complete before beginning the next operation. This is typically done by using the await keyword on each async operation.

How to make parallel async queries in EF Core 3.0 with repository pattern?

How to use it with repositories in async?

You can only have one simultaneous asynchronous request per repository. If you need to have more than one at a time, then you need more than one repository. This may require you to inject a repository factory into your types.

What is the best practice in EF Core for using parallel async calls with an Injected DbContext?

It came to the point where really the only way to answer the debate was to do a performance/load test to get comparable, empirical, statistical evidence so I could settle this once and for all.

Here is what I tested:

Cloud Load test with VSTS @ 200 users max for 4 minutes on a Standard Azure webapp.

Test #1: 1 API call with Dependency Injection of the DbContext and async/await for each service.

Results for Test #1: Sample Image

Test #2: 1 API call with new creation of the DbContext within each service method call and using parallel thread execution with WhenAll.

Results for Test #2: Sample Image

Conclusion:

For those who doubt the results, I ran these tests several times with varying user loads, and the averages were basically the same every time.

The performance gains with parallel processing in my opinion is insignificant, and this does not justify the need for abandoning Dependency Injection which would create development overhead/maintenance debt, potential for bugs if handled wrong, and a departure from Microsoft's official recommendations.

One more thing to note: as you can see there were actually a few failed requests with the WhenAll strategy, even when ensuring a new context is created every time. I am not sure the reason for this, but I would much prefer no 500 errors over a 10ms performance gain.

EF Core: Correct way to query data multiple levels deep in related one-to-many entities

Database management systems are extremely optimized in combining tables and selecting columns from the result. The transport of the selected data is the slower part.

Hence it is usually better to limit the data that needs to be transported: let the DBMS do all the joining and selecting.

For this, you don't need to put everything in one big LINQ statement that is hard to understand (and thus hard to test, reuse, maintain). As long as your LINQ statements remain IQuerayble<...>, the query is not executed. Concatenating several of these LINQ statements is not costly.

Back to your question

If you followed the entity framework conventions, your one-to-many relations and your many-to-many will have resulted in classes similar to the following:

class State
{
    public int Id {get; set;}
    public string Name {get; set;}
    ...

    // every State has zero or more Cities (one-to-many)
    public virtual ICollection<City> Cities {get; set;}
}

class City
{
    public int Id {get; set;}
    public string Name {get; set;}
    ...

    // Every City is a City in exactly one State, using foreign key:
    public int StateId {get; set;}
    public virtual State State {get; set;}

    // every City has zero or more Libraries (one-to-many)
    public virtual ICollection<Library> Libraries {get; set;}
}

Library and Books: many-to-many:

class Library
{
    public int Id {get; set;}
    public string Name {get; set;}
    ...

    // Every Library is a Library in exactly one City, using foreign key:
    public int CityId {get; set;}
    public virtual City City {get; set;}

    // every Library has zero or more Books (many-to-many)
    public virtual ICollection<Book> Books {get; set;}
}

class Book
{
    public int Id {get; set;}
    public string Title {get; set;}
    ...

    // Every Book is a Book in zero or more Libraries (many-to-many)
    public virtual ICollection<Book> Books {get; set;}
}

This is all that entity framework needs to know to recognize your tables, the columns in the tables and the relations between the tables.

You will only need attributes or fluent API if you want to deviate from the conventions: different identifiers for columns or tables, non-default types for decimals, non default behaviour for cascade on delete, etc.

In entity framework, the columns in the tables are represented by the non-virtual properties; the virtual properties represent the relations between the tables.

The foreign key is an actual column in the table, hence it is non-virtual. The one-to-many has virtual ICollection<Type> on the "one" side and virtual Type on the "many" side. The many-to-many has virtual ICollection<...> on both sides.

There is no need to specify the junction table. Entity framework recognizes the many-to-many and creates the junction table for you. If you use database first, you might need to use fluent API to specify the junction table.

But how am I supposed to do the joins without a junction table?

Answer: don't do the (group-)joins yourself, use the virtual ICollections!

How can I best return all of the States that contain a particular Book?

int bookId = ...
var statesWithThis = dbContext.States
    .Where(state => state.Cities.SelectMany(city => city.Libraries)
                                .SelectMany(library => library.Books)
                                .Select(book => book.Id)
                                .Contains(bookId);

In words: you have a lot of States. From every State, get all Books that are in all Libraries that are in all Cities in this State. Use SelectMany to make this one big sequence of Books. From every Book Select the Id. The result is one big sequence of Ids (of Books that are in Libraries that are in Cities that are in the State). Keep only those States that have at least one Book.

Room for Optimization

If you regularly need to do similar questions, like: "Give me all States that have a Book from a certain Author", or "Give me all Libraries that have a Book with a certain title", consider to create extension methods for this. This way you can concatenate them as any LINQ method. The extension method creates the query, it will not execute them, so this won't be a performance penalty.

Advantages of the extension method: simpler to understand, reusable, easier to test and easier to change.

If you are not familiar with extension methods, read Extension Methods Demystified

// you need to convert them to IQueryable with the AsQueryable() method, if not
// you get an error since the receiver asks for an IQueryable
// and a ICollection was given
public static IQueryable<Book> GetBooks(this IQueryable<Library> libraries)
{
    return libraries.SelectMany(library => library.AsQueryable().Books);
}

public static IQueryable<Book> GetBooks(this IQueryable<City> cities)
{
    return cities.SelectMany(city => city.Libraries.AsQueryable().GetBooks());
}

Usage:

Get all states that have a book by Karl Marx:

string author = "Karl Marx";
var statesWithCommunistBooks = dbContext.States.
    .Where(state => state.GetBooks()
                         .Select(book => book.Author)
                         .Contains(author));

Get all Cities without a bible:

string title = "Bible";
var citiesWithoutBibles = dbContext.Cities
    .Where(city => !city.GetBooks()
                       .Select(book => book.Title)
                       .Contains(title));

Because you extended your classes with method GetBooks(), it is as if States and Cities have Books. You've seen the reusability above. Changes can be easy, if for instance you extend your database such, that Cities have BookStores. GetBooks can check the libraries and the BookStores. Your change will be in one place. Users of GetBooks(), won't have to change.

Ef 6 - How to Correctly Perform Parallel Queries