Entity Framework Skip/Take Is Very Slow When Number to Skip Is Big

Entity Framework Skip/Take is very slow when number to skip is big

I think OFFSET .. FETCH is very useful when browsing the first pages from your large data (which is happening very often in most applications) and have a performance drawback when querying high order pages from large data.

Check this article for more details regarding performance and alternatives to OFFSET .. FETCH.

Try to apply as many filters to your data before applying paging, so that paging is run against a smaller data volume. It is hard to imagine that the user wants no navigate through 1M rows.

LINQ to Entities - Skip and Take very slow

I got to the end of writing this question, and then struck upon a slightly different approach. I thought I'd post anyway because it may be useful.

I've changed my code structure. I now do an extended .Take() first. This gets me all pages up to, and including, the page I want to return. Then I do the order by and skip to get only the page I want.

        query = query
.Take((page.GetValueOrDefault(0) + 1) * recordCount.GetValueOrDefault(100));

// Now skip to the required page.
daResults = daResults
.OrderBy(x => x.Id)
.Skip(page.GetValueOrDefault(0) * recordCount.GetValueOrDefault(100))
.ToList();

The original Skip/Take results in the following SQL, which needs that internal query that was previously fully evaluated, and is slow:

ORDER BY row_number() OVER (ORDER BY [Project1].[Id] ASC)
OFFSET 100 ROWS FETCH NEXT 100 ROWS ONLY

Changing it, that internal query is much smaller. That internal sub-query uses SELECT TOP(200), which is lightning fast, and then applies the OFFSET etc. to the reduced results

I'm still only enumerating results (.ToList()) after all this has happened, so it all stays in the database and the results are now pretty much instant again.

Query using Entity Framework is Very SLOW

Have the entities been set up with navigation properties? For instance does the Diagnostico entitiy have something like below declared?

public virtual Parciero ParcieroPf {get; set;} 
public virtual Parciero ParcieroPj {get; set;}

From reading the second part of the query it does look like there are related entities mapped.

If you have navigation properties at your disposal then you can structure those queries to use the navigation properties rather than the embedded SQL. As mentioned, that way of querying is vulnerable to SQL injection and it should be a priority to eliminate it.

The performance cost you are likely seeing is due to the manual lazy-load that is being done to populate the various related details for the query results.

At a minimum you can speed up the loading of these related details by first extracting your "idDiagnostico" values from the query results and using those to load all of the related child records in one hit, then associate them to their respective Diagnostico entities:

So assuming you need to keep the SQL query at least to begin with:

// ... load SQL based initial data ...

List<RetornaRelatorioEnvioSmsModel> models = ctx.Database.SqlQuery<RetornaRelatorioEnvioSmsModel>(query).ToList();

if (models.Count == 0)
return models;

// Fetch all applicable IDs.
var diagnosticoIds = ret.Select(x => x.idDiagnostico).ToList();

// Load the related data for *all* applicable diagnostico IDs above. Load into their view models, then at the end, split them among the related diagnostico.

var formasContatos = ctx.tb_diagnostico
.Where(x => diagnosticoIds.Contains(x.id_diagnostico))
.Select(x => new RetornaRelatorioEnvioSmsFormaContatoModel
{
formaContato = x.tb_parceiro.tb_tipo_forma_contato.nm_tipo_forma_contato
}).ToList();

var temas = ctx.tb_diagnostico
.Where(x => diagosticoIds.Contains(x.id_diagnostico))
.Select(x => new RetornaRelatorioEnvioSmsTemaModel
{
tema = x.tb_diagnosticoperfiltema.tb_perfiltema.tb_tema.nm_tema,
nivel = x.tb_diagnosticoperfiltema.tb_nivelmaturidade.nm.nivel
}).ToList();

// This part is a bit tricky.. It looks like you want the the lowest nu_prioridade of the highest nu_pontuacao
var temaPrioritario = ctx.tb_diagnostico
.SelectMany(x => x.tb_diagnosticoperfiltema) // from diagnostico
.SelectMany(x => x.tb_perfiltema) // from diagnostico.diagnosticoperfiltema
.GroupBy(x => x.tb_diagnosticoperfiltema.tb_diagnostico.id_diagnostico) // group by diagnostico ID. Requires bi-directional references...
.Select(x => new
{
x.Key, // id_diagnostico
Tema = x.OrderByDescending(y => y.tb_diagnosticoperfiltema.nu_pontuacao)
.ThenBy(y => y.nu_prioridade)
.Select(y => new RetornaRelatorioEnvioSmsTemaModel
{
tema = y.tb_tema.nm_tema,
nivel = y.tb_diagnosticoperfiltema.tb_nivelmaturidade.nm_nivel
}).FirstOrDefault())
.Where(x => diagnosticoIds.Contains(x.Key))
.Select(x => x.Tema)
.ToList();

// Caveat, the above needs to be tested but should give you an idea on how to select the desired data.

foreach(var model in models)
{
model.formasContato = formasContatos.Where(x => x.id_diagnostico == model.id_diagnostico).ToList();
model.temas = temas.Where(x => x.id_diagnostico == model.id_diagnostico).ToList();
model.temaPrioritario = temaPrioritarios.Where(x => x.id_diagnostico == model.id_diagnostico).ToList();
}

With the navigation properties though, this can all be done away with and loaded from the initial data model retrieved. It's a pretty complex model, and the (Italian?) naming convention makes it a bit hard to follow but hopefully that gives you some ideas on how to tackle the performance issues.



Related Topics



Leave a reply



Submit