What's the Difference Between "Includes" and "Preload" in an Activerecord Query

What's the difference between “includes” and “preload” in an ActiveRecord query?

Rails has 2 ways of avoiding the n+1 problem. One involves creating a big join based query to pull in your associations, the other involves making a separate query per association.

When you do includes rails decides which strategy to use for you. It defaults to the separate query approach (preloading) unless it thinks you are using the columns from the associations in you conditions or order. Since that only works with the joins approach it uses that instead.

Rails' heuristics sometimes get it wrong or you may have a specific reason for preferring one approach over the other. preload ( and its companion method eager_load) allow you to specify which strategy you want rails to use.

What's the difference between includes and joins in ActiveRecord query?

:joins joins tables together in sql, :includes eager loads associations to avoid the n+1 problem (where one query is executed to retrieve the record and then one per association which is loaded).

I suggest you read their sections in Rails Guides to get more info.

What's the difference between left_joins and includes in Rails 5?

includes by default loads the association data in 2 queries just like preload.
But with additional references call it switches from using two separate queries to creating a single LEFT OUTER JOIN like left_joins.

Refs:

Preload, Eagerload, Includes and Joins

Making sense of ActiveRecord joins, includes, preload, and eager_load

For the sake of efficiency and optimization, should I use eager_load or includes?

As I can't infer the query from your description (a: [:b, {c: :d}, {e: :f}]), I need to talk about includes for a little bit.

includes is a query method which accommodates in different situations.

Here are some example code:

# model and reference
class Blog < ActiveRecord::Base
has_many :posts

# t.string "name"
# t.string "author"
end

class Post < ActiveRecord::Base
belongs_to :blog

# t.string "title"
end

# seed
(1..3).each do |b_id|
blog = Blog.create(name: "Blog #{b_id}", author: 'someone')
(1..5).each { |p_id| blog.posts.create(title: "Post #{b_id}-#{p_id}") }
end

In one case, it fires two separate queries, just like preload.

> Blog.includes(:posts)
Blog Load (2.8ms) SELECT "blogs".* FROM "blogs"
Post Load (0.7ms) SELECT "posts".* FROM "posts" WHERE "posts"."blog_id" IN (1, 2, 3)

In another case, when querying on the referenced table, it fires only one LEFT OUTER JOIN query, just like eager_load.

> Blog.includes(:posts).where(posts: {title: 'Post 1-1'})
SQL (0.3ms) SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id" WHERE "posts"."title" = ? [["title", "Post 1-1"]]

So, I think you may asking for the different part of includes and eager_load, which is

Should we use two separate queries or one LEFT OUTER JOIN query for the sake of efficiency and optimisation?

This also confuses me. After some digging, I've found this article by Fabio Akita convinced me. Here are some references and example:

For some situations, the monster outer join becomes slower than many smaller queries. The bottom line is: generally it seems better to split a monster join into smaller ones. This avoid the cartesian product overload problem.

The longer and more complex the result set, the more this matters because the more objects Rails would have to deal with. Allocating and deallocating several hundreds or thousands of small duplicated objects is never a good deal.

Example for query data from Rails

> Blog.eager_load(:posts).map(&:name).count
SQL (0.9ms) SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id"
=> 3

Example for SQL data returned from LEFT OUTER JOIN query

sqlite>  SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id";
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|1|Post 1-1|2015-11-11 15:22:35.053689|2015-11-11 15:22:35.053689|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|2|Post 1-2|2015-11-11 15:22:35.058113|2015-11-11 15:22:35.058113|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|3|Post 1-3|2015-11-11 15:22:35.062776|2015-11-11 15:22:35.062776|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|4|Post 1-4|2015-11-11 15:22:35.065994|2015-11-11 15:22:35.065994|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|5|Post 1-5|2015-11-11 15:22:35.069632|2015-11-11 15:22:35.069632|1
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|6|Post 2-1|2015-11-11 15:22:35.078644|2015-11-11 15:22:35.078644|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|7|Post 2-2|2015-11-11 15:22:35.081845|2015-11-11 15:22:35.081845|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|8|Post 2-3|2015-11-11 15:22:35.084888|2015-11-11 15:22:35.084888|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|9|Post 2-4|2015-11-11 15:22:35.087778|2015-11-11 15:22:35.087778|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|10|Post 2-5|2015-11-11 15:22:35.090781|2015-11-11 15:22:35.090781|2
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|11|Post 3-1|2015-11-11 15:22:35.097479|2015-11-11 15:22:35.097479|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|12|Post 3-2|2015-11-11 15:22:35.103512|2015-11-11 15:22:35.103512|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|13|Post 3-3|2015-11-11 15:22:35.108775|2015-11-11 15:22:35.108775|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|14|Post 3-4|2015-11-11 15:22:35.112654|2015-11-11 15:22:35.112654|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|15|Post 3-5|2015-11-11 15:22:35.117601|2015-11-11 15:22:35.117601|3

We got the expected result from Rails, but bigger result from SQL. And that's the efficiency lose for the LEFT OUTER JOIN.

So my conclusion is, prefer includes over eager_load.


I've concluded a blog post about Preload, Eager_load, Includes, References, and Joins in Rails while researching. Hope this can help.

Reference

  • Remove N+1 queries in your Ruby on Rails app
  • Rails :include vs. :joins
  • Preload, Eagerload, Includes and Joins
  • Rolling with Rails 2.1 - The First Full Tutorial - Part 2

Preloading using includes on unrelated models

A join across three tables is an inefficient way to do this. Rails might even be smart enough to realise this and split it into seperate db queries.

I would do it like this, which has two simple queries instead

student_ids_with_car = Car.select("student_id").distinct
@tests = Test.where("student_id not in (?)", student_ids_with_car)

Comparing .references requirement on includes vs. eager_load

It comes down to the problem they mention in the deprecation warning:

Currently, Active Record recognizes the table in the string, and knows to JOIN the comments table to the query, rather than loading comments in a separate query. However, doing this without writing a full-blown SQL parser is inherently flawed. Since we don't want to write an SQL parser, we are removing this functionality.

In older versions, Rails tried to be helpful about selecting the query pattern to use, and includes would use the preload strategy if it could, but switch to the eager_load strategy when it looks like you're referencing something in a joined table. But without a full SQL parser figuring out what tables are actually referenced, it's like parsing XHTML with a Regex - you can get some things done, but Rails can't decide correctly in every case. Consider:

User.includes(:orders).where("Orders.cost < 20")

This is a nice, simple example, and Rails could tell that you need Orders joined. Now try this one:

User.includes(:orders).where("id IN (select user_id from Orders where Orders.cost < 20)")

This gives the same result, but the subquery rendered joining Orders unnecessary. It's a contrived example, and I don't know whether Rails would decide the second query needed to join or not, but the point is there are cases when the heuristic could make the wrong decision. In those cases, either Rails would perform an unnecessary join, burning memory and slowing the query down, or not perform a necessary join, causing an error.

Rather than maintain a heuristic with a pretty bad failure case, the developers decided to just ask the programmer whether the join is needed. You're able to get it right more often than Rails can (hopefully), and when you get it wrong, it's clear what to change.

Instead of adding references you could switch to eager_load, but keeping includes and references separate allows the implementation flexibility in its query pattern. You could conceivably .includes(:orders, :addresses).references(:orders) and have addresses loaded in a second preload-style query because it's not needed during the join (though Rails actually just includes addresses in the join anyway). You don't need to specify references when you're using eager_load because eager_load always joins, where preload always does multiple queries. All references does is instruct includes to use the necessary eager_load strategy and specify which tables are needed.

Rails 4: Should I preload relations in the controller

When you iterate through a collection and access a child assocations this creates what is called a N+1 query issue where each record will create a separate database query.

If a single request ends up using 100 database queries your server will grind to a halt in no time.

<% @users do |user| %>
<div><%= user.name %> : </div>
<!-- This line will create a N+1 query -->
<% user.posts.each do |post| %>
<div><%= post.title %></div>
<% end %>
<% end %>

So yes - preloading associations is necessary to build applications that scale (or even work). ActiveRecord provides several methods such as .includes, .preload, .eager_load, .join and .left_outer_joins (Rails 5) that each give different results and are good for different use cases.



Related Topics



Leave a reply



Submit