What's the difference between “includes” and “preload” in an ActiveRecord query?
Rails has 2 ways of avoiding the n+1 problem. One involves creating a big join based query to pull in your associations, the other involves making a separate query per association.
When you do includes
rails decides which strategy to use for you. It defaults to the separate query approach (preloading) unless it thinks you are using the columns from the associations in you conditions or order. Since that only works with the joins approach it uses that instead.
Rails' heuristics sometimes get it wrong or you may have a specific reason for preferring one approach over the other. preload
( and its companion method eager_load
) allow you to specify which strategy you want rails to use.
What's the difference between includes and joins in ActiveRecord query?
:joins
joins tables together in sql, :includes
eager loads associations to avoid the n+1 problem (where one query is executed to retrieve the record and then one per association which is loaded).
I suggest you read their sections in Rails Guides to get more info.
What's the difference between left_joins and includes in Rails 5?
includes
by default loads the association data in 2 queries just like preload
.
But with additional references
call it switches from using two separate queries to creating a single LEFT OUTER JOIN
like left_joins
.
Refs:
Preload, Eagerload, Includes and Joins
Making sense of ActiveRecord joins, includes, preload, and eager_load
For the sake of efficiency and optimization, should I use eager_load or includes?
As I can't infer the query from your description (a: [:b, {c: :d}, {e: :f}])
, I need to talk about includes
for a little bit.
includes
is a query method which accommodates in different situations.
Here are some example code:
# model and reference
class Blog < ActiveRecord::Base
has_many :posts
# t.string "name"
# t.string "author"
end
class Post < ActiveRecord::Base
belongs_to :blog
# t.string "title"
end
# seed
(1..3).each do |b_id|
blog = Blog.create(name: "Blog #{b_id}", author: 'someone')
(1..5).each { |p_id| blog.posts.create(title: "Post #{b_id}-#{p_id}") }
end
In one case, it fires two separate queries, just like preload
.
> Blog.includes(:posts)
Blog Load (2.8ms) SELECT "blogs".* FROM "blogs"
Post Load (0.7ms) SELECT "posts".* FROM "posts" WHERE "posts"."blog_id" IN (1, 2, 3)
In another case, when querying on the referenced table, it fires only one LEFT OUTER JOIN
query, just like eager_load
.
> Blog.includes(:posts).where(posts: {title: 'Post 1-1'})
SQL (0.3ms) SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id" WHERE "posts"."title" = ? [["title", "Post 1-1"]]
So, I think you may asking for the different part of includes
and eager_load
, which is
Should we use two separate queries or one LEFT OUTER JOIN
query for the sake of efficiency and optimisation?
This also confuses me. After some digging, I've found this article by Fabio Akita convinced me. Here are some references and example:
For some situations, the monster outer join becomes slower than many smaller queries. The bottom line is: generally it seems better to split a monster join into smaller ones. This avoid the cartesian product overload problem.
The longer and more complex the result set, the more this matters because the more objects Rails would have to deal with. Allocating and deallocating several hundreds or thousands of small duplicated objects is never a good deal.
Example for query data from Rails
> Blog.eager_load(:posts).map(&:name).count
SQL (0.9ms) SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id"
=> 3
Example for SQL data returned from LEFT OUTER JOIN
query
sqlite> SELECT "blogs"."id" AS t0_r0, "blogs"."name" AS t0_r1, "blogs"."author" AS t0_r2, "blogs"."created_at" AS t0_r3, "blogs"."updated_at" AS t0_r4, "posts"."id" AS t1_r0, "posts"."title" AS t1_r1, "posts"."created_at" AS t1_r2, "posts"."updated_at" AS t1_r3, "posts"."blog_id" AS t1_r4 FROM "blogs" LEFT OUTER JOIN "posts" ON "posts"."blog_id" = "blogs"."id";
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|1|Post 1-1|2015-11-11 15:22:35.053689|2015-11-11 15:22:35.053689|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|2|Post 1-2|2015-11-11 15:22:35.058113|2015-11-11 15:22:35.058113|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|3|Post 1-3|2015-11-11 15:22:35.062776|2015-11-11 15:22:35.062776|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|4|Post 1-4|2015-11-11 15:22:35.065994|2015-11-11 15:22:35.065994|1
1|Blog 1|someone|2015-11-11 15:22:35.015095|2015-11-11 15:22:35.015095|5|Post 1-5|2015-11-11 15:22:35.069632|2015-11-11 15:22:35.069632|1
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|6|Post 2-1|2015-11-11 15:22:35.078644|2015-11-11 15:22:35.078644|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|7|Post 2-2|2015-11-11 15:22:35.081845|2015-11-11 15:22:35.081845|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|8|Post 2-3|2015-11-11 15:22:35.084888|2015-11-11 15:22:35.084888|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|9|Post 2-4|2015-11-11 15:22:35.087778|2015-11-11 15:22:35.087778|2
2|Blog 2|someone|2015-11-11 15:22:35.072871|2015-11-11 15:22:35.072871|10|Post 2-5|2015-11-11 15:22:35.090781|2015-11-11 15:22:35.090781|2
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|11|Post 3-1|2015-11-11 15:22:35.097479|2015-11-11 15:22:35.097479|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|12|Post 3-2|2015-11-11 15:22:35.103512|2015-11-11 15:22:35.103512|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|13|Post 3-3|2015-11-11 15:22:35.108775|2015-11-11 15:22:35.108775|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|14|Post 3-4|2015-11-11 15:22:35.112654|2015-11-11 15:22:35.112654|3
3|Blog 3|someone|2015-11-11 15:22:35.093902|2015-11-11 15:22:35.093902|15|Post 3-5|2015-11-11 15:22:35.117601|2015-11-11 15:22:35.117601|3
We got the expected result from Rails, but bigger result from SQL. And that's the efficiency lose for the LEFT OUTER JOIN
.
So my conclusion is, prefer includes
over eager_load
.
I've concluded a blog post about Preload, Eager_load, Includes, References, and Joins in Rails while researching. Hope this can help.
Reference
- Remove N+1 queries in your Ruby on Rails app
- Rails :include vs. :joins
- Preload, Eagerload, Includes and Joins
- Rolling with Rails 2.1 - The First Full Tutorial - Part 2
Preloading using includes on unrelated models
A join across three tables is an inefficient way to do this. Rails might even be smart enough to realise this and split it into seperate db queries.
I would do it like this, which has two simple queries instead
student_ids_with_car = Car.select("student_id").distinct
@tests = Test.where("student_id not in (?)", student_ids_with_car)
Comparing .references requirement on includes vs. eager_load
It comes down to the problem they mention in the deprecation warning:
Currently, Active Record recognizes the table in the string, and knows to JOIN the comments table to the query, rather than loading comments in a separate query. However, doing this without writing a full-blown SQL parser is inherently flawed. Since we don't want to write an SQL parser, we are removing this functionality.
In older versions, Rails tried to be helpful about selecting the query pattern to use, and includes
would use the preload
strategy if it could, but switch to the eager_load
strategy when it looks like you're referencing something in a joined table. But without a full SQL parser figuring out what tables are actually referenced, it's like parsing XHTML with a Regex - you can get some things done, but Rails can't decide correctly in every case. Consider:
User.includes(:orders).where("Orders.cost < 20")
This is a nice, simple example, and Rails could tell that you need Orders
joined. Now try this one:
User.includes(:orders).where("id IN (select user_id from Orders where Orders.cost < 20)")
This gives the same result, but the subquery rendered joining Orders
unnecessary. It's a contrived example, and I don't know whether Rails would decide the second query needed to join or not, but the point is there are cases when the heuristic could make the wrong decision. In those cases, either Rails would perform an unnecessary join, burning memory and slowing the query down, or not perform a necessary join, causing an error.
Rather than maintain a heuristic with a pretty bad failure case, the developers decided to just ask the programmer whether the join is needed. You're able to get it right more often than Rails can (hopefully), and when you get it wrong, it's clear what to change.
Instead of adding references
you could switch to eager_load
, but keeping includes
and references
separate allows the implementation flexibility in its query pattern. You could conceivably .includes(:orders, :addresses).references(:orders)
and have addresses
loaded in a second preload
-style query because it's not needed during the join (though Rails actually just includes addresses
in the join anyway). You don't need to specify references
when you're using eager_load
because eager_load
always joins, where preload
always does multiple queries. All references
does is instruct includes
to use the necessary eager_load
strategy and specify which tables are needed.
Rails 4: Should I preload relations in the controller
When you iterate through a collection and access a child assocations this creates what is called a N+1 query issue where each record will create a separate database query.
If a single request ends up using 100 database queries your server will grind to a halt in no time.
<% @users do |user| %>
<div><%= user.name %> : </div>
<!-- This line will create a N+1 query -->
<% user.posts.each do |post| %>
<div><%= post.title %></div>
<% end %>
<% end %>
So yes - preloading associations is necessary to build applications that scale (or even work). ActiveRecord provides several methods such as .includes
, .preload
, .eager_load
, .join
and .left_outer_joins
(Rails 5) that each give different results and are good for different use cases.
Related Topics
How to Add a New Action to the Existing Controller
Ruby: Sum Corresponding Members of Two or More Arrays
What Does the Fail Keyword Do in Ruby
Ruby - Determining Method Origins
How to Backreference in Ruby Regular Expression (Regex) with Gsub When I Use Grouping
Getting Typed Results from Activerecord Raw SQL
How to Resolve Rails Model Namespace Collision
Rails How to Update a Column After Saving
Rails: Plus Sign in Get-Request Replaced by Space
How to Create a Full Audit Log in Rails for Every Table
Why Does Ruby on Rails Use Http://0.0.0.0:3000 Instead of Http://Localhost:3000
Ruby on Rails Error "Cannot Load Such File -- Less"
How to Check If a Variable Is a Number or a String
Bundler Could Not Find Compatible Versions for Gem "Bundler":