take vs first performance in Ruby on Rails
In general "take" will be faster, because the database does not have to identify all of the rows that meet the criteria and then sort them and find the lowest-sorting row. "take" allows the database to stop as soon as it has found a single row.
The degree to which it is faster is going to vary according to:
How much time is saved in not having to look for more than one row. The worst case here is where a full scan of a large table is required, but one matching row is found very early in the scan. "take" would allow the scan to be stopped.
How many rows would need to be sorted to find the one with the lowest id. The worst case here is where every row in the table matches the criteria and needs to be included in the sort.
There are some other factors to consider -- for example for a "first" query the optimiser might be able to access the table via a scan of the primary key index and check each row to see if it matches the condition. If there is a very high likelihood of that then both a complete scan of the data and a sort can be avoided if the query optimiser is sophisticated enough.
In many cases, where there are very few matching records and index-based access to find them, you'll find that the difference is trivial (where there is a unique index on "email" in your example). However, I would still use "take" in preference to first even then.
Edit: I'll just add, though it's a little off-topic, that in your example you might as well use:
User.find_by(email: 'f@example.com')
The generated query should be exactly the same as for take, but the semantics are a bit more clear I think.
Arrays in Ruby: Take vs Limit vs First
- limit is not an array method
- take requires an argument; it returns an empty array if the array is empty.
- first can be called without an argument; it returns nil if the array is empty and the argument is absent.
Source for 2.0 take
static VALUE
rb_ary_take(VALUE obj, VALUE n)
{
long len = NUM2LONG(n);
if (len < 0) {
rb_raise(rb_eArgError, "attempt to take negative size");
}
return rb_ary_subseq(obj, 0, len);
}
Source for 2.0 first:
static VALUE
rb_ary_first(int argc, VALUE *argv, VALUE ary)
{
if (argc == 0) {
if (RARRAY_LEN(ary) == 0) return Qnil;
return RARRAY_PTR(ary)[0];
}
else {
return ary_take_first_or_last(argc, argv, ary, ARY_TAKE_FIRST);
}
}
In terms of Rails:
limit(5)
will add the scope oflimit(5)
to anActiveRecord::Relation
. It can not be called on an array, solimit(5).limit(4)
will fail.first(5)
will add the scope oflimit(5)
to anActiveRecord::Relation
. It can also be called on an array so.first(4).first(3)
will be the same as.limit(4).first(3)
.take(5)
will run the query in the current scope, build all the objects and return the first 5. It only works on arrays, soModel.take(5)
will not work, though the other two will work.
ruby Enumerable#first vs #take
Well, I've looked at the source (Ruby 2.1.5). Under the hood, if first
is provided an argument, it forwards it to take
. Otherwise, it returns a single value:
static VALUE
enum_first(int argc, VALUE *argv, VALUE obj)
{
NODE *memo;
rb_check_arity(argc, 0, 1);
if (argc > 0) {
return enum_take(obj, argv[0]);
}
else {
memo = NEW_MEMO(Qnil, 0, 0);
rb_block_call(obj, id_each, 0, 0, first_i, (VALUE)memo);
return memo->u1.value;
}
}
take
, on the other hand, requires an argument and always returns an array of given size or smaller with the elements taken from the beginning.
static VALUE
enum_take(VALUE obj, VALUE n)
{
NODE *memo;
VALUE result;
long len = NUM2LONG(n);
if (len < 0) {
rb_raise(rb_eArgError, "attempt to take negative size");
}
if (len == 0) return rb_ary_new2(0);
result = rb_ary_new2(len);
memo = NEW_MEMO(result, 0, len);
rb_block_call(obj, id_each, 0, 0, take_i, (VALUE)memo);
return result;
}
So yes, that's a reason why these two are so similar. The only difference seems to be, that first
can be called without arguments and will output not an array, but a single value. <...>.first(1)
, on the other hand, is equivalent to <...>.take(1)
. As simple as that.
With lazy collections, however, things are different. first
in lazy collections is still enum_first
which is, as seen above, is a shortcut to enum_take
. take
, however, is C-coded lazy_take
:
static VALUE
lazy_take(VALUE obj, VALUE n)
{
long len = NUM2LONG(n);
VALUE lazy;
if (len < 0) {
rb_raise(rb_eArgError, "attempt to take negative size");
}
if (len == 0) {
VALUE len = INT2FIX(0);
lazy = lazy_to_enum_i(obj, sym_cycle, 1, &len, 0);
}
else {
lazy = rb_block_call(rb_cLazy, id_new, 1, &obj,
lazy_take_func, n);
}
return lazy_set_method(lazy, rb_ary_new3(1, n), lazy_take_size);
}
...that doesn't evaulate immediately, requires a .force
call for that.
And in fact, it's hinted in the docs under lazy
, it lists all the lazily implemented methods. The list does contain take
, but doesn't contain first
. That said, on lazy sequences take
stays lazy and first
doesn't.
Here's an example how these work differently:
lz = (1..Float::INFINITY).lazy.map{|i| i }
# An infinite sequence, evaluating it head-on won't do
# Ruby 2.2 also offers `.map(&:itself)`
lz.take(5)
#=> #<Enumerator::Lazy: ...>
# Well, `take` is lazy then
# Still, we need values
lz.take(5).force
#=> [1, 2, 3, 4, 5]
# Why yes, values, finally
lz.first(5)
#=> [1, 2, 3, 4, 5]
# So `first` is not lazy, it evaluates values immediately
Some extra fun can be gained by running in versions prior to 2.2 and using code for 2.2 (<...>.lazy.map(&:itself)
), because that way the moment you lose laziness will immediately raise a NoMethodError
.
Which one is best for performance from 'order' and 'sort_by'?
Database processing is few orders of significance faster, than Ruby. And database processing scales exclusively well, whereas Ruby's processing slowdown is proportional to the increase of the size of data you are processing.
Processing with Ruby drastically increases both time and (especially) memory consumption, and it can easily overload the memory and never actually finish the processing having the dataset is "big".
Some calculations with 1_000_000 rows with Ruby would take few tens of seconds, whereas PostgreSQL would finish it within few seconds.
Difference between first! and first method in Rails
I didn't know there was a first!
finder method in ActiveRecord. Thanks to your question, now I know :-)
first!
is the same as first
except that it raises ActiveRecord::RecordNotFound if no record is found.
More details here : http://api.rubyonrails.org/classes/ActiveRecord/FinderMethods.html#method-i-first-21
Difference between ActiveRecord's finder methods: take vs limit(1)
From the docs
# File activerecord/lib/active_record/relation/finder_methods.rb, line 64
def take(limit = nil)
limit ? limit(limit).to_a : find_take
end
take
returns an Array
of records while limit
returns an ActiveRecord Relation that can be chained with other relations.
Better to use each and check for type vs where to filter for the type in ActiveRecord Relation
Each is much slower. You should always prioritize ActiveRecord method/queries over vanilla Ruby.
es = events.where(event_type: "game")
es.find_each do |e|
....
events.where(...)
will return an ActiveRecord relation. Because of that you can use find_each
. The docs suggest not using find_each
on small collections, but I like to plan for scalability
EDIT: If you have any limit or offset, find_each
will ignore the limit and use the default find_in_batches
which overwrites the limit to be 1000.
Related Topics
SQL Statement to Get All Customers with No Orders
Issues with SQL Server Merge Statement
How Much Does Wrapping Inserts in a Transaction Help Performance on SQL Server
Best Way to Interpolate Values in SQL
Postgresql - Best Way to Return an Array of Key-Value Pairs
Determine Row That Caused "Unexpected End of File" Error in Bulk Insert
Postgres "Missing From-Clause Entry" Error on Query with With Clause
Using Same Column Multiple Times in Where Clause
Unexpected Eof Encountered in Bcp
How to Select Top 1 and Ordered by Date in Oracle SQL
How to Use an Oracle Associative Array in a SQL Query
SQL Server: How to Select the Installation Path
How to Avoid Dynamic SQL When Using an Undetermined Number of Parameters
How to Get All the Fields of a Row Using the SQL Max Function
Mysql: Returning Multiple Columns from an In-Line Subquery
What's the Easiest Way to Preview Data from an Image Column