Take VS First Performance in Ruby on Rails

take vs first performance in Ruby on Rails

In general "take" will be faster, because the database does not have to identify all of the rows that meet the criteria and then sort them and find the lowest-sorting row. "take" allows the database to stop as soon as it has found a single row.

The degree to which it is faster is going to vary according to:

  1. How much time is saved in not having to look for more than one row. The worst case here is where a full scan of a large table is required, but one matching row is found very early in the scan. "take" would allow the scan to be stopped.

  2. How many rows would need to be sorted to find the one with the lowest id. The worst case here is where every row in the table matches the criteria and needs to be included in the sort.

There are some other factors to consider -- for example for a "first" query the optimiser might be able to access the table via a scan of the primary key index and check each row to see if it matches the condition. If there is a very high likelihood of that then both a complete scan of the data and a sort can be avoided if the query optimiser is sophisticated enough.

In many cases, where there are very few matching records and index-based access to find them, you'll find that the difference is trivial (where there is a unique index on "email" in your example). However, I would still use "take" in preference to first even then.

Edit: I'll just add, though it's a little off-topic, that in your example you might as well use:

User.find_by(email: 'f@example.com')

The generated query should be exactly the same as for take, but the semantics are a bit more clear I think.

Arrays in Ruby: Take vs Limit vs First

  1. limit is not an array method
  2. take requires an argument; it returns an empty array if the array is empty.
  3. first can be called without an argument; it returns nil if the array is empty and the argument is absent.

Source for 2.0 take

              static VALUE
rb_ary_take(VALUE obj, VALUE n)
{
long len = NUM2LONG(n);
if (len < 0) {
rb_raise(rb_eArgError, "attempt to take negative size");
}
return rb_ary_subseq(obj, 0, len);
}

Source for 2.0 first:

              static VALUE
rb_ary_first(int argc, VALUE *argv, VALUE ary)
{
if (argc == 0) {
if (RARRAY_LEN(ary) == 0) return Qnil;
return RARRAY_PTR(ary)[0];
}
else {
return ary_take_first_or_last(argc, argv, ary, ARY_TAKE_FIRST);
}
}

In terms of Rails:

  1. limit(5) will add the scope of limit(5) to an ActiveRecord::Relation. It can not be called on an array, so limit(5).limit(4) will fail.

  2. first(5) will add the scope of limit(5) to an ActiveRecord::Relation. It can also be called on an array so .first(4).first(3) will be the same as .limit(4).first(3).

  3. take(5) will run the query in the current scope, build all the objects and return the first 5. It only works on arrays, so Model.take(5) will not work, though the other two will work.

ruby Enumerable#first vs #take

Well, I've looked at the source (Ruby 2.1.5). Under the hood, if first is provided an argument, it forwards it to take. Otherwise, it returns a single value:

static VALUE
enum_first(int argc, VALUE *argv, VALUE obj)
{
NODE *memo;
rb_check_arity(argc, 0, 1);
if (argc > 0) {
return enum_take(obj, argv[0]);
}
else {
memo = NEW_MEMO(Qnil, 0, 0);
rb_block_call(obj, id_each, 0, 0, first_i, (VALUE)memo);
return memo->u1.value;
}
}

take, on the other hand, requires an argument and always returns an array of given size or smaller with the elements taken from the beginning.

static VALUE
enum_take(VALUE obj, VALUE n)
{
NODE *memo;
VALUE result;
long len = NUM2LONG(n);

if (len < 0) {
rb_raise(rb_eArgError, "attempt to take negative size");
}

if (len == 0) return rb_ary_new2(0);
result = rb_ary_new2(len);
memo = NEW_MEMO(result, 0, len);
rb_block_call(obj, id_each, 0, 0, take_i, (VALUE)memo);
return result;
}

So yes, that's a reason why these two are so similar. The only difference seems to be, that first can be called without arguments and will output not an array, but a single value. <...>.first(1), on the other hand, is equivalent to <...>.take(1). As simple as that.

With lazy collections, however, things are different. first in lazy collections is still enum_first which is, as seen above, is a shortcut to enum_take. take, however, is C-coded lazy_take:

static VALUE
lazy_take(VALUE obj, VALUE n)
{
long len = NUM2LONG(n);
VALUE lazy;

if (len < 0) {
rb_raise(rb_eArgError, "attempt to take negative size");
}
if (len == 0) {
VALUE len = INT2FIX(0);
lazy = lazy_to_enum_i(obj, sym_cycle, 1, &len, 0);
}
else {
lazy = rb_block_call(rb_cLazy, id_new, 1, &obj,
lazy_take_func, n);
}
return lazy_set_method(lazy, rb_ary_new3(1, n), lazy_take_size);
}

...that doesn't evaulate immediately, requires a .force call for that.

And in fact, it's hinted in the docs under lazy, it lists all the lazily implemented methods. The list does contain take, but doesn't contain first. That said, on lazy sequences take stays lazy and first doesn't.

Here's an example how these work differently:

lz = (1..Float::INFINITY).lazy.map{|i| i }
# An infinite sequence, evaluating it head-on won't do
# Ruby 2.2 also offers `.map(&:itself)`

lz.take(5)
#=> #<Enumerator::Lazy: ...>
# Well, `take` is lazy then
# Still, we need values

lz.take(5).force
#=> [1, 2, 3, 4, 5]
# Why yes, values, finally

lz.first(5)
#=> [1, 2, 3, 4, 5]
# So `first` is not lazy, it evaluates values immediately

Some extra fun can be gained by running in versions prior to 2.2 and using code for 2.2 (<...>.lazy.map(&:itself)), because that way the moment you lose laziness will immediately raise a NoMethodError.

Which one is best for performance from 'order' and 'sort_by'?

Database processing is few orders of significance faster, than Ruby. And database processing scales exclusively well, whereas Ruby's processing slowdown is proportional to the increase of the size of data you are processing.

Processing with Ruby drastically increases both time and (especially) memory consumption, and it can easily overload the memory and never actually finish the processing having the dataset is "big".

Some calculations with 1_000_000 rows with Ruby would take few tens of seconds, whereas PostgreSQL would finish it within few seconds.

Difference between first! and first method in Rails

I didn't know there was a first! finder method in ActiveRecord. Thanks to your question, now I know :-)

first! is the same as first except that it raises ActiveRecord::RecordNotFound if no record is found.

More details here : http://api.rubyonrails.org/classes/ActiveRecord/FinderMethods.html#method-i-first-21

Difference between ActiveRecord's finder methods: take vs limit(1)

From the docs

# File activerecord/lib/active_record/relation/finder_methods.rb, line 64
def take(limit = nil)
limit ? limit(limit).to_a : find_take
end

take returns an Array of records while limit returns an ActiveRecord Relation that can be chained with other relations.

Better to use each and check for type vs where to filter for the type in ActiveRecord Relation

Each is much slower. You should always prioritize ActiveRecord method/queries over vanilla Ruby.

es = events.where(event_type: "game")
es.find_each do |e|
....

events.where(...) will return an ActiveRecord relation. Because of that you can use find_each. The docs suggest not using find_each on small collections, but I like to plan for scalability

EDIT: If you have any limit or offset, find_each will ignore the limit and use the default find_in_batches which overwrites the limit to be 1000.



Related Topics



Leave a reply



Submit