Does The Rails Orm Limit The Ability to Perform Aggregations

Does the Rails ORM limit the ability to perform aggregations?

Limitations of ActiveRecord are one of the reasons I had trouble using Rails in a scientific environment. You might want to check out alternative Ruby ORM's that make it a bit easier to work with a legacy database:

  • Sequel
  • DataMapper

Ultimately though ORM's by design take you away from SQL so it's possible that none of them are a good fit.

What can you NOT do in Rails that you can do in another framework?

Two things. First, Ruby is a relatively young language, and you may run into brick walls when trying to do slightly more esoteric things (like connect to non mainstream or older types of datasources). It also has poor GC, and no kernel threads, both of which are very important for a high performance platform. The main codebase (MRI) is quite hacky (lots of clever obfuscating programmer tricks like macros) and there are parts that are poorly written (gc and thread scheduling leap to mind). Again, it is a very young platform that got very popular very fast.

Secondly, while ruby the language and rails the ideas/paradigm are both phenomenal, ruby and rails the platforms are not. There is a hell of alot in both ruby and rails that is downright ugly, and deployment solutions are in the dark ages compared to what is considered normal for other platforms (php/asp/jsp).

Being accused of trolling here, so I will expound a bit. Due to the threading model, Rails cannot process requests concurrently unless you launch multiple full instances of your rails app. To do that you have two options, the relatively young and still under development passenger (mod_rails), or the tried and tested apache load balancer with multiple mongrel instances behind it.

Either way, the lack of the ability to just just spawn workers means you will want 5-10 full instances of your application running, which incurs a very large overhead (can easily be 300-500megs per app depending on your gems and how big your app is). Because of that, the infrastructure needed to serve rails is a hell of alot more complecated then most other things.

Now, that being said, the situation has been continuously getting better (I mean, passenger is usable now, it wasn't the last time I had to deal with deploying a rails app). I would be very surprised if rails doesn't catch up in the next few years.

Also, rubinius/jruby are doing things the right way, and are moving along at a great pace. I wouldn't be suprised if MRI gets dropped in the next few years in favor of one of those implementations for mainstream rails work.

Rails joins with limit on association

I believe in Rails 4 you can apply a scope on the association:

class Stations
has_many :measures, -> { order('created_at DESC').limit(1) }
end

Then:

2.0.0-p353 :008 > Station.first.measures
Station Load (0.1ms) SELECT "stations".* FROM "stations" ORDER BY "stations"."id" ASC LIMIT 1
Measure Load (0.1ms) SELECT "measures".* FROM "measures" WHERE "measures"."station_id" = ? ORDER BY created_at DESC LIMIT 1 [["station_id", 1]]

Edit: Actually if you need only the most recent one you can use has_one. It will work both for Rails 4 and Rails 3, with slightly modified syntax:

class Stations
has_one :recent_measure, -> { order('created_at DESC') }, class_name: 'Measure' # Rails 4
has_one :recent_measure, order: 'created_at DESC', class_name: 'Measure' # Rails 3
end

Rails: Optimize querying maximum values from associated table

You can use two forms of SQL to efficiently retrieve this information, and I'm assuming here that you want a result for a partner even where there is no klass record for it

The first is:

   select partners.*,
max(klasses.limit) as max_klasses_limit
from partners
left join klasses on klasses.partner_id = partners.id
group by partner.id

Some RDBMSs require that you use "group by partner.*", though, which is potentially expensive in terms of the required sort and the possibility of it spilling to disk.

On the other hand you can add a clause such as:

having("max(klasses.limit) > ?", 3)

... to efficiently filter the partners by their value of maximum klass.limit

The other is:

   select partners.*,
(Select max(klasses.limit)
from klasses
where klasses.partner_id = partners.id) as max_klasses_limit
from partners

The second one does not rely on a group by, and in some RDBMSs may be effectively transformed internally to the first form, but may execute less efficiently by the subquery being executed once per row in the partners table (which would stil be much faster than the raw Rails way of actually submitting a query per row).

The Rails ActiveRecord forms of these would be:

Partner.joins("left join klasses on klasses.partner_id = partners.id").
select("partners.*, max(klasses.limit) as max_klasses_limit").
group(:id)

... and ...

Partner.select("partners.*, (select max(klasses.limit)
from klasses
where klasses.partner_id = partners.id) as max_klasses_limit")

Which of these is actually the most efficient is probably going to depend on the RDBMS and even the RDBMS version.

If you don't need a result when there is no klass for the partner, or there is always guaranteed to be one, then:

Partner.joins(:klasses).
select("partners.*, max(klasses.limit) as max_klasses_limit").
group(:id)

Either way, you can then reference

partner.max_klasses_limit

Object.limit(x).sum(:value) ignores the limit(x) -- why?

it is scoped, it's just that the scope is applied after the result of the aggregate value, and since the aggregate is across all rows in the table, it returns only a single row.

Your second statement is roughly analogous to:

SELECT * FROM (SELECT SUM(value) from hourly_metrics) limit 24;

and this is why the result is confusing you.

To reiterate - the aggregate function SUM returns 1 row, which is scoped with LIMIT 24

Rails relation ordering?

While ActiveRecord does not provide CTEs in its high level API, Arel will allow you to build this exact query.

Since you did not provide models and obfuscated the table names I will build this completely in Arel for the time being.

sub_table = Arel::Table.new('sub_table')
main_table = Arel::Table.new('main_table')
other_table = Arel::Table.new('other_table')

sub_table_query = main_table.project(Arel.star).take(10).skip(100).order(main_table[:id])

sub_table_alias = Arel::Nodes::As.new(Arel.sql(sub_table.name),sub_table_query)

query = sub_table.project(Arel.star)
.join(other_table).on(sub_table[:id].eq(other_table[:other_id]))
.with(sub_table_alias)

query.to_sql

Output :

WITH sub_table AS (
SELECT
*
FROM main_table
ORDER BY main_table.id
-- Output here will differ by database
LIMIT 10 OFFSET 100
)

SELECT
*
FROM sub_table
INNER JOIN other_table ON sub_table.id = other_table.other_id

If you are able to provide better context I can provided a better solution, most likely resulting in an ActiveRecord::Relation object which is likely to be preferable for chaining and model access purposes.

Multiple limit condition in mongodb

Generally what you are describing is a relatively common question around the MongoDB community which we could describe as the "top n results problem". This is when given some input that is likely sorted in some way, how to get the top n results without relying on arbitrary index values in the data.

MongoDB has the $first operator which is available to the aggregation framework which deals with the "top 1" part of the problem, as this actually takes the "first" item found on a grouping boundary, such as your "type". But getting more than "one" result of course gets a little more involved. There are some JIRA issues on this about modifying other operators to deal with n results or "restrict" or "slice". Notably SERVER-6074. But the problem can be handled in a few ways.

Popular implementations of the rails Active Record pattern for MongoDB storage are Mongoid and Mongo Mapper, both allow access to the "native" mongodb collection functions via a .collection accessor. This is what you basically need to be able to use native methods such as .aggregate() which supports more functionality than general Active Record aggregation.

Here is an aggregation approach with mongoid, though the general code does not alter once you have access to the native collection object:

require "mongoid"
require "pp";

Mongoid.configure.connect_to("test");

class Item
include Mongoid::Document
store_in collection: "item"

field :type, type: String
field :pos, type: String
end

Item.collection.drop

Item.collection.insert( :type => "A", :pos => "First" )
Item.collection.insert( :type => "A", :pos => "Second" )
Item.collection.insert( :type => "A", :pos => "Third" )
Item.collection.insert( :type => "A", :pos => "Forth" )
Item.collection.insert( :type => "B", :pos => "First" )
Item.collection.insert( :type => "B", :pos => "Second" )
Item.collection.insert( :type => "B", :pos => "Third" )
Item.collection.insert( :type => "B", :pos => "Forth" )

res = Item.collection.aggregate([
{ "$group" => {
"_id" => "$type",
"docs" => {
"$push" => {
"pos" => "$pos", "type" => "$type"
}
},
"one" => {
"$first" => {
"pos" => "$pos", "type" => "$type"
}
}
}},
{ "$unwind" => "$docs" },
{ "$project" => {
"docs" => {
"pos" => "$docs.pos",
"type" => "$docs.type",
"seen" => {
"$eq" => [ "$one", "$docs" ]
},
},
"one" => 1
}},
{ "$match" => {
"docs.seen" => false
}},
{ "$group" => {
"_id" => "$_id",
"one" => { "$first" => "$one" },
"two" => {
"$first" => {
"pos" => "$docs.pos",
"type" => "$docs.type"
}
},
"splitter" => {
"$first" => {
"$literal" => ["one","two"]
}
}
}},
{ "$unwind" => "$splitter" },
{ "$project" => {
"_id" => 0,
"type" => {
"$cond" => [
{ "$eq" => [ "$splitter", "one" ] },
"$one.type",
"$two.type"
]
},
"pos" => {
"$cond" => [
{ "$eq" => [ "$splitter", "one" ] },
"$one.pos",
"$two.pos"
]
}
}}
])

pp res

The naming in the documents is actually not used by the code, and titles in the data shown for "First", "Second" etc, are really just there to illustrate that you are indeed getting the "top 2" documents from the listing as a result.

So the approach here is essentially to create a "stack" of the documents "grouped" by your key, such as "type". The very first thing here is to take the "first" document from that stack using the $first operator.

The subsequent steps match the "seen" elements from the stack and filter them, then you take the "next" document off of the stack again using the $first operator. The final steps in there are really justx to return the documents to the original form as found in the input, which is generally what is expected from such a query.

So the result is of course, just the top 2 documents for each type:

{ "type"=>"A", "pos"=>"First" }
{ "type"=>"A", "pos"=>"Second" }
{ "type"=>"B", "pos"=>"First" }
{ "type"=>"B", "pos"=>"Second" }

There was a longer discussion and version of this as well as other solutions in this recent answer:

Mongodb aggregation $group, restrict length of array

Essentially the same thing despite the title and that case was looking to match up to 10 top entries or greater. There is some pipeline generation code there as well for dealing with larger matches as well as some alternate approaches that may be considered depending on your data.



Related Topics



Leave a reply



Submit