Are There Any Ruby Orms Which Use Cursors or Smart Fetch

Are there any Ruby ORMs which use cursors or smart fetch?

Sequel's Dataset#each does yield individual rows at a time, but most database drivers will load the entire result in memory first.

If you are using Sequel's Postgres adapter, you can choose to use real cursors:

posts.use_cursor.each{|p| puts p}

This fetches 1000 rows at a time by default, but you can use an option to specify the amount of rows to grab per cursor fetch:

posts.use_cursor(:rows_per_fetch=>100).each{|p| puts p}

If you aren't using Sequel's Postgres adapter, you can use Sequel's pagination extension:

Sequel.extension :pagination
posts.order(:id).each_page(1000){|ds| ds.each{|p| puts p}}

However, like ActiveRecord's find_in_batches/find_each, this does separate queries, so you need to be careful if there are concurrent modifications to the dataset you are retrieving.

The reason this isn't the default in Sequel is probably the same reason it isn't the default in ActiveRecord, which is that it isn't a good default in the general case. Only queries with large result sets really need to worry about it, and most queries don't return large result sets.

At least with the Postgres adapter cursor support, it's fairly easy to make it the default for your model:

Post.dataset = Post.dataset.use_cursor

For the pagination extension, you can't really do that, but you can wrap it in a method that makes it mostly transparent.

ORM that accepts SQL and simply maps the objects and relations?

I find that the need for a quite complex query is about the 20% of a project, so using an ORM quite helps.

When that 20% arise, I find myself doing something alike what you ask, working excluvely with SQL. ActiveRecord and DataMapper have a find_by_sql method that helps you more, but doesn't instantiate all models (at least on ActiveRecord, if I'm not mistaken, that is).

Have you tried using Sequel? It's an ORM too but let's you have an easier approach and more flexibility on what you have.

Besides that, I can't think of a more focused solution on the ORMs realm. Keep in mind that the ORM tries to abstract the querying interface to simplify. If you are feeling confortable with using raw SQL, maybe you could be more productive with just an SQL facade interface.

When using Sequel ORM; when to use Core or Model?

Sequel core is more a less a version of SQL in ruby. It's good for reporting, data processing, or when you want to manipulate sets of objects at once.

Sequel::Model is an object-relational mapper, allowing you to assign behavior to specific types of rows. If most of your work is dealing with individual rows instead of groups of rows, you will probably want to use models.

If you are unsure, start with Sequel Model. Sequel Model is built on top of Sequel core, so you have all of the power of core datasets when using models.

When to use an ORM (Sequel, Datamapper, AR, etc.) vs. pure SQL for querying

I'm the DataMapper maintainer, and I think for complex reporting you should use SQL.

While I do think someday we'll have a DSL that provides the power and conciseness of SQL, everything I've seen so far requires you to write more Ruby code than SQL for complex queries. I would much rather maintain a 5 line SQL query than 10-15 lines of Ruby code to describe the same complex operation.

Please note I say complex.. if you have something simple, use the ORM's build-in finders. However, I do believe there is a line you can cross where SQL becomes simpler. Now, most apps aren't just reporting. You may have alot of CRUD type operations, for which an ORM is perfectly suited and far better than doing those things by hand.

One thing that an ORM will usually provide is some sort of organization to your application logic. You can group code based around each model in the same file. It's usually there that I'll put the complex SQL query, rather than embedding it in the controller, eg:

class User
include DataMapper::Resource

property :id, Serial
property :name, String, :length => 1..100, :required => true
property :age, Integer, :min => 1, :max => 130

def self.some_complex_query
repository.adapter.select <<-SQL
SELECT ...
FROM ...
WHERE ...
... more complex stuff here ...
SQL
end
end

Then I can just generate the report using User.some_complex_query. You could also push the SQL query into a view if you wanted to further cleanup this code.

EDIT: By "view" in the above sentence I meant RDBMS view, rather than view in the MVC context. Just wanted to clear up any potential confusion.

What ORM to use in one process multiple db connections sinatra application?

DataMapper is designed for multi-database use.

You can set up multiple repositories just by saying something like DataMapper.setup(:repository_one, "mysql://localhost/my_db_name").

DataMapper then tracks all the repositories that have been setup in a hash that you can reference and use for scoping:

DataMapper.repository(:repository_one){ MyModel.all }

(The default scope just being DataMapper.repository, which you can set up by saying DataMapper.setup(:default, "postgres://localhost/my_primary_db") or the like)

Iterate over large external postgres db, manipulate rows, write output to rails postgres db

You will want to use a cursor, either a protocol-level one or an SQL-level cursor with DECLARE and FETCH.

Handily, someone already wrote an ActiveRecord adapter for PostgreSQL cursors; see rubygems.

You might also find this question informative: Are there any Ruby ORMs which use cursors or smart fetch?

I haven't checked the source code / docs to see if the Pg gem supports PostgreSQL's protocol-level cursors for batched reads, but if there's already a tool to do it (as linked above) it's probably not worth exploring.

What are your best Sequel tips?

If you're coming from Rails, note that connection option keys and values used in Sequel are spelled differently than those in database.yml:

db_config {
:adapter => 'postgres', # NOT 'postgresql'
:default_schema => 'public', # NOT :schema_search_path
:user => 'myusername', # NOT :username
:password => 'mypassword',
:host => 'myhost',
:database => 'mydb',
:max_connections => 5 # NOT :pool'
}
DB = Sequel.connect db_config


Related Topics



Leave a reply



Submit