Orm Performance Cost

ORM performance cost

My advice is not to worry about this until you need to - don't optimise prematurely. An ORM can provide many benefits to development speed, code readability and can remove a lot of code repetition. I would recommend using one if it will make your application easier to develop.

As you progress through the development use benchmarks and profiling to determine the bottlenecks in the code and if needed you can bypass the ORM and use manual queries where they are required. Normally you will be able to improve the speed of the ORM using caching and database indexes (amongst other things) and then you can decide where manual queries are required. For the most part, the ORM performance will probably acceptable and the benefits of using it will far outweigh the performance cost.

Does the exists function have a performance cost in Django's ORM?

Yes, it will query the database, but the minimum possible query.

As mentioned in the docs:

Returns True if the QuerySet contains any results, and False if not. This tries to perform the query in the simplest and fastest way possible, but it does execute nearly the same query as a normal QuerySet query.

and

Additionally, if a some_queryset has not yet been evaluated, but you know that it will be at some point, then using some_queryset.exists() will do more overall work (one query for the existence check plus an extra one to later retrieve the results) than simply using bool(some_queryset), which retrieves the results and then checks if any were returned.

What is the a performance cost of Django's unique_together feature

unique_together is implemented as UNIQUE CONSTRAINT at the database level. Read how SQL indexes work, about read / write ratio (indexes are built when data is written, so it slows down write operations but when there is much more of reads than writes that is not the problem), read about index selectivity / cardinality. Use EXPLAIN queries to check whether the index are used (not ignored) when retrieving data.

Having multiple large unique indexes is one of the sign that the table should be split into two or few with lower number of columns, related through one to many relations.

ORM query performance vs RDBMS performance

You're comparing apples to oranges here.

By definition, an RDBMS is always going to be faster, because the RDBMS is your database (RDBMS = Relational Database Management System). IE -- MySQL, SQL Server, PostgreSQL, etc. And the database does one thing really, really well -- handle data (okay, two things, depending on how you look at it -- storing and retrieving data).

Since every way of accessing the database from any other language is at least one step removed from the database itself, everything is slower than the RDBMS itself, if for no other reason than the language's interpreter has to first connect to the database at least once, before it can do anything.

That said, there are a few different layers available when dealing with databases in PHP:

  • Raw queries using PHP's built in mysql_* functions.
  • Basic database abstraction layers (ie - PDO)
  • Basic query builders (ie - Laravel's Query Builder)
  • Active Record pattern ORMs (ie - Eloquent)
  • Stateless/transactional ORMs (ie - Doctrine)

Assuming perfectly optimized queries fed into the given method by the developer, raw queries will be fastest, followed by the basic DBAL, followed by anything built on top of the basic DBAL. Where query builders and the ORMs built on them fall will depend on whether the query builder is, itself, built on top of another DBAL (in this case, I think it puts Eloquent one more layer removed than Doctrine, because Eloquent is built on Query Builder, which is built on PDO). This is because each one is an abstraction layer over the previous, so the path of the code, when executed, has to run through the stack.

The question then becomes how much of a difference are we talking? That depends entirely on the queries you're feeding into the system, as well as the quality of the system, itself. What are you looking for to show differences? How fast it can do a basic SELECT? Or how well it can do some crazy multi-JOIN query? What determines "speed" for the purpose of your thesis? Only you can really decide that, because you have more information than anyone here. For the sake of thoroughness, you're probably looking at basic SELECTs, complex queries that include things like JOINs, ORDER BYs, and GROUP BYs, and INSERT and UPDATE commands.

I will tell you this, though -- any test to show speed differences will likely have be on thousands or tens of thousands of transactions, at least, in order to show any significant differences, because on an individual transaction level, we're talking microseconds and possibly even nanoseconds in differences.

In actual industry use, then, how do we decide what route to go? Speed and ease of writing and maintaining the code. In that aspect, ORMs or DBALs will very often beat out raw queries. The fractions of a second per script run lost to the abstraction overhead is recuperated thousands upon thousands of times over in developer costs for time spent writing and maintaining the code in question.

In fact, by the time you get to the point where ORM vs DBAL vs raw queries actually matters, odds are good that you're starting to question whether your original database, language interpreter, or server is up to par with your software's demands. This is actually the issue that Facebook started facing a couple of years ago, at which point they started offloading some of their PHP to C, because C is faster in certain cases. It's also why they've created a completely new interpreter for PHP code (HipHop Virtual Machine, or HHVM), which is quite a bit faster than the original PHP engine.

Do Large High-Traffic Websites use ORMs?

Currently, the released version of EF, v1.0 in .NET 3.5, has terrible performance. I did extensive testing and had several long email discussions with Microsoft on the subject over a year ago when it was first released. EF's current efficiency has a LOT to be desired, and in many cases, can generate absolutely atrocious SQL queries that decimate your performance.

Entity Framework v4.0 in .NET 4.0 is a LOT better. They have fixed most, if not all, of the poor SQL generation issues that plague EF v1.0 (including the issues I presented to them a year ago.) Whether EF v4.0 has the best performance is really yet to be seen. It is more complex than LINQ to SQL, as it provides much greater flexibility. As a release version is not yet available, its impossible to say whether EF v4.0 will be the fastest or not.

An objective answer to this would require an objective, unbiased comparison between the major ORM contendors, such as EF, LINQ to SQL, nHibernate (preferably with a LINQ provider), LLBLGen, and even some of the newcommers, such as Telerik's ORM, Subsonic and the like.

As for large-scale, high-volume production systems that use ORM's. I would suggest looking at StackOverflow.com itself, which uses LINQ to SQL. SO has become one of, if not the, top programmer communities on the Internet. Definitely high volume here, and this site performs wonderfully. As for other sites, I couldn't really say. The internal implementation details of most major web applications are generally a mystery. Most uses of ORM's that I know of are also for internal, enterprise systems. Financial systems, health care, etc. Object Databases are also used in the same kinds of systems, although they are much less frequent. I would so some searches for ORM use and high volume web sites.

One thing to note in your search. Make sure the reviews you find are current. The ORM scene has changed a LOT in the last two years. Performance, efficiency, capabilities, RDBMS tuning capability of dynamic SQL, etc. have all improved significantly since ORM's were first created around a decade ago.

RedBean ORM performance

@tereško if tis possible, can you give the pros and cons of orm with respect to pure sql according to your experience and also i will google the topic at same time. – Jaison Justus

Well .. explaining this in 600 characters would be hard.

One thing I must clarify: this is about ORMs in PHP, though i am pretty sure it applies to some Ruby ORMs too and maybe others.

In brief, you should avoid them, but if you have to use an ORM, then you will be better of with Doctrine 2.x , it's the lesser evil. (Implements something similar to DataMapper instead of ActiveRecord).

Case against ORMs

The main reason why some developers like to use ORMs is also the worst thing about them: it is easy to do simple thing in ORM, with very minor performance costs. This is perfectly fine.

1. Exponential complexity

The problem originates in people to same tool for everything. If all you have is a hammer (..) type of issue. This results in creating a technical debt.

At first it is easy to write new DB related code. And maybe, because you have a large project, management in first weeks (because later it would case additional issues - read The Mythical Man-Month, if interested in details) decides to hire more people. And you end up preferring people with ORM skills over general SQL.

But, as project progresses, you will begin to use ORM for solving increasingly complex problems. You will start to hack around some limitations and eventually you may end up with problems which just cannot be solved even with all the ORM hacks you know ... and now you do not have the SQL experts, because you did not hire them.

Additionally most of popular ORMs are implementing ActiveRecord, which means that your application's business logic is directly coupled to ORM. And adding new features will take more and more time because of that coupling. And for the same reason, it is extremely hard to write good unit-tests for them.

2. Performance

I already mentioned that even simple uses of ORM (working with single table, no JOIN) have some performance costs. It is due to the fact that they use wildcard * for selecting data. When you need just the list of article IDs and titles, there is no point on fetching the content.

ORMs are really bad at working with multiple tables, when you need data based on multiple conditions. Consider the problem:

Database contains 4 tables: Projects, Presentations, Slides and Bulletpoints.

  • Projects have many Presentations
  • Presentations have many Slides
  • Slides have many Bulletpoitns

And you need to find content from all the Bulletpoints in the Slides tagged as "important" from 4 latest Presentations related to the Projects with ids 2, 4 and 8.

This is a simple JOIN to write in pure SQL, but in any ORM implementation, that i have seen, this will result in 3-level nested loop, with queries at every level.


P.S. there are other reasons and side-effects, but they are relatively minor .. cannot remember any other important issues right now.

The advantages and disadvantages of using ORM

"ORM fail to compete against SQL
queries for complex queries."

  • Well both LINQ-SQL and Entity Framework Allow complex queries and even translation of SQL query results into objects.

"Developers loose understanding of
what the code is actually doing - the
developer is more in control using
SQL."

  • Not really, if you know what you are doing. SQL profiler is enough to see what the translated SQL queries are.

"ORM has a tendency to be slow."

  • Yes, but delay loading and some smart options can make it almost as fast.

"Loss in developer productivity whilst
they learn to program with ORM."

  • Hibernate and the Entity Framework might take time to learn but in the long run they will save time in development. LINQ-SQL on the other hand has little to no learning curve involved.

I say, use ORM but keep this in mind.

  1. Design your queries and write code
    that will result in the least number
    of roundtrips with the server. It's
    the overhead taken for the roundtrip
    that takes up time.

  2. Read about the experiences other
    people have had with the selected
    ORM before you dig in too deep.

  3. Always compare your queries with the
    actual ones being executed in SQL
    server profiler.

Edit:
You wouldn't use an ORM for a performance critical situation same way you wouldn't use .Net or Java to write an operating system. Consider your requirements before choosing. Even if you don't use an ORM, you will end up doing some mapping yourself either via repeating a lot of code or by using a data dictionary. Why not use an ORM and know how to use its options to make it ALMOST as fast? Weigh up the advantages and disadvantages and make your choice.

http://mikehadlow.blogspot.ca/2012/06/when-should-i-use-orm.html

ORM or SQL in large, scalable and MAINTAINABLE web application?

Ajsie,

My vote is for an ORM. I use NHibernate. It's not perfect and there is a sizable learning curve. But the code is much more maintainable, much more OOP. Its almost impossible to create an application using OOP without an ORM unless you like a lot of duplicate code. It will definitely eliminate probably the vast majority of your SQL code.

And here's the other thing. If you're are going to build an OOP system, you'll end up writing your own O/R Mapper anyway. You'll need to call dynamic SQL or stored procs, get the data as a reader or dataset, convert that to an object, wire up relationships to other objects, turn object modifications into sql inserts/updates, etc. What you write will be slower and more buggy than NHibernate or something that's been in the market for a long while.

Your only other choice really is to build a very data centric, procedural application. Yes it may perform faster in some areas. I agree that performance IS important. But what matters is that its FAST ENOUGH. If you save a few milliseconds here and there doing procedural code, your users will not notice the performance increase. But you 'll notice the crappy code.

The biggest performance bottle-necks in an ORM are in the right way to pre-fetch and lazy-load objects. This gets into the n-query problems with ORMs. However, these are easily solved. You just have to performance tune your object queries and limit the number of calls to the database, tell it when to use joins, etc. NHibernate also supports a rich caching mechanism so you don't hit the database at all at times.

I also disagree with those that say performance is about users and maintenance is about coders. If your code is not easily maintained, it will be buggy and slow to add features. Your users will care about that.

I wont say every application should have an ORM, but I think most will benefit. Also don't be afraid to use native SQL or stored procedures with an ORM every now and then where necessary. If you have to do batch updates to millions of records or write a very complex report (hopefully against a separate, denormalized reporting database) then straight SQL is the way to go. Use ORMs for the OOP, transactional, business logic and C.R.U.D. stuff, and use SQL for the exceptions and edge cases.

I'd recommend reading Jeffrey Palermo's stuff on NHibernate and Onion Architecture. Also, take his agile boot camp or other classes to learn O/R Mapping, NHibernate and OOP. Thats what we use: NHibernate, MVC, TDD, Dependency Injection.



Related Topics



Leave a reply



Submit