Performance - Single Join Select VS. Multiple Simple Selects

JOIN queries vs multiple queries

This is way too vague to give you an answer relevant to your specific case. It depends on a lot of things. Jeff Atwood (founder of this site) actually wrote about this. For the most part, though, if you have the right indexes and you properly do your JOINs it is usually going to be faster to do 1 trip than several.

Multiple Table Select vs. JOIN (performance)

They are the same, but with a different syntax. So you shouldn't expect any performance difference between the two syntaxes. However the the last syntax(ANS SQL-92 syntax) is the recommended, see these for more details:

  • Bad habits to kick : using old-style JOINs.
  • SQL JOIN: is there a difference between USING, ON or WHERE?

Which provides better performance one big join or multiple queries?

I agree with everyone who's said a single join will probably be more efficient, even with a lot of tables. It's also less development effort than doing the work in your application code. This assumes the tables are appropriately indexed, with an index on each foreign key column, and (of course) an index on each primary key column.

Your best bet is to try the easiest approach (the big join) first, and see how well it performs. If it performs well, then great - you're done. If it performs poorly, profile the query and look for missing indexes on your tables.

Your option #1 is not likely to perform well, due to the number of network round-trips (as anijhaw mentioned). This is sometimes called the "select N+1" problem - you do one SELECT to get the list of N applications, and then do N SELECTs in a loop to get the customers. This record-at-a-time looping is natural to application programmers; but SQL works much better when you operate on whole sets of data at once.

If option #2 is slow even with good indexing, you may want to look into caching. You can cache in the database (using a summary table or materialized/indexed view), in the application (if there is enough RAM), or in a dedicated caching server such as memcached. Of course, this depends on how up-to-date your query results need to be. If everything has to be fully up-to-date, then any cache would have to be updated whenever the underlying tables are updated - it gets complicated and becomes less useful.

This sounds like a reporting query though, and reporting often doesn't need to be real-time. So caching might be able to help you.

Depending on your DBMS, another thing to think about is the impact of this query on other queries hitting the same database. If your DBMS allows readers to block writers, then this query could prevent updates to the tables if it takes a long time to run. That would be bad. Oracle doesn't have this problem, and neither does SQL Server when run in "read committed snapshot" mode. I don't know about MySQL though.

NotORM's join vs multiple selects

I would expect the single query to be quite a bit quicker. However will depend on the communication between php and MySQL. When I have tried to benchmark it there has been a noticeable overhead just for doing a query (however simple).

However if the query gets too complicated you land up with something that is hard to maintain.

Another point comes if you are trying to use any custom procedures. For example I needed to do some processing based on Levenshtein distances (ie, how similar words are). The MySQL function I found for this was FAR slower than retrieving the data and processing with PHPs Levenshtein function.

Is it better to do multiple selects from multiple tables or 1 select of all your data from all the tables?

Just write one query with joins. If you are concerned about performance there are a number of options including:

  • Creating indexes that will help the performance of your selects
  • Create a persisted denormalized form of the data you want so you can query one table. This would most likely be an indexed view or another table.

LEFT JOIN vs. multiple SELECT statements

There is not enough information to really answer the question. I've worked on applications where decreasing the query count for one reason and increasing the query count for another reason both gave performance improvements. In the same application!

For certain combinations of table size, database configuration and how often the foreign table would be queried, doing the two queries can be much faster than a LEFT JOIN. But experience and testing is the only thing that will tell you that. MySQL with moderately large tables seems to be susceptable to this, IME. Performing three queries on one table can often be much faster than one query JOINing the three. I've seen speedups of an order of magnitude.



Related Topics



Leave a reply



Submit