Ad Hoc Queries VS Stored Procedures VS Dynamic SQL

Which is better: Ad hoc queries or stored procedures?

In my experience writing mostly WinForms Client/Server apps these are the simple conclusions I've come to:

Use Stored Procedures:

  1. For any complex data work. If you're going to be doing something truly requiring a cursor or temp tables it's usually fastest to do it within SQL Server.
  2. When you need to lock down access to the data. If you don't give table access to users (or role or whatever) you can be sure that the only way to interact with the data is through the SP's you create.

Use ad-hoc queries:

  1. For CRUD when you don't need to restrict data access (or are doing so in another manner).
  2. For simple searches. Creating SP's for a bunch of search criteria is a pain and difficult to maintain. If you can generate a reasonably fast search query use that.

In most of my applications I've used both SP's and ad-hoc sql, though I find I'm using SP's less and less as they end up being code just like C#, only harder to version control, test, and maintain. I would recommend using ad-hoc sql unless you can find a specific reason not to.

When is it better to write ad hoc sql vs stored procedures

SQL Server caches the execution plans for ad-hoc queries, so (discounting the time taken by the first call) the two approaches will be identical in terms of speed.

In general, the use of stored procedures means taking a portion of the code needed by your application (the T-SQL queries) and putting it in a place that is not under source control (it can be, but usually isn't) and where it can be altered by others without your knowledge.

Having the queries in a central place like this may be a good thing, depending upon how many different applications need access to the data they represent. I generally find it much easier to keep the queries used by an application resident in the application code itself.

In the mid-1990's, the conventional wisdom said that stored procedures in SQL Server were the way to go in performance-critical situations, and at the time they definitely were. The reasons behind this CW have not been valid for a long time, however.

Update: Also, frequently in debates over the viability of stored procedures, the need to prevent SQL injection is invoked in defense of procs. Surely, no one in their right mind thinks that assembling ad hoc queries through string concatenation is the correct thing to do (although this will only expose you to a SQL injection attack if you're concatenating user input). Obviously ad hoc queries should be parameterized, not only to prevent the monster-under-the-bed of a sql injection attack, but also just to make your life as a programmer generally easier (unless you enjoy having to figure out when to use single quotes around your values).

Update 2: I have done more research. Based on this MSDN white paper, it appears that the answer depends on what you mean by "ad-hoc" with your queries, exactly. For example, a simple query like this:

SELECT ID, DESC FROM tblSTUFF WHERE ITEM_COUNT > 5

... will have its execution plan cached. Moreover, because the query does not contain certain disqualifying elements (like nearly anything other than a simple SELECT from one table), SQL Server will actually "auto-parameterize" the query and replace the literal constant "5" with a parameter, and cache the execution plan for the parameterized version. This means that if you then execute this ad-hoc query:

SELECT ID, DESC FROM tblSTUFF WHERE ITEM_COUNT > 23

... it will be able to use the cached execution plan.

Unfortunately, the list of disqualifying query elements for auto-parameterization is long (for example, forget about using DISTINCT, TOP, UNION, GROUP BY, OR etc.), so you really cannot count on this for performance.

If you do have a "super complex" query that won't be auto-parameterized, like:

SELECT ID, DESC FROM tblSTUFF WHERE ITEM_COUNT > 5 OR ITEM_COUNT < 23

... it will still be cached by the exact text of the query, so if your application calls this query with the same literal "hard-coded" values repeatedly, each query after the first will re-use the cached execution plan (and thus be as fast as a stored proc).

If the literal values change (based on user actions, for example, like filtering or sorting viewed data), then the queries will not benefit from caching (except occasionally when they accidentally match a recent query exactly).

The way to benefit from caching with "ad-hoc" queries is to parameterize them. Creating a query on the fly in C# like this:

int itemCount = 5;
string query = "DELETE FROM tblSTUFF WHERE ITEM_COUNT > " +
itemCount.ToString();

is incorrect. The correct way (using ADO.Net) would be something like this:

using (SqlConnection conn = new SqlConnection(connStr))
{
SqlCommand com = new SqlCommand(conn);
com.CommandType = CommandType.Text;
com.CommandText =
"DELETE FROM tblSTUFF WHERE ITEM_COUNT > @ITEM_COUNT";
int itemCount = 5;
com.Parameters.AddWithValue("@ITEM_COUNT", itemCount);
com.Prepare();
com.ExecuteNonQuery();
}

The query contains no literals and is already fully parameterized, so subsequent queries using the identical parameterized statement would use the cached plan (even if called with different parameter values). Note that the code here is virtually the same as the code you would use for calling a stored procedure anyway (the only difference being the CommandType and the CommandText), so it somewhat comes down to where you want the text of that query to "live" (in your application code or in a stored procedure).

Finally, if by "ad-hoc" queries you mean you're dynamically constructing queries with different columns, tables, filtering parameters and whatnot, like maybe these:

SELECT ID, DESC FROM tblSTUFF WHERE ITEM_COUNT > 5

SELECT ID, FIRSTNAME, LASTNAME FROM tblPEEPS
WHERE AGE >= 18 AND LASTNAME LIKE '%What the`

SELECT ID, FIRSTNAME, LASTNAME FROM tblPEEPS
WHERE AGE >= 18 AND LASTNAME LIKE '%What the`
ORDER BY LASTNAME DESC

... then you pretty much can't do this with stored procedures (without the EXEC hack which is not to be spoken of in polite society), so the point is moot.

Update 3: Here is the only really good performance-related reason (that I can think of, anyway) for using a stored procedure. If your query is a long-running one where the process of compiling the execution plan takes significantly longer than the actual execution, and the query is only called infrequently (like a monthly report, for example), then putting it in a stored procedure might make SQL Server keep the compiled plan in the cache long enough for it to still be around next month. Beats me if that's true or not, though.

Inline SQL versus stored procedure

A well formed "inline" or "ad-hoc" SQL query - if properly used with parameters - is just as good as a stored procedure.

But this is absolutely crucial: you must use properly parametrized queries! If you don't - if you concatenate together your SQL for each request - then you don't benefit from these points...

Just like with a stored procedure, upon first executing, a query execution plan must be found - and then that execution plan is cached in the plan cache - just like with a stored procedure.

That query plan is reused over and over again, if you call your inline parametrized SQL statement multiple times - and the "inline" SQL query plan is subject to the same cache eviction policies as the execution plan of a stored procedure.

Just from that point of view - if you really use properly parametrized queries - there's no performance benefit for a stored procedure.

Stored procedures have other benefits (like being a "security boundary" etc.), but just raw performance isn't one of their major plus points.

What's the point of writing a dynamic query in a stored procedure

There are many other facets of stored procedures to consider besides their execution plan caching features, so I don't think its fair to dismiss their use simply because they are going to contain an ad-hoc query.

(Also worth noting that a properly formed bit of dynamic SQL is no barrier to execution plan reuse)

Are Stored Procedures more efficient, in general, than inline statements on modern RDBMS's?


NOTE that this is a general look at stored procedures not regulated to a specific
DBMS. Some DBMS (and even, different
versions of the same DBMS!) may operate
contrary to this, so you'll want to
double-check with your target DBMS
before assuming all of this still holds.

I've been a Sybase ASE, MySQL, and SQL Server DBA on-and off since for almost a decade (along with application development in C, PHP, PL/SQL, C#.NET, and Ruby). So, I have no particular axe to grind in this (sometimes) holy war.

The historical performance benefit of stored procs have generally been from the following (in no particular order):

  • Pre-parsed SQL
  • Pre-generated query execution plan
  • Reduced network latency
  • Potential cache benefits

Pre-parsed SQL -- similar benefits to compiled vs. interpreted code, except on a very micro level.

Still an advantage?
Not very noticeable at all on the modern CPU, but if you are sending a single SQL statement that is VERY large eleventy-billion times a second, the parsing overhead can add up.

Pre-generated query execution plan.
If you have many JOINs the permutations can grow quite unmanageable (modern optimizers have limits and cut-offs for performance reasons). It is not unknown for very complicated SQL to have distinct, measurable (I've seen a complicated query take 10+ seconds just to generate a plan, before we tweaked the DBMS) latencies due to the optimizer trying to figure out the "near best" execution plan. Stored procedures will, generally, store this in memory so you can avoid this overhead.

Still an advantage?
Most DBMS' (the latest editions) will cache the query plans for INDIVIDUAL SQL statements, greatly reducing the performance differential between stored procs and ad hoc SQL. There are some caveats and cases in which this isn't the case, so you'll need to test on your target DBMS.

Also, more and more DBMS allow you to provide optimizer path plans (abstract query plans) to significantly reduce optimization time (for both ad hoc and stored procedure SQL!!).

WARNING Cached query plans are not a performance panacea. Occasionally the query plan that is generated is sub-optimal.
For example, if you send SELECT *
FROM table WHERE id BETWEEN 1 AND
99999999
, the DBMS may select a
full-table scan instead of an index
scan because you're grabbing every row
in the table (so sayeth the
statistics). If this is the cached
version, then you can get poor
performance when you later send
SELECT * FROM table WHERE id BETWEEN
1 AND 2
. The reasoning behind this is
outside the scope of this posting, but
for further reading see:
http://www.microsoft.com/technet/prodtechnol/sql/2005/frcqupln.mspx
and
http://msdn.microsoft.com/en-us/library/ms181055.aspx
and http://www.simple-talk.com/sql/performance/execution-plan-basics/

"In summary, they determined that
supplying anything other than the
common values when a compile or
recompile was performed resulted in
the optimizer compiling and caching
the query plan for that particular
value. Yet, when that query plan was
reused for subsequent executions of
the same query for the common values
(‘M’, ‘R’, or ‘T’), it resulted in
sub-optimal performance. This
sub-optimal performance problem
existed until the query was
recompiled. At that point, based on
the @P1 parameter value supplied, the
query might or might not have a
performance problem."

Reduced network latency
A) If you are running the same SQL over and over -- and the SQL adds up to many KB of code -- replacing that with a simple "exec foobar" can really add up.
B) Stored procs can be used to move procedural code into the DBMS. This saves shuffling large amounts of data off to the client only to have it send a trickle of info back (or none at all!). Analogous to doing a JOIN in the DBMS vs. in your code (everyone's favorite WTF!)

Still an advantage?
A) Modern 1Gb (and 10Gb and up!) Ethernet really make this negligible.
B) Depends on how saturated your network is -- why shove several megabytes of data back and forth for no good reason?

Potential cache benefits
Performing server-side transforms of data can potentially be faster if you have sufficient memory on the DBMS and the data you need is in memory of the server.

Still an advantage?
Unless your app has shared memory access to DBMS data, the edge will always be to stored procs.

Of course, no discussion of Stored Procedure optimization would be complete without a discussion of parameterized and ad hoc SQL.

Parameterized / Prepared SQL

Kind of a cross between stored procedures and ad hoc SQL, they are embedded SQL statements in a host language that uses "parameters" for query values, e.g.:

SELECT .. FROM yourtable WHERE foo = ? AND bar = ?

These provide a more generalized version of a query that modern-day optimizers can use to cache (and re-use) the query execution plan, resulting in much of the performance benefit of stored procedures.

Ad Hoc SQL
Just open a console window to your DBMS and type in a SQL statement. In the past, these were the "worst" performers (on average) since the DBMS had no way of pre-optimizing the queries as in the parameterized/stored proc method.

Still a disadvantage?
Not necessarily. Most DBMS have the ability to "abstract" ad hoc SQL into parameterized versions -- thus more or less negating the difference between the two. Some do this implicitly or must be enabled with a command setting (SQL server: http://msdn.microsoft.com/en-us/library/ms175037.aspx , Oracle: http://www.praetoriate.com/oracle_tips_cursor_sharing.htm).

Lessons learned?
Moore's law continues to march on and DBMS optimizers, with every release, get more sophisticated. Sure, you can place every single silly teeny SQL statement inside a stored proc, but just know that the programmers working on optimizers are very smart and are continually looking for ways to improve performance. Eventually (if it's not here already) ad hoc SQL performance will become indistinguishable (on average!) from stored procedure performance, so any sort of massive stored procedure use ** solely for "performance reasons"** sure sounds like premature optimization to me.

Anyway, I think if you avoid the edge cases and have fairly vanilla SQL, you won't notice a difference between ad hoc and stored procedures.

What's the point of writing a dynamic query in a stored procedure

There are many other facets of stored procedures to consider besides their execution plan caching features, so I don't think its fair to dismiss their use simply because they are going to contain an ad-hoc query.

(Also worth noting that a properly formed bit of dynamic SQL is no barrier to execution plan reuse)



Related Topics



Leave a reply



Submit