Why Do People Hate SQL Cursors So Much

Why do people hate SQL cursors so much?

The "overhead" with cursors is merely part of the API. Cursors are how parts of the RDBMS work under the hood. Often CREATE TABLE and INSERT have SELECT statements, and the implementation is the obvious internal cursor implementation.

Using higher-level "set-based operators" bundles the cursor results into a single result set, meaning less API back-and-forth.

Cursors predate modern languages that provide first-class collections. Old C, COBOL, Fortran, etc., had to process rows one at a time because there was no notion of "collection" that could be used widely. Java, C#, Python, etc., have first-class list structures to contain result sets.

The Slow Issue

In some circles, the relational joins are a mystery, and folks will write nested cursors rather than a simple join. I've seen truly epic nested loop operations written out as lots and lots of cursors. Defeating an RDBMS optimization. And running really slowly.

Simple SQL rewrites to replace nested cursor loops with joins and a single, flat cursor loop can make programs run in 100th the time. [They thought I was the god of optimization. All I did was replace nested loops with joins. Still used cursors.]

This confusion often leads to an indictment of cursors. However, it isn't the cursor, it's the misuse of the cursor that's the problem.

The Size Issue

For really epic result sets (i.e., dumping a table to a file), cursors are essential. The set-based operations can't materialize really large result sets as a single collection in memory.

Alternatives

I try to use an ORM layer as much as possible. But that has two purposes. First, the cursors are managed by the ORM component. Second, the SQL is separated from the application into a configuration file. It's not that the cursors are bad. It's that coding all those opens, closes and fetches is not value-add programming.

What is wrong with Cursors?

What's wrong with cursors is that they are often abused, both in Oracle and in MS SQL.

Cursor are for keeping a stable resultset which you can retrieve row-by-row. They are implicitly created when your query is run, and closed when it's finished.

Of course keeping such a resultset requires some resources: locks, latches, memory, even disk space.

The faster these resources are freed, the better.

Keeping a cursor open is like keeping a fridge door open

You don't do it for hours without necessity, but it does not mean you should never open your fridge.

That means that:

  • You don't get your results row-by-row and sum them: you call the SQL's SUM instead.
  • You don't execute whole query and get the first results from the cursor: you append a rownum <= 10 condition to your query

, etc.

As for Oracle, processing your cursors inside a procedure requires infamous SQL/PLSQL context switch which happens every time you get a result of an SQL query out of the cursor.

It involves passing large amounts of data between threads and synchronizing the threads.

This is one of the most irritating things in Oracle.

One of the less evident consequences of that behaviour is that triggers in Oracle should be avoided if possible.

Creating a trigger and calling a DML function is equal to opening the cursor selecting the updated rows and calling the trigger code for each row of this cursor.

Mere existence of the trigger (even the empty trigger) may slow down a DML operation 10 times or more.

A test script on 10g:

SQL> CREATE TABLE trigger_test (id INT NOT NULL)
2 /

Table created

Executed in 0,031 seconds
SQL> INSERT
2 INTO trigger_test
3 SELECT level
4 FROM dual
5 CONNECT BY
6 level <= 1000000
7 /

1000000 rows inserted

Executed in 1,469 seconds
SQL> COMMIT
2 /

Commit complete

Executed in 0 seconds
SQL> TRUNCATE TABLE trigger_test
2 /

Table truncated

Executed in 3 seconds
SQL> CREATE TRIGGER trg_test_ai
2 AFTER INSERT
3 ON trigger_test
4 FOR EACH ROW
5 BEGIN
6 NULL;
7 END;
8 /

Trigger created

Executed in 0,094 seconds
SQL> INSERT
2 INTO trigger_test
3 SELECT level
4 FROM dual
5 CONNECT BY
6 level <= 1000000
7 /

1000000 rows inserted

Executed in 17,578 seconds

1.47 seconds without a trigger, 17.57 seconds with an empty trigger doing nothing.

Is there any performance impact using cursors in stored procedure?

Cursors are one of the worst things you can do for performance. They run row by row instead of impacting the whole set of data. No one except an experienced dba should ever consider writing one.

http://wiki.lessthandot.com/index.php/Cursors_and_How_to_Avoid_Them

When are TSQL Cursors the best or only option?

To answer your question directly:

I have yet to encounter a situation where set operations could not do what might otherwise be done with cursors. However, there are situations where using cursors to break a large set problem down into more manageable chunks proves a better solution for purposes of code maintainability, logging, transaction control, and the like. But I doubt there are any hard-and-fast rules to tell you what types of requirements would lead to one solution or the other -- individual databases and needs are simply far too variant.

That said, I fully concur with your "if it ain't broke, don't fix it" approach. There is little to be gained by refactoring procedural code to set operations for a procedure that is working just fine. However, it is a good rule of thumb to seek first for a set-based solution and only drop into procedural code when you must. Gut feel? If you're using cursors more than 20% of the time, you're doing something wrong.

And for what I really want to say:

When I interview programmers, I always throw them a couple of moderately complex SQL questions and ask them to explain how they'd solve them. These are problems that I know can be solved with set operations, and I'm specifically looking for candidates who are able to solve them without procedural approaches (i.e., cursors).

This is not because I believe there is anything inherently good or more performant in either approach -- different situations yield different results. Rather it's because, in my experience, programmers either get the concept of set-based operations or they do not. If they do not, they will spend too much time developing complex procedural solutions for problems that can be solved far more quickly and simply with set-based operations.

Conversely, a programmer who gets set-based operations almost never has problems implementing a procedural solution when, indeed, it's absolutely necessary.

SQL cursor memory overhead

It is going to be dependent on the type of cursor you are using. A keyset cursor will only store the PK in the tempdb and then do lookups on the other columns as needed. Others may store retrieve more information at one time, more rows, single rows, etc.

Here's a good description of each type: http://searchsqlserver.techtarget.com/feature/Part-1-How-cursors-work

SQL Server Fast Forward Cursors

The 'Best Practice' of avoiding cursors in SQL Server dates back to SQL Server 2000 and earlier versions. The rewrite of the engine in SQL 2005 addressed most of the issues related to the problems of cursors, particularly with the introduction of the fast forward option. Cursors are not neccessarily worse than set-based and are used extensively and successfully in Oracle PL/SQL (LOOP).

The 'generally accepted' that you refer to was valid, but is now outdated and incorrect - go on the assumption that fast forward cursors behave as advertised and perform. Do some tests and research, basing your findings on SQL2005 and later

Performance hit of having many Result Set of a single row


  1. For sure you should better place all single output rows into resulting temporary table before selecting final recordset. There is no reason currently in your code to return one recorset containing all separate rows from iteration over cursor with sp;
  2. Your MtdMdl_Item_ItemBase_Read is relevant a bit because after turning it into function you can avoid sp+cursor and complete the task with one single query using inline function.

upd

According to your data structure I understand that your [Id] is not unique which is source of confusing.

There are many ways to do what you need but here is example of one query even avoiding CTE for temporary result:

DECLARE @relatedid int = 2

SELECT top(1) WITH ties
[Id], [TimeStamp], [Name], [Owner], [Value]
FROM MtdMdl_Item
WHERE [Owner]=@relatedid
ORDER BY row_number() over(partition BY [Id] ORDER BY [TimeStamp] DESC)

Consider this SQL Fiddle as demo.

upd2

Example with inline table function:

CREATE FUNCTION MtdMdl_Item_ItemBase_Read (@keyid int)
RETURNS TABLE
AS
RETURN
(
SELECT TOP(1) [Id], [TimeStamp], [Name], [Owner], [Value]
FROM [MtdMdl_Item]
WHERE ([Id]=@keyid)
ORDER BY TimeStamp Desc
)
GO

DECLARE @relatedid int = 2

SELECT DISTINCT A.[Id],B.* FROM MtdMdl_Item A
OUTER apply (SELECT * FROM MtdMdl_Item_ItemBase_Read(A.[Id])) B
WHERE A.[Owner] = @relatedid

SQL Fiddle 2



Related Topics



Leave a reply



Submit