12 June 2014

My first cursor - faster than SET based operation

Of course, you should NEVER write cursor. Especially not for SQL server. Or so I was told.

The rational behind the code


I believe people hate them so much because usually you might find them in some legacy code, where there could be some other proper SET based operations, or at least they might exist nowadays. Nobody likes legacy code. Nothing too delightful about jumping into code that you don't know and you didn't write. Another reason is of course, for the usual T-SQL code that you write some smart people wrote optimizer for. If you run a Cursor the optimizer cannot kick in, so it's likely that the code would run slower.

But sometimes you do need to write cursors. The main use is when you need to go row by row. Then why not use client side programs, like C# or VB? Well, maybe it's because the developer feels more competent using SQL than by using C#? That's been my excuse for writing the code below.
So I had to go row by row for a one off. My SQL skills exceed my C# skills. I needed to run this code once and that's all. So I choose to write a cursor.

What does it do?

There had been some duplicated rows of data in the "attachment" table. The data is a file, saved in the "content" column as varbinary(VARMAX), and I just wanted to make sure that the duplicated file name didn't stem from a different files, but rather, represented the same file on multiple instances. I've made sure that all attachments were saved to a new table called "multiple_attachment" when "min_attachid" is the ref (there's obviously another field called "file name" which I didn't bother with this time).
Since it's hard to trace errors in Cursor I've added the "INSERT" statement.
Using SET based operation would not be quicker (tested!) due to the INNER JOIN of a table on itself, on top of the expensive operation of comparing varbinary(max).

The code


DECLARE first_cursor CURSOR
FOR
SELECT content, attachid, min_attachmentid
FROM dbo.multiple_attachment
WHERE attachid <> min_attachmentid

DECLARE @content VARBINARY (max)
DECLARE @attachid BIGINT
DECLARE @min_attachid BIGINT

OPEN first_cursor

FETCH NEXT FROM first_cursor
INTO @content, @attachid, @min_attachid

WHILE @@FETCH_STATUS = 0
BEGIN
IF @content <> (SELECT content
FROM dbo.attachment
WHERE attachid = @min_attachid)
INSERT  [dbo].[trace_errors]
           ([attachid])
     SELECT (@attachid)

FETCH NEXT FROM first_cursor
END

CLOSE first_cursor
DEALLOCATE first_cursor


A problem:

The code above got stuck with the error "An error occurred while executing batch. Error message is: Error creating window handle."
because there were over 2000 lines in the table and there was a limit to how much "Display Result" window panes you can create. For every line it produced a new message! the solution was to disable the results pane in the Query Option (Query -> Query Options -> Results
choose "Discard results after execution" in SQL Server 2012). The above proc run much quicker and I would recommend setting this option each time you need to write a Cursor.

No comments:

Post a Comment