The Side Effect of NOLOCK

Posted by & filed under , .

598995_10151827714910161_1528365845_nHave you ever been told that you should use NOLOCK?  I have, and its usually been with OLTP systems where the person making the recommendation wants to “improve” the performance of the data platform by reducing and removing, the locks that are held for SELECT queries.

Those that know better will often point to the fact that NOLOCK allows dirty reads.  Data that has not been committed can, and will, be returned.  In straight terms, an insert, update, or delete that is in process will considered for the result set, regardless of the state of the transaction.  This can mean returning rows from an insert that may potentially rollback, or returning rows that are being deleted.

What about times a query is returning rows that you know for certain are not being modified?  Suppose that you are updating rows for one client and need to return rows for a second client.  In this case, will the use of NOLOCK be safe?  The data isn’t being modified for the second client, so you might assume that the returning that data won’t have an opportunity for dirty data.  Unfortunately, this assumption is incorrect.

NOLOCK Side Effect Demo

Before explaining how the assumption is incorrect, let’s first demonstrate the problem.  We’ll start with a table for storing information for a client that represents points that a client has in the system.  In the code below, we’ll add in 10 rows for the client with personid equal to –1.  Then the script will loop through an insert rows for random clients and adding in more points.  Since the table uses an uniqueidentifier for the clustering key, the inserts will cause frequent page splits.


IF OBJECT_ID('dbo.Points') IS NOT NULL
DROP TABLE dbo.Points;

CREATE TABLE dbo.Points
(
id UNIQUEIDENTIFIER PRIMARY KEY DEFAULT ( NEWID() )
,personid INT
,dollars MONEY
,filler_data CHAR(1000) DEFAULT ( 'x' )
)

INSERT INTO dbo.Points ( personid, dollars )
VALUES(-1,10),(-1,10),(-1,10),(-1,10),(-1,10),(-1,10),(-1,10),(-1,10),(-1,10),(-1,10)

WHILE 1 = 1
BEGIN
INSERT INTO dbo.Points ( personid, dollars )
VALUES ( CAST(RAND() * 10000 AS INT), 10 )
END

In a second session, execute the query below that returns a summary of the data for the client where personid equals –1.  In this script, the query use NOLOCK since we are working under the premise that it is safe to use the hint on rows that we know are not affected by the ongoing inserts.


DECLARE @points MONEY

WHILE 1 = 1
BEGIN
SET @points = ( SELECT SUM(dollars)
FROM dbo.Points (NOLOCK)
WHERE personid = -1)
IF @points <> 100
BEGIN
SELECT
@points AS current_total
BREAK
END
END

On the first execution of the second script, the loop exited almost immediately.  Examining the image below, the first time it ran in less than a second.  Instead of returning a total of $100 in points, the result was $90.

results_too_low

Running the script for a second time, the same script returns different results.  Instead of the $90 returned on exit last time, the total is now $110 in points.  Looking at the results, this time it took 7 seconds to get an incorrect total.

results_too_high

While these two executions did return rather quickly, there were a few subsequent executions that ran for 7, 17, and 30 minutes before incorrect results were encountered.  Even though the length of time increased, the risk for incorrect results was equal and there is no guarantee that correct or incorrect will results won’t be returned at any time.

NOLOCK Problem

The cause for the incorrect results in the example is the occurrence of page splits from the data modifications.  While the secondary clients data wasn’t being changed, the pages that contained this data were being changed.  When the pages were being split, all of the data on the pages is affected, even those that are not related to the actual rows being changed.  And depending on the state of the page split and how the data is currently stored, this may result in either more or less rows data being returned.

The typical protection for this type of operation is the use of READ COMMITTED; which prevents dirty reads from occurring. Of course, this uses locking, and it should, to guarantee that the dirty reads do not occur.  But this provides that guarantee that the data returned is the data that is committed, regardless of whether a page split is moving around your data.

One of the the main concerns that people have with locking, is the blocking that is associated with two users trying to access the same locked resource.  As an alternative to using NOLOCK, try using SNAPSHOT isolation level instead.  Through this, readers won’t block writers; which will reduce the amount of lock blocking on your data platform.

A Little Homework

Now it’s time to put on your thinking cap.  Consider the calls that you’ve received from time to time where users complain about incorrect results on a report; which run fine the next time.  Or consider the order that is placed that exceeds the available cap for the user.  Or all of the other issues that arise from day to day where there seems to be an unreproducible bug in your application that relates to incorrect data being returned.

Are you using NOLOCK?  Did a page split change the location of the data being queried, changing the results in-flight?  Given the issues, which I am sure you all have, is it really ok to use NOLOCK?

Feel free to comment below with your thoughts.

  • mbourgon

    Be VERY careful when using snapshot isolation. You can wind up with file growth you’re not expecting. Yes, read committed is very nice. Yes, there are issues with dirty data. But put me firmly on the side of NOLOCK in a high-OLTP environment.

    • StrateSQL

      Care needs to be taken in deploying snapshot isolation level, but it is better imo than jumping in with NOLOCK. When it comes to high-end OLTP, I don’t know anyone that recommends NOLOCK over re-architecting the data for better access. Unfortunately, I have had clients where they preferred bad architecture to accurate data; which is an issue of its own entirely.

  • http://twitter.com/Svelmoe Allan Svelmøe Hansen

    Running more and more into the ‘we should use NOLOCK’ mindset; it’s nice to have some additional ammunition other than just ‘dirty data’. I hadn’t thought of page splits for this before.

    • StrateSQL

      Yeah, this is one that usually gets people to perk up a little more and realize the scope of their risk.

  • Barry

    Like anything, you need to be aware of the pros and cons. So long as you know the ‘issues’ with the NOLOCK option and your systems can deal with those ‘issues’ then it can make a very positive contribution to a system’s performance.

    • StrateSQL

      I get what you are trying to say, but how would build an application that could accommodate for results that may or may not be accurate?

      The only methods I can think of to guarantee correct results to people would be to execute the query twice and if the results matched then present them to the end user. Or, put the database or tables being used in read only mode until after the query completed.

      • Barry

        It is not always necessary to guarantee correct results and the ‘dirty’ state doesn’t matter. If your application inserts into a given table and that data is mostly correct and any updates to that data do not significantly change the meaning then there is probably no risk showing it in it’s ‘dirty’ state.

        I’d consider that if you are talking, for example, about financial transactions then the use of NOLOCK would probably be wholly inappropriate, but for a table which contains comments like the ones we enter here then reading it with NOLOCK is probably not going to be any cause for concern; the data returned is good enough to service the display requirement.

  • Adykes

    This is a good article, however I use NOLOCK from time to time. In our particular case though, we are doing quality business rule testing on a static set of data. Once it is loaded it doesn’t change, so for my case, it is helpful.

  • Pingback: Семь смертных грехов программиста на T-SQL перевод | sql.dp.ua – про sql, базы данных, программирование()

  • http://www.facebook.com/rajib.bahar Rajib Bahar

    Thanks for the POC. I have been lucky so far because our target tables were either updated in lesser frequency or part of a datawarehouse.

  • Guru

    It is interesting to know that page splits will actually move data stored in the table from a previous insert or update. Is this limited to the same session, or would inserts or updates from other sessions cause this same behavior?

    While I agree that (nolock) certainly isn’t correct in all situations, I would expect (nolock) to be correct while generating reports. This assumes 2 things:

    1. Production data used for generating the reports is not changing while the reports are running or report data is being processed.
    2. Production data used for generating reports is stored in reporting tables, meaning that the data has already been processed and is stored in a way that the GUI will simply pick up and generate a report with. (This step isn’t necessary, but will greatly enhance report performance and will further decrease the chance of generating a report with ‘dirty data’.)

    Think of it this way: With (nolock), all of the instructions that the processor(s) would have to process is decreased. At a minimum, all of the extra overhead associated with locking does not have to be processed. Running a select with (nolock) will be faster than a select without it, unless you think there will ever be a time when the same processor will process more instructions faster than it would process less instructions… For all those programmers out there, it’s similar to running the Release version of an executable (without all of the debug instructions) instead of running the Debug version of an executable. You just have to make sure that your data isn’t changing whenever you generate a report.

  • Eric Stimpson

    I wasn’t able to replicate your results on SQL 2012 in my environment. Query 2 ran indefinitely…

    As for the general argument, I get the point, but in practical terms I suspect that the problem you’ve demonstrated here would be exceedingly rare in most circumstances. Dirty reads aside, I doubt that the problem of incorrect results would occur in most environments. I’d be interested in someone who understood how this problem could be further isolated weighing in, but I suspect that if you added a little wait time to query 1’s loop, more closely replicating a typical environment, query 2 would never return.

    For certain environments (high volume oltp, client-server with 100+ clients) I know that not allowing dirty reads results in a plethora of deadlock exceptions, making the system completely unusable. We have chosen the strategy of only preventing dirty reads during data modification transactions and allowing dirty reads for reporting. This is particularly important because the OLTP data is extensively reported against in real-time, and dealing with this added load is a critical requirement.

    In many real-world environments, I suspect dirty read reporting is a necessary evil.

    • StrateSQL

      This is a real world issue. Though the structure of the scenario that I use to demonstrate the issue is not. No one only reads only one user and continually adds new data. To make the scenario more real world, more than one user would be being read at a time along with the data being added.

      Is this a rare issue? Most definitely. In order to hit it, you need to read the data for a row that is on a page being split. The script I use generally creates this scenario in a short amount of time. But sometimes, it will miss on replicating the issue. Not seeing the issue occur does not mean it isn’t an issue, the same that seeing different times changes the possible outcome.

      Adding a wait between query executions won’t change anything in a real world scenario. In the case of this scenario, if the data is read for another row that is co-located on a page that is being split because of changes to a different row, this issue will likely occur. It’s all about timing.

      Of course, there is still the general issue with NOLOCK that data read that is mid-transaction may not return correct results.

      In regards to deadlocks, they should never be accepted as a foregone conclusion to high volume OLTP environments. In my experrience, shortening transactions, re-ordered operations, approriate use of isolation levels (i.e. avoiding serialized) and properly indexing tables generally resolves most deadlock issues.

      When reporting needs to occur on production OLTP platforms, there are many options that can be looked at over NOLOCK use. I’d first consider database snapshots, replication, availability groups, and SNAPSHOT isolation level before going to NOLOCK.

    • Grant Fritchey

      I’m not so sure about your conclusion on this.

      I’ve seen missing and extra rows occur in real-world, small scale, systems. I’ve also consulted with people that have hit the issue on systems that were not really large or heavily loaded.

      I think what happens most of the time is that people run the report and then note the discrepancies and then rerun the report. Everything is fine the second time. I’ve noticed that many, maybe even most, database problems are not reported.

      In my own, admittedly non-emperical, experience, this is more common than most people realize. I don’t think that it’s actually common, but I also don’t think it’s rare. It’s just too easy to replicate.

    • Allen White

      Hi Eric. Unfortunately the problem that Jason’s demo highlights is one that is real, and is inherent in the functional design of the physical implementation of SQL Server. By using the NOLOCK hint you’re accepting that transactional consistency is not important. I’ve used Jason’s demos to show multiple clients how their NOLOCK usage is creating inconsistent results, thus potentially causing bad decisions to be made.

      With respect to your deadlock issue, properly written applications should NEVER deadlock. If all processes update tables in the same order then a deadlock simply won’t occur. (We know that’s not real world, but if you have a serious deadlock problem, I’d be callin’ on some dev types.)

      Now, I’ve run Jason’s demos on my laptop and my Surface Pro, which both have SSD drives, and the failure doesn’t occur. Does that mean that the newer technology makes this problem meaningless? No, it doesn’t, because I/O requirements frequently exceed the capacity for the IO channel, and you’ll run into this problem, even on solid-state technology, eventually.

      The proper way to fix the problem is to make the queries against your OLTP database highly selective. SQL Server does very well in high-volume transactional environments with its pessimistic concurrency models when the application is built correctly. When the queries get overly broad, or you’ve got monthly reports going against your OLTP database while updates are occurring, you’ve got trouble, and NOLOCK in that case is a bandaid that won’t solve your problem, it just hides it.

    • Eric Stimpson

      Ok, let me start with the following acknowledgement: I’m not dismissing this as a potentially significant issue. I’m also not recommending as a best practice NOLOCK usage. And finally, I’m not dismissing all of your suggests for dealing with DEADLOCKS or better reporting solutions. All of these are suggestions which I, too, employ personally and preach to others. That said, I’m sure many of us have had the experience of dealing with legacy systems/environments that violate a multitude of desirable practices and where practicality limits available courses of corrective actions.

      My argument boils down to this: NOLOCK “ought not” create the issue demonstrated here. I understand that it can, but I see this as a bug in SQL Server. Let’s imagine that the issue demonstrated here didn’t exist. Then we could say the following about using NOLOCK for reporting purposes -> you may get inconsistent results for 2 distinct reasons:
      1) You might read data from a table that is modified at the beginning of a transaction and data from a table that is modified at the end of a transaction, and there may be a transaction that is partially completed, thereby giving you inconsistent results.
      2) You might read data that will be rolled back.

      These may be generally acceptable risks for reporting, particularly if you’re reporting on data that was created in past periods (i.e. yesterday, or last month) instead of up to the very last transaction. However, what is demonstrated here is that reporting on data that hasn’t been modified can be made inconsistent if the underlying table is being modified and you read with NOLOCK (provided the clustered index is such that new data might need to be placed anywhere other than the end of a table).

      I get it. I just don’t think SQL Server ought to behave this way.

      Given that I have several suitable environments for testing this problem (they have high transaction loads, lots of poorly designed clustered indeces, and large-scale NOLOCK reporting), I’m tempted to see if I can determine quantitatively how frequently this issue causes inaccurate reporting in these environments, although I agree with what would seem to be the consensus: the effort would better be made eliminating the need for NOLOCK reporting.

  • psteffek

    Great stuff Jason. Just started a new gig where a report using NOLOCK was getting extremely inconsistent results. In one of the tables involved, four out of five columns are uniqueidentifiers, and the table gets all rows deleted and reinserted every 30 seconds. Your example provided an easy demonstration of what the issue could be. Thanks!