STATS_DATE() Doesn’t Indicate The Validity of Statistics

Posted by & filed under , , , .

NumberDialFrom time to time, you might have caught yourself in a conversation similar to the following:

Fred: Hey folks, we have a query that isn’t working right anymore?

Wilma: I wonder if there’s a problem with the statistics.  Maybe they are out of date.

Barney: I checked the statistics with STATS_DATE() and they are really old.  They were created over a year ago.

Phil: Last year?! Those things are out of date.  We should update statistics!

Wilma: Are you sure about that? Hasn’t this been a read-only database for a few years?

As a DBA, and a consultant, I have run into conversations like this many times over the years.  There is a myth out there that when statistics are old, they are automatically out of date.  I’ve seen scripts that treat old statistics as out-of-date, indexing presentations that mention the same, and even health check tools that do this as well.  In many cases, the person bringing it up touts it as a trick they learned that will fix the problem.

But that is not the case, not in the slightest.  When people refer to out of date statistics, they are referring to the state of the statistics, not the age.  In other words, when statistics are out of date it only means that the distribution of data right now, does not match the distribution of data when the statistics were last created.  The STATS_DATE() function ONLY indicates the date in which the statistics were created; which is not a measure of their quality.

Using STATS_DATE()

To demonstrate, let’s first look at the statistics for a couple indexes on the table Sales.SalesOrderHeader in AdventureWorks2012.  Using the query in Listing 1, we can determine the date in which all of the statistics for the indexes on Sales.SalesOrderHeader were created.  All of the indexes have a create date for the statistics of March 14, 2013, shown in Figure 1.


--Listing 1. Query using STATS_DATE()

SELECT name, STATS_DATE(object_id, index_id) AS STATS_DATE
FROM sys.indexes
WHERE object_id = OBJECT_ID('Sales.SalesOrderHeader');

image
Figure 1. Results for STATS_DATE()

Looking At Clustered Index Statistics

Now that we’ve seen the date for the statistics creation, let’s next look at the statistics for the clustered index.  Using the DBCC SHOW_STATISTICS statement in Listing 2, we can see a number of attributes about the index, shown in Figure 2.  For instance, there are 31,465 rows in the table.  The density of the key value is 3.178134E-05.  Also, the histogram has two steps with the second step being comprised of 31,463 rows and 31,463 distinct values.  Long story short, we know that for any value in the the range the likelihood is that there is one value that will be returned.


--Listing 2. DBCC SHOW_STATISTICS on the Clustered Index

DBCC SHOW_STATISTICS('Sales.SalesOrderHeader','PK_SalesOrderHeader_SalesOrderID')

image
Figure 2. Results for DBCC SHOW_STATISTICS on Clustered Index

Of course, these statistics are really old.  At this point they are almost a year old.  Add to that the fact that the histogram only has two steps.  Obviously, these must not be correct.

To test this theory, we’ll run UPDATE STATISTICS on the clustered index with the query provided in Listing 3.  Followed by the DBCC SHOW_STATISTICS command.  The results from updating the statistics doesn’t change too much, shown in Figure 3.   By comparing it to the previous results, we can see the row numbers are the same, density is unchanged, the ration of range rows to distinct rows in the histogram is the same.  The only difference is the addition on a third step in the histogram.  The only substantive difference between the new and the old statistics is the date they were created.


--Listing 3. UPDATE STATISTICS on the Clustered Index

UPDATE STATISTICS Sales.SalesOrderHeader PK_SalesOrderHeader_SalesOrderID
DBCC SHOW_STATISTICS('Sales.SalesOrderHeader','PK_SalesOrderHeader_SalesOrderID'

image
Figure 3. Updated Results for DBCC SHOW_STATISTICS on Clustered Index

Looking At Non-Clustered Index Statistics

But what about non-clustered indexes.  Surely, since they aren’t clustered their statistics must be more susceptible to become bad over time.  Of course, saying (or reading) this does sound a little ridiculous, but let’s continue on anyways.

First, we’ll take a look the statistics using DBCC SHOW_STATISTICS, as provided in Listing 4.  As with the clustered index, figure 4 shows that there are 31,465 rows in the index.  Along with that, there are 18 steps in the historgram detailing the distribution of data in the index.  As with the statistics for the clustered index, the statistics for the index were created March 14, 2012.


--Listing 4. DBCC SHOW_STATISTICS on a non-clustered index

DBCC SHOW_STATISTICS('Sales.SalesOrderHeader','IX_SalesOrderHeader_SalesPersonID'

image
Figure 4. Results for DBCC SHOW_STATISTICS on Non-Clustered Index

Now let’s update the statistics for the non-clustered index and find out if running them about a year later makes the statistics even better.  We’ll again use UPDATE STATISTICS, as provided in Listing 5, to refresh the statistics and then return the results with the DBCC SHOW_STATISTICS command.  Similar to the previous example, reviewing the results in Figure 5, the only difference between old and the new statistics is the create date for the statistics.  Updating old statistics, again, provided no value or difference to the statistics.


--Listing 5. UPDATE STATISTICS on the Non-Clustered Index

UPDATE STATISTICS Sales.SalesOrderHeader IX_SalesOrderHeader_SalesPersonID
DBCC SHOW_STATISTICS('Sales.SalesOrderHeader','IX_SalesOrderHeader_SalesPersonID')

image

Figure 5. Updated Results for DBCC SHOW_STATISTICS on Non-Clustered Index

Summary

If you are in the habit of updating your statistics because they are old, take some time to reconsider how you are identifying out-of-date statistics.  As the demonstrations in this post showed, just because statistics are old doesn’t mean they are bad, or out-of-date.  It just means they are old.  And being old, isn’t always all that bad.

  • Mark Freeman

    Is there a good way to tell if the statistics are no longer valid without running an UPDATE STATISTICS on them and noting that nothing significant changed? It would be nice to avoid the UPDATE of a statistics object with an old STATS_DATE if we can tell in advance that it won’t help.