Can You Dig It? – Find Estimated RowCounts

Posted by & filed under , , , , , .

Follow the rest of this series at the Can You Dig It? – Plan Cache series introduction post.

3295969599_eb16a58118_m1
She can dig it!
D Sharon Pruitt

It’s the eighth post in the plan cache series.  I took a few days off to start up a new blog series, I had considered holding that series until March but a couple upcoming presentations made it seem worthwhile.  But I digress…

Today we’re going to do a little more digging into the StmtSimple element.  As I mentioned in the that post, we could probably find some interesting things in the plan cache if we dig around with some of the attributes of StmtSimple.

StatementEstRows Attribute

Our attribute of the day from StmtSimple will be StatementEstRows.  This element returns the estimated number of rows that a compiled plan is expecting to return.

The estimation of rows return by a query is important.  A plan than is expecting many more or much fewer rows than it actual returns can lead to plans that probably won’t perform optimally.  The plan is compiled with certain expectations and not meeting these has about the same effect as not meeting the expectation of buying your significant other a birthday present.

StatementEstRows Query

Below is a query that I’ve used a few times for investigating StatementEstRows information from plan caches.


SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
GO

WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
,cEstimatedRows
AS (
SELECT TOP 25
c.value('@StatementEstRows', 'float') AS StatementEstRows
,cp.usecounts
,c.value('@StatementSubTreeCost', 'float') AS StatementSubTreeCost
,c.value('@StatementType', 'varchar(255)') AS StatementType
,CAST('<!--?query  ' + CHAR(13) + c.value('@StatementText', 'varchar(max)') + CHAR(13) + ' ?-->' AS xml) AS StatementText
,cp.plan_handle
,qp.query_plan
,c.value('xs:hexBinary(substring(@QueryHash,3))','binary(8)') AS query_hash
,c.query('.') query
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
CROSS APPLY qp.query_plan.nodes('//StmtSimple') t(c)
WHERE qp.query_plan.exist('//StmtSimple') = 1
ORDER BY c.value('@StatementEstRows', 'float') DESC
),cQueryStats
AS (
SELECT query_hash
,SUM(total_worker_time / NULLIF(qs.execution_count,0)) AS avg_worker_time
,SUM(total_logical_reads / NULLIF(qs.execution_count,0)) AS avg_logical_reads
,SUM(total_elapsed_time / NULLIF(qs.execution_count,0)) AS avg_elapsed_time
FROM sys.dm_exec_query_stats qs
GROUP BY query_hash
)
SELECT er.StatementEstRows
, er.usecounts
, er.StatementSubTreeCost
, qs.avg_worker_time
, qs.avg_logical_reads
, qs.avg_elapsed_time
, er.StatementType
, er.StatementText
, er.query_plan
, er.plan_handle
FROM cEstimatedRows er
LEFT OUTER JOIN cQueryStats qs ON er.query_hash = qs.query_hash
GO

The query returns the following columns:

  • StatementEstRows: The number of rows that have been estimated to be returned by the plan.
  • Usecounts: Number of times that the compiled plan has been used.
  • StatementSubTreeCost: The total calculated cost for all of the operations in the statement.
  • Avg_worker_time: Average time spent by the CPU executing the query.
  • Avg_logical_reads: Average number of reads while executing the query.
  • Avg_elapsed_time: Average duration of each query execution.
  • StatementType: Type of T-SQL statement executed.
  • StatementText: T-SQL statement that was executed.  The results are in XML format to allow easy viewing.
  • Query_plan: SHOWPLAN XML for the compiled plan from the plan cache.
  • Plan_handle: Value the refers to the compiled plan in the plan cache.

The Query Results

You may be wondering, “Now that I’ve run the query on the plan cache what am I going to do with the results?”  That’s a great question.  Rather than talk in platitudes, I’ll go through a couple sets of results.  I won’t get specific on their environments but will give my general line of thinking with each.

Result Set from Server A

image

  • Orange: The item was a bit confusing.  There are an estimated 24M rows but the average elapsed time was 30 ms.  Not a performance killer since it doesn’t appear that it ran further than compiling a plan.  But if it ran there could be Big Trouble in Little China.
  • Yellow: Returning 14M rows and eating a lot of other resources.  This turned out to be a not so awesome report.  This will be in the performance tuning queue.
  • Green: This item appeared similar to the Orange item above, but in actuality was caused by bad statistics.  Some additional maintenance is probably required.
  • Blue: These are all ad-hoc statements related to an import process.  Some tuning of that process is likely in order as these are potential resource bottlenecks.

Result Set from Server B

image

  • Orange: Insert statement that SQL Server believes will insert 1.3M rows and has been used 9,485 times.  This to me is troubling since the databases on the server don’t have that many rows.
  • Yellow: Similar the to the Orange item, this is a delete statement, though.  And it’s deleting more than 750K rows almost as often as the insert above.

There are other inserts, updates, and deletes in this result set.  But overall these two items indicate that there are large batches of data modifications happening in a system that doesn’t have a lot of activity.  Turns out some of the maintenance processes that update a few rows at a time have been updating everything – because… what could it hurt?!?

StatementEstRows Wrap-up

The StatementEstRows attribute is not a smoking gun.  You can’t run these queries against your servers and immediately find things that you can implement.  You will though possibly uncover anomalies that are worth investigating.  These investigations can lead to changes in queries, indexing or how statistics are managed in your databases.

Also, the information included above from sys.dm_exec_query_stats should be considered to be generalized performance information.  This is because the DMV returns information based on individual query statements which may or may not be part of the average execution of a plan or statement.

Lastly, you can get some of this information from other areas when it comes to resource utilization.  On the other hand, this is one of the few ways that I know of to get some insight into what the statistics of the compiled plan.

Downloads