Can You Dig It? – Find Estimated Cost

Follow the rest of this series at the Can You Dig It? – Plan Cache series introduction post.

Picking NoseIt’s the eighth post in the plan cache series.  I hadn’t expect to take as many days off from this due to the 31 Days of SSIS, but that one had a stricter time table that I was following.  Though now that I’ve wrapped up that series I want to get back to talking about things you can pull out of, and discover in, the plan cache.

Today we’re going to do a little more digging into the StmtSimple element.  As I mentioned in a previous post, there are some interesting attributes that could provide some insight in this element.  Today, let’s look at StatementSubTreeCost; which is similar in many respected to StatementEstRows.

StatementSubTreeCost Attribute

As I mentioned, we will be looking at the attribute StatementSubTreeCost.  This attribute returns the estimated cost that a calculated when the plan was compiled.  The cost is based on the statistics and distributions of data that the query will encounter when executing.   The higher the cost, the more expensive the query will be against the resources of your environment.

There aren’t any hard guidelines, that I know of, that indicate specifically when the cost of a query is too high or what the average cost of your queries should be.  Costs are based on the statistics for your data and derived from the manner in which an execution plan is put together.

With the consulting work that I do, I often don’t blink when the cost estimation is below one for queries that are often executed.  If the query runs often, though, and the cost is in the thousands or millions, I’ll wonder if the query is really doing what it should or if it’s making the request properly.

StatementSubTreeCost Query

Below is a query that I’ve used a few times for investigating StatementSubTreeCost information from plan caches.  You’ll notice that it’s structure is a bit different from the one provided for querying for StatementEstRows.  The reason for this difference is that I found on one SQL Server the query ran long.  The version below pulls by StatementSubTreeCostand does so more reasonably:

IF OBJECT_ID('tempdb..#StatementSubTreeCost') IS NOT NULL
DROP TABLE #StatementSubTreeCost ;

CREATE TABLE #StatementSubTreeCost
(
StatementSubTreeCost FLOAT
,StatementId INT
,UseCounts BIGINT
,plan_handle VARBINARY(64)
) ;

WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
INSERT INTO #StatementSubTreeCost (StatementSubTreeCost, StatementId, UseCounts, plan_handle)
SELECT TOP 25
c.value('@StatementSubTreeCost', 'float') AS StatementSubTreeCost
,c.value('@StatementId', 'float') AS StatementId
,cp.UseCounts
,cp.plan_handle
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
CROSS APPLY qp.query_plan.nodes('//StmtSimple') t(c)
WHERE cp.cacheobjtype = 'Compiled Plan'
AND qp.query_plan.exist('//StmtSimple') = 1
AND c.value('@StatementSubTreeCost', 'float') IS NOT NULL
ORDER BY c.value('@StatementSubTreeCost', 'float') DESC;

WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
,cQueryStats
AS (
SELECT query_hash
,SUM(total_worker_time / NULLIF(qs.execution_count,0)) AS avg_worker_time
,SUM(total_logical_reads / NULLIF(qs.execution_count,0)) AS avg_logical_reads
,SUM(total_elapsed_time / NULLIF(qs.execution_count,0)) AS avg_elapsed_time
FROM sys.dm_exec_query_stats qs
GROUP BY query_hash
)
SELECT
s.StatementSubTreeCost
, s.usecounts
, CAST(c.value('@StatementEstRows', 'float') AS bigint) AS StatementEstRows
, qs.avg_worker_time
, qs.avg_logical_reads
, qs.avg_elapsed_time
,c.value('@StatementType', 'varchar(255)') AS StatementType
,c.value('@StatementText', 'varchar(max)') AS StatementText
,s.plan_handle
,qp.query_plan
FROM #StatementSubTreeCost s
CROSS APPLY sys.dm_exec_query_plan(s.plan_handle) AS qp
CROSS APPLY qp.query_plan.nodes('//StmtSimple') t(c)
LEFT OUTER JOIN cQueryStats qs ON c.value('xs:hexBinary(substring(@QueryHash,3))','binary(8)') = query_hash
WHERE c.value('@StatementId', 'float') = s.StatementId
ORDER BY c.value('@StatementSubTreeCost', 'float') DESC, s.StatementSubTreeCost DESC

The query returns the following columns:

  • StatementSubTreeCost: The total calculated cost for all of the operations in the statement.
  • Usecounts: Number of times that the compiled plan has been used.
  • StatementEstRows: The number of rows that have been estimated to be returned by the plan.
  • Avg_worker_time: Average time spent by the CPU executing the query.
  • Avg_logical_reads: Average number of reads while executing the query.
  • Avg_elapsed_time: Average duration of each query execution.
  • StatementType: Type of T-SQL statement executed.
  • StatementText: T-SQL statement that was executed.
  • Plan_handle: Value the refers to the compiled plan in the plan cache.
  • Query_plan: SHOWPLAN XML for the compiled plan from the plan cache.

The Query Results

Similar to the previous post on this element, let’s take a look at the results from a couple SQL Servers.  There are a few things that do pop out to me on these servers that lead me to want to investigate a few things.  It’ll be obvious below, but I’m going to talk about these stats and then the related systems, but I won’t be providing information on those investigations.  I left that out for brevity – but what I provide should provide direction on what you can look for with your own results.

Result Set from Server A

image

  • Blue: This is a new report that was pushed out recently.  Obviously, it should have gotten a bit more review.  With a cost of over 312 million, there is probably something wrong with the underlying statistics or the selectivity of the query.
  • Green: These two queries popped up on my radar and are staying there until they can be tuned a bit.  Both use XQuery and not very efficiently.  It hasn’t been called too often, but is planned for more frequent executions.
  • Pink: These queries are used to provides the feedback counts for a search screen.  Expensive and each is sent through as an ad-hoc query without much reuse.  High cost for only returning a single row.
  • Yellow: Another set of queries that haven’t been executed often, but low and behold I over heard a developer talking about writing a support application that would be running against these tables.  These are large logging tables and couple the cost with the row counts and these may be candidate for future issues.  Honestly, who wants support applications to be causing support issues?

There isn’t anything excessively critical in this list.  Though, I wouldn’t be surprised to see a noticeable performance hit on the data returned by the blue item if it ran during peak hours.  But with today being Friday, digging into a couple of these is a good way to wrap up the weekend.

Result Set from Server B

Now lets take a look at another SQL Server to see how the results can differ and possibly provide us with a different path to follow.

image

  • Blue: The blue queries are all expensive update statements.  For a system that is primarily a read-only platform, these items stick out even more.  Investigating these items showed that they were part of a maintenance process that is validating that state of some data.  Based on the cost, these may be a little too broad.  Side effects to these updates could be related to the deadlocks seen from time to time.  Actually, those side effects do occur.
  • Yellow: A couple ad-hoc queries.  I note them because they are SELECT * queries without WHERE clauses that are part of an Access database.  I did some digging around and found out about that little crime.  Someone is definitely going to be having a discussion about better ways to access with Access.
  • Red: Over and over again there are update and insert statements that are executed about the same number of times.  As I mentioned in the section above, there is a validation and maintenance process on the data.  Based on frequency and cost these appear may be a little heavy handed.  If they were being executed less often the cost wouldn’t be an issue, but at this rate I’d like to see these tuned a bit.

Based on the items I pointed out, the maintenance processes need to have a little review.  They are heavy handed and likely (actually are) causing side effects that are unneeded.  The upside to fixing these is that the results above will likely shift dramatically based on a little bit of work.

StatementSubTreeCost Wrap-up

Looking into the plan cache for StatementSubTreeCost is a little better than looking for StatementEstRows.  Where the later could be a justifiable row count, a cost that reaches the millions is often one that you can say is – just bad.  When you look at lower costs, you will need to make choices between the cost and frequency of execution to determine if you should investigate.  The key is identify worthwhile places to look to be able to affect improvements in your environments, looking at the plan cache in this manner does just that.

Also, the information included above from sys.dm_exec_query_stats should be considered to be generalized performance information.  This is because the DMV returns information based on individual query statements which may or may not be part of the average execution of a plan or statement.

One final bit of caution, when querying the plan cache be careful not to start your queries and walk off.  The plan cache is pretty active on most environments and writing XQuery against it can sometimes lead to lengthy execution times.  Don’t make the mistake of firing off your query and causing your own side-effect issues.  Not saying I did that once or that doing that caused me to do some re-writing of the query above.

Downloads

6 thoughts on “Can You Dig It? – Find Estimated Cost

Comments are closed.