Analyze This – Your Indexes Analysis – Part 1

Posted by & filed under , , , , , , .

Shrinks have no idea what’s wrong with you when you walk in the door.  In you come you try to describe what appears to be the problem and then they start asking about your mother.  But you don’t want to talk about your mother, most likely you’d like some relief to whatever the concern is that brought you in the door.

In much the same way, I often will go out to a client where there is some sort of performance “concern”.  They will describe the problem, but since I don’t have the history with the “concern” I need to ask about the database’s mother to get to the root cause.

But when performance is a problem, the last thing that anyone really wants to do is wait and talk about the how’s and the why’s of the problem.  Like the average person they want some level of a solution as soon as possible.

If I haven’t worked with the client’s database before, one of the items on my initial checklist of when start looking into the problem is determining if the database has indexes.  There are other things on the checklist; such as wait stats and performance counters.  But in many cases, resolving poor or the lack of indexing resolves a lot of performance issues with the feel of the mythical “Silver Bullet”.

Now that I’ve told my story, I’ll start with the interesting part.  This is the first in a series of posts in which I plan to assemble the query that I use to analyze indexes by going through each piece that I have added to the query and to explain why I’ve added it.  Feel free to ask questions and feedback is always appreciated.  I’m definitely open to the idea that some of my ideas are crazier than

In the first iteration of the query I wanted to see how the current indexes were being used along with what indexes could potentially be built.  To do this, I used the following DMVs:

There have been plenty of posts on using these DMVs throughout the all of recorded history.  I’m not going to rehash what’s already been done.  Typically, when these DMVs are used there are two queries created one for index usage and another for missing indexes.

When I started looking at missing indexes and index usage, my next question was always so what?  Is index A better than potential index B?  Sure, I can calculate the impact of Index C and it’s a really big number but how does it’s potential usage rank against indexes A and D.

I got to thinking that I wish I could sleep on this flight a combination of these results in a single output was what I needed.  I put it together and here you go, version 1 of my index analysis DMV query (I’ve attached the script in a SQL file at the end as well).


DECLARE @ObjectID int
SELECT @ObjectID = OBJECT_ID('Person.Address')

;WITH preIndexAnalysis
AS (
SELECT
OBJECT_SCHEMA_NAME(t.object_id) as schema_name
,t.name as table_name
,COALESCE(i.name, 'N/A') as index_name
,CASE WHEN i.is_unique = 1 THEN 'UNIQUE ' ELSE '' END + i.type_desc as type_desc
,NULL as impact
,ROW_NUMBER()
OVER (PARTITION BY i.object_id ORDER BY i.is_primary_key desc, ius.user_seeks + ius.user_scans + ius.user_lookups desc) as ranking
,ius.user_seeks + ius.user_scans + ius.user_lookups as user_total
,COALESCE(CAST(100 * (ius.user_seeks + ius.user_scans + ius.user_lookups)
/(NULLIF(SUM(ius.user_seeks + ius.user_scans + ius.user_lookups)
OVER(PARTITION BY i.object_id), 0) * 1.) as decimal(6,2)),0) as user_total_pct
,ius.user_seeks
,ius.user_scans
,ius.user_lookups
,STUFF((SELECT ', ' + c.name
FROM sys.index_columns ic
INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
WHERE i.object_id = ic.object_id
AND i.index_id = ic.index_id
AND is_included_column = 0
ORDER BY index_column_id ASC
FOR XML PATH('')), 1, 2, '') as indexed_columns
,STUFF((SELECT ', ' + c.name
FROM sys.index_columns ic
INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
WHERE i.object_id = ic.object_id
AND i.index_id = ic.index_id
AND is_included_column = 1
ORDER BY index_column_id ASC
FOR XML PATH('')), 1, 2, '') as included_columns
FROM sys.tables t
INNER JOIN sys.indexes i ON t.object_id = i.object_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats ius ON i.object_id = ius.object_id AND i.index_id = ius.index_id AND ius.database_id = db_id()
WHERE t.object_id = @ObjectID OR @ObjectID IS NULL
UNION ALL
SELECT
OBJECT_SCHEMA_NAME(mid.object_id) as schema_name
,OBJECT_NAME(mid.object_id) as table_name
,'--MISSING--'
,'--NONCLUSTERED--'
,(migs.user_seeks + migs.user_scans) * migs.avg_user_impact as impact
,0 as ranking
,migs.user_seeks + migs.user_scans as user_total
,NULL as user_total_pct
,migs.user_seeks
,migs.user_scans
,0 as user_lookups
,COALESCE(equality_columns + ', ', SPACE(0)) + COALESCE(inequality_columns, SPACE(0)) as indexed_columns
,included_columns
FROM sys.dm_db_missing_index_details mid
INNER JOIN sys.dm_db_missing_index_groups mig ON mid.index_handle = mig.index_handle
INNER JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
WHERE database_id = db_id()
AND mid.object_id = @ObjectID OR @ObjectID IS NULL
)
SELECT schema_name
,table_name
,index_name
,type_desc
,impact
,user_total
,user_total_pct
,CAST(100 * (user_seeks + user_scans + user_lookups)
/(NULLIF(SUM(user_seeks + user_scans + user_lookups)
OVER(PARTITION BY schema_name, table_name), 0) * 1.) as decimal(6,2)) as estimated_percent
,user_seeks
,user_scans
,user_lookups
,indexed_columns
,included_columns
FROM preIndexAnalysis
ORDER BY schema_name, table_name, ROW_NUMBER() OVER (PARTITION BY schema_name, table_name ORDER BY user_total desc, ranking)

NOTE: Populating the table name in the function for the @ObjectID variable is optional.  If the value of @ObjectID is NULL then information on all tables is returned.

This version of the DMV query returns the following columns:

  • schema_name: Schema name
  • table_name: Table name
  • index_name: Index Name
  • type_desc: Type of index; either clustered or non-clustered.
  • impact: Calculation of impact of a potential index.  This is based on the seeks and scans that the index could have utilized multiplied by average improvement the index would have provided.  This is included only for missing indexes.
  • user_total: Total number of seek, scan, and lookup operations for the index
  • user_total_pct: Percentage of total number of seek, scan, and lookup operations for this index compared to all seek, scan, and lookup operations for existing indexes for the table.
  • estimated_percent: Percentage of total number of seek, scan, and lookup operations for this index compared to all seek, scan, and lookup operations for existing and potential indexes for the table.  This number is naturally skewed because a seek for potential Index A resulted in another operation on an existing index and both of these operations would be counted.
  • user_seeks: Number of seek operations on the index.
  • user_scans: Number of scan operations on the index.
  • user_lookups: Number of lookup operations on the index.
  • indexed_columns: Columns that are part of the index.
  • included_columns: Columns that are included in the index.

To get some results for this query I ran the following SQL statements against the AdventureWorks database.


USE AdventureWorks
GO
SELECT * FROM Person.Address
GO

SELECT * FROM Person.Address
WHERE AddressLine1 = '1'
GO

SELECT * FROM Person.Address
WHERE AddressLine2 = '1'
GO

SELECT StateProvinceID
FROM Person.Address
WHERE ModifiedDate > GETDATE()

After running the queries above a number of times, I get the output shown below from the index analysis query.  The column width have been shrunk to fit the image, use the column definitions above for the column titles.

If the activity was representative of typical table index activity there are some recommendations that could be made.  Some of them could be:

  • A new index could be made on the AddressLine2 column with a number of included columns such as AddressID and AddressLine1.
  • A new index could be made on ModifiedDate that included StateProvinceID.
  • The index IX_Address_StateProvinceID could be dropped.
  • The index AK_Address_rowguid while not used is a unique index and probably should not be dropped.

There are a couple important things to keep in mind when using this type of index analysis.

First off, missing index and index statistics are calculated from the point when the SQL Server service was last started or when the last index was added to the table.  Make certain that you know the time frame of your sample before implementing any index changes.

Secondly, the statistics for missing indexes overlaps with index usage stats.  A query that could have used a missing index still needs to use an index when it executes and that usage will have been included in the index usage stats.

Hopefully, I’ve covered all of the bases here and I’ll be adding in more of the pieces I have in my full index analysis query in future posts.

EDIT: Fixed some display issues for SQLServerPedia