Analyze This – Analyze Your Indexes – Part 2

Posted by & filed under , , , , , .

This is the second post in which I am walking through a query that I use to analyze indexes.  For some reason the grammar on the title of the first post was a bit jacked so I’ve corrected it for this post and the all following ones.

Here’s a summary of what I’ve added to this query thus far:

  1. Combine existing index statistics and missing indexes into a single output

The next thing the index analysis needed was a method to determine whether or not an existing index was either a duplicate of or overlapped another index on the table.  For an explanation of the logic to find duplicate and overlapping index please read Paul Nielsen’s blog, which happens to be the source of the logic that the query uses.

The query I added to the index analysis is basically the same except the SQL Server 2005/2008 catalog view (sys.indexed_columns) is are being used instead of the SQL Server 2000 and prior system view (sys.sysindexkeys).

Version 2 of the query is as follows:


DECLARE @ObjectID int
SELECT @ObjectID = OBJECT_ID('Production.BillOfMaterials')

;WITH preIndexAnalysis
AS (
SELECT
OBJECT_SCHEMA_NAME(t.object_id) as schema_name
,t.name as table_name
,COALESCE(i.name, 'N/A') as index_name
,CASE WHEN i.is_unique = 1 THEN 'UNIQUE ' ELSE '' END + i.type_desc as type_desc
,NULL as impact
,ROW_NUMBER()
OVER (PARTITION BY i.object_id ORDER BY i.is_primary_key desc, ius.user_seeks + ius.user_scans + ius.user_lookups desc) as ranking
,ius.user_seeks + ius.user_scans + ius.user_lookups as user_total
,COALESCE(CAST(100 * (ius.user_seeks + ius.user_scans + ius.user_lookups)
/(NULLIF(SUM(ius.user_seeks + ius.user_scans + ius.user_lookups)
OVER(PARTITION BY i.object_id), 0) * 1.) as decimal(6,2)),0) as user_total_pct
,ius.user_seeks
,ius.user_scans
,ius.user_lookups
,STUFF((SELECT ', ' + QUOTENAME(c.name)
FROM sys.index_columns ic
INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
WHERE i.object_id = ic.object_id
AND i.index_id = ic.index_id
AND is_included_column = 0
ORDER BY index_column_id ASC
FOR XML PATH('')), 1, 2, '') AS indexed_columns
,STUFF((SELECT ', ' + QUOTENAME(c.name)
FROM sys.index_columns ic
INNER JOIN sys.columns c ON ic.object_id = c.object_id AND ic.column_id = c.column_id
WHERE i.object_id = ic.object_id
AND i.index_id = ic.index_id
AND is_included_column = 1
ORDER BY index_column_id ASC
FOR XML PATH('')), 1, 2, '') AS included_columns
,i.object_id
,i.index_id
,STUFF((SELECT ', ' + QUOTENAME(ic.column_id,'(')
FROM sys.index_columns ic
WHERE i.object_id = ic.object_id
AND i.index_id = ic.index_id
AND is_included_column = 0
ORDER BY index_column_id ASC
FOR XML PATH('')), 1, 2, '') AS indexed_columns_compare
,COALESCE(STUFF((SELECT ', ' + QUOTENAME(ic.column_id, '(')
FROM sys.index_columns ic
WHERE i.object_id = ic.object_id
AND i.index_id = ic.index_id
AND is_included_column = 1
ORDER BY index_column_id ASC
FOR XML PATH('')), 1, 2, ''), SPACE(0)) AS included_columns_compare
FROM sys.tables t
INNER JOIN sys.indexes i ON t.object_id = i.object_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats ius ON i.object_id = ius.object_id AND i.index_id = ius.index_id AND ius.database_id = db_id()
WHERE t.object_id = @ObjectID OR @ObjectID IS NULL
UNION ALL
SELECT
OBJECT_SCHEMA_NAME(mid.object_id) as schema_name
,OBJECT_NAME(mid.object_id) as table_name
,'--MISSING--'
,'--NONCLUSTERED--'
,(migs.user_seeks + migs.user_scans) * migs.avg_user_impact as impact
,0 as ranking
,migs.user_seeks + migs.user_scans as user_total
,NULL as user_total_pct
,migs.user_seeks
,migs.user_scans
,0 as user_lookups
,COALESCE(equality_columns + ', ', SPACE(0)) + COALESCE(inequality_columns, SPACE(0)) as indexed_columns
,included_columns
,mid.object_id
,NULL
,NULL
,NULL
FROM sys.dm_db_missing_index_details mid
INNER JOIN sys.dm_db_missing_index_groups mig ON mid.index_handle = mig.index_handle
INNER JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
WHERE database_id = db_id()
AND mid.object_id = @ObjectID OR @ObjectID IS NULL
)
SELECT schema_name
,table_name
,index_name
,type_desc
,impact
,user_total
,user_total_pct
,CAST(100 * (user_seeks + user_scans + user_lookups)
/(NULLIF(SUM(user_seeks + user_scans + user_lookups)
OVER(PARTITION BY schema_name, table_name), 0) * 1.) as decimal(6,2)) as estimated_percent
,user_seeks
,user_scans
,user_lookups
,indexed_columns
,included_columns
,STUFF((SELECT ', ' + index_name AS [data()]
FROM preIndexAnalysis iia
WHERE ia.object_id = iia.object_id
AND ia.index_id <> iia.index_id
AND ia.indexed_columns_compare = iia.indexed_columns_compare
AND ia.included_columns_compare = iia.included_columns_compare
FOR XML PATH('')), 1, 2, '') AS duplicate_indexes
,STUFF((SELECT ', ' + index_name AS [data()]
FROM preIndexAnalysis iia
WHERE ia.object_id = iia.object_id
AND ia.index_id <> iia.index_id
AND (ia.indexed_columns_compare LIKE iia.indexed_columns_compare + '%'
OR iia.indexed_columns_compare LIKE ia.indexed_columns_compare + '%')
AND ia.indexed_columns_compare <> iia.indexed_columns_compare
FOR XML PATH('')), 1, 2, '') AS overlapping_indexes
FROM preIndexAnalysis ia
ORDER BY schema_name, table_name, ROW_NUMBER() OVER (PARTITION BY schema_name, table_name ORDER BY user_total desc, ranking), index_name

NOTE: Populating the table name in the function for the @ObjectID variable is optional.  If the value of @ObjectID is NULL then information on all tables is returned.

This version of the DMV query returns the following columns:

  • schema_name: Schema name
  • table_name: Table name
  • index_name: Index Name
  • type_desc: Type of index; either clustered or non-clustered.
  • impact: Calculation of impact of a potential index.  This is based on the seeks and scans that the index could have utilized multiplied by average improvement the index would have provided.  This is included only for missing indexes.
  • user_total: Total number of seek, scan, and lookup operations for the index
  • user_total_pct: Percentage of total number of seek, scan, and lookup operations for this index compared to all seek, scan, and lookup operations for existing indexes for the table.
  • estimated_percent: Percentage of total number of seek, scan, and lookup operations for this index compared to all seek, scan, and lookup operations for existing and potential indexes for the table.  This number is naturally skewed because a seek for potential Index A resulted in another operation on an existing index and both of these operations would be counted.
  • user_seeks: Number of seek operations on the index.
  • user_scans: Number of scan operations on the index.
  • user_lookups: Number of lookup operations on the index.
  • indexed_columns: Columns that are part of the index.
  • included_columns: Columns that are included in the index.
  • duplicate_indexes: Indexes that exist on the table that are identical to the index on this row.
  • overlapping_indexes: Indexes that exist on the table that overlap the index on this row.

To review the results for this new version of the query I ran the following SQL statements against the AdventureWorks database:


CREATE INDEX IX_BillOfMaterials_UnitMeasureCode2 ON Production.BillOfMaterials(UnitMeasureCode)
GO
CREATE INDEX AK_BillOfMaterials_ProductAssemblyID ON Production.BillOfMaterials(ProductAssemblyID)
GO

This code creates a couple indexes that duplicate and overlap existing indexes.  The image below shows the first three and last two columns from the table.  For the sake of space, I removed the other columns.

There are two things that the query output shows:

  1. The index IX_BillOfMaterials_UnitMeasureCode is an exact duplicate of the index IX_BillOfMaterials_UnitMeasureCode2 and vice versa.  When at a client, I’d recommend dropping one of these two indexes.
  2. The index AK_BillOfMaterials_ProductAssemblyID overlaps with the index AK_BillOfMaterials_ProductAssemblyID_ComponentID_StartDate.  As is mention in Paul Nielsen’s article, overlapping indexes shouldn’t necessarily be removed.  A narrow index that overlaps a wide index may provide a performance gain.  Reviewing the usage stats, which are available in the other columns of this query, will help make that determination.

Feel free to leave any questions, comments, or requests for this query.  I’ve got about a half dozen other things I’ll be adding to it in the next few posts, but I am always open to new directions.

EDIT: Fixed some formatting issues.