The OVER() Clause

Thought I’d share a cool trick I picked up a while back with the OVER() clause. Typically you’d use this clause with the PARTITION BY or ORDER BY arguments with the ranking functions released with SQL Server 2005.

The OVER() clause though can also be used to do aggregations with a SELECT statement without a ranking function providing functionality that usually requires variables or a sub-query to accomplish.

For instance if I wanted determine the percentage that each wait stat had of the total wait time I would previously have written a SQL statement similar to the following:

DECLARE @LineTotal money

SELECT @LineTotal = SUM(LineTotal)
FROM Sales.SalesOrderDetail

SELECT sod.ProductID
,sod.LineTotal
,CAST((sod.LineTotal * 100.)/@LineTotal AS decimal(8,2)) as PctLineTotal
FROM Sales.SalesOrderDetail sod
ORDER BY LineTotal DESC

Of course, some serious waits could occur in the fraction of a millisecond between the two SELECT statements so that wasn’t always the best way to write these kinds of calculations. So sometimes, I’d also write it using a sub-query or a Common Table Expression (CTE).

;WITH SumSalesOrderDetail
AS (
SELECT SUM(LineTotal) LineTotal FROM Sales.SalesOrderDetail
)
SELECT sod.ProductID
,sod.LineTotal
,CAST((sod.LineTotal * 100.)/x.LineTotal AS decimal(8,2)) as PctLineTotal
FROM Sales.SalesOrderDetail sod
CROSS JOIN SumSalesOrderDetail x
ORDER BY LineTotal DESC

As I’ve mentioned the OVER() clause can be used to do aggregations with a SELECT statement without providing a GROUP BY or a sub-query. Leaving the arguments blank causes the OVER() to apply the aggregation across all rows of the query.

SELECT sod.ProductID
,sod.LineTotal
,CAST((LineTotal * 100.)/SUM(sod.LineTotal) OVER () AS decimal(8,2)) as PctLineTotal
FROM Sales.SalesOrderDetail sod
ORDER BY LineTotal DESC

Sweet! Cool! Awesome! This is great! But, hold on, there’s a rub, this technique does not perform well at all. And by not well, I really mean that the reads go through the roof. I ran the above three statements and graphed the reads, CPU, and duration below for comparison.

It should be pretty obvious that there are a few more reads with this usage of OVER(), so is something that needs to be used with caution. Make certain that the ease of the coding outweighs it’s impact on performance.

So is this trick even worth mentioning?

First off, people are out there using this so other people need to be aware of the implications of using it. I wouldn’t run out and start putting this into all of my code without having a good idea of it’s impact over time.

On the other hand, though, I like to use this in management scripts and quick queries to get statistics from the server. It is a bit prettier than some of the alternatives and is a little quicker to write.

Share this:

Related