The OVER() Clause

Posted by & filed under , , , , .

Thought I’d share a cool trick I picked up a while back with the OVER() clause.  Typically you’d use this clause with the PARTITION BY or ORDER BY arguments with the ranking functions released with SQL Server 2005.

The OVER() clause though can also be used to do aggregations with a SELECT statement without a ranking function providing functionality that usually requires variables or a sub-query to accomplish.

For instance if I wanted determine the percentage that each wait stat had of the total wait time I would previously have written a SQL statement similar to the following:

DECLARE @LineTotal money

 

SELECT @LineTotal = SUM(LineTotal)

FROM Sales.SalesOrderDetail

 

SELECT sod.ProductID

    ,sod.LineTotal 

    ,CAST((sod.LineTotal * 100.)/@LineTotal AS decimal(8,2)) as PctLineTotal

FROM Sales.SalesOrderDetail sod

ORDER BY LineTotal DESC

Of course, some serious waits could occur in the fraction of a millisecond between the two SELECT statements so that wasn’t always the best way to write these kinds of calculations.  So sometimes, I’d also write it using a sub-query or a Common Table Expression (CTE).

;WITH SumSalesOrderDetail

AS (

    SELECT SUM(LineTotal) LineTotal FROM Sales.SalesOrderDetail

)

SELECT sod.ProductID

    ,sod.LineTotal 

    ,CAST((sod.LineTotal * 100.)/x.LineTotal AS decimal(8,2)) as PctLineTotal

FROM Sales.SalesOrderDetail sod

    CROSS JOIN SumSalesOrderDetail x

ORDER BY LineTotal DESC

As I’ve mentioned the OVER() clause can be used to do aggregations with a SELECT statement without providing a GROUP BY or a sub-query.  Leaving the arguments blank causes the OVER() to apply the aggregation across all rows of the query.

SELECT sod.ProductID

    ,sod.LineTotal 

    ,CAST((LineTotal * 100.)/SUM(sod.LineTotal) OVER () AS decimal(8,2)) as PctLineTotal

FROM Sales.SalesOrderDetail sod

ORDER BY LineTotal DESC

Cheering-business-people

Sweet!  Cool!  Awesome!  Look at the four suits cheer this mighty knowledge.  Well hold on, here’s the rub, this technique does not perform well at all.  And by not well, I really mean that the reads go through the roof.  I ran the above three statements and graphed the reads, CPU, and duration below for comparison.

It should be pretty obvious that there are a few more reads with this usage of OVER(), so it isn’t something to go running out and putting into production.

image

So why is this trick even worth mentioning?

First off, people are out there using this so other people need to be aware of the implications of using it.  I wouldn’t run out and use this in my production application code and if I see it there I’m not going to let it stay there.

On the other hand, though, I like to use this in management scripts and quick queries to get statistics from the server.  It is a bit prettier than some of the alternatives and is a little quicker to write.