Back to articles list Articles Cookbook
11 minutes read

The SQL OVER() Clause - When and Why to Use It

The OVER() clause, or how to reach a whole new level of data analysis.

Why all the commotion around the SQL OVER() clause? I’ll tell you why: the OVER() clause introduces window functions.

And what are window functions in SQL, you might ask? They are functions that operate on a window, i.e. a set of rows related to the current row. In that way, they are similar to the GROUP BY clause and to aggregate functions.

So why does knowing OVER() unlock a new level of data analysis? If you think about aggregate functions and GROUP BY, you’ll realize that they do aggregate data. But what you get is only the aggregated values. In other words, you lose all the individual rows.

With window functions, you can show individual row values and aggregate values at the same time. If you’re not familiar with window functions in SQL, this is a perfect moment to let you know about our Window Functions course with 218 interactive exercises. You have an internet browser, so all you need to enjoy the course is some basic SQL knowledge, like how to use JOINs, WHERE, GROUP BY, and HAVING.

For those of you who already use window functions, our Window Functions Practice Set will help maintain your coding form.  Get your free Window Functions Cheat Sheet and you’re ready to go!

Let’s answer the question of when and why you should use the OVER() clause in four practical examples. But first, let’s examine the data.

Data for Showcasing the OVER() Clause

There’s a chain of bookstores selling, quite predictably, books. They record their sales daily using two rather simple tables. The first one is author, which has three columns.

  • id – The author’s ID and the table’s primary key (PK).
  • first_name – The author’s first name.
  • last_name – The author’s last name.

The other table is book_sales:

  • id - The sales ID and the table’s primary key (PK).
  • date – The sales date.
  • book_title – The book’s title.
  • sales – The number of book copies sold.
  • author_id – The author’s ID and the table’s foreign key (FK), referencing the table author.

To help you visualize what this data looks like, here’s what you get when joining these two tables:

idfirst_namelast_nameid-2datebook_titlesalesauthor_id
1HanyaYanagihara12022-02-01A Little Life141
2MayaAngelou22022-02-01I Know Why the Caged Bird Sings92
3MilanKundera32022-02-01The Unbearable Lightness of Being13

The above data tells you that:

  • 14 copies of Hanya Yanagihara’s “A Little Life” were sold on 1 February 2022.
  • On the same date, 9 copies of “I Know Why the Caged Bird Sings” by Maya Angelou were sold.
  • Milan Kundera’s “The Unbearable Lightness of Being” was sold only once on that day.

Data is presented the same way for all other titles.

Example 1: OVER() Without Additional Clauses

The easiest way of working with SQL window functions is using OVER() without its optional clauses. I’ll demonstrate how this works by showing every book’s sales in relation to the total sales on that date.

Here’s how the code should be written:

SELECT date,
       first_name,
       last_name,
       book_title,
       sales,
       SUM(sales) OVER() AS total_daily_sales,
       sales::DECIMAL/SUM(sales) OVER()*100 AS percent_of_daily_sales
FROM author
JOIN book_sales ON author.id =book_sales.author_id
WHERE date = '2022-02-01'
GROUP BY date,
   first_name,
         last_name,
         book_title,
         sales;

After listing the date, author’s name, and book title, I selected the column sales. It represents the daily sales of a book title.

The next step is to show the total sales on the desired date. To do this, an ordinary aggregate function has to become a window aggregate function. This transformation is achieved  by writing the OVER() clause and leaving the parentheses empty.

To get the sales for 1 February 2022, I need to filter data using the WHERE clause. If I didn’t apply this filter, using the OVER() clause would get me the sum of overall sales.

Now I need to divide the sales by the book with the total daily sales to get the sales percentage for each book. Since SQL doesn’t allow referencing the alias column of the calculation, I needed to write the above calculation again (or copy it, to be more honest). I also converted the data type into DECIMAL and multiplied the result by 100 to get a percentage. Data is fetched from both tables using the JOIN clause.

In the end, data is grouped by the selected columns. Running the query will give you this result:

datefirst_namelast_namebook_titlesalestotal_daily_salespercent_of_daily_sales
2022-02-01HanyaYanagiharaA Little Life148017.5
2022-02-01HanyaYanagiharaTo Paradise428052.5
2022-02-01MayaAngelouGather Together in My Name7808.75
2022-02-01MayaAngelouI Know Why the Caged Bird Sings98011.25
2022-02-01MilanKunderaThe Book of Laughter and Forgetting7808.75
2022-02-01MilanKunderaThe Unbearable Lightness of Being1801.25

You can interpret the result in the following way. The total daily sales for 1 February 2022 is 80. Of those 80, 14 copies sold refer to “A Little Life”, which is 17.5% of the total daily sales. You can go through the result and see the percentage for other titles.  

Example 2: OVER(ORDER BY)

The second example will show you how to rank sales using the DENSE_RANK() function (which does  not omit ranking numbers) with  ORDER BY in the OVER() clause.

The ORDER BY clause tells the window function in which order to perform its calculations. It includes both the column name and the order of the calculation (descending (10-1) or ascending (1-10)).

SELECT date,
	 first_name,
	 last_name,
	 book_title,
	 sales,
	 DENSE_RANK() OVER(ORDER BY sales DESC)
FROM author 
JOIN book_sales 
	ON author.id = book_sales.author_id;

The info I want to see is the sales date, author’s name, book title, and the number of copies sold on that day.

Remember, I want to rank these sales using the DENSE_RANK() function. As in the previous example, I introduced the window function using the OVER() clause. The parentheses aren’t empty this time: there’s ORDER BY sales DESC. What does that mean? That I want to rank data according to sales, from the highest to the lowest.

Have a look at the partial output:

datefirst_namelast_namebook_titlesalesdense_rank
2022-02-02HanyaYanagiharaTo Paradise671
2022-02-01HanyaYanagiharaTo Paradise422
2022-02-03HanyaYanagiharaTo Paradise343
2022-02-03MayaAngelouI Know Why the Caged Bird Sings214
2022-02-02HanyaYanagiharaA Little Life214
2022-02-03HanyaYanagiharaA Little Life195

The highest daily sales for any particular book was on 2 February 2022. It was Hanya Yanagihara’s latest title, “To Paradise”, which sold 67 copies. The same title holds the second and the third place. Two other titles sold 21 copies per day; these are “I Know Why the Caged Bird Sings” and “A Little Life”. They share fourth place, since the number of copies sold is the same. For information on why tied ranks are important and how different ranking window functions treat them, see this overview of SQL ranking functions.

Example 3: OVER(PARTITION BY)

The second important clause in OVER() is PARTITION BY. This specifies the window of data over which the calculation will be executed. What that means will be more apparent after I show you how to find the daily sales and the maximum daily sales.

Here’s the code:

SELECT date,
       book_title,
	 sales,
	 MAX(sales) OVER(PARTITION BY date) AS highest_daily_sales
FROM book_sales;

I want to output the date, book title, and the number of copies sold. There also needs to be info about the highest number of copies sold that day.

To achieve that, I use the MAX() function. Again, the OVER() clause is here to make it a window function. Inside the OVER(),  there’s PARTITION BY date. This instructs the window function not to show the highest overall daily sales, but only the highest sales for that particular date.

The results might help you understand what that means:

datebook_titlesaleshighest_daily_sales
2022-02-01A Little Life1442
2022-02-01I Know Why the Caged Bird Sings942
2022-02-01The Unbearable Lightness of Being142
2022-02-01To Paradise4242
2022-02-01Gather Together in My Name742
2022-02-01The Book of Laughter and Forgetting742
2022-02-02A Little Life2167
2022-02-02I Know Why the Caged Bird Sings1267
2022-02-02The Unbearable Lightness of Being1867
2022-02-02To Paradise6767
2022-02-02Gather Together in My Name867
2022-02-02The Book of Laughter and Forgetting1267

This is, again, only part of the output. You see that the output lists all sales on 1 February 2022. It also shows the highest sales for that day was 42 copies. If you go a few rows lower, you’ll discover 42 copies is how many “To Paradise” sold, which makes it the best-selling book of that day.

On 2 February 2022, the highest sales were 67, which is again “To Paradise”.

Now it’s obvious what the PARTITION BY clause does. It divides data into windows (in this case, according to date), performs the calculation for the first window, restarts the calculation at the second window (another date), and so on.

Example 4: OVER(PARTITION BY ORDER BY)

Now, I’m going all in and showing you how the window functions work when you use both the PARTITION BY and the ORDER BY clauses in OVER().

The example asks you to calculate the cumulative sum of each title’s sales. As you’d imagine, I’ll use the SUM() window function:

SELECT date,
       book_title,
	 SUM(sales) OVER (PARTITION BY book_title ORDER BY date) AS cum_sum
FROM book_sales;

I need only three rows: date, book title, and the cumulative sum of sales. The first two are easy, but how do I get this cumulative amount?

I use the SUM() function on the sales column. Then I introduce the window function via the OVER() clause. I want to calculate the cumulative sum for every book separately, so I partition data by book title.

The cumulative sum with the data I have would mean going through sales chronologically and adding the current day’s sales to the previous sales. The ORDER BY instructs the window function to sum data exactly that way: from the earliest date to the latest. When not specified, the order is ascending by default. Again, remember that the ORDER BY in the OVER() clause does not order the output. It orders data within the window frame, i.e., influences the order of the calculation.

And the full-scale output is:

datebook_titlecumulative_sum
2022-02-01A Little Life14
2022-02-02A Little Life35
2022-02-03A Little Life54
2022-02-01Gather Together in My Name7
2022-02-02Gather Together in My Name15
2022-02-03Gather Together in My Name17
2022-02-01I Know Why the Caged Bird Sings9
2022-02-02I Know Why the Caged Bird Sings21
2022-02-03I Know Why the Caged Bird Sings42
2022-02-01The Book of Laughter and Forgetting7
2022-02-02The Book of Laughter and Forgetting19
2022-02-03The Book of Laughter and Forgetting23
2022-02-01The Unbearable Lightness of Being1
2022-02-02The Unbearable Lightness of Being19
2022-02-03The Unbearable Lightness of Being23
2022-02-01To Paradise42
2022-02-02To Paradise109
2022-02-03To Paradise143

The output tells you that “A Little Life” sold 14 copies on 1 February 2022. The cumulative sum on 2 February is 35, while the cumulative sum on 3 February is 54, which is also the total number of copies sold. You can interpret the rest of the data in the same way.

I believe it’s clear how the cumulative sum (or running total) works. If it’s not, read this article on  computing running totals in SQL.

When Is OVER() Useful in Business?

The practical examples I showed you explain the uses of the OVER() clause from a technical standpoint. Let’s see how this translates to practical business use!

Creating Rankings

SQL window functions such as ROW_NUMBER(), RANK(), and DENSE_RANK() are helpful when creating various rankings or in numbering rows. Some business examples include creating Top 5, Top 10, or Top n reports of the highest or the lowest selling products, highest or lowest-paid employees, highest or lowest streamed songs or movies, or the website/app users with the highest and lowest activity.

Calculating Cumulative Sums

From the last example, you already learned how to calculate running totals (cumulative sums) using the SUM() window function. Some examples of business use are calculating cumulative sales, costs, streams, website/app activities, etc.

Calculating Moving Sums and Moving Averages

By applying the ROWS BETWEEN clause with the SUM() and AVG() window functions, you can calculate the moving sum and the moving average. They are cumulative sums and cumulative averages with set starting and ending time periods. This means you could, for instance, calculate the sum of the last three days or the average of the last seven days. In practice, this is used again on summing the revenue over time (day, week, month, etc.) or FX rates.

Time Series Analysis

To perform a time series analysis in SQL, you should use the LEAD() and LAG() window functions. They make it possible to go a certain number of rows back or forward compared to the current row. This is helpful when you prepare day-to-day, month-to-month, year-to-year and similar reports. It’s used to show the difference (delta) between the previous and current period or the percentage of increase or decrease. You could compare budgeted and/or actual sales, salaries, costs, or any data showing time series, such as COVID-19 data.

Finding the Nth Value

If you want to find the first or last values in the dataset, the MIN(), MAX(), FIRST_VALUE(), and LAST_VALUE() functions are the way to go. If you don’t want to limit yourself only to the first or last values, you should use the NTH_VALUE() window function. It will get you the nth value (6th, 1,000th, etc.) from a dataset.

One popular business example of this is finding the highest/lowest paid employee in a department or company or finding the third, the fifth, or the nth highest/lowest paid employee. You could do the same with the product sales, product costs, loan applications, borrowed books, streams, etc. 

See for Yourself How the OVER() Clause Works

You’ve learned how to use OVER() alone and with any combination of its additional ORDER BY and PARTITION BY clauses. I think you realize how useful it is in elevating your data analysis skills and helping you in endless practical business situations.

Don’t let what you’ve learned here evaporate! Have a look at our Window Functions course to practice the OVER() clause and discover even more situations for its practical use.