8th Apr 2020 6 minutes read

SQL Window Functions vs. GROUP BY: What’s the Difference?

A very common misconception among SQL users is that there is not much difference between SQL window functions and aggregate functions or the GROUP BY clause. However, the differences are very significant.

Perhaps the only similar point between GROUP BY and window functions is that both allow you to execute a function (such as AVG, MAX, MIN, or COUNT) on a group of records. I’d say that window functions’ special power is that they allow us to obtain results that otherwise would be almost impossible to achieve.

In this article, we’ll go over the use of window functions vs GROUP BY and window functions vs aggregate functions.

Quick Review of GROUP BY

The GROUP BY clause allows us to group a set of records based on some criteria and apply a function (e.g. AVG or MAX) to each group, obtaining one result for each group of records. Let’s see an example.

We have a table called employee with a total of five employees and three departments:

Employee_Name	Department	Salary
John Roberts	Finance	2300
Peter Hudson	Marketing	1800
Sue Gibson	Finance	2000
Melinda Bishop	Marketing	1500
Nancy Hudson	IT	1950

fig1: the employee table

Suppose we want to obtain the average salary per department and the top salary for every department. We should use the following query:

SELECT Department, 
       avg(salary) as average,
       max(salary) as top_salary
FROM employee
GROUP BY department

The image below shows the result:

Department	average	top_salary
Marketing	1650	1800
Finance	2150	2300
IT	1950	1950

GROUP BY vs Window Functions

When comparing window functions and GROUP BY, it’s essential to remember that GROUP BY collapses the individual records into groups; after using GROUP BY, you cannot refer to any individual field because it is collapsed. Later, we will talk in depth about this topic. For now, we’ll just mention that window functions do not collapse individual records.

So, if you want to create a report with an employee name, salary, and the top salary of the employee’s department, you can’t do it with GROUP BY. The individual records of each employee are collapsed by the GROUP BY department clause. For this kind of report, you need to use window functions, which is the topic of the next section.

If you want to go deep into the nuances of SQL GROUP BY and reports, we recommend our interactive course Creating Basic SQL Reports.

The Power of Window Functions

Window functions are a powerful feature of SQL. They allow us to apply functions like AVG, COUNT, MAX, and MIN on a group of records while still leaving the individual records accessible. Since the individual records are not collapsed, we can create queries showing data from the individual record together with the result of the window function. This is what makes window functions so powerful.

Suppose we want to obtain a list of employee names, salaries, and the top salary in their departments.

SELECT 	employee_name,
department,
salary,
max(salary) OVER (PARTITION BY department) as top_salary
FROM		employee

The next image shows the result:

Employee_Name	Department	salary	top_salary
John Roberts	Finance	2300	2300
Peter Hudson	Marketing	1800	1800
Sue Gibson	Finance	2000	2300
Melinda Bishop	Marketing	1500	1800
Nancy Hudson	IT	1950	1950

In the previous query, we used a window function:

max(salary) OVER (PARTITION BY department) as top_salary

The window function is MAX() and we applied it to the set of records defined by the clause OVER (PARTITION BY department), which are the records with the same value in the department field. Lastly, we renamed the column top_salary. In the query result, we have rows for individual employees. If we used GROUP BY instead of window functions, we would have rows for each department.

Window functions have a rather verbose syntax; if you want to go into details, I suggest the “Window Functions” course, which is a step-by-step tutorial that takes you through SQL window functions using examples and exercises.

Window Functions vs Aggregate Functions

When we compare window functions and aggregate functions, we note a super powerful feature on the window functions side: positional functions. They allow us to obtain a column value from other records in the same window. This is a really amazing ability, allowing SQL users to create complex reports in just a few lines. Let’s briefly discuss two of these functions: LEAD() and LAG().

The LAG() function returns the column value of the previous record in the window, while LEAD() returns the column value from the next record in the window. It is very important to have the window ordered by the right column if you want to use these functions.

Let’s see an example of how we can use these functions. Suppose we have a table that stores company shares with their market values at a point in time. The table might look like this:

share_symbol	timestamp	value
OILBEST	2020-03-05 10:00	120
OILBEST	2020-03-05 12:00	123
OILBEST	2020-03-05 15:00	122
BANKWEB	2020-03-05 10:00	91
BANKWEB	2020-03-05 12:00	87
BANKWEB	2020-03-05 15:00	99

fig2: the share table

Suppose we want a report showing every share’s value with its previous value and the variation percentage related to the previous value. We can do it by using the LEAD() function to obtain the share’s previous value. Note that we use ORDER BY timestamp when defining the partition (i.e. the window of records). We’ll go back to this point later.

SELECT 	
  share_symbol,
  timestamp,
  value,
  LAG(value) OVER (PARTITION BY share_symbol ORDER BY timestamp ) AS previous_value, 
  TRUNC(((value - (LAG(value) OVER (PARTITION BY share_symbol ORDER BY timestamp )))*100)/value,2) AS percentage_variation
FROM share

Notice that the columns previous_value and percentage_variation are calculated columns. They use values from different records in the same table.

share_symbol	timestamp	value	previous_value	percentage_variation
OILBEST	2020-03-05 10:00	120
OILBEST	2020-03-05 12:00	123	120	2.43
OILBEST	2020-03-05 15:00	122	123	-0.81
BANKWEB	2020-03-05 10:00	91
BANKWEB	2020-03-05 12:00	87	91	-4.59
BANKWEB	2020-03-05 15:00	99	87	12.12

When we use positional functions, it is very important to put an ORDER BY clause along with the PARTITION clause (as we did in the previous query). If we don’t use the right ORDER BY, the results can be wrong. Why? Because positional functions work based on the order of the records in the window.

Let’s examine this a little more. The FIRST_VALUE() function returns a column value from the first record in the window. LAG(), as we know, returns the column value of the previous record in the window. Having the right window order is crucial; imagine what you’d get from these functions otherwise! In our example, we want the previous chronological market value for a specific share. Thus, we used ORDER BY timestamp. If we omit the ORDER BY or we order by another column, the result would be wrong.

In some specific cases, positional functions can return wrong values because of a partially populated window. And there are more window functions, like RANK(), NTH_VALUE() and LAST_VALUE(). We don’t have space to cover all this here, but I suggest checking out this article explaining window functions and these window function examples to learn more.

Closing the Window

In this article, we explored the differences between window functions and GROUP BY. We looked at examples with several aggregate and window functions. We also talked about an important limitation of the GROUP BY clause, namely the “collapse of records”. This limitation is not present on window functions, allowing SQL developers to combine record-level data with window function results in the same query.

Another advantage of window functions is their capability to combine query values from different records (from the same window) in the same row of the result set.

If you are interested in learning more about window functions, I suggest the Window Functions course, where you can learn SQL window functions using interactive exercises and detailed explanations.

Tags:

Quick Review of GROUP BY

GROUP BY vs Window Functions

The Power of Window Functions

Window Functions vs Aggregate Functions

Closing the Window

You may also like