# How to Select the First Row of Each Group in SQL

*Grouping data in SQL is not that hard. Finding the first row of each group, maybe, but not after you read this article!*

You use `GROUP BY`

when you need to group data in SQL. The `GROUP BY`

statement groups rows with the same value into a single row. Due to its logic, it’s often used with aggregate functions. You know, functions like `MIN()`

, `MAX()`

, `SUM()`

, `COUNT()`

, `AVG()`

, etc.

A problem may surface when you want to select several columns but only one row from each group. Specifically, you may want the first row of each group, that is, the row with a minimum value. I’ll show you two main approaches:

- Correlated subquery.
- Window functions + common table expressions.

If you’re not that familiar with window functions or common table expressions, I recommend our hands-on Window Functions and Common Table Expressions courses. Together, they contain over 300 interactive exercises teach you window functions and common table expressions starting with the basics and progressing to advanced topics like recursive queries.

When is it useful to look for the first row in each group? For example, imagine you have sales data by month over five years, and you’d like to see the month with the lowest sales for every year. Or imagine you want to see the employee with the lowest salary in every department. In the example I’ll show below, you may want to see the least streamed song for every artist.

You’ll understand the problem better when I show you the data.

## Data for Finding the First Row in Each Group

id | artist | song_name | number_of_streams |
---|---|---|---|

1 | Prince | Uptown | 514,744 |

2 | Paul Simon | Graceland | 2,205,477 |

3 | Donny Hathaway | You've Got a Friend | 498,444 |

4 | Prince | I Wanna Be Your Lover | 1,547,956 |

5 | Donny Hathaway | To Be Young, Gifted And Black | 274,894 |

6 | Paul Simon | Stranger to Stranger | 498,715 |

7 | Prince | Little Red Corvette | 2,147,988 |

8 | Paul Simon | Darling Lorraine | 348,748 |

9 | Donny Hathaway | A Song For You | 687,415 |

10 | Donny Hathaway | Someday We'll All Be Free | 274,894 |

There are three different artists: Prince, Paul Simon, and Donny Hathaway. Prince and Paul Simon have three songs each. Donny Hathaway has four songs. There’s streaming data for each of these songs.

Grouping by artist gets you every artist only once, for sure. But what if you want to show song names and the number of streams, too? With a different name and a different number of streams for each song, it isn’t much of a grouping.

You can go even further and ask: how can you group by artist and also show the song with the smallest number of streams for each artist?

## Using Correlated Subqueries

The first way to find the first row of each group is by using a correlated subquery. In short, a correlated subquery is a type of subquery that is executed row by row. It uses the values from the outer query, that is, the values from the query it’s nested into.

You can use a correlated subquery to find the minimum row of each group in SQL:

SELECT artist, song_name, number_of_streams FROM streaming_data st_outer WHERE number_of_streams = (SELECT MIN(number_of_streams) FROM streaming_data WHERE artist = st_outer.artist) GROUP BY artist, song_name, number_of_streams;

In this query, I select three columns: `artist`

, `song_name`

, and `number_of_streams`

from the table ** streaming_data**. I give the alias

**to the table since this is my outer query and I reference this table again in the correlated subquery. This is a way to know which table from which**

`st_outer`

`SELECT`

statement you’re referencing.I use a `WHERE`

clause to get only the results where the `number_of_streams`

is equal to the minimum number of streams for this artist.

The subquery gets me these minimum values with the `MIN()`

aggregate function. But I filter the data using a `WHERE`

clause. That’s because I want the minimum number of streams where the artist from the correlated subquery is the same artist from the outer query.

In other words, these two `SELECT`

statements return the data for rows where the number of streams equals the minimum number of streams. If you group the result by artist, song name, and the number of streams, here’s what you get:

artist | song_name | number_of_streams |
---|---|---|

Donny Hathaway | Someday We'll All Be Free | 274,894 |

Donny Hathaway | To Be Young, Gifted And Black | 274,894 |

Paul Simon | Darling Lorraine | 348,748 |

Prince | Uptown | 514,744 |

These are indeed the least streamed song for each artist. You may be asking why there are two songs for Donny Hathaway. Nothing wrong with the output. Both songs have the same smallest number of streams, so the output shows them both.

Even though I get the correct result, I generally don’t recommend using correlated subqueries for this task. The reason? It’s not very efficient. A correlated subquery checks against the outer query once for every row. If you have a lot of data, this may be very slow. In addition, the code gets messy and difficult to read if you write a little more complex query and subquery.

The second approach for selecting the first row of each group is to combine window functions and common table expressions.

## Using Window Functions and Common Table Expressions

Window functions are similar to aggregate functions in that both perform calculations across rows. However, the window functions do not group the result into a single row as aggregate functions do. With window functions, all the rows remain intact, with the result shown in an additional column for every row.

You may want to have the Window Functions Cheat Sheet open for following the next code.

A common table expression (CTE) is a temporary result of a query. You can name, reference, and use it like any other table. The only difference is a CTE has to be executed every time you want to use its result in another query.

To get the same result as you do with the correlated subquery, combine window functions and a CTE this way:

WITH first_row AS ( SELECT RANK() OVER (PARTITION BY artist ORDER BY number_of_streams ASC) AS streaming_rank, artist, song_name, number_of_streams FROM streaming_data ) SELECT streaming_rank, artist, song_name, number_of_streams FROM first_row WHERE streaming_rank = 1 GROUP BY streaming_rank, artist, song_name, number_of_streams;

It works like this. As with any CTE, first I write the `WITH`

clause. The name of my CTE is ** first_row**. Why? Because I use it to get the first row for every artist.

In the first `SELECT`

statement, the window function appears immediately. It’s the RANK() function, whose purpose is to rank the rows in the table ** streaming_data**.

This function is followed by an `OVER()`

clause with two keywords in the parentheses. The first is `PARTITION BY`

, which I use to rank rows only within each artist. The second keyword is the `ORDER BY`

, which tells it to rank by the column `number_of_streams`

in ascending order. Using these keywords this way, the window functions rank the rows for one artist from the smallest to the largest number of streams, then with the second artist the ranking restarts at 1, and so on.

After the CTE comes the second `SELECT`

statement that references the CTE. Here, I’m simply selecting all the columns from the CTE. I’m only interested in rows ranked first, hence the `WHERE`

clause.
Finally, I group the result. Here’s what I get:

streaming_rank | artist | song_name | number_of_streams |
---|---|---|---|

1 | Donny Hathaway | Someday We'll All Be Free | 274,894 |

1 | Donny Hathaway | To Be Young, Gifted and Black | 274,894 |

1 | Paul Simon | Darling Lorraine | 348,748 |

1 | Prince | Uptown | 514,744 |

Other than the `streaming_rank`

column, the result is identical to the previous one. The difference between these two approaches is not in the output itself. Using the window functions and CTE to get the first row of each group is recommended for a few reasons.

First, it’s much more efficient. Unlike the correlated subquery, this approach does not check the same condition for every row of data. This makes it much quicker.

Second, the query is also more readable because it’s divided into two obvious steps:

- Ranking the rows.
- Selecting only the rank 1.

The code is also easier to play with and to change according to your needs. For example, you may want to use the `DENSE_RANK()`

functions instead of `RANK()`

to get the same result as above:

WITH first_row AS ( SELECT DENSE_RANK() OVER (PARTITION BY artist ORDER BY number_of_streams ASC) AS streaming_rank, artist, song_name, number_of_streams FROM streaming_data ) SELECT streaming_rank, artist, song_name, number_of_streams FROM first_row WHERE streaming_rank = 1 GROUP BY streaming_rank, artist, song_name, number_of_streams;

Both functions returns all the rows that are tied (i.e., have the same rank), so they are ideal if you want to show all the rows with the same minimum value.

Be careful, though. `RANK()`

and `DENSE_RANK()`

**are not **the same. The differences between them, however, are not the topic of this article. For that, read the ranking functions overview and the complete guide to ranking rows in SQL. All you have to remember for now is they return the same result if used for the purpose of finding the minimum row within a group.

In case you want only one row with the minimum value, and you don’t care which one, use the `ROW_NUMBER()`

function. The main difference from the previous two window functions is that it does not return tied values. If there are several rows with the same (minimum) value, it chooses only one randomly.

Here is the same code as before, only the window function changes to `ROW_NUMBER()`

:

WITH first_row AS ( SELECT ROW_NUMBER() OVER (PARTITION BY artist ORDER BY number_of_streams ASC) AS row_number, artist, song_name, number_of_streams FROM streaming_data ) SELECT streaming_rank, artist, song_name, number_of_streams FROM first_row WHERE streaming_rank = 1 GROUP BY streaming_rank, artist, song_name, number_of_streams;

As I said, the result is a bit different:

streaming_rank | artist | song_name | number_of_streams |
---|---|---|---|

1 | Donny Hathaway | To Be Young, Gifted and Black | 274,894 |

1 | Paul Simon | Darling Lorraine | 348,748 |

1 | Prince | Uptown | 514,744 |

Instead of two Donny Hathaway songs with minimum streams, I just get “To Be Young, Gifted And Black.”

One of the benefits of using window functions and CTEs is that you can easily change the ranking order. For example, if you want to see the maximum number of streams, the row with the maximum value now gets to be the first one in each group. I’ll show this only with `ROW_NUMBER()`

, but you can do the same with `RANK()`

and `DENSE_RANK()`

.

WITH first_row AS ( SELECT ROW_NUMBER() OVER (PARTITION BY artist ORDER BY number_of_streams DESC) AS streaming_rank, artist, song_name, number_of_streams FROM streaming_data ) SELECT streaming_rank, artist, song_name, number_of_streams FROM first_row WHERE streaming_rank = 1 GROUP BY streaming_rank, artist, song_name, number_of_streams;

You see the only difference, right? It’s in the `OVER()`

clause. This time, I ordered it in descending rather than ascending order. This returns the most streamed songs for every artist:

streaming_rank | artist | song_name | number_of_streams |
---|---|---|---|

1 | Donny Hathaway | A Song For You | 687,415 |

1 | Paul Simon | Graceland | 2,205,477 |

1 | Prince | Little Red Corvette | 2,147,988 |

## Find Out What Else You Can Do With Window Functions!

Finding the first row of each group is only one of the examples where you can use SQL window functions. You get a thorough guide to what they can do via the hands-on window functions course. This course is great for learning and practicing examples. And what do you get? Take a look at the article explaining the benefits of the course and of window functions themselves!

If that doesn’t quench your thirst, you can always use the Window Functions Practice Set to get your fingers more agile in writing code!