# What Is Advanced SQL?

*Are you confused about advanced SQL skills? What are they? This article will explain what advanced SQL can mean, especially as we use it on LearnSQL.com. *

I’m sure you find the phrases ‘advanced SQL skills’ or ‘advanced SQL topics’ very often. You read one article about advanced SQL and you’re happy with how easy these advanced topics seem to be. Then you talk to someone and you see they consider everything you know as basic SQL knowledge. How do you define yourself? Do you consider yourself a basic, intermediate, or advanced SQL user?

## Advanced SQL Is Everywhere

Advanced SQL is everywhere. Well, the word ‘advanced’ is, at least. It’s used very commonly by SQL learners and SQL users. You can find it in SQL course descriptions, in job ads, and in the job interview questions. It’s in the SQL literature. You hear it when colleagues are talking at work. It’s in numerous articles trying to define what advanced SQL is.

Actually, I’m not trying to give you a definition of advanced SQL. I’m trying to tell you something else: **There’s no single definition of advanced SQL!** And you should stop looking for it. What should you do instead?

## Embrace the Inconsistency

That’s right! You should accept that the term ‘advanced SQL’ is used inconsistently. It means different things depending on the context and who’s using the term.

It’s only logical that advanced SQL would include one thing for someone who writes SQL reports and something entirely different for someone hiring a data analyst. A software developer will surely have yet another definition of what advanced SQL includes.

You get the picture. Advanced SQL can’t have just one definition. When you’re reading about advanced SQL skills, you should always consider the context, who’s doing the talking, and their audience.

## What Might Advanced SQL Include?

For instance, there’s a really interesting discussion about advanced SQL on Stack Overflow.

The discussion was started by someone looking for an SQL job who noted that there are plenty of jobs requiring “advanced SQL knowledge”. The user is asking what is to be expected from this kind of job. What knowledge is considered advanced?

The first answer gives a rather long code snippet as a measure of advanced knowledge. Even though it’s pretty long, it’s not that complicated. According to this reply, advanced SQL covers selecting columns, aggregate functions like `MIN()`

and `MAX()`

, the CASE WHEN statement, `JOINs`

, the `WHERE`

clause, `GROUP BY`

, declaring variables, and subqueries.

On the other hand, the following reply considers most of these topics to be basic or intermediate at best. This user believes advanced SQL topics include functions, stored procedures, hierarchical queries, triggers, indices, data modeling (normal forms, primary and foreign keys, table constraints), transactions, and much more. This is much closer to my definition of advanced SQL and what I was taught in SQL lectures. However, this was a program for database administrators; understandably, this knowledge is considered advanced. Some reporting specialists and data analysts may never need to use such things.

It’s interesting to note that sometimes `JOINs`

are considered advanced while writing stored procedures is still regarded as basic knowledge. I can understand why because one user hints at the problem with `JOINs`

. Even though they are generally considered basic knowledge, many SQL users learn much more advanced topics before really understanding `JOINs`

. This is how the basics easily become advanced knowledge. It’s not unusual to find someone using flashy functions, triggers, and whatnot – without knowing how to write a simple `JOIN`

.

## What Is Advanced SQL at LearnSQL.com?

Before explaining what advanced SQL is, it’s essential to know what it isn't. When you look at our courses and articles, basic/intermediate SQL is anything in SQL-92. (Here’s the history and details of SQL standards if you’re interested in finding out more.) This includes:

- All types of
`JOINs`

- Aggregate functions
`GROUP BY`

`HAVING`

- Subqueries
- Set operations (
`UNION`

,`UNION ALL`

,`INTERSECT`

,`MINUS`

)

You must be familiar with these topics if you claim to know SQL. These are things you should understand before moving to more advanced topics.

Generally, we consider three topics as ‘advanced SQL’:

- Window functions
- Common Table Expressions (CTEs)
`GROUP BY`

extensions (`ROLLUP`

,`CUBE`

, and`GROUPING SETS`

)

Anybody wanting to learn (or practice) all three topics should check out our Advanced SQL track. Of course, this is not the only advanced SQL course out there; we’ve already reviewed some excellent advanced SQL courses from other platforms. For now, let’s look at an example of each of these topics.

### Window Functions

SQL window functions allow you to perform operations that are often required for creating reports, e.g. ranking data, calculating running totals and moving averages, finding the difference between rows, etc. Not only that, but you can also divide data into windows, which enables you to perform operations on data subsets rather than the data as a whole. You can learn much more about this in our Window Functions course.

Let’s see an example. This code will show the difference in yearly numbers of cars sold, according to make (i.e. car brand):

SELECT car_make, cars_sold, year, cars_sold - LAG(cars_sold) OVER (PARTITION BY car_make ORDER BY year) AS sales_diff FROM cars_sale;

To get this information, you first have to select the columns you want in the result: `car_make`

, `cars_sold`

, `year`

. To get the yearly difference, subtract the previous year’s sale from the current year’s sale: `cars_sold - LAG(cars_sold) OVER (PARTITION BY car_make ORDER BY year) AS sales_diff`

. Here, `cars_sold`

means the current year’s sales. The `LAG()`

function allows you to fetch data from the previous row. The `OVER`

clause signifies this is a window function. Then follows the `PARTITION BY`

clause, which is used to define the window (data subset) we want to use. In this case, it’s the `car_make`

; this means the function will calculate the sale difference only within a specific car make. When it runs into another car make, the function will reset and start calculating the sales difference all over again.

Finally, the operation is ordered by year ascending. Why is that? The `LAG()`

function will get us the data from the previous row. So, if this operation is performed for every year in ascending order, the “previous year” will mean the previous row. That’s exactly what we need.

Take a look at the result:

car_make | cars_sold | year | sales_diff |
---|---|---|---|

Nissan | 459,663 | 2015 | NULL |

Nissan | 312,453 | 2016 | -147,210 |

Nissan | 541,223 | 2017 | 228,770 |

Nissan | 452,844 | 2018 | -88,379 |

Nissan | 584,256 | 2019 | 131,412 |

Renault | 1,342,558 | 2015 | NULL |

Renault | 17,251,456 | 2016 | 15,908,898 |

Renault | 16,842,552 | 2017 | -408,904 |

Renault | 1,425,895 | 2018 | -15,416,657 |

Renault | 1,548,698 | 2019 | 122,803 |

Did you see the `NULL`

value in the first row? That’s because 2015 is the first year; there’s no previous data that can be deducted from it. If you follow the results, you’ll see every row is the difference between the current row’s yearly sales and the previous row’s yearly sales. When you reach the row where Renault starts, there’s a `NULL`

again. This is what window functions do; they work on data within a given window., I’ve defined the window according to the `car_make`

, so the window function resets when we get a new value in this column. It’s only logical. Why would I deduct Renault sales from Nissan sales? I want to do that for every car make separately.

### Common Table Expressions (CTEs)

CTEs will allow you to write complex queries without using subqueries keeping your code simple and straightforward. They give you the possibility to produce complex reports quickly and efficiently. They also enable you to make some calculations you wouldn’t be able to do otherwise.

What is a common table expression, you might ask? It’s a temporary result you can use in the SELECT statement. It works like a temporary table – you can join it with other tables, other CTEs, or with itself.

They can be helpful if you, for instance, have to report on time spent on a particular project. On one side, there’s a table containing data about the date when each employee worked on this project. There’s also the start time and end time. On the other side, there’s a table containing employee names. You have to produce a table showing every employee’s name and his or her average time spent on this project.

Here’s how the CTE can help you:

WITH time_worked AS ( SELECT employee_id, end_time - start_time AS time FROM project_timesheet ) SELECT e.first_name, e.last_name, AVG (tw.time) AS avg_time_worked FROM employee e LEFT JOIN time_worked tw ON e.id = tw.employee_id GROUP BY e.first_name, e.last_name;

How does this CTE work? Every CTE opens with the `WITH`

clause. Then you must name your CTE; in this case, it’s `time_worked`

. Then you write a `SELECT`

statement. Here, I’ll use the CTE to calculate how much time each employee worked every time they worked on the project. I need the CTE because I don’t have this information stated explicitly in the table; I only have the `start_time`

and `end_time`

. To calculate the average time worked, the first step is to get the time worked. That’s why this CTE deducts the `start_time`

from the `end_time`

and shows the result in the column `time`

. The data is taken from the table

.**project_timesheet**

Now that I’ve written the CTE, I can use it in the next `SELECT`

statement. First, I’ll get the first name and the last name from the table

. Then I’ll use the **employee**`AVG()`

function on the column `time`

from the CTE `time_worked`

. To do that, I’ve used the `LEFT JOIN`

– and I’ve used it exactly like I would with any other table. Finally, the data is grouped by the employees’ first and last names.

The result is a little table like this:

first_name | last_name | avg_time_worked |
---|---|---|

Janine | Rooney | 4:58:39 |

Mike | Watson | 5:52:24 |

Peter | Marcotti | 4:09:33 |

Inge | Ongeborg | 8:56:05 |

If CTEs have you interested, imagine what you’ll be able to do after finishing our Recursive Queries course. Oh, yeah – I didn’t mention that a CTE can be recursive, which means it references itself. By doing so, it returns the sub-result and repeats the process until it returns the final result. While CTEs can be non-recursive, there are no recursive queries that are non-CTE. If you want to learn recursive queries, knowing CTEs is a must.

### GROUP BY Extensions

SQL’s `GROUP BY`

extensions provide you with additional possibilities for grouping data. This, in return, can increase the complexity of your data analysis and the reports you create.

There are three `GROUP BY`

extensions:

`ROLLUP`

`CUBE`

`GROUPING SETS`

Unlike regular `GROUP BY`

, `ROLLUP`

lets you group the data into multiple data sets and aggregate results on different levels. Fancy talk, but simply put: you can use `ROLLUP`

to calculate totals and subtotals, just like in Excel pivot tables.

The `CUBE`

extension is similar, but there’s one crucial difference. `CUBE`

will generate subtotals for every combination of the columns specified.

Finally, there are `GROUPING SETs`

. A grouping set is a set of columns you use in the `GROUP BY`

clause. You can connect different queries containing `GROUP BY`

if you use `UNION ALL`

. However, the more queries you have, the messier it gets. You can achieve the same result but with much neater queries by using `GROUPING SETS`

.

Let me show you how `ROLLUP`

works. Suppose you’re working for a guitar store that has several locations. You’ll sometimes need to create a report showing the total number of guitars you have in stock. Here’s a query that will do that on a manufacturer, model, and store level:

SELECT manufacturer, model, store, SUM(quantity) AS quantity_sum FROM guitars GROUP BY ROLLUP (manufacturer, model, store) ORDER BY manufacturer;

This doesn’t look complicated. It’s a simple `SELECT`

statement that will give you the columns `manufacturer`

, `model`

, and store from the table

. I’ve used the aggregate function **guitars**`SUM()`

to get the quantity. Then I wrote `GROUP BY`

followed immediately by `ROLLUP`

. The data will be grouped according to the columns in the parentheses. Finally, the result is ordered by the manufacturer.

What will this query return? Have a look:

manufacturer | model | store | quantity_sum |
---|---|---|---|

Fender | Jazzmaster | Amsterdam | 9 |

Fender | Jazzmaster | New York | 32 |

Fender | Jazzmaster | NULL | 41 |

Fender | Stratocaster | Amsterdam | 102 |

Fender | Stratocaster | New York | 157 |

Fender | Stratocaster | NULL | 259 |

Fender | Telecaster | Amsterdam | 80 |

Fender | Telecaster | New York | 212 |

Fender | Telecaster | NULL | 292 |

Fender | NULL | NULL | 592 |

Gibson | ES-335 | Amsterdam | 4 |

Gibson | ES-335 | New York | 26 |

Gibson | ES-335 | NULL | 30 |

Gibson | Les Paul | Amsterdam | 21 |

Gibson | Les Paul | New York | 42 |

Gibson | Les Paul | NULL | 63 |

Gibson | SG | Amsterdam | 32 |

Gibson | SG | New York | 61 |

Gibson | SG | NULL | 93 |

Gibson | NULL | NULL | 186 |

NULL | NULL | NULL | 778 |

It should be easier to understand what I mean by different grouping levels. A little tip before I continue: Wherever you see a `NULL`

value, this is a subtotal. Let’s have a look at the table. First, there are 9 Fender Jazzmasters in Amsterdam. Then there are 32 Fender Jazzmasters in New York. The total quantity is 41, which is what is shown in the row:

manufacturer | model | store | quantity_sum |
---|---|---|---|

Fender | Jazzmaster | NULL | 41 |

The `NULL`

value means the data is grouped on a store level. This result reads “there are 41 Fender Jazzmasters in total, in both New York and Amsterdam”. The same calculation is done for every other Fender model, i.e. Stratocaster and Telecaster. Then there’s this row:

manufacturer | model | store | quantity_sum |
---|---|---|---|

Fender | NULL | NULL | 592 |

What does it mean? It means there are in total 592 Fenders of all three models in both stores.

The same principle is applied to Gibson. The quantity of guitars in Amsterdam and New York is first shown for the model. After this is done, there is a subtotal summing of the quantities from both stores. This is done for all three Gibson models: ES-335, Les Paul, and SG. Then there is a line showing the total number of all three Gibson guitar models in both stores (the same as with Fenders):

manufacturer | model | store | quantity_sum |
---|---|---|---|

Gibson | NULL | NULL | 186 |

Finally, there’s a row showing the total number of guitars, no matter the store, guitar manufacturer, or model:

manufacturer | model | store | quantity_sum |
---|---|---|---|

NULL | NULL | NULL | 778 |

I’m sure you now want to find out how CUBE and GROUPING SETS work. For that, I’d recommend having a look at the GROUP BY extensions course.

These advanced topics are something data analysts will use very often. I’ve therefore prepared some SQL constructions for my fellow data analysts. If you’re into finance, here are some advanced SQL queries for financial analysis.

## Do You Consider Yourself an Advanced SQL User?

How do you feel now? Did I raise your confidence? If you already know SQL window functions, CTEs, and the GROUP BY extensions, you can brag about your advanced SQL skills.

Or maybe I did just the opposite? Perhaps I’ve shaken your confidence when you realized you don’t know anything about the advanced topics I’ve talked about in this article.

Don’t worry! Whatever group you belong to, there are LearnSQL.com courses that will help you build your knowledge and your skills. Want to learn window functions? No problem – see our Window Functions course. Interested in CTEs? You can learn and practice them in our Recursive Queries course. Need to get more out of GROUP BY? Our GROUP BY Extensions in SQL course has you covered.