19th May 2021 12 minutes read

What Is Advanced SQL?

Are you confused about advanced SQL skills? What are they? This article will explain what advanced SQL can mean, especially as we use it on LearnSQL.com.

I’m sure you find the phrases ‘advanced SQL skills’ or ‘advanced SQL topics’ very often. You read one article about advanced SQL and you’re happy with how easy these advanced topics seem to be. Then you talk to someone and you see they consider everything you know as basic SQL knowledge. How do you define yourself? Do you consider yourself a basic, intermediate, or advanced SQL user?

Advanced SQL Is Everywhere

Advanced SQL is everywhere. Well, the word ‘advanced’ is, at least. It’s used very commonly by SQL learners and SQL users. You can find it in SQL course descriptions, in job ads, and in the job interview questions. It’s in the SQL literature. You hear it when colleagues are talking at work. It’s in numerous articles trying to define what advanced SQL is.

Actually, I’m not trying to give you a definition of advanced SQL. I’m trying to tell you something else: There’s no single definition of advanced SQL! And you should stop looking for it. What should you do instead?

Embrace the Inconsistency

That’s right! You should accept that the term ‘advanced SQL’ is used inconsistently. It means different things depending on the context and who’s using the term.

It’s only logical that advanced SQL would include one thing for someone who writes SQL reports and something entirely different for someone hiring a data analyst. A software developer will surely have yet another definition of what advanced SQL includes.

You get the picture. Advanced SQL can’t have just one definition. When you’re reading about advanced SQL skills, you should always consider the context, who’s doing the talking, and their audience.

What Might Advanced SQL Include?

For instance, there’s a really interesting discussion about advanced SQL on Stack Overflow.

The discussion was started by someone looking for an SQL job who noted that there are plenty of jobs requiring “advanced SQL knowledge”. The user is asking what is to be expected from this kind of job. What knowledge is considered advanced?

The first answer gives a rather long code snippet as a measure of advanced knowledge. Even though it’s pretty long, it’s not that complicated. According to this reply, advanced SQL covers selecting columns, aggregate functions like MIN() and MAX(), the CASE WHEN statement, JOINs, the WHERE clause, GROUP BY, declaring variables, and subqueries.

On the other hand, the following reply considers most of these topics to be basic or intermediate at best. This user believes advanced SQL topics include functions, stored procedures, hierarchical queries, triggers, indices, data modeling (normal forms, primary and foreign keys, table constraints), transactions, and much more. This is much closer to my definition of advanced SQL and what I was taught in SQL lectures. However, this was a program for database administrators; understandably, this knowledge is considered advanced. Some reporting specialists and data analysts may never need to use such things.

It’s interesting to note that sometimes JOINs are considered advanced while writing stored procedures is still regarded as basic knowledge. I can understand why because one user hints at the problem with JOINs. Even though they are generally considered basic knowledge, many SQL users learn much more advanced topics before really understanding JOINs. This is how the basics easily become advanced knowledge. It’s not unusual to find someone using flashy functions, triggers, and whatnot – without knowing how to write a simple JOIN.

What Is Advanced SQL at LearnSQL.com?

Before explaining what advanced SQL is, it’s essential to know what it isn't. When you look at our courses and articles, basic/intermediate SQL is anything in SQL-92. (Here’s the history and details of SQL standards if you’re interested in finding out more.) This includes:

All types of JOINs
Aggregate functions
GROUP BY
HAVING
Subqueries
Set operations (UNION, UNION ALL, INTERSECT, MINUS)

You must be familiar with these topics if you claim to know SQL. These are things you should understand before moving to more advanced topics.

Generally, we consider three topics as ‘advanced SQL’:

Window functions
Common Table Expressions (CTEs)
GROUP BY extensions (ROLLUP, CUBE, and GROUPING SETS)

Anybody wanting to learn (or practice) all three topics should check out our Advanced SQL track. Of course, this is not the only advanced SQL course out there; we’ve already reviewed some excellent advanced SQL courses from other platforms. For now, let’s look at an example of each of these topics.

Window Functions

SQL window functions allow you to perform operations that are often required for creating reports, e.g. ranking data, calculating running totals and moving averages, finding the difference between rows, etc. Not only that, but you can also divide data into windows, which enables you to perform operations on data subsets rather than the data as a whole. You can learn much more about this in our Window Functions course.

Let’s see an example. This code will show the difference in yearly numbers of cars sold, according to make (i.e. car brand):

SELECT	car_make,
		cars_sold,
		year,
		cars_sold - LAG(cars_sold) OVER (PARTITION BY car_make ORDER BY year) AS sales_diff
FROM cars_sale;

To get this information, you first have to select the columns you want in the result: car_make, cars_sold, year. To get the yearly difference, subtract the previous year’s sale from the current year’s sale: cars_sold - LAG(cars_sold) OVER (PARTITION BY car_make ORDER BY year) AS sales_diff. Here, cars_sold means the current year’s sales. The LAG() function allows you to fetch data from the previous row. The OVER clause signifies this is a window function. Then follows the PARTITION BY clause, which is used to define the window (data subset) we want to use. In this case, it’s the car_make; this means the function will calculate the sale difference only within a specific car make. When it runs into another car make, the function will reset and start calculating the sales difference all over again.

Finally, the operation is ordered by year ascending. Why is that? The LAG() function will get us the data from the previous row. So, if this operation is performed for every year in ascending order, the “previous year” will mean the previous row. That’s exactly what we need.

Take a look at the result:

car_make	cars_sold	year	sales_diff
Nissan	459,663	2015	NULL
Nissan	312,453	2016	-147,210
Nissan	541,223	2017	228,770
Nissan	452,844	2018	-88,379
Nissan	584,256	2019	131,412
Renault	1,342,558	2015	NULL
Renault	17,251,456	2016	15,908,898
Renault	16,842,552	2017	-408,904
Renault	1,425,895	2018	-15,416,657
Renault	1,548,698	2019	122,803

Did you see the NULL value in the first row? That’s because 2015 is the first year; there’s no previous data that can be deducted from it. If you follow the results, you’ll see every row is the difference between the current row’s yearly sales and the previous row’s yearly sales. When you reach the row where Renault starts, there’s a NULL again. This is what window functions do; they work on data within a given window., I’ve defined the window according to the car_make, so the window function resets when we get a new value in this column. It’s only logical. Why would I deduct Renault sales from Nissan sales? I want to do that for every car make separately.

Common Table Expressions (CTEs)

CTEs will allow you to write complex queries without using subqueries keeping your code simple and straightforward. They give you the possibility to produce complex reports quickly and efficiently. They also enable you to make some calculations you wouldn’t be able to do otherwise.

What is a common table expression, you might ask? It’s a temporary result you can use in the SELECT statement. It works like a temporary table – you can join it with other tables, other CTEs, or with itself.

They can be helpful if you, for instance, have to report on time spent on a particular project. On one side, there’s a table containing data about the date when each employee worked on this project. There’s also the start time and end time. On the other side, there’s a table containing employee names. You have to produce a table showing every employee’s name and his or her average time spent on this project.

Here’s how the CTE can help you:

WITH time_worked AS (
	SELECT	employee_id,
			end_time - start_time AS time
FROM project_timesheet
)
SELECT	e.first_name,
		e.last_name,
		AVG (tw.time) AS avg_time_worked
FROM employee e 
LEFT JOIN time_worked tw 
ON e.id = tw.employee_id
GROUP BY e.first_name, e.last_name;

How does this CTE work? Every CTE opens with the WITH clause. Then you must name your CTE; in this case, it’s time_worked. Then you write a SELECT statement. Here, I’ll use the CTE to calculate how much time each employee worked every time they worked on the project. I need the CTE because I don’t have this information stated explicitly in the table; I only have the start_time and end_time. To calculate the average time worked, the first step is to get the time worked. That’s why this CTE deducts the start_time from the end_time and shows the result in the column time. The data is taken from the table project_timesheet.

Now that I’ve written the CTE, I can use it in the next SELECT statement. First, I’ll get the first name and the last name from the table employee. Then I’ll use the AVG() function on the column time from the CTE time_worked. To do that, I’ve used the LEFT JOIN – and I’ve used it exactly like I would with any other table. Finally, the data is grouped by the employees’ first and last names.

The result is a little table like this:

first_name	last_name	avg_time_worked
Janine	Rooney	4:58:39
Mike	Watson	5:52:24
Peter	Marcotti	4:09:33
Inge	Ongeborg	8:56:05

If CTEs have you interested, imagine what you’ll be able to do after finishing our Recursive Queries course. Oh, yeah – I didn’t mention that a CTE can be recursive, which means it references itself. By doing so, it returns the sub-result and repeats the process until it returns the final result. While CTEs can be non-recursive, there are no recursive queries that are non-CTE. If you want to learn recursive queries, knowing CTEs is a must.

GROUP BY Extensions

SQL’s GROUP BY extensions provide you with additional possibilities for grouping data. This, in return, can increase the complexity of your data analysis and the reports you create.

There are three GROUP BY extensions:

ROLLUP
CUBE
GROUPING SETS

Unlike regular GROUP BY, ROLLUP lets you group the data into multiple data sets and aggregate results on different levels. Fancy talk, but simply put: you can use ROLLUP to calculate totals and subtotals, just like in Excel pivot tables.

The CUBE extension is similar, but there’s one crucial difference. CUBE will generate subtotals for every combination of the columns specified.

Finally, there are GROUPING SETs. A grouping set is a set of columns you use in the GROUP BY clause. You can connect different queries containing GROUP BY if you use UNION ALL. However, the more queries you have, the messier it gets. You can achieve the same result but with much neater queries by using GROUPING SETS.

Let me show you how ROLLUP works. Suppose you’re working for a guitar store that has several locations. You’ll sometimes need to create a report showing the total number of guitars you have in stock. Here’s a query that will do that on a manufacturer, model, and store level:

SELECT	manufacturer,
		model,
		store,
		SUM(quantity) AS quantity_sum
FROM guitars
GROUP BY ROLLUP (manufacturer, model, store)
ORDER BY manufacturer;

This doesn’t look complicated. It’s a simple SELECT statement that will give you the columns manufacturer, model, and store from the table guitars. I’ve used the aggregate function SUM() to get the quantity. Then I wrote GROUP BY followed immediately by ROLLUP. The data will be grouped according to the columns in the parentheses. Finally, the result is ordered by the manufacturer.

What will this query return? Have a look:

manufacturer	model	store	quantity_sum
Fender	Jazzmaster	Amsterdam	9
Fender	Jazzmaster	New York	32
Fender	Jazzmaster	NULL	41
Fender	Stratocaster	Amsterdam	102
Fender	Stratocaster	New York	157
Fender	Stratocaster	NULL	259
Fender	Telecaster	Amsterdam	80
Fender	Telecaster	New York	212
Fender	Telecaster	NULL	292
Fender	NULL	NULL	592
Gibson	ES-335	Amsterdam	4
Gibson	ES-335	New York	26
Gibson	ES-335	NULL	30
Gibson	Les Paul	Amsterdam	21
Gibson	Les Paul	New York	42
Gibson	Les Paul	NULL	63
Gibson	SG	Amsterdam	32
Gibson	SG	New York	61
Gibson	SG	NULL	93
Gibson	NULL	NULL	186
NULL	NULL	NULL	778

It should be easier to understand what I mean by different grouping levels. A little tip before I continue: Wherever you see a NULL value, this is a subtotal. Let’s have a look at the table. First, there are 9 Fender Jazzmasters in Amsterdam. Then there are 32 Fender Jazzmasters in New York. The total quantity is 41, which is what is shown in the row:

manufacturer	model	store	quantity_sum
Fender	Jazzmaster	NULL	41

The NULL value means the data is grouped on a store level. This result reads “there are 41 Fender Jazzmasters in total, in both New York and Amsterdam”. The same calculation is done for every other Fender model, i.e. Stratocaster and Telecaster. Then there’s this row:

manufacturer	model	store	quantity_sum
Fender	NULL	NULL	592

What does it mean? It means there are in total 592 Fenders of all three models in both stores.

The same principle is applied to Gibson. The quantity of guitars in Amsterdam and New York is first shown for the model. After this is done, there is a subtotal summing of the quantities from both stores. This is done for all three Gibson models: ES-335, Les Paul, and SG. Then there is a line showing the total number of all three Gibson guitar models in both stores (the same as with Fenders):

manufacturer	model	store	quantity_sum
Gibson	NULL	NULL	186

Finally, there’s a row showing the total number of guitars, no matter the store, guitar manufacturer, or model:

manufacturer	model	store	quantity_sum
NULL	NULL	NULL	778

I’m sure you now want to find out how CUBE and GROUPING SETS work. For that, I’d recommend having a look at the GROUP BY extensions course.

These advanced topics are something data analysts will use very often. I’ve therefore prepared some SQL constructions for my fellow data analysts. If you’re into finance, here are some advanced SQL queries for financial analysis.

Do You Consider Yourself an Advanced SQL User?

How do you feel now? Did I raise your confidence? If you already know SQL window functions, CTEs, and the GROUP BY extensions, you can brag about your advanced SQL skills.

Or maybe I did just the opposite? Perhaps I’ve shaken your confidence when you realized you don’t know anything about the advanced topics I’ve talked about in this article.

Don’t worry! Whatever group you belong to, there are LearnSQL.com courses that will help you build your knowledge and your skills. Want to learn window functions? No problem – see our Window Functions course. Interested in CTEs? You can learn and practice them in our Recursive Queries course. Need to get more out of GROUP BY? Our GROUP BY Extensions in SQL course has you covered.

Tags: