25th Aug 2023 13 minutes read SQL for Data Analysis Cheat Sheet LearnSQL.com Team cheat sheet sql data analysis Welcome to our SQL for Data Analysis Cheat Sheet — a must-have resource for anyone looking to harness the power of SQL in the realm of data analysis. Whether you're a data enthusiast, a business analyst, or a seasoned data professional, this cheat sheet is designed to empower you to extract valuable insights from your datasets. Inside, you'll find a carefully curated collection of SQL commands and tips that will elevate your data analysis game. We've distilled the most frequently employed SQL patterns in data analysis into concise references enabling you to quickly find the essential hints and syntax reminders you need. Whether you're modifying data, performing calculations, or crafting insightful reports, our cheat sheet has got you covered. Download the cheat sheet in your preferred format: A4 Format - Best for printing on A4-sized paper. Letter Format - Ideal for printing on standard Letter-sized paper. Mobile Friendly Format - Optimized for viewing on mobile devices. For visual enthusiasts: If you prefer a more visual approach or need a digital-friendly version for various applications, we also offer the cheat sheet as a high-resolution PNG image. To download, right-click (for desktop users) or long tap (for mobile users) on the image. Table of Contents SQL Basics SELECT GROUP BY ORDER BY Computations Rounding numbers Troubleshooting (integer division, division by zero) JOIN INSERT UPDATE DELETE Working with date and time Current date and time Creating date and time values Sorting chronologically Comparing date and time values Intervals Adding intervals to date and time values Filtering events to those in the last 7 days Extracting parts of dates Grouping by year and month Common SQL patterns for data analysis CASE WHEN CASE WHEN and GROUP BY GROUP BY extensions: GROUPING SETS, CUBE, ROLLUP COALESCE Common table expressions Window functions Computing the percent of total within a group Ranking Running total Moving average Difference between two rows (delta) SQL SQL, or Structured Query Language, is a language for talking to databases. It lets you select specific data and build complex reports. Today, SQL is a universal language of data, used in practically all technologies that process data. Learn the basics of SQL in our interactive SQL Basics course. SELECT Fetch the id and name columns from the product table: SELECT id, name FROM product; Concatenate the name and the description to fetch the full description of the products: SELECT name || ' - ' || description FROM product; Fetch names of products with prices above 15: SELECT name FROM product WHERE price > 15; Fetch names of products with prices between 50 and 150: SELECT name FROM product WHERE price BETWEEN 50 AND 150; Fetch names of products that are not watches: SELECT name FROM product WHERE name != 'watch'; Fetch names of products that start with a 'P' or end with an 's': SELECT name FROM product WHERE name LIKE 'P%' OR name LIKE '%s'; Fetch names of products that start with any letter followed by 'rain' (like 'train' or 'grain'): SELECT name FROM product WHERE name LIKE '_rain'; Fetch names of products with non-null prices: SELECT name FROM product WHERE price IS NOT NULL; GROUP BY AGGREGATE FUNCTIONS Count the number of products: SELECT COUNT(*) FROM product; Count the number of products with non-null prices: SELECT COUNT(price) FROM product; Count the number of unique category values: SELECT COUNT(DISTINCT category) FROM product; Get the lowest and the highest product price: SELECT MIN(price), MAX(price) FROM product; Find the total price of products for each category: SELECT category, SUM(price) FROM product GROUP BY category; Find the average price of products for each category whose average is above 3.0: SELECT category, AVG(price) FROM product GROUP BY category HAVING AVG(price) > 3.0; ORDER BY Fetch product names sorted by the price column in the default ASCending order: SELECT name FROM product ORDER BY price [ASC]; Fetch product names sorted by the price column in DESCending order: SELECT name FROM product ORDER BY price DESC; COMPUTATIONS Use +, -, *, / to do basic math. To get the number of seconds in a week: SELECT 60 * 60 * 24 * 7; -- result: 604800 ROUNDING NUMBERS Round a number to its nearest integer: SELECT ROUND(1234.56789); -- result: 1235 Round a number to two decimal places: SELECT ROUND(AVG(price), 2) FROM product WHERE category_id = 21; -- result: 124.56 TROUBLESHOOTING INTEGER DIVISION In PostgreSQL and SQL Server, the / operator performs integer division for integer arguments. If you do not see the number of decimal places you expect, it is because you are dividing between two integers. Cast one to decimal: 123 / 2 -- result: 61 CAST(123 AS decimal) / 2 -- result: 61.5 DIVISION BY 0 To avoid this error, make sure the denominator is not 0. You may use the NULLIF() function to replace 0 with a NULL, which results in a NULL for the entire expression: count / NULLIF(count_all, 0) JOIN JOIN is used to fetch data from multiple tables. To get the names of products purchased in each order, use: SELECT orders.order_date, product.name AS product, amount FROM orders JOIN product ON product.id = orders.product_id; Learn more about JOINs in our interactive SQL JOINs course. INSERT To insert data into a table, use the INSERT command: INSERT INTO category VALUES (1, 'Home and Kitchen'), (2, 'Clothing and Apparel'); You may specify the columns to which the data is added. The remaining columns are filled with predefined default values or NULLs. INSERT INTO category (name) VALUES ('Electronics'); UPDATE To update the data in a table, use the UPDATE command: UPDATE category SET is_active = true, name = 'Office' WHERE name = 'Ofice'; DELETE To delete data from a table, use the DELETE command: DELETE FROM category WHERE name IS NULL; Check out our interactive course How to INSERT, UPDATE, and DELETE Data in SQL. DATE AND TIME There are 3 main time-related types: date, time, and timestamp. Time is expressed using a 24-hour clock, and it can be as vague as just hour and minutes (e.g., 15:30 – 3:30 p.m.) or as precise as microseconds and time zone (as shown below): 14:39:53.662522-05 is almost 2:40 p.m. CDT (e.g., in Chicago; in UTC it'd be 7:40 p.m.). The letters in the above example represent: In the date part: YYYY – the 4-digit year. mm – the zero-padded month (01—January through 12—December). dd – the zero-padded day. In the time part: HH – the zero-padded hour in a 24-hour clock. MM – the minutes. SS – the seconds. Omissible. ssssss – the smaller parts of a second – they can be expressed using 1 to 6 digits. Omissible. ±TZ – the timezone. It must start with either + or -, and use two digits relative to UTC. Omissible. CURRENT DATE AND TIME Find out what time it is: SELECT CURRENT_TIME; Get today's date: SELECT CURRENT_DATE; In SQL Server: SELECT GETDATE(); Get the timestamp with the current date and time: SELECT CURRENT_TIMESTAMP; CREATING DATE AND TIME VALUES To create a date, time, or timestamp, write the value as a string and cast it to the proper type. SELECT CAST('2021-12-31' AS date); SELECT CAST('15:31' AS time); SELECT CAST('2021-12-31 23:59:29+02' AS timestamp); SELECT CAST('15:31.124769' AS time); Be careful with the last example – it is interpreted as 15 minutes 31 seconds and 124769 microseconds! It is always a good idea to write 00 for hours explicitly: '00:15:31.124769'. SORTING CHRONOLOGICALLY Using ORDER BY on date and time columns sorts rows chronologically from the oldest to the most recent: SELECT order_date, product, quantity FROM sales ORDER BY order_date; order_dateproduct quantity 2023-07-22Laptop 2 2023-07-23Mouse 3 2023-07-24Sneakers10 2023-07-24Jeans 3 2023-07-25Mixer 2 Use the DESCending order to sort from the most recent to the oldest: SELECT order_date, product, quantity FROM sales ORDER BY order_date DESC; COMPARING DATE AND TIME VALUES You may use the comparison operators <, <=, >, >=, and = to compare date and time values. Earlier dates are less than later ones. For example, 2023-07-05 is "less" than 2023-08-05. Find sales made in July 2023: SELECT order_date, product_name, quantity FROM sales WHERE order_date >= '2023-07-01' AND order_date < '2023-08-01'; Find customers who registered in July 2023: SELECT registration_timestamp, email FROM customer WHERE registration_timestamp >= '2023-07-01' AND registration_timestamp < '2023-08-01'; Note: Pay attention to the end date in the query. The upper bound '2023-08-01' is not included in the range. The timestamp '2023-08-01' is actually the timestamp '2023-08-01 00:00:00.0'. The comparison operator < is used to ensure the selection is made for all timestamps less than '2023-08-01 00:00:00.0', that is, all timestamps in July 2023, even those close to the midnight of August 1, 2023. INTERVALS An interval measures the difference between two points in time. For example, the interval between 2023-07-04 and 2023-07-06 is 2 days. To define an interval in SQL, use this syntax: INTERVAL '1' DAY The syntax consists of three elements: the INTERVAL keyword, a quoted value, and a time part keyword. You may use the following time parts: YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND. Adding intervals to date and time values You may use + or - to add or subtract an interval to date or timestamp values. Subtract one year from 2023-07-05: SELECT CAST('2023-07-05' AS TIMESTAMP) - INTERVAL '1' year; -- result: 2022-07-05 00:00:00 Find customers who placed the first order within a month from the registration date: SELECT id FROM customers WHERE first_order_date > registration_date + INTERVAL '1' month; Filtering events to those in the last 7 days To find the deliveries scheduled for the last 7 days, use: SELECT delivery_date, address FROM sales WHERE delivery_date <= CURRENT_DATE AND delivery_date >= CURRENT_DATE - INTERVAL '7' DAY; Note: In SQL Server, intervals are not implemented – use the DATEADD() and DATEDIFF() functions. Filtering events to those in the last 7 days in SQL Server To find the sales made within the last 7 days, use: SELECT delivery_date, address FROM sales WHERE delivery_date <= GETDATE() AND delivery_date >= DATEADD(DAY, -7, GETDATE()); EXTRACTING PARTS OF DATES The standard SQL syntax to get a part of a date is SELECT EXTRACT(YEAR FROM order_date) FROM sales; You may extract the following fields:YEAR, MONTH, DAY, HOUR, MINUTE, and SECOND. The standard syntax does not work In SQL Server. Use the DATEPART(part, date) function instead. SELECT DATEPART(YEAR, order_date) FROM sales; GROUPING BY YEAR AND MONTH Find the count of sales by month: SELECT EXTRACT(YEAR FROM order_date) AS year, EXTRACT(MONTH FROM order_date) AS month, COUNT(*) AS count FROM sales GROUP BY year, month ORDER BY year month; yearmonthcount 20228 51 20229 58 202210 62 202211 76 202212 85 20231 71 20232 69 Note that you must group by both the year and the month. EXTRACT(MONTH FROM order_date) only extracts the month number (1, 2, ..., 12). To distinguish between months from different years, you must also group by year. More about working with date and time values in our interactive Standard SQL Functions course. CASE WHEN CASE WHEN lets you pass conditions (as in the WHERE clause), evaluates them in order, then returns the value for the first condition met. SELECT name, CASE WHEN price > 150 THEN 'Premium' WHEN price > 100 THEN 'Mid-range' ELSE 'Standard' END AS price_category FROM product; Here, all products with prices above 150 get the Premium label, those with prices above 100 (and below 150) get the Mid-range label, and the rest receives the Standard label. CASE WHEN and GROUP BY You may combine CASE WHEN and GROUP BY to compute object statistics in the categories you define. SELECT CASE WHEN price > 150 THEN 'Premium' WHEN price > 100 THEN 'Mid-range' ELSE 'Standard' END AS price_category, COUNT(*) AS products FROM product GROUP BY price_category; Count the number of large orders for each customer using CASE WHEN and SUM(): SELECT customer_id, SUM( CASE WHEN quantity > 10 THEN 1 ELSE 0 END ) AS large_orders FROM sales GROUP BY customer_id; ... or using CASE WHEN and COUNT(): SELECT customer_id, COUNT( CASE WHEN quantity > 10 THEN order_id END ) AS large_orders FROM sales GROUP BY customer_id; Learn more in our interactive Creating Basic SQL Reports course. GROUP BY EXTENSIONS GROUPING SETS GROUPING SETS lets you specify multiple sets of columns to group by in one query. SELECT region, product, COUNT(order_id) FROM sales GROUP BY GROUPING SETS ((region, product), ()); CUBE CUBE generates groupings for all possible subsets of the GROUP BY columns. SELECT region, product, COUNT(order_id) FROM sales GROUP BY CUBE (region, product); ROLLUP ROLLUP adds new levels of grouping for subtotals and grand totals. SELECT region, product, COUNT(order_id) FROM sales GROUP BY ROLLUP (region, product); COALESCE COALESCE replaces the first NULL argument with a given value. It is often used to display labels with GROUP BY extensions. SELECT region, COALESCE(product, 'All'), COUNT(order_id) FROM sales GROUP BY ROLLUP (region, product); regionproductcount USALaptop10 USAMouse 5 USAAll 15 UK Laptop6 UK All 6 AllAll 21 Check out our hands-on course on GROUP BY Extensions. COMMON TABLE EXPRESSIONS A common table expression (CTE) is a named temporary result set that can be referenced within a larger query. They are especially useful for complex aggregations and for breaking down large queries into more manageable parts. WITH total_product_sales AS ( SELECT product, SUM(profit) AS total_profit FROM sales GROUP BY product ) SELECT AVG(total_profit) FROM total_product_sales; Check out our hands-on course on Common Table Expressions. WINDOW FUNCTIONS Window functions compute their results based on a sliding window frame, a set of rows related to the current row. Unlike aggregate functions, window functions do not collapse rows. COMPUTING THE PERCENT OF TOTAL WITHIN A GROUP SELECT product, brand, profit, (100.0 * profit / SUM(profit) OVER(PARTITION BY brand)) AS perc FROM sales; productbrandprofitperc KnifeCulina100025 Pot Culina300075 Doll Toyze 200040 Car Toyze 300060 RANKING Rank products by price: SELECT RANK() OVER(ORDER BY price), name FROM product; RANKING FUNCTIONS RANK – gives the same rank for tied values, leaves gaps. DENSE_RANK – gives the same rank for tied values without gaps. ROW_NUMBER – gives consecutive numbers without gaps. namerankdense_rankrow_number Jeans 111 Leggings222 Leggings223 Sneakers434 Sneakers435 Sneakers436 T-Shirt 747 RUNNING TOTAL A running total is the cumulative sum of a given value and all preceding values in a column. SELECT date, amount, SUM(amount) OVER(ORDER BY date) AS running_total FROM sales; MOVING AVERAGE A moving average (a.k.a. rolling average, running average) is a technique for analyzing trends in time series data. It is the average of the current value and a specified number of preceding values. SELECT date, price, AVG(price) OVER( ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_averge FROM stock_prices; DIFFERENCE BETWEEN TWO ROWS (DELTA) SELECT year, revenue, LAG(revenue) OVER(ORDER BY year) AS revenue_prev_year, revenue - LAG(revenue) OVER(ORDER BY year) AS yoy_difference FROM yearly_metrics; Learn about SQL window functions in our interactive Window Functions course. Tags: cheat sheet sql data analysis You may also like SQL for Data Analysis: What Should I Learn? SQL is an essential skill for data analysis. Explore SQL’s features and get course recommendations to help you learn how to become a data analyst! Read more SQL Window Functions Cheat Sheet This 2-page SQL Window Functions Cheat Sheet covers the syntax of window functions and a list of window functions. Download it in PDF or PNG format. Read more Standard SQL Functions Cheat Sheet Your quick guide to SQL functions! Download the cheat sheet in PDF or PNG for syntax reference, examples, and tips. Read more SQL JOIN Cheat Sheet Master SQL JOINs with Our Comprehensive Cheat Sheet. Download it in PDF or PNG format. Read more Top 7 Advanced SQL Queries for Data Analysis The most essential advanced SQL queries for data analysis. Read more Best SQL Books for Data Analysis Dive into data analysis with SQL. Here are our recommendations for the best SQL books for data analysis to enhance your skills. Read more Learn SQL for Data Analysis With LearnSQL.com An in-depth look at the courses for those looking to learn SQL for data analysis. Read more Read These 5 Data Analysis Books to Jump-Start Your Career Don't know what data analysis books to read? Here are 5 great books to take your reporting to the next level! Read more These SQL Queries Will Help You With Data Analysis This article demonstrates simple SQL queries that can be used for basic data analysis. Read more Roadmap to Becoming a Data Analyst Would you like to become a data analyst? Read this article to learn what data analysts do and what steps you should take to become one. Read more Subscribe to our newsletter Join our monthly newsletter to be notified about the latest posts. Email address How Do You Write a SELECT Statement in SQL? What Is a Foreign Key in SQL? Enumerate and Explain All the Basic Elements of an SQL Query