SQLĀ forĀ DataĀ Engineering
MyĀ full course to help you
build production data pipelines withĀ SQL
All video lessons are free on YouTube
Supporter Access unlocks structure, guided practice, and community
Ā
Unlock Full Access
Pricing
FREE OPTION
Free
Features:
š„ Full YouTube Course (14+ hours)
š§āš» Two Complete PortfolioĀ Projects
š Links to Required Materials & Resources
š Real-World Dataset (2023 to mid-2025)
Ā
SUPPORTERS
$49
One-time payment ā Supporter Access adds:
š§Ŗ 170+ Interview-Level SQL Problems
šŗ Playlist-Style Lesson Videos
ā³ Progress Tracking
š¬Ā Community AccessĀ
š Course Notes
šĀ Cheat Sheets
šĀ Certificate of Completion
š Full Real-World Dataset (2023āPresent)
Course Outline
0ļøā£ Course Intro
ā¾
Course Intro
ā¾
What is SQL
ā¾
Data & Pipeline Intro
ā¾
1ļøā£ SQL Foundations
ā¾
SQL & Dataset Setup
ā¾
data_jobs database containing job posting facts and dimensions.data_jobs database with job postings, company, and skills tables for hands-on SQL practice.
Basic Keywords
ā¾
SELECT, FROM, WHERE, LIMIT, DISTINCT, and ORDER BY for querying data.SELECT * / FROMSELECT * to retrieve all columns and FROM to specify the table for querying job postings data.LIMITLIMIT to restrict the number of rows returned in query results for faster testing and data exploration.DISTINCTDISTINCT to remove duplicate values and return only unique rows from query results.WHEREWHERE clause to retrieve rows that meet specific conditions from the job postings table.IS NULL / IS NOT NULLIS NULL and IS NOT NULL operators to filter rows based on whether column values are missing or present.--) and multi-line (/* */) comments to document SQL queries for better code readability.ORDER BYORDER BY using ASC (ascending) or DESC (descending) to organize data by one or more columns.FROM, WHERE, SELECT, to ORDER BY and LIMIT for proper query construction.GROUP BY ALL and simplified column references for cleaner queries.
Comparison & Logical Operators
ā¾
=, !=, <, >, BETWEEN, IN) and logical operators (AND, OR, NOT) for filtering data.=, !=, <, >)=, !=, <, and > to filter job postings by salary, location, and other numeric or text values.AND, OR, NOT)AND, OR, and NOT logical operators to create precise data filters.BETWEEN, IN)BETWEEN for range filtering and IN to match values against a list of options in WHERE clauses.
Wildcards & Aliases
ā¾
%, _) with LIKE operator and AS keyword for creating column aliases in SQL queries.LIKELIKE operator using % (multiple characters) and _ (single character) wildcards to filter text data.ASAS keyword to alias columns and tables for clearer query results.LIKE, wildcards, and AS aliases to filter and display job postings with custom column names.
Arithmetic Operators
ā¾
* and / operators to multiply and divide numeric values for salary conversions and rate calculations.% to find remainders and identify even/odd numbers or distribute data into buckets.
Aggregate Functions
ā¾
COUNT, SUM, AVG, MIN, MAX, MEDIAN with GROUP BY and HAVING for data analysis.COUNT()COUNT() to count total rows or non-null values in a column for data analysis.COUNT(DISTINCT)COUNT(DISTINCT) to count unique values in a column, removing duplicates from the count.SUM()SUM() to add up numeric values like total salaries across job postings.AVG()AVG() to find mean values such as average salary for job roles.GROUP BYGROUP BY to organize rows into categories for aggregate calculations by job title or location.MIN() / MAX()MIN() and MAX() to identify lowest and highest salaries in datasets.MEDIAN()MEDIAN() to find the median salary, which is less affected by outliers than averages.HAVINGHAVING clause to apply conditions on aggregate functions like COUNT() or AVG().
Terminal Intro
ā¾
pwd, ls, cd)pwd (print working directory), ls (list files), and cd (change directory) to navigate file systems.mkdir, touch, rm)mkdir (make directory), touch (create file), and rm (remove files/folders) commands.
Local DuckDB Intro
ā¾
winget package manager for quick and easy setup via command line.brew install duckdb for streamlined setup.duckdb command to run SQL queries in an interactive local environment.motherduck_token environment variable and md: connection string prefix.
VS Code Intro
ā¾
Shift+Enter) to execute queries from editor to integrated terminal seamlessly.duckdb md: commands.
Data Modeling Pt.1
ā¾
ERD diagrams.ERD)information_schema)information_schema.schemata, information_schema.tables, and information_schema.columns to inspect database metadata and structure.
JOINs
ā¾
LEFT JOIN, RIGHT JOIN, INNER JOIN, and FULL OUTER JOIN for combining data from multiple related tables.foreign key relationships.LEFT JOINLEFT JOIN to return all rows from the left table and matching rows from the right table, with nulls for non-matches.RIGHT JOINRIGHT JOIN to return all rows from the right table and matching rows from the left table, with nulls for non-matches.INNER JOININNER JOIN to return only rows where matching values exist in both tables, excluding non-matches from results.FULL OUTER JOINFULL OUTER JOIN to return all rows from both tables, with nulls where matches don't exist on either side.
Order of Execution
ā¾
EXPLAIN and EXPLAIN ANALYZE for performance analysis.SELECT, FROM, JOIN, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT.FROM, then WHERE, GROUP BY, HAVING, SELECT, ORDER BY, and finally LIMIT.EXPLAINEXPLAIN and EXPLAIN ANALYZE commands to visualize query execution plans and identify performance bottlenecks.
š SQL Exploratory Data Analysis ā Project 1
ā¾
Project #1 Intro
ā¾
EDA #1 - In-Demand Skills
ā¾
COUNT, GROUP BY, JOIN, and ORDER BY on job postings data.
EDA #2 - Highest Paying Skills
ā¾
AVG, MEDIAN, GROUP BY, and ORDER BY with salary data.
EDA #3 - Most Optimal Skills
ā¾
LN, COUNT, AVG, and ranking calculations.
README.md Build
ā¾
README.md BuildREADME.md IntroREADME.md BuildREADME.md file with project description, setup instructions, and analysis results.
Git & GitHub Pt.1
ā¾
git config)git config command.git init, add, commit)git init, staging files with git add, and committing with git commit.git push, git pull)git push and git pull commands.
Share Project #1
ā¾
README.mdREADME.md and pushing project files with git push.
2ļøā£ Production SQL
ā¾
Data Types
ā¾
CAST.INTEGER, VARCHAR, BOOLEAN, DATE, TIMESTAMP, and DOUBLE.DESCRIBE and querying information_schema.columns to check column data types.CAST OperatorCAST operator to change column types for calculations and comparisons.
DDL & DML Pt.1
ā¾
DDL & DML Pt.1DDL (Data Definition Language) and DML (Data Manipulation Language) commands.DDL vs. DML IntroDDL commands that modify structure and DML commands that modify data.CREATE / DROP DATABASECREATE DATABASE and DROP DATABASE commands.CREATE / DROP SCHEMACREATE SCHEMA and DROP SCHEMA commands.CREATE / DROP TABLECREATE TABLE and removing tables with DROP TABLE.INSERT INTOINSERT INTO with explicit values or from query results.ALTER TABLE - ADD / DROP COLUMNALTER TABLE ADD COLUMN and ALTER TABLE DROP COLUMN commands.UPDATEUPDATE with SET and WHERE clauses.ALTER TABLE - RENAME TABLE & RENAME/ALTER COLUMNALTER TABLE RENAME TO and ALTER TABLE RENAME COLUMN commands.
DDL & DML Pt.2
ā¾
DDL & DML Pt.2DDL and DML patterns including CTAS, views, and temp tables.DDL & DML - RefresherDDL and DML fundamentals before exploring advanced patterns.CTAS - CREATE TABLE AS SELECTCREATE TABLE AS SELECT (CTAS) pattern.CREATE VIEWCREATE VIEW to simplify complex queries.CREATE TEMP TABLECREATE TEMP TABLE for session-specific data processing.DELETEDELETE with WHERE clause filtering.TRUNCATETRUNCATE.
Subqueries and CTEs
ā¾
CTEsCTEs) for modular SQL.CTEs?CTEs for nested query logic.SELECT, FROM, and WHERE clauses to filter and transform data.CTEs - Common Table ExpressionsWITH (Common Table Expressions) for readable queries.EXISTSEXISTS and NOT EXISTS operators to filter based on subquery result existence.
DDL & DML Pt.3
ā¾
DDL & DML Pt.3CREATE TABLE and INSERT INTO.INSERT INTO with JOIN logic.UPDATE / INSERT / DELETE (Refresher)UPDATE, INSERT INTO, and DELETE commands before learning MERGE.MERGE INTOMERGE INTO using WHEN MATCHED, WHEN NOT MATCHED, and WHEN NOT MATCHED BY SOURCE clauses.CTAS vs. MERGECREATE TABLE AS SELECT for full rebuilds versus MERGE for incremental updates.
Data Modeling Pt.2
ā¾
OLTP vs OLAPOLTP databases (transactional) with OLAP databases (analytical) for different workloads.
CASE Expressions
ā¾
CASE ExpressionsCASE expressions.CASE ExpressionsCASE WHEN, THEN, ELSE, and END for data categorization.CASE: Engineering Use CasesCASE expressions to data engineering scenarios like salary standardization and bucketing.CASE expressions to standardize and categorize job salary data.
Date Functions
ā¾
EXTRACT, DATE_TRUNC, and time zones.EXTRACT()EXTRACT to get year, month, day, quarter from timestamps.DATE_TRUNC()DATE_TRUNC for month, quarter, year aggregations.AT TIME ZONEAT TIME ZONE for global data analysis.
SET Operators
ā¾
SET OperatorsSET operators for combining and comparing query results.UNION / INTERSECT / EXCEPTUNION, finding common rows with INTERSECT, and differences with EXCEPT.INTERSECT and EXCEPT to identify stable job markets and detect data quality issues.
Text & NULL Functions
ā¾
REPLACE / CONCATLOWER, UPPER, TRIM, REPLACE, and CONCAT for data cleaning.LOWER, TRIM, and CASE expressions together.NULLIF / COALESCECOALESCE to provide defaults and NULLIF to convert values to NULL.COALESCE instead of nested CASE expressions.
Window Functions
ā¾
GROUP BY for adding calculations without collapsing rows.OVER, PARTITION BY, and ORDER BY clauses.PARTITION BYPARTITION BY clause in window functions to divide result sets into groups for separate calculationsORDER BYORDER BY within window functions to control row ordering for calculations like running totalsPARTITION & ORDER BYPARTITION BY and ORDER BY in window functions for grouped, ordered calculationsSUM(), AVG(), COUNT(), MIN(), and MAX() with OVER clauseROW_NUMBER(), RANK(), and DENSE_RANK() window functionsLAG(), LEAD(), FIRST_VALUE(), and LAST_VALUE() navigation functions
Nested Functions
ā¾
ARRAY_AGG(), ARRAY_LENGTH(), and UNNEST() to collect and expand multiple valuesSTRUCT_PACK() to group named fields of different typesARRAY_AGG() and STRUCT_PACK() to create arrays of structured records for one-to-many relationshipsMAP() function and accessing values using bracket notationJSON_EXTRACT() and converting to structured types for analysisARRAY_AGG() and flattening them with UNNEST() for analysisSTRUCT_PACK() and UNNEST() to model complex job-skill relationships
Git & GitHub Pt.2
ā¾
git branch, git switch)git branch, git switch, and git checkout commandsgit add and git commit workflowgit merge when both branches have new commits
šļø End-to-End Data Pipeline ā Project 2
ā¾
Project #2 Intro
ā¾
Build Data Warehouse
ā¾
git switch -c for isolated warehouse development workCREATE TABLE, primary keys, and FOREIGN KEY constraintsINSERT INTO with read_csv() functiongit merge and cleaning up branches with git branch -d
Build Flat Table Mart
ā¾
CREATE TABLE AS SELECT using LEFT JOIN and ARRAY_AGG(STRUCT_PACK()) for skill aggregationgit commit and merging to development branch
Build Skills Mart
ā¾
CREATE TABLE, dimension tables, and fact table using DATE_TRUNC() and EXTRACT() for time aggregationCOUNT() queries and joining fact tables with dimensions for enriched sample datagit add and git commit
Build Priority Mart
ā¾
CREATE TABLE and INSERT INTOMERGE INTO using WHEN MATCHED, WHEN NOT MATCHED, and WHEN NOT MATCHED BY SOURCE clausesgit merge
README.md Build
ā¾
README.md BuildREADME.md documentation for Project #2 repository and main course repoREADME.mdREADME.md with markdown formatting to document warehouse architecture, marts, and usage instructionsREADME.mdREADME.md with project links and course overview documentationgit commit and git merge
Share Project #2
ā¾
CourseĀ Resources
š½ Course Dataset ā SQL Environment
This is the primary dataset used throughout the entire course. It contains real-world data engineering & analytics job postings (2023 to mid-2025) and is hosted in MotherDuck for instant querying.
š Step 1 ā Sign in to MotherDuck
Create your free account š https://lukeb.co/motherduck
š»Ā Step 2 ā Attach Database
Run this SQL inside the MotherDuck editor:
š Project 1 ā SQL Exploratory Data Analysis
Explore real-world job data using SQL to uncover in-demand skills, salary trends, and hiring patterns. Youāll practice EDA techniques and build your first portfolio-ready project.
šĀ Project #1Ā Repo
š https://lukeb.co/sql-de-project1
Ā
Ā šļø Project 2 ā Data Pipeline: Warehouse + Mart
Build a production-style SQL pipeline ā modeling a data warehouse and creating analytical marts. Youāll apply data modeling, transformations, and best practices to deliver a second portfolio project.
šĀ Project #2 Repo
SupporterĀ Resources
šĀ Practice Problems
Ā
š§© 170+Ā Interview-LevelĀ Problems: LearnĀ SQL fasterĀ with meticulously designed exercises spanning a range from easy to challenging
Ā
š Detailed Solutions and Results: Every problem is accompanied by a comprehensive solution andĀ your expected query results
Ā
šŗĀ Structured Video Lessons
Ā
š¢ Navigate with Ease: Jump instantly to any lesson or specific topic within the course ā no more wasting time scrubbing through hours of video to find what you need
Ā
š§ Ā Focused Learning: Master concepts more effectively with dedicated, bite-sized videos for each distinct lesson, allowing for better concentration and easier review
Ā
šļø Lesson Notes &Ā Cheat Sheets
Ā
š Structured Lesson Notes:Ā Step-by-step walkthroughs for every topic, helping you follow along with each lesson and understand why queries and pipelines are built the way they are
Ā
š Practical Cheat Sheets: Quick-reference guides for core SQL syntax, transformations, and data engineering concepts youāll reuse across projects
āØĀ Certificate of Completion
šļøĀ Certificate of Completion:Ā Receive a certificate to validate your new skills and enhance your LinkedIn profile
š§āš»Ā Showcase Experience:Ā Share how you usedĀ real-world data to help solve a problem for data professionals
Ā
About theĀ Instructors
Luke BarousseĀ - Course Instructor
Ā š Real-worldĀ Experience withĀ SQL
Spearheaded innovative projects in collaboration with MrBeast's team, integrating popular tools like SQL & Python.
Ā
š”š¤ Sharing Knowledge about Data & AIĀ
Guides a community of +600,000 data nerdsĀ in harnessing analytical tools to revolutionize their professional workflows.
š Trusted Course DeveloperĀ
Imparted wisdomĀ to +30,000 learners on DataCamp in leveraging analytical tools to elevate their career efficiency.
Ā
Ā
Kelly AdamsĀ - CourseĀ Producer
Ā š¹ļø Hands on Experience with SQL
Driving strategic decisionsĀ within the social gaming industry at Golden Hearts Games,Ā using popularĀ tools like Google BigQuery (SQL), Dataform (dbt) and Looker.
Ā
šĀ Analytics Engineering &Ā Data Pipelines
Building scalable data pipelines that support product, finance, and growth teams with reliable, decision-ready metrics.
Ā
š¹ Course Producer for Data Analytics Content
Educating an audience of +600,000 analysts about the latest data analytical tools to improve their professional skill sets.
Ā
Rikki SinghĀ - Content Developer
Ā š§āš»Ā Hands-onĀ SQL & Analytics
Works across gaming, entertainment, and marketingāusing Redshift and BigQuery toĀ query and model data,Ā andĀ builds decision-ready dashboards in Looker and Tableau.
Ā
š¼Ā Director-LevelĀ OperatorĀ
Leads analytics initiativesābringing a āwhat matters to the businessā lens toĀ every lesson and project.
š¬Ā Course Producer forĀ Data Analytics Content
Builds high-signal practice problems by benchmarking a wide range ofĀ learning platformsĀ and question styles, then translating the bestĀ patternsĀ into realistic,Ā interview-ready exercises.
100% Satisfaction Guarantee or Your Money Back
Ā
ā±ļø If you donāt feel the course problems and notes help you learnĀ this tool as it has for countless others, Iāll refund your money!
š« Email me within 30 days of purchasing the course on why you are unsatisfied, and Iāll return the full purchase price to you ASAP.