Introduction
In the realm of relational databases, understanding how to combine data from multiple tables is a fundamental skill. SQL Joins provide a powerful mechanism for fetching and consolidating information spread across different tables. This comprehensive guide aims to demystify SQL Joins, offering a step-by-step exploration from the basics to advanced techniques.JOIN, INNER JOIN, RIGHT JOIN, FULL JOIN, CROSS JOIN operations:
- A JOIN clause is used to combine rows from two or more tables, based on a related column between them.
- Different types of Joins:
- INNER JOIN: Returns records that have matching values in both tables. In a CROSS JOIN, there is no specific condition for joining; it simply forms a combination of all possible pairs of rows from the involved tables.
- LEFT JOIN: Returns all records from the left table, and the matched records from the right table
- RIGHT JOIN: Returns all records from the right table, and the matched records from the left table
- FULL JOIN: Returns all records when there is a match in either left or right table
- CROSS JOIN: The CROSS JOIN keyword matches ALL records from the "left" table with EACH record from the "right" table.
- FULL OUTER JOIN EXCLUDING INNER JOIN: A FULL OUTER JOIN returns all rows from both tables and includes matching rows. The "EXCLUDING INNER JOIN" part suggests that you want to exclude rows that have a match (i.e., rows that would be part of an INNER JOIN) from the result set.
- UNION: The UNION operator is used to combine the result-set of two or more queries. The queries in the union must follow these rules:
- They must have the same number of columns
- The columns must have the same data types
- The columns must be in the same order
- UNION ALL: With the UNION operator, if some rows in the two queries returns the exact same result, only one row will be listed, because UNION selects only distinct values. Use UNION ALL to return duplicate values.
- FULL JOIN:
SELECT product_id, product_name, category_name
FROM products
INNER JOIN categories ON products.category_id = categories.category_id;
-- Join products to categories using the category_id column --
SELECT testproduct_id, product_name, category_name
FROM testproducts
INNER JOIN categories ON testproducts.category_id = categories.category_id;
-- Join testproducts to categories using the category_id column --
SELECT testproduct_id, product_name, category_name
FROM testproducts
FULL JOIN categories ON testproducts.category_id = categories.category_id;
SELECT testproduct_id, product_name, category_name
FROM testproducts
CROSS JOIN categories;
-- Join testproducts to categories using the CROSS JOIN keyword --
SELECT product_id, product_name
FROM products
UNION
SELECT testproduct_id, product_name
FROM testproducts
ORDER BY product_id;
-- Combine products and testproducts using the UNION operator --
SELECT product_id
FROM products
UNION
SELECT testproduct_id
FROM testproducts
ORDER BY product_id;
--
GROUP BY: The GROUP BY clause groups rows that have the same values into summary rows.
SELECT COUNT(customer_id), country
FROM customers
GROUP BY country;
-- Lists the number of customers in each country --
SELECT customers.customer_name, COUNT(orders.order_id)
FROM orders
LEFT JOIN customers ON orders.customer_id = customers.customer_id
GROUP BY customer_name;
-- The following SQL statement lists the number of orders made by each customer --
HAVING: The HAVING clause was added to SQL because the WHERE clause cannot be used with aggregate functions
SELECT COUNT(customer_id), country
FROM customers
GROUP BY country
HAVING COUNT(customer_id) > 5;
-- List only countries that are represented more than 5 times --
SELECT order_details.order_id, SUM(products.price)
FROM order_details
LEFT JOIN products ON order_details.product_id = products.product_id
GROUP BY order_id
HAVING SUM(products.price) > 400.00;
-- lists only orders with a total price of 400$ or more --
EXISTS: The EXISTS operator is used to test for the existence of any record in a sub query.
SELECT customers.customer_name
FROM customers
WHERE EXISTS (
SELECT order_id
FROM orders
WHERE customer_id = customers.customer_id
);
-- Return all customers that is represented in the orders table --
NOT EXISTS: To check which customers that do not have any orders, we can use the NOT operator together with the EXISTS operator
SELECT customers.customer_name
FROM customers
WHERE NOT EXISTS (
SELECT order_id
FROM orders
WHERE customer_id = customers.customer_id
);
-- Return all customers that is NOT represented in the orders table --
ANY: The ANY operator allows you to perform a comparison between a single column value and a range of other values.
SELECT product_name
FROM products
WHERE product_id = ANY (
SELECT product_id
FROM order_details
WHERE quantity > 120
);
-- List products that have ANY records in the order_details table with a quantity larger than 120 --
ALL: The ALL operator:
- returns a Boolean value as a result
- returns TRUE if ALL of the sub query values meet the condition
- is used with SELECT, WHERE and HAVING statements
SELECT product_name
FROM products
WHERE product_id = ALL (
SELECT product_id
FROM order_details
WHERE quantity > 10
);
-- List the products if ALL the records in the order_details with quantity larger than 10. --
CASE: The CASE expression goes through conditions and returns a value when the first condition is met (like an if-then-else statement). Once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the ELSE clause. If there is no ELSE part and no conditions are true, it returns NULL.
SELECT product_name,
CASE
WHEN price < 10 THEN 'Low price product'
WHEN price > 50 THEN 'High price product'
ELSE
'Normal product'
END
FROM products;
-- Return specific values if the price meets a specific condition --
SELECT product_name,
CASE
WHEN price < 10 THEN 'Low price product'
WHEN price > 50 THEN 'High price product'
ELSE
'Normal product'
END AS "price category"
FROM products;
-- Same example, but with an alias for the case column --
References
- My github Repositories on Remote sensing Machine learning
- A Visual Introduction To Linear regression (Best reference for theory and visualization).
- Book on Regression model: Regression and Other Stories
- Book on Statistics: The Elements of Statistical Learning
- A nice mathematical description on Simple & Multiple Linear Regression Implementation From Basic To Advanced is available at youtube video.
- A nice mathematical description on Linear Regression,Cost Function,Convergence Algorithm is available at youtube video.
- A nice mathematical description on Regression Performance Metrics,MSE,MAE,RMSE, R squared is available at youtube video.
Some other interesting things to know:
- Visit my website on For Data, Big Data, Data-modeling, Datawarehouse, SQL, cloud-compute.
- Visit my website on Data engineering