Structured Query Language (SQL) is one of the most common programming languages that enables one to store, retrieve, manipulate and organize data in relational databases such as MySQL, Oracle Database, and Microsoft SQL Server.
This documentation contains a short SQL query example that combines two tables using two unique keys present in both tables.
On a high level, the example SQL code above "selects all columns from tables a and b, and joins both tables anywhere there's a match on the sk_applicant and hellofresh_week field. A NULL value is returned for non-match keys." Let's take a moment to understand what the query is doing.
In SQL, "SELECT" is a reserved keyword used to find and extract data from one or more database tables.
Whenever you use a SELECT keyword in a query, what typically comes after is the column you want to select (In the example SQL code, "SELECT a.*, b.*" retrieves all columns in tables a and b).
A database table stores data in a logically organized, row-and-column format and uses identifiers (i.e., primary and foreign keys) to uniquely identify table records and enhance referencing with other related tables.
In SQL, you can't select data in a vacuum. You always need to specify the table you want to retrieve data from. In the example query, we're fetching data from tables a and b. Another thing to note is that you can select columns from one table and JOIN the data retrieved with another table using some keywords like INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN e.t.c.
The example query uses a "FULL OUTER JOIN." A FULL OUTER JOIN returns both matching and unmatched rows and columns from "table a and b." If a record in "table a" doesn't have a match in "table b" on the join clauses, SQL fills the corresponding cell with "NULL." The same thing happens when a record in "table b" doesn't have a match in "table a."
Conventionally, programmers prefer to join tables using a single unique identifier or key. But, the example query uses two different identifiers for merging (i.e., sk_applicant and hellofresh_week).
The second index further narrows down the data and ensures there's a unique match between sk_applicant and hellofresh_week in the "table a" with the sk_applicant and hellofresh_week in "table b." It's super important as a single applicant can have multiple hellofresh_week values.
The diagram below presents a high-level overview of how the code works:
But, there are several concerns with the SQL query:
The following can be done to improve the query:
First seen here.