paint-brush
Data Mastery: SQL — Case Functionby@laurenjglass9
301 reads
301 reads

Data Mastery: SQL — Case Function

by Lauren GlassNovember 12th, 2018
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

There are so many exciting projects out there in the Data World. Artificial Intelligence, Machine Learning, Neural Nets, Blockchain, and more are sweeping the technology industry. In order to get to the cutting-edge stuff, first and foremost, data needs to be stored, evaluated, and tested. The best place to do that is SQL (or a library that operates with SQL-like commands, <a href="https://hackernoon.com/fundamental-python-data-science-libraries-a-cheatsheet-part-2-4-fcf5fab9cdf1">see my article on Python’s Pandas library</a>).

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Data Mastery: SQL — Case Function
Lauren Glass HackerNoon profile picture

Data Mastery: SQL — Case Function

There are so many exciting projects out there in the Data World. Artificial Intelligence, Machine Learning, Neural Nets, Blockchain, and more are sweeping the technology industry. In order to get to the cutting-edge stuff, first and foremost, data needs to be stored, evaluated, and tested. The best place to do that is SQL (or a library that operates with SQL-like commands, see my article on Python’s Pandas library).

This series Data Mastery: SQL, will teach you the essential subjects. These are not exhaustive tutorials. Instead they are focused preparation guides — with brevity and efficiency in mind. It is meant for:

  • Software Engineers who want to analyze their creation’s data
  • Product Managers, Marketers, and others who want to be data-driven
  • Beginning Data Scientists, Data Engineers, Data Analysts, or Business Intelligence Developers preparing for interviews

See my resource list of the books I used to prepare for my big interview

Each article will contain a brief technical explanation of the subject, an example question, and an answer. Follow up articles will contain challenging questions so you can evaluate your learning.

This series does not come with accompanying data sets. The advantage to this is when you are on the drawing board, whether in an interview or project design, you do not have test data to play with. You have to think abstract.

Case Function

A CASE function checks a condition and changes a column’s value accordingly. The CASE function is used commonly in the SELECT statement. It follows this formula:

CASE WHEN condition THEN value
     WHEN condition THEN value
     ELSE value END

At least one WHEN/THEN condition is required. You can have as many WHEN/THEN conditions as you like. CASE and END are also required.

Here is our daily_user_score table from earlier in the series:

date       | userid    | sessionid | score
------------------------------------------
2018–09–01 | 983489272 | 125       | 112
2018–09–01 | 234342423 | 34        | 112
2018–09–01 | 567584329 | 207       | 618
2018–09–02 | 983489272 | 126       | 410
2018–09–02 | 983489272 | 127       | 339

Say we know userid 983489272 is a bot and we want to add a user type to the table’s columns when we view the data. The query will look like this:

SELECT date,
       userid,
       CASE WHEN userid = 983489272 THEN ‘bot’
            ELSE ‘normal’ END AS user_type,
       sessionid,
       score
FROM daily_user_score;

These are the results of the above query:

date       | userid    | user_type | sessionid | score
------------------------------------------------------
2018–09–01 | 983489272 | bot       | 125       | 112
2018–09–01 | 234342423 | normal    | 34        | 112
2018–09–01 | 567584329 | normal    | 207       | 618
2018–09–02 | 983489272 | bot       | 126       | 410
2018–09–02 | 983489272 | bot       | 127       | 339

Try it yourself

Say the users 234342423 and 567584329 are fake accounts. Write a query that returns all the table’s columns and a user type column indicating if the corresponding userid is fake or real.

Answer

There are two correct possibilities, the first is a better answer than the second.

SELECT date,
       userid,
       CASE WHEN userid = 234342423 OR userid = 567584329 
              THEN ‘fake’
            ELSE ‘real’ END AS user_type,
       sessionid,
       score
FROM daily_user_score;
SELECT date,
       userid,
       CASE WHEN userid = 234342423 THEN ‘fake’
            WHEN userid = 567584329 THEN ‘fake’
            ELSE ‘real’ END AS user_type,
       sessionid,
       score
FROM daily_user_score;

This query returns:

date       | userid    | user_type | sessionid | score
------------------------------------------------------
2018–09–01 | 983489272 | real      | 125       | 112
2018–09–01 | 234342423 | fake      | 34        | 112
2018–09–01 | 567584329 | fake      | 207       | 618
2018–09–02 | 983489272 | real      | 126       | 410
2018–09–02 | 983489272 | real      | 127       | 339

Thanks for reading! If you have questions feel free to comment & I will try to get back to you.

Connect with me on Instagram @lauren__glass

Connect with me on LinkedIn

Check out my essentials list on Amazon

Search for me using my nametag on Instagram!