Learn SQL by Calculating Customer Lifetime Value Part 1: Setup, Counting and Filtering
Last updated December 5, 2014Motivation
As far as technical skills go, SQL is a really nice skill to have for product managers and product marketers. Instead of constantly running into performance issues in Excel or begging “technical” people to look stuff up for you, you can get answers to your questions directly from data.
Unfortunately, good, non-encyclopedic resources for learning SQL are few and far between on the Internet. Many of them just tell you the syntax without context. Many assume you are already a programmer and/or omit the technical details of setting up the environment.
The approach here is different. In a series of blog posts, I will walk through a simple but common data analysis scenario. Each step introduces a new concept in SQL that’s tied to a natural business question. Also, each query comes with a screenshot or animation of comparable operations in Excel to aid your understanding. By the end of this series, you should feel comfortable getting read-only access to your company’s databases.
Helpful Mnemonic for Excel Veterans
It is helpful to keep the following mental model throughout this tutorial:
Excel | SQL |
---|---|
Table | Table |
Sort | ORDER BY |
Filter | WHERE |
Pivot Table | GROUP BY |
VLOOKUP | JOIN |
Array Functions | Aggregation Functions |
Get the FREE e-book based on this blog series!
Scenario
Imagine that you are a product manager at an e-commerce or SaaS startup. For such a business, the critical metric is CLV: Customer Lifetime Value. If you know how much revenue each customer brings to your business over their lifetime, you have a much better idea of how to meet revenue goals and evolve your product.
Also, if you know the CLV for each customer, then you can make more efficient business decisions. Which customer acquisition channel has high CLVs? Which one is losing money? Are there regional differences?
The goal of this tutorial is to compute the average customer lifetime value from two data sources: users data and payments data.
Setup
For this series, I will be using SQLite. SQLite is simple, free, and shipped with recent versions of Mac OSX, which I assume many of you work on. Windows is also supported (See here, for example, on how to download SQLite on Windows).
Also, this might be the first time you are working with the command line interface. I promise that you won’t have to do much besides entering SQL commands.
First, let’s open up Terminal. If you are on a Mac, it’s as simple as searching for it in the search bar.
The first step is to start sqlite3 with pre-populated tables. Download bootstrap.sql and place it in the directory of your choice. Then, in your terminal, run the following command in the same directory:
sqlite3 -init bootstrap.sql
This should start up a SQLite interactive shell like this:
sqlite>
Let’s see what tables (again, they are like the tables in Excel) are there. For sqlite3, typing in “.tables” does the job:
sqlite>.tables payments users
Great. There are two tables here, “payments” and “users”. Unlike Excel, you need to actually write a query to see what the data looks like.
To do so, we run the simplest SQL query, which just grabs all the columns and rows in a table. DO NOT DO THIS for large tables. But here, the table is small (with just 10 rows), so go ahead and type in “SELECT * FROM users;” and hit enter. Do not forget the trailing semicolon.
sqlite> SELECT * FROM users; id campaign signed_up_on ---------- ---------- ------------ 1 facebook 2014-10-01 2 twitter 2014-10-02 3 direct 2014-10-02 4 facebook 2014-10-03 5 organic 2014-10-03 6 organic 2014-10-03 7 organic 2014-10-04 8 direct 2014-10-05 9 twitter 2014-10-05 10 organic 2014-10-05
Here is an Excel equivalent:
As you can see, this table has three fields/columns:
- “id” is the user’s ID. This will be referenced in the “payments” table
- “campaign” is the campaign used to acquire that user.
- “signed_up_on” is the date when the user signed up for the website.
Sorting with ORDER BY
Just like Excel’s tables, SQL lets you sort data by one or more columns. To do so, we add ORDER BY <column_names>
to your SQL statement.
sqlite> SELECT * FROM users ORDER BY campaign; id campaign signed_up_on ---------- ---------- ------------ 3 direct 2014-10-02 8 direct 2014-10-05 1 facebook 2014-10-01 4 facebook 2014-10-03 5 organic 2014-10-03 6 organic 2014-10-03 7 organic 2014-10-04 10 organic 2014-10-05 2 twitter 2014-10-02 9 twitter 2014-10-05
An Excel equivalent is as follows:
Here is how you can sort by campaign AND signed_up_on.
sqlite> SELECT * FROM users ORDER BY campaign, signed_up_on; id campaign signed_up_on ---------- ---------- ------------ 3 direct 2014-10-02 8 direct 2014-10-05 1 facebook 2014-10-01 4 facebook 2014-10-03 5 organic 2014-10-03 6 organic 2014-10-03 7 organic 2014-10-04 10 organic 2014-10-05 2 twitter 2014-10-02 9 twitter 2014-10-05
An Excel equivalent is as follows:
You can also sort in reverse order by adding “DESC”.
sqlite> SELECT * FROM users ORDER BY campaign DESC; id campaign signed_up_on ---------- ---------- ------------ 2 twitter 2014-10-02 9 twitter 2014-10-05 5 organic 2014-10-03 6 organic 2014-10-03 7 organic 2014-10-04 10 organic 2014-10-05 1 facebook 2014-10-01 4 facebook 2014-10-03 3 direct 2014-10-02 8 direct 2014-10-05
An Excel equivalent is as follows:
And of course, “DESC” can be used when sorted by multiple columns.
sqlite> SELECT * FROM users ORDER BY campaign, signed_up_on DESC; id campaign signed_up_on ---------- ---------- ------------ 8 direct 2014-10-05 3 direct 2014-10-02 4 facebook 2014-10-03 1 facebook 2014-10-01 10 organic 2014-10-05 7 organic 2014-10-04 5 organic 2014-10-03 6 organic 2014-10-03 9 twitter 2014-10-05 2 twitter 2014-10-02
An Excel equivalent is as follows:
Filtering with WHERE
One of Excel’s most used features is filters. Of course, SQL, also has filters. They are called WHERE clauses and can express an even broader range of filter conditions than Excel’s filters.
Here is the query that fetches all the users that signed up organically:
sqlite> SELECT * FROM users WHERE campaign = 'organic'; id campaign signed_up_on ---------- ---------- ------------ 5 organic 2014-10-03 6 organic 2014-10-03 7 organic 2014-10-04 10 organic 2014-10-05
An Excel equivalent is as follows:
As you can see, we have “WHERE campaign = ‘organic’ in the above query. An analog in Excel is going to the “campaign” column and selecting just ‘organic’.
What if you want to select multiple values? No problem, SQL can handle that with “IN” like this.
sqlite> SELECT * FROM users WHERE campaign IN ('facebook', 'twitter'); id campaign signed_up_on ---------- ---------- ------------ 1 facebook 2014-10-01 2 twitter 2014-10-02 4 facebook 2014-10-03 9 twitter 2014-10-05
An Excel equivalent is as follows:
The above query fetches all the users that signed up through Facebook or Twitter. As you can see, you can list up multiple fields separated by commas inside the parenthesis.
What if you wanted to fetch all the users EXCEPT the ones that came Facebook or Twitter? This is supported with “NOT IN” like this:
sqlite> SELECT * FROM users WHERE campaign NOT IN ('facebook', 'twitter'); id campaign signed_up_on ---------- ---------- ------------ 3 direct 2014-10-02 5 organic 2014-10-03 6 organic 2014-10-03 7 organic 2014-10-04 8 direct 2014-10-05 10 organic 2014-10-05
(Note that it’s not easy to “exclude” particular values for a given column in Excel).
Okay, but all the filtering thus far involved a single column. Can SQL filter by multiple columns? The answer is yes, and it uses AND to concatenate filter conditions. The following query fetches all the users that came from Facebook or Twitter campaigns that signed up on Oct. 1, 2014.
sqlite> SELECT * FROM users WHERE campaign in ('facebook', 'twitter') AND signed_up_on = '2014-10-01'; id campaign signed_up_on ---------- ---------- ------------ 1 facebook 2014-10-01
An Excel equivalent is as follows:
Now it’s time to show that SQL’s WHERE is more powerful than Excel’s filters. In addition to AND, You can actually use OR to say something like “Get me all the users that signed up before 2014-10-04 OR came in organically”. Here is the query:
sqlite> SELECT * FROM users WHERE campaign = 'organic' OR signed_up_on < '2014-10-04'; id campaign signed_up_on ---------- ---------- ------------ 1 facebook 2014-10-01 2 twitter 2014-10-02 3 direct 2014-10-02 4 facebook 2014-10-03 5 organic 2014-10-03 6 organic 2014-10-03 7 organic 2014-10-04 10 organic 2014-10-05
(As explained above, this is not easy to do in Excel, either. Most likely, you would need to create an additional column.)
Filtering AND Sorting
As you might have guessed by now, SQL allows you to filter and sort in one pass. The syntax is simple: Have both WHERE and ORDER BY in your query, but make sure WHERE comes before ORDER BY. Here is a query that fetches all the Facebook-/Twitter-sourced users, sorted by campaign.
sqlite> SELECT * FROM users WHERE campaign in ('facebook', 'twitter') ORDER BY campaign; id campaign signed_up_on ---------- ---------- ------------ 1 facebook 2014-10-01 4 facebook 2014-10-03 2 twitter 2014-10-02 9 twitter 2014-10-05
What’s Next?
Congratulations! You now know how to do the SQL equivalent of Excel’s sorting and filtering. In the next post, I will show how to do “pivot tables” in SQL. In the meantime, if you are interested in use-case specific SQL query templates, check out our library!
Contact me on twitter @kiyototamura or leave a comment if you have any questions about this.
Continue reading Part 2!