In a previous article, we covered how to export data from BigQuery to Google Sheets or Data Studio for further analysis.
But what if we want to run a quick analysis directly in BigQuery?
With just a little bit of knowledge of SQL, you can write your own queries and quickly analyze your data, directly in BigQuery.
Let's go over the main concepts you need to know before running your first queries in BigQuery.
NOTE:
This article covers just the basics of SQL queries in BigQuery. For a more in-depth look, check out our free BigQuery course here.
QUERY EDITOR
The Query Editor is the principal tool you'll use to write and modify queries in BigQuery, which you can access at https://console.cloud.google.com/bigquery.
You can also access the Query Editor from the Google Cloud Platform Dashboard, via the hamburger menu at the top left.
KEY TIP
The BigQuery option may appear at the bottom the first time you look for it, but you can pin it to the top using the pin logo when you hover over it.
You can directly start writing a new query from within the editor. Or, if you have a table open in "preview mode", you can select "QUERY TABLE" to populate a basic query for that table:
CORE ELEMENTS OF A SQL QUERY
Below you can see an example of a simple SQL query. The query takes a table (GA) from a specific project and database and gives us the fields month and sessions from it.
SELECT
month,
sessions,
FROM `webris-recipes.wqa.ga`
The core elements of the query are in all caps: the SELECT and FROM parameters. ALL SQL QUERIES MUST INCLUDE THESE.
SELECT
This defines the columns to retrieve from the table, and any processing you'd like to perform on those columns. In the query above, we are selecting two fields to be retrieved: month and sessions.
FROM
If the SELECT statement is the what of the query, the FROM statement is the where.
FROM defines the table that we'd like to pull data from, and in BigQuery it's structured as:
'project.dataset.table'
Think of it as the address to find the data, navigating down through the levels of granularity (project, dataset, table). So, from our example above:
`webris-recipes.wqa.ga`
webris-recipes is the project.
wqa is the dataset.
ga is the table.
OTHER CORE STATEMENTS
Although the SELECT and FROM statements are the only required parameters to run a query, you'll usually want to add some type of conditional logic or grouping to your query.
WHERE
This statement creates a logical condition within the query. The resulting query will only include data for which the result of the logical condition is true.
In the example below, building up from our first query, we added a condition where only sessions belonging to the month of may ('2020-05-01') appear.
SELECT
month,
sessions,
FROM `webris-recipes.wqa.ga`
WHERE month = '2020-05-01'
This statement can be combined with all sort of logical clauses such as
- Greater than (>)
- Lower than (<)
- Equal or greater than (>=)
- Lower or equal than (=<)
- Not equal to (!=)
- Not in (value1, value2)
- In (value1, value2)
Your first conditional logic will always be noted by a WHERE parameter.
After that, you can add on 2 or more bits of conditional logic with AND / OR statements.
For the example above, we added the condition where sessions are greater or equal than 100.
SELECT
month,
sessions,
FROM `webris-recipes.wqa.ga`
WHERE month = '2020-05-01'
AND sessions >= 100
It is worth pointing out that the logical comparison can be done on any fields of the table, not just fields we are returning in the query SELECT.
In the example below, we bring only results that have the hostname 'condingisforlosers.com'.
SELECT
month,
sessions,
FROM `webris-recipes.wqa.ga`
WHERE hostname = 'codingisforlosers.com'
GROUP BY
This statement allows you to roll up query results by a dimension or set of dimensions.
Let's say that you want to get the total value of sessions by month and hostname. Using GROUP BY:
SELECT
month,
hostname,
sum(sessions)
FROM `webris-recipes.wqa.ga`
WHERE month = '2020-05-01'
GROUP BY month, hostname
Every column within the query that you're not grouping by, must be aggregated in some way.
In the case above, given we want the total sessions, we aggregate them by summing, and the two other fields have to used as dimensions and included in the GROUP BY statement.
Aggregation can be done by a large array of arithmetic and logic operations, including but not limited to:
- sum
- avg
- median
- standard deviation
- count
- count(distinct( ))
- max
- min
ORDER BY
This statement allows you to order the results of your query by one or more fields.
The query below sorts results by month in ascending order and then by the sum of sessions in descending order.
So the results will appear in chronological order, and for the same month, the results with the highest sum of sessions will appear first.
SELECT
month,
hostname,
sum(sessions) as sessions
FROM `webris-recipes.wqa.ga`
GROUP BY month, hostname
ORDER BY month asc, sessions desc
The ORDER BY statement can include any number of fields as long as they are separated by a comma (','), as well as specifying either ascending order (asc) or descending order (desc).