In a previous article, we covered how to export data from BigQuery to Google Sheets or Data Studio for further analysis.

But what if we want to run a quick analysis directly in BigQuery?

With just a little bit of knowledge of SQL, you can write your own queries and quickly analyze your data, directly in BigQuery.

Let's go over the main concepts you need to know before running your first queries in BigQuery.

NOTE:

This article covers just the basics of SQL queries in BigQuery. For a more in-depth look, check out our free BigQuery course here.

QUERY EDITOR

The Query Editor is the principal tool you'll use to write and modify queries in BigQuery, which you can access at https://console.cloud.google.com/bigquery.

You can also access the Query Editor from the Google Cloud Platform Dashboard, via the hamburger menu at the top left.

KEY TIP

The BigQuery option may appear at the bottom the first time you look for it, but you can pin it to the top using the pin logo when you hover over it.

You can directly start writing a new query from within the editor. Or, if you have a table open in "preview mode", you can select "QUERY TABLE" to populate a basic query for that table:

CORE ELEMENTS OF A SQL QUERY

Below you can see an example of a simple SQL query. The query takes a table (GA) from a specific project and database and gives us the fields month and sessions from it.

SELECT 

month,
sessions,
FROM `webris-recipes.wqa.ga`

The core elements of the query are in all caps: the SELECT and FROM parameters. ALL SQL QUERIES MUST INCLUDE THESE.

SELECT

This defines the columns to retrieve from the table, and any processing you'd like to perform on those columns. In the query above, we are selecting two fields to be retrieved: month and sessions.

FROM

If the SELECT statement is the what of the query, the FROM statement is the where.

FROM defines the table that we'd like to pull data from, and in BigQuery it's structured as:

'project.dataset.table'

Think of it as the address to find the data, navigating down through the levels of granularity (project, dataset, table). So, from our example above:

`webris-recipes.wqa.ga`

webris-recipes is the project.

wqa is the dataset.

ga is the table.

OTHER CORE STATEMENTS

Although the SELECT and FROM statements are the only required parameters to run a query, you'll usually want to add some type of conditional logic or grouping to your query.

WHERE

This statement creates a logical condition within the query. The resulting query will only include data for which the result of the logical condition is true.

In the example below, building up from our first query, we added a condition where only sessions belonging to the month of may ('2020-05-01') appear.

SELECT 

month,
sessions,
FROM `webris-recipes.wqa.ga`
WHERE month = '2020-05-01'

This statement can be combined with all sort of logical clauses such as

  • Greater than (>)

  • Lower than (<)

  • Equal or greater than (>=)

  • Lower or equal than (=<)

  • Not equal to (!=)

  • Not in (value1, value2)

  • In (value1, value2)

Your first conditional logic will always be noted by a WHERE parameter.

After that, you can add on 2 or more bits of conditional logic with AND / OR statements.

For the example above, we added the condition where sessions are greater or equal than 100.

SELECT 

month,
sessions,
FROM `webris-recipes.wqa.ga`
WHERE month = '2020-05-01'

AND sessions >= 100

It is worth pointing out that the logical comparison can be done on any fields of the table, not just fields we are returning in the query SELECT.

In the example below, we bring only results that have the hostname 'condingisforlosers.com'.

SELECT 

month,
sessions,
FROM `webris-recipes.wqa.ga`
WHERE hostname = 'codingisforlosers.com'

GROUP BY

This statement allows you to roll up query results by a dimension or set of dimensions.

Let's say that you want to get the total value of sessions by month and hostname. Using GROUP BY:

SELECT 
month,
hostname,
sum(sessions)

FROM `webris-recipes.wqa.ga`
WHERE month = '2020-05-01'
GROUP BY month, hostname

Every column within the query that you're not grouping by, must be aggregated in some way.

In the case above, given we want the total sessions, we aggregate them by summing, and the two other fields have to used as dimensions and included in the GROUP BY statement.

Aggregation can be done by a large array of arithmetic and logic operations, including but not limited to:

  • sum

  • avg

  • median

  • standard deviation

  • count

  • count(distinct( ))

  • max

  • min

ORDER BY

This statement allows you to order the results of your query by one or more fields.

The query below sorts results by month in ascending order and then by the sum of sessions in descending order.

So the results will appear in chronological order, and for the same month, the results with the highest sum of sessions will appear first.

SELECT 
month,
hostname,
sum(sessions) as sessions

FROM `webris-recipes.wqa.ga`
GROUP BY month, hostname
ORDER BY month asc, sessions desc

The ORDER BY statement can include any number of fields as long as they are separated by a comma (','), as well as specifying either ascending order (asc) or descending order (desc).

OTHER RESOURCES TO CHECK OUT

Did this answer your question?