SQL insights enable you to directly access data in PostHog using SQL queries. They're powered by HogQL.
SQL commands
You create SQL queries out of a list of commands that shape what data, filters, aggregations, and ordering we want to see.
SQL queries in PostHog don’t require the trailing semi-colon (
;
) of traditional SQL queries.
SELECT
Use SELECT
to select data (usually columns, transformations, or aggregations) from one or more tables in the database. You can use arithmetic or functions to modify the selected data.
SELECTevent,timestamp,properties.$current_url,concat(properties.$lib, ' - ', properties.$lib_version)FROM events
Common values to select are *
(representing all), event
, timestamp
, properties
, and functions like count()
. You can access properties using dot notation like person.properties.$initial_browser
. These values can be found in the data management properties tab or inside tables in the database warehouse tab.
Add the DISTINCT
clause to SELECT
commands to keep only unique rows in query results.
SELECT DISTINCT person_idFROM events
FROM
Use FROM
to select the database table to run the queries on. In PostHog, examples include events
, groups
, raw_session_replay_events
, and more listed in the data management database tab.
SELECT session_id, min_first_timestamp, click_countFROM raw_session_replay_events
FROM
also enables you to break down your query into subqueries. This is useful for analyzing multiple groups of data. For example, to get the difference in event count between the last 7 days and the 7 days before that, you can use two subqueries like this:
SELECTlast_7_days.last_7_days_count - previous_7_days.previous_7_days_countFROM(SELECT COUNT(*) AS last_7_days_countFROM eventsWHERE timestamp > now() - INTERVAL 7 DAYAND timestamp <= now()) AS last_7_days,(SELECT COUNT(*) AS previous_7_days_countFROM eventsWHERE timestamp > now() - INTERVAL 14 DAYAND timestamp <= now() - INTERVAL 7 DAY) AS previous_7_days
JOIN
You can query over multiple tables together using the JOIN
command. This combines the tables and returns different records depending on which of the four conditions you use:
INNER JOIN
: Records with matching values in both tables.LEFT JOIN
: All records from the left table and the matched records from the right table.RIGHT JOIN
: All records from the right table and the matched records from the left table.FULL JOIN
: All records matching either the left or right tables.
The command then takes one table before it and another after it and combines them based on the join condition using the ON
keyword. For example, below, the events table is the left table and the persons table is the right table:
SELECT events.event, persons.is_identifiedFROM eventsLEFT JOIN persons ON events.person_id = persons.id
This is especially useful when querying using the data warehouse and querying external sources. For example, once you set up the Stripe connector, you can query for a count of events from your customers like this:
SELECT events.distinct_id, COUNT() AS event_countFROM eventsINNER JOIN prod_stripe_customer ON events.distinct_id = prod_stripe_customer.emailGROUP BY events.distinct_idORDER BY event_count DESC
WHERE
Use WHERE
to filter rows based on specified conditions. These conditions can be:
- Comparison operators like
=
,!=
,<
, or>=
- Logical operators like
AND
,OR
, orNOT
. These are often used to combine multiple conditions. - Functions like
toDate
,today()
- Clauses like
LIKE
,IN
,IS NULL
,BETWEEN
SELECT *FROM eventsWHERE event = '$pageview'AND toDate(timestamp) = today()AND properties.$current_url LIKE '%/blog%'
WHERE
is also useful for querying across multiple tables. For example, if you have the Hubspot connector set up, you can get a count of events for contacts with a query like this:
SELECT COUNT() AS event_count, distinct_idFROM eventsWHERE distinct_id IN (SELECT email FROM hubspot_contacts)GROUP BY distinct_idORDER BY event_count DESC
GROUP BY
Use GROUP BY
to group rows that have the same values in specified columns into summary rows. It is often used in combination with aggregate functions.
selectproperties.$os,count()from eventsgroup byproperties.$os
ORDER BY
Use ORDER BY
to sort the query results by one or more columns. You can specify order by ascending with ASC
or descending with DESC
.
SELECTproperties.$browser,count()FROM eventsGROUP BY properties.$browserORDER BY count() DESC
LIKE
Use LIKE
to search for a specified pattern in a column. This is often done in the WHERE
command. You can also use ILIKE
to make the search case insensitive.
Use the %
sign to represent any set of characters (wildcard). Use the _
sign to define a single character wildcard.
For example, to get all the current URLs that contain the string "docs" you can use:
SELECT properties.$current_urlFROM eventsWHERE properties.$current_url LIKE '%docs%'
AS
Use AS
to alias columns or tables with different names. This makes the query and results more readable.
SELECTproperties.$current_url as current_url,count() as current_url_countFROM eventsGROUP BY current_urlORDER BY current_url_count DESC
LIMIT
Use LIMIT
to restrict the number of rows returned by the query. It specifies the maximum number of rows the query should retrieve. By default, PostHog sets it at 100
.
SELECTproperties.$lib as library,count() as library_countFROM eventsWHERE properties.$lib != ''GROUP BY libraryORDER BY library_count DESCLIMIT 1
To paginate results, you can use the OFFSET
command. For example, to get the 101-150th rows, you can use it like this:
SELECT event, count(*)FROM eventsGROUP BY eventORDER BY count(*) DESCLIMIT 50 OFFSET 100
HAVING
Use HAVING
with the GROUP BY
command to filter the results based on aggregate function values. While WHERE
filters rows before grouping, HAVING
filters grouped results after aggregation.
SELECTproperties.$os,count()FROM eventsGROUP BYproperties.$osHAVING count() > 100
WITH
Use WITH
to define a temporary result set that you can reference within a larger query. It helps break down complex queries into smaller parts. You can think of it as a function that returns a temporary table similar to using a subquery in a FROM
command. The difference is that we query WITH
subqueries each time they are used, potentially leading to slower queries.
with first_query as (selectcount() as first_countfrom events)selectfirst_countfrom first_query
WINDOW
Use WINDOW
to query data across a set of rows related to the current row without grouping the rows in the output like aggregates do. This is useful for complex queries that need row-level detail while also aggregating data from a set of rows.
A window function contains multiple parts:
PARTITION BY
: Divides the rows into partitions which the function then applies to. Each partition is processed separately.ORDER BY
: Sorts rows within each partition.Frame Specification: Defines the subset of rows to include in the window such as
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
.
It is useful for analysis like running totals, moving averages, ranking, and percentiles. For example, to get a running total of pageviews, you can:
SELECTevent,timestamp,COUNT(*) OVER w AS running_total_pageviewsFROM eventsWHERE event = '$pageview'WINDOW w AS (ORDER BY timestamp)ORDER BY timestamp DESC
CASE
Use CASE
within the SELECT
command to implement conditional logic within your SQL queries. It allows you to execute different SQL expressions based on conditions. It is similar to if/else statements in programming.
For example, to group properties.$os
values in mobile
, desktop
, and other
, you can use:
SELECTCASEWHEN properties.$os IN ('iOS', 'Android') THEN 'mobile'WHEN properties.$os IN ('Windows', 'Mac OS X', 'Linux', 'Chrome OS') THEN 'desktop'ELSE 'other'END AS os_category,COUNT(*) AS os_countFROM eventsWHERE properties.$os IS NOT NULLGROUP BY os_categoryORDER BY os_count DESC
Comments
Use two dashes (--
) to write comments.
SELECT *FROM events-- WHERE event = '$pageview'
Useful functions
HogQL includes many functions to aggregate and manipulate queried data. Below are some examples of some of the most popular SQL functions you can use in your insights.
Aggregate functions
These aggregate results for columns across all rows. They include:
avg()
: Calculates the average numeric value of a column.sum()
: Calculates the total (sum) numeric value of a column.max()
,min()
: Finds the maximum or minimum value of a column.
SELECTavg(properties.$screen_height),sum(properties.$screen_height),max(properties.$screen_height),min(properties.$screen_height)FROM eventsWHERE event = '$pageview' AND properties.$screen_height IS NOT NULL
You can find a full list of these in supported aggregations.
Count
Use count()
to count the number of rows in a particular column. count(*)
counts all rows, while count(column_name)
counts the number of non-null values in a column.
Regular functions
HogQL provides many functions for accessing, modifying, and calculating data from queries. Along with the ones listed below, many basics include calculation operators (+
, -
, /
, *
), type conversions (toInt
, toString
), conditional statements (if
, multiIf
), and rounding (floor
, round
).
You can find a full list of these in supported ClickHouse functions.
Date and time
now()
,today()
,yesterday()
: Returns the current time, date, or yesterday’s date respectively.interval
: A length of time for use in arithmetic operations with other dates and times.toDayOfWeek
,toHour
,toMinute
: Converts date number of day of week (1-7), hour in 24-hour time (0-23), and minute in hour (0-59).toStartOfYear
,toStartOfMonth
,toMonday
,toStartOfDay
,toStartOfMinute
: rounds date down to the nearest year, month, Monday, day, hour, or minute respectivelydateDiff('unit', startdate, enddate)
: Returns the count inunit
betweenstartdate
andenddate
.formatDateTime
: Formats a time according to a MySQL datetime format string.
SELECTformatDateTime(now(), '%a %b %T') AS current_time,toDayOfWeek(now()) AS current_day_of_week,dateDiff('day', timestamp, now()) AS days_since_eventFROM eventsWHERE timestamp > now() - interval 1 day
Read more examples in How to do time-based breakdowns (hour, minute, real time) and Using HogQL for advanced time and date filters.
String
extract
: Extracts a fragment of a string using a regular expression.concat
: Concatenates strings listed without separator.splitByChar
,splitByString
,splitByRegexp
,splitByWhitespace
: splits a string into substring separated by a specified character, string, regular expression, or whitespace character respectively.match
: Return whether the string matches a regular expression pattern.replaceOne
,replaceRegexpOne
: Replace the first occurrence of matching a substring or regular expression pattern respectively with a replacement string.trim
: Remove specified characters (or whitespace) from the start or end of a string.upper
,lower
: Converts a string to uppercase or lowercase.
selectextract(elements_chain, '[:|"]attr__class="(.*?)"') as class_name,concat(properties.$os, ' version: ', properties.$os_version),replaceRegexpOne(properties.$current_url, '^/', 'site/') AS modified_current_urlfrom eventswhere event = '$autocapture'
Read more in How to analyze autocapture events with HogQL.
JSON
You can access nested data in JSON and objects directly.
select properties.$set.$geoip_country_namefrom events
You can parse JSON with JSONExtractRaw()
to return a value.
SELECTJSONExtractRaw(properties.$set) as set_propertiesFROM eventsWHERE properties.$set IS NOT NULL
Specialized JSONExtract
functions exist for different data types including:
JSONExtractFloat
JSONExtractArrayRaw
JSONExtractString
JSONExtractBool
Array
arrayElement(arr, n)
: Retrieves the element with the index of n from the arrayarr
.arrayJoin(arr)
: Takes a row and generates multiple rows for the number of elements in the array. It copies all the column values, except the column where this function is applied. It replaces the applied column with the corresponding array value.
SELECT flag, count()FROM (SELECT arrayJoin(JSONExtractArrayRaw(assumeNotNull(properties.$active_feature_flags))) as flagFROM eventsWHERE event = '$pageview' and timestamp > '2023-08-01')GROUP BY flagORDER BY count() desc
Read more in How to filter and breakdown arrays with HogQL.
Sparkline
A sparkline is a tiny graph contained in one cell of your query result. As an argument, it takes an array of integers.
SELECT sparkline(range(1, 10)) FROM (SELECT 1)
You can use it to visualize queries, such as a 24-hour $pageview
count for different $current_url
values.
SELECTpageview,sparkline(arrayMap(h -> countEqual(groupArray(hour), h), range(0,23))),count() as pageview_countFROM(SELECTproperties.$current_url as pageview,toHour(timestamp) AS hourFROMeventsWHEREtimestamp > now () - interval 1 dayand event = '$pageview') subqueryGROUP BYpageviewORDER BYpageview_count desc
You can also use it for art.
selectsparkline(arrayMap(a -> cos(toSecond(timestamp) + a/4), range(100 + 5 * toSecond(timestamp))))from events
Accessing data
Strings and quotes
Quotation symbols work the same way they would work with ClickHouse, which inherits from ANSI SQL:
- Single quotes (
'
) for Strings literals. - Double quotes (
"
) and Backticks (`) for DataBase identifiers.
For example:
SELECT * FROM events WHERE properties.`$browser` = 'Chrome'
Types
Types (and names) for the accessible data can be found in the database, properties tabs in data management as well as in the data warehouse tab for external sources. They include:
STRING
(default)JSON
(accessible with dot or bracket notation)DATETIME
(inISO-8601
, read more in our data docs)INTEGER
NUMERIC
(AKA float)BOOLEAN
For example:
SELECT round(properties.$screen_width * properties.$screen_height / 1000000, 2) as `Screen MegaPixels` FROM events LIMIT 100
This works because $screen_width
and $screen_height
are both defined as numeric properties. Thus you can multiply them.
To cast a string property into a different type, use type conversion functions, such astoString
, toDate
, toFloat
, JSONExtractString
, JSONExtractInt
, and more.
Property access
To access a property stored on an event or person, just use dot notation. For example properties.$browser
or person.properties.$initial_browser
. You can also use bracket notation like properties['$feature/cool-flag']
.
Nested property or JSON access, such as properties.$some.nested.property
, works as well.
PostHog's properties include always include
$
as a prefix, while custom properties do not (unless you add it).
Property identifiers must be known at query time. For dynamic access, use the JSON manipulation functions from below on the properties
field directly.
Some queries can error when accessing null values. To avoid this, use the COALESCE
function to replace null values with a default value or filter NULL
values with IS NOT NULL
and use assumeNotNull
to cast a column to a non-null type.