前言
在擬定商業策略的時候 Cohorts Retention 一直是很常被用來參考的指標, 透過這個指標可以了解客戶對於產品的黏著度倒底有多高, 進而去調整產品規劃或是公司的營運方向Cohorts Retention Rate
又稱回客率, 也就是當過了一段時間之後仍還繼續使用產品的客戶比率, 若以七天為一個週期來看的話, 就是每週的回客率每週回客率 = 當週的舊客戶數/第0週的所有客戶數
舉例來說, 以下是一個週回客率的表格, 總共分析六週(Week 0 ~ Week 5), 而在Week 0 出現的客戶群即在左方日期區間內出現的客戶群, 而紅色箭頭的 25.7% 指的是出現在Week 0(Feb 3 - Feb 9) 的客戶仍然有出現在Week 1的(即當週的舊客戶)占所有出現在 Week 0 的客戶的比例
SQL 計算 Retention
如下圖所示, Retention 計算出來之後會呈現出一個左上三角形的圖表
而這整張表的計算其實並不複雜, 基本上只要知道第一列怎麼算, 剩下的也都不是問題, 只要如法炮製就好
Step 1. 首先算出第0週的所有使用者
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20180731' AND '20180807'
Step 2. 計算第一週(Week 1)還留存多少客戶
這個部分就是將第0週的客戶資料跟第1週的直接做JOIN, 就可得知在第一週時有哪些第0週的客戶回流了, 如下
WITH week_0 AS(
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20180731' AND '20180807'
)
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
JOIN week_0 using(user_pseudo_id)
WHERE _TABLE_SUFFIX BETWEEN '20180807' AND '201808015'
依此類推可以陸續算出第二第三週的回客數
若想要更近一步算出比率(Retention Rate), 可以調整查詢語法如下所示WITH week_0 AS(
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20180731' AND '20180807'
),
week_1 AS (
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
JOIN week_0 using(user_pseudo_id)
WHERE _TABLE_SUFFIX BETWEEN '20180807' AND '20180815'
),
week_2 AS (
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
JOIN week_0 using(user_pseudo_id)
WHERE _TABLE_SUFFIX BETWEEN '20180814' AND '20180822'
),
week_3 AS (
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
JOIN week_0 using(user_pseudo_id)
WHERE _TABLE_SUFFIX BETWEEN '20180821' AND '20180829'
)
SELECT
(SELECT COUNT(*) FROM week_0) AS week_0_cohort,
(SELECT COUNT(*) FROM week_1) AS week_1_cohort,
(SELECT COUNT(*) FROM week_2) AS week_2_cohort,
(SELECT COUNT(*) FROM week_3) AS week_3_cohort
SELECT
( week_0_cohort/week_0_cohort ) AS week_0,
( week_1_cohort/week_0_cohort ) AS week_1,
( week_2_cohort/week_0_cohort ) AS week_2,
( week_3_cohort/week_0_cohort ) AS week_3
FROM (
WITH week_0 AS(
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20180731' AND '20180807'
),
week_1 AS (
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
JOIN week_0 using(user_pseudo_id)
WHERE _TABLE_SUFFIX BETWEEN '20180807' AND '20180815'
),
week_2 AS (
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
JOIN week_0 using(user_pseudo_id)
WHERE _TABLE_SUFFIX BETWEEN '20180814' AND '20180822'
),
week_3 AS (
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
JOIN week_0 using(user_pseudo_id)
WHERE _TABLE_SUFFIX BETWEEN '20180821' AND '20180829'
)
SELECT
(SELECT COUNT(*) FROM week_0) AS week_0_cohort,
(SELECT COUNT(*) FROM week_1) AS week_1_cohort,
(SELECT COUNT(*) FROM week_2) AS week_2_cohort,
(SELECT COUNT(*) FROM week_3) AS week_3_cohort
)
如何計算任意週期的Retention
剛剛的示範是計算週回客數, 若計算的週期不是七天而是3天時, 可以修改查詢語法如下DECLARE start_day, period INT64;
SET start_day = UNIX_MICROS(TIMESTAMP("2018-08-01 00:00:00", "-7:00"));
SET period = 3600*1000*1000*24*3;
WITH base AS (
SELECT user_pseudo_id,
event_timestamp,
FROM `firebase-public-project.analytics_153293282.events_*`
WHERE _table_suffix BETWEEN '20180731' AND '20180819'
)
宣告兩個變數, 開始日期start_day和週期period, 待會兒在計算每個週期的回客數時, 可以透過這兩個變數以start_day+(N* period) AND start_day+ ( (N+1) * period)作為過濾條件, 來撈出特定日期區間內的使用者WITH week_0 AS(
SELECT DISTINCT user_pseudo_id
FROM base
WHERE event_timestamp BETWEEN start_day+(1* period) AND start_day+(2* period)
),
完整語法如下
DECLARE start_day, period INT64;
SET start_day = UNIX_MICROS(TIMESTAMP("2018-08-01 00:00:00", "-7:00"));
SET period = 3600*1000*1000*24*3;
WITH base AS (
SELECT user_pseudo_id,
event_timestamp,
FROM `firebase-public-project.analytics_153293282.events_*`
WHERE _table_suffix BETWEEN '20180731' AND '20180819'
)
SELECT
( week_0_cohort/week_0_cohort ) AS week_0,
( week_1_cohort/week_0_cohort ) AS week_1,
( week_2_cohort/week_0_cohort ) AS week_2,
( week_3_cohort/week_0_cohort ) AS week_3
FROM (
WITH week_0 AS(
SELECT DISTINCT user_pseudo_id
FROM base
WHERE event_timestamp BETWEEN start_day+(0* period) AND start_day+(1* period)
),
week_1 AS (
SELECT DISTINCT user_pseudo_id
FROM base
JOIN week_0 USING(user_pseudo_id)
WHERE event_timestamp BETWEEN start_day+(1* period) AND start_day+(2* period)
),
week_2 AS (
SELECT DISTINCT user_pseudo_id
FROM base
JOIN week_0 USING(user_pseudo_id)
WHERE event_timestamp BETWEEN start_day+(2* period) AND start_day+(3* period)
),
week_3 AS (
SELECT DISTINCT user_pseudo_id
FROM base
JOIN week_0 USING(user_pseudo_id)
WHERE event_timestamp BETWEEN start_day+(3* period) AND start_day+(4* period)
)
SELECT
(SELECT COUNT(*) FROM week_0) AS week_0_cohort,
(SELECT COUNT(*) FROM week_1) AS week_1_cohort,
(SELECT COUNT(*) FROM week_2) AS week_2_cohort,
(SELECT COUNT(*) FROM week_3) AS week_3_cohort
)
以上的語法只有算出第一列
如果要算出剩下的部分可以作部分修改再將結果UNION起來
REF:
https://appworks.tw/startup-retention/
留言
張貼留言