跳到主要內容

Retention Cohorts Using BigQuery



前言

在擬定商業策略的時候 Cohorts Retention 一直是很常被用來參考的指標, 透過這個指標可以了解客戶對於產品的黏著度倒底有多高, 進而去調整產品規劃或是公司的營運方向

Cohorts Retention Rate

又稱回客率,  也就是當過了一段時間之後仍還繼續使用產品的客戶比率, 若以七天為一個週期來看的話, 就是每週的回客率

每週回客率 = 當週的舊客戶數/第0週的所有客戶數

舉例來說, 以下是一個週回客率的表格, 總共分析六週(Week 0 ~ Week 5), 而在Week 0 出現的客戶群即在左方日期區間內出現的客戶群, 而紅色箭頭的 25.7% 指的是出現在Week 0(Feb 3 - Feb 9) 的客戶仍然有出現在Week 1的(即當週的舊客戶)占所有出現在 Week 0 的客戶的比例






SQL 計算 Retention


如下圖所示, Retention 計算出來之後會呈現出一個左上三角形的圖表

而這整張表的計算其實並不複雜, 基本上只要知道第一列怎麼算, 剩下的也都不是問題, 只要如法炮製就好

接下來的演練將使用Open Data的資料


Step 1. 首先算出第0週的所有使用者

SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
WHERE  _TABLE_SUFFIX BETWEEN '20180731' AND '20180807'

Step 2. 計算第一週(Week 1)還留存多少客戶


這個部分就是將第0週的客戶資料跟第1週的直接做JOIN, 就可得知在第一週時有哪些第0週的客戶回流了, 如下

WITH week_0 AS(
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
WHERE  _TABLE_SUFFIX BETWEEN '20180731' AND '20180807'
)

SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
JOIN week_0 using(user_pseudo_id)
WHERE _TABLE_SUFFIX BETWEEN '20180807' AND '201808015'


依此類推可以陸續算出第二第三週的回客數

WITH week_0 AS(
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
WHERE  _TABLE_SUFFIX BETWEEN '20180731' AND '20180807'
),

week_1 AS (
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
JOIN week_0 using(user_pseudo_id)
WHERE _TABLE_SUFFIX BETWEEN '20180807' AND '20180815'
),

week_2 AS (
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
JOIN week_0 using(user_pseudo_id)
WHERE _TABLE_SUFFIX BETWEEN '20180814' AND '20180822'
),

week_3 AS (
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
JOIN week_0 using(user_pseudo_id)
WHERE _TABLE_SUFFIX BETWEEN '20180821' AND '20180829'
)

SELECT
 (SELECT COUNT(*) FROM week_0) AS week_0_cohort,
 (SELECT COUNT(*) FROM week_1) AS week_1_cohort,
 (SELECT COUNT(*) FROM week_2) AS week_2_cohort,
 (SELECT COUNT(*) FROM week_3) AS week_3_cohort
若想要更近一步算出比率(Retention Rate), 可以調整查詢語法如下所示

SELECT
 ( week_0_cohort/week_0_cohort ) AS week_0,
 ( week_1_cohort/week_0_cohort ) AS week_1,
 ( week_2_cohort/week_0_cohort ) AS week_2,
 ( week_3_cohort/week_0_cohort ) AS week_3
FROM (

WITH week_0 AS(
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
WHERE  _TABLE_SUFFIX BETWEEN '20180731' AND '20180807'
),

week_1 AS (
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
JOIN week_0 using(user_pseudo_id)
WHERE _TABLE_SUFFIX BETWEEN '20180807' AND '20180815'
),

week_2 AS (
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
JOIN week_0 using(user_pseudo_id)
WHERE _TABLE_SUFFIX BETWEEN '20180814' AND '20180822'
),

week_3 AS (
SELECT DISTINCT user_pseudo_id
FROM `firebase-public-project.analytics_153293282.events_*`
JOIN week_0 using(user_pseudo_id)
WHERE _TABLE_SUFFIX BETWEEN '20180821' AND '20180829'
)

SELECT
 (SELECT COUNT(*) FROM week_0) AS week_0_cohort,
 (SELECT COUNT(*) FROM week_1) AS week_1_cohort,
 (SELECT COUNT(*) FROM week_2) AS week_2_cohort,
 (SELECT COUNT(*) FROM week_3) AS week_3_cohort
)

如何計算任意週期的Retention

剛剛的示範是計算週回客數, 若計算的週期不是七天而是3天時, 可以修改查詢語法如下

DECLARE start_day, period INT64;
SET start_day = UNIX_MICROS(TIMESTAMP("2018-08-01 00:00:00", "-7:00"));
SET period = 3600*1000*1000*24*3;

WITH base AS (
   SELECT user_pseudo_id,
          event_timestamp,                   
   FROM `firebase-public-project.analytics_153293282.events_*`
   WHERE _table_suffix BETWEEN '20180731' AND '20180819'
)
宣告兩個變數, 開始日期start_day和週期period, 待會兒在計算每個週期的回客數時,  可以透過這兩個變數以start_day+(N* period) AND start_day+ ( (N+1) * period)作為過濾條件, 來撈出特定日期區間內的使用者

WITH week_0 AS(
SELECT DISTINCT user_pseudo_id
FROM base
WHERE   event_timestamp BETWEEN start_day+(1* period) AND start_day+(2* period)
),


完整語法如下

DECLARE start_day, period INT64;
SET start_day = UNIX_MICROS(TIMESTAMP("2018-08-01 00:00:00", "-7:00"));
SET period = 3600*1000*1000*24*3;

WITH base AS (
   SELECT user_pseudo_id,
          event_timestamp,                   
   FROM `firebase-public-project.analytics_153293282.events_*`
   WHERE _table_suffix BETWEEN '20180731' AND '20180819'
)

 SELECT
 ( week_0_cohort/week_0_cohort ) AS week_0,
 ( week_1_cohort/week_0_cohort ) AS week_1,
 ( week_2_cohort/week_0_cohort ) AS week_2,
 ( week_3_cohort/week_0_cohort ) AS week_3
FROM (

WITH week_0 AS(
SELECT DISTINCT user_pseudo_id
FROM base
WHERE   event_timestamp BETWEEN start_day+(0* period) AND start_day+(1* period)
),

week_1 AS (
SELECT DISTINCT user_pseudo_id
FROM base
JOIN week_0 USING(user_pseudo_id)
WHERE  event_timestamp BETWEEN start_day+(1* period) AND start_day+(2* period)
),

week_2 AS (
SELECT DISTINCT user_pseudo_id
FROM base
JOIN week_0 USING(user_pseudo_id)
WHERE  event_timestamp BETWEEN start_day+(2* period) AND start_day+(3* period)
),

week_3 AS (
SELECT DISTINCT user_pseudo_id
FROM base
JOIN week_0 USING(user_pseudo_id)
WHERE  event_timestamp BETWEEN start_day+(3* period) AND start_day+(4* period)
)

SELECT
 (SELECT COUNT(*) FROM week_0) AS week_0_cohort,
 (SELECT COUNT(*) FROM week_1) AS week_1_cohort,
 (SELECT COUNT(*) FROM week_2) AS week_2_cohort,
 (SELECT COUNT(*) FROM week_3) AS week_3_cohort
)

以上的語法只有算出第一列
如果要算出剩下的部分可以作部分修改再將結果UNION起來




REF: 
https://appworks.tw/startup-retention/

留言

這個網誌中的熱門文章

[解決方法] docker: permission denied

前言 當我們執行docker 指令時若出現以下錯誤訊息 docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.26/containers/create: dial unix /var/run/docker.sock: connect: permission denied. See 'docker run --help'. 表示目前的使用者身分沒有權限去存取docker engine, 因為docker的服務基本上都是以root的身分在執行的, 所以在指令前加sudo就能成功執行指令 但每次實行docker指令(就連docker ps)都還要加sudo實在有點麻煩, 正確的解法是 我們可以把目前使用者加到docker群組裡面, 當docker service 起來時, 會以這個群組的成員來初始化相關服務 sudo groupadd docker sudo usermod -aG docker $USER 需要退出重新登錄後才會生效 Workaround 因為問題是出在權限不足, 如果以上方法都不管用的話, 可以手動修改權限來解決這個問題 sudo chmod 777 /var/run/docker.sock https://docs.docker.com/install/linux/linux-postinstall/

[C#] Visual Studio, 如何在10分鐘內快速更改命名專案名稱

前言: 由於工作需要, 而且懶得再重寫類似的專案, 所以常常將之前寫的專案複製一份加料後, 再重新命名編譯 假設今天我有一個專案HolyUWP, 我想把它重新命名成 BestUWP 時該怎麼做? 以下是幾個簡單的的步驟 使用Visual Studio 2017 備份原來專案 更改Solution名稱 更改Assembly name, Default namespce 更改每支程式碼的Namespace 更改專案資料夾名稱 備份原來專案 由於怕改壞掉, 所以在改之前先備份 更改Solution名稱 更改sln的名稱, 這邊我改成BestUWP.sln 使用Visual Studio打開你的.sln, 右鍵點擊Solution後選擇Rename, 這邊我把它重新命名成BestUWP(跟檔案名稱一致) 必要的話可以順便修改Porject名稱 更改Assembly name, Default namespce 進入 Project > OOXX Properties    修改Assembly Name, Default namesapce 更改每支程式碼的Namespace 基本上隨便挑一支有用到預設Namesapce(HolyUWP)的程式碼來改就好了 重新命名後點擊Apply,  這個動作做完後所有用到舊Namespace的程式碼都會被改成新的 更改專案資料夾名稱 以上動作做完後, 基本上就可以把專案編譯出來測看看了~

[解決方法] mac 作業系統上無法使用 docker

  錯誤訊息 Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? 原因 因為 docker 的設計是走 client-server 的架構,  如果少裝了 server 的部分就會出現以上的錯誤訊息 解決方法 因為 docker daemon 需要使用 linux kernel 上的某些功能, 所以若想要在 mac 的 OS X 上使用 docker 必須額外起一台 linux VM 給 docker daemon 用  Step 1. 安裝 virtual box $ brew install virtualbox --cask   Step 2. 安裝 docker machine $ brew install docker-machine --cask   Step 3. 設定 使用 docker-machine 建立 VM 跑容器 $docker-machine create --driver virtualbox default $docker-machine restart   輸出環境變數 $docker-machine env default 如果執行以上的指令出現錯誤訊息 Error checking TLS connection: ...  可以執行以下指令重新產生憑證 $docker-machine regenerate-certs 最後套用環境變數, 讓 docker 知道要怎麼去跟這台 VM 溝通  $eval $(docker-machine env default)   測試 若做完以上的步驟沒噴錯誤訊息的話, 可以跑個 hello-world 看看 docker daemon 有沒有起來 $docker run hello-world Unable to find image 'hello-world:latest' locally latest: Pulling from library/hello-world 0e03bdcc26d7: Pull complete Digest: sha256:95ddb6c31407e84e91a986b004aee40975cb0