Run notebook in Airflow

The simplest way to achieve this goal is using KubernetesPodOperator to execute papermill command.

For example

    run_notebook = kubernetes_pod_operator.KubernetesPodOperator(
            task_id=f"run-notebook",
            name=f"run-notebook",
            namespace='default',
            is_delete_operator_pod=True,
            image_pull_policy="IfNotPresent",
            startup_timeout_seconds=3600,
            cmds=['/bin/bash'],
            arguments=["-c",
            """
            
            echo y | conda create --name=runenv python=3.8
            source /opt/conda/etc/profile.d/conda.sh
            conda activate runenv
                        conda install ipykernel
            python -m ipykernel install --user --name=runenv
            
            papermill \
                "{{dag_run.conf['nbIn']}}" \
                "{{dag_run.conf['nbOut']}}" \
                -k runenv \
                {{dag_run.conf['nbParams']}}
            
            """
            ],
            image=f'gcr.io/deeplearning-platform-release/base-cu100')

We overwrite the entrypoint to /bin/bash and pass a script that will run papermill to execute the specified notebook.

As you can see, papermill will take parameters passed to DAG

nbIn

The path of the notebook we want to execute.

nbOut

The path of the notebook execution result.

nbParams

Parameters which will be passed to notebook.

With these parameters, we can trigger the DAG like below

resp = requests.request(
       "POST", 
       url,
       headers={
         "Authorization": "Bearer {}".format(google_open_id_connect_token)
       }, 
       json={ 
         "conf": {
             "nbIn: "gs://andy-notebook/in/template.ipynb",
             "nbOut: "gs://andy-notebook/out/template.ipynb",
             "nbParams: "-p dataset gs://andy-data/",      
         },          "replace_microseconds": 'false'
       })

That's it

Useful command

List kernels

$ jupyter kernelspec list

Available kernels:

global-tf-python-3 /home/felipe/.local/share/jupyter/kernels/global-tf-python-3

local_venv2 /home/felipe/.local/share/jupyter/kernels/local_venv2

python2 /home/felipe/.local/share/jupyter/kernels/python2

python36 /home/felipe/.local/share/jupyter/kernels/python36

scala /home/felipe/.local/share/jupyter/kernels/scala

Remove kernel

$ jupyter kernelspec remove old_kernel

Install kernel

python -m ipykernel install --user --name=My_Project_Name"

黑皮考町

搜尋此網誌

Run notebook in Airflow

Useful command

標籤

留言

張貼留言

這個網誌中的熱門文章

[解決方法] docker: permission denied

[C#] Visual Studio, 如何在10分鐘內快速更改命名專案名稱

[Visual Studio Code] 如何切換背景主題