The simplest way to achieve this goal is using KubernetesPodOperator to execute papermill command.
For example
We overwrite the entrypoint to /bin/bash and pass a script that will run papermill to execute the specified notebook.run_notebook = kubernetes_pod_operator.KubernetesPodOperator( task_id=f"run-notebook", name=f"run-notebook", namespace='default', is_delete_operator_pod=True, image_pull_policy="IfNotPresent", startup_timeout_seconds=3600, cmds=['/bin/bash'], arguments=["-c", """ echo y | conda create --name=runenv python=3.8 source /opt/conda/etc/profile.d/conda.sh conda activate runenv
conda install ipykernel python -m ipykernel install --user --name=runenv papermill \ "{{dag_run.conf['nbIn']}}" \ "{{dag_run.conf['nbOut']}}" \ -k runenv \ {{dag_run.conf['nbParams']}} """ ], image=f'gcr.io/deeplearning-platform-release/base-cu100')
As you can see, papermill will take parameters passed to DAG
nbIn
The path of the notebook we want to execute.
nbOut
The path of the notebook execution result.
nbParams
Parameters which will be passed to notebook.
With these parameters, we can trigger the DAG like below
resp = requests.request( "POST", url, headers={ "Authorization": "Bearer {}".format(google_open_id_connect_token) }, json={ "conf": { "nbIn: "gs://andy-notebook/in/template.ipynb", "nbOut: "gs://andy-notebook/out/template.ipynb", "nbParams: "-p dataset gs://andy-data/", },
"replace_microseconds": 'false' })
That's it
Useful command
List kernels
$ jupyter kernelspec list
Available kernels:
global-tf-python-3 /home/felipe/.local/share/jupyter/kernels/global-tf-python-3
local_venv2 /home/felipe/.local/share/jupyter/kernels/local_venv2
python2 /home/felipe/.local/share/jupyter/kernels/python2
python36 /home/felipe/.local/share/jupyter/kernels/python36
scala /home/felipe/.local/share/jupyter/kernels/scala
Remove kernel
$ jupyter kernelspec remove old_kernel
Install kernel
python -m ipykernel install --user --name=My_Project_Name"
留言
張貼留言