Skip to content

Commit

Permalink
Copy editing Beam ML notebooks (apache#26226)
Browse files Browse the repository at this point in the history
* Copy editing the Beam ML notebooks

* typo fixes

* updated note capitalization

* Update examples/notebooks/beam-ml/automatic_model_refresh.ipynb

Co-authored-by: Danny McCormick <dannymccormick@google.com>

---------

Co-authored-by: Danny McCormick <dannymccormick@google.com>
  • Loading branch information
rszper and damccorm authored Apr 12, 2023
1 parent 0e8c3c2 commit 78db671
Show file tree
Hide file tree
Showing 9 changed files with 171 additions and 151 deletions.
69 changes: 37 additions & 32 deletions examples/notebooks/beam-ml/automatic_model_refresh.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,6 @@
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/AnandInguva/beam/blob/notebook/beam/examples/notebooks/beam-ml/side_Input_model_updates.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"source": [
Expand Down Expand Up @@ -57,7 +47,16 @@
{
"cell_type": "markdown",
"source": [
"# Update ML models in running pipelines"
"# Update ML models in running pipelines\n",
"\n",
"<table align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/automatic_model_refresh.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/automatic_model_refresh.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
" </td>\n",
"</table>\n"
],
"metadata": {
"id": "ZUSiAR62SgO8"
Expand All @@ -66,13 +65,13 @@
{
"cell_type": "markdown",
"source": [
"The pipeline in this notebook uses a RunInference `PTransform` to run inference on images using TensorFlow models. To update the model, it uses a side input `PCollection` that emits `ModelMetadata`.\n",
"\n",
"You can use side inputs to update your model in real-time, even while the Apache Beam pipeline is running. The side input is passed in a `ModelHandler` configuration object. You can update the model either by leveraging one of Apache Beam's provided patterns, such as the `WatchFilePattern`, or by configuring a custom side input `PCollection` that defines the logic for the model update.\n",
"This notebook demonstrates how to perform automatic model updates without stopping your Apache Beam pipeline.\n",
"You can use side inputs to update your model in real time, even while the Apache Beam pipeline is running. The side input is passed in a `ModelHandler` configuration object. You can update the model either by leveraging one of Apache Beam's provided patterns, such as the `WatchFilePattern`, or by configuring a custom side input `PCollection` that defines the logic for the model update.\n",
"\n",
"The pipeline in this notebook uses a RunInference `PTransform` with TensorFlow machine learning (ML) models to run inference on images. To update the model, it uses a side input `PCollection` that emits `ModelMetadata`.\n",
"For more information about side inputs, see the [Side inputs](https://beam.apache.org/documentation/programming-guide/#side-inputs) section in the Apache Beam Programming Guide.\n",
"\n",
"This example uses `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for the file updates matching the `file_pattern` based on timestamps. It emits the latest `ModelMetadata`, which is used in the RunInference `PTransform` to automatically update the ML model without stopping the Apache Beam pipeline.\n"
"This example uses `WatchFilePattern` as a side input. `WatchFilePattern` is used to watch for file updates that match the `file_pattern` based on timestamps. It emits the latest `ModelMetadata`, which is used in the RunInference `PTransform` to automatically update the ML model without stopping the Apache Beam pipeline.\n"
],
"metadata": {
"id": "tBtqF5UpKJNZ"
Expand All @@ -84,7 +83,7 @@
"## Before you begin\n",
"Install the dependencies required to run this notebook.\n",
"\n",
"To use RunInference with side inputs for automatic model updates, install `Apache Beam` version `2.46.0` or later."
"To use RunInference with side inputs for automatic model updates, use Apache Beam version 2.46.0 or later."
],
"metadata": {
"id": "SPuXFowiTpWx"
Expand Down Expand Up @@ -147,11 +146,14 @@
{
"cell_type": "markdown",
"source": [
"## Runner\n",
"## Configure the runner\n",
"\n",
"This pipeline runs on the Dataflow Runner. Ensure that you have all the required permissions to run the pipeline on Dataflow.\n",
"This pipeline uses the Dataflow Runner. To run the pipeline, you need to complete the following tasks:\n",
"\n",
"Configure the pipeline options for the pipeline to run on Dataflow. Make sure the pipeline is using streaming mode."
"* Ensure that you have all the required permissions to run the pipeline on Dataflow.\n",
"* Configure the pipeline options for the pipeline to run on Dataflow. Make sure the pipeline is using streaming mode.\n",
"\n",
"In the following code, replace `BUCKET_NAME` with the the name of your Cloud Storage bucket."
],
"metadata": {
"id": "ORYNKhH3WQyP"
Expand All @@ -172,7 +174,7 @@
"# Set the Google Cloud region that you want to run Dataflow in.\n",
"options.view_as(GoogleCloudOptions).region = 'us-central1'\n",
"\n",
"# IMPORTANT: Update the following line to choose a Cloud Storage location.\n",
"# IMPORTANT: Replace BUCKET_NAME with the the name of your Cloud Storage bucket.\n",
"dataflow_gcs_location = \"gs://BUCKET_NAME/tmp/\"\n",
"\n",
"# The Dataflow staging location. This location is used to stage the Dataflow pipeline and the SDK binary.\n",
Expand Down Expand Up @@ -220,10 +222,12 @@
{
"cell_type": "markdown",
"source": [
"## TensorFlow ModelHandler\n",
" This example uses `TFModelHandlerTensor` as the model handler and the `resnet_101` model trained on imagenet as our initial model used for inference.\n",
"## Use the TensorFlow model handler\n",
" This example uses `TFModelHandlerTensor` as the model handler and the `resnet_101` model trained on [ImageNet](https://www.image-net.org/).\n",
"\n",
" Download the model from [Google Cloud Storage](https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet101_weights_tf_dim_ordering_tf_kernels.h5) (link downloads the model), and place it in the directory that you want to use to update your model.\n",
"\n",
" Download the model from [Google Cloud Storage](https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet101_weights_tf_dim_ordering_tf_kernels.h5) (link downloads the model), and place it in the directory that you want to use to update your model."
"In the following code, replace `BUCKET_NAME` with the the name of your Cloud Storage bucket."
],
"metadata": {
"id": "_AUNH_GJk_NE"
Expand Down Expand Up @@ -319,8 +323,7 @@
"source": [
"1. Create a `PeriodicImpulse` transform, which emits output every `n` seconds. The `PeriodicImpulse` transform generates an infinite sequence of elements with a given runtime interval.\n",
"\n",
" In this example, `PeriodicImpulse` mimics the Pub/Sub source. Because the inputs in a streaming pipeline arrive in intervals, use `PeriodicImpulse` to output elements at `m` intervals.\n",
"\n",
" In this example, `PeriodicImpulse` mimics the Pub/Sub source. Because the inputs in a streaming pipeline arrive in intervals, use `PeriodicImpulse` to output elements at `m` intervals.\n",
"To learn more about `PeriodicImpulse`, see the [`PeriodicImpulse` code](https://github.com/apache/beam/blob/9c52e0594d6f0e59cd17ee005acfb41da508e0d5/sdks/python/apache_beam/transforms/periodicsequence.py#L150)."
],
"metadata": {
Expand Down Expand Up @@ -353,7 +356,7 @@
"source": [
"2. To read and pre-process the images, use the `read_image` function. This example uses `Cat-with-beanie.jpg` for all inferences.\n",
"\n",
" **Note**: Image used for prediction is licensed in CC-BY, creator in listed in the [LICENSE.txt](https://storage.googleapis.com/apache-beam-samples/image_captioning/LICENSE.txt) file."
" **Note**: Image used for prediction is licensed in CC-BY. The creator is listed in the [LICENSE.txt](https://storage.googleapis.com/apache-beam-samples/image_captioning/LICENSE.txt) file."
],
"metadata": {
"id": "8-sal2rFAxP2"
Expand Down Expand Up @@ -385,7 +388,8 @@
"cell_type": "markdown",
"source": [
"3. Pass the images to the RunInference `PTransform`. RunInference takes `model_handler` and `model_metadata_pcoll` as input parameters.\n",
" * `model_metadata_pcoll` is a [side input](https://beam.apache.org/documentation/programming-guide/#side-inputs) `PCollection` to the RunInference `PTransform`. This side input is used to update the `model_uri` in the `model_handler` without needing to stop the Apache Beam pipeline. Use `WatchFilePattern` as side input to watch a `file_pattern` matching `.h5` files. In this case, the `file_pattern` is `'gs://BUCKET_NAME/*.h5'`.\n",
" * `model_metadata_pcoll` is a side input `PCollection` to the RunInference `PTransform`. This side input is used to update the `model_uri` in the `model_handler` without needing to stop the Apache Beam pipeline\n",
" * Use `WatchFilePattern` as side input to watch a `file_pattern` matching `.h5` files. In this case, the `file_pattern` is `'gs://BUCKET_NAME/*.h5'`.\n",
"\n"
],
"metadata": {
Expand Down Expand Up @@ -418,8 +422,7 @@
"cell_type": "markdown",
"source": [
"4. Post-process the `PredictionResult` object.\n",
"\n",
" When the inference is complete, RunInference outputs a `PredictionResult` object that contains the fields `example`, `inference`, and `model_id`. The `model_id` field identifies the model used to run the inference. The `PostProcessor` returns the predicted label and the model ID used to run the inference on the predicted label."
"When the inference is complete, RunInference outputs a `PredictionResult` object that contains the fields `example`, `inference`, and `model_id`. The `model_id` field identifies the model used to run the inference. The `PostProcessor` returns the predicted label and the model ID used to run the inference on the predicted label."
],
"metadata": {
"id": "lTA4wRWNDVis"
Expand All @@ -442,9 +445,9 @@
{
"cell_type": "markdown",
"source": [
"**How to watch for the automatic model update**\n",
"### Watch for the model update\n",
"\n",
" After the pipeline starts processing data and when you see output emitted from the RunInference `PTransform`, upload a `resnet152` model saved in `.h5` format to a Google Cloud Storage bucket location that matches the `file_pattern` you defined earlier. You can download a copy of the model by clicking [this link](https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet152_weights_tf_dim_ordering_tf_kernels.h5). RunInference uses `WatchFilePattern` as a side input to update the `model_uri` of `TFModelHandlerTensor`."
"After the pipeline starts processing data and when you see output emitted from the RunInference `PTransform`, upload a `resnet152` model saved in `.h5` format to a Google Cloud Storage bucket location that matches the `file_pattern` you defined earlier. You can [download a copy of the model](https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet152_weights_tf_dim_ordering_tf_kernels.h5) (link downloads the model). RunInference uses `WatchFilePattern` as a side input to update the `model_uri` of `TFModelHandlerTensor`."
],
"metadata": {
"id": "wYp-mBHHjOjA"
Expand All @@ -453,7 +456,9 @@
{
"cell_type": "markdown",
"source": [
"## Run the pipeline"
"## Run the pipeline\n",
"\n",
"Use the following code to run the pipeline."
],
"metadata": {
"id": "_ty03jDnKdKR"
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/beam-ml/run_inference_multi_model.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
"\n",
"For more information about the RunInference API, review the [RunInference notebook](https://colab.research.google.com/drive/111USL4VhUa0xt_mKJxl5nC1YLOC8_yF4?usp=sharing#scrollTo=746b67a7-3562-467f-bea3-d8cd18c14927).\n",
"\n",
"**Note:** all images are licensed CC-BY, creators are listed in the [LICENSE.txt](https://storage.googleapis.com/apache-beam-samples/image_captioning/LICENSE.txt) file."
"**Note:** All images are licensed CC-BY, and creators are listed in the [LICENSE.txt](https://storage.googleapis.com/apache-beam-samples/image_captioning/LICENSE.txt) file."
],
"metadata": {
"id": "6vZWSLyuM_P4"
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/beam-ml/run_inference_pytorch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@
"id": "A8xNRyZMW1yK"
},
"source": [
"This notebook demonstrates the use of the RunInference transform for PyTorch. Apache Beam includes implementations of the [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) class for [users of PyTorch](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.pytorch_inference.html). For more information about the RunInference API, see [Machine Learning](https://beam.apache.org/documentation/sdks/python-machine-learning) in the Apache Beam documentation.\n",
"This notebook demonstrates the use of the RunInference transform for PyTorch. Apache Beam includes implementations of the [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) class for [users of PyTorch](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.pytorch_inference.html). For more information about using RunInference, see [Get started with AI/ML pipelines](https://beam.apache.org/documentation/ml/overview/) in the Apache Beam documentation.\n",
"\n",
"\n",
"This notebook illustrates common RunInference patterns, such as:\n",
Expand Down
2 changes: 1 addition & 1 deletion examples/notebooks/beam-ml/run_inference_sklearn.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@
},
"source": [
"This notebook demonstrates the use of the RunInference transform for [scikit-learn](https://scikit-learn.org/), also called sklearn.\n",
"Apache Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) has implementations of the [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) class prebuilt for scikit-learn. For more information about the RunInference API, see [Machine Learning](https://beam.apache.org/documentation/sdks/python-machine-learning) in the Apache Beam documentation.\n",
"Apache Beam [RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference) has implementations of the [ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) class prebuilt for scikit-learn. For more information about using RunInference, see [Get started with AI/ML pipelines](https://beam.apache.org/documentation/ml/overview/) in the Apache Beam documentation.\n",
"\n",
"You can choose the appropriate model handler based on your input data type:\n",
"* [NumPy model handler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.sklearn_inference.html#apache_beam.ml.inference.sklearn_inference.SklearnModelHandlerNumpy)\n",
Expand Down
Loading

0 comments on commit 78db671

Please sign in to comment.