Difference between revisions of "User:Elukey/MachineLearning/Deploy"

From Wikitech-static
Jump to navigation Jump to search
imported>Elukey
 
imported>Elukey
Line 1: Line 1:
== Summary ==
== Summary ==
This page is a guide for ML Team members to start experimenting with helm and ml-serve. The guide is meant to outline how somebody can deploy a KServe InferenceService.
This page is a guide for ML Team members to start experimenting with helm and ml-serve. The guide is meant to outline the procedure to deploy a KServe InferenceService.


== Helm at the WMF ==
== Helm at the WMF ==
Line 6: Line 6:


=== Charts ===
=== Charts ===
The charts are [https://helm.sh/ Helm] charts, that we can consider (at high level) like Debian packages for service definitions. They have a version, and every time that we update/add/delete/etc.. anything in them we need to bump that version. A certain chart will be then deployed following its versioning (more details later on). A Helm chart is a collection of yaml files, and templates can be used to control their content. A special file called values.yaml is meant to contain the default values for all the placeholders/variables/etc.. used in the templates.
The charts are [https://helm.sh/ Helm] charts, that we can consider (at high level) like Debian packages for service definitions. They have a version (in <code>Chart.yaml</code>), and every time that we update/add/delete/etc.. anything in them we need to bump that version. The charts will be then deployed following their versioning (more details later on). A Helm chart is a collection of yaml files, and templates can be used to control their content. A special file called values.yaml is meant to contain the default values for all the placeholders/variables/etc.. used in the templates.


In our case, we have created multiple charts, but the ones that we care for this tutorial are:
In our case, we have created multiple charts, but the ones that we care for this tutorial are:


- kserve
- <code>kserve</code>


- kserve-inference
- <code>kserve-inference</code>


The kserve chart contains the KServe upstream yaml file with all the Kubernetes Resource definitions needed to deploy the service. For example, it takes care to create the kserve namespace config and the kserve-controller-manager pod, that periodically checks for InferenceService resoures and takes care of creating the related pods when needed. This chart is changed when Kserve needs to be upgraded (a new upstream version is out) or if we want to tune some of its configs.
The <code>kserve</code> chart contains the KServe upstream yaml file with all the Kubernetes Resource definitions needed to deploy the service. For example, it takes care to create the <code>kserve</code> namespace config and the <code>kserve-controller-manager</code> pod, that periodically checks for <code>InferenceService</code> resources and takes care of creating the related pods when needed. This chart is changed when Kserve needs to be upgraded (a new upstream version is out) or if we want to tune some of its configs.


The kserve-inference chart is where we define InferenceService resources, that correspond to the pods that implement our ML Services. The idea is to hide from the user all the complexity of an InferenceService config, reducing the boilerplate copy/paste to do. The template that is used in the chart allows the definition of a list of InferenceService resources, but its values.yaml file doesn't contain any value for it. This is because, as mentioned previously, the chart should contain values only for default settings (so something that can be applied to any clusterfor example). We deploy Helm charts via Helmfile, see the next section for more info!
The <code>kserve-inference</code> chart is where we define <code>InferenceService</code> resources, that correspond to the pods that implement our ML Services. The idea is to hide from the user all the complexity of an <code>InferenceService</code> config, reducing the boilerplate copy/paste to do. The template that is used in the chart allows the definition of a list of <code>InferenceService</code> resources, but its <code>values.yaml</code> file doesn't contain any value for it. This is because, as mentioned previously, the chart should contain values only for default settings (so something that can be applied to any cluster for example). We deploy Helm charts via Helmfile, see the next section for more info!


=== Helmfile ===
=== Helmfile ===
Helm is a nice deployment tool for Kubernetes, one of the de-facto standards. As more complex infrastructure were created using Kubernetes, it came up pretty soon the need to configure Helm deployments based on hierarchical settings and groups of charts bundled together, introducing the concept of cluster/environment. This is why helmfile was created! It is basically a very nice wrapper around Helm, implementing features that are not included in it. In the WMF use case, it allows the definition of multiple Kubernetes clusters (main-staging, main-eqiad, main-codfw, ml-serve-eqiad, ml-serve-codfw) and to manage various helm charts with a hierarchical config.
Helm is a nice deployment tool for Kubernetes, one of the de-facto standards. As more complex infrastructure were created using Kubernetes, it came up pretty soon the need to configure Helm deployments based on hierarchical settings and groups of charts bundled together, introducing the concept of cluster/environment. This is why helmfile was created! It is basically a very nice wrapper around Helm, implementing features that are not included in it. In the WMF use case, it allows the definition of multiple Kubernetes clusters (main-staging, main-eqiad, main-codfw, ml-serve-eqiad, ml-serve-codfw) and to manage various helm charts with a hierarchical config.
The major difference with Helm charts is that Helmfile configs don't have a version, and they have a totally different syntax from regular charts (but they allow to use templating in yaml files).
== Real life example ==
Let's start with a real life example, namely deploying a new revscoring-based model. In a world without Helm/Helmfile, we would craft something like the following:<syntaxhighlight lang="yaml">
apiVersion: v1
kind: Secret
metadata:
  name: test-secret
  annotations:
    serving.kserve.io/s3-endpoint: thanos-swift.discovery.wmnet
    serving.kserve.io/s3-usehttps: "1"
    serving.kserve.io/s3-region: "us-east-1"
type: Opaque
stringData:
  AWS_ACCESS_KEY_ID: someaccount
  AWS_SECRET_ACCESS_KEY: somepassword
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: testelukey
secrets:
- name: test-secret
---
apiVersion: serving.kubeflow.org/v1beta1
kind: InferenceService
metadata:
  name: enwiki-goodfaith
  annotations:
    sidecar.istio.io/inject: "false"
spec:
  predictor:
    serviceAccountName: testelukey
    containers:
      - name: kfserving-container
        image: docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-editquality:2021-07-28-204847-production
        env:
          # TODO: https://phabricator.wikimedia.org/T284091
          - name: STORAGE_URI
            value: "s3://wmf-ml-models/goodfaith/enwiki/202105140814/"
          - name: INFERENCE_NAME
            value: "enwiki-goodfaith"
          - name: WIKI_URL
            value: "https://en.wikipedia.org"
</syntaxhighlight>And then we would just <code>kubectl apply -f</code> it to the cluster. This is doable for small tests but it is clearly not scalable for multiple users. There are also a lot of boilerplate configs like secrets and service accounts that should be hidden from users, together with every bit that is common between different InferenceService resources. Here comes our kserve-inference chart :)
As explained above, the chart should only be changed when we have base/common/default configs that we want to apply to all the model services, otherwise Helmfile is the right place to start.
The ML services config is stored in the [[gerrit:admin/repos/operations/deployment-charts|deployment-charts]] repository, under <code>helmfile.d/ml-services</code>. There may be one or more sub-directories to explore, generally we create one for each group of models that we want to deploy. For example, the <code>revscoring-editquality</code> directory contains the definition of all the InferenceService resources for the edit quality models.
Once identified the correct ml-services dir, there are two files to consider:
- <code>helmfile.yaml</code>
- <code>values.yaml</code>
The former is a helmfile specific config of the service, that includes what helm charts are used, their dependencies, how to release to specific clusters, etc.. 99% of the times this file should be left untouched, unless one wants to specifically modify something in it. The vast majority of the times the values.yaml is the one that we want to modify, since it contains the configuration bits for the kserve-inference chart. Do you recall that the chart's values.yaml file contained only default configs? The helmfile's values.yaml file contains the service-specific bits, most notably the list of InferenceService resources to deploy.
For the moment we have only the revscoring-editquality helmfile config, let's see its values.yaml file content:<syntaxhighlight lang="yaml">
docker:
  registry: docker-registry.discovery.wmnet/wikimedia
  imagePullPolicy: IfNotPresent
inference:
  image: "machinelearning-liftwing-inference-services-editquality"
  version: "2021-09-01-140944-production"
  annotations:
    sidecar.istio.io/inject: "false"
  base_env:
    - name: WIKI_URL
      value: "https://api-ro.discovery.wmnet"
    - name: REQUESTS_CA_BUNDLE
      value: "/usr/share/ca-certificates/wikimedia/Puppet_Internal_CA.crt"
inference_services:
  - name: "enwiki-goodfaith"
    custom_env:
      - name: INFERENCE_NAME
        value: "enwiki-goodfaith"
      - name: WIKI_HOST
        value: "en.wikipedia.org"
      - name: STORAGE_URI
        value: "s3://wmf-ml-models/goodfaith/enwiki/202105140814/"
</syntaxhighlight>As we can see all the bits related to secrets and service accounts are gone, they are configured behind the scenes and hidden from the user. We have two relevant sections to consider:
* <code>inference</code> - this is the common config of all the InferenceService resources that we'll configure, and it is meant to avoid to copy/paste the same bits over and over in inference_services.
* <code>inference_services</code> - every entry in this list corresponds to a separate InferenceService resource, that is composed by the default config (outlined above) plus the more specific one. To see all the configuration bits allowed, please check the kserve-inference templates!
Let's imagine that we want to change the enwiki-goodfaith's docker image, leaving the rest of the inference_services entries to use the default one (in this case we have only one service definition, but pretend that there are way more :). We can try something like that:<syntaxhighlight lang="diff">
diff --git a/helmfile.d/ml-services/revscoring-editquality/values.yaml b/helmfile.d/ml-services/revscoring-editquality/values.yaml
index 30c59a38..ec8ad0e6 100644
--- a/helmfile.d/ml-services/revscoring-editquality/values.yaml
+++ b/helmfile.d/ml-services/revscoring-editquality/values.yaml
@@ -15,6 +15,8 @@ inference:
inference_services:
  - name: "enwiki-goodfaith"
+    image: "machine-learning-liftwing-new-docker-image"
+    image_version: "some-new-version"
    custom_env:
      - name: INFERENCE_NAME
        value: "enwiki-goodfaith"
</syntaxhighlight>This change, once code reviewed and deployed, will translate into a new Knative revision of the InferenceService, that will get all new traffic coming in.
TODO: Add best practices and tricks to use with Knative to split traffic between revisions etc..
I we want to add another model to the list, for example enwiki-damaging, this change should be enough:<syntaxhighlight lang="diff">
diff --git a/helmfile.d/ml-services/revscoring-editquality/values.yaml b/helmfile.d/ml-services/revscoring-editquality/values.yaml
index 30c59a38..26aaedd8 100644
--- a/helmfile.d/ml-services/revscoring-editquality/values.yaml
+++ b/helmfile.d/ml-services/revscoring-editquality/values.yaml
@@ -21,4 +21,12 @@ inference_services:
      - name: WIKI_HOST
        value: "en.wikipedia.org"
      - name: STORAGE_URI
-        value: "s3://wmf-ml-models/goodfaith/enwiki/202105140814/"
\ No newline at end of file
+        value: "s3://wmf-ml-models/goodfaith/enwiki/202105140814/"
+  - name: "enwiki-damaging"
+    custom_env:
+      - name: INFERENCE_NAME
+        value: "enwiki-damaging"
+      - name: WIKI_HOST
+        value: "en.wikipedia.org"
+      - name: STORAGE_URI
+        value: "s3://wmf-ml-models/goodfaith/damaging/202105140915/"
</syntaxhighlight>The idea is that we use each helmfile subdir of ml-services as collections of models, all belonging to the same group/category.
== How to deploy ==
Once you have code reviewed and merged a change for deployment-charts, you'll need to jump to the current deployment node (like deploy1002.eqiad.wmnet). If you don't have access to the host, it may be due to the fact that you are not in the <code>deploy-ml-services</code> POSIX group. In case you are not in it, file an Access Request to the SRE team (TODO: add links, all the ML team members are already in it).
Once on the deployment node, cd to <code>/srv/deployment-charts/helmfile.d/ml-services</code> and choose the directory corresponding to the model that you want to deploy. The repository gets updated to its latest version by puppet, so after merging your change it may take some minutes before your change appears in the repo on the deployment node. Use <code>git log</code> to confirm that your change is available.
It is always good to double check that the model.bin file is on S3/Swift before proceeding:<syntaxhighlight lang="bash">
elukey@ml-serve1001:~$ s3cmd ls s3://wmf-ml-models/goodfaith/enwiki/202105260914/
2021-10-19 14:43    10351347  s3://wmf-ml-models/goodfaith/enwiki/202105260914/model.bin
</syntaxhighlight>If there is no model.bin file, please do not proceed further!
At this point, you can use the helmfile command in the following way:
* <code>helmfile -e ml-serve-eqiad diff</code> to see what is going to be changed if you deploy (it only display a diff, no deploy action is taken, so it is a safe command to run)
* <code>helmfile -e ml-serve-eqiad sync</code> to deploy via helm your new config/code/etc..
== Test your model after deployment ==
Once an InferenceService is deployed/changed it should become available with the HTTP Host header <code>MODELNAME.KUBERNETES_NAMESPACE.wikimedia.org</code>. For example, the aforementioned <code>enwiki-goodfaith</code> model should get the Host header <code>enwiki-goodfaith.revscoring-editquality.wikimedia.org</code> (note: the kubernetes namespace is equal to the name of the ml-services' subdirectory). If you want to query it via curl:<syntaxhighlight lang="bash">
elukey@ml-serve-ctrl1001:~$ cat input.json
{ "rev_id": 132421 }
elukey@ml-serve-ctrl1001:~$ time curl "https://inference.svc.eqiad.wmnet:30443/v1/models/enwiki-goodfaith:predict" -X POST -d @input.json -i -H "Host: enwiki-goodfaith.revscoring-editquality.wikimedia.org" --http1.1
HTTP/1.1 200 OK
content-length: 112
content-type: application/json; charset=UTF-8
date: Tue, 19 Oct 2021 12:17:28 GMT
server: istio-envoy
x-envoy-upstream-service-time: 349
{"predictions": {"prediction": true, "probability": {"false": 0.06715093098078351, "true": 0.9328490690192165}}}
real 0m0.381s
user 0m0.015s
sys 0m0.011s
</syntaxhighlight>If you want to inspect some kubernetes-specific settings, for example the Knative revisions and their settings, you can connect to deploy1002 and do something like:<syntaxhighlight lang="bash">
elukey@deploy1002:~$ kube_env revscoring-editquality ml-serve-eqiad
elukey@deploy1002:~$ kubectl get pods
NAME                                                              READY  STATUS    RESTARTS  AGE
enwiki-goodfaith-predictor-default-84n6c-deployment-656584fbrx4  2/2    Running  0          4d21h
elukey@deploy1002:~$ kubectl get revisions
NAME                                      CONFIG NAME                          K8S SERVICE NAME                          GENERATION  READY  REASON
enwiki-goodfaith-predictor-default-7sbq5  enwiki-goodfaith-predictor-default  enwiki-goodfaith-predictor-default-7sbq5  1            True   
enwiki-goodfaith-predictor-default-84n6c  enwiki-goodfaith-predictor-default  enwiki-goodfaith-predictor-default-84n6c  5            True   
enwiki-goodfaith-predictor-default-g2ffj  enwiki-goodfaith-predictor-default  enwiki-goodfaith-predictor-default-g2ffj  3            True   
enwiki-goodfaith-predictor-default-jnb8s  enwiki-goodfaith-predictor-default  enwiki-goodfaith-predictor-default-jnb8s  2            True   
enwiki-goodfaith-predictor-default-t8dkx  enwiki-goodfaith-predictor-default  enwiki-goodfaith-predictor-default-t8dkx  4            True   
</syntaxhighlight>In case of troubles, you can always check the logs of the pods. For example, let's assume you see the following after deploying:<syntaxhighlight lang="bash">
elukey@deploy1002:~$ kube_env revscoring-editquality ml-serve-eqiad
elukey@deploy1002:~$ kubectl get pods
NAME 
revscoring-editquality  enwiki-damaging-predictor-default-ggjx2-deployment-b6977b6298cx  0/2    CrashLoopBackOff  6          10m
revscoring-editquality  enwiki-goodfaith-predictor-default-84n6c-deployment-656584fbrx4  2/2    Running            0          4d23h
</syntaxhighlight>If you just deployed enwiki-damaging, then something is not right. A quick sanity check could be to inspect the pod's container logs to see if anything looks weird. In this case:<syntaxhighlight lang="bash">
elukey@ml-serve-ctrl1001:~$ kubectl logs enwiki-damaging-predictor-default-ggjx2-deployment-b6977b6298cx -n revscoring-editquality storage-initializer
/usr/local/lib/python3.7/dist-packages/ray/autoscaler/_private/cli_logger.py:61: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
  "update your install command.", FutureWarning)
[I 211019 14:21:23 storage-initializer-entrypoint:13] Initializing, args: src_uri [s3://wmf-ml-models/damaging/enwiki/202105260914/] dest_path[ [/mnt/models]
[I 211019 14:21:23 storage:52] Copying contents of s3://wmf-ml-models/damaging/enwiki/202105260914/ to local
[I 211019 14:21:23 credentials:1102] Found credentials in environment variables.
[I 211019 14:21:23 storage:85] Successfully copied s3://wmf-ml-models/damaging/enwiki/202105260914/ to /mnt/models
elukey@ml-serve-ctrl1001:~$ kubectl describe pod enwiki-damaging-predictor-default-ggjx2-deployment-b6977b6298cx -n revscoring-editquality
</syntaxhighlight>We start from the storage-initializer since it is the first one that runs, and in this case it seems doing the right thing (namely pulling the model from s3 to local). So let's see the logs for the kserve-container:<syntaxhighlight lang="bash">
elukey@ml-serve-ctrl1001:~$ kubectl logs enwiki-damaging-predictor-default-ggjx2-deployment-b6977b6298cx -n revscoring-editquality kserve-container
Traceback (most recent call last):
  File "model-server/model.py", line 41, in <module>
    model.load()
  File "model-server/model.py", line 17, in load
    with open("/mnt/models/model.bin") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/models/model.bin'
</syntaxhighlight>The logs indicate that the model.bin file was not found, but the storage-initializer states that it did download it correctly. In this case the issue is sneaky, compare the following:<syntaxhighlight lang="bash">
[I 211019 14:21:23 storage:85] Successfully copied s3://wmf-ml-models/damaging/enwiki/202105260914/ to /mnt/models
</syntaxhighlight>And the relevant code review change:<syntaxhighlight lang="diff">
+      - name: STORAGE_URI
+        value: "s3://wmf-ml-models/damaging/enwiki/202105260914/"
</syntaxhighlight>And the s3 bucket list:<syntaxhighlight lang="bash">
elukey@ml-serve1001:~$ s3cmd ls s3://wmf-ml-models/damaging/enwiki/202105260914/
[..nothing..]
</syntaxhighlight>The model.bin file was not uploaded in the correct S3 path, and the storage-initializer probably failed gracefully. In this case a good follow up is uploading the model via s3cmd, or to create another S3 path and redo the change.

Revision as of 15:36, 19 October 2021

Summary

This page is a guide for ML Team members to start experimenting with helm and ml-serve. The guide is meant to outline the procedure to deploy a KServe InferenceService.

Helm at the WMF

The starting point is the deployment-charts repository, that is split into two macro parts: charts and helmfile config. Deployment pipeline is the official guide for the more common k8s services, a strongly suggested read.

Charts

The charts are Helm charts, that we can consider (at high level) like Debian packages for service definitions. They have a version (in Chart.yaml), and every time that we update/add/delete/etc.. anything in them we need to bump that version. The charts will be then deployed following their versioning (more details later on). A Helm chart is a collection of yaml files, and templates can be used to control their content. A special file called values.yaml is meant to contain the default values for all the placeholders/variables/etc.. used in the templates.

In our case, we have created multiple charts, but the ones that we care for this tutorial are:

- kserve

- kserve-inference

The kserve chart contains the KServe upstream yaml file with all the Kubernetes Resource definitions needed to deploy the service. For example, it takes care to create the kserve namespace config and the kserve-controller-manager pod, that periodically checks for InferenceService resources and takes care of creating the related pods when needed. This chart is changed when Kserve needs to be upgraded (a new upstream version is out) or if we want to tune some of its configs.

The kserve-inference chart is where we define InferenceService resources, that correspond to the pods that implement our ML Services. The idea is to hide from the user all the complexity of an InferenceService config, reducing the boilerplate copy/paste to do. The template that is used in the chart allows the definition of a list of InferenceService resources, but its values.yaml file doesn't contain any value for it. This is because, as mentioned previously, the chart should contain values only for default settings (so something that can be applied to any cluster for example). We deploy Helm charts via Helmfile, see the next section for more info!

Helmfile

Helm is a nice deployment tool for Kubernetes, one of the de-facto standards. As more complex infrastructure were created using Kubernetes, it came up pretty soon the need to configure Helm deployments based on hierarchical settings and groups of charts bundled together, introducing the concept of cluster/environment. This is why helmfile was created! It is basically a very nice wrapper around Helm, implementing features that are not included in it. In the WMF use case, it allows the definition of multiple Kubernetes clusters (main-staging, main-eqiad, main-codfw, ml-serve-eqiad, ml-serve-codfw) and to manage various helm charts with a hierarchical config.

The major difference with Helm charts is that Helmfile configs don't have a version, and they have a totally different syntax from regular charts (but they allow to use templating in yaml files).

Real life example

Let's start with a real life example, namely deploying a new revscoring-based model. In a world without Helm/Helmfile, we would craft something like the following:

apiVersion: v1
kind: Secret
metadata:
  name: test-secret
  annotations:
     serving.kserve.io/s3-endpoint: thanos-swift.discovery.wmnet
     serving.kserve.io/s3-usehttps: "1"
     serving.kserve.io/s3-region: "us-east-1"
type: Opaque
stringData:
  AWS_ACCESS_KEY_ID: someaccount
  AWS_SECRET_ACCESS_KEY: somepassword
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: testelukey
secrets:
- name: test-secret
---
apiVersion: serving.kubeflow.org/v1beta1
kind: InferenceService
metadata:
  name: enwiki-goodfaith
  annotations:
    sidecar.istio.io/inject: "false"
spec:
  predictor:
    serviceAccountName: testelukey
    containers:
      - name: kfserving-container
        image: docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-editquality:2021-07-28-204847-production
        env:
          # TODO: https://phabricator.wikimedia.org/T284091
          - name: STORAGE_URI
            value: "s3://wmf-ml-models/goodfaith/enwiki/202105140814/"
          - name: INFERENCE_NAME
            value: "enwiki-goodfaith"
          - name: WIKI_URL
            value: "https://en.wikipedia.org"

And then we would just kubectl apply -f it to the cluster. This is doable for small tests but it is clearly not scalable for multiple users. There are also a lot of boilerplate configs like secrets and service accounts that should be hidden from users, together with every bit that is common between different InferenceService resources. Here comes our kserve-inference chart :)

As explained above, the chart should only be changed when we have base/common/default configs that we want to apply to all the model services, otherwise Helmfile is the right place to start.

The ML services config is stored in the deployment-charts repository, under helmfile.d/ml-services. There may be one or more sub-directories to explore, generally we create one for each group of models that we want to deploy. For example, the revscoring-editquality directory contains the definition of all the InferenceService resources for the edit quality models.

Once identified the correct ml-services dir, there are two files to consider:

- helmfile.yaml

- values.yaml

The former is a helmfile specific config of the service, that includes what helm charts are used, their dependencies, how to release to specific clusters, etc.. 99% of the times this file should be left untouched, unless one wants to specifically modify something in it. The vast majority of the times the values.yaml is the one that we want to modify, since it contains the configuration bits for the kserve-inference chart. Do you recall that the chart's values.yaml file contained only default configs? The helmfile's values.yaml file contains the service-specific bits, most notably the list of InferenceService resources to deploy.

For the moment we have only the revscoring-editquality helmfile config, let's see its values.yaml file content:

docker:
  registry: docker-registry.discovery.wmnet/wikimedia
  imagePullPolicy: IfNotPresent

inference:
  image: "machinelearning-liftwing-inference-services-editquality"
  version: "2021-09-01-140944-production"
  annotations:
    sidecar.istio.io/inject: "false"
  base_env:
    - name: WIKI_URL
      value: "https://api-ro.discovery.wmnet"
    - name: REQUESTS_CA_BUNDLE
      value: "/usr/share/ca-certificates/wikimedia/Puppet_Internal_CA.crt"

inference_services:
  - name: "enwiki-goodfaith"
    custom_env:
      - name: INFERENCE_NAME
        value: "enwiki-goodfaith"
      - name: WIKI_HOST
        value: "en.wikipedia.org"
      - name: STORAGE_URI
        value: "s3://wmf-ml-models/goodfaith/enwiki/202105140814/"

As we can see all the bits related to secrets and service accounts are gone, they are configured behind the scenes and hidden from the user. We have two relevant sections to consider:

  • inference - this is the common config of all the InferenceService resources that we'll configure, and it is meant to avoid to copy/paste the same bits over and over in inference_services.
  • inference_services - every entry in this list corresponds to a separate InferenceService resource, that is composed by the default config (outlined above) plus the more specific one. To see all the configuration bits allowed, please check the kserve-inference templates!

Let's imagine that we want to change the enwiki-goodfaith's docker image, leaving the rest of the inference_services entries to use the default one (in this case we have only one service definition, but pretend that there are way more :). We can try something like that:

diff --git a/helmfile.d/ml-services/revscoring-editquality/values.yaml b/helmfile.d/ml-services/revscoring-editquality/values.yaml
index 30c59a38..ec8ad0e6 100644
--- a/helmfile.d/ml-services/revscoring-editquality/values.yaml
+++ b/helmfile.d/ml-services/revscoring-editquality/values.yaml
@@ -15,6 +15,8 @@ inference:
 
 inference_services:
   - name: "enwiki-goodfaith"
+    image: "machine-learning-liftwing-new-docker-image"
+    image_version: "some-new-version"
     custom_env:
       - name: INFERENCE_NAME
         value: "enwiki-goodfaith"

This change, once code reviewed and deployed, will translate into a new Knative revision of the InferenceService, that will get all new traffic coming in.

TODO: Add best practices and tricks to use with Knative to split traffic between revisions etc..

I we want to add another model to the list, for example enwiki-damaging, this change should be enough:

diff --git a/helmfile.d/ml-services/revscoring-editquality/values.yaml b/helmfile.d/ml-services/revscoring-editquality/values.yaml
index 30c59a38..26aaedd8 100644
--- a/helmfile.d/ml-services/revscoring-editquality/values.yaml
+++ b/helmfile.d/ml-services/revscoring-editquality/values.yaml
@@ -21,4 +21,12 @@ inference_services:
       - name: WIKI_HOST
         value: "en.wikipedia.org"
       - name: STORAGE_URI
-        value: "s3://wmf-ml-models/goodfaith/enwiki/202105140814/"
\ No newline at end of file
+        value: "s3://wmf-ml-models/goodfaith/enwiki/202105140814/"
+  - name: "enwiki-damaging"
+    custom_env:
+      - name: INFERENCE_NAME
+        value: "enwiki-damaging"
+      - name: WIKI_HOST
+        value: "en.wikipedia.org"
+      - name: STORAGE_URI
+        value: "s3://wmf-ml-models/goodfaith/damaging/202105140915/"

The idea is that we use each helmfile subdir of ml-services as collections of models, all belonging to the same group/category.

How to deploy

Once you have code reviewed and merged a change for deployment-charts, you'll need to jump to the current deployment node (like deploy1002.eqiad.wmnet). If you don't have access to the host, it may be due to the fact that you are not in the deploy-ml-services POSIX group. In case you are not in it, file an Access Request to the SRE team (TODO: add links, all the ML team members are already in it).

Once on the deployment node, cd to /srv/deployment-charts/helmfile.d/ml-services and choose the directory corresponding to the model that you want to deploy. The repository gets updated to its latest version by puppet, so after merging your change it may take some minutes before your change appears in the repo on the deployment node. Use git log to confirm that your change is available.

It is always good to double check that the model.bin file is on S3/Swift before proceeding:

elukey@ml-serve1001:~$ s3cmd ls s3://wmf-ml-models/goodfaith/enwiki/202105260914/
2021-10-19 14:43     10351347  s3://wmf-ml-models/goodfaith/enwiki/202105260914/model.bin

If there is no model.bin file, please do not proceed further!

At this point, you can use the helmfile command in the following way:

  • helmfile -e ml-serve-eqiad diff to see what is going to be changed if you deploy (it only display a diff, no deploy action is taken, so it is a safe command to run)
  • helmfile -e ml-serve-eqiad sync to deploy via helm your new config/code/etc..

Test your model after deployment

Once an InferenceService is deployed/changed it should become available with the HTTP Host header MODELNAME.KUBERNETES_NAMESPACE.wikimedia.org. For example, the aforementioned enwiki-goodfaith model should get the Host header enwiki-goodfaith.revscoring-editquality.wikimedia.org (note: the kubernetes namespace is equal to the name of the ml-services' subdirectory). If you want to query it via curl:

elukey@ml-serve-ctrl1001:~$ cat input.json 
{ "rev_id": 132421 }

elukey@ml-serve-ctrl1001:~$ time curl "https://inference.svc.eqiad.wmnet:30443/v1/models/enwiki-goodfaith:predict" -X POST -d @input.json -i -H "Host: enwiki-goodfaith.revscoring-editquality.wikimedia.org" --http1.1
HTTP/1.1 200 OK
content-length: 112
content-type: application/json; charset=UTF-8
date: Tue, 19 Oct 2021 12:17:28 GMT
server: istio-envoy
x-envoy-upstream-service-time: 349

{"predictions": {"prediction": true, "probability": {"false": 0.06715093098078351, "true": 0.9328490690192165}}}
real	0m0.381s
user	0m0.015s
sys	0m0.011s

If you want to inspect some kubernetes-specific settings, for example the Knative revisions and their settings, you can connect to deploy1002 and do something like:

elukey@deploy1002:~$ kube_env revscoring-editquality ml-serve-eqiad

elukey@deploy1002:~$ kubectl get pods 
NAME                                                              READY   STATUS    RESTARTS   AGE
enwiki-goodfaith-predictor-default-84n6c-deployment-656584fbrx4   2/2     Running   0          4d21h

elukey@deploy1002:~$ kubectl get revisions
NAME                                       CONFIG NAME                          K8S SERVICE NAME                           GENERATION   READY   REASON
enwiki-goodfaith-predictor-default-7sbq5   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-7sbq5   1            True    
enwiki-goodfaith-predictor-default-84n6c   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-84n6c   5            True    
enwiki-goodfaith-predictor-default-g2ffj   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-g2ffj   3            True    
enwiki-goodfaith-predictor-default-jnb8s   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-jnb8s   2            True    
enwiki-goodfaith-predictor-default-t8dkx   enwiki-goodfaith-predictor-default   enwiki-goodfaith-predictor-default-t8dkx   4            True

In case of troubles, you can always check the logs of the pods. For example, let's assume you see the following after deploying:

elukey@deploy1002:~$ kube_env revscoring-editquality ml-serve-eqiad

elukey@deploy1002:~$ kubectl get pods 
NAME  
revscoring-editquality   enwiki-damaging-predictor-default-ggjx2-deployment-b6977b6298cx   0/2     CrashLoopBackOff   6          10m
revscoring-editquality   enwiki-goodfaith-predictor-default-84n6c-deployment-656584fbrx4   2/2     Running            0          4d23h

If you just deployed enwiki-damaging, then something is not right. A quick sanity check could be to inspect the pod's container logs to see if anything looks weird. In this case:

elukey@ml-serve-ctrl1001:~$ kubectl logs enwiki-damaging-predictor-default-ggjx2-deployment-b6977b6298cx -n revscoring-editquality storage-initializer 
/usr/local/lib/python3.7/dist-packages/ray/autoscaler/_private/cli_logger.py:61: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
  "update your install command.", FutureWarning)
[I 211019 14:21:23 storage-initializer-entrypoint:13] Initializing, args: src_uri [s3://wmf-ml-models/damaging/enwiki/202105260914/] dest_path[ [/mnt/models]
[I 211019 14:21:23 storage:52] Copying contents of s3://wmf-ml-models/damaging/enwiki/202105260914/ to local
[I 211019 14:21:23 credentials:1102] Found credentials in environment variables.
[I 211019 14:21:23 storage:85] Successfully copied s3://wmf-ml-models/damaging/enwiki/202105260914/ to /mnt/models
elukey@ml-serve-ctrl1001:~$ kubectl describe pod enwiki-damaging-predictor-default-ggjx2-deployment-b6977b6298cx -n revscoring-editquality

We start from the storage-initializer since it is the first one that runs, and in this case it seems doing the right thing (namely pulling the model from s3 to local). So let's see the logs for the kserve-container:

elukey@ml-serve-ctrl1001:~$ kubectl logs enwiki-damaging-predictor-default-ggjx2-deployment-b6977b6298cx -n revscoring-editquality kserve-container 
Traceback (most recent call last):
  File "model-server/model.py", line 41, in <module>
    model.load()
  File "model-server/model.py", line 17, in load
    with open("/mnt/models/model.bin") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/models/model.bin'

The logs indicate that the model.bin file was not found, but the storage-initializer states that it did download it correctly. In this case the issue is sneaky, compare the following:

[I 211019 14:21:23 storage:85] Successfully copied s3://wmf-ml-models/damaging/enwiki/202105260914/ to /mnt/models

And the relevant code review change:

+      - name: STORAGE_URI
+        value: "s3://wmf-ml-models/damaging/enwiki/202105260914/"

And the s3 bucket list:

elukey@ml-serve1001:~$ s3cmd ls s3://wmf-ml-models/damaging/enwiki/202105260914/
[..nothing..]

The model.bin file was not uploaded in the correct S3 path, and the storage-initializer probably failed gracefully. In this case a good follow up is uploading the model via s3cmd, or to create another S3 path and redo the change.