You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

User:AikoChou/LiftWing

From Wikitech-static
Jump to navigation Jump to search

Summary

This is a guide on deploying an ML model as an Inference Service (isvc) on Lift Wing. As an example, we will be creating an NSFW model inference service.

Prerequisites

A prerequisite for the guide is that we have loaded the NSFW model as a KServe custom inference service in a local Docker container (done in T313526). Therefore, we have a basic model serving code model.py and the dependencies file requirements.txt.

Make sure you have access to the following hosts:

  • ml-sandbox - ml-sandbox.machine-learning.eqiad1.wikimedia.cloud
  • deployment server - like deploy1002.eqiad.wmnet
  • stat100x machine - like stat1007.eqiad.wmnet

Repositories

We will submit code changes to the following repositories:

Clone the repositories with commit-msg hook from Gerrit.

Production Image Development

Blubberfile

Blubber is an abstraction for container build configurations, used by Wikimedia CI to publish production-ready Docker images. We need to develop a Blubberfile that generates a Dockerfile to build an image that can be run in production.

Here is a Bubberfile for serving the NSFW model.

version: v4
base: docker-registry.wikimedia.org/buster:20220807
runs:
  insecurely: true

lives:
  in: /srv/nsfw-model

variants:
  build:
    python:
      version: python3
      requirements: [nsfw-model/model-server/requirements.txt]
    apt:
      packages:
        - python3-pip
    builder:
      command: ["rm -rf /var/cache/apk/*"]
  production:
    copies:
      - from: local
        source: nsfw-model/model-server
        destination: model-server
      - from: build
        source: /opt/lib/python/site-packages
        destination: /opt/lib/python/site-packages
    apt:
      packages:
        - python3
        - python3-distutils
    python:
      version: python3
      use-system-flag: false
    entrypoint: ["python3",  "model-server/model.py"]

  test:
    apt:
      packages:
        - python3-pip
    copies:
      - from: local
        source: nsfw-model/model-server
        destination: model-server
    entrypoint: ["tox", "-c", "model-server/tox.ini"]
    python:
      version: python3
      use-system-flag: false
      requirements: [nsfw-model/model-server/requirements-test.txt]

tutorial 1, tutorial 2..

Please check out this awesome tutorial to learn how to create your own Blubberfile!

Build a Image

To build the Docker image, use the following command:

blubber .pipeline/nsfw/blubber.yaml production | docker build -t aiko/nsfw-model:1 --file - .

I push the image to the Docker Hub, so it can be used in ML-Sandbox later.

docker push aiko/nsfw-model:1

Testing your Image in ML-Sandbox

Upload a model to Minio

Minio is a model storage we use in ML-sandbox. Before uploading a model, open a separate terminal, expose the minio outside of minikube:

aikochou@ml-sandbox:~$ kubectl port-forward $(kubectl get pod -n kserve-test --selector="app=minio" --output jsonpath='{.items[0].metadata.name}') 9000:9000 -n kserve-test

To upload the model, use the following command:

aikochou@ml-sandbox:~$ mc cp model.h5 myminio/wmf-ml-models/nsfw-model/

Create an Inference Service

We need a nsfw-service.yaml to create an Inference Service:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: nsfw-model
  annotations:
    sidecar.istio.io/inject: "true"
spec:
  predictor:
    serviceAccountName: sa
    containers:
      - name: kserve-container
        image: aiko/nsfw-model:1
        env:
          - name: STORAGE_URI
            value: "s3://wmf-ml-models/nsfw-model/"

It defines the container image to "aiko/nsfw-model:1" that we generated from the Blubberfile and points the storage uri to the location where the model is stored. Apply the CRD:

aikochou@ml-sandbox:~$ kubectl apply -f nsfw-service.yaml

Check if the inference service is up running:

aikochou@ml-sandbox:~$ kubectl get pod -n kserve-test
NAME                                                            READY   STATUS    RESTARTS   AGE
minio-fbbf6dfb8-p65fr                                           1/1     Running   0          16d
nsfw-model-predictor-default-cl72b-deployment-9585657df-kk65x   2/2     Running   0          7d8h

Run a prediction

We use a test.sh script that sets model name, ingress host and port, service host name, and uses curl to query the inference service. A test sample input_nsfw.json needs to be in the directory as well.

MODEL_NAME="nsfw-model"
INGRESS_HOST=$(minikube ip)
INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
SERVICE_HOSTNAME=$(kubectl get isvc ${MODEL_NAME} -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)

curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict -d @./input_nsfw.json --http1.1

Run the test script:

aikochou@ml-sandbox:~$ sh test.sh
...
{"prob_nsfw": 0.9999992847442627, "prob_sfw": 7.475603638340544e-07}

In the process of development, we may modify model.py or Blubberfile for various reasons (e.g. adding a missing package). As a result, we will repeat the above steps: rebuild an image, apply the CRD, create an inference service, run a prediction. Developing too many docker images may result in insufficient space in the ML-Sandbox. When it happens, you can use the following commands to clean up images:

aikochou@ml-sandbox:~$ minikube ssh
Last login: Tue Aug  9 14:38:49 2022 from 192.168.49.1
docker@minikube:~$ docker image ls
docker@minikube:~$ docker image rm <image you want to delete>

Delete the Inference Service after testing:

aikochou@ml-sandbox:~$ kubectl delete -f nsfw-service.yaml

Pipelines

Once you are happy with the image generated from the Blubberfile, it is time to configure the pipeline to build the image, run the tests, and publish the production-ready image.

In our inference-services repo, we need to add two pipelines in .pipeline/config.yaml:

  nsfw:
    stages:
      - name: run-test
        build: test
        run: true
      - name: production
        build: production

  nsfw-publish:
    blubberfile: nsfw/blubber.yaml
    stages:
      - name: publish
        build: production
        publish:
          image:
            name: '${setup.project}-nsfw'
            tags: [stable]

Switch to integration/config repo, we need to define the jobs and set triggers in the Jenkins job builder spec for the new service. Search for "machinelearning/liftwing/inference-services" in the files and follow the pattern to add new entries. It is basically copy/paste existing inference-services configs for the new Inference Service image.

  • jjb/project-pipelines.yaml
- project:
    # machinelearning/liftwing/inference-services
    name: inference-services
    pipeline:
        ...
        - nsfw
        - nsfw-publish
    jobs:
        ...
        # trigger-inference-services-pipeline-nsfw
        # trigger-inference-services-pipeline-nsfw-publish
        ...
        # inference-services-pipeline-nsfw
        # inference-services-pipeline-nsfw-publish
  • zuul/layout.yaml
  # machinelearning/liftwing/inference-services holds several projects each
  # having at least two pipelines. We thus need files based filtering and a
  # meta job to cover all the pipelines variants.
  ...
  - name: ^trigger-inference-services-pipeline-nsfw
    files:
        - '.pipeline/nsfw/blubber.yaml'
        - '.pipeline/config.yaml'
        - 'nsfw-model/model-server/.*'
  ...
  # When adding a new sub project, make sure to add a job filter above in the
  # job section to have the job only trigger for the directory holding the
  # project in the repository.
  - name: machinelearning/liftwing/inference-services
    test:
      ...
      - trigger-inference-services-pipeline-nsfw
    gate-and-submit:
      ...
      - trigger-inference-services-pipeline-nsfw
    postmerge:
      ...
      - trigger-inference-services-pipeline-nsfw-publish

When you are done editing, you can commit your code and create a patchset for the repo. Here are the changes we have made so far:

https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/822046

https://gerrit.wikimedia.org/r/c/integration/config/+/822052

Once your code get reviewed and merged, you will see the PipelineBot comments on the patch with a pointer to the new image and tags it made, like:

Wikimedia Pipeline

Image BuildSUCCESS

IMAGE:

docker-registry.wikimedia.org/wikimedia/machinelearning-liftwing-inference-services-topic:2022-08-11-085125-publish

TAGS:

2022-08-11-085125-publish, stable

Deployment

Upload a model to Swift

We store model files that used in production in Swift, which is an open-source s3-compatible object store that is widely-used across the WMF.

To upload the model, jump to a stat100x host and use a tool called model_upload:

aikochou@stat1007:~$ model_upload model.h5 experimental nsfw wmf-ml-models

Check if the upload is successful:

aikochou@stat1007:~$ s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg ls -r s3://wmf-ml-models/experimental/nsfw/
2022-08-11 08:28     70393536  s3://wmf-ml-models/experimental/nsfw/20220811082819/model.h5

Helmfile

https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/822326

Since the NSFW is a new model, ML SRE will set up a new helmfile/namespace config in the deployment-charts repo. Most of the times the value.yaml is the one you want to modify.

  • value.yaml
...
inference:
  annotations:
    sidecar.istio.io/inject: "true"
  predictor:
    image: "machinelearning-liftwing-inference-services-nsfw"
    version: "2022-08-11-085124-publish"
    base_env:
      - name: STORAGE_URI
        value: "s3://wmf-ml-models/experimental/nsfw/20220811082819/"
inference_services:
  - name: "nsfw-model"
  • values-ml-staging-codfw.yaml

elukey: The helmfile config picks up the values.yaml file first, then the staging one, so unless you specifically override things in the staging yaml nothing will be picked up from.

(if you check helmfile.yaml in the experimental dir of deployment charts at line 22 "values" will explain what I am saying)

(values are picked up from top to bottom)

Deploy

Machine Learning/LiftWing/Deploy#How to deploy

Test the model after deployment

Machine Learning/LiftWing/Deploy#Test your model after deployment