You are browsing a read-only backup copy of Wikitech. The live site can be found at


From Wikitech-static
< User:AikoChou
Revision as of 20:37, 18 August 2022 by imported>AikoChou (Part II)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


This is a guide on deploying an ML model as an Inference Service (isvc) on Lift Wing. As an example, we will be creating an NSFW model inference service.


A prerequisite for the guide is that we have loaded the NSFW model as a KServe custom inference service in a local Docker container (done in T313526). Therefore, we have a basic model serving code and the dependencies file requirements.txt.

Make sure you have access to the following hosts:

  • ml-sandbox -
  • deployment server - like deploy1002.eqiad.wmnet
  • stat100x machine - like stat1007.eqiad.wmnet


We will submit code changes to the following repositories:

Clone the repositories with commit-msg hook from Gerrit.

Production Image Development


Blubber is an abstraction for container build configurations, used by Wikimedia CI to publish production-ready Docker images. We need to develop a Blubberfile that generates a Dockerfile to build an image that can be run in production.

Here is a Bubberfile for serving the NSFW model.

version: v4
  insecurely: true

  in: /srv/nsfw-model

      version: python3
      requirements: [nsfw-model/model-server/requirements.txt]
        - python3-pip
      command: ["rm -rf /var/cache/apk/*"]
      - from: local
        source: nsfw-model/model-server
        destination: model-server
      - from: build
        source: /opt/lib/python/site-packages
        destination: /opt/lib/python/site-packages
        - python3
        - python3-distutils
      version: python3
      use-system-flag: false
    entrypoint: ["python3",  "model-server/"]

        - python3-pip
      - from: local
        source: nsfw-model/model-server
        destination: model-server
    entrypoint: ["tox", "-c", "model-server/tox.ini"]
      version: python3
      use-system-flag: false
      requirements: [nsfw-model/model-server/requirements-test.txt]

tutorial 1, tutorial 2..

Please check out this awesome tutorial to learn how to create your own Blubberfile!

Build a Image

To build the Docker image, use the following command:

blubber .pipeline/nsfw/blubber.yaml production | docker build -t aiko/nsfw-model:1 --file - .

I push the image to the Docker Hub, so it can be used in ML-Sandbox later.

docker push aiko/nsfw-model:1

Testing your Image in ML-Sandbox

Upload a model to Minio

Minio is a model storage we use in ML-sandbox. Before uploading a model, open a separate terminal, expose the minio outside of minikube:

aikochou@ml-sandbox:~$ kubectl port-forward $(kubectl get pod -n kserve-test --selector="app=minio" --output jsonpath='{.items[0]}') 9000:9000 -n kserve-test

To upload the model, use the following command:

aikochou@ml-sandbox:~$ mc cp model.h5 myminio/wmf-ml-models/nsfw-model/

Create an Inference Service

We need a nsfw-service.yaml to create an Inference Service:

kind: InferenceService
  name: nsfw-model
  annotations: "true"
    serviceAccountName: sa
      - name: kserve-container
        image: aiko/nsfw-model:1
          - name: STORAGE_URI
            value: "s3://wmf-ml-models/nsfw-model/"

It defines the container image to "aiko/nsfw-model:1" that we generated from the Blubberfile and points the storage uri to the location where the model is stored. Apply the CRD:

aikochou@ml-sandbox:~$ kubectl apply -f nsfw-service.yaml

Check if the inference service is up running:

aikochou@ml-sandbox:~$ kubectl get pod -n kserve-test
NAME                                                            READY   STATUS    RESTARTS   AGE
minio-fbbf6dfb8-p65fr                                           1/1     Running   0          16d
nsfw-model-predictor-default-cl72b-deployment-9585657df-kk65x   2/2     Running   0          7d8h

Run a prediction

We use a script that sets model name, ingress host and port, service host name, and uses curl to query the inference service. A test sample input_nsfw.json needs to be in the directory as well.

INGRESS_HOST=$(minikube ip)
INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?("http2")].nodePort}')
SERVICE_HOSTNAME=$(kubectl get isvc ${MODEL_NAME} -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)

curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict -d @./input_nsfw.json --http1.1

Run the test script:

aikochou@ml-sandbox:~$ sh
{"prob_nsfw": 0.9999992847442627, "prob_sfw": 7.475603638340544e-07}

In the process of development, we may modify or Blubberfile for various reasons (e.g. adding a missing package). As a result, we will repeat the above steps: rebuild an image, apply the CRD, create an inference service, run a prediction. Developing too many docker images may result in insufficient space in the ML-Sandbox. When it happens, you can use the following commands to clean up images:

aikochou@ml-sandbox:~$ minikube ssh
Last login: Tue Aug  9 14:38:49 2022 from
docker@minikube:~$ docker image ls
docker@minikube:~$ docker image rm <image you want to delete>

Delete the Inference Service after testing:

aikochou@ml-sandbox:~$ kubectl delete -f nsfw-service.yaml


Once you are happy with the image generated from the Blubberfile, it is time to configure the pipeline to build the image, run the tests, and publish the production-ready image.

In our inference-services repo, we need to add two pipelines in .pipeline/config.yaml:

      - name: run-test
        build: test
        run: true
      - name: production
        build: production

    blubberfile: nsfw/blubber.yaml
      - name: publish
        build: production
            name: '${setup.project}-nsfw'
            tags: [stable]

Switch to integration/config repo, we need to define the jobs and set triggers in the Jenkins job builder spec for the new service. Search for "machinelearning/liftwing/inference-services" in the files and follow the pattern to add new entries. It is basically copy/paste existing inference-services configs for the new Inference Service image.

  • jjb/project-pipelines.yaml
- project:
    # machinelearning/liftwing/inference-services
    name: inference-services
        - nsfw
        - nsfw-publish
        # trigger-inference-services-pipeline-nsfw
        # trigger-inference-services-pipeline-nsfw-publish
        # inference-services-pipeline-nsfw
        # inference-services-pipeline-nsfw-publish
  • zuul/layout.yaml
  # machinelearning/liftwing/inference-services holds several projects each
  # having at least two pipelines. We thus need files based filtering and a
  # meta job to cover all the pipelines variants.
  - name: ^trigger-inference-services-pipeline-nsfw
        - '.pipeline/nsfw/blubber.yaml'
        - '.pipeline/config.yaml'
        - 'nsfw-model/model-server/.*'
  # When adding a new sub project, make sure to add a job filter above in the
  # job section to have the job only trigger for the directory holding the
  # project in the repository.
  - name: machinelearning/liftwing/inference-services
      - trigger-inference-services-pipeline-nsfw
      - trigger-inference-services-pipeline-nsfw
      - trigger-inference-services-pipeline-nsfw-publish

When you are done editing, you can commit your code and create a patchset for the repo. Here are the changes we have made so far:

Once your code get reviewed and merged, you will see the PipelineBot comments on the patch with a pointer to the new image and tags it made, like:

Wikimedia Pipeline

Image BuildSUCCESS



2022-08-11-085125-publish, stable


Upload a model to Swift

We store model files that used in production in Swift, which is an open-source s3-compatible object store that is widely-used across the WMF.

To upload the model, jump to a stat100x host and use a tool called model_upload:

aikochou@stat1007:~$ model_upload model.h5 experimental nsfw wmf-ml-models

Check if the upload is successful:

aikochou@stat1007:~$ s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg ls -r s3://wmf-ml-models/experimental/nsfw/
2022-08-11 08:28     70393536  s3://wmf-ml-models/experimental/nsfw/20220811082819/model.h5


Since the NSFW is a new model, ML SRE will set up a new helmfile/namespace config in the deployment-charts repo. Most of the times the value.yaml is the one you want to modify.

  • value.yaml
  annotations: "true"
    image: "machinelearning-liftwing-inference-services-nsfw"
    version: "2022-08-11-085124-publish"
      - name: STORAGE_URI
        value: "s3://wmf-ml-models/experimental/nsfw/20220811082819/"
  - name: "nsfw-model"
  • values-ml-staging-codfw.yaml

elukey: The helmfile config picks up the values.yaml file first, then the staging one, so unless you specifically override things in the staging yaml nothing will be picked up from.

(if you check helmfile.yaml in the experimental dir of deployment charts at line 22 "values" will explain what I am saying)

(values are picked up from top to bottom)


Machine Learning/LiftWing/Deploy#How to deploy

Test the model after deployment

Machine Learning/LiftWing/Deploy#Test your model after deployment