You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Machine Learning/LiftWing/KServe

From Wikitech-static
< Machine Learning‎ | LiftWing
Revision as of 15:47, 12 July 2022 by imported>Elukey (→‎Example 1)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

KServe is a Python framework and K8s infrastructure aimed to standardize the way that people run and deploy HTTP servers wrapping up ML models. The Machine Learning team uses it in the LiftWing K8s cluster, to implement the new model serving infrastructure that should replace ORES.

How Kserve fits into the Kubernetes picture?

As described above, KServe represents two things:

  • A Python framework to load model binaries and wrap them around a consistent and standard HTTP interface/server.
  • A set of Kubernetes resources and controllers able to deploy the aforementioned HTTP servers.

Before concentrating on Kubernetes it is wise to learn a bit how the Python framework works and how to write custom code to serve your model.

Repositories

The starting point is surely the inference-services repository, where we keep all our configurations and Python code needed to generate the Docker images that will run on Kubernetes.

Run KServe locally via Docker

Testing locally is possible with Docker, but it needs a little bit of knowledge about how Kserve works.

Example 1

Let's imagine that we want to run the enwiki revscoring editquality goodfaith model locally, to test how it works:

  • First of all, we need to deploy the inference-services repository (see the related section for more info).
  • We need to have Blubber available locally.
  • We need to get the model binary version that we need (in our case, they are available in https://github.com/wikimedia/editquality/tree/master/models).
  • In the inference-service repo, change dir to revscoring/editquality
  • Run the following commands to build the Docker image: blubber ../../.pipeline/editquality/blubber.yaml production | docker build --tag SOME-DOCKER-TAG-THAT-YOU-LIKE --file - .
    • If you are curious about what Dockerfile gets built, remove the docker build command and see the output of Blubber.
  • At this point, we should see a Docker image in your local environment named after the tag added to the docker build command (use docker image ls to check).
  • Check the model.py file related to editquality (contained in the model-server directory) and familiarize with the __init__() function. All the environment variables retrieved in there are usually passed to the container by Kubernetes settings, so with Docker we'll have to explicitly set them.
  • Now you can create your specific playground directory under /tmp or somewhere else. The important bit is that you place the model binary file inside it. In this example, let's suppose that we are under /tmp/test-kserve, and that the model binary is stored in a subdirectory called models (so the binary's path is /tmp/test-kserve/models/model.bin). The name of the model is important, the standard is model.bin (so please rename your binary in case it doesn't match).
  • Run something like the following: docker run -p 8080:8080 -e INFERENCE_NAME=enwiki-goodfaith -e WIKI_URL=https://en.wikipedia.org --rm -v `pwd`/models:/mnt/models SOME-DOCKER-TAG-THAT-YOU-LIKE
  • Now we are ready to test the model server!
    • Create a file called input.json with the following content: { "rev_id": 1097728152 }
    • Execute: curl localhost:8080/v1/models/enwiki-goodfaith:predict -i -X POST -d@input.json --header "Content-type: application/json" --header "Accept-Encoding: application/json"
    • If everything goes fine, you should see some scores in the HTTP response.

Example 2

A more complicated example is how to test code that needs to call services (besides the MW API). One example is the testing of https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/808247

In the above code change, we are trying to add support for EventGate. The new code would allow us to create and send specific JSON events via HTTP POSTs to EventGate, but in our case we don't need to re-create the whole infrastructure locally; a simple HTTP server to echo the POST content is enough to verify the functionality.

The Docker daemon creates containers in a default network called bridge, that we can use to connect two containers together. The idea is to:

  • Create a KServe container like explained in the Example 1.
  • Create a HTTP server in another container using Python.

The latter is simple. Let's create a directory with two files:

FROM python:3-alpine

EXPOSE 6666

RUN mkdir /ws
COPY server.py /ws/server.py

WORKDIR /ws

CMD ["python", "server.py"]

We can then build and execute the container:

  • docker build . -t simple-http-server
  • docker run --rm -it -p 6666 simple-http-server

Before creating the KServe container, let's check the running container's IP:

  • docker ps (to get the container id)
  • docker inspect #container-id | grep IPAddress (let's assume it is 172.19.0.3)

As you can see in https://gerrit.wikimedia.org/r/c/machinelearning/liftwing/inference-services/+/808247, two new variables have been added to __init__: EVENTGATE_URL and EVENTGATE_STREAM. So let's add them to the run command:

docker run --network my-net -p 8080:8080 -e EVENTGATE_STREAM=test -e EVENTGATE_URL="http://172.19.0.3:6666" -e INFERENCE_NAME=enwiki-goodfaith -e WIKI_URL=https://en.wikipedia.org --rm -v `pwd`/models:/mnt/models SOME-DOCKER-TAG

Now you can test via curl the new code, and you should see the HTTP POST send by the KServe container to the "fake" EventGate simple HTTP server!