You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Machine Learning/LiftWing: Difference between revisions
imported>Accraze (→Stack: adding links to Knative and KServe images) |
imported>Accraze (→Logging: add grafana links for KServe and Knative Serving) |
||
Line 6: | Line 6: | ||
== Stack == | == Stack == | ||
=== Knative Images === | {| class="wikitable" | ||
|- | |||
! Software !! Version | |||
|- | |||
| k8s || v1.16.5 | |||
|- | |||
| Istio || v1.9.5 | |||
|- | |||
| Knative || v0.18.1 | |||
|- | |||
| KServe || v0.7.0 | |||
|} | |||
=== Istio === | |||
Istio is a service-mesh where we can run our ML-services. It is installed using the istioctl package, which has been added to the WMF APT repository (https://wikitech.wikimedia.org/wiki/APT_repository, debian buster). See: https://apt-browser.toolforge.org/buster-wikimedia/main/ , we are currently running Istio 1.9.5 (istioctl: 1.9.5-1) | |||
=== Knative === | |||
We use Knative Serving for running serverless containers on k8s using Istio. It also allows for various deployment strategies like canary, blue-green, A/B tests, etc. | |||
==== Charts ==== | |||
* Knative Serving CRDs: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/knative-serving-crds/ | |||
* Knative Serving: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/knative-serving/ | |||
==== Images ==== | |||
* Webhook: https://docker-registry.wikimedia.org/knative-serving-webhook/tags/ | * Webhook: https://docker-registry.wikimedia.org/knative-serving-webhook/tags/ | ||
* Queue: https://docker-registry.wikimedia.org/knative-serving-queue/tags/ | * Queue: https://docker-registry.wikimedia.org/knative-serving-queue/tags/ | ||
Line 20: | Line 39: | ||
* Net-istio controller: https://docker-registry.wikimedia.org/knative-net-istio-controller/tags/ | * Net-istio controller: https://docker-registry.wikimedia.org/knative-net-istio-controller/tags/ | ||
=== | === KServe === | ||
We use KServe for it's custom <code>InferenceService</code> resource. It enables us to expose our ML models as asynchronous micro-services. | |||
==== Charts ==== | |||
* KServe: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/kserve/ | |||
* InferenceService: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/kserve-inference/ | |||
==== Images ==== | |||
* KServe agent: https://docker-registry.wikimedia.org/kserve-agent/tags/ | * KServe agent: https://docker-registry.wikimedia.org/kserve-agent/tags/ | ||
* Kserve controller: https://docker-registry.wikimedia.org/kserve-controller/tags/ | * Kserve controller: https://docker-registry.wikimedia.org/kserve-controller/tags/ | ||
Line 33: | Line 57: | ||
== Components == | == Components == | ||
=== | === Monitoring === | ||
* | * KServe: https://grafana.wikimedia.org/d/Rvs1p4K7k/kserve | ||
* Knative Serving: https://grafana.wikimedia.org/d/c6GYmqdnz/knative-serving | |||
=== Serving === | === Serving === |
Revision as of 16:46, 10 March 2022
Lift Wing
A scalable machine learning model serving infrastructure on Kubernetes using KServe.
- Phabricator MVP Task: https://phabricator.wikimedia.org/T272917
Stack
Software | Version |
---|---|
k8s | v1.16.5 |
Istio | v1.9.5 |
Knative | v0.18.1 |
KServe | v0.7.0 |
Istio
Istio is a service-mesh where we can run our ML-services. It is installed using the istioctl package, which has been added to the WMF APT repository (https://wikitech.wikimedia.org/wiki/APT_repository, debian buster). See: https://apt-browser.toolforge.org/buster-wikimedia/main/ , we are currently running Istio 1.9.5 (istioctl: 1.9.5-1)
Knative
We use Knative Serving for running serverless containers on k8s using Istio. It also allows for various deployment strategies like canary, blue-green, A/B tests, etc.
Charts
- Knative Serving CRDs: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/knative-serving-crds/
- Knative Serving: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/knative-serving/
Images
- Webhook: https://docker-registry.wikimedia.org/knative-serving-webhook/tags/
- Queue: https://docker-registry.wikimedia.org/knative-serving-queue/tags/
- Controller: https://docker-registry.wikimedia.org/knative-serving-controller/tags/
- Autoscaler: https://docker-registry.wikimedia.org/knative-serving-autoscaler/tags/
- Activator: https://docker-registry.wikimedia.org/knative-serving-activator/tags/
- Net-istio webhook: https://docker-registry.wikimedia.org/knative-net-istio-webhook/tags/
- Net-istio controller: https://docker-registry.wikimedia.org/knative-net-istio-controller/tags/
KServe
We use KServe for it's custom InferenceService
resource. It enables us to expose our ML models as asynchronous micro-services.
Charts
- KServe: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/kserve/
- InferenceService: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/kserve-inference/
Images
- KServe agent: https://docker-registry.wikimedia.org/kserve-agent/tags/
- Kserve controller: https://docker-registry.wikimedia.org/kserve-controller/tags/
- KServe storage-initializer: https://docker-registry.wikimedia.org/kserve-storage-initializer/tags/
Hosts
eqiad
- ml-serve1001-4
codfw
- ml-serve2001-4
Components
Monitoring
- KServe: https://grafana.wikimedia.org/d/Rvs1p4K7k/kserve
- Knative Serving: https://grafana.wikimedia.org/d/c6GYmqdnz/knative-serving
Serving
We host our Machine Learning models as Inference Services (isvcs), which are asynchronous micro-services that can transform raw feature data and make predictions. Each inference service has production images that are published in the WMF Docker Registry via the Deployment Pipeline. These images are then used for an isvc configuration in our ml-services helmfile in the operations/deployment-charts repo.
- Model Deployment Guide: Machine Learning/LiftWing/Deploy
- Inference Service Docs: Machine_Learning/LiftWing/Inference Services
Storage
We store model binary files in Swift, which is an open-source s3-compatible object store that is widely-used across the WMF. The model files are downloaded by the storage-initializer (init:container) when an Inference Service pod is created. The storage-initializer then mounts the model binary in the pod at /mnt/models/
and can be loaded by the predictor container.
- Model Upload info: Machine_Learning/LiftWing/Deploy#How_to_upload_a_model_to_Swift
Development
We are developing inference services on the ML Sandbox using our own WMF KServe images & charts.
We previously used multiple sandbox clusters running MiniKF.
Services
We are serving ML models as Inference Services, which are containerized applications. The code is currently hosted on gerrit: https://gerrit.wikimedia.org/g/machinelearning/liftwing/inference-services