You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Wikimedia Cloud Services team/EnhancementProposals/Toolforge Buildpack Implementation

From Wikitech-static
< Wikimedia Cloud Services team‎ | EnhancementProposals
Revision as of 22:44, 14 April 2021 by imported>Bstorm (Start some of the thoughts and experiments that have gone into things as a skeleton to add real design materials)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Overview

Building on Wikimedia Cloud Services team/EnhancementProposals/Toolforge push to deploy and initial PoC at Portal:Toolforge/Admin/Buildpacks, this is an effort at delivering a design document and plan for the introduction of a buildpack-based workflow.

Toolforge is a Platform-as-a-Service concept inside Cloud VPS, but it is burdened heavily by technical debt from outdated designs and assumptions. Since the latest implementations of Toolforge's most effectively curated services are all cloud native structures that are dependent on containerization and Kubernetes, moving away from shell logins and compute batch processing (like Grid Engine) that is heavily tied to NFS, the clear way forward is a flexible, easy-to-use container system that launches code with minimal effort on the part of the user while having a standardized way to contribute to the system itself. Today, the community-adopted and widely-used solution is Cloud Native Buildpacks, a CNCF project that was originally started by Heroku and Cloud Foundry based on their own work in order to move toward industry standardization. Since it is a standard and a specification, there are many implementations of that standard. The one used for local development is the pack command line tool, which is also orchestrated by many other integrations such at Gitlab, Waypoint and CircleCI. Separate implementations are more readily consumed if not using those tools such as kpack (which is maintained mostly by VMWare) and a couple tasks built into to Tekton CI/CD that are maintained by the Cloud Native Buildpacks group directly (like pack). Because Tekton is designed to be integrated into other CI/CD solutions and is fairly transparent about it's backing organizations, it is a natural preference for Toolforge as it should work with Jenkins, Gitlab, Github, etc.

To get an idea of what it "looks like" to use a buildpack, it can be thought of as simply an alternative to using a Dockerfile. It's an alternative that cares about the specific OCI image (aka Docker image) layers that are created and exported to prevent not only an explosion of space used by them but to make sure that the lower layers stay in a predictable state that can be rebased on top of when they are updated. The example used is almost always using the pack command to deploy an app locally (see https://buildpacks.io/docs/app-journey/). This looks either like magic (if you are used to creating Dockerfiles) or like just another way to dockerize things. That really doesn't explain a thing about what buildpacks are or why we'd use them. A better example would be to create a Node.js app on Gitlab.com, enable "Auto DevOps" (aka buildpacks), add a Kubernetes cluster to your Gitlab settings and watch it magically produce a CI/CD pipeline all the way to production without any real configuration. That's what it kind of looks like if you get it right. The trickier part is all on our end.

Concepts

Buildpacks are a specification as well as a piece of code you can put in a repository. It is somewhat necessary to separate these two ideas in order to understand the ecosystem. Part of the reason for this is that buildpacks don't do anything without an implementation of the lifecycle in a platform. A "buildpack" applies user code to a "stack" via a "builder" during the "lifecycle" to get code onto a "platform", which is also the term used to describe the full system. So when we talk about "buildpacks" we could be talking about:

This is not made easier to understand by the ongoing development of the standard (which now includes stackpacks, which can take root actions on layers before handing off to buildpacks--such as running package installs).

TODO: add diagrams and such that I'm working on

Components

Implementation in CI

The specification and lifecycle must be implemented in a way that works with our tooling and enables our contributors. Effectively, that means it has to be integrated in CI and deployed with CD into Toolforge Kubernetes (as it is or in a slightly different form). Buildpacks are not natively implemented in Jenkins without help, and Gitlab may be on its way, but it in order to build out with appropriate access restrictions, security model and full integration into WMCS operations we likely need to operate our own small CI system to process buildpacks. Tekton CI/CD should provide a good mechanism for this since it is entirely build for Kubernetes and is actively maintained with buildpacks in mind. It does have a dashboard available that may be useful to our users, but it all needs evaluation and work.

Because we will likely not be able to simply consume Auto DevOps from Gitlab without at least as much work, we will probably pursue this more flexible route unless we end up with a good reason not to.

OCI Registry

Since Docker software is getting dropped from many non-desktop implementations of cloud native containers, it is usually talked about as OCI images and registries that we are working with. Docker is a specific implementation and extension of the standard with its own orchestration layers that are separate from the OCI spec. All that said, we need a registry to work with, and right now, Toolforge uses the same basic Docker registry that is used by production, but we use local disk instead of Swift for storage. This is highly limiting. Since WMCS systems are not limited exclusively to Debian packaging, it is possible we could deploy the vastly popular and much more business-ready Harbor that is maintained mostly by VMWare, Red Hat's cloud-based Quay.io service that is free for public images (and used by PAWS) or something else. Harbor is deployed via docker compose or (in better, more fault tolerant form) helm since it is distributed entirely as OCI images built on VMWare's peculiar Photon OS. That's why it was rejected by main production SRE teams because repackaging it would be a huge piece of work. Despite this odd deployment style, it enjoys wide adoption and contributions now and is at 2.0 maturity. It can also be used to cache and proxy to things like Quay.io. The downside of Harbor is that it is backed by Postgresql, and it doesn't solve the storage problem inherently. While it would be ideal to deploy that with Trove Postgresql and Swift storage. We could deploy it initially with a simple postgres install and cinder storage volumes.

Dashboards and UX

Repository of buildpacks (and likely builders and stacks)

We could consume the well-maintained paketo buildpacks, which basically give you most things for free via community maintained and professionally managed sets of these things (backed by CloudFoundry). However, to maintain interoperability with familiar interfaces, internal response to security issues in our stacks and likely customizations we might want, we are likely to want to manage our own. Most buildpack authors seem to use golang binaries for the stages of the buildpack specification in order to get the advantages of "real" software development without including lots of runtime cruft in the layers in order to complete builds (which also could pollute the artifacts). The examples on the buildpacks website all use shell scripts to make it easier to understand, but that's not very extensible, maintainable or performant in the long run. It may be reasonable to begin with shell-based buildpacks for initial demonstrations and workloads and replace them with golang or rust (which has more enthusiasm in our contributor communities) over time.