You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Dragonfly is an P2P based file distribution system we use for distributing docker image layers to Kubernetes worker nodes. It was added to our infrastructure to overcome the issue of overloaded Docker-registry nodes when big deployments (in terms of number of replicas) that also use big docker images (in terms of layer size) are rolled out (read: MediaWiki).
Dragonfly consists of multiple components:
- supernode: The supernode is a service running on dedicated hosts (Ganeti VMs) that acts as a tracker and scheduler for the P2P network.
- dfget: Is the download client (wget like) that at the same time acts as peer in the P2P network.
- dfdaemon: Is a local HTTP(S) proxy between the docker container engine and the docker registry. It filters out requests for (specific) layers and uses dfget to download those via the P2P network instead.
For a more complete documentation about design and implementation of Dragonfly, please refer to: https://github.com/dragonflyoss/Dragonfly/blob/master/docs/design/design.md
You may also want to watch the in introduction to drafonfly from KubeCon 2019: https://www.youtube.com/watch?v=LcxBgmmeA80
We currently run one supernode in each datacenter (listening on tcp/8002), all Kubernetes nodes (P2P peers) in that datacenter use that supernode to span the P2P network. That means, Dragonfly P2P networks do not span (and should not) cross datacenters.
On each Kubernetes node we run dfdaemon (listening on tcp/65001) as a HTTPS poxy between dockerd and the docker registries. dfdaemon is configured to use a TLS certificate that contains the alt name docker-registry.discovery.wmnet, so that connections from dockerd can be transparently hijacked and potentially re-routed through the P2P network. dfdaemon does that by spawning multiple instances of dfget to download from and and one instance to serve parts (4MB chunks of docker image layers) to the P2P network. The later listens on tcp/15001 for connections from other peers (for around 5min, after that time of inactivity the peer unregisters itself and removes the cached chunks from disk).
If a supernode fails, dfdeamon on each P2P peer will direct the traffic to the "source" of the data requested (e.g. the docker-registry) directly instead of failing. That means that in case of an issue with the P2P network, all docker daemons will pull (more or less) directly from the docker-registry again - potentially exhausting it's network links.
Monitoring / Logging
The monitoring currently relies on Icinga to watch over the state of the systemd service on supernode as well as on P2P peers (dfdaemon).
There is a Grafana dashboard with some metrics at https://grafana-rw.wikimedia.org/d/CmbiPADWz/dragonfly
Where to look for logs
- dfget's downloading chunks:
- dfget's serving chunks:
Disable the use of dragonfly on a kubernetes node
sudo disable-puppet 'disable dragonfly' sudo systemctl revert docker.service sudo systemcrl restart docker.service
Importing a new version
The imported upstream tarballs should include the complete vendor directory.
- Check out the version (git tag) to import
$ ./debian/repack vX.Y.Z
- This drops you into a shell with the git tag checked out. Do necessary changes here and commit
$ go mod vendor $ git add -f vendor # git diff --name-status --cached | grep -v 'vendor/' to make sure you only changed vendor $ git commit -m "added vendor"
- Exiting the shell will build a tarball to import
$ gbp import-orig /path/to/tarball.tar.xz
- Push changes (including the tag crated by gpb) to gerrit
$ git push gerrit --all $ git push gerrit --tags
- Add a debian/changelog entry (as CR)
$ gbp dch # Edit debian/changelog $ git commit $ git review
Building a new version
- Check out the git repo on the build host
$ git clone "https://gerrit.wikimedia.org/r/operations/debs/dragonfly" && cd dragonfly
- Build the package
$ BACKPORTS=yes gbp buildpackage --git-pbuilder --git-no-pbuilder-autoconf --git-dist=buster -sa -uc -us
Publish a new version
# on apt1001 rsync -vaz deneb.codfw.wmnet::pbuilder-result/buster-amd64/dragonfly* . sudo -i reprepro -C main include buster-wikimedia /path/to/<PACKAGE>.changes # Copy the package over to other distros (this is possible because they only contain static binaries) sudo -i reprepro copysrc stretch-wikimedia buster-wikimedia dragonfly
If you need to add/update patches, please see: https://honk.sigxcpu.org/projects/git-buildpackage/manual-html/gbp.patches.html
- Grafana Dashboard