You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Calico: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Giuseppe Lavagetto
No edit summary
imported>JMeybohm
(Add some words about calico IPAM)
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
[http://docs.projectcalico.org Calico] is a virtual network infrastructure that we use to manage kubernetes networking.
{{Kubernetes nav}}


== Building ==
== Introduction ==
[http://docs.projectcalico.org Calico] is a virtual network infrastructure that we use to manage Kubernetes networking.


Following the [http://docs.projectcalico.org/v2.1/getting-started/kubernetes/installation/integration integration guide] there are a few things we need to rebuild, in order to stick to our policy of "internal builds only":
It provides IPAM for Kubernetes workloads/Pods as well as management of IPtables rules, routing tables and BGP peering of the Kubernetes Nodes.


- the <tt>calico/node</tt> container
== IPAM ==
- the <tt>calicoctl</tt> cli tool (which gets built as part of the <tt>calico/node</tt> build pipeline)
We configure IP pools per cluster (via the calico {{Gitweb|project=operations/deployment-charts|file=charts/calico|text=helm-chart}}) that Calico splits up in blocks (CRD resource: <code>ipamblocks.crd.projectcalico.org</code>) of sizes <code>26</code> for IPv4 and <code>122</code> for IPv6, providing 64 addresses to nodes when blocks get assigned on demand. One node can have zero or more IPAM blocks assigned (the first one will be assigned as soon as the first Pod is scheduled on a node). As of 2022-01-22 our running Calico version (3.17) does not free unused blocks, so they stay assigned to a node forever.<ref>Calico >= v3.20 will release unused blocks: https://github.com/projectcalico/kube-controllers/pull/799</ref>
- the <tt>calico-cni</tt> CNI plugins
- the base <tt>cni</tt> plugins we'll be using


The next few sections will detail how to do each of these things.
On the nodes, the network of assigned IPAM blocks get blackholed and specific (<code>/32</code>) routes are added for every Pod running on the node:<syntaxhighlight lang="bash">
kubestage1003:~# ip route
default via 10.64.16.1 dev eno1 onlink
10.64.16.0/22 dev eno1 proto kernel scope link src 10.64.16.55
10.64.75.64 dev caliabad5f15937 scope link
blackhole 10.64.75.64/26 proto bird
10.64.75.65 dev cali13b43f910f6 scope link
10.64.75.66 dev cali8bc45095644 scope link
...
</syntaxhighlight>This way, the nodes will be authoritative for and announce the assigned IPAM blocks networks to their BGP  peers.
 
The IPAM blocks, affinities are stored in Kubernetes CRD objects and can be viewed and modified using the Kubernetes API, <code>kubectl</code> or <code>calicoctl</code>:<syntaxhighlight lang="bash">
calicoctl ipam show --show-blocks
kubectl get ipamblocks.crd.projectcalico.org,blockaffinities.crd.projectcalico.org
</syntaxhighlight>
 
 
Calico IPAM also supports a concept of borrowing IPs from IP blocks of foreign nodes in case a node has used up all it's attached IP blocks and can't get another one from the IP pool. We disable this feature by configuring calico IPAM with StrictAffinity  (see {{Phabricator/en|T296303}}) as it only works in a node-to-node mesh configuration.


=== calicoctl (and calico/node) ===
== Operations ==
Calico should be running via a Daemonset on every node of a Kubernetes cluster, establishing a BGP peering with the core routers (see [[IP and AS allocations#Private AS]]).


We keep a cloned/modified repository at [https://gerrit.wikimedia.org/r/#/admin/projects/operations/calico-containers operations/calico-containers]. Please note that calico has renamed their repo to "calicoctl" in the meanwhile.
Unfortunately, Calico [https://github.com/projectcalico/node/issues/519 currently] does not set the <code>NetworkUnavailable</code> condition to true on nodes where it is not running or failing, although that will ultimately render the node unusable. Therefore a Prometheus alert will fire in case if fails to scrape Calico metrics from a node.


Download the needed version using the debian/repack scipt, then import it using gbp<syntaxhighlight lang="bash">
If you are reading this page because you've seen such an alert:
# In the directory where you cloned calico-containers
 
/srv/calico $ debian/repack 1.2.1
* Check the nodes state with: <code>kubectl describe node <node fqdn></code>
...
* Take a look at the latest events in the cluster: https://logstash.wikimedia.org/app/dashboards#/view/d43f9bf0-17b5-11eb-b848-090a7444f26c
/srv/calico $ gpb import-orig ../calico-containers-1.2.1.tar.xz
* Check the logs of calico-node Pods: https://logstash.wikimedia.org/app/dashboards#/view/f6a5b090-0020-11ec-81e9-e1226573bad4
....
 
/srv/calico $ dch -v "1.2.1-1~wmf1"
== Packaging ==
/srv/calico $ git add debian/changelog && git commit -m "Updating debian version"
{{note|<dist> below stands for one of the Debian distribution's codenames, e.g. jessie, stretch, buster, bullseye. Make sure you use the one you target}}We don't actually build calico but package it's components from upstream binary releases.
# Now build calicoctl; this will also apply the needed patches to the dockerfiles
 
/srv/calico $ gbp buildpackage
Because of that, you will need to set [[HTTP proxy]] variables for internet access on the build host.
...
 
/srv/calico $ make calico-node node-test-containerized
The general process to follow is:
</syntaxhighlight>The resulting calico-node container will need to be tagged appropriately and pushed to the registry.
*Check out {{Gitweb|project=operations/debs/calico}} on your workstation
*Decide if you want to package a new master (production) or future (potential next production) version
*Create a patch to bump the debian changelog
<syntaxhighlight lang="bash">
export NEW_VERSION=3.16.5 # Calico version you want to package
dch -v ${NEW_VERSION}-1 -D unstable "Update to v${NEW_VERSION}"
git commit debian/changelog
 
# If you're packaging a new future version, make sure to submit the patch to the correct branch
git review future
</syntaxhighlight>
 
* Merge
* Check out {{Gitweb|project=operations/debs/calico}} on the build host
* Build the packages:
<syntaxhighlight lang="bash">
git checkout future # If you want to build a new version not directly to be released to production
 
# Ensure you allow networking in pbuilder
# This option needs to be in the file, an environment variable will *not* work!
echo "USENETWORKING=yes" >> ~/.pbuilderrc
 
# Build the package
https_proxy=http://webproxy.$(hostname -d):8080 DIST=<dist> pdebuild
</syntaxhighlight>
 
== Updating helm charts ==
There are two helm charts that might need updating, depending on the changes in a newly packaged calico version:
 
* {{Gitweb|project=operations/deployment-charts|file=charts/calico}}
* {{Gitweb|project=operations/deployment-charts|file=charts/calico-crds}}
 
==Publishing==
 
=== The Debian Packages ===
<syntaxhighlight lang="bash">
# On apt1001, copy the packages from the build host
rsync -vaz deneb.codfw.wmnet::pbuilder-result/<dist>-amd64/calico*<PACKAGE VERSION>* .
 
# If you want to import a new production version, import to component main
sudo -i reprepro -C main --ignore=wrongdistribution include <dist>-wikimedia /path/to/<PACKAGE>.changes
 
# If you want to import a test/pre-production version, import to component calico-future
sudo -i reprepro -C component/calico-future --ignore=wrongdistribution include <dist>-wikimedia /path/to/<PACKAGE>.changes
</syntaxhighlight>


=== calico-cni ===
=== The Docker Images ===
Download the needed version using the <code>debian/repack</code> script, then import it using <code>gbp</code><syntaxhighlight lang="bash">
Calico also includes a bunch of docker images which need to be published into our docker registry. To simplify the process, the packaging generates a debian package named "calico-images" that includes the images as well as a script to publish them:<syntaxhighlight lang="bash">
# In the directory where you cloned operations/calico-cni
# On the build host, extract the calico-images debian package
/srv/calico-cni $ debian/repack 1.8.3
tmpd=$(mktemp -d)
Downloading calico-cni-1.8.3
dpkg -x /var/cache/pbuilder/result/<dist>-amd64/calico-images_<PACKAGE_VERSION>_amd64.deb $tmpd
Repackaged in /srv/calico-cni-1.8.3.tar.xz
Cleaned up the working directory '/tmp/tmp.Yj51DNj'
/srv/calico-cni $ gbp import-orig ../calico-cni-1.8.3.tar.xz
What is the upstream version? [1.8.3]
gbp:info: Importing '../calico-cni-1.8.3.tar.xz' to branch 'upstream'...
gbp:info: Source package is calico-cni
gbp:info: Upstream version is 1.8.3
gbp:info: Merging to 'master'
gbp:info: Successfully imported version 1.8.3 of ../calico-cni-1.8.3.tar.xz
/srv/calico-cni $ dch -v "1.8.3-1~wmf1"
/srv/calico-cni $ git add debian/changelog
/srv/calico-cni $ git commit -m 'Updating debian version'
[master a9c31a1] Updating debian version
1 file changed, 6 insertions(+)
/srv/calico-cni $ gbp buildpackage
...
</syntaxhighlight>Please note that this will result in a "dirty" package as we're downloading all go dependencies directly and not "the debian way". This might change once the number of natively supported libraries becomes larger.


=== calico-k8s-policy-controller ===
# Load and push the images
Configure your repository to have an upstream to track the calico upstream:<syntaxhighlight lang="bash">
sudo -i CALICO_IMAGE_DIR=${tmpd}/usr/share/calico ${tmpd}/usr/share/calico/push-calico-images.sh
$ git remote show upstream
rm -rf $tmpd
* remote upstream
</syntaxhighlight>
  Fetch URL: https://github.com/projectcalico/k8s-policy.git
  Push  URL: https://github.com/projectcalico/k8s-policy.git
  HEAD branch: master


</syntaxhighlight>Then checkout a branch for the desired tag<syntaxhighlight lang="bash">
==Updating==
git remote update
*Update debian packages calicoctl and calico-cni on kubernetes nodes using [[Debdeploy]]
git checkout -b 0.6.0 v0.6.0
*Update <code>image.tag</code> version in <code>helmfile.d/admin_ng/values/<Cluster>/calico-values.yaml</code>
# Fix the Dockerfile
**Deploy to the cluster(s) that you want updated
git cherry-pick fbf840d180e4b70f56f02e9d616adf9ed9cd8523
git push origin 0.6.0
</syntaxhighlight>Finally, rebuild the docker container (this can be managed via the build-calico script)

Latest revision as of 11:00, 6 January 2022

Introduction

Calico is a virtual network infrastructure that we use to manage Kubernetes networking.

It provides IPAM for Kubernetes workloads/Pods as well as management of IPtables rules, routing tables and BGP peering of the Kubernetes Nodes.

IPAM

We configure IP pools per cluster (via the calico helm-chart) that Calico splits up in blocks (CRD resource: ipamblocks.crd.projectcalico.org) of sizes 26 for IPv4 and 122 for IPv6, providing 64 addresses to nodes when blocks get assigned on demand. One node can have zero or more IPAM blocks assigned (the first one will be assigned as soon as the first Pod is scheduled on a node). As of 2022-01-22 our running Calico version (3.17) does not free unused blocks, so they stay assigned to a node forever.[1]

On the nodes, the network of assigned IPAM blocks get blackholed and specific (/32) routes are added for every Pod running on the node:

kubestage1003:~# ip route
default via 10.64.16.1 dev eno1 onlink
10.64.16.0/22 dev eno1 proto kernel scope link src 10.64.16.55
10.64.75.64 dev caliabad5f15937 scope link
blackhole 10.64.75.64/26 proto bird
10.64.75.65 dev cali13b43f910f6 scope link
10.64.75.66 dev cali8bc45095644 scope link
...

This way, the nodes will be authoritative for and announce the assigned IPAM blocks networks to their BGP peers. The IPAM blocks, affinities are stored in Kubernetes CRD objects and can be viewed and modified using the Kubernetes API, kubectl or calicoctl:

calicoctl ipam show --show-blocks
kubectl get ipamblocks.crd.projectcalico.org,blockaffinities.crd.projectcalico.org


Calico IPAM also supports a concept of borrowing IPs from IP blocks of foreign nodes in case a node has used up all it's attached IP blocks and can't get another one from the IP pool. We disable this feature by configuring calico IPAM with StrictAffinity (see task T296303) as it only works in a node-to-node mesh configuration.

Operations

Calico should be running via a Daemonset on every node of a Kubernetes cluster, establishing a BGP peering with the core routers (see IP and AS allocations#Private AS).

Unfortunately, Calico currently does not set the NetworkUnavailable condition to true on nodes where it is not running or failing, although that will ultimately render the node unusable. Therefore a Prometheus alert will fire in case if fails to scrape Calico metrics from a node.

If you are reading this page because you've seen such an alert:

Packaging

We don't actually build calico but package it's components from upstream binary releases.

Because of that, you will need to set HTTP proxy variables for internet access on the build host.

The general process to follow is:

  • Check out operations/debs/calico on your workstation
  • Decide if you want to package a new master (production) or future (potential next production) version
  • Create a patch to bump the debian changelog
export NEW_VERSION=3.16.5 # Calico version you want to package
dch -v ${NEW_VERSION}-1 -D unstable "Update to v${NEW_VERSION}"
git commit debian/changelog

# If you're packaging a new future version, make sure to submit the patch to the correct branch
git review future
git checkout future # If you want to build a new version not directly to be released to production

# Ensure you allow networking in pbuilder
# This option needs to be in the file, an environment variable will *not* work!
echo "USENETWORKING=yes" >> ~/.pbuilderrc

# Build the package
https_proxy=http://webproxy.$(hostname -d):8080 DIST=<dist> pdebuild

Updating helm charts

There are two helm charts that might need updating, depending on the changes in a newly packaged calico version:

Publishing

The Debian Packages

# On apt1001, copy the packages from the build host
rsync -vaz deneb.codfw.wmnet::pbuilder-result/<dist>-amd64/calico*<PACKAGE VERSION>* .

# If you want to import a new production version, import to component main
sudo -i reprepro -C main --ignore=wrongdistribution include <dist>-wikimedia /path/to/<PACKAGE>.changes

# If you want to import a test/pre-production version, import to component calico-future
sudo -i reprepro -C component/calico-future --ignore=wrongdistribution include <dist>-wikimedia /path/to/<PACKAGE>.changes

The Docker Images

Calico also includes a bunch of docker images which need to be published into our docker registry. To simplify the process, the packaging generates a debian package named "calico-images" that includes the images as well as a script to publish them:

# On the build host, extract the calico-images debian package
tmpd=$(mktemp -d)
dpkg -x /var/cache/pbuilder/result/<dist>-amd64/calico-images_<PACKAGE_VERSION>_amd64.deb $tmpd

# Load and push the images
sudo -i CALICO_IMAGE_DIR=${tmpd}/usr/share/calico ${tmpd}/usr/share/calico/push-calico-images.sh
rm -rf $tmpd

Updating

  • Update debian packages calicoctl and calico-cni on kubernetes nodes using Debdeploy
  • Update image.tag version in helmfile.d/admin_ng/values/<Cluster>/calico-values.yaml
    • Deploy to the cluster(s) that you want updated
  1. Calico >= v3.20 will release unused blocks: https://github.com/projectcalico/kube-controllers/pull/799