You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Calico: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Giuseppe Lavagetto
(First stub of the page; will improve it as I move along.)
 
imported>JMeybohm
(Add some words about calico IPAM)
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[http://docs.projectcalico.org Calico] is a virtual network infrastructure that we use to manage kubernetes networking.
{{Kubernetes nav}}


== Building ==
== Introduction ==
[http://docs.projectcalico.org Calico] is a virtual network infrastructure that we use to manage Kubernetes networking.


Following the [http://docs.projectcalico.org/v2.1/getting-started/kubernetes/installation/integration integration guide] there are a few things we need to rebuild, in order to stick to our policy of "internal builds only":
It provides IPAM for Kubernetes workloads/Pods as well as management of IPtables rules, routing tables and BGP peering of the Kubernetes Nodes.


- the <tt>calico/node</tt> container
== IPAM ==
- the <tt>calicoctl</tt> cli tool (which gets built as part of the <tt>calico/node</tt> build pipeline)
We configure IP pools per cluster (via the calico {{Gitweb|project=operations/deployment-charts|file=charts/calico|text=helm-chart}}) that Calico splits up in blocks (CRD resource: <code>ipamblocks.crd.projectcalico.org</code>) of sizes <code>26</code> for IPv4 and <code>122</code> for IPv6, providing 64 addresses to nodes when blocks get assigned on demand. One node can have zero or more IPAM blocks assigned (the first one will be assigned as soon as the first Pod is scheduled on a node). As of 2022-01-22 our running Calico version (3.17) does not free unused blocks, so they stay assigned to a node forever.<ref>Calico >= v3.20 will release unused blocks: https://github.com/projectcalico/kube-controllers/pull/799</ref>
- the <tt>calico-cni</tt> CNI plugins
- the base <tt>cni</tt> plugins we'll be using


The next few sections will detail how to do each of these things.
On the nodes, the network of assigned IPAM blocks get blackholed and specific (<code>/32</code>) routes are added for every Pod running on the node:<syntaxhighlight lang="bash">
kubestage1003:~# ip route
default via 10.64.16.1 dev eno1 onlink
10.64.16.0/22 dev eno1 proto kernel scope link src 10.64.16.55
10.64.75.64 dev caliabad5f15937 scope link
blackhole 10.64.75.64/26 proto bird
10.64.75.65 dev cali13b43f910f6 scope link
10.64.75.66 dev cali8bc45095644 scope link
...
</syntaxhighlight>This way, the nodes will be authoritative for and announce the assigned IPAM blocks networks to their BGP  peers.


=== calico/node and calicoctl ===
The IPAM blocks, affinities are stored in Kubernetes CRD objects and can be viewed and modified using the Kubernetes API, <code>kubectl</code> or <code>calicoctl</code>:<syntaxhighlight lang="bash">
calicoctl ipam show --show-blocks
kubectl get ipamblocks.crd.projectcalico.org,blockaffinities.crd.projectcalico.org
</syntaxhighlight>


We keep a cloned/modified repository at [https://gerrit.wikimedia.org/r/#/admin/projects/operations/calico-containers operations/calico-containers].
 
Calico IPAM also supports a concept of borrowing IPs from IP blocks of foreign nodes in case a node has used up all it's attached IP blocks and can't get another one from the IP pool. We disable this feature by configuring calico IPAM with StrictAffinity  (see {{Phabricator/en|T296303}}) as it only works in a node-to-node mesh configuration.
 
== Operations ==
Calico should be running via a Daemonset on every node of a Kubernetes cluster, establishing a BGP peering with the core routers (see [[IP and AS allocations#Private AS]]).
 
Unfortunately, Calico [https://github.com/projectcalico/node/issues/519 currently] does not set the <code>NetworkUnavailable</code> condition to true on nodes where it is not running or failing, although that will ultimately render the node unusable. Therefore a Prometheus alert will fire in case if fails to scrape Calico metrics from a node.
 
If you are reading this page because you've seen such an alert:
 
* Check the nodes state with: <code>kubectl describe node <node fqdn></code>
* Take a look at the latest events in the cluster: https://logstash.wikimedia.org/app/dashboards#/view/d43f9bf0-17b5-11eb-b848-090a7444f26c
* Check the logs of calico-node Pods: https://logstash.wikimedia.org/app/dashboards#/view/f6a5b090-0020-11ec-81e9-e1226573bad4
 
== Packaging ==
{{note|<dist> below stands for one of the Debian distribution's codenames, e.g. jessie, stretch, buster, bullseye. Make sure you use the one you target}}We don't actually build calico but package it's components from upstream binary releases.
 
Because of that, you will need to set [[HTTP proxy]] variables for internet access on the build host.
 
The general process to follow is:
*Check out {{Gitweb|project=operations/debs/calico}} on your workstation
*Decide if you want to package a new master (production) or future (potential next production) version
*Create a patch to bump the debian changelog
<syntaxhighlight lang="bash">
export NEW_VERSION=3.16.5 # Calico version you want to package
dch -v ${NEW_VERSION}-1 -D unstable "Update to v${NEW_VERSION}"
git commit debian/changelog
 
# If you're packaging a new future version, make sure to submit the patch to the correct branch
git review future
</syntaxhighlight>
 
* Merge
* Check out {{Gitweb|project=operations/debs/calico}} on the build host
* Build the packages:
<syntaxhighlight lang="bash">
git checkout future # If you want to build a new version not directly to be released to production
 
# Ensure you allow networking in pbuilder
# This option needs to be in the file, an environment variable will *not* work!
echo "USENETWORKING=yes" >> ~/.pbuilderrc
 
# Build the package
https_proxy=http://webproxy.$(hostname -d):8080 DIST=<dist> pdebuild
</syntaxhighlight>
 
== Updating helm charts ==
There are two helm charts that might need updating, depending on the changes in a newly packaged calico version:
 
* {{Gitweb|project=operations/deployment-charts|file=charts/calico}}
* {{Gitweb|project=operations/deployment-charts|file=charts/calico-crds}}
 
==Publishing==
 
=== The Debian Packages ===
<syntaxhighlight lang="bash">
# On apt1001, copy the packages from the build host
rsync -vaz deneb.codfw.wmnet::pbuilder-result/<dist>-amd64/calico*<PACKAGE VERSION>* .
 
# If you want to import a new production version, import to component main
sudo -i reprepro -C main --ignore=wrongdistribution include <dist>-wikimedia /path/to/<PACKAGE>.changes
 
# If you want to import a test/pre-production version, import to component calico-future
sudo -i reprepro -C component/calico-future --ignore=wrongdistribution include <dist>-wikimedia /path/to/<PACKAGE>.changes
</syntaxhighlight>
 
=== The Docker Images ===
Calico also includes a bunch of docker images which need to be published into our docker registry. To simplify the process, the packaging generates a debian package named "calico-images" that includes the images as well as a script to publish them:<syntaxhighlight lang="bash">
# On the build host, extract the calico-images debian package
tmpd=$(mktemp -d)
dpkg -x /var/cache/pbuilder/result/<dist>-amd64/calico-images_<PACKAGE_VERSION>_amd64.deb $tmpd
 
# Load and push the images
sudo -i CALICO_IMAGE_DIR=${tmpd}/usr/share/calico ${tmpd}/usr/share/calico/push-calico-images.sh
rm -rf $tmpd
</syntaxhighlight>
 
==Updating==
*Update debian packages calicoctl and calico-cni on kubernetes nodes using [[Debdeploy]]
*Update <code>image.tag</code> version in <code>helmfile.d/admin_ng/values/<Cluster>/calico-values.yaml</code>
**Deploy to the cluster(s) that you want updated

Latest revision as of 11:00, 6 January 2022

Introduction

Calico is a virtual network infrastructure that we use to manage Kubernetes networking.

It provides IPAM for Kubernetes workloads/Pods as well as management of IPtables rules, routing tables and BGP peering of the Kubernetes Nodes.

IPAM

We configure IP pools per cluster (via the calico helm-chart) that Calico splits up in blocks (CRD resource: ipamblocks.crd.projectcalico.org) of sizes 26 for IPv4 and 122 for IPv6, providing 64 addresses to nodes when blocks get assigned on demand. One node can have zero or more IPAM blocks assigned (the first one will be assigned as soon as the first Pod is scheduled on a node). As of 2022-01-22 our running Calico version (3.17) does not free unused blocks, so they stay assigned to a node forever.[1]

On the nodes, the network of assigned IPAM blocks get blackholed and specific (/32) routes are added for every Pod running on the node:

kubestage1003:~# ip route
default via 10.64.16.1 dev eno1 onlink
10.64.16.0/22 dev eno1 proto kernel scope link src 10.64.16.55
10.64.75.64 dev caliabad5f15937 scope link
blackhole 10.64.75.64/26 proto bird
10.64.75.65 dev cali13b43f910f6 scope link
10.64.75.66 dev cali8bc45095644 scope link
...

This way, the nodes will be authoritative for and announce the assigned IPAM blocks networks to their BGP peers. The IPAM blocks, affinities are stored in Kubernetes CRD objects and can be viewed and modified using the Kubernetes API, kubectl or calicoctl:

calicoctl ipam show --show-blocks
kubectl get ipamblocks.crd.projectcalico.org,blockaffinities.crd.projectcalico.org


Calico IPAM also supports a concept of borrowing IPs from IP blocks of foreign nodes in case a node has used up all it's attached IP blocks and can't get another one from the IP pool. We disable this feature by configuring calico IPAM with StrictAffinity (see task T296303) as it only works in a node-to-node mesh configuration.

Operations

Calico should be running via a Daemonset on every node of a Kubernetes cluster, establishing a BGP peering with the core routers (see IP and AS allocations#Private AS).

Unfortunately, Calico currently does not set the NetworkUnavailable condition to true on nodes where it is not running or failing, although that will ultimately render the node unusable. Therefore a Prometheus alert will fire in case if fails to scrape Calico metrics from a node.

If you are reading this page because you've seen such an alert:

Packaging

We don't actually build calico but package it's components from upstream binary releases.

Because of that, you will need to set HTTP proxy variables for internet access on the build host.

The general process to follow is:

  • Check out operations/debs/calico on your workstation
  • Decide if you want to package a new master (production) or future (potential next production) version
  • Create a patch to bump the debian changelog
export NEW_VERSION=3.16.5 # Calico version you want to package
dch -v ${NEW_VERSION}-1 -D unstable "Update to v${NEW_VERSION}"
git commit debian/changelog

# If you're packaging a new future version, make sure to submit the patch to the correct branch
git review future
git checkout future # If you want to build a new version not directly to be released to production

# Ensure you allow networking in pbuilder
# This option needs to be in the file, an environment variable will *not* work!
echo "USENETWORKING=yes" >> ~/.pbuilderrc

# Build the package
https_proxy=http://webproxy.$(hostname -d):8080 DIST=<dist> pdebuild

Updating helm charts

There are two helm charts that might need updating, depending on the changes in a newly packaged calico version:

Publishing

The Debian Packages

# On apt1001, copy the packages from the build host
rsync -vaz deneb.codfw.wmnet::pbuilder-result/<dist>-amd64/calico*<PACKAGE VERSION>* .

# If you want to import a new production version, import to component main
sudo -i reprepro -C main --ignore=wrongdistribution include <dist>-wikimedia /path/to/<PACKAGE>.changes

# If you want to import a test/pre-production version, import to component calico-future
sudo -i reprepro -C component/calico-future --ignore=wrongdistribution include <dist>-wikimedia /path/to/<PACKAGE>.changes

The Docker Images

Calico also includes a bunch of docker images which need to be published into our docker registry. To simplify the process, the packaging generates a debian package named "calico-images" that includes the images as well as a script to publish them:

# On the build host, extract the calico-images debian package
tmpd=$(mktemp -d)
dpkg -x /var/cache/pbuilder/result/<dist>-amd64/calico-images_<PACKAGE_VERSION>_amd64.deb $tmpd

# Load and push the images
sudo -i CALICO_IMAGE_DIR=${tmpd}/usr/share/calico ${tmpd}/usr/share/calico/push-calico-images.sh
rm -rf $tmpd

Updating

  • Update debian packages calicoctl and calico-cni on kubernetes nodes using Debdeploy
  • Update image.tag version in helmfile.d/admin_ng/values/<Cluster>/calico-values.yaml
    • Deploy to the cluster(s) that you want updated
  1. Calico >= v3.20 will release unused blocks: https://github.com/projectcalico/kube-controllers/pull/799