imported>BryanDavis |
imported>Btullis |
(9 intermediate revisions by 6 users not shown) |
Line 6: |
Line 6: |
| == Clusters == | | == Clusters == |
|
| |
|
| The list of currently maintained clusters in WMF, split by realm and team is at [[Kubernetes/Clusters]] | | We maintain Kubernetes clusters in both the [[SRE/Production access|production]] and the [[Help:Cloud Services introduction|cloud services]] realms. |
| | |
| | Most of the information on this page and its subpages applies to the clusters in the production realm, although some techniques and tools are broadly applicable to other WMF clusters and Kubernetes in general. |
| | |
| | The '''[[Kubernetes/Clusters]]''' page contains the definitive list of currently maintained clusters in the production realm, along with information about who manages them and each cluster's specific purpose. |
| | |
| | For information relating to the Kubernetes clusters in the cloud services realm, please see [[Kubernetes#Toolforge info|Toolforge info]]. |
|
| |
|
| == Packages == | | == Packages == |
Line 23: |
Line 29: |
| * Learn more about [[Deployment pipeline/Migration/Tutorial | Migrating a service to kubernetes]] and [[Deployment pipeline]] generally. | | * Learn more about [[Deployment pipeline/Migration/Tutorial | Migrating a service to kubernetes]] and [[Deployment pipeline]] generally. |
|
| |
|
| == Debugging == | | == Deployment Charts == |
| | We use a git repository called [[gerrit:plugins/gitiles/operations/deployment-charts/+/refs/heads/master/|operations/deployment-charts]] to manage all of the applications and deployments to Kubernetes clusters in the production realm. |
|
| |
|
| For a quick intro into the debugging actions one can take during a problem in production look at [[Kubernetes/Helm]]. There will also be a guide posted under [[Kubernetes/Kubectl]]
| | See [[Kubernetes/Deployment Charts]] for more detailed information about the respository structure and its various functions. |
|
| |
|
| == Administration ==
| | It primarly contains [[Helm]] charts and [[Helmfile]] deployments. |
|
| |
|
| === Create a new cluster ===
| | The services and deployments that are defined within the repository are a combination of: |
|
| |
|
| Documentation for creating a new cluster is in [[Kubernetes/Clusters/New]]
| | * WMF software, running on [[Kubernetes/Images#Services%20images|service images]] managed by the [[deployment pipeline]] |
| | * WMF forks of third-party software, also running on [[Kubernetes/Images#Services images|service images]] managed by the [[deployment pipeline]] |
| | * WMF builds of third-party software, running on [[Kubernetes/Images#Production images|production images]] and built with [https://doc.wikimedia.org/docker-pkg/ docker-pkg] |
| | See [[Kubernetes/Deployments]] for instructions regarding day-to-day deployment of Kubernetes [[Kubernetes#services|services]]. |
|
| |
|
| === Add a new service === | | == Debugging == |
| To add a new service named '''service-foo''' to the clusters of the '''main''' group:
| |
|
| |
|
| #Ensure the service has its ports registered at: [[Service ports]]
| | For a quick intro into the debugging actions one can take during a problem in production look at [[Kubernetes/Helm]]. There will also be a guide posted under [[Kubernetes/Kubectl]] |
| #Create deployment user/tokens in the puppet private and public repos. You can use a randomly generated 22-character [A-z0-9] password for each of the two required tokens. You need to edit the <code>hieradata/common/profile/kubernetes.yaml</code> file in the private repository - specifically the <code>profile::kubernetes::infrastructure_user</code> key, as in the example below:<syntaxhighlight lang="yaml">
| |
| profile::kubernetes::infrastructure_users:
| |
| main:
| |
| client-infrastructure:
| |
| token: <REDACTED>
| |
| groups: [system:masters]
| |
| ...
| |
| + service-foo:
| |
| + token: <YOUR_TOKEN>
| |
| + groups:
| |
| + - deploy
| |
| + service-foo-deploy:
| |
| + token: <ANOTHER_TOKEN>
| |
| </syntaxhighlight>The additional user with the -deploy suffix is required due to the access control policies configured. Please see [[phab:T251305#7314778|this comment]] for a more detailed explanation of how this pattern arose.
| |
| #Tell the deployment server how to set up the kubeconfig files. This is done by modifying the <code>profile::kubernetes::deployment_server::services</code> hiera key (<code>hieradata/common/profile/kubernetes/deployment_server.yaml</code>) as in the example below:<syntaxhighlight lang="yaml">
| |
|
| |
|
| profile::kubernetes::deployment_server::services:
| | == Administration == |
| main:
| | See [[Kubernetes/Administration]] for collected instructions and runbooks for such tasks as: |
| mathoid:
| |
| usernames:
| |
| - name: mathoid
| |
| ...
| |
| + service-foo:
| |
| + usernames:
| |
| + - name: service-foo
| |
| + owner: mwdeploy
| |
| + group: wikidev
| |
| + mode: "0640"
| |
| </syntaxhighlight> Please note that the owner/group/mode here refer to the file permissions of your kubeconfig file ("/etc/kubernetes/service-foo-<cluster_name>.config"), determining which users/groups will be able to use this kubeconfig. Typically for normal service users you don't need to define them, as the defaults are correct.
| |
| #Ask Sevice Ops the private data for your service. This is done by adding an entry for service-foo under <code>profile::kubernetes::deployment_server_secrets::services</code> in the private repository (<code>role/common/deployment_server.yaml</code>). Secrets will most likely needed for all clusters, including staging.
| |
| #Add a Kubernetes namespace. Example commit:
| |
| #* '''kubernetes namespace:''' deployment-charts https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/693124
| |
| #At this point, you can safely merge the changes (after '''somebody from Service Ops validates'''). After merging, it is important to run the commands in the next step, so to avoid impacting other people rolling out changes later on.
| |
| #Setting up in staging-codfw cluster (and then to the other clusters)
| |
| '''On a cumin server''' ({{CuminHosts}})
| |
| sudo cumin -b 4 -s 2 kubemaster* 'run-puppet-agent'
| |
| '''On deploy1002:'''
| |
| sudo run-puppet-agent
| |
| sudo -i
| |
| cd /srv/deployment-charts/helmfile.d/admin_ng/
| |
| helmfile -e staging-codfw -i apply
| |
| The command above should show you a diff in namespaces/quotas/etc.. related to your new service. If you don't see a diff, ping somebody from the Service Ops team! Check that everything is ok:
| |
| kube_env $YOUR-SERVICE-NAME staging-codfw
| |
| kubectl get ns
| |
| kubectl get pods
| |
| You should be able to see info about your namespace. <code>kubectl get pods </code> should show a tiller pod.<br>
| |
| '''Repeat for the staging-eqiad, eqiad and codfw clusters even if you aren't ready to fully deploy your service. Leaving undeployed things will impede further operations by other people.'''
| |
| | |
| ==== Deploy a service to staging ====
| |
| At this point you should have a a Chart for your service (TODO: link to docs?), and will need to setup a <code>helmfile.d/services</code> directory in the {{Gitweb|project=operations/deployment-charts}} repository for the deployment. You can copy the structure (helmfile.yaml, values.yaml, values-staging.yaml, etc.) from {{Gitweb|project=operations/deployment-charts|file=helmfile.d/services/_example_}} and customize as needed.
| |
| | |
| You can proceed to deploy the new service to staging for real. Don't worry for TLS (if needed) since in staging it will be added a default config for your service auto-magically. Things are slightly different for production.
| |
| | |
| '''On deploy1002:'''
| |
| cd /srv/deployment-charts/helmfile.d/services/service-foo
| |
| helmfile -e staging -i apply
| |
| The command above will show a diff related to the new service, make sure that everything looks fine and then hit Yes to proceed.
| |
| | |
| ==== Testing a service ==== | |
| #Now we can test the service in staging. Use the very handy endpoint: <code>http(s)://staging.svc.eqiad.wmnet:$YOUR-SERVICE-PORT</code> to quickly test if everything works as expected.
| |
| ==== Deploy a service to production ====
| |
| #Create certificates for the new service, if it has an HTTPS endpoint (remember that this step for staging is automatically handled for you, but for production it is not).
| |
| #[[Kubernetes/Enabling TLS|Enable TLS for Kubernetes deployments]]
| |
| #At this point, you need to update the admin config for eqiad and codfw (if you have configs for both of course):
| |
| #*On deploy1002: <code>sudo -i; cd /srv/deployment-charts/helmfile.d/admin/codfw/; kube_env admin codfw; ./cluster-helmfile.sh -i apply</code>
| |
| #*On deploy1002: <code>sudo -i; cd /srv/deployment-charts/helmfile.d/admin/eqiad/; kube_env admin eqiad; ./cluster-helmfile.sh -i apply</code>
| |
| #Then the final step, namely deploying the new service:
| |
| #*On deploy1002: <code>cd /srv/deployment-charts/helmfile.d/services/service-foo; helmfile -e codfw -i apply</code>
| |
| #*On deploy1002: <code>cd /srv/deployment-charts/helmfile.d/services/service-foo; helmfile -e eqiad -i apply</code>
| |
| The service can now be accessed via the registered port on any of the kubernetes nodes (for manual testing).
| |
| | |
| If you need the service to be easily accessible from outside of the cluster, you might want to add [[LVS#Add%20a%20new%20load%20balanced%20service|Add a new load balanced service]].
| |
| | |
| === Rebooting a worker node ===
| |
| | |
| ==== The unpolite way ====
| |
| To reboot a worker node, you can just reboot it in our environment. The platform will understand the event and respawn the pods on other nodes. However the system does not automatically rebalance itself currently (pods are not rescheduled on the node after it has been rebooted)
| |
| | |
| ==== The polite way (recommended) ====
| |
| | |
| If you feel like being more polite, use kubectl drain, it will configure the worker node to no longer create new pods and move the existing pods to other workers. Draining the node will take time. Rough numbers on 2019-12-11 are at around 60 seconds.
| |
| | |
| <syntaxhighlight lang="shell-session">
| |
| # kubectl drain --ignore-daemonsets kubernetes1001.eqiad.wmnet
| |
| # kubectl describe pods --all-namespaces | awk '$1=="Node:" {print $NF}' | sort -u
| |
| kubernetes1002.eqiad.wmnet/10.64.16.75
| |
| kubernetes1003.eqiad.wmnet/10.64.32.23
| |
| kubernetes1004.eqiad.wmnet/10.64.48.52
| |
| kubernetes1005.eqiad.wmnet/10.64.0.145
| |
| kubernetes1006.eqiad.wmnet/10.64.32.18
| |
| # kubectl get nodes
| |
| NAME STATUS ROLES AGE VERSION
| |
| kubernetes1001.eqiad.wmnet Ready,SchedulingDisabled <none> 2y352d v1.12.9
| |
| kubernetes1002.eqiad.wmnet Ready <none> 2y352d v1.12.9
| |
| kubernetes1003.eqiad.wmnet Ready <none> 2y352d v1.12.9
| |
| kubernetes1004.eqiad.wmnet Ready <none> 559d v1.12.9
| |
| kubernetes1005.eqiad.wmnet Ready <none> 231d v1.12.9
| |
| kubernetes1006.eqiad.wmnet Ready <none> 231d v1.12.9
| |
| </syntaxhighlight>
| |
| | |
| When the node has been rebooted, it can be configured to reaccept pods using '''kubectl uncordon''', e.g.
| |
| <syntaxhighlight lang="shell-session">
| |
| # kubectl uncordon kubernetes1001.eqiad.wmnet
| |
| # kubectl get nodes
| |
| NAME STATUS ROLES AGE VERSION
| |
| kubernetes1001.eqiad.wmnet Ready <none> 2y352d v1.12.9
| |
| kubernetes1002.eqiad.wmnet Ready <none> 2y352d v1.12.9
| |
| kubernetes1003.eqiad.wmnet Ready <none> 2y352d v1.12.9
| |
| kubernetes1004.eqiad.wmnet Ready <none> 559d v1.12.9
| |
| kubernetes1005.eqiad.wmnet Ready <none> 231d v1.12.9
| |
| kubernetes1006.eqiad.wmnet Ready <none> 231d v1.12.9
| |
| </syntaxhighlight>
| |
| | |
| The pods are not rebalanced automatically, i.e. the rebooted node is free of pods initially.
| |
| | |
| === Restarting specific components ===
| |
| | |
| kube-controller-manager and kube-scheduler are components of the API server. In production multiple ones run and perform via the API an election to determine which one is the master. Restarting both is without grave consequences so it's safe to do. However both are critical components in as such that there are required for the overall cluster to function smoothly. kube-scheduler is crucial for node failovers, pod evictions, etc while kube-controller-manager packs multiple controller components and is critical for responding to pod failures, depools etc.
| |
| | |
| commands would be <syntaxhighlight lang="bash">
| |
| sudo systemctl restart kube-controller-manager
| |
| sudo systemctl restart kube-scheduler
| |
| </syntaxhighlight>
| |
| | |
| === Restarting the API server ===
| |
| | |
| It's behind LVS in production, it's fine to restart it as long as enough time is given between the restarts across the cluster.
| |
| <syntaxhighlight lang="bash">
| |
| sudo systemctl restart kube-apiserver
| |
| </syntaxhighlight>
| |
| | |
| If you need to restart all API servers, it might be wise to start with the ones that are not currently leading the cluster (to avoid multiple leader elections). The current leader is stored in the <code>control-plane.alpha.kubernetes.io/leader</code> annotation of the kube-scheduler endpoint:<syntaxhighlight lang="bash">
| |
| kubectl -n kube-system describe ep kube-scheduler
| |
| </syntaxhighlight>
| |
| | |
| === Switch the active staging cluster (eqiad<->codfw) ===
| |
| We do have one staging cluster per DC, mostly to separate staging of kubernetes and components from staging of the services running on top of it. To provide staging services during work on one of the clusters, we can (manually) switch between the DCs:
| |
| | |
| * Switch staging.svc.eqiad.wmnet to point to the new active k8s cluster (we should have a better solution/DNS name for this at some point)
| |
| ** https://gerrit.wikimedia.org/r/c/operations/dns/+/667982
| |
| * Switch the definition of "staging" on the deployment servers:
| |
| ** https://gerrit.wikimedia.org/r/c/operations/puppet/+/667996
| |
| * Switch CI and releases to the other kubestagemaster:
| |
| ** https://gerrit.wikimedia.org/r/c/operations/puppet/+/668114
| |
| ** <syntaxhighlight lang="bash">
| |
| sudo cumin -b 3 'O:ci::master or O:releases or O:deployment_server' 'run-puppet-agent -q'
| |
| </syntaxhighlight>
| |
| *Make sure all service deployments are up to date after the switch (e.g. deploy them all)
| |
| | |
| === Managing pods, jobs and cronjobs ===
| |
| | |
| Commands should be run from the [[Deployment_server|deployment servers]] (at the time of this writing [[deploy1002]]).
| |
| | |
| You need to set the correct context, for example:
| |
| kube_env admin eqiad
| |
| Other choices are codfw, staging-eqiad and staging-codfw.
| |
| | |
| The management commands is called [[kubectl]]. You may find some more inspiration on kubectl commands at [[Kubernetes/kubectl_Cheat_Sheet]]
| |
| | |
| ==== Listing cronjobs, jobs and pods ====
| |
| kubectl get cronjobs -n <namespace>
| |
| kubectl get jobs -n <namespace>
| |
| kubectl get pods -n <namespace>
| |
| | |
| ==== Deleting a job ====
| |
| kubectl delete job <job id>
| |
| | |
| ==== Updating the docker image run by a CronJob ====
| |
| | |
| The relationship between the resources is the following:
| |
| | |
| Cronjob --spawns--> Job(s) --spawns--> Pod(s)
| |
| | |
| Note: Technically speaking, it's a tight control loop that lives in kube-controller-manager that does the spawning part, but adding that to the above would make this more confusing.
| |
| | |
| Under normal conditions a docker image version will be updated when a new deploy happens. The cronjob will have the new version. However, already created jobs by the CronJob will not be stopped until they have run to completion.
| |
| | |
| When the job finishes, the cronjob will create new job(s), which in turn will create new pod(s).
| |
| | |
| Depending on the correlation between a CronJob scheduling and the job run time there might be a window of time where despite the new deployment, the old job is still running.
| |
| | |
| Deleting the kubernetes pod created by the job itself will NOT work, i.e. the job will still exist and it will create a new pod (which will still have the old image).
| |
| | |
| So, if we are dealing with a long running kubernetes Job one can get the same effect by deleting the kubernetes job created by the cronjob.
| |
| | |
| [[phab:T280076]] is an example where this was needed.
| |
| | |
| | |
| ==== Recreate pods (of deployments, daemonsets, statefulsets, ...) ====
| |
| Pods which are backed by workloads controllers (such as Deployments or Daemonsets) can be easily recreated, without the need to manually delete them, using `kubectl rollout`. This will make sure that the update strategy specified for the set of pods as well as disruption budgets etc. are properly honored.
| |
| | |
| To restart all pods of a specific Deployment/Daemonset:<syntaxhighlight lang="bash">
| |
| kubectl -n NAMESPACE rollout restart [deployment|daemonset|statefulset|...] NAME
| |
| </syntaxhighlight>You may also restart all Pods of all Deployments/Daemonsets in a specific namespace just by omitting the name. The command will immediately return (e.g. not wait for the process to complete) and the scheduler will do the actual rolling restart in background for you.
| |
| | |
| In order to restart workload across multiple namespaces, one can use something like:
| |
| <syntaxhighlight lang="bash">
| |
| kubectl get ns -l app.kubernetes.io/managed-by=Helm -o jsonpath='{.items[*].metadata.name}' | xargs -L1 -d ' ' kubectl rollout restart deployment -n
| |
| </syntaxhighlight>
| |
|
| |
|
| With or without label filters. The above ensures that for example workload in pre-defined namespaces (like kube-system) does not get restarted.
| | * [[Kubernetes/Administration#Rebooting worker nodes|Rebooting worker nodes]] |
| | * [[Kubernetes/Administration#Restarting%20specific%20components|Restarting specific components]] |
| | * [[Kubernetes/Administration#Managing pods, jobs and cronjobs|Managing pods, jobs, and, cronjobs]] |
|
| |
|
| == See also == | | === See also === |
|
| |
|
| * [[Kubernetes/Clusters/New|Adding a new Kubernetes cluster]] | | * [[Kubernetes/Clusters/New|Adding a new Kubernetes cluster]] |