You are browsing a read-only backup copy of Wikitech. The live site can be found at

Difference between revisions of "Enable TLS for Kubernetes deployments"

From Wikitech-static
Jump to navigation Jump to search
(Legoktm moved page Enable TLS for Kubernetes deployments to Kubernetes/Enabling TLS: Move under Kubernetes/)
Line 1: Line 1:
We use [[envoy]] to provide TLS termination functionality to services. It's installed as a sidecar in each pod and functions as a reverse proxy to the app. We intend the use it at some point as an initiation point for TLS as well, but that's down the road.
#REDIRECT [[Kubernetes/Enabling TLS]]
== Add support to the chart ==
There is support for TLS already that has been split off from charts and in in [ common_templates/]. Feel free to look at other charts and copy their approach, the basics are:
* Symlink from the chart to those (helm package will resolve them correctly)
* Amend the chart to use them
* Add a values file to .fixtures/ directory, so CI can test the chart with TLS-enabled
* Define a proper "upstream_timeout" for envoy to use. Current default is 60s
* Use the most recent image version (
* Choose a new TCP port.
** Update [[Service ports]] to point that out.
** Make it configurable so we can change it without messing with the chart
== Create and place certificates ==
* Patch the helm chart to add the relevant stanzas. Remember to package the chart and reindex before merging your patch
* Assuming you've guarded the TLS addition, do a noop deployment to verify you didn't change something fundamental
* For staging deployments, certificates for staging.svc.eqiad.wmnet and staging.svc.codfw.wmnet are provided by default. You may of cause override them if you need to
* Add the relevant production certificate to puppet's private repo:
** edit <code>/srv/private/modules/secret/secrets/certificates/certificate.manifests.d/kube_services.certs.yaml</code> and add a stanza for your service. It should closely mimic the existing ones.  '''DO NOT SET A PASSWORD'''.  Using a password results in an encrypted key file, which envoyproxy can't use.
** run cergen <code>cergen -c '$SERVICE_NAME.*'  --base-path /srv/private/modules/secret/secrets/certificates /srv/private/modules/secret/secrets/certificates/certificate.manifests.d</code> to see if the right certificates would be generated; then run again adding <code>--generate</code> to create the certificate
** '''ONLY IF YOU SET A KEY PASSWORD''' do the following: We need the unencrypted key, create it with  <code> openssl ec -in  modules/secret/secrets/certificates/$CERT_NAME/$CERT_NAME.key.private.pem -out modules/secret/secrets/certificates/$CERT_NAME/$CERT_NAME.key.private.unencrypted.pem</code>. You will be required a password (that you set up in cergen)
** Commit all the generated files to git
** edit <code>/srv/private/hieradata/role/common/deployment_server.yaml</code> to add it to the appropriate place there, for all production environments:
**: <syntaxhighlight lang="yaml">
      tls: &blubberoid_certs
          # NOTE: If you set a password, use the $CERT_NAME.key.private.unencrypted.pem file you created instead.
          key: "secret(certificates/$CERT_NAME/$CERT_NAME.key.private.pem)"
          cert: "secret(certificates/$CERT_NAME/$CERT_NAME.crt.pem)"
      tls: *blubberoid_certs
** commit all your changes
* Run puppet on the deployment hosts, verify the data that gets written to the <code>/etc/helmfile-defaults/private/$SERVICE_NAME/{staging,eqiad,codfw}.yaml</code> is correct
* Add the rest of the configuration for tls enablement in deployment-charts under <code>helmfile.d/services/$SERVICE_NAME/values*.yaml</code>
* Happy helming!
== Deploy the new chart version that has TLS support ==
helmfile sync/apply in all of the cluster (staging, codwf, eqiad) should cover this. Documentation could use some love but we have [[Deployments on kubernetes]] already.
== Enable the TLS support ==
Add a gerrit change to switch <code>tls.enabled</code> to true, perhaps by cluster and turn it on.
== Create a new LVS service for TLS enabled service ==
Follow [[LVS#Add_a_new_load_balanced_service]] to create a new LVS service on your newly chosen port, but on the same LVS IP as the previous one.
== Switch traffic, aka switch configuration of dependent services to use the new LVS service ==
Things that might need to be changed:
* mediawiki-config
* caching proxies configuration
Things to be mindful of:
* CPU and memory limits of the envoy sidecar container when more traffic starts hitting the new LVS service.
== Remove the old LVS service ==
For this we use the inverse process than the creation of the new LVS service. There is a runbook already at [[LVS#Remove_a_load_balanced_service]]
Things to be mindful of:
* Make sure that no traffic goes to the old service
* Alerts are scheduled downtime in icinga
== Decommission the non-TLS service from helm chart ==
The non-TLS service (template) may now be removed from the helm chart es well (freeing a nodePort).

Latest revision as of 23:40, 2 August 2021