You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Miscweb: Difference between revisions
imported>Dzahn |
imported>Dzahn m (TLS, not LVS) |
||
Line 188: | Line 188: | ||
| 54 || switched service_state from service_setup to lvs_setup || https://gerrit.wikimedia.org/r/694628 | | 54 || switched service_state from service_setup to lvs_setup || https://gerrit.wikimedia.org/r/694628 | ||
|- | |- | ||
| 55 || enabled | | 55 || enabled TLS in helm chart || https://gerrit.wikimedia.org/r/739675 | ||
|- | |- | ||
| 56 || removed nodePort, added public_port, enabled TLS, multiple attempts to get the order right, then TLS worked || https://gerrit.wikimedia.org/r/739810 , https://gerrit.wikimedia.org/r/739848 , https://gerrit.wikimedia.org/r/739945 , https://gerrit.wikimedia.org/r/742819 | | 56 || removed nodePort, added public_port, enabled TLS, multiple attempts to get the order right, then TLS worked || https://gerrit.wikimedia.org/r/739810 , https://gerrit.wikimedia.org/r/739848 , https://gerrit.wikimedia.org/r/739945 , https://gerrit.wikimedia.org/r/742819 |
Revision as of 14:45, 25 January 2022
miscweb is a new service on kubernetes.
Since 2022-01-20 it serves production traffic for static-bugzilla.
It was requested in task T281538 to replace the legacy service "miscweb" running on Ganeti VMs in production.
Also see: miscweb1002, miscweb2002 for the legacy machines still serving other microsites.
Sites running on miscweb k8s
The first of the sites hosted on miscweb-k8s is static-bugzilla.
Since 2022-01-20 static-bugzilla.wikimedia.org is served from k8s.
The actual switch to the new backend was made here.
Other micro-sites are going to follow this quarter.
Where does the code live?
The docker image is is built by the Deployment Pipeline/CI from the repo operations/container/miscweb. This is also where the actual content and webserver config can be found.
The helm charts for kubernetes are together with the other services in operations/deployment-charts.
Note that all the HTML content files are gzipped to reduce image size. If you want to edit HTML inside it you need to gunzip, edit and gzip, then upload to Gerrit.
How to deploy changes
Staging
- ssh deploy1002.eqiad.wmnet
- [deploy1002:~] $ kube_env miscweb staging
- [deploy1002:~] $ helmfile -e staging diff
- [deploy1002:~] $ helmfile -e staging -i apply
And wait.. either it works after a little while or it will automatically revert after 5 minutes.
Production
- ssh deploy1002.eqiad.wmnet
- [deploy1002:~] $ kube_env miscweb codfw
- [deploy1002:~] $ helmfile -e codfw diff
- [deploy1002:~] $ helmfile -e codfw -i apply
And wait.. either it works after a little while or it will automatically revert after 5 minutes.
- [deploy1002:~] $ kube_env miscweb eqiad
- [deploy1002:~] $ helmfile -e eqiad diff
- [deploy1002:~] $ helmfile -e eqiad -i apply
And wait.. either it works after a little while or it will automatically revert after 5 minutes.
Service names
miscweb.svc.eqiad.wmnet has address 10.2.2.58 (eqiad) miscweb.svc.codfw.wmnet has address 10.2.1.58 (codfw)
miscweb.discovery.wmnet has address 10.2.2.58 (DNS/Discovery)
LVS / discovery
https://config-master.wikimedia.org/pybal/eqiad/miscweb
https://config-master.wikimedia.org/pybal/codfw/miscweb
https://config-master.wikimedia.org/discovery/
How this service was made
Here I am trying to compile a table / list of all the changes made to get this service from scratch into WMF production, in chronological order of how they were merged.
# | action | link | |
---|---|---|---|
1 | created a new service request ticket | https://phabricator.wikimedia.org/project/profile/1305/ | |
2 | read docs | https://wikitech.wikimedia.org/wiki/Kubernetes#Add_a_new_service | |
3 | reserved a service port | https://wikitech.wikimedia.org/wiki/Kubernetes/Service_ports | |
4 | added tokens in private repo to CI::master and deployment_server in private repo | cd /srv/private/.. on the puppetmaster (ask an SRE with root access if needed) | |
5 | added dummy tokens in the labs/private repo | https://gerrit.wikimedia.org/r/684000 | |
6 | created a new namespace in kubernetes, use helmfile apply on deployment servers | https://gerrit.wikimedia.org/r/683743 | |
7 | added new namespace to CI and deployment_server | https://gerrit.wikimedia.org/r/681500/ , https://gerrit.wikimedia.org/r/685116 | |
8 | requested a new Gerrit repo to host your (Blubber) code | https://www.mediawiki.org/wiki/Gerrit/New_repositories/Requests | |
9 | read about deployment pipeline | https://wikitech.wikimedia.org/wiki/Deployment_pipeline/Migration/Tutorial | |
10 | added initial config stub for pipeline lib | https://gerrit.wikimedia.org/r/690678 | |
11 | read about Blubber | https://wikitech.wikimedia.org/wiki/Blubber , https://wikitech.wikimedia.org/wiki/Blubber/Pipeline | |
12 | added initial Blubber file | https://gerrit.wikimedia.org/r/690768 | |
13 | added pipelines and config in integration/config | https://gerrit.wikimedia.org/r/690788 (asked releng) | |
14 | added bespoke pipeline in integration/config if needed | https://gerrit.wikimedia.org/r/690794 (asked releng) | |
15 | added LVS service IPs | https://gerrit.wikimedia.org/r/693966 | |
16 | added entrypoint.sh in Blubber | https://gerrit.wikimedia.org/r/697140 | |
17 | tried staging/test variants | https://gerrit.wikimedia.org/r/697142 | |
18 | simplified apache config | https://gerrit.wikimedia.org/r/697654/ , https://gerrit.wikimedia.org/r/697663 , https://gerrit.wikimedia.org/r/697691 | |
19 | installed vim, curl in container for testing | https://gerrit.wikimedia.org/r/697655/ , https://gerrit.wikimedia.org/r/697666 | |
20 | dropped/merged unused pipeline | https://gerrit.wikimedia.org/r/697657 | |
21 | switched service to not run 'insecurely' (as a separate user) | https://gerrit.wikimedia.org/r/697662/ | |
22 | added virtual site inside webserver | https://gerrit.wikimedia.org/r/697695 | |
23 | tested cloning from repo, letting Blubber generate a Dockerfile and got shell inside container | https://phabricator.wikimedia.org/T281538#7128132 | |
24 | stopped loading modules not used | https://gerrit.wikimedia.org/r/698079 | |
25 | reserved a public port for LVS | https://wikitech.wikimedia.org/w/index.php?title=Service_ports&type=revision&diff=1914806&oldid=1913236 | |
26 | opened firewall on deployment server to dump data from pre-k8s service | https://gerrit.wikimedia.org/r/699064 | |
27 | rsynced data over to deployment server | https://phabricator.wikimedia.org/T281538#7147262 | |
28 | added config to serve data gzipped to reduce image size, installed browser in container to test | https://gerrit.wikimedia.org/r/698079 , https://gerrit.wikimedia.org/r/699320 | |
29 | load mod_rewrite and mod_headers, add headers/encoding settings for gziped content | https://gerrit.wikimedia.org/r/699319 | |
30 | read about helm and deployments on kubernetes | https://wikitech.wikimedia.org/wiki/Helm , https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments | |
30 | cloned the repo 'operations/deployment-charts' where the helm files live | https://gerrit.wikimedia.org/r/admin/repos/operations/deployment-charts | |
31 | read README in the repo about how to create charts | https://gerrit.wikimedia.org/g/operations/deployment-charts | |
32 | read and ran 'create_new_service.sh' | https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/create_new_service.sh https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/Rakefile | |
33 | adjusted values in new files generated by script and uploaded to the repo | https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/698895/ | |
34 | created a new app type for a httpd without php-fpm, added a prometheus (metrics) exporter | https://gerrit.wikimedia.org/r/700522 | |
35 | added helmfile.yaml and values under services.d, copying from another service | https://gerrit.wikimedia.org/r/713441 | |
36 | set the docker registry name specifically to use the discovery name | https://gerrit.wikimedia.org/r/714014/ | |
37 | added uncompressed content of the first 1000 Bugzilla bugs | https://gerrit.wikimedia.org/r/714460 | |
38 | cleaned up and added comments for others to delete files they don't use | https://gerrit.wikimedia.org/r/713639 | |
39 | set a main_app version and added some CPU/RAM limits | https://gerrit.wikimedia.org/r/714022 | |
40 | added reserved port as nodePort | https://gerrit.wikimedia.org/r/714053 | |
41 | added version tags for staging and production | https://gerrit.wikimedia.org/r/714368 | |
42 | linked staging httpd config to prod httpd config | https://gerrit.wikimedia.org/r/714458 | |
43 | added httpd rewrite rules from pre-k8s config | https://gerrit.wikimedia.org/r/714459 | |
44 | set service deployment to production, not minikube | https://gerrit.wikimedia.org/r/714034 | |
45 | bumped staging version to latest build created by CI | https://gerrit.wikimedia.org/r/714755 , https://gerrit.wikimedia.org/r/715236 etc .. (skipping these in the future, needed after every change) | |
46 | loaded missing mod_alias for Redirect directive | https://gerrit.wikimedia.org/r/715727 | |
47 | added HTML content for the first 10000 bugs, checked image size | https://gerrit.wikimedia.org/r/717347 | |
48 | compressed content with gzip and added more bug HTML | https://gerrit.wikimedia.org/r/728668 | |
49 | various changes to add all the content in batches of 10k bugs, then the same for activities HTML files | https://gerrit.wikimedia.org/r/730275 , https://gerrit.wikimedia.org/r/730281 and various others up to https://gerrit.wikimedia.org/r/730334 | |
50 | added and gzipped index and "all" pages | https://gerrit.wikimedia.org/r/730336 | |
51 | added old Bugzilla Wikimedia skin directory | https://gerrit.wikimedia.org/r/730339 | |
52 | read about adding a new service to LVS | https://wikitech.wikimedia.org/wiki/LVS#Add_a_new_load_balanced_service | |
53 | added service IPs in DNS | https://netbox.wikimedia.org/ (ask infra foundations) | |
53 | added LVS config and had it merged | https://gerrit.wikimedia.org/r/694625 (coordinate with serviceops/traffic for this step) | |
54 | switched service_state from service_setup to lvs_setup | https://gerrit.wikimedia.org/r/694628 | |
55 | enabled TLS in helm chart | https://gerrit.wikimedia.org/r/739675 | |
56 | removed nodePort, added public_port, enabled TLS, multiple attempts to get the order right, then TLS worked | https://gerrit.wikimedia.org/r/739810 , https://gerrit.wikimedia.org/r/739848 , https://gerrit.wikimedia.org/r/739945 , https://gerrit.wikimedia.org/r/742819 | |
57 | switched service_state from lvs_setup to monitoring_setup, checked new Icinga monitoring being added, further testing to confirm it at all works | https://gerrit.wikimedia.org/r/694629 , https://phabricator.wikimedia.org/T281538#7578691 | |
58 | debugged gzip encoding issue in cloud VPS, confirmed can pull and run directly from prod docker registry | https://phabricator.wikimedia.org/T281538#7606684 | |
59 | fixed content type for HTML, which was set to CSS, service now working in cloud | https://gerrit.wikimedia.org/r/752235 , https://staticbz.wmcloud.org/bug10001.html | |
60 | further version bumping / deploying / testing | https://gerrit.wikimedia.org/r/752750 | |
61 | confirmed working with curl directly from production service names with right content-type and content-encoding | https://phabricator.wikimedia.org/T281538#7620703 | |
62 | switched service_state from monitoring_setup to production (make it page) but only very carefully after checking confd templates on DNS servers, downtiming services in Icinga | https://gerrit.wikimedia.org/r/694630 , https://phabricator.wikimedia.org/T281538#7620961 | |
63 | read about discovery DNS | https://wikitech.wikimedia.org/wiki/DNS/Discovery | |
64 | added discovery DNS as an active-active service, confirmed could now curl from discovery name | https://gerrit.wikimedia.org/r/693968 , https://phabricator.wikimedia.org/T281538#7620995 | |
65 | switched ATS (traffic servers/caching layer) from old backend to new backend, the discovery name on our reserved service port | https://gerrit.wikimedia.org/r/753813 | |
66 | added service to disc_desired_state.py | https://gerrit.wikimedia.org/r/753846 | |
67 | ATS servers got 502, did not work, reverted, turned out the reason was a missing SAN on the TLS cert | ||
68 | addded SAN to cert, created new cert, checked it | https://phabricator.wikimedia.org/T281538#7635115 |