You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Miscweb

From Wikitech-static
Revision as of 00:27, 25 January 2022 by imported>Dzahn (→‎How this service was made)
Jump to navigation Jump to search

miscweb is a new service on kubernetes.

Since 2022-01-20 it serves production traffic for static-bugzilla.

It was requested in task T281538 to replace the legacy service "miscweb" running on Ganeti VMs in production.

Also see: miscweb1002, miscweb2002 for the legacy machines still serving other microsites.


Sites running on miscweb k8s

The first of the sites hosted on miscweb-k8s is static-bugzilla.

Since 2022-01-20 static-bugzilla.wikimedia.org is served from k8s.

The actual switch to the new backend was made here.

Other micro-sites are going to follow this quarter.

Where does the code live?

The docker image is is built by the Deployment Pipeline/CI from the repo operations/container/miscweb. This is also where the actual content and webserver config can be found.

The helm charts for kubernetes are together with the other services in operations/deployment-charts.

Note that all the HTML content files are gzipped to reduce image size. If you want to edit HTML inside it you need to gunzip, edit and gzip, then upload to Gerrit.

How to deploy changes

Staging

  • ssh deploy1002.eqiad.wmnet
  • [deploy1002:~] $ kube_env miscweb staging
  • [deploy1002:~] $ helmfile -e staging diff
  • [deploy1002:~] $ helmfile -e staging -i apply

And wait.. either it works after a little while or it will automatically revert after 5 minutes.

Production

  • ssh deploy1002.eqiad.wmnet
  • [deploy1002:~] $ kube_env miscweb codfw
  • [deploy1002:~] $ helmfile -e codfw diff
  • [deploy1002:~] $ helmfile -e codfw -i apply

And wait.. either it works after a little while or it will automatically revert after 5 minutes.

  • [deploy1002:~] $ kube_env miscweb eqiad
  • [deploy1002:~] $ helmfile -e eqiad diff
  • [deploy1002:~] $ helmfile -e eqiad -i apply

And wait.. either it works after a little while or it will automatically revert after 5 minutes.

Service names

miscweb.svc.eqiad.wmnet has address 10.2.2.58  (eqiad)
miscweb.svc.codfw.wmnet has address 10.2.1.58  (codfw)
miscweb.discovery.wmnet has address 10.2.2.58  (DNS/Discovery)

LVS / discovery

https://config-master.wikimedia.org/pybal/eqiad/miscweb

https://config-master.wikimedia.org/pybal/codfw/miscweb

https://config-master.wikimedia.org/discovery/

How this service was made

Here I am trying to compile a table / list of all the changes made to get this service from scratch into WMF production, in chronological order of how they were merged.

steps for miscweb
# action link
1 created a new service request ticket https://phabricator.wikimedia.org/project/profile/1305/
2 read docs https://wikitech.wikimedia.org/wiki/Kubernetes#Add_a_new_service
3 reserved a service port https://wikitech.wikimedia.org/wiki/Kubernetes/Service_ports
4 added tokens in private repo to CI::master and deployment_server in private repo cd /srv/private/.. on the puppetmaster (ask an SRE with root access if needed)
5 added dummy tokens in the labs/private repo https://gerrit.wikimedia.org/r/684000
6 created a new namespace in kubernetes, use helmfile apply on deployment servers https://gerrit.wikimedia.org/r/683743
7 added new namespace to CI and deployment_server https://gerrit.wikimedia.org/r/681500/ , https://gerrit.wikimedia.org/r/685116
8 requested a new Gerrit repo to host your (Blubber) code https://www.mediawiki.org/wiki/Gerrit/New_repositories/Requests
9 read about deployment pipeline https://wikitech.wikimedia.org/wiki/Deployment_pipeline/Migration/Tutorial
10 added initial config stub for pipeline lib https://gerrit.wikimedia.org/r/690678
11 read about Blubber https://wikitech.wikimedia.org/wiki/Blubber , https://wikitech.wikimedia.org/wiki/Blubber/Pipeline
12 added initial Blubber file https://gerrit.wikimedia.org/r/690768
13 added pipelines and config in integration/config https://gerrit.wikimedia.org/r/690788 (asked releng)
14 added bespoke pipeline in integration/config if needed https://gerrit.wikimedia.org/r/690794 (asked releng)
15 added LVS service IPs https://gerrit.wikimedia.org/r/693966
16 added entrypoint.sh in Blubber https://gerrit.wikimedia.org/r/697140
17 tried staging/test variants https://gerrit.wikimedia.org/r/697142
18 simplified apache config https://gerrit.wikimedia.org/r/697654/ , https://gerrit.wikimedia.org/r/697663 , https://gerrit.wikimedia.org/r/697691
19 installed vim, curl in container for testing https://gerrit.wikimedia.org/r/697655/ , https://gerrit.wikimedia.org/r/697666
20 dropped/merged unused pipeline https://gerrit.wikimedia.org/r/697657
21 switched service to not run 'insecurely' (as a separate user) https://gerrit.wikimedia.org/r/697662/
22 added virtual site inside webserver https://gerrit.wikimedia.org/r/697695
23 tested cloning from repo, letting Blubber generate a Dockerfile and got shell inside container https://phabricator.wikimedia.org/T281538#7128132
24 stopped loading modules not used https://gerrit.wikimedia.org/r/698079
25 reserved a public port for LVS https://wikitech.wikimedia.org/w/index.php?title=Service_ports&type=revision&diff=1914806&oldid=1913236
26 opened firewall on deployment server to dump data from pre-k8s service https://gerrit.wikimedia.org/r/699064
27 rsynced data over to deployment server https://phabricator.wikimedia.org/T281538#7147262
28 added config to serve data gzipped to reduce image size, installed browser in container to test https://gerrit.wikimedia.org/r/698079 , https://gerrit.wikimedia.org/r/699320
29 load mod_rewrite and mod_headers, add headers/encoding settings for gziped content https://gerrit.wikimedia.org/r/699319
30 read about helm and deployments on kubernetes https://wikitech.wikimedia.org/wiki/Helm , https://wikitech.wikimedia.org/wiki/Kubernetes/Deployments
30 cloned the repo 'operations/deployment-charts' where the helm files live https://gerrit.wikimedia.org/r/admin/repos/operations/deployment-charts
31 read README in the repo about how to create charts https://gerrit.wikimedia.org/g/operations/deployment-charts
32 read and ran 'create_new_service.sh' https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/create_new_service.sh https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/Rakefile
33 adjusted values in new files generated by script and uploaded to the repo https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/698895/
34 created a new app type for a httpd without php-fpm, added a prometheus (metrics) exporter https://gerrit.wikimedia.org/r/700522
35 added helmfile.yaml and values under services.d, copying from another service https://gerrit.wikimedia.org/r/713441
36 set the docker registry name specifically to use the discovery name https://gerrit.wikimedia.org/r/714014/
37 added uncompressed content of the first 1000 Bugzilla bugs https://gerrit.wikimedia.org/r/714460
38 cleaned up and added comments for others to delete files they don't use https://gerrit.wikimedia.org/r/713639
39 set a main_app version and added some CPU/RAM limits https://gerrit.wikimedia.org/r/714022
40 added reserved port as nodePort https://gerrit.wikimedia.org/r/714053
41 added version tags for staging and production https://gerrit.wikimedia.org/r/714368
42 linked staging httpd config to prod httpd config https://gerrit.wikimedia.org/r/714458
43 added httpd rewrite rules from pre-k8s config https://gerrit.wikimedia.org/r/714459
44 set service deployment to production, not minikube https://gerrit.wikimedia.org/r/714034
45 bumped staging version to latest build created by CI https://gerrit.wikimedia.org/r/714755 , https://gerrit.wikimedia.org/r/715236 etc .. (skipping these in the future, needed after every change)
46 loaded missing mod_alias for Redirect directive https://gerrit.wikimedia.org/r/715727
47 added HTML content for the first 10000 bugs, checked image size https://gerrit.wikimedia.org/r/717347
48 compressed content with gzip and added more bug HTML https://gerrit.wikimedia.org/r/728668
49 various changes to add all the content in batches of 10k bugs, then the same for activities HTML files https://gerrit.wikimedia.org/r/730275 , https://gerrit.wikimedia.org/r/730281 and various others up to https://gerrit.wikimedia.org/r/730334
50 added and gzipped index and "all" pages https://gerrit.wikimedia.org/r/730336
51 added old Bugzilla Wikimedia skin directory https://gerrit.wikimedia.org/r/730339
52 read about adding a new service to LVS https://wikitech.wikimedia.org/wiki/LVS#Add_a_new_load_balanced_service
53 added service IPs in DNS https://netbox.wikimedia.org/ (ask infra foundations)
53 added LVS config and had it merged https://gerrit.wikimedia.org/r/694625 (coordinate with serviceops/traffic for this step)
54 switched service_state from service_setup to lvs_setup https://gerrit.wikimedia.org/r/694628
55 enabled LVS in helm chart https://gerrit.wikimedia.org/r/739675
56 removed nodePort, added public_port, enabled TLS, multiple attempts to get the order right, then TLS worked https://gerrit.wikimedia.org/r/739810 , https://gerrit.wikimedia.org/r/739848 , https://gerrit.wikimedia.org/r/739945 , https://gerrit.wikimedia.org/r/742819
57 switched service_state from lvs_setup to monitoring_setup, checked new Icinga monitoring being added, further testing to confirm it at all works https://gerrit.wikimedia.org/r/694629 , https://phabricator.wikimedia.org/T281538#7578691
58 debugged gzip encoding issue in cloud VPS, confirmed can pull and run directly from prod docker registry https://phabricator.wikimedia.org/T281538#7606684
59 fixed content type for HTML, which was set to CSS, service now working in cloud https://gerrit.wikimedia.org/r/752235 , https://staticbz.wmcloud.org/bug10001.html
60 further version bumping / deploying / testing https://gerrit.wikimedia.org/r/752750
61 confirmed working with curl directly from production service names with right content-type and content-encoding https://phabricator.wikimedia.org/T281538#7620703
62 switched service_state from monitoring_setup to production (make it page) but only very carefully after checking confd templates on DNS servers, downtiming services in Icinga https://gerrit.wikimedia.org/r/694630 , https://phabricator.wikimedia.org/T281538#7620961
63 read about discovery DNS https://wikitech.wikimedia.org/wiki/DNS/Discovery
64 added discovery DNS as an active-active service, confirmed could now curl from discovery name https://gerrit.wikimedia.org/r/693968 , https://phabricator.wikimedia.org/T281538#7620995
65 switched ATS (traffic servers/caching layer) from old backend to new backend, the discovery name on our reserved service port https://gerrit.wikimedia.org/r/753813
66 added service to disc_desired_state.py https://gerrit.wikimedia.org/r/753846
67 ATS servers got 502, did not work, reverted, turned out the reason was a missing SAN on the TLS cert
68 addded SAN to cert, created new cert, checked it https://phabricator.wikimedia.org/T281538#7635115