You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Performance/Runbook/Puppet patches: Difference between revisions
imported>Krinkle |
imported>Alex Monk m (→Beta Cluster testing: puppetmaster++) |
||
Line 45: | Line 45: | ||
Steps: | Steps: | ||
* Connect with SSH to the current puppetmaster in Beta Cluster (<code>deployment- | * Connect with SSH to the current puppetmaster in Beta Cluster (<code>deployment-puppetmaster04.deployment-prep.eqiad.wmflabs</code>). | ||
* Enter sudo mode (<code>sudo -i</code>). | * Enter sudo mode (<code>sudo -i</code>). | ||
* Navigate to /var/lib/git/operations/puppet. | * Navigate to /var/lib/git/operations/puppet. |
Revision as of 22:51, 22 February 2020
This is the runbook for testing and staging Puppet patches that affect Puppet roles applied to servers maintained by the Performance Team.
Meta
- Source code (roles): Gerrit
- Source code (webperf classes): Gerrit
- Source code (arclamp classes): Gerrit
For changes to our services that run on these hosts, see instead the runbooks for Webperf-processor services and Webperf-tools services.
Writing a patch
See Puppet coding.
Testing a patch
When submitting a patch for operations/puppet.git, Jenkins typically reports within a minute or two with the results of syntax, coding convention, and unit tests.
Staging a patch
Before we deploy a patch to production, there's two kinds of tests we apply:
- Puppet compiler tests. This asks Puppet to simulate what would happen given all the production realm variables. This identical to what would happen in actual production, if applied to a clean install of the HEAD-1 state on a fresh server and no private overrides.
- Beta Cluster testing. This will actually apply the patch to a real server in the Beta Cluster. Catches everything that would happen on a real server. But, it runs with the betacluster realm variables instead of production. So there may be intentional differences.
Puppet compiler
Prerequisites:
- Wikimedia Developer account (same as Gerrit account), with "wmf" or "nda" user group.
Steps:
- Use the build form for the puppet-compiler job on Jenkins.
- Enter the Gerrit change number.
- Enter the list of nodes to simulate before/after. For our patches this is usually:
webperf1001.eqiad.wmnet,webperf1002.eqiad.wmnet,webperf2001.codfw.wmnet,webperf2002.codfw.wmnet
- Start the build and view its console output. Once done, review its result. (example)
Beta Cluster testing
Once the patch passes Puppet compiler without errors, and the effective changes are what you want them to be, it's time to cherry-pick the puppet patch to the Beta Cluster.
Prerequisites:
- Wikimedia Developer account (same as wikitech.wikimedia.org account).
- Shell access to Wikimedia Cloud VPS (see Help:Access).
- In user group "Administrators" for the "Beta Cluster" VPS project in Wikimedia Cloud (existing admins can add you in Horizon).
Steps:
- Connect with SSH to the current puppetmaster in Beta Cluster (
deployment-puppetmaster04.deployment-prep.eqiad.wmflabs
). - Enter sudo mode (
sudo -i
). - Navigate to /var/lib/git/operations/puppet.
- Ensure
git status
is clean. - From the change page on Gerrit, under "Download", copy the "Cherry Pick" command (using
anonymous http
). - Run the command on the puppetmaster in the operations/puppet directory.
Now, in a separate terminal (so that you can easily undo or fixup if something goes wrong):
- Connect with SSH to the Beta Cluster server you want to apply the change to. For example, if the change affects webperf1001 in production, you'd connect with
deployment-webperf11.deployment-prep.eqiad.wmflabs
. If the change affects multiple, carefully consider whether it should really be a single commit. If the in-between state is harmless, then go ahead and try to do this mostly concurrently for other hosts as well in a third terminal. - Trigger a Puppet agent run on this host:
sudo puppet agent -tv
.
If Puppet fails with an error about compilation of the Puppet catalog, that means the Puppet master is now unable to serve any hosts in the Beta Cluster, including others. As such, undo your change on the puppetmaster by running git rebase -i
and removing your cherry-pick from the list.
Once any Puppet compilation error or other error has been addressed with an amended version of the patch, confirm that the host is now has the new behaviour your patch intends to create.
Report back to Gerrit and ask SRE to merge it:
- Leave a link to the clean Puppet compiler result in a Gerrit comment.
- Mention in a comment that it's live on Beta Cluster and working as intended.
If the patch is needed for Beta Cluster to work properly, leave it and add hash tag "beta-cherry-picked
" to the Gerrit change. Otherwise, remove the cherry-pick from the puppetmaster after you are done testing the patch.