You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Orchestrator"

From Wikitech-static
Jump to navigation Jump to search
imported>Kormat
(→‎Operations: split out packaging from operations)
imported>Kormat
(→‎Upgrading orchestrator: trimm trailing garbage)
Line 3: Line 3:


== Operations ==
== Operations ==
=== Current deployment ===
Orchestrator runs on <code>dborch1001.eqiad.wmnet</code>, a ganeti VM. It's publicly accessible as https://orchestrator.wikimedia.org/. It's backend database is named <code>orchestrator</code>, and it lives on <code>db2093.codfw.wmnet</code> (the <code>db_inventory</code> node in codfw, see [[phab:T266003|T266003]] for the background).


=== Adding a section to orchestrator ===
=== Adding a section to orchestrator ===
Line 15: Line 18:
=== Upgrading orchestrator ===
=== Upgrading orchestrator ===
Orchestrator automatically deploys schema changes when it gets upgraded. It tracks these in the <code>orchestrator_db_deployments</code> table. On startup it will check to see if the current version number is in that table, and if not it will perform ''all'' [https://github.com/openark/orchestrator/blob/c846d43668239cad384dc31b9255a3ade3a35001/go/db/generate_patches.go#L19-L21 schema changes]. It will not detect if a later version has been deployed. This means that we need a full backup of the orchestrator database before doing an upgrade, as otherwise we do not have a way to rollback.
Orchestrator automatically deploys schema changes when it gets upgraded. It tracks these in the <code>orchestrator_db_deployments</code> table. On startup it will check to see if the current version number is in that table, and if not it will perform ''all'' [https://github.com/openark/orchestrator/blob/c846d43668239cad384dc31b9255a3ade3a35001/go/db/generate_patches.go#L19-L21 schema changes]. It will not detect if a later version has been deployed. This means that we need a full backup of the orchestrator database before doing an upgrade, as otherwise we do not have a way to rollback.
On <code>dborch1001</code>:
# Update apt, so the new package is available: <code>sudo apt update</code>
# Stop orchestrator: <code>sudo systemctl stop orchestrator</code>
# Take a backup of the orchestrator backend database: <code>sudo mysqldump --defaults-file=/etc/mysql/orchestrator_srv.cnf --ssl -h db2093.codfw.wmnet orchestrator > orchestrator.sql.$(date +"%Y-%m-%d")</code>
# Upgrade the orchestrator packages: <code>sudo apt install orchestrator orchestrator-client</code>
# Test that the <code>orchestrator</code> binary works from the cmdline:<syntaxhighlight lang="shell">
$ sudo orchestrator -c clusters-alias
2021-10-14 14:03:12 DEBUG Connected to orchestrator backend: orchestrator_srv:?@tcp(db2093.codfw.wmnet:3306)/orchestrator?timeout=1s
2021-10-14 14:03:12 DEBUG Orchestrator pool SetMaxOpenConns: 128
2021-10-14 14:03:12 DEBUG Initializing orchestrator
2021-10-14 14:03:12 INFO Connecting to backend db2093.codfw.wmnet:3306: maxConnections: 128, maxIdleConns: 32
db1103.eqiad.wmnet:3306 x1
db1104.eqiad.wmnet:3306 s8
db1107.eqiad.wmnet:3306 m3
...
</syntaxhighlight>
# Start orchestrator: <code>sudo systemctl start orchestrator</code>
# Test that <code>orchestrator-client</code> works:<syntaxhighlight lang="shell">
$ orchestrator-client -c clusters-alias
db1103.eqiad.wmnet:3306,x1
db1104.eqiad.wmnet:3306,s8
db1107.eqiad.wmnet:3306,m3
...
</syntaxhighlight>
# Test that the [https://orchestrator.wikimedia.org/ web u/i] works.
If a rollback is needed, unfortunately there's no good story. You need to have (or rebuild) the previous version of the orchestrator packages, and upload them to apt.wm.o, and go from there.


== Packaging ==
== Packaging ==
Line 54: Line 88:
# On <code>apt1001</code>: <code>mkdir -p ~/orchestrator && rm ~/orchestrator/*.changes</code>
# On <code>apt1001</code>: <code>mkdir -p ~/orchestrator && rm ~/orchestrator/*.changes</code>
# From your build dir on your local machine: <code>scp ../*changes ../*deb ../*dsc apt1001.eqiad.wmnet:orchestrator/</code>
# From your build dir on your local machine: <code>scp ../*changes ../*deb ../*dsc apt1001.eqiad.wmnet:orchestrator/</code>
# Back on <code>apt1001</code>: <code>cd orchestrator && sudo -i reprepro -C main include buster-wikimedia *.changes</code>
# Back on <code>apt1001</code>: <code>cd ~/orchestrator && sudo -i reprepro -C main include buster-wikimedia $PWD/*.changes</code>
# In <code>#wikimedia-operations</code> on irc: <code>!log uploaded orchestrator $VERSION packages to apt.wm.o (buster) TXXXXXX</code>
# In <code>#wikimedia-operations</code> on irc: <code>!log uploaded orchestrator $VERSION packages to apt.wm.o (buster) TXXXXXX</code>



Revision as of 14:28, 14 October 2021

You may also be looking for the WikiFunctions function-orchestrator.

Orchestrator is a service for managing mysql cluster replication. The data-persistence SRE team is currently doing a proof-of-concept deployment of it within WMF, with the aim of replacing Tendril/Dbtree.

Operations

Current deployment

Orchestrator runs on dborch1001.eqiad.wmnet, a ganeti VM. It's publicly accessible as https://orchestrator.wikimedia.org/. It's backend database is named orchestrator, and it lives on db2093.codfw.wmnet (the db_inventory node in codfw, see T266003 for the background).

Adding a section to orchestrator

  1. Deploy the orchestrator grants to the section (modules/role/templates/mariadb/grants/orchestrator.sql.erb in the puppet repo). This should be done on the active DC's primary instance, and also on both DC's sanitarium hosts.
  2. Clean up the heartbeat table so that there's no stale entries.
    1. E.g. run this against all instances individually: set session sql_log_bin=0; delete from heartbeat where server_id=171974662 limit 1
  3. Add the primary instance to orchestrator.
    1. Ssh to the dborch node, and run sudo orchestrator -c discover -i FQDN
    2. N.B. it needs to be the FQDN of the instance.

Upgrading orchestrator

Orchestrator automatically deploys schema changes when it gets upgraded. It tracks these in the orchestrator_db_deployments table. On startup it will check to see if the current version number is in that table, and if not it will perform all schema changes. It will not detect if a later version has been deployed. This means that we need a full backup of the orchestrator database before doing an upgrade, as otherwise we do not have a way to rollback.

On dborch1001:

  1. Update apt, so the new package is available: sudo apt update
  2. Stop orchestrator: sudo systemctl stop orchestrator
  3. Take a backup of the orchestrator backend database: sudo mysqldump --defaults-file=/etc/mysql/orchestrator_srv.cnf --ssl -h db2093.codfw.wmnet orchestrator > orchestrator.sql.$(date +"%Y-%m-%d")
  4. Upgrade the orchestrator packages: sudo apt install orchestrator orchestrator-client
  5. Test that the orchestrator binary works from the cmdline:
    $ sudo orchestrator -c clusters-alias
    2021-10-14 14:03:12 DEBUG Connected to orchestrator backend: orchestrator_srv:?@tcp(db2093.codfw.wmnet:3306)/orchestrator?timeout=1s
    2021-10-14 14:03:12 DEBUG Orchestrator pool SetMaxOpenConns: 128
    2021-10-14 14:03:12 DEBUG Initializing orchestrator
    2021-10-14 14:03:12 INFO Connecting to backend db2093.codfw.wmnet:3306: maxConnections: 128, maxIdleConns: 32
    db1103.eqiad.wmnet:3306	x1
    db1104.eqiad.wmnet:3306	s8
    db1107.eqiad.wmnet:3306	m3
    ...
    
  6. Start orchestrator: sudo systemctl start orchestrator
  7. Test that orchestrator-client works:
    $ orchestrator-client -c clusters-alias
    db1103.eqiad.wmnet:3306,x1
    db1104.eqiad.wmnet:3306,s8
    db1107.eqiad.wmnet:3306,m3
    ...
    
  8. Test that the web u/i works.


If a rollback is needed, unfortunately there's no good story. You need to have (or rebuild) the previous version of the orchestrator packages, and upload them to apt.wm.o, and go from there.

Packaging

Updating orchestrator packaging to a new upstream version

  1. Check out the orchestrator package repo: https://gerrit.wikimedia.org/r/admin/repos/operations/debs/orchestrator
  2. On the master branch, run ./debian/repack v$VER. Note the leading v in the upstream version umber. This will create a tarball in the current directory.
  3. Move the tarball out of the git working dir: mv orchestrator_$VER.orig.tar.xz ..
  4. Import it: gbp import-orig ../orchestrator_$VER.orig.tar.xz. This will add a commit to the upstream branch a new upstream/$VER tag referencing it. It will then merge the new upstream branch into master.
  5. Push these new branches directly to gerrit, as they are not reviewable:
    1. git checkout upstream; git push; git push origin upstream/$VER
    2. git checkout master; git push
  6. Create a debian changelog entry for the new version: dch -D buster-wikimedia --force-distribution -v $VER-1. If you forget to do this, trying to build a package will fail horribly with dpkg-source: error: unrepresentable changes to source
  7. Test building the package to make sure that still works, and then send a CR for review with your changes.

Creating a new orchestrator release

You will need a gpg key to sign the new release. git tag will prompt you for your gpg password when creating the new tag.

  1. For simplicity, set 2 environment variables in your shell, $DEBVER for the new release you're creating, and $OLDDEBVERfor the previous release. E.g.: DEBVER=3.2.6-1; OLDVER=3.2.3-3
  2. Add/update a debian changelog entry for $DEBVER. Send a CR for review for any changes.
    1. If it doesn't already exist, create it with dch -D buster-wikimedia --force-distribution -v ${DEBVER:?}
  3. Create a git tag for the release, and populate it with changes made since the last release: git tag -s -a -F <(echo orchestrator ${DEBVER:?}; echo; git log --no-decorate --oneline debian/${OLDDEBVER:?}..) -e debian/${DEBVER:?}. This will prompt you for a gpg password to sign the tag with.
  4. Check that the new tag looks good: git show debian/${DEBVER:?}
  5. Push the tag to the upstream repo: git push origin debian/${DEBVER:?}

Building orchestrator packages

It's not currently possible to build orchestrator on deneb due to its golang version requirements. (Until Puppet host certs do not contain Subject Alt Name entries is fixed, or a workaround implemented, we're limited to golang 1.14).

  1. Check out the orchestrator package repo: https://gerrit.wikimedia.org/r/admin/repos/operations/debs/orchestrator
  2. Install the following prerequisites:
    1. sudo apt install devscripts debhelper dh-golang
    2. golang 1.14
  3. Build with debclean -d && debuild -d -us -uc (-d is needed to work around the fact that the build requirement on golang 1.14 isn't being satisfied by a debian package).

Uploading new orchestrator packages

This is a simplified version of Debian Packaging#Upload to Wikimedia Repo.

  1. On apt1001: mkdir -p ~/orchestrator && rm ~/orchestrator/*.changes
  2. From your build dir on your local machine: scp ../*changes ../*deb ../*dsc apt1001.eqiad.wmnet:orchestrator/
  3. Back on apt1001: cd ~/orchestrator && sudo -i reprepro -C main include buster-wikimedia $PWD/*.changes
  4. In #wikimedia-operations on irc: !log uploaded orchestrator $VERSION packages to apt.wm.o (buster) TXXXXXX

Troubleshooting

Entry in database_resolve that maps to a bare hostname

+--------------------+--------------------+---------------------+
| hostname           | resolved_hostname  | resolved_timestamp  |
+--------------------+--------------------+---------------------+
| pc1008.eqiad.wmnet | pc1008             | 2020-11-18 10:11:58 |
+--------------------+--------------------+---------------------+

This can cause a 'ghost' cluster to appear, containing the bare-hostname version of the host. To fix this:

systemctl stop orchestrator
orchestrator -c forget -i <instance> for all instances in the ghost cluster
orchestrator -c reset-hostname-resolve-cache
systemctl start orchestrator

Stopping orchestrator is required to stop it from reinserting the bad entry into hostname_resolve.

The entries can be queried via orchestrator -c show-resolve-hosts