You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Orchestrator"

From Wikitech-static
Jump to navigation Jump to search
imported>Kormat
(Adding a section to orch.)
imported>Kormat
Line 11: Line 11:
## Ssh to the dborch node, and run <code>sudo orchestrator -c discover -i FQDN</code>
## Ssh to the dborch node, and run <code>sudo orchestrator -c discover -i FQDN</code>
## '''N.B.''' it needs to be the FQDN of the instance.
## '''N.B.''' it needs to be the FQDN of the instance.
=== Updating orchestrator packages to a new upstream version ===
# Check out the orchestrator package repo: https://gerrit.wikimedia.org/r/admin/repos/operations/debs/orchestrator
# On the <code>master</code> branch, run <code>./debian/repack v$VER</code>. Note the leading <code>v</code> in the upstream version umber. This will create a tarball in the current directory.
# Move the tarball out of the git working dir: <code>mv orchestrator_$VER.orig.tar.xz ..</code>
# Import it: <code>gbp import-orig ../orchestrator_$VER.orig.tar.xz</code>. This will add a commit to the <code>upstream</code> branch a new <code>upstream/$VER</code> tag referencing it. It will then merge the new <code>upstream</code> branch into <code>master</code>.
#Push these new branches directly to gerrit, as they are not reviwable:
##<code>git checkout upstream; git push; git push upstream/$VER</code>
##<code>git checkout master; git push</code>
# Create a debian changelog entry for the new version: <code>dch -v $VER-1</code>. If you forget to do this, trying to build a package will fail horribly with <code>dpkg-source: error: unrepresentable changes to source</code>
# <WIP>


== Troubleshooting ==
== Troubleshooting ==

Revision as of 13:23, 10 September 2021

Orchestrator is a service for managing mysql cluster replication. The data-persistence SRE team is currently doing a proof-of-concept deployment of it within WMF, with the aim of replacing Tendril/Dbtree.

Operations

Adding a section to orchestrator

  1. Deploy the orchestrator grants to the section (modules/role/templates/mariadb/grants/orchestrator.sql.erb in the puppet repo). This should be done on the active DC's primary instance, and also on both DC's sanitarium hosts.
  2. Clean up the heartbeat table so that there's no stale entries.
    1. E.g. run this against all instances individually: set session sql_log_bin=0; delete from heartbeat where server_id=171974662 limit 1
  3. Add the primary instance to orchestrator.
    1. Ssh to the dborch node, and run sudo orchestrator -c discover -i FQDN
    2. N.B. it needs to be the FQDN of the instance.

Updating orchestrator packages to a new upstream version

  1. Check out the orchestrator package repo: https://gerrit.wikimedia.org/r/admin/repos/operations/debs/orchestrator
  2. On the master branch, run ./debian/repack v$VER. Note the leading v in the upstream version umber. This will create a tarball in the current directory.
  3. Move the tarball out of the git working dir: mv orchestrator_$VER.orig.tar.xz ..
  4. Import it: gbp import-orig ../orchestrator_$VER.orig.tar.xz. This will add a commit to the upstream branch a new upstream/$VER tag referencing it. It will then merge the new upstream branch into master.
  5. Push these new branches directly to gerrit, as they are not reviwable:
    1. git checkout upstream; git push; git push upstream/$VER
    2. git checkout master; git push
  6. Create a debian changelog entry for the new version: dch -v $VER-1. If you forget to do this, trying to build a package will fail horribly with dpkg-source: error: unrepresentable changes to source
  7. <WIP>

Troubleshooting

Entry in database_resolve that maps to a bare hostname

+--------------------+--------------------+---------------------+
| hostname           | resolved_hostname  | resolved_timestamp  |
+--------------------+--------------------+---------------------+
| pc1008.eqiad.wmnet | pc1008             | 2020-11-18 10:11:58 |
+--------------------+--------------------+---------------------+

This can cause a 'ghost' cluster to appear, containing the bare-hostname version of the host. To fix this:

systemctl stop orchestrator
orchestrator -c forget -i <instance> for all instances in the ghost cluster
orchestrator -c reset-hostname-resolve-cache
systemctl start orchestrator

Stopping orchestrator is required to stop it from reinserting the bad entry into hostname_resolve.

The entries can be queried via orchestrator -c show-resolve-hosts