You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Dumps/Dumpsdata hosts

From Wikitech-static
< Dumps
Revision as of 09:12, 29 July 2021 by imported>ArielGlenn (→‎XML Dumpsdata hosts: dumpsdata hosts switched back after network maintenance. 1003 has the 10g nic so we prefer it as primary)
Jump to navigation Jump to search

XML Dumpsdata hosts

Hardware

We have three hosts:

  • Dumpsdata1001 in eqiad, production xml/sql dumps nfs fallback:
    Hardware/OS: PowerEdge R730xd, Debian 8 (buster), 32GB RAM, 1 quad-core Xeon ES2623 cpu, HT enabled
    Disks: 12 4TB disks in 1 12-disk raid10 volume; two 1T disks in raid 1 for the OS
  • Dumpsdata1002 in eqiad, production misc dumps nfs:
    Hardware/OS: PowerEdge R730xd, Debian 10 (buster), 32GB RAM, 1 quad-core Xeon ES2623 cpu, HT enabled
    Disks: 12 4TB disks in 1 12-disk raid10 volume; two 1T disks in raid 1 for the OS
  • Dumpsdata1003 in eqiad, production xml/sql dumps nfs primary:
    Hardware/OS: PowerEdge R730xd, Debian 10 (buster), 64GB RAM, 1 quad-core Xeon Silver 4112 cpu, HT enabled
    Disks: 12 4TB disks in 1 12-disk raid10 volume; two 1T disks in raid 1 for the OS

Services

The production host is nfs-mounted on the snapshot hosts; generated dumps are written there and rsynced from there to the web and rsync servers.

Deploying a new host

You'll need to set up the raid arrays by hand. The single LVM volume mounted on /data is ext4.

You'll want to make sure your host has the right partman recipe in netboot.cfg, at this writing dumpsdata100X.cfg. This will set up sda1 as /boot, one lvm as / and one as /data.

Install in the usual way (add to puppet, copying a pre-existing production dumpsdata host stanza, set up everything for PXE boot and go). You will need to add the new host to the dumps_web_rsync_server_clients[ipv4][internal] list in common.yaml. You will also need to add an entry for it in hieradata/hosts; copy either the stanza for the primary generator host, if the new host is to replace it, or the stanza for a secondary (fallback) host, if the new host is to become a fallback.

Additionally you must add your host to $peer_hosts in profile::dumps::rsyncer and profile::dumps::rsyncer_peer so that rsyncs can be done from this host to the rest of the dumpsdata and dumps web server hosts. This must be done for both primary and secondary hosts.

If it is a fallback host, data must be rsynced to it from the primary on a regular basis. To make this happen, add the hostname to the xmlremotedirs and miscremotedirs arguments as passed from profile::dumps::generation::server::primary; you'll need to know the rsync path to the public directory for the dumps on the new host, in case it is set up differently than the rest, and you'll likewise need to know the rsync path to the directory for misc ('other') dumps. Because the initial rsync can be quite timeconsuming, it's best to do a manual rsync first from one of the web servers, and then enable the rolling rsync in puppet.

If it is to be a primary host and the old primary is to go away, when you are ready to make the switch you will need to change profile::dumps::generation::worker::common so that the dumpsdatamount resource mounts the new server's filesystem on the snapshot hosts instead of mounting the old primary server.

Reimaging an old host

This assumes you are using the wmf-auto-reimage script.

You likely want to preserve all the data on the /data filesystem. To make this happen, in netboot.cfg you'll want to set your host to use dumpsdata100X-no-data-format.cfg. This requires manual intervention during partitioning. Sepcifically, you'll need to select:

  • /dev/sda1 to use as ext4, mount at /boot, format
  • data vg to use as ext4, mount at /data, preserve all contents
  • root vg to use as ext4, mount at /, format

Write the changes to disk and the install will carry on as desired.

Space issues

These hosts should have three sets of xml/sql dumps at all times, thus ensuring that there is always one set that contains full revision history, usable for prefetch for the next such run. If we start getting low on space, this is a huge issue; we need to head that off well before it happens.