You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Data Engineering/Systems/DB Replica
The Analytics DB replica is the host running with the role::mariadb::misc::analytics::backup, currently db1108. The host was previously the replica of the Eventlogging database, but after decommissioning it we "promoted" db1108 to be a generic replica for Analytics databases. The host runs a mariadb instance for every database host to replicate; to see the config, check
profile::mariadb::misc::analytics::multiinstance in puppet.
Start/Stop replicas for maintenance
If you have to do maintenance (like rebooting, changing ports to the mysql instances, etc..) you'll need to stop replication first. You can do easily with:
sudo mysql -S /run/mysqld/mysqld.analytics_meta.sock # or any other socket/instance combination of course set session sql_log_bin=0; STOP SLAVE; # do maintenance; START SLAVE;
Check status of replication
sudo mysql -S /run/mysqld/mysqld.analytics_meta.sock # or any other socket/instance combination of course set session sql_log_bin=0; show slave status \G;
Pay attention to:
- Slave_IO_Running: if not "Yes" something is not working.
- Slave_SQL_Running_State: if the above is not "Yes", it should contain the reason of the problem.
There are two instances on db1108 currently running, that replicate:
piwikdatabase from matomo1002.
druid druid_public_eqiad hue oozie search_airflow superset_production hive_metastoredatabases from an-coord1001 (Analytics meta).
Add a new database to replicate
In this case, there are two options:
- There is a new db host to replicate, if so follow "Set up replication" later on.
- A new db is added to a db host that already holds a replica instance on db1108 (for example - Analytics Meta). In this case the new DB should be replicated correctly without any problem. Please remember that things like grants etc.. are as well replicated, so you need to be sure that things like the user are created/replicated as well beforehand.
Set up replication
This procedure assumes that:
- A new mariadb instance is running on db1108.
- The mariadb instance is configured correctly (see User:Elukey/Analytics/DBs#First_run_of_MariaDB for example).
The procedure to follow for each new mysql instance running on the replica is the following:
- On the master, execute
mysqldump --single-transaction --master-data=2 -u $USER -p $PASSWORD -h $host --databases $DATABASE1 $DATABASE2 etc.. >> file.sql. Note that usually we connect via UNIX socket on localhost as root to have full admin privileges, so the user/password parameters can be removed when connecting as root@localhost (typically via sudo mysql etc..).
file.sqldump should contain in its header something like:
CHANGE MASTER TO MASTER_LOG_FILE='matomo1002-bin.000023', MASTER_LOG_POS=29517516;. Keep a note about it since it will be needed later on.
- Ship the file from the master to db1108. This can be done in several ways, the quickest one is with transfer.py on cumin1001.
- On the replica, execute
cat file.sql | sudo mysql -S /run/mysqld/mysqld.$INSTANCE.sock($INSTANCE is the name that you set up in puppet).
- On the replica, run
sudo mysql_upgrade -S /run/mysqld/mysqld.$INSTANCE.sock
- On the replica, log in as root to the instance that you are configuring (
sudo mysql -S /run/mysqld/mysqld.$INSTANCE.sock) and then run:
change master to master_host='an-coord1001.eqiad.wmnet', master_user='$REPLICATION_USER', master_password='$PASSWORD', master_port=3306, MASTER_SSL=1, master_log_pos=SEE-PREVIOUS-STEP, master_log_file='SEE-PREVIOUS-STEP';(master_user and master password can be found in the puppet private repo looking for class passwords::analytics)
show slave status \G;(make sure that you see
STOP SLAVE; CHANGE MASTER TO MASTER_USE_GTID=Slave_pos; START SLAVE;
show slave status \G;(make sure that you see
There is a weekly bacula backup job that pulls the databases list above to Bacula. You can check what gets saved on dbprov1002:
root@dbprov1002:/srv/backups/dumps/latest/dump.matomo.2020-07-31--07-54-57# ls metadata piwik.piwik_archive_blob_2019_01.sql.gz piwik.piwik_archive_numeric_2017_07-schema.sql.gz piwik.piwik_access-schema.sql.gz piwik.piwik_archive_blob_2019_02-schema.sql.gz piwik.piwik_archive_numeric_2017_07.sql.gz [..]
Why weekly? If we need to recover anything that happened a day before, we'll have 14 days of binlog on the master and replica database hosts. See https://dev.mysql.com/doc/refman/8.0/en/point-in-time-recovery-binlog.html