You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log

From Wikitech-static
Revision as of 00:54, 10 October 2020 by imported>Stashbot (tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on all but enwiki (T264799) (duration: 01m 01s))
Jump to navigation Jump to search

2020-10-10

2020-10-09

  • 23:44 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on Wikidata (T264799) (duration: 00m 59s)
  • 23:25 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on Commons (T264799) (duration: 00m 59s)
  • 23:13 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL and only related ticket says resolved - powercycling it - boots normal but doesn't have a prod role (T260271)
  • 23:07 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL or tickets
  • 23:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:52 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on group1, except Commons/Wikidata (T264799) (duration: 00m 57s)
  • 22:23 tgr@deploy1001: Synchronized php-1.36.0-wmf.11/includes/: Backport: Log IP/device changes within the same session (T264799) & SessionManager: Always log IP/UA in session-ip (duration: 01m 04s)
  • 22:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable session-ip log channel on group0 (T264799) (duration: 00m 59s)
  • 22:09 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/: Backport: Log IP/device changes within the same session (T264799) & SessionManager: Always log IP/UA in session-ip (duration: 01m 06s)
  • 22:01 tgr_: rolling out T264799#6533622
  • 21:53 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=dewiki --userlist users.txt # users.txt contains Almeida # T263935
  • 20:41 dwisehaupt: upgrading pay-lvs1001 to buster
  • 20:31 dwisehaupt: upgrading pay-lvs1002 to buster
  • 20:04 dwisehaupt: upgrading payments1001 to buster
  • 19:14 dwisehaupt: upgrading payments1002 to buster
  • 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 18:44 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:30 dwisehaupt: upgrading payments1003 to buster
  • 17:53 dwisehaupt: upgrading payments1004 to buster
  • 17:52 cstone: civicrm revision changed from b86a15a430 to 585eb835d8, config revision is 57843925bb
  • 16:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 15:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 15:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:40 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:41 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 14:32 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 14:18 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 13:48 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:45 jayme: helm rollback push-notification in eqiad to revision 8
  • 13:31 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 13:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:12 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 12:55 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 12:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:16 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 12:13 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 11:38 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 11:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 11:13 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 11:13 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 10:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 10:41 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 10:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 09:55 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 09:53 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 09:47 elukey: roll restart of hadoop-yarn-nodemanager on all hadoop workers to pick up new settings
  • 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 09:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:07 XioNoX: remove user from all network devices
  • 08:22 marostegui: Restart dbstore1005 mysql to pick up new buffer pool sizes
  • 08:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:11 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:36 moritzm: installing xen security updates for buster (libs only)
  • 07:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:34 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.decommission

2020-10-08

  • 23:42 ryankemper: `cloudelastic1006` done. Writes thawed, maintenance window lifted; restarts are done for `cloudelastic`
  • 23:37 ryankemper: `cloudelastic1005` done
  • 23:31 ryankemper: `cloudelastic1004` done
  • 23:27 ryankemper: `cloudelastic1003` done
  • 23:23 ryankemper: `cloudelastic1002` done
  • 23:16 tgr_: Evening deploys done
  • 23:16 ryankemper: `cloudelastic1001` is done restarting and cluster is green again. Proceeding to `cloudelastic1002`
  • 23:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes everywhere (T264793) (duration: 01m 01s)
  • 23:04 ryankemper: Beginning cluster restarts one server at a time. For each server, the process is depool->restart elasticsearch services->wait for services to restart and then pool->wait for cluster to return to green status before starting next server
  • 23:01 ryankemper: Writes are frozen for `cloudelastic`: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic` on `mwmaint2001` => `Applied cluster-wide freeze`
  • 22:56 ryankemper: `sudo apt policy wmf-elasticsearch-search-plugins` shows correct state: `Installed: 6.5.4-4~stretch`
  • 22:56 ryankemper: `sudo -E cumin -b 6 C:role::elasticsearch::cloudelastic 'DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install wmf-elasticsearch-search-plugins'`
  • 22:54 ryankemper: About to start plugin upgrade followed by restarts of `cloudelastic`. Maintenance window set for the next 2 hours on `cloudelastic100[1-6]`
  • 21:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data (duration: 01m 04s)
  • 21:53 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data
  • 21:52 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session/SessionBackend.php: Deduplicate SessionBackend::logPersistenceChange calls - T264793 (duration: 01m 01s)
  • 21:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 21:00 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 20:50 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:45 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 20:43 volans: deploying Netbox DNS zone consolidation - T264273
  • 20:11 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name (duration: 01m 09s)
  • 19:23 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name
  • 18:57 volker-e@deploy1001: Finished deploy [design/style-guide@b1166af]: Deploy design/style-guide: (duration: 00m 06s)
  • 18:57 volker-e@deploy1001: Started deploy [design/style-guide@b1166af]: Deploy design/style-guide:
  • 18:17 tchanders@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Special:Investigate by default on production (T264357) (duration: 01m 06s)
  • 17:50 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data (duration: 11m 55s)
  • 17:44 root@cumin1001: START - Cookbook sre.dns.netbox
  • 17:37 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data
  • 17:31 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:30 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:23 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:16 shdubsh: install prometheus-rsyslog-exporter_0.0.0+git20201008 on centrallog1001 - T210137
  • 16:25 mutante: rebooting cloudvirt1023 - trying PXE boot
  • 16:19 hashar: Restarting CI Jenkins
  • 16:15 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:09 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:08 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 16:08 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:21 marostegui: Set global innodb_change_buffering = all; on pc2009 T263443
  • 14:17 moritzm: importing icu 63.1-6+deb10u1~wmf5 to component/icu63 T264991
  • 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:29 kart_: Updated cxserver to 2020-10-08-053343-production (T264407, T264859)
  • 12:26 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:24 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:21 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:07 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:54 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:52 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1030.eqiad.wmnet
  • 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1030.eqiad.wmnet
  • 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1030.eqiad.wmnet
  • 10:37 moritzm: installing Postgres security updates on netboxdb1001
  • 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1029.eqiad.wmnet
  • 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1029.eqiad.wmnet
  • 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1029.eqiad.wmnet
  • 10:32 moritzm: installing Postgres security updates on netboxdb2001
  • 10:29 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
  • 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
  • 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
  • 10:26 hnowlan: pooling restbase1028,restbase1029,restbase1030
  • 10:22 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:14 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:40 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 09:10 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:09 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:38 godog: roll-restart swift-object-replicator on ms-be2* - T261633
  • 08:19 kormat: running schema change against s8 in eqiad T259831
  • 08:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:06 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:04 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:02 gehel: repooling wdqs2002
  • 07:55 marostegui: Rebuild db2125 from snapshots - T260670
  • 07:45 marostegui: Stop MySQL on db1077 to build it from s1 snapshot
  • 07:40 gehel: depooled wdqs2002 to catch up on lag
  • 07:29 jayme: updated envoyproxy to 1.15.1-2 on all codfw hosts
  • 07:23 moritzm: installing pyzmq updates from Buster point release
  • 07:00 dcausse: depooling wdqs2002 (catching-up lag)
  • 06:57 dcausse: restart blazegraph on wdqs2002 (stuck) T242453
  • 06:51 _joe_: enable notifications for wdqs-ssl-codfw
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:05 ejegg: updated fundraising python tools from 5515923ef7 to d4e08c52de
  • 00:31 tgr_: evening deploys done
  • 00:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes in group1 (T264793) (again, forgot to rebase the previous time) (duration: 00m 59s)
  • 00:15 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes in group1 (T264793) (duration: 00m 57s)
  • 00:03 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable logging of session cookie changes in group0 (T264793) (duration: 00m 58s)

2020-10-07

  • 23:58 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session: Backport: Log when SessionManager is emitting cookies (T264793) (duration: 01m 00s)
  • 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
  • 23:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
  • 21:55 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
  • 21:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 21:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
  • 20:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
  • 20:09 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset (duration: 03m 23s)
  • 20:05 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset
  • 19:36 mutante: blog post: The latest addition to our family of Wikimedia languages is "Inari Sami" with language code "smn". It is a Sami language spoken by the Inari Sami of Finland and has about 400 native speakers. It's in the Uralic language family. Wikipedia will be created in T264859. https://en.wikipedia.org/wiki/Inari_Sami | https://iso639-3.sil.org/code/smn |
  • 18:30 ryankemper: search team's backport deploy is complete
  • 18:30 ryankemper@deploy1001: Synchronized wmf-config/ProductionServices.php: Config: cloudelastic: envoy sits in front now (T263073) (duration: 00m 58s)
  • 18:29 ryankemper: Above tests are as expected, syncing changes everywhere: `scap sync-file wmf-config/ProductionServices.php 'Config: cloudelastic: envoy sits in front now (T263073)'`
  • 18:27 ryankemper: `scap pull`ed onto `mwdebug2001`; talking to cloudelastic via mediawiki from codfw has the expected decrease in latency due to the tls connection pooling
  • 18:24 ryankemper: `scap pull`ed onto `mwdebug1002`. Talking to cloudelastic on localhost (which routes thru envoy), 6105 is `cloudelastic-chi-eqiad`, 6106 is `cloudelastic-omega-eqiad`, and 6107 is `cloudelastic-psi-eqiad` as expected
  • 18:20 ryankemper: (backport) HEAD set to 834b457 as expected
  • 18:12 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/HeaderCallback.php: Preload class used in HeaderCallback - T261260 (duration: 01m 01s)
  • 17:58 hashar: Pulled https://gerrit.wikimedia.org/r/c/mediawiki/core/+/632680 on deployment staging area and mw2001
  • 17:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:39 jgleeson: updated civicrm from 39b4f954ed to b86a15a430
  • 16:35 mutante: switching webproxy service names to the new local install servers in esams/eqsin/ulsfo T242602
  • 15:12 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog1001 - T259780
  • 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:33 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:22 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 14:04 hoo: Ran "mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P1820 --new-data-type external-id" on mwmaint2001 (T263986)
  • 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 14:03 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 14:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 13:42 jayme: updated envoyproxy to 1.15.1-2 on all eqiad hosts
  • 13:39 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 13:18 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 04s)
  • 13:18 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
  • 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:24 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:22 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 11:55 _joe_: rolling restart of restbase due to running puppet with changed config-vars (a noop for the actual configuration)
  • 11:22 Urbanecm: EU B&C window done
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: f85bc30: Enable bot passwords at all fishbowl and private wikis (T258356) (duration: 00m 58s)
  • 11:15 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 5729736: Fix OAuthRateLimiter rate limit configuration (duration: 00m 59s)
  • 11:14 urbanecm@deploy1001: sync-file aborted: 5729736: Fix OAuthRateLimiter rate limit configuration (duration: 00m 02s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 6cdeea2: Set CXMTThresholdForPublish to 95% for Vietnamese Wikipedia (T264161) (duration: 00m 59s)
  • 10:58 marostegui: Set innodb_change_buffering = inserts on pc2009 T263443
  • 09:53 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from mw load groups T259831', diff saved to https://phabricator.wikimedia.org/P12945 and previous config saved to /var/cache/conftool/dbconfig/20201007-095355-kormat.json
  • 09:44 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 100%: 75', diff saved to https://phabricator.wikimedia.org/P12944 and previous config saved to /var/cache/conftool/dbconfig/20201007-094412-kormat.json
  • 09:21 moritzm: imported icu63 63.1-6+deb10u1~wmf1 to component/icu63 for stretch-wikimedia
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 T264755 ', diff saved to https://phabricator.wikimedia.org/P12943 and previous config saved to /var/cache/conftool/dbconfig/20201007-090943-marostegui.json
  • 08:39 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12942 and previous config saved to /var/cache/conftool/dbconfig/20201007-083903-kormat.json
  • 08:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:32 godog: roll-restart statsd-exporter across ms-be* after puppet run - T264588
  • 08:09 jayme: updated envoyproxy to 1.15.1-2 on all non mw and restbase hosts
  • 08:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:58 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2015 from dbctl T264700', diff saved to https://phabricator.wikimedia.org/P12941 and previous config saved to /var/cache/conftool/dbconfig/20201007-074951-marostegui.json
  • 07:14 marostegui: Stop MySQL es2015 for decommissioning T264700
  • 05:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:46 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 02:37 eileen: civicrm revision changed from a30da7f92a to 39b4f954ed, config revision is 0ca9a3a055
  • 01:00 cdanis: repool esams; cr2-esams router upgrade complete
  • 00:43 cdanis: T259621 cdanis@re1.cr2-esams> request chassis routing-engine master switch
  • 00:40 cdanis: T259621 cdanis@re1.cr2-esams> request system reboot other-routing-engine
  • 00:36 cdanis: T259621 cdanis@re1.cr2-esams> request system software add /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz re0 no-validate
  • 00:26 cdanis: T259621 cdanis@re0.cr2-esams> request chassis routing-engine master switch
  • 00:22 cdanis: T259621 cdanis@re0.cr2-esams> request system reboot other-routing-engine
  • 00:15 cdanis: T259621 cdanis@re0.cr2-esams> request system software add re1 no-validate /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz
  • 00:01 mutante: reinstalling testvm[345]001 to confirm OS installs work as normal after switching DHCP servers in POPs (T252526)

2020-10-06

  • 23:55 mutante: 🖧 switched DHCP server for eqsin from install2003 to install5001 - homer deployed to cr*eqsin* (T252526) 🖧
  • 23:53 mutante: 🖧 switched DHCP server for ulsfo from install2003 to install4001 - homer deployed to cr*ulsfo* (T252526) 🖧
  • 23:52 mutante: 🖧 switched DHCP server for esams from install1003 to install3001 - homer deployed to cr*esams* (T252526) 🖧
  • 23:43 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 23:11 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 23:07 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 22:32 ryankemper: Restart of `wdqs-categories` done. WDQS deploy is complete
  • 21:57 ryankemper: Restarting `wdqs-categories` across production instances one-at-a-time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
  • 21:57 ryankemper: Restarting `wdqs-categories` across all test instances (not public facing): `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 21:56 ryankemper: Restarting `wdqs-updater` across the fleet: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 21:55 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@e56a20e]: 0.3.51 (duration: 13m 09s)
  • 21:43 ryankemper: All tests passing on canary `wdqs1003`, proceeding to rest of fleet
  • 21:42 ryankemper@deploy1001: Started deploy [wdqs/wdqs@e56a20e]: 0.3.51
  • 21:14 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:632535 (duration: 01m 00s)
  • 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:40 Urbanecm: Morning B&C done
  • 18:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/skins/MinervaNeue/: 2118d26: Hot fix: Use display for hiding/showing sidebar on OS 14_0 (T264376) (duration: 01m 00s)
  • 18:37 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/skins/MinervaNeue/: d428ccb: Hot fix: Use display for hiding/showing sidebar on OS 14_0 (T264376) (duration: 01m 03s)
  • 18:25 ppchelko@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase.php gerrit:631775 T263493 T259622 (duration: 00m 58s)
  • 18:23 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: IS.php gerrit:631775 T263493 T259622 (duration: 00m 59s)
  • 18:19 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632516 T264043 (duration: 00m 59s)
  • 18:15 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632323 T264637 (duration: 00m 58s)
  • 18:12 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632484 T264637 (duration: 00m 58s)
  • 15:41 godog: centrallog* delete archived logs from old, single file, organization
  • 15:23 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:23 jayme: updated envoyproxy to 1.15.1-2 on mw-canary and restbase-canary
  • 14:57 sukhe: upload dnsdist_1.5.0-1wm1 to apt.wm.o (buster) - T263789
  • 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12936 and previous config saved to /var/cache/conftool/dbconfig/20201006-144701-kormat.json
  • 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:45 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:45 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 5% - T262946
  • 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:44 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:40 jayme: updated envoyproxy to 1.15.1-2 on mw2295.codfw.wmnet,restbase2017.codfw.wmnet
  • 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-backend,name=restbase2009.codfw.wmnet
  • 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase2009.codfw.wmnet
  • 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2009.codfw.wmnet
  • 14:36 hnowlan: repooling restbase2009
  • 14:31 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12935 and previous config saved to /var/cache/conftool/dbconfig/20201006-143157-kormat.json
  • 14:19 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 05s)
  • 14:19 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
  • 14:15 jayme: installed envoyproxy 1.15.1-2 on mwdebug1001
  • 14:08 marostegui: Reboot db1076 for kernel upgrade T264755
  • 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 14:03 marostegui: Power cycle db1076 T264755
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 ', diff saved to https://phabricator.wikimedia.org/P12934 and previous config saved to /var/cache/conftool/dbconfig/20201006-135810-marostegui.json
  • 13:41 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12932 and previous config saved to /var/cache/conftool/dbconfig/20201006-134149-kormat.json
  • 13:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:40 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from dump/vslow, add to all other contributions/logpager/recentchanges*/watchlist temporarily T259831', diff saved to https://phabricator.wikimedia.org/P12931 and previous config saved to /var/cache/conftool/dbconfig/20201006-134020-kormat.json
  • 13:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:14 jayme: pushed docker-registry.discovery.wmnet/envoy:1.15.1-2 - T264157
  • 13:04 marostegui: Change innodb_change_buffering = inserts on db2075 db2089 db2099 db2111 db2128 T263443
  • 12:55 godog: swift codfw-prod: bump weight for ms-be2057 - T261633
  • 12:20 elukey: update HDFS Namenode GC/Heap settings on an-master100[1,2]
  • 12:13 jayme: imported envoyproxy_1.15.1-2 to buster-wikimedia and stretch-wikimedia
  • 12:08 jbond42: deploy puppetlabs-stdlib 5.2
  • 11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:35 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 11:34 Urbanecm: EU B&C window done
  • 11:34 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=arbcom_ruwiki --fix # T264430 # P12930
  • 11:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 07c19f9: arbcom_ruwiki: Set AK as alias for NS_PROJECT (T264430) (duration: 00m 58s)
  • 11:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7e4e811: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons (T264430) (duration: 00m 58s)
  • 11:30 urbanecm@deploy1001: Synchronized static/favicon/arbcom_ruwiki.ico: 7e4e811: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons (T264430) (duration: 00m 58s)
  • 11:20 XioNoX: push L3 prep work to cloudsw1-c8-eqiad
  • 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b1a4fa: ruewiki: Add rollbacker, grantable and revokable by sysops (T264147) (duration: 00m 58s)
  • 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5cc7027: Allow bureaucrats to remove sysop permissions on Commons (T261481) (duration: 00m 58s)
  • 11:07 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 03m 14s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5f9721b: GrowthExperiments: Change Help Page URL for kowiki (T254364) (duration: 01m 00s)
  • 11:04 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
  • 11:02 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 00m 12s)
  • 11:02 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
  • 11:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:48 effie: set mw2279.codfw.wmnet as inactive T264698
  • 10:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2279.codfw.wmnet
  • 10:45 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
  • 10:44 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
  • 10:43 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
  • 10:41 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
  • 10:37 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009 (duration: 00m 15s)
  • 10:37 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009
  • 10:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:33 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: (no justification provided) (duration: 03m 01s)
  • 10:31 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:30 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: (no justification provided)
  • 10:01 marostegui: Restart mysql on dbstore1004 to pick up new buffer pool sizes
  • 09:59 effie: enable puppet on mc20*
  • 09:41 effie: enable puppet on mc10*
  • 09:38 effie: disable puppet on mc*
  • 09:27 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:26 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 08:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 08:33 jayme: imported envoyproxy_1.15.1-1+deb9u1 to stretch-wikimedia
  • 08:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:02 volans: removing unused ms-fe and ms-fe-thumbs svc records from DNS (gerrit/628086)
  • 07:53 marostegui: Change innodb_change_buffering = inserts on db2087:3316 db2089:3316 db2076 db2097:3316 db2114 T263443
  • 07:39 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 07:35 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 07:31 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 07:17 marostegui: Remove es2015 and es2017 from tendril and zarcillo T264700 T264386
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 T264700 ', diff saved to https://phabricator.wikimedia.org/P12926 and previous config saved to /var/cache/conftool/dbconfig/20201006-071451-marostegui.json
  • 07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2017 from dbctl T264386', diff saved to https://phabricator.wikimedia.org/P12925 and previous config saved to /var/cache/conftool/dbconfig/20201006-052849-marostegui.json

2020-10-05

  • 23:11 ejegg: updated payments staging from 52704ffe24 to db03677b2d
  • 22:27 mutante: removing shinken puppet module and role
  • 22:01 ebernhardson: restore wikidatawiki_content enwiki_content enwiki_general and commonswiki_file to default index.merge.policy.deletes_pct_allowed on eqiad cirrus cluster T264053
  • 21:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:26 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (32 sector, 16kB) readahead settings T264053
  • 20:13 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (64 sector, 32kB) readahead settings T264053
  • 19:56 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2050 to take reduced (128kB) readahead settings T264053
  • 19:31 mutante: ran sre.dns.netbox to push addition of an-worker1113 which was commited in prod repo but not in netbox data
  • 19:30 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:27 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 18:59 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 00m 08s)
  • 18:59 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
  • 18:58 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 12m 08s)
  • 18:46 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
  • 18:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 18:17 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 18:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 18:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 18:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 18:10 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 17:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 15:15 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:56 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 14:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 14:41 elukey: shutdown stat1005 and stat1008 for ram expansion (1005 again)
  • 14:36 ppchelko@deploy1001: Finished deploy [restbase/deploy@366a543]: T263133 T264035 (duration: 22m 23s)
  • 14:25 elukey: shutdown an-master1001 for ram expansion
  • 14:13 ppchelko@deploy1001: Started deploy [restbase/deploy@366a543]: T263133 T264035
  • 14:01 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:58 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:55 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:54 elukey: shutdown stat1005 for ram upgrade
  • 13:31 elukey: shutdown an-master1002 for ram expansion (64 -> 128G)
  • 12:39 moritzm: installing curl security updates on remaining hosts
  • 11:34 hoo@deploy1001: Synchronized wmf-config/: Revert "Remove $wgExtraLanguageNames from Wikidata and Commons" (T264295) (duration: 00m 59s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: be73f15: Move changetags right from users to sysop [trwiki] (T264508) (duration: 00m 59s)
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cd30b62: wgSkipSkins: Exclude contenttranslation skin from skin options for users (T263093) (duration: 00m 59s)
  • 11:05 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 11:04 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:34 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 10:32 ema: cp3052: pool with varnish 5.1.3-1wm15 T264398
  • 10:28 ema: cp3052: depool and downgrade varnish to 5.1.3-1wm15 T264398
  • 10:08 moritzm: installing ldap-replica1002 T264390
  • 09:52 moritzm: installing ldap-replica1001 T264390
  • 09:22 moritzm: installing ldap-replica2003 T264390
  • 09:02 hnowlan: bootstrapping restbase1030-b
  • 08:57 moritzm: installing ldap-replica2004 T264390
  • 08:40 kormat@cumin1001: dbctl commit (dc=all): 'db2073 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12918 and previous config saved to /var/cache/conftool/dbconfig/20201005-084022-kormat.json
  • 08:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:38 kormat@cumin1001: dbctl commit (dc=all): 'Add db2119 to s4 dump/vslow temporarily T259831', diff saved to https://phabricator.wikimedia.org/P12917 and previous config saved to /var/cache/conftool/dbconfig/20201005-083822-kormat.json
  • 08:23 godog: prometheus codfw/ops, add 100G to the LV
  • 08:06 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 07:46 marostegui: Stop mysql on es2017 T264386
  • 07:30 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 06:52 XioNoX: add static NAT to pfw3-eqiad - T264356
  • 06:33 elukey: reboot stat1005 to resolve weird GPU state (scheduled last week)
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2017 T264386 ', diff saved to https://phabricator.wikimedia.org/P12916 and previous config saved to /var/cache/conftool/dbconfig/20201005-050636-marostegui.json

2020-10-03

  • 15:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: emergency: 840545f: Restrict flow-hide right to autoconfirmed users on zhwiki (T264489) (duration: 01m 17s)
  • 00:08 ejegg: updated fundraising CiviCRM from 256adda03c to a30da7f92a

2020-10-02

  • 22:00 mutante: depooling mw2271 because Icinga alerts about memcached and SAL shows there were ongoing tests of some kind on it
  • 21:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=mw2271.codfw.wmnet
  • 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:32 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 21:26 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:22 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 19:14 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 18:35 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:27 effie: enable puppet on mw2271
  • 18:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events (duration: 02m 01s)
  • 18:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events
  • 17:15 mutante: submitted puppet refactoring change on maps servers
  • 16:49 effie: disable puppet on mw2271 and briefly depool it
  • 15:39 _joe_: restarting redis on rdb2003, instance 6380
  • 15:28 hnowlan: bootstrapping restbase1030-a
  • 15:25 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
  • 14:45 cdanis@deploy1001: Synchronized docroot/wikimediafoundation.org: Separate foundation.wikimedia.org docroot & add .well-known/matrix/server T261531 4573776bd 2fb4c20ae (duration: 01m 01s)
  • 14:19 moritzm: installing LLVM 7 bugfix updates from Buster point release
  • 14:08 effie: enable puppet on mwdebug1001
  • 14:08 moritzm: purging some unused kernels on ping* (these only have 3GB "disks")
  • 13:37 Urbanecm: Create bot_passwords table at fishbowl wikis (T258356)
  • 13:35 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12905 and previous config saved to /var/cache/conftool/dbconfig/20201002-133545-kormat.json
  • 13:20 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12904 and previous config saved to /var/cache/conftool/dbconfig/20201002-132042-kormat.json
  • 13:00 moritzm: installing Linux 4.19.146 on Buster updates (from latest Buster point release, at this point only installing the updates, no reboots (yet))
  • 12:26 effie: disable puppet on mwdebug1001
  • 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db2140 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12903 and previous config saved to /var/cache/conftool/dbconfig/20201002-121830-kormat.json
  • 12:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:08 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12902 and previous config saved to /var/cache/conftool/dbconfig/20201002-120825-kormat.json
  • 12:05 hnowlan: bootstrapping restbase1029-c
  • 11:53 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12901 and previous config saved to /var/cache/conftool/dbconfig/20201002-115322-kormat.json
  • 11:22 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:59 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 10:57 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 10:47 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 10:47 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
  • 10:44 kormat@cumin1001: dbctl commit (dc=all): 'db2110 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12900 and previous config saved to /var/cache/conftool/dbconfig/20201002-104453-kormat.json
  • 10:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:43 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12899 and previous config saved to /var/cache/conftool/dbconfig/20201002-104320-kormat.json
  • 10:40 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 10:36 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 10:28 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 67%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12898 and previous config saved to /var/cache/conftool/dbconfig/20201002-102817-kormat.json
  • 10:13 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 33%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12897 and previous config saved to /var/cache/conftool/dbconfig/20201002-101313-kormat.json
  • 10:06 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 09:56 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 09:48 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 09:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:27 kormat@cumin1001: dbctl commit (dc=all): 'db2106 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12896 and previous config saved to /var/cache/conftool/dbconfig/20201002-092715-kormat.json
  • 09:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:19 jayme: running ipvsadm -D -t 10.2.1.20:10042; ipvsadm -D -t 10.2.1.16:1969 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255875 T255869
  • 09:18 jayme: running ipvsadm -D -t 10.2.2.20:10042; ipvsadm -D -t 10.2.2.16:1969 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255875 T255869
  • 09:17 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255875 T255869
  • 09:14 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255875 T255869
  • 09:12 jayme: running puppet on lvs servers - T255875 T255869
  • 09:11 arturo: added helm3 package to buster-wikimedia/thirdparty/kubeadm-k8s-1-17 (T264221)
  • 09:09 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:08 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 09:08 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
  • 09:07 hnowlan: bootstrapping restbase1029-b cassandra
  • 09:05 hashar: gerrit: running garbage collector
  • 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:59 root@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:54 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 03s)
  • 08:54 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
  • 08:42 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 34s)
  • 08:41 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
  • 08:30 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 00m 33s)
  • 08:30 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
  • 08:29 moritzm: installing pyzmq bugfix update from buster point release
  • 08:24 moritzm: installing nginx security updates on puppetdb*
  • 08:17 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 01m 35s)
  • 08:16 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
  • 07:42 moritzm: installing libcommons-compress-java security updates
  • 07:35 godog: swift codfw-prod bump weight for ms-be2057 - T261633
  • 07:29 godog: prometheus codfw/k8s, add 50G to the LV
  • 07:23 moritzm: installing libx11 security updates on buster
  • 06:51 _joe_: restarting php-fpm on all appservers in eqiad, in batches of 10%, for testing the procedure suggested at T264362
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2011 from dbctl T264261', diff saved to https://phabricator.wikimedia.org/P12893 and previous config saved to /var/cache/conftool/dbconfig/20201002-053020-marostegui.json

2020-10-01

  • 23:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 34s)
  • 23:38 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
  • 23:33 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 23:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 24s)
  • 23:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
  • 23:07 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:36 James_F: Manually created mediawiki/extensions.git REL1_35 at 7ab9a74 for T264365
  • 22:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 22:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 22:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:29 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 as well T264363
  • 21:29 James_F: Manually created mediawiki/skins.git REL1_35 at 796693c for T264365
  • 21:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group1
  • 20:48 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 refs T263177 (duration: 01m 06s)
  • 20:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11 refs T263177
  • 20:19 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
  • 20:08 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.11/includes/parser/: sync ParserCache patches to unblock the train T264257 T263177 (duration: 00m 59s)
  • 18:40 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: increase more_like recommendation cache from one to three days T264053 (duration: 00m 59s)
  • 17:49 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339 (duration: 13m 42s)
  • 17:35 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339
  • 17:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:24 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339 (duration: 01m 34s)
  • 17:24 mutante: etherpad1002 - attempted to upgrade Etherpad to newer version but wasn't working, reverted to previous one
  • 17:22 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train 530b339
  • 17:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:46 volans: migrating esams DNS records to the autogenerated ones from Netbox - T258729
  • 16:19 bblack: rebooting lvs1016 to a fresh state for interface config and error counters, etc - T264227
  • 15:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:53 bblack: lvs1016: re-disabled puppet with ticket ref in comment, downed interface enp5s0f0 since it's flapping furiously - T264227
  • 15:53 bblack: lvs1016: re-disabled puppet with ticket ref in comment, downed interface enp5s0f0 since it's flapping furiously
  • 14:55 jayme: running ipvsadm -D -t 10.2.2.10:8081; ipvsadm -D -t 10.2.2.47:8889 on lvs1015.eqiad.wmnet - T244843 T255878
  • 14:55 moritzm: installing npm security updates on buster
  • 14:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:53 jayme: running ipvsadm -D -t 10.2.1.10:8081; ipvsadm -D -t 10.2.1.47:8889 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T244843 T255878
  • 14:52 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:50 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T244843 T255878
  • 14:48 jayme: restarting pybal on lvs2010.codfw.wmnet - T244843 T255878
  • 14:42 jayme: running puppet on lvs servers - T244843 T255878
  • 14:35 Urbanecm: Create bot_passwords table at all private wikis (T258356)
  • 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:21 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12886 and previous config saved to /var/cache/conftool/dbconfig/20201001-142156-kormat.json
  • 14:14 andrewbogott: reimaging cloudvirt-wdqs1001 to buster
  • 14:12 effie: enable puppet on mw2271
  • 14:08 moritzm: installing pillow security updates
  • 14:06 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 67%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12885 and previous config saved to /var/cache/conftool/dbconfig/20201001-140653-kormat.json
  • 13:59 moritzm: installing nginx security updates on schema*
  • 13:51 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 33%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12884 and previous config saved to /var/cache/conftool/dbconfig/20201001-135149-kormat.json
  • 13:50 klausman: rebooting an-worker1096 for cluster maintenance
  • 13:49 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:49 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:43 vgutierrez: use synthetic warning for 2% of ECDHE-ECDSA-AES128-SHA pageviews - T258405
  • 13:29 moritzm: restarting mw canaries to pick up curl update
  • 13:22 moritzm: installing curl security updates on stretch
  • 12:57 kormat@cumin1001: dbctl commit (dc=all): 'db2136 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12883 and previous config saved to /var/cache/conftool/dbconfig/20201001-125707-kormat.json
  • 12:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12882 and previous config saved to /var/cache/conftool/dbconfig/20201001-123925-kormat.json
  • 12:24 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12881 and previous config saved to /var/cache/conftool/dbconfig/20201001-122422-kormat.json
  • 12:15 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/GrowthExperiments/includes/NewcomerTasks/TemplateFilter.php: 500d0c7: Prevent returning the full templatelinks table in TemplateFilter (T264029) (duration: 00m 59s)
  • 12:12 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/TemplateFilter.php: 500d0c7: Prevent returning the full templatelinks table in TemplateFilter (T264029) (duration: 01m 00s)
  • 12:09 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12880 and previous config saved to /var/cache/conftool/dbconfig/20201001-120919-kormat.json
  • 11:54 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12879 and previous config saved to /var/cache/conftool/dbconfig/20201001-115415-kormat.json
  • 11:14 arturo: pulling packages into reprepro for buster-wikimedia/thirdpardy/kubeadm-k8s-1-17 (T263284)
  • 11:09 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=kuwiktionary --fix # T262046
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 58a8c82: kuwiktionary: Create Jinûvesazî namespace (T262046) (duration: 01m 01s)
  • 10:47 kormat@cumin1001: dbctl commit (dc=all): 'db2119 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12878 and previous config saved to /var/cache/conftool/dbconfig/20201001-104716-kormat.json
  • 10:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:55 hnowlan: adding buster host restbase1028-b to cassandra
  • 08:53 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:38 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2109', diff saved to https://phabricator.wikimedia.org/P12877 and previous config saved to /var/cache/conftool/dbconfig/20201001-083321-marostegui.json
  • 08:28 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:27 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:25 akosiaris@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:25 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:25 akosiaris@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 08:22 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:16 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109 ', diff saved to https://phabricator.wikimedia.org/P12875 and previous config saved to /var/cache/conftool/dbconfig/20201001-081308-marostegui.json
  • 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091', diff saved to https://phabricator.wikimedia.org/P12874 and previous config saved to /var/cache/conftool/dbconfig/20201001-071442-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091 ', diff saved to https://phabricator.wikimedia.org/P12873 and previous config saved to /var/cache/conftool/dbconfig/20201001-071413-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12872 and previous config saved to /var/cache/conftool/dbconfig/20201001-071347-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12871 and previous config saved to /var/cache/conftool/dbconfig/20201001-071321-marostegui.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2083', diff saved to https://phabricator.wikimedia.org/P12870 and previous config saved to /var/cache/conftool/dbconfig/20201001-071241-marostegui.json
  • 07:12 elukey: restart hdfs namenodes on an-worker100[1,2] to pick up new hadoop workers settings
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2083', diff saved to https://phabricator.wikimedia.org/P12869 and previous config saved to /var/cache/conftool/dbconfig/20201001-071155-marostegui.json
  • 06:42 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 06:40 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Make es2033 master of es2 T261717', diff saved to https://phabricator.wikimedia.org/P12867 and previous config saved to /var/cache/conftool/dbconfig/20201001-063104-marostegui.json
  • 06:18 jayme: imported envoyproxy 1.15.1 to buster-wikimedia, stretch-wikimedia - T264157
  • 05:45 marostegui: Stop MySQL on es2011 T264261
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2011 T264261', diff saved to https://phabricator.wikimedia.org/P12866 and previous config saved to /var/cache/conftool/dbconfig/20201001-054335-marostegui.json
  • 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:29 marostegui: Deploy schema change on s3 (testwikidatawiki) T264109
  • 05:19 marostegui: Repool labsdb1011
  • 04:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 04:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:27 krinkle@deploy1001: Synchronized php-1.36.0-wmf.10/includes/parser/: Ia3357b2f593c (duration: 00m 58s)
  • 01:12 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: 1721d2aa0 - Reject ParserCache entries from the last wmf.11 deployment (duration: 05m 13s)

2020-09-30

  • 22:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:10 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:46 cdanis: depool mw2356 and mw2319
  • 21:45 eileen: civicrm revision changed from 5a53bfe6ed to 256adda03c, config revision is 646817a2c0
  • 21:23 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 also
  • 21:19 ejegg: updated fundraising CiviCRM from 6e843649ac to 5a53bfe6ed
  • 21:04 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback
  • 21:00 twentyafterfour@deploy1001: scap failed: average error rate on 5/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
  • 20:58 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 20s)
  • 20:56 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11
  • 20:47 mutante: temp disabling puppet on C:profile::swift::stats_reporter hosts, applying gerrit:631158 refactoring change
  • 20:36 mutante: temp disabling puppet on swift::storage (swift-be) hosts, applying gerrit:631157 refactoring change
  • 19:21 mutante: activating DHCP and squid on install[345]001.wikimedia.org
  • 19:12 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
  • 19:01 effie: disable puppet on mw2271 and use onhost memcached - T263958
  • 19:00 hoo@deploy1001: Synchronized wmf-config/: Revert "labs: Turn on termbox v2 on wikidatawiki" (T264066) (duration: 00m 58s)
  • 18:58 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Revert "labs: Turn on termbox v2 on wikidatawiki" (T264066) (duration: 00m 58s)
  • 18:38 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on svwiki (T257220) (duration: 00m 58s)
  • 18:36 bblack: lvs1016 pybal diff alerts downtimed in icinga for ~48h to reduce annoying flappy alert spam, with reference to https://phabricator.wikimedia.org/T264227
  • 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments for newcomers on ptwiki (T225027) (duration: 00m 58s)
  • 18:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Put search in header for anons on all wikis, not just desktop-improvements wikis (T263032) (duration: 00m 59s)
  • 18:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable clientError on Wikidata and all Wikipedias except enwiki (T255585) (duration: 00m 58s)
  • 18:08 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move search in header for anons (T263032) (duration: 00m 59s)
  • 17:52 bblack: lvs1016: restart pybal
  • 17:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:01 hnowlan: finished adding restbase2018-a to the cassandra cluster
  • 16:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:33 cicalese@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Add beta config for API Portal/OAuth communications (duration: 00m 58s)
  • 16:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:21 mutante: re-enabled puppet on install2003
  • 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:28 moritzm: removed librsvg 2.40.20-3+wmf1+stretch1 from component/thumbor, superseded by 2.40.21-0+deb9u1 released via stretch-security
  • 14:23 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:10 cmjohnson1: powering down ores100[3-9 to upgrade memory in each T259909
  • 14:05 elukey: create thirdparty/amd-rocm33 for stretch-wikimedia
  • 14:03 cmjohnson1: powering down ores1002 to upgrade memory T259909
  • 13:55 cmjohnson1: powering down ores1001 to upgrade memory T259909
  • 13:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:12 hnowlan: started bootstrapping restbase1028-a, first buster restbase host
  • 12:39 marostegui: Deploy schema change on db2080, db2081 T264109
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2081', diff saved to https://phabricator.wikimedia.org/P12858 and previous config saved to /var/cache/conftool/dbconfig/20200930-123851-marostegui.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P12857 and previous config saved to /var/cache/conftool/dbconfig/20200930-123824-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2080', diff saved to https://phabricator.wikimedia.org/P12856 and previous config saved to /var/cache/conftool/dbconfig/20200930-123753-marostegui.json
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080', diff saved to https://phabricator.wikimedia.org/P12855 and previous config saved to /var/cache/conftool/dbconfig/20200930-123659-marostegui.json
  • 11:33 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:33 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:33 effie: enable puppet P:mediawiki::mcrouter_wancache for 630845 - T244340
  • 11:21 nikerabbit@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: Enable Special:TranslationStats (T263004) (duration: 00m 59s)
  • 11:06 effie: disable puppet on P:mediawiki::mcrouter_wancache for 630845 - T244340
  • 10:57 moritzm: installing librsvg security updates
  • 10:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:21 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:07 kormat: deploying schema change to s4/eqiad T259831
  • 10:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:07 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:59 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:50 jayme: imported envoyproxy 1.15.1 to buster-wikimedia component/envoy-future - T264157
  • 09:12 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:10 gehel@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:45 kormat: deploying schema change to s7/eqiad T259831
  • 08:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2016 from dbctl T264156', diff saved to https://phabricator.wikimedia.org/P12853 and previous config saved to /var/cache/conftool/dbconfig/20200930-080817-marostegui.json
  • 08:06 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 08:00 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 07:56 akosiaris: upgrade termbox to latest chart, fixing various prometheus-statsd-export configuration minor issues.
  • 07:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 07:55 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1131 on s6 eqiad master T263227, also give weight to db1093 as new API host', diff saved to https://phabricator.wikimedia.org/P12852 and previous config saved to /var/cache/conftool/dbconfig/20200930-074417-marostegui.json
  • 07:41 marostegui: Starting s6 eqiad failover from db1093 to db1131 - T263227
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 T263227', diff saved to https://phabricator.wikimedia.org/P12851 and previous config saved to /var/cache/conftool/dbconfig/20200930-071841-marostegui.json
  • 07:05 marostegui: Stop mysql on es2016 before decommissioning T264156
  • 07:01 elukey@deploy1001: Finished deploy [analytics/superset/deploy@7bdc414]: Upgrade to 0.37.2 (duration: 00m 49s)
  • 07:00 elukey@deploy1001: Started deploy [analytics/superset/deploy@7bdc414]: Upgrade to 0.37.2
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2016 T264156', diff saved to https://phabricator.wikimedia.org/P12850 and previous config saved to /var/cache/conftool/dbconfig/20200930-065838-marostegui.json
  • 06:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 06:19 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2082', diff saved to https://phabricator.wikimedia.org/P12849 and previous config saved to /var/cache/conftool/dbconfig/20200930-061036-marostegui.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2082', diff saved to https://phabricator.wikimedia.org/P12848 and previous config saved to /var/cache/conftool/dbconfig/20200930-061005-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318', diff saved to https://phabricator.wikimedia.org/P12847 and previous config saved to /var/cache/conftool/dbconfig/20200930-060754-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318', diff saved to https://phabricator.wikimedia.org/P12846 and previous config saved to /var/cache/conftool/dbconfig/20200930-060705-marostegui.json
  • 05:43 marostegui: Remove es2019 from tendril and zarcillo T264063
  • 05:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:29 marostegui: Reduce busy-time from 3600 to 1800 on labsdb1010
  • 02:30 eileen: process-control config revision is 646817a2c0
  • 00:41 tgr@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/GrowthExperiments/: Backport: Ensure variant A homepage sidebar is always at least 300px (T263905) (duration: 01m 01s)

2020-09-29

  • 23:35 mutante: created testvm3001.esams.wmnet to test install3001
  • 23:31 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 23:24 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Echo app push on all Wikipedias (T262936) (duration: 00m 59s)
  • 23:20 Urbanecm: Evening B&C window completed
  • 23:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 68d7af9: Enable watchlist expiry feature (wikisource; T260461) (duration: 00m 58s)
  • 23:18 eileen: process-control config revision is 8b39770e93
  • 23:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bc6dda2: Enable watchlist expiry feature (T260461) (duration: 00m 58s)
  • 23:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:52 eileen: process-control config revision is 16a6dcafd6
  • 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:48 eileen: civicrm revision changed from 035ad1c351 to 06a5289d1a, config revision is 2622fd2c09
  • 22:45 eileen: process-control config revision is 2622fd2c09 jobs disabled
  • 22:33 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:26 mutante: phab1001 - re-enabled puppet and running it
  • 22:24 ejegg: CiviCRM rolled back from 4aa0aeccd1 to 035ad1c351
  • 22:16 eileen: civicrm revision changed from 035ad1c351 to 4aa0aeccd1, config revision is b9120969bf
  • 21:59 mutante: temp. disabled puppet on phab1001
  • 21:49 mutante: restarted aphlict service on aphlict1001
  • 21:47 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.10 (duration: 13m 45s)
  • 21:34 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.10
  • 21:30 mutante: started DHCP service on install2003 again
  • 21:22 mutante: temp stopping DHCP service on install2003 for a test
  • 21:09 mutante: rebooting testvm5001 for install test after switching DHCP/TFTP in eqsin to new dedicated VM
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:00 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:54 cdanis@cumin1001: dbctl commit (dc=all): 'depool db2125', diff saved to https://phabricator.wikimedia.org/P12843 and previous config saved to /var/cache/conftool/dbconfig/20200929-205453-cdanis.json
  • 20:51 mutante: DHCP server for EQSIN switched from bast5001 to install5001 (T252526)
  • 20:45 twentyafterfour@deploy1001: Finished scap: testwikis to 1.36.0-wmf.11 refs T263177 (duration: 69m 57s)
  • 19:44 andrewbogott: apt-get update && apt-get upgrade on wikitech-static
  • 19:40 mutante: temp. disabling puppet on ms-fe (swift-proxy) hosts, applying puppet refactoring change carefully
  • 19:35 twentyafterfour@deploy1001: Started scap: testwikis to 1.36.0-wmf.11 refs T263177
  • 19:29 twentyafterfour: Checked out mediawiki 1.36.0-wmf.11 on deploy1001 see T263177
  • 17:30 hnowlan: ported cassandra-tools-wmf to wikimedia-buster
  • 17:12 jbond42: update libdbi-perl on dbmonitor1001 and helium
  • 17:02 jbond42: re-enable puppet to post deploy puppetdb change
  • 16:57 jbond42: disable puppet to deploy puppetdb change
  • 16:34 chaomodus: deploying eqsin automated DNS
  • 15:51 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:47 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:39 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:23 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:10 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:02 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:00 vgutierrez: restarting acme-chief on acmechief1001
  • 14:48 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:41 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:34 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:32 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 14:30 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 14:30 bblack: switching eqsin and esams public-facing unified certs to letsencrypt - https://gerrit.wikimedia.org/r/c/operations/puppet/+/630847
  • 14:06 moritzm: installing facter updates from Buster 10.6 point release
  • 13:57 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:57 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:54 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:49 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2126 from dump/vslow T259831', diff saved to https://phabricator.wikimedia.org/P12841 and previous config saved to /var/cache/conftool/dbconfig/20200929-134926-kormat.json
  • 13:47 ema: text@esams: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 13:40 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12840 and previous config saved to /var/cache/conftool/dbconfig/20200929-134018-kormat.json
  • 13:36 ema: upload@esams: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 13:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:28 moritzm: installing lua5.3 security updates
  • 13:25 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12839 and previous config saved to /var/cache/conftool/dbconfig/20200929-132515-kormat.json
  • 13:10 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12838 and previous config saved to /var/cache/conftool/dbconfig/20200929-131011-kormat.json
  • 12:56 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 12:55 kormat@cumin1001: dbctl commit (dc=all): 'db2108 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12837 and previous config saved to /var/cache/conftool/dbconfig/20200929-125508-kormat.json
  • 12:53 moritzm: installing QT security updates
  • 12:29 kormat@cumin1001: dbctl commit (dc=all): 'db2108 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12836 and previous config saved to /var/cache/conftool/dbconfig/20200929-122914-kormat.json
  • 12:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:28 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db2126 to dump/vslow T259831', diff saved to https://phabricator.wikimedia.org/P12835 and previous config saved to /var/cache/conftool/dbconfig/20200929-122811-kormat.json
  • 12:05 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 11:54 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 11:28 vgutierrez: disabling DHE-RSA-AES128-SHA support - T258405
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12834 and previous config saved to /var/cache/conftool/dbconfig/20200929-111804-root.json
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12833 and previous config saved to /var/cache/conftool/dbconfig/20200929-110300-root.json
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12832 and previous config saved to /var/cache/conftool/dbconfig/20200929-104757-root.json
  • 10:42 XioNoX: re-enable TFTP ALGs on all mr
  • 10:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:40 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:39 moritzm: installing libdbi-perl security updates for stretch/buster
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: After reboot to troubleshoot a degraded RAID', diff saved to https://phabricator.wikimedia.org/P12831 and previous config saved to /var/cache/conftool/dbconfig/20200929-103253-root.json
  • 10:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:07 kormat@cumin1001: dbctl commit (dc=all): 'Promote db1104 on s8 eqiad master T239238', diff saved to https://phabricator.wikimedia.org/P12830 and previous config saved to /var/cache/conftool/dbconfig/20200929-100723-kormat.json
  • 10:05 kormat: Starting s8 eqiad failover from db1109 to db1104 - T239238
  • 10:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:59 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:59 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 09:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:51 kormat@cumin1001: dbctl commit (dc=all): 'Set db1104 with weight 0 T239238', diff saved to https://phabricator.wikimedia.org/P12829 and previous config saved to /var/cache/conftool/dbconfig/20200929-095135-kormat.json
  • 09:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 09:17 marostegui: Depool labsdb1010 from web role
  • 09:08 jbond42: update rails on puppetmasters
  • 08:21 jayme: switching esams pybal back to conf1006 - T196487
  • 08:01 ema: cp3050: varnish upgrade to 6.0.6-1wm1 T263557
  • 07:55 gehel: badblocks check on wdqs1009 - T263125
  • 07:46 marostegui: Stop MySQL on es2019 before decommissioning T264063
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2019 from dbctl T264063', diff saved to https://phabricator.wikimedia.org/P12825 and previous config saved to /var/cache/conftool/dbconfig/20200929-074602-marostegui.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2019 T264063', diff saved to https://phabricator.wikimedia.org/P12824 and previous config saved to /var/cache/conftool/dbconfig/20200929-060538-marostegui.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2034 as es3 master in codfw T261717', diff saved to https://phabricator.wikimedia.org/P12823 and previous config saved to /var/cache/conftool/dbconfig/20200929-060253-marostegui.json
  • 05:13 marostegui: Stop mysql and reboot es2026 - T263837
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2026 T263837', diff saved to https://phabricator.wikimedia.org/P12822 and previous config saved to /var/cache/conftool/dbconfig/20200929-051236-marostegui.json
  • 05:10 marostegui: Remove es2013 from tendril and zarcillo T263740
  • 05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 04:59 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 03:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 03:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 03:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:32 tgr_: B&C done
  • 00:31 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/TaskSuggester/CacheDecorator.php: Backport: Add (and increment) CacheDecorator cache version ([PHABRICATOR-TASK]) (duration: 00m 58s)
  • 00:09 mutante: TFTP/install server for eqsin switched from bast5001 to install5001 - T252526

2020-09-28

  • 23:56 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T264053: Remove commonswiki from sidebar search (duration: 01m 09s)
  • 23:42 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/ConfigurationLoader/PageConfigurationLoader.php: Backport: Properly handle namespaces in tasktype template configuration (T264029) (duration: 01m 03s)
  • 22:27 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:25 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:24 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 22:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:58 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:25 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:22 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:51 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:46 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:45 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:17 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:17 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:13 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:13 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:10 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 19:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:16 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:12 ejegg: updated staging payments-wiki from 43470629cc to 885d87a905
  • 18:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:15 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:15 Urbanecm: Morning B&C done
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c7e08bc: Enable search in header A/B test for logged in users (T263032) (duration: 00m 58s)
  • 17:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:15 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:58 ejegg: updated payment-wiki from b2eb456ed1 to 2083498811
  • 16:34 cdanis@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 16:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:24 cdanis@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:20 nskaggs@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 16:20 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 16:20 cdanis@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 16:20 cdanis@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 16:08 hnowlan: reimaging new restbase hosts - restbase1028, restbase1029, restbase1030
  • 16:08 XioNoX: push pfw policies - T264013
  • 15:51 papaul: poweroff elastic2037 for DIMM replacing
  • 15:26 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1114 T196487', diff saved to https://phabricator.wikimedia.org/P12818 and previous config saved to /var/cache/conftool/dbconfig/20200928-152635-kormat.json
  • 15:25 hashar: Restarting CI Jenkins for plugins uninstallation T260565
  • 15:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:13 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:13 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:12 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:12 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:59 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:49 moritzm: installing glib-networking security updates
  • 14:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:40 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1006.eqiad.wmnet
  • 14:33 XioNoX: repool eqiad
  • 14:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:05 moritzm: uploaded libdbi-perl 1.631-3+wmf1 for jessie-wikimedia T259102
  • 13:58 XioNoX: asw2-d-eqiad# run request system power-off member 4
  • 13:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:46 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1006.eqiad.wmnet
  • 13:45 XioNoX: downtiming all eqiad row D hosts - T196487
  • 13:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:38 godog: roll restart object-replicator on ms-be2* for higher concurrency - T261633
  • 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:20 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:19 moritzm: reimaging sretest1001 to validate puppetised sources.list with a new installation T158562
  • 13:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 12:57 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:37 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 12:31 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 12:29 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript resetUserEmail.php --wiki=arbcom_ruwiki 'Adamant.pwn' 'adamant.pwn@hotmail.com' # T262812
  • 12:28 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript createAndPromote.php --wiki=arbcom_ruwiki --bureaucrat --sysop 'Adamant.pwn' <PASSWORD REDACTED> # T262812
  • 12:26 Urbanecm: arbcom_ruwiki is created (T262812)
  • 12:26 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 48s)
  • 12:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating arbcom_ruwiki (T262812) (duration: 00m 56s)
  • 12:23 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating arbcom_ruwiki (T262812) (duration: 00m 56s)
  • 12:21 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating arbcom_ruwiki (T262812)
  • 12:20 urbanecm@deploy1001: Synchronized dblists: Creating arbcom_ruwiki (T262812) (duration: 00m 57s)
  • 12:19 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating arbcom_ruwiki (T262812) (duration: 00m 57s)
  • 12:17 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating arbcom_ruwiki (T262812) (duration: 00m 56s)
  • 12:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:59 kormat@cumin1001: dbctl commit (dc=all): 'db1114 depooling: prep for rack switch upgrade T196487', diff saved to https://phabricator.wikimedia.org/P12815 and previous config saved to /var/cache/conftool/dbconfig/20200928-115904-kormat.json
  • 11:43 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 483beb2: ContentTranslation: Do not use wikishared DB for testwiki (T263417; follow-up af09303 also included in this sync) (duration: 00m 56s)
  • 11:34 Urbanecm: EU B&C window done
  • 11:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 61eac95: Creation of patroller group on arz.wikipedia (T262218) (duration: 00m 57s)
  • 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 483beb2: ContentTranslation: Do not use wikishared DB for testwiki (T263417; follow-up af09303 also included in this sync) (duration: 00m 57s)
  • 10:45 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:37 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 10:35 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:35 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:33 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:32 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 10:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 10:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:25 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:48 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 09:48 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 09:48 ema: upload@codfw: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 09:29 ema: text@codfw: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 09:17 _joe_: changing the restbase public TLS certs to include restbase-async.discovery.wmnet
  • 09:17 XioNoX: restart bird on dns2001 - T262372
  • 09:15 jynus: restart db1077 for upgrade and cleanup T187984
  • 09:06 XioNoX: restart bird on centrallog2001 - T262372
  • 09:02 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:00 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:56 dcausse: T263970: recovering lost apifeature indices (copying eqiad indices -> codfw)
  • 08:55 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:53 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:46 godog: swift codfw-prod: bump object weight for ms-be2057 - T261633
  • 08:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:37 elukey: decommission the hadoop test cluster (analytics1028->41)
  • 08:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:36 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 08:35 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:34 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:34 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:32 ema: text@eqiad: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 08:28 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12813 and previous config saved to /var/cache/conftool/dbconfig/20200928-082825-kormat.json
  • 08:21 ema: upload@eqiad: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 08:21 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2113 from contributions/logpager/recentchanges*/watchlist T263842', diff saved to https://phabricator.wikimedia.org/P12812 and previous config saved to /var/cache/conftool/dbconfig/20200928-082114-kormat.json
  • 08:13 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12811 and previous config saved to /var/cache/conftool/dbconfig/20200928-081321-kormat.json
  • 08:07 jayme: restarting pybal on lvs3005 for switching to conf1005 - T196487
  • 08:06 jayme: restarting pybal on lvs3006 for switching to conf1005 - T196487
  • 08:02 jayme: restarting pybal on lvs3007 for switching to conf1005 - T196487
  • 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 07:58 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12810 and previous config saved to /var/cache/conftool/dbconfig/20200928-075817-kormat.json
  • 07:54 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 07:43 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: mobo replaced T260670', diff saved to https://phabricator.wikimedia.org/P12809 and previous config saved to /var/cache/conftool/dbconfig/20200928-074313-kormat.json
  • 07:29 _joe_: restarting pybal on the LVS primaries
  • 07:24 dcausse: T263970: forcing allocation of enwiki_general_1587198756 (chi@eqiad)
  • 07:18 _joe_: restarting pybal on the backup LVS in eqiad, codfw to pick up the new wikifeeds endpoint
  • 07:17 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
  • 07:09 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2028 as es1 master in codfw T261717', diff saved to https://phabricator.wikimedia.org/P12806 and previous config saved to /var/cache/conftool/dbconfig/20200928-065938-marostegui.json
  • 06:15 marostegui: Set innodb_change_buffering = inserts; on db2089 (s5), db2106 (s4), db2108 (s2), db2085 (s1), db2085 (s8), db2087 (s7), db2087 (s6), db2109 (s3) T263443
  • 05:55 marostegui: Stop MySQL on es2013 before decommissioning it T263740
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2013 from dbctl T263740', diff saved to https://phabricator.wikimedia.org/P12805 and previous config saved to /var/cache/conftool/dbconfig/20200928-055410-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2013 T263740', diff saved to https://phabricator.wikimedia.org/P12804 and previous config saved to /var/cache/conftool/dbconfig/20200928-054846-marostegui.json
  • 05:22 marostegui: Decrease labsdb1011 weight

2020-09-27

  • 06:36 elukey: powercycle analytics1048

2020-09-26

  • 19:20 chrisalbon: sudo service uwsgi-ores restart
  • 02:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 02:04 cdanis@cumin2001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=eqiad
  • 02:04 cdanis@cumin2001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=codfw
  • 01:56 cdanis: ❌cdanis@cumin2001.codfw.wmnet ~ 🕙🍺 sudo cumin 'A:ores and A:codfw' 'systemctl restart celery-ores-worker.service uwsgi-ores.service '
  • 01:48 cdanis@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=codfw
  • 01:48 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
  • 01:17 cdanis: ❌cdanis@ores2001.codfw.wmnet ~ 🕤🍺 sudo systemctl restart uwsgi-ores.service
  • 01:11 cdanis: ✔️ cdanis@ores2001.codfw.wmnet ~ 🕘🍺 sudo systemctl restart celery-ores-worker.service
  • 00:56 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:46 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm

2020-09-25

  • 23:03 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a135388]: correct scap variable refernce in airflow_variables (duration: 26m 57s)
  • 22:36 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a135388]: correct scap variable refernce in airflow_variables
  • 22:17 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d1a619f]: increase airflow_variable debugging verbosity (duration: 10m 42s)
  • food: updated fundraising CiviCRM from eb90dbcfd3 to 035ad1c351
  • 22:06 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d1a619f]: increase airflow_variable debugging verbosity
  • 21:23 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d999f76]: adding debug info to deployment (duration: 11m 33s)
  • 21:11 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d999f76]: adding debug info to deployment
  • 20:26 effie: installing memcached 1.4.33-1+deb9u1 on mwdebug1001
  • 19:34 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@303eaf3]: Enable icutoknorm in glent m0 and m1 (duration: 53m 58s)
  • 18:40 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@303eaf3]: Enable icutoknorm in glent m0 and m1
  • 17:47 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/MobileFrontend/: Backport: Make all section `collapsible` during server side rendering (T263832) (duration: 00m 59s)
  • 17:37 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae3c936]: Deploy glent 0.2.3 (duration: 02m 01s)
  • 17:35 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ae3c936]: Deploy glent 0.2.3
  • 16:35 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@94c8e6a]: fixed start data for wikidata ttl import (duration: 01m 10s)
  • 16:34 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@94c8e6a]: fixed start data for wikidata ttl import
  • 16:33 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Promote 1.35.0 to stable in extensiondistributor (duration: 00m 57s)
  • 16:29 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:23 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:23 jynus: fixing enwikivoyage ipblocks inconsistency cluster-wide T263842
  • 14:54 elukey: install linux-image-4.19-amd64 on an-worker1096 + reboot
  • 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:13 kormat@cumin1001: dbctl commit (dc=all): 'Add db2113 to various groups T263842', diff saved to https://phabricator.wikimedia.org/P12797 and previous config saved to /var/cache/conftool/dbconfig/20200925-121332-kormat.json
  • 11:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:23 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:10 moritzm: reimaging sretest1001 to validate puppetised sources.list with a new installation T158562
  • 10:42 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:28 moritzm: reimaging sretest1002 to validate puppetised sources.list with a new installation T158562
  • 09:58 moritzm: restarting archiva to pick up Java security update
  • 09:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:22 ema: upload@eqsin: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 09:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:02 ema: text@eqsin: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 06:50 elukey: shutdown ganeti5002 (mistakenly powercycled it without seeing T261130)
  • 06:40 elukey: powercycle ganeti5002 (no instances running on it, mgmt console shows no tty usable)
  • 06:34 elukey: reboot stat1004 to pick up kernel settings
  • 03:10 ejegg: updated payments-wiki from f89c594e12 to b2eb456ed1
  • 02:29 ppchelko@deploy1001: Finished deploy [restbase/deploy@4eaad8f]: new codfw, T263798 (duration: 09m 05s)
  • 02:27 andrew@deploy1001: Finished deploy [horizon/deploy@7b61460]: (no justification provided) (duration: 00m 07s)
  • 02:27 andrew@deploy1001: Started deploy [horizon/deploy@7b61460]: (no justification provided)
  • 02:20 ppchelko@deploy1001: Started deploy [restbase/deploy@4eaad8f]: new codfw, T263798
  • 02:20 ppchelko@deploy1001: Finished deploy [restbase/deploy@4eaad8f]: eqiad-only, T263798 (duration: 06m 09s)
  • 02:14 ppchelko@deploy1001: Started deploy [restbase/deploy@4eaad8f]: eqiad-only, T263798

2020-09-24

  • 23:39 andrew@deploy1001: Finished deploy [horizon/deploy@7b61460]: (no justification provided) (duration: 01m 58s)
  • 23:37 andrew@deploy1001: Started deploy [horizon/deploy@7b61460]: (no justification provided)
  • 21:40 mutante: mw1349 - systemctl reset-failed
  • 21:03 cdanis: reprepro: add backported ipvsadm 1:1.31-1+deb10u1 to buster-wikimedia
  • 21:00 andrew@deploy1001: Finished deploy [horizon/deploy@404e205]: (no justification provided) (duration: 01m 05s)
  • 20:59 andrew@deploy1001: Started deploy [horizon/deploy@404e205]: (no justification provided)
  • 20:41 andrew@deploy1001: Finished deploy [horizon/deploy@24368a5]: (no justification provided) (duration: 02m 10s)
  • 20:39 andrew@deploy1001: Started deploy [horizon/deploy@24368a5]: (no justification provided)
  • 20:35 andrew@deploy1001: Finished deploy [horizon/deploy@85125d1]: (no justification provided) (duration: 00m 52s)
  • 20:34 andrew@deploy1001: Started deploy [horizon/deploy@85125d1]: (no justification provided)
  • 19:57 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 19:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 19:54 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 19:47 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: Revert: cloudelastic: envoy sits in front now (duration: 00m 59s)
  • 19:41 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: (no justification provided) (duration: 00m 36s)
  • 19:41 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: (no justification provided)
  • 19:39 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: (no justification provided) (duration: 01m 08s)
  • 19:38 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: (no justification provided)
  • 19:30 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: dev (duration: 00m 44s)
  • 19:29 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: dev
  • 19:08 dancy@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.10
  • 19:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bcf9fcb: Enable mobile block notice tracking in MobileFrontend (T260218) (duration: 01m 04s)
  • 18:58 tchanders@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Special:Investigate on itwiki and svwiki (T262436) (duration: 01m 05s)
  • 18:01 mutante: temp. disabled puppet on install4001/install5001 - applying install_server role to new servers, starting with install3001
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:24 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:21 jbond42: enable puppet fleet wide post update puppetdb postgres logging
  • 17:19 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:17 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 17:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 17:15 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:15 jbond42: disable puppet fleet wide to update puppetdb postgres loggin
  • 17:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 17:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 17:14 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:11 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 17:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 17:09 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 17:04 mutante: syncing facts to puppet compiler hosts
  • 17:01 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 17:00 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:56 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:26 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 16:26 robh: properly pooled mw1360 this time T262151
  • 16:18 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 16:04 XioNoX: pfw3-eqiad> restart security-log gracefully
  • 15:58 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/AbuseFilter/includes/Hooks/AbuseFilterHookRunner.php: 5e88c36: HookRunner: onAbuseFilterGenerateUserVars should run generateUserVars (T263750) (duration: 01m 06s)
  • 15:46 Urbanecm: Run `mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=simplewiki --username="Oversight~simplewiki"` (T263760)
  • 15:44 Urbanecm: Run `mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=enwiki --username=Oversight` (T263760)
  • 15:43 Urbanecm: Rename all local Oversight accounts but enwiki to Oversight~dbname, see task for full list (T263760)
  • 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12794 and previous config saved to /var/cache/conftool/dbconfig/20200924-152626-root.json
  • 15:15 robh: mw1360 scap and repooled post work via T262151
  • 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 66%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12793 and previous config saved to /var/cache/conftool/dbconfig/20200924-151120-root.json
  • 15:10 jayme: switched zotero service-proxy listener to use TLS - T255869
  • 15:00 XioNoX: repool eqiad - T256112
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 33%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12792 and previous config saved to /var/cache/conftool/dbconfig/20200924-145617-root.json
  • 14:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:52 robh@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:28 XioNoX: [Netops] In window: turn VC-ports on/off for proper cabling: - T256112
  • 14:19 XioNoX: remove damping on anycast group for cr2-codfw
  • 14:18 jayme: restart pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255869
  • 14:16 jayme: restart pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255869
  • 14:16 XioNoX: [Netops] Disable unused VC ports to not risk them going online at connect: - T256112
  • 14:09 jayme: running puppet on lvs servers - T255869
  • 14:09 cmjohnson1: removing the cable connected to FPC1:1/0 (DAC 3m) FPC8:1/0 (DAC 3m)
  • 13:58 moritzm: upgrading mariadb on cloudcontrol-2001/2003/2004
  • 13:52 XioNoX: depool eqiad for row D recabling - T256112
  • 13:32 ottomata: Increased retention time for *.mediawiki.job.processMediaModeration topics in kafka main-eqiad and main-codfw to 31 days (as per request from Pchelolo )
  • 13:22 elukey: moved the hadoop cluster to puppet TLS certificates - T253957
  • 13:17 XioNoX: add damping to anycast BGP - T262372
  • 12:58 jayme: switched mathoid service-proxy listener to use TLS - T255875
  • 12:50 moritzm: upgrading bird on centtrallog1001
  • 12:43 gehel: restarting wdqs-categories on wdqs1009
  • 12:43 moritzm: installing netty-3.9 security updates
  • 12:42 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 12:30 ema: upload@ulsfo: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 12:29 godog: swift codfw-prod: rebalance only, no weight change
  • 12:27 kormat: powering off db2125 for maintenance T260670
  • 12:25 moritzm: installing xorg-server security updates
  • 12:09 ema: text@ulsfo: rolling varnish upgrade to 6.0.6-1wm1 T263557
  • 12:02 ema: cp4022: upgrade varnish to 6.0.6-1wm1 T263557
  • 11:40 Urbanecm: EU B&C window done
  • 11:34 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/Translate/tag/TPSection.php: fa4900e: Fix validation of translation unit section names (T263546) (duration: 01m 07s)
  • 11:25 jbond42: re-enable puppet fleet wide
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fdab74c: Enable ContentTranslation in Bashkir, Urdu and Welsh WPs as a default tool (T258504; T260022; T260024) (duration: 01m 05s)
  • 11:21 jbond42: disable puppet fleet wide to reduce log level on puppetdb
  • 11:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 90c7291: Move DiscussionTools out of beta on arwiki, cswiki, huwiki (T249394); d8553f3: Simplify DiscussionTools config (duration: 01m 11s)
  • 11:06 moritzm: installing imagemagick security updates on stretch
  • 11:02 jbond42: re-enable puppet fleet wide
  • 10:51 jbond42: disable puppet fleet wide to deploy a puppetmaster change
  • 10:49 moritzm: installing libproxy security updates
  • 10:23 volans: uploaded python3-wmflib_0.0.2 to apt.wikimedia.org buster-wikimedia
  • 10:20 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12789 and previous config saved to /var/cache/conftool/dbconfig/20200924-102025-kormat.json
  • 10:05 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12788 and previous config saved to /var/cache/conftool/dbconfig/20200924-100521-kormat.json
  • 10:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:50 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12787 and previous config saved to /var/cache/conftool/dbconfig/20200924-095018-kormat.json
  • 09:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 09:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 09:48 jayme: restart pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255875
  • 09:46 jayme: restart pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255875
  • 09:43 jayme: running puppet on lvs servers - T255875
  • 09:35 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12786 and previous config saved to /var/cache/conftool/dbconfig/20200924-093514-kormat.json
  • 09:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 09:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 09:20 ema: cp4021: repool with varnish 6.0.6-1wm1 T263557
  • 09:19 ema: cp4021: redepool with varnish to 6.0.6-1wm1 T263557
  • 09:14 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12785 and previous config saved to /var/cache/conftool/dbconfig/20200924-091445-kormat.json
  • 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:14 ema: cp4021: depool and upgrade varnish to 6.0.6-1wm1 T263557
  • 09:05 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 09:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 08:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 08:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
  • 08:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2127 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12784 and previous config saved to /var/cache/conftool/dbconfig/20200924-082443-marostegui.json
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 100%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12783 and previous config saved to /var/cache/conftool/dbconfig/20200924-082319-root.json
  • 08:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:17 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 08:15 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:15 XioNoX: configure vrrp_master_pinning in codfw - T263212
  • 08:10 moritzm: installing mariadb-10.1/mariadb-10.3 updates (packaged version from Debian, not the wmf-mariadb variants we used for mysqld)
  • 08:09 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:08 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 66%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12782 and previous config saved to /var/cache/conftool/dbconfig/20200924-080816-root.json
  • 07:58 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:57 marostegui: Remove es2018 from tendril and zarcillo T263613
  • 07:57 XioNoX: configure vrrp_master_pinning in eqiad - T263212
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 33%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12781 and previous config saved to /var/cache/conftool/dbconfig/20200924-075312-root.json
  • 07:52 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:49 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:49 godog: roll restart logstash codfw, gc death
  • 07:25 XioNoX: push pfw policies - T263674
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Place db2073 into vslow, not api in s4', diff saved to https://phabricator.wikimedia.org/P12780 and previous config saved to /var/cache/conftool/dbconfig/20200924-064018-marostegui.json
  • 06:22 elukey: powercycle elastic2037 (host stuck, no mgmt serial console working, DIMM errors in racadm getsel)
  • 05:57 marostegui: Remove es2012 from tendril and zarcillo T263613
  • 05:41 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2012 and es2018 from dbctl - T263615 T263613', diff saved to https://phabricator.wikimedia.org/P12778 and previous config saved to /var/cache/conftool/dbconfig/20200924-053001-marostegui.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12777 and previous config saved to /var/cache/conftool/dbconfig/20200924-052207-marostegui.json
  • 01:25 ryankemper: Root cause of sigkill of `elasticsearch_5@production-logstash-eqiad.service` appears to be OOMKill of the java process: `Killed process 1775 (java) total-vm:8016136kB, anon-rss:4888232kB, file-rss:0kB, shmem-rss:0kB`. Service appears to have restarted itself and is healthy again
  • 01:21 ryankemper: Observed that `elasticsearch_5@production-logstash-eqiad.service` is in a `failed` state since `Thu 2020-09-24 00:53:53 UTC`; appears the process received a SIGKILL - not sure why
  • 01:19 ryankemper: Getting `connection refused` when trying to `curl -X GET 'http://localhost:9200/_cluster/health'` on `logstash1009`
  • 01:16 ryankemper: (after) `{"cluster_name":"production-elk7-codfw","status":"green","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":868,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0`
  • 01:16 ryankemper: Ran `curl -X POST 'http://localhost:9200/_cluster/reroute?retry_failed=true'`, cluster status is green again
  • 01:15 ryankemper: (before) `{"cluster_name":"production-elk7-codfw","status":"yellow","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":866,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":2,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0`
  • 01:14 ryankemper: (before) `{"cluster_name":"production-elk7-codfw","status":"yellow","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":866,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":2,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0

2020-09-23

  • 23:52 mutante: alert1001 - systemctl restar ircecho because icinga-wm left the chat
  • 23:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cbd77e3: Add new Racine namespace to frwiktionary (T263525) (duration: 01m 05s)
  • 23:44 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 23:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:40 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:37 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 22382a9: remove wtp2005 from wgLinterSubmitterWhitelist (T257903) (duration: 01m 04s)
  • 23:14 eileen: civicrm revision changed from 32a82aa1b7 to eb90dbcfd3, config revision is 2a55766237
  • 23:13 eileen: civicrm revision is 32a82aa1b7, config revision is 2a55766237
  • 23:10 mutante: ganeti5003 - rebooting install5001 - OS install on 3001/4001/5001 T263684
  • 23:04 mutante: ganeti4003 - rebooting install4001
  • 22:51 mutante: ganeti5003 - rebooting install5001
  • 22:27 mutante: ganeti5003 - gnt-instance start install5001
  • 21:40 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:38 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:30 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.10 (duration: 01m 04s)
  • 21:29 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.10
  • 21:24 dancy@deploy1001: Finished scap: (no justification provided) (duration: 42m 52s)
  • 21:12 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:06 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 20:57 mepps: updated payments-wiki from 7bb99ce03a to f89c594e12
  • 20:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 20:42 dancy: dancy@deploy1001 Started scap: Deploying fixes for T263601 and T263675 to 1.36.0-wmf.10
  • 20:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:41 dancy@deploy1001: Started scap: (no justification provided)
  • 20:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:36 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 20:36 eileen: civicrm revision changed from a789afd79b to 32a82aa1b7, config revision is 2a55766237
  • 20:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:30 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 20:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:28 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 20:27 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 20:22 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 20:18 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 20:15 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 20:08 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 20:06 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 20:02 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 19:42 robh: ganeti5002 firmware update before hw testing via T261130
  • 18:57 ryankemper: (Above deploy complete)
  • 18:54 ryankemper: `scap sync-file wmf-config/ProductionServices.php 'Config: cloudelastic: envoy sits in front now (T263073)'` from `ryankemper@deploy1001:/srv/mediawiki-staging`
  • 18:47 ryankemper: Above deploy appears successful, test requests seem to be taking 40ms instead of the previous 140ms
  • 18:31 ryankemper: HEAD of `/srv/mediawiki-staging` is now at 7a96d63 as expected
  • 18:13 Urbanecm: End of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwikiquote --previous-collation=uppercase # T263628
  • 18:13 Urbanecm: Start of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwikiquote --previous-collation=uppercase # T263628
  • 18:12 Urbanecm: urbanecm@deploy1001: scap sync-file wmf-config/InitialiseSettings.php 'b1554f36be68106c9364f4aa2fd70d759ad74356: Set $wgCategoryCollation = uca-tr on trwikiquote (T263628)'
  • 18:11 Urbanecm: Logmsgbot seems to be down
  • 17:29 robh: migrating ganeti instances off ganeti5002 for troubleshooting per T261130
  • 16:37 sukhe: upload dnsdist_1.4.0-1~deb10u2 to apt.wm.o (buster) - T252132
  • 16:00 herron: switching icinga over from icinga1001 to alert1001 T247966
  • 16:00 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2088:3312 from api now that db2104/db2126 are done T259831', diff saved to https://phabricator.wikimedia.org/P12775 and previous config saved to /var/cache/conftool/dbconfig/20200923-160010-kormat.json
  • 15:58 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12774 and previous config saved to /var/cache/conftool/dbconfig/20200923-155819-kormat.json
  • 15:57 robh: updating firmware on mw1360, troubleshooting nic failure issue T262151
  • 15:57 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/includes/specials/SpecialBlock.php: 3234fad: SpecialUnblock: Allow getTargetAndType to accept null $par (T263642) (duration: 01m 07s)
  • 15:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/includes/specials/SpecialUnblock.php: 3234fad: SpecialUnblock: Allow getTargetAndType to accept null $par (T263642) (duration: 01m 08s)
  • 15:53 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:51 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:48 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:48 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:45 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:45 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:44 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 15:44 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 15:43 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:43 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12773 and previous config saved to /var/cache/conftool/dbconfig/20200923-154315-kormat.json
  • 15:40 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:37 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:33 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
  • 15:30 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:28 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12772 and previous config saved to /var/cache/conftool/dbconfig/20200923-152812-kormat.json
  • 15:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
  • 15:21 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
  • 15:13 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12771 and previous config saved to /var/cache/conftool/dbconfig/20200923-151308-kormat.json
  • 14:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:44 kormat@cumin1001: dbctl commit (dc=all): 'db2126 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12770 and previous config saved to /var/cache/conftool/dbconfig/20200923-144441-kormat.json
  • 14:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:37 herron: grew prometheus1004 prometheus-ops filesystem to 1.6T
  • 14:35 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable repo config propagateChangeVisibility everywhere, 2/2 (duration: 01m 06s)
  • 14:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Enable repo config propagateChangeVisibility everywhere, 1/2 (duration: 01m 06s)
  • 13:50 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12769 and previous config saved to /var/cache/conftool/dbconfig/20200923-135028-kormat.json
  • 13:35 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12768 and previous config saved to /var/cache/conftool/dbconfig/20200923-133525-kormat.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 100%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12766 and previous config saved to /var/cache/conftool/dbconfig/20200923-132918-root.json
  • 13:20 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12765 and previous config saved to /var/cache/conftool/dbconfig/20200923-132022-kormat.json
  • 13:20 moritzm: installing ruby-json security updates
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 75%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12764 and previous config saved to /var/cache/conftool/dbconfig/20200923-131414-root.json
  • 13:05 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12763 and previous config saved to /var/cache/conftool/dbconfig/20200923-130518-kormat.json
  • 13:04 moritzm: installing multipath-tools bugfix updates from buster 10.5 point release
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 25%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12762 and previous config saved to /var/cache/conftool/dbconfig/20200923-125911-root.json
  • 12:49 moritzm: installing libunwind bugfix updates from buster 10.5 point release
  • 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2104 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12761 and previous config saved to /var/cache/conftool/dbconfig/20200923-123922-kormat.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074', diff saved to https://phabricator.wikimedia.org/P12760 and previous config saved to /var/cache/conftool/dbconfig/20200923-123806-marostegui.json
  • 12:37 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:37 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:36 kormat@cumin1001: dbctl commit (dc=all): 'Add db2088:3312 to api while db2104 gets depooled T259831', diff saved to https://phabricator.wikimedia.org/P12759 and previous config saved to /var/cache/conftool/dbconfig/20200923-123649-kormat.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 25%: Slowly db2074 ', diff saved to https://phabricator.wikimedia.org/P12758 and previous config saved to /var/cache/conftool/dbconfig/20200923-123528-root.json
  • 12:22 ema: cp4027: repool with varnish 6.0.6-1wm1 T263557
  • 12:09 ema: cp4027: depool and upgrade varnish to 6.0.6-1wm1 T263557
  • 11:52 moritzm: installing GNUTLS bugfix updates from buster 10.5 point release
  • 11:51 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.Homepage.GrowthTasksApi.js: 73b5ce8: Fix GrowthTasksApi lazy-loading flags for pages with no views (T263611) (duration: 01m 05s)
  • 11:49 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/modules/help/ext.growthExperiments.PostEdit.js: 1ab31a9: Mark pageviews as not used in the mobile postedit notice (T263611) (duration: 01m 06s)
  • 11:38 Urbanecm: Revert https://gerrit.wikimedia.org/r/c/mediawiki/core/+/629188 and fetch to deploy1001 to unblock EU B&C deployment (T237467; cc twentyafterfour)
  • 11:27 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12756 and previous config saved to /var/cache/conftool/dbconfig/20200923-112712-kormat.json
  • 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12755 and previous config saved to /var/cache/conftool/dbconfig/20200923-111209-kormat.json
  • 11:08 Urbanecm: Create ContentTranslation tables at testwiki using SQL files from `/srv/mediawiki/php-1.36.0-wmf.10/extensions/ContentTranslation/sql` (T263417
  • 10:57 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12754 and previous config saved to /var/cache/conftool/dbconfig/20200923-105705-kormat.json
  • 10:42 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12753 and previous config saved to /var/cache/conftool/dbconfig/20200923-104202-kormat.json
  • 10:21 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12752 and previous config saved to /var/cache/conftool/dbconfig/20200923-102120-kormat.json
  • 10:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12751 and previous config saved to /var/cache/conftool/dbconfig/20200923-100156-marostegui.json
  • 10:01 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Configure entityDataCachePaths for Wikibase (duration: 01m 05s)
  • 09:59 elukey: update puppet compiler's facts
  • 09:57 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wgExtraLanguageNames from Wikidata and Commons (T260118), part 2/2 (production no-op) (duration: 01m 04s)
  • 09:55 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wgExtraLanguageNames from Wikidata and Commons (T260118), part 1/2 (duration: 01m 16s)
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12750 and previous config saved to /var/cache/conftool/dbconfig/20200923-094511-marostegui.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12748 and previous config saved to /var/cache/conftool/dbconfig/20200923-083200-marostegui.json
  • 08:29 moritzm: installing dbus security updates on buster
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12747 and previous config saved to /var/cache/conftool/dbconfig/20200923-080651-marostegui.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes T262856', diff saved to https://phabricator.wikimedia.org/P12746 and previous config saved to /var/cache/conftool/dbconfig/20200923-071129-marostegui.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 to re-add change_revision_id index T262856', diff saved to https://phabricator.wikimedia.org/P12745 and previous config saved to /var/cache/conftool/dbconfig/20200923-070926-marostegui.json
  • 06:34 marostegui: Stop MySQL on es2012 and es2018 T263613 T263615
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2018 T263615', diff saved to https://phabricator.wikimedia.org/P12744 and previous config saved to /var/cache/conftool/dbconfig/20200923-063140-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2012 for decommmissioning', diff saved to https://phabricator.wikimedia.org/P12743 and previous config saved to /var/cache/conftool/dbconfig/20200923-060812-marostegui.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index removal T262856', diff saved to https://phabricator.wikimedia.org/P12742 and previous config saved to /var/cache/conftool/dbconfig/20200923-055850-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 T262856', diff saved to https://phabricator.wikimedia.org/P12741 and previous config saved to /var/cache/conftool/dbconfig/20200923-055531-marostegui.json
  • 05:37 marostegui: Purge global_status_log table on tendril - T252331
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:03 marostegui: Remove triggers from db2094:3313 for MCR schema change T238966
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12739 and previous config saved to /var/cache/conftool/dbconfig/20200923-050234-marostegui.json
  • 04:25 eileen: civicrm revision changed from 8f32b6301f to a789afd79b, config revision is 9933605187

2020-09-22

  • 23:27 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: clientError: enable on ja,es,de,ru,it,zh,pt wikipedias (T255585) (duration: 01m 04s)
  • 23:24 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable watchlist expiry feature (T261249) (duration: 01m 06s)
  • 21:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:46 ebernhardson: T259539 enabled adaptive replica selection on elasticsearch at search.svc.eqiad.wmnet:9[246]43
  • 20:44 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:43 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.10
  • 20:42 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:31 dancy@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.10 (duration: 42m 21s)
  • 20:30 mutante: gerrit2001 (gerrit-replica) restarting gerrit service
  • 19:49 dancy@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.10
  • 19:44 dancy@deploy1001: Pruned MediaWiki: 1.36.0-wmf.5 (duration: 17m 59s)
  • 19:31 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:29 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 16:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:00 robh: running dell epsa test on down host mw1360 per T262151
  • 14:34 moritzm: installing nginx security updates on buster
  • 14:33 shdubsh: restart apache on prometheus nodes to pick up new ext endpoint
  • 14:24 ema: upload libvmod-re2 1.5.3-1 to buster-wikimedia component/varnish6 T261632
  • 14:24 papaul: rebooting ms-be2019
  • 14:15 XioNoX: upgrade FNM on netflow2001 - T257035
  • 14:12 jayme: running ipvsadm -D -t 10.2.1.19:1970; ipvsadm -D -t 10.2.1.21:24766 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255868 T255877
  • 14:12 jayme: running ipvsadm -D -t 10.2.2.19:1970; ipvsadm -D -t 10.2.2.21:24766 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255868 T255877
  • 14:11 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255868 T255877
  • 14:10 XioNoX: upgrade FNM on netflow5001 - T257035
  • 14:09 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255868 T255877
  • 14:09 shdubsh: restart statsv on webperf[1-2]001 to route metrics through statsd-exporter
  • 14:09 XioNoX: upgrade FNM on netflow1001 - T257035
  • 14:06 XioNoX: upgrade FNM on netflow3001 - T257035
  • 14:05 jayme: running puppet on lvs servers - T255868 T255877
  • 14:03 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 14:02 hnowlan: roll-restarting restbase codfw for java updates
  • 13:59 XioNoX: add fastnetmon_1.1.7 to buster-wikimedia repo - T257035
  • 13:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:55 ema: upload varnish-modules 0.15.0-1+wmf1 to buster-wikimedia component/varnish6 T261632
  • 13:49 marostegui: Deploy MCR change on db2098:3313 - T238966
  • 13:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:39 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:35 ema: upload libvmod-netmapper 1.8-1 to buster-wikimedia component/varnish6 T261632
  • 12:54 ema: upload varnishkafka 1.1.0-1 to buster-wikimedia component/varnish6 T261632
  • 12:11 moritzm: installing python3.7 security updates on Buster
  • 12:09 moritzm: installing bundler updates on buster
  • 11:59 Urbanecm: Reset password for SUL User:Freibo
  • 11:58 Lucas_WMDE: EU backport&config window done
  • 11:56 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2001:~$ mwscript namespaceDupes.php trwikisource --fix | tee T263358.fix # 1350 to fix, 1350 resolvable, 0 deleted
  • 11:55 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2001:~$ mwscript namespaceDupes.php trwikisource | tee T263358.dryrun # 1350 to fix, 1350 resolvable, 0 deleted
  • 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Create Portal and Portal_talk namespaces on trwikisource, and fix an incorrect alias (T263358) (duration: 00m 57s)
  • 11:47 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Removing Wikipedia store link from enwiki (T262329) (duration: 00m 57s)
  • 11:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Set timezone for wikis of the CWIRP to Europe/Rome (T263123) (duration: 00m 59s)
  • 11:35 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 11:35 hnowlan: roll-restarting restbase eqiad for java updates
  • 11:25 ema: upload varnish 6.0.6-1wm1 to buster-wikimedia component/varnish6 T261632
  • 11:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:13 moritzm: installing intel-microcode 3.20200616.1 on Buster baremetal servers (compared to to current installed packages this reverts microcode changes for some Skylake CPUs we don't use
  • 11:00 moritzm: installing intel-microcode 3.20200616.1 on Stretch baremetal servers (compared to to current installed packages this reverts microcode changes for some Skylake CPUs we don't use
  • 10:51 XioNoX: Add policy-options for primary IXPs to all routers - T262517
  • 10:48 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 10:48 hnowlan: roll-restarting sessionstore for java security updates
  • 10:44 moritzm: installing bacula security updates on stretch
  • 10:33 moritzm: installing remaining libx11 security updates
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12733 and previous config saved to /var/cache/conftool/dbconfig/20200922-101342-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12732 and previous config saved to /var/cache/conftool/dbconfig/20200922-101324-root.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12731 and previous config saved to /var/cache/conftool/dbconfig/20200922-101308-root.json
  • 10:00 kormat: deploying schema change to s2 in eqiad. labsdb will have s2 lag until this finishes. T259831
  • 09:59 jayme: running ipvsadm -D -t 10.2.1.45:34192; ipvsadm -D -t 10.2.1.42:35192 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255873 T255870
  • 09:59 jayme: running ipvsadm -D -t 10.2.2.45:34192; ipvsadm -D -t 10.2.2.42:35192 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255873 T255870
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 75%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12730 and previous config saved to /var/cache/conftool/dbconfig/20200922-095839-root.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 75%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12729 and previous config saved to /var/cache/conftool/dbconfig/20200922-095821-root.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 75%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12728 and previous config saved to /var/cache/conftool/dbconfig/20200922-095805-root.json
  • 09:57 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255873 T255870
  • 09:55 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255873 T255870
  • 09:51 jayme: running puppet on lvs servers - T255873 T255870
  • 09:46 jbond@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-password (exit_code=99)
  • 09:46 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 50%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12727 and previous config saved to /var/cache/conftool/dbconfig/20200922-094336-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 50%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12726 and previous config saved to /var/cache/conftool/dbconfig/20200922-094317-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 50%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12725 and previous config saved to /var/cache/conftool/dbconfig/20200922-094302-root.json
  • 09:30 volans: repooling ulsfo after merging DNS migration to Netbox zonefiles - T258729
  • 09:30 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.uptime (exit_code=0)
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 25%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12724 and previous config saved to /var/cache/conftool/dbconfig/20200922-092832-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 25%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12723 and previous config saved to /var/cache/conftool/dbconfig/20200922-092814-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 25%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12722 and previous config saved to /var/cache/conftool/dbconfig/20200922-092758-root.json
  • 09:26 jbond@cumin1001: START - Cookbook sre.pdus.uptime
  • 09:24 XioNoX: replace BGP_IXP_in with BGP_IXP_PRIMARY_in on cr3-ulsfo IX BGP group - T262517
  • 09:22 XioNoX: add BGP_IXP_PRIMARY_in to cr3-ulsfo - T262517
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: Slowly repool es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12721 and previous config saved to /var/cache/conftool/dbconfig/20200922-091329-root.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: Slowly repool es2033 T261717 ', diff saved to https://phabricator.wikimedia.org/P12720 and previous config saved to /var/cache/conftool/dbconfig/20200922-091310-root.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: Slowly es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12719 and previous config saved to /var/cache/conftool/dbconfig/20200922-091255-root.json
  • 09:11 jbond42: update snmp string on ps1-a8-codfw
  • 09:05 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12718 and previous config saved to /var/cache/conftool/dbconfig/20200922-090520-kormat.json
  • 08:58 _joe_: restart pybal on lvs2009
  • 08:56 _joe_: restarting pybal on lvs2010
  • 08:54 _joe_: restarted pybal on lvs1015
  • 08:50 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12717 and previous config saved to /var/cache/conftool/dbconfig/20200922-085017-kormat.json
  • 08:36 _joe_: restarting pybal low-traffic in eqiad to pick up lvs changes
  • 08:35 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12715 and previous config saved to /var/cache/conftool/dbconfig/20200922-083514-kormat.json
  • 08:22 volans: migrating ulsfo public DNS records to the Netbox-generated ones - T258729
  • 08:20 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12714 and previous config saved to /var/cache/conftool/dbconfig/20200922-082010-kormat.json
  • 08:13 kormat: uploaded wmfmariadbpy v0.5 to apt. deploying now to fleet
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2032, es2033 and es2034 for the first time with minimal weight T261717', diff saved to https://phabricator.wikimedia.org/P12713 and previous config saved to /var/cache/conftool/dbconfig/20200922-081154-marostegui.json
  • 07:57 volans: migrating ulsfo private DNS records to the Netbox-generated ones - T258729
  • 07:54 kormat@cumin1001: dbctl commit (dc=all): 'db2076 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12712 and previous config saved to /var/cache/conftool/dbconfig/20200922-075429-kormat.json
  • 07:51 jayme: running ipvsadm -D -t 10.2.1.18:8080; ipvsadm -D -t 10.2.1.46:3030 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - T255879 T254581
  • 07:49 jayme: running ipvsadm -D -t 10.2.2.18:8080; ipvsadm -D -t 10.2.2.46:3030 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - T255879 T254581
  • 07:46 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - T255879 T254581
  • 07:42 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - T255879 T254581
  • 07:39 jayme: running puppet on lvs servers - T255879 T254581
  • 07:34 volans: depooling ulsfo to merge DNS migration to Netbox zonefiles - T258729
  • 07:24 marostegui: Stop MySQL on es2014 - host will be decommissioned T262889
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2014 from dbctl T262889', diff saved to https://phabricator.wikimedia.org/P12711 and previous config saved to /var/cache/conftool/dbconfig/20200922-071435-marostegui.json
  • 07:11 XioNoX: cr1-codfw# run clear bfd session address fe80::f27c:c7ff:fe11:2c1b
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2014 for decommissioning T262889', diff saved to https://phabricator.wikimedia.org/P12710 and previous config saved to /var/cache/conftool/dbconfig/20200922-061815-marostegui.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 100%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12709 and previous config saved to /var/cache/conftool/dbconfig/20200922-054455-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 100%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12708 and previous config saved to /var/cache/conftool/dbconfig/20200922-054438-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 100%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12707 and previous config saved to /var/cache/conftool/dbconfig/20200922-054430-root.json
  • 05:41 marostegui: Log remove triggers on revision table on db1124:3313 T238966
  • 05:39 marostegui: Deploy MCR schema change on s3 eqiad, this will generate lag on s3 on labsdb T238966
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2032, es2033 and es2034 into dbctl T261717', diff saved to https://phabricator.wikimedia.org/P12706 and previous config saved to /var/cache/conftool/dbconfig/20200922-053346-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 75%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12705 and previous config saved to /var/cache/conftool/dbconfig/20200922-052951-root.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 75%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12704 and previous config saved to /var/cache/conftool/dbconfig/20200922-052935-root.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 75%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12703 and previous config saved to /var/cache/conftool/dbconfig/20200922-052926-root.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 50%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12702 and previous config saved to /var/cache/conftool/dbconfig/20200922-051448-root.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 50%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12701 and previous config saved to /var/cache/conftool/dbconfig/20200922-051431-root.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 50%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12700 and previous config saved to /var/cache/conftool/dbconfig/20200922-051423-root.json
  • 05:00 marostegui: Add es2032 es2033 and es2034 to tendril and zarcillo T261717
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 25%: Slowly repool after recloning es2034 T261717 ', diff saved to https://phabricator.wikimedia.org/P12699 and previous config saved to /var/cache/conftool/dbconfig/20200922-045944-root.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 25%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12698 and previous config saved to /var/cache/conftool/dbconfig/20200922-045928-root.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 25%: Slowly repool after recloning es2032 T261717 ', diff saved to https://phabricator.wikimedia.org/P12697 and previous config saved to /var/cache/conftool/dbconfig/20200922-045919-root.json
  • 01:35 ryankemper: `sudo cumin C:profile::services_proxy::envoy 'enable-puppet "adding cloudelastic to the service proxy --rkemper"'` done
  • 01:35 ryankemper: woot! `curl -X GET -s 'http://localhost:6105/_cluster/health'` gives a response as expected. (As do 6106 and 6107). Re-enabling puppet across the fleet...
  • 01:32 ryankemper: `sudo run-puppet-agent -e "adding cloudelastic to the service proxy --rkemper"` on `mwdebug1002.eqiad.wmnet`
  • 01:28 ryankemper: `sudo puppet-merge` done, now will run puppet on a single eqiad appserver and verify we can curl `localhost:610{5,6,7}`
  • 01:17 ryankemper: Disabling puppet on affected nodes via `sudo cumin C:profile::services_proxy::envoy 'disable-puppet "adding cloudelastic to the service proxy --rkemper"'`
  • 01:17 ryankemper: Going to test patch to stick envoy in front of `cloudelastic`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/628243

2020-09-21

  • 23:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:39 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:37 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 23:36 mutante: debmonitor2002 - systemctl reset-failed
  • 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 22:57 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 22:55 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 22:20 mutante: releases.wikimedia.org has been converted to an active-active service with geodns/ backends in both DCs
  • 21:56 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 21:54 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 21:51 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 21:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:18 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:49 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: adjust enwiktionary completion search ranking (duration: 00m 57s)
  • 20:47 ebernhardson@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/CirrusSearch/: Remove pages from completion search by page id (duration: 01m 00s)
  • 20:04 herron: moving prometheus instance from bast3004 to prometheus3001 T243057
  • 19:46 herron: moving prometheus instance from bast4002 to prometheus4001 T243057
  • 19:38 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Push notifications deployment (4/5) (duration: 00m 57s)
  • 19:34 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: Push notifications deployment (3/5) (duration: 00m 57s)
  • 19:28 mholloway-shell@deploy1001: Synchronized wmf-config/ProductionServices.php: Push notifications deployment (2/5) (duration: 00m 57s)
  • 19:26 mholloway-shell@deploy1001: Synchronized wmf-config/LabsServices.php: Push notifications deployment (1/5) (duration: 00m 57s)
  • 19:19 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 19:18 mepps: updated crm to 8f32b6301f
  • 19:15 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 19:14 ejegg: updated fundraising CiviCRM from e5ebf9d18a to 8f32b6301f
  • 19:13 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 18:59 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622863 T249745 (duration: 00m 56s)
  • 18:57 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@8afe8d2]: mjolnir daemons update I336365 (duration: 06m 54s)
  • 18:53 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on plwiki (T254239) and ptwiki (T255027) (duration: 00m 56s)
  • 18:50 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@8afe8d2]: mjolnir daemons update I336365
  • 18:33 mepps: updated crm from cc1f7e6d13 to e5ebf9d18a
  • 18:26 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Define Chinese logo variants for Modern Vector (no-op) (part 2) (T261153) (duration: 00m 56s)
  • 18:25 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Define Chinese logo variants for Modern Vector (no-op) (T261153) (duration: 00m 57s)
  • 18:21 catrope@deploy1001: Synchronized static/images/mobile/copyright/: Update Chinese logo variants for Modern Vector (T261153) (duration: 00m 56s)
  • 18:08 XioNoX: add NAT rule to pfw3-codfw - T263488
  • 17:42 papaul: rebooting ps1-a8-codfw firmware upgrade
  • 16:46 papaul: shutting down ms-be2019 for BBU replacing
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12696 and previous config saved to /var/cache/conftool/dbconfig/20200921-162433-root.json
  • 16:17 papaul: replacing msw-c8-codfw
  • 16:16 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12695 and previous config saved to /var/cache/conftool/dbconfig/20200921-160929-root.json
  • 16:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 50%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12694 and previous config saved to /var/cache/conftool/dbconfig/20200921-155426-root.json
  • 15:51 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/: Introduce and use StatsdMonitoring trait in term store (T262923), Part I (duration: 00m 56s)
  • 15:50 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/Util/StatsdMonitoring.php: Introduce and use StatsdMonitoring trait in term store (T262923), Part I (duration: 00m 59s)
  • 15:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Slowly repool after on-site maintenance T262247 ', diff saved to https://phabricator.wikimedia.org/P12693 and previous config saved to /var/cache/conftool/dbconfig/20200921-153923-root.json
  • 15:24 hnowlan: roll-restarting restbase-dev for java security updates
  • 15:24 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Take db2124 back out of dump/vslow T259831', diff saved to https://phabricator.wikimedia.org/P12692 and previous config saved to /var/cache/conftool/dbconfig/20200921-151210-kormat.json
  • 15:10 moritzm: rolling restart of mw canaries in codfw to pick up libx11 update
  • 15:07 moritzm: installing libx11 security updates on stretch
  • 15:02 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12691 and previous config saved to /var/cache/conftool/dbconfig/20200921-150233-kormat.json
  • 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12690 and previous config saved to /var/cache/conftool/dbconfig/20200921-144729-kormat.json
  • 14:40 moritzm: installing qemu security updates on ganeti* stretch nodes
  • 14:37 papaul: firmware upgrade on db2127
  • 14:36 moritzm: installing qemu security updates on ganeti2011 and gnt-instance reboot debmonitor2001
  • 14:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:32 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12689 and previous config saved to /var/cache/conftool/dbconfig/20200921-143226-kormat.json
  • 14:30 herron: moving prometheus from bast5001 to prometheus5001 T243057
  • 14:24 papaul: disconnecting mgmt on msw-c1-codfw to re-do cable end T263138
  • 14:21 marostegui: Set innodb_change_buffering = inserts; on db2125 (s2 slave) for performance testing T263443
  • 14:17 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12688 and previous config saved to /var/cache/conftool/dbconfig/20200921-141722-kormat.json
  • 14:11 papaul: disconnecting mgmt on msw-d6-codfw to re-do cable end T263138
  • 14:00 moritzm: installing Java security updates on restbase/sessionstore*
  • 13:58 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2117 for schema change, add db2124 to dump/vslow in the interim T259831', diff saved to https://phabricator.wikimedia.org/P12687 and previous config saved to /var/cache/conftool/dbconfig/20200921-135821-kormat.json
  • 13:21 moritzm: installing glib-networking security updates for Stretch
  • 13:21 marostegui: Set innodb_change_buffering = inserts; on db2081 (s8 slave) for performance testing T263443
  • 12:59 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=push-notifications,name=codfw
  • 12:38 XioNoX: set same OSPF metric on both eqiad/codfw links - T263230
  • 12:26 marostegui: Set innodb_change_buffering = all; on db2071 (s1 slave) for performance testing T263443
  • 12:26 marostegui: Set innodb_change_buffering = all; on db2129 (s6 master) for performance testing T263443
  • 11:38 effie: restart pybal on lvs2009 and lvs1015 - T256973
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - crashed', diff saved to https://phabricator.wikimedia.org/P12684 and previous config saved to /var/cache/conftool/dbconfig/20200921-113708-marostegui.json
  • 11:35 Urbanecm: EU B&C done
  • 11:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/MobileFrontend/includes/Transforms/MoveLeadParagraphTransform.php: 3fab588: Simplify lead paragraph check (duration: 00m 59s)
  • 11:22 effie: restart pybal on lvs2010 and lvs1016 - T256973
  • 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a62212a: Allow local steward group members to bigdelete (duration: 00m 57s)
  • 11:12 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=shnwiktionary --fix # T256348 # P12683
  • 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1cf4664: Set WT namespace alias to NS_PROJECT in shn.wiktionary (T256348) (duration: 00m 57s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 01ba828: Add archive.wul.waseda.ac.jp to the wgCopyUploadDomains (T261037) (duration: 00m 57s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bd51f47: Add *.70yearsindonesiaaustralia.com to the wgCopyUploadsDomains allowlist of commonswiki (T262238) (duration: 00m 57s)
  • 11:02 effie: restart pybal on lvs2010 and lvs1016 - T256973
  • 10:36 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:35 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 12s)
  • 09:03 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12682 and previous config saved to /var/cache/conftool/dbconfig/20200921-090343-kormat.json
  • 08:48 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12681 and previous config saved to /var/cache/conftool/dbconfig/20200921-084840-kormat.json
  • 08:48 marostegui: Stop MySQL on db2127 for on-site maintenance - T262247
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2127 T262247', diff saved to https://phabricator.wikimedia.org/P12680 and previous config saved to /var/cache/conftool/dbconfig/20200921-084730-marostegui.json
  • 08:33 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12679 and previous config saved to /var/cache/conftool/dbconfig/20200921-083337-kormat.json
  • 08:21 godog: swift codfw-prod: bump weight for ms-be2057 - T261633
  • 08:18 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: reimage+reclone done T263244', diff saved to https://phabricator.wikimedia.org/P12678 and previous config saved to /var/cache/conftool/dbconfig/20200921-081833-kormat.json
  • 08:15 godog: roll-restart swift-object-replicator in codfw and eqiad for increased concurrency
  • 07:53 hashar: Upgrading all CI Jenkins jobs to Quibble 0.0.45
  • 07:05 XioNoX: upgrade FNM to 1.1.7 in ulsfo - T257035
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12677 and previous config saved to /var/cache/conftool/dbconfig/20200921-060053-marostegui.json
  • 05:48 marostegui: Set innodb_change_buffering = inserts; on db2129 (s6 master) for performance testing
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12676 and previous config saved to /var/cache/conftool/dbconfig/20200921-054730-marostegui.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12675 and previous config saved to /var/cache/conftool/dbconfig/20200921-052704-marostegui.json
  • 05:18 marostegui: Stop mysql on: es2013 es2016 es2019 to clone es2032 es2033 es2034 - T261717
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12674 and previous config saved to /var/cache/conftool/dbconfig/20200921-050632-marostegui.json
  • 05:06 marostegui: Deploy MCR schema change on s8 eqiad master, lag will appear on s8 (wikidata) on labsdb hosts T238966
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2013,es2016 and es2019 to clone new hosts T261717', diff saved to https://phabricator.wikimedia.org/P12673 and previous config saved to /var/cache/conftool/dbconfig/20200921-050305-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2015 as es2 codfw master T261717', diff saved to https://phabricator.wikimedia.org/P12672 and previous config saved to /var/cache/conftool/dbconfig/20200921-050228-marostegui.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12671 and previous config saved to /var/cache/conftool/dbconfig/20200921-045919-marostegui.json
  • 04:37 marostegui: Set innodb_change_buffering = inserts; on db2116 for performance testing
  • 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 for the first time with minimal weight T261717', diff saved to https://phabricator.wikimedia.org/P12670 and previous config saved to /var/cache/conftool/dbconfig/20200921-043154-marostegui.json

2020-09-20

  • 08:46 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Tepig10102020' 'Davidfromtheworld' # T263317
  • 07:42 gehel: depooling wdqs2002 to catch up on lag
  • 07:36 gehel: restarting blazegraph + updater on wdqs2002

2020-09-19

  • 19:03 ariel@deploy1001: Finished deploy [dumps/dumps@14ba6e9]: defer getting db creds until really needed (duration: 00m 04s)
  • 19:02 ariel@deploy1001: Started deploy [dumps/dumps@14ba6e9]: defer getting db creds until really needed
  • 16:49 ejegg: reverted PayPal failmail diversion - IPN verification is working again
  • 16:27 ejegg: Diverted SmashPig PayPal failmail to eeggleston only

2020-09-18

  • 21:48 tzatziki: changed password for Millennium bug@ptwiki
  • 19:28 eileen: process-control config revision is 739ea754ca
  • 18:52 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:46 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 18:44 ryankemper: `sudo kill 254017 254018 254028 254029` to kill some dangling serdi / gzip processes, all the wikidata cleanup should be complete
  • 18:38 ryankemper: `sudo kill 126121 126122 126124 126128 249520 249521 254016 254027` on `snapshot1008` to terminate wikidata dump jobs that are in a bad state
  • 18:10 ryankemper: Removed stale `wikidatardf-dumps` crontab entry from `dumpsgen@snapshot1008`, stored backup of previous state of crontab in the (admittedly verbose) `/tmp/dumpsgen_crontab_before_removing_stale_wikidata_dump_entry_see_gerrit_puppet_patch_622342`
  • 17:15 mutante: lists1001 - apt-get install pwgen to generate passwords (this was installed on previous list server but apparently not puppetized, puppet patch coming up)
  • 16:23 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:21 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:09 mutante: restarting gerrit service to apply gerrit::628338 to make it dump heap if out of memory (T263008)
  • 14:15 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: labs: Turn on termbox v2 on desktop for wikidatawiki -- noop for production, sanity sync (T261488) (duration: 00m 56s)
  • 14:13 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: labs: Turn on termbox v2 on desktop for wikidatawiki -- noop for production, sanity sync (T261488) (duration: 01m 00s)
  • 13:02 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:00 kormat@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:48 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
  • 12:41 kormat: reimaging db2125 T263244
  • 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12665 and previous config saved to /var/cache/conftool/dbconfig/20200918-123947-kormat.json
  • 12:24 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12664 and previous config saved to /var/cache/conftool/dbconfig/20200918-122444-kormat.json
  • 12:09 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12663 and previous config saved to /var/cache/conftool/dbconfig/20200918-120940-kormat.json
  • 11:54 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12662 and previous config saved to /var/cache/conftool/dbconfig/20200918-115437-kormat.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125', diff saved to https://phabricator.wikimedia.org/P12661 and previous config saved to /var/cache/conftool/dbconfig/20200918-113509-marostegui.json
  • 11:15 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12660 and previous config saved to /var/cache/conftool/dbconfig/20200918-111529-kormat.json
  • 10:56 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12659 and previous config saved to /var/cache/conftool/dbconfig/20200918-105645-kormat.json
  • 10:45 jiji@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:41 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 75%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12658 and previous config saved to /var/cache/conftool/dbconfig/20200918-104141-kormat.json
  • 10:35 jiji@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:34 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:28 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:26 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 50%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12657 and previous config saved to /var/cache/conftool/dbconfig/20200918-102638-kormat.json
  • 10:11 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 25%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12656 and previous config saved to /var/cache/conftool/dbconfig/20200918-101135-kormat.json
  • 09:55 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12655 and previous config saved to /var/cache/conftool/dbconfig/20200918-095554-kormat.json
  • 09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:47 twentyafterfour: deployed hotfix for T263063 to phab1001
  • 09:47 jayme: deleting some random pods in kubernetes staging to rebalance load back on kubestage1001 - T262527
  • 09:46 jayme: uncordoned kubestage1001 - T262527
  • 09:46 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12654 and previous config saved to /var/cache/conftool/dbconfig/20200918-094608-kormat.json
  • 09:31 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 80%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12653 and previous config saved to /var/cache/conftool/dbconfig/20200918-093105-kormat.json
  • 09:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:22 klausman@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:16 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 60%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12652 and previous config saved to /var/cache/conftool/dbconfig/20200918-091601-kormat.json
  • 09:00 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 40%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12651 and previous config saved to /var/cache/conftool/dbconfig/20200918-090058-kormat.json
  • 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:56 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:56 jayme: reboot kubestage1001 for clean state - T262527
  • 08:54 elukey: change analytics-in4/in6 filters on cr1/cr2 after https://gerrit.wikimedia.org/r/628300
  • 08:47 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:45 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 20%: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12650 and previous config saved to /var/cache/conftool/dbconfig/20200918-084554-kormat.json
  • 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:43 jayme: reboot kubestage1001 for kernel upgrade - T262527
  • 08:30 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:25 jayme: reboot kubestage1001 for clean state testing - T262527
  • 08:22 kormat@cumin1001: dbctl commit (dc=all): 'db2124 depooling: schema change T259831', diff saved to https://phabricator.wikimedia.org/P12648 and previous config saved to /var/cache/conftool/dbconfig/20200918-082223-kormat.json
  • 08:16 klausman: reinstalling stat1004 with Buster
  • 07:17 moritzm: installing xdg-utils security updates
  • 07:14 XioNoX: push pfw policies - T263168
  • 07:12 jayme: draining kubestage1001 for kernel upgrade - T262527
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12647 and previous config saved to /var/cache/conftool/dbconfig/20200918-062127-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12646 and previous config saved to /var/cache/conftool/dbconfig/20200918-060815-marostegui.json
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1131 after rack move', diff saved to https://phabricator.wikimedia.org/P12645 and previous config saved to /var/cache/conftool/dbconfig/20200918-060724-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12644 and previous config saved to /var/cache/conftool/dbconfig/20200918-060103-marostegui.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12643 and previous config saved to /var/cache/conftool/dbconfig/20200918-053758-marostegui.json
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2029 and es2030 to dbctl depooled - T261717', diff saved to https://phabricator.wikimedia.org/P12642 and previous config saved to /var/cache/conftool/dbconfig/20200918-053604-marostegui.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12641 and previous config saved to /var/cache/conftool/dbconfig/20200918-052608-marostegui.json
  • 05:15 marostegui: Restart wikibugs

2020-09-17

  • 23:41 ejegg: updated payments-wiki from 86c997fdb2 to 7bb99ce03a
  • 23:01 ejegg: updated payments-wiki from 1e5a52ed26 to 86c997fdb2
  • 20:47 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: 19b9b98: Fix APCOND_FR_NEVERBLOCKED handling (part 3; T262970) (duration: 00m 57s)
  • 19:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:25 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:02 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=wikidatawiki --logwiki=metawiki 'Filomena ciavarella' 'Filomena Ciavarella' #T262657
  • 18:54 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:54 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:39 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:39 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:29 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 18:11 Urbanecm: Morning B&C done
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 40591d3: Enable DiscussionTools beta on jawiki & viwiki (T261654; T262109) (duration: 00m 56s)
  • 18:06 Urbanecm: Move /srv/mediawiki-stagging/grep (owned by tstarling) to /home/urbanecm to make working directory clean (cc TimStarling)
  • 17:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 17:20 rzl: repooled eqiad at 17:11
  • 17:12 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:12 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 17:12 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:03 papaul: restarting ps1-d8-codfw
  • 16:45 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 01m 12s)
  • 16:44 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
  • 16:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 02m 50s)
  • 16:41 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
  • 16:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 07m 26s)
  • 16:33 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
  • 16:33 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema (duration: 06m 14s)
  • 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:27 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema
  • 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:21 marostegui: Restart wikibugs
  • 16:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:15 papaul: replacing msw-d8-codfw
  • 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1131 IP after moving it to a different rack T262901', diff saved to https://phabricator.wikimedia.org/P12639 and previous config saved to /var/cache/conftool/dbconfig/20200917-160540-marostegui.json
  • 16:03 marostegui: Recreate db1131 on tendril T262901
  • 15:59 marostegui: Update rack location on zarcillo for db1131 T262901
  • 15:57 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 100% T259831', diff saved to https://phabricator.wikimedia.org/P12638 and previous config saved to /var/cache/conftool/dbconfig/20200917-155708-kormat.json
  • 15:44 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 75% T259831', diff saved to https://phabricator.wikimedia.org/P12637 and previous config saved to /var/cache/conftool/dbconfig/20200917-154431-kormat.json
  • 15:43 mepps: updated payments-wiki from 3c073a6a56 to 1e5a52ed26
  • 15:35 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 50% T259831', diff saved to https://phabricator.wikimedia.org/P12636 and previous config saved to /var/cache/conftool/dbconfig/20200917-153514-kormat.json
  • 15:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:20 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 25% T259831', diff saved to https://phabricator.wikimedia.org/P12635 and previous config saved to /var/cache/conftool/dbconfig/20200917-152019-kormat.json
  • 15:17 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12634 and previous config saved to /var/cache/conftool/dbconfig/20200917-151347-marostegui.json
  • 15:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12633 and previous config saved to /var/cache/conftool/dbconfig/20200917-150234-marostegui.json
  • 15:02 jynus: deploying extended grants for admin account on sys/p_s at s8@codfw T195578
  • 15:00 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:00 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:54 kormat@cumin1001: dbctl commit (dc=all): 'db2114: depool for schema change T259831', diff saved to https://phabricator.wikimedia.org/P12632 and previous config saved to /var/cache/conftool/dbconfig/20200917-145451-kormat.json
  • 14:49 cmjohnson1: ending pdu maintenance in eqiad
  • 14:40 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12631 and previous config saved to /var/cache/conftool/dbconfig/20200917-143914-marostegui.json
  • 14:32 papaul: replacing msw-d1,d2,d3,d4,d5 and d6
  • 14:31 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12630 and previous config saved to /var/cache/conftool/dbconfig/20200917-141825-marostegui.json
  • 14:02 marostegui: Start mysql on db1125 after PDU maintenance T261459
  • 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 T260670', diff saved to https://phabricator.wikimedia.org/P12629 and previous config saved to /var/cache/conftool/dbconfig/20200917-140014-marostegui.json
  • 13:33 jayme: ran ipvsadm -D -t 10.2.2.14:8888 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet
  • 13:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:32 jayme: ran ipvsadm -D -t 10.2.2.31:8748 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet
  • 13:32 jayme: ran ipvsadm -D -t 10.2.1.31:8748 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet
  • 13:32 jayme: ran ipvsadm -D -t 10.2.1.14:8888 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet
  • 13:25 kormat@cumin1001: dbctl commit (dc=all): 'Start depooling db2114 T259831', diff saved to https://phabricator.wikimedia.org/P12628 and previous config saved to /var/cache/conftool/dbconfig/20200917-132513-kormat.json
  • 13:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:19 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet
  • 13:17 marostegui: Stop MySQL on db2125 for on-site maintenance T260670
  • 13:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:13 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet
  • 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.9
  • 12:18 cmjohnson1: pdu swap maintenance beginning now for racks D1, D2 and C1 eqiad
  • 11:24 matthiasmullie: End Euro B&C
  • 11:24 mlitn@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/NavigationTiming/: Account for empty layout shift sources array (duration: 01m 05s)
  • 11:22 mlitn@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/WikimediaEvents/: Disable MediaSearch A/B test (duration: 01m 08s)
  • 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12627 and previous config saved to /var/cache/conftool/dbconfig/20200917-111028-marostegui.json
  • 11:06 vgutierrez: update to acme-chief 0.29 on acmechief[12]001 - T263006
  • 11:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:04 vgutierrez: upload acme-chief 0.29 to apt.wm.o (buster) - T263006
  • 11:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:03 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wikifeeds,name=eqiad
  • 10:58 marostegui: Stop mysql on db1125 for PDU mainteanance, lag will appear on s2, s4, s6 and s7 on labsdb hosts T261459
  • 10:58 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=codfw
  • 10:51 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wikifeeds,name=codfw
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12626 and previous config saved to /var/cache/conftool/dbconfig/20200917-104816-marostegui.json
  • 10:46 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=eqiad
  • 10:40 oblivian@cumin1001: conftool action : set/ttl=10; selector: dnsdisc=wikifeeds
  • 10:34 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:27 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 10:22 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 10:20 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 10:18 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 10:17 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 09:14 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 08:49 jayme: deleting some random pods in kubernetes staging to rebalance load back on kubestage1002 - T262527
  • 08:43 jayme: uncordoned kubestage1002 after kernel upgrade - T262527
  • 08:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:37 godog: graphite compress /var/log/carbon logs older than 2d
  • 08:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:25 jayme: reboot kubestage1002 for kernel upgrade - T262527
  • 08:24 godog: graphite add 300G to /srv
  • 07:55 jayme: draining kubestage1002 for kernel upgrade - T262527
  • 07:55 jayme: cordoning kubestage1002 for kernel upgrade - T262527
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12624 and previous config saved to /var/cache/conftool/dbconfig/20200917-070145-marostegui.json
  • 06:55 hashar: Taking a heap dump of Gerrit JVM
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12623 and previous config saved to /var/cache/conftool/dbconfig/20200917-061931-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12622 and previous config saved to /var/cache/conftool/dbconfig/20200917-060312-marostegui.json
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12621 and previous config saved to /var/cache/conftool/dbconfig/20200917-055219-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for on-site maintenace', diff saved to https://phabricator.wikimedia.org/P12620 and previous config saved to /var/cache/conftool/dbconfig/20200917-055158-marostegui.json
  • 05:46 marostegui: Stop mysql on db1131 - T262901
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2031 on es2 for the first time with minimal weight T261717', diff saved to https://phabricator.wikimedia.org/P12619 and previous config saved to /var/cache/conftool/dbconfig/20200917-054226-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12618 and previous config saved to /var/cache/conftool/dbconfig/20200917-053503-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12617 and previous config saved to /var/cache/conftool/dbconfig/20200917-052347-marostegui.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2011 as es1 master and es2017 as es3 master and then depool es2018 and es2012 to clone es2029 and es2030 T261717', diff saved to https://phabricator.wikimedia.org/P12616 and previous config saved to /var/cache/conftool/dbconfig/20200917-051741-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12615 and previous config saved to /var/cache/conftool/dbconfig/20200917-050739-marostegui.json
  • 04:53 marostegui: Deploy schema change on s1 eqiad primary master - T238966
  • 01:22 Krinkle: krinkle@mwmaint1002 synced docroot/noc – https://gerrit.wikimedia.org/r/620138
  • 01:22 Krinkle: krinkle@mwmaint2001 synced docroot/noc – https://gerrit.wikimedia.org/r/620138

2020-09-16

  • 23:41 catrope@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FlaggedRevs: T262970 (duration: 01m 06s)
  • 23:40 catrope@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs: T262970 (duration: 01m 06s)
  • 23:37 catrope@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/GrowthExperiments/: Fix styling for mobile start module (T258008); Revert wider task card on desktop (T263042, T258704); Fix width of sidebar modules in narrow mode in variant A (T263068) (duration: 01m 09s)
  • 22:24 shdubsh: install prometheus-icinga-exporter 0.11 on icinga2001
  • 20:19 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 20:19 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 20:10 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:04 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Vector search in header on testwiki and officewiki (T262207) (duration: 01m 04s)
  • 18:00 brennen@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/MobileFrontend: Backport: Check $coords matched some nodes before comparing contents (T263034) (duration: 01m 06s)
  • 17:58 joal@deploy1001: Finished deploy [analytics/refinery@07056b0] (thin): Regular analytics weekly train THIN [analytics/refinery@07056b0] (duration: 00m 08s)
  • 17:58 joal@deploy1001: Started deploy [analytics/refinery@07056b0] (thin): Regular analytics weekly train THIN [analytics/refinery@07056b0]
  • 17:51 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:50 joal@deploy1001: Started deploy [analytics/refinery@07056b0]: Regular analytics weekly train [analytics/refinery@07056b0]
  • 17:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:11 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:03 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 16:45 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:40 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:13 marostegui: Start mysql on db1093, db1109 and db1123 after pdu work is done
  • 16:12 ryankemper: `wdqs` deploy complete, service is healthy
  • 16:09 elukey: reinstall buster on an-tool1009 after a lot of tests (ganeti VM, so it is a manual work)
  • 16:00 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:58 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:49 ryankemper: sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'; sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'
  • 15:49 ryankemper: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 15:48 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b7e2d0b]: 0.3.48 (duration: 14m 40s)
  • 15:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Rename wmgWikibaseClientLocalEntitySourceName to wmgWikibaseClientItemAndPropertySourceName on Beta (T258060) (production no-op) (duration: 01m 04s)
  • 15:35 ryankemper: Canary `wdqs1003` query tests looks good, proceeding to wdqs deploy for rest of fleet
  • 15:33 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b7e2d0b]: 0.3.48
  • 15:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Remove `wmgWikibaseClientLocalEntitySourceName` from InitialiseSettings.php (T258060) (duration: 01m 05s)
  • 15:27 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Use `wmgWikibaseClientItemAndPropertySourceName` instead of `wmgWikibaseClientLocalEntitySourceName` in Wikibase.php (T258060) (duration: 01m 02s)
  • 15:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add `wmgWikibaseClientItemAndPropertySourceName` to InitialiseSettings.php (T258060) (duration: 01m 06s)
  • 14:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 14:41 volans: uploaded spicerack_0.0.43 to apt.wikimedia.org buster-wikimedia
  • 14:39 cmjohnson1: pdu swap rack d7-eqiad, missed this in earlier log entry
  • 14:34 jiji@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 14:02 Urbanecm: Change email address of User:Oversight@enwiki to oversight-en-wp@wikipedia.org as OTRS is back up (T262733)
  • 13:48 marostegui: Start mysql on db1121 after PDU work
  • 13:46 James_F: Restarting CI Jenkins for T262827
  • 13:08 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2256.codfw.wmnet
  • 13:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.9
  • 12:58 elukey: upload hue_4.7.1-1+deb10u1 to buster-wikimedia
  • 12:56 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 12:56 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 12:49 cmjohnson1: start pdu swap in racks c6 and c7, d8
  • 12:36 moritzm: powercycling mw2256 (went down with overheated CPU)
  • 12:29 moritzm: restarting exim on MXes to pick up GNUTLS update
  • 11:29 moritzm: restarting slapd on LDAP replicas to pick up GNUTLS update
  • 11:18 moritzm: installing gnutls28 security updates on remaining stretch hosts
  • 11:12 jforrester@deploy1001: Synchronized php-1.36.0-wmf.9/includes/filerepo/file: T263014 Revert "Remove support for (Archived|OldLocal)File::userCan without a user" (duration: 01m 04s)
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2027 and es2028 T261717', diff saved to https://phabricator.wikimedia.org/P12606 and previous config saved to /var/cache/conftool/dbconfig/20200916-103324-marostegui.json
  • 10:20 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.9
  • 10:14 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.9 (duration: 46m 07s)
  • 10:10 ema: upload python-acme 0.31.0-2wm1 to buster-wikimedia T263006
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12605 and previous config saved to /var/cache/conftool/dbconfig/20200916-100548-marostegui.json
  • 10:01 akosiaris: T187984 Shutdown mendelevium.
  • 09:43 jynus: deploying max_packet_size change to m3 instances, too
  • 09:28 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.9
  • 09:26 liw: moving train 1.36.0-wmf.9 to testwikis
  • 09:22 jynus: restarting gerrit service on gerrit1001, unresponsive
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12603 and previous config saved to /var/cache/conftool/dbconfig/20200916-091535-marostegui.json
  • 09:13 XioNoX: fasw-c-eqiad> request system snapshot slice alternate member 0 - T262290
  • 09:08 XioNoX: fasw-c-eqiad> request system snapshot slice alternate member 1 - T262290
  • 08:52 marostegui: Stop mysql on db1121, db1123, db1093 and db1109 for PDU work T261454 T261457
  • 08:52 XioNoX: asw-d-codfw> request system snapshot slice alternate all-members - T262290
  • 08:50 jynus: deploy new max_allowed_packet configuration to m1, m2 and m5 dbs
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12601 and previous config saved to /var/cache/conftool/dbconfig/20200916-084916-marostegui.json
  • 08:42 awight: finished security backport for https://phabricator.wikimedia.org/T262628
  • 08:41 awight@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FileImporter/src/Services/ImportPlanValidator.php: Security patch for T262628 (duration: 00m 59s)
  • 08:41 XioNoX: asw-c-codfw> request system snapshot slice alternate all-members - T262290
  • 08:27 XioNoX: asw-b-codfw> request system snapshot slice alternate all-members - T262290
  • 08:26 awight: beginning security backport for https://phabricator.wikimedia.org/T262628
  • 08:17 XioNoX: asw-a-codfw> request system snapshot slice alternate all-members - T262290
  • 08:04 akosiaris: T187984 Validated that ticket.wikimedia.org works, proceeding with a wider announcement
  • 08:02 XioNoX: asw2-d-eqiad> request system snapshot slice alternate all-members - T262290
  • 07:49 akosiaris: T187984 Switch over ticket.discovery.wmnet to otrs1001
  • 07:48 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:44 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 07:40 XioNoX: asw2-c-eqiad> request system snapshot slice alternate all-members - T262290
  • 07:37 akosiaris: T187984 Tested inbound email successfully
  • 07:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:26 akosiaris: T187984 Tested outbound email, switching inbound email configuration and performing tests
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12600 and previous config saved to /var/cache/conftool/dbconfig/20200916-072614-marostegui.json
  • 07:22 jayme@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:22 jayme@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 07:21 jayme@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:12 akosiaris: T187984 Disable gravatar in system configuration to avoid leaking agent PII through a 3rd party service
  • 07:03 akosiaris: T187984 validated that the OTRS installation is functional over SSH
  • 07:02 akosiaris: T187984 migration script done. Config updates, rebuilds, package upgrades/reinstall and index rebuilds done
  • 06:28 godog: codfw-prod: bump weight for ms-be2057 - T261633
  • 06:20 kart_: Updated cxserver to 2020-08-30-011854-production (T253439, T260557)
  • 06:20 XioNoX: asw2-b-eqiad> request system snapshot slice alternate all-members - T262290
  • 06:15 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:11 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 for the first time with minimum weight T261717', diff saved to https://phabricator.wikimedia.org/P12599 and previous config saved to /var/cache/conftool/dbconfig/20200916-061013-marostegui.json
  • 06:08 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12598 and previous config saved to /var/cache/conftool/dbconfig/20200916-060717-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 to clone es2031 T261717', diff saved to https://phabricator.wikimedia.org/P12597 and previous config saved to /var/cache/conftool/dbconfig/20200916-055535-marostegui.json
  • 05:53 XioNoX: asw2-a-eqiad> request system snapshot slice alternate all-members - T262290
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12596 and previous config saved to /var/cache/conftool/dbconfig/20200916-055108-marostegui.json
  • 05:50 XioNoX: msw1-codfw> request system snapshot slice alternate - T262290
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2027 and es2028 to dbctl T261717', diff saved to https://phabricator.wikimedia.org/P12595 and previous config saved to /var/cache/conftool/dbconfig/20200916-053918-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12594 and previous config saved to /var/cache/conftool/dbconfig/20200916-053507-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 into vslow', diff saved to https://phabricator.wikimedia.org/P12593 and previous config saved to /var/cache/conftool/dbconfig/20200916-052343-marostegui.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12592 and previous config saved to /var/cache/conftool/dbconfig/20200916-052241-marostegui.json
  • 05:07 marostegui: Repool labsdb1010
  • 02:22 mutante: deneb - sudo systemctl start package_builder_Clean_up_build_directory to fix icinga alert after failed build attempts

2020-09-15

  • 23:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: 1c0b0d1: Fix APCOND_FR_NEVERBLOCKED handling (T262970) (duration: 00m 56s)
  • 23:18 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: 5beace3: Fix APCOND_FR_NEVERBLOCKED handling (T262970) (duration: 00m 58s)
  • 23:14 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: ac8bd38: flaggedrevs: Remove non-existent config options (duration: 00m 58s)
  • 23:07 urbanecm@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 23:00 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 62b21d5: Revert "Remove abusefilter-view right grant from wmf-config" (T255506) (duration: 00m 59s)
  • 20:44 brennen: removing extraneous recursive symlink /srv/mediawiki-staging/php-1.36.0-wmf.9/php-1.36.0-wmf.8
  • 18:32 Urbanecm: Morning B&C done
  • 18:28 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 084729b: Remove abusefilter-view right grant from wmf-config (T255506) (duration: 00m 56s)
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1d34565: Enable MediaWiki client errors on frwiki (T255585) (duration: 00m 57s)
  • 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 79004b7: Enable the reverted tag on all wikis (T164307) (duration: 00m 56s)
  • 17:59 krinkle@deploy1001: Synchronized src/ServiceConfig.php: If727ae4335 (duration: 00m 56s)
  • 17:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint, feeds time out (duration: 37m 42s)
  • 17:05 ppchelko@deploy1001: Started deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint, feeds time out
  • 17:05 ppchelko@deploy1001: Finished deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint (duration: 86m 46s)
  • 17:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:57 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:38 ppchelko@deploy1001: Started deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint
  • 15:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:30 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:30 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:26 shdubsh: manual install prometheus-icinga-exporter upgrade on icinga2001
  • 14:53 godog: switch grafana to eqiad - T259143
  • 14:48 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:42 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:38 XioNoX: remove old SNMP community from all network devices
  • 14:23 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:22 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Set canary_events_enabled: true for eventlogging_TemplateWizard - T251609 (duration: 00m 56s)
  • 14:21 otto@deploy1001: sync-file aborted: wgEventStreams: Set canary_events_enabled: true for eventlogging_TemplateWizard - T251609 (duration: 00m 06s)
  • 14:01 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:01 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:51 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:51 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:50 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:18 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:14 cmjohnson1: beginning work inside racks c2, c3, c4 and c5 eqiad
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 from vslow, s8, add db1092 temporarily', diff saved to https://phabricator.wikimedia.org/P12589 and previous config saved to /var/cache/conftool/dbconfig/20200915-121849-marostegui.json
  • 12:18 jbond42: update libxml2 on stretch and jessie
  • 12:08 jbond42: rolling restart of php7.2-fpm
  • 12:05 elukey: roll restart cassandra on aqs* to pick up openjdk upgrades
  • 12:05 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart
  • 11:44 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 294931f: Revert "Disable DynamicPageList on ruwikinews" (T262240; T262391) (duration: 00m 58s)
  • 11:17 effie: roll out scap 3.15.0-1 to all - T261234
  • 11:12 XioNoX: mass update SCS SNMP community in LibreNMS - T246890
  • 10:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:56 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 10:54 XioNoX: mass update PDU SNMP community in LibreNMS - T246890
  • 10:48 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 10:36 moritzm: uploaded libxml2 2.9.1+dfsg1-5+deb8u8+wmf1 for jessie-wikimedia
  • 10:33 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 10:22 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "testwikiswikis to 1.36.0-wmf.9"
  • 10:12 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 09:22 marostegui: Stop MySQL on s5 and s8 eqiad primary master - lag will show up on labsdb hosts T261455
  • 09:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 09:08 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 09:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 09:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 09:04 gehel: restart elasticsearch on elastic2029 (high GC
  • 09:01 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 08:59 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 08:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 08:53 elukey: roll restart druid zookeeper clusters for openjdk upgrades
  • 08:53 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 08:52 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:13 marostegui: Stop MySQL on labsdb1010 for PDU maintenance T261456
  • 08:05 liw@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_498180604" --store-class=LCStoreCDB --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 11m 10s)
  • 08:04 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
  • 08:01 akosiaris: T187984 migration script on otrs1001 proceeding as expected. Still in step 31/44, but that's what we saw in the test migration
  • 07:54 liw@deploy1001: Started scap: testwikis to 1.36.0-wmf.9
  • 07:24 godog: swift codfw add ms-be2057 at object weight 100 - T261633
  • 07:19 elukey: roll restart druid cluster to pick up openjdk updates
  • 07:19 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
  • 07:16 XioNoX: pre-configure SGIX port on cr2-eqsin
  • 06:57 liw: 1.36.0-wmf.9 was branched at 7269b6b for T257977
  • 06:08 marostegui: Stop mysql on es2011 to clone es2028
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2011 to clone es2028', diff saved to https://phabricator.wikimedia.org/P12585 and previous config saved to /var/cache/conftool/dbconfig/20200915-060623-marostegui.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2012 as es1 codfw master T261717', diff saved to https://phabricator.wikimedia.org/P12584 and previous config saved to /var/cache/conftool/dbconfig/20200915-060508-marostegui.json
  • 05:33 marostegui: Depool labsdb1010 for PDU maintenance
  • 05:10 marostegui: Restart sanitarium hosts on eqiad and codfw T262832

2020-09-14

  • 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:49 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 22:49 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 22:45 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 21:34 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:32 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:30 cdanis: T257527 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo cumin 'R:Class ~ "(?i)profile::logstash::collector7"' 'enable-puppet "cdanis rolling out Ifa3c68e4"'
  • 21:24 cdanis: T257527 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕠🍺 sudo cumin 'R:Class ~ "(?i)profile::logstash::collector7"' 'disable-puppet "cdanis rolling out Ifa3c68e4"'
  • 21:05 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:03 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:38 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:36 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:26 cdanis@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a588eb0c6 T262087 modify wgEventStreams to reference NEL schema (duration: 00m 56s)
  • 19:00 Urbanecm: Morning B&C done
  • 18:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a5d56ed: e2f4798: Enable Special:Investigate on eswiki (T262436) (duration: 00m 56s)
  • 18:49 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:47 volans@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:38 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 7d19393: Remove investigate from $wgAvailableRights (T260175) (duration: 00m 56s)
  • 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d2fa653: Remove the investigate right from testwiki and frwiki (T260175) (duration: 00m 56s)
  • 18:30 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/EventStreamConfig/includes/: a4c8608: Default to using API json formatversion=2 (T251609) (duration: 00m 57s)
  • 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 27ba5a1: add new parse* servers to $wgLinterSubmitterWhitelist (T247441) (duration: 00m 56s)
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: 720e6cb: flaggedrevs: Move setting of wgFlaggedRevsAutopromote and wgFlaggedRevsAutoconfirm out of wgExtensionFunctions (T237191) (duration: 00m 56s)
  • 18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 699f5e8: Add logo Wordmark and Tagline for hywiki (T259985) (duration: 00m 55s)
  • 18:08 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: 699f5e8: Add logo Wordmark and Tagline for hywiki (T259985) (duration: 00m 56s)
  • 17:51 mutante: all new parse* parsoid hardware pooled now and set to active in netbox, deploy in 10 min will add to $wgLinterSubmitterWhitelist (T247441)
  • 17:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
  • 17:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse200[0-9].codfw.wmnet
  • 17:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2002.codfw.wmnet
  • 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 16:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 16:36 mutante: pooled the first of the new parsoid servers - parse2001 (T247441)
  • 16:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 16:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
  • 16:04 elukey: completed the rollout of restrictive kafka ferm rules on the Kafka jumbo cluster
  • 16:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse200[0-9].codfw.wmnet
  • 16:01 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=parsoid,name=parse20[0-2][0-9].codfw.wmnet
  • 15:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid,name=parse2001.codfw.wmnet
  • 15:58 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=parsoid,name=parse20[1-2][0-9].codfw.wmnet
  • 15:54 moritzm: restarting apache on webperf* to pick up GNU TLS security update
  • 15:45 moritzm: restarting apache/FPM on mw2271/m2272 (codfw canaries) to pick up GNU TLS update
  • 15:35 moritzm: installing gnutls28 security updates on stretch
  • 15:23 elukey: enable stricter ferm rules on kafka-jumbo1007 and kafka-jumbo1005
  • 15:17 cicalese@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Allow public access to API Portal main page for private launch (duration: 00m 57s)
  • 15:17 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:11 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:11 cmjohnson1: completed pdu swap in eqiad racks d5/d6
  • 14:55 elukey: ferm rules added to kafka-jumbo1009, 1006 and 1008 up to now
  • 14:24 milimetric@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:24 milimetric@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:16 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:14 milimetric@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 14:14 milimetric@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:11 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:09 milimetric@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 14:09 milimetric@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:50 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:42 moritzm: installing dbus security updates on stretch
  • 13:42 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:32 moritzm: installing websockify stretch updates
  • 13:10 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 12:51 cmjohnson1: correction it's replacing the pdu's in racks d5 and d6
  • 12:50 Amir1: ladsgroup@mwmaint2001:~$ mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P1438 --new-data-type external-id (T262198)
  • 12:49 cmjohnson1: replacing pdu's in racks d4 and d5 eqiad
  • 12:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:32 ayounsi@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-snmp (exit_code=1)
  • 12:30 ayounsi@cumin1001: START - Cookbook sre.pdus.rotate-snmp
  • 12:30 XioNoX: rotate SNMP community on all the PDUs - T246890
  • 12:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 12:24 moritzm: rebooting sodium for kernel update
  • 12:09 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 12:08 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 12:06 akosiaris: T187984 migration script on otrs1001 now in step 31/44
  • 12:03 volans@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:53 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fea8861: Follow-up 0ee0d8f: [frwiktionary] Create `conj` alias (T262298) (duration: 00m 56s)
  • 11:50 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:48 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:48 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:46 volans@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:45 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:41 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:41 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:40 volans@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:39 volans@cumin1001: START - Cookbook sre.ganeti.makevm
  • 11:36 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:35 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:27 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for MCR', diff saved to https://phabricator.wikimedia.org/P12578 and previous config saved to /var/cache/conftool/dbconfig/20200914-112648-marostegui.json
  • 11:24 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:20 marostegui: Remove triggers from db1124:3311 - T238966
  • 11:19 marostegui: Deploy MCR schema change on s1, this will generate lag on s1 labsdb - T238966
  • 11:13 Urbanecm: EU B&C window done
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 47fe87c: [itwiki] Increase $wgAutoConfirmAge and $wgAutoConfirmCount (T262738) (duration: 00m 56s)
  • 11:09 marostegui: Stop MySQL on s5 and s8 eqiad primary master - lag will show up on labsdb hosts T261455
  • 11:05 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=frwiktionary --fix # T262298 # P12576
  • 11:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0ee0d8f: [frwiktionary] Create new namespace "Conjugaison" & associated talk (T262298) (duration: 00m 56s)
  • 11:00 volans: Mass importing IPs from PuppetDB into Netbox T244153
  • 10:59 XioNoX: create LACP bundle to labtestvirt2003
  • 10:50 jbond42: enable git protocol version2 fleet wide
  • 10:43 effie: deploy scap 3.15.0-1 to canaries - T261234
  • 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 09:27 akosiaris: T187984 migration script on otrs1001 now in step 8/44 (correction)
  • 09:26 akosiaris: T187984 migration script on otrs1001 now in step 8/41
  • 09:09 akosiaris: db1077. stop slave ; show slave status > /home/akosiaris/show_slave_status; reset slave all T187984
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2026 on es2 T261717', diff saved to https://phabricator.wikimedia.org/P12575 and previous config saved to /var/cache/conftool/dbconfig/20200914-085842-marostegui.json
  • 08:49 akosiaris: start the OTRS upgrade to 6.0.29 T187984
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12574 and previous config saved to /var/cache/conftool/dbconfig/20200914-084509-marostegui.json
  • 08:42 moritzm: upgrading remaining stretch systems to git 2.20 T262244
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12573 and previous config saved to /var/cache/conftool/dbconfig/20200914-083525-marostegui.json
  • 08:17 _joe_: restarting pybal on lvs2009
  • 08:16 _joe_: repooling mw2297
  • 08:14 _joe_: restarting php on mw2297, php-fpm stuck in SIGILL
  • 08:14 marostegui: Stop MySQL on db2125 for on-site maintenance - T260670
  • 08:12 _joe_: restarting pybal on lvs2010
  • 08:09 _joe_: restarting pybal on lvs1015
  • 08:05 godog: prometheus codfw ops, extend the lv by 100G
  • 08:04 marostegui: Stop MySQL on es2017 to clone es2027
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2017 to clone es2027 - T261717', diff saved to https://phabricator.wikimedia.org/P12572 and previous config saved to /var/cache/conftool/dbconfig/20200914-080344-marostegui.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2018 as es3 codfw master T261717', diff saved to https://phabricator.wikimedia.org/P12571 and previous config saved to /var/cache/conftool/dbconfig/20200914-080239-marostegui.json
  • 07:58 _joe_: restarting pybal on lvs1015
  • 07:52 _joe_: restarting pybal on lvs1016
  • 07:40 jayme: shutting down etcd100[1-3] (sheduled for decommission, replaced by kubetcd100[4-6])
  • 07:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:39 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 with more weight T261717', diff saved to https://phabricator.wikimedia.org/P12570 and previous config saved to /var/cache/conftool/dbconfig/20200914-073919-marostegui.json
  • 06:56 elukey: slowly rollout ferm rules on Kafka-Jumbo hosts (see https://gerrit.wikimedia.org/r/611168)
  • 06:19 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 05:54 elukey: execute "gnt-instance modify -B vcpus=4 an-tool1009.eqiad.wmnet" on ganeti1011 - T258768
  • 05:54 marostegui: Truncate tendril.general_log_sampled on db1115 - T262782
  • 05:47 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:43 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2026 on es2 for the first time with minimum weight T261717', diff saved to https://phabricator.wikimedia.org/P12569 and previous config saved to /var/cache/conftool/dbconfig/20200914-053844-marostegui.json

2020-09-13

  • 23:47 Urbanecm: Change email address of User:Oversight@enwiki to oversight-l@lists.wikimedia.org as part of OTRS downtime preparation (T262733)
  • 05:51 effie: sudo -i depool mw2297

2020-09-12

  • 01:07 mutante: people2001 - rsyncing user home dirs from people1002
  • 00:38 mutante: all issues with hosts doing stuff "on every run" have been fixed except one is left: analytics1034

2020-09-11

  • 22:54 mutante: starting people2001 VM
  • 17:30 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:29 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:22 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:12 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 12:48 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:47 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:27 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 11:49 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:55 jynus: starting snapshot of m2 from db1117
  • 08:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:00 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 07:59 XioNoX: remove BGP to AS64271 in AMS-IX (see peering@ email)
  • 07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 07:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 07:17 moritzm: rebootin ldap-corp server for kernel update
  • 07:02 moritzm: remove git-core from stretch systems, it's a transition package no longer provided by the 2.20 backport from Buster
  • 02:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 02:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:54 mutante: downtimes 48h for parse* hosts not in production yet but getting icinga checks from applied role
  • 01:53 mutante: ACKed alerts for eqiad power switches after making T262629
  • 01:53 mutante: initial puppet runs on parse2010 - parse2020, staggered, not in production yet, new hardware, setup WIP (T247441)
  • 01:45 mutante: mw2296 - restarted php7.2-fpm
  • 01:42 mutante: mw2296 - systemctl restart apache2 - rescheduled icinga alerts for apache and php-fpm
  • 01:33 mutante: initial puppet runs on parse2001 - parse2010, staggered, not in production yet, new hardware, setup WIP (T247441)
  • 01:32 milimetric@deploy1001: Finished deploy [analytics/refinery@6057f20] (thin): Simple hql syntax fix (duration: 00m 07s)
  • 01:32 milimetric@deploy1001: Started deploy [analytics/refinery@6057f20] (thin): Simple hql syntax fix
  • 01:32 milimetric@deploy1001: Finished deploy [analytics/refinery@6057f20]: Simple hql syntax fix (duration: 08m 09s)
  • 01:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:24 milimetric@deploy1001: Started deploy [analytics/refinery@6057f20]: Simple hql syntax fix
  • 00:41 milimetric@deploy1001: Finished deploy [analytics/refinery@7f5a6ca] (thin): Regular analytics weekly train THIN [analytics/refinery@7f5a6ca] (duration: 00m 08s)
  • 00:41 milimetric@deploy1001: Started deploy [analytics/refinery@7f5a6ca] (thin): Regular analytics weekly train THIN [analytics/refinery@7f5a6ca]
  • 00:40 milimetric@deploy1001: Finished deploy [analytics/refinery@7f5a6ca]: Regular analytics weekly train [analytics/refinery@7f5a6ca] (duration: 08m 25s)
  • 00:38 mutante: generating mcrouter certs for parse2001 - parse2019 - mcrouter_generate_certs on puppetmaster1001 (T247441)
  • 00:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:31 milimetric@deploy1001: Started deploy [analytics/refinery@7f5a6ca]: Regular analytics weekly train [analytics/refinery@7f5a6ca]
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:01 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime

2020-09-10

  • 23:44 ejegg: updated payments-wiki from e41ab173e0 to 3c073a6a56
  • 23:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 22:50 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 22:43 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 22:33 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 22:31 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 22:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 22:11 ejegg: updated payments-wiki from be81063168 to e41ab173e0
  • 22:06 mutante: added mcrouter cert for parse2020, ran mcrouter_generate_certs
  • 21:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 21:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 21:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:25 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.8
  • 20:23 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:21 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:20 longma: correction: T257976 - 1.36.0-wmf.8 to all wikis
  • 20:20 longma: deploying 1.36.0-wmf.8 to all wikis
  • 20:02 krinkle@deploy1001: Synchronized php-1.36.0-wmf.8/includes/resourceloader/ResourceLoaderSkinModule.php: Ibe2c9f8d024f6 (duration: 01m 05s)
  • 19:44 Urbanecm: End of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwiktionary --previous-collation=uppercase # T262163
  • 19:12 mholloway-shell@deploy1001: Started restart [recommendation-api/deploy@db7fd80]: (no justification provided)
  • 19:07 Urbanecm: Start of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwiktionary --previous-collation=uppercase # T262163
  • 19:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 95d2b57: Set $wgCategoryCollation = uca-tr on trwiktionary (T262163) (duration: 01m 05s)
  • 18:58 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=frwiktionary --fix # T262398
  • 18:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 09e487e: Add a new namespace to frwiktionary (T262398) (duration: 01m 04s)
  • 18:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/includes/EditPage.php: 8240944: EditPage: Fix member call on boolean when undo is impossible (T262463) (duration: 01m 03s)
  • 18:37 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.6/includes/EditPage.php: 8240944: EditPage: Fix member call on boolean when undo is impossible (T262463) (duration: 01m 07s)
  • 18:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:24 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: 0cde0b1: Add throttle rule for Czech senior citizens course (T262415) (duration: 01m 05s)
  • 18:24 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 18:00 mutante: helium (former backup host) is being removed from ferm rules on all hosts, it was replaced by backup1001 (T260717)
  • 17:33 bblack: dns servers: upgrading remainder of fleet to gdnsd-3.3.0-1~wmf1
  • 16:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:25 bblack: authdns1001 - upgrade gdnsd to 3.3.0-1~wmf1
  • 16:06 bblack: dns4001 - upgrade gdnsd to 3.3.0-1~wmf1
  • 16:04 bblack: reprepro: uploaded gdnsd-3.3.0-1~wmf1 - T261340
  • 15:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 14:04 volans: uploaded cumin_4.0.0 to apt.wikimedia.org buster-wikimedia (no code changes)
  • 13:58 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 13:52 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 13:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 13:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 13:42 moritzm: rebooting etherpad1002 (etherpad.wikimedia.org) for kernel update
  • 13:24 moritzm: installing rake security updates on stretch
  • 13:10 ebernhardson: delete lldwiki_{content|general} indices from search.svc.{eqiad|codfw}.wmnet:9643 (psi), they should be on 9443 (omega)
  • 12:57 klausman: Ran puppet-merge to get my dotfiles from https://gerrit.wikimedia.org/r/c/operations/puppet/+/626367 out
  • 12:34 moritzm: installing firejail updates on maps/thumbor/restbase
  • 12:01 moritzm: upgrading deployment servers to git 2.20 T262244
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P12557 and previous config saved to /var/cache/conftool/dbconfig/20200910-113758-marostegui.json
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317, db1101:3318', diff saved to https://phabricator.wikimedia.org/P12556 and previous config saved to /var/cache/conftool/dbconfig/20200910-113426-marostegui.json
  • 11:13 matthiasmullie: Euro B&C done
  • 11:13 moritzm: uploaded git 2.20.1-2+deb10u3~wmf1 to stretch-wikimedia/main T262244
  • 11:11 mlitn@deploy1001: Synchronized php-1.36.0-wmf.8//extensions/WikimediaEvents/: WikimediaEvents: Enable MediaSearch A/B test (duration: 01m 06s)
  • 10:42 duesen_: daniel@mwmaint2001:~$ mwscript maintenance/findBadBlobs.php jvwiki --revisions 214173 --mark T262457
  • 10:34 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:32 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:28 XioNoX: move VRRP master to cr2-esams
  • 10:21 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 09:45 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 09:43 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:42 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:40 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12555 and previous config saved to /var/cache/conftool/dbconfig/20200910-093106-marostegui.json
  • 09:26 dcausse: creating missing cirrus indices for jawikivoyage T262518
  • 09:24 dcausse: creating missing cirrus indices for jawikivoyage T260228
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12554 and previous config saved to /var/cache/conftool/dbconfig/20200910-091335-marostegui.json
  • 08:49 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 08:47 jynus@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12551 and previous config saved to /var/cache/conftool/dbconfig/20200910-082304-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2014 after cloning es2026', diff saved to https://phabricator.wikimedia.org/P12550 and previous config saved to /var/cache/conftool/dbconfig/20200910-073107-marostegui.json
  • 07:03 elukey: resize search-loader vms (+4 vcores +4GB of ram) on Ganeti - T262385
  • 05:29 marostegui: Deploy schema change on s3 master - T260476
  • 00:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@00b0e20]: Update to current master (duration: 06m 42s)
  • 00:24 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@00b0e20]: Update to current master
  • 00:23 twentyafterfour: done. Phabricator update complete
  • 00:23 twentyafterfour: applying database migrations to phabricator db
  • 00:09 twentyafterfour: deploying phabricator update 2020-09-10 https://phabricator.wikimedia.org/project/view/4755/

2020-09-09

  • 23:51 dpifke@deploy1001: Finished deploy [performance/arc-lamp@55fccc6]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/622915 (duration: 00m 05s)
  • 23:51 dpifke@deploy1001: Started deploy [performance/arc-lamp@55fccc6]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/622915
  • 23:37 ebernhardson@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/CirrusSearch/includes/Search/InterleavedResultSet.php: Repair passing interleaved search metrics from backend to frontend (duration: 01m 04s)
  • 20:13 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:625914 (duration: 01m 03s)
  • 20:03 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:626190 T261425 (duration: 01m 03s)
  • 20:01 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.8/skins/WikimediaApiPortal: Backport gerrit:626044, T261425 (duration: 01m 12s)
  • 19:11 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.8 (duration: 01m 03s)
  • 19:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.8
  • 18:19 _joe_: banning urls ^/api/rest_v1/page/mobile-html-offline-resources/ from varnish caches
  • 18:19 Urbanecm: Morning B&C window done
  • 18:17 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:17 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b226330: Enable $wgAllowCrossOrigin on all wikis (T262425) (duration: 01m 04s)
  • 18:15 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
  • 18:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 18:13 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 85e36ae: Enable MediaWiki client errors on commonswiki and metawiki (T255585) (duration: 01m 06s)
  • 18:10 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 18:02 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout (duration: 02m 55s)
  • 17:59 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout
  • 17:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout (duration: 06m 47s)
  • 17:52 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, feed timeout
  • 17:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, take 2 (duration: 09m 38s)
  • 17:42 ppchelko@deploy1001: Started deploy [restbase/deploy@b90472d]: Require mobile-html 1.2.2 T262437, take 2
  • 17:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@dc3b955]: Require mobile-html 1.2.2 T262437 (duration: 06m 00s)
  • 17:35 ppchelko@deploy1001: Started deploy [restbase/deploy@dc3b955]: Require mobile-html 1.2.2 T262437
  • 17:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:28 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:25 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:24 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:22 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:18 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:15 marostegui: Stop mysql on db2125 for on-site maintenance T260670
  • 16:10 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 3] (duration: 00m 11s)
  • 16:10 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 3]
  • 16:10 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 16:10 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 16:06 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 16:06 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 16:06 bd808: scap3 of Striker to labweb1001 failing. Will investigate.
  • 16:05 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 2] (duration: 00m 11s)
  • 16:05 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) [take 2]
  • 16:04 bd808@deploy1001: Finished deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111) (duration: 01m 21s)
  • 16:03 bd808@deploy1001: Started deploy [striker/deploy@e120c6c]: Deploying r20200909 tag (T262323, T144111)
  • 15:54 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:48 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:26 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:26 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:11 herron: prometheus1003: systemctl restart thanos-sidecar@ops.service
  • 14:29 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:22 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:02 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:02 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:00 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:57 marostegui: Restart mysql on db1115 T231769
  • 13:54 bblack: deployed https://gerrit.wikimedia.org/r/626153
  • 12:47 _joe_: restarting php-fpm on wtp2003
  • 12:46 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 12:37 cmjohnson1: beginning scheduled PDU maintenance racks D5 and D6 in eqiad
  • 12:36 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12545 and previous config saved to /var/cache/conftool/dbconfig/20200909-123634-kormat.json
  • 12:31 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12544 and previous config saved to /var/cache/conftool/dbconfig/20200909-123109-kormat.json
  • 12:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:11 moritzm: installing zeromq security updates on Buster
  • 12:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:37 awight: EU Bacon complete
  • 11:34 awight@deploy1001: Synchronized wmf-config: Config: api-portal: required extended configuration (T261425) (duration: 01m 08s)
  • 11:15 moritzm: added Tobias Klausmann to pwstore
  • 11:14 jiji@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 11:03 marostegui: Stop MySQL on s2 eqiad master to prepare for the PDU maintenance (this will generate lag on s2 on labsdb) T261453
  • 10:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:28 volans: restarting ferm on failed hosts: an-test-master1001.eqiad.wmnet,an-worker1116.eqiad.wmnet,db[1075,1101,1116].eqiad.wmnet,labstore1007.wikimedia.org,logstash[1025,1030].eqiad.wmnet leftover from yesterday network issue
  • 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:11 klausman: Rebooting stat1005 for clearing GPU status and testing new DKMS driver (T260442)
  • 10:09 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:01 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12542 and previous config saved to /var/cache/conftool/dbconfig/20200909-100157-kormat.json
  • 09:52 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12541 and previous config saved to /var/cache/conftool/dbconfig/20200909-095219-kormat.json
  • 09:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:33 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12540 and previous config saved to /var/cache/conftool/dbconfig/20200909-093353-kormat.json
  • 09:26 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12539 and previous config saved to /var/cache/conftool/dbconfig/20200909-092621-kormat.json
  • 09:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:11 moritzm: installing qemu security updates on Buster
  • 09:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 08:53 _joe_: restarting restbase on rb2009 (depooled)
  • 08:53 godog: upgrade kibana to 7.9.1 on the logstash7 cluster
  • 08:51 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12538 and previous config saved to /var/cache/conftool/dbconfig/20200909-085147-kormat.json
  • 08:44 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12537 and previous config saved to /var/cache/conftool/dbconfig/20200909-084433-kormat.json
  • 08:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:40 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:40 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 08:36 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12536 and previous config saved to /var/cache/conftool/dbconfig/20200909-083616-kormat.json
  • 08:34 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 08:34 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:30 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12535 and previous config saved to /var/cache/conftool/dbconfig/20200909-083038-kormat.json
  • 08:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:14 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 07:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable DynamicPageList on ruwikinews (T262240) (duration: 01m 22s)
  • 07:25 elukey: restart varnishkafka-webrequest on cp5010 and cp5012, delivery reports errors happening since yesterday's network outage
  • 06:21 XioNoX: push new pfw policies - T262297
  • 01:58 eileen: civicrm revision changed from 4e40a59d42 to cc1f7e6d13, config revision is 4845a229dc

2020-09-08

  • 23:47 eileen: civicrm revision is 4e40a59d42, config revision is d26334fa36
  • 23:25 eileen: civicrm revision changed from 5e7352e2c3 to 4e40a59d42, config revision is 3cf0913789
  • 22:14 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:12 andrew@deploy1001: Finished deploy [horizon/deploy@7d727eb]: very minor wmf-puppet-dashboard update (duration: 03m 35s)
  • 22:08 andrew@deploy1001: Started deploy [horizon/deploy@7d727eb]: very minor wmf-puppet-dashboard update
  • 22:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 21:57 andrew@deploy1001: Finished deploy [horizon/deploy@7a3221d]: refreshing to clobber local hacks (duration: 00m 13s)
  • 21:57 andrew@deploy1001: Started deploy [horizon/deploy@7a3221d]: refreshing to clobber local hacks
  • 19:19 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.8
  • 19:12 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.8 (duration: 71m 45s)
  • 18:22 elukey: rm /srv/prometheus/ops/targets/mjolnir_msearch_eqiad.yaml on prometheus100[3,4] as cleanup after https://gerrit.wikimedia.org/r/621988 - T260305
  • 18:00 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.8
  • 17:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:57 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 17:54 Amir1: Deployed patch for T262240
  • 17:53 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:23 andrewbogott: rebooting cloudvirt1033
  • 17:03 klausman: attempted to add rock-dkms_3.3-19_all.deb to thirdparty/amd-rocm33 for use on analytics servers with GPUs
  • 16:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Set canary_events_enabled: true for eventgate test streams and eventlogging_Test - T251609 (duration: 00m 58s)
  • 16:34 herron: increased elk5 logstash JVM heaps to 2g (to help decrease kafka-logging consumer lag)
  • 16:12 longma: 1.36.0-wmf.8 was branched at e81e81e for T257976
  • 16:03 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 16:03 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 16:02 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:34 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1004.*
  • 15:32 jayme@cumin1001: conftool action : set/pooled=yes; selector: service=kubesvc,name=kubernetes1013.*
  • 15:30 elukey: roll restart of hadoop master daemons on an-master100[1,2] after the cookbook failed
  • 15:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 15:20 _joe_: restarted celery-ores-worker.service on ores1007
  • 15:19 _joe_: restarted ferm on wdqs1011
  • 15:18 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 15:16 _joe_: starting wdqs-updater on wdqs1005
  • 15:15 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1090.eqiad.wmnet
  • 15:14 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp108[789].eqiad.wmnet
  • 15:14 bblack: repool cp1087-90 (eqiad row D)
  • 15:13 herron: rolling restart of elk5 logstashes
  • 15:10 marostegui: Start mysql on db1106 after PDU maintenance is done
  • 15:03 jayme@cumin1001: conftool action : set/pooled=inactive; selector: service=kubesvc,name=kubernetes1013.*
  • 15:03 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=kubernetes1004.*
  • 15:03 XioNoX: request virtual-chassis vc-port set pic-slot 1 member 4 port 0
  • 15:03 XioNoX: request virtual-chassis vc-port set pic-slot 0 member 2 port 50
  • 15:02 XioNoX: request virtual-chassis vc-port set pic-slot 1 member 1 port 1
  • 14:53 marostegui: Reload dbproxy1016 to recover the alert
  • 14:45 jynus: restarting bacula-dir @ backup1001
  • 14:44 XioNoX: reboot asw2-d3-eqiad
  • 14:33 moritzm: bouncing ferm on hosts where ferm.service failed due to DNS resolution issues for prometheus hosts
  • 14:31 volans: restarted ssh on mc1033 from console
  • 14:16 XioNoX: request virtual-chassis vc-port delete pic-slot 1 member 4 port 0
  • 14:16 XioNoX: request virtual-chassis vc-port delete pic-slot 0 member 2 port 50
  • 14:13 akosiaris: drain kubernetes1013, kubernetes1004. They are on row D
  • 14:13 bblack: dns1002 - disable puppet + bird service (stop advertising recdns from row D)
  • 14:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1090.eqiad.wmnet
  • 13:59 bblack: depooling cp1087-1090
  • 13:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp108[789].eqiad.wmnet
  • 13:57 XioNoX: asw2-d-eqiad> request system reboot member 3
  • 13:35 cmjohnson1: the power cable was not properly seated and lost power to asw2-d3-eqiad
  • 13:34 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
  • 13:30 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 13:26 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:26 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:25 mateusbs17: Restarted puppetdb on deployment-puppetdb03 (T248041)
  • 13:24 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:24 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 13:21 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 13:20 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 13:20 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 13:18 cmjohnson1: swapping pdu's in eqiad, mgmt for racks d3 and d4 will go down
  • 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 13:18 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 13:17 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 13:17 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
  • 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'staging' .
  • 13:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 13:14 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 13:13 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 13:12 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:09 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 13:09 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 13:08 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 13:08 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 13:04 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:04 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 12:47 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 12:35 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12523 and previous config saved to /var/cache/conftool/dbconfig/20200908-123546-kormat.json
  • 12:34 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 12:27 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12522 and previous config saved to /var/cache/conftool/dbconfig/20200908-122702-kormat.json
  • 12:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:11 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12521 and previous config saved to /var/cache/conftool/dbconfig/20200908-121139-kormat.json
  • 12:04 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12520 and previous config saved to /var/cache/conftool/dbconfig/20200908-120419-kormat.json
  • 12:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:34 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:33 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 11:33 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 11:18 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:15 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:53 marostegui: Deploy schema change on s3 eqiad master - T253276
  • 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:53 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:20 marostegui: Deploy schema change on s4 eqiad master - T253276
  • 10:14 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 10:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:11 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 10:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 10:08 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12519 and previous config saved to /var/cache/conftool/dbconfig/20200908-100852-kormat.json
  • 09:52 akosiaris: enable puppet, run it on all k8s eqiad nodes and double check that calico-node is fine T239835
  • 09:43 akosiaris: stopped calico-node and kube-apiserver on k8s nodes/masters T239835
  • 09:43 marostegui: Stop mysql on es2014 to clone es2026 T261717
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2014 - T261717', diff saved to https://phabricator.wikimedia.org/P12517 and previous config saved to /var/cache/conftool/dbconfig/20200908-093957-marostegui.json
  • 09:37 volans: running homer 'cr*eqiad*' commit "Update debmonitor IPs (#2), T261489"
  • 09:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:33 jayme@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:28 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12515 and previous config saved to /var/cache/conftool/dbconfig/20200908-092755-kormat.json
  • 09:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:20 jayme: disabling puppted on argon.eqiad.wmnet,chlorine.eqiad.wmnet,kubernetes[1001-1016].eqiad.wmnet - Reinitialize eqiad k8s cluster with new etcd - T239835
  • 08:55 marostegui: Deploy schema change on s7 eqiad master - T253276
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2127's weight', diff saved to https://phabricator.wikimedia.org/P12514 and previous config saved to /var/cache/conftool/dbconfig/20200908-084834-marostegui.json
  • 08:45 volans: running homer 'cr*eqiad*' commit "Update debmonitor IPs, T261489"
  • 08:23 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=blubberoid,name=eqiad
  • 08:22 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
  • 08:21 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=codfw
  • 08:20 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
  • 08:16 moritzm: installing 4.19.132 kernel on buster systems (only installing the deb, reboots separately)
  • 07:44 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Revert "Update T250887 mitigations" (T250887; T262242) (duration: 00m 59s)
  • 07:44 elukey: roll restart kafka daemons on kafka-jumbo100[7-9] to pick up opendjk upgrades
  • 07:40 XioNoX: move HE from ix to transit BGP group on cr3-eqsin
  • 07:00 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:58 marostegui: Deploy schema change on s2 eqiad master - T253276
  • 06:58 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:56 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P12513 and previous config saved to /var/cache/conftool/dbconfig/20200908-065022-marostegui.json
  • 06:47 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:31 marostegui: Deploy schema change on s5 eqiad master - T253276
  • 06:23 elukey: roll restart of Hadoop master daemons on an-master100[1,2] to pick up new opejdk settings
  • 06:14 marostegui: Stop MySQL on db1106 for PDU maintenance T261452
  • 05:34 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime

2020-09-07

  • 23:35 Reedy: Deployed patch for T262213
  • 21:19 reedy@deploy1001: Synchronized private/PrivateSettings.php: Remove old mitigation (duration: 00m 55s)
  • 18:04 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 00m 56s)
  • 16:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:38 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12511 and previous config saved to /var/cache/conftool/dbconfig/20200907-153857-kormat.json
  • 15:32 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12510 and previous config saved to /var/cache/conftool/dbconfig/20200907-153206-kormat.json
  • 15:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12509 and previous config saved to /var/cache/conftool/dbconfig/20200907-152117-kormat.json
  • 15:17 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12508 and previous config saved to /var/cache/conftool/dbconfig/20200907-151718-kormat.json
  • 15:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:09 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. T261389', diff saved to https://phabricator.wikimedia.org/P12507 and previous config saved to /var/cache/conftool/dbconfig/20200907-150901-kormat.json
  • 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 15:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:03 moritzm: rebooting poolcounter1004/1005
  • 15:03 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for T261389', diff saved to https://phabricator.wikimedia.org/P12506 and previous config saved to /var/cache/conftool/dbconfig/20200907-150310-kormat.json
  • 15:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1133 from dbctl T253217', diff saved to https://phabricator.wikimedia.org/P12504 and previous config saved to /var/cache/conftool/dbconfig/20200907-143507-marostegui.json
  • 14:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:25 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:48 _joe_: restarting pybal in codfw to pick up the new mobileapps TLS endpoint
  • 13:44 _joe_: restarting pybal in eqiad to pick up the new mobileapps TLS endpoint
  • 13:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:28 hashar@deploy1001: Finished deploy [integration/docroot@e4e3af9]: Support published documents outside of the git checkout # T149924 (duration: 00m 05s)
  • 13:27 hashar@deploy1001: Started deploy [integration/docroot@e4e3af9]: Support published documents outside of the git checkout # T149924
  • 13:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:22 hashar@deploy1001: Finished deploy [integration/docroot@11ab4a0]: (no justification provided) (duration: 00m 10s)
  • 13:22 hashar@deploy1001: Started deploy [integration/docroot@11ab4a0]: (no justification provided)
  • 13:14 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 13:04 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 12:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 12:43 kormat@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
  • 12:42 kormat@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:29 marostegui: Upgrade and reboot db2094 and db2095 (sanitarium hosts in codfw)
  • 12:18 gehel: restarting elasticsearch on elastic2029 (high GC)
  • 12:01 volans: restart uwsgi on debmonitor1002 to test db reconnection
  • 11:58 marostegui: Reboot pc1008 for upgrade
  • 11:36 Urbanecm: EU B&C done
  • 11:30 urbanecm@deploy1001: Synchronized docroot/noc/index.html: bbfe2ce: noc: Remove link to outdated blog (T259978) (duration: 00m 57s)
  • 11:27 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: ff9f104: Update help URL (T256623) (duration: 00m 56s)
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b512d3: [hewiktionary] Enable wikilove (T262181) (duration: 00m 57s)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 35224f4: [eswiki] Create an `abusefilter` user group (T262174; 2/2) (duration: 00m 57s)
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 35224f4: [eswiki] Create an `abusefilter` user group (T262174; 1/2) (duration: 01m 20s)
  • 11:02 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewiktionary wikilove # T262181
  • 11:01 marostegui: Reboot pc1007 for upgrade
  • 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:36 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 09:30 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 09:12 dcausse@deploy1001: Finished deploy [wdqs/wdqs@c96b49e]: deploy wdqs-0.3.47 to wdqs1009 (test server) (duration: 00m 33s)
  • 09:11 dcausse@deploy1001: Started deploy [wdqs/wdqs@c96b49e]: deploy wdqs-0.3.47 to wdqs1009 (test server)
  • 09:10 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 09:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:02 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 08:53 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 08:49 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 08:29 jayme@deploy2001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:19 marostegui: Upgrade and restart pc1010
  • 08:18 jayme@deploy2001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:10 jayme@deploy2001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 08:03 marostegui: Compress InnoDB on s8 eqiad master (db1109) - T232446
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 after MCR schema change', diff saved to https://phabricator.wikimedia.org/P12501 and previous config saved to /var/cache/conftool/dbconfig/20200907-051157-marostegui.json
  • 04:56 marostegui: Compress InnoDB on s1 eqiad master - this will generate a few day of lag on s1 and labsdb for enwiki T254462
  • 04:53 marostegui: Deploy schema change on db1109 (eqiad wikidata master) - T256685

2020-09-06

  • 19:45 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db2127's weight a bit', diff saved to https://phabricator.wikimedia.org/P12496 and previous config saved to /var/cache/conftool/dbconfig/20200906-194512-marostegui.json
  • 08:20 elukey: powercycle mw1360 (mgmt console available, network errors while running anything)
  • 08:04 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1360.eqiad.wmnet
  • 08:01 elukey: executed "sudo ipmitool -I lanplus -H mw1360.mgmt.eqiad.wmnet -U root mc reset cold" from cumin (mgmt not available for mw1360)

2020-09-05

  • 00:23 foks: removing 2 files for legal compliance

2020-09-04

  • 22:15 ryankemper: wdqs deploy complete, service is healthy
  • 21:54 ryankemper: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
  • 21:52 ryankemper: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 21:49 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@c7e6b35]: 0.3.47 (duration: 12m 55s)
  • 21:37 ryankemper: Tests on canary `wdqs1003` passing, beginning full wdqs deploy
  • 21:36 ryankemper@deploy1001: Started deploy [wdqs/wdqs@c7e6b35]: 0.3.47
  • 21:31 ryankemper: `ryankemper@wdqs2002:~$ sudo systemctl restart wdqs-blazegraph`
  • 21:06 mutante: apt1001 - removed all libnginx-mod* packages except libnginx-mod-http-echo ; sudo apt-get autoremove ; run puppet ; restarted nginx - apt.wikimedia.org switched to nginx-light (T261962)
  • 21:02 mutante: apt1001 - remove all libnginx-mod* packages except libnginx-mod-http-echo
  • 20:59 mutante: apt2001 - sudo apt-get autoremove
  • 20:51 mutante: apt2001 - apt-get remove --purge libnginx* and run puppet to replace nginx-full with nginx-light (T261962)
  • 20:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:36 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:04 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:22 mutante: Icinga - ACKing with sticky - alerts on test and dev hosts
  • 18:10 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@95d6432]: AQS: new editors by country endpoint, low risk so trying on a Friday with SRE blessing (duration: 07m 35s)
  • 18:02 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: new editors by country endpoint, low risk so trying on a Friday with SRE blessing
  • 10:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12492 and previous config saved to /var/cache/conftool/dbconfig/20200904-102955-marostegui.json
  • 10:28 marostegui: Deploy MCR schema change on db1087 (sanitarium master), this will generate lag (probably a few days) on s8 labsdb hosts T238966
  • 09:48 marostegui: Restart prometheus-mysqld-exporter on db2125
  • 09:11 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 08:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 08:31 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 08:29 elukey: roll restart of the hadoop workers (test and analytics cluster) for openjdk upgrades
  • 08:08 moritzm: installing 4.19.132 kernel on buster systems (only installing the deb, reboots separately)
  • 07:30 moritzm: installing 4.9.228 kernel on stretch systems (only installing the deb, reboots separately)
  • 05:13 marostegui: Deploy MCR schema change on s4 eqiad master T238966
  • 01:51 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints (duration: 63m 18s)
  • 01:35 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:30 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 01:23 ryankemper: (Following the restart of blazegraph, service has been restored to `wdqs2003`. See https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1599182219699&to=1599182547699)
  • 01:16 ryankemper: Glancing at https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1599170628749&to=1599182011243, looks like `wdqs2003`'s blazegaph isn't happy based off the null data entries. Restarting blazegraph: `ryankemper@wdqs2003:~$ sudo systemctl restart wdqs-blazegraph`
  • 00:48 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints

2020-09-03

  • 23:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9394739: Start logging log-ins on select wikis (T253802) (duration: 00m 56s)
  • 21:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:55 milimetric@deploy1001: deploy aborted: AQS: Deploying new geoeditors endpoints (duration: 00m 13s)
  • 19:54 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@95d6432]: AQS: Deploying new geoeditors endpoints
  • 19:07 milimetric@deploy1001: Finished deploy [analytics/refinery@e4d5149] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d5149] (duration: 00m 08s)
  • 19:07 milimetric@deploy1001: Started deploy [analytics/refinery@e4d5149] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d5149]
  • 19:06 milimetric@deploy1001: Finished deploy [analytics/refinery@e4d5149]: Regular analytics weekly train [analytics/refinery@e4d5149] (duration: 09m 06s)
  • 18:57 milimetric@deploy1001: Started deploy [analytics/refinery@e4d5149]: Regular analytics weekly train [analytics/refinery@e4d5149]
  • 17:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:46 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 17:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:36 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:36 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:28 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:02 papaul: power down ores2009 for DIMM upgrade
  • 16:45 papaul: power down ores2008 for DIMM upgrade
  • 16:33 papaul: power down ores2007 for DIMM upgrade
  • 16:24 elukey: roll restart aqs on aqs1* to pick up new druid settings
  • 16:05 papaul: power down ores2006 for DIMM upgrade
  • 15:51 papaul: power down ores2005 for DIMM upgrade
  • 15:33 papaul: power down ores2004 for DIMM upgrade
  • 15:30 moritzm: installing nginx updates on apt* and htmldumper1001
  • 15:25 moritzm: installing firejail update (along with restarts) on thumbor1001, maps1001, restbase1016 (and -dev)
  • 15:22 papaul: power down ores2003 for DIMM upgrade
  • 15:17 moritzm: installing firejail security updates on parsoid servers
  • 15:08 papaul: power down ores2002 for DIMM upgrade
  • 14:53 papaul: power down ores2001 for DIMM upgrade
  • 14:36 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:30 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:29 jmm@deploy1001: Finished deploy [debmonitor/deploy@fb64c52]: deploy to new buster host (duration: 00m 06s)
  • 14:29 jmm@deploy1001: Started deploy [debmonitor/deploy@fb64c52]: deploy to new buster host
  • 14:13 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:11 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:00 marostegui: Failover m5 (wikitech) master - T260324
  • 13:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:53 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:43 jmm@deploy1001: Finished deploy [debmonitor/deploy@fb64c52]: deploy to new buster host (duration: 00m 18s)
  • 13:43 jmm@deploy1001: Started deploy [debmonitor/deploy@fb64c52]: deploy to new buster host
  • 13:40 jmm@deploy1001: Finished deploy [debmonitor/deploy@25dbd20]: deploy to new buster host, now the --force is with me (duration: 01m 29s)
  • 13:39 jmm@deploy1001: Started deploy [debmonitor/deploy@25dbd20]: deploy to new buster host, now the --force is with me
  • 13:32 jmm@deploy1001: Finished deploy [debmonitor/deploy@25dbd20]: deploy to new buster host (duration: 00m 05s)
  • 13:32 jmm@deploy1001: Started deploy [debmonitor/deploy@25dbd20]: deploy to new buster host
  • 13:08 marostegui: Start pre m5 failover steps T260324
  • 12:46 marostegui: Deploy MCR schema change on s7 eqiad master (lag might show up) - T238966
  • 12:30 hnowlan: enabling puppet on appservers, finished rollout of api.wikimedia.org https://gerrit.wikimedia.org/r/c/operations/puppet/+/623833
  • 12:19 kormat@cumin1001: dbctl commit (dc=all): 'Shift weights in s2 codfw to account for db2125 being down T260670', diff saved to https://phabricator.wikimedia.org/P12485 and previous config saved to /var/cache/conftool/dbconfig/20200903-121916-kormat.json
  • 12:17 moritzm: installing openexr security updates for stretch
  • 12:03 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2125 after hw issue', diff saved to https://phabricator.wikimedia.org/P12483 and previous config saved to /var/cache/conftool/dbconfig/20200903-120304-kormat.json
  • 11:45 moritzm: installing net-snmp security updates on Stretch
  • 11:45 moritzm: installing net-snmp security updates on Buster
  • 11:33 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage --fix | phaste # T260320 # P12481
  • 11:28 moritzm: installing PHP 7.0 security updates
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 04281a0: Add extra namespaces for jawikivoyage (T260320) (duration: 01m 01s)
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: 976d735: Lift IP cap on 2020-09-08 for Senior Citizen Write Wikipedia course - cs.wikipedia (T261882) (duration: 01m 01s)
  • 11:21 gilles@deploy1001: Synchronized static/images/project-logos: T252108 Deploying lossily optimised Wikipedia logos (duration: 01m 20s)
  • 10:50 hnowlan: disabling apache on appservers for rollout of https://gerrit.wikimedia.org/r/c/operations/puppet/+/623833
  • 10:38 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:07 XioNoX: re-apply vlan 1118 firewall filter and update OSPF/bootp on cr1/2-eqiad - T261866
  • 09:57 XioNoX: rectification: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 on cr1-eqiad - T261866
  • 09:56 XioNoX: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12480 and previous config saved to /var/cache/conftool/dbconfig/20200903-095510-marostegui.json
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12479 and previous config saved to /var/cache/conftool/dbconfig/20200903-095015-marostegui.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12478 and previous config saved to /var/cache/conftool/dbconfig/20200903-094857-marostegui.json
  • 09:48 XioNoX: move VRRP master from cr1-eqiad:ae2.1118 to cr2-eqiad:xe-3/0/4.1118 - T261866
  • 09:46 XioNoX: move vlan 1118 IPv4 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12477 and previous config saved to /var/cache/conftool/dbconfig/20200903-094435-marostegui.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12476 and previous config saved to /var/cache/conftool/dbconfig/20200903-094043-marostegui.json
  • 09:38 XioNoX: move vlan 1118 IPv6 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3317', diff saved to https://phabricator.wikimedia.org/P12475 and previous config saved to /var/cache/conftool/dbconfig/20200903-093629-marostegui.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12474 and previous config saved to /var/cache/conftool/dbconfig/20200903-093454-marostegui.json
  • 09:32 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:31 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2087:3316', diff saved to https://phabricator.wikimedia.org/P12473 and previous config saved to /var/cache/conftool/dbconfig/20200903-092549-marostegui.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316 db2087:3317 T261917', diff saved to https://phabricator.wikimedia.org/P12472 and previous config saved to /var/cache/conftool/dbconfig/20200903-092028-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12471 and previous config saved to /var/cache/conftool/dbconfig/20200903-091834-marostegui.json
  • 09:13 XioNoX: rolled back: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2122', diff saved to https://phabricator.wikimedia.org/P12470 and previous config saved to /var/cache/conftool/dbconfig/20200903-090901-marostegui.json
  • 09:06 XioNoX: move vlan 1118 from ae2.1118 to xe-3/0/4.1118 cr2-eqiad - T261866
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P12469 and previous config saved to /var/cache/conftool/dbconfig/20200903-090419-marostegui.json
  • 09:01 XioNoX: force ae2.1118 VRRP master on cr1-eqiad - T261866
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317, db1098:3316', diff saved to https://phabricator.wikimedia.org/P12468 and previous config saved to /var/cache/conftool/dbconfig/20200903-090007-marostegui.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1090:3317', diff saved to https://phabricator.wikimedia.org/P12467 and previous config saved to /var/cache/conftool/dbconfig/20200903-085838-marostegui.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12466 and previous config saved to /var/cache/conftool/dbconfig/20200903-085708-marostegui.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12465 and previous config saved to /var/cache/conftool/dbconfig/20200903-084910-marostegui.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1090:3312', diff saved to https://phabricator.wikimedia.org/P12464 and previous config saved to /var/cache/conftool/dbconfig/20200903-084836-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3317, db1090:3312', diff saved to https://phabricator.wikimedia.org/P12463 and previous config saved to /var/cache/conftool/dbconfig/20200903-084358-marostegui.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2122', diff saved to https://phabricator.wikimedia.org/P12462 and previous config saved to /var/cache/conftool/dbconfig/20200903-084147-marostegui.json
  • 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2122 T261917', diff saved to https://phabricator.wikimedia.org/P12461 and previous config saved to /var/cache/conftool/dbconfig/20200903-082956-marostegui.json
  • 08:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
  • 08:28 moritzm: rebooting mwmaint1002 for kernel update
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12460 and previous config saved to /var/cache/conftool/dbconfig/20200903-082655-marostegui.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12459 and previous config saved to /var/cache/conftool/dbconfig/20200903-082034-marostegui.json
  • 08:16 marostegui: Upgrade db1101 (s7 and s8)
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12458 and previous config saved to /var/cache/conftool/dbconfig/20200903-081543-marostegui.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318, db1101:3317', diff saved to https://phabricator.wikimedia.org/P12457 and previous config saved to /var/cache/conftool/dbconfig/20200903-081503-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12456 and previous config saved to /var/cache/conftool/dbconfig/20200903-081337-marostegui.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12455 and previous config saved to /var/cache/conftool/dbconfig/20200903-080714-marostegui.json
  • 08:06 marostegui: Upgrade and reboot db1127
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3317', diff saved to https://phabricator.wikimedia.org/P12454 and previous config saved to /var/cache/conftool/dbconfig/20200903-080634-marostegui.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12453 and previous config saved to /var/cache/conftool/dbconfig/20200903-080024-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12452 and previous config saved to /var/cache/conftool/dbconfig/20200903-075443-marostegui.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12451 and previous config saved to /var/cache/conftool/dbconfig/20200903-074922-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3317 T261917', diff saved to https://phabricator.wikimedia.org/P12450 and previous config saved to /var/cache/conftool/dbconfig/20200903-074827-marostegui.json
  • 07:45 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 07:45 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 07:45 marostegui: Upgrade and reboot db1094
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12449 and previous config saved to /var/cache/conftool/dbconfig/20200903-074426-marostegui.json
  • 07:38 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 07:38 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12448 and previous config saved to /var/cache/conftool/dbconfig/20200903-073718-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12447 and previous config saved to /var/cache/conftool/dbconfig/20200903-073116-marostegui.json
  • 07:29 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12446 and previous config saved to /var/cache/conftool/dbconfig/20200903-072716-marostegui.json
  • 07:24 hashar: contint2001: restarting CI Jenkins for plugins upgrade
  • 07:19 marostegui: Deploy schema change on s8 eqiad master T237120
  • 07:18 marostegui: Stop slave on s8 eqiad master (lag will appear on s8 eqiad) - T237120
  • 07:02 marostegui: Stop db2100:3317 and db2121 in sync to reload metawiki.content T261869
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 T261869', diff saved to https://phabricator.wikimedia.org/P12445 and previous config saved to /var/cache/conftool/dbconfig/20200903-070104-marostegui.json
  • 06:56 hashar: contint2001: restarting CI Jenkins
  • 06:56 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 06:56 _joe_: deployment of mobileapps to pick up changes to envoy config, new helmfile layout
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12444 and previous config saved to /var/cache/conftool/dbconfig/20200903-065105-marostegui.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12443 and previous config saved to /var/cache/conftool/dbconfig/20200903-064804-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12442 and previous config saved to /var/cache/conftool/dbconfig/20200903-064623-marostegui.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12441 and previous config saved to /var/cache/conftool/dbconfig/20200903-064334-marostegui.json
  • 06:24 marostegui: Disconnect eqiad -> codfw replication

2020-09-02

  • 22:55 shdubsh: restart rsyslog on centrallog[12]001
  • 22:27 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart wdqs-blazegraph.service"`
  • 22:26 ryankemper: Puppet finished on all external wdqs codfw nodes, nginx automatically reloaded as intended
  • 22:24 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo run-puppet-agent"`
  • 21:48 bd808@deploy1001: Finished deploy [striker/deploy@3c2090a]: Deploying r20200902 tag (T198114, T223610, T245804, T144111, T261810) (duration: 01m 34s)
  • 21:46 bd808@deploy1001: Started deploy [striker/deploy@3c2090a]: Deploying r20200902 tag (T198114, T223610, T245804, T144111, T261810)
  • 21:10 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart wdqs-blazegraph.service"`
  • 21:10 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal' "sudo systemctl restart nginx.service"`
  • 21:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:01 ryankemper: Restarted nginx on `wdqs2007`
  • 21:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:47 ryankemper: restarted blazegraph on `wdqs2001` as well
  • 20:46 ryankemper: `sudo cumin -b10 'P{wdqs2*} and not A:wdqs-test and not A:wdqs-internal and not P{wdqs2001.codfw.wmnet}' "sudo systemctl restart wdqs-blazegraph.service"` (restarted everything but 2001, will restart 2001 next)
  • 20:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:20 robh: scs-c1-eqiad firmware update complete and back online T238036
  • 19:14 robh: updating firmware on scs-c1-eqiad via T238036
  • 19:14 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Revert "Update T250887 mitigations" (duration: 00m 32s)
  • 18:58 herron: freeing some disk space on centrallog1001 with 'tune2fs -m 0 /dev/centrallog1001-vg/data'
  • 18:43 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622898 Install OAuthRateLimiter III: Install where enabled, ouch, forgot to rebase (duration: 00m 55s)
  • 18:40 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622898 Install OAuthRateLimiter III: Install where enabled (duration: 00m 55s)
  • 18:38 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka jumbo-eqiad (for consistency with main) - T261865
  • 18:37 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka main-codfw - T261865
  • 18:36 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:622897 Install OAuthRateLimiter extension II: Add flag to IS (duration: 00m 56s)
  • 18:34 ottomata: execute kafka topics --alter --topic codfw.resource_change --partitions 3 and kafka topics --alter --topic eqiad.resource_change --partitions 3 on kafka main-eqiad - T261865
  • 18:33 ppchelko@deploy1001: Synchronized wmf-config/extension-list: (no justification provided) (duration: 00m 54s)
  • 18:32 ottomata: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka jumbo-eqiad (for consistency with main) - T261865
  • 18:28 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/DiscussionTools/: Backport Fix parsing localised digits in PHP discussion parser (duration: 00m 56s)
  • 18:19 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/DiscussionTools/: Backport Re-apply new reply API patches (again) (duration: 00m 58s)
  • 17:34 bstorm: re-enabled puppet on labsdb10[09-12]
  • 17:28 bstorm: disabled puppet on labsdb10[09-12]
  • 17:18 herron: restarted elasticsearch on logstash1012
  • 16:39 Pchelolo: creating oauth_ratelimit_client_tier table T258711
  • 15:55 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
  • 15:55 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
  • 15:55 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
  • 15:32 hnowlan: Temporarily disabling apache for configuration change T246945
  • 15:24 godog: prometheus codfw lvextend --resizefs --size +50G /dev/mapper/vg--ssd-prometheus--k8s
  • 15:19 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
  • 15:18 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
  • 15:18 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
  • 15:17 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:16 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
  • 15:15 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
  • 15:15 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main
  • 15:11 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:31 elukey: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka-main eqiad - T261865
  • 14:29 elukey: execute kafka topics --alter --topic codfw.resource-purge --partitions 3 and kafka topics --alter --topic eqiad.resource-purge --partitions 3 on kafka-main codfw - T261865
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2120 T261869', diff saved to https://phabricator.wikimedia.org/P12434 and previous config saved to /var/cache/conftool/dbconfig/20200902-141854-marostegui.json
  • 13:05 elukey: run kafka preferred-replica-election on kafka-main codfw
  • 12:07 XioNoX: move vrrp master from cr2-codfw to cr1-codfw
  • 11:52 duesen__: daniel@mwmaint2001:/srv/mediawiki/php-1.36.0-wmf.6$ mwscript findBadBlobs.php testwiki --mark T251778
  • 11:36 Urbanecm: EU B&C done
  • 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 796b4fa: Add title for apiportalwiki (T246945) (duration: 00m 56s)
  • 11:34 Urbanecm: Fetched extra commits to deploy1001's stagging dir, commit messages explains it's an accident, continuing; cc Krinkle
  • 11:31 duesen__: Deployed second security fix for T260485
  • 11:07 XioNoX: repool cr1-eqiad
  • 10:58 XioNoX: cr1-eqiad:request chassis routing-engine master switch
  • 10:49 XioNoX: reboot cr1-eqiad:re0 (backup)
  • 10:45 jbond42: install apache updates on buster
  • 10:36 XioNoX: cr1-eqiad:request chassis routing-engine master switch
  • 10:35 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=codfw
  • 10:34 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async
  • 10:32 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main
  • 10:31 jbond42: install apache updates on jessie
  • 10:27 XioNoX: reboot cr1-eqiad:re1 (backup)
  • 10:18 XioNoX: move VRRP master from cr1 to cr2
  • 10:16 XioNoX: drain cr1-eqiad transit/transport/IX
  • 10:13 XioNoX: drain cr1-eqiad-pfw3-eqiad link
  • 10:04 XioNoX: repool cr2-eqiad
  • 09:55 XioNoX: cr2-eqiad:request chassis routing-engine master switch - T259621
  • 09:46 XioNoX: reboot cr2-eqiad:re0 (backup) - T259621
  • 09:28 XioNoX: cr2-eqiad:request chassis routing-engine master switch - T259621
  • 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:18 XioNoX: reboot cr2-eqiad:re1 (backup) - T259621
  • 09:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:13 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:13 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
  • 09:12 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:11 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:08 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
  • 09:07 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
  • 09:01 elukey: reimage kafka-jumbo1004 to Buster
  • 08:58 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1128 from s10 - T260324', diff saved to https://phabricator.wikimedia.org/P12432 and previous config saved to /var/cache/conftool/dbconfig/20200902-085705-marostegui.json
  • 08:52 XioNoX: deactivate cr2-eqiad transit/IX - T259621
  • 08:50 XioNoX: drain cr2-eqiad transport links - T259621
  • 08:20 XioNoX: activate Telia BGP in eqiad
  • 07:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:38 elukey: reimage kafka-jumbo1003 to buster
  • 07:28 marostegui: Reboot dbstore1003 for kernel upgrade - T261389
  • 07:12 XioNoX: configure cr2-eqiad:ae5 as single LACP link to Telia
  • 07:05 marostegui: Drop unused grants on m5 T261152
  • 07:02 elukey: reboot kafka-jumbo1002 to pick up new kernel settings
  • 07:00 XioNoX: deactivate Telia BGP in eqiad
  • 06:38 elukey: powercycle analytics1059 - cpu soft locks on multiple CPUs
  • 06:30 elukey: reboot kafka-jumbo1001 to pick up new kernel settings
  • 06:30 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 06:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:29 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:21 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'production' .

2020-09-01

  • 22:39 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=sysop_itwiki Pierpao (T261722)
  • 17:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:36 ryankemper: wdqs [canary] rollback complete, tests passing now. Will need to dig into source of failure
  • 17:35 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@7920fbe]: 0.3.46 (duration: 03m 43s)
  • 17:35 ryankemper: `wdqs1003` (the canary instance) is failing tests now, going to rollback
  • 17:32 ryankemper@deploy1001: Started deploy [wdqs/wdqs@7920fbe]: 0.3.46
  • 17:30 ryankemper: Starting wdqs deploy
  • 15:56 chasemp: labsdb* puppet agent --test; sudo /usr/local/sbin/maintain-views --all-databases --table user --replace-all; sudo /usr/local/sbin/maintain-views --all-databases --table user_old --replace-all
  • 15:25 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 14:28 _joe_: restarting envoy on all eqiad jobrunners
  • 14:22 _joe_: restarted confd on mwmaint1002
  • 14:18 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 14:18 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 14:17 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:15 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2083 weight', diff saved to https://phabricator.wikimedia.org/P12429 and previous config saved to /var/cache/conftool/dbconfig/20200901-141521-marostegui.json
  • 14:15 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 14:14 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:07 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:07 rzl@cumin1001: MediaWiki read-only period ends at: 2020-09-01 14:07:36.305500
  • 14:07 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:04 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
  • 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 14:04 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 14:02 rzl@cumin1001: MediaWiki read-only period starts at: 2020-09-01 14:02:04.851006
  • 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 13:58 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 13:58 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:51 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:50 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:45 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:44 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:40 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 13:39 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 13:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 13:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 13:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 13:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 10:37 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:48 XioNoX: reserve cr2-eqiad:xe-3/3/7 for new Telia port
  • 09:38 jayme: systemctl restart docker-reporter-releng-images.service on deneb to clear out alert because of temporary HTTP 504 from debmonitor
  • 09:01 moritzm: installing Java 8 sec updates on contint*
  • 08:51 moritzm: uploaded apache 2.4.10-10+deb8u16+wmf1 for jessie-wikimedia
  • 07:11 moritzm: installing 4.9.228 kernel on stretch systems (only installing the deb, reboots separately)
  • 07:05 moritzm: restarting jenkins on releases1002 to pick up Java security updates
  • 06:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:44 elukey: reimage kafka-jumbo1002 to Buster
  • 06:20 marostegui: Install query killers on db2137:3314 T243373
  • 01:17 chaomodus: updated the pynetbox package to 5.0.7 and uploaded to buster
  • 00:02 mutante: wb2-grrrri was not running and wikibugs had no more Gerrit updates since a while
  • 00:01 mutante: restarting wikibugs

2020-08-31

  • 23:38 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 (final) (duration: 00m 17s)
  • 23:38 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 (final)
  • 23:37 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox2001 (duration: 01m 12s)
  • 23:36 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox2001
  • 23:36 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox1001 (duration: 00m 58s)
  • 23:35 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Deploy of 2.8.9 to netbox1001
  • 23:31 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next pt2 (duration: 00m 05s)
  • 23:31 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next pt2
  • 23:31 crusnov@deploy1001: Finished deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next (duration: 00m 57s)
  • 23:30 crusnov@deploy1001: Started deploy [netbox/deploy@2fc439e]: Test deploy of 2.8.9 to netbox-next
  • 23:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable (future) mw-reverted tag for all wikis except testwiki (T254074) (duration: 00m 57s)
  • 21:06 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:00 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:20 ryankemper: `sudo systemctl restart elasticsearch_6@production-search-psi-eqiad.service` on `elastic1052.eqiad.wmnet`
  • 18:38 Urbanecm: Morning B&C done
  • 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 16197aa: Add two domains to wgCopyUploadsDomains for commonswiki (T261562; T261575) (duration: 00m 54s)
  • 18:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bb28e9d: itwiki: Assign patrol right to autopatrolled instead of autoconfirmed (T261587) (duration: 00m 53s)
  • 18:23 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: a1b0d6e: b609cd5: CommonSettings.php: limit new Echos `push-subscription-manager` group to Meta-Wiki (T261625) (duration: 00m 54s)
  • 18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 846c544: wgEventStreams: Stream for MEP-iOS pilot (T260382) (duration: 00m 55s)
  • 17:21 volans: uploaded spicerack_0.0.42 to apt.wikimedia.org buster-wikimedia
  • 15:50 rzl@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
  • 15:49 ejegg: updated payments-wiki from ef7ebd08cb to be81063168
  • 15:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 15:33 rzl@cumin1001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 15:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=99)
  • 15:32 rzl@cumin1001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 14:58 ema: Traffic: depool eqiad from user traffic T243316
  • 14:38 moritzm: installing rake security updates on stretch
  • 14:33 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 14:21 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 14:20 rzl@cumin1001: Switching services apertium, termbox, search, api-gateway, ores, sessionstore, eventgate-main, graphoid, eventstreams, wikifeeds, wdqs, parsoid, eventgate-logging-external, wdqs-internal, echostore, mathoid, mobileapps, proton, restbase, kartotherian, recommendation-api, eventgate-analytics-external, restbase-async, citoid, schema, cxserver, eventgate-analytics, zotero: eqiad => codfw
  • 14:20 rzl@cumin1001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 14:18 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 14:13 rzl@cumin1001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 14:12 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=99)
  • 14:11 rzl@cumin1001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 13:41 andrewbogott: dropping many databases from m5, as per T261152
  • 13:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:15 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:07 marostegui: Failover m3 (phabricator) proxy from dbproxy1016 to dbproxy1020 - T261459
  • 13:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:54 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 12:54 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 12:53 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 12:53 oblivian@cumin2001: Switching services parsoid: eqiad => codfw
  • 12:53 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 12:53 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 12:48 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 12:45 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 12:45 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 12:44 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 12:44 oblivian@cumin2001: Switching services restbase-async: eqiad => codfw
  • 12:44 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 12:43 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 12:37 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 12:14 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 12:14 oblivian@cumin2001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 12:13 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 12:13 oblivian@cumin2001: Switching services restbase-async: eqiad => codfw
  • 12:13 oblivian@cumin2001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 12:10 oblivian@cumin2001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 12:05 oblivian@cumin2001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 11:58 elukey: reimage kafka-jumbo1001 to Buster
  • 11:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: 5d583d9: Disable MediaSearch A/B test (duration: 00m 55s)
  • 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 81f88fd: Enable Signature button on Wikiproject for hywiki (T261550) (duration: 00m 54s)
  • 11:22 jbond42: removing old hiera version 1 and 3 backends
  • 11:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b74893f: Enable sitenotice on mobile for closed wikis (T261357) (duration: 00m 56s)
  • 11:02 volans: upgraded spicerack to 0.0.41 on cumin hosts
  • 10:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 09:51 elukey: executed /srv/phab/phabricator/bin/remove destroy @klausman on phab1001 (following https://wikitech.wikimedia.org/wiki/Phabricator#Delete_a_user) to clear incosistent state of new account (wrong email address)
  • 08:43 moritzm: installing bind9 security updates on stretch/buster (client-side tools/libs only)
  • 07:53 volans: uploaded spicerack_0.0.41 to apt.wikimedia.org buster-wikimedia
  • 07:30 moritzm: installing squid security updates
  • 07:24 moritzm: installing openexr security updates on buster
  • 07:12 marostegui: Sanitize jawikivoyage on db2094:3325 and db1124:3325 T260482
  • 06:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:06 elukey: reimage kafka-jumbo1005 to Debian Buster
  • 05:21 marostegui: Reload haproxy on dbproxy1017 and dbproxy1021 to test db1128

2020-08-30

  • 16:13 herron: restarted eqiad v5 logstashes

2020-08-29

  • 18:05 Amir1: end of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T261451)
  • 17:45 Amir1: start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T261451)

2020-08-28

  • 21:53 ryankemper: `sudo systemctl reload nginx.service` on `cloudelastic100[5,6].wikimedia.org` to try to resolve certificate warning issues
  • 19:11 andrewbogott: rebooting cloudvirt1006. It's a spare, unused system but showing a bus error and icinga alerts; not worth saving if it needs saving
  • 17:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:39 mutante: shutting down mw2196
  • 17:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:40 rzl: switchdc live test complete
  • 16:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 16:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 16:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 16:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 16:33 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 16:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 16:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 16:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 16:29 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-08-28 16:29:24.432463
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 16:29 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 16:29 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 16:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 16:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 16:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 16:28 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-08-28 16:28:07.882663
  • 16:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 16:19 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 16:19 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 16:13 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 16:12 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 16:09 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 16:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 16:08 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 16:08 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 16:07 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 16:07 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 16:06 rzl: starting one more live test of the data center switchover automation, no production impact is expected but there will be some SAL noise
  • 14:22 moritzm: installing Java security updates on kafka/main and Logstash(5) clusters
  • 13:35 hashar@deploy1001: Finished deploy [integration/docroot@65ec92c]: noop, sync up for README.md (duration: 00m 07s)
  • 13:35 hashar@deploy1001: Started deploy [integration/docroot@65ec92c]: noop, sync up for README.md
  • 13:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:07 elukey: stop kafka on kafka-jumbo1006 and reimage to buster
  • 12:56 moritzm: installing debmonitor1002 T261492
  • 12:46 moritzm: installing debmonitor2002 T261492
  • 11:50 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:40 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 11:27 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 11:27 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 11:27 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 09:48 jayme: updated helm to 2.16.9-3 on chartmuseum*, contint*, deploy*
  • 09:19 jayme: imported helm_2.16.9-3 to buster-wikimedia, stretch-wikimedia, jessie-wikimedia
  • 08:22 kormat: enabling replication from db2112 to db1083 (s1) T243373
  • 07:41 jynus: restart backup2001,backup1002
  • 07:10 jynus: restart db2139
  • 07:07 marostegui: Warm up parsercache in codfw - T260042
  • 06:47 jynus: restart db2102
  • 06:28 jynus: restart db2100
  • 06:07 jynus: restart db2099
  • 05:50 jynus: restart db2098
  • 00:06 eileen: process-control config revision is dd541a25dc

2020-08-27

  • 23:53 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:51 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 23:48 eileen: civicrm revision changed from a942537984 to 3d501e71d9, config revision is dd541a25dc
  • 22:54 eileen: civicrm revision changed from 481ab742db to a942537984, config revision is e2ab4d7c1f
  • 22:28 tzatziki: removing one file for legal compliance
  • 22:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 22:18 volans: uploaded spicerack_0.0.40-1_amd64.deb to apt.wikimedia.org buster-wikimedia
  • 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:57 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:41 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:29 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:25 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:22 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:17 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 21:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:16 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 21:14 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:10 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 21:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:50 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw221[0-4].codfw.wmnet
  • 20:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:49 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw220[0-9].codfw.wmnet
  • 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw214[0-7].codfw.wmnet
  • 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:47 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=api_appserver,name=mw213[0-9].codfw.wmnet
  • 20:43 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Streams for testing MEP-based analytics instruments - T259714 (duration: 00m 55s)
  • 19:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:57 marxarelli: 1.36.0-wmf.6 promoted to all wikis (T257974). new errors appear to be related to T261345 but are known since 1.36.0-wmf.5
  • 19:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=appserver,name=mw21[8-9][0-9]*.codfw.wmnet
  • 19:41 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.6
  • 19:22 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 11s)
  • 19:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:19 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:16 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating apiportalwiki (T246945)
  • 19:15 urbanecm@deploy1001: Synchronized dblists: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:14 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:13 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 19:11 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating apiportalwiki (T246945) (duration: 01m 03s)
  • 18:54 mforns@deploy1001: Finished deploy [analytics/refinery@e85191b] (thin): Regular analytics weekly train THIN [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9] (duration: 00m 08s)
  • 18:54 mforns@deploy1001: Started deploy [analytics/refinery@e85191b] (thin): Regular analytics weekly train THIN [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9]
  • 18:53 mforns@deploy1001: Finished deploy [analytics/refinery@e85191b]: Regular analytics weekly train [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9] (duration: 10m 01s)
  • 18:43 mforns@deploy1001: Started deploy [analytics/refinery@e85191b]: Regular analytics weekly train [analytics/refinery@e85191bb80c13781b41031340ea318b5f854b6a9]
  • 18:43 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Assign all homepage users to variant A (duration: 01m 03s)
  • 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on ruwiki (T257490) (duration: 01m 03s)
  • 18:17 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=jobrunner,name=mw2250.codfw.wmnet,service=canary
  • 18:17 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=jobrunner,name=mw2249.codfw.wmnet,service=canary
  • 18:16 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 18:16 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 18:14 dzahn@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=jobrunner,name=mw1318.eqiad.wmnet
  • 18:07 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw229[1-9].codfw.wmnet,cluster=api_appserver
  • 18:06 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2290.codfw.wmnet,cluster=api_appserver
  • 18:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw22[6-8][0-9].codfw.wmnet,cluster=api_appserver
  • 18:03 Urbanecm: Creating jawikivoyage is done (T260320)
  • 18:02 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 59s)
  • 18:02 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw225[0-9].codfw.wmnet,cluster=api_appserver
  • 18:00 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating jawikivoyage (T260320) (duration: 01m 02s)
  • 17:59 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw224[4-5].codfw.wmnet,service=canary
  • 17:59 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw224[4-5].codfw.wmnet
  • 17:59 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating jawikivoyage (T260320) (duration: 01m 03s)
  • 17:58 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating jawikivoyage (T260320)
  • 17:57 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw222[0-3].codfw.wmnet
  • 17:56 urbanecm@deploy1001: Synchronized dblists: Creating jawikivoyage (T260320) (duration: 00m 58s)
  • 17:56 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw221[5-9].codfw.wmnet,service=canary
  • 17:55 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw221[5-9].codfw.wmnet
  • 17:55 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating jawikivoyage (T260320) (duration: 01m 03s)
  • 17:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw221[0-4].codfw.wmnet
  • 17:54 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw221[0-4].codfw.wmnet
  • 17:54 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating jawikivoyage (T260320) (duration: 01m 07s)
  • 17:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw220[1-9].codfw.wmnet
  • 17:52 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw220[1-9].codfw.wmnet
  • 17:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2200.codfw.wmnet
  • 17:50 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2200.codfw.wmnet
  • 17:48 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw214[0-7].codfw.wmnet
  • 17:47 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw213[5-9].codfw.wmnet
  • 17:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw214[0-7].codfw.wmnet
  • 17:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw213[5-9].codfw.wmnet
  • 17:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw23[0-7][0-9].codfw.wmnet
  • 17:31 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[0-7].codfw.wmnet,service=canary
  • 17:30 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw227[0-7].codfw.wmnet
  • 17:29 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 17:29 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 17:18 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:17 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw226[8-9].codfw.wmnet
  • 17:13 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw225[4-8].codfw.wmnet
  • 17:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 17:11 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw224[0-2].codfw.wmnet
  • 17:04 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw223[2-9].codfw.wmnet
  • 17:01 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2231.codfw.wmnet
  • 16:59 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2230.codfw.wmnet
  • 16:54 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw222[4-9].codfw.wmnet
  • 16:49 mutante: re-weighted appservers and api appservers in eqiad - hardware type G = weight 25, all other types = weight 30 (T261159)
  • 16:48 mutante: depooling mw2187 - mw2199 - old codfw appservers of type A to be decom'ed, previously weight 10 (T260654)
  • 16:47 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw219[0-9].codfw.wmnet
  • 16:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw218[7-9].codfw.wmnet
  • 16:35 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1297.eqiad.wmnet
  • 16:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:21 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw127[0-5].eqiad.wmnet
  • 16:19 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw126[1-5].eqiad.wmnet,service=canary
  • 16:14 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw126[1-9].eqiad.wmnet
  • 16:12 elukey: remove some old/stale terms from analytics-in4 on cr1/cr2-eqiad (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622746, https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622744)
  • 16:09 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw127[6-9].eqiad.wmnet,service=canary
  • 16:08 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw127[6-9].eqiad.wmnet
  • 16:06 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1290.eqiad.wmnet
  • 16:05 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw128[0-9].eqiad.wmnet
  • 15:52 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1290.eqiad.wmnet
  • 15:51 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw128[0-9].eqiad.wmnet
  • 15:43 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw127[7-9].eqiad.wmnet,service=canary
  • 15:43 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1276.eqiad.wmnet,service=canary
  • 15:41 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw127[6-9].eqiad.wmnet
  • 15:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1297.eqiad.wmnet
  • 15:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1269.eqiad.wmnet
  • 15:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1267.eqiad.wmnet
  • 14:48 moritzm: installing Java security updates on aqs, hadoop and kafka-jumbo
  • 14:44 moritzm: restarting tomcat on idp-test* hosts to pick up Java update
  • 14:42 elukey: add eventgate-related terms to analytics-in4 filter on cr1/cr2-eqiad (ref https://gerrit.wikimedia.org/r/c/operations/homer/public/+/622705)
  • 14:37 moritzm: imported openjdk 8u265-b01-1~deb10u1 to buster-wikimedia (forward port of latest Java 8 security update)
  • 14:31 papaul: replacing msw-c5,c6,c7 and fmsw-c8
  • 13:58 kormat: disabling GTID on pc2007 (pc1), pc2008 (pc2), pc2009 (pc3) T243373
  • 13:56 kormat: disabling GTID on db2096 (x1), es2021 (es4), es2023 (es5) T243373
  • 13:54 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:53 kormat: disabling GTID on db2129 (s6), db2118 (s7), db2079 (s8) T243373
  • 13:52 kormat: disabling GTID on db2123 (s5) T243373
  • 13:52 kormat: disabling GTID on db2090 (s4) T243373
  • 13:51 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:51 kormat: disabling GTID on db2105 (s3) T243373
  • 13:50 kormat: disabling GTID on db2107 (s2) T243373
  • 13:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:29 elukey: restart jvm daemons on analytics1042, aqs1004, kafka-jumbo1001 to pick up new openjdk upgrades (canaries)
  • 13:18 kormat: enabling replication from db2107 to db1122 (s2) T243373
  • 13:14 kormat: enabling replication from db2096 to db1103 (x1) T243373
  • 13:10 jynus: restart db2097
  • 13:07 jbond42: deploy python3.4 security update to kraz
  • 13:03 jbond42: deploy python3.4 security update to canaries on jessie
  • 13:01 kormat: enabling replication from db2118 to db1086 (s7) T243373
  • 12:52 jynus: restart db1140
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s8 weights T243373', diff saved to https://phabricator.wikimedia.org/P12402 and previous config saved to /var/cache/conftool/dbconfig/20200827-124338-marostegui.json
  • 12:35 jynus: restart db1139
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s7 weights T243373', diff saved to https://phabricator.wikimedia.org/P12401 and previous config saved to /var/cache/conftool/dbconfig/20200827-123028-marostegui.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s7 weights T243373', diff saved to https://phabricator.wikimedia.org/P12400 and previous config saved to /var/cache/conftool/dbconfig/20200827-123003-marostegui.json
  • 12:24 marostegui: Fix password format for in db2129 (s6 codfw master) T243373
  • 12:14 kormat: enabling replication from db2129 to db1093 (s6) T243373
  • 12:13 jynus: restart db1095
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s6 weights T243373', diff saved to https://phabricator.wikimedia.org/P12399 and previous config saved to /var/cache/conftool/dbconfig/20200827-120816-marostegui.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 codfw weights T243373', diff saved to https://phabricator.wikimedia.org/P12398 and previous config saved to /var/cache/conftool/dbconfig/20200827-120211-marostegui.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 eqiad weights T243373', diff saved to https://phabricator.wikimedia.org/P12397 and previous config saved to /var/cache/conftool/dbconfig/20200827-115934-marostegui.json
  • 11:56 Urbanecm: Lift range blocks exceeding wgBlockCIDRLimit via custom script from F32197596 (ruwiki, ruwikiquote; T243980)
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s4 codfw weights T243373', diff saved to https://phabricator.wikimedia.org/P12396 and previous config saved to /var/cache/conftool/dbconfig/20200827-115110-marostegui.json
  • 11:49 moritzm: uploaded python3.4 3.4.2-1+deb8u7+wmf1 for jessie-wikimedia T259102
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 codfw weights T243373', diff saved to https://phabricator.wikimedia.org/P12395 and previous config saved to /var/cache/conftool/dbconfig/20200827-114509-marostegui.json
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust db2126 weight T243373', diff saved to https://phabricator.wikimedia.org/P12394 and previous config saved to /var/cache/conftool/dbconfig/20200827-112213-marostegui.json
  • 11:12 Urbanecm: EU B&C done
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 34994d3: Add $wgTranslateMessageNamespaces[] = NS_MEDIAWIKI; for commonswiki (T131300) (duration: 01m 03s)
  • 10:57 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:56 godog: bounce grafana to apply new settings
  • 10:51 kormat: enabling replication from db2123 to db1100 (s5) T243373
  • 10:48 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:30 kormat: enabling replication from es2023 to es1024 (es5) T243373
  • 10:28 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 10:23 kormat: enabling replication from es2021 to es1021 (es4) T243373
  • 10:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 10:19 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 10:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 10:03 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 09:54 moritzm: installing Java security updates on IDP* hosts
  • 09:51 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 09:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:44 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:44 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:43 elukey: decommissioning vms schema[12]00[12] (replaced previously by schema[12]00[34] buster vms)
  • 09:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:41 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 09:39 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:20 kormat: enabling replication from db2105 to db1123 (s3) T243373
  • 09:15 kormat: enabling replication from db2079 to db1109 (s8) T243373
  • 09:07 kormat: enabling replication from db2090 to db1081 (s4) T243373
  • 08:53 kormat: enabling replication from pc2009 to pc1009 (pc3) T243373
  • 08:44 kormat: enabling replication from pc2008 to pc1008 (pc2) T243373
  • 08:13 marostegui: Enable replication codfw -> eqiad on pc1 T243373
  • 08:01 gehel: manual cleanup of stale wdqs deploy crontab on wdqs1009
  • 07:35 marostegui: Move pc2010 under pc2007 T243373
  • 07:16 moritzm: installing ghostscript security updates on stretch
  • 06:50 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 06:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:46 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:45 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:34 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:31 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12392 and previous config saved to /var/cache/conftool/dbconfig/20200827-060652-marostegui.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12391 and previous config saved to /var/cache/conftool/dbconfig/20200827-055815-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079,db1082', diff saved to https://phabricator.wikimedia.org/P12390 and previous config saved to /var/cache/conftool/dbconfig/20200827-055522-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12389 and previous config saved to /var/cache/conftool/dbconfig/20200827-055126-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12388 and previous config saved to /var/cache/conftool/dbconfig/20200827-055104-marostegui.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P12387 and previous config saved to /var/cache/conftool/dbconfig/20200827-054259-marostegui.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1074 db1085 db1078', diff saved to https://phabricator.wikimedia.org/P12386 and previous config saved to /var/cache/conftool/dbconfig/20200827-054114-marostegui.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P12385 and previous config saved to /var/cache/conftool/dbconfig/20200827-053814-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12384 and previous config saved to /var/cache/conftool/dbconfig/20200827-053558-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074, db1085, db1079', diff saved to https://phabricator.wikimedia.org/P12383 and previous config saved to /var/cache/conftool/dbconfig/20200827-053509-marostegui.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074, db1085, db1079', diff saved to https://phabricator.wikimedia.org/P12382 and previous config saved to /var/cache/conftool/dbconfig/20200827-053100-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134', diff saved to https://phabricator.wikimedia.org/P12381 and previous config saved to /var/cache/conftool/dbconfig/20200827-052925-marostegui.json
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P12380 and previous config saved to /var/cache/conftool/dbconfig/20200827-052818-marostegui.json
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074', diff saved to https://phabricator.wikimedia.org/P12379 and previous config saved to /var/cache/conftool/dbconfig/20200827-052413-marostegui.json
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P12378 and previous config saved to /var/cache/conftool/dbconfig/20200827-051609-marostegui.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12377 and previous config saved to /var/cache/conftool/dbconfig/20200827-051546-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1134 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12376 and previous config saved to /var/cache/conftool/dbconfig/20200827-050754-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P12375 and previous config saved to /var/cache/conftool/dbconfig/20200827-050727-marostegui.json
  • 04:53 marostegui: Stop db1074 and db2107 in sync to fix drifts on s2 change_tag - T260042
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074', diff saved to https://phabricator.wikimedia.org/P12374 and previous config saved to /var/cache/conftool/dbconfig/20200827-045329-marostegui.json
  • 04:04 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=cloudelastic1006.wikimedia.org
  • 04:03 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=cloudelastic1005.wikimedia.org
  • 04:01 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cloudelastic1005.wikimedia.org
  • 02:03 mutante: shutting down install3001,install4001,install5001 VMs (no OS yet, but please also don't delete, debugging in progress, shutting them down until I continue on T254157)

2020-08-26

  • 23:35 eileen: civicrm revision changed from d2e80f7522 to 481ab742db, config revision is e2ab4d7c1f
  • 23:00 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 22:36 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:30 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:26 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 22:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 19:51 XioNoX: standardize pfw3-eqiad
  • 19:33 marxarelli: 1.36.0-wmf.6 promoted to group1 (T257974). logs show no new errors
  • 19:24 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.6 (duration: 01m 03s)
  • 19:23 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.6
  • 18:21 Urbanecm: Morning B&C done
  • 18:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 945b97c: Added import sources for mlwiktionary (T260716) (duration: 01m 05s)
  • 18:12 Urbanecm: Purge Thai and Greek taglines, URLs are at P12372 (T258552)
  • 18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 4009289: Update Thai and Greek taglines (T258552) (duration: 01m 03s)
  • 18:09 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: 4009289: Update Thai and Greek taglines (T258552) (duration: 01m 05s)
  • 18:08 herron: upgraded eqiad elk v7 cluster from 7.8.0 to 7.9.0 T234854
  • 18:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:02 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:41 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable client side error logging on hewiki (T255585) (duration: 01m 04s)
  • 17:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Documentation-only change; sync for line sanity (duration: 01m 04s)
  • 17:12 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T254349 Set wgVisualEditorEnableBetaFeature true on wikis that need it (duration: 01m 03s)
  • 15:59 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 15:53 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 15:41 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 15:11 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for MCR change', diff saved to https://phabricator.wikimedia.org/P12371 and previous config saved to /var/cache/conftool/dbconfig/20200826-145612-marostegui.json
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12370 and previous config saved to /var/cache/conftool/dbconfig/20200826-145531-marostegui.json
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12369 and previous config saved to /var/cache/conftool/dbconfig/20200826-144750-marostegui.json
  • 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema1002.eqiad.wmnet
  • 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema1001.eqiad.wmnet
  • 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema2001.codfw.wmnet
  • 14:45 elukey@puppetmaster1001: conftool action : set/pooled=inactive:weight=0; selector: name=schema2002.codfw.wmnet
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12368 and previous config saved to /var/cache/conftool/dbconfig/20200826-143623-marostegui.json
  • 14:34 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema2003.codfw.wmnet
  • 14:34 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema2004.codfw.wmnet
  • 14:33 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema1004.eqiad.wmnet
  • 14:33 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=schema1003.eqiad.wmnet
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12367 and previous config saved to /var/cache/conftool/dbconfig/20200826-142746-marostegui.json
  • 14:25 jgleeson: updated civicrm from 0f195c6cca to d2e80f7522
  • 14:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:20 marostegui: Upgrade mysql on db1091 after MCR changes
  • 14:13 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:37 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 100% T261276', diff saved to https://phabricator.wikimedia.org/P12366 and previous config saved to /var/cache/conftool/dbconfig/20200826-133753-kormat.json
  • 13:18 duesen: daniel@mwmaint1002:/srv/mediawiki/php-1.36.0-wmf.5$ mwscript maintenance/findBadBlobs.php dewiki --mark T205936 --revisions - < ~/T205936-dewiki-20050512070000.ids # marking known bad revisions for T205936
  • 13:17 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 75% T261276', diff saved to https://phabricator.wikimedia.org/P12365 and previous config saved to /var/cache/conftool/dbconfig/20200826-131732-kormat.json
  • 13:16 duesen: daniel@mwmaint1002:/srv/mediawiki/php-1.36.0-wmf.5$ mwscript maintenance/findBadBlobs.php oswiki --mark T205936 --revisions - < ~/T205936-oswiki-20090309200000.ids # marking known bad revisions for T205936
  • 13:07 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 50% T261276', diff saved to https://phabricator.wikimedia.org/P12364 and previous config saved to /var/cache/conftool/dbconfig/20200826-130735-kormat.json
  • 13:06 vgutierrez: serve a synthetic warn page to DHE-RSA-AES128-SHA users - T258405
  • 12:47 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 30% T261276', diff saved to https://phabricator.wikimedia.org/P12363 and previous config saved to /var/cache/conftool/dbconfig/20200826-124700-kormat.json
  • 12:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling db1110 @ 20% T261276', diff saved to https://phabricator.wikimedia.org/P12362 and previous config saved to /var/cache/conftool/dbconfig/20200826-122059-kormat.json
  • 12:12 godog: upgrade nagios-nrpe-server to 2.15-2 on jessie hosts - T261198
  • 11:58 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling db1110 T261276', diff saved to https://phabricator.wikimedia.org/P12361 and previous config saved to /var/cache/conftool/dbconfig/20200826-115850-kormat.json
  • 11:56 mlitn@deploy1001: Synchronized php-1.36.0-wmf.6/extensions/WikibaseMediaInfo: MediaSearchQueryBuilder should support keyword only queries (duration: 01m 00s)
  • 11:55 mlitn@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/WikibaseMediaInfo: MediaSearchQueryBuilder should support keyword only queries (duration: 01m 08s)
  • 11:53 kart_: Finished manual run of ContentTranslation/scripts/purge-unpublished-drafts.php script on mwmaint1002 (T261189)
  • 11:39 kart_: Started manual run of ContentTranslation/scripts/purge-unpublished-drafts.php script on mwmaint1002 (T261189)
  • 11:29 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Enable propagateChangeVisibility for testwikidata, part 2 (duration: 01m 03s)
  • 11:26 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable propagateChangeVisibility for testwikidata, part 1 (duration: 01m 19s)
  • 10:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 10:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:00 XioNoX: re-enable IPv6 BGP to Init7 in knams
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 replication broken', diff saved to https://phabricator.wikimedia.org/P12360 and previous config saved to /var/cache/conftool/dbconfig/20200826-084044-marostegui.json
  • 08:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 for MCR change', diff saved to https://phabricator.wikimedia.org/P12358 and previous config saved to /var/cache/conftool/dbconfig/20200826-054557-marostegui.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12357 and previous config saved to /var/cache/conftool/dbconfig/20200826-054409-marostegui.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12356 and previous config saved to /var/cache/conftool/dbconfig/20200826-053345-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12355 and previous config saved to /var/cache/conftool/dbconfig/20200826-052355-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114, db1135 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12354 and previous config saved to /var/cache/conftool/dbconfig/20200826-050849-marostegui.json
  • 05:03 marostegui: Update db1135 and db1114 after MCR changes

2020-08-25

  • 21:51 mutante: xhgui1001/xhgui2001 - Unpacking xhgui (0.12.0-2-wmf1) over (0.9.0-1-wmf1) (T260397)
  • 21:50 mutante: xhgui1001 - Unpacking xhgui (0.12.0-2-wmf1) over (0.9.0-1-wmf1) ...
  • 21:46 mutante: importing xhgui 0.12.0-2-wmf1 to buster-wikimedia APT repo (T260397)
  • 19:40 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@125cb6d]: test: Add wikidata ttl import (duration: 00m 54s)
  • 19:39 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@125cb6d]: test: Add wikidata ttl import
  • 19:15 marxarelli: 1.36.0-wmf.6 promoted to group0 (T257974). no new errors
  • 19:09 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.6
  • 19:05 moritzm: installing Java security updates on cloudelastic* hosts
  • 19:02 moritzm: installing Java security updates on elastic* hosts
  • 18:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 17:58 dduvall@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.6 (duration: 41m 58s)
  • 17:30 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@bc2f7f1]: test: Add wikidata ttl import (duration: 01m 52s)
  • 17:28 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@bc2f7f1]: test: Add wikidata ttl import
  • 17:17 dduvall@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.6
  • 17:08 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.4 (duration: 01m 40s)
  • 17:01 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.3 (duration: 19m 12s)
  • 17:01 herron: imported logstash, elasticsearch, and kibana 7.9.0 -oss packages into buster-wikimedia thirdparty/elastic79
  • 16:42 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@89b4f74]: test: Add wikidata ttl import (duration: 00m 49s)
  • 16:41 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@89b4f74]: test: Add wikidata ttl import
  • 16:21 shdubsh: restart logstash on logstash1007 -- gc duration outlier
  • 16:08 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae6dd8d]: test: Add wikidata ttl import (duration: 00m 54s)
  • 16:07 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@ae6dd8d]: test: Add wikidata ttl import
  • 16:00 gehel: repool wdqs1005 - catched up on lag
  • 15:47 elukey: restart mariadb@analytics_meta on db1108 to apply a replication filter (exclude superset_staging database from replication)
  • 15:44 jgleeson: fundraising-tools updated from dcad0bfe75 to 3fe3a23114
  • 15:41 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@cbf2f9d]: Add wikidata ttl import (duration: 01m 38s)
  • 15:39 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@cbf2f9d]: Add wikidata ttl import
  • 15:22 liw: testing upcoming Scap release on beta
  • 14:56 moritzm: installing rake security updates on stretch
  • 14:56 moritzm: installing take security updates on stretch
  • 14:36 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 14:32 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 05s)
  • 14:32 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
  • 14:26 XioNoX: disable IPv6 BGP to Init7 in knams
  • 14:10 andrew@deploy1001: Finished deploy [horizon/deploy@7a3221d]: add hostname checking --bug T207538 (duration: 03m 50s)
  • 14:06 andrew@deploy1001: Started deploy [horizon/deploy@7a3221d]: add hostname checking --bug T207538
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 for MCR change', diff saved to https://phabricator.wikimedia.org/P12347 and previous config saved to /var/cache/conftool/dbconfig/20200825-135248-marostegui.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'fully repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12346 and previous config saved to /var/cache/conftool/dbconfig/20200825-134736-marostegui.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12345 and previous config saved to /var/cache/conftool/dbconfig/20200825-133734-marostegui.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12344 and previous config saved to /var/cache/conftool/dbconfig/20200825-132027-marostegui.json
  • 13:17 moritzm: installing firejail security updates on remaining mw* servers in eqiad
  • 12:56 godog: upgrade nagios-nrpe-server on scb2* and mwlog* - T261198
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 MCR changes', diff saved to https://phabricator.wikimedia.org/P12343 and previous config saved to /var/cache/conftool/dbconfig/20200825-125108-marostegui.json
  • 12:45 marostegui: Update MySQL on db1111 after MCR change
  • 12:39 marostegui: alter table sites on s6, directly on the primary master T260476
  • 12:39 godog: test nagios-nrpe-server with dh 2048 on scb2001 - T261198
  • 12:35 moritzm: imported ceph packages from stretch-backports to component/ceph T256877
  • 12:10 moritzm: installing ruby-json security updates
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 MCR change', diff saved to https://phabricator.wikimedia.org/P12341 and previous config saved to /var/cache/conftool/dbconfig/20200825-120708-marostegui.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12340 and previous config saved to /var/cache/conftool/dbconfig/20200825-120211-marostegui.json
  • 11:59 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12339 and previous config saved to /var/cache/conftool/dbconfig/20200825-114938-marostegui.json
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12338 and previous config saved to /var/cache/conftool/dbconfig/20200825-113758-marostegui.json
  • 11:36 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:32 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118 MCR changes', diff saved to https://phabricator.wikimedia.org/P12337 and previous config saved to /var/cache/conftool/dbconfig/20200825-112859-marostegui.json
  • 11:25 marostegui: Upgrade mysql on db1118 after MCR change
  • 11:16 Urbanecm: EU B&C done
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: d869e30: Enable ContentTranslation as a default tool in Assamese and Burmese WPs (T258503; T258505) (duration: 01m 00s)
  • 10:59 moritzm: installing remaining libx11 security updates
  • 10:37 arturo: import all binary packages from tesseract-ocr-lang into stretch-wikimedia/component/tesseract-410-bpo (T247422)
  • 10:32 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 10:28 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 10:28 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:23 moritzm: removed fermium.wikimedia.org from debmonitor
  • 09:45 marostegui: Create missing table cx_notification_log on x1 wikishared T261190
  • 08:50 XioNoX: re-activate eqord peering/transit - T259593
  • 08:19 XioNoX: reconfigure eqord to be AS65020 - T259593
  • 08:18 XioNoX: deactivate eqord peering/transit - T259593
  • 07:22 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=1)
  • 07:13 marostegui: Upgrade MySQL on dbstore1004
  • 07:09 dcausse: depooling wdqs1005 (high lag)
  • 07:04 dcausse: restartint blazegraph on wdqs1005 (T242453)
  • 06:20 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111, db1118 for MCR change', diff saved to https://phabricator.wikimedia.org/P12336 and previous config saved to /var/cache/conftool/dbconfig/20200825-053856-marostegui.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12335 and previous config saved to /var/cache/conftool/dbconfig/20200825-053801-marostegui.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12334 and previous config saved to /var/cache/conftool/dbconfig/20200825-052602-marostegui.json
  • 05:21 moritzm: installing Java security updates on relforge*
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12333 and previous config saved to /var/cache/conftool/dbconfig/20200825-051327-marostegui.json
  • 05:11 marostegui: Remove revisions triggers from db2094:3311 T238966
  • 05:10 marostegui: Deploy MCR schema change on s1 codfw, this will create lag on s1 codfw - T238966
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084,db1092 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12332 and previous config saved to /var/cache/conftool/dbconfig/20200825-050451-marostegui.json
  • 04:02 ejegg: updated fundraising python tools from 305f2a4438 to dcad0bfe75
  • 01:49 eileen: civicrm revision changed from ce28723709 to 0f195c6cca, config revision is 96839009f1
  • 01:39 eileen: civicrm revision is ce28723709, config revision is 96839009f1
  • 01:30 eileen: civicrm revision is ce28723709, config revision is 54c8c7abf2
  • 01:17 cdanis: repool esams
  • 01:11 cdanis: T259621 wrong junos version was staged on cr2-esams, abandoning this attempt and putting back in service
  • 01:07 cdanis: cdanis@re0.cr2-esams> request system software add validate re1 /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz
  • 00:56 cdanis: T259621 ❌cdanis@cumin1001.eqiad.wmnet ~ 🕘🍺 homer 'cr*' commit 'drain cr2-esams transport link'
  • 00:36 cdanis: T259621 cdanis@re1.cr3-esams> request chassis routing-engine master switch
  • 00:30 cdanis: T259621 cdanis@re1.cr3-esams> request vmhost reboot re0
  • 00:24 cdanis: T259621 cdanis@re1.cr3-esams> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz re0
  • 00:18 cdanis: T259621 cdanis@re0.cr3-esams> request chassis routing-engine master switch
  • 00:14 cdanis: T259621 cdanis@re0.cr3-esams> request vmhost reboot re1
  • 00:08 cdanis: T259621 cdanis@re0.cr3-esams> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz re1

2020-08-24

  • 23:46 cdanis: depool esams T259621
  • 23:16 Urbanecm: Evening B&C window done
  • 23:06 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 778f710: Alternate configuration mechanism for Parsoid (T241961) (duration: 00m 58s)
  • 22:13 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 22:10 rzl@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:29 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Deployed additional mitigations for T257687 (duration: 00m 58s)
  • 20:29 rzl: re-enabled puppet on 'R:File = /etc/nutcracker/nutcracker.yml' T261154
  • 19:25 rzl: disabling puppet on 'R:File = /etc/nutcracker/nutcracker.yml' to swap mc2028 out for mc2037 T261154
  • 18:10 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: Increase weight of grants and research namespaces in metawiki search (duration: 00m 58s)
  • 15:20 jynus: shutdown backup2001 T260764
  • 15:13 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:08 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:04 vgutierrez: rolling restart of ats-tls to disable ECDHE-RSA-AES128-SHA - T258405
  • 14:58 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:55 rzl: switchover test complete, puppet re-enabled on cumin1001
  • 14:54 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 14:53 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 14:53 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:52 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:52 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 14:52 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:48 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 14:48 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:47 godog: powercycle ganeti5002 -- host down and nothing in console
  • 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:43 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-08-24 14:43:35.570234
  • 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 14:42 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=99)
  • 14:42 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:42 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 14:41 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-08-24 14:41:55.754938
  • 14:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 14:41 dcausse: creating cirrus indices for lldwiki
  • 14:39 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 14:39 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 14:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 14:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 14:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 14:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 14:24 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 14:24 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 14:24 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 14:24 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 14:22 moritzm: installing libexif security updates on stretch
  • 14:18 rzl: disabling puppet on cumin1001 and starting a test of the DC switchover automation, expect some SAL noise but no production impact
  • 14:08 duesen: Deployed patch for T260485
  • 13:59 marostegui: Stop mysql on db1117:3325 to clone db1128 - T260324
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for MCR change', diff saved to https://phabricator.wikimedia.org/P12327 and previous config saved to /var/cache/conftool/dbconfig/20200824-135538-marostegui.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3318 after MCR change', diff saved to https://phabricator.wikimedia.org/P12326 and previous config saved to /var/cache/conftool/dbconfig/20200824-133032-marostegui.json
  • 13:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12325 and previous config saved to /var/cache/conftool/dbconfig/20200824-131305-marostegui.json
  • 13:05 moritzm: installing imagemagick security updates on stretch
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12323 and previous config saved to /var/cache/conftool/dbconfig/20200824-130024-marostegui.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12322 and previous config saved to /var/cache/conftool/dbconfig/20200824-125131-marostegui.json
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 for MCR change', diff saved to https://phabricator.wikimedia.org/P12321 and previous config saved to /var/cache/conftool/dbconfig/20200824-122848-marostegui.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311 after MCR change', diff saved to https://phabricator.wikimedia.org/P12320 and previous config saved to /var/cache/conftool/dbconfig/20200824-122752-marostegui.json
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12319 and previous config saved to /var/cache/conftool/dbconfig/20200824-122050-marostegui.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12318 and previous config saved to /var/cache/conftool/dbconfig/20200824-121200-marostegui.json
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12317 and previous config saved to /var/cache/conftool/dbconfig/20200824-120310-marostegui.json
  • 12:01 Urbanecm: EU B&C window completed
  • 12:01 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 8c380d6: Enable tewiki as import source for tewikibooks (T260107) (duration: 00m 57s)
  • 11:58 XioNoX: test advertise CF tunnel endpoint on cr1-eqiad - T259036
  • 11:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5a6d025: Add retrobibliothek.de to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T261012) (duration: 00m 56s)
  • 11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e1ae39a: Enable mapframe at trwiki (T260594) (duration: 00m 58s)
  • 11:43 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: 1066ecb: Enable MediaSearch A/B test (T254388) (duration: 00m 56s)
  • 11:42 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/ContentTranslation/modules/publish/ext.cx.wikibase.link.js: 74a8718: Publish: Fix broken wikidata linking (T249458) (duration: 00m 58s)
  • 11:39 Urbanecm: Purge 13 URLs with purgeList.php, see P12316 for list of them (T260908; T258552; T261076; T261110)
  • 11:34 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:32 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:32 arturo: add liblept5 1.76.0-1~bpo9+1 (and leptonica-progs) to stretch-wikimedia/component/tesseract-410-bpo (T247422)
  • 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: fe0449d: 74220d0: 7db8a19: Update Chinese wordmarks and taglines, update zhwikisource project logo (T260908; T258552; T261076; T261110) (duration: 00m 59s)
  • 11:29 urbanecm@deploy1001: Synchronized static/images/: fe0449d: 74220d0: 7db8a19: Update Chinese wordmarks and taglines, update zhwikisource project logo (T260908; T258552; T261076; T261110) (duration: 00m 58s)
  • 11:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:46 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:45 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 10:43 moritzm: installing ruby2.3 security updates
  • 10:12 moritzm: installing firejail security updates on mw canaries
  • 09:58 oblivian@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=appserver,service=canary
  • 09:46 XioNoX: add PNI to CF on cr1-eqiad with import/export NONE - T259036
  • 09:18 moritzm: restarting mw canaries to pick up libx11 update
  • 09:13 moritzm: installing libx11 security updates on stretch
  • 09:10 vgutierrez: repool cp5002
  • 09:08 _joe_: restarting php-fpm on mw1344 (stuck in SIGILL for new children)
  • 09:00 vgutierrez: restart ats-tls on cp5002
  • 08:54 moritzm: installing net-snmp security updates on buster
  • 08:52 ema: depool cp5002 due to icinga errors
  • 08:24 moritzm: installing json-c security updates on buster
  • 07:36 XioNoX: push new pfw policies - T261007
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318, db1105:3311 for MCR change', diff saved to https://phabricator.wikimedia.org/P12315 and previous config saved to /var/cache/conftool/dbconfig/20200824-052916-marostegui.json

2020-08-23

  • 20:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:23 gehel: repool wdqs1006 - catched up on lag

2020-08-22

  • 19:33 ryankemper: depooled wdqs1006 (still has 2.5 hours to catch up on)
  • 19:31 ryankemper: pooled wdqs1006 now that lag has dissipated
  • 07:36 gehel: restart blazegraph on wdqs1006 + depool to catchup on lag
  • 05:24 legoktm: legoktm@mwmaint1002:~$ echo "https://releases.wikimedia.org/mediawiki/1.35/" | mwscript purgeList.php --wiki=aawiki

2020-08-21

  • 17:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:39 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:00 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 16:17 zpapierski@deploy1001: Finished deploy [search/mjolnir/deploy@c80e2e7]: .. redeploy after theory verification (duration: 00m 50s)
  • 16:16 zpapierski@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: .. redeploy after theory verification
  • 16:15 zpapierski@deploy1001: deploy aborted: .. (duration: 00m 01s)
  • 16:15 zpapierski@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: ..
  • 13:25 jayme@cumin1001: conftool action : set/pooled=True; selector: dnsdisc=termbox,name=codfw
  • 13:25 jayme@cumin1001: conftool action : set/pooled=False; selector: dnsdisc=termbox,name=codfw
  • 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 09:02 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:01 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 01:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 01:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime

2020-08-20

  • 22:31 eileen: civicrm revision changed from 27d5900f7d to ce28723709, config revision is 706cf3c898
  • 22:20 eileen: civicrm revision is 27d5900f7d, config revision is 706cf3c898
  • 22:20 mutante: permanently shut down tungsten.eqiad.wmnet T260395 T158837 T180761 T224549
  • 22:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:35 ejegg: updated fundraising CiviCRM from 958a79f660 to 27d5900f7d
  • 20:53 cdanis: repool eqsin
  • 20:37 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:36 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:25 cdanis: cdanis@cr2-eqsin> request vmhost reboot
  • 20:17 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:16 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:13 cdanis: cdanis@cr2-eqsin> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-18.2R3-S5.3.tgz
  • 20:11 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:02 cdanis: depool eqsin for router upgrade
  • 19:57 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
  • 19:37 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:34 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:24 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:17 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 19:17 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 19:17 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.5 refs T257973
  • 19:08 mutante: restarted apache on cont2001 for integration.wikimedia.org docroot change
  • 19:07 mutante: switching document root of integration.wikimedia.org to scap (T149924)
  • 19:02 twentyafterfour: 1.36.0-wmf.5 has no known blockers and logspam is cleaned up, time to roll group2 wikis to wmf.5
  • 18:42 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 18:42 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 18:19 mutante: ores1004 - starting failed celery-ores-worker
  • 18:18 mutante: testreduce1001 - rt_client and vd_client now properly stopped by puppet T257906
  • 17:29 shdubsh: restart elasticsearch on logstash1012 (not 1020) -- high gc runtimes
  • 17:28 shdubsh: restart elasticsearch on logstash1020 -- high gc runtimes
  • 17:23 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 17:23 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
  • 17:23 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 17:22 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 17:22 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 16:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:48 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:46 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:45 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:43 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:40 _joe_: restarted apache2 on icinga1001
  • 16:13 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:11 shdubsh: restart elasticsearch on logstash1011 -- long gc runs
  • 16:10 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:08 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:02 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:06 oblivian@deploy1001: Finished deploy [ores/deploy@8540eec]: various configuration fixes (duration: 09m 03s)
  • 13:57 oblivian@deploy1001: Started deploy [ores/deploy@8540eec]: various configuration fixes
  • 13:53 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:53 oblivian@deploy1001: Finished deploy [ores/deploy@e860508]: switch everything to use envoy as a service proxy T244843 (duration: 14m 00s)
  • 13:39 oblivian@deploy1001: Started deploy [ores/deploy@e860508]: switch everything to use envoy as a service proxy T244843
  • 13:26 oblivian@deploy1001: Finished deploy [ores/deploy@74677b6]: switch testwiki to use envoy as a service proxy T244843 (take 2) (duration: 11m 37s)
  • 13:14 oblivian@deploy1001: Started deploy [ores/deploy@74677b6]: switch testwiki to use envoy as a service proxy T244843 (take 2)
  • 13:11 oblivian@deploy1001: Finished deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy T244843 (duration: 11m 19s)
  • 13:09 gehel: repool wdqs1007 - catched up on lag
  • 13:00 oblivian@deploy1001: Started deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy T244843
  • 12:51 oblivian@deploy1001: Finished deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy T244843 (duration: 07m 03s)
  • 12:44 oblivian@deploy1001: Started deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy T244843
  • 11:49 Lucas_WMDE: EU backport window done
  • 11:44 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/AbuseFilter/includes/AbuseFilterHooks.php: d762e7b: Use $user param when filtering edits (T258717) (duration: 01m 05s)
  • 11:41 eileen: civicrm revision changed from 6c9441a18e to 958a79f660, config revision is 706cf3c898
  • 11:38 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/AbuseFilter/includes/AbuseFilterHooks.php: 00da39b: Use $user param when filtering edits (T258717) (duration: 01m 05s)
  • 11:32 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/Wikibase/client/data-bridge/dist/: Backport: Don't try to load source maps in production (T260852) (duration: 01m 07s)
  • 11:07 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix testwikidata depicts id & CirrusSearchUserTesting config (duration: 01m 06s)
  • 11:07 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=trwiki editor # T260899
  • 10:58 XioNoX: re-pool codfw - T259621
  • 10:53 XioNoX: un-drain cr1-codfw - T259621
  • 10:45 XioNoX: cr1-codfw> request chassis routing-engine master switch - T259621
  • 10:26 hashar: Restarted zuul-merger instances on contint1001 and contint2001
  • 10:24 hashar@deploy1001: Finished deploy [zuul/deploy@8a05b4d]: Support Gerrit replication events (duration: 00m 24s)
  • 10:24 hashar@deploy1001: Started deploy [zuul/deploy@8a05b4d]: Support Gerrit replication events
  • 10:21 XioNoX: cr1-codfw> request chassis routing-engine master switch - T259621
  • 10:12 XioNoX: reboot cr1-codfw:re1 (backup) for upgrade - T259621
  • 09:57 XioNoX: bump cr1-codfw OSPF metrics - T259621
  • 09:51 XioNoX: enable transit/peering and re-set normal OSPF values on cr2-codfw - T259621
  • 09:41 XioNoX: cr2-codfw> request chassis routing-engine master switch - T259621
  • 09:36 eileen: civicrm revision changed from cf9fadbeed to 6c9441a18e, config revision is 706cf3c898
  • 09:33 XioNoX: reboot cr2-codfw:re0 (backup) for upgrade - T259621
  • 09:18 XioNoX: cr2-codfw> request chassis routing-engine master switch - T259621
  • 09:18 kormat: stress-testing db2125 T260670
  • 09:08 XioNoX: reboot cr2-codfw:re1 (backup) for upgrade - T259621
  • 09:03 kormat@cumin1001: dbctl commit (dc=all): 'Repool db2125 after host failure T260670', diff saved to https://phabricator.wikimedia.org/P12303 and previous config saved to /var/cache/conftool/dbconfig/20200820-090313-kormat.json
  • 08:52 kormat: removing /usr/bin/check_mariadb.py from all db hosts T259516
  • 08:52 XioNoX: disable transit/peering on cr2-codfw - T259621
  • 08:48 XioNoX: bump cr2-codfw OSPF metrics - T259621
  • 08:44 jynus: running analyze table on db1115's tendril.global_status_log, may case some stalls on tendril/dbtree T260876
  • 08:41 XioNoX: depool codfw for routers upgrade - T259621
  • 08:31 XioNoX: enable transit/peering on cr3-knams - T259621
  • 08:21 XioNoX: reboot cr3-knams for upgrade - T259621
  • 08:07 XioNoX: disable transit/peering on cr3-knams - T259621
  • 07:39 hashar: contint2001: restarted zuul
  • 07:29 hashar: contint1001: restarted zuul-merger
  • 07:29 hashar@deploy1001: Finished deploy [zuul/deploy@5989ed0]: Upgrade gear from 0.7.0 to 1.15.1+wmf1 - T258630 (duration: 00m 13s)
  • 07:28 hashar@deploy1001: Started deploy [zuul/deploy@5989ed0]: Upgrade gear from 0.7.0 to 1.15.1+wmf1 - T258630
  • 01:54 ejegg: re-enabled fundraising scheduled jobs
  • 00:51 mutante: ms-be1039 - started failed ferm service
  • 00:35 ejegg: stopped fundraising scheduled jobs
  • 00:27 eileen: civicrm revision changed from c442a09153 to cf9fadbeed, config revision is 3cdffd4fc2

2020-08-19

  • 23:20 Urbanecm: Evening B&C window closed
  • 23:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a808999: Enable VisualEditor in namespaces Draft and Wikiproject on hywiki (T260825) (duration: 01m 05s)
  • 22:41 eileen: civicrm revision changed from 34f95a3311 to c442a09153, config revision is 3cdffd4fc2
  • 21:27 eileen: civicrm revision changed from 154519cc1f to 34f95a3311, config revision is 3cdffd4fc2
  • 21:17 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 21:17 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 20:39 dpifke@deploy1001: Finished deploy [performance/arc-lamp@2ef1af7]: Deploy fixes for notifications and OOM prevention (T259167) (duration: 00m 06s)
  • 20:39 dpifke@deploy1001: Started deploy [performance/arc-lamp@2ef1af7]: Deploy fixes for notifications and OOM prevention (T259167)
  • 19:43 ebernhardson: restart mjolnir-kafka-bulk-daemon on search-loader2001 with debug logging
  • 19:20 mutante: testreduce1001 - re-enabled puppet, confirmed parsoid-rt service was now stopped properly by puppet while it runs as before on scandium, the previous parsoid-testing host. switching it over is now a Hiera one-liner. (T257906)
  • 19:15 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.5 refs T257973 (duration: 01m 04s)
  • 19:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.5 refs T257973
  • 19:02 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 60af096: Add autopatrolled group at arzwiki (T260761) (duration: 01m 04s)
  • 18:52 mutante: testreduce1001 - disable puppet; stop parsoid-rt service
  • 18:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 924a03b: Add clinton.presidentiallibraries.us to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T259927) (duration: 01m 04s)
  • 18:45 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 83b34e1: ClosedWikiProvider: Use testUserForCreation rather than testForAuthentication (T258695) (duration: 01m 04s)
  • 18:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 95d45f6: Dont index Draft (118) and Draft talk (119) on hywiki (T260804) (duration: 01m 04s)
  • 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 803cb1a: Update taglines for various projects (T258552) (duration: 01m 04s)
  • 18:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: 803cb1a: Update taglines for various projects (T258552) (duration: 01m 06s)
  • 18:25 mutante: rebooting webperf1002 VM on ganeti level (outside OS) to upgrade rom 8 to 16GB RAM (T260192)
  • 18:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bb4aa44: Configure namespaces on commons to include categories (T198716) (duration: 01m 04s)
  • 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b904333: Update project wordmarks (T254788; sync 2/2) (duration: 01m 04s)
  • 18:19 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: b904333: Update project wordmarks (T254788; sync 1/2) (duration: 01m 06s)
  • 18:15 mutante: rebooting webperf2002 VM on ganeti level (outside OS) to upgrade rom 8 to 16GB RAM (T260192)
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a6f8354: Enable $wgMFNoindexPages for all wikis (T255458) (duration: 01m 07s)
  • 18:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:13 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:38 mutante: decom'ing releases2001.codfw.wmnet (
  • 17:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:39 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:37 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:41 rzl: finished exercising the switchdc cookbooks with --live-test for now, all changes reverted including re-enabling puppet on cumin1001
  • 15:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 15:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 15:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 15:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 15:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:31 jbond42: update java.security https://gerrit.wikimedia.org/r/c/operations/puppet/+/593467
  • 15:30 oblivian@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=api-rw
  • 15:26 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 15:26 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 15:22 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 15:22 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:18 godog: prometheus codfw lvextend --resizefs --size +80G /dev/mapper/vg--ssd-prometheus--ops
  • 15:17 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 15:17 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 15:16 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 15:16 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 15:14 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 15:14 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:08 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 15:08 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:50 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 14:50 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 14:50 rzl: running the switchdc cookbooks with --live-test, simulating a switch to eqiad where we're already running, no production impact is expected
  • 14:47 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 14:47 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 14:41 rzl: disable puppet on cumin1001 for switchdc testing
  • 14:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:33 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:27 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:38 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:34 gehel: depooling wdqs1007 and restarting blazegraph
  • 13:29 _joe_: depooling and disabling puppet on restbase1024 for further investigation
  • 13:27 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:26 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:25 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:10 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:03 _joe_: building and uploading fluent-bit, ratelimit images
  • 13:01 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 12:57 _joe_: building a new version of the base docker images
  • 11:29 awight: EU bacon finished
  • 11:28 effie: restart mwdebug* servers
  • 11:08 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Fix typos in flaggedrevs comments () (duration: 01m 19s)
  • 09:22 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 08:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:36 XioNoX: update firewall policies on pfw - T260585
  • 08:35 jayme: running puppet on A:all-mw-eqiad
  • 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:20 godog: switch grafana.w.o to grafana 7 in codfw - T259143
  • 08:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:18 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:14 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:06 jayme: running puppet on A:all-mw-eqiad
  • 07:46 godog: upgrade to grafana 7 on cloudmetrics hosts - T259143
  • 07:15 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 07:10 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 06:39 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 06:13 eileen: tools revision changed from b4ebd1e564 to 0b9d971bc4
  • 06:07 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 06:04 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 06:03 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 06:00 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 05:55 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 05:53 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 05:47 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 05:37 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 05:31 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 03:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:53 cstone: civicrm revision changed from f5469d0a4c to 154519cc1f
  • 02:00 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 01:05 dpifke@deploy1001: Synchronized wmf-config/profiler.php: Deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/620139 (duration: 01m 18s)
  • 00:49 dpifke@deploy1001: Synchronized wmf-config/ProductionServices.php: Disabling old XHGui backend (T180761) (duration: 05m 13s)
  • 00:15 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster

2020-08-18

  • 23:45 catrope@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/GrowthExperiments: Only fetch task card data for users in variant C/D (T258021) (duration: 01m 05s)
  • 23:44 catrope@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/GrowthExperiments: Only fetch task card data for users in variant C/D (T258021) (duration: 01m 06s)
  • 23:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1301.eqiad.wmnet
  • 23:34 Urbanecm: Run scap pull at mw1301
  • 23:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable static maps on testwiki, disable them on test2wiki (duration: 03m 22s)
  • 23:32 mutante: rebooting mw1301 via mgmt
  • 23:22 mutante: killed reboot-cluster on cumin1001
  • 23:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ac34f72: Enable subpages in NS:0 in techconductwiki (T260350) (duration: 05m 14s)
  • 23:04 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1300.eqiad.wmnet
  • 22:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 22:41 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
  • 22:09 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:07 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:06 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 21:39 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:37 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:34 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 21:24 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 21:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:27 hashar: https://releases-jenkins.wikimedia.org/ changed agent from releases1001 to releases1002
  • 20:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.5 refs T257973
  • 20:11 mutante: running puppet on cp-ats-ulsfo and switching releases-jenkins backend
  • 20:07 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.5 refs T257973 (duration: 53m 12s)
  • 20:00 mutante: releases1001 rm /etc/rsync.d/frag* & run puppet
  • 19:54 mutante: rsyncing /var/lib/jenkins from releases1001 to releases1002/2002 with --delete T256164
  • 19:47 ejegg: updated payments-wiki from a7ee1790e0 to ef7ebd08cb
  • 19:44 hashar: Deleting old jobs from https://releases-jenkins.wikimedia.org/ # T256164
  • 19:41 hashar: releases1001: deleting old legacy mediawiki snapshots under /var/lib/jenkins/{REL1_27,REL1_29,REL1_30} # T256164
  • 19:14 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.5 refs T257973
  • 19:13 twentyafterfour: Promote testwikis from 1.36.0-wmf.4 to 1.36.0-wmf.5 refs T257973
  • 17:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:12 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw14(09|11|13).*
  • 16:03 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
  • 15:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 15:30 jayme@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 15:02 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 14:56 papaul: replacing msw-c1,c2 and c4
  • 14:55 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1104', diff saved to https://phabricator.wikimedia.org/P12293 and previous config saved to /var/cache/conftool/dbconfig/20200818-145337-marostegui.json
  • 14:48 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(55|64|65).*
  • 14:46 XioNoX: move v4 HE on cr3-ulsfo from peering to transit bgp group
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12292 and previous config saved to /var/cache/conftool/dbconfig/20200818-144415-marostegui.json
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12291 and previous config saved to /var/cache/conftool/dbconfig/20200818-143758-marostegui.json
  • 14:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12290 and previous config saved to /var/cache/conftool/dbconfig/20200818-142937-marostegui.json
  • 14:28 marostegui: Stop MYSQL on db2125 for on-site maintenance - T260670
  • 13:54 marostegui: Revoke DELETE and CREATE from xhgui user on m2 T260640
  • 13:53 XioNoX: bump Zayo v4 BGP session in eqiad
  • 13:49 XioNoX: move v4 HE on cr2-eqord from peering to transit bgp group
  • 13:37 XioNoX: move v4 cr1-eqiad from peering to transit bgp group
  • 13:04 kormat: disabling puppet on all db machines T259516
  • 12:57 _joe_: rebooting appservers in eqiad, 3 at a time
  • 12:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 12:37 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 12:34 kormat: deploying wmfmariadbpy 0.4
  • 12:21 jayme@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:53 XioNoX: add new icinga hosts to mr policies - T260533
  • 11:40 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:36 Lucas_WMDE: EU backport&config done
  • 11:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Wikisource wordmark for trwikisource (T260658), part 2 (duration: 00m 55s)
  • 11:32 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf '%s\n' 'https://en.wikipedia.org/static/images/mobile/copyright/wikisource-wordmark-tr.svg' | mwscript purgeList.php # T260658
  • 11:32 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/mobile/copyright/wikisource-wordmark-tr.svg: Config: Add Wikisource wordmark for trwikisource (T260658), part 1 (duration: 00m 55s)
  • 11:24 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Data Bridge on Catalan Wikipedia (T232584) (duration: 01m 01s)
  • 11:06 jbond42: deploy net-snmp update to buster
  • 10:56 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw229.*
  • 10:55 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 10:54 marostegui: Reboot db2125 after running a full upgrade - T260670
  • 10:46 marostegui: Powercycle db2125 from the idrac T260670
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - host down T260670', diff saved to https://phabricator.wikimedia.org/P12288 and previous config saved to /var/cache/conftool/dbconfig/20200818-100718-marostegui.json
  • 09:45 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 09:43 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
  • 09:40 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw214[234].*
  • 09:40 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 09:35 kart_: Update cxserver to 2020-08-17-090424-production (T259980)
  • 09:32 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 09:29 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 09:28 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 09:28 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw214[02].*
  • 09:26 volans: upgraded spicerack to v0.0.39 on cumin hosts
  • 09:25 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 09:21 volans: uploaded spicerack_0.0.39-1+deb10u1 to apt.wikimedia.org buster-wikimedia
  • 09:05 hashar: Restarting CI Jenkins
  • 08:44 vgutierrez: restart ats-tls on cp5006
  • 08:24 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 08:17 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:16 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 08:10 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1089', diff saved to https://phabricator.wikimedia.org/P12284 and previous config saved to /var/cache/conftool/dbconfig/20200818-080256-marostegui.json
  • 07:58 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 07:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 07:45 godog: VictorOps ack'd incidents will re-trigger after 24h if not resolved - T259465
  • 07:44 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12283 and previous config saved to /var/cache/conftool/dbconfig/20200818-074325-marostegui.json
  • 07:42 _joe_: performing rolling reboot of all codfw api servers
  • 07:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12282 and previous config saved to /var/cache/conftool/dbconfig/20200818-072349-marostegui.json
  • 07:19 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw213[5-9].codfw.wmnet
  • 07:16 jynus: update rest of phabricator passwords T250361
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12281 and previous config saved to /var/cache/conftool/dbconfig/20200818-071121-marostegui.json
  • 07:08 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 07:07 godog: prometheus eqiad: add 100G to prometheus/global
  • 07:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 07:01 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 07:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 06:53 twentyafterfour: phabricator maintenance successful
  • 06:48 jynus: deploy another password change to phabricator service (potentially disruptive) T250361
  • 06:41 XioNoX: add cloudflare PNI IPs in eqiad - T259036
  • 06:21 jynus: deploy password change to phabricator service T146055
  • 06:06 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 06:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 05:52 _joe_: running puppet on mc1020 T260622
  • 05:02 twentyafterfour: phabricator appears to be fully functional
  • 05:01 twentyafterfour: phabricator read-only ended
  • 05:00 twentyafterfour: phabricator is now read-only
  • 05:00 marostegui: Failover m3 (phabricator) database master from db1128 to db1132 - T259589
  • 04:32 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1088', diff saved to https://phabricator.wikimedia.org/P12279 and previous config saved to /var/cache/conftool/dbconfig/20200818-043241-marostegui.json
  • 01:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1376.eqiad.wmnet
  • 01:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1343.eqiad.wmnet
  • 01:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1344.eqiad.wmnet
  • 01:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 00:48 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1341.eqiad.wmnet
  • 00:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1340.eqiad.wmnet
  • 00:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1339.eqiad.wmnet
  • 00:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 00:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 00:15 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 00:07 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1315.eqiad.wmnet
  • 00:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)

2020-08-17

  • 23:59 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1313.eqiad.wmnet
  • 23:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 23:41 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1312.eqiad.wmnet
  • 23:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 23:30 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1297.eqiad.wmnet
  • 23:26 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 23:25 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:11 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:10 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1288.eqiad.wmnet
  • 23:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:47 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1286.eqiad.wmnet
  • 22:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:41 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 22:37 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1285.eqiad.wmnet
  • 22:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:26 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:26 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 22:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1284.eqiad.wmnet
  • 22:25 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:23 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 22:21 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 22:21 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:09 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1282.eqiad.wmnet
  • 22:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:02 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1281.eqiad.wmnet
  • 22:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:57 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki=aawiktionary --site-group wiktionary (T259360)
  • 21:56 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 21:56 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 21:53 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add api-gateway.request stream config T259736, one host timed out (duration: 00m 55s)
  • 21:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:48 ppchelko@deploy1001: sync-file aborted: Add api-gateway.request stream config T259736 (duration: 05m 01s)
  • 21:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1278.eqiad.wmnet
  • 21:46 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 21:43 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1279.eqiad.wmnet
  • 21:42 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 21:38 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Further mitigations for T257687 (duration: 00m 57s)
  • 21:38 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:34 effie: blocking temporarily traffic to mc1020
  • 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1276.eqiad.wmnet
  • 21:12 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2240.codfw.wmnet
  • 21:08 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 20:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 20:47 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:38 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 20:20 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 20:02 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 19:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 19:30 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:28 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:22 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:01 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 3 (duration: 02m 57s)
  • 18:58 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 3
  • 18:58 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 2 (duration: 11m 19s)
  • 18:46 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 2
  • 18:46 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002 (duration: 131m 17s)
  • 18:46 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:43 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:39 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:32 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 808c17d: Change logo for lldwiki to match the requested one (T259432) (duration: 00m 56s)
  • 18:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: 67e8f88: Add logo files for lldwiki (T259432) (duration: 00m 56s)
  • 17:17 cdanis@cumin1001: conftool action : set/pooled=yes; selector: name=mw1359.*
  • 17:06 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 17:04 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=codfw,name=mw2246.codfw.wmnet
  • 17:01 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 16:36 jynus: restart backup2001, backup1001 one after the other
  • 16:35 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002
  • 16:31 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 16:27 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 00m 56s)
  • 16:23 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - remove unneeded override for SearchSatisfaction - T259163 (duration: 00m 56s)
  • 16:22 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:21 oblivian@cumin1001: conftool action : set/pooled=inactive; selector: cluster=jobrunner,dc=codfw,name=mw2250.codfw.wmnet
  • 16:20 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=codfw
  • 16:20 cdanis@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:14 cdanis@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1359.*
  • 16:12 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 16:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 15:44 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 3. feeds timed out (duration: 01m 31s)
  • 15:43 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 3. feeds timed out
  • 15:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 2. feeds timed out (duration: 20m 40s)
  • 15:36 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕦☕ homer 'cr*' commit 'revert skipping RPKI validation for Jio AS55836 I0fd4683 T260452'
  • 15:30 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕦☕ homer 'cr*-codfw*' commit 'revert skipping RPKI validation for Jio AS55836 I0fd4683 T260452'
  • 15:22 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 2. feeds timed out
  • 15:22 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054 (duration: 02m 30s)
  • 15:19 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054
  • 15:08 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:06 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:04 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:57 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - all wikis (take 2) - T254606 (duration: 00m 53s)
  • 14:57 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 14:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 14:44 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - all wikis - T254606 (duration: 00m 55s)
  • 14:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - group0 - T254606 (duration: 00m 56s)
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12277 and previous config saved to /var/cache/conftool/dbconfig/20200817-141449-marostegui.json
  • 14:09 marostegui: Sanitize thankyouwiki on db1124:3315, db2094:3315 - T260551
  • 14:03 marostegui: Sanitize lldwiki on db1124:3315 and db2094:3315 T259436
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12276 and previous config saved to /var/cache/conftool/dbconfig/20200817-140229-marostegui.json
  • 13:58 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T259432)
  • 13:54 Urbanecm: Creating thankyouwiki and lldwiki is done
  • 13:54 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 52s)
  • 13:54 Urbanecm: Create account Pcoombe (WMF) at thankyouwiki, email set to pcoombe@wikimedia.org (T259002)
  • 13:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:51 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:49 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating thankyouwiki (T259002)
  • 13:48 urbanecm@deploy1001: Synchronized dblists: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:47 marostegui: Deploy MCR change on db1104
  • 13:47 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating thankyouwiki (T259002) (duration: 00m 56s)
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 for MCR change', diff saved to https://phabricator.wikimedia.org/P12275 and previous config saved to /var/cache/conftool/dbconfig/20200817-134701-marostegui.json
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12274 and previous config saved to /var/cache/conftool/dbconfig/20200817-134619-marostegui.json
  • 13:46 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12273 and previous config saved to /var/cache/conftool/dbconfig/20200817-134604-marostegui.json
  • 13:41 jayme: imported td-agent-bit_1.5.3-0 to buster-wikimedia - T260536
  • 13:40 jayme: imported !log imported to buster-wikimedia
  • 13:39 marostegui: Upgrade db1088 (s6) to a newer mysql version (10.4.14)
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for mysql upgrade', diff saved to https://phabricator.wikimedia.org/P12272 and previous config saved to /var/cache/conftool/dbconfig/20200817-133905-marostegui.json
  • 13:34 jbond42: deploy json-c security update to buster
  • 13:33 marostegui: Restart mysql on db2102 (testing new package)
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12271 and previous config saved to /var/cache/conftool/dbconfig/20200817-133043-marostegui.json
  • 13:29 urbanecm@deploy1001: Synchronized langlist: Creating lldwiki (T259432) (duration: 00m 54s)
  • 13:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating lldwiki (T259432) (duration: 00m 55s)
  • 13:27 urbanecm@deploy1001: sync-file aborted: Creating lldwiki (T259432)¨ (duration: 00m 00s)
  • 13:26 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating lldwiki (T259432) (duration: 00m 53s)
  • 13:25 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating lldwiki (T259432)
  • 13:23 urbanecm@deploy1001: Synchronized dblists: Creating lldwiki (T259432) (duration: 00m 56s)
  • 13:22 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating lldwiki (T259432) (duration: 00m 56s)
  • 13:20 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating lldwiki (T259432) (duration: 00m 55s)
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12270 and previous config saved to /var/cache/conftool/dbconfig/20200817-131307-marostegui.json
  • 13:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:09 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12269 and previous config saved to /var/cache/conftool/dbconfig/20200817-130127-marostegui.json
  • 12:58 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depoool db1089 for MCR change', diff saved to https://phabricator.wikimedia.org/P12268 and previous config saved to /var/cache/conftool/dbconfig/20200817-124458-marostegui.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12267 and previous config saved to /var/cache/conftool/dbconfig/20200817-124409-marostegui.json
  • 12:44 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 12:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 12:35 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:27 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12266 and previous config saved to /var/cache/conftool/dbconfig/20200817-122234-marostegui.json
  • 12:21 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:20 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:19 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:19 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12265 and previous config saved to /var/cache/conftool/dbconfig/20200817-121600-marostegui.json
  • 12:05 Lucas_WMDE: EU backport window done
  • 12:02 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bjnwiki --fix | tee T259429-fix
  • 12:02 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bjnwiki | tee T259429-dryrun
  • 12:01 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Set Portal and Portal_talk namespaces in bjnwiki as an extra namespace. (T259429) (duration: 00m 55s)
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12264 and previous config saved to /var/cache/conftool/dbconfig/20200817-115741-marostegui.json
  • 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Wiktionary wordmark for eswiktionary (T254059), part 2 (duration: 00m 57s)
  • 11:53 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/mobile/copyright/wiktionary-wordmark-es.svg\n' | mwscript purgeList.php # T254059
  • 11:53 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/mobile/copyright/wiktionary-wordmark-es.svg: Config: Add Wiktionary wordmark for eswiktionary (T254059), part 1 (duration: 00m 56s)
  • 11:46 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/zh_classicalwiki%s.png\n' '-1.5x' '-2x' | mwscript purgeList.php # T259006
  • 11:45 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/project-logos/: Config: Change the logo of lzh Wikipedia (T259006) (duration: 00m 55s)
  • 11:40 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Turkish powered by MW and Wikimedia project icons for Turkish Wikiquote, Turkish Wiktionary, Turkish Wikisource and Turkish Wikibooks (T260493) (duration: 00m 55s)
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Turkish powered by MW and Wikimedia project icons (T260492) (duration: 00m 57s)
  • 11:25 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:14 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:09 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] configure mediasearch A/B test (duration: 01m 08s)
  • 11:08 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:54 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:52 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:52 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:52 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:51 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:49 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:47 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:42 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:36 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:35 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:35 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:30 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:14 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:14 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:13 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:13 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:06 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:56 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:55 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:52 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:45 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:42 jynus: updating compiler facts for cloud puppet compiler project to include new host dbprov2003
  • 09:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:38 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:38 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:28 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:27 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:23 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:22 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:18 _joe_: running a full apt-get upgrade on mw1379-1380
  • 09:18 _joe_: re-upgrading imagemagick on mw1378
  • 09:16 _joe_: upgrading packages on mw1377
  • 09:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:06 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:06 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:05 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:25 jayme: forcing a puppet run on all mw-api servers in eqiad - T260329
  • 07:52 _joe_: repooling mw1382
  • 07:37 _joe_: running the same test on mw1382 T260329
  • 07:34 _joe_: repooling mw1381
  • 07:15 _joe_: running the same test on mw1381 T260329
  • 07:15 _joe_: repooled mw1281
  • 06:26 _joe_: stop testing on mw1281, T260329
  • 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:28 marostegui: Stop mysql on db1099:3311, db1099:3318 for reimage
  • 05:28 _joe_: depooling mw1281 for testing for T260329
  • 05:25 marostegui: Deploy schema change on db1139:3311
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311, db1099:3318 for reimage and MCR change', diff saved to https://phabricator.wikimedia.org/P12263 and previous config saved to /var/cache/conftool/dbconfig/20200817-052147-marostegui.json

2020-08-16

  • 11:12 gehel: repooling wdqs1004 - catched up on lag

2020-08-15

  • 21:18 gehel: depooling wdqs1004 and restarting services, will wait to catch up on lag before repooling

2020-08-14

  • 19:41 effie: restart mwdebug1002
  • 16:58 cdanis: done deploying 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8' to all routers T260449
  • 16:44 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr2-esams*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
  • 16:39 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr1-codfw*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
  • 16:36 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr2-codfw*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
  • 02:41 eileen: tools revision changed from 9a89f45974 to b4ebd1e564

2020-08-13

  • 23:39 tzatziki: removing 3 files for legal compliance
  • 22:03 mutante: switching xhgui from tungsten to xhgui1001 - ran puppet on webperf*001 - T180761 T158837
  • 21:54 andrew@deploy1001: Finished deploy [horizon/deploy@f3dcb29]: fix proxy in project-local domain --bug T260388 (duration: 03m 53s)
  • 21:50 andrew@deploy1001: Started deploy [horizon/deploy@f3dcb29]: fix proxy in project-local domain --bug T260388
  • 21:11 mutante: rsyncing /var/lib/jenkins from releases1001 to releases1002 and then all other releases* servers. 57GB, overwriting existing data from manual config (T247652)
  • 20:53 kormat: dropping xhgui.xhgui on m2
  • 19:35 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/DiscussionTools: Revert new reply API (again) T259855 (duration: 00m 57s)
  • 18:06 herron: restarted ES on logstash1010
  • 18:05 dpifke@deploy1001: Synchronized wmf-config/ProductionServices.php: Enabling new XHGui backend (T180761) (duration: 00m 56s)
  • 17:16 hnowlan: deployed ATS and varnish rules to route api.wikimedia.org
  • 16:26 hnowlan: created api.wikimedia.org
  • 15:49 hnowlan: moving api-gateway service to state production. critical set to false
  • 15:41 herron: restart ES on logstash1012
  • 14:56 fdans@deploy1001: Finished deploy [analytics/refinery@ba1a439]: Regular analytics weekly train (duration: 11m 34s)
  • 14:45 ema: repool mw1382 with kernel memory accounting disabled T260281
  • 14:45 fdans@deploy1001: Started deploy [analytics/refinery@ba1a439]: Regular analytics weekly train
  • 14:41 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:40 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:38 ema: reboot mw1382 with kernel memory accounting disabled T260281
  • 14:34 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:34 _joe_: rebooting mw1381 with a newer kernel, mw1383 as control with the old kernel T260329
  • 14:33 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:31 _joe_: installing kernel 4.19.0-0.bpo.9 on mw1381 T260329
  • 14:05 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 14:00 elukey: create schema[12]00[34] in ganeti - T260347
  • 13:59 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:58 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 13:53 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:51 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 13:46 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:45 hnowlan: moving api-gateway service to monitoring_setup
  • 13:44 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 13:44 hashar: Gracefully restarting Zuul
  • 13:39 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:10 _joe_: forcing a puppet run on the api appservers in eqiad T260329
  • 13:07 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: revert enabling of lilypond (again) T257091 T260329 (duration: 00m 59s)
  • 11:24 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:20 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:09 hnowlan: restarting pybal on lvs2010 T254908
  • 11:09 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:06 hnowlan: restarting pybal on lvs2009 T254908
  • 11:05 hnowlan: restarting pybal on lvs1016 T254908
  • 11:05 jayme: depool mw1380 for downgrade of poppler-utils,libpoppler-glib8,libpoppler64,curl,libcurl3,libcurl3-gnutls,libpython3.5,python3.5,libpython3.5-stdlib,python3.5-minimal,libpython3.5-minimal,imagemagick-6-common,libmagickcore-6.q16-3,libmagickwand-6.q16-3,imagemagick-6.q16,imagemagick,e2fslibs,e2fsprogs,libcomerr2,libss2 and reboot - T260329
  • 11:05 hnowlan: restarting pybal on lvs1015 T254908
  • 11:04 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:42 hnowlan: Moving api-gateway service to from service_setup to lvs_setup and running puppet on LVS servers
  • 10:17 jayme: depool mw1379 for downgrade of poppler-utils,libpoppler-glib8,libpoppler64,curl,libcurl3,libcurl3-gnutls,libpython3.5,python3.5,libpython3.5-stdlib,python3.5-minimal,libpython3.5-minimal,imagemagick-6-common,libmagickcore-6.q16-3,libmagickwand-6.q16-3,imagemagick-6.q16,imagemagick,e2fslibs,e2fsprogs,libcomerr2,libss2 and reboot - T260329
  • 10:04 XioNoX: re-order OSPF interfaces on all routers (now partially netbox driven)
  • 09:37 ayounsi@deploy1001: Finished deploy [homer/deploy@89636df]: Homer release v0.2.5 (duration: 03m 03s)
  • 09:34 ayounsi@deploy1001: Started deploy [homer/deploy@89636df]: Homer release v0.2.5
  • 08:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 08:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1082', diff saved to https://phabricator.wikimedia.org/P12247 and previous config saved to /var/cache/conftool/dbconfig/20200813-085547-marostegui.json
  • 08:45 _joe_: downgrading imagemagick on mw1378 T260329
  • 08:43 _joe_: downgrading imagemagick on mw1378 T260281
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 07:55 _joe_: downgrading curl/libcurl3/libcurl3-gnutls on mw1377 T260329
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12246 and previous config saved to /var/cache/conftool/dbconfig/20200813-074528-marostegui.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12244 and previous config saved to /var/cache/conftool/dbconfig/20200813-071943-marostegui.json
  • 07:16 marostegui: Stop replication on db1082 to remove triggers on sanitarium for MCR changs
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P12243 and previous config saved to /var/cache/conftool/dbconfig/20200813-071545-marostegui.json
  • 06:48 marostegui: Deploy MCR change on dbstore1003:3311
  • 06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1126', diff saved to https://phabricator.wikimedia.org/P12242 and previous config saved to /var/cache/conftool/dbconfig/20200813-060135-marostegui.json
  • 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:43 marostegui: Stop MySQL on db2135 (codfw master), haproxy irc alert will fire T260324
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12241 and previous config saved to /var/cache/conftool/dbconfig/20200813-052859-marostegui.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12240 and previous config saved to /var/cache/conftool/dbconfig/20200813-051222-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12239 and previous config saved to /var/cache/conftool/dbconfig/20200813-050107-marostegui.json
  • 02:56 mutante: testreduce1001 - systemctl reset-failed ; fix parsoid-vd systemd state and icinga alert
  • 00:37 mutante: removing jenkins_service_running checks from secondary servers where it's stopped, manually from icinga config, running puppet on icinga
  • 00:14 mutante: re-enabling puppet on releases* servers

2020-08-12

  • 23:44 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:41 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:40 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:39 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:37 wkandek: reboot mw1372
  • 23:36 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:36 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:32 wkandek: reboot mw1373
  • 23:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:31 wkandek: reboot mw1371
  • 23:31 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:31 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:30 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:28 wkandek: reboot mw1384
  • 23:27 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:27 wkandek: reboot mw1385
  • 23:26 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:25 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:24 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:22 wkandek: reboot mw1370
  • 23:22 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:19 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:18 wkandek: reboot mw1369
  • 23:18 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:17 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:17 wkandek: reboot mw1387
  • 23:16 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:16 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:16 wkandek: reboot mw1389
  • 23:15 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:14 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:09 wkandek: reboot mw1368
  • 23:09 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:08 wkandek: reboot me1367
  • 23:08 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:07 wkandek: reboot mw1391
  • 23:07 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:05 ejegg: updated Fundraising CiviCRM from 72452e28a9 to f5469d0a4c
  • 23:05 wkandek: reboot mw1393
  • 23:04 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:04 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:01 wkandek: reboot mw1395
  • 23:01 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:00 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:53 wkandek: reboot mw1397
  • 22:53 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:52 wkandek: reboot mw1366
  • 22:52 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:52 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:52 wkandek: reboot me1365
  • 22:51 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:51 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:51 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:47 wkandek: reboot mw1399
  • 22:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:46 wkandek: reboot mw1364
  • 22:46 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:45 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:44 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:42 wkandek: reboot mw1401
  • 22:42 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:41 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:41 wkandek: reboot mw1355
  • 22:40 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:40 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:38 wkandek: reboot mw1354
  • 22:38 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:36 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:36 wkandek: reboot mw1396
  • 22:36 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:35 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:32 wkandek: reboot mw1353
  • 22:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:31 wkandek: reboot mw1352
  • 22:31 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:31 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:30 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:29 wkandek: reboot mw1348
  • 22:29 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:28 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:26 wkandek: reboot 1347
  • 22:26 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:25 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:23 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:22 wkandek: reboot mw1350
  • 22:22 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:21 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:20 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:19 wkandek: reboot mw1346
  • 22:19 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:18 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:14 wkandek: reboot mw1345
  • 22:13 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:12 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:12 wkandek: reboot mw1349
  • 22:12 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:11 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:08 wkandek: reboot mw1333
  • 22:07 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:07 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1330.eqiad.wmnet
  • 22:03 wkandek: reboot mw1344
  • 22:03 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:02 wkandek: reboot mw1343
  • 22:02 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:02 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:00 wkandek: reboot mw1332
  • 22:00 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:56 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:55 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:53 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:50 wkandek: reboot mw1331
  • 21:50 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:48 wkandek: reboot mw1342
  • 21:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:46 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:46 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1340.eqiad.wmnet
  • 21:41 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:40 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:39 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:39 wkandek: reboot mw1341
  • 21:39 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:37 wkandek@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
  • 21:37 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:36 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:33 wkandek: reboot mw1329
  • 21:33 wkandek: reboot mw1328
  • 21:32 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:29 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:28 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:28 ejegg: updated payments-wiki from 77ff5d70fc to a7ee1790e0
  • 21:25 wkandek: reboot mw1340
  • 21:25 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:23 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:21 wkandek: reboot mw1339
  • 21:20 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:20 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:15 wkandek: reboot mw1327
  • 21:15 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:14 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:13 wkandek: reboot mw1326
  • 21:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:11 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:11 wkandek: reboot mw1317
  • 21:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:10 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:05 wkandek: reboot mw1316
  • 21:04 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:03 wkandek: reboot mw1325
  • 21:03 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:02 wkandek: reboot mw1324
  • 21:02 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:02 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:01 wkandek: reboot mw1315
  • 21:01 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:57 wkandek: reboot mw1323
  • 20:57 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:54 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:52 wkandek: reboot mw1322
  • 20:52 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:51 wkandek: reboot mw1314
  • 20:51 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:50 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:50 wkandek: reboot mw1313
  • 20:50 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:49 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:48 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:44 wkandek: reboot mw1312
  • 20:44 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:43 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:43 wkandek: reboot mw1321
  • 20:42 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:41 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:40 wkandek: reboot mw1297
  • 20:40 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:39 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:39 wkandek: reboot mw1320
  • 20:39 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:38 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:34 wkandek: reboot mw1290
  • 20:34 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:33 wkandek: reboot mw1319
  • 20:33 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:32 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:31 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:29 wkandek: reboot mw1275
  • 20:29 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:28 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:26 wkandek: reboot mw1289
  • 20:25 wkandek: reboot mw1288
  • 20:25 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:25 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:23 wkandek: reboot mw1274
  • 20:23 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:23 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:22 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:20 wkandek: reboot mw1273
  • 20:20 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:16 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:13 wkandek: reboot mw1287
  • 20:13 wkandek: reboot mw1286
  • 20:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:11 wkandek: reboot mw1272
  • 20:11 wkandek: reboot mw1271
  • 19:41 hashar: Upgrading Jenkins on contint2001 (primary)
  • 19:25 hashar: contint1001: sudo systemctl mask jenkins # spare server
  • 19:25 mutante: all releases* servers except 1001 - disable puppet; stop jenkins, mask jenkins
  • 19:22 mutante: releases1002 - stopped and masked jenkins service
  • 19:22 mutante: releases2001 - stopped and masked jenkins service
  • 19:20 mutante: upgrading jenkins on releases*001
  • 19:19 hashar: Upgrading Jenkins on contint1001 (spare)
  • 19:16 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.4
  • 19:13 mutante: uploade new jenkins version to APT repo; upgrading jenkins on releases1002/2002
  • 19:08 effie: pool mw1396
  • 19:06 effie: repool mw1395 mw1397 mw1399
  • 18:56 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 18:55 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 18:50 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/Wikibase/client/includes/Store/Sql/DirectSqlStore.php: Set caching of CachingEntityRevisionLookup to CACHE_NONE in client (duration: 02m 13s)
  • 18:47 wkandek: reboot mw1270
  • 18:47 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:45 wkandek: reboot mw1269
  • 18:41 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:39 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:39 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:38 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 18:25 wkandek: reboot mw1268
  • 18:25 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 18:25 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 18:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:22 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 18:22 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:17 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
  • 18:17 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:16 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on hewiki (T255020) (duration: 01m 03s)
  • 18:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:07 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:07 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:06 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:04 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: Set caching of CachingEntityRevisionLookup to CACHE_NONE in repo (duration: 01m 06s)
  • 18:02 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 18:02 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 18:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:56 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:52 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 17:52 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 17:52 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:51 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 17:51 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:51 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:50 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:49 effie: reboot mw1265 mw1282 mw1283
  • 17:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:45 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:37 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 17:36 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 17:30 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:19 effie: reboot mw1263 mw1264 mw1279 and mw1281
  • 17:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 17:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 17:16 cdanis: for posterity: mw1359 has a bunch of special packages installed (previously recorded in SAL) and also has `sudo memleak-bpfcc -o 60000 -z 31 -Z 33 30` running in a tmux in an attempt to understand what's causing the page fragmentation in the appserver fleet
  • 17:16 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 17:16 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:13 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 17:13 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:03 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 17:00 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 16:57 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Additional mitigations for T257687 (duration: 01m 03s)
  • 16:53 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:52 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:48 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 16:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:35 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:35 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:32 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:31 effie: reboot mw1277 mw1278 && mw1261 mw1262
  • 16:29 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 16:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:04 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: I3726a6364d, T257079 (duration: 01m 02s)
  • 15:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:52 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:50 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:48 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:48 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:42 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:37 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:36 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:32 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:32 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:26 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:22 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:21 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:15 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:14 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:12 cdanis: ✔️ cdanis@mw1359.eqiad.wmnet ~ 🕚☕ sudo apt install linux-headers-4.9.0-12-amd64
  • 15:10 cdanis: ✔️ cdanis@mw1359.eqiad.wmnet ~ 🕚☕ sudo apt install python3-netaddr ieee-data
  • 15:09 cdanis: ✔️ cdanis@mw1359.eqiad.wmnet ~ 🕚☕ sudo dpkg -i bpfcc-tools_0.12.0-2_all.deb libbpfcc_0.12.0-2_amd64.deb python3-bpfcc_0.12.0-2_all.deb
  • 15:08 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:03 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 15:03 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 15:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:54 cdanis: again un-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports
  • 14:53 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 14:52 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:44 cdanis: temporarily re-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports, original in my homedir
  • 14:37 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:37 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:35 cdanis: un-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports
  • 14:32 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:31 cdanis: temporarily kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports, original in my homedir
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:02 kormat: uploaded wmfmariadbpy 0.3 to apt
  • 13:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:42 effie: restart mw1383 & mw1386
  • 13:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:27 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.4 (duration: 01m 16s)
  • 13:25 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.4
  • 13:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:19 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:15 cdanis: ✔️ cdanis@mw1357.eqiad.wmnet ~ 🕘☕ sudo sysctl -w vm/compact_memory=1
  • 13:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:07 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:04 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:59 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:52 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:50 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:33 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 12:27 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:15 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:15 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 12:15 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:51 ema: pool mw1363 after reboot
  • 11:49 jynus: creating artificial low replication lag on db2130 to test icinga alerts T253120
  • 11:41 ema@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:37 ema@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:30 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:28 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:21 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:17 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:17 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:13 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:10 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:08 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:07 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 11:07 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:00 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 11:00 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:55 _joe_: rebooting mw1361
  • 10:51 jayme: rebooting mw1356
  • 10:49 _joe_: rebooting mw1378
  • 09:45 _joe_: repooling mw1377
  • 09:40 _joe_: rebooting mw1377
  • 09:22 _joe_: depool mw1357 tool
  • 09:14 _joe_: depooling mw1377 for inspection
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1110', diff saved to https://phabricator.wikimedia.org/P12220 and previous config saved to /var/cache/conftool/dbconfig/20200812-091211-marostegui.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12219 and previous config saved to /var/cache/conftool/dbconfig/20200812-090831-marostegui.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12218 and previous config saved to /var/cache/conftool/dbconfig/20200812-085021-marostegui.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12217 and previous config saved to /var/cache/conftool/dbconfig/20200812-083548-marostegui.json
  • 07:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for reimage', diff saved to https://phabricator.wikimedia.org/P12215 and previous config saved to /var/cache/conftool/dbconfig/20200812-073130-marostegui.json
  • 04:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for MCR change', diff saved to https://phabricator.wikimedia.org/P12214 and previous config saved to /var/cache/conftool/dbconfig/20200812-045157-marostegui.json

2020-08-11

  • 23:41 Urbanecm: Evening B&C window completed
  • 23:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0f238f7: Update wgMFRemovableClasses (T231160) (duration: 01m 03s)
  • 23:36 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/MobileFrontend/extension.json: c22d65f: Hide vertical nav-boxes on mobile domain (T231160) (duration: 01m 03s)
  • 23:34 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/MobileFrontend/extension.json: 81d54b0: Hide vertical nav-boxes on mobile domain (T231160) (duration: 01m 05s)
  • 23:07 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 28faa27: Switching to updated license definition (duration: 01m 04s)
  • 21:52 krinkle@deploy1001: Synchronized php-1.36.0-wmf.3/includes/skins/SkinMustache.php: Ibe1f07346, T259872, T259858 (duration: 01m 04s)
  • 19:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 19:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 19:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add streams for eventgate-main - T251935 (duration: 01m 04s)
  • 19:21 ejegg: updated payments-wiki from f199c071c3 to 77ff5d70fc
  • 18:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:48 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 18:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Grant investigate right to checkuser group on frwiki (T260171) (duration: 01m 04s)
  • 18:18 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Beta-only: Configured additional settings for API Portal beta wiki gerrit:619339 (duration: 01m 03s)
  • 18:05 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Direct GrowthExperiments help panel questions to mentors on cswiki (T250235) (duration: 01m 03s)
  • 17:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Remove extraneous mediawiki.api-request stream - T251935 (duration: 01m 01s)
  • 17:53 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:53 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:43 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:43 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:38 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:33 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:28 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:25 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:58 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:53 hashar@deploy1001: Synchronized php-1.36.0-wmf.4/skins/MinervaNeue/: Revert "ServiceWiring: Avoid usage of deprecated Title::getSubjectPage()" - T260155 (duration: 01m 06s)
  • 16:12 herron: migrating lists.wikimedia.org services from fermium to lists1001 T224586
  • 15:36 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.4
  • 15:27 hashar@deploy1001: Finished scap: (no justification provided) (duration: 30m 51s)
  • 14:59 marostegui: Deploy MCR change on db1116:3318
  • 14:56 hashar@deploy1001: Started scap: (no justification provided)
  • 14:56 hashar@deploy1001: Pruned MediaWiki: 1.36.0-wmf.2 (duration: 04m 15s)
  • 14:55 jayme: updated helmfile to 0.125.2-1 on contint* and deploy*
  • 14:52 otto@deploy1001: Finished deploy [analytics/refinery@35c4430]: Deploying to an-launcher1002 to get camus wrapper script changes - T251935 (duration: 01m 14s)
  • 14:51 otto@deploy1001: Started deploy [analytics/refinery@35c4430]: Deploying to an-launcher1002 to get camus wrapper script changes - T251935
  • 14:50 hashar@deploy1001: Pruned MediaWiki: 1.36.0-wmf.1 (duration: 02m 07s)
  • 14:48 jayme: imported helmfile_0.125.2-1 to buster-wikimedia, jessie-wikimedia, stretch-wikimedia
  • 14:47 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.41 (duration: 04m 20s)
  • 14:40 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.40 (duration: 10m 24s)
  • 14:37 papaul: replacing msw-b5,b6,b7 and b8
  • 14:30 hashar: Cleaning old MediaWiki versions that were never removed
  • 14:27 hashar@deploy1001: sync aborted: testwikis wikis to 1.36.0-wmf.4 (duration: 72m 36s)
  • 14:10 hashar: mw1319: scap pull
  • 13:27 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 13:27 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:23 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:16 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:14 hashar@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.4
  • 13:12 hashar: Applied 1.36.0-wmf.4 security patches # T257972
  • 13:03 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 13:03 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:52 kormat: uploaded wmfmariadbpy 0.2 packages to apt1001
  • 12:36 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:36 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 11:54 marostegui: Install new MariaDB 10.4.14 on db2102
  • 11:42 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:30 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 11:18 Urbanecm: EU B&C window done
  • 11:08 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 619255|Enable ContentTranslation in Sundanese WP as a default tool (T258502) (duration: 00m 59s)
  • 10:39 volans: migrating *all* eqiad mgmt DNS records to the autogenerated ones via Netbox - T233183
  • 10:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:34 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:20 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0)
  • 10:01 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh
  • 10:00 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 09:51 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 09:29 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:25 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:11 marostegui: Rename tables on muswiki and mhwiktionary on s3 master (db1123) without replication T260112
  • 09:01 volans: renewed puppet certificate on scb1001.eqiad.wmnet
  • 08:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e6ec237: Revert "Turn muswiki and mhwiktionary to read-only" (T259004) (duration: 00m 58s)
  • 08:45 urbanecm@deploy1001: Synchronized dblists/: 81f4594: Point muswiki and mhwiktionary to s5 (T259004; 3/3) (duration: 00m 58s)
  • 08:44 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: 81f4594: Point muswiki and mhwiktionary to s5 (T259004; 2/3) (duration: 00m 58s)
  • 08:43 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: 81f4594: Point muswiki and mhwiktionary to s5 (T259004; 1/3) (duration: 01m 02s)
  • 08:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a04bc1f: Turn muswiki and mhwiktionary to read-only (T259004) (duration: 01m 01s)
  • 08:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 06:54 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 06:45 XioNoX: Re-prioritize peering over transit eqiad/esams - T259614
  • 01:59 tstarling@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: enabling fast stale mode T250248 (duration: 00m 58s)
  • 00:33 dpifke@deploy1001: Finished deploy [performance/arc-lamp@fc5f1c6]: Deploying latest attempt to fix T259167 (duration: 01m 03s)
  • 00:31 dpifke@deploy1001: Started deploy [performance/arc-lamp@fc5f1c6]: Deploying latest attempt to fix T259167
  • 00:24 mutante: reverting switch of releases.wikimedia.org for today since releases-jenkins.wikimedia.org is tied to it and new jenkins still needs some config and plugins (T247652)
  • 00:08 mutante: releases-jenkins.wikimedia.org currently under maintenance (T247652)

2020-08-10

  • 23:56 eileen: tools revision changed from 22550f38c5 to 9a89f45974
  • 23:53 mutante: https://releases.wikimedia.org switched to new backends running Debian buster. files have been synced. httpbb tests have been created and pass. (T247652)
  • 23:52 mutante: https://releases.wikimedia.org switched to new backends running Debian buster. files have been synced of course.
  • 20:13 hashar: Updated container for Jenkins job operations-puppet-tests-buster-docker https://gerrit.wikimedia.org/r/c/integration/config/+/619359/
  • 20:10 ejegg: updated payments-wiki from 932aacde54 to f199c071c3
  • 18:32 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@3e12dbb]: 0.3.44 (duration: 15m 18s)
  • 18:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:17 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 18:17 ryankemper@deploy1001: Started deploy [wdqs/wdqs@3e12dbb]: 0.3.44
  • 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Special:Investigate on frwiki (T257891) (duration: 00m 58s)
  • 18:07 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Explicitly disable nativeGallery in Parsoid settings (no-op) (duration: 00m 58s)
  • 18:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump the weight of near match for search (T257922) (duration: 00m 59s)
  • 17:56 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:52 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:49 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-analytics streams - T251935 (duration: 01m 02s)
  • 17:46 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:38 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:38 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:34 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:34 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:31 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 17:31 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:14 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:12 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:04 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:03 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:59 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:59 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 15:59 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:55 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:17 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:11 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:01 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:48 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:15 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:15 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:14 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:55 XioNoX: Re-prioritize peering over transit - codfw - T259614
  • 12:34 XioNoX: Re-prioritize peering over transit - eqsin - T259614
  • 12:07 XioNoX: standardize cr1-eqiad interfaces
  • 11:56 Urbanecm: EU B&C window done
  • 11:55 Urbanecm: Run `mwscript namespaceDupes.php --wiki=tiwiki --fix` at mwmaint1002 (T259295)
  • 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 14b2897: Define Portal namespace for tiwiki (T259295) (duration: 00m 59s)
  • 11:49 urbanecm@deploy1001: Synchronized static/images/project-logos/: bbbf701: Regenerate Bengali Wikipedia logo from source SVG (T259292) (duration: 00m 59s)
  • 11:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0d8366f: Search Work NS by default at bnwikisource (T258982) (duration: 00m 59s)
  • 11:37 Urbanecm: Run `mwscript namespaceDupes.php --wiki=hywiki --fix` at mwmaint1002 (T259987)
  • 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1771487: add two extra namespaces for hywiki (T259987) (duration: 00m 59s)
  • 11:28 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/shnwiktionary*.png with purgeList.php (T260010)
  • 11:27 XioNoX: standardize cr2-eqiad interfaces
  • 11:27 urbanecm@deploy1001: Synchronized static/images/project-logos/: c5c96ca: Regenerate shnwiktionary logo from source svg (T260010) (duration: 00m 58s)
  • 11:21 XioNoX: repool ulsfo
  • 11:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a15e3a2: Increase autoconfirmed threshold for Chinese Wikinews to 7 days and 20 edits at least (T259869) (duration: 00m 58s)
  • 11:13 XioNoX: Re-prioritize peering over transit - ulsfo - T259614
  • 11:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ba0b2ab: Create TemplateEditor group on zhwiki (T260012) (duration: 00m 58s)
  • 11:09 Urbanecm: Run mwscript namespaceDupes.php --wiki=ptwikinews --fix --add-prefix=T259959 (T259959)
  • 11:09 Urbanecm: Run mwscript namespaceDupes.php --wiki=ptwikinews --fix (T259959)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 010f63e: Add WN as an alias to project namespace in Portuguese Wikinews (T259959) (duration: 00m 58s)
  • 11:06 urbanecm@deploy1001: sync-file aborted: 010f63e: Add WN as an alias to project namespace in Portuguese Wikinews (T259959¨) (duration: 00m 00s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 01s)
  • 10:42 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.pool (exit_code=0)
  • 10:37 jayme@cumin1001: START - Cookbook sre.discovery.pool
  • 10:36 jayme@cumin1001: END (FAIL) - Cookbook sre.discovery.pool (exit_code=99)
  • 10:36 jayme@cumin1001: START - Cookbook sre.discovery.pool
  • 10:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:29 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.depool (exit_code=0)
  • 10:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:23 jayme@cumin1001: START - Cookbook sre.discovery.depool
  • 10:19 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.pool (exit_code=0)
  • 10:18 jayme@cumin1001: START - Cookbook sre.discovery.pool
  • 10:14 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:10 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:04 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:56 hashar: Updated containeer for Jenkins job operations-dns-lint-docker https://gerrit.wikimedia.org/r/619267
  • 09:55 hashar: Updated container for Jenkins job operations-puppet-tests-buster-docker https://gerrit.wikimedia.org/r/619266
  • 09:54 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.depool (exit_code=0)
  • 09:49 jayme@cumin1001: START - Cookbook sre.discovery.depool
  • 09:21 marostegui: Promote dbproxy1019 back T255408
  • 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:43 marostegui: Remove revision triggers from db2094:3318 T238966
  • 06:42 marostegui: Stop replication on s8 codfw master to deploy MCR change, this will generate lag on s8 codfw T238966
  • 04:46 marostegui: Depool dbproxy1019 for reimage T255408

2020-08-09

  • 21:58 ejegg: updated payments-wiki from cd012f37f1 to 932aacde54
  • 03:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)

2020-08-08

  • 02:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 02:21 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload

2020-08-07

  • 16:42 jforrester@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/DiscussionTools/: T259855 Revert new reply API (duration: 01m 06s)
  • 15:01 volans: import DNS names for network devices in Netbox - T258729
  • 13:27 godog: bounce pybal on lvs1016 and then lvs1015 to reset state, logstash1025 reported down but actually up
  • 10:27 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:27 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:02 elukey: reboot deneb via ganeti2021 (hostname config pointing to recdns for some reason)
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1092', diff saved to https://phabricator.wikimedia.org/P12195 and previous config saved to /var/cache/conftool/dbconfig/20200807-091527-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12194 and previous config saved to /var/cache/conftool/dbconfig/20200807-084747-marostegui.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12193 and previous config saved to /var/cache/conftool/dbconfig/20200807-080719-marostegui.json
  • 07:50 godog: prometheus codfw lvextend --resize --size +60G /dev/mapper/vg--hdd-prometheus--global
  • 07:49 godog: prometheus codfw lvextend --resize --size +30G /dev/mapper/vg--ssd-prometheus--k8s
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12192 and previous config saved to /var/cache/conftool/dbconfig/20200807-074658-marostegui.json
  • 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for upgrade', diff saved to https://phabricator.wikimedia.org/P12191 and previous config saved to /var/cache/conftool/dbconfig/20200807-063431-marostegui.json

2020-08-06

  • 23:21 catrope@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/GrowthExperiments/: Fixes for WelcomeSurvey language question (T232410) (duration: 00m 59s)
  • 23:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change GrowthExperiments mentor list on fawiki (T253291) (duration: 00m 59s)
  • 21:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:39 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:35 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:33 brennen@deploy1001: Synchronized php-1.36.0-wmf.3/vendor: Update git submodules (vendor) (T259832) (duration: 01m 08s)
  • 21:32 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 20:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 20:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 20:47 shdubsh: restart logstash -- pipeline appears stuck
  • 20:38 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:38 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 20:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 20:19 brennen: manually updating the vendor submodule on 1.36.0 for T259832
  • 20:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 20:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:47 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:45 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - wgEventStreams - fix another typo in eventgate stream config - T251935 (duration: 00m 58s)
  • 19:40 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - wgEventStreams - fix typo in eventgate stream config - T251935 (duration: 00m 59s)
  • 19:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:04 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.3
  • 18:58 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:57 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:29 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 18:29 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 18:21 Urbanecm: Morning B&C window was completed
  • 18:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/GrowthExperiments/modules/: fb4a808: Fix "Ask mentor" help panel button styling (T250235) (duration: 01m 07s)
  • 18:11 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 18:11 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9db9659: Remove temporary logging for mediamoderation (T259742) (duration: 01m 07s)
  • 18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9695811: : Enable DiscussionTools as a beta feature on 8 more wikis ("phase 1") (T259574) (duration: 01m 06s)
  • 17:42 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.3 (duration: 01m 06s)
  • 17:41 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.3
  • 17:37 brennen: train 1.36.0-wmf.3: proceeding to group1
  • 17:36 brennen@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/WikibaseMediaInfo/src/View/MediaInfoEntityTermsView.php: Backport: Fix array unpacking as argument list (T259745) (duration: 01m 07s)
  • 16:32 chrisalbon@deploy1001: Finished deploy [ores/deploy@f3c44be]: T258435 (duration: 14m 12s)
  • 16:18 dpifke@deploy1001: Finished deploy [performance/arc-lamp@7838c88]: Deploying fixes for T259167 (duration: 00m 05s)
  • 16:18 dpifke@deploy1001: Started deploy [performance/arc-lamp@7838c88]: Deploying fixes for T259167
  • 16:18 chrisalbon@deploy1001: Started deploy [ores/deploy@f3c44be]: T258435
  • 15:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:10 fdans@deploy1001: Finished deploy [analytics/refinery@97a02a3]: Regular analytics weekly train [analytics/refinery@97a02a3 (duration: 20m 01s)
  • 14:50 fdans@deploy1001: Started deploy [analytics/refinery@97a02a3]: Regular analytics weekly train [analytics/refinery@97a02a3
  • 14:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-* test.event streams - T251935 (duration: 01m 08s)
  • 13:32 jayme: updated helm to 2.16.9-2 on contint*, deploy* and chartmuseum*
  • 13:24 jayme: imported helm_2.16.9-2 and tiller_2.16.9-2 to buster-wikimedia, jessie-wikimedia and stretch-wikimedia
  • 12:06 kart_: Updated cxserver to 2020-08-05-070016-production (T258919, T199523, T257943, T256194)
  • 12:03 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 11:59 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 11:57 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 11:54 Lucas_WMDE: EU backport window done
  • 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/Flow/: Backport: Pass jQuery objects into jqueryMsg (duration: 01m 09s)
  • 11:53 XioNoX: reboot cr2-eqord - T259621
  • 11:37 XioNoX: drain traffic away cr2-eqord - T259621
  • 11:27 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/Wikibase/lib/: Backport: Fix CachingFallbackLabelDescriptionLookup failing in edge-cases (T259744) (duration: 01m 10s)
  • 11:22 XioNoX: reboot cr2-eqdfw - T259621
  • 11:13 XioNoX: drain traffic away cr2-eqdfw - T259621
  • 10:52 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:48 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:45 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 10:23 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:16 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:14 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:12 jynus@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:11 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1127', diff saved to https://phabricator.wikimedia.org/P12188 and previous config saved to /var/cache/conftool/dbconfig/20200806-084406-marostegui.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12187 and previous config saved to /var/cache/conftool/dbconfig/20200806-083743-marostegui.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12186 and previous config saved to /var/cache/conftool/dbconfig/20200806-083033-marostegui.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12185 and previous config saved to /var/cache/conftool/dbconfig/20200806-081416-marostegui.json
  • 07:03 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 06:57 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 06:57 marostegui: Truncate tables on zerowiki T227717
  • 06:53 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 06:47 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 06:43 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 06:37 elukey: roll restart of druid clusters' zookeeper and an-conf* zookeeper for openjdk-11 upgrades
  • 06:36 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 for MCR', diff saved to https://phabricator.wikimedia.org/P12184 and previous config saved to /var/cache/conftool/dbconfig/20200806-050743-marostegui.json
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079', diff saved to https://phabricator.wikimedia.org/P12182 and previous config saved to /var/cache/conftool/dbconfig/20200806-045622-marostegui.json
  • 04:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12181 and previous config saved to /var/cache/conftool/dbconfig/20200806-045107-marostegui.json
  • 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12180 and previous config saved to /var/cache/conftool/dbconfig/20200806-044608-marostegui.json
  • 04:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12179 and previous config saved to /var/cache/conftool/dbconfig/20200806-043758-marostegui.json
  • 03:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2019.codfw.wmnet
  • 02:24 eileen: process-control config revision is 525eb71235 turn off delete deleted contacts
  • 01:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:17 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:35 mutante: wtp2019 - reimaging - parsoid service does not work, unlike on all other wtp*, making sure it's clean
  • 00:00 mutante: LDAP - removed demon from nda group

2020-08-05

  • 23:57 eileen: civicrm revision changed from 150c3476c4 to 72452e28a9, config revision is b6ece03513
  • 23:02 shdubsh: logstash in codfw looks stuck -- restarting
  • 19:41 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.2
  • 19:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:13 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.3 (duration: 01m 44s)
  • 19:11 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.3
  • 18:26 Lucas_WMDE: Morning backport window done
  • 18:25 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/ContentTranslation/: Backport: Pass jQuery objects into jqueryMsg (duration: 01m 11s)
  • 18:14 mutante: test !log
  • 18:11 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Re-enable growth study quick survey (T257015) (duration: 01m 12s)
  • 17:30 shdubsh: test prometheus-icinga-exporter upgrade on icinga2001
  • 16:50 elukey: powercycle stat1005 after GPU issue
  • 15:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-logging-external streams and destination_event_service settings - T251935 (duration: 01m 05s)
  • 15:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:11 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 godog: bounce logstash on logstash100[789] - udp loss reported
  • 15:05 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 14:48 elukey: reboot stat1008 for unexpected maintenance (GPU stuck)
  • 14:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:32 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:27 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:27 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:25 moritzm: installing nmap bugfix updates from buster point release
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:20 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:20 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:14 moritzm: installing pillow security updates
  • 14:03 moritzm: installing node-minimist security updates
  • 13:51 moritzm: installing Linux update to 4.9.132 from buster point update (no reboots, just the package updates)
  • 13:32 jayme: updated helmfile to 0.125.2-0 and helm-diff to 3.1.2-1 on contint* and deploy*
  • 13:28 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:24 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:04 elukey: restart yarn resource managers on an-master100[12] to pick up new Yarn settings - https://gerrit.wikimedia.org/r/c/operations/puppet/+/618529
  • 13:00 moritzm: installing libjpeg-turbo security updates on stretch
  • 12:52 XioNoX: netmon1002:/srv/deployment/librenms/librenms$ sudo -u librenms ./lnms migrate
  • 12:49 jayme: imported helm-diff_3.1.2-1 to buster-wikimedia, jessie-wikimedia and stretch-wikimedia
  • 12:46 moritzm: installing imagemagick security updates on buster
  • 12:33 moritzm: installing net-snmp security updates on icinga hosts
  • 11:36 awight: EU Bacon reclosed
  • 11:36 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Switch test wikis to new version of vector by default (3/3) (T254227) (duration: 01m 07s)
  • 11:29 awight: EU Bacon reopened
  • 11:28 awight: EU Bacon complete
  • 11:26 awight@deploy1001: Synchronized wmf-config: Config: FileImporter: full default deployment (T232542) (duration: 01m 04s)
  • 11:23 jayme: imported helm-diff_3.1.2-0 to jessie-wikimedia and stretch-wikimedia
  • 11:22 jayme: imported helm-diff_3.1.2-0 to buster-wikimedia
  • 11:19 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add import sources for lijwikisource (T259633) (duration: 01m 07s)
  • 11:13 awight@deploy1001: sync-file aborted: Config: Add import sources for lijwikisource (T259633) (duration: 00m 13s)
  • 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Data Bridge on Test Wikidata clients (T232584) (duration: 01m 20s)
  • 10:39 XioNoX: reboot cr3-ulsfo - T259621
  • 10:28 XioNoX: drain traffic away cr3-ulsfo - T259621
  • 10:21 moritzm: installing libssh security updates
  • 10:18 XioNoX: reboot cr4-ulsfo - T259621
  • 09:58 XioNoX: drain traffic away cr4-ulsfo
  • 09:53 XioNoX: depool ulsfo - T259621
  • 09:32 elukey: set ticket max renewable lifetime to 7d on all kerberos clients (was zero, the default)
  • 09:07 jayme: imported helmfile_0.125.2-0 to jessie-wikimedia
  • 09:07 jayme: imported helmfile_0.125.2-0 to stretch-wikimedia
  • 09:05 jayme: imported helmfile_0.125.2-0 to buster-wikimedia
  • 08:39 marostegui: Remove revision triggers on db1125:3317
  • 08:39 marostegui: Stop replication on db1079 for MCR, this will generate lag on s7 on labsdb
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for MCR', diff saved to https://phabricator.wikimedia.org/P12173 and previous config saved to /var/cache/conftool/dbconfig/20200805-083916-marostegui.json
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1094', diff saved to https://phabricator.wikimedia.org/P12172 and previous config saved to /var/cache/conftool/dbconfig/20200805-083833-marostegui.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12171 and previous config saved to /var/cache/conftool/dbconfig/20200805-082908-marostegui.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12170 and previous config saved to /var/cache/conftool/dbconfig/20200805-082138-marostegui.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12169 and previous config saved to /var/cache/conftool/dbconfig/20200805-081237-marostegui.json
  • 07:49 marostegui: Stop mysql on db1117:3323 (this will generate haproxy irc alerts) T259589
  • 07:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 07:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:26 moritzm: installing perl security updates on buster
  • 07:20 moritzm: installing libexif security updates on buster
  • 07:14 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 07:13 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 07:04 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 07:04 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 06:50 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:50 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:46 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:46 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 05:53 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for MCR', diff saved to https://phabricator.wikimedia.org/P12167 and previous config saved to /var/cache/conftool/dbconfig/20200805-050907-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1136', diff saved to https://phabricator.wikimedia.org/P12166 and previous config saved to /var/cache/conftool/dbconfig/20200805-050808-marostegui.json
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12165 and previous config saved to /var/cache/conftool/dbconfig/20200805-050308-marostegui.json
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12164 and previous config saved to /var/cache/conftool/dbconfig/20200805-045334-marostegui.json
  • 04:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12163 and previous config saved to /var/cache/conftool/dbconfig/20200805-043346-marostegui.json

2020-08-04

  • 22:41 brennen: restarting php7.2-fpm on mw1404 for opcache issues
  • 21:45 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 21:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 21:34 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 21:03 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 21:03 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:52 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 20:27 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@c80e2e7]: use provided ca certs for elasticsearch (duration: 02m 22s)
  • 20:25 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: use provided ca certs for elasticsearch
  • 20:15 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@b17bfd4]: Move mjolnir daemons from cirrus hosts to dedicated instances (duration: 02m 07s)
  • 20:12 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@b17bfd4]: Move mjolnir daemons from cirrus hosts to dedicated instances
  • 19:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.3
  • 19:11 brennen@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.3 (duration: 91m 03s)
  • 19:03 brennen: current 1.36.0-wmf.3 train status (T257971): mid scap-cdb-rebuild for testwiki sync; will proceed with group0 when finished.
  • 18:55 sukhe: upload pdns-recursor_4.3.3-1~deb10u1 to apt.wm.o (buster) - T252132
  • 18:49 mutante: letting puppet install envoy on all ores1* hosts
  • 18:46 mutante: letting puppet install envoy on all ores2* hosts
  • 18:37 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:19 mutante: temp disabling puppet on all ores hosts to add envoy
  • 17:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:40 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:40 brennen@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.3
  • 17:36 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:17 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:05 brennen: 1.36.0-wmf.3 was branched at 2d0cf09cdf for T257971
  • 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:49 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:24 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:15 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 15:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Set default topic_prefixes - T255888 (duration: 00m 58s)
  • 15:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 15:39 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 15:39 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 15:38 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:18 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove now unused wgEventServiceStreamConfig - T229863 (duration: 00m 58s)
  • 15:18 moritzm: installing jackson-databind security issues
  • 15:08 moritzm: installing qemu security updates on cloudvirt* Stretch hosts
  • 14:54 cmjohnson1: swapping kubernetes1010 network cable T257542
  • 14:48 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 14:41 cmjohnson1: powercycling analytics1050 T258370
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for MCR', diff saved to https://phabricator.wikimedia.org/P12161 and previous config saved to /var/cache/conftool/dbconfig/20200804-143524-marostegui.json
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12160 and previous config saved to /var/cache/conftool/dbconfig/20200804-142710-marostegui.json
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12159 and previous config saved to /var/cache/conftool/dbconfig/20200804-142220-marostegui.json
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12158 and previous config saved to /var/cache/conftool/dbconfig/20200804-141556-marostegui.json
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12157 and previous config saved to /var/cache/conftool/dbconfig/20200804-141004-marostegui.json
  • 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 13:51 hashar: Install newer openjdk on contint2001 and restarting CI Jenkins
  • 12:00 jayme: helm was updated: 2.16.7-2 -> 2.16.9-1 on chartmuseum*, contint*, deploy*
  • 11:43 Lucas_WMDE: EU backport window done
  • 11:41 marostegui: Deploy schema change on s3 codfw master, lag might show up on codfw s3 T259238
  • 11:37 moritzm: installing openjdk-11 security updates
  • 11:36 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Load WikibaseRepo using extension registration in production (T257433) (duration: 00m 58s)
  • 11:12 Lucas_WMDE: Deployed patch for T86738 / T259565
  • 11:03 moritzm: installing e2fsprogs security updates for stretch
  • 10:47 moritzm: installing tomcat8 security updates
  • 10:47 vgutierrez: upgrade acme-chief to version 0.28
  • 10:33 vgutierrez: upload acme-chief 0.28 to apt.wm.o (buster) - T259338
  • 10:18 moritzm: installing imagemagick security updates on stretch
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for MCR and PK change T259524', diff saved to https://phabricator.wikimedia.org/P12156 and previous config saved to /var/cache/conftool/dbconfig/20200804-100035-marostegui.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12155 and previous config saved to /var/cache/conftool/dbconfig/20200804-095608-marostegui.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12154 and previous config saved to /var/cache/conftool/dbconfig/20200804-094909-marostegui.json
  • 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:58 moritzm: installing python3.5 security updates
  • 08:15 moritzm: installing remaining cups security updates
  • 08:13 XioNoX: cleaning up a bunch of prefix limit reached issues
  • 08:00 marostegui: Failover m2 from db1132 to db1107 -T257540
  • 07:54 moritzm: installing poppler security updates on stretch
  • 07:43 jayme: imported helm_2.16.9-1 to jessie-wikimedia
  • 07:43 jayme: imported helm_2.16.9-1 to stretch-wikimedia
  • 07:38 jayme: imported helm_2.16.9-1 to buster-wikimedia
  • 07:34 elukey: upgrade druid analytics (backend for Turnilo/Superset/etc..) to 0.19
  • 07:32 XioNoX: remove nonstop-bridging from fasw-c-eqiad switches - T191667
  • 07:29 XioNoX: remove nonstop-bridging from eqiad asw2 switches - T191667
  • 07:28 XioNoX: remove nonstop-bridging from asw2-esams - T191667
  • 07:27 marostegui: Start topology changes on m2 - T257540
  • 07:25 moritzm: installing rails security updates
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P12153 and previous config saved to /var/cache/conftool/dbconfig/20200804-064223-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12152 and previous config saved to /var/cache/conftool/dbconfig/20200804-063026-marostegui.json
  • 06:27 _joe_: restarting docker daemon on kubestage1002, seems like a case of https://github.com/moby/moby/issues/29635
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Restore original weight to db1089 on main traffic', diff saved to https://phabricator.wikimedia.org/P12151 and previous config saved to /var/cache/conftool/dbconfig/20200804-062358-marostegui.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12150 and previous config saved to /var/cache/conftool/dbconfig/20200804-062256-marostegui.json
  • 06:19 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 06:13 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: re-enabling lilypond execution in safe mode 3rd attempt (duration: 00m 58s)
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1089 on main traffic', diff saved to https://phabricator.wikimedia.org/P12149 and previous config saved to /var/cache/conftool/dbconfig/20200804-061255-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12148 and previous config saved to /var/cache/conftool/dbconfig/20200804-061209-marostegui.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for MCR', diff saved to https://phabricator.wikimedia.org/P12147 and previous config saved to /var/cache/conftool/dbconfig/20200804-061003-marostegui.json
  • 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for reimage', diff saved to https://phabricator.wikimedia.org/P12146 and previous config saved to /var/cache/conftool/dbconfig/20200804-051843-marostegui.json
  • 05:04 marostegui: Reboot db1107 to pick up the last kernel
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into API', diff saved to https://phabricator.wikimedia.org/P12145 and previous config saved to /var/cache/conftool/dbconfig/20200804-050150-marostegui.json
  • 03:56 legoktm: added Arlo to wmf-deployment Gerrit group
  • 03:53 legoktm: added subbu to wmf-deployment Gerrit group

2020-08-03

  • 23:43 mutante: mwdebug1001 - temp installing apt-file for debugging an issue on mwmaint
  • 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on fawiki (T253291) (duration: 00m 59s)
  • 21:35 sbassett: Deployed mitigations for T115888
  • 21:14 sbassett@deploy1001: Synchronized php-1.36.0-wmf.2/resources/src/mediawiki.jqueryMsg/mediawiki.jqueryMsg.js: (no justification provided) (duration: 01m 00s)
  • 18:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:09 dcausse@deploy1001: Finished deploy [wdqs/wdqs@20dcff3]: deploy 0.3.43 and gui update (duration: 15m 53s)
  • 17:53 dcausse@deploy1001: Started deploy [wdqs/wdqs@20dcff3]: deploy 0.3.43 and gui update
  • 17:33 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.2
  • 17:28 dcausse@deploy1001: Finished deploy [wdqs/wdqs@20dcff3]: (no justification provided) (duration: 00m 35s)
  • 17:28 dcausse@deploy1001: Started deploy [wdqs/wdqs@20dcff3]: (no justification provided)
  • 16:58 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.36.0-wmf.1"
  • 16:21 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 16:16 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 15:55 _joe_: regenerating the TLS certs for blubberoid
  • 15:33 XioNoX: standardize all routers routing-options config
  • 15:27 marostegui: Change PK on frwiktionary.revision on db2087:3317, db2129, db2121 db2086:3317 T259524
  • 15:16 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P12143 and previous config saved to /var/cache/conftool/dbconfig/20200803-145111-marostegui.json
  • 14:40 moritzm: update Buster netboot images to Buster 10.5 T259519
  • 14:33 XioNoX: disable all ALGs from pfw3-codfw
  • 14:28 XioNoX: remove IGMP and PIM from pfw3-codfw security zones
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into dump and depool db1106', diff saved to https://phabricator.wikimedia.org/P12142 and previous config saved to /var/cache/conftool/dbconfig/20200803-142749-marostegui.json
  • 14:27 XioNoX: remove nonstop-bridging from fasw-c-codfw - T191667
  • 14:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:04 filippo@deploy1001: Finished deploy [librenms/librenms@413e006]: Upgrade LibreNMS to 1.66 - T257017 (duration: 00m 23s)
  • 14:03 filippo@deploy1001: Started deploy [librenms/librenms@413e006]: Upgrade LibreNMS to 1.66 - T257017
  • 14:00 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin A:puppetmaster 'enable-puppet "cdanis deploying I92e9a05"'
  • 13:56 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin A:puppetmaster 'disable-puppet "cdanis deploying I92e9a05"'
  • 13:27 moritzm: installing libopenmpt security updates
  • 13:15 XioNoX: remove nonstop-bridging from asw-d-codfw - T191667
  • 13:14 XioNoX: remove nonstop-bridging from asw-c-codfw - T191667
  • 13:12 XioNoX: remove nonstop-bridging from asw-b-codfw - T191667
  • 13:11 XioNoX: remove nonstop-bridging from asw-a-codfw - T191667
  • 13:05 moritzm: installing json-c security updates
  • 12:53 XioNoX: move VRRP master to cr3-eqsin
  • 12:32 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.2
  • 12:26 moritzm: installing apache-log4j1.2 security updates
  • 12:20 moritzm: restarting nginx on francium to pick up luajit update
  • 12:13 kormat: disabling puppet on cumin hosts T259021
  • 11:55 moritzm: installing luajit security updates
  • 11:20 moritzm: installing ruby-rack security updates
  • 11:19 Urbanecm: EU B&C done
  • 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 346138d: Add extra namespaces for yuewiktionary (T258913) (duration: 01m 06s)
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 8c2a2b2: Add gpophotoeng.gov.il to the wgCopyUploadsDomains allowlist for commonswiki (T258857) (duration: 01m 07s)
  • 11:03 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: ead6b9e: New throttle rule for Czech editathon (T259352) (duration: 01m 06s)
  • 11:03 moritzm: installing ruby2.5 security updates
  • 11:01 moritzm: removing cloudcephmon100[1-3].wikimedia.org from debmonitor (these eventually got re-installed as cloudcephmon100[1-3].eqiad.wmnet)
  • 10:51 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 06s)
  • 10:50 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 08s)
  • 10:29 moritzm: installing NSS security updates on buster
  • 10:26 moritzm: restarting Apache on puppetboard to pick up curl security updates
  • 10:19 moritzm: restarting wtp1025 (parsoid canary) to pick up curl security updates
  • 09:46 moritzm: restarting mw1261-mw1265 to pick up curl security updates
  • 09:42 moritzm: installing curl security updates on stretch
  • 08:59 moritzm: installing ffmpeg security updates on jobrunners/video scalers (3.2.15 rebuilt with VP9/row-mt patches)
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into API', diff saved to https://phabricator.wikimedia.org/P12141 and previous config saved to /var/cache/conftool/dbconfig/20200803-082641-marostegui.json
  • 08:25 moritzm: installing qemu security updates on stretch
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12140 and previous config saved to /var/cache/conftool/dbconfig/20200803-082533-marostegui.json
  • 08:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify s5 wikis T259437 (duration: 01m 05s)
  • 08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify s5 wikis T259437 (duration: 01m 40s)
  • 08:07 elukey: roll restart aqs on aqs* to pick up new druid settings
  • 07:10 marostegui: Remove revision triggers from db2095:3317 for MCR changes T238966
  • 07:09 marostegui: Deploy MCR change on s7 codfw, lag will appear on codfw T238966
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12139 and previous config saved to /var/cache/conftool/dbconfig/20200803-070702-marostegui.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12138 and previous config saved to /var/cache/conftool/dbconfig/20200803-052715-marostegui.json
  • 05:04 marostegui: Remove db1108:3321 and db1108:3322 from tendril and add db1108:3351 and db1108:3352 T254462
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12137 and previous config saved to /var/cache/conftool/dbconfig/20200803-050148-marostegui.json

2020-08-01

  • 16:30 Amir1: wikiadmin@10.64.32.197(avkwiki)> delete from site_identifiers; (T259122)
  • 16:27 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T259122)

Archives

See Server Admin Log/Archives.