You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech
Jump to navigation Jump to search
imported>Stashbot
(jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMFMobileFormatterHeadings, unread T232690 (duration: 00m 51s))
imported>Stashbot
(AndyRussG: updated fruec from 18d89675d0 to 1e6a6ee2de)
Line 1: Line 1:
 +
== 2019-10-11 ==
 +
* 15:39 AndyRussG: updated fruec from {{Gerrit|18d89675d0}} to {{Gerrit|1e6a6ee2de}}
 +
* 13:57 moritzm: rebooting cloudbackup2001
 +
* 13:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
 +
* 13:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
 +
* 13:01 moritzm: installing 4.9.189 Linux update from last stretch point releases (no reboots, deploying the package only at this point)
 +
* 12:48 XioNoX: disable SIP ALG on pfw3-eqiad - [[phab:T235150|T235150]]
 +
* 12:47 XioNoX: disable SIP ALG on pfw3-codfw - [[phab:T235150|T235150]]
 +
* 12:45 moritzm: installing libxslt security updates
 +
* 12:35 moritzm: installin zsh updates from stretch point release
 +
* 12:33 moritzm: installing gsoap security updates on stretch
 +
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 after schema change [[phab:T233625|T233625]]', diff saved to https://phabricator.wikimedia.org/P9314 and previous config saved to /var/cache/conftool/dbconfig/20191011-123159-marostegui.json
 +
* 12:31 moritzm: installing libcaca security updates on stretch
 +
* 12:25 XioNoX: push firewall policies to pfw3-eqiad - [[phab:T235074|T235074]]
 +
* 12:24 XioNoX: push firewall policies to pfw3-codfw - [[phab:T235074|T235074]]
 +
* 11:51 moritzm: installing unzip security updates on stretch
 +
* 11:08 moritzm: upgrading debdeploy to 0.0.99.11
 +
* 10:18 moritzm: imported debdeploy 0.0.99.11 for jessie/stretch/buster-wikimedia
 +
* 10:11 hashar: Restarting Gerrit # [[phab:T224448|T224448]]
 +
* 10:02 hashar: gerrit: killed a stall SendEmail thread that was holding a lock
 +
* 08:34 moritzm: remove kafka2001-2003 from debmonitor DB ([[phab:T235125|T235125]])
 +
* 08:32 moritzm: remove kafka1001-1003 from debmonitor DB ([[phab:T235125|T235125]])
 +
* 08:30 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
 +
* 08:28 jmm@cumin1001: START - Cookbook sre.hosts.downtime
 +
* 08:04 moritzm: reimaging labpuppetmaster1002 (spare) for some tests related to microcode loading
 +
* 07:32 XioNoX: rollback two previous HE peering deactivate
 +
* 07:30 XioNoX: deactivate HE peering on cr2-eqord for packet loss
 +
* 07:28 XioNoX: deactivate HE peering on cr1-eqiad for packet loss
 +
* 06:13 marostegui: Compress tables on db2085:3318 - [[phab:T232446|T232446]]
 +
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318 for compression - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P9311 and previous config saved to /var/cache/conftool/dbconfig/20191011-060814-marostegui.json
 +
* 05:27 papaul: rebooting an-conf1001 for serial troubleshooting
 +
* 05:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
 +
* 05:13 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
 +
* 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for schema change [[phab:T233625|T233625]]', diff saved to https://phabricator.wikimedia.org/P9310 and previous config saved to /var/cache/conftool/dbconfig/20191011-045409-marostegui.json
 +
* 02:14 mutante: gerrit - "manually" starting replication via ssh command
 +
* 02:13 mutante: gerrit - restart service to ensure last config change is picked up
 +
* 02:10 mutante: gerrit1001 - attempt to manually start replication to github
 +
 
== 2019-10-10 ==
 
== 2019-10-10 ==
 
* 22:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMFMobileFormatterHeadings, unread [[phab:T232690|T232690]] (duration: 00m 51s)
 
* 22:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMFMobileFormatterHeadings, unread [[phab:T232690|T232690]] (duration: 00m 51s)

Revision as of 15:39, 11 October 2019

2019-10-11

  • 15:39 AndyRussG: updated fruec from 18d89675d0 to 1e6a6ee2de
  • 13:57 moritzm: rebooting cloudbackup2001
  • 13:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:01 moritzm: installing 4.9.189 Linux update from last stretch point releases (no reboots, deploying the package only at this point)
  • 12:48 XioNoX: disable SIP ALG on pfw3-eqiad - T235150
  • 12:47 XioNoX: disable SIP ALG on pfw3-codfw - T235150
  • 12:45 moritzm: installing libxslt security updates
  • 12:35 moritzm: installin zsh updates from stretch point release
  • 12:33 moritzm: installing gsoap security updates on stretch
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9314 and previous config saved to /var/cache/conftool/dbconfig/20191011-123159-marostegui.json
  • 12:31 moritzm: installing libcaca security updates on stretch
  • 12:25 XioNoX: push firewall policies to pfw3-eqiad - T235074
  • 12:24 XioNoX: push firewall policies to pfw3-codfw - T235074
  • 11:51 moritzm: installing unzip security updates on stretch
  • 11:08 moritzm: upgrading debdeploy to 0.0.99.11
  • 10:18 moritzm: imported debdeploy 0.0.99.11 for jessie/stretch/buster-wikimedia
  • 10:11 hashar: Restarting Gerrit # T224448
  • 10:02 hashar: gerrit: killed a stall SendEmail thread that was holding a lock
  • 08:34 moritzm: remove kafka2001-2003 from debmonitor DB (T235125)
  • 08:32 moritzm: remove kafka1001-1003 from debmonitor DB (T235125)
  • 08:30 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:28 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:04 moritzm: reimaging labpuppetmaster1002 (spare) for some tests related to microcode loading
  • 07:32 XioNoX: rollback two previous HE peering deactivate
  • 07:30 XioNoX: deactivate HE peering on cr2-eqord for packet loss
  • 07:28 XioNoX: deactivate HE peering on cr1-eqiad for packet loss
  • 06:13 marostegui: Compress tables on db2085:3318 - T232446
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318 for compression - T232446', diff saved to https://phabricator.wikimedia.org/P9311 and previous config saved to /var/cache/conftool/dbconfig/20191011-060814-marostegui.json
  • 05:27 papaul: rebooting an-conf1001 for serial troubleshooting
  • 05:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:13 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9310 and previous config saved to /var/cache/conftool/dbconfig/20191011-045409-marostegui.json
  • 02:14 mutante: gerrit - "manually" starting replication via ssh command
  • 02:13 mutante: gerrit - restart service to ensure last config change is picked up
  • 02:10 mutante: gerrit1001 - attempt to manually start replication to github

2019-10-10

  • 22:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMFMobileFormatterHeadings, unread T232690 (duration: 00m 51s)
  • 22:17 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T78711 Update cron-updated miser pages to say they are run periodically, not never (duration: 00m 51s)
  • 22:10 jforrester@deploy1001: Synchronized wmf-config/wikitech.php: Remove debug line dating from 2015-12-08! (duration: 00m 51s)
  • 22:04 jforrester@deploy1001: Synchronized wmf-config/mc.php: Drop nutcracker indirection for HHVM servers, just point to localhost (duration: 00m 51s)
  • 21:58 jforrester@deploy1001: Synchronized wmf-config/PhpAutoPrepend.php: Drop special-case for PHP7, now always used (duration: 00m 51s)
  • 21:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop HHVM special-case for SVG converter, no longer used (duration: 00m 51s)
  • 21:49 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't check to shard static config cache for HHVM any more (duration: 00m 50s)
  • 21:48 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Don't check to shard wmgWBSharedCacheKey for HHVM any more (duration: 00m 51s)
  • 21:39 jforrester@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/VisualEditor/lib/ve/src/dm/ve.dm.TreeCursor.js: T234881 TreeCursor: cross ignored nodes properly from the end of a text node (duration: 00m 54s)
  • 20:36 otto@deploy1001: Finished deploy [analytics/refinery@9b322e4]: attempting to fix missing git fat jar on stat1004 (duration: 00m 06s)
  • 20:36 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: attempting to fix missing git fat jar on stat1004
  • 20:13 hoo: Updated the Wikidata property suggester with data from the 2019-09-30 JSON dump and applied the T132839 workarounds
  • 19:33 godog: swift eqiad-prod: add weight to ms-be105[1-6] - T232367
  • 19:29 marxarelli: promoted 1.35.0-wmf.1 to all wikis. no rise in errors rates. no new relevant errors cc: T233849
  • 19:25 godog: swift codfw-prod: more weight to ms-be205[1-6] - T233638
  • 19:20 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.1
  • 19:11 dduvall@deploy1001: rebuilt and synchronized wikiversions files: labswiki to 1.35.0-wmf.1
  • 19:09 dduvall@deploy1001: Synchronized php-1.35.0-wmf.1/extensions/OpenStackManager: labswiki to 1.35.0-wmf.1 (duration: 01m 00s)
  • 19:04 marxarelli: promoting labswiki to 1.35.0-wmf.1 cc: T233849
  • 17:07 jbond42: puppetmaster1001 has been upgraded and is back serving requests
  • 16:21 urandom: Upgrading sessionstore200[1-3].codfw.wmnet to Cassandra 3.11.4 -- T200803
  • 16:18 urandom: Upgrading sessionstore1003.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 16:16 urandom: Upgrading sessionstore1002.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 16:11 @: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 16:07 @: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 16:04 thcipriani: restarting gerrit due to T224448
  • 16:04 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 16:01 urandom: Upgrading sessionstore1001.eqiad.wmnet to Cassandra 3.11.4 -- T200803
  • 15:42 @: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 15:23 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@1adf74e]: Update mobileapps to c89aa55 (duration: 05m 39s)
  • 15:18 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@1adf74e]: Update mobileapps to c89aa55
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1074 after getting its BBU replaced T231638', diff saved to https://phabricator.wikimedia.org/P9306 and previous config saved to /var/cache/conftool/dbconfig/20191010-145737-marostegui.json
  • 14:54 moritzm: ran systemctl reset-failed on puppetmaster1001 (puppet-master.service after reimage)
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1074 after BBU replacement T231638', diff saved to https://phabricator.wikimedia.org/P9305 and previous config saved to /var/cache/conftool/dbconfig/20191010-144201-marostegui.json
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1112 into recentchanges and remove db1078 from it after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9304 and previous config saved to /var/cache/conftool/dbconfig/20191010-143924-marostegui.json
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool to db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9303 and previous config saved to /var/cache/conftool/dbconfig/20191010-143633-marostegui.json
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9302 and previous config saved to /var/cache/conftool/dbconfig/20191010-142323-marostegui.json
  • 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 db1083 db1076 db1112 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9301 and previous config saved to /var/cache/conftool/dbconfig/20191010-141303-marostegui.json
  • 14:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1013, es1014 after PDU maintenance (duration: 00m 59s)
  • 14:03 jbond42: re-enable puppet now ca has been correctly moved
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9300 and previous config saved to /var/cache/conftool/dbconfig/20191010-135806-marostegui.json
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 db1083 db1076 db1118 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9299 and previous config saved to /var/cache/conftool/dbconfig/20191010-135659-marostegui.json
  • 13:50 jbond42: disable puppet fleet wide as puppetmaster2002 is stuggeling
  • 13:32 jbond42: reimage puppetmaster1001
  • 13:27 marostegui: Repool labsdb1011 after reclone - T235016
  • 13:16 arturo: added flannel 0.5.5-4 to buster-wikimedia (T235059)
  • 13:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to es1013, es1014 after PDU maintenance (duration: 00m 58s)
  • 13:00 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 12:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1013, es1014 after PDU maintenance (duration: 00m 59s)
  • 11:57 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:57 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:48 jbond@cumin2001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 11:46 jbond@cumin2001: Updating IPMI password on 35 hosts - jbond@cumin2001
  • 11:46 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:41 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Fix typo in beta repo data bridge config (T235033) (duration: 00m 59s)
  • 11:40 marostegui: Deploy schema change on s7 codfw master (db2118), this will generate lag on s7 codfw - T234066 T233135
  • 11:38 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:38 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:38 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:37 arturo: icinga downtime cloudvirt1023 for 2h (T227536)
  • 11:36 arturo: icinga downtime cloudvirt1025 for 2h (T227536)
  • 11:36 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:36 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:36 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:35 arturo: icinga downtime cloudvirt1026 for 2h (T227536)
  • 11:35 marostegui: Stop replication on db2077 to change triggers on db2095:3317 - T234704
  • 11:23 moritzm: installing reportbug updates from stretch point release
  • 11:22 Lucas_WMDE: EU SWAT done
  • 11:21 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:21 jbond@cumin2001: Updating IPMI password on 1253 hosts - jbond@cumin2001
  • 11:21 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: Set dataBridgeEnabled repo setting on beta (T235033) (affects InitialiseSettings-labs.php and Wikibase.php, but Wikibase.php part is guarded by isset(), so should be safe to sync both at once, I think) (duration: 01m 00s)
  • 11:21 jbond@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:21 jbond@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:14 Lucas_WMDE: ^ (and by CS, I actually mean Wikibase.php, not CommonSettings.php, sorry)
  • 11:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: Rename data bridge config variable names (T235033) (affects IS-labs and CS, but the CS part is all guarded by isset(), so should be safe to sync both at once, I think) (duration: 01m 00s)
  • 10:38 moritzm: rebalancing Ganeti eqiad/row C after rolling reboots of Ganeti nodes
  • 10:34 volans: uploaded spicerack_0.0.28-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 08:23 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:20 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 08:17 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 08:12 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Add wtp1025/wtp2001 to the list of servers using Parsoid/PHP - T233654 (duration: 01m 01s)
  • 07:55 marostegui: Stop MySQL on es1014 es1013 db1084 db1083 db1077 db1076 db1112 db1124 db1118 for on-site PDU maintenance (this will generate lag on labsdb hosts) - T227536
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:56 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:45 marostegui: Drop designate_pool_manager database from m5 - T233978
  • 06:33 marostegui: Revoke privileges from designate user on the designate_pool_manager database - T233978
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for PDU maintenance T227536', diff saved to https://phabricator.wikimedia.org/P9294 and previous config saved to /var/cache/conftool/dbconfig/20191010-055153-marostegui.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1078 into rc service for s3 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9293 and previous config saved to /var/cache/conftool/dbconfig/20191010-055102-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 db1083 db1076 db1118 for PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9292 and previous config saved to /var/cache/conftool/dbconfig/20191010-054853-marostegui.json
  • 05:47 marostegui: Depool db1084 db1083 db1076 db1118 for PDU maintenance - T227536
  • 05:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:04 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:53 marostegui: Deploy schema change on db1061 (s6 eqiad master) - T233135 T234066
  • 04:43 marostegui: Depool labsdb1011 for recloning - T235016
  • 00:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 00:39 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 00:39 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 00:38 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset

2019-10-09

  • 23:55 twentyafterfour@deploy1001: deploy aborted: (no justification provided) (duration: 03m 57s)
  • 23:51 twentyafterfour@deploy1001: Started deploy [phabricator/deployment@e4e2b22]: (no justification provided)
  • 23:24 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable AMC on all wikis (T233612) (duration: 00m 58s)
  • 23:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Turn on AMC outreach modal (T234026) (duration: 00m 59s)
  • 22:01 mutante: restarting gerrit to revert replication config change (T235135)
  • 21:27 godog: swift eqiad-prod: add ms-be105[1-6] - T232367
  • 21:02 otto@deploy1001: Finished deploy [analytics/refinery@9b322e4]: (no justification provided) (duration: 00m 02s)
  • 21:02 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: (no justification provided)
  • 21:02 otto@deploy1001: deploy aborted: (no justification provided) (duration: 38m 29s)
  • 20:55 ppchelko@deploy1001: Finished deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds, rb-dev1006 (duration: 01m 44s)
  • 20:53 ppchelko@deploy1001: Started deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds, rb-dev1006
  • 20:44 ppchelko@deploy1001: Finished deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds (duration: 02m 42s)
  • 20:41 ppchelko@deploy1001: Started deploy [restbase/deploy@aaadd73] (dev-cluster): Switch to wikifeeds
  • 20:31 papaul: rebooting ms-be1051 to access BIOS
  • 20:28 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@469ed65]: Update mobileapps to b9a225e (duration: 06m 22s)
  • 20:28 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:28 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:28 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:28 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:23 otto@deploy1001: Started deploy [analytics/refinery@9b322e4]: (no justification provided)
  • 20:22 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@469ed65]: Update mobileapps to b9a225e
  • 20:16 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 00m 10s)
  • 20:16 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:16 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 05m 34s)
  • 20:10 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:09 milimetric@deploy1001: Finished deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix (duration: 02m 23s)
  • 20:06 milimetric@deploy1001: Started deploy [analytics/refinery@46501d1]: new geoeditors column and wikipedia portal EL fix
  • 20:01 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 19:56 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 19:54 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 00m 12s)
  • 19:54 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 19:54 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:52 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 08m 00s)
  • 19:44 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:44 milimetric@deploy1001: Finished deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix (duration: 09m 33s)
  • 19:34 milimetric@deploy1001: Started deploy [analytics/refinery@0a914bf]: new geoeditors column and wikipedia portal EL fix
  • 19:25 marxarelli: 1.35.0-wmf.1 promoted to group1, labswiki rolled back to 1.34.0-wmf.25 and to be kept back, cc: T233849
  • 19:09 dduvall@deploy1001: rebuilt and synchronized wikiversions files: labswiki rollback to 1.34.0-wmf.25 due to hhvm
  • {{safesubst:SAL entry|1=19:09 urandom: Upgrade restbase-dev1006-{a,b} to Cassandra 3.11.4 -- T200803}}
  • 19:09 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.1 (duration: 00m 58s)
  • 19:06 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.1
  • {{safesubst:SAL entry|1=18:51 urandom: Upgrade restbase-dev1005-{a,b} to Cassandra 3.11.4 -- T200803}}
  • {{safesubst:SAL entry|1=18:45 urandom: Upgrade restbase-dev1004-{a,b} to Cassandra 3.11.4 -- T200803}}
  • 18:44 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:44 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:43 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:43 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:22 elukey: roll restart aqs on aqs100[4-9] to pick up new Druid config changes
  • 17:19 eileen: civicrm revision changed from 2ba100486e to 5a2f8048c4, config revision is 5560cc0878
  • 16:50 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:48 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9289 and previous config saved to /var/cache/conftool/dbconfig/20191009-160506-marostegui.json
  • 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P9288 and previous config saved to /var/cache/conftool/dbconfig/20191009-153705-marostegui.json
  • 15:04 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 15:02 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1085 vslow and dump group', diff saved to https://phabricator.wikimedia.org/P9287 and previous config saved to /var/cache/conftool/dbconfig/20191009-145102-marostegui.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9286 and previous config saved to /var/cache/conftool/dbconfig/20191009-144928-marostegui.json
  • 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9285 and previous config saved to /var/cache/conftool/dbconfig/20191009-144607-marostegui.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'More trafic to db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9284 and previous config saved to /var/cache/conftool/dbconfig/20191009-144400-marostegui.json
  • 14:38 elukey: cr1-eqsin: change IPv6 address for BGP peer AS4761
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9283 and previous config saved to /var/cache/conftool/dbconfig/20191009-141137-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1075 after unexpected reboot', diff saved to https://phabricator.wikimedia.org/P9282 and previous config saved to /var/cache/conftool/dbconfig/20191009-140749-marostegui.json
  • 14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:02 moritzm: rebalancing Ganeti eqiad/row A after rolling reboots of Ganeti nodes
  • 13:48 jbond42: reimage puppetmaster2001
  • 13:37 vgutierrez: repooling cp1085 - T231525
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'depool db1075', diff saved to https://phabricator.wikimedia.org/P9280 and previous config saved to /var/cache/conftool/dbconfig/20191009-133709-marostegui.json
  • 13:13 mobrovac@deploy1001: Finished deploy [restbase/deploy@aaadd73]: Parsoid: Retry fetching stashes with undefined as the revid - T234928 (duration: 14m 26s)
  • 12:59 mobrovac@deploy1001: Started deploy [restbase/deploy@aaadd73]: Parsoid: Retry fetching stashes with undefined as the revid - T234928
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9279 and previous config saved to /var/cache/conftool/dbconfig/20191009-125641-marostegui.json
  • 12:42 marostegui: Stop MySQL and power off db1074 for BBU replacement T231638
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for BBU replacement T231638', diff saved to https://phabricator.wikimedia.org/P9278 and previous config saved to /var/cache/conftool/dbconfig/20191009-124218-marostegui.json
  • 12:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response, take #2 (duration: 08m 18s)
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P9277 and previous config saved to /var/cache/conftool/dbconfig/20191009-124035-marostegui.json
  • 12:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:36 moritzm: disabled puppet on DNS recursors for staged rollout of ferm NTP change
  • 12:35 jbond42: reimage puppetmaster2002
  • 12:32 mobrovac@deploy1001: Started deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response, take #2
  • 12:30 mobrovac@deploy1001: Finished deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response - T170455 T234928 (duration: 09m 40s)
  • 12:28 vgutierrez: depooling cp1085 for a power drain - T231525
  • 12:20 mobrovac@deploy1001: Started deploy [restbase/deploy@068d2ed]: Feed: Use Wikifeeds; Parsoid: Use the ETag revid for stashing and use the same ETag for stashing and response - T170455 T234928
  • 12:13 moritzm: draining ganeti1001 for upcoming reboot (combined kernel/qemu security updates)
  • 12:10 moritzm: failover Ganeti master in eqiad to ganeti1003
  • 12:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:32 moritzm: draining ganeti1008 for upcoming reboot (combined kernel/qemu security updates)
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:05 Amir1: EU SWAT is done
  • 11:04 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Put write both limit down to Q70m for item terms (T234948) (duration: 01m 10s)
  • 11:04 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 10:58 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:18 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 10:16 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 09:53 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:53 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:48 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:44 moritzm: draining ganeti1007 for upcoming reboot (combined kernel/qemu security updates)
  • 09:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:39 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:00 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 08:59 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change, temporarily pool db1085 as vslow,dump', diff saved to https://phabricator.wikimedia.org/P9276 and previous config saved to /var/cache/conftool/dbconfig/20191009-085016-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1085 after schema change', diff saved to https://phabricator.wikimedia.org/P9275 and previous config saved to /var/cache/conftool/dbconfig/20191009-084732-marostegui.json
  • 08:39 vgutierrez: Switch cp1082 from nginx to ats-tls - T231433
  • 08:24 moritzm: draining ganeti1006 for upcoming reboot (combined kernel/qemu security updates)
  • 08:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:14 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:01 vgutierrez: Switch cp2011 from nginx to ats-tls - T231433
  • 07:48 moritzm: reduced RAM assignment for boron to 8G
  • 07:38 vgutierrez: Switch cp3038 from nginx to ats-tls - T231433
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:34 vgutierrez: switching from nginx to ats-tls on cp4024 - T231433
  • 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1013, es1014 T227536 (duration: 01m 00s)
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change - lag will be generated on s6 labs', diff saved to https://phabricator.wikimedia.org/P9274 and previous config saved to /var/cache/conftool/dbconfig/20191009-051911-marostegui.json
  • 05:11 marostegui: Restart gerrit as it is down
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P9273 and previous config saved to /var/cache/conftool/dbconfig/20191009-045941-marostegui.json
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312', diff saved to https://phabricator.wikimedia.org/P9272 and previous config saved to /var/cache/conftool/dbconfig/20191009-044752-marostegui.json
  • 04:40 vgutierrez: switching cp5004 from nginx to ats-tls - T231433

2019-10-08

  • 23:28 mutante: phab1001 - replacing tin.eqiad.wmnet with deploy1001.eqiad.wmnet in phabricator/deployment-cache/.config:git_server - wondering if we can ever get rid of tin (T190568)
  • 23:05 ebernhardson@deploy1001: Synchronized wmf-config/: [cirrus] drop support for HHVM connection pooling (duration: 00m 59s)
  • 21:58 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Split out the CSP configuration s it can be more easily over-ridden (duration: 00m 59s)
  • 21:28 XenoRyet: updated payments-wiki from d2e2637275 to 8a65f57874
  • 21:09 chaomodus: restarted nagios-nrpe-server on notebook1003
  • 20:38 mutante: labweb1001 - disabled 2fa for myself on Wikitech using disableOATHAuthForUser.php --wiki=labswiki to debug T234996
  • 20:24 mutante: labweb1001 - edit /srv/mediawiki/wmf-config/wikitech.php to and change "false" to "true" on line 52 to enable LDAP debug logging for T234996
  • 19:51 marxarelli: 1.35.0-wmf.1 promoted to group0, cc: T233849. no rise in error rates. no new relevant errors
  • 19:43 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.1
  • 19:38 dduvall@deploy1001: Synchronized php-1.35.0-wmf.1/skins/MinervaNeue/: sync T233521 backport prior to group0 (duration: 00m 59s)
  • 19:29 shdubsh: adding swagger exporter to apt repo
  • 19:13 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.1 and rebuild l10n cache (duration: 19m 21s)
  • 18:54 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.1 and rebuild l10n cache
  • 18:53 godog: codfw-prod: more weight to ms-be205[1-6] - T233638
  • 18:45 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.24 (duration: 08m 24s)
  • 17:32 marxarelli: cutting wmf/1.35.0-wmf.1
  • 16:17 cstone: civicrm revision changed from db7ef10bfa to 2ba100486e
  • 16:00 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:58 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 15:57 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 15:30 XioNoX: remove 2 more sessions to AS12871 on cr2-esams - T232617
  • 15:20 XioNoX: add BGP sessions to AS199524 on cr2-eqdfw
  • 15:18 XioNoX: add BGP sessions to AS2635 on cr2-eqiad
  • 15:13 XioNoX: renumber BGP session to AS4761 on cr1-eqsin
  • 13:53 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 13:51 marostegui@cumin2001: dbctl commit (dc=all): 'Depool db1103:3312 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9266 and previous config saved to /var/cache/conftool/dbconfig/20191008-135058-marostegui.json
  • 13:50 marostegui@cumin2001: dbctl commit (dc=all): 'Fully repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9265 and previous config saved to /var/cache/conftool/dbconfig/20191008-135033-marostegui.json
  • 13:49 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 13:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 13:41 marostegui@cumin2001: dbctl commit (dc=all): 'More traffic for db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9264 and previous config saved to /var/cache/conftool/dbconfig/20191008-134152-marostegui.json
  • 13:35 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@8490964]: Update mobileapps to abd3543 (duration: 06m 04s)
  • 13:32 marostegui@cumin2001: dbctl commit (dc=all): 'Slowly repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9263 and previous config saved to /var/cache/conftool/dbconfig/20191008-133208-marostegui.json
  • 13:29 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@8490964]: Update mobileapps to abd3543
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 db1081 db1080 db1079 db1075 db1074 after PDU maintenance', diff saved to https://phabricator.wikimedia.org/P9262 and previous config saved to /var/cache/conftool/dbconfig/20191008-131752-marostegui.json
  • 13:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1011 (duration: 00m 51s)
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1093 after schema change', diff saved to https://phabricator.wikimedia.org/P9261 and previous config saved to /var/cache/conftool/dbconfig/20191008-124417-marostegui.json
  • 12:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1011 (duration: 00m 51s)
  • 12:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es1012 T227138 (duration: 00m 51s)
  • 12:27 marostegui: Stop MySQL on es1012 for onsite maintenance
  • 12:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1012 T227138 (duration: 00m 51s)
  • 11:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 11:10 Urbanecm: EU SWAT done
  • 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: fb49404: Enable more transwiki import sources for hiwikisource (T234892) (duration: 00m 55s)
  • 10:58 jbond@cumin1001: Updating IPMI password on 1253 hosts - jbond@cumin1001
  • 10:58 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:58 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
  • 10:58 jbond@cumin1001: Updating IPMI password on 1253 hosts - jbond@cumin1001
  • 10:57 jbond42: testing ipmi reset cookbook. using the current pass for both old and new so no reset actully occures
  • 10:57 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:57 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 10:57 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:22 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:21 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:19 moritzm: draining ganeti1005 for upcoming reboot (combined kernel/qemu security updates)
  • 10:16 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:15 mobrovac@deploy1001: Finished deploy [restbase/deploy@00eda0b]: Parsoid VE logging: log if the etags differ (duration: 06m 32s)
  • 10:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:09 mobrovac@deploy1001: Started deploy [restbase/deploy@00eda0b]: Parsoid VE logging: log if the etags differ
  • 10:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for schema change', diff saved to https://phabricator.wikimedia.org/P9259 and previous config saved to /var/cache/conftool/dbconfig/20191008-093309-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1088 after schema change', diff saved to https://phabricator.wikimedia.org/P9258 and previous config saved to /var/cache/conftool/dbconfig/20191008-092627-marostegui.json
  • 09:20 marostegui: Compress logging table on db2088:3312 for idwiki,plwiki,ptwiki,zhwiki
  • 09:09 moritzm: draining ganeti1004 for upcoming reboot (combined kernel/qemu security updates)
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3315 T233625', diff saved to https://phabricator.wikimedia.org/P9257 and previous config saved to /var/cache/conftool/dbconfig/20191008-090616-marostegui.json
  • 08:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:46 mobrovac@deploy1001: Finished deploy [restbase/deploy@83fcc0c]: Minor updates to VE logging (duration: 08m 05s)
  • 08:38 mobrovac@deploy1001: Started deploy [restbase/deploy@83fcc0c]: Minor updates to VE logging
  • 08:33 elukey: roll restart druid historicals and brokers on druid100[1-3] to pick up new settings - T234684
  • 08:10 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:10 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:09 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 08:05 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 07:51 moritzm: draining ganeti1003 for upcoming reboot (combined kernel/qemu security updates)
  • 07:49 akosiaris: update OTRS to 5.0.38
  • 07:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for schema change', diff saved to https://phabricator.wikimedia.org/P9256 and previous config saved to /var/cache/conftool/dbconfig/20191008-071859-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P9255 and previous config saved to /var/cache/conftool/dbconfig/20191008-071551-marostegui.json
  • 07:10 moritzm: draining ganeti1002 for upcoming reboot (combined kernel/qemu security updates)
  • 06:48 marostegui: Stop MySQL on es1011 db1082 db1081 db1080 db1079 db1075 db1074 (replication lag will appear on labs for s5) for on-site maintenance T227138
  • 06:09 marostegui: Repool labsdb1011 after mysql upgrade
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:44 elukey: drop PageCreation_7481635 table from the log db on db1107/db1108 - T233892
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 db1081 db1080 db1079 db1075 db1074 for PDU maintenance T227138', diff saved to https://phabricator.wikimedia.org/P9254 and previous config saved to /var/cache/conftool/dbconfig/20191008-054127-marostegui.json
  • 05:35 elukey: drop CitationUsage tables from the log database on db1107/db1108 (the ones listed in the task) - T233893
  • 05:25 marostegui: Depool labsdb1011 for mysql upgrade
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for schema change', diff saved to https://phabricator.wikimedia.org/P9253 and previous config saved to /var/cache/conftool/dbconfig/20191008-051435-marostegui.json
  • 05:10 marostegui: Reload query killer on labsdb1011
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 T233625', diff saved to https://phabricator.wikimedia.org/P9252 and previous config saved to /var/cache/conftool/dbconfig/20191008-050833-marostegui.json
  • 05:07 marostegui: Deploy schema change on db1097:3315 - T233625
  • 03:04 andrewbogott: restarted nova-conductor on cloudcontrol1003 and cloudcontrol1004 — experimental band-aid for T234876
  • 00:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)

2019-10-07

  • 23:52 dzahn@cumin1001: Updating IPMI password on 1254 hosts - dzahn@cumin1001
  • 23:52 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 23:26 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op / config cache issue? (duration: 00m 49s)
  • 23:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 23:21 dzahn@cumin1001: Updating IPMI password on 1254 hosts - dzahn@cumin1001
  • 23:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 22:40 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 7b9e6829821, T156095 (duration: 00m 51s)
  • 22:29 chaomodus: restart nagios-nrpe-server on stat1007
  • 21:56 mutante: gerrit2001 - sudo rm /etc/apache2/sites-available/50-gerrit-slave-wikimedia-org.conf
  • 21:40 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Run Labs config after CSP config so it can change it (duration: 00m 51s)
  • 21:20 godog: swift codfw-prod: add ms-be205[3456] - T233638
  • 20:56 XenoRyet: updated payments-wiki from b94da68f7e to d2e2637275
  • 20:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:33 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:33 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:31 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:31 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 20:31 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 20:30 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:30 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:29 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:31 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add the beta REL1_34 to ExtensionDistributor (duration: 00m 50s)
  • 19:20 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:18 herron@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:10 Lucas_WMDE: Morning SWAT done
  • 19:09 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/Wikibase: SWAT: Revert "Format coordinates with limited precision" (T174504) (duration: 00m 57s)
  • 18:33 Lucas_WMDE: reopen Morning SWAT for another backport (sorry)
  • 18:26 Urbanecm: Morning SWAT done
  • 18:25 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/: SWAT: 011b6eb: 11033b7: Update VE core submodule to 2ffb699eb (TreeModifier fixes), T234489, T234742 + ve.ui.MWDefinedTransclusionContextItem: Fix handling of template names (T234817) (duration: 00m 53s)
  • 18:16 godog: roll-restart logstash to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/539978
  • 18:12 andrewbogott: apt dist-upgrade on all cloudvirts (for nova upgrades)
  • 18:12 godog: start swiftrepl eqiad -> codfw (no deletes)
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: f434ae3: Enable NewUserMessage on sq.wikipedia and sq.wikiquote (T234499) (duration: 00m 52s)
  • 18:07 jgleeson: Updating civicrm from c12f7bb51f to db7ef10bfa
  • 17:46 ottomata: stat1007 is unresponsive, can't login via mgmt either. powercycling.
  • 17:29 XioNoX: add BGP route damping on IX sessions - eqiad - T222424
  • 17:27 XioNoX: add BGP route damping on IX sessions - esams - T222424
  • 17:22 XioNoX: add BGP route damping on IX sessions - eqsin - T222424
  • 15:34 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@334e809]: Update mobileapps to 16cb9ae (duration: 06m 28s)
  • 15:30 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 15:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 15:27 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@334e809]: Update mobileapps to 16cb9ae
  • 15:27 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 15:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop writing wmgVisualEditorEnableNewMobileContext (duration: 00m 51s)
  • 15:13 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop reading wmgVisualEditorEnableNewMobileContext (duration: 00m 52s)
  • 14:25 arturo: upgrading openstack in CloudVPS. Some IRC bots and related stuff may be unavailable (T212302)
  • 14:17 marostegui: Deploy schema change on db1139:3316 - T233135 T234066
  • 13:27 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set all of wikidata to write both for item term store (T225055) (duration: 00m 54s)
  • 13:26 mobrovac@deploy1001: Finished deploy [restbase/deploy@1337290]: Minor tweaks to VE logging, v2 (duration: 06m 38s)
  • 13:19 mobrovac@deploy1001: Started deploy [restbase/deploy@1337290]: Minor tweaks to VE logging, v2
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9248 and previous config saved to /var/cache/conftool/dbconfig/20191007-131720-marostegui.json
  • 13:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@bf72f5c]: Minor tweaks to VE logging (duration: 07m 01s)
  • 13:13 elukey: upload python-kafka and python3-kafka 1.4.7-1 to buster-wikimedia - T222941
  • 13:09 mobrovac@deploy1001: Started deploy [restbase/deploy@bf72f5c]: Minor tweaks to VE logging
  • 13:05 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: (no justification provided) (duration: 00m 29s)
  • 13:04 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: (no justification provided)
  • 13:04 mobrovac@deploy1001: deploy aborted: Minor tweaks to VE logging (duration: 01m 07s)
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315 after schema change T233625', diff saved to https://phabricator.wikimedia.org/P9247 and previous config saved to /var/cache/conftool/dbconfig/20191007-130317-marostegui.json
  • 13:03 mobrovac@deploy1001: Started deploy [restbase/deploy@fe39197]: Minor tweaks to VE logging
  • 12:54 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=restrouter
  • 12:54 elukey: upload python-kafka and python3-kafka 1.4.7-1 to stretch-wikimedia - T222941
  • 11:44 Lucas_WMDE: EU SWAT done
  • 11:44 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Get rid of main page hack for fixcopyrightwiki (T120085) (duration: 00m 52s)
  • 11:42 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set $wgMainPageIsDomainRoot true for fixcopyrightwiki (T120085) (duration: 00m 52s)
  • 11:41 Amir1: another hack bites the dust
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/GrowthExperiments/: SWAT: Homepage: Don't use flexbox for vertical layouts in mobile start module (T234380) (duration: 00m 53s)
  • 11:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable partial blocks on nlwiki (T234685) (duration: 00m 52s)
  • 11:16 arturo: added bdsync 0.11.1-1~wmf1 to buster-wikimedia (T234683)
  • 10:59 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #5 (duration: 04m 17s)
  • 10:55 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #5
  • 10:54 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #4 (duration: 04m 27s)
  • 10:50 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #4
  • 10:48 mobrovac@deploy1001: Finished deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #3 (duration: 03m 53s)
  • 10:44 mobrovac@deploy1001: Started deploy [restbase/deploy@5321aac]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #3
  • 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 51s)
  • 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 10:31 _joe_: uploading confd 0.16.0 to stretch
  • 10:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #2 (duration: 01m 56s)
  • 10:19 mobrovac@deploy1001: Started deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests, take #2
  • 10:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests - T233127 T234772 (duration: 05m 58s)
  • 10:10 mobrovac@deploy1001: Started deploy [restbase/deploy@1798e39]: Skip checking resources on start-up, add banwiki, add metrics/mediarequests end points and log all VE requests - T233127 T234772
  • 09:55 marostegui: Deploy schema change on db2129 (s6 codfw master), this will generate lag on s6 codfw - T233135 T234066
  • 08:34 hashar: gerrit: force reindexing all changes ( gerrit index start changes --force )
  • 07:09 marostegui: Remove grants for dbproxy1006 on m1 databases - T231280
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change T233625', diff saved to https://phabricator.wikimedia.org/P9246 and previous config saved to /var/cache/conftool/dbconfig/20191007-065645-marostegui.json
  • 06:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1011 T227138 (duration: 01m 10s)
  • 06:08 elukey: upgrade python-kafka on eventlog1002 to 1.4.7-1 (manually via dpkg -i) - T222941
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:25 marostegui: Deploy schema change on db2124 T233135 T234066
  • 05:10 marostegui: The above was for db2095:3316 T234704
  • 05:08 marostegui: Stop replication on db2076 to modify triggers on db2096:3316 T234704
  • 05:02 marostegui: Fix replication on labsdb1011:s8
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 for schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9245 and previous config saved to /var/cache/conftool/dbconfig/20191007-045411-marostegui.json

2019-10-06

  • 20:11 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Racconish /home/urbanecm/T234741 (T234741)
  • 19:15 marostegui: Reload haproxy on dbproxy1010, dbproxy1011, dbproxy1018, dbproxy1019
  • 06:47 elukey: delete old cron entry 'xenon_generate_svgs' (user xenon) on webperf[12]002 to reduce cronspam

2019-10-05

  • 06:48 elukey: force umount/remount of /mnt/hdfs on an-coord1001 - processes stuck in D state, fuser proc consuming a ton of memory

2019-10-04

  • 22:06 mutante: ms-be1020 - power cycle via mgmt - host down
  • 20:43 krinkle@deploy1001: Synchronized w/static.php: 9648e03, 97d9384 (duration: 00m 53s)
  • 20:41 mutante: deploy1001 / deploy2001 - remove python-pygerrit2 (version for python3 is needed instead)
  • 20:32 mutante: gerrit1001 - scp /usr/share/java/mysql-connector-java.jar from cobalt into /usr/share/java/ on gerrit1001 and then symlink into /var/lib/gerrit2/review_site/lib/ (T222391)
  • 19:27 mutante: wtp1025 - mediawiki appserver classes are being applied, install in progress will trigger some new icinga alerts
  • 14:03 marostegui: Deploy schema change on db2117 T233135 T234066
  • 13:50 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 13:47 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 13:36 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 12:28 marostegui: Deploy schema change on db2097:3316 T233135 T234066
  • 12:23 elukey: cleaned up old files and apt-cache from an-coord1001
  • 08:41 marostegui: Deploy schema change on db2076 (sanitarium master) with replication T233135 T234066
  • 08:32 _joe_: reuploading the old confd package to stetch-wikimedia, some incompatibility detected
  • 07:26 elukey: execute gnt-instance remove kerberos1001 on ganeti1001 - T234600
  • 07:24 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 07:24 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:41 marostegui: Deploy schema change on db2114 T233135 T234066
  • 06:22 _joe_: downgrading confd back to 0.9.0 while some templates get fixed.
  • 06:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:16 marostegui: Deploy schema change on dbstore1005:3316 T233135 T234066
  • 05:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool es1019 after on-site maintenance T233698 (duration: 00m 51s)
  • 05:53 _joe_: upgrading confd on puppetmaster1001 T147204
  • 05:50 _joe_: uploading confd 0.16.0 on stretch T147204
  • 05:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to es1019 after on-site maintenance T233698 (duration: 00m 51s)
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9240 and previous config saved to /var/cache/conftool/dbconfig/20191004-051112-marostegui.json
  • 05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool es1019 after on-site maintenance T233698 (duration: 00m 53s)

2019-10-03

  • 23:50 mutante: gerrit - restarting for replication config tweaks
  • 20:05 @: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 20:01 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 19:52 XenoRyet: updated payments-wiki from 80dead6444 to b94da68f7e
  • 19:40 mutante: mw1290 - depooled and scheduled downtime in Icinga for hardware maintenance T234153
  • 19:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 19:30 marxarelli: 1.34.0-wmf.25 promoted to all wikis, cc: T220750. no rise in relevant error rates. no new errors
  • 19:21 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.25
  • 19:19 mutante: puppetmaster1001 - revoke cert for parsoid.discovery.wmnet - creating new ones for each DC and a unified one with both (T233654)
  • 19:11 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 18:52 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op / config cached? (duration: 00m 59s)
  • 18:43 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c2b3d7c (duration: 00m 59s)
  • 18:14 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: no-op / config cache issue? (duration: 01m 00s)
  • 18:03 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 5389d0243ee9c (duration: 01m 01s)
  • 17:13 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@31b2703]: Update mobileapps to 1db84a7 (duration: 06m 06s)
  • 17:07 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@31b2703]: Update mobileapps to 1db84a7
  • 13:49 elukey: roll restart hadoop yarn resource managers for openssl updates on Hadoop workers
  • 13:44 marostegui: Stop MySQL and shutdown es1019 for on-site maintenance - T233698
  • 13:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es1019 for on-site maintenance T233698 (duration: 01m 01s)
  • 13:29 hashar: Gerrit should be back
  • 13:26 hashar: restarting Gerrit due to a deadlock in SendEmail task and AccountCacheImpl
  • 13:22 hashar: Gerrit might be dead again; taking traces
  • 13:04 _joe_: restarting php7 on mw1275
  • 12:54 onimisionipe: force shard allocation on eqiad chi cluster
  • 10:27 elukey: killed rsync processes in "D" state on stat1007, force umount/mount of /mnt/hdfs
  • 10:25 jbond42: rolling upgrade of openssl packages
  • 10:21 Urbanecm: Manually cleared signup throttle for IP 80.188.128.54 at cswiki, issue with introduced throttle rule
  • 10:20 Urbanecm: Manually cleared signup throttle for IP 88.100.221.84 at cswiki, issue with introduced throttle rule
  • 10:18 Urbanecm: Manually cleared signup throttle for IP 90.176.155.12 at cswiki, issue with introduced throttle rule
  • 09:32 elukey: run apt-get autoremove incrementally on all the hadoop prod workers to remove python2 deps (and verify that they are not used anymore by Hadoop)
  • 08:33 marostegui: Deploy schema change on db2087:3316 T233135 T234066
  • 08:28 marostegui: Deploy schema change on db1096:3316 - T233625
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change T233135 T234066', diff saved to https://phabricator.wikimedia.org/P9236 and previous config saved to /var/cache/conftool/dbconfig/20191003-082651-marostegui.json
  • 08:15 akosiaris: slowly rolling restart all pods in eqiad, codfw, staging for log rollover before merging https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/539912
  • 07:49 marostegui: Set notes on the sanitarium masters - T234039
  • 07:19 marostegui: Remove unused labspuppet database from m5 - T233281
  • 07:03 @: helmfile [CODFW] Ran 'apply' command on namespace 'zotero' for release 'production' .
  • 07:00 @: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
  • 06:59 eileen: tools revision changed from e1b81688c6 to b3c7453be2
  • 06:59 @: helmfile [EQIAD] Ran 'apply' command on namespace 'zotero' for release 'production' .
  • 06:48 marostegui: Drop database grants on m5 for labspuppet - T233281
  • 06:37 marostegui: Rename tables on m5 master on designate_pool_manager - T233978
  • 06:16 marostegui: Deploy schema change on db2089:3316 T233135 T234066
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:28 eileen: civicrm revision changed from 12c5727a23 to c12f7bb51f, config revision is 422a0f7d48
  • 02:07 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1c599baea51f9 (duration: 01m 03s)
  • 01:05 mutante: gerrit1001 - shutdown - scheduled downtime
  • 00:51 mutante: gerrit1001 - removing wrong IPv6 address from interface, running puppet

2019-10-02

  • 23:42 XioNoX: enable cr2-eqiad:xe-4/0/0 - T234416
  • 23:38 XioNoX: disable cr2-eqiad:xe-4/0/0 - T234416
  • 23:22 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/CirrusSearch/: T234445: CirrusSearch: Fix Precondition failed: Must have a resultset set (duration: 01m 00s)
  • 23:21 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/CirrusSearch/: T234445: CirrusSearch: Fix Precondition failed: Must have a resultset set (duration: 01m 02s)
  • 22:29 godog: remove queued messages from mx1001 for fr-tech-ops@, triggering sender rate limit from gmail
  • 22:12 jforrester@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: VE unstructured logging, part II (duration: 00m 58s)
  • 22:11 jforrester@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/VisualEditor/includes/ApiVisualEditor.php: VE unstructured logging, part I (duration: 00m 59s)
  • 22:09 jforrester@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: VE unstructured logging, part II (duration: 00m 58s)
  • 22:06 jforrester@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditor.php: VE unstructured logging, part I (duration: 01m 00s)
  • 21:17 mutante: cobalt (gerrit) rsyncing /srv/gerrit/git and /srv/gerrit/plugins data to gerrit1001 again after reinstall and fixing gerrit2 UID/GID (T222391)
  • 21:13 mutante: gerrit1001 - rebooting
  • 21:08 mutante: gerrit1001 changing GID of gerrit2 user to 119 in /etc/group ; find / -uid 499 -exec chown gerrit2 {} \; find / -gid 1001 -exec chown gerrit2:gerrit2 {} \; (T222391)
  • 21:03 mutante: gerrit1001 changing UID of gerrit2 user to 114 and GID to 119 in /etc/passwd to match cobalt to avoid privilege issues after rsyncing data (T222391)
  • 19:58 mutante: puppetmaster1001 - sudo puppet cert clean parsoid.discovery.wmnet (only created yesterday but does not have all the SANs it needs, updating with more SANs) (T233654)
  • 19:47 Jeff_Green: deployed icinga fundraising-nsca collection configuration change
  • 19:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:33 marxarelli: 1.34.0-wmf.25 promoted to group1, cc: T220750. no rise in relevant error rates
  • 19:23 dduvall@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.25 (duration: 00m 59s)
  • 19:22 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.25
  • 18:28 XioNoX: add BGP route damping on IX sessions - eqord - T222424
  • 18:25 XioNoX: add BGP route damping on IX sessions - eqdfw - T222424
  • 18:15 XioNoX: add BGP route damping on IX sessions - ulsfo - T222424
  • 17:08 Lucas_WMDE: Morning SWAT done
  • 17:03 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/skins/Vector/: SWAT: vector.js: Remove eager calculation of p-cactions width on page load (duration: 01m 00s)
  • 16:53 otto@deploy1001: Started restart [eventstreams/deploy@dbc9bbb]: Enabling revision-score stream in eventstreams
  • 16:50 otto@deploy1001: Started restart [eventstreams/deploy@dbc9bbb]: (no justification provided)
  • 16:50 otto@deploy1001: Finished deploy [eventstreams/deploy@dbc9bbb]: (no justification provided) (duration: 00m 01s)
  • 16:50 otto@deploy1001: Started deploy [eventstreams/deploy@dbc9bbb]: (no justification provided)
  • 16:46 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/: SWAT: ApiVisualEditor: Add logging for RESTBase HTTP errors (T233127) + ApiVisualEditorEdit: Add logging for funny etags (T233320) (duration: 01m 04s)
  • 16:42 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/VisualEditor/: SWAT: ApiVisualEditorEdit: Add logging for funny etags (T233320) (duration: 01m 03s)
  • 15:31 godog: correction, add ms-be2052
  • 15:29 godog: swift codfw-prod: add ms-be2051 T233638
  • 15:13 godog: run swiftrepl eqiad -> codfw on ms-fe1005 (no deletes)
  • 14:31 moritzm: installing libxslt security updates on stretch
  • 14:16 moritzm: installing babeltrace bugfix update from buster point release
  • 13:18 moritzm: installing mariabd-10.3 update from buster point release (just client side libs, tools)
  • 13:16 moritzm: installing console-setup bugfix update from buster point release
  • 11:28 moritzm: installing cryptsetup bugfix from buster 10.1 point release
  • 11:26 Urbanecm: EU SWAT done
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 01711d5: Enable partial blocks at ptwiki (T233754) (duration: 00m 55s)
  • 11:26 jbond42: update puppet.eqiad.wmnet to puppetmaster2001
  • 11:24 jbond42: update puppet.esams.wmnet to puppetmaster2001
  • 11:20 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set new MFMobileFormatterOptions config using old config (T232690) (duration: 01m 01s)
  • 11:15 _joe_: testing the package on restbase-dev1006
  • 11:14 _joe_: uploaded service-checker 0.2.0 to stretch-wikimedia
  • 11:12 pmiazga@deploy1001: Synchronized wmf-config/mobile.php: SWAT: Do not set wgMFNoindexPages config flag in mobile.php (T206497) (duration: 01m 14s)
  • 10:17 gehel@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 10:17 gehel@cumin1001: START - Cookbook sre.hosts.decommission
  • 09:41 moritzm: rebalancing Ganeti/codfw Row A after rolling reboot of Ganeti nodes
  • 07:46 moritzm: upgrading remaining stretch hosts to ferm 2.4.2pre
  • 06:23 marostegui: Fix replication on labsdb1011:s7 - T233986
  • 06:17 marostegui: Fix replication on labsdb1011:s1 - T233986
  • 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 04:07 vgutierrez: restarting trafficserver-tls on cp5007
  • 00:54 ejegg: updated fundraising CiviCRM from 6d90d0cf06 to 12c5727a23
  • 00:34 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/resources/src: 5eb3ae1 (duration: 01m 00s)
  • 00:30 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/skins/Vector/: d30064229f9 (duration: 00m 59s)

2019-10-01

  • 23:46 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditor.php: T233127: ApiVisualEditor: Add logging for RESTBase HTTP errors (duration: 00m 58s)
  • 23:44 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: T233211: Deploy cirrussearch glent m0 a/b test (duration: 00m 59s)
  • 23:43 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.25/extensions/WikimediaEvents/modules/all/ext.wikimediaEvents.searchSatisfaction.js: T233211: Deploy cirrussearch glent m0 a/b test (duration: 00m 59s)
  • 23:28 mutante: cobalt (gerrit) rsyncing /srv/gerrit/plugins dir, push to new server gerrit1001 (T222391)
  • 23:21 mutante: gerrit1001 - chown -R gerrit2:gerrit2 /srv/gerrit/git/ (T222391)
  • 23:20 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T233211: CirrusSearch: Configuration for glent m0 AB test (duration: 00m 58s)
  • 23:12 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T233127: Add VisualEditor logging channel to wmgMonologChannels (duration: 00m 59s)
  • 22:30 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 22:19 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 21:34 godog: swift codfw-prod: add ms-be2051 with minimal weight - T233638 T222366
  • 21:33 krinkle@deploy1001: Synchronized php-1.34.0-wmf.25/skins/Vector/: bb2fd9cf9c22cc (duration: 01m 00s)
  • 21:29 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 21:29 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 20:11 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 20:10 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 19:58 mutante: cobalt (gerrit) - rsyncing gerrit data to gerrit1001 in a screen session (T222391)
  • 19:47 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
  • 19:47 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 19:42 marxarelli: 1.34.0-wmf.25 promoted to group0 cc: T220750. no rise in relevant error rates
  • 19:34 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.25
  • 19:30 marxarelli: promoting 1.34.0-wmf.25 to group0
  • 19:28 dduvall@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.25 and rebuild l10n cache (duration: 19m 31s)
  • 19:08 dduvall@deploy1001: Started scap: testwiki to php-1.34.0-wmf.25 and rebuild l10n cache
  • 19:07 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.23 (duration: 01m 32s)
  • 19:04 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.22 (duration: 01m 41s)
  • 19:02 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.21 (duration: 01m 57s)
  • 19:01 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.rotate-pdu-password (exit_code=1)
  • 19:00 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 18:59 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.20 (duration: 02m 11s)
  • 18:57 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.19 (duration: 02m 12s)
  • 18:54 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.17 (duration: 02m 48s)
  • 18:48 dduvall@deploy1001: Pruned MediaWiki: 1.34.0-wmf.16 (duration: 18m 45s)
  • 17:53 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
  • 17:52 thcipriani: gerrit restart for new config changes incoming
  • 17:52 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 17:50 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.rotate-pdu-password (exit_code=97)
  • 17:48 ayounsi@cumin1001: START - Cookbook sre.hosts.rotate-pdu-password
  • 17:48 XioNoX: rotate PDUs passwords - T233053
  • 17:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 17:14 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 17:09 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T156095 - c28baa1862401 (duration: 00m 59s)
  • 17:07 mutante: Welcome new deployer Andrew Kostka (WMDE) (T233202)
  • 17:07 marxarelli: cutting wmf/1.34.0-wmf.25
  • 16:16 _joe_: manually downgrading php-geoip on deploy*, it was still at the 7.0-only version from the distro
  • 16:14 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 16:14 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 16:10 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 16:06 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 15:36 _joe_: uninstalling temporarily the math rendering related packages from mwdebug2002, test for T195847
  • 15:36 elukey: powercycle an-conf1001 to test some bios settings
  • 15:12 jbond42: puppetmaster2001 is back online
  • 14:34 dcausse: created cirrussearch indices for nqowiki (T234326)
  • 14:18 moritzm: rebooting krb1001 for some tests
  • 14:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:10 hashar: Restarting CI Jenkins
  • 14:08 cdanis: ✔️ cdanis@puppetmaster2001.codfw.wmnet ~ 🕙☕ (cd /var/lib/git/labs/private ; git rev-parse HEAD | sudo tee /srv/config-master/labsprivate-sha1.txt )
  • 14:08 cdanis: ✔️ cdanis@puppetmaster2001.codfw.wmnet ~ 🕙☕ (cd /var/lib/git/operations/puppet ; git rev-parse HEAD | sudo tee /srv/config-master/puppet-sha1.txt )
  • 14:08 herron: beginning rolling reboots of eqiad and codfw logstash collectors
  • 14:02 moritzm: rebooting mw1265 for some tests
  • 14:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:01 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:59 cdanis: ✔️ cdanis@puppetmaster2001.codfw.wmnet ~ 🕙☕ sudo touch /srv/config-master/puppet-sha1.txt /srv/config-master/labsprivate-sha1.txt && sudo chown gitpuppet:gitpuppet /srv/config-master/puppet-sha1.txt /srv/config-master/labsprivate-sha1.txt
  • 13:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:24 jbond42: reimage puppetmaster2001
  • 12:37 hashar: Gerrit misbehaved temporarily due to human operator error (hashar ran jstack -l -m which bring the jvm to an halt)
  • 11:16 jbond42: update puppet.ulsfo.wmnet to point to puppetmaster1001
  • 10:45 jbond42: update puppet.esqin.wmnet to point to puppetmaster1001
  • 10:17 moritzm: upgrading ferm on remaining mw servers 2.4.2pre T153468
  • 09:35 moritzm: run systemctl reset-failed on puppetmaster2002 to clear failed puppet-master.service
  • 09:19 moritzm: upgrading ferm on a number of systems to 2.4.2pre T153468
  • 09:07 vgutierrez: restarting acme-chief on acmechief1001 to catch up with python3-cryptography upgrades - T234131
  • 09:04 vgutierrez: upgrading python3-cryptography to version 2.6.1-3+deb10u1~wmf1 on acme-chief hosts - T234131
  • 09:03 moritzm: rebalancing ganeti/row_B after rolling reboot
  • 08:57 vgutierrez: upgrading python3-cryptography to version 2.6.1-3+deb10u1~wmf1 on acmechief-test1001 - T234131
  • 08:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:00 moritzm: draining ganeti2003 for upcoming reboot (combined kernel/qemu security updates)
  • 07:00 hashar: gerrit: forcing reindex of changes # T233989
  • 06:29 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 06:29 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:28 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 06:28 elukey@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091:3314 schema change - T233625', diff saved to https://phabricator.wikimedia.org/P9223 and previous config saved to /var/cache/conftool/dbconfig/20191001-061956-marostegui.json
  • 05:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 05:12 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 00:12 mutante: phabricator - upgrading PHP version to 7.2.22 - T230024

2019-09-30

  • 23:28 niharika29@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/CentralNotice/resources/infrastructure/: CentralNotice: Replace deprecated editToken with csrfToken - T233538 (duration: 00m 57s)
  • 23:23 AndyRussG: updated fruec from c591bd653b to 18d89675d0
  • 21:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1290.eqiad.wmnet
  • 21:47 mutante: mw1290 - scap pull to get it in sync with latest deployment - it was down during scap run for T234153
  • 21:42 jforrester@deploy1001: Synchronized robots.txt: Remove old InternetArchive bot rule that's been disabled since 2008 T7582 (duration: 00m 57s)
  • 21:40 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T222539 Drop no-op hacky disablement of MessageBlobStore::clear() (duration: 05m 13s)
  • 21:38 James_F: sync failure on mw1290.eqiad.wmnet – Connection timed out
  • 21:26 mutante: mw1290 - downtimed for onsite work on mgmt, depooled earlier
  • 21:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 21:08 XioNoX: delete BGP to AS131285 on cr1-eqsin
  • 20:43 arlolra: Updated Parsoid to 1922eb6 (T233459, T230359, T208070)
  • 20:43 arlolra: T208070
  • 20:34 arlolra@deploy1001: Finished deploy [parsoid/deploy@a6da34c]: Updating Parsoid to 1922eb6 (duration: 08m 39s)
  • 20:25 arlolra@deploy1001: Started deploy [parsoid/deploy@a6da34c]: Updating Parsoid to 1922eb6
  • 20:06 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@1f9fedd]: Update mobileapps to 131b83f (duration: 05m 55s)
  • 20:00 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@1f9fedd]: Update mobileapps to 131b83f
  • 19:15 XenoRyet: Updated payments-wiki from 5193dcdfa9 to 80dead6444
  • 17:37 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: fix T234223 (duration: 03m 03s)
  • 17:33 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:24 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:18 twentyafterfour@deploy1001: Finished deploy [releng/phatality@62e2870]: fix T234223 (duration: 00m 05s)
  • 17:18 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:15 twentyafterfour@deploy1001: deploy aborted: fix T234223 (duration: 06m 24s)
  • 17:10 twentyafterfour: deploy failed
  • 17:09 twentyafterfour@deploy1001: Started deploy [releng/phatality@62e2870]: fix T234223
  • 17:08 twentyafterfour: deploying minor update to phatality to fix T234223
  • 16:35 cdanis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 16:34 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0aa4b4b (duration: 00m 57s)
  • 16:34 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@79db711]: Take job domain into account for deduplication T234226 (duration: 01m 17s)
  • 16:32 krinkle@deploy1001: Synchronized wmf-config/abusefilter.php: 0aa4b4b (duration: 00m 57s)
  • 16:32 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@79db711]: Take job domain into account for deduplication T234226
  • 16:25 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:25 cdanis@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 16:25 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:49 moritzm: installing console-setup bugfixes from Buster 10.1 point release
  • 15:46 cdanis@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 15:46 cdanis@cumin1001: START - Cookbook sre.ganeti.makevm
  • 15:42 moritzm: failover Ganeti master in codfw to ganeti2001
  • 15:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:29 moritzm: draining ganeti2007 for upcoming reboot (combined kernel/qemu security updates)
  • 14:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:08 moritzm: draining ganeti2006 for upcoming reboot (combined kernel/qemu security updates)
  • 14:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:54 moritzm: draining ganeti2005 for upcoming reboot (combined kernel/qemu security updates)
  • 13:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:49 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:33 kart_: Update cxserver to 2019-09-26-034732-production (T233834, T232674, T233085)
  • 12:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 12:29 jbond42: offline puppetmaster2002 to reimage https://gerrit.wikimedia.org/r/c/operations/puppet/+/539322
  • 12:27 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 12:24 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 12:00 Urbanecm: EU SWAT done #2
  • 12:00 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 3f4f242: New throttle rule for Czech wiki course (T234113) (duration: 00m 56s)
  • 11:57 Urbanecm: Reopen EU SWAT to deploy throttle rule for October 02 (T234113)
  • 11:54 raynor: EU SWAT finished
  • 11:54 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable alternate mobile link for it, nl, ko wikis. (T206497) (duration: 00m 57s)
  • 11:27 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 539517|Enable CX out of beta in Tagalog and Central Bikol WPs (T233006, T233007) (duration: 00m 59s)
  • 11:20 hashar: Restarting Docker on integration-agent-puppet-docker-1001 # T234197
  • 11:08 hashar: Restarting Docker on CI agents to clear out some docker/iptables oddity # T234197
  • 10:48 hashar: CI outage is tracked in https://phabricator.wikimedia.org/T234197
  • 10:42 moritzm: draining ganeti2004 for upcoming reboot (combined kernel/qemu security updates)
  • 10:40 hashar: CI down due to some DNS related failure on the hosts :-\
  • 10:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:30 moritzm: uploading ferm 2.4.1+wmf2+deb9u1 for stretch-wikimedia, fixes AAAA lookups (T153468)
  • 09:11 moritzm: draining ganeti2002 for upcoming reboot (combined kernel/qemu security updates)
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091:3314 for a schema change - T233625', diff saved to https://phabricator.wikimedia.org/P9217 and previous config saved to /var/cache/conftool/dbconfig/20190930-091043-marostegui.json
  • 08:01 moritzm: installing e2fsprogs security updates on Stretch/Buster
  • 07:56 marostegui: Stop dbstore1003:3311 for troubleshooting
  • 06:47 moritzm: installing exim security updates on buster

2019-09-28

  • 16:28 vgutierrez: restarting acme-chief on acmechief1001

2019-09-27

  • 22:44 mutante: phab2001 - apt-get autoremove - remove unused python and ruby packages
  • 22:36 mutante: phab2001 - upgrade php7.2 packages to 7.2.22 (T230024)
  • 22:03 mutante: webperf1001, webperf2001: restart envoyproxy to pick up new cert with the right subject alt. names
  • 18:22 mutante: mwdebug1001, mwdebug1002 - deleted from /srv/mediawiki/: php-1.34.0-wmf.16, .17, .18, .19 and .20 (current is .24) - usage back to about 57% (T234063)
  • 18:17 mutante: mwdebug1001, mwdebug1002 - apt-get clean saves about 3GB and gets usage down from 94% to 87% on / (T234063)
  • 16:01 XioNoX: delete BGP to AS34305 on cr2-esams
  • 15:34 elukey: update pcc facts to add new hosts
  • 15:02 moritzm: installing usb.ids update from Buster 10.1 point release
  • 14:45 moritzm: installing ncurses bugfix update from Buster 10.1 point release
  • 14:39 moritzm: installing postgresql-common bugfix update from Buster 10.1 point release
  • 14:32 effie: Disable puppet and reload apache on mw* for 539465 and 539488 - T229792
  • 13:33 marostegui: Set candidate masters in dbctl T234039
  • 13:31 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:29 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:16 moritzm: reimaging auth1002 to buster
  • 13:09 akosiaris: reboot ganeti2001 T233906
  • 13:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 13:08 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:03 effie: Disable puppet on mwmaint1002 to test noc.wikimedia.org with PHP7
  • 12:58 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:48 moritzm: installing openldap security updates on Buster
  • 12:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:37 moritzm: killing stray processes from old openjdk-8 build on boron (probably test suite not properly terminated)
  • 12:30 moritzm: installing glib2.0 security updates on Buster
  • 12:14 moritzm: reimaging auth2001 to buster
  • 12:06 moritzm: install gnupg2 security update from Buster 10.1 point release
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9213 and previous config saved to /var/cache/conftool/dbconfig/20190927-104914-marostegui.json
  • 10:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 10:02 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: New throttle rule for Czech course (T234024) (duration: 00m 59s)
  • 09:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:06 moritzm: running a few ferm tests on cp1008, puppet disabled
  • 07:36 godog: swift eqiad-prod: remove ms-be1027 - T233289
  • 05:42 XioNoX: remove tcp-mss clamping from cr2-eqiad - T232602
  • 05:30 XioNoX: remove tcp-mss clamping from cr2-eqord - T232602
  • 05:23 XioNoX: remove tcp-mss clamping from cr1-eqiad - T232602
  • 00:53 twentyafterfour: hotfixing phabricator fatal exception refs T233998

2019-09-26

  • 22:15 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T211620 Enable emails for certain notification types by default on officewiki (duration: 00m 56s)
  • 22:11 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgPageTriageNoIndexTemplates, never read (duration: 00m 57s)
  • 22:02 jforrester@deploy1001: Synchronized wmf-config/filebackend.php: T228547 Stop sharding wgFileBackends shardViaHashLevels for math-render (duration: 00m 56s)
  • 21:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T228547 Stop setting wgMathFileBackend, wgMathPath, wgMathDirectory (unused) (duration: 00m 56s)
  • 21:57 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T228547 Stop setting wgTexvc, wgMathTexvcCheckExecutable, wgMathCheckFiles (unused) (duration: 01m 00s)
  • 20:53 ejegg: updated fundraising CiviCRM from 52d2a24404 to 6d90d0cf06
  • 19:58 phedenskog@deploy1001: Finished deploy [performance/navtiming@1880a79]: Test deploy (duration: 00m 05s)
  • 19:58 phedenskog@deploy1001: Started deploy [performance/navtiming@1880a79]: Test deploy
  • 19:52 krinkle@deploy1001: Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s)
  • 19:52 krinkle@deploy1001: Started deploy [performance/navtiming@f2a0863]: (no justification provided)
  • 19:46 phedenskog@deploy1001: Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s)
  • 19:46 phedenskog@deploy1001: Started deploy [performance/navtiming@f2a0863]: (no justification provided)
  • 19:23 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.24 refs T220749
  • 19:17 volans@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (test) (duration: 00m 16s)
  • 19:17 volans@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release (test)
  • 19:13 twentyafterfour: preparing to deploy the mediawiki train for 1.34.0-wmf.24. refs T220749
  • 18:45 ayounsi@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (duration: 00m 22s)
  • 18:44 ayounsi@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release
  • 18:35 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: Stop setting various static settings, now set in IS (duration: 01m 04s)
  • 18:35 mforns@deploy1001: Finished deploy [analytics/refinery@cd2f43b]: deploy refinery using scap (together with refinery-source v0.0.101) (duration: 06m 04s)
  • 18:34 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set last static Cirrus settings directly in IS (duration: 01m 07s)
  • 18:29 mforns@deploy1001: Started deploy [analytics/refinery@cd2f43b]: deploy refinery using scap (together with refinery-source v0.0.101)
  • 18:25 volans@deploy1001: Finished deploy [homer/deploy@715d842]: Initial Homer release (duration: 00m 23s)
  • 18:25 volans@deploy1001: Started deploy [homer/deploy@715d842]: Initial Homer release
  • 18:17 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop indirectly setting wgWMESearchRelevancePages (duration: 01m 04s)
  • 18:15 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 31s)
  • 18:15 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 18:11 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgWMESearchRelevancePages directly in InitialiseSettings (duration: 01m 04s)
  • 18:07 ayounsi@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 55s)
  • 18:06 ayounsi@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 18:04 mutante: running mcrouter_generate_certs to add a cert for wtp2001.codfw.wmnet for T233654
  • 18:04 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 03s)
  • 18:04 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 18:03 volans@deploy1001: Finished deploy [homer/deploy@68ac5cc]: Initial Homer release (duration: 00m 42s)
  • 18:02 volans@deploy1001: Started deploy [homer/deploy@68ac5cc]: Initial Homer release
  • 17:58 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop setting bits of the CirrusSearch timeoutes arrays, already set in IS (duration: 01m 04s)
  • 17:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set the whole of the CirrusSearch timeoutes arrays directly (duration: 01m 00s)
  • 17:49 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Stop setting static values now set in InitialiseSettings (duration: 01m 04s)
  • 17:49 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T233835, T233246)
  • 17:47 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move static settings from CirrusSettings-common (duration: 01m 05s)
  • 17:43 ppchelko@deploy1001: Finished deploy [changeprop/deploy@2db4bff]: Modify ORES processor for new-style events T225211 (duration: 02m 04s)
  • 17:41 ppchelko@deploy1001: Started deploy [changeprop/deploy@2db4bff]: Modify ORES processor for new-style events T225211
  • 17:35 elukey: run apt-get autoremove on stat* and notebook* to clean up old python2 deps
  • 17:31 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T233835, T233246)
  • 17:14 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 17:13 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s eqiad
  • 17:11 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 17:08 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 16:40 papaul: upgrading firmware on scs-c1-codfw
  • 16:37 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕛☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s codfw
  • 15:56 cdanis: sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s esams
  • 15:35 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕦☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s ulsfo
  • 15:15 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕚☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s eqsin
  • 15:06 mforns@deploy1001: Finished deploy [analytics/aqs/deploy@1a1c08c]: Deploying analytics-aqs using scap (duration: 02m 44s)
  • 15:03 mforns@deploy1001: Started deploy [analytics/aqs/deploy@1a1c08c]: Deploying analytics-aqs using scap
  • 15:00 cdanis: dbctl schema migration done T229677
  • 14:47 cdanis: dbctl schema migration on instances to add note field https://wikitech.wikimedia.org/wiki/Dbctl#Schema_upgrades T229677
  • 14:43 cdanis@cumin1001: dbctl commit (dc=all): 'dbctl 1.2.0 adds hostByName to the output, but it is not used by Mediawiki; this commit is the first made with the new release; no-op change', diff saved to https://phabricator.wikimedia.org/P9208 and previous config saved to /var/cache/conftool/dbconfig/20190926-144328-cdanis.json
  • 14:41 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕥☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s cumin
  • 14:37 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕥☕ sudo debdeploy deploy -u 2019-09-26-conftool.yaml -s puppetmaster
  • 14:36 cdanis: ✔️ cdanis@puppetmaster1001.eqiad.wmnet ~ 🕥☕ sudo apt install python3-conftool
  • 14:19 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕥☕ sudo -E reprepro -C main include jessie-wikimedia conftool_1.2.0-1+deb8u1_amd64.changes
  • 14:16 cdanis: ✔️ cdanis@install1002.wikimedia.org ~ 🕙☕ sudo -E reprepro -C main include buster-wikimedia conftool_1.2.0-1+deb10u1_amd64.changes ; sudo -E reprepro -C main include stretch-wikimedia conftool_1.2.0-1_amd64.changes
  • 11:31 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='Nederlandse Leeuw' /home/urbanecm/T233922 (T233922)
  • 11:23 Urbanecm: EU SWAT done
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 3/3) (duration: 01m 05s)
  • 11:14 Urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-szl.svg (T233104)
  • 11:13 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.svg: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 2/3) (duration: 01m 05s)
  • 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 7645e55: Enable reader demographic surveys in English, Polish, and Russian (T232525) (duration: 01m 06s)
  • 11:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:07 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.png: SWAT: 96bba4c: Add wgMinervaCustomLogos for szlwiki (T233104; 1/3) (duration: 01m 08s)
  • 11:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:53 jbond42: reimagaing puppetmaster1002 to buster
  • 10:48 vgutierrez: switching from nginx to ats-tls on cp5007 - T231627
  • 09:55 moritzm: bouncing postgres on puppetdb1002/2002
  • 09:18 vgutierrez: switching from nginx to ats-tls on cp1080 - T231433
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1078', diff saved to https://phabricator.wikimedia.org/P9203 and previous config saved to /var/cache/conftool/dbconfig/20190926-091348-marostegui.json
  • 09:04 mobrovac@deploy1001: Finished deploy [restbase/deploy@c419651]: Add nqo.wp.org - T233833 (duration: 21m 32s)
  • 09:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:47 vgutierrez: switching from nginx to ats-tls on cp2008 - T231433
  • 08:43 mobrovac@deploy1001: Started deploy [restbase/deploy@c419651]: Add nqo.wp.org - T233833
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1078', diff saved to https://phabricator.wikimedia.org/P9202 and previous config saved to /var/cache/conftool/dbconfig/20190926-084159-marostegui.json
  • 08:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Change special weights from 1 to 100 - T231018', diff saved to https://phabricator.wikimedia.org/P9201 and previous config saved to /var/cache/conftool/dbconfig/20190926-082233-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1078', diff saved to https://phabricator.wikimedia.org/P9200 and previous config saved to /var/cache/conftool/dbconfig/20190926-081759-marostegui.json
  • 08:13 vgutierrez: switching from nginx to ats-tls on cp3036 - T231433
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P9199 and previous config saved to /var/cache/conftool/dbconfig/20190926-081144-marostegui.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P9198 and previous config saved to /var/cache/conftool/dbconfig/20190926-080949-marostegui.json
  • 08:07 elukey: executed 'rmr /yarn-rmstore/analytics-test-hadoop/ZKRMStateRoot' on conf1004's zkCli.sh to clean up znodes - T217057
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to change binlog format', diff saved to https://phabricator.wikimedia.org/P9197 and previous config saved to /var/cache/conftool/dbconfig/20190926-080442-marostegui.json
  • 08:02 marostegui: Depool db1078 to restart mysql to change its binlog format to ROW
  • 07:57 vgutierrez: switching from nginx to ats-tls on cp4023 - T231433
  • 07:49 godog: swift eqiad-prod: continue ms-be1027 decom - T233289
  • 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:42 moritzm: draining ganeti2001 for upcoming reboot (combined kernel/qemu security updates)
  • 07:41 vgutierrez: switching from nginx to ats-tls on cp5003 - T231433
  • 07:10 marostegui: Power off db1114 for mainboard replacement T229452
  • 07:09 marostegui: Stop mysql on db1114 for mainboard replacement - T229452
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 06:55 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:41 marostegui: Sanitize nqowiki on db1124:3313 and db2094:3313 - T230543
  • 06:39 marostegui: Deploy schema change on db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3311 db2091:3312 db2084:3314 db2089:3315 db2089:3316 db2087:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9196 and previous config saved to /var/cache/conftool/dbconfig/20190926-063555-marostegui.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): ' Repool db2088:3312 db2084:3315 db2087:3316 db2086:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9195 and previous config saved to /var/cache/conftool/dbconfig/20190926-062922-marostegui.json
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9194 and previous config saved to /var/cache/conftool/dbconfig/20190926-053029-marostegui.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9193 and previous config saved to /var/cache/conftool/dbconfig/20190926-051916-marostegui.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Give some API weight to db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9192 and previous config saved to /var/cache/conftool/dbconfig/20190926-050937-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to db1081 - T230784', diff saved to https://phabricator.wikimedia.org/P9191 and previous config saved to /var/cache/conftool/dbconfig/20190926-050722-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 master and remove read-only from s4 T230784', diff saved to https://phabricator.wikimedia.org/P9190 and previous config saved to /var/cache/conftool/dbconfig/20190926-050140-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance T230784', diff saved to https://phabricator.wikimedia.org/P9189 and previous config saved to /var/cache/conftool/dbconfig/20190926-050050-marostegui.json
  • 05:00 marostegui: Starting s4 failover from db1081 to db1138 - T230784
  • 04:15 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1138 with weight 0 T230784', diff saved to https://phabricator.wikimedia.org/P9188 and previous config saved to /var/cache/conftool/dbconfig/20190926-041508-marostegui.json
  • 04:10 marostegui: Start pre-switchover s4 steps T230784

2019-09-25

  • 21:59 bblack: remove GRE MTU hacks on archiva1001 gerrit2001 cobalt install1002 - T232602
  • 21:58 bblack: remove GRE MTU hacks on eqiad caches (cp1xxx) - T232602
  • 21:57 bblack: remove GRE MTU hacks on esams caches (cp3xxx) - T232602
  • 21:56 bblack: remove GRE MTU hacks on eqsin caches (cp5xxx) - T232602
  • 21:10 AndyRussG: update fruec from 97128874bf to c591bd653b
  • 21:00 ejegg: updated fundraising internal dashboard from 4473c65af0 to 69fdbec60d
  • 20:23 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@dbf4e7e]: Speed up querySelectors in domUtil (T229286) (duration: 05m 32s)
  • 20:20 hashar: Upgrading CI Jenkins
  • 20:17 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@dbf4e7e]: Speed up querySelectors in domUtil (T229286)
  • 19:28 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.24 refs T220749 (duration: 01m 03s)
  • 19:27 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.24 refs T220749
  • 18:24 twentyafterfour@deploy1001: Finished deploy [releng/phatality@42ba003]: trying again (duration: 03m 31s)
  • 18:21 twentyafterfour@deploy1001: Started deploy [releng/phatality@42ba003]: trying again
  • 18:19 twentyafterfour@deploy1001: Finished deploy [releng/phatality@42ba003]: deploy for version 5.6.15 (duration: 00m 50s)
  • 18:19 twentyafterfour@deploy1001: Started deploy [releng/phatality@42ba003]: deploy for version 5.6.15
  • 18:13 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: Deploy phatality (duration: 00m 24s)
  • 18:13 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: Deploy phatality
  • 18:11 Amir1: creating nqowiki is finished now
  • 18:10 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 39s)
  • 18:07 ladsgroup@deploy1001: Synchronized dblists/rtl.dblist: Create nqowiki T230359 (duration: 01m 05s)
  • 18:01 Amir1: creating nqowiki is going to take five more minutes
  • 17:57 ladsgroup@deploy1001: Synchronized langlist: Create nqowiki T230359 (duration: 01m 02s)
  • 17:56 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Create nqowiki T230359 (duration: 01m 05s)
  • 17:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create nqowiki T230359 (duration: 01m 04s)
  • 17:51 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 17:47 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 01m 04s)
  • 17:29 mutante: DNS - adding nqo (N'Ko) to langlist for new nqo.wikipedia, approved by langcom https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_N'Ko (T230359)
  • 17:11 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/WikimediaMaintenance/addWiki.php: Redefine RevisionStore service for the wiki being created (T212881) (duration: 01m 05s)
  • 17:08 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/WikimediaMaintenance/addWiki.php: Redefine RevisionStore service for the wiki being created (T212881) (duration: 01m 04s)
  • 16:19 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: Enable WelcomeSurvey for euwiki (T233063) (duration: 01m 04s)
  • 16:06 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 537628|Fix incorrect channel name for TranslationNotifications extension (T144780) (duration: 01m 06s)
  • 15:38 moritzm: installing php5 security updates
  • 15:07 moritzm: imported jenkins 2.176.4 for jessie/stretch T233214
  • 14:57 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=True)
  • 14:57 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:55 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.24/extensions/Wikibase/view/lib/resources.php: Revert "Merge valueview modules": T233800 (duration: 01m 04s)
  • 14:53 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix Draft namespace aliases (T233770) (duration: 01m 04s)
  • 14:52 onimisionipe: pool wdqs1005 - lag issues have minimized.
  • 14:38 moritzm: restarting apache on analytics-tool/an-tool to pick up Expat security update
  • 14:35 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=True)
  • 14:34 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:29 moritzm: restarting apache on grafana1001 to pick up Expat security update
  • 14:14 moritzm: restarting apache on various services to pick up Expat security update (releases, netmon, miscweb, graphite, planet,puppetboard)
  • 14:02 marostegui: Deploy schema change on db2086:3318
  • 14:00 effie: Rolling restart thumbor for expat updat
  • 13:55 moritzm: rolling restart of apache on webperf* to pick up Expat security update
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9183 and previous config saved to /var/cache/conftool/dbconfig/20190925-135317-marostegui.json
  • 13:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 13:51 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:51 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 13:51 filippo@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:45 _joe_: restarting trafficserver on cp1075 to pick up the change
  • 13:41 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T230817 Remove origin trials config (duration: 01m 05s)
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9182 and previous config saved to /var/cache/conftool/dbconfig/20190925-133146-marostegui.json
  • 13:31 moritzm: installing remaining expat security updates
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9181 and previous config saved to /var/cache/conftool/dbconfig/20190925-132147-marostegui.json
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1075 after BBU replacement', diff saved to https://phabricator.wikimedia.org/P9180 and previous config saved to /var/cache/conftool/dbconfig/20190925-131149-marostegui.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1075 after replacing its BBU', diff saved to https://phabricator.wikimedia.org/P9179 and previous config saved to /var/cache/conftool/dbconfig/20190925-130613-marostegui.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3311 T233625', diff saved to https://phabricator.wikimedia.org/P9178 and previous config saved to /var/cache/conftool/dbconfig/20190925-125601-marostegui.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): ' Depool for schema change on the logging table: db2088:3312 db2084:3315 db2087:3316 db2086:3317 T233625', diff saved to https://phabricator.wikimedia.org/P9177 and previous config saved to /var/cache/conftool/dbconfig/20190925-125140-marostegui.json
  • 12:47 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:47 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:46 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:45 akosiaris@: helmfile [EQIAD] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:45 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:44 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 12:44 marostegui: Repool labsdb1011 T233766
  • 12:41 marostegui: Shutdown db1075 for onsite maintenance T233534
  • 12:37 marostegui: Stop MySQL on db1075 for BBU replacement T233534
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 for BBU replacement T233534', diff saved to https://phabricator.wikimedia.org/P9176 and previous config saved to /var/cache/conftool/dbconfig/20190925-123736-marostegui.json
  • 12:34 onimisionipe: depool wdqs1005 to allow it catch up on lag
  • 12:32 @: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
  • 12:29 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 12:28 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 12:18 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@241b284]: Performance tweaks: domUtil + addSectionEditButtons (T229286) (duration: 05m 17s)
  • 12:13 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@241b284]: Performance tweaks: domUtil + addSectionEditButtons (T229286)
  • 12:05 akosiaris: depool kubernetes1001 and disable puppet on it for rsyslog mmkubernetes testing
  • 12:05 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=kubernetes1001.*
  • 11:57 vgutierrez: switch cp1078 from nginx to ats-tls - T231433
  • 11:37 vgutierrez: switch cp2005 from nginx to ats-tls - T231433
  • 11:29 onimisionipe: restarted wdqs-blazegraph on wdqs1005
  • 11:15 onimisionipe: repooled wdqs1004 to reduce load on the wdqs public cluster
  • 11:15 Urbanecm: EU SWAT done
  • 11:13 vgutierrez: switch cp3035 from nginx to ats-tls - T231433
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 127485c: Fully close bgwikinews (T233322) (duration: 01m 06s)
  • 10:48 vgutierrez: Switch from nginx to ats-tls on cp4022 - T231433
  • 10:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:46 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:27 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 16s)
  • 10:26 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 10:26 vgutierrez: switch cp5002 from nginx to ats-tls - T231433
  • 10:25 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 12s)
  • 10:25 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 10:22 twentyafterfour@deploy1001: deploy aborted: (no justification provided) (duration: 00m 42s)
  • 10:21 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 10:13 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 45m 54s)
  • 09:51 @: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
  • 09:50 @: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'codfw' .
  • 09:27 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 09:20 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 02m 24s)
  • 09:18 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 09:16 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 54s)
  • 09:15 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 09:07 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'codfw' .
  • 09:06 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 09:02 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 09:02 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 09:01 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 08:52 godog: roll-restart kibana
  • 08:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:48 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 00m 05s)
  • 08:48 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 08:48 twentyafterfour@deploy1001: Finished deploy [releng/phatality@8f05ba9]: (no justification provided) (duration: 09m 26s)
  • 08:44 vgutierrez: repooling cp4027 - T233667
  • 08:39 twentyafterfour@deploy1001: Started deploy [releng/phatality@8f05ba9]: (no justification provided)
  • 07:51 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T233584 revert: [cirrus] temp disable sanity check (duration: 01m 05s)
  • 07:38 moritzm: installing emacs updates for buster (from SUA update, extended ELPA repository key)
  • 07:28 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: c761ec1: Revert "Add localized Wikipedia wordmark for szlwiki" (T233104) (duration: 01m 04s)
  • 07:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: c761ec1: Revert "Add localized Wikipedia wordmark for szlwiki" (T233104) (duration: 01m 16s)
  • 07:17 onimisionipe: pool wdqs1005 to allow depooling wdqs1004 to handle lag issues
  • 07:17 elukey: allow analytics users to log in into stat1005
  • 06:33 _joe_: restarting pybal on all low-traffic lbs
  • 06:29 @: helmfile [CODFW] Ran 'sync' command on namespace 'restrouter' for release 'codfw' .
  • 06:29 @: helmfile [EQIAD] Ran 'sync' command on namespace 'restrouter' for release 'production' .
  • 06:21 marostegui: Deploy schema change on db2085:3311 T233625
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3311 T233625', diff saved to https://phabricator.wikimedia.org/P9171 and previous config saved to /var/cache/conftool/dbconfig/20190925-062036-marostegui.json
  • 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:24 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 05:11 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 05:06 marostegui: Run a data check on labsdb1011 - T233766
  • 04:43 marostegui: Deploy schema change on s3 with replication - T231172
  • 03:28 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.24 refs T220749
  • 03:03 krinkle@deploy1001: Synchronized docroot/noc/: c7c6c0ee0, 8405bf1c2 (duration: 01m 05s)
  • 03:01 krinkle@deploy1001: Synchronized src/: c7c6c0ee0, 8405bf1c2 (for noc.wm.o) (duration: 01m 09s)
  • 02:58 twentyafterfour: belatedly promoting wmf.24 to group0 refs T220749
  • 02:32 onimisionipe: depool wdqs1005 to let it catch up with lag
  • 02:30 onimisionipe: pool wdqs1006 - it has caught up with lag
  • 01:16 mutante: stat1007 - restart nagios-nrpe-server, echo "please don't use all of the RAM on this server" | wall
  • 01:14 krinkle@deploy1001: Synchronized wmf-config/: 3373247e12 (duration: 01m 04s)
  • 01:12 krinkle@deploy1001: Synchronized src/WmfClusters.php: 3373247e123b (duration: 01m 04s)
  • 01:08 krinkle@deploy1001: Synchronized tests: 3373247e123b5 (duration: 01m 04s)
  • 01:07 krinkle@deploy1001: Synchronized docroot/noc: 3373247e123b53 and 1efc8bd (duration: 01m 05s)
  • 01:03 krinkle@deploy1001: Synchronized README: 3373247e123b53 (duration: 01m 04s)
  • 01:00 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 3373247e123b53 - create new file (duration: 01m 05s)
  • 00:47 krinkle@deploy1001: Synchronized wmf-config/: 6dca83a9f6c2c (duration: 01m 04s)
  • 00:44 krinkle@deploy1001: Synchronized docroot/noc/: 6dca83a9f6c2c (duration: 01m 05s)
  • 00:43 krinkle@deploy1001: Synchronized tests/: 6dca83a9f6c2c (duration: 01m 05s)
  • 00:02 mutante: cp1075 - systemctl restart vhtcpd
  • 00:02 mutante: cp1075 - systemctl status vhtcpd

2019-09-24

  • 23:38 mutante: gerrit service restart to switch LDAP backend
  • 23:35 bstorm_: wiki-replicas depooled labsdb1011
  • 23:33 mutante: gerrit2001 - restarting gerrit service
  • 23:30 mutante: switching LDAP servers used by Gerrit to readonly replicas. stop using so called "labs" config for LDAP backend.
  • 22:26 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.24 refs T220749 (duration: 40m 38s)
  • 21:53 mutante: restbase1024 - enable IPMI over LAN which wasn't working before
  • 21:45 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.24 refs T220749
  • 21:19 mutante: ganeti4001 - racadm racreset - attempt to fix IPMI
  • 20:19 twentyafterfour: restarting gerrit due to unreasonably high garbage collection times and sluggish performance in general.
  • 19:39 XioNoX: disable asw2-d-eqiad:ge-5/0/41 excessive flapping
  • 19:28 ejegg: updated payments-wiki from 939b771800 to 5193dcdfa9
  • 19:20 twentyafterfour: branching 1.34.0-wmf.24 refs T220749
  • 18:45 AndyRussG: updated fruec from fb29cb74 to 97128874bf
  • 18:08 ejegg: updated Fundraising CiviCRM feca96a2e3 to 52d2a24404
  • 17:13 cstone: civicrm revision changed from 5def62ab05 to feca96a2e3
  • 14:40 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:28 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:24 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:24 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:17 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:09 moritzm: rebooting cloudvirt1021 for kernel update
  • 14:09 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:09 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 13:50 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 13:50 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:49 jbond42__: promote puppetmaster1003 to a real puppetmaster backend https://gerrit.wikimedia.org/r/c/operations/puppet/+/538686
  • 13:45 _joe_: installing the new conftool version on the cumin hosts
  • 13:40 _joe_: uploaded conftool 1.1.4-3 to stretch-wikimedia, T233679
  • 13:19 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=True)
  • 13:18 volans@cumin1001: START - Cookbook sre.hosts.decommission
  • 13:02 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 12:22 arturo: remove systemd-sysv from jessie-wikimedia/openstack-mitaka-jessie in install1002 (T231793)
  • 12:20 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T233584 [cirrus] temp disable sanity check (duration: 00m 55s)
  • 12:18 mobrovac@deploy1001: Finished deploy [restbase/deploy@19d0f44]: REVERT (due to wikifeeds problems): Start using the wikifeeds service for v1/feed - T170455 (duration: 02m 35s)
  • 12:16 mobrovac@deploy1001: Started deploy [restbase/deploy@19d0f44]: REVERT (due to wikifeeds problems): Start using the wikifeeds service for v1/feed - T170455
  • 11:47 mobrovac@deploy1001: Finished deploy [restbase/deploy@87eea26]: Start using the wikifeeds service for v1/feed - T170455 (duration: 02m 35s)
  • 11:45 mobrovac@deploy1001: Started deploy [restbase/deploy@87eea26]: Start using the wikifeeds service for v1/feed - T170455
  • 11:43 Urbanecm: EU SWAT done
  • 11:41 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 11a48f8: Add support for some languages on Commons and stop support for nys on Wikidata (T230480) (duration: 00m 56s)
  • 11:39 Urbanecm: Run mwscript initSiteStats.php --wiki=napwikisource --update (T233673)
  • 11:37 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 9eaa4f8: Set wgArticleCountMethod to any for napwikisource (T233673) (duration: 00m 56s)
  • 11:30 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/mxwikimedia.png (T233670)
  • 11:30 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: b6947c5: Follow-up 8f3f0705baed: add missing namespace for eswiki (T233562) (duration: 00m 56s)
  • 11:27 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/MassMessage/: SWAT: ba9b209: Provide deduplication info to MassMessageJob (T232379) (duration: 00m 57s)
  • 11:26 urbanecm@deploy1001: Synchronized static/images/project-logos/mxwikimedia.png: SWAT: 246b352: Update logo for mx.wikimedia (T233670) (duration: 00m 54s)
  • 11:24 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.less: SWAT: d4c64a7: Fix broken display of mobile overlay headings (T233163) (duration: 00m 57s)
  • 11:16 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8bf6aae: Enable alternate mobile link for ar,zh,hi wikis (T206497) (duration: 00m 54s)
  • 11:10 _joe_: all wikis (including API) are now served by PHP7 T219150
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: a14b772: FileImporter: limited default deployment (2/2; T232539) (duration: 00m 56s)
  • 11:05 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8a89652: FileImporter: limited default deployment (1/2; T232539) (duration: 01m 03s)
  • 10:56 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@7857639]: Bump CirrusSearchLinksUpdate concurrency to clear the queue - T233584 (duration: 01m 00s)
  • 10:55 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@7857639]: Bump CirrusSearchLinksUpdate concurrency to clear the queue - T233584
  • 10:54 _joe_: converting all appservers to php7, T219150
  • 10:51 mobrovac@deploy1001: Finished deploy [restbase/deploy@19d0f44]: Expose the key_value buckets to production IPs - T223953 (duration: 22m 20s)
  • 10:50 _joe_: converting mw1261 to full-php7
  • 10:29 mobrovac@deploy1001: Started deploy [restbase/deploy@19d0f44]: Expose the key_value buckets to production IPs - T223953
  • 10:12 marostegui: Deploy schema change on s7 (centralauth and wikis) master with replication - T231172
  • 10:03 marostegui: Deploy schema change on s1 master with replication - T231172
  • 09:58 marostegui: Deploy schema change on labswiki (wikitech) and labtestwiki T231172
  • 09:51 effie: Upgrade to php 7.2.22 on mwmaint* - T230024
  • 09:30 marostegui: Deploy schema change on s2 master with replication - T231172
  • 09:26 effie: Upgrade to php 7.2.22 on deploy* - T230024
  • 09:14 marostegui: Drop table archive_save on frwiki T233187
  • 08:43 marostegui: Deploy schema change on s8 master with replication - T231172
  • 08:37 mvolz@deploy1001: scap-helm zotero finished
  • 08:37 mvolz@deploy1001: scap-helm zotero cluster codfw completed
  • 08:37 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
  • 08:36 jynus: stop db1114 mariadb process for some time
  • 08:33 moritzm: installed expat security updates on remaining mw* servers
  • 08:33 mvolz@deploy1001: scap-helm zotero finished
  • 08:32 mvolz@deploy1001: scap-helm zotero cluster eqiad completed
  • 08:32 mvolz@deploy1001: scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
  • 08:30 marostegui: Deploy schema change on s4 master with replication - T231172
  • 08:29 effie: Disable puppet on api cluster and restart php-fpm to finish php7 migration - T219150
  • 08:19 mvolz@deploy1001: scap-helm zotero finished
  • 08:19 mvolz@deploy1001: scap-helm zotero cluster staging completed
  • 08:19 mvolz@deploy1001: scap-helm zotero upgrade staging -f zotero-values-staging.yaml stable/zotero [namespace: zotero, clusters: staging]
  • 08:18 marostegui: Deploy schema change on s5 master with replication - T231172
  • 07:51 onimisionipe: depool wdqs1006 to clear HTTP too many request error
  • 07:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:42 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:29 moritzm: uploaded openjdk-8 8u222-b10-1~deb10u2 to buster-wikimedia component/jdk8 T233604
  • 07:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:18 godog: swift eqiad-prod: continue ms-be1027 decom T233289
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:37 marostegui: Stop MySQL on db1066 - T233071
  • 06:36 marostegui: Remove db1066 from tendril and zarcillo T233071
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1075', diff saved to https://phabricator.wikimedia.org/P9163 and previous config saved to /var/cache/conftool/dbconfig/20190924-063002-marostegui.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1075', diff saved to https://phabricator.wikimedia.org/P9162 and previous config saved to /var/cache/conftool/dbconfig/20190924-061943-marostegui.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1075', diff saved to https://phabricator.wikimedia.org/P9161 and previous config saved to /var/cache/conftool/dbconfig/20190924-053919-marostegui.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Give weight 100 to db1075', diff saved to https://phabricator.wikimedia.org/P9160 and previous config saved to /var/cache/conftool/dbconfig/20190924-052545-marostegui.json
  • 05:13 cdanis@cumin1001: dbctl commit (dc=all): 're-do T230783 master promotion and set read-write', diff saved to https://phabricator.wikimedia.org/P9159 and previous config saved to /var/cache/conftool/dbconfig/20190924-051307-cdanis.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1123 to s3 master and remove read-only from s3 T230783', diff saved to https://phabricator.wikimedia.org/P9158 and previous config saved to /var/cache/conftool/dbconfig/20190924-051147-marostegui.json
  • 05:10 cdanis: T230783 mark DEFAULT not s3 as readonly in etcd etcd dbconfig data
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 as read-only for maintenance T230783', diff saved to https://phabricator.wikimedia.org/P9157 and previous config saved to /var/cache/conftool/dbconfig/20190924-050034-marostegui.json
  • 05:00 marostegui: Starting s3 failover from db1075 to db1123 - T230783
  • 04:21 marostegui@cumin1001: dbctl commit (dc=all): 'Set weight 0 to db1123 T230783', diff saved to https://phabricator.wikimedia.org/P9156 and previous config saved to /var/cache/conftool/dbconfig/20190924-042121-marostegui.json
  • 04:13 marostegui: Start pre switchover steps - T230783
  • 03:52 chaomodus: rebooted netboxdb[12]001 for kernel upgrade
  • 03:46 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:45 crusnov@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:43 mutante: db2060 - remove PXE flag boot override - set Boot Device to none

2019-09-23

  • 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 23:50 dzahn@cumin1001: Updating IPMI password on 92 hosts - dzahn@cumin1001
  • 23:50 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 23:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 23:43 dzahn@cumin1001: Updating IPMI password on 92 hosts - dzahn@cumin1001
  • 23:43 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 21:32 catrope@deploy1001: Synchronized wmf-config/VariantSettings.php: Syncing no-op change for T232419 (duration: 00m 57s)
  • 19:57 cdanis: T233657 ✔️ cdanis@cp4027.ulsfo.wmnet ~ 🕓🍵 sudo -i depool
  • 19:16 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: 2a7a125: Redefine hiwikisource extra namespaces (T233365) (duration: 00m 57s)
  • 19:09 Urbanecm: Going to deploy one more last-time patch
  • 18:51 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Update active handler config, take 2 (T233610) (duration: 00m 56s)
  • 18:48 Urbanecm: Morning SWAT done
  • 18:48 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 37fcbdf: Fix: Move hiwikisource extra namespace to extra namespace section (duration: 00m 56s)
  • 18:35 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: be2f9d4: Add localized Wikipedia wordmark for szlwiki (T233104) (duration: 00m 55s)
  • 18:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-szl.svg: SWAT: d397f5f: Add localized Wikipedia wordmark for szlwiki (T233104) (duration: 00m 56s)
  • 18:23 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8f3f070: Disallow indexing discussion and user pages on eswiki (T233562) (duration: 00m 56s)
  • 18:21 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 6cb2042: New throttle rule for Wikimedia Chile editathon (T233378) (duration: 00m 56s)
  • 18:13 Urbanecm: Security deploy for T207094
  • 18:03 gilles: T233095 Purge articles for all wikis: foreachwiki maintenance/purgeList.php --all --verbose
  • 17:59 gilles@deploy1001: Synchronized php-1.34.0-wmf.23/maintenance/purgeList.php: T233095 Make purgeList.php use getCdnUrls() (duration: 00m 56s)
  • 17:54 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: MachineVision: Update active handler config (T233610) (duration: 00m 58s)
  • 16:53 elukey@deploy1001: Finished deploy [analytics/refinery@b99647e]: (no justification provided) (duration: 07m 24s)
  • 16:46 elukey@deploy1001: Started deploy [analytics/refinery@b99647e]: (no justification provided)
  • 16:33 Urbanecm: Remove my temporary adminship on bgwikinews (T233322)
  • 16:29 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: 84afa44: Close bgwikinews, but allow sysops to edit (T233322; 2/2) (duration: 00m 56s)
  • 16:27 urbanecm@deploy1001: Synchronized dblists/closed.dblist: 84afa44: Close bgwikinews, but allow sysops to edit (T233322; 1/2) (duration: 00m 58s)
  • 16:26 Urbanecm: mwscript createAndPromote.php --wiki=bgwikinews --sysop --force 'Martin Urbanec' - temporary (T233322)
  • 13:21 moritzm: installing qemu security updates on remaining cloudvirt hosts
  • 12:40 moritzm: rolling restart of graphoid on scb to pick up expat security update
  • 12:05 moritzm: restarting apache on bast5001 to pick up expat security update
  • 11:50 moritzm: restarting Apache/HHVM/PHP on mw1261-mw1265 after Expat security update
  • 11:42 vgutierrez: switching cp4027 from nginx to ats-tls - T231627
  • 11:35 moritzm: installing expat security updates
  • 11:33 awight: EU SWAT finished
  • 11:31 awight@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/FileImporter: SWAT: Add change tags to all FileImport text revisions (T227849) (duration: 00m 57s)
  • 11:23 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Set item terms on write both up to Q40Mio (T225055) (duration: 00m 55s)
  • 11:12 effie: Disable puppet and rolling restart of php7.2-fpm on mw[1321-1333] - T219150
  • 11:11 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Add localized logos for the Zulu Wikipedia (T233424) (duration: 00m 56s)
  • 11:06 awight@deploy1001: Synchronized static/images/project-logos: SWAT: Add localized logos for the Zulu Wikipedia (T233424) (duration: 00m 57s)
  • 11:05 moritzm: uploaded openjdk 8u222-b10-1~deb10u1 to buster-wikimedia/component/jdk8 (bootstrap build, second boron build following) T233604
  • 10:43 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 09:51 jynus: stopping db2102 mariadb to recover db
  • 09:45 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=loginwiki --logwiki=metawiki 'نعنوعه' 'مريانا_علي' (T233585)
  • 09:44 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bnwiki --logwiki=metawiki 'Huangzonghao' 'HUANGZONGHAO' (T233585)
  • 09:38 akosiaris: T218184 upload to apt.wikimedia.org/jessie-wikimedia apertium-dan-nor_1.4.0-1+wmf1, apertium-nno-nob_1.2.0-1+wmf1, apertium-swe-dan_0.8.0-2+wmf1, apertium-swe-nor_0.3.0-2+wmf1
  • 09:02 effie: Disable puppet and rolling restart php-fpm on mw[1312-1317,1339-1347]* - T219150
  • 08:31 elukey@deploy1001: Finished deploy [analytics/refinery@a20a647]: Deploy python2 -> python3 fixes (duration: 07m 26s)
  • 08:24 elukey@deploy1001: Started deploy [analytics/refinery@a20a647]: Deploy python2 -> python3 fixes
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9148 and previous config saved to /var/cache/conftool/dbconfig/20190923-082119-marostegui.json
  • 07:41 godog: swift run swiftrepl without deletes eqiad -> codfw
  • 07:40 godog: swift eqiad-prod: continue ms-be1027 decom - T233289
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9147 and previous config saved to /var/cache/conftool/dbconfig/20190923-073044-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1123 after kernel and binlog format change', diff saved to https://phabricator.wikimedia.org/P9146 and previous config saved to /var/cache/conftool/dbconfig/20190923-071537-marostegui.json
  • 07:08 marostegui: Stop MySQL on db1123 to reboot to change binlog format and kernel - T230783
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 to change binlog format T230783', diff saved to https://phabricator.wikimedia.org/P9145 and previous config saved to /var/cache/conftool/dbconfig/20190923-070628-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1123 and db1078 roles, db1078 will serve logpager and recentchanges, db1123 will just serve general traffic', diff saved to https://phabricator.wikimedia.org/P9144 and previous config saved to /var/cache/conftool/dbconfig/20190923-065056-marostegui.json
  • 05:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1066 from config T233071 (duration: 00m 56s)
  • 05:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1066 from config T233071 (duration: 01m 15s)

2019-09-22

  • off: marostegui set s3 master RW

2019-09-21

  • 05:42 shdubsh: re-enable input-kafka-rsyslog-shipper in codfw
  • 05:33 shdubsh: drop input-kafka-rsyslog-shipper in codfw
  • 02:15 bblack: dbproxy1017: executing "systemctl reload haproxy" to recover from false healthcheck failure (network issues) on master
  • 02:14 bblack: dbproxy1016: executing "systemctl reload haproxy" to recover from false healthcheck failure (network issues) on master
  • 01:52 shdubsh: temporarily removing input-kafka-rsyslog-shipper-eqiad/codfw from logstash2004-5-6
  • 01:34 mutante: restarting mobileapps service on scb*
  • 01:34 mutante: restarted mobileapps service on scb1001
  • 01:21 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
  • 01:21 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1088.eqiad.wmnet
  • 01:21 bblack: re-pooling cp108[78] in D2 via confctl
  • 01:14 shdubsh: temporarily removing input-kafka-rsyslog-shipper-eqiad/codfw from logstash1007
  • 01:08 shdubsh: removed input-kafka-rsyslog-shipper-eqiad/codfw from logstash inputs logstash1008 and logstash1009
  • 00:54 mutante: aqs1009 - systemctl restart aqs
  • 00:54 mutante: aqs1006 - systemctl restart aqs
  • 00:48 mutante: aqs1005 - systemctl restart aqs
  • 00:46 shdubsh: restarting logstash on logstash1008 without udp-localhost-eqiad/codfw configs
  • 00:39 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1088.eqiad.wmnet
  • 00:38 bblack: depooling confctl things in rack D2
  • 00:38 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet

2019-09-20

  • 21:30 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/CheckUser: fix T233453 (duration: 00m 56s)
  • 21:29 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser: fix T233453 (duration: 00m 58s)
  • 19:26 XioNoX: update eqsin firewall filters - T233268
  • 16:35 krinkle@deploy1001: Synchronized vendor/: ead70240892e9 (duration: 00m 59s)
  • 16:14 XioNoX: update eqiad firewall filters - T233268
  • 16:11 XioNoX: update esams firewall filters - T233268
  • 15:17 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bgwiki --logwiki=metawiki 'Newrdkter' 'NRdk' (T233313)
  • 15:03 XioNoX: remove AS-PATH prepending in ams
  • 11:29 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:22 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:16 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:15 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 10:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 10:17 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
  • 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:17 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 10:17 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
  • 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 10:17 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 10:17 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 09:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 09:31 jbond@cumin1001: Updating IPMI password on 1 hosts - jbond@cumin1001
  • 09:31 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 09:30 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
  • 09:30 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 09:30 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 09:30 jbond@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 08:52 jynus: creating new database on m1 "bacula9" T229209
  • 08:28 hashar: Killed zuul-server process on contint2001 which was establishing connections to Gerrit and filling the pool of allowed ssh connections # T233390
  • 08:23 hashar: CI in default since it is somehow no more able to fetch from Gerrit T233390
  • 08:20 hashar: contint1001: upgrade zuul to 2.5.1-wmf10 # T203846
  • 08:12 hashar: contint2001: upgrade zuul to 2.5.1-wmf10 # T203846
  • 07:46 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:46 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:46 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:45 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 07:14 godog: eqiad-prod: start ms-be1027 decom - T233289
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1089 from logpager and contributions after testing, repool back with normal weight on main traffic T223151', diff saved to https://phabricator.wikimedia.org/P9136 and previous config saved to /var/cache/conftool/dbconfig/20190920-052902-marostegui.json
  • 05:27 marostegui: Analyze table enwiki.logging on db2102 - T223151
  • 05:07 marostegui: Remove temporary index on hiwikisource views T219374
  • 01:06 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@a29da76]: Rolling back deployment due to alerts beginning after 0:00 UTC (duration: 02m 51s)
  • 01:05 jforrester@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/TimedMediaHandler/: T233360 Fix Safari 13.0 regression in video playback with audio (duration: 00m 58s)
  • 01:03 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@a29da76]: Rolling back deployment due to alerts beginning after 0:00 UTC

2019-09-19

  • 23:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 22:51 ejegg: updated payments-wiki from adef0e858f to 939b771800
  • 22:34 mutante: gerrit1001 - stopping puppet, removing gerrit IP from interface, rebooting
  • 21:37 niharika29@deploy1001: Synchronized wmf-config/VariantSettings.php: Enable special:mute on testwiki; T231577 (duration: 00m 56s)
  • 20:15 XioNoX: push firewall policies to pfw3-eqiad - T233325
  • 20:07 XioNoX: push firewall policies to pfw3-codfw - T233325
  • 19:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.23 refs T220748
  • 19:02 twentyafterfour: There are currently no blockers for T220748 so I am preparing to deploy 1.34.0-wmf.23 to all wikis.
  • 18:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
  • 18:14 XioNoX: add TCP-MSS 1436 to cr2-eqiad external interfaces - T232602
  • 18:12 XioNoX: add TCP-MSS 1436 to cr1-eqiad external interfaces - T232602
  • 18:01 bblack: lvs2004 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:55 mutante: puppetmaster1001 - add mcrouter cert for mw1298.eqiad.wmnet (T192457)
  • 17:52 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 17:48 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/AbuseFilter/includes/: T156095, 32cf50453cd (duration: 01m 04s)
  • 17:47 arlolra@deploy1001: Finished deploy [parsoid/deploy@77630c5]: Updating Parsoid to 6bf23c2 (duration: 08m 52s)
  • 17:43 Krinkle: Move whisper/MediaWiki/wanobjectcache/revision_row_1/29 to whisper/MediaWiki/wanobjectcache/revision_row_1_29 on graphite1004 and graphite2003 (T232907)
  • 17:38 arlolra@deploy1001: Started deploy [parsoid/deploy@77630c5]: Updating Parsoid to 6bf23c2
  • 17:27 bblack: lvs2006 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:27 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/includes/libs/objectcache/wancache: 2e910c9, T232907 (duration: 01m 03s)
  • 17:23 bblack: lvs2005 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:19 bblack: lvs2006 - restart pybal for deploy/test of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536324/
  • 17:16 bblack: lvs200[456] - puppet disabled for https://gerrit.wikimedia.org/r/536324 deploy/test
  • 17:14 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@69b3737]: Update mobileapps to cfc3062 (duration: 05m 42s)
  • 17:08 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@69b3737]: Update mobileapps to cfc3062
  • 16:31 _joe_: removed manually the purge_checkuser cron from mwmaint1002, to have puppet recreate it
  • 16:20 ejegg: updated fundraising CiviCRM from 90db6cb5a1 to 5def62ab05
  • 16:15 papaul: shutting down scs-a1-codfw for replacement
  • 15:26 moritzm: repooling restbase2012 after completed Cassandra bootstrap T224553
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=restbase,service=cassandra,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase-backend,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase-ssl,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:25 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=restbase,service=restbase,dc=codfw,name=restbase2012.codfw.wmnet
  • 15:06 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 15:05 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:56 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@16a6af1]: Increase num_workers to (ncpu * 1.5) (T229286) (duration: 05m 39s)
  • 14:51 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@16a6af1]: Increase num_workers to (ncpu * 1.5) (T229286)
  • 14:47 mobrovac@deploy1001: Finished deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #3 (duration: 10m 42s)
  • 14:37 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #3
  • 14:36 mobrovac@deploy1001: Finished deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #2 (duration: 08m 24s)
  • 14:31 mobrovac: bootstrap restbase2012-c -- T224553
  • 14:28 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present, take #2
  • 14:28 mobrovac@deploy1001: deploy aborted: Remove the TID suffix in the ETag, if present - T230272 (duration: 11m 20s)
  • 14:28 sbassett: Deployed security patch for T224203 (php-1.34.0-wmf.23)
  • 14:19 sbassett: Deployed security patch for T224203
  • 14:19 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=False)
  • 14:18 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 14:17 mobrovac@deploy1001: Started deploy [restbase/deploy@44f4c79]: Remove the TID suffix in the ETag, if present - T230272
  • 13:54 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@c8abb0f]: Article recommendation API: replace WDQS with MW API (T216750) (duration: 03m 06s)
  • 13:51 mholloway-shell@deploy1001: Started deploy [recommendation-api/deploy@c8abb0f]: Article recommendation API: replace WDQS with MW API (T216750)
  • 13:43 reedy@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/Translate: T233308 (duration: 01m 07s)
  • 13:14 moritzm: powercycling mw1300
  • 13:12 mobrovac: bootstrap restbase2012-b -- T224553
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1089 into contributions service T223151', diff saved to https://phabricator.wikimedia.org/P9133 and previous config saved to /var/cache/conftool/dbconfig/20190919-130848-marostegui.json
  • 13:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@7f4b7f7]: Start using RESTBase built on Stretch - T224553 (duration: 21m 38s)
  • 12:39 mobrovac@deploy1001: Started deploy [restbase/deploy@7f4b7f7]: Start using RESTBase built on Stretch - T224553
  • 12:36 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 11:48 mobrovac: bootstrap restbase2012-a -- T224553
  • 11:32 Urbanecm: EU SWAT done
  • 11:26 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 199a05c: Add new throttle rule for Czech wiki course (T233199) (duration: 01m 01s)
  • 11:23 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: eab7c6a: c80f026: GrowthExperiments: GrowthExperiments: Enable Special:Homepage for euwiki, GrowthExperiments: Enable help panel for euwiki (T233066, T233065) (duration: 01m 05s)
  • 09:54 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/CheckUser: security T207094 (duration: 01m 02s)
  • 09:53 urbanecm@deploy1001: sync-file aborted: security T207094 (duration: 00m 28s)
  • 09:51 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser: security T207094 (duration: 01m 05s)
  • 09:22 godog: power back on ms-be1027, found with power off
  • 08:31 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: 393441b: Change configuration of AbuseFilter extension for enwikisource (T231750) (duration: 01m 04s)
  • 08:22 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:21 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser/: revert T207094 (duration: 01m 04s)
  • 08:20 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:14 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/CheckUser/: security T207094 (duration: 01m 06s)
  • 08:11 marostegui: Rename tables on db1133:labspuppet T233281
  • 07:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:40 moritzm: rebooting failoid1001 for kernel update
  • 07:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:39 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Give more logpager weight to db1089 T223151', diff saved to https://phabricator.wikimedia.org/P9131 and previous config saved to /var/cache/conftool/dbconfig/20190919-072234-marostegui.json
  • 07:01 moritzm: reimaging restbase2012 to stretch T224553
  • 06:18 marostegui: Sanitize hiwikisource on db1124:3313 and db2094:3313 T219374
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Temporarily pool db1089 into enwiki logpager T223151', diff saved to https://phabricator.wikimedia.org/P9130 and previous config saved to /var/cache/conftool/dbconfig/20190919-060440-marostegui.json
  • 05:11 marostegui: Stop MySQL on db2055 for decommission T233186
  • 05:11 marostegui: Remove db2055 from tendril and zarcillo T233186

2019-09-18

  • 23:18 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/MobileFrontend/resources/dist/: T233260, 1667ed9 (duration: 01m 04s)
  • 22:58 cmjohnson1: enabled asw2-c-eqiad interface xe-2/0/45
  • 22:40 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/resources/Resources.php: d6dadfd (duration: 01m 03s)
  • 22:37 krinkle@deploy1001: Synchronized php-1.34.0-wmf.23/extensions/AbuseFilter/includes/: T156095, ff44043efa59e9 (duration: 01m 05s)
  • 22:13 cmjohnson1: disabling asw2-c-eqiad xe-2/0/45 - cr1-eqiad to replace optic T233265
  • 21:54 gilles: T233095 Purging all eswiki articles (both desktop and mobile this time)
  • 21:53 gilles@deploy1001: Synchronized php-1.34.0-wmf.22/maintenance/purgeList.php: T233095 Make purgeList.php use getCdnUrls() (duration: 01m 04s)
  • 21:13 XioNoX: enable damping on primary codfw-eqiad link - T196432
  • 21:09 XioNoX: enable damping on codfw-ulsfo link - T196432
  • 20:50 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: No longer load InitialiseSettings at all in CommonSettings (duration: 01m 03s)
  • 20:43 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Quick fix for wmfLoadInitialiseSettings() (duration: 01m 03s)
  • 20:40 jforrester@deploy1001: scap failed: average error rate on 9/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 20:23 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Factor out call to InitialiseSettings.php (duration: 01m 04s)
  • 20:18 @: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 20:18 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Variant configuration: Drop suport for serialised PHP (duration: 01m 04s)
  • 20:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Variant configuration: Never write to serialised PHP T223602 (duration: 01m 04s)
  • 20:15 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 20:11 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 20:07 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T208246 Enforce a 10-byte password for privileged users (duration: 01m 04s)
  • 19:57 urandom: decommissioning Cassandra, restbase2012-c -- T224553
  • 19:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:42 gilles: T233095 Purging all pages on eswiki
  • 19:27 joal@deploy1001: Finished deploy [analytics/aqs/deploy@bc9dde1]: Regular deploy - analytics weekly train - Second retry after fix (duration: 03m 40s)
  • 19:24 mutante: ganeti1001 - deleting krypton.eqiad.wmnet - decom T231546
  • 19:23 joal@deploy1001: Started deploy [analytics/aqs/deploy@bc9dde1]: Regular deploy - analytics weekly train - Second retry after fix
  • 19:14 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.23 refs T220748 (duration: 01m 04s)
  • 19:13 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.23 refs T220748
  • 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 19:07 twentyafterfour: There appear to be no blockers on T220748 so I'll proceed with deploying 1.34.0-wmf.23 to group 1.
  • 19:01 joal@deploy1001: Finished deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train - Retry after fix (duration: 02m 12s)
  • 18:59 joal@deploy1001: Started deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train - Retry after fix
  • 18:55 joal@deploy1001: Finished deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train (duration: 01m 05s)
  • 18:54 joal@deploy1001: Started deploy [analytics/aqs/deploy@5b011d1]: Regular deploy - analytics weekly train
  • 18:46 XioNoX: remove `border-in4 term ddos-0906` from all routers
  • 17:53 Amir1: Creating hiwikisource is done
  • 17:50 urandom: decommissioning Cassandra, restbase2012-b -- T224553
  • 17:48 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 32s)
  • 17:45 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Add hiwikisource logos (T218155) (duration: 01m 04s)
  • 17:43 ladsgroup@deploy1001: Synchronized wmf-config/VariantSettings.php: Add hiwikisource (T218155) (duration: 01m 05s)
  • 17:40 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add hiwikisource (T218155) (duration: 01m 04s)
  • 17:38 Amir1: manual write on hiwikisource "wikiadmin@10.64.0.205(hiwikisource)> update text set old_text = 'DB://cluster25/1';" (T218155)
  • 17:33 Amir1: mwscript maintenance/createAndPromote.php --wiki=hiwikisource --force --sysop Ladsgroup (T218155)
  • 17:28 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 17:22 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 01m 06s)
  • 17:22 Jeff_Green: authdns-update to deploy DNS for new fundraising host
  • 17:03 mutante: ganeti2004 - resetting DRAC in an attempt to make IPMI work again
  • 17:00 Urbanecm: Morning SWAT done
  • 16:48 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Enable DNS blacklist on testwiki temporarily (T230822) (duration: 01m 03s)
  • 16:43 Urbanecm: 8340be9 sync is for T230822, mistakenly inserted `test` instead of the task number
  • 16:42 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 16:42 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 8340be9: Enable logging for BlockManager channel at info level (test) (duration: 01m 04s)
  • 16:36 @: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 16:35 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: dc1298d: Add Draft and Draft_talk aliases for wikis that define draft namespace (T223472) (duration: 01m 02s)
  • 16:31 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 6e59651: Disable FundraiserLandingPage extension on test.wikipedia.org (T203020) (duration: 01m 04s)
  • 16:26 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/tewikisource.png (T232065)
  • 16:25 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 7c987fc: Change Telugu Wikisource Logo (T232065; 2/2) (duration: 01m 06s)
  • 16:24 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: 7c987fc: Change Telugu Wikisource Logo (T232065; 1/2) (duration: 01m 05s)
  • 16:18 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 817d679: Turn on EventLogging at 100% for DonateWiki (T233145) (duration: 01m 04s)
  • 16:05 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: ba30276: Add suppressredirect right to filemovers on bnwiki (T233137) (duration: 01m 05s)
  • 15:55 moritzm: repooling restbase2011 after reimage/bootstrap
  • 15:53 urandom: decommissioning Cassandra, restbase2012-a -- T224553
  • 15:06 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 14:59 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-backend
  • 14:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:52 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 13:50 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 13:41 joal@deploy1001: Finished deploy [analytics/refinery@ca30c4e]: Regular analytics weekly train (duration: 05m 28s)
  • 13:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:36 joal@deploy1001: Started deploy [analytics/refinery@ca30c4e]: Regular analytics weekly train
  • 13:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:11 hashar: Restarting Jenkins, starting Zuul
  • 12:56 marostegui: Deploy schema change on the following s6 hosts: db1088, db1093, db1096, db1098, db1139, dbstore1005 - T231172
  • 12:52 hashar: gracefully stopping Zuul (kill SIGUSR1) to prepare for Jenkins restart
  • 12:40 marostegui: Deploy schema change on s6 codfw master with replication T231172
  • 12:18 vgutierrez: restarting ats-tls to avoid spreading Proxy-Connection header - T233205
  • 12:03 marostegui: Stop haproxy on dbproxy1006 - T233207
  • 11:29 mobrovac: bootstrap restbase2011-c -- T224553
  • 11:27 awight: EU SWAT complete
  • 11:27 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: Enable FileImport source wiki editing (T228851) (duration: 00m 59s)
  • 11:25 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Enable FileImport source wiki editing (T228851) (duration: 01m 03s)
  • 11:14 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: NowCommons test & test2wiki configuration (T228851) (duration: 01m 15s)
  • 10:17 onimisionipe: force relocation of shards for eqiad search(chi) cluster
  • 10:16 moritzm: restarting postgres on puppetdb1002/2002 after updating permissions for replication user
  • 10:00 mobrovac: bootstrap restbase2011-b -- T224553
  • 09:37 godog: run swiftrepl eqiad -> codfw on all containers, no deletes
  • 09:37 effie: upgrading netmon* to PHP 7.2.22 T230024
  • 09:35 godog: run swiftrepl eqiad -> codfw for transcoded containers
  • 08:59 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:57 jynus@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P9125 and previous config saved to /var/cache/conftool/dbconfig/20190918-085721-marostegui.json
  • 08:22 mobrovac: bootstrap restbase2011-a -- T224553
  • 07:43 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 07:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 06:43 moritzm: reimaging restbase2011 to stretch T224553
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P9124 and previous config saved to /var/cache/conftool/dbconfig/20190918-060401-marostegui.json
  • 05:58 marostegui: Deploy schema change on db2097:3316 - T233135
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool host after onsite checks T233184', diff saved to https://phabricator.wikimedia.org/P9123 and previous config saved to /var/cache/conftool/dbconfig/20190918-054755-marostegui.json
  • 05:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2055 from config T233186 (duration: 01m 04s)
  • 05:31 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2055 from config T233186 (duration: 01m 06s)
  • 05:03 marostegui: Start MySQL on db2127 T233184
  • 03:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.util/: 0333729e, ccfe88241 (duration: 01m 07s)

2019-09-17

  • 23:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.23 refs T220748
  • 23:20 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/VisualEditor/extension.json: aae62a8 (duration: 01m 05s)
  • 22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 22:43 dzahn@cumin1001: Updating IPMI password on 6 hosts - dzahn@cumin1001
  • 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 22:09 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add comment about MinimumPasswordLengthToLogin (duration: 01m 03s)
  • 21:45 cstone: civicrm revision changed from 45dbfdb96f to 90db6cb5a1
  • 21:45 tzatziki: removed one file for legal compliance
  • 21:12 XioNoX: delete AS13335 91.198.174.0/24 RPKI/ROA
  • 21:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 21:10 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 21:10 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 21:08 @: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:07 twentyafterfour@deploy1001: Finished scap: testwikis to 1.34.0-wmf.23 refs T220748 (duration: 24m 55s)
  • 21:01 XioNoX: enable interface damping on primary eqiad-esams link (eqiad side) - T196432
  • 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:47 dzahn@cumin1001: Updating IPMI password on 660 hosts - dzahn@cumin1001
  • 20:46 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:42 twentyafterfour@deploy1001: Started scap: testwikis to 1.34.0-wmf.23 refs T220748
  • 20:39 @: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 20:31 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.Title/phpCharToUpper.json: 8372dcd (duration: 00m 56s)
  • 20:30 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/resources/src/mediawiki.Title/Title.js: 8372dcd (duration: 02m 08s)
  • 20:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 21 hosts - dzahn@cumin1001
  • 20:18 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:15 tzatziki: changing email for User:Olag
  • 20:12 dzahn@cumin1001: Updating IPMI password on 18 hosts - dzahn@cumin1001
  • 20:11 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:04 dzahn@cumin1001: Updating IPMI password on 29 hosts - dzahn@cumin1001
  • 20:04 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:32 ejegg: updated payments-wiki from fc82318180 to adef0e858f
  • 19:26 dzahn@cumin1001: Updating IPMI password on 543 hosts - dzahn@cumin1001
  • 19:25 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:22 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:22 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:20 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:14 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:14 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:08 twentyafterfour: Branch cut is in progress for 1.34.0-wmf.23
  • 19:05 urandom: decommissioning Cassandra, restbase2011-c -- T224553
  • 18:06 papaul: upgrading firmware on scs1-a1-codfw
  • 17:18 ejegg: updated SmashPig payments listener from a0151434f4 to dc0c6b208b
  • 17:09 urandom: decommissioning Cassandra, restbase2011-b -- T224553
  • 17:08 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 17:00 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:59 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 16:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
  • 16:04 jbond42: run octocatalog-diff from elnath with current facts
  • 15:55 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Revert Set MinimumPasswordLengthToLogin to 10 for all prived groups, not just +staff (duration: 00m 55s)
  • 15:53 reedy@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
  • 15:53 reedy@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
  • 15:39 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 15:38 urandom: decommissioning Cassandra, restbase2011-a -- T224553
  • 15:17 marostegui@cumin1001: dbctl commit (dc=all): 'Host down for on-site maintenance', diff saved to https://phabricator.wikimedia.org/P9120 and previous config saved to /var/cache/conftool/dbconfig/20190917-151714-marostegui.json
  • 15:16 marostegui: Stop MySQL on db2127 and shut the host down for onsite maintenance
  • 14:52 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99)
  • 14:52 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on wikitech for T232464
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 8 wikis for T232464
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 7 wikis for T232464
  • 14:51 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 6 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 5 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 4 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on remaining section 3 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 2 wikis for T232464
  • 14:50 anomie@mwmaint1002: Running cleanupRevActorPage.php on section 1 wikis for T232464
  • 14:48 anomie@mwmaint1002: Running cleanupRevActorPage.php on test wikis and mediawikiwiki for T232464
  • 14:39 anomie@deploy1001: Synchronized php-1.34.0-wmf.22/includes/MergeHistory.php: Backport MergeHistory fix for T232464 gerrit:537436 (duration: 00m 54s)
  • 14:35 ottomata: bouncing eventstreams service on scb hosts
  • 14:15 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 14:14 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 14:13 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 14:03 herron: migrating kafka1003 to kafka-main1003 T225005
  • 14:00 jbond42: forcing puppet run
  • 14:00 bblack: lvs1015 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:59 bblack: lvs2003 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:57 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 13:52 bblack: lvs1016 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:52 bblack: lvs2006 - restart pybal to remove runcommands - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536581/
  • 13:46 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 13:45 moritzm: repooling restbase2010 after reimage/completed bootstrap
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130 db1104 db1085 db1086 after PDU maintenance - T227539', diff saved to https://phabricator.wikimedia.org/P9117 and previous config saved to /var/cache/conftool/dbconfig/20190917-132102-marostegui.json
  • 13:17 godog: force-run puppet in eqiad to update exported resources
  • 13:14 jbond42: currently running octocatalog-diff for all hosts from elnath
  • 13:02 marostegui: Start replication on db1130 db1104 db1085 db1086 after PDU maintenance is completed - T227539
  • 13:01 cmjohnson1: The PDU swap in rack B3 eqiad is finished.
  • 12:30 mobrovac: bootstrap restbase2010-c - T224553
  • 11:32 Urbanecm: EU SWAT is done
  • 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 11:31 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 11:31 urbanecm@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: 290e207: Add channels for the Translate and TranslationsNotification extension (T221119, T144780, T143073) (duration: 00m 56s)
  • 11:30 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 11:30 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:29 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.ipmi-password-reset (exit_code=97)
  • 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 11:27 awight@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: Use https rather than protcol-relative remote API URLs (T228851) (duration: 00m 58s)
  • 11:24 cmjohnson1: commencing pdu swap rack b3 eqiad T227539
  • 11:22 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Update ORES filter threshold configuration for new huwiki model (T230031) (duration: 00m 55s)
  • 11:17 awight@deploy1001: Synchronized wmf-config/VariantSettings.php: SWAT: Enable EditorJourney for euwiki (T232061) (duration: 00m 56s)
  • 11:13 Urbanecm: Run mwscript emptyUserGroup.php --wiki=aawiki 'inactive' (T150538)
  • 10:58 mobrovac: bootstrap restbase2010-b - T224553
  • 10:44 vgutierrez: replacing nginx with ATS in cp1076 (upload cluster) - T231433
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool and stop replication on db1130 db1104 db1085 db1086 (lag will appear on s6 on labsdb) for PDU maintenance - T227539', diff saved to https://phabricator.wikimedia.org/P9116 and previous config saved to /var/cache/conftool/dbconfig/20190917-094827-marostegui.json
  • 09:46 marostegui: Depool and stop replication on db1130 db1104 db1085 db1086 (lag will appear on s6 on labsdb) for PDU maintenance - T227539
  • 09:30 hashar: Restarting CI jenkins
  • 09:29 marostegui: Downtime db1073 db1130 db1104 db1085 db1086 for the PDU maintenance T227539
  • 09:18 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:16 mobrovac: bootstrap restbase2010-a - T224553
  • 09:15 jynus@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:05 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Push PHP7 traffic to 100% of users who accept cookies - T219150 (duration: 00m 57s)
  • 08:37 vgutierrez: upgrading ATS to 8.0.5-1wm8 on cp3034 - T231849 T232724
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1074 with just 50 to keep its warmness level just in case T231638', diff saved to https://phabricator.wikimedia.org/P9115 and previous config saved to /var/cache/conftool/dbconfig/20190917-075807-marostegui.json
  • 07:48 effie: Enable puppet on mw*
  • 07:42 elukey: reboot analytics-tool1004 (host running superset) for kernel updates
  • 07:41 marostegui: Stop mysql on db1063 for decommissioning T232564
  • 07:40 marostegui: Remove db1063 from puppet and zarcillo T232564
  • 07:29 vgutierrez: repooling cp5007 without wikibase configuration - T99531
  • 07:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:21 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:19 vgutierrez: depooling cp5007 to ensure that wikibase removal goes as expected - T99531
  • 07:10 vgutierrez: getting rid of wikibase TLS certificate & nginx configuration on the text cache cluster - T99531
  • 06:56 vgutierrez: upgrading ATS to 8.0.5-1wm8 on cp2002, cp4021 and cp5001 - T231849
  • 06:55 vgutierrez: uploaded trafficserver 8.0.5-1wm8 to apt.wikimedia.org (stretch) - T231849
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1066 T233071', diff saved to https://phabricator.wikimedia.org/P9114 and previous config saved to /var/cache/conftool/dbconfig/20190917-065342-marostegui.json
  • 06:49 moritzm: reimage restbase2010 to Stretch T224553
  • 05:57 vgutierrez: upgrading ATS to 8.0.5-1wm7 on cp2002 and cp4021 - T232724
  • 05:56 vgutierrez: uploaded trafficserver 8.0.5-1wm7 to apt.wikimedia.org (stretch) - T232298 T232724
  • 05:23 effie: disable puppet on mw* servers for 536979
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1122 to s2 master and remove read-only from s2 T230785', diff saved to https://phabricator.wikimedia.org/P9113 and previous config saved to /var/cache/conftool/dbconfig/20190917-050133-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 as read-only for maintenance T230785', diff saved to https://phabricator.wikimedia.org/P9112 and previous config saved to /var/cache/conftool/dbconfig/20190917-050043-marostegui.json
  • 05:00 marostegui: Starting s2 failover from db1066 to db1122 - T230785
  • 04:57 effie: Downtiming HTTPS-blog on icing - T232412
  • 04:14 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1122 with weight 0 and depool it from API T230785', diff saved to https://phabricator.wikimedia.org/P9111 and previous config saved to /var/cache/conftool/dbconfig/20190917-041441-marostegui.json
  • 04:11 marostegui: Start s2 pre-switchover steps T230785
  • 00:34 AndyRussG: updated fruec from fb29cb7407 to 97128874bf

2019-09-16

  • 23:53 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgDebugLogFile in VS (duration: 00m 55s)
  • 23:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgDebugLogFile in CS (duration: 00m 55s)
  • 23:42 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgUploadThumbnailRenderHttpCustom* in VS (duration: 00m 54s)
  • 23:41 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgUploadThumbnailRenderHttpCustom* in CS (duration: 00m 55s)
  • 23:30 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wmgRC2UDPAddress in VS (duration: 00m 55s)
  • 23:29 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wmgRC2UDPAddress in CS (duration: 00m 56s)
  • 23:24 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Stop setting wgCopyUploadProxy in VS (duration: 00m 56s)
  • 23:21 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wgCopyUploadProxy in CS (duration: 00m 55s)
  • 23:13 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T225261 T194019 Adjust CentralNotice CSP for banner previews for FR-tech (duration: 00m 55s)
  • 22:59 chaomodus: restarted nagios-nrpe-server on notebook1003
  • 22:46 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use __DIR__ rather than global wmfConfgDir (duration: 00m 55s)
  • 21:48 ebernhardson: unban elastic1027 from production-search-eqiad
  • 20:55 XioNoX: remove 2 sessions to AS12871 on cr2-esams - T232617
  • 20:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:20 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:20 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:19 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 20:18 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:15 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 20:14 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:10 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 20:09 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 20:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 20:08 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 20:08 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:55 XioNoX: reboot scs-a8-eqiad (at 100% CPU)
  • 19:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:55 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:54 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:53 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:52 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:51 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:51 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:35 dzahn@cumin1001: Updating IPMI password on 12 hosts - dzahn@cumin1001
  • 19:34 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:28 dzahn@cumin1001: Updating IPMI password on 12 hosts - dzahn@cumin1001
  • 19:27 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:27 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 19:26 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:19 dzahn@cumin1001: Updating IPMI password on 2 hosts - dzahn@cumin1001
  • 19:19 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:13 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:13 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:09 dzahn@cumin1001: Updating IPMI password on 8 hosts - dzahn@cumin1001
  • 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 19:03 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgCookieSetOnAutoBlock and wgCookieSetOnIpBlock to the default; never varied (duration: 00m 56s)
  • 19:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean up globals in InitialiseSettings.php (duration: 00m 56s)
  • 19:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 19:01 dzahn@cumin1001: Updating IPMI password on 0 hosts - dzahn@cumin1001
  • 19:00 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 18:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 18:54 dzahn@cumin1001: Updating IPMI password on 1 hosts - dzahn@cumin1001
  • 18:54 dzahn@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
  • 18:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T223602 Variant configuration: Read JSON config for all wikis (duration: 00m 56s)
  • 18:48 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set MinimumPasswordLengthToLogin to 10 for all prived groups, not just +staff (duration: 00m 56s)
  • 18:40 jforrester@deploy1001: Synchronized src/WmfClusters.php: Use static VariantSettings instead of InitialiseSettings (noc-only change) (duration: 00m 55s)
  • 18:40 mutante: phab1001 - racadm racreset
  • 18:21 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Remove globals declaration and use via GLOBALS for testability (duration: 00m 56s)
  • 18:15 Lucas_WMDE: Morning SWAT done
  • 18:14 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: SWAT: bridge: enable EditTags for beta (T232582) (duration: 00m 58s)
  • 18:12 herron: migrating kafka1002 to kafka-main1002 T225005
  • 18:09 mutante: registry2001 - restarting nginx
  • 17:55 jforrester@deploy1001: Synchronized docroot/noc/conf/VariantSettings.php.txt: New file for NOC (duration: 00m 55s)
  • 17:49 ejegg: updated SmashPig standalone from 5d187092a7 to a0151434f4
  • 17:42 urandom: decommissioning Cassandra, restbase2010-c -- T224553
  • 17:42 ebernhardson: restart elasticsearch_6@production-search-eqiad on elastic1027 due to >1k orphan tasks
  • 17:09 jforrester@deploy1001: Synchronized docroot/noc/conf/VariantSettings.php.txt: New file for NOC (duration: 00m 54s)
  • 16:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Make CommonSettings use mtime from VariantSettings (duration: 00m 55s)
  • 16:58 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Make InitialiseSettings use values from VariantSettings (duration: 00m 54s)
  • 16:55 jforrester@deploy1001: Synchronized wmf-config/VariantSettings.php: Establish VariantSettings.php everywhere (duration: 00m 56s)
  • 16:51 ebernhardson: ban elastic1027 from production-search-eqiad-chi
  • 16:12 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T223602 Inject config object into InitialiseSettings-labs rather than use wgConf global (duration: 00m 55s)
  • 15:42 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Variant configuration: Write JSON config for all wikis T223602 (duration: 00m 56s)
  • 15:41 jforrester@deploy1001: sync aborted: wmf-config/CommonSettings.php Variant configuration: Write JSON config for all wikis T223602 (duration: 00m 08s)
  • 15:41 jforrester@deploy1001: Started scap: wmf-config/CommonSettings.php Variant configuration: Write JSON config for all wikis T223602
  • 15:10 @: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 15:07 @: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 15:06 @: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 14:54 urandom: decommissioning Cassandra, restbase2010-b -- T224553
  • 14:37 @: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 14:25 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:09 @: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 14:05 @: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 13:48 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99)
  • 13:28 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FlaggedRevs/frontend/specialpages/reports/ValidationStatistics.php: Add missing "use" to getTopReviewers() - T232618 (duration: 00m 55s)
  • 13:10 moritzm: rebooting failoid2001 for kernel update/pick up new qemu
  • 13:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.22
  • 12:59 moritzm: installing qemu security updates on stretch
  • 12:58 urandom: decommissioning Cassandra, restbase2010-a -- T224553
  • 12:44 godog: stop thumbor traffic to statsd/graphite, use Prometheus only and replace Thumbor dashboard - T205870
  • 12:40 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 12:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99)
  • 12:17 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
  • 12:07 _joe_: rolling restart ended on eqiad T232613
  • 11:56 _joe_: rolling restart of php-fpm in eqiad to pick up the new memcached extension T232613
  • 11:50 _joe_: rolling restart of php-fpm in codfw to pick up the new memcached extension T232613
  • 11:43 Urbanecm: EU SWAT is done
  • 11:38 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: e37aed2: Remove expired throttle rules (duration: 01m 03s)
  • 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 313e3d9: Increase move rate-limit on Commons for all autopatrolled users (T232657) (duration: 01m 05s)
  • 11:33 jbond42: update peer address of AS28598
  • 11:30 effie: Upgrading php-memcached to 3.0.1+2.2.0-1~wmf3
  • 11:30 awight@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: Send a User-Agent with remote API requests (T232840) (duration: 01m 02s)
  • 11:28 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 869b56f: Lift IP cap on 2019-10-02 for Senior Citizen Write Wikipedia course - cs.wikipedia (T232831) (duration: 01m 02s)
  • 11:21 awight@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: SWAT: Enable File Importer source wiki edits on beta cluster (T228851) (duration: 01m 03s)
  • 11:14 awight@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Enable source wiki editing for testwiki (T228851) (duration: 01m 02s)
  • 11:10 awight@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/FileImporter: SWAT: Add debug logging for remote API failures (T228851) (duration: 01m 05s)
  • 11:06 _joe_: uploaded php-memcached_3.0.1+2.2.0-1~wmf3 to component/php72 for stretch T232613
  • 10:52 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ncredir2002.codfw.wmnet
  • 10:51 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=ncredir2002.codfw.wmnet
  • 10:50 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 03s)
  • 10:49 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 04s)
  • 10:45 vgutierrez: Enabling OCSP prefetched responses for the non-canonical redirect service - T232988
  • 10:29 _joe_: installing a patched php-memcached on mw1347 T232613
  • 10:16 vgutierrez: upgrade acme-chief production servers to acme-chief 0.21 - T219765
  • 10:16 moritzm: upload libtrapperkeeper-webserver-jetty9-clojure 1.7.0-2+wmf1 to buster-wikimedia
  • 10:05 vgutierrez: restarting acmechief servers to get latest kernel upgrades
  • 09:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:57 vgutierrez: replacing nginx with ATS in cp3034 (upload cluster) - T231433
  • 08:56 mobrovac@deploy1001: Synchronized wmf-config/CommonSettings.php: Beta: enable the Parsoid extension - T231569 (duration: 01m 01s)
  • 08:50 marostegui: Apply grants for dbproxy1021 on db1133 (m5 master) with replication - T202367
  • 08:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:38 moritzm: installing faad2 security updates
  • 07:15 moritzm: repooling restbase2009
  • 06:48 marostegui: Stop MySQL on db1114 to upgrade it to 10.3
  • 06:04 marostegui: Stop MySQL on db2054 for decommissioning T232969
  • 06:01 marostegui: Remove db2054 from tendril and zarcillo T232969
  • 05:59 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2054 from config T232969 (duration: 01m 03s)
  • 05:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2054 from config T232969 (duration: 01m 05s)

2019-09-15

  • 16:51 Krinkle: Fixed a dozen abuse filters, listed at https://phabricator.wikimedia.org/T156096#5494060. The trailing pipe character was removed from filters that had it which is no longer supported in a future version of AbuseFilter.
  • 14:35 _joe_: test: setting opcache.interned_strings_buffer to 0 on mw1348 for T232613

2019-09-14

  • 23:42 onimisionipe: force shard allocation (dewiki_content_1566659363[4]) on eqiad cluster
  • 04:39 effie: Depool and reload mw1286
  • 01:14 ejegg: updated fundraising python tools from 1e405864d7 to e1b81688c6
  • 00:29 ejegg: updated payments-wiki from 1f556670cf to fc82318180

2019-09-13

  • 23:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:06 gehel: re-enable puppet on maps - T232817
  • 20:23 chaomodus: restarting netbox1001.wikimedia.org
  • 20:00 twentyafterfour: hotfixing T232600 due to severity of the bug and relative safety of the fix (if this breaks, yell at James_F who twisted my arm and made me do it)
  • 19:54 urandom: bootstrapping Cassandra, restbase2009-c -- T224553
  • 17:24 urandom: bootstrapping Cassandra, restbase2009-b -- T224553
  • 16:10 XioNoX: fix bgp group netflow on cr2-codfw
  • 15:47 urandom: bootstrapping Cassandra, restbase2009-a -- T224553
  • 15:43 effie: reverting live hacks on mw1348
  • 15:34 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: Disable adhoc core dump logging - T232613 (duration: 01m 04s)
  • 15:14 akosiaris: upload apertium-dan_0.6.0-1+wmf3 apertium-nno_1.0.0-1+wmf1 apertium-nob_1.0.0-2+wmf1 apertium-swe_0.8.0-1+wmf1 to apt.wikimedia.org/jessie-wikimedia T218184
  • 15:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:11 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:02 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/rdbms/lbfactory/LBFactoryMulti.php: Add more log and context for T232613 logging - T232613 (duration: 01m 04s)
  • 15:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:30 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 14:30 moritzm: installing cups security update on buster (only client-side libs installed)
  • 14:22 moritzm: installing bzip2 update from Buster 10.1 point release
  • 14:18 moritzm: installing reportbug update from Buster 10.1 point release
  • 14:14 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 14:05 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 13:57 oblivian@deploy1001: Synchronized wmf-config/logging.php: unbreak mediawiki logging on scandium (duration: 01m 04s)
  • 13:28 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:27 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:21 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:20 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:19 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 12:56 _joe_: banning more urls on maps1003
  • 12:37 _joe_: temp ban of class of urls on maps1003 nginx
  • 12:14 jbond42: add timing information to maps1003 access logs
  • 11:39 jbond42: enable access logs on maps1003
  • 11:38 _joe_: manually raising the worker heap limit to 600 MB on kartotherian on maps1003
  • 11:11 elukey: reboot an-conf100* (Analytics Zookeeper nodes - not yet in production) for kernel upgrades
  • 11:10 elukey: reboot an-tool1007 (runs turnilo) for kernel upgrades
  • 11:08 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 11:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:05 godog: silence kartotherian pages for 2h, known issue
  • 10:47 vgutierrez: rebooting acmechief-test servers to catch up latest kernel upgrades
  • 10:42 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:41 moritzm: reimage restbase2009 to stretch T224553
  • 10:38 moritzm: repool restbase1018 after reimage to stretch and completed Cassandra bootstrap
  • 10:36 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:36 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:13 vgutierrez: disable ATS-TLS debug options on cp5001 - T232298
  • 10:09 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 09:46 gehel: re-enabling /geoline on maps1004 - T232817
  • 09:45 @: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
  • 09:44 @: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 09:42 @: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
  • 09:40 godog: install linux-perf-4.9 on maps1002 and attempt to capture a stack sample
  • 09:38 gehel: drop /geoshape and restart kartotherian on maps1004 - T232817
  • 09:27 gehel: restart kartotherian on maps1004 - T232817
  • 09:24 gehel: deny access to /geoline on maps1004 - T232817
  • 09:11 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 09:08 godog: downtime kartotherian pages for 1h in codfw
  • 09:01 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1046.eqiad.wmnet
  • 09:00 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1017.eqiad.wmnet
  • 08:57 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 08:52 godog: downtime kartotherian pages for 1h
  • 08:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
  • 08:48 jmm@cumin2001: Updating IPMI password on 1 hosts - jmm@cumin2001
  • 08:47 jmm@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 08:47 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
  • 08:47 jmm@cumin2001: START - Cookbook sre.hosts.ipmi-password-reset
  • 08:45 gehel: stop tilerator on maps to help reduce load
  • 08:37 _joe_: rolling restart of karotherian
  • 08:33 _joe_: restarting kartotherian on maps1003, all workers seem stuck
  • 05:58 oblivian@deploy1001: Synchronized w/fatal-error.php: Adding core dump function to fatal-error (duration: 01m 04s)
  • 05:40 _joe_: live-hacking mw1348, setting rlimit_core = unlimited to allow core dumps to be taken
  • 05:17 effie: Rolling restart php-fpm across the fleet for 536400
  • 04:53 vgutierrez: restarting ats-tls on cp4021 and cp2002 to pick up the new SSL session cache timeout - T231849
  • 04:50 eileen: process-control config revision is 43a2677bcf - turned off gender import
  • 02:23 eileen: civicrm revision changed from c5ab5aea9e to 45dbfdb96f, config revision is 1da8391a9a
  • 01:09 XioNoX: add IPv6 sampling to cr1-eqiad
  • 01:07 XioNoX: enable netflow sampling on cr2-codfw

2019-09-12

  • 23:35 XioNoX: enable netflow sampling on cr1-codfw
  • 23:21 urandom: decommissioning Cassandra, restbase2009-b -- T224553
  • 23:19 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T223602 Read config from JSON, not serialised PHP on testwiki (duration: 01m 03s)
  • 23:18 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: T223602 Add ability to read config from JSON, not serialised PHP (duration: 01m 04s)
  • 23:10 eileen: process-control config revision is 1da8391a9a
  • 22:53 ayounsi@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 22:48 ayounsi@cumin2001: START - Cookbook sre.ganeti.makevm
  • 22:43 ayounsi@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
  • 22:43 ayounsi@cumin2001: START - Cookbook sre.ganeti.makevm
  • 22:20 XenoRyet: payments-wiki updated from 4ebbdb247d to 1f556670cf
  • 22:14 XioNoX: remove extra prepend in AMS-IX
  • 21:18 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/rdbms/lbfactory/LBFactoryMulti.php: Hardcode posix signal and log coredump - T232613 (duration: 01m 04s)
  • 21:17 mbsantos@deploy1001: Finished deploy [tilerator/deploy@5996843]: Deploy tilerator 1.1.4-wmf.0 (duration: 03m 18s)
  • 21:14 mbsantos@deploy1001: Started deploy [tilerator/deploy@5996843]: Deploy tilerator 1.1.4-wmf.0
  • 21:13 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@c4c9e8b]: Deploy kartotherian 1.1.4-wmf.0 (duration: 03m 52s)
  • 21:09 mbsantos@deploy1001: Started deploy [kartotherian/deploy@c4c9e8b]: Deploy kartotherian 1.1.4-wmf.0
  • 21:00 urandom: decommissioning Cassandra, restbase2009 -- T224553
  • 20:33 krinkle@deploy1001: Synchronized wmf-config/: d495d5e24949 (duration: 01m 03s)
  • 20:28 krinkle@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: d495d5e24949 (duration: 01m 04s)
  • 20:27 eileen: civicrm revision changed from 4075e396d5 to f00c6482bf, config revision is 635f198b92
  • 20:05 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: beta-only (duration: 01m 02s)
  • 20:03 krinkle@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: beta-only (duration: 01m 04s)
  • 20:02 moritzm: installing firmware-nonfree update from Buster 10.1 point release
  • 19:51 moritzm: installing systemd bugfix update from Buster 10.1 point release
  • 19:44 moritzm: installing 4.19.67 kernel from 10.1 point release on Buster systems
  • 19:34 urandom: bootstrapping Cassandra, restbase1018-c -- T224553
  • 18:59 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable coredump on some mysterious php7.2 failure - T232613 (duration: 01m 04s)
  • 18:32 moritzm: installing gdb updates from buster 10.1 point release
  • 18:28 bblack: lvs1016: restart pybal to revert test
  • 18:21 bblack: lvs1016: restart pybal to test dual bgp peering
  • 18:04 bblack: lvs1015: restart pybal to return BGP session to cr2 - T226424
  • 18:03 bblack: lvs1014: restart pybal to return BGP session to cr2 - T226424
  • 17:58 XioNoX: revert VRRP priority change cr2-eqiad - T226424
  • 17:54 XioNoX: revert OSPF priority change on cr2-eqiad - T226424
  • 17:53 XioNoX: re-enabled external BGP on cr2-eqiad - T226424
  • 17:46 urandom: bootstrapping Cassandra, restbase1018-b -- T224553
  • 17:43 XioNoX: reboot cr2-eqiad - T226424
  • 17:40 XioNoX: failover cr2-eqiad master RE from RE1 to RE0 - T226424
  • 17:31 jforrester@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/rdbms/lbfactory/LBFactoryMulti.php: T232613 Add ability to core dump on empty string array key that should exist (wmf.22 only, flagged off) (duration: 01m 03s)
  • 17:31 XioNoX: power off re0.cr2-eqiad - T226424
  • 17:25 XioNoX: failover cr2-eqiad master RE from RE0 to RE1 - T226424
  • 17:19 halfak@deploy1001: Finished deploy [ores/deploy@7d45b80]: T232660 (duration: 13m 41s)
  • 17:05 halfak@deploy1001: Started deploy [ores/deploy@7d45b80]: T232660
  • 17:04 XioNoX: power off re1.cr2-eqiad - T226424
  • 17:02 moritzm: installing unzip security updates on buster
  • 17:00 XioNoX: +1000 metric to all transport to/from cr2-eqiad - T226424
  • 16:57 moritzm: installing libxslt security updates on buster
  • 16:49 XioNoX: Deactivate IX/transit/private-peer v4/v6 BGP on cr2-eqiad - T226424
  • 16:47 moritzm: installing NSS security updates on buster
  • 16:42 XioNoX: er, switch VRRP master to cr1-eqiad - T226424
  • 16:42 XioNoX: switch VRRP master to cr2-eqiad - T226424
  • 16:36 bblack: lvs1013: restart pybal to move bgp session to cr1 - T226424
  • 16:36 bblack: lvs1014: restart pybal to move bgp session to cr1 - T226424
  • 16:35 bblack: lvs1015: restart pybal to move bgp session to cr1 - T226424
  • 16:34 bblack: lvs1016: restart pybal to move bgp session to cr1 - T226424
  • 16:19 XioNoX: rollback force VRRP backup on cr1-eqiad - T226424
  • 16:16 XioNoX: activate CF tunnel on cr1-eqiad - T226424
  • 16:16 XioNoX: activate transit4/6 on cr1-eqiad - T226424
  • 16:09 urandom: bootstrapping Cassandra, restbase1018-a -- T224553
  • 16:04 XioNoX: reboot cr1-eqiad - T226424
  • 16:01 XioNoX: force offline/online of FPC3 on cr1-eqiad
  • 15:45 XioNoX: failover master RE from RE1 to RE0 on cr1-eqiad - T226424
  • 15:39 XioNoX: deactivate transit4/6 on cr1-eqiad - T226424
  • 15:31 XioNoX: shutdown re0.cr1-eqiad - T226424
  • 15:23 XioNoX: failover master RE from RE0 to RE1 on cr1-eqiad - T226424
  • 15:13 XioNoX: shutdown re1.cr1-eqiad - T226424
  • 15:05 XioNoX: disable primary tunnel to CF in eqiad (for real this time, I did see an uptake of traffic on backup link before the rollback)
  • 15:03 XioNoX: rolled back disable primary tunnel to CF in eqiad
  • 15:02 XioNoX: disable primary tunnel to CF in eqiad
  • 14:53 bblack: restart pybal on lvs1013 to move BGP conn to cr2-eqiad - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536209 - T226424
  • 14:50 bblack: restart pybal on lvs1016 to move BGP conn to cr2-eqiad - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/536209 - T226424
  • 14:45 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:41 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 14:39 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:37 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:29 XioNoX: ensure cr1-eqiad is vrrp backup for all groups - T226424
  • 13:22 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:03 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:01 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:57 effie: restarting hhvm on mw1233 and repooling
  • 12:56 effie: depool mw12333
  • 12:38 moritzm: reimaging restbase1018 to stretch
  • 12:03 Amir1: EU SWAT is done
  • 12:03 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set item terms on write both up to Q20mio (T225055) (duration: 01m 31s)
  • 11:11 akosiaris@: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:11 akosiaris@: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:09 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:09 akosiaris@: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 11:00 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 09:42 jynus: compressing tables on labsdb1012 T232446
  • 08:22 vgutierrez: upgrading to acme-chief 0.21 on acmechief-test instances - T219765
  • 08:17 vgutierrez: restarting pybal on lvs1015 and lvs2003 - T176875
  • 08:13 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wdqs,service=wdqs-heavy-queries
  • 08:11 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=puppetmaster1001.eqiad.wmnet,service=wdqs-heavy-queries
  • 08:07 vgutierrez: restarting pybal on lvs2006 - T176875
  • 08:02 vgutierrez: restarting pybal on lvs1016 - T176875
  • 07:45 vgutierrez: uploaded acme-chief 0.21 to apt.wikimedia.org (buster) - T219765
  • 06:51 vgutierrez: restarting ATS-TLS on cp4021 and cp2002 to get the new SSL session cache size - T232298
  • 06:00 marostegui: Stop MySQL on db1073 for decommission T231892
  • 05:59 marostegui: Remove db1073 from tendril and zarcillo T231892
  • 05:26 _joe_: restarting strongswan on all eqiad caches that need it
  • 05:23 _joe_: restarting strongswan on cp1077
  • 03:37 eileen: civicrm revision changed from 32cd5e4953 to 4075e396d5, config revision is 3e22a80bc8
  • 02:13 eileen: civicrm revision changed from 53aeba6318 to 32cd5e4953, config revision is 3e22a80bc8
  • 02:03 XioNoX: repooling ulsfo

2019-09-11

  • 23:50 ejegg: updated payments-wiki from 5432f9c3a4 to 4ebbdb247d
  • 23:20 XioNoX: `set protocols bgp group Netflow cluster 208.80.154.197` on cr2-eqiad
  • 22:43 XioNoX: `set protocols bgp group Netflow cluster 208.80.154.196` on cr1-eqiad
  • 22:36 XioNoX: add BGP session between cr2-eqord and netflow1001
  • 22:30 urandom: decommissioning Cassandra, restbase1018-c -- T224553
  • 20:57 urandom: bootstrapping Cassandra, restbase-dev1005-b -- T224554
  • 20:21 ottomata: stopped and removed eventlogging-service-eventbus - T232122
  • 20:12 ppchelko@deploy1001: Finished deploy [changeprop/deploy@522177f]: Clean up old event style support (duration: 01m 39s)
  • 20:11 ppchelko@deploy1001: Started deploy [changeprop/deploy@522177f]: Clean up old event style support
  • 20:07 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@2c9e409]: Clean up old event style support T230049 (duration: 00m 53s)
  • 20:06 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@2c9e409]: Clean up old event style support T230049
  • 18:43 urandom: decommissioning Cassandra, restbase1018-b -- T224553
  • 18:42 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T211124 ed8dd7aad9e5 (duration: 01m 04s)
  • 18:42 nuria@deploy1001: Finished deploy [analytics/refinery@fa994c7]: v0.0.99 of refinery, again, try II. last time shas commited by jenkins were incorrect (duration: 08m 39s)
  • 18:40 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: no-op ed8dd7aad9e5 (duration: 01m 06s)
  • 18:37 krinkle@deploy1001: Synchronized tests/: no-op ed8dd7aad9e5 (duration: 01m 05s)
  • 18:33 nuria@deploy1001: Started deploy [analytics/refinery@fa994c7]: v0.0.99 of refinery, again, try II. last time shas commited by jenkins were incorrect
  • 18:16 krinkle@deploy1001: Synchronized wmf-config/logging.php: d6865e3365e8 - T211124 (duration: 01m 04s)
  • 18:16 nuria@deploy1001: Finished deploy [analytics/refinery@f4c60a4]: v0.0.99 of refinery (duration: 01m 21s)
  • 18:15 nuria@deploy1001: Started deploy [analytics/refinery@f4c60a4]: v0.0.99 of refinery
  • 18:02 krinkle@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/WikimediaMaintenance/blameStartupRegistry.php: (no justification provided) (duration: 01m 05s)
  • 17:57 XioNoX: upgrade librenms to 1.55
  • 17:43 ayounsi@deploy1001: Finished deploy [librenms/librenms@2a06e98]: Upgrade LibreNMS to 1.55 - T232599 (duration: 00m 09s)
  • 17:42 ayounsi@deploy1001: Started deploy [librenms/librenms@2a06e98]: Upgrade LibreNMS to 1.55 - T232599
  • 17:32 bblack: enable GRE MTU mitigation on eqsin caches (cp5xxx) - T232602
  • 17:27 bblack: restbase2009 - re-pool - T227408
  • 17:07 bblack: restbase2009 - shutdown for hardware work - T227408
  • 17:05 bblack: restbase2009 - depool for hardware work - T227408
  • 16:57 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.StartModule.less: SWAT: c0fd061: Homepage: Fix start module layout bugs (T230629, T232549, T225668) (duration: 01m 02s)
  • 16:54 bblack: manually removed decommed eventbus LVS IP on kafka100[23]
  • 16:54 bblack: manually removed decommed eventbus LVS IP on kafka-main1001
  • 16:50 bblack: manually removed decommed eventbus LVS IP on kafka-main200[23]
  • 16:49 bblack: manually removed decommed eventbus LVS IP on kafka-main2001
  • 16:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 6007fbc: [rowiki] Allow sysops to remove patrollers (T231099) (duration: 01m 03s)
  • 16:39 urandom: decommissioning Cassandra, restbase1018-a -- T224553
  • 16:38 Urbanecm: Run mwscript emptyUserGroup.php --wiki=fawiki OTRS-member (T232554)
  • 16:36 bblack: ran conftool-merge on puppetmaster1001 (manually from sudo -i, to fixup missing updates)
  • 16:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 76991f2: Remove OTRS-member usergroup from fawiki (T232554) (duration: 01m 05s)
  • 16:32 Urbanecm: mwscript importImages.php --wiki=commonswiki --user=Abbe98 --comment-ext=txt /home/urbanecm/T232346
  • 16:31 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.22/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.StartModule.less: SWAT: c45d6d0: Homepage: Fix start module layout bugs (T230629, T232549, T225668) (duration: 01m 03s)
  • 16:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 565fafa: Set noindex for user and user_talk on zhwiki (T231982) (duration: 01m 05s)
  • 16:24 urandom: bootstrapping Cassandra, restbase-dev1005-a -- T224554
  • 16:16 bblack@cumin1001: conftool action : set/pooled=no; selector: cluster=eventbus
  • 16:10 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 510aa6b: Add new whitelist rule for Université de Lorraine course (T232596) (duration: 01m 04s)
  • 16:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: eceaccf: Add autopatrolled user group to az.wikibooks (T231493) (duration: 01m 06s)
  • 15:52 bblack: lvs1015 - remove eventbus.svc.eqiad.wmnet service, restart pybal, etc
  • 15:51 bblack: lvs2003 - remove eventbus.svc.codfw.wmnet service, restart pybal, etc
  • 15:49 bblack: lvs1016 - remove eventbus.svc.eqiad.wmnet service, restart pybal, etc
  • 15:48 bblack: lvs2006 - remove eventbus.svc.codfw.wmnet service, restart pybal, etc
  • 15:03 bblack: downtimed dns-discovery confd health checks for eventbus - T232122
  • 13:13 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.22 (duration: 01m 02s)
  • 13:12 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.22
  • 12:48 moritzm: upgrade labpuppetmaster* to use facter 3 / puppet 5
  • 12:40 moritzm: removing now obsolete puppet/puppetdb packages from labpuppetmaster* T171188
  • 12:40 moritzm: removing now puppet/puppetdb packages from labpuppetmaster* T171188
  • 11:59 hashar: Restarting Gerrit due to deadlock in the account cache # T224448
  • 11:57 bblack: applying GRE MTU -> MSS fixup to cobalt and gerrit2001 - T218184
  • 11:41 Amir1: EU SWAT is done
  • 11:40 ladsgroup@deploy1001: Synchronized php-1.34.0-wmf.21/maintenance/getReplicaServer.php: SWAT: maintenance/getReplicaServer.php: Remove reference to long-deleted config var (T232268) (duration: 01m 04s)
  • 11:29 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable AMC Outreach modal (T231436) (duration: 01m 04s)
  • 11:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set item terms on write both up to Q10mio (T225055) (duration: 01m 03s)
  • 11:10 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: TR: set WikibaseTaintedReferencesEnabled true on labs wikidatawiki (T232191) (duration: 01m 03s)
  • 10:57 mobrovac: drop the wiktionary definition keyspace - T231361
  • 10:23 moritzm: removed roentgenium/tureis in Ganeti T224559
  • 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:18 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 10:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 10:17 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 10:01 jynus: stopping and upgrading db1074
  • 09:56 jynus: upgrading mariadb client libary on mariadb root clients
  • 09:46 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Push PHP7 traffic to 50% - T219150 (duration: 01m 03s)
  • 09:45 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3a (duration: 12m 15s)
  • 09:32 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3a
  • 09:32 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3 (duration: 13m 18s)
  • 09:19 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #3
  • 09:16 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #2 (duration: 03m 59s)
  • 09:13 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints, take #2
  • 09:11 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints - T231361 T232449 (duration: 03m 24s)
  • 09:08 mobrovac@deploy1001: Started deploy [restbase/deploy@cf2ca76]: Stop using storage for enwiktionary definition and expose new PCS javascript endpoints - T231361 T232449
  • 08:36 mobrovac@deploy1001: Finished deploy [changeprop/deploy@7a8ab89]: Stop pregenerating enwiktionary page/definition, take #2 - T231361 (duration: 02m 13s)
  • 08:34 mobrovac@deploy1001: Started deploy [changeprop/deploy@7a8ab89]: Stop pregenerating enwiktionary page/definition, take #2 - T231361
  • 08:24 mobrovac@deploy1001: Finished deploy [changeprop/deploy@069d297]: Revert Stop pregenerating enwiktionary page/definition (duration: 00m 34s)
  • 08:24 mobrovac@deploy1001: Started deploy [changeprop/deploy@069d297]: Revert Stop pregenerating enwiktionary page/definition
  • 08:22 mobrovac@deploy1001: Finished deploy [changeprop/deploy@56a8342]: Stop pregenerating enwiktionary page/definition - T231361 (duration: 02m 45s)
  • 08:19 mobrovac@deploy1001: Started deploy [changeprop/deploy@56a8342]: Stop pregenerating enwiktionary page/definition - T231361
  • 08:13 elukey: add thirdparty/amd-rocm271 to buster-wikimedia and update it with ROCm 2.7.1 packages
  • 08:09 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:07 elukey: execute reprepro clearvanished on install1002 to clear buster-wikimedia|thirdparty/amd-rocm27 (not used anymore)
  • 08:07 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1122', diff saved to https://phabricator.wikimedia.org/P9088 and previous config saved to /var/cache/conftool/dbconfig/20190911-080450-marostegui.json
  • 07:52 moritzm: reimaging restbase-dev1005 to Stretch T224554
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1122', diff saved to https://phabricator.wikimedia.org/P9087 and previous config saved to /var/cache/conftool/dbconfig/20190911-075139-marostegui.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1122', diff saved to https://phabricator.wikimedia.org/P9086 and previous config saved to /var/cache/conftool/dbconfig/20190911-073335-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1122', diff saved to https://phabricator.wikimedia.org/P9085 and previous config saved to /var/cache/conftool/dbconfig/20190911-072344-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1122', diff saved to https://phabricator.wikimedia.org/P9084 and previous config saved to /var/cache/conftool/dbconfig/20190911-071450-marostegui.json
  • 07:07 marostegui: Stop MySQL on db1122 to reboot for a kernel upgrade T230785
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 to reboot for kernel upgrade T230785', diff saved to https://phabricator.wikimedia.org/P9083 and previous config saved to /var/cache/conftool/dbconfig/20190911-070635-marostegui.json
  • 07:00 hashar: Restarting Gerrit - T224448
  • 06:58 hashar: Restarting Gerrit
  • 06:45 marostegui: Drop unused database puppet on m1 - T231539
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Re-organize s1 codfw weights and roles - T230106', diff saved to https://phabricator.wikimedia.org/P9082 and previous config saved to /var/cache/conftool/dbconfig/20190911-061924-marostegui.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Re-organize s1 codfw weights and roles - T230106', diff saved to https://phabricator.wikimedia.org/P9081 and previous config saved to /var/cache/conftool/dbconfig/20190911-061659-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2048, will be decommissioned T230106', diff saved to https://phabricator.wikimedia.org/P9080 and previous config saved to /var/cache/conftool/dbconfig/20190911-054855-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2112 to s1 codfw master T230106', diff saved to https://phabricator.wikimedia.org/P9079 and previous config saved to /var/cache/conftool/dbconfig/20190911-054753-marostegui.json
  • 05:29 marostegui: Switchover s1 codfw master db2048 -> db2112 T230106
  • 03:31 eileen: civicrm revision changed from b343642c76 to 53aeba6318, config revision is 3e22a80bc8

2019-09-10

  • 20:46 ejegg: updated payments-wiki from 15baf7f58b to 5432f9c3a4
  • 20:24 XioNoX: add MSS clamp on install1002 - T2324563
  • 20:20 XioNoX: add MSS clamp on archiva1001 - T232456
  • 18:42 herron: rolling out "Aggregate IPsec Tunnel Status” icinga check, please disregard for the time being if it alerts
  • 18:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T229863 Remove EventBusRCFeedEngine eventServiceName (duration: 01m 05s)
  • 18:15 XioNoX: rollback test add static route on bast3002 to force advmss
  • 18:10 XioNoX: test add static route on bast3002 to force advmss
  • 17:58 jforrester@deploy1001: Synchronized wmf-config/logging.php: T232042 Direct Parsoid/PHP rt-testing log events to a different target (duration: 01m 02s)
  • 17:56 jforrester@deploy1001: Synchronized wmf-config/ProductionServices.php: T232122 Stop setting production value for eventlogging-service (duration: 01m 00s)
  • 17:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T232122 Remove use of eventlogging-service (duration: 01m 03s)
  • 17:33 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Re-sync for safety after scap errored with a broken pipe (duration: 01m 03s)
  • 17:31 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Variant configuration: Write to static (JSON) as well as serialised cache for testwiki T223602 (duration: 01m 02s)
  • 17:29 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Variant configuration: Be able to write to static (JSON) as well as serialised cache (duration: 01m 03s)
  • 16:35 elukey: reboot analytics-tool1001 via ganeti gnt - not reachable via ssh
  • 16:24 urandom: disabling reserved space on restbase-dev1005:/dev/mapper/restbase--dev1005--vg-srv -- T224554
  • 16:10 marostegui: Failover m1 from db1063 to db1135 - T231403
  • 15:58 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Set items term store on write both for all of Wikidata" (duration: 01m 02s)
  • 15:58 thcipriani: restarting gerrit (again) https://grafana.wikimedia.org/d/Bw2mQ3iWz/gerrit-javamelody?orgId=1&from=1568109359163&to=1568130959163&var-Application=&var-Window=30m due to T224448
  • 15:39 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.22
  • 15:37 marostegui: Start pre-switchover for m1 steps T231403
  • 15:35 hashar@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/http/MultiHttpClient.php: Revert "Improve MultiHttpClient connection concurrency and reuse" - T232487 (duration: 00m 55s)
  • 15:33 reedy@deploy1001: Synchronized php-1.34.0-wmf.22/includes/libs/http/MultiHttpClient.php: T232487 (duration: 00m 55s)
  • 15:13 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert group0 to 1.34.0-wmf.22 # T220747
  • 14:48 hashar@deploy1001: scap failed: average error rate on 3/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 14:45 akosiaris: repool cp1075 ats-be, releases cert updated
  • 14:44 akosiaris@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,dc=eqiad,cluster=cache_text,service=ats-be
  • 14:44 XioNoX: depool ulsfo for DC UPS power maintenance (see maint-announce)
  • 14:36 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 14:32 hashar@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.22 and rebuild l10n cache # T220747 (duration: 34m 03s)
  • 14:31 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 14:29 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 14:26 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:20 @: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:18 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 14:18 ottomata: increasing max_body_size to 10mb for all eventgate services - T232362
  • 14:14 akosiaris: depool cp1075 ats-be to test helmfile sync
  • 14:14 akosiaris@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,dc=eqiad,cluster=cache_text,service=ats-be
  • 13:58 hashar@deploy1001: Started scap: testwiki to php-1.34.0-wmf.22 and rebuild l10n cache # T220747
  • 13:56 hashar: Applied security patches to 1.34.0-wmf.22 # T220747
  • 13:53 hashar: scap prep 1.34.0-wmf.22 # T220747
  • 13:34 elukey: reboot stat1005 to clear incosistent process state after tensorflow tests
  • 13:23 hashar: ./make-wmf-branch -n 1.34.0-wmf.22 -o master -c extensions/CharInsert # T220747
  • 13:12 thcipriani: restarting gerrit
  • 13:11 hashar: Gerrit experimenting difficulty due to ongoing wmf branch cut - T231872
  • 13:01 moritzm: copied prometheus-jmx-exporter to buster-wikimedia (from stretch-wikimedia, just a package with some jars)
  • 12:40 cmjohnson1: the new pdus are racked in b6
  • 12:14 cmjohnson1: removing power from ps1-b6 side B...mgmt should not be affected
  • 11:20 cmjohnson1: swapping the PDU in rack B6 eqiad T227541
  • 11:09 Urbanecm: EU SWAT done
  • 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c780fa4: Bump MobileWebUIActionsTracking sampling rate to 10 percent (T220016) (duration: 00m 55s)
  • 11:07 ema@puppetmaster1001: conftool action : set/weight=100; selector: service=ats-be,dc=eqiad,name=cp1075.eqiad.wmnet
  • 11:06 ema: cp1075: set weight in etcd back to 100
  • 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 6afe963: Set items term store on write both for all of Wikidata (T225055) (duration: 00m 55s)
  • 10:51 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:45 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 10:45 akosiaris@: helmfile [CODFW] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:34 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:34 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'coredns' .
  • 10:34 akosiaris@: helmfile [STAGING] Ran 'apply' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:34 @: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 10:34 @: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 10:34 @: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 10:32 vgutierrez: repool cp5001 with ats-tls collecting memory usage details every hour - T232298
  • 09:56 elukey: restart archiva on archiva1001 - UI not working (probably due to connections to maven central being stuck)
  • 09:50 moritzm: installing ghostscript security updates on jessie
  • 09:37 moritzm: added jbond as chanserv ops for #wikimedia-operations
  • 08:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:42 moritzm: reimaging mw2231 after hardware maintenance T231192
  • 07:21 moritzm: iron.wikimedia.org is no longer a bastion host
  • 06:57 moritzm: upgrading snapshot* to PHP 7.2.22 T230024
  • 05:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1073 from config T231892 (duration: 00m 54s)
  • 05:45 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1073 from config T231892 (duration: 00m 55s)
  • 05:35 marostegui: Stop MySQL on db2047 T231852
  • 05:35 marostegui: Remove db2047 from tendril and zarcillo - T231852
  • 05:33 urandom: decommissioning Cassandra, restbase-dev1005-b -- T224554
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1104 into API T230762', diff saved to https://phabricator.wikimedia.org/P9071 and previous config saved to /var/cache/conftool/dbconfig/20190910-051529-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1109 to s8 master and remove read-only from s8 T227062', diff saved to https://phabricator.wikimedia.org/P9070 and previous config saved to /var/cache/conftool/dbconfig/20190910-050213-marostegui.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s8 as read-only for maintenance T230762', diff saved to https://phabricator.wikimedia.org/P9069 and previous config saved to /var/cache/conftool/dbconfig/20190910-050046-marostegui.json
  • 05:00 marostegui: Starting s8 failover from db1104 to db1109 - T227062
  • 04:46 vgutierrez: depool cp5001 for memory leak debugging on ATS - T232298
  • 04:23 marostegui: Start topology changes on s8, connect everything under db1109 - T230762
  • 04:22 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1109 with weight 0 and depool it from API T230762', diff saved to https://phabricator.wikimedia.org/P9068 and previous config saved to /var/cache/conftool/dbconfig/20190910-042243-marostegui.json
  • 04:18 marostegui: Start s8 (wikidata) pre switchover steps T230762
  • 00:59 krinkle@deploy1001: Finished deploy [performance/navtiming@f2a0863]: (no justification provided) (duration: 00m 05s)
  • 00:59 krinkle@deploy1001: Started deploy [performance/navtiming@f2a0863]: (no justification provided)
  • 00:57 Krinkle: krinkle@deploy1001: Deploy performance/navtiming f2a0863 - T226539
  • 00:41 urandom: decommissioning Cassandra, restbase-dev1005-a -- T224554

2019-09-09

  • 23:44 catrope@deploy1001: Synchronized php-1.34.0-wmf.21/skins/MinervaNeue/: T232260 (duration: 00m 57s)
  • 22:28 ejegg: updated payments-wiki from 51d9ed79b6 to 15baf7f58b
  • 20:50 urandom: bootstrapping Cassandra, restbase-dev1004-b -- T224554
  • 19:48 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@533d541]: Update mobileapps to 01971d9 (duration: 05m 45s)
  • 19:42 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@533d541]: Update mobileapps to 01971d9
  • 19:41 mdholloway: mobileapps deployment failed repooling canary (scb2001); retrying
  • 19:40 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@533d541]: Update mobileapps to 01971d9 (duration: 02m 59s)
  • 19:37 XioNoX: fix eqsin CF tunnel missconfig
  • 19:37 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@533d541]: Update mobileapps to 01971d9
  • 17:56 andrewbogott: disabling puppet on labpuppetmaster1001 as part of T171188
  • 17:55 XioNoX: push cloudflare tunnel config to cr1-eqsin
  • 16:50 papaul: replacing Fan kit and power supplies on cr1-codfw
  • 14:22 urandom: bootstrapping Cassandra, restbase-dev1004-a -- T224554
  • 13:51 vgutierrez: upgrading ats to 8.0.5-1wm6 on cp5001 - T232298
  • 13:39 vgutierrez: uploaded trafficserver 8.0.5-1wm6 to apt.wikimedia.org (stretch) - T232298
  • 13:31 moritzm: installing facter update from buster 10.1 point release (T222356)
  • 13:15 moritzm: upgrading labweb/wikitech to PHP 7.2.22 T230024
  • 13:02 Urbanecm: Patch is deployed, deploy1001 should be clear
  • 13:01 moritzm: upgrading remaining mediawiki app servers (mw1266-mw1275) to PHP 7.2.22 T230024
  • 12:55 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/WikibaseMediaInfo/: ubn patch T231276 (duration: 00m 58s)
  • 12:51 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/Wikibase: ubn patch T231276 (duration: 01m 03s)
  • 12:48 moritzm: upgrading remaining job runners to PHP 7.2.22 T230024
  • 12:44 Urbanecm: EU SWAT wmf patch ongoing, testing with mwdebug1002
  • 12:41 ema: lvs1015 (primary): restart pybal to add service restbase-ssl T210411
  • 12:36 ema: lvs2003 (primary): restart pybal to add service restbase-ssl T210411
  • 12:32 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-ssl,dc=eqiad
  • 12:30 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-ssl,dc=codfw
  • 12:29 elukey: restart archiva again to debug download artifact issue
  • 12:24 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-ssl,name=restbase2009.codfw.wmnet
  • 12:24 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: service=restbase-ssl,name=restbase1022.eqiad.wmnet
  • 12:11 Urbanecm: Undeployed patch in wmf branch, will resolve soon
  • 12:01 moritzm: installing ldap-corp1001 T231015
  • 11:32 Urbanecm: Dry run for all wikis (T231137)
  • 11:26 moritzm: installing ldap-corp2001 T231015
  • 10:39 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 10:38 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 54s)
  • 10:22 effie: jiji@deploy1001:~$ scap sync-file wmf-config/CommonSettings.php "Push PHP7 traffic to 33.3% - T219150"
  • 09:48 moritzm: updated stretch netinst image to 9.11 T232308
  • 09:42 eileen: civicrm revision changed from d1d65f37ea to 516eeb54b5, config revision is 5a6a9c6c03
  • 09:40 moritzm: updated buster netinst image to 10.1 T232310
  • 09:28 ema: lvs1016, lvs2006 (secondaries): restart pybal to add service restbase-ssl T210411
  • 09:02 elukey: restart archiva on archiva1001 - stuck and not serving requests (no trace about why in the logs)
  • 08:55 eileen: civicrm revision is d1d65f37ea, config revision is 5a6a9c6c03
  • 08:38 vgutierrez: disabling systemd hardening for ats-tls on cp5001 - T232298
  • 07:33 moritzm: installing ghostscript security updates
  • 03:53 vgutierrez: reboot analytics-tool1001
  • 02:59 bd808: Testing twitter integration after software update for Stashbot. In theory messages up to 280 characters in length will now be passed through to the @wikimediatech Twitter feed without being truncated. This message should end with a unicorn face if that is correct. 🦄

2019-09-08

2019-09-06

  • 21:33 cdanis: cdanis@mw1317.eqiad.wmnet ~ 🕠🍺 sudo -i depool
  • 21:27 James_F: mw1317 seems corrupted (Fatal error: Class undefined: stdClass in /srv/mediawiki/php-1.34.0-wmf.21/includes/libs/rdbms/database/DatabaseMysqli.php); running scap pull
  • 18:01 godog: silence esams pages for 30m
  • 17:43 crusnov@deploy1001: Finished deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux (duration: 02m 55s)
  • 17:40 crusnov@deploy1001: Started deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux
  • 17:39 crusnov@deploy1001: Finished deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux 3 (duration: 00m 21s)
  • 17:38 crusnov@deploy1001: Started deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux 3
  • 17:26 crusnov@deploy1001: Finished deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux 2 (duration: 00m 37s)
  • 17:25 crusnov@deploy1001: Started deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux 2
  • 17:25 crusnov@deploy1001: Finished deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux (duration: 01m 29s)
  • 17:24 crusnov@deploy1001: Started deploy [netbox/deploy@dea254a]: deploy for netbox split T223291 - buster redux
  • 14:56 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 14:51 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 14:48 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 14:43 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
  • 12:38 ema: cp5001: restart trafficserver-tls.service to clear icinga alert after segfault
  • 12:36 moritzm: fix permissions on /var/spool/exim on krypton (hosts used to run the exim heavy role which uses different permissions than the light role)
  • 10:59 onimisionipe: force shard allocation - chi eqiad
  • 10:59 Amir1: ladsgroup@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=testwikidatawiki (T225056)
  • 10:17 moritzm: installing exim4 security updates
  • 08:43 mutante: webperf* - /usr/local/sbin/build-envoy-config -c /etc/envoy | rm /etc/envoy/listeners.d/00-tls_terminator_443.yaml | run puppet - envoy now listening on 443 (T210411)
  • 07:48 mutante: running puppet on cp-text_eqiad / cp1075 - switching releases.wikimedia.org to TLS to backend
  • 06:29 oblivian@deploy1001: Synchronized README: testing php conditional restarts (duration: 00m 55s)
  • 06:09 mutante: puppetmaster1001 - same for restbase-dev1005 and restbase-dev1006 (T224554)
  • 06:03 mutante: puppetmaster1001 - copying cassandra-ca-manager to /usr/local/bin - deleting expired restbase-dev1004 certs - running cassandra-ca-manager services-dev.yaml T224554
  • 05:31 marostegui: Stop MySQL on db2046 - T231767
  • 05:11 marostegui: Remove db2046 from tendril and zarcillo - T231767
  • 04:54 _joe_: run systemctl reset-failed on kafka1001 to clear a 13 hours icinga alert
  • 03:21 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: deploy for netbox split T223291 (duration: 00m 14s)
  • 03:21 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: deploy for netbox split T223291
  • 03:16 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: deploy for netbox split T223291 (testing) (duration: 00m 20s)
  • 03:16 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: deploy for netbox split T223291 (testing)
  • 03:07 chaomodus: restarting keyholder on deploy1001
  • 02:34 ejegg: rolled back payments-wiki to 51d9ed79b6
  • 02:25 ejegg: updated payments-wiki (again) from 51d9ed79b6 to 04120169b0... false alarm
  • 02:15 ejegg: payments-wiki rolled back to 51d9ed79b6
  • 02:11 ejegg: updated payments-wiki from 51d9ed79b6 to 04120169b0
  • 01:44 eileen: tools revision changed from 643c48b26a to 1e405864d7
  • 01:18 ayounsi@deploy1001: Finished deploy [netbox/deploy@367ca84]: test (duration: 00m 02s)
  • 01:18 ayounsi@deploy1001: Started deploy [netbox/deploy@367ca84]: test

2019-09-05

  • 23:13 ayounsi@deploy1001: Finished deploy [netbox/deploy@367ca84]: test (duration: 00m 42s)
  • 23:12 ayounsi@deploy1001: Started deploy [netbox/deploy@367ca84]: test
  • 23:09 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T151425 Require that passwords are not in the most common 100k list for all users (duration: 00m 48s)
  • 22:12 eileen: tools revision changed from b42bda6bf3 to 643c48b26a
  • 21:42 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: deploy for netbox split T223291 (duration: 00m 03s)
  • 21:42 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: deploy for netbox split T223291
  • 21:35 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: test deploy for netbox split - again (duration: 00m 12s)
  • 21:34 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: test deploy for netbox split - again
  • 19:28 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: c7678f0e3d638 (duration: 00m 47s)
  • 19:21 krinkle@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/WikimediaMaintenance/blameStartupRegistry.php: 7adf466614d (duration: 00m 48s)
  • 18:10 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: test deploy for netbox split (duration: 38m 39s)
  • 17:31 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: test deploy for netbox split
  • 16:22 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch all events to eventgate - T228705 - take 2 (duration: 00m 49s)
  • 16:06 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch all events to eventgate - T228705 (duration: 00m 48s)
  • 16:04 ottomata: switching remaining job queue events (and all remaining events) to eventgate - T228705
  • 15:59 jynus: restarting batch processes on mwmaint1002 T232106
  • 15:54 jynus@deploy1001: Synchronized private/PrivateSettings.php: updating cli password (duration: 00m 47s)
  • 15:23 herron: beginning replacement of kafka1001 with kafka-main1001 T225005
  • 14:54 ema: restbase2009: repool after successful envoy deployment T210411
  • 14:50 ema: restbase2009: depool and add TLS termination w/ envoy -- https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/533028/ T210411
  • 14:42 XioNoX: remove iron from mr* routers - T231811
  • 14:30 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1003.eqiad.wmnet
  • 14:15 @: helmfile [EQIAD] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 14:14 @: helmfile [CODFW] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
  • 14:11 cdanis: restarted swiftrepl on ms-fe1005 T231110
  • 13:54 @: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 13:39 moritzm: upgrading remaining API servers to PHP 7.2.22
  • 13:37 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 13:21 filippo@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=prometheus1003.eqiad.wmnet
  • 13:17 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1004.eqiad.wmnet
  • 13:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.21
  • 12:47 @: helmfile [STAGING] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 12:13 moritzm: upgrading mw1284-mw1290 to PHP 7.2.22
  • 12:02 @: helmfile [STAGING] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 11:57 moritzm: upgrading remaining job runners to PHP 7.2.22
  • 11:50 dcausse: EU swat done
  • 11:48 dcausse@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/CirrusSearch/: T159321: Add morelikethis a non-greedy version of the morelike keyword (duration: 00m 59s)
  • 10:53 godog: temporarily enable prometheus admin web api in prometheus@ops in eqiad to delete spammy metrics - T228395
  • 10:49 filippo@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=prometheus1004.eqiad.wmnet
  • 10:46 moritzm: upgrading mw1221-mw1335 to PHP 7.2.22
  • 10:31 moritzm: upgrading mw1319-mw1333 to PHP 7.2.22
  • 10:28 _joe_: upgrading scap across the fleet T224857
  • 10:25 moritzm: upgrading mw1238-mw1258 to PHP 7.2.22
  • 09:39 mutante: ganeti1001 - creating VM moscovium (T232077)
  • 09:26 vgutierrez: rolling back from ats-tls to nginx on cp1076 - T231433
  • 09:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:05 hashar@deploy1001: rebuilt and synchronized wikiversions files: Promote wikidatawiki to 1.34.0-wmf.21 for T232035 - T220746
  • 09:04 vgutierrez: rolling back from ats-tls to nginx on cp3034 - T231433
  • 08:55 hashar@deploy1001: rebuilt and synchronized wikiversions files: Rollback wikidatawiki to 1.34.0-wmf.20 for T232035
  • 08:38 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=a.*-ro,name=codfw
  • 08:37 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:35 jmm@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:32 akosiaris: depool restbase1022 T232007
  • 08:30 vgutierrez: rebooting cp3034
  • 08:23 vgutierrez: repooling cp3034
  • 08:21 hashar@deploy1001: rebuilt and synchronized wikiversions files: Promote wikidatawiki to 1.34.0-wmf.21 for T232035 - T220746
  • 08:16 moritzm: reimage restbase-dev1004 to Stretch T224554
  • 08:13 _joe_: upgrading scap on deploy1001
  • 08:09 vgutierrez: depooling cp3034 due to intermittent network issues
  • 07:57 _joe_: upgrading scap on mwdebug1001
  • 07:56 _joe_: uploading scap 3.12.1 to reprepro on all distros 224857
  • 07:56 hashar: Switching "wikidatawiki" on mwdebug1001 to 1.34.0-wmf.21 by editing /srv/mediawiki/wikiversions.php # T232035
  • 07:53 marostegui: Remove old backups for db2037 and db2042 from dbprov2001
  • 07:45 marostegui: Remove puppet grants from m1 for the following IPs: 10.64.0.165 10.64.16.159 10.64.16.18 T231539
  • 07:32 moritzm: upgrading mw1293-mw1296, mw1299-mw1306 to PHP 7.2.22
  • 07:31 mutante: ununpentium - removed /etc/envoy/envoy.yaml; ran /usr/local/sbin/build-envoy-config -c /etc/envoy to regenarate config without 443 listener; ran puppet; envoy now running on jessie
  • 07:07 mutante: ununpentium - manually delete /etc/envoy/listeners.d/00-tls_terminator_443.yaml after changing port to 1443 - puppet does not remove it
  • 06:44 kart_: Updated cxserver to 2019-09-04-065911-production (T213255, T206310)
  • 06:41 @: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 06:39 @: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 06:38 @: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 05:42 marostegui: Remove grants for dbproxy1005 T231280 T231967
  • 05:31 marostegui: Restart MySQL on codfw sanitariums (db1124 and db1125) to pick up new filters - T51195
  • 05:29 marostegui: Restart wikibugs
  • 05:21 mutante: ganeti2005 - DRAC reset fails - ipmi_cmd_cold_reset: bad completion code
  • 05:19 mutante: ganeti2005 - reset DRAC via local IPMI since mgmt stopped responding
  • 05:14 marostegui: Restart MySQL on codfw sanitariums (db2094 and db2095) to pick up new filters - T51195
  • 04:57 vgutierrez: rearming keyholder on cumin1001
  • 04:42 vgutierrez: upgrading ATS to 8.0.5-1wm5 on cp4021 - T231433
  • 04:37 vgutierrez: switching cp4021 from nginx to ats-tls - T231433
  • 04:31 vgutierrez: upgrading ATS to 8.0.5-1wm5 on cp3034 - T231433
  • 04:20 vgutierrez: switching cp3034 from nginx to ats-tls - T231433
  • 04:02 vgutierrez: upgrading ATS to 8.0.5-1wm5 on cp1076 - T231433
  • 03:57 vgutierrez: switching cp1076 from nginx to ats-tls - T231433
  • 00:55 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Factor out write of variant config into MWConfigCacheGenerator, part 2 (duration: 00m 53s)
  • 00:54 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: CommonSettings: Factor out write of variant config into MWConfigCacheGenerator, part 1 (duration: 00m 56s)
  • 00:04 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Factor out load of variant config into MWConfigCacheGenerator, part 2 (duration: 00m 55s)
  • 00:02 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: CommonSettings: Factor out load of variant config into MWConfigCacheGenerator, part 1 (duration: 00m 55s)

2019-09-04

  • 23:36 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings: Factor out variant config generation into MWConfigCacheGenerator, part 2 (duration: 00m 55s)
  • 23:33 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: CommonSettings: Factor out variant config generation into MWConfigCacheGenerator, part 1 (duration: 00m 54s)
  • 23:05 urandom: decommission restbase-dev1004-b (Cassandra) -- T224554
  • 21:58 andrewbogott: attached to console on cumin1001, found it in bios 'system settings', exited, allowed boot to continue. No idea how it got there — spontaneous reboot?
  • 21:12 crusnov@deploy1001: Finished deploy [netbox/deploy@367ca84]: (no justification provided) (duration: 08m 55s)
  • 21:03 crusnov@deploy1001: Started deploy [netbox/deploy@367ca84]: (no justification provided)
  • 20:14 urandom: decommission restbase-dev1004-a (Cassandra) -- T224554
  • 20:00 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 19:35 hashar@deploy1001: rebuilt and synchronized wikiversions files: rollback wikidatawiki to 1.34.0-wmf.20 for T232035 - T220746
  • 19:33 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 19:17 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 19:00 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.21 (duration: 00m 54s)
  • 18:59 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.21
  • 17:59 jforrester@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/GrowthExperiments/modules/homepage/: T229271 Homepage: Unbreak question dialogs on mobile (duration: 00m 56s)
  • 17:47 jforrester@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: T150418 Fix HTML blacklist inheritance to avoid copy-pasted read <ref>s again (duration: 00m 57s)
  • 17:45 jforrester@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: T150418 Fix HTML blacklist inheritance to avoid copy-pasted read <ref>s again (duration: 00m 56s)
  • 17:43 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch all non-low-traffic jobs to eventgate - T228705 - take 2 (duration: 00m 55s)
  • 17:34 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch all non-low-traffic jobs to eventgate - T228705 (duration: 00m 56s)
  • 17:32 ottomata: Switch all non-low-traffic jobs to eventgate - T228705
  • 17:14 @: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 16:50 @: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 16:48 joal@deploy1001: Finished deploy [analytics/refinery@2322f10]: Fix for yesterday regular analytics deploy (duration: 53m 16s)
  • 16:40 Lucas_WMDE: Morning SWAT done
  • 16:38 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/AbuseFilter: SWAT: Fix filter validation in ViewEdit (T231985) (duration: 00m 58s)
  • 16:11 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 533172|Move ContentTranslation out of Beta in jvwiki (T231207) (duration: 00m 56s)
  • 15:55 joal@deploy1001: Started deploy [analytics/refinery@2322f10]: Fix for yesterday regular analytics deploy
  • 15:36 godog: upgrade grafana to 5.4.5 on labmon
  • 14:51 andrewbogott: reimaging cloudvirt1015 for T220853
  • 14:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove obsoleted DB config from db-eqiad.php T231642 (duration: 00m 57s)
  • 14:08 cdanis: If0dd79604 actually live on canaries now
  • 14:04 cdanis: If0dd79604 deployed to eqiad MW canaries T231642
  • 13:59 moritzm: installing nghttp2 security updates
  • 13:59 cdanis: manually testing If0dd79604 on mwdebug1001
  • 13:47 _joe_: restarting php7.2-fpm across the fleet to pick up the apc.ttl removal
  • 13:20 cdanis@deploy1001: Synchronized wmf-config/db-codfw.php: a8dc4c4a0 db-codfw: remove obsoleted DB config T231642 (duration: 00m 55s)
  • 13:20 oblivian@cumin1001: END (PASS) - Cookbook sre.mediawiki.restart-appservers (exit_code=0)
  • 13:17 oblivian@cumin1001: START - Cookbook sre.mediawiki.restart-appservers
  • 13:17 oblivian@cumin1001: END (FAIL) - Cookbook sre.mediawiki.restart-appservers (exit_code=99)
  • 13:17 oblivian@cumin1001: START - Cookbook sre.mediawiki.restart-appservers
  • 12:56 cdanis: manually testing I1bc6d1603 on mwdebug2002
  • 12:49 gehel: reset kartotherian password on maps slaves - T231964
  • 12:36 gehel: restart kartotherian on maps1001 - T231964
  • 11:52 dcausse: EU SWAT done
  • 11:49 dcausse@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: T231194: [cirrus] Reenable sanity checks (duration: 00m 56s)
  • 11:47 dcausse@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/CirrusSearch/: T159321: Add morelikethis a non-greedy version of the morelike keyword (duration: 00m 57s)
  • 11:47 Amir1: start of ladsgroup@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --to-id 2000000 --sleep 2 > ~/rebuildItemTerms.out 2> rebuildItemTerms.err (T225056). This is going to take a while. On screen
  • 11:38 moritzm: upgrading mw1339-mw1348 to PHP 7.2.22
  • 11:37 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set item terms migration stage for Wikidata on WRITE_BOTH up to Q2m (T225055) (duration: 00m 55s)
  • 11:32 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add high-density logos for the Incubator (T230122) (duration: 00m 56s)
  • 11:28 ladsgroup@deploy1001: Synchronized static/images/project-logos/incubatorwiki-2x.png: SWAT: Add high-density logos for the Incubator (T230122) Part II (duration: 00m 54s)
  • 11:27 ladsgroup@deploy1001: Synchronized static/images/project-logos/incubatorwiki-1.5x.png: SWAT: Add high-density logos for the Incubator (T230122) Part I (duration: 00m 52s)
  • 11:24 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf '%s\n' 'https://en.wikipedia.org/static/images/project-logos/wikidatawiki-1.5x.png' | mwscript purgeList.php wikidatawiki # T230120
  • 11:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add high-density logos for Wikidata (T230120) (duration: 00m 55s)
  • 11:14 ladsgroup@deploy1001: Synchronized static/images/project-logos/wikidatawiki-2x.png: SWAT: Add high-density logos for Wikidata (T230120) Part II (duration: 00m 56s)
  • 11:12 ladsgroup@deploy1001: Synchronized static/images/project-logos/wikidatawiki-1.5x.png: SWAT: Add high-density logos for Wikidata (T230120) Part I (duration: 00m 56s)
  • 10:42 marostegui: Start event scheduler on db1115 T231769
  • 10:23 vgutierrez: upgrading ATS to 8.0.5-1wm5 on cp2002 - T231859
  • 10:20 marostegui: Start MySQL on db1115 without the event scheduler - T231769
  • 10:12 marostegui: Stop MySQL on db1115 without the event scheduler - T231769
  • 10:12 vgutierrez: upgrading ATS to 8.0.5-1wm5 on cp5001 - T231859
  • 10:11 @: helmfile [STAGING] Ran 'sync' command on namespace 'restrouter' for release 'staging' .
  • 10:11 marostegui: Tendril/dbtree will be unavailable for a few minutes T231769
  • 10:11 marostegui: Stop MySQL on db1115 - T231769
  • 10:09 vgutierrez: uploaded trafficserver 8.0.5-1wm5 to apt.wikimedia.org (stretch) - T231533 T231859
  • 09:33 moritzm: upgrading mw servers in codfw to 7.2.22
  • 09:19 _joe_: uploaded envoyproxy to buster
  • 08:56 moritzm: upgrading mw1238-mw1258 to PHP 7.2.22
  • 08:42 marostegui: Stop HAproxy on dbproxy1005 - T231967
  • 08:37 moritzm: upgrading API canaries in eqiad to 7.2.22
  • 08:26 marostegui: Reboot db1135 to pick up new kernel - T231403
  • 07:50 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2047 from config T231852 (duration: 00m 54s)
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2047 from config T231852 (duration: 00m 57s)
  • 07:21 mutante: ununpentium - a2dismod ssl - systemctl restart apache2
  • 05:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:46 krinkle@deploy1001: Synchronized php-1.34.0-wmf.21/resources/src/startup/mediawiki.js: 8a1b13026 (duration: 00m 55s)
  • 02:42 krinkle@deploy1001: Synchronized php-1.34.0-wmf.21/resources/src/mediawiki.base/mediawiki.base.js: 8a1b13026 (duration: 00m 56s)
  • 02:21 chaomodus: extending downtime on netmon1002 and netmon2001, netbox1001, netbox2001, netboxdb1001 and netbox2001 should be stable but are still being debugged
  • 01:02 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: ed5297c10 / T217830 (duration: 00m 59s)
  • 00:02 chaomodus: installing and setting up netbox instances T223291

2019-09-03

  • 23:57 niharika29@deploy1001: Synchronized wmf-config/CommonSettings.php: Revert - [bugfix]Growth experiments not loading conf properly T231935 (duration: 00m 55s)
  • 23:56 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert - [bugfix]Growth experiments not loading conf properly T231935 (duration: 00m 55s)
  • 23:54 niharika29@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/GrowthExperiments/: Set correct merge strategy for help panel links T231935 (duration: 00m 55s)
  • 23:52 niharika29@deploy1001: Synchronized php-1.34.0-wmf.20/extensions/GrowthExperiments/: Set correct merge strategy for help panel links T231935 (duration: 00m 56s)
  • 23:42 niharika29@deploy1001: Synchronized php-1.34.0-wmf.20/tests/phpunit/: Allow CompositeBlock::appliesToRight to return null when unsure T229417, T231145 (duration: 00m 57s)
  • 23:41 niharika29@deploy1001: Synchronized php-1.34.0-wmf.20/includes/block: Allow CompositeBlock::appliesToRight to return null when unsure T229417, T231145 (duration: 00m 55s)
  • 23:28 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure ORES damaging and goodfaith on zhwiki T225562 (duration: 00m 58s)
  • 23:10 ebernhardson: production-search-eqiad all indices index.merge.policy.deletes_pct_allowed=20
  • 22:54 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T208694 Set CentralNotice's wgNoticeProjects for wikimedia (duration: 00m 59s)
  • 22:45 eileen: process-control config revision is 100334de4a adjust silverpop schedule
  • 19:42 XioNoX: rollback OSPF metric change on eqiad-codfw Zayo link (1320->320)
  • 19:20 fdans@deploy1001: Started restart [analytics/aqs/deploy@fc1d232]: (no justification provided)
  • 19:18 fdans@deploy1001: Started restart [analytics/aqs/deploy@fc1d232]: (no justification provided)
  • 19:14 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch high-traffic jobs to eventgate. Take 2 - T228705 (duration: 00m 56s)
  • 19:12 ottomata: switching jobqueue events to eventgate-main - T228705
  • 18:41 urbanecm@deploy1001: Synchronized wmf-config/: Emergency fix: GE not loading configuration properly: newbie facing feature (duration: 00m 57s)
  • 18:35 Urbanecm: Livetesting on mwdebug1002
  • 17:45 James_F: Pulled I9b64a2bb770 into wmf.21 production on the deploy server; no need to deploy to app-servers, CI-only fix.
  • 17:40 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.21
  • 16:35 catrope@deploy1001: Synchronized php-1.34.0-wmf.21/extensions/Graph/includes/ApiGraph.php: T231894 (duration: 00m 55s)
  • 16:01 joal@deploy1001: Finished deploy [analytics/refinery@8b17711]: Fixes for regualr analytics deploy (duration: 136m 59s)
  • 15:55 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T227260 (duration: 00m 54s)
  • 15:32 ebernhardson: unban elastic1027 from production-search-eqiad
  • 15:07 hashar@deploy1001: rebuilt and synchronized wikiversions files: testwiki 1.34.0-wmf.21 for T231894 - T220746
  • 14:57 hashar@deploy1001: rebuilt and synchronized wikiversions files: Rollback group0 to 1.34.0-wmf.21 - T220746
  • 14:45 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.21 - T220746
  • 14:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Promote db1133 as wikitech master T229657 (duration: 00m 54s)
  • 14:28 hashar@deploy1001: Finished scap: testwiki to 1.34.0-wmf.21 and rebuild l10n cache - T220746 (duration: 50m 09s)
  • 14:21 moritzm: upgrading app server canaries to PHP 7.2.22 T230024
  • 13:44 joal@deploy1001: Started deploy [analytics/refinery@8b17711]: Fixes for regualr analytics deploy
  • 13:38 hashar@deploy1001: Started scap: testwiki to 1.34.0-wmf.21 and rebuild l10n cache - T220746
  • 13:26 hashar: Gerrit should be fine again, apparently was due to the wmf branch cut taking too much resources (sic) - T231872 filled to investigate
  • 13:25 hashar: 1.34.0-wmf.21 cut
  • 13:16 hashar: Gerrit has some random times out from time to time (no reason)
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1073 from wikitech T229657', diff saved to https://phabricator.wikimedia.org/P9038 and previous config saved to /var/cache/conftool/dbconfig/20190903-131456-marostegui.json
  • 13:13 marostegui: Re-enable puppet on db1073 and db1133 T229657
  • 13:11 marostegui: Reload haproxy on dbproxy1005 T229657
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Set wikitech back to RW after maintenance T229657', diff saved to https://phabricator.wikimedia.org/P9037 and previous config saved to /var/cache/conftool/dbconfig/20190903-131000-marostegui.json
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set wikitech as read-only for maintenance T229657', diff saved to https://phabricator.wikimedia.org/P9033 and previous config saved to /var/cache/conftool/dbconfig/20190903-130113-marostegui.json
  • 13:00 marostegui: Failover m5 from db1073 to db1133 - T229657
  • 12:52 moritzm: uploaded PHP 7.2.22 to component/php72 T230024
  • 12:39 moritzm: upgrading mwdebug2001 to PHP 7.2.22
  • 12:29 hashar: Cutting wmf/1.34.0-wmf.21 # T220746
  • 12:19 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.20
  • 12:02 marostegui: Disable puppet on db1073 and db1133 - T229657
  • 11:55 marostegui: Change topology on m5 and make everything replicate from db1133 - T229657
  • 11:48 marostegui: Downtime m5 hosts T229657
  • 11:35 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --to-id 1000 --sleep 2 (T225056)
  • 11:29 Amir1: EU SWAT is done
  • 11:29 Amir1: ladsgroup@mwmaint1002:~$ mwscript namespaceDupes.php bswiki --fix (T231654)
  • 11:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix wgMetaNamespaceTalk for bswiki (T231654) (duration: 00m 54s)
  • 11:25 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Bump MobileWebUIActionsTracking sampling rate to 1 percent (T220016) (duration: 00m 52s)
  • 11:11 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Bump MobileWebUIActionsTracking sampling rate to 1 percent (T220016) (duration: 00m 53s)
  • 11:07 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable WRITE_BOTH for items term store for wikidatawiki (T225055) (duration: 00m 55s)
  • 10:17 ema: cp1083: varnish-backend-restart -- mbox lag, fetch failures
  • 09:59 _joe_: removing old lvs-related scripts from ores*
  • 09:46 moritzm: moved uid=smalyshev from cn=wmf to cn=nda
  • 09:46 mutante: install1002 - import GPG key for getenvoy repo, importing envoy for jessie with reprepro update
  • 09:16 hashar: Deploy refactor of Zuul pipelines which might mean that some repos/branches would miss jobs or have extra unwanted jobs. In such case please fill in a task against #continuous-integration-config
  • 09:04 ema: cp1085: varnish-backend-restart, mbox lag and fetch failures
  • 09:03 gehel: reset kartotherian password -T231842
  • 08:54 ema: cp1089: varnish-backend-restart due to mbox lag and fetch failures
  • 08:49 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
  • 08:49 ema: cp1075: pool ats-be with caching enabled T228629
  • 08:26 marostegui: Add REPLICATION grant to wikiuser and wikiadmin on db1073 with replication enabled - T229657
  • 08:21 gehel: purging maps / info.json from cache - T231842
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1133 with weight 0 T229657', diff saved to https://phabricator.wikimedia.org/P9031 and previous config saved to /var/cache/conftool/dbconfig/20190903-080958-marostegui.json
  • 08:04 joal@deploy1001: Finished deploy [analytics/refinery@4810dfa]: Regular weekly analytics deploy train - Second try (duration: 00m 27s)
  • 08:03 joal@deploy1001: Started deploy [analytics/refinery@4810dfa]: Regular weekly analytics deploy train - Second try
  • 08:02 joal@deploy1001: deploy aborted: Regular weekly analytics deploy train (duration: 27m 47s)
  • 07:16 marostegui: Change min_replicas to 6 on s1 for eqiad and codfw T231019
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1133 with weight 0 T229657', diff saved to https://phabricator.wikimedia.org/P9029 and previous config saved to /var/cache/conftool/dbconfig/20190903-063932-marostegui.json
  • 06:10 mutante: running puppet on cp-text_eqiad to switch people.wm.org to https backend
  • 06:04 marostegui: Change min_replicas to 4 on s7 for eqiad and codfw T231019
  • 05:53 mutante: people.wikimedia.org - switching to TLS termination with envoy
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Reorganize s7 codfw T230106', diff saved to https://phabricator.wikimedia.org/P9028 and previous config saved to /var/cache/conftool/dbconfig/20190903-055234-marostegui.json
  • 05:47 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Reorganize s7 codfw T230106 (duration: 00m 54s)
  • 05:22 marostegui: Rename tables on the puppet database on m1 master - T231539
  • 05:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2118 to s7 codfw master (db2047 -> db2118) T230106 (duration: 00m 54s)
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2047 old master from s7 T230106', diff saved to https://phabricator.wikimedia.org/P9027 and previous config saved to /var/cache/conftool/dbconfig/20190903-051619-marostegui.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2118 to s7 codfw master (db2047 -> db2118) T230106', diff saved to https://phabricator.wikimedia.org/P9026 and previous config saved to /var/cache/conftool/dbconfig/20190903-051450-marostegui.json
  • 05:02 marostegui: Promote db2118 to s7 codfw master (db2047 -> db2118) T230106
  • 04:50 marostegui: Drop filejournal table on s3 - T51195
  • 04:49 vgutierrez: repooling cp2002 - T231433
  • 04:36 vgutierrez: upgrading ATS to 8.0.5-1wm4 on cp2002 - T231433
  • 04:28 vgutierrez: Switching cp2002 from nginx to ats-tls - T231433

2019-09-02

  • 22:08 ebernhardson: ban elastic1027 from production-search-chi
  • 20:48 ebernhardson: restart production-search-eqiad on elastic1027 again
  • 20:33 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@453ee8a]: Make osm-pbf source private (T231842) (duration: 02m 09s)
  • 20:31 mbsantos@deploy1001: Started deploy [kartotherian/deploy@453ee8a]: Make osm-pbf source private (T231842)
  • 19:54 ebernhardson: restart elasticsearch_6@production-search-eqiad on elastic1027
  • 17:57 mateusbs17: regenerating tiles from z0 to z9 in eqiad and codfw- T231691, T230511
  • 15:08 moritzm: installing libssh2 security updates
  • 14:36 moritzm: installing ghostscript updates on thumbor1001
  • 14:24 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 14:21 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 14:10 @: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 13:44 akosiaris: resync the sessionstore staging release as there was wrong port mapping (port 8080 instead of 8081) for both netpol and service
  • 13:43 @: helmfile [STAGING] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 13:40 @: helmfile [STAGING] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
  • 13:09 vgutierrez: upgrading prometheus-trafficserver-exporter to version 0.3.2 on the cache cluster - T231533
  • 12:58 vgutierrez: upgrading prometheus-trafficserver-exporter to version 0.3.2 on cp5001 - T231533
  • 12:46 vgutierrez: uploaded prometheus-trafficserver-exporter 0.3.2 to apt.wikimedia.org (stretch) - T231533
  • 12:40 moritzm: installing freetype security updates on jessie (stretch/buster already fixed)
  • 11:23 moritzm: installing apache2 security updates on jessie
  • 11:18 moritzm: imported apache2 2.4.10-10+deb8u15+wmf1 to apt.wikimedia.org/jessie-wikimedia (rebuild of latest Jessie update against our patches)
  • 10:25 moritzm: installing libav security updates
  • 10:07 moritzm: installing subversion security updates on jessie
  • 09:21 marostegui: Drop filejournal table on s7 - T51195
  • 09:15 marostegui: Drop filejournal table on s1 - T51195
  • 08:45 marostegui: Drop filejournal table on s8 - T51195
  • 08:27 marostegui: Drop filejournal table on labtestwiki - T51195
  • 08:25 marostegui: Drop filejournal table on s2 - T51195
  • 08:15 godog: upgrade grafana to 5.4.5 on grafana1001
  • 08:12 godog: update amd-rocm debian repository gpg key (same id, new expiration)
  • 07:34 marostegui: Drop filejournal table on s4 - T51195
  • 07:26 marostegui: Drop filejournal table on s5 - T51195
  • 07:17 marostegui: Drop filejournal table on s6 - T51195
  • 05:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2046 from config T231767 (duration: 00m 53s)
  • 05:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2046 from config T231767 (duration: 00m 55s)

2019-09-01

  • 17:53 Urbanecm: Run mwscript extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --wiki=enwikiquote --verbose (T231137)
  • 17:45 Urbanecm: Run mwscript extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --wiki=metawiki --verbose (T231137)
  • 17:33 Urbanecm: Run foreachwikiindblist group1.dblist extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --dry-run --verbose (T231137)
  • 17:29 Urbanecm: Previous should be *group0.dblist (T231137)
  • 17:29 Urbanecm: Run foreachwikiindblist group0 extensions/AbuseFilter/maintenance/fixFirstBlockautopromoteEntries.php --dry-run --verbose (T231137)


Archives

See Server admin log/Archives.