You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ejegg: updated fundraising CiviCRM from 256adda03c to a30da7f92a)
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(263 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-10-03 ==
== 2021-08-03 ==
* 00:08 ejegg: updated fundraising CiviCRM from {{Gerrit|256adda03c}} to {{Gerrit|a30da7f92a}}
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== 2020-10-02 ==
== 2021-08-02 ==
* 22:00 mutante: depooling mw2271 because Icinga alerts about memcached and SAL shows there were ongoing tests of some kind on it
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=mw2271.codfw.wmnet
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 21:32 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:26 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 21:22 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:14 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:35 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:27 effie: enable puppet on mw2271
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 18:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events (duration: 02m 01s)
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 18:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 17:15 mutante: submitted puppet refactoring change on maps servers
* 21:31 tzatziki: removing 1 file for legal compliance
* 16:49 effie: disable puppet on mw2271 and briefly depool it
* 21:16 tzatziki: removing 7 files for legal compliance
* 15:39 _joe_: restarting redis on rdb2003, instance 6380
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 15:28 hnowlan: bootstrapping restbase1030-a
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:25 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:45 cdanis@deploy1001: Synchronized docroot/wikimediafoundation.org: Separate foundation.wikimedia.org docroot & add .well-known/matrix/server [[phab:T261531|T261531]] {{Gerrit|4573776bd}} {{Gerrit|2fb4c20ae}} (duration: 01m 01s)
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:19 moritzm: installing LLVM 7 bugfix updates from Buster point release
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 14:08 effie: enable puppet on mwdebug1001
* 19:00 urbanecm: Morning B&C window completed
* 14:08 moritzm: purging some unused kernels on ping* (these only have 3GB "disks")
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 13:37 Urbanecm: Create bot_passwords table at fishbowl wikis ([[phab:T258356|T258356]])
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 13:35 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12905 and previous config saved to /var/cache/conftool/dbconfig/20201002-133545-kormat.json
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:20 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12904 and previous config saved to /var/cache/conftool/dbconfig/20201002-132042-kormat.json
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:00 moritzm: installing Linux 4.19.146 on Buster updates (from latest Buster point release, at this point only installing the updates, no reboots (yet))
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 12:26 effie: disable puppet on mwdebug1001
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db2140 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12903 and previous config saved to /var/cache/conftool/dbconfig/20201002-121830-kormat.json
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 12:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 12:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 12:08 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12902 and previous config saved to /var/cache/conftool/dbconfig/20201002-120825-kormat.json
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:05 hnowlan: bootstrapping restbase1029-c
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 11:53 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12901 and previous config saved to /var/cache/conftool/dbconfig/20201002-115322-kormat.json
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 11:22 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 10:59 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 10:57 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:47 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 10:47 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:44 kormat@cumin1001: dbctl commit (dc=all): 'db2110 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12900 and previous config saved to /var/cache/conftool/dbconfig/20201002-104453-kormat.json
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 10:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 10:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:43 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12899 and previous config saved to /var/cache/conftool/dbconfig/20201002-104320-kormat.json
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:40 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 10:36 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 10:28 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 67%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12898 and previous config saved to /var/cache/conftool/dbconfig/20201002-102817-kormat.json
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 10:13 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 33%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12897 and previous config saved to /var/cache/conftool/dbconfig/20201002-101313-kormat.json
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 10:06 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 09:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 09:56 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 09:48 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 09:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 09:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 09:27 kormat@cumin1001: dbctl commit (dc=all): 'db2106 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12896 and previous config saved to /var/cache/conftool/dbconfig/20201002-092715-kormat.json
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 09:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 09:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 09:19 jayme: running ipvsadm -D -t 10.2.1.20:10042; ipvsadm -D -t 10.2.1.16:1969 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 09:18 jayme: running ipvsadm -D -t 10.2.2.20:10042; ipvsadm -D -t 10.2.2.16:1969 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 12:20 mutante: gerrit servers: disabling puppet
* 09:17 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 09:14 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 09:12 jayme: running puppet on lvs servers - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 09:11 arturo: added helm3 package to buster-wikimedia/thirdparty/kubeadm-k8s-1-17 ([[phab:T264221|T264221]])
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 09:09 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 09:08 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 09:08 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 09:07 hnowlan: bootstrapping restbase1029-b cassandra
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 09:05 hashar: gerrit: running garbage collector
* 11:27 hashar: restarting Jenkins on contint2001
* 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:27 hashar: restarting Jenkins on contint1001
* 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 08:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:59 root@cumin1001: START - Cookbook sre.hosts.downtime
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 08:54 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 03s)
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:54 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
* 11:13 urbanecm: EU B&C window completed
* 08:42 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 34s)
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 08:41 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
* 11:08 moritzm: installing openjdk-11 security updates
* 08:30 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 00m 33s)
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 08:30 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 08:29 moritzm: installing pyzmq bugfix update from buster point release
* 07:24 moritzm: installing libsndfile security updates on buster
* 08:24 moritzm: installing nginx security updates on puppetdb*
* 07:12 moritzm: installing aspell security updates
* 08:17 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 01m 35s)
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:16 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:42 moritzm: installing libcommons-compress-java security updates
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)
* 07:35 godog: swift codfw-prod bump weight for ms-be2057 - [[phab:T261633|T261633]]
* 07:29 godog: prometheus codfw/k8s, add 50G to the LV
* 07:23 moritzm: installing libx11 security updates on buster
* 06:51 _joe_: restarting php-fpm on all appservers in eqiad, in batches of 10%, for testing the procedure suggested at [[phab:T264362|T264362]]
* 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2011 from dbctl [[phab:T264261|T264261]]', diff saved to https://phabricator.wikimedia.org/P12893 and previous config saved to /var/cache/conftool/dbconfig/20201002-053020-marostegui.json


== 2020-10-01 ==
== 2021-07-31 ==
* 23:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 34s)
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 23:38 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
* 23:33 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 24s)
* 23:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
* 23:07 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:36 James_F: Manually created mediawiki/extensions.git REL1_35 at {{Gerrit|7ab9a74c9ebbb22ad9fb9b7c95c91b7fad8bf8c6}} for [[phab:T264365|T264365]]
* 22:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 22:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 22:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 21:29 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 as well [[phab:T264363|T264363]]
* 21:29 James_F: Manually created mediawiki/skins.git REL1_35 at {{Gerrit|796693cb7a2ee3191fcbe19769d341bd0530bd4a}} for [[phab:T264365|T264365]]
* 21:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 21:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group1
* 20:48 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11  refs [[phab:T263177|T263177]] (duration: 01m 06s)
* 20:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11  refs [[phab:T263177|T263177]]
* 20:19 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
* 20:08 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.11/includes/parser/: sync ParserCache patches to unblock the train [[phab:T264257|T264257]] [[phab:T263177|T263177]] (duration: 00m 59s)
* 18:40 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: increase more_like recommendation cache from one to three days [[phab:T264053|T264053]] (duration: 00m 59s)
* 17:49 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}} (duration: 13m 42s)
* 17:35 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}}
* 17:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:24 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}} (duration: 01m 34s)
* 17:24 mutante: etherpad1002 - attempted to upgrade Etherpad to newer version but wasn't working, reverted to previous one
* 17:22 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}}
* 17:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:46 volans: migrating esams DNS records to the autogenerated ones from Netbox - [[phab:T258729|T258729]]
* 16:19 bblack: rebooting lvs1016 to a fresh state for interface config and error counters, etc - [[phab:T264227|T264227]]
* 15:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:53 bblack: lvs1016: re-disabled puppet with ticket ref in comment, downed interface enp5s0f0 since it's flapping furiously - [[phab:T264227|T264227]]
* 15:53 bblack: lvs1016: re-disabled puppet with ticket ref in comment, downed interface enp5s0f0 since it's flapping furiously
* 14:55 jayme: running ipvsadm -D -t 10.2.2.10:8081; ipvsadm -D -t 10.2.2.47:8889 on lvs1015.eqiad.wmnet - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 14:55 moritzm: installing npm security updates on buster
* 14:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:53 jayme: running ipvsadm -D -t 10.2.1.10:8081; ipvsadm -D -t 10.2.1.47:8889 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 14:52 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:50 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 14:48 jayme: restarting pybal on lvs2010.codfw.wmnet - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 14:42 jayme: running puppet on lvs servers - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 14:35 Urbanecm: Create bot_passwords table at all private wikis ([[phab:T258356|T258356]])
* 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:21 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12886 and previous config saved to /var/cache/conftool/dbconfig/20201001-142156-kormat.json
* 14:14 andrewbogott: reimaging cloudvirt-wdqs1001 to buster
* 14:12 effie: enable puppet on mw2271
* 14:08 moritzm: installing pillow security updates
* 14:06 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 67%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12885 and previous config saved to /var/cache/conftool/dbconfig/20201001-140653-kormat.json
* 13:59 moritzm: installing nginx security updates on schema*
* 13:51 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 33%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12884 and previous config saved to /var/cache/conftool/dbconfig/20201001-135149-kormat.json
* 13:50 klausman: rebooting an-worker1096 for cluster maintenance
* 13:49 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:49 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 13:43 vgutierrez: use synthetic warning for 2% of ECDHE-ECDSA-AES128-SHA pageviews - [[phab:T258405|T258405]]
* 13:29 moritzm: restarting mw canaries to pick up curl update
* 13:22 moritzm: installing curl security updates on stretch
* 12:57 kormat@cumin1001: dbctl commit (dc=all): 'db2136 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12883 and previous config saved to /var/cache/conftool/dbconfig/20201001-125707-kormat.json
* 12:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12882 and previous config saved to /var/cache/conftool/dbconfig/20201001-123925-kormat.json
* 12:24 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12881 and previous config saved to /var/cache/conftool/dbconfig/20201001-122422-kormat.json
* 12:15 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/GrowthExperiments/includes/NewcomerTasks/TemplateFilter.php: {{Gerrit|500d0c70c84936bcdecdd0927bcbb9ff7265afa9}}: Prevent returning the full templatelinks table in TemplateFilter ([[phab:T264029|T264029]]) (duration: 00m 59s)
* 12:12 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/TemplateFilter.php: {{Gerrit|500d0c70c84936bcdecdd0927bcbb9ff7265afa9}}: Prevent returning the full templatelinks table in TemplateFilter ([[phab:T264029|T264029]]) (duration: 01m 00s)
* 12:09 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12880 and previous config saved to /var/cache/conftool/dbconfig/20201001-120919-kormat.json
* 11:54 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12879 and previous config saved to /var/cache/conftool/dbconfig/20201001-115415-kormat.json
* 11:14 arturo: pulling packages into reprepro for buster-wikimedia/thirdpardy/kubeadm-k8s-1-17 ([[phab:T263284|T263284]])
* 11:09 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=kuwiktionary --fix # [[phab:T262046|T262046]]
* 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|58a8c8271d75ff477ce0507ac5021edcfc2f6453}}: kuwiktionary: Create Jinûvesazî namespace ([[phab:T262046|T262046]]) (duration: 01m 01s)
* 10:47 kormat@cumin1001: dbctl commit (dc=all): 'db2119 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12878 and previous config saved to /var/cache/conftool/dbconfig/20201001-104716-kormat.json
* 10:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:55 hnowlan: adding buster host restbase1028-b to cassandra
* 08:53 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:38 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 08:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2109', diff saved to https://phabricator.wikimedia.org/P12877 and previous config saved to /var/cache/conftool/dbconfig/20201001-083321-marostegui.json
* 08:28 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 08:27 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 08:25 akosiaris@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 08:25 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 08:25 akosiaris@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 08:22 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 08:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:16 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109 ', diff saved to https://phabricator.wikimedia.org/P12875 and previous config saved to /var/cache/conftool/dbconfig/20201001-081308-marostegui.json
* 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091', diff saved to https://phabricator.wikimedia.org/P12874 and previous config saved to /var/cache/conftool/dbconfig/20201001-071442-marostegui.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091 ', diff saved to https://phabricator.wikimedia.org/P12873 and previous config saved to /var/cache/conftool/dbconfig/20201001-071413-marostegui.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12872 and previous config saved to /var/cache/conftool/dbconfig/20201001-071347-marostegui.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12871 and previous config saved to /var/cache/conftool/dbconfig/20201001-071321-marostegui.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2083', diff saved to https://phabricator.wikimedia.org/P12870 and previous config saved to /var/cache/conftool/dbconfig/20201001-071241-marostegui.json
* 07:12 elukey: restart hdfs namenodes on an-worker100[1,2] to pick up new hadoop workers settings
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2083', diff saved to https://phabricator.wikimedia.org/P12869 and previous config saved to /var/cache/conftool/dbconfig/20201001-071155-marostegui.json
* 06:42 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 06:40 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Make es2033 master of es2 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12867 and previous config saved to /var/cache/conftool/dbconfig/20201001-063104-marostegui.json
* 06:18 jayme: imported envoyproxy 1.15.1 to buster-wikimedia, stretch-wikimedia - [[phab:T264157|T264157]]
* 05:45 marostegui: Stop MySQL on es2011 [[phab:T264261|T264261]]
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2011 [[phab:T264261|T264261]]', diff saved to https://phabricator.wikimedia.org/P12866 and previous config saved to /var/cache/conftool/dbconfig/20201001-054335-marostegui.json
* 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:29 marostegui: Deploy schema change on s3 (testwikidatawiki) [[phab:T264109|T264109]]
* 05:19 marostegui: Repool labsdb1011
* 04:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 04:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 01:27 krinkle@deploy1001: Synchronized php-1.36.0-wmf.10/includes/parser/: {{Gerrit|Ia3357b2f593c}} (duration: 00m 58s)
* 01:12 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|1721d2aa0}} - Reject ParserCache entries from the last wmf.11 deployment (duration: 05m 13s)


== 2020-09-30 ==
== 2021-07-30 ==
* 22:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 22:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 22:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 22:10 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 21:46 cdanis: depool mw2356 and mw2319
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 21:45 eileen: civicrm revision changed from {{Gerrit|5a53bfe6ed}} to {{Gerrit|256adda03c}}, config revision is {{Gerrit|646817a2c0}}
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 21:23 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 also
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 21:19 ejegg: updated fundraising CiviCRM from {{Gerrit|6e843649ac}} to {{Gerrit|5a53bfe6ed}}
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 21:04 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 21:00 twentyafterfour@deploy1001: scap failed: average error rate on 5/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 20:58 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 20s)
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 20:56 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:47 mutante: temp disabling puppet on C:profile::swift::stats_reporter hosts, applying gerrit:631158 refactoring change
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 20:36 mutante: temp disabling puppet on swift::storage (swift-be) hosts, applying gerrit:631157 refactoring change
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 19:21 mutante: activating DHCP and squid on install[345]001.wikimedia.org
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:12 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:01 effie: disable puppet on mw2271 and use onhost memcached - [[phab:T263958|T263958]]
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:00 hoo@deploy1001: Synchronized wmf-config/: Revert "labs: Turn on termbox v2 on wikidatawiki" ([[phab:T264066|T264066]]) (duration: 00m 58s)
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 18:58 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Revert "labs: Turn on termbox v2 on wikidatawiki" ([[phab:T264066|T264066]]) (duration: 00m 58s)
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:38 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on svwiki ([[phab:T257220|T257220]]) (duration: 00m 58s)
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 18:36 bblack: lvs1016 pybal diff alerts downtimed in icinga for ~48h to reduce annoying flappy alert spam, with reference to https://phabricator.wikimedia.org/T264227
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments for newcomers on ptwiki ([[phab:T225027|T225027]]) (duration: 00m 58s)
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 18:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Put search in header for anons on all wikis, not just desktop-improvements wikis ([[phab:T263032|T263032]]) (duration: 00m 59s)
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 18:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable clientError on Wikidata and all Wikipedias except enwiki ([[phab:T255585|T255585]]) (duration: 00m 58s)
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:08 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move search in header for anons ([[phab:T263032|T263032]]) (duration: 00m 59s)
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 17:52 bblack: lvs1016: restart pybal
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 17:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 17:01 hnowlan: finished adding restbase2018-a to the cassandra cluster
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 16:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 16:33 cicalese@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Add beta config for API Portal/OAuth communications (duration: 00m 58s)
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 16:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 16:21 mutante: re-enabled puppet on install2003
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 16:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 15:28 moritzm: removed librsvg 2.40.20-3+wmf1+stretch1 from component/thumbor, superseded by 2.40.21-0+deb9u1 released via stretch-security
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:23 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 14:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 14:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 14:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 14:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 14:10 cmjohnson1: powering down ores100[3-9 to upgrade memory in each [[phab:T259909|T259909]]
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 14:05 elukey: create thirdparty/amd-rocm33 for stretch-wikimedia
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 14:03 cmjohnson1: powering down ores1002 to upgrade memory [[phab:T259909|T259909]]
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 13:55 cmjohnson1: powering down ores1001 to upgrade memory [[phab:T259909|T259909]]
* 11:23 moritzm: installing libsndfile security updates on stretch
* 13:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 13:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 13:12 hnowlan: started bootstrapping restbase1028-a, first buster restbase host
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 12:39 marostegui: Deploy schema change on db2080, db2081 [[phab:T264109|T264109]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2081', diff saved to https://phabricator.wikimedia.org/P12858 and previous config saved to /var/cache/conftool/dbconfig/20200930-123851-marostegui.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P12857 and previous config saved to /var/cache/conftool/dbconfig/20200930-123824-marostegui.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2080', diff saved to https://phabricator.wikimedia.org/P12856 and previous config saved to /var/cache/conftool/dbconfig/20200930-123753-marostegui.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080', diff saved to https://phabricator.wikimedia.org/P12855 and previous config saved to /var/cache/conftool/dbconfig/20200930-123659-marostegui.json
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 11:33 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 11:33 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json
* 11:33 effie: enable puppet  P:mediawiki::mcrouter_wancache for 630845 - [[phab:T244340|T244340]]
* 11:21 nikerabbit@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:627744{{!}}Enable Special:TranslationStats (T263004)]] (duration: 00m 59s)
* 11:06 effie: disable puppet on P:mediawiki::mcrouter_wancache for 630845 - [[phab:T244340|T244340]]
* 10:57 moritzm: installing librsvg security updates
* 10:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:21 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:07 kormat: deploying schema change to s4/eqiad [[phab:T259831|T259831]]
* 10:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:07 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:59 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:50 jayme: imported envoyproxy 1.15.1 to buster-wikimedia component/envoy-future - [[phab:T264157|T264157]]
* 09:12 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:10 gehel@cumin1001: START - Cookbook sre.hosts.downtime
* 08:45 kormat: deploying schema change to s7/eqiad [[phab:T259831|T259831]]
* 08:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2016 from dbctl [[phab:T264156|T264156]]', diff saved to https://phabricator.wikimedia.org/P12853 and previous config saved to /var/cache/conftool/dbconfig/20200930-080817-marostegui.json
* 08:06 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 08:00 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 07:56 akosiaris: upgrade termbox to latest chart, fixing various prometheus-statsd-export configuration minor issues.
* 07:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 07:55 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1131 on s6 eqiad master [[phab:T263227|T263227]], also give weight to db1093 as new API host', diff saved to https://phabricator.wikimedia.org/P12852 and previous config saved to /var/cache/conftool/dbconfig/20200930-074417-marostegui.json
* 07:41 marostegui: Starting s6 eqiad failover from db1093 to db1131 - [[phab:T263227|T263227]]
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 [[phab:T263227|T263227]]', diff saved to https://phabricator.wikimedia.org/P12851 and previous config saved to /var/cache/conftool/dbconfig/20200930-071841-marostegui.json
* 07:05 marostegui: Stop mysql on es2016 before decommissioning [[phab:T264156|T264156]]
* 07:01 elukey@deploy1001: Finished deploy [analytics/superset/deploy@7bdc414]: Upgrade to 0.37.2 (duration: 00m 49s)
* 07:00 elukey@deploy1001: Started deploy [analytics/superset/deploy@7bdc414]: Upgrade to 0.37.2
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2016 [[phab:T264156|T264156]]', diff saved to https://phabricator.wikimedia.org/P12850 and previous config saved to /var/cache/conftool/dbconfig/20200930-065838-marostegui.json
* 06:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 06:19 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2082', diff saved to https://phabricator.wikimedia.org/P12849 and previous config saved to /var/cache/conftool/dbconfig/20200930-061036-marostegui.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2082', diff saved to https://phabricator.wikimedia.org/P12848 and previous config saved to /var/cache/conftool/dbconfig/20200930-061005-marostegui.json
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318', diff saved to https://phabricator.wikimedia.org/P12847 and previous config saved to /var/cache/conftool/dbconfig/20200930-060754-marostegui.json
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318', diff saved to https://phabricator.wikimedia.org/P12846 and previous config saved to /var/cache/conftool/dbconfig/20200930-060705-marostegui.json
* 05:43 marostegui: Remove es2019 from tendril and zarcillo [[phab:T264063|T264063]]
* 05:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:29 marostegui: Reduce busy-time from 3600 to 1800 on labsdb1010
* 02:30 eileen: process-control config revision is {{Gerrit|646817a2c0}}
* 00:41 tgr@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/GrowthExperiments/: Backport: [[gerrit:630801{{!}}Ensure variant A homepage sidebar is always at least 300px (T263905)]] (duration: 01m 01s)


== 2020-09-29 ==
== 2021-07-29 ==
* 23:35 mutante: created testvm3001.esams.wmnet to test install3001
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 23:31 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 23:24 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Echo app push on all Wikipedias ([[phab:T262936|T262936]]) (duration: 00m 59s)
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 23:20 Urbanecm: Evening B&C window completed
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 23:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|68d7af9cb38de09b4cb8655f0b095b60d470fbbc
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 14:39 dzahn@cumin1001: conftool action : set


== 2020-09-28 ==
== 2021-07-15 ==
* 23:56 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T264053|T264053]]: Remove commonswiki from sidebar search (duration: 01m 09s)
* 23:32 brennen: checking stashbot: [[phab:T286756|T286756]]
* 23:42 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/ConfigurationLoader/PageConfigurationLoader.php: Backport: [[gerrit:630420{{!}}Properly handle namespaces in tasktype template configuration (T264029)]] (duration: 01m 03s)
* 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: [[gerrit:704815{{!}}Fix creation of mw.Message objects (T286385)]] (duration: 00m 57s)
* 22:27 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # [[phab:T285811|T285811]]
* 22:25 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # [[phab:T285811|T285811]]
* 22:24 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # [[phab:T285811|T285811]]
* 22:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
* 21:58 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 21:25 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki [[phab:T284928|T284928]]
* 21:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
* 21:22 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 21:21 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 20:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 20:51 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 20:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
* 20:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:28 volker-e@deploy1002: Started deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484)
* 20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 20:46 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 20:45 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 20:17 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 20:17 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:35 reedy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 06s)
* 20:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:34 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 07s)
* 20:13 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 18:32 robh@cumin1001: START - Cookbook sre.dns.netbox
* 20:13 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:11 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs (duration: 05m 41s)
* 20:10 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:05 otto@deploy1002: Started deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs
* 19:16 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 17:00 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs (duration: 17m 21s)
* 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 19:12 ejegg: updated staging payments-wiki from {{Gerrit|43470629cc}} to {{Gerrit|885d87a905}}
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 18:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:43 otto@deploy1002: Started deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs
* 18:15 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 16:40 ejegg: updated payments-wiki from {{Gerrit|d9892207c1}} to {{Gerrit|844b59ee42}}
* 18:15 Urbanecm: Morning B&C done
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c7e08bc2bbff6aead186350726d5c1c137cca052}}: Enable search in header A/B test for logged in users ([[phab:T263032|T263032]]) (duration: 00m 58s)
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 17:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:27 ejegg: updated fundraising CiviCRM from {{Gerrit|e0d53c92b5}} to {{Gerrit|20e9ef6bbb}}
* 17:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 16:24 ejegg: updated payments-wiki from {{Gerrit|0e7800027a}} to {{Gerrit|844b59ee42}}
* 17:15 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 17:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:58 ejegg: updated payment-wiki from {{Gerrit|b2eb456ed1}} to {{Gerrit|2083498811}}
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:34 cdanis@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:24 cdanis@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:16 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704773{{!}}flaggedrevs: Allow admins of idwiki to change stablesettings (T268317)]], try II (duration: 01m 05s)
* 16:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:03 Amir1: temporary becoming admin on idwiki to debug [[phab:T268317|T268317]]
* 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 moritzm: installing nginx security updates on ms-fe*
* 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:20 nskaggs@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:20 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 16:20 cdanis@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 16:20 cdanis@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 16:08 hnowlan: reimaging new restbase hosts - restbase1028, restbase1029, restbase1030
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 16:08 XioNoX: push pfw policies - [[phab:T264013|T264013]]
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:51 papaul: poweroff elastic2037 for DIMM replacing
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:26 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1114 [[phab:T196487|T196487]]', diff saved to https://phabricator.wikimedia.org/P12818 and previous config saved to /var/cache/conftool/dbconfig/20200928-152635-kormat.json
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 15:25 hashar: Restarting CI Jenkins for plugins uninstallation [[phab:T260565|T260565]]
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 15:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1301.eqiad.wmnet
* 15:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 15:13 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 15:13 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 15:12 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:12 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 15:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 15:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 15:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1301.eqiad.wmnet
* 14:59 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1300.eqiad.wmnet
* 14:49 moritzm: installing glib-networking security updates
* 13:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw130[0-1].eqiad.wmnet
* 14:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:22 mutante: mw1300, mw1301 - jobrunners going out of service, decom
* 14:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1300.eqiad.wmnet
* 14:40 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1006.eqiad.wmnet
* 13:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:33 XioNoX: repool eqiad
* 13:17 jelto@cumin1001: START - Cookbook sre.dns.netbox
* 14:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:12 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw130[0-1].eqiad.wmnet
* 14:05 moritzm: uploaded libdbi-perl 1.631-3+wmf1 for jessie-wikimedia [[phab:T259102|T259102]]
* 13:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 13:58 XioNoX: asw2-d-eqiad# run request system power-off member 4
* 13:05 mutante: mw1413 - pooling, was depooled but for unknown reason, dont see it in SAL, looks ok, scap pulled
* 13:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:03 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1422.eqiad.wmnet
* 13:46 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1006.eqiad.wmnet
* 13:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:45 XioNoX: downtiming all eqiad row D hosts - [[phab:T196487|T196487]]
* 13:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 13:38 godog: roll restart object-replicator on ms-be2* for higher concurrency - [[phab:T261633|T261633]]
* 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[3-5].eqiad.wmnet
* 13:20 volans@cumin1001: START - Cookbook sre.dns.netbox
* 12:54 mutante: mw1423, mw1424, mw1425 - pooled as new API servers
* 13:19 moritzm: reimaging sretest1001 to validate puppetised sources.list with a new installation [[phab:T158562|T158562]]
* 12:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 13:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 12:53 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[3-5].eqiad.wmnet
* 12:57 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 12:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 12:37 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:31 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:29 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript resetUserEmail.php --wiki=arbcom_ruwiki 'Adamant.pwn' 'adamant.pwn@hotmail.com' # [[phab:T262812|T262812]]
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:28 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript createAndPromote.php --wiki=arbcom_ruwiki --bureaucrat --sysop 'Adamant.pwn' <PASSWORD REDACTED> # [[phab:T262812|T262812]]
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:26 Urbanecm: arbcom_ruwiki is created ([[phab:T262812|T262812]])
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:26 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 48s)
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating arbcom_ruwiki ([[phab:T262812|T262812]]) (duration: 00m 56s)
* 12:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 12:23 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating arbcom_ruwiki ([[phab:T262812|T262812]]) (duration: 00m 56s)
* 12:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 12:21 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating arbcom_ruwiki ([[phab:T262812|T262812]])
* 12:34 mutante: mw1423, mw1424, mw1425 - scap pull
* 12:20 urbanecm@deploy1001: Synchronized dblists: Creating arbcom_ruwiki ([[phab:T262812|T262812]]) (duration: 00m 57s)
* 12:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:19 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating arbcom_ruwiki ([[phab:T262812|T262812]]) (duration: 00m 57s)
* 12:09 mutante: mw1423,mw1424,mw1425 - rebooting
* 12:17 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating arbcom_ruwiki ([[phab:T262812|T262812]]) (duration: 00m 56s)
* 11:48 moritzm: restarting restbase1028-1030 to pick up libuv security update
* 12:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 11:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 11:59 kormat@cumin1001: dbctl commit (dc=all): 'db1114 depooling: prep for rack switch upgrade [[phab:T196487|T196487]]', diff saved to https://phabricator.wikimedia.org/P12815 and previous config saved to /var/cache/conftool/dbconfig/20200928-115904-kormat.json
* 11:47 mutante: mw1423, mw1424, mw1425 - initial puppet run, new API appservers going into production
* 11:43 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|483beb2452caead8c44dfb8e608812778033fba0}}: ContentTranslation: Do not use wikishared DB for testwiki ([[phab:T263417|T263417]]; follow-up {{Gerrit|af09303a4a155681b198ac70468494c2155868df}} also included in this sync) (duration: 00m 56s)
* 11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704527{{!}}Make idwiki use protect mode of flaggedrevs (T268317)]] (duration: 01m 07s)
* 11:34 Urbanecm: EU B&C window done
* 11:40 moritzm: restarting Etherpad to pick up libuv security update
* 11:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|61eac95ef62aef682039761e0f02188437cb15fb}}: Creation of patroller group on arz.wikipedia ([[phab:T262218|T262218]]) (duration: 00m 57s)
* 11:37 moritzm: restarting Turnilo to pick up libuv security update
* 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|483beb2452caead8c44dfb8e608812778033fba0}}: ContentTranslation: Do not use wikishared DB for testwiki ([[phab:T263417|T263417]]; follow-up {{Gerrit|af09303a4a155681b198ac70468494c2155868df}} also included in this sync) (duration: 00m 57s)
* 11:34 moritzm: installing libuv1 security updates
* 10:45 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:630561{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 10 hosts
* 10:44 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:630561{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 10:37 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 11:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 10:35 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 11:05 volans@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 10:35 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:56 volans: commented out cron-spam entries on thanos-fe2001, puppet is disabled, thanos-store.service fails to start - [[phab:T285835|T285835]]
* 10:33 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:41 godog: move wikibase.queryService.ui.app to wikibase.queryService.ui.index.app - [[phab:T272128|T272128]]
* 10:32 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:34 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 10:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 10:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:33 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 10:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 10:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 10:25 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:26 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 09:48 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:48 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 09:48 ema: upload@codfw: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 09:29 ema: text@codfw: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 10:02 effie: disableing puppet on maps* for 704394
* 09:17 _joe_: changing the restbase public TLS certs to include restbase-async.discovery.wmnet
* 09:38 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:17 XioNoX: restart bird on dns2001 - [[phab:T262372|T262372]]
* 09:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:15 jynus: restart db1077 for upgrade and cleanup [[phab:T187984|T187984]]
* 09:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:06 XioNoX: restart bird on centrallog2001 - [[phab:T262372|T262372]]
* 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:02 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:00 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 08:56 dcausse: [[phab:T263970|T263970]]: recovering lost apifeature indices (copying eqiad indices -> codfw)
* 09:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 08:55 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 09:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 08:53 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 09:11 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-8].eqiad.wmnet
* 08:46 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 09:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 08:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 08:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 08:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 08:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:37 elukey: decommission the hadoop test cluster (analytics1028->41)
* 08:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:36 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 08:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:35 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:34 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:31 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:34 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:29 elukey: sudo rm /etc/rawdog/en/feeds/847a7185.state* on planet1002 (corrupted file) - backup in /home/elukey + restart planet-update-en.service
* 08:32 ema: text@eqiad: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 08:12 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-8].eqiad.wmnet
* 08:28 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: mobo replaced [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12813 and previous config saved to /var/cache/conftool/dbconfig/20200928-082825-kormat.json
* 08:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 08:21 ema: upload@eqiad: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 08:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 08:21 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2113 from contributions/logpager/recentchanges*/watchlist [[phab:T263842|T263842]]', diff saved to https://phabricator.wikimedia.org/P12812 and previous config saved to /var/cache/conftool/dbconfig/20200928-082114-kormat.json
* 07:48 moritzm: updated bullseye d-i image for latest daily build [[phab:T275873|T275873]]
* 08:13 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: mobo replaced [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12811 and previous config saved to /var/cache/conftool/dbconfig/20200928-081321-kormat.json
* 07:31 godog: reimage thanos-fe2001 with bullseye - [[phab:T285835|T285835]]
* 08:07 jayme: restarting pybal on lvs3005 for switching to conf1005 - [[phab:T196487|T196487]]
* 07:23 elukey: restart planet-update-en.service on planet1002
* 08:06 jayme: restarting pybal on lvs3006 for switching to conf1005 - [[phab:T196487|T196487]]
* 07:17 elukey: remove /etc/rawdog/en/<nowiki>{</nowiki>state,state.lock<nowiki>}</nowiki> on planet1002 (following what rawdog suggested) due to corrupted files (backups available in /home/elukey/en)
* 08:02 jayme: restarting pybal on lvs3007 for switching to conf1005 - [[phab:T196487|T196487]]
* 06:51 elukey: restart phabricator_clean_tmp_files.service on phab1001 - transient error (tmp files already cleaned up)
* 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 06s)
* 07:58 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: mobo replaced [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12810 and previous config saved to /var/cache/conftool/dbconfig/20200928-075817-kormat.json
* 06:47 tstarling@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 07s)
* 07:54 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 05:50 kart_: Updated cxserver to 2021-07-14-124232-production ([[phab:T282369|T282369]], [[phab:T284450|T284450]])
* 07:43 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: mobo replaced [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12809 and previous config saved to /var/cache/conftool/dbconfig/20200928-074313-kormat.json
* 05:47 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 07:29 _joe_: restarting pybal on the LVS primaries
* 05:43 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 07:24 dcausse: [[phab:T263970|T263970]]: forcing allocation of enwiki_general_1587198756 (chi@eqiad)
* 05:41 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 07:18 _joe_: restarting pybal on the backup LVS in eqiad, codfw to pick up the new wikifeeds endpoint
* 00:00 twentyafterfour: phabricator update deployed.
* 07:17 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
* 07:09 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2028 as es1 master in codfw [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12806 and previous config saved to /var/cache/conftool/dbconfig/20200928-065938-marostegui.json
* 06:15 marostegui: Set innodb_change_buffering = inserts; on db2089 (s5), db2106 (s4), db2108 (s2), db2085 (s1), db2085 (s8), db2087 (s7), db2087 (s6), db2109 (s3) [[phab:T263443|T263443]]
* 05:55 marostegui: Stop MySQL on es2013 before decommissioning it [[phab:T263740|T263740]]
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2013 from dbctl [[phab:T263740|T263740]]', diff saved to https://phabricator.wikimedia.org/P12805 and previous config saved to /var/cache/conftool/dbconfig/20200928-055410-marostegui.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2013 [[phab:T263740|T263740]]', diff saved to https://phabricator.wikimedia.org/P12804 and previous config saved to /var/cache/conftool/dbconfig/20200928-054846-marostegui.json
* 05:22 marostegui: Decrease labsdb1011 weight


== 2020-09-27 ==
== 2021-07-14 ==
* 06:36 elukey: powercycle analytics1048
* 23:23 eileen: civicrm revision changed from {{Gerrit|b1c63470bb}} to {{Gerrit|e0d53c92b5}}, config revision is {{Gerrit|bb405c5232}}
* 21:19 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.14 (duration: 01m 05s)
* 21:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.14
* 21:08 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/User.php: Backport: [[gerrit:704609{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 20:58 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/User.php: Backport: [[gerrit:704608{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 20:51 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.14
* 19:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/resources: Backport: [[gerrit:704606{{!}}Fix deprecated offset() on invalid DOM (T185629)]] (duration: 01m 07s)
* 19:31 andrew@deploy1002: Finished deploy [horizon/deploy@156a984]: fix trove-dashboard bug (duration: 04m 18s)
* 19:26 andrew@deploy1002: Started deploy [horizon/deploy@156a984]: fix trove-dashboard bug
* 19:17 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 19:17 nskaggs@cumin1001: Added views for new wiki: dagwiki [[phab:T284456|T284456]]
* 18:55 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:54 nskaggs@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
* 18:54 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:36 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 18:36 nskaggs@cumin1001: Added views for new wiki: banwikisource [[phab:T284390|T284390]]
* 18:30 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:14 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 17:52 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 17:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 17:49 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 17:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 17:39 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 17:35 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 17:35 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704383{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 06s)
* 17:00 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704382{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 05s)
* 16:27 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 16:26 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 16:11 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Translate: Backport: [[gerrit:704404{{!}}TranslationAid: Handle empty message definition (T285830)]] and [[gerrit:704405{{!}}TranslationAid: Make sure to return successfully fetched definitions (T285830)]] (duration: 01m 09s)
* 16:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:37 moritzm: installing klibc security updates
* 15:36 ottomata: deploying eventgate-analytics with direct service-runner promethues support
* 15:34 ryankemper: [Elastic] Manually triggering readahead mitigation across whole fleet to prevent any further issues today: `ryankemper@cumin1001:~$ sudo cumin -b 12 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl restart elasticsearch-disable-readahead.service'` (still need to investigate why `elasticsearch-disable-readahead.timer` isn't re-firing every 30 mins as desired)
* 15:34 moritzm: installing apache security updates on otrs1001 (ticket.wikimedia.org)
* 15:34 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:28 urbanecm: Start server-side upload of 3 large image files ([[phab:T285708|T285708]])
* 15:16 moritzm: installing apache security updates on lists1001 (lists.wikimedia.org)
* 14:51 moritzm: installing apache security updates on puppet masters
* 14:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2384.codfw.wmnet
* 14:47 effie: set mw2384 as inactive to investigate mw2383 issue - [[phab:T286463|T286463]]
* 14:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 14:44 moritzm: installing apache security updates on grafana*
* 14:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 14:43 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 14:40 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:40 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1422.eqiad.wmnet
* 14:33 dcausse: runnning elasticsearch-madvise-random ES_PID on elastic2045
* 14:31 dcausse: runnning elasticsearch-madvise-random 1022 on elastic2054
* 14:23 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:13 elukey: restart php-fpm on mw2370
* 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 13:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 13:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277118|T277118]]
* 13:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277118|T277118]]
* 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1005.eqiad.wmnet
* 12:43 urbanecm: Start server-side upload of 3 large image files ([[phab:T285708|T285708]])
* 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1005.eqiad.wmnet
* 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 12:23 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 12:15 mutante: mw1422 - scap pull
* 12:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1422.eqiad.wmnet
* 12:02 moritzm: upgrading python3-wmflib fleetwide to 0.0.8 (needed for new logout.d wrapper)
* 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
* 12:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
* 11:52 mutante: mw1422 - new setup, not in prod yet
* 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 11:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
* 11:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
* 11:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 11:49 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704525{{!}}Remove reviewer user group in ruwiki (T284589)]] (duration: 01m 05s)
* 11:40 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
* 11:39 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:700854{{!}}flaggedrevs: Reduce levels for ruwiki to 1 (T284589)]] (duration: 01m 05s)
* 11:37 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
* 11:23 ariel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2383.codfw.wmnet
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|72027e136f10867f5db02043b7505390e49130d1}}: Disable indexing in NS_USER and NS_USER_TALK on bnwiki ([[phab:T286152|T286152]]) (duration: 02m 07s)
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4dc11d2333cbf70a4eb20f3fb94a9e363b41d2df}}: Change category name of Babel extension on Javanese Wikipedia ([[phab:T286165|T286165]]) (duration: 02m 10s)
* 10:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 09:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277118|T277118]]
* 09:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277118|T277118]]
* 09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277118|T277118]]
* 09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277118|T277118]]
* 09:27 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php-1.37.0-wmf.14]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=testwiki # [[phab:T285811|T285811]]
* 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277118|T277118]]
* 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277118|T277118]]
* 07:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277118|T277118]]
* 07:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277118|T277118]]
* 07:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277118|T277118]]
* 07:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277118|T277118]]
* 07:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T277118|T277118]]
* 07:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T277118|T277118]]
* 00:58 eileen: process control updated to {{Gerrit|c291b3c6890364281d}}
* 00:58 eileen: {{Gerrit|c291b3c6890364281d}}
* 00:49 eileen: civicrm revision changed from {{Gerrit|bb62188ec6}} to {{Gerrit|b1c63470bb}}, config revision is {{Gerrit|c291b3c689}}
* 00:48 eileen: process-control config revision is {{Gerrit|c291b3c689}}
* 00:15 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: fix conf cache conflict with Defines.php noticed in beta (duration: 02m 09s)


== 2020-09-26 ==
== 2021-07-13 ==
* 19:20 chrisalbon: sudo service uwsgi-ores restart
* 23:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: {{Gerrit|f3627361ff558c89d4a4452ff24b3457f46a4f46}}: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector ([[phab:T286587|T286587]]) (duration: 02m 08s)
* 02:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: {{Gerrit|f3627361ff558c89d4a4452ff24b3457f46a4f46}}: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector ([[phab:T286587|T286587]]) (duration: 02m 07s)
* 02:04 cdanis@cumin2001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=eqiad
* 23:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
* 02:04 cdanis@cumin2001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=codfw
* 23:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
* 01:56 cdanis: ❌cdanis@cumin2001.codfw.wmnet ~ 🕙🍺 sudo cumin 'A:ores and A:codfw' 'systemctl restart celery-ores-worker.service uwsgi-ores.service '
* 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
* 01:48 cdanis@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=codfw
* 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
* 01:48 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
* 01:17 cdanis: ❌cdanis@ores2001.codfw.wmnet ~ 🕤🍺 sudo systemctl restart uwsgi-ores.service
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
* 01:11 cdanis: ✔️ cdanis@ores2001.codfw.wmnet ~ 🕘🍺 sudo systemctl restart celery-ores-worker.service
* 22:22 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
* 00:56 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 22:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
* 00:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 22:18 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Use Score with lilypond's safe mode only (duration: 02m 06s)
* 00:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:53 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 00:46 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 20:30 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/skins/Skin.php: Backport: [[gerrit:704368{{!}}links is flat array (T286040)]] (duration: 02m 07s)
* 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:26 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.9 (duration: 04m 21s)
* 00:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 20:19 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.14 (duration: 31m 56s)
* 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:47 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.14
* 19:02 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1283.eqiad.wmnet
* 17:45 mutante: mw1283 - decom - powered off by cookbook
* 17:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1283.eqiad.wmnet
* 17:41 mutante: homer "asw2-a*eqiad*" commit "decom mw1282 - [[phab:T280203|T280203]]"
* 17:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
* 17:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
* 17:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
* 17:09 mutante: mw1282 - decom, powered off
* 17:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
* 17:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1281.eqiad.wmnet
* 17:05 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: Backport: [[gerrit:704181{{!}}Do not lock user_preferences before updating (T286521)]] (duration: 01m 58s)
* 16:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:55 jbond: upload statograph to buster wikimedia
* 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1281.eqiad.wmnet
* 16:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 16:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 16:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 16:25 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw128[1-3].eqiad.wmnet
* 15:34 topranks: Adding IX peering to AS393950 (Xiber LLC) on cr2-eqiad.
* 15:20 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:19 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:52 volker-e@deploy1002: Finished deploy [design/style-guide@5c07233]: Deploy design/style-guide: {{Gerrit|5c07233}} “Components”: Add WikimediaUI theme Figma links to various components (#483) (duration: 00m 06s)
* 14:52 volker-e@deploy1002: Started deploy [design/style-guide@5c07233]: Deploy design/style-guide: {{Gerrit|5c07233}} “Components”: Add WikimediaUI theme Figma links to various components (#483)
* 14:35 nskaggs@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 14:35 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 13:57 otto@deploy1002: Finished deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job  - [[phab:T271232|T271232]] (duration: 03m 28s)
* 13:53 otto@deploy1002: Started deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job - [[phab:T271232|T271232]]
* 13:37 effie: rolling restart php-fpm across clusters - [[phab:T286260|T286260]]
* 13:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/lib/includes/SimpleCacheWithBagOStuff.php: Backport: [[gerrit:704176{{!}}Send TTL instead of expiry in unix timestamp in calling BagOStuff (T286260)]] (duration: 00m 58s)
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 13:29 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 13:14 kormat: restarted replication on db1117:3325 [[phab:T284622|T284622]]
* 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 1732 hosts
* 13:10 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 13:10 hashar: Upgraded Apache on gerrit1001 and gerrit2001
* 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
* 13:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
* 12:53 kormat: stopping replication on db1117:3325 [[phab:T284622|T284622]]
* 12:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 [[phab:T284622|T284622]]
* 12:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 [[phab:T284622|T284622]]
* 12:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 12:41 mutante: depooling and decom'ing eqiad API servers mw1281, mw1282, mw1283 - [[phab:T280203|T280203]]
* 12:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[1-3].eqiad.wmnet
* 12:20 mutante: mwmaint1002 - scap pull after reimaging
* 11:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
* 11:28 Lucas_WMDE: EU backport+config window done
* 11:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:704304{{!}}Remove obsolete $wgShowDBErrorBacktrace config]] (duration: 01m 25s)
* 11:13 mutante: mwmaint1002 - reimaging with buster ([[phab:T267607|T267607]])
* 10:54 mutante: switching https://noc.wikimedia.org backened from eqiad to codfw for mwmaint1002 OS upgrade, not affecting config-master/pybal, tests passed ([[phab:T267607|T267607]])
* 10:44 moritzm: upgrading apache on phab1001 (phabricator.wikimedia.org)
* 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:39 hnowlan: running `nodetool decommission` on maps2008
* 10:27 moritzm: installing apache security updates on alert1001 (icinga.wikimedia.org)
* 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277116|T277116]]
* 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277116|T277116]]
* 10:18 moritzm: installing apache security updates on Logstash hosts
* 09:58 moritzm: upgrading PHP/Apache on matomo1002 (piwik.wikimedia.org)
* 09:40 moritzm: installing apache security updates on thanos-fe hosts
* 09:38 moritzm: installing apache security updates on parsoid hosts
* 09:31 effie: depool mw2383 [[phab:T286463|T286463]]
* 09:18 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 09:15 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277116|T277116]]
* 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277116|T277116]]
* 08:59 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
* 08:59 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
* 08:45 effie: depool mw2383 - [[phab:T286463|T286463]]
* 08:02 moritzm: upgrade bullseye pilot installs to latest state of bullseye
* 07:06 moritzm: installing apache security updates on codfw mw* hosts
* 06:53 elukey: systemctl reset-failed ifup@ens5 on gitlab2001 - [[phab:T273026|T273026]]
* 06:06 effie: pool mw2383  - [[phab:T286463|T286463]]
* 04:09 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 04:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 04:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 04:05 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@36f74b3]: 0.3.76 (duration: 08m 28s)
* 03:56 ryankemper@deploy1002: Started deploy [wdqs/wdqs@36f74b3]: 0.3.76
* 03:55 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@36f74b3]: 0.3.76 (duration: 02m 22s)
* 03:54 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.76` on canary `wdqs1003`; proceeding to rest of fleet
* 03:53 ryankemper@deploy1002: Started deploy [wdqs/wdqs@36f74b3]: 0.3.76
* 03:53 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.76`. Pre-deploy tests passing on canary `wdqs1003`


== 2020-09-25 ==
== 2021-07-12 ==
* 23:03 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a135388]: correct scap variable refernce in airflow_variables (duration: 26m 57s)
* 23:57 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1896efc27f3de39659673091bc4c43ad874da0c5}}: Add sayahna.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T286163|T286163]]) (duration: 00m 56s)
* 22:36 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a135388]: correct scap variable refernce in airflow_variables
* 23:51 urbanecm: urbanecm@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=[[phab:T286396|T286396]] # [[phab:T286396|T286396]]
* 22:17 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d1a619f]: increase airflow_variable debugging verbosity (duration: 10m 42s)
* 23:50 urbanecm: urbanecm@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=BROKEN # [[phab:T286396|T286396]]
* food: updated fundraising CiviCRM from {{Gerrit|eb90dbcfd3}} to {{Gerrit|035ad1c351}}
* 23:50 urbanecm: Delete Project:BROKENPesak at sr.wikipedia to be able to rerun namespaceDupes.php ([[phab:T286396|T286396]])
* 22:06 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d1a619f]: increase airflow_variable debugging verbosity
* 23:45 urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=BROKEN # [[phab:T286396|T286396]]
* 21:23 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d999f76]: adding debug info to deployment (duration: 11m 33s)
* 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|284216a7d35c815ea203a9c0bd738a1e1bf31f7e}}: Add few namespace aliases for Serbian Wikipedia ([[phab:T286396|T286396]]) (duration: 00m 56s)
* 21:11 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d999f76]: adding debug info to deployment
* 23:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8a79bf752ff5eb15f3042fd94ba10c2c50607a85}}: enwiki: Delete Book namespace ([[phab:T285766|T285766]]) (duration: 00m 57s)
* 20:26 effie: installing memcached 1.4.33-1+deb9u1 on mwdebug1001
* 23:29 urbanecm@deploy1002: Synchronized static/images/: {{Gerrit|d007b9ccb77db9f3dc492df7a35477e5563a921a}}: Remove unused celebration logos and wordmark ([[phab:T286380|T286380]]) (duration: 00m 57s)
* 19:34 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@303eaf3]: Enable icutoknorm in glent m0 and m1 (duration: 53m 58s)
* 23:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6c581493fbe5d9c372fd44635b704d04040d8b38}}: Add editautoreviewprotected to bot on hewikisource ([[phab:T275076|T275076]]) (duration: 00m 57s)
* 18:40 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@303eaf3]: Enable icutoknorm in glent m0 and m1
* 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|40eade4131eac95ba3dc0d918ad540070d7bcb99}}: Enable RelatedArticles Extension in zhwikinews ([[phab:T266933|T266933]]) (duration: 00m 57s)
* 17:47 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/MobileFrontend/: Backport: [[gerrit:630065{{!}}Make all section `collapsible` during server side rendering (T263832)]] (duration: 00m 59s)
* 23:15 urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=zhwiktionary --fix --add-prefix=BROKEN # [[phab:T286101|T286101]], P16817
* 17:37 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae3c936]: Deploy glent 0.2.3 (duration: 02m 01s)
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5ab00d188bc4161e40455b842f613698548b3518}}: zhwiktionary: Add templateeditor right ([[phab:T286101|T286101]]) (duration: 00m 57s)
* 17:35 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ae3c936]: Deploy glent 0.2.3
* 23:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5822b2be129b934939af46bab5b8916039661e97}}: zhwiktionary: Add aliases for namespaces ([[phab:T286101|T286101]]) (duration: 00m 57s)
* 16:35 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@94c8e6a]: fixed start data for wikidata ttl import (duration: 01m 10s)
* 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ba0967f5c18652d02b7b476e9592b81dcb9b74fc}}: zhwiktionary: Add Reconstruction namespace ([[phab:T286101|T286101]]) (duration: 00m 57s)
* 16:34 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@94c8e6a]: fixed start data for wikidata ttl import
* 22:53 legoktm: root@urldownloader2002:/var/cache/apt# rm -rf * to free up space
* 16:33 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Promote 1.35.0 to stable in extensiondistributor (duration: 00m 57s)
* 21:26 urbanecm: Start server-side upload for 2 video files ([[phab:T286432|T286432]], [[phab:T286433|T286433]])
* 16:29 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@200b502]: Finalize event_default gobblin job  - [[phab:T271232|T271232]] (duration: 03m 39s)
* 16:23 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:37 otto@deploy1002: Started deploy [analytics/refinery@200b502]: Finalize event_default gobblin job  - [[phab:T271232|T271232]]
* 15:23 jynus: fixing enwikivoyage ipblocks inconsistency cluster-wide [[phab:T263842|T263842]]
* 18:12 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score using Shellbox on testwiki ([[phab:T257066|T257066]]) (duration: 00m 58s)
* 14:54 elukey: install linux-image-4.19-amd64 on an-worker1096 + reboot
* 16:15 ppchelko@deploy1002: Finished deploy [restbase/deploy@b05ade3]: Add newly created wikis [[phab:T284929|T284929]] [[phab:T284457|T284457]] [[phab:T284392|T284392]] (duration: 21m 24s)
* 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]] - extending downtime
* 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]] - extending downtime
* 12:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:54 ppchelko@deploy1002: Started deploy [restbase/deploy@b05ade3]: Add newly created wikis [[phab:T284929|T284929]] [[phab:T284457|T284457]] [[phab:T284392|T284392]]
* 12:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]]
* 12:13 kormat@cumin1001: dbctl commit (dc=all): 'Add db2113 to various groups [[phab:T263842|T263842]]', diff saved to https://phabricator.wikimedia.org/P12797 and previous config saved to /var/cache/conftool/dbconfig/20200925-121332-kormat.json
* 15:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]]
* 11:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277116|T277116]]
* 11:23 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 15:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277116|T277116]]
* 11:10 moritzm: reimaging sretest1001 to validate puppetised sources.list with a new installation [[phab:T158562|T158562]]
* 15:24 elukey: expand ML k8s iBGP neighbors to include the master nodes (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/704104)
* 10:42 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277116|T277116]]
* 10:40 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 15:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277116|T277116]]
* 10:28 moritzm: reimaging sretest1002 to validate puppetised sources.list with a new installation [[phab:T158562|T158562]]
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica1002.wikimedia.org
* 09:58 moritzm: restarting archiva to pick up Java security update
* 15:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277116|T277116]]
* 09:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277116|T277116]]
* 09:22 ema: upload@eqsin: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 15:00 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica1002.wikimedia.org
* 09:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change [[phab:T277116|T277116]]
* 09:02 ema: text@eqsin: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 14:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change [[phab:T277116|T277116]]
* 06:50 elukey: shutdown ganeti5002 (mistakenly powercycled it without seeing [[phab:T261130|T261130]])
* 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica1001.wikimedia.org
* 06:40 elukey: powercycle ganeti5002 (no instances running on it, mgmt console shows no tty usable)
* 14:44 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica1001.wikimedia.org
* 06:34 elukey: reboot stat1004 to pick up kernel settings
* 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica2004.wikimedia.org
* 03:10 ejegg: updated payments-wiki from {{Gerrit|f89c594e12}} to {{Gerrit|b2eb456ed1}}
* 14:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica2004.wikimedia.org
* 02:29 ppchelko@deploy1001: Finished deploy [restbase/deploy@4eaad8f]: new codfw, [[phab:T263798|T263798]] (duration: 09m 05s)
* 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica2003.wikimedia.org
* 02:27 andrew@deploy1001: Finished deploy [horizon/deploy@7b61460]: (no justification provided) (duration: 00m 07s)
* 14:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica2003.wikimedia.org
* 02:27 andrew@deploy1001: Started deploy [horizon/deploy@7b61460]: (no justification provided)
* 14:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 02:20 ppchelko@deploy1001: Started deploy [restbase/deploy@4eaad8f]: new codfw, [[phab:T263798|T263798]]
* 13:59 otto@deploy1002: Finished deploy [analytics/refinery@dd65f38]: event_default gobblin job - fix typo  - [[phab:T271232|T271232]] (duration: 03m 30s)
* 02:20 ppchelko@deploy1001: Finished deploy [restbase/deploy@4eaad8f]: eqiad-only, [[phab:T263798|T263798]] (duration: 06m 09s)
* 13:56 otto@deploy1002: Started deploy [analytics/refinery@dd65f38]: event_default gobblin job - fix typo  - [[phab:T271232|T271232]]
* 02:14 ppchelko@deploy1001: Started deploy [restbase/deploy@4eaad8f]: eqiad-only, [[phab:T263798|T263798]]
* 13:52 otto@deploy1002: Finished deploy [analytics/refinery@0149c81]: Set event_default gobblin job max mappers=128 - [[phab:T271232|T271232]] (duration: 03m 16s)
* 13:49 otto@deploy1002: Started deploy [analytics/refinery@0149c81]: Set event_default gobblin job max mappers=128 - [[phab:T271232|T271232]]
* 13:36 otto@deploy1002: Finished deploy [analytics/refinery@1cb9e12]: Add event_default gobblin job - [[phab:T271232|T271232]] (duration: 03m 37s)
* 13:32 otto@deploy1002: Started deploy [analytics/refinery@1cb9e12]: Add event_default gobblin job - [[phab:T271232|T271232]]
* 12:51 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:48 volans@cumin2002: START - Cookbook sre.dns.netbox
* 12:42 volans: reverting Primary IP allocation for pc1011-1014, leaving only mgmt IPs - [[phab:T282484|T282484]]
* 12:34 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps2004.codfw.wmnet
* 11:58 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:703567{{!}}Enable template search improvements on first wikis 2/2 (T284553)]] (duration: 00m 57s)
* 11:54 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:703566{{!}}Enable template search improvements on first wikis 1/2 (T284553)]] (duration: 00m 56s)
* 11:49 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/VisualEditor/modules/ve-mw/ui/widgets/ve.ui.MWTemplateTitleInputWidget.js: Backport: [[gerrit:703649{{!}}Always add 1 prefixsearch match when searching for templates]] (duration: 00m 57s)
* 11:47 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps100[1-4].eqiad.wmnet
* 11:45 hnowlan: adjusting weights of eqiad maps servers to reduce load on older spec machines
* 11:40 moritzm: installing apache updates on mw1/eqiad hosts
* 11:38 hnowlan: adjusting weights of codfw maps servers to reduce load on older spec machines
* 11:37 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2004.codfw.wmnet
* 11:34 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|773c956811cba5c3a2cbba32bc1e1a536dbd9f0b}}: Revert "Use ptwiki 20th anniversary logos" ([[phab:T286380|T286380]]) (duration: 00m 57s)
* 11:34 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2003.codfw.wmnet
* 11:33 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2001.codfw.wmnet
* 11:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cd5f5375b4f712c56e9396cc550078272ef668de}}: Revert "ptwiki: Use celebration logos in new vector" ([[phab:T286380|T286380]]) (duration: 00m 57s)
* 11:26 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:702761{{!}}Add 'editautoreviewprotected' protection level to hewikisource (T275076)]] (duration: 00m 57s)
* 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 11:19 hnowlan: testing a depool of maps2010 to ensure kartotherian load can cope with two less nodes
* 11:12 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:703568{{!}}Enable transclusion back button on first wikis (T284553)]] (duration: 00m 58s)
* 11:01 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2008.codfw.wmnet
* 10:58 hnowlan: testing a depool of maps2008 to ensure kartotherian load can cope with one less node
* 10:30 moritzm: installing apache updates on an-tool* hosts (affects Turnilo, Yarn, Superset, Hue) briefly
* 10:11 elukey: add 10g disk to ml-serve-ctrl[12]00[12] for [[phab:T285927|T285927]]
* 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet
* 10:05 mutante: planet - deleting state files, manually running update for all 161 en feeds - [[phab:T285251|T285251]]
* 10:03 effie: depool mw2383
* 10:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet
* 10:01 godog: test thanos-compact upload with smaller part size - [[phab:T285835|T285835]]
* 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
* 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
* 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1006.eqiad.wmnet
* 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
* 09:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
* 09:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1006.eqiad.wmnet
* 09:07 godog: repool thanos-fe2002 - [[phab:T285835|T285835]]
* 08:38 godog: test a single frontend for thanos-swift / thanos-query to test "bad host" theory - [[phab:T285835|T285835]]
* 08:26 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/client: Backport: [[gerrit:703890{{!}}Remove subscribing to other aspect for entity usage (T286193)]] (duration: 00m 59s)
* 07:44 jynus: restart db1102:x1 mariadb instance
* 07:01 moritzm: installing apache2 security updates
* 05:14 Amir1: start of mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --batch-size=10 --verbose --mime="application/pdf" --force --sleep 5 on screen - It will take days / week to finish ([[phab:T275268|T275268]])
* 05:06 ladsgroup@deploy1002: Synchronized wmf-config/filebackend.php: Config: [[gerrit:703951{{!}}Enable json image metadata everywhere (T275268)]] (duration: 01m 05s)
* 04:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/maintenance/refreshImageMetadata.php: Backport: [[gerrit:703891{{!}}Add --sleep option to refreshImageMetadata.php]] (duration: 01m 04s)
* 04:10 Amir1: mwscript refreshImageMetadata.php --wiki=testcommonswiki --mediatype=OFFICE --batch-size=20 --verbose --mime="application/pdf" --force ([[phab:T275268|T275268]])
* 04:08 ladsgroup@deploy1002: Synchronized wmf-config/filebackend.php: Config: [[gerrit:703950{{!}}Set testcommonswiki to use json image metadata (T275268)]] (duration: 01m 10s)


== 2020-09-24 ==
== 2021-07-09 ==
* 23:39 andrew@deploy1001: Finished deploy [horizon/deploy@7b61460]: (no justification provided) (duration: 01m 58s)
* 23:28 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:37 andrew@deploy1001: Started deploy [horizon/deploy@7b61460]: (no justification provided)
* 23:27 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 21:40 mutante: mw1349 - systemctl reset-failed
* 22:36 legoktm: running benchmarking scripts again shellbox
* 21:03 cdanis: reprepro: add backported ipvsadm 1:1.31-1+deb10u1 to buster-wikimedia
* 14:49 otto@deploy1002: Finished deploy [analytics/refinery@cdb3fc5] (hadoop-test): Deploy for finalize event_default_test gobblin job in hadoop test - [[phab:T271232|T271232]] (duration: 03m 08s)
* 21:00 andrew@deploy1001: Finished deploy [horizon/deploy@404e205]: (no justification provided) (duration: 01m 05s)
* 14:46 otto@deploy1002: Started deploy [analytics/refinery@cdb3fc5] (hadoop-test): Deploy for finalize event_default_test gobblin job in hadoop test - [[phab:T271232|T271232]]
* 20:59 andrew@deploy1001: Started deploy [horizon/deploy@404e205]: (no justification provided)
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1118', diff saved to https://phabricator.wikimedia.org/P16809 and previous config saved to /var/cache/conftool/dbconfig/20210709-115609-marostegui.json
* 20:41 andrew@deploy1001: Finished deploy [horizon/deploy@24368a5]: (no justification provided) (duration: 02m 10s)
* 11:40 _joe_: deleting coredns pod in codfw, potentially causing [[phab:T286360|T286360]]
* 20:39 andrew@deploy1001: Started deploy [horizon/deploy@24368a5]: (no justification provided)
* 10:13 _joe_: recreated all pods for zotero in codfw
* 20:35 andrew@deploy1001: Finished deploy [horizon/deploy@85125d1]: (no justification provided) (duration: 00m 52s)
* 00:47 legoktm: zotero rolling restart didn't help, filed [[phab:T286360|T286360]] for DNS issues
* 20:34 andrew@deploy1001: Started deploy [horizon/deploy@85125d1]: (no justification provided)
* 00:39 legoktm: doing a rolling restart of zotero in codfw to hopefully fix DNS ENOTFOUND issues
* 19:57 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 19:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 19:54 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 19:47 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: Revert: cloudelastic: envoy sits in front now (duration: 00m 59s)
* 19:41 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: (no justification provided) (duration: 00m 36s)
* 19:41 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: (no justification provided)
* 19:39 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: (no justification provided) (duration: 01m 08s)
* 19:38 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: (no justification provided)
* 19:30 andrew@deploy1001: Finished deploy [horizon/deploy@e5890b9]: dev (duration: 00m 44s)
* 19:29 andrew@deploy1001: Started deploy [horizon/deploy@e5890b9]: dev
* 19:08 dancy@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.36.0-wmf.10
* 19:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bcf9fcbe3b82ab85b8f97206ceca45b64619c362}}: Enable mobile block notice tracking in MobileFrontend ([[phab:T260218|T260218]]) (duration: 01m 04s)
* 18:58 tchanders@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:627481{{!}}Enable Special:Investigate on itwiki and svwiki (T262436)]] (duration: 01m 05s)
* 18:01 mutante: temp. disabled puppet on install4001/install5001 - applying install_server role to new servers, starting with install3001
* 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:24 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:21 jbond42: enable puppet fleet wide post update puppetdb postgres logging
* 17:19 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:17 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:15 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:15 jbond42: disable puppet fleet wide to update puppetdb postgres loggin
* 17:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:14 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 17:11 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 17:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:09 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 17:04 mutante: syncing facts to puppet compiler hosts
* 17:01 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:00 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:56 volans@cumin1001: START - Cookbook sre.dns.netbox
* 16:26 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 16:26 robh: properly pooled mw1360 this time [[phab:T262151|T262151]]
* 16:18 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 16:04 XioNoX: pfw3-eqiad> restart security-log gracefully
* 15:58 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/AbuseFilter/includes/Hooks/AbuseFilterHookRunner.php: {{Gerrit|5e88c36fa4111cde33dafb0d7ac31a854b95e5ea}}: HookRunner: onAbuseFilterGenerateUserVars should run generateUserVars ([[phab:T263750|T263750]]) (duration: 01m 06s)
* 15:46 Urbanecm: Run `mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=simplewiki --username="Oversight~simplewiki"` ([[phab:T263760|T263760]])
* 15:44 Urbanecm: Run `mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=enwiki --username=Oversight` ([[phab:T263760|T263760]])
* 15:43 Urbanecm: Rename all local Oversight accounts but enwiki to Oversight~dbname, see task for full list ([[phab:T263760|T263760]])
* 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12794 and previous config saved to /var/cache/conftool/dbconfig/20200924-152626-root.json
* 15:15 robh: mw1360 scap and repooled post work via [[phab:T262151|T262151]]
* 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 66%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12793 and previous config saved to /var/cache/conftool/dbconfig/20200924-151120-root.json
* 15:10 jayme: switched zotero service-proxy listener to use TLS - [[phab:T255869|T255869]]
* 15:00 XioNoX: repool eqiad - [[phab:T256112|T256112]]
* 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 33%: Slowly repool db2127 ', diff saved to https://phabricator.wikimedia.org/P12792 and previous config saved to /var/cache/conftool/dbconfig/20200924-145617-root.json
* 14:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:52 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 14:28 XioNoX: [Netops] In window: turn VC-ports on/off for proper cabling: - [[phab:T256112|T256112]]
* 14:19 XioNoX: remove damping on anycast group for cr2-codfw
* 14:18 jayme: restart pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T255869|T255869]]
* 14:16 jayme: restart pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - [[phab:T255869|T255869]]
* 14:16 XioNoX: [Netops] Disable unused VC ports to not risk them going online at connect: - [[phab:T256112|T256112]]
* 14:09 jayme: running puppet on lvs servers - [[phab:T255869|T255869]]
* 14:09 cmjohnson1: removing the cable connected to FPC1:1/0 (DAC 3m) FPC8:1/0 (DAC 3m)
* 13:58 moritzm: upgrading mariadb on cloudcontrol-2001/2003/2004
* 13:52 XioNoX: depool eqiad for row D recabling - [[phab:T256112|T256112]]
* 13:32 ottomata: Increased retention time for *.mediawiki.job.processMediaModeration topics in kafka main-eqiad and main-codfw to 31 days (as per request from Pchelolo )
* 13:22 elukey: moved the hadoop cluster to puppet TLS certificates - [[phab:T253957|T253957]]
* 13:17 XioNoX: add damping to anycast BGP - [[phab:T262372|T262372]]
* 12:58 jayme: switched mathoid service-proxy listener to use TLS - [[phab:T255875|T255875]]
* 12:50 moritzm: upgrading bird on centtrallog1001
* 12:43 gehel: restarting wdqs-categories on wdqs1009
* 12:43 moritzm: installing netty-3.9 security updates
* 12:42 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 12:30 ema: upload@ulsfo: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 12:29 godog: swift codfw-prod: rebalance only, no weight change
* 12:27 kormat: powering off db2125 for maintenance [[phab:T260670|T260670]]
* 12:25 moritzm: installing xorg-server security updates
* 12:09 ema: text@ulsfo: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 12:02 ema: cp4022: upgrade varnish to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 11:40 Urbanecm: EU B&C window done
* 11:34 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/Translate/tag/TPSection.php: {{Gerrit|fa4900e1e6022e645be12505de30b696a9769e77}}: Fix validation of translation unit section names ([[phab:T263546|T263546]]) (duration: 01m 07s)
* 11:25 jbond42: re-enable puppet fleet wide
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fdab74c443bc3328856e8441f4d2df8bc57c6f54}}: Enable ContentTranslation in Bashkir, Urdu and Welsh WPs as a default tool ([[phab:T258504|T258504]]; [[phab:T260022|T260022]]; [[phab:T260024|T260024]]) (duration: 01m 05s)
* 11:21 jbond42: disable puppet fleet wide to reduce log level on puppetdb
* 11:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|90c72912f26d91df6d28b1efd64e366aaabc5357}}: Move DiscussionTools out of beta on arwiki, cswiki, huwiki ([[phab:T249394|T249394]]); {{Gerrit|d8553f35b4dd581f67bd568d773ff65f316fbfd3}}: Simplify DiscussionTools config (duration: 01m 11s)
* 11:06 moritzm: installing imagemagick security updates on stretch
* 11:02 jbond42: re-enable puppet fleet wide
* 10:51 jbond42: disable puppet fleet wide to deploy a puppetmaster change
* 10:49 moritzm: installing libproxy security updates
* 10:23 volans: uploaded python3-wmflib_0.0.2 to apt.wikimedia.org buster-wikimedia
* 10:20 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12789 and previous config saved to /var/cache/conftool/dbconfig/20200924-102025-kormat.json
* 10:05 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12788 and previous config saved to /var/cache/conftool/dbconfig/20200924-100521-kormat.json
* 10:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:50 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12787 and previous config saved to /var/cache/conftool/dbconfig/20200924-095018-kormat.json
* 09:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 09:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 09:48 jayme: restart pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T255875|T255875]]
* 09:46 jayme: restart pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - [[phab:T255875|T255875]]
* 09:43 jayme: running puppet on lvs servers - [[phab:T255875|T255875]]
* 09:35 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12786 and previous config saved to /var/cache/conftool/dbconfig/20200924-093514-kormat.json
* 09:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 09:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 09:20 ema: cp4021: repool with varnish 6.0.6-1wm1 [[phab:T263557|T263557]]
* 09:19 ema: cp4021: redepool with varnish to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 09:14 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3312 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12785 and previous config saved to /var/cache/conftool/dbconfig/20200924-091445-kormat.json
* 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:14 ema: cp4021: depool and upgrade varnish to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 09:05 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 09:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 08:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 08:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 08:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2127 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12784 and previous config saved to /var/cache/conftool/dbconfig/20200924-082443-marostegui.json
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 100%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12783 and previous config saved to /var/cache/conftool/dbconfig/20200924-082319-root.json
* 08:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:17 volans@cumin1001: START - Cookbook sre.dns.netbox
* 08:15 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:15 XioNoX: configure vrrp_master_pinning in codfw - [[phab:T263212|T263212]]
* 08:10 moritzm: installing mariadb-10.1/mariadb-10.3 updates (packaged version from Debian, not the wmf-mariadb variants we used for mysqld)
* 08:09 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 08:08 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 66%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12782 and previous config saved to /var/cache/conftool/dbconfig/20200924-080816-root.json
* 07:58 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 07:57 marostegui: Remove es2018 from tendril and zarcillo [[phab:T263613|T263613]]
* 07:57 XioNoX: configure vrrp_master_pinning in eqiad - [[phab:T263212|T263212]]
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2109 (re)pooling @ 33%: Slowly repool db2109 ', diff saved to https://phabricator.wikimedia.org/P12781 and previous config saved to /var/cache/conftool/dbconfig/20200924-075312-root.json
* 07:52 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:49 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 07:49 godog: roll restart logstash codfw, gc death
* 07:25 XioNoX: push pfw policies - [[phab:T263674|T263674]]
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Place db2073 into vslow, not api in s4', diff saved to https://phabricator.wikimedia.org/P12780 and previous config saved to /var/cache/conftool/dbconfig/20200924-064018-marostegui.json
* 06:22 elukey: powercycle elastic2037 (host stuck, no mgmt serial console working, DIMM errors in racadm getsel)
* 05:57 marostegui: Remove es2012 from tendril and zarcillo [[phab:T263613|T263613]]
* 05:41 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2012 and es2018 from dbctl - [[phab:T263615|T263615]] [[phab:T263613|T263613]]', diff saved to https://phabricator.wikimedia.org/P12778 and previous config saved to /var/cache/conftool/dbconfig/20200924-053001-marostegui.json
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12777 and previous config saved to /var/cache/conftool/dbconfig/20200924-052207-marostegui.json
* 01:25 ryankemper: Root cause of sigkill of `elasticsearch_5@production-logstash-eqiad.service` appears to be OOMKill of the java process: `Killed process 1775 (java) total-vm:8016136kB, anon-rss:4888232kB, file-rss:0kB, shmem-rss:0kB`. Service appears to have restarted itself and is healthy again
* 01:21 ryankemper: Observed that `elasticsearch_5@production-logstash-eqiad.service` is in a `failed` state since `Thu 2020-09-24 00:53:53 UTC`; appears the process received a SIGKILL - not sure why
* 01:19 ryankemper: Getting `connection refused` when trying to `curl -X GET 'http://localhost:9200/_cluster/health'` on `logstash1009`
* 01:16 ryankemper: (after) `<nowiki>{</nowiki>"cluster_name":"production-elk7-codfw","status":"green","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":868,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0`
* 01:16 ryankemper: Ran `curl -X POST 'http://localhost:9200/_cluster/reroute?retry_failed=true'`, cluster status is green again
* 01:15 ryankemper: (before) `<nowiki>{</nowiki>"cluster_name":"production-elk7-codfw","status":"yellow","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":866,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":2,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0`
* 01:14 ryankemper: (before) `<nowiki>{</nowiki>"cluster_name":"production-elk7-codfw","status":"yellow","timed_out":false,"number_of_nodes":12,"number_of_data_nodes":7,"active_primary_shards":459,"active_shards":866,"relocating_shards":4,"initializing_shards":0,"unassigned_shards":2,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0


== 2020-09-23 ==
== 2021-07-08 ==
* 23:52 mutante: alert1001 - systemctl restar ircecho because icinga-wm left the chat
* 22:48 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Add configuration to use Score with Shellbox (still disabled) (2/2) - [[phab:T281423|T281423]] (duration: 00m 57s)
* 23:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cbd77e3dff0d56b851b3d15b4d267d1faacfae26}}: Add new Racine namespace to frwiktionary ([[phab:T263525|T263525]]) (duration: 01m 05s)
* 22:46 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add configuration to use Score with Shellbox (still disabled) (1/2) - [[phab:T281423|T281423]] (duration: 00m 58s)
* 23:44 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
* 19:29 legoktm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Score/includes/Score.php: Allow setting a different path for `convert` just for Score (2/2) (duration: 00m 57s)
* 23:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 19:27 legoktm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Score/extension.json: Allow setting a different path for `convert` just for Score (1/2) (duration: 00m 58s)
* 23:40 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:56 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:37 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:55 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|22382a97ec252488a346fbf0c3d40bc974d0cdbe}}: remove wtp2005 from wgLinterSubmitterWhitelist ([[phab:T257903|T257903]]) (duration: 01m 04s)
* 18:53 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:14 eileen: civicrm revision changed from {{Gerrit|32a82aa1b7}} to {{Gerrit|eb90dbcfd3}}, config revision is {{Gerrit|2a55766237}}
* 17:02 joal@deploy1002: Finished deploy [analytics/refinery@51a73f1] (hadoop-test): Analytics deploy for Gobblin replacing Camus - hadoop-test [analytics/refinery@51a73f1] (duration: 05m 38s)
* 23:13 eileen: civicrm revision is {{Gerrit|32a82aa1b7}}, config revision is {{Gerrit|2a55766237}}
* 16:56 joal@deploy1002: Started deploy [analytics/refinery@51a73f1] (hadoop-test): Analytics deploy for Gobblin replacing Camus - hadoop-test [analytics/refinery@51a73f1]
* 23:10 mutante: ganeti5003 - rebooting install5001 - OS install on 3001/4001/5001  [[phab:T263684|T263684]]
* 16:47 joal@deploy1002: Finished deploy [analytics/refinery@51a73f1]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@51a73f1] (duration: 03m 17s)
* 23:04 mutante: ganeti4003 - rebooting install4001
* 16:44 joal@deploy1002: Started deploy [analytics/refinery@51a73f1]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@51a73f1]
* 22:51 mutante: ganeti5003 - rebooting install5001
* 15:37 otto@deploy1002: Finished deploy [analytics/refinery@9883dbf] (hadoop-test): Deploy for event_default_test job in hadoop test - [[phab:T271232|T271232]] (duration: 03m 06s)
* 22:27 mutante: ganeti5003 - gnt-instance start install5001
* 15:34 otto@deploy1002: Started deploy [analytics/refinery@9883dbf] (hadoop-test): Deploy for event_default_test job in hadoop test - [[phab:T271232|T271232]]
* 21:40 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:29 otto@deploy1002: Finished deploy [analytics/refinery@51f4696] (hadoop-test): Deploy for eventlogging_legacy gobblin with final import path - [[phab:T271232|T271232]] (duration: 05m 27s)
* 21:38 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:23 otto@deploy1002: Started deploy [analytics/refinery@51f4696] (hadoop-test): Deploy for eventlogging_legacy gobblin with final import path - [[phab:T271232|T271232]]
* 21:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:11 otto@deploy1002: Finished deploy [analytics/refinery@42541e6] (hadoop-test): Deploy for eventlogging_legacy gobblin migration - [[phab:T271232|T271232]] (duration: 05m 42s)
* 21:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:05 otto@deploy1002: Started deploy [analytics/refinery@42541e6] (hadoop-test): Deploy for eventlogging_legacy gobblin migration - [[phab:T271232|T271232]]
* 21:30 dancy@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.10 (duration: 01m 04s)
* 14:52 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add consumers.analytics_hadoop-ingestion stream config settings for automated gobblin imports - [[phab:T271232|T271232]] [[phab:T273901|T273901]] (duration: 01m 09s)
* 21:29 dancy@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.10
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16807 and previous config saved to /var/cache/conftool/dbconfig/20210708-134421-root.json
* 21:24 dancy@deploy1001: Finished scap: (no justification provided) (duration: 42m 52s)
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16806 and previous config saved to /var/cache/conftool/dbconfig/20210708-132917-root.json
* 21:12 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16805 and previous config saved to /var/cache/conftool/dbconfig/20210708-131414-root.json
* 21:06 volans@cumin1001: START - Cookbook sre.dns.netbox
* 13:04 otto@deploy1002: Finished deploy [analytics/refinery@2d4c645]: Make gobblin-netflow use production directory - [[phab:T271232|T271232]] (duration: 03m 22s)
* 20:57 mepps: updated payments-wiki from {{Gerrit|7bb99ce03a}} to {{Gerrit|f89c594e12}}
* 13:01 otto@deploy1002: Started deploy [analytics/refinery@2d4c645]: Make gobblin-netflow use production directory - [[phab:T271232|T271232]]
* 20:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16804 and previous config saved to /var/cache/conftool/dbconfig/20210708-125910-root.json
* 20:42 dancy: dancy@deploy1001 Started scap: Deploying fixes for [[phab:T263601|T263601]] and [[phab:T263675|T263675]] to 1.36.0-wmf.10
* 12:52 moritzm: installing klibc security updates on buster
* 20:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 12:38 moritzm: installing openexr security updates
* 20:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103', diff saved to https://phabricator.wikimedia.org/P16803 and previous config saved to /var/cache/conftool/dbconfig/20210708-105353-marostegui.json
* 20:41 dancy@deploy1001: Started scap: (no justification provided)
* 10:20 jbond: upgrade golang-cfssl
* 20:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16802 and previous config saved to /var/cache/conftool/dbconfig/20210708-100947-root.json
* 20:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16801 and previous config saved to /var/cache/conftool/dbconfig/20210708-095443-root.json
* 20:36 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16800 and previous config saved to /var/cache/conftool/dbconfig/20210708-093939-root.json
* 20:36 eileen: civicrm revision changed from {{Gerrit|a789afd79b}} to {{Gerrit|32a82aa1b7}}, config revision is {{Gerrit|2a55766237}}
* 09:25 jbond: upload golang-github-cloudflare-cfssl_1.6.0-1_amd64 to bullseye
* 20:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16799 and previous config saved to /var/cache/conftool/dbconfig/20210708-092436-root.json
* 20:30 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2116', diff saved to https://phabricator.wikimedia.org/P16798 and previous config saved to /var/cache/conftool/dbconfig/20210708-092411-marostegui.json
* 20:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16797 and previous config saved to /var/cache/conftool/dbconfig/20210708-090456-root.json
* 20:28 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:27 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 08:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:22 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16796 and previous config saved to /var/cache/conftool/dbconfig/20210708-084952-root.json
* 20:18 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:50 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:15 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:42 moritzm: imported ganeti 2.16.0 for stretch-security/component/ganeti216 [[phab:T284811|T284811]]
* 20:08 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16795 and previous config saved to /var/cache/conftool/dbconfig/20210708-083449-root.json
* 20:06 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16794 and previous config saved to /var/cache/conftool/dbconfig/20210708-081945-root.json
* 20:02 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2130', diff saved to https://phabricator.wikimedia.org/P16793 and previous config saved to /var/cache/conftool/dbconfig/20210708-081922-marostegui.json
* 19:42 robh: ganeti5002 firmware update before hw testing via [[phab:T261130|T261130]]
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16792 and previous config saved to /var/cache/conftool/dbconfig/20210708-060812-root.json
* 18:57 ryankemper: (Above deploy complete)
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16791 and previous config saved to /var/cache/conftool/dbconfig/20210708-055309-root.json
* 18:54 ryankemper: `scap sync-file wmf-config/ProductionServices.php 'Config: [[gerrit:628978{{!}}cloudelastic: envoy sits in front now (T263073)]]'` from `ryankemper@deploy1001:/srv/mediawiki-staging`
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16790 and previous config saved to /var/cache/conftool/dbconfig/20210708-053805-root.json
* 18:47 ryankemper: Above deploy appears successful, test requests seem to be taking 40ms instead of the previous 140ms
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16789 and previous config saved to /var/cache/conftool/dbconfig/20210708-052302-root.json
* 18:31 ryankemper: HEAD of `/srv/mediawiki-staging` is now at {{Gerrit|7a96d63d862eacf5244eec79b63d29d78fbaa6f7}}  as expected
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092', diff saved to https://phabricator.wikimedia.org/P16788 and previous config saved to /var/cache/conftool/dbconfig/20210708-052216-marostegui.json
* 18:13 Urbanecm: End of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwikiquote --previous-collation=uppercase # [[phab:T263628|T263628]]
* 18:13 Urbanecm: Start of [urbanecm@mwmaint2001 ~]$ mwscript updateCollation.php --wiki=trwikiquote --previous-collation=uppercase # [[phab:T263628|T263628]]
* 18:12 Urbanecm: urbanecm@deploy1001: scap sync-file wmf-config/InitialiseSettings.php 'b1554f36be68106c9364f4aa2fd70d759ad74356: Set $wgCategoryCollation = uca-tr on trwikiquote ([[phab:T263628|T263628]])'
* 18:11 Urbanecm: Logmsgbot seems to be down
* 17:29 robh: migrating ganeti instances off ganeti5002 for troubleshooting per [[phab:T261130|T261130]]
* 16:37 sukhe: upload dnsdist_1.4.0-1~deb10u2 to apt.wm.o (buster) - [[phab:T252132|T252132]]
* 16:00 herron: switching icinga over from icinga1001 to alert1001 [[phab:T247966|T247966]]
* 16:00 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2088:3312 from api now that db2104/db2126 are done [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12775 and previous config saved to /var/cache/conftool/dbconfig/20200923-160010-kormat.json
* 15:58 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12774 and previous config saved to /var/cache/conftool/dbconfig/20200923-155819-kormat.json
* 15:57 robh: updating firmware on mw1360, troubleshooting nic failure issue [[phab:T262151|T262151]]
* 15:57 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/includes/specials/SpecialBlock.php: {{Gerrit|3234fad0d9b370b1cf75093dd13c0e1639619f08}}: SpecialUnblock: Allow getTargetAndType to accept null $par ([[phab:T263642|T263642]]) (duration: 01m 07s)
* 15:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/includes/specials/SpecialUnblock.php: {{Gerrit|3234fad0d9b370b1cf75093dd13c0e1639619f08}}: SpecialUnblock: Allow getTargetAndType to accept null $par ([[phab:T263642|T263642]]) (duration: 01m 08s)
* 15:53 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 15:51 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:48 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:48 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 15:45 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:45 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:44 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 15:44 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 15:43 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:43 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12773 and previous config saved to /var/cache/conftool/dbconfig/20200923-154315-kormat.json
* 15:40 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 15:37 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:33 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 15:30 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:28 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12772 and previous config saved to /var/cache/conftool/dbconfig/20200923-152812-kormat.json
* 15:21 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 15:21 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:13 kormat@cumin1001: dbctl commit (dc=all): 'db2126 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12771 and previous config saved to /var/cache/conftool/dbconfig/20200923-151308-kormat.json
* 14:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:44 kormat@cumin1001: dbctl commit (dc=all): 'db2126 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12770 and previous config saved to /var/cache/conftool/dbconfig/20200923-144441-kormat.json
* 14:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:37 herron: grew prometheus1004 prometheus-ops filesystem to 1.6T
* 14:35 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:629383{{!}}Enable repo config propagateChangeVisibility everywhere]], 2/2 (duration: 01m 06s)
* 14:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:629383{{!}}Enable repo config propagateChangeVisibility everywhere]], 1/2 (duration: 01m 06s)
* 13:50 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12769 and previous config saved to /var/cache/conftool/dbconfig/20200923-135028-kormat.json
* 13:35 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12768 and previous config saved to /var/cache/conftool/dbconfig/20200923-133525-kormat.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 100%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12766 and previous config saved to /var/cache/conftool/dbconfig/20200923-132918-root.json
* 13:20 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12765 and previous config saved to /var/cache/conftool/dbconfig/20200923-132022-kormat.json
* 13:20 moritzm: installing ruby-json security updates
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 75%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12764 and previous config saved to /var/cache/conftool/dbconfig/20200923-131414-root.json
* 13:05 kormat@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12763 and previous config saved to /var/cache/conftool/dbconfig/20200923-130518-kormat.json
* 13:04 moritzm: installing multipath-tools bugfix updates from buster 10.5 point release
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 25%: Slowly repool db2074 ', diff saved to https://phabricator.wikimedia.org/P12762 and previous config saved to /var/cache/conftool/dbconfig/20200923-125911-root.json
* 12:49 moritzm: installing libunwind bugfix updates from buster 10.5 point release
* 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2104 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12761 and previous config saved to /var/cache/conftool/dbconfig/20200923-123922-kormat.json
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074', diff saved to https://phabricator.wikimedia.org/P12760 and previous config saved to /var/cache/conftool/dbconfig/20200923-123806-marostegui.json
* 12:37 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:37 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:36 kormat@cumin1001: dbctl commit (dc=all): 'Add db2088:3312 to api while db2104 gets depooled [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12759 and previous config saved to /var/cache/conftool/dbconfig/20200923-123649-kormat.json
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2074 (re)pooling @ 25%: Slowly db2074 ', diff saved to https://phabricator.wikimedia.org/P12758 and previous config saved to /var/cache/conftool/dbconfig/20200923-123528-root.json
* 12:22 ema: cp4027: repool with varnish 6.0.6-1wm1 [[phab:T263557|T263557]]
* 12:09 ema: cp4027: depool and upgrade varnish to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 11:52 moritzm: installing GNUTLS bugfix updates from buster 10.5 point release
* 11:51 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.Homepage.GrowthTasksApi.js: {{Gerrit|73b5ce82b3913232b708405147f0bb6d27128974}}: Fix GrowthTasksApi lazy-loading flags for pages with no views ([[phab:T263611|T263611]]) (duration: 01m 05s)
* 11:49 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/modules/help/ext.growthExperiments.PostEdit.js: {{Gerrit|1ab31a966edc4748f82f75bb370371733c2ca090}}: Mark pageviews as not used in the mobile postedit notice ([[phab:T263611|T263611]]) (duration: 01m 06s)
* 11:38 Urbanecm: Revert https://gerrit.wikimedia.org/r/c/mediawiki/core/+/629188 and fetch to deploy1001 to unblock EU B&C deployment ([[phab:T237467|T237467]]; cc twentyafterfour)
* 11:27 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12756 and previous config saved to /var/cache/conftool/dbconfig/20200923-112712-kormat.json
* 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12755 and previous config saved to /var/cache/conftool/dbconfig/20200923-111209-kormat.json
* 11:08 Urbanecm: Create ContentTranslation tables at testwiki using SQL files from `/srv/mediawiki/php-1.36.0-wmf.10/extensions/ContentTranslation/sql` ([[phab:T263417|T263417]]
* 10:57 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12754 and previous config saved to /var/cache/conftool/dbconfig/20200923-105705-kormat.json
* 10:42 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12753 and previous config saved to /var/cache/conftool/dbconfig/20200923-104202-kormat.json
* 10:21 kormat@cumin1001: dbctl commit (dc=all): 'db2088:3312 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12752 and previous config saved to /var/cache/conftool/dbconfig/20200923-102120-kormat.json
* 10:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:20 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2084 after index changes [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12751 and previous config saved to /var/cache/conftool/dbconfig/20200923-100156-marostegui.json
* 10:01 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:629133{{!}}Configure entityDataCachePaths for Wikibase]] (duration: 01m 05s)
* 09:59 elukey: update puppet compiler's facts
* 09:57 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:620050{{!}}Remove $wgExtraLanguageNames from Wikidata and Commons (T260118)]], part 2/2 (production no-op) (duration: 01m 04s)
* 09:55 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:620050{{!}}Remove $wgExtraLanguageNames from Wikidata and Commons (T260118)]], part 1/2 (duration: 01m 16s)
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12750 and previous config saved to /var/cache/conftool/dbconfig/20200923-094511-marostegui.json
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12748 and previous config saved to /var/cache/conftool/dbconfig/20200923-083200-marostegui.json
* 08:29 moritzm: installing dbus security updates on buster
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12747 and previous config saved to /var/cache/conftool/dbconfig/20200923-080651-marostegui.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index changes [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12746 and previous config saved to /var/cache/conftool/dbconfig/20200923-071129-marostegui.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 to re-add change_revision_id index [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12745 and previous config saved to /var/cache/conftool/dbconfig/20200923-070926-marostegui.json
* 06:34 marostegui: Stop MySQL on es2012 and es2018 [[phab:T263613|T263613]] [[phab:T263615|T263615]]
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2018 [[phab:T263615|T263615]]', diff saved to https://phabricator.wikimedia.org/P12744 and previous config saved to /var/cache/conftool/dbconfig/20200923-063140-marostegui.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2012 for decommmissioning', diff saved to https://phabricator.wikimedia.org/P12743 and previous config saved to /var/cache/conftool/dbconfig/20200923-060812-marostegui.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2084 after index removal [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12742 and previous config saved to /var/cache/conftool/dbconfig/20200923-055850-marostegui.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 [[phab:T262856|T262856]]', diff saved to https://phabricator.wikimedia.org/P12741 and previous config saved to /var/cache/conftool/dbconfig/20200923-055531-marostegui.json
* 05:37 marostegui: Purge global_status_log table on tendril - [[phab:T252331|T252331]]
* 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:03 marostegui: Remove triggers from db2094:3313 for MCR schema change [[phab:T238966|T238966]]
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2074 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12739 and previous config saved to /var/cache/conftool/dbconfig/20200923-050234-marostegui.json
* 04:25 eileen: civicrm revision changed from {{Gerrit|8f32b6301f}} to {{Gerrit|a789afd79b}}, config revision is {{Gerrit|9933605187}}


== 2020-09-22 ==
== 2021-07-07 ==
* 23:27 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: clientError: enable on ja,es,de,ru,it,zh,pt wikipedias ([[phab:T255585|T255585]]) (duration: 01m 04s)
* 20:22 legoktm: repooling eqiad - https://gerrit.wikimedia.org/r/703561
* 23:24 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable watchlist expiry feature ([[phab:T261249|T261249]]) (duration: 01m 06s)
* 18:06 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Add Shellbox to <nowiki>{</nowiki>Production,Labs<nowiki>}</nowiki>Services.php (2/2) (duration: 00m 59s)
* 21:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:05 legoktm@deploy1002: Synchronized wmf-config/LabsServices.php: Add Shellbox to <nowiki>{</nowiki>Production,Labs<nowiki>}</nowiki>Services.php (1/2) (duration: 00m 59s)
* 21:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:04 otto@deploy1002: Finished deploy [analytics/refinery@46c0b84] (hadoop-test): Deploy for gobblin migration - Refine now supports gzip - [[phab:T271232|T271232]] (duration: 05m 28s)
* 20:46 ebernhardson: [[phab:T259539|T259539]] enabled adaptive replica selection on elasticsearch at search.svc.eqiad.wmnet:9[246]43
* 17:59 legoktm@deploy1002: Synchronized private/readme.php: Document $wgShellboxSecretKey in private/readme.php (duration: 01m 01s)
* 20:44 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:58 otto@deploy1002: Started deploy [analytics/refinery@46c0b84] (hadoop-test): Deploy for gobblin migration - Refine now supports gzip - [[phab:T271232|T271232]]
* 20:43 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.10
* 17:54 otto@deploy1002: Finished deploy [analytics/refinery@46c0b84]: Deploy for gobblin migration - Refine now supports gzip - [[phab:T271232|T271232]] (duration: 17m 22s)
* 20:42 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 17:36 otto@deploy1002: Started deploy [analytics/refinery@46c0b84]: Deploy for gobblin migration - Refine now supports gzip - [[phab:T271232|T271232]]
* 20:31 dancy@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.10 (duration: 42m 21s)
* 16:55 joal@deploy1002: Finished deploy [analytics/refinery@b5c4462]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@b5c4462] (duration: 03m 10s)
* 20:30 mutante: gerrit2001 (gerrit-replica) restarting gerrit service
* 16:52 joal@deploy1002: Started deploy [analytics/refinery@b5c4462]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@b5c4462]
* 19:49 dancy@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.10
* 16:28 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:44 dancy@deploy1001: Pruned MediaWiki: 1.36.0-wmf.5 (duration: 17m 59s)
* 16:15 joal@deploy1002: Finished deploy [analytics/refinery@b5c4462] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@b5c4462] (duration: 10m 21s)
* 19:31 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:05 joal@deploy1002: Started deploy [analytics/refinery@b5c4462] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@b5c4462]
* 19:29 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:03 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:01 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 15:25 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:49 moritzm: installing djvulibre security updates
* 16:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 14:05 _joe_: powercycling mw2267, stuck witout network, blank console
* 16:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:25 otto@deploy1002: Finished deploy [analytics/refinery@8de71e6] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin dir fixes - [[phab:T271232|T271232]] (duration: 05m 41s)
* 16:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 13:19 otto@deploy1002: Started deploy [analytics/refinery@8de71e6] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin dir fixes - [[phab:T271232|T271232]]
* 16:00 robh: running dell epsa test on down host mw1360 per [[phab:T262151|T262151]]
* 13:13 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:34 moritzm: installing  nginx security updates on buster
* 13:13 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:33 shdubsh: restart apache on prometheus nodes to pick up new ext endpoint
* 13:12 otto@deploy1002: Finished deploy [analytics/refinery@8de71e6]: analytics cluster deploy for webrequest gobblin job migration - [[phab:T271232|T271232]] (duration: 03m 11s)
* 14:24 ema: upload libvmod-re2 1.5.3-1 to buster-wikimedia component/varnish6 [[phab:T261632|T261632]]
* 13:09 otto@deploy1002: Started deploy [analytics/refinery@8de71e6]: analytics cluster deploy for webrequest gobblin job migration - [[phab:T271232|T271232]]
* 14:24 papaul: rebooting ms-be2019
* 12:12 urbanecm: Start server-side upload for 3 video files ([[phab:T286173|T286173]], [[phab:T286175|T286175]], [[phab:T286174|T286174]])
* 14:15 XioNoX: upgrade FNM on netflow2001 - [[phab:T257035|T257035]]
* 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx1002.wikimedia.org
* 14:12 jayme: running ipvsadm -D -t 10.2.1.19:1970; ipvsadm -D -t 10.2.1.21:24766 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - [[phab:T255868|T255868]] [[phab:T255877|T255877]]
* 11:49 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host mx1002.wikimedia.org
* 14:12 jayme: running ipvsadm -D -t 10.2.2.19:1970; ipvsadm -D -t 10.2.2.21:24766 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - [[phab:T255868|T255868]] [[phab:T255877|T255877]]
* 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx2002.wikimedia.org
* 14:11 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T255868|T255868]] [[phab:T255877|T255877]]
* 11:29 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host mx2002.wikimedia.org
* 14:10 XioNoX: upgrade FNM on netflow5001 - [[phab:T257035|T257035]]
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16782 and previous config saved to /var/cache/conftool/dbconfig/20210707-112149-root.json
* 14:09 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - [[phab:T255868|T255868]] [[phab:T255877|T255877]]
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16781 and previous config saved to /var/cache/conftool/dbconfig/20210707-110645-root.json
* 14:09 shdubsh: restart statsv on webperf[1-2]001 to route metrics through statsd-exporter
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16780 and previous config saved to /var/cache/conftool/dbconfig/20210707-105142-root.json
* 14:09 XioNoX: upgrade FNM on netflow1001 - [[phab:T257035|T257035]]
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16779 and previous config saved to /var/cache/conftool/dbconfig/20210707-103638-root.json
* 14:06 XioNoX: upgrade FNM on netflow3001 - [[phab:T257035|T257035]]
* 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316', diff saved to https://phabricator.wikimedia.org/P16778 and previous config saved to /var/cache/conftool/dbconfig/20210707-103553-marostegui.json
* 14:05 jayme: running puppet on lvs servers - [[phab:T255868|T255868]] [[phab:T255877|T255877]]
* 07:56 moritzm: bounced elasticsearch_5@production-logstash-eqiad on logstash1009
* 14:03 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 07:03 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:02 hnowlan: roll-restarting restbase codfw for java updates
* 13:59 XioNoX: add fastnetmon_1.1.7 to buster-wikimedia repo - [[phab:T257035|T257035]]
* 13:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 13:55 ema: upload varnish-modules 0.15.0-1+wmf1 to buster-wikimedia component/varnish6 [[phab:T261632|T261632]]
* 13:49 marostegui: Deploy MCR change on db2098:3313 - [[phab:T238966|T238966]]
* 13:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:39 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:35 ema: upload libvmod-netmapper 1.8-1 to buster-wikimedia component/varnish6 [[phab:T261632|T261632]]
* 12:54 ema: upload varnishkafka 1.1.0-1 to buster-wikimedia component/varnish6 [[phab:T261632|T261632]]
* 12:11 moritzm: installing python3.7 security updates on Buster
* 12:09 moritzm: installing bundler updates on buster
* 11:59 Urbanecm: Reset password for SUL User:Freibo
* 11:58 Lucas_WMDE: EU backport&config window done
* 11:56 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2001:~$ mwscript namespaceDupes.php trwikisource --fix {{!}} tee [[phab:T263358|T263358]].fix # 1350 to fix, 1350 resolvable, 0 deleted
* 11:55 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2001:~$ mwscript namespaceDupes.php trwikisource {{!}} tee [[phab:T263358|T263358]].dryrun # 1350 to fix, 1350 resolvable, 0 deleted
* 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:628598{{!}}Create Portal and Portal_talk namespaces on trwikisource, and fix an incorrect alias (T263358)]] (duration: 00m 57s)
* 11:47 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:628521{{!}}Removing Wikipedia store link from enwiki (T262329)]] (duration: 00m 57s)
* 11:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:628515{{!}}Set timezone for wikis of the CWIRP to Europe/Rome (T263123)]] (duration: 00m 59s)
* 11:35 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 11:35 hnowlan: roll-restarting restbase eqiad for java updates
* 11:25 ema: upload varnish 6.0.6-1wm1 to buster-wikimedia component/varnish6 [[phab:T261632|T261632]]
* 11:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 11:13 moritzm: installing intel-microcode 3.20200616.1 on Buster baremetal servers (compared to to current installed packages this reverts microcode changes for some Skylake CPUs we don't use
* 11:00 moritzm: installing intel-microcode 3.20200616.1 on Stretch baremetal servers (compared to to current installed packages this reverts microcode changes for some Skylake CPUs we don't use
* 10:51 XioNoX: Add policy-options for primary IXPs to all routers - [[phab:T262517|T262517]]
* 10:48 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 10:48 hnowlan: roll-restarting sessionstore for java security updates
* 10:44 moritzm: installing bacula security updates on stretch
* 10:33 moritzm: installing remaining libx11 security updates
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: Slowly repool es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12733 and previous config saved to /var/cache/conftool/dbconfig/20200922-101342-root.json
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: Slowly repool es2033 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12732 and previous config saved to /var/cache/conftool/dbconfig/20200922-101324-root.json
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: Slowly es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12731 and previous config saved to /var/cache/conftool/dbconfig/20200922-101308-root.json
* 10:00 kormat: deploying schema change to s2 in eqiad. labsdb will have s2 lag until this finishes. [[phab:T259831|T259831]]
* 09:59 jayme: running ipvsadm -D -t 10.2.1.45:34192; ipvsadm -D -t 10.2.1.42:35192 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - [[phab:T255873|T255873]] [[phab:T255870|T255870]]
* 09:59 jayme: running ipvsadm -D -t 10.2.2.45:34192; ipvsadm -D -t 10.2.2.42:35192 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - [[phab:T255873|T255873]] [[phab:T255870|T255870]]
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 75%: Slowly repool es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12730 and previous config saved to /var/cache/conftool/dbconfig/20200922-095839-root.json
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 75%: Slowly repool es2033 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12729 and previous config saved to /var/cache/conftool/dbconfig/20200922-095821-root.json
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 75%: Slowly es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12728 and previous config saved to /var/cache/conftool/dbconfig/20200922-095805-root.json
* 09:57 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T255873|T255873]] [[phab:T255870|T255870]]
* 09:55 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - [[phab:T255873|T255873]] [[phab:T255870|T255870]]
* 09:51 jayme: running puppet on lvs servers - [[phab:T255873|T255873]] [[phab:T255870|T255870]]
* 09:46 jbond@cumin1001: END (FAIL) - Cookbook sre.pdus.rotate-password (exit_code=99)
* 09:46 jbond@cumin1001: START - Cookbook sre.pdus.rotate-password
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 50%: Slowly repool es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12727 and previous config saved to /var/cache/conftool/dbconfig/20200922-094336-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 50%: Slowly repool es2033 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12726 and previous config saved to /var/cache/conftool/dbconfig/20200922-094317-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 50%: Slowly es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12725 and previous config saved to /var/cache/conftool/dbconfig/20200922-094302-root.json
* 09:30 volans: repooling ulsfo after merging DNS migration to Netbox zonefiles - [[phab:T258729|T258729]]
* 09:30 jbond@cumin1001: END (PASS) - Cookbook sre.pdus.uptime (exit_code=0)
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 25%: Slowly repool es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12724 and previous config saved to /var/cache/conftool/dbconfig/20200922-092832-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 25%: Slowly repool es2033 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12723 and previous config saved to /var/cache/conftool/dbconfig/20200922-092814-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 25%: Slowly es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12722 and previous config saved to /var/cache/conftool/dbconfig/20200922-092758-root.json
* 09:26 jbond@cumin1001: START - Cookbook sre.pdus.uptime
* 09:24 XioNoX: replace BGP_IXP_in with BGP_IXP_PRIMARY_in on cr3-ulsfo IX BGP group - [[phab:T262517|T262517]]
* 09:22 XioNoX: add BGP_IXP_PRIMARY_in to cr3-ulsfo - [[phab:T262517|T262517]]
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: Slowly repool es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12721 and previous config saved to /var/cache/conftool/dbconfig/20200922-091329-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: Slowly repool es2033 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12720 and previous config saved to /var/cache/conftool/dbconfig/20200922-091310-root.json
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: Slowly es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12719 and previous config saved to /var/cache/conftool/dbconfig/20200922-091255-root.json
* 09:11 jbond42: update snmp string on ps1-a8-codfw
* 09:05 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12718 and previous config saved to /var/cache/conftool/dbconfig/20200922-090520-kormat.json
* 08:58 _joe_: restart pybal on lvs2009
* 08:56 _joe_: restarting pybal on lvs2010
* 08:54 _joe_: restarted pybal on lvs1015
* 08:50 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12717 and previous config saved to /var/cache/conftool/dbconfig/20200922-085017-kormat.json
* 08:36 _joe_: restarting pybal low-traffic in eqiad to pick up lvs changes
* 08:35 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12715 and previous config saved to /var/cache/conftool/dbconfig/20200922-083514-kormat.json
* 08:22 volans: migrating ulsfo public DNS records to the Netbox-generated ones - [[phab:T258729|T258729]]
* 08:20 kormat@cumin1001: dbctl commit (dc=all): 'db2076 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12714 and previous config saved to /var/cache/conftool/dbconfig/20200922-082010-kormat.json
* 08:13 kormat: uploaded wmfmariadbpy v0.5 to apt. deploying now to fleet
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2032, es2033 and es2034 for the first time with minimal weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12713 and previous config saved to /var/cache/conftool/dbconfig/20200922-081154-marostegui.json
* 07:57 volans: migrating ulsfo private DNS records to the Netbox-generated ones - [[phab:T258729|T258729]]
* 07:54 kormat@cumin1001: dbctl commit (dc=all): 'db2076 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12712 and previous config saved to /var/cache/conftool/dbconfig/20200922-075429-kormat.json
* 07:51 jayme: running ipvsadm -D -t 10.2.1.18:8080; ipvsadm -D -t 10.2.1.46:3030 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - [[phab:T255879|T255879]] [[phab:T254581|T254581]]
* 07:49 jayme: running ipvsadm -D -t 10.2.2.18:8080; ipvsadm -D -t 10.2.2.46:3030 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - [[phab:T255879|T255879]] [[phab:T254581|T254581]]
* 07:46 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T255879|T255879]] [[phab:T254581|T254581]]
* 07:42 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - [[phab:T255879|T255879]] [[phab:T254581|T254581]]
* 07:39 jayme: running puppet on lvs servers - [[phab:T255879|T255879]] [[phab:T254581|T254581]]
* 07:34 volans: depooling ulsfo to merge DNS migration to Netbox zonefiles - [[phab:T258729|T258729]]
* 07:24 marostegui: Stop MySQL on es2014 - host will be decommissioned [[phab:T262889|T262889]]
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2014 from dbctl [[phab:T262889|T262889]]', diff saved to https://phabricator.wikimedia.org/P12711 and previous config saved to /var/cache/conftool/dbconfig/20200922-071435-marostegui.json
* 07:11 XioNoX: cr1-codfw# run clear bfd session address fe80::f27c:c7ff:fe11:2c1b
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2014 for decommissioning [[phab:T262889|T262889]]', diff saved to https://phabricator.wikimedia.org/P12710 and previous config saved to /var/cache/conftool/dbconfig/20200922-061815-marostegui.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 100%: Slowly repool after recloning es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12709 and previous config saved to /var/cache/conftool/dbconfig/20200922-054455-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 100%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12708 and previous config saved to /var/cache/conftool/dbconfig/20200922-054438-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 100%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12707 and previous config saved to /var/cache/conftool/dbconfig/20200922-054430-root.json
* 05:41 marostegui: Log remove triggers on revision table on db1124:3313 [[phab:T238966|T238966]]
* 05:39 marostegui: Deploy MCR schema change on s3 eqiad, this will generate lag on s3 on labsdb [[phab:T238966|T238966]]
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2032, es2033 and es2034 into dbctl [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12706 and previous config saved to /var/cache/conftool/dbconfig/20200922-053346-marostegui.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 75%: Slowly repool after recloning es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12705 and previous config saved to /var/cache/conftool/dbconfig/20200922-052951-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 75%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12704 and previous config saved to /var/cache/conftool/dbconfig/20200922-052935-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 75%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12703 and previous config saved to /var/cache/conftool/dbconfig/20200922-052926-root.json
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 50%: Slowly repool after recloning es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12702 and previous config saved to /var/cache/conftool/dbconfig/20200922-051448-root.json
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 50%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12701 and previous config saved to /var/cache/conftool/dbconfig/20200922-051431-root.json
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 50%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12700 and previous config saved to /var/cache/conftool/dbconfig/20200922-051423-root.json
* 05:00 marostegui: Add es2032 es2033 and es2034 to tendril and zarcillo [[phab:T261717|T261717]]
* 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2019 (re)pooling @ 25%: Slowly repool after recloning es2034 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12699 and previous config saved to /var/cache/conftool/dbconfig/20200922-045944-root.json
* 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2016 (re)pooling @ 25%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12698 and previous config saved to /var/cache/conftool/dbconfig/20200922-045928-root.json
* 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2013 (re)pooling @ 25%: Slowly repool after recloning es2032 [[phab:T261717|T261717]] ', diff saved to https://phabricator.wikimedia.org/P12697 and previous config saved to /var/cache/conftool/dbconfig/20200922-045919-root.json
* 01:35 ryankemper: `sudo cumin C:profile::services_proxy::envoy 'enable-puppet "adding cloudelastic to the service proxy --rkemper"'` done
* 01:35 ryankemper: woot! `curl -X GET -s 'http://localhost:6105/_cluster/health'` gives a response as expected. (As do 6106 and 6107). Re-enabling puppet across the fleet...
* 01:32 ryankemper: `sudo run-puppet-agent -e "adding cloudelastic to the service proxy --rkemper"` on `mwdebug1002.eqiad.wmnet`
* 01:28 ryankemper: `sudo puppet-merge` done, now will run puppet on a single eqiad appserver and verify we can curl `localhost:610<nowiki>{</nowiki>5,6,7<nowiki>}</nowiki>`
* 01:17 ryankemper: Disabling puppet on affected nodes via `sudo cumin C:profile::services_proxy::envoy 'disable-puppet "adding cloudelastic to the service proxy --rkemper"'`
* 01:17 ryankemper: Going to test patch to stick envoy in front of `cloudelastic`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/628243


== 2020-09-21 ==
== 2021-07-06 ==
* 23:42 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:34 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 23:39 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:34 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 23:37 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 23:36 mutante: debmonitor2002 - systemctl reset-failed
* 18:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 22:59 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 17:25 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0] (duration: 05m 31s)
* 22:57 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 17:20 joal@deploy1002: Started deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0]
* 22:55 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 17:19 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0] (duration: 00m 07s)
* 22:20 mutante: releases.wikimedia.org has been converted to an active-active service with geodns/ backends in both DCs
* 17:19 joal@deploy1002: Started deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0]
* 21:56 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 17:19 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0] (duration: 36m 59s)
* 21:54 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 16:42 joal@deploy1002: Started deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0]
* 21:51 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 15:54 otto@deploy1002: Finished deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration (duration: 05m 24s)
* 21:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:48 otto@deploy1002: Started deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration
* 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16777 and previous config saved to /var/cache/conftool/dbconfig/20210706-140049-root.json
* 21:18 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:53 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 21:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart
* 20:49 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: adjust enwiktionary completion search ranking (duration: 00m 57s)
* 13:49 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
* 20:47 ebernhardson@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/CirrusSearch/: Remove pages from completion search by page id (duration: 01m 00s)
* 13:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart
* 20:04 herron: moving prometheus instance from bast3004 to prometheus3001 [[phab:T243057|T243057]]
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16776 and previous config saved to /var/cache/conftool/dbconfig/20210706-134545-root.json
* 19:46 herron: moving prometheus instance from bast4002 to prometheus4001 [[phab:T243057|T243057]]
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16775 and previous config saved to /var/cache/conftool/dbconfig/20210706-133041-root.json
* 19:38 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Push notifications deployment (4/5) (duration: 00m 57s)
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16774 and previous config saved to /var/cache/conftool/dbconfig/20210706-131537-root.json
* 19:34 mholloway-shell@deploy1001: Synchronized wmf-config/CommonSettings.php: Push notifications deployment (3/5) (duration: 00m 57s)
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16773 and previous config saved to /var/cache/conftool/dbconfig/20210706-120242-root.json
* 19:28 mholloway-shell@deploy1001: Synchronized wmf-config/ProductionServices.php: Push notifications deployment (2/5) (duration: 00m 57s)
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P16772 and previous config saved to /var/cache/conftool/dbconfig/20210706-115820-marostegui.json
* 19:26 mholloway-shell@deploy1001: Synchronized wmf-config/LabsServices.php: Push notifications deployment (1/5) (duration: 00m 57s)
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P16771 and previous config saved to /var/cache/conftool/dbconfig/20210706-115732-marostegui.json
* 19:19 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16770 and previous config saved to /var/cache/conftool/dbconfig/20210706-114739-root.json
* 19:18 mepps: updated crm to {{Gerrit|8f32b6301f}}
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16769 and previous config saved to /var/cache/conftool/dbconfig/20210706-113235-root.json
* 19:15 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16768 and previous config saved to /var/cache/conftool/dbconfig/20210706-111731-root.json
* 19:14 ejegg: updated fundraising CiviCRM from {{Gerrit|e5ebf9d18a}} to {{Gerrit|8f32b6301f}}
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2071', diff saved to https://phabricator.wikimedia.org/P16767 and previous config saved to /var/cache/conftool/dbconfig/20210706-111635-marostegui.json
* 19:13 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 10:19 moritzm: installing jackson-databind security updates on buster
* 18:59 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:622863 [[phab:T249745|T249745]] (duration: 00m 56s)
* 09:01 _joe_: repooling wdqs1007 now that lag has caught up
* 18:57 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@8afe8d2]: mjolnir daemons update {{Gerrit|I336365}} (duration: 06m 54s)
* 08:43 moritzm: installing libuv1 security updates on buster
* 18:53 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on plwiki ([[phab:T254239|T254239]]) and ptwiki ([[phab:T255027|T255027]]) (duration: 00m 56s)
* 07:06 marostegui: Upgrade db1104 kernel
* 18:50 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@8afe8d2]: mjolnir daemons update {{Gerrit|I336365}}
* 06:54 moritzm: installing PHP 7.3 securiy updates on buster
* 18:33 mepps: updated crm from {{Gerrit|cc1f7e6d13}} to {{Gerrit|e5ebf9d18a}}
* 06:50 marostegui: Upgrade db1122 kernel
* 18:26 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Define Chinese logo variants for Modern Vector (no-op) (part 2) ([[phab:T261153|T261153]]) (duration: 00m 56s)
* 06:35 marostegui: Upgrade db1138 kernel
* 18:25 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Define Chinese logo variants for Modern Vector (no-op) ([[phab:T261153|T261153]]) (duration: 00m 57s)
* 06:31 marostegui: Upgrade db1160 kernel
* 18:21 catrope@deploy1001: Synchronized static/images/mobile/copyright/: Update Chinese logo variants for Modern Vector ([[phab:T261153|T261153]]) (duration: 00m 56s)
* 00:56 eileen: process-control config revision is {{Gerrit|8d46b52ed4}}
* 18:08 XioNoX: add NAT rule to pfw3-codfw - [[phab:T263488|T263488]]
* 17:42 papaul: rebooting ps1-a8-codfw firmware upgrade
* 16:46 papaul: shutting down ms-be2019 for BBU replacing
* 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Slowly repool after on-site maintenance [[phab:T262247|T262247]] ', diff saved to https://phabricator.wikimedia.org/P12696 and previous config saved to /var/cache/conftool/dbconfig/20200921-162433-root.json
* 16:17 papaul: replacing  msw-c8-codfw
* 16:16 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Slowly repool after on-site maintenance [[phab:T262247|T262247]] ', diff saved to https://phabricator.wikimedia.org/P12695 and previous config saved to /var/cache/conftool/dbconfig/20200921-160929-root.json
* 16:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 50%: Slowly repool after on-site maintenance [[phab:T262247|T262247]] ', diff saved to https://phabricator.wikimedia.org/P12694 and previous config saved to /var/cache/conftool/dbconfig/20200921-155426-root.json
* 15:51 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/: [[gerrit:628808{{!}}Introduce and use StatsdMonitoring trait in term store (T262923), Part I]] (duration: 00m 56s)
* 15:50 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/Wikibase/lib/includes/Store/Sql/Terms/Util/StatsdMonitoring.php: [[gerrit:628808{{!}}Introduce and use StatsdMonitoring trait in term store (T262923), Part I]] (duration: 00m 59s)
* 15:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Slowly repool after on-site maintenance [[phab:T262247|T262247]] ', diff saved to https://phabricator.wikimedia.org/P12693 and previous config saved to /var/cache/conftool/dbconfig/20200921-153923-root.json
* 15:24 hnowlan: roll-restarting restbase-dev for java security updates
* 15:24 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Take db2124 back out of dump/vslow [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12692 and previous config saved to /var/cache/conftool/dbconfig/20200921-151210-kormat.json
* 15:10 moritzm: rolling restart of mw canaries in codfw to pick up libx11 update
* 15:07 moritzm: installing libx11 security updates on stretch
* 15:02 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12691 and previous config saved to /var/cache/conftool/dbconfig/20200921-150233-kormat.json
* 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12690 and previous config saved to /var/cache/conftool/dbconfig/20200921-144729-kormat.json
* 14:40 moritzm: installing qemu security updates on ganeti* stretch nodes
* 14:37 papaul: firmware upgrade on db2127
* 14:36 moritzm: installing qemu security updates on ganeti2011 and gnt-instance reboot debmonitor2001
* 14:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:32 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12689 and previous config saved to /var/cache/conftool/dbconfig/20200921-143226-kormat.json
* 14:30 herron: moving prometheus from bast5001 to prometheus5001 [[phab:T243057|T243057]]
* 14:24 papaul: disconnecting mgmt on msw-c1-codfw to re-do cable end [[phab:T263138|T263138]]
* 14:21 marostegui: Set innodb_change_buffering = inserts; on db2125 (s2 slave) for performance testing [[phab:T263443|T263443]]
* 14:17 kormat@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12688 and previous config saved to /var/cache/conftool/dbconfig/20200921-141722-kormat.json
* 14:11 papaul: disconnecting mgmt on msw-d6-codfw to re-do cable end [[phab:T263138|T263138]]
* 14:00 moritzm: installing Java security updates on restbase/sessionstore*
* 13:58 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2117 for schema change, add db2124 to dump/vslow in the interim [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12687 and previous config saved to /var/cache/conftool/dbconfig/20200921-135821-kormat.json
* 13:21 moritzm: installing glib-networking security updates for Stretch
* 13:21 marostegui: Set innodb_change_buffering = inserts; on db2081 (s8 slave) for performance testing [[phab:T263443|T263443]]
* 12:59 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=push-notifications,name=codfw
* 12:38 XioNoX: set same OSPF metric on both eqiad/codfw links - [[phab:T263230|T263230]]
* 12:26 marostegui: Set innodb_change_buffering = all; on db2071 (s1 slave) for performance testing  [[phab:T263443|T263443]]
* 12:26 marostegui: Set innodb_change_buffering = all; on db2129 (s6 master) for performance testing  [[phab:T263443|T263443]]
* 11:38 effie: restart pybal on lvs2009 and lvs1015 - [[phab:T256973|T256973]]
* 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - crashed', diff saved to https://phabricator.wikimedia.org/P12684 and previous config saved to /var/cache/conftool/dbconfig/20200921-113708-marostegui.json
* 11:35 Urbanecm: EU B&C done
* 11:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/MobileFrontend/includes/Transforms/MoveLeadParagraphTransform.php: {{Gerrit|3fab5882505809b412cff641a17ae5d973db04a4}}: Simplify lead paragraph check (duration: 00m 59s)
* 11:22 effie: restart pybal on lvs2010 and lvs1016 - [[phab:T256973|T256973]]
* 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a62212a5a8f4692b860eb3d6c3322c82d88125a9}}: Allow local steward group members to bigdelete (duration: 00m 57s)
* 11:12 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=shnwiktionary --fix # [[phab:T256348|T256348]] # P12683
* 11:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1cf4664df87f10bf60b47345dfe3c52d7dc24f6c}}: Set WT namespace alias to NS_PROJECT in shn.wiktionary ([[phab:T256348|T256348]]) (duration: 00m 57s)
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|01ba82866f3e04c7c635e9089fed4269190b93f0}}: Add archive.wul.waseda.ac.jp to the wgCopyUploadDomains ([[phab:T261037|T261037]]) (duration: 00m 57s)
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bd51f47b1f60fbfafdcc623ae22dcadf2c927876}}: Add *.70yearsindonesiaaustralia.com to the wgCopyUploadsDomains allowlist of commonswiki ([[phab:T262238|T262238]]) (duration: 00m 57s)
* 11:02 effie: restart pybal on lvs2010 and lvs1016 - [[phab:T256973|T256973]]
* 10:36 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:628766{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 10:35 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:628766{{!}} Bumping portals to master (T128546)]] (duration: 01m 12s)
* 09:03 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: reimage+reclone done [[phab:T263244|T263244]]', diff saved to https://phabricator.wikimedia.org/P12682 and previous config saved to /var/cache/conftool/dbconfig/20200921-090343-kormat.json
* 08:48 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: reimage+reclone done [[phab:T263244|T263244]]', diff saved to https://phabricator.wikimedia.org/P12681 and previous config saved to /var/cache/conftool/dbconfig/20200921-084840-kormat.json
* 08:48 marostegui: Stop MySQL on db2127 for on-site maintenance - [[phab:T262247|T262247]]
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2127 [[phab:T262247|T262247]]', diff saved to https://phabricator.wikimedia.org/P12680 and previous config saved to /var/cache/conftool/dbconfig/20200921-084730-marostegui.json
* 08:33 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: reimage+reclone done [[phab:T263244|T263244]]', diff saved to https://phabricator.wikimedia.org/P12679 and previous config saved to /var/cache/conftool/dbconfig/20200921-083337-kormat.json
* 08:21 godog: swift codfw-prod: bump weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:18 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: reimage+reclone done [[phab:T263244|T263244]]', diff saved to https://phabricator.wikimedia.org/P12678 and previous config saved to /var/cache/conftool/dbconfig/20200921-081833-kormat.json
* 08:15 godog: roll-restart swift-object-replicator in codfw and eqiad for increased concurrency
* 07:53 hashar: Upgrading all CI Jenkins jobs to Quibble 0.0.45
* 07:05 XioNoX: upgrade FNM to 1.1.7 in ulsfo - [[phab:T257035|T257035]]
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2029 and es2030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12677 and previous config saved to /var/cache/conftool/dbconfig/20200921-060053-marostegui.json
* 05:48 marostegui: Set innodb_change_buffering = inserts; on db2129 (s6 master) for performance testing
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12676 and previous config saved to /var/cache/conftool/dbconfig/20200921-054730-marostegui.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12675 and previous config saved to /var/cache/conftool/dbconfig/20200921-052704-marostegui.json
* 05:18 marostegui: Stop mysql on: es2013 es2016 es2019 to clone es2032 es2033 es2034 - [[phab:T261717|T261717]]
* 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12674 and previous config saved to /var/cache/conftool/dbconfig/20200921-050632-marostegui.json
* 05:06 marostegui: Deploy MCR schema change on s8 eqiad master, lag will appear on s8 (wikidata) on labsdb hosts [[phab:T238966|T238966]]
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2013,es2016 and es2019 to clone new hosts [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12673 and previous config saved to /var/cache/conftool/dbconfig/20200921-050305-marostegui.json
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Set es2015 as es2 codfw master [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12672 and previous config saved to /var/cache/conftool/dbconfig/20200921-050228-marostegui.json
* 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12671 and previous config saved to /var/cache/conftool/dbconfig/20200921-045919-marostegui.json
* 04:37 marostegui: Set innodb_change_buffering = inserts; on db2116 for performance testing
* 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2029 and es2030 for the first time with minimal weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12670 and previous config saved to /var/cache/conftool/dbconfig/20200921-043154-marostegui.json


== 2020-09-20 ==
== 2021-07-05 ==
* 08:46 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Tepig10102020' 'Davidfromtheworld' # [[phab:T263317|T263317]]
* 17:40 legoktm: published fixed docker-registry.discovery.wmnet/nodejs10-devel:0.0.4 image ([[phab:T286212|T286212]])
* 07:42 gehel: depooling wdqs2002 to catch up on lag
* 15:24 _joe_: leaving wdqs1007 depooled so that the updater can recover faster, now at 16.5 hours of lag
* 07:36 gehel: restarting blazegraph + updater on wdqs2002
* 14:01 moritzm: uploaded nginx 1.13.9-1+wmf3 for stretch-wikimedoa
* 12:50 marostegui: Stop MySQL on db1117:3321 to clone db1125 [[phab:T286042|T286042]]
* 11:29 moritzm: installing openexr security updates on stretch
* 11:07 moritzm: installing tiff security updates on stretch
* 10:48 moritzm: upgrading PHP on miscweb*
* 10:37 jbond: enable puppet fleet wide to post puppetdb change
* 10:29 marostegui: Optimize ruwiki.logging on s6 eqiad with replication [[phab:T286102|T286102]]
* 10:27 jbond: disable puppet fleet wide to preforem puppetdb change
* 08:15 moritzm: rolling out debmonitor-client 0.3.0
* 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
* 08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
* 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
* 07:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
* 07:04 _joe_: restarting blazegraph, then restarting the updater again
* 06:48 moritzm: start rasdaemon on sretest1001, didn't start after last reboot from a week ago
* 06:47 _joe_: restart wdqs-updater on wdqs1007
* 00:53 eileen: process-control config revision is {{Gerrit|a1717c7fde}}
* 00:47 eileen: process-control config revision is {{Gerrit|24565578f7}}


== 2020-09-19 ==
== 2021-07-04 ==
* 19:03 ariel@deploy1001: Finished deploy [dumps/dumps@14ba6e9]: defer getting db creds until really needed (duration: 00m 04s)
* 17:43 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:702957{{!}}Revert "Replace depricating method IContextSource::getWikiPage to WikiPageFactory usage" (T286140)]] (duration: 01m 06s)
* 19:02 ariel@deploy1001: Started deploy [dumps/dumps@14ba6e9]: defer getting db creds until really needed
* 08:02 elukey: repool eqsin after equinix maintenance - [[phab:T286113|T286113]]
* 16:49 ejegg: reverted PayPal failmail diversion - IPN verification is working again
* 16:27 ejegg: Diverted SmashPig PayPal failmail to eeggleston only


== 2020-09-18 ==
== 2021-07-03 ==
* 21:48 tzatziki: changed password for Millennium bug@ptwiki
* 17:46 elukey: depool eqsin due to loss of power redundancy (equinix maintenance) - [[phab:T286113|T286113]]
* 19:28 eileen: process-control config revision is {{Gerrit|739ea754ca}}
* 09:12 Amir1: restarting mailman3-web on lists1001 to pick up patches for [[phab:T283659|T283659]]
* 18:52 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:53 Amir1: patching postorius and mailmanclient on lists1001 for [[phab:T283659|T283659]]
* 18:46 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:44 ryankemper: `sudo kill 254017 254018 254028 254029` to kill some dangling serdi / gzip processes, all the wikidata cleanup should be complete
* 18:38 ryankemper: `sudo kill 126121 126122 126124 126128 249520 249521 254016 254027` on `snapshot1008` to terminate wikidata dump jobs that are in a bad state
* 18:10 ryankemper: Removed stale `wikidatardf-dumps` crontab entry from `dumpsgen@snapshot1008`, stored backup of previous state of crontab in the (admittedly verbose) `/tmp/dumpsgen_crontab_before_removing_stale_wikidata_dump_entry_see_gerrit_puppet_patch_622342`
* 17:15 mutante: lists1001 - apt-get install pwgen to generate passwords (this was installed on previous list server but apparently not puppetized, puppet patch coming up)
* 16:23 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:21 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:09 mutante: restarting gerrit service to apply gerrit::628338 to make it dump heap if out of memory ([[phab:T263008|T263008]])
* 14:15 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: labs: Turn on termbox v2 on desktop for wikidatawiki -- noop for production, sanity sync ([[phab:T261488|T261488]]) (duration: 00m 56s)
* 14:13 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: labs: Turn on termbox v2 on desktop for wikidatawiki -- noop for production, sanity sync ([[phab:T261488|T261488]]) (duration: 01m 00s)
* 13:02 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:00 kormat@cumin2001: START - Cookbook sre.hosts.downtime
* 12:48 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 12:41 kormat: reimaging db2125 [[phab:T263244|T263244]]
* 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12665 and previous config saved to /var/cache/conftool/dbconfig/20200918-123947-kormat.json
* 12:24 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12664 and previous config saved to /var/cache/conftool/dbconfig/20200918-122444-kormat.json
* 12:09 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12663 and previous config saved to /var/cache/conftool/dbconfig/20200918-120940-kormat.json
* 11:54 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12662 and previous config saved to /var/cache/conftool/dbconfig/20200918-115437-kormat.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125', diff saved to https://phabricator.wikimedia.org/P12661 and previous config saved to /var/cache/conftool/dbconfig/20200918-113509-marostegui.json
* 11:15 kormat@cumin1001: dbctl commit (dc=all): 'db2089:3316 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12660 and previous config saved to /var/cache/conftool/dbconfig/20200918-111529-kormat.json
* 10:56 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12659 and previous config saved to /var/cache/conftool/dbconfig/20200918-105645-kormat.json
* 10:45 jiji@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 10:41 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12658 and previous config saved to /var/cache/conftool/dbconfig/20200918-104141-kormat.json
* 10:35 jiji@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 10:34 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:28 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:26 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12657 and previous config saved to /var/cache/conftool/dbconfig/20200918-102638-kormat.json
* 10:11 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12656 and previous config saved to /var/cache/conftool/dbconfig/20200918-101135-kormat.json
* 09:55 kormat@cumin1001: dbctl commit (dc=all): 'db2087:3316 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12655 and previous config saved to /var/cache/conftool/dbconfig/20200918-095554-kormat.json
* 09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:55 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 09:47 twentyafterfour: deployed hotfix for [[phab:T263063|T263063]] to phab1001
* 09:47 jayme: deleting some random pods in kubernetes staging to rebalance load back on kubestage1001 - [[phab:T262527|T262527]]
* 09:46 jayme: uncordoned kubestage1001 - [[phab:T262527|T262527]]
* 09:46 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12654 and previous config saved to /var/cache/conftool/dbconfig/20200918-094608-kormat.json
* 09:31 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 80%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12653 and previous config saved to /var/cache/conftool/dbconfig/20200918-093105-kormat.json
* 09:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:22 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 09:16 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 60%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12652 and previous config saved to /var/cache/conftool/dbconfig/20200918-091601-kormat.json
* 09:00 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 40%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12651 and previous config saved to /var/cache/conftool/dbconfig/20200918-090058-kormat.json
* 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:56 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:56 jayme: reboot kubestage1001 for clean state - [[phab:T262527|T262527]]
* 08:54 elukey: change analytics-in4/in6 filters on cr1/cr2 after https://gerrit.wikimedia.org/r/628300
* 08:47 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:45 kormat@cumin1001: dbctl commit (dc=all): 'db2124 (re)pooling @ 20%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12650 and previous config saved to /var/cache/conftool/dbconfig/20200918-084554-kormat.json
* 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:43 jayme: reboot kubestage1001 for kernel upgrade - [[phab:T262527|T262527]]
* 08:30 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:25 jayme: reboot kubestage1001 for clean state testing - [[phab:T262527|T262527]]
* 08:22 kormat@cumin1001: dbctl commit (dc=all): 'db2124 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12648 and previous config saved to /var/cache/conftool/dbconfig/20200918-082223-kormat.json
* 08:16 klausman: reinstalling stat1004 with Buster
* 07:17 moritzm: installing xdg-utils security updates
* 07:14 XioNoX: push pfw policies - [[phab:T263168|T263168]]
* 07:12 jayme: draining kubestage1001 for kernel upgrade - [[phab:T262527|T262527]]
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2018, es2012 after cloning es2029 and es2030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12647 and previous config saved to /var/cache/conftool/dbconfig/20200918-062127-marostegui.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 after MCR changes', diff saved to https://phabricator.wikimedia.org/P12646 and previous config saved to /var/cache/conftool/dbconfig/20200918-060815-marostegui.json
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1131 after rack move', diff saved to https://phabricator.wikimedia.org/P12645 and previous config saved to /var/cache/conftool/dbconfig/20200918-060724-marostegui.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12644 and previous config saved to /var/cache/conftool/dbconfig/20200918-060103-marostegui.json
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12643 and previous config saved to /var/cache/conftool/dbconfig/20200918-053758-marostegui.json
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2029 and es2030 to dbctl depooled - [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12642 and previous config saved to /var/cache/conftool/dbconfig/20200918-053604-marostegui.json
* 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2018, es2012 after cloning es2029 and es2030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12641 and previous config saved to /var/cache/conftool/dbconfig/20200918-052608-marostegui.json
* 05:15 marostegui: Restart wikibugs


== 2020-09-17 ==
== 2021-07-02 ==
* 23:41 ejegg: updated payments-wiki from {{Gerrit|86c997fdb2}} to {{Gerrit|7bb99ce03a}}
* 22:06 foks: removing three files for legal compliance
* 23:01 ejegg: updated payments-wiki from {{Gerrit|1e5a52ed26}} to {{Gerrit|86c997fdb2}}
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:47 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: {{Gerrit|19b9b9877ea3f8ffa6626108941891c2454348de}}: Fix APCOND_FR_NEVERBLOCKED handling (part 3; [[phab:T262970|T262970]]) (duration: 00m 57s)
* 19:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:25 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:02 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=wikidatawiki --logwiki=metawiki 'Filomena ciavarella' 'Filomena Ciavarella' #[[phab:T262657|T262657]]
* 18:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:54 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:54 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:39 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 18:22 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:39 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:29 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 18:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:11 Urbanecm: Morning B&C done
* 15:59 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|40591d3dfdc2fc360cb060770677a48e2a53362c}}: Enable DiscussionTools beta on jawiki & viwiki ([[phab:T261654|T261654]]; [[phab:T262109|T262109]]) (duration: 00m 56s)
* 15:54 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 18:06 Urbanecm: Move /srv/mediawiki-stagging/grep (owned by tstarling) to /home/urbanecm to make working directory clean (cc TimStarling)
* 15:29 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:17 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dragonfly-supernode1001.eqiad.wmnet
* 17:20 rzl: repooled eqiad at 17:11
* 15:07 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
* 17:12 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 15:05 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dragonfly-supernode1001.eqiad.wmnet
* 17:12 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 15:02 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
* 17:12 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:54 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
* 17:03 papaul: restarting ps1-d8-codfw
* 14:53 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
* 16:45 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 01m 12s)
* 14:52 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
* 16:44 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
* 14:40 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[0-1].eqiad.wmnet
* 16:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 02m 50s)
* 14:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-9].eqiad.wmnet
* 16:41 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
* 14:38 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
* 16:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout (duration: 07m 26s)
* 14:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw142[0-1].eqiad.wmnet
* 16:33 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema, feed timeout
* 14:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-9].eqiad.wmnet
* 16:33 ppchelko@deploy1001: Finished deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema (duration: 06m 14s)
* 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw142[0-1].eqiad.wmnet
* 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw141[4-9].eqiad.wmnet
* 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:15 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw142[0-1].eqiad.wmnet
* 16:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:12 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw141[4-9].eqiad.wmnet
* 16:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:12 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry[2005-2008].codfw.wmnet
* 16:27 ppchelko@deploy1001: Started deploy [restbase/deploy@6f507e0]: Fix up metrics editors-by-country schema
* 13:54 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry[2005-2008].codfw.wmnet
* 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 13:32 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=registry200[5-8].codfw.wmnet,dc=codfw,cluster=docker-registry
* 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 16:21 marostegui: Restart wikibugs
* 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 16:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 16:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 16:15 papaul: replacing msw-d8-codfw
* 13:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
* 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1131 IP after moving it to a different rack [[phab:T262901|T262901]]', diff saved to https://phabricator.wikimedia.org/P12639 and previous config saved to /var/cache/conftool/dbconfig/20200917-160540-marostegui.json
* 13:11 mutante: mw2380 - rebooting
* 16:03 marostegui: Recreate db1131 on tendril [[phab:T262901|T262901]]
* 13:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
* 15:59 marostegui: Update rack location on zarcillo for db1131 [[phab:T262901|T262901]]
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
* 15:57 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 100% [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12638 and previous config saved to /var/cache/conftool/dbconfig/20200917-155708-kormat.json
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
* 15:44 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 75% [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12637 and previous config saved to /var/cache/conftool/dbconfig/20200917-154431-kormat.json
* 12:24 moritzm: added btullis to pwstore
* 15:43 mepps: updated payments-wiki from {{Gerrit|3c073a6a56}} to {{Gerrit|1e5a52ed26}}
* 12:06 mutante: mw2380 /puppetmaster: reimaged, revoking old cert, signing new cert, initial puppet run [[phab:T285603|T285603]]
* 15:35 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 50% [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12636 and previous config saved to /var/cache/conftool/dbconfig/20200917-153514-kormat.json
* 11:51 mutante: mw2380 - PXE booting - does not boot from hard disk
* 15:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:28 mutante: powercycling mw2380, trying to make it boot
* 15:20 kormat@cumin1001: dbctl commit (dc=all): 'db2114: repool at 25% [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12635 and previous config saved to /var/cache/conftool/dbconfig/20200917-152019-kormat.json
* 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 15:17 volans@cumin1001: START - Cookbook sre.dns.netbox
* 11:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12634 and previous config saved to /var/cache/conftool/dbconfig/20200917-151347-marostegui.json
* 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 15:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 11:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 15:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 10:33 jforrester@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/WikibaseMediaInfo: UploadWizard/WikibaseMediaInfo fix {{Gerrit|3fd2873}} for [[phab:T285579{{!}}T285579]] (duration: 00m 59s)
* 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12633 and previous config saved to /var/cache/conftool/dbconfig/20200917-150234-marostegui.json
* 09:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1268.eqiad.wmnet
* 15:02 jynus: deploying extended grants for admin account on sys/p_s at s8@codfw [[phab:T195578|T195578]]
* 09:37 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:702808{{!}}Fix handling of geEnabled flag (T285996)]] (duration: 00m 57s)
* 15:00 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1268.eqiad.wmnet
* 15:00 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 09:24 godog: test thanos 0.21.1 locally on thanos-fe2001 and depool the host - [[phab:T285835|T285835]]
* 14:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 09:19 dcausse: restart blazegraph on wdqs1013
* 14:54 kormat@cumin1001: dbctl commit (dc=all): 'db2114: depool for schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12632 and previous config saved to /var/cache/conftool/dbconfig/20200917-145451-kormat.json
* 09:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1267.eqiad.wmnet
* 14:49 cmjohnson1: ending pdu maintenance in eqiad
* 09:04 mutante: decom'ing mw1267
* 14:40 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:02 moritzm: installing node-hosted-git-info security updates
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12631 and previous config saved to /var/cache/conftool/dbconfig/20200917-143914-marostegui.json
* 09:02 tgr: deploying emergency backport: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/702808
* 14:32 papaul: replacing msw-d1,d2,d3,d4,d5 and d6
* 08:54 moritzm: installing  golang-docker-credential-helpers security updates
* 14:31 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 08:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1267.eqiad.wmnet
* 14:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12630 and previous config saved to /var/cache/conftool/dbconfig/20200917-141825-marostegui.json
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 14:02 marostegui: Start mysql on db1125 after PDU maintenance [[phab:T261459|T261459]]
* 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2125 [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12629 and previous config saved to /var/cache/conftool/dbconfig/20200917-140014-marostegui.json
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 13:33 jayme: ran ipvsadm -D -t 10.2.2.14:8888 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet
* 08:03 moritzm: installing ipmitool security updates
* 13:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1268.eqiad.wmnet
* 13:33 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1267.eqiad.wmnet
* 13:32 jayme: ran ipvsadm -D -t 10.2.2.31:8748 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet
* 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 13:32 jayme: ran ipvsadm -D -t 10.2.1.31:8748 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet
* 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
* 13:32 jayme: ran ipvsadm -D -t 10.2.1.14:8888 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet
* 07:25 dcausse: installing openjdk-8-dbg on wdqs1013
* 13:25 kormat@cumin1001: dbctl commit (dc=all): 'Start depooling db2114 [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12628 and previous config saved to /var/cache/conftool/dbconfig/20200917-132513-kormat.json
* 03:14 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo run-puppet-agent --force'`
* 13:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 03:11 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo apt update'` fixed the issue
* 13:20 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 03:07 ryankemper: [[phab:T264053|T264053]] `Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install elasticsearch-madvise' returned 100: Reading package lists...` grr
* 13:19 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet
* 03:07 ryankemper: [[phab:T264053|T264053]] `ryankemper@elastic2054:~$ sudo run-puppet-agent --force`
* 13:17 marostegui: Stop MySQL on db2125 for on-site maintenance [[phab:T260670|T260670]]
* 03:06 ryankemper: [[phab:T264053|T264053]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/702791; will run puppet on single host
* 13:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 03:05 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo disable-puppet "verify new deb package works - [[phab:T264053|T264053]]"'`
* 13:14 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 03:02 legoktm: uploaded elasticsearch-madvise_0.1~deb9u1_amd64.changes to stretch-wikimedia on apt1001
* 13:13 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet
* 01:47 eileen: civicrm revision changed from {{Gerrit|e07c2be1a7}} to {{Gerrit|bb62188ec6}}, config revision is {{Gerrit|1739c53fcb}}
* 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.9
* 01:16 legoktm: uploaded elasticsearch-madvise 0.1 to apt.wm.o ([[phab:T264053|T264053]])
* 12:18 cmjohnson1: pdu swap maintenance beginning now for racks D1, D2 and C1 eqiad
* 11:24 matthiasmullie: End Euro B&C
* 11:24 mlitn@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/NavigationTiming/: Account for empty layout shift sources array (duration: 01m 05s)
* 11:22 mlitn@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/WikimediaEvents/: Disable MediaSearch A/B test (duration: 01m 08s)
* 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12627 and previous config saved to /var/cache/conftool/dbconfig/20200917-111028-marostegui.json
* 11:06 vgutierrez: update to acme-chief 0.29 on acmechief[12]001 - [[phab:T263006|T263006]]
* 11:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:04 vgutierrez: upload acme-chief 0.29 to apt.wm.o (buster) - [[phab:T263006|T263006]]
* 11:04 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:03 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wikifeeds,name=eqiad
* 10:58 marostegui: Stop mysql on db1125 for PDU mainteanance, lag will appear on s2, s4, s6 and s7 on labsdb hosts [[phab:T261459|T261459]]
* 10:58 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=codfw
* 10:51 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wikifeeds,name=codfw
* 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12626 and previous config saved to /var/cache/conftool/dbconfig/20200917-104816-marostegui.json
* 10:46 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=eqiad
* 10:40 oblivian@cumin1001: conftool action : set/ttl=10; selector: dnsdisc=wikifeeds
* 10:34 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:27 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 10:22 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 10:20 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 10:18 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:17 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 09:14 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 08:49 jayme: deleting some random pods in kubernetes staging to rebalance load back on kubestage1002 - [[phab:T262527|T262527]]
* 08:43 jayme: uncordoned kubestage1002 after kernel upgrade - [[phab:T262527|T262527]]
* 08:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:37 godog: graphite compress /var/log/carbon logs older than 2d
* 08:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:25 jayme: reboot kubestage1002 for kernel upgrade - [[phab:T262527|T262527]]
* 08:24 godog: graphite add 300G to /srv
* 07:55 jayme: draining kubestage1002 for kernel upgrade - [[phab:T262527|T262527]]
* 07:55 jayme: cordoning kubestage1002 for kernel upgrade - [[phab:T262527|T262527]]
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12624 and previous config saved to /var/cache/conftool/dbconfig/20200917-070145-marostegui.json
* 06:55 hashar: Taking a heap dump of Gerrit JVM
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12623 and previous config saved to /var/cache/conftool/dbconfig/20200917-061931-marostegui.json
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12622 and previous config saved to /var/cache/conftool/dbconfig/20200917-060312-marostegui.json
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2015 after cloning es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12621 and previous config saved to /var/cache/conftool/dbconfig/20200917-055219-marostegui.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for on-site maintenace', diff saved to https://phabricator.wikimedia.org/P12620 and previous config saved to /var/cache/conftool/dbconfig/20200917-055158-marostegui.json
* 05:46 marostegui: Stop mysql on db1131 - [[phab:T262901|T262901]]
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2031 on es2 for the first time with minimal weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12619 and previous config saved to /var/cache/conftool/dbconfig/20200917-054226-marostegui.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12618 and previous config saved to /var/cache/conftool/dbconfig/20200917-053503-marostegui.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12617 and previous config saved to /var/cache/conftool/dbconfig/20200917-052347-marostegui.json
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2011 as es1 master and es2017 as es3 master and then depool es2018 and es2012 to clone es2029 and es2030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12616 and previous config saved to /var/cache/conftool/dbconfig/20200917-051741-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2015 after cloning es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12615 and previous config saved to /var/cache/conftool/dbconfig/20200917-050739-marostegui.json
* 04:53 marostegui: Deploy schema change on s1 eqiad primary master - [[phab:T238966|T238966]]
* 01:22 Krinkle: krinkle@mwmaint1002 synced docroot/noc – https://gerrit.wikimedia.org/r/620138
* 01:22 Krinkle: krinkle@mwmaint2001 synced docroot/noc – https://gerrit.wikimedia.org/r/620138


== 2020-09-16 ==
== 2021-07-01 ==
* 23:41 catrope@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FlaggedRevs: [[phab:T262970|T262970]] (duration: 01m 06s)
* 23:29 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:702777{{!}}Revert "deployment training: readme whitespace"]] (duration: 00m 56s)
* 23:40 catrope@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs: [[phab:T262970|T262970]] (duration: 01m 06s)
* 23:21 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:702774{{!}}deployment training: readme whitespace]] (duration: 00m 57s)
* 23:37 catrope@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/GrowthExperiments/: Fix styling for mobile start module ([[phab:T258008|T258008]]); Revert wider task card on desktop ([[phab:T263042|T263042]], [[phab:T258704|T258704]]); Fix width of sidebar modules in narrow mode in variant A ([[phab:T263068|T263068]]) (duration: 01m 09s)
* 22:37 urbanecm: Start server-side upload for 1 video file ([[phab:T285182|T285182]])
* 22:24 shdubsh: install prometheus-icinga-exporter 0.11 on icinga2001
* 22:36 urbanecm: Start server-side upload for 1 video file ([[phab:T285789|T285789]])
* 20:19 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 22:31 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:702704{{!}}Use train-versions.json to map from version to image tag (T282824)]] (duration: 00m 57s)
* 20:19 cdanis@cumin1001: START - Cookbook sre.network.cf
* 22:27 urbanecm: Start server-side upload for 1 video file ([[phab:T285682|T285682]])
* 20:10 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:43 dancy@deploy1002: Synchronized .pipeline/config.yaml: Config: [[gerrit:702755{{!}}Temporarily disable notification for security patch failures]] (duration: 00m 57s)
* 20:04 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:45 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.12
* 18:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Vector search in header on testwiki and officewiki ([[phab:T262207|T262207]]) (duration: 01m 04s)
* 19:41 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 12s)
* 18:00 brennen@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/MobileFrontend: Backport: [[gerrit:627793{{!}}Check $coords matched some nodes before comparing contents (T263034)]] (duration: 01m 06s)
* 19:39 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
* 17:58 joal@deploy1001: Finished deploy [analytics/refinery@07056b0] (thin): Regular analytics weekly train THIN [analytics/refinery@07056b0] (duration: 00m 08s)
* 19:35 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/tests/phpunit/includes/TitleMethodsTest.php: Backport: [[gerrit:702711{{!}}Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
* 17:58 joal@deploy1001: Started deploy [analytics/refinery@07056b0] (thin): Regular analytics weekly train THIN [analytics/refinery@07056b0]
* 19:34 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/Title.php: Backport: [[gerrit:702711{{!}}Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
* 17:51 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 19:18 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/.pipeline/config.yaml: Backport: [[gerrit:702168{{!}}Trigger update-train-versions job at end of wmf-publish pipeline]] (duration: 01m 08s)
* 17:50 joal@deploy1001: Started deploy [analytics/refinery@07056b0]: Regular analytics weekly train [analytics/refinery@07056b0]
* 18:55 otto@deploy1002: Finished deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883] (duration: 05m 19s)
* 17:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:50 otto@deploy1002: Started deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883]
* 17:11 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7995f7abe3b94eb0326064cbbd1d3111f8f21365}}: Use Vue.js for QuickSurveys on available wikis ([[phab:T285890|T285890]]) (duration: 01m 09s)
* 17:03 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: {{Gerrit|654877f92fa18ae766d693630025c69400cad3ac}}: EventDispatcher: Ensure we fetch page content from the primary database ([[phab:T285895|T285895]]) (duration: 01m 12s)
* 16:45 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: {{Gerrit|6d9043087ec421e1321cd6797934928e2651b1c1}}: EventDispatcher: Ensure we fetch page content from the primary database ([[phab:T285895|T285895]]) (duration: 01m 14s)
* 16:40 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:28 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.12"
* 16:13 marostegui: Start mysql on db1093, db1109 and db1123 after pdu work is done
* 16:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:12 ryankemper: `wdqs` deploy complete, service is healthy
* 16:23 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php: [[phab:T285959|T285959]] (duration: 01m 20s)
* 16:09 elukey: reinstall buster on an-tool1009 after a lot of tests (ganeti VM, so it is a manual work)
* 16:11 vgutierrez: restart varnish-fe on cp3059 - [[phab:T285953|T285953]]
* 16:00 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:58 papaul: poweroff mw2380 for disk replacement
* 15:58 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:57 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
* 15:49 ryankemper: sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'; sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'
* 14:53 effie: depool mw2380 for disk repair - [[phab:T285603|T285603]]
* 15:49 ryankemper: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 14:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:48 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@b7e2d0b]: 0.3.48 (duration: 14m 40s)
* 14:51 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:627871{{!}}Rename wmgWikibaseClientLocalEntitySourceName to wmgWikibaseClientItemAndPropertySourceName on Beta (T258060)]] (production no-op) (duration: 01m 04s)
* 14:45 moritzm: installing glib2.0 security updates on buster
* 15:35 ryankemper: Canary `wdqs1003` query tests looks good, proceeding to wdqs deploy for rest of fleet
* 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts maps2002.codfw.wmnet
* 15:33 ryankemper@deploy1001: Started deploy [wdqs/wdqs@b7e2d0b]: 0.3.48
* 13:35 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts maps2002.codfw.wmnet
* 15:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:622994{{!}}Remove `wmgWikibaseClientLocalEntitySourceName` from InitialiseSettings.php (T258060)]] (duration: 01m 05s)
* 13:03 marostegui: Deploy schema change on s2 eqiad master [[phab:T276150|T276150]]
* 15:27 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:622993{{!}}Use `wmgWikibaseClientItemAndPropertySourceName` instead of `wmgWikibaseClientLocalEntitySourceName` in Wikibase.php (T258060)]] (duration: 01m 02s)
* 12:49 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1266.eqiad.wmnet
* 15:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:622612{{!}}Add `wmgWikibaseClientItemAndPropertySourceName` to InitialiseSettings.php (T258060)]] (duration: 01m 06s)
* 12:39 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1266.eqiad.wmnet
* 14:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 12:37 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:41 volans: uploaded spicerack_0.0.43 to apt.wikimedia.org buster-wikimedia
* 12:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1264-1265].eqiad.wmnet
* 14:39 cmjohnson1: pdu swap rack d7-eqiad, missed this in earlier log entry
* 12:23 tgr: EU deploys done
* 14:34 jiji@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 12:22 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/: Backport: [[gerrit:702402{{!}}Welcome tour: Mark as complete when notice is shown (T284800)]] [[gerrit:702404{{!}}SuggestedEdits: Return default JS data as 'noresults' (T285906)]] (duration: 01m 08s)
* 14:02 Urbanecm: Change email address of User:Oversight@enwiki to oversight-en-wp@wikipedia.org as OTRS is back up ([[phab:T262733|T262733]])
* 12:20 tgr@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/: Backport: [[gerrit:702401{{!}}Welcome tour: Mark as complete when notice is shown (T284800)]] [[gerrit:702403{{!}}SuggestedEdits: Return default JS data as 'noresults' (T285906)]] (duration: 01m 09s)
* 13:48 marostegui: Start mysql on db1121 after PDU work
* 12:19 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1264-1265].eqiad.wmnet
* 13:46 James_F: Restarting CI Jenkins for [[phab:T262827|T262827]]
* 12:09 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1263.eqiad.wmnet
* 13:08 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2256.codfw.wmnet
* 11:58 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1263.eqiad.wmnet
* 13:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.9
* 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/: Backport: [[gerrit:702400{{!}}Stop using legacy entityNamespaces setting in onSetupAfterCache hook (T285472)]] (duration: 01m 15s)
* 12:58 elukey: upload hue_4.7.1-1+deb10u1 to buster-wikimedia
* 11:46 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1262.eqiad.wmnet
* 12:56 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 11:35 elukey: reboot ml-serve-ctrl200[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
* 12:56 cdanis@cumin1001: START - Cookbook sre.network.cf
* 11:35 marostegui: Deploy schema change on s8 eqiad master [[phab:T276150|T276150]]
* 12:49 cmjohnson1: start pdu swap in racks c6 and c7, d8
* 11:33 elukey: reboot ml-serve-ctrl100[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
* 12:36 moritzm: powercycling mw2256 (went down with overheated CPU)
* 11:33 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1262.eqiad.wmnet
* 12:29 moritzm: restarting exim on MXes to pick up GNUTLS update
* 11:19 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:29 moritzm: restarting slapd on LDAP replicas to pick up GNUTLS update
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:697851{{!}}Avoid using MWNamespace]] (duration: 01m 06s)
* 11:18 moritzm: installing gnutls28 security updates on remaining stretch hosts
* 11:07 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:12 jforrester@deploy1001: Synchronized php-1.36.0-wmf.9/includes/filerepo/file: [[phab:T263014|T263014]] Revert "Remove support for (Archived{{!}}OldLocal)File::userCan without a user" (duration: 01m 04s)
* 10:27 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool es2027 and es2028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12606 and previous config saved to /var/cache/conftool/dbconfig/20200916-103324-marostegui.json
* 10:05 moritzm: installing remaining libgcrypt20 security updates
* 10:20 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.9
* 09:56 moritzm: installing remaining gnutls28 security updates
* 10:14 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.9 (duration: 46m 07s)
* 09:55 Amir1: start of clean up of autoreview logs in ruwiki ([[phab:T285608|T285608]])
* 10:10 ema: upload python-acme 0.31.0-2wm1 to buster-wikimedia [[phab:T263006|T263006]]
* 09:47 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12605 and previous config saved to /var/cache/conftool/dbconfig/20200916-100548-marostegui.json
* 09:36 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 10:01 akosiaris: [[phab:T187984|T187984]] Shutdown mendelevium.
* 09:36 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:43 jynus: deploying max_packet_size change to m3 instances, too
* 09:35 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:28 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.9
* 09:35 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:26 liw: moving train 1.36.0-wmf.9 to testwikis
* 09:05 marostegui: Deploy schema change on s1 eqiad (db1157) master [[phab:T277123|T277123]]
* 09:22 jynus: restarting gerrit service on gerrit1001, unresponsive
* 08:52 marostegui: Deploy schema change on s1 eqiad (db1163) master [[phab:T277123|T277123]]
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12603 and previous config saved to /var/cache/conftool/dbconfig/20200916-091535-marostegui.json
* 08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1261.eqiad.wmnet
* 09:13 XioNoX: fasw-c-eqiad> request system snapshot slice alternate member 0 - [[phab:T262290|T262290]]
* 08:28 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1261.eqiad.wmnet
* 09:08 XioNoX: fasw-c-eqiad> request system snapshot slice alternate member 1 - [[phab:T262290|T262290]]
* 08:23 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw126[2-6].eqiad.wmnet
* 08:52 marostegui: Stop mysql on db1121, db1123, db1093 and db1109 for PDU work [[phab:T261454|T261454]] [[phab:T261457|T261457]]
* 08:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw126[2-6].eqiad.wmnet
* 08:52 XioNoX: asw-d-codfw> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 08:13 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1261.eqiad.wmnet
* 08:50 jynus: deploy new max_allowed_packet configuration to m1, m2 and m5 dbs
* 08:11 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12601 and previous config saved to /var/cache/conftool/dbconfig/20200916-084916-marostegui.json
* 07:06 marostegui: Deploy schema change on s4 eqiad (db1138) master [[phab:T277123|T277123]]
* 08:42 awight: finished security backport for https://phabricator.wikimedia.org/T262628
* 06:34 marostegui: Deploy schema change on s7 eqiad (db1136) masters [[phab:T277123|T277123]]
* 08:41 awight@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FileImporter/src/Services/ImportPlanValidator.php: Security patch for [[phab:T262628|T262628]] (duration: 00m 59s)
* 06:31 marostegui: Deploy schema change on s2,s8 eqiad masters [[phab:T277123|T277123]]
* 08:41 XioNoX: asw-c-codfw> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 05:57 marostegui: Deploy schema change on s5 eqiad master (db1130) [[phab:T277123|T277123]]
* 08:27 XioNoX: asw-b-codfw> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 05:55 marostegui: Deploy schema change on s6 eqiad master (db1173) [[phab:T277123|T277123]]
* 08:26 awight: beginning security backport for https://phabricator.wikimedia.org/T262628
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129', diff saved to https://phabricator.wikimedia.org/P16750 and previous config saved to /var/cache/conftool/dbconfig/20210701-055243-marostegui.json
* 08:17 XioNoX: asw-a-codfw> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P16749 and previous config saved to /var/cache/conftool/dbconfig/20210701-052702-marostegui.json
* 08:04 akosiaris: [[phab:T187984|T187984]] Validated that ticket.wikimedia.org works, proceeding with a wider announcement
* 04:48 marostegui: Disconnect eqiad -> codfw replication from s1-s8
* 08:02 XioNoX: asw2-d-eqiad> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 07:49 akosiaris: [[phab:T187984|T187984]] Switch over ticket.discovery.wmnet to otrs1001
* 07:48 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:44 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 07:40 XioNoX: asw2-c-eqiad> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 07:37 akosiaris: [[phab:T187984|T187984]] Tested inbound email successfully
* 07:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:26 akosiaris: [[phab:T187984|T187984]] Tested outbound email, switching inbound email configuration and performing tests
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 with more weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12600 and previous config saved to /var/cache/conftool/dbconfig/20200916-072614-marostegui.json
* 07:22 jayme@cumin1001: START - Cookbook sre.hosts.decommission
* 07:22 jayme@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 07:21 jayme@cumin1001: START - Cookbook sre.hosts.decommission
* 07:12 akosiaris: [[phab:T187984|T187984]] Disable gravatar in system configuration to avoid leaking agent PII through a 3rd party service
* 07:03 akosiaris: [[phab:T187984|T187984]] validated that the OTRS installation is functional over SSH
* 07:02 akosiaris: [[phab:T187984|T187984]] migration script done. Config updates, rebuilds, package upgrades/reinstall and index rebuilds done
* 06:28 godog: codfw-prod: bump weight for ms-be2057 - [[phab:T261633|T261633]]
* 06:20 kart_: Updated cxserver to 2020-08-30-011854-production ([[phab:T253439|T253439]], [[phab:T260557|T260557]])
* 06:20 XioNoX: asw2-b-eqiad> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 06:15 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:11 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es2027 and es2028 for the first time with minimum weight [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12599 and previous config saved to /var/cache/conftool/dbconfig/20200916-061013-marostegui.json
* 06:08 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12598 and previous config saved to /var/cache/conftool/dbconfig/20200916-060717-marostegui.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 to clone es2031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12597 and previous config saved to /var/cache/conftool/dbconfig/20200916-055535-marostegui.json
* 05:53 XioNoX: asw2-a-eqiad> request system snapshot slice alternate all-members - [[phab:T262290|T262290]]
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12596 and previous config saved to /var/cache/conftool/dbconfig/20200916-055108-marostegui.json
* 05:50 XioNoX: msw1-codfw> request system snapshot slice alternate - [[phab:T262290|T262290]]
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Add es2027 and es2028 to dbctl [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12595 and previous config saved to /var/cache/conftool/dbconfig/20200916-053918-marostegui.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12594 and previous config saved to /var/cache/conftool/dbconfig/20200916-053507-marostegui.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 into vslow', diff saved to https://phabricator.wikimedia.org/P12593 and previous config saved to /var/cache/conftool/dbconfig/20200916-052343-marostegui.json
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es2011 and es2017 after cloning es2027 and es2028', diff saved to https://phabricator.wikimedia.org/P12592 and previous config saved to /var/cache/conftool/dbconfig/20200916-052241-marostegui.json
* 05:07 marostegui: Repool labsdb1010
* 02:22 mutante: deneb - sudo systemctl start package_builder_Clean_up_build_directory to fix icinga alert after failed build attempts


== 2020-09-15 ==
== 2021-06-30 ==
* 23:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.9/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: {{Gerrit|1c0b0d161fe1024d6d08a27bbacf5b62c56c9c01}}: Fix APCOND_FR_NEVERBLOCKED handling ([[phab:T262970|T262970]]) (duration: 00m 56s)
* 23:28 urbanecm: Evening B&C window finished
* 23:18 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.8/extensions/FlaggedRevs/backend/FlaggedRevsHooks.php: {{Gerrit|5beace32a396adfcce46b04e7f969b2f9f9effda}}: Fix APCOND_FR_NEVERBLOCKED handling ([[phab:T262970|T262970]]) (duration: 00m 58s)
* 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|667d88054097b195208818aee15bb1eb58955437}}: Add Parsoid to wmgMonologChannels with warning level (duration: 01m 07s)
* 23:14 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: {{Gerrit|ac8bd3894f2dc8f2735cc9fa7b860af1d91c6707}}: flaggedrevs: Remove non-existent config options (duration: 00m 58s)
* 23:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REVERT: {{Gerrit|8e719d54baa4c26aaa090e02503b0d9473301cce}}: Add Parsoid to wmgMonologChannels (duration: 00m 38s)
* 23:07 urbanecm@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
* 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8e719d54baa4c26aaa090e02503b0d9473301cce}}: Add Parsoid to wmgMonologChannels (duration: 01m 07s)
* 23:00 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|62b21d55a8f0a94b8cd268d5024df0cf64013dd5}}: Revert "Remove abusefilter-view right grant from wmf-config" ([[phab:T255506|T255506]]) (duration: 00m 59s)
* 21:43 Amir1: deleting auto-review logs from test2wiki ([[phab:T285608|T285608]])
* 20:44 brennen: removing extraneous recursive symlink /srv/mediawiki-staging/php-1.36.0-wmf.9/php-1.36.0-wmf.8
* 21:40 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T284931|T284931]] [[phab:T284459|T284459]] [[phab:T284394|T284394]])
* 18:32 Urbanecm: Morning B&C done
* 21:29 cstone: civicrm revision changed from {{Gerrit|789c92d13b}} to {{Gerrit|e07c2be1a7}}
* 18:28 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|084729b7fd0716f11265f1b37570afc120b27109}}: Remove abusefilter-view right grant from wmf-config ([[phab:T255506|T255506]]) (duration: 00m 56s)
* 21:23 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T284931|T284931]] [[phab:T284459|T284459]] [[phab:T284394|T284394]])
* 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1d3456570b80b1d8af1d2b71975496e54f87b24e}}: Enable MediaWiki client errors on frwiki ([[phab:T255585|T255585]]) (duration: 00m 57s)
* 19:06 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 07s)
* 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|79004b7e503c7274fa56d2699b423b6919fbc869}}: Enable the reverted tag on all wikis ([[phab:T164307|T164307]]) (duration: 00m 56s)
* 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
* 17:59 krinkle@deploy1001: Synchronized src/ServiceConfig.php: {{Gerrit|If727ae4335}} (duration: 00m 56s)
* 18:57 legoktm: legoktm@mwmaint2002:~$ sudo systemctl start mediawiki_job_purge_parsercache_pc[123] # to start split purge jobs ahead of the timers
* 17:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint, feeds time out (duration: 37m 42s)
* 18:54 legoktm: legoktm@mwmaint2002:~$ sudo systemctl stop mediawiki_job_parser_cache_purging.service # to stop zombie service
* 17:05 ppchelko@deploy1001: Started deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint, feeds time out
* 18:53 Amir1: adding urbanecm as admin of newprojects mailing list
* 17:05 ppchelko@deploy1001: Finished deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint (duration: 86m 46s)
* 18:12 Jeff_Green: authdns-update to deploy A/PTR records for frdev1002.frack.eqiad.wmnet
* 17:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 17:57 thcipriani: restart ci jenkins following upgrade
* 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 17:54 thcipriani: restart releases-jenkins following upgrade
* 16:57 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 17:16 moritzm: imported jenkins 2.289.2 to thirdparty/ci [[phab:T285532|T285532]]
* 15:38 ppchelko@deploy1001: Started deploy [restbase/deploy@f7cda70]: Fix analytics by-country endpoint
* 16:30 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki 'Tech/Server_switch_2020' 'Tech/Server_switch' 'Martin Urbanec' --move-subpages --reason='per [[:phab:T285866]]' # [[phab:T285866|T285866]]
* 15:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 16:10 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache for Beta Cluster (duration: 00m 46s)
* 15:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 16:08 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 01s)
* 15:30 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 16:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 20s)
* 15:30 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 16:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 16s)
* 15:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 16:03 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 17s)
* 15:28 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 16:02 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating banwikisource ([[phab:T284389|T284389]])
* 15:26 shdubsh: manual install prometheus-icinga-exporter upgrade on icinga2001
* 16:00 urbanecm@deploy1002: Synchronized dblists: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 17s)
* 14:53 godog: switch grafana to eqiad - [[phab:T259143|T259143]]
* 15:58 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 14s)
* 14:48 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)