You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ryankemper: Cleaning up a dangling no-longer-puppet-managed udev elasticsearch-readahead rule across all cirrus instances: `sudo cumin -b 36 C:profile::elasticsearch::cirrus 'sudo rm -fv /etc/udev/rules.d/elasticsearch-readahead.rules && sudo /sbin/udevadm control --reload && sudo /sbin/udevadm trigger'`)
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(252 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-10-16 ==
== 2021-08-03 ==
* 01:01 ryankemper: Cleaning up a dangling no-longer-puppet-managed udev elasticsearch-readahead rule across all cirrus instances: `sudo cumin -b 36 C:profile::elasticsearch::cirrus 'sudo rm -fv /etc/udev/rules.d/elasticsearch-readahead.rules && sudo /sbin/udevadm control --reload && sudo /sbin/udevadm trigger'`
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:56 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:56 cdanis@cumin1001: START - Cookbook sre.network.cf
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== 2020-10-15 ==
== 2021-08-02 ==
* 23:49 ryankemper: Began in-place reindex of `eqiad`, `codfw`, and `cloudelastic`. Running on `ryankemper@mwmaint2001` under tmux sessions `inplace_reindex_[eqiad, codfw, cloudelastic]`
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:00 krinkle@deploy1001: Synchronized wmf-config/env.php: {{Gerrit|I245e84e0b8c}} (duration: 01m 10s)
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 22:09 cdanis: previous sre.network.cf invocation was a no-op; just checking status
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 22:08 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:08 cdanis@cumin1001: START - Cookbook sre.network.cf
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 22:06 mutante: depooled remaining wtp* servers in codfw. old parsoid servers, new servers are parse2* ([[phab:T265558|T265558]])
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp2020.codfw.wmnet
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp201[6-9].codfw.wmnet
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp201[0-5].codfw.wmnet
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 20:27 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 20:27 cdanis@cumin1001: START - Cookbook sre.network.cf
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 19:46 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@88e1283]: spark: fix handling of unpartitioned data sources (duration: 06m 22s)
* 21:31 tzatziki: removing 1 file for legal compliance
* 19:43 marxarelli: all wikis promoted to 1.36.0-wmf.13 ([[phab:T263179|T263179]])
* 21:16 tzatziki: removing 7 files for legal compliance
* 19:39 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@88e1283]: spark: fix handling of unpartitioned data sources
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 19:33 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.13
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:30 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:23 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:20 catrope@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/DiscussionTools/: Correctly generate timezone abbreviations for parsing ([[phab:T265500|T265500]]) (duration: 01m 29s)
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 19:16 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/DiscussionTools/: Correctly generate timezone abbreviations for parsing ([[phab:T265500|T265500]]) (duration: 01m 51s)
* 19:00 urbanecm: Morning B&C window completed
* 19:14 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/Echo/: Drop text indent in modern Vector ([[phab:T264339|T264339]]) (duration: 01m 51s)
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 19:09 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/skins/Vector/: Vertically align personal tools ([[phab:T264339|T264339]]) (duration: 01m 43s)
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 19:07 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/WikimediaEvents/: Revert "clientError: Adds is_logged_in tag to aid filtering" ([[phab:T256173|T256173]]) (duration: 01m 58s)
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:04 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/UploadWizard/: Work around LESS calculating calc() values wrong ([[phab:T265560|T265560]]) (duration: 02m 07s)
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:32 mutante: depooling wtp2005 through wtp2009 (parsoid, old server generation) [[phab:T265558|T265558]]
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 18:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[6-9].codfw.wmnet
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 mutante: mx1001/mx2001: made previous live hack official and added benefactors@wikipedia alias, re-enabling puppet
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 17:51 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 17:46 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 17:19 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:17 jbond42: deleteing old pcc reports in compiler1002 to free disk space
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 17:12 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 17:06 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 17:05 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 17:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 16:57 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:56 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 16:54 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 16:51 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 16:48 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 16:46 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 16:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 16:25 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 16:25 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 16:14 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 16:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 16:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 16:11 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 16:11 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 16:11 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 15:53 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 15:53 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/CheckUser/includes/specials/: {{Gerrit|fd94002cf6070180a289296ec65ad224e5a0ae67}}: Revert "Validate username input before constructing subpage links" ([[phab:T265606|T265606]]) (duration: 02m 48s)
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 15:50 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 12:20 mutante: gerrit servers: disabling puppet
* 15:47 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 15:35 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 15:19 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 15:09 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 15:07 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@500bdad]: spark: correctly parse non-partitioned partition specs (duration: 00m 59s)
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 15:06 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@500bdad]: spark: correctly parse non-partitioned partition specs
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 14:51 elukey: roll restart druid-historical daemons on druid1004-1008 to pick up new conn pooling changes
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 14:51 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 11:27 hashar: restarting Jenkins on contint2001
* 14:45 jbond42: enable puppet post deploy puppetdb change blacklisting dynamic facts
* 11:27 hashar: restarting Jenkins on contint1001
* 14:41 ema: varnish 6.0.6-1wm2 uploaded to apt.wikimedia.org component/varnish6 [[phab:T264074|T264074]]
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:38 jbond42: disable puppet to deploy puppetdb change blacklisting dynamic facts
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:21 ema: cp3050: systemctl reload varnishkafka-webrequest.service [[phab:T264074|T264074]]
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 14:21 jayme: imported doxygen_1.8.19-1~deb10+wmf1 to component/ci buster-wikimedia - [[phab:T265579|T265579]]
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:12 ema: cp3050: restart varnishkafka-webrequest w/ libvarnishapi2 6.0.6-1wm2 [[phab:T264074|T264074]]
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 14:11 ema: cp3050: upgrade varnish to 6.0.6-1wm2 [[phab:T264074|T264074]]
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:10 ema: cp3050: upgrade varnish to 6.0.6-1wm2 [[phab:T26407|T26407]]
* 11:13 urbanecm: EU B&C window completed
* 12:58 gilles@deploy1001: Finished deploy [performance/navtiming@dff55f8]: (no justification provided) (duration: 00m 05s)
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 12:58 gilles@deploy1001: Started deploy [performance/navtiming@dff55f8]: (no justification provided)
* 11:08 moritzm: installing openjdk-11 security updates
* 12:12 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 10:47 vgutierrez: restart ats-backend on cp3050
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 10:00 akosiaris: [[phab:T264209|T264209]]. Initiate a docker pull of docker-registry.discovery.wmnet/mwcachedir:0.0.1 from all kubernetes and kubernetes staging nodes.
* 07:24 moritzm: installing libsndfile security updates on buster
* 08:17 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 07:12 moritzm: installing aspell security updates
* 04:27 ryankemper: Rolling upgrade for cirrus `codfw` complete
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:18 ryankemper: Rolling upgrade for cirrussearch `codfw` beginning
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)
* 02:18 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 02:14 ryankemper: Rolling upgrade for cirrussearch `eqiad` is complete
* 02:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 00:36 ryankemper: Beginning rolling upgrade for cirrussearch `eqiad`. Cookbook will restart elasticsearch on 36 nodes total, 3 nodes at a time
* 00:36 eileen: tools revision changed from {{Gerrit|d4e08c52de}} to {{Gerrit|a2a91d6c6a}}
* 00:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 00:24 twentyafterfour: phabricator update was uneventful
* 00:13 twentyafterfour: updating phabricator


== 2020-10-14 ==
== 2021-07-31 ==
* 23:35 foks: Removing one further file for legal compliance
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 23:28 foks: Removing nine files for legal compliance
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
* 23:11 ebernhardson: Syncronized wmf-config/InitialiseSettings.php to sync reduction of cirrus morelike query cache from 3 back to 1 day
* 23:08 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 01m 04s)
* 23:00 dwisehaupt: all payments hosts in eqiad are now running the REL1_35 code.
* 22:41 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression (duration: 02m 25s)
* 22:38 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression
* 22:13 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s)
* 22:12 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13
* 22:08 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive (duration: 03m 44s)
* 22:04 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive
* 22:01 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/NavigationTiming: BACON: [[gerrit:634002{{!}}Make attribution source logic more defensive]] [[phab:T263599|T263599]] (duration: 01m 05s)
* 21:51 dpifke@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling image preconnect in group0 ([[phab:T123582|T123582]]) (duration: 01m 03s)
* 21:33 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.13/skins/Vector/resources/skins.vector.styles/Menu.less: BACON: [[gerrit:634086{{!}}Stylesheet needs to be compatible with cached HTML]] [[phab:T265543|T265543]] (duration: 01m 07s)
* 20:39 marxarelli: group1 rolled back to 1.36.0-wmf.11 due to malformed html in nav. task incoming (cc: [[phab:T263179|T263179]])
* 20:37 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.11
* 20:32 marxarelli: rolling back group1 due to malformed html in nav menu
* 19:46 marxarelli: 1.36.0-wmf.13 promoted to group1. no new or concerning errors or changes in error rates ([[phab:T263179|T263179]])
* 19:39 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s)
* 19:38 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13
* 19:33 mutante: mx1001/mx2001 - temp. disabled puppet, live hacking urgent alias change since private repo needs to be fixed
* 19:14 mutante: depooling 5 of the older parsoid servers in codfw
* 19:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[1-5].codfw.wmnet
* 18:28 Urbanecm: wikiadmin@10.192.0.6(wikidatawiki)> DELETE FROM watchlist WHERE wl_user=104889; # [[phab:T265347|T265347]]
* 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6a56bb7fb762c53db5965f2698a93db2433d33d}}: Add rollbacker right on uzwiki ([[phab:T265509|T265509]]) (duration: 01m 04s)
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|0da89998e4e380f3ebe527a42a47dc66c49ee4d2}}: Add spamblacklistlog as a default right for the CU log user ([[phab:T239288|T239288]]) (duration: 01m 05s)
* 16:12 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
* 15:59 elukey: drain + reboot an-worker1100 to pick up GPU settings - [[phab:T255138|T255138]]
* 15:58 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 15:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
* 15:29 elukey: drain + reboot an-worker110[1,2] to pick up GPU settings - [[phab:T255138|T255138]]
* 15:28 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
* 15:24 jayme: enabled and ran puppet on deploy1001 - [[phab:T260917|T260917]]
* 14:56 elukey: drain + reboot an-worker109[8,9] to pick up GPU settings - [[phab:T255138|T255138]]
* 14:55 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 14:12 jayme: disable-puppet on deploy1001 to test a change in hemlfile puppet on deploy2001 only - [[phab:T260917|T260917]]
* 14:01 akosiaris: push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. [[phab:T264209|T264209]]
* 14:01 akosiaris: push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. [[phab:T265183|T265183]]
* 13:53 jbond42: enable puppet fleet wide post - convert puppetdb stockpile queue to tmpfs
* 13:48 jbond42: disable puppet fleet wide to convert puppetdb stockpile queue to tmpfs
* 12:46 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 10% - [[phab:T258405|T258405]]
* 11:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:43 moritzm: imported php-memcached, php-redis to component/icu63 [[phab:T264991|T264991]]
* 11:25 Urbanecm: EU B&C window completed
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c63632de6a20b2f00da91187e5cf416fd39d8c5b}}: Enable DiscussionTools as a beta feature on 30 more wikis ([[phab:T264693|T264693]]) (duration: 01m 15s)
* 11:16 moritzm: imported php-igbinary, php-apcu-bc to component/icu63 [[phab:T264991|T264991]]
* 09:59 moritzm: imported php-wmerrors, tideways, tideways-xhprof, wikidiff2, xdebug to component/icu63 [[phab:T264991|T264991]]
* 08:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:28 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 08:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:09 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12988 and previous config saved to /var/cache/conftool/dbconfig/20201014-071440-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12987 and previous config saved to /var/cache/conftool/dbconfig/20201014-065936-root.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12986 and previous config saved to /var/cache/conftool/dbconfig/20201014-064433-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12985 and previous config saved to /var/cache/conftool/dbconfig/20201014-062930-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12984 and previous config saved to /var/cache/conftool/dbconfig/20201014-061426-root.json
* 06:12 marostegui: Change UNIQUE into KEY on enwikivoyage.imagelinks [[phab:T265445|T265445]]
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 30%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12983 and previous config saved to /var/cache/conftool/dbconfig/20201014-055923-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 10%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12982 and previous config saved to /var/cache/conftool/dbconfig/20201014-054420-root.json


== 2020-10-13 ==
== 2021-07-30 ==
* 23:22 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/GrowthExperiments/: Revert removal of variant A ([[phab:T265372|T265372]]) (duration: 01m 04s)
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Rename GrowthExperiments help desk on ptwiki ([[phab:T265214|T265214]]) (duration: 01m 04s)
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable event logging in MediaViewer ([[phab:T260582|T260582]]) (duration: 01m 04s)
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 23:07 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable watchlist expiry on frwiki, fawiki, dewiki, cswiki ([[phab:T264780|T264780]]) (duration: 01m 04s)
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 21:16 mutante: icinga had gerrit health alert but did not notice an issue myself and was gone next check
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 21:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 21:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 21:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 21:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 20:44 mutante: bast1002 - apt-get autoremove - cleans up golang and ruby packages
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 20:44 mutante: bast1002 - apt-get remove nmap (it can be used on netmon hosts and was not consistent with other bast hosts)
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 20:15 ebernhardson: unban elastic2029 from production-search-psi-codfw
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:14 ebernhardson: restart production-search-psi-codfw on elastic2029 to reset any wonkiness from gc hell
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 20:06 marxarelli: 1.36.0-wmf.13 promoted to group0. no new or concerning errors or changes in error rates ([[phab:T263179|T263179]])
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 20:03 ebernhardson: add elastic2029-production-search-psi-codfw to cluster.routing.allocatin.exclude._name to drain active shards, instance currently in gc hell
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:54 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.13
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 19:40 dduvall@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.13 (duration: 40m 51s)
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:00 dduvall@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.13
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 18:58 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.9 (duration: 01m 56s)
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 18:56 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.8 (duration: 02m 10s)
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 18:53 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.6 (duration: 13m 00s)
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 18:23 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.11
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:21 marxarelli: 1.36.0-wmf.11 promoted to group1. no new errors ([[phab:T263177|T263177]]). promoting to all wikis
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 18:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:09 robh: scs-c1-codfw mgmt firmware updated, updating scs-a1-codfw [[phab:T238036|T238036]]
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 18:08 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 18:01 robh: scs-c1-codfw firmware update via [[phab:T238036|T238036]]
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 17:47 marxarelli: 1.36.0-wmf.13 branched at {{Gerrit|a6be801fc6331a6a6b96f02f368750200d50ab09}} for [[phab:T263179|T263179]]
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 17:35 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 07s)
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 17:34 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 17:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 17:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 17:30 marxarelli: 1.36.0-wmf.11 promoted to group0. no new errors ([[phab:T263177|T263177]]). preparing to promote to group1
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 17:18 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 17:18 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 17:17 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 17:16 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 17:15 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 17:15 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 16:39 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 16:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc (duration: 05m 29s)
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 16:26 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 15:56 papaul: power down ms-be2036 for maintenance
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 15:02 godog: bounce logstash on logstash1007, GC death
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 14:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 14:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 14:18 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|5b28fd685b9cb8d8e93650b5d02bc41b81d0883c}}: Add setmentor to wgAvailableRights (duration: 00m 59s)
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 13:42 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 13:40 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 13:15 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=BROKEN --fix # [[phab:T265336|T265336]]
* 11:23 moritzm: installing libsndfile security updates on stretch
* 13:08 moritzm: imported php-mailparse, php-mongodb, php-msgpack to component/icu63 [[phab:T264991|T264991]]
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 12:50 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=FIXME --fix # [[phab:T265336|T265336]]
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 12:49 Urbanecm: End of `urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --fix` # [[phab:T265336|T265336]]
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2026 for on-site maintenance [[phab:T263837|T263837]] ', diff saved to https://phabricator.wikimedia.org/P12975 and previous config saved to /var/cache/conftool/dbconfig/20201013-124940-marostegui.json
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 12:20 moritzm: imported dh-php, php-acpu, php-imagick to component/icu63 [[phab:T264991|T264991]]
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 11:22 moritzm: imported php-defaults, php-excimer, php-luasandbox, php-geoip to component/icu63 [[phab:T264991|T264991]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|90028b4c3c1cd4407e0834d603ccb8b256f2498e}}: Add suppressredirect right to reviewers on bnwiki ([[phab:T265169|T265169]]) (duration: 00m 58s)
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 11:14 Urbanecm: Start of `urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --fix # [[phab:T265336|T265336]]`
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 11:13 volans: installed spicerack_0.0.43-1+deb10u1_amd64.deb on cumin2001 , need to wait a long-rnning cookbook to end to upgrade both hosts
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e61fcebe7315f73d1fb4d531da37d2c1253115ee}}: Add namespace aliases for Turkish Wikipedia ([[phab:T265336|T265336]]) (duration: 00m 59s)
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 10:47 jayme: no-change rolling restart of push-notifications in codfw - [[phab:T265258|T265258]]
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 10:29 volans: upgrading spicerack on cumin2001 to 0.0.44
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 10:19 ema: cp3050: clear varnishkafka-webrequest's vut->sighup via stap [[phab:T264074|T264074]]
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 10:09 ema: cp3050: *reload* varnishkafka-webrequest [[phab:T264074|T264074]]
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json
* 10:04 volans: uploaded spicerack_0.0.44 to apt.wikimedia.org buster-wikimedia
* 09:55 ema: cp3054: systemctl restart varnishkafka-webrequest.service [[phab:T264074|T264074]]
* 09:51 ema: cp3052: systemctl restart varnishkafka-webrequest.service [[phab:T264074|T264074]]
* 09:39 kormat: running schema change against s1 in eqiad [[phab:T259831|T259831]]
* 09:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:32 ema: cp3050: set grouping by request (vut->g_arg = 2) on varnishkafka-webrequest [[phab:T264074|T264074]]
* 08:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:11 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 07:55 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:55 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:43 kormat: running schema change against s3 in eqiad [[phab:T259831|T259831]]
* 07:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 07:37 moritzm: installing ruby security updates on stretch
* 07:02 moritzm: installing PHP 7.0 security updates
* 06:39 moritzm: Installing httpcomponents-client security updates for Stretch
* 05:35 marostegui: Set global innodb_change_buffering = inserts; on pc2009 [[phab:T263443|T263443]]


== 2020-10-12 ==
== 2021-07-29 ==
* 17:03 jayme: fixed /var/lock/ permission (1777) on ms-be2036 - [[phab:T265208|T265208]]
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 15:41 godog: roll-restart logstash5 in codfw
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 14:44 _joe_: freed 1.5 GB of space on ms-be2036 by running "apt-get clean"
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 14:05 moritzm: uploaded php7.2 7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1+icu63 to component/icu63 [[phab:T264991|T264991]]
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 12:39 moritzm: installing rails security updates on Stretch
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 12:26 moritzm: installing spice security updates on Buster
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 11:38 Urbanecm: EU B&C done
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 11:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fff2532424f84970962f7de1e35d4250b83cb3da}}: [testwiki, test2wiki] Allow bureaucrats to grant import rights (duration: 00m 58s)
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4966e8a6b8ae4e6d5623dd35e65ed8fcf3338bc1}}: Enable wgCheckUserLogLogins at all wikis but few large wikis ([[phab:T253802|T253802]]) (duration: 00m 58s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 11:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:631809{{!}}Require autoconfirmed status to edit Wikidata Properties (T254280)]] (duration: 01m 00s)
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 10:26 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:26 hnowlan: roll-restarting restbase201[345678] for cert refresh
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 08:50 moritzm: uploaded libxml2 2.9.4+dfsg1-2.2+deb9u3+wmf1 to component/icu63 [[phab:T264991|T264991]]
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 07:54 godog: reboot ms-be2036 - [[phab:T265208|T265208]]
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 07:53 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:11 vgutierrez: restart pybal on lvs2009
* 14:09 vgutierrez: restart pybal on lvs2010
* 14:07 vgutierrez: restart pybal on lvs2008
* 14:05 vgutierrez: restart pybal on lvs2007
* 13:59 vgutierrez: restart pybal on lvs1014
* 13:55 vgutierrez: restart pybal on lvs1015
* 13:52 _joe_: restarting pybal on lvs1016
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:52 moritzm: restarting Tomcat on idp-test
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}


== 2020-10-10 ==
== 2021-07-28 ==
* 01:32 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633281{{!}}Enable session-ip log channel everywhere (T264799)]] (duration: 00m 59s)
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 00:54 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633277{{!}}Enable session-ip log channel on all but enwiki (T264799)]] (duration: 01m 01s)
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 00:18 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633276{{!}}Enable session-ip log channel on eswiki (T264799)]] (duration: 00m 55s)
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 00:13 mutante: built prometheus-nutcracker-exporter for buster and imported on apt1001 (0.2+nmu1)
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 13:29 moritzm: installing python2.7 security updates on stretch
* 13:08 moritzm: installing python3.5 security updates on stretch
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:27 moritzm: installing nginx security updates on thumbor*
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:27 Amir1: running several long-running queries against pc1007
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 07:53 moritzm: installing aspell security updates on stretch
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php


== 2020-10-09 ==
== 2021-07-27 ==
* 23:44 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633274{{!}}Enable session-ip log channel on Wikidata (T264799)]] (duration: 00m 59s)
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 23:25 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633272{{!}}Enable session-ip log channel on Commons (T264799)]] (duration: 00m 59s)
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 23:13 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL and only related ticket says resolved - powercycling it - boots normal but doesn't have a prod role ([[phab:T260271|T260271]])
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 23:07 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL or tickets
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 23:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 23:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 23:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:52 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633271{{!}}Enable session-ip log channel on group1, except Commons/Wikidata (T264799)]] (duration: 00m 57s)
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:23 tgr@deploy1001: Synchronized php-1.36.0-wmf.11/includes/: Backport: [[gerrit:633252{{!}}Log IP/device changes within the same session (T264799)]] & [[gerrit:633254{{!}}SessionManager: Always log IP/UA in session-ip]] (duration: 01m 04s)
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 22:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633210{{!}}Enable session-ip log channel on group0 (T264799)]] (duration: 00m 59s)
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 22:09 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/: Backport: [[gerrit:633252{{!}}Log IP/device changes within the same session (T264799)]] & [[gerrit:633254{{!}}SessionManager: Always log IP/UA in session-ip]] (duration: 01m 06s)
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 22:01 tgr_: rolling out [[phab:T264799|T264799]]#6533622
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 21:53 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=dewiki --userlist users.txt # users.txt contains Almeida # [[phab:T263935|T263935]]
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 20:41 dwisehaupt: upgrading pay-lvs1001 to buster
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 20:31 dwisehaupt: upgrading pay-lvs1002 to buster
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 20:04 dwisehaupt: upgrading payments1001 to buster
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 19:14 dwisehaupt: upgrading payments1002 to buster
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 18:44 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 18:30 dwisehaupt: upgrading payments1003 to buster
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 17:53 dwisehaupt: upgrading payments1004 to buster
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 17:52 cstone: civicrm revision changed from {{Gerrit|b86a15a430}} to {{Gerrit|585eb835d8}}, config revision is {{Gerrit|57843925bb}}
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 16:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 15:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 15:40 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 14:41 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 14:32 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 14:18 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 13:48 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 13:45 jayme: helm rollback push-notification in eqiad to revision 8
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 13:31 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 13:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 13:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 13:12 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 12:55 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 12:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 12:16 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 12:13 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 11:38 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 11:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 11:13 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 11:13 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 10:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 10:41 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 10:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 09:55 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 09:53 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 09:47 elukey: roll restart of hadoop-yarn-nodemanager on all hadoop workers to pick up new settings
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 09:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 09:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 09:07 XioNoX: remove user from all network devices
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 08:22 marostegui: Restart dbstore1005 mysql to pick up new buffer pool sizes
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 08:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 08:11 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 07:36 moritzm: installing xen security updates for buster (libs only)
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 07:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:11 moritzm: installing aspell security updates
* 07:34 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support.  Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 11:23 Lucas_WMDE: EU backport+config window done
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 08:57 _joe_: repooling mw225[12] for apis
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 08:36 jynus: reenabled puppet on mwmaint1002
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 07:52 jynus: disabling puppet on mwmaint1002
* 07:14 moritzm: installing krb security updates on buster
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json


== 2020-10-08 ==
== 2021-07-26 ==
* 23:42 ryankemper: `cloudelastic1006` done. Writes thawed, maintenance window lifted; restarts are done for `cloudelastic`
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 23:37 ryankemper: `cloudelastic1005` done
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 23:31 ryankemper: `cloudelastic1004` done
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 23:27 ryankemper: `cloudelastic1003` done
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 23:23 ryankemper: `cloudelastic1002` done
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 23:16 tgr_: Evening deploys done
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 23:16 ryankemper: `cloudelastic1001` is done restarting and cluster is green again. Proceeding to `cloudelastic1002`
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 23:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632797{{!}}Enable logging of session cookie changes everywhere (T264793)]] (duration: 01m 01s)
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 23:04 ryankemper: Beginning cluster restarts one server at a time. For each server, the process is depool->restart elasticsearch services->wait for services to restart and then pool->wait for cluster to return to green status before starting next server
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 23:01 ryankemper: Writes are frozen for `cloudelastic`: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic` on `mwmaint2001` => `Applied cluster-wide freeze`
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 22:56 ryankemper: `sudo apt policy wmf-elasticsearch-search-plugins` shows correct state: `Installed: 6.5.4-4~stretch`
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 22:56 ryankemper: `sudo -E cumin -b 6 C:role::elasticsearch::cloudelastic 'DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install wmf-elasticsearch-search-plugins'`
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 22:54 ryankemper: About to start plugin upgrade followed by restarts of `cloudelastic`. Maintenance window set for the next 2 hours on `cloudelastic100[1-6]`
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 21:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data (duration: 01m 04s)
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:53 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 21:52 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session/SessionBackend.php: Deduplicate SessionBackend::logPersistenceChange calls - [[phab:T264793|T264793]] (duration: 01m 01s)
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 21:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 21:00 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 20:50 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 20:45 volans@cumin1001: START - Cookbook sre.dns.netbox
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 20:43 volans: deploying Netbox DNS zone consolidation - [[phab:T264273|T264273]]
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:11 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 19:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name (duration: 01m 09s)
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 19:23 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 18:57 volker-e@deploy1001: Finished deploy [design/style-guide@b1166af]: Deploy design/style-guide:  (duration: 00m 06s)
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 18:57 volker-e@deploy1001: Started deploy [design/style-guide@b1166af]: Deploy design/style-guide:
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 18:17 tchanders@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632908{{!}}Enable Special:Investigate by default on production (T264357)]] (duration: 01m 06s)
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 17:50 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 17:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data (duration: 11m 55s)
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 17:44 root@cumin1001: START - Cookbook sre.dns.netbox
* 06:39 moritzm: installing krb5 security updates
* 17:37 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki
* 17:31 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:30 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:23 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:16 shdubsh: install prometheus-rsyslog-exporter_0.0.0+git20201008 on centrallog1001 - [[phab:T210137|T210137]]
* 16:25 mutante: rebooting cloudvirt1023 - trying PXE boot
* 16:19 hashar: Restarting CI Jenkins
* 16:15 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:09 volans@cumin1001: START - Cookbook sre.dns.netbox
* 16:08 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 16:08 volans@cumin1001: START - Cookbook sre.dns.netbox
* 14:21 marostegui: Set  global innodb_change_buffering = all; on pc2009 [[phab:T263443|T263443]]
* 14:17 moritzm: importing icu 63.1-6+deb10u1~wmf5 to component/icu63 [[phab:T264991|T264991]]
* 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 12:29 kart_: Updated cxserver to 2020-10-08-053343-production ([[phab:T264407|T264407]], [[phab:T264859|T264859]])
* 12:26 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:24 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:21 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 12:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:07 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:54 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:52 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1030.eqiad.wmnet
* 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1030.eqiad.wmnet
* 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1030.eqiad.wmnet
* 10:37 moritzm: installing Postgres security updates on netboxdb1001
* 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1029.eqiad.wmnet
* 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1029.eqiad.wmnet
* 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1029.eqiad.wmnet
* 10:32 moritzm: installing Postgres security updates on netboxdb2001
* 10:29 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
* 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
* 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
* 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
* 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
* 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
* 10:26 hnowlan: pooling restbase1028,restbase1029,restbase1030
* 10:22 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:14 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:40 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 09:10 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:09 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 08:38 godog: roll-restart swift-object-replicator on ms-be2* - [[phab:T261633|T261633]]
* 08:19 kormat: running schema change against s8 in eqiad [[phab:T259831|T259831]]
* 08:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:06 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:04 gehel@cumin1001: START - Cookbook sre.hosts.downtime
* 08:02 gehel: repooling wdqs2002
* 07:55 marostegui: Rebuild db2125 from snapshots - [[phab:T260670|T260670]]
* 07:45 marostegui: Stop MySQL on db1077 to build it from s1 snapshot
* 07:40 gehel: depooled wdqs2002 to catch up on lag
* 07:29 jayme: updated envoyproxy to 1.15.1-2 on all codfw hosts
* 07:23 moritzm: installing pyzmq updates from Buster point release
* 07:00 dcausse: depooling wdqs2002 (catching-up lag)
* 06:57 dcausse: restart blazegraph on wdqs2002 (stuck) [[phab:T242453|T242453]]
* 06:51 _joe_: enable notifications for wdqs-ssl-codfw
* 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 04:05 ejegg: updated fundraising python tools from {{Gerrit|5515923ef7}} to {{Gerrit|d4e08c52de}}
* 00:31 tgr_: evening deploys done
* 00:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632796{{!}}Enable logging of session cookie changes in group1 (T264793)]] (again, forgot to rebase the previous time) (duration: 00m 59s)
* 00:15 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632796{{!}}Enable logging of session cookie changes in group1 (T264793)]] (duration: 00m 57s)
* 00:03 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632795{{!}}Enable logging of session cookie changes in group0 (T264793)]] (duration: 00m 58s)


== 2020-10-07 ==
== 2021-07-24 ==
* 23:58 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session: Backport: [[gerrit:632685{{!}}Log when SessionManager is emitting cookies (T264793)]] (duration: 01m 00s)
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]
* 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
* 23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 23:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 21:55 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
* 21:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 21:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
* 20:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 20:09 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset (duration: 03m 23s)
* 20:05 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset
* 19:36 mutante: blog post: The latest addition to our family of Wikimedia languages is "Inari Sami" with language code "smn". It is a Sami language spoken by the Inari Sami of Finland and has about 400 native speakers. It's in the Uralic language family. Wikipedia will be created in [[phab:T264859|T264859]]. https://en.wikipedia.org/wiki/Inari_Sami {{!}} https://iso639-3.sil.org/code/smn {{!}}
* 18:30 ryankemper: search team's backport deploy is complete
* 18:30 ryankemper@deploy1001: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:632683{{!}}cloudelastic: envoy sits in front now (T263073)]] (duration: 00m 58s)
* 18:29 ryankemper: Above tests are as expected, syncing changes everywhere: `scap sync-file wmf-config/ProductionServices.php 'Config: [[gerrit:632683{{!}}cloudelastic: envoy sits in front now (T263073)]]'`
* 18:27 ryankemper: `scap pull`ed onto `mwdebug2001`; talking to cloudelastic via mediawiki from codfw has the expected decrease in latency due to the tls connection pooling
* 18:24 ryankemper: `scap pull`ed onto `mwdebug1002`. Talking to cloudelastic on localhost (which routes thru envoy), 6105 is `cloudelastic-chi-eqiad`, 6106 is `cloudelastic-omega-eqiad`, and 6107 is `cloudelastic-psi-eqiad` as expected
* 18:20 ryankemper: (backport) HEAD set to {{Gerrit|834b4571f978674162fa805906e665e35ac68e27}} as expected
* 18:12 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/HeaderCallback.php: Preload class used in HeaderCallback - [[phab:T261260|T261260]] (duration: 01m 01s)
* 17:58 hashar: Pulled https://gerrit.wikimedia.org/r/c/mediawiki/core/+/632680  on deployment staging area  and mw2001
* 17:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 16:39 jgleeson: updated civicrm from {{Gerrit|39b4f954ed}} to {{Gerrit|b86a15a430}}
* 16:35 mutante: switching webproxy service names to the new local install servers in esams/eqsin/ulsfo [[phab:T242602|T242602]]
* 15:12 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog1001 - [[phab:T259780|T259780]]
* 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:33 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:22 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 14:04 hoo: Ran "mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P1820 --new-data-type external-id" on mwmaint2001 ([[phab:T263986|T263986]])
* 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 14:03 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 14:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 13:42 jayme: updated envoyproxy to 1.15.1-2 on all eqiad hosts
* 13:39 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 13:18 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide:  (duration: 00m 04s)
* 13:18 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
* 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 12:24 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 12:22 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 11:55 _joe_: rolling restart of restbase due to running puppet with changed config-vars (a noop for the actual configuration)
* 11:22 Urbanecm: EU B&C window done
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f85bc3056f809910c0487fb0b0559b3de92b1992}}: Enable bot passwords at all fishbowl and private wikis ([[phab:T258356|T258356]]) (duration: 00m 58s)
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|57297362c0a22ecf16648b7be4a73c4cb80d53ef}}: Fix OAuthRateLimiter rate limit configuration (duration: 00m 59s)
* 11:14 urbanecm@deploy1001: sync-file aborted: {{Gerrit|57297362c0a22ecf16648b7be4a73c4cb80d53ef}}: Fix OAuthRateLimiter rate limit configuration (duration: 00m 02s)
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6cdeea2c4c15780a641722157584f12febedab2a}}: Set CXMTThresholdForPublish to 95% for Vietnamese Wikipedia ([[phab:T264161|T264161]]) (duration: 00m 59s)
* 10:58 marostegui: Set innodb_change_buffering = inserts on pc2009 [[phab:T263443|T263443]]
* 09:53 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from mw load groups [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12945 and previous config saved to /var/cache/conftool/dbconfig/20201007-095355-kormat.json
* 09:44 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 100%: 75', diff saved to https://phabricator.wikimedia.org/P12944 and previous config saved to /var/cache/conftool/dbconfig/20201007-094412-kormat.json
* 09:21 moritzm: imported icu63 63.1-6+deb10u1~wmf1 to component/icu63 for stretch-wikimedia
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 [[phab:T264755|T264755]] ', diff saved to https://phabricator.wikimedia.org/P12943 and previous config saved to /var/cache/conftool/dbconfig/20201007-090943-marostegui.json
* 08:39 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12942 and previous config saved to /var/cache/conftool/dbconfig/20201007-083903-kormat.json
* 08:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:32 godog: roll-restart statsd-exporter across ms-be* after puppet run - [[phab:T264588|T264588]]
* 08:09 jayme: updated envoyproxy to 1.15.1-2 on all non mw and restbase hosts
* 08:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:58 volans@cumin1001: START - Cookbook sre.dns.netbox
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2015 from dbctl [[phab:T264700|T264700]]', diff saved to https://phabricator.wikimedia.org/P12941 and previous config saved to /var/cache/conftool/dbconfig/20201007-074951-marostegui.json
* 07:14 marostegui: Stop MySQL es2015 for decommissioning [[phab:T264700|T264700]]
* 05:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:46 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 02:37 eileen: civicrm revision changed from {{Gerrit|a30da7f92a}} to {{Gerrit|39b4f954ed}}, config revision is {{Gerrit|0ca9a3a055}}
* 01:00 cdanis: repool esams; cr2-esams router upgrade complete
* 00:43 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr2-esams> request chassis routing-engine master switch
* 00:40 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr2-esams> request system reboot other-routing-engine
* 00:36 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr2-esams> request system software add /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz re0 no-validate
* 00:26 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr2-esams> request chassis routing-engine master switch
* 00:22 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr2-esams> request system reboot other-routing-engine
* 00:15 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr2-esams> request system software add re1 no-validate /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz
* 00:01 mutante: reinstalling testvm[345]001 to confirm OS installs work as normal after switching DHCP servers in POPs ([[phab:T252526|T252526]])


== 2020-10-06 ==
== 2021-07-23 ==
* 23:55 mutante: 🖧  switched DHCP server for eqsin from install2003 to install5001 - homer deployed to cr*eqsin* ([[phab:T252526|T252526]]) 🖧
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 23:53 mutante: 🖧  switched DHCP server for ulsfo from install2003 to install4001 - homer deployed to cr*ulsfo* ([[phab:T252526|T252526]]) 🖧
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 23:52 mutante: 🖧  switched DHCP server for esams from install1003 to install3001 - homer deployed to cr*esams* ([[phab:T252526|T252526]]) 🖧
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:43 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:11 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 23:07 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 22:32 ryankemper: Restart of `wdqs-categories` done. WDQS deploy is complete
* 16:15 effie: enable puppet on mc-gp* hosts
* 21:57 ryankemper: Restarting `wdqs-categories` across production instances one-at-a-time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 21:57 ryankemper: Restarting `wdqs-categories` across all test instances (not public facing): `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 21:56 ryankemper: Restarting `wdqs-updater` across the fleet: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 21:55 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@e56a20e]: 0.3.51 (duration: 13m 09s)
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 21:43 ryankemper: All tests passing on canary `wdqs1003`, proceeding to rest of fleet
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 21:42 ryankemper@deploy1001: Started deploy [wdqs/wdqs@e56a20e]: 0.3.51
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 21:14 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:632535 (duration: 01m 00s)
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 20:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 18:40 Urbanecm: Morning B&C done
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 18:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/skins/MinervaNeue/: {{Gerrit|2118d265c0f5b6c914efeba86ba7eacd30c5ee0f}}: Hot fix: Use display for hiding/showing sidebar on OS 14_0 ([[phab:T264376|T264376]]) (duration: 01m 00s)
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 18:37 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/skins/MinervaNeue/: {{Gerrit|d428ccbdf3be9a45139f8b8c0874c113f1732198}}: Hot fix: Use display for hiding/showing sidebar on OS 14_0 ([[phab:T264376|T264376]]) (duration: 01m 03s)
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 18:25 ppchelko@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase.php gerrit:631775 [[phab:T263493|T263493]] [[phab:T259622|T259622]] (duration: 00m 58s)
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 18:23 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: IS.php gerrit:631775 [[phab:T263493|T263493]] [[phab:T259622|T259622]] (duration: 00m 59s)
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 18:19 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632516 [[phab:T264043|T264043]] (duration: 00m 59s)
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 18:15 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632323 [[phab:T264637|T264637]] (duration: 00m 58s)
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 18:12 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632484 [[phab:T264637|T264637]] (duration: 00m 58s)
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 15:41 godog: centrallog* delete archived logs from old, single file, organization
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 15:23 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 15:23 jayme: updated envoyproxy to 1.15.1-2 on mw-canary and restbase-canary
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 14:57 sukhe: upload dnsdist_1.5.0-1wm1 to apt.wm.o (buster) - [[phab:T263789|T263789]]
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12936 and previous config saved to /var/cache/conftool/dbconfig/20201006-144701-kormat.json
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 14:45 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 14:45 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 5% - [[phab:T262946|T262946]]
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 14:44 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 14:40 jayme: updated envoyproxy to 1.15.1-2 on mw2295.codfw.wmnet,restbase2017.codfw.wmnet
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-backend,name=restbase2009.codfw.wmnet
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase2009.codfw.wmnet
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2009.codfw.wmnet
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 14:36 hnowlan: repooling restbase2009
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 14:31 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12935 and previous config saved to /var/cache/conftool/dbconfig/20201006-143157-kormat.json
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 14:19 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide:  (duration: 00m 05s)
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 14:19 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 14:15 jayme: installed envoyproxy 1.15.1-2 on mwdebug1001
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 14:08 marostegui: Reboot db1076 for kernel upgrade [[phab:T264755|T264755]]
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]
* 14:03 marostegui: Power cycle db1076 [[phab:T264755|T264755]]
* 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 ', diff saved to https://phabricator.wikimedia.org/P12934 and previous config saved to /var/cache/conftool/dbconfig/20201006-135810-marostegui.json
* 13:41 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12932 and previous config saved to /var/cache/conftool/dbconfig/20201006-134149-kormat.json
* 13:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:40 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from dump/vslow, add to all other contributions/logpager/recentchanges*/watchlist temporarily [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12931 and previous config saved to /var/cache/conftool/dbconfig/20201006-134020-kormat.json
* 13:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 13:14 jayme: pushed docker-registry.discovery.wmnet/envoy:1.15.1-2 - [[phab:T264157|T264157]]
* 13:04 marostegui: Change innodb_change_buffering = inserts on db2075 db2089 db2099 db2111 db2128 [[phab:T263443|T263443]]
* 12:55 godog: swift codfw-prod: bump weight for ms-be2057 - [[phab:T261633|T261633]]
* 12:20 elukey: update HDFS Namenode GC/Heap settings on an-master100[1,2]
* 12:13 jayme: imported envoyproxy_1.15.1-2 to buster-wikimedia and stretch-wikimedia
* 12:08 jbond42: deploy puppetlabs-stdlib 5.2
* 11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:35 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 11:34 Urbanecm: EU B&C window done
* 11:34 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=arbcom_ruwiki --fix # [[phab:T264430|T264430]] # P12930
* 11:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|07c19f97c79ec20d6b1657e589acfc242dd53b09}}: arbcom_ruwiki: Set AK as alias for NS_PROJECT ([[phab:T264430|T264430]]) (duration: 00m 58s)
* 11:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7e4e81129b8697c394ec329dd2b3c784e607a4d1}}: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons ([[phab:T264430|T264430]]) (duration: 00m 58s)
* 11:30 urbanecm@deploy1001: Synchronized static/favicon/arbcom_ruwiki.ico: {{Gerrit|7e4e81129b8697c394ec329dd2b3c784e607a4d1}}: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons ([[phab:T264430|T264430]]) (duration: 00m 58s)
* 11:20 XioNoX: push L3 prep work to cloudsw1-c8-eqiad
* 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7b1a4fad0f55c626e42961489062115d5f97ed6c}}: ruewiki: Add rollbacker, grantable and revokable by sysops ([[phab:T264147|T264147]]) (duration: 00m 58s)
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5cc7027ba8d0ddee5c9898b80afe850603bf870e}}: Allow bureaucrats to remove sysop permissions on Commons ([[phab:T261481|T261481]]) (duration: 00m 58s)
* 11:07 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 03m 14s)
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5f9721b3300c8e733d331bcbc754d31d9493f8ba}}: GrowthExperiments: Change Help Page URL for kowiki ([[phab:T254364|T254364]]) (duration: 01m 00s)
* 11:04 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
* 11:02 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 00m 12s)
* 11:02 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
* 11:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 10:48 effie: set mw2279.codfw.wmnet as inactive [[phab:T264698|T264698]]
* 10:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2279.codfw.wmnet
* 10:45 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
* 10:44 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
* 10:43 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
* 10:41 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
* 10:37 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009 (duration: 00m 15s)
* 10:37 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009
* 10:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:33 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: (no justification provided) (duration: 03m 01s)
* 10:31 volans@cumin1001: START - Cookbook sre.dns.netbox
* 10:30 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: (no justification provided)
* 10:01 marostegui: Restart mysql on dbstore1004 to pick up new buffer pool sizes
* 09:59 effie: enable puppet on mc20*
* 09:41 effie: enable puppet on mc10*
* 09:38 effie: disable puppet on mc*
* 09:27 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:26 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 08:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 08:33 jayme: imported envoyproxy_1.15.1-1+deb9u1 to stretch-wikimedia
* 08:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 08:02 volans: removing unused ms-fe and ms-fe-thumbs svc records from DNS (gerrit/628086)
* 07:53 marostegui: Change innodb_change_buffering = inserts on db2087:3316 db2089:3316 db2076 db2097:3316 db2114 [[phab:T263443|T263443]]
* 07:39 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 07:35 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 07:31 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 07:17 marostegui: Remove es2015 and es2017 from tendril and zarcillo [[phab:T264700|T264700]] [[phab:T264386|T264386]]
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 [[phab:T264700|T264700]] ', diff saved to https://phabricator.wikimedia.org/P12926 and previous config saved to /var/cache/conftool/dbconfig/20201006-071451-marostegui.json
* 07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2017 from dbctl [[phab:T264386|T264386]]', diff saved to https://phabricator.wikimedia.org/P12925 and previous config saved to /var/cache/conftool/dbconfig/20201006-052849-marostegui.json


== 2020-10-05 ==
== 2021-07-22 ==
* 23:11 ejegg: updated payments staging from {{Gerrit|52704ffe24}} to {{Gerrit|db03677b2d}}
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 22:27 mutante: removing shinken puppet module and role
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 22:01 ebernhardson: restore wikidatawiki_content enwiki_content enwiki_general and commonswiki_file to default index.merge.policy.deletes_pct_allowed on eqiad cirrus cluster [[phab:T264053|T264053]]
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 21:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 20:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 20:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 20:26 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (32 sector, 16kB) readahead settings [[phab:T264053|T264053]]
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 20:13 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (64 sector, 32kB) readahead settings [[phab:T264053|T264053]]
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 19:56 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2050 to take reduced (128kB) readahead settings [[phab:T264053|T264053]]
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 19:31 mutante: ran sre.dns.netbox to push addition of an-worker1113 which was commited in prod repo but not in netbox data
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 19:30 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 19:27 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 18:59 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 00m 08s)
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 18:59 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 18:58 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 12m 08s)
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 18:46 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 18:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:17 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 18:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 18:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 18:10 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 17:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 17:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 17:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 17:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 15:15 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 14:56 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 14:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 14:41 elukey: shutdown stat1005 and stat1008 for ram expansion (1005 again)
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 14:36 ppchelko@deploy1001: Finished deploy [restbase/deploy@366a543]: [[phab:T263133|T263133]] [[phab:T264035|T264035]] (duration: 22m 23s)
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 14:25 elukey: shutdown an-master1001 for ram expansion
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 14:13 ppchelko@deploy1001: Started deploy [restbase/deploy@366a543]: [[phab:T263133|T263133]] [[phab:T264035|T264035]]
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 14:01 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 13:58 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 13:55 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 13:54 elukey: shutdown stat1005 for ram upgrade
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 13:31 elukey: shutdown an-master1002 for ram expansion (64 -> 128G)
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 12:39 moritzm: installing curl security updates on remaining hosts
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 11:34 hoo@deploy1001: Synchronized wmf-config/: Revert "Remove $wgExtraLanguageNames from Wikidata and Commons" ([[phab:T264295|T264295]]) (duration: 00m 59s)
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|be73f155001e9095697c3c21a208c63e7bf5d2d1}}: Move changetags right from users to sysop [trwiki] ([[phab:T264508|T264508]]) (duration: 00m 59s)
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cd30b626e23b48146b970c72731f8f7bb1eee9e1}}: wgSkipSkins: Exclude contenttranslation skin from skin options for users ([[phab:T263093|T263093]]) (duration: 00m 59s)
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 11:05 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:632212{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 11:04 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:632212{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:632204{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:632204{{!}} Bumping portals to master (T128546)]] (duration: 01m 00s)
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 10:34 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 10:32 ema: cp3052: pool with varnish 5.1.3-1wm15 [[phab:T264398|T264398]]
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 10:28 ema: cp3052: depool and downgrade varnish to 5.1.3-1wm15 [[phab:T264398|T264398]]
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 10:08 moritzm: installing ldap-replica1002 [[phab:T264390|T264390]]
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 09:52 moritzm: installing ldap-replica1001 [[phab:T264390|T264390]]
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 09:22 moritzm: installing ldap-replica2003 [[phab:T264390|T264390]]
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 09:02 hnowlan: bootstrapping restbase1030-b
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 08:57 moritzm: installing ldap-replica2004 [[phab:T264390|T264390]]
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 08:40 kormat@cumin1001: dbctl commit (dc=all): 'db2073 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12918 and previous config saved to /var/cache/conftool/dbconfig/20201005-084022-kormat.json
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 08:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 08:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 08:38 kormat@cumin1001: dbctl commit (dc=all): 'Add db2119 to s4 dump/vslow temporarily [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12917 and previous config saved to /var/cache/conftool/dbconfig/20201005-083822-kormat.json
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 08:23 godog: prometheus codfw/ops, add 100G to the LV
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 08:06 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:27 moritzm: installing libwebp security updates on stretch
* 07:46 marostegui: Stop mysql on es2017 [[phab:T264386|T264386]]
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 07:30 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 06:52 XioNoX: add static NAT to pfw3-eqiad - [[phab:T264356|T264356]]
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 06:33 elukey: reboot stat1005 to resolve weird GPU state (scheduled last week)
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2017 [[phab:T264386|T264386]] ', diff saved to https://phabricator.wikimedia.org/P12916 and previous config saved to /var/cache/conftool/dbconfig/20201005-050636-marostegui.json
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 11:36 Lucas_WMDE: EU backport+config window done
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:42 effie: restart pybal on lvs2010  and lvs1016
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE


== 2020-10-03 ==
== 2021-07-21 ==
* 15:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: emergency: {{Gerrit|840545f1d9115ea6b672cecce1762d850d8b1f54}}: Restrict flow-hide right to autoconfirmed users on zhwiki ([[phab:T264489|T264489]]) (duration: 01m 17s)
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 00:08 ejegg: updated fundraising CiviCRM from {{Gerrit|256adda03c}} to {{Gerrit|a30da7f92a}}
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:27 dancy: testing upcoming Scap release on beta
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 10:50 moritzm: installing systemd security updates on bullseye
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 10:14 effie: enable puppet on mw* servers
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:17 effie: enable puppet on alert*
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 07:16 godog: powercycle ms-be2048
* 07:03 moritzm: installing systemd security updates on stretch
* 06:51 effie: restart memcached on eqiad mc* hosts
* 06:51 effie: enable puppet on mc* hosts
* 06:35 effie: disable puppet on mc1* hosts and icinga - [[phab:T271967|T271967]]
* 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2020-10-02 ==
== 2021-07-20 ==
* 22:00 mutante: depooling mw2271 because Icinga alerts about memcached and SAL shows there were ongoing tests of some kind on it
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|caa5a076f39b051b01622aa3e4c9d716a8643eef}}: Set wgGEMentorDashboardBackendEnabled properly ([[phab:T285811|T285811]]) (duration: 00m 57s)
* 21:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=mw2271.codfw.wmnet
* 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|dafd953eb5cd35bddbd2fd348b03066420a42362}}: updateMenteeData: Make it possible to disable script per-wiki ([[phab:T285811|T285811]]) (duration: 00m 58s)
* 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:57 urbanecm: Start server-side upload for 4 large PNG files ([[phab:T285708|T285708]])
* 21:32 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
* 21:26 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 21:22 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 19:14 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 17:06 rzl: enabled puppet on A:mw
* 18:35 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 18:27 effie: enable puppet on mw2271
* 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 18:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events (duration: 02m 01s)
* 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
* 18:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events
* 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 17:15 mutante: submitted puppet refactoring change on maps servers
* 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 16:49 effie: disable puppet on mw2271 and briefly depool it
* 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:39 _joe_: restarting redis on rdb2003, instance 6380
* 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
* 15:28 hnowlan: bootstrapping restbase1030-a
* 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:25 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
* 14:45 cdanis@deploy1001: Synchronized docroot/wikimediafoundation.org: Separate foundation.wikimedia.org docroot & add .well-known/matrix/server [[phab:T261531|T261531]] {{Gerrit|4573776bd}} {{Gerrit|2fb4c20ae}} (duration: 01m 01s)
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
* 14:19 moritzm: installing LLVM 7 bugfix updates from Buster point release
* 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
* 14:08 effie: enable puppet on mwdebug1001
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
* 14:08 moritzm: purging some unused kernels on ping* (these only have 3GB "disks")
* 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
* 13:37 Urbanecm: Create bot_passwords table at fishbowl wikis ([[phab:T258356|T258356]])
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
* 13:35 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12905 and previous config saved to /var/cache/conftool/dbconfig/20201002-133545-kormat.json
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
* 13:20 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12904 and previous config saved to /var/cache/conftool/dbconfig/20201002-132042-kormat.json
* 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:00 moritzm: installing Linux 4.19.146 on Buster updates (from latest Buster point release, at this point only installing the updates, no reboots (yet))
* 15:23 vgutierrez: pool dns1002 - [[phab:T286069|T286069]]
* 12:26 effie: disable puppet on mwdebug1001
* 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db2140 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12903 and previous config saved to /var/cache/conftool/dbconfig/20201002-121830-kormat.json
* 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 12:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
* 12:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
* 12:08 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12902 and previous config saved to /var/cache/conftool/dbconfig/20201002-120825-kormat.json
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
* 12:05 hnowlan: bootstrapping restbase1029-c
* 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 11:53 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12901 and previous config saved to /var/cache/conftool/dbconfig/20201002-115322-kormat.json
* 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 11:22 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:53 urbanecm: Start server-side upload for 7 large PNG files ([[phab:T285708|T285708]])
* 10:59 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
* 10:57 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 10:47 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 10:47 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 10:44 kormat@cumin1001: dbctl commit (dc=all): 'db2110 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12900 and previous config saved to /var/cache/conftool/dbconfig/20201002-104453-kormat.json
* 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 10:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:46 vgutierrez: depool dns1002 - [[phab:T286069|T286069]]
* 10:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 10:43 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12899 and previous config saved to /var/cache/conftool/dbconfig/20201002-104320-kormat.json
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 10:40 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 10:36 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 10:28 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 67%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12898 and previous config saved to /var/cache/conftool/dbconfig/20201002-102817-kormat.json
* 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 10:13 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 33%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12897 and previous config saved to /var/cache/conftool/dbconfig/20201002-101313-kormat.json
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 10:06 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 09:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 09:56 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 09:48 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 09:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 09:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:27 kormat@cumin1001: dbctl commit (dc=all): 'db2106 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12896 and previous config saved to /var/cache/conftool/dbconfig/20201002-092715-kormat.json
* 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
* 09:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:19 jayme: running ipvsadm -D -t 10.2.1.20:10042; ipvsadm -D -t 10.2.1.16:1969 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
* 09:18 jayme: running ipvsadm -D -t 10.2.2.20:10042; ipvsadm -D -t 10.2.2.16:1969 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 09:17 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 09:14 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 09:12 jayme: running puppet on lvs servers - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 09:11 arturo: added helm3 package to buster-wikimedia/thirdparty/kubeadm-k8s-1-17 ([[phab:T264221|T264221]])
* 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
* 09:09 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10{{!}}0[1-9]).codfw.wmnet
* 09:08 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 09:08 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 09:07 hnowlan: bootstrapping restbase1029-b cassandra
* 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - [[phab:T285643|T285643]]
* 09:05 hashar: gerrit: running garbage collector
* 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
* 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:44 moritzm: installing systemd security updates on buster
* 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
* 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
* 08:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
* 08:59 root@cumin1001: START - Cookbook sre.hosts.downtime
* 11:58 Lucas_WMDE: EU config+backport window done
* 08:54 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 03s)
* 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (3/3) (duration: 00m 56s)
* 08:54 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
* 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 08:42 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 34s)
* 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 08:41 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
* 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (2/3) (duration: 00m 56s)
* 08:30 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 00m 33s)
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (1/3) (duration: 00m 56s)
* 08:30 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
* 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 3/3) (duration: 00m 56s)
* 08:29 moritzm: installing pyzmq bugfix update from buster point release
* 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 2/3) (duration: 00m 56s)
* 08:24 moritzm: installing nginx security updates on puppetdb*
* 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 1/3) (duration: 00m 57s)
* 08:17 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 01m 35s)
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:705498{{!}}Add patroller group for ckbwiki (T285221)]] (duration: 00m 57s)
* 08:16 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (2/2, beta) (duration: 00m 56s)
* 07:42 moritzm: installing libcommons-compress-java security updates
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (1/2, prod) (duration: 00m 57s)
* 07:35 godog: swift codfw-prod bump weight for ms-be2057 - [[phab:T261633|T261633]]
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704867{{!}}Update config for language switching on pilot wikis (T286459)]] (duration: 00m 59s)
* 07:29 godog: prometheus codfw/k8s, add 50G to the LV
* 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:23 moritzm: installing libx11 security updates on buster
* 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:51 _joe_: restarting php-fpm on all appservers in eqiad, in batches of 10%, for testing the procedure suggested at [[phab:T264362|T264362]]
* 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2011 from dbctl [[phab:T264261|T264261]]', diff saved to https://phabricator.wikimedia.org/P12893 and previous config saved to /var/cache/conftool/dbconfig/20201002-053020-marostegui.json
* 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
* 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
* 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
* 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
* 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
* 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
* 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
* 03:17 eileen: civicrm revision changed from {{Gerrit|20e9ef6bbb}} to {{Gerrit|819c11307d}}, config revision is {{Gerrit|bb405c5232}}


== 2020-10-01 ==
== 2021-07-19 ==
* 23:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 34s)
* 20:48 urbanecm: Deploy security patch for [[phab:T286884|T286884]]
* 23:38 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
* 20:29 vgutierrez: pool text@codfw - [[phab:T286921|T286921]]
* 23:33 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 24s)
* 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
* 23:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
* 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467{{!}}prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s)
* 23:07 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
* 22:36 James_F: Manually created mediawiki/extensions.git REL1_35 at {{Gerrit|7ab9a74c9ebbb22ad9fb9b7c95c91b7fad8bf8c6}} for [[phab:T264365|T264365]]
* 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448{{!}}Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s)
* 22:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - [[phab:T286921|T286921]]
* 22:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - [[phab:T286921|T286921]]
* 22:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 18:46 brennen: gerrit1001: restarting gerrit
* 22:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:40 vgutierrez: stop pybal on lvs2009  - [[phab:T286921|T286921]]
* 22:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:38 brennen: re-enabling puppet on gerrit1001]
* 21:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - [[phab:T286921|T286921]]
* 21:29 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 as well [[phab:T264363|T264363]]
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 21:29 James_F: Manually created mediawiki/skins.git REL1_35 at {{Gerrit|796693cb7a2ee3191fcbe19769d341bd0530bd4a}} for [[phab:T264365|T264365]]
* 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
* 21:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P<nowiki>{</nowiki>cloudelastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 21:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - [[phab:T286921|T286921]]
* 21:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group1
* 18:20 vgutierrez: enabling pybal on lvs2007 - [[phab:T286921|T286921]]
* 20:48 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11  refs [[phab:T263177|T263177]] (duration: 01m 06s)
* 18:19 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 20:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11  refs [[phab:T263177|T263177]]
* 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
* 20:19 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
* 18:06 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:705437{{!}}pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately]] (duration: 00m 56s)
* 20:08 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.11/includes/parser/: sync ParserCache patches to unblock the train [[phab:T264257|T264257]] [[phab:T263177|T263177]] (duration: 00m 59s)
* 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 18:40 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: increase more_like recommendation cache from one to three days [[phab:T264053|T264053]] (duration: 00m 59s)
* 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:49 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}} (duration: 13m 42s)
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:35 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}}
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
* 17:24 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}} (duration: 01m 34s)
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:24 mutante: etherpad1002 - attempted to upgrade Etherpad to newer version but wasn't working, reverted to previous one
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
* 17:22 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}}
* 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
* 16:46 volans: migrating esams DNS records to the autogenerated ones from Netbox - [[phab:T258729|T258729]]
* 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 16:19 bblack: rebooting lvs1016 to a fresh state for interface config and error counters, etc - [[phab:T264227|T264227]]
* 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
* 15:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
* 15:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:30 volans: running puppet on elastic2038 after nework was restored
* 15:53 bblack: lvs1016: re-disabled puppet with ticket ref in comment, downed interface enp5s0f0 since it's flapping furiously - [[phab:T264227|T264227]]
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
* 15:53 bblack: lvs1016: re-disabled puppet with ticket ref in comment, downed interface enp5s0f0 since it's flapping furiously
* 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 14:55 jayme: running ipvsadm -D -t 10.2.2.10:8081; ipvsadm -D -t 10.2.2.47:8889 on lvs1015.eqiad.wmnet - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
* 14:55 moritzm: installing npm security updates on buster
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 14:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 14:53 jayme: running ipvsadm -D -t 10.2.1.10:8081; ipvsadm -D -t 10.2.1.47:8889 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 14:52 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 14:50 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 14:48 jayme: restarting pybal on lvs2010.codfw.wmnet - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 14:42 jayme: running puppet on lvs servers - [[phab:T244843|T244843]] [[phab:T255878|T255878]]
* 17:23 volans: running authdns-update to force-update authdns2001
* 14:35 Urbanecm: Create bot_passwords table at all private wikis ([[phab:T258356|T258356]])
* 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:21 XioNoX: remove ns1 redirect - [[phab:T286787|T286787]]
* 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
* 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
* 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
* 14:21 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12886 and previous config saved to /var/cache/conftool/dbconfig/20201001-142156-kormat.json
* 17:10 XioNoX: enable asw-a2-codfw access ports - [[phab:T286787|T286787]]
* 14:14 andrewbogott: reimaging cloudvirt-wdqs1001 to buster
* 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - [[phab:T286787|T286787]]
* 14:12 effie: enable puppet on mw2271
* 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
* 14:08 moritzm: installing pillow security updates
* 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
* 14:06 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 67%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12885 and previous config saved to /var/cache/conftool/dbconfig/20201001-140653-kormat.json
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
* 13:59 moritzm: installing nginx security updates on schema*
* 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
* 13:51 kormat@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 33%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12884 and previous config saved to /var/cache/conftool/dbconfig/20201001-135149-kormat.json
* 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
* 13:50 klausman: rebooting an-worker1096 for cluster maintenance
* 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
* 13:49 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
* 13:49 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 16:40 XioNoX: update asw-a2-codfw serial number - [[phab:T286787|T286787]]
* 13:43 vgutierrez: use synthetic warning for 2% of ECDHE-ECDSA-AES128-SHA pageviews - [[phab:T258405|T258405]]
* 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:29 moritzm: restarting mw canaries to pick up curl update
* 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
* 13:22 moritzm: installing curl security updates on stretch
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 12:57 kormat@cumin1001: dbctl commit (dc=all): 'db2136 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12883 and previous config saved to /var/cache/conftool/dbconfig/20201001-125707-kormat.json
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 12:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 12:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 12:39 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12882 and previous config saved to /var/cache/conftool/dbconfig/20201001-123925-kormat.json
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 12:24 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12881 and previous config saved to /var/cache/conftool/dbconfig/20201001-122422-kormat.json
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 12:15 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/GrowthExperiments/includes/NewcomerTasks/TemplateFilter.php: {{Gerrit|500d0c70c84936bcdecdd0927bcbb9ff7265afa9}}: Prevent returning the full templatelinks table in TemplateFilter ([[phab:T264029|T264029]]) (duration: 00m 59s)
* 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 12:12 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/TemplateFilter.php: {{Gerrit|500d0c70c84936bcdecdd0927bcbb9ff7265afa9}}: Prevent returning the full templatelinks table in TemplateFilter ([[phab:T264029|T264029]]) (duration: 01m 00s)
* 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
* 12:09 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 50%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12880 and previous config saved to /var/cache/conftool/dbconfig/20201001-120919-kormat.json
* 16:21 mutante: depooled logstash2021 for dcops maintenance work
* 11:54 kormat@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 25%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12879 and previous config saved to /var/cache/conftool/dbconfig/20201001-115415-kormat.json
* 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
* 11:14 arturo: pulling packages into reprepro for buster-wikimedia/thirdpardy/kubeadm-k8s-1-17 ([[phab:T263284|T263284]])
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
* 11:09 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=kuwiktionary --fix # [[phab:T262046|T262046]]
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|58a8c8271d75ff477ce0507ac5021edcfc2f6453}}: kuwiktionary: Create Jinûvesazî namespace ([[phab:T262046|T262046]]) (duration: 01m 01s)
* 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
* 10:47 kormat@cumin1001: dbctl commit (dc=all): 'db2119 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12878 and previous config saved to /var/cache/conftool/dbconfig/20201001-104716-kormat.json
* 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
* 10:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|310be45f7}} (duration: 00m 57s)
* 10:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
* 08:55 hnowlan: adding buster host restbase1028-b to cassandra
* 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 08:53 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
* 08:38 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
* 08:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2bdfbd258e}} (duration: 00m 57s)
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2109', diff saved to https://phabricator.wikimedia.org/P12877 and previous config saved to /var/cache/conftool/dbconfig/20201001-083321-marostegui.json
* 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:28 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I069c7b53}} (duration: 00m 58s)
* 08:27 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:25 akosiaris@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
* 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 15:10 godog: +100G to prometheus/ops in codfw
* 08:25 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 14:59 vgutierrez: rolling restart of eqiad pybal instances
* 08:25 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 08:25 akosiaris@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 08:22 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 14:42 vgutierrez: rolling restart of codfw pybal instances
* 08:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:33 vgutierrez: rolling restart of eqsin pybal instances
* 08:16 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 14:23 vgutierrez: rolling restart of ulsfo pybal instances
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2109 ', diff saved to https://phabricator.wikimedia.org/P12875 and previous config saved to /var/cache/conftool/dbconfig/20201001-081308-marostegui.json
* 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
* 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091', diff saved to https://phabricator.wikimedia.org/P12874 and previous config saved to /var/cache/conftool/dbconfig/20201001-071442-marostegui.json
* 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091 ', diff saved to https://phabricator.wikimedia.org/P12873 and previous config saved to /var/cache/conftool/dbconfig/20201001-071413-marostegui.json
* 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12872 and previous config saved to /var/cache/conftool/dbconfig/20201001-071347-marostegui.json
* 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P12871 and previous config saved to /var/cache/conftool/dbconfig/20201001-071321-marostegui.json
* 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2083', diff saved to https://phabricator.wikimedia.org/P12870 and previous config saved to /var/cache/conftool/dbconfig/20201001-071241-marostegui.json
* 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
* 07:12 elukey: restart hdfs namenodes on an-worker100[1,2] to pick up new hadoop workers settings
* 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2083', diff saved to https://phabricator.wikimedia.org/P12869 and previous config saved to /var/cache/conftool/dbconfig/20201001-071155-marostegui.json
* 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 06:42 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
* 06:40 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Make es2033 master of es2 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12867 and previous config saved to /var/cache/conftool/dbconfig/20201001-063104-marostegui.json
* 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
* 06:18 jayme: imported envoyproxy 1.15.1 to buster-wikimedia, stretch-wikimedia - [[phab:T264157|T264157]]
* 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 05:45 marostegui: Stop MySQL on es2011 [[phab:T264261|T264261]]
* 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2011 [[phab:T264261|T264261]]', diff saved to https://phabricator.wikimedia.org/P12866 and previous config saved to /var/cache/conftool/dbconfig/20201001-054335-marostegui.json
* 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 05:29 marostegui: Deploy schema change on s3 (testwikidatawiki) [[phab:T264109|T264109]]
* 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
* 05:19 marostegui: Repool labsdb1011
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
* 04:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 04:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
* 01:27 krinkle@deploy1001: Synchronized php-1.36.0-wmf.10/includes/parser/: {{Gerrit|Ia3357b2f593c}} (duration: 00m 58s)
* 11:40 moritzm: installing bluez security updates
* 01:12 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|1721d2aa0}} - Reject ParserCache entries from the last wmf.11 deployment (duration: 05m 13s)
* 11:31 Lucas_WMDE: EU backport+config window done
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:703205{{!}}Add config for updated PropertySuggester beta cluster (T285098)]] (beta-only) (duration: 00m 57s)
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 09:52 moritzm: imported megacli for bullseye-wikimedia [[phab:T282272|T282272]] [[phab:T275873|T275873]]
* 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
* 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
* 08:15 vgutierrez: depool codfw text traffic
* 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
* 03:26 twentyafterfour: restarted phd on phab1001
* 03:25 twentyafterfour: investigating PHD failure


== 2020-09-30 ==
== 2021-07-16 ==
* 22:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 22:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 22:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 22:10 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 21:46 cdanis: depool mw2356 and mw2319
* 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
* 21:45 eileen: civicrm revision changed from {{Gerrit|5a53bfe6ed}} to {{Gerrit|256adda03c}}, config revision is {{Gerrit|646817a2c0}}
* 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
* 21:23 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 also
* 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P<nowiki>{</nowiki>elastic2*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
* 21:19 ejegg: updated fundraising CiviCRM from {{Gerrit|6e843649ac}} to {{Gerrit|5a53bfe6ed}}
* 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
* 21:04 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback
* 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
* 21:00 twentyafterfour@deploy1001: scap failed: average error rate on 5/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
* 15:48 vgutierrez: restart pybal on lvs2010
* 20:58 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 20s)
* 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:56 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11
* 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
* 20:47 mutante: temp disabling puppet on C:profile::swift::stats_reporter hosts, applying gerrit:631158 refactoring change
* 15:24 godog: downtime flappy pages in codfw for 40 minutes
* 20:36 mutante: temp disabling puppet on swift::storage (swift-be) hosts, applying gerrit:631157 refactoring change
* 15:14 godog: set alert2001 as active in netbox (was staged) - [[phab:T247966|T247966]]
* 19:21 mutante: activating DHCP and squid on install[345]001.wikimedia.org
* 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
* 19:12 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
* 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
* 19:01 effie: disable puppet on mw2271 and use onhost memcached - [[phab:T263958|T263958]]
* 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw [[phab:T286787|T286787]]
* 19:00 hoo@deploy1001: Synchronized wmf-config/: Revert "labs: Turn on termbox v2 on wikidatawiki" ([[phab:T264066|T264066]]) (duration: 00m 58s)
* 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
* 18:58 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Revert "labs: Turn on termbox v2 on wikidatawiki" ([[phab:T264066|T264066]]) (duration: 00m 58s)
* 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 18:38 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on svwiki ([[phab:T257220|T257220]]) (duration: 00m 58s)
* 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
* 18:36 bblack: lvs1016 pybal diff alerts downtimed in icinga for ~48h to reduce annoying flappy alert spam, with reference to https://phabricator.wikimedia.org/T264227
* 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
* 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments for newcomers on ptwiki ([[phab:T225027|T225027]]) (duration: 00m 58s)
* 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 18:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Put search in header for anons on all wikis, not just desktop-improvements wikis ([[phab:T263032|T263032]]) (duration: 00m 59s)
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 18:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable clientError on Wikidata and all Wikipedias except enwiki ([[phab:T255585|T255585]]) (duration: 00m 58s)
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 18:08 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Move search in header for anons ([[phab:T263032|T263032]]) (duration: 00m 59s)
* 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 17:52 bblack: lvs1016: restart pybal
* 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 17:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
* 17:01 hnowlan: finished adding restbase2018-a to the cassandra cluster
* 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
* 16:37 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers ([[phab:T279309|T279309]])
* 16:33 cicalese@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Add beta config for API Portal/OAuth communications (duration: 00m 58s)
* 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 16:31 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 16:21 mutante: re-enabled puppet on install2003
* 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 12:39 mutante: mw1412 through mw1428 - set to active in netbox ([[phab:T279309|T279309]])
* 16:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
* 15:28 moritzm: removed librsvg 2.40.20-3+wmf1+stretch1 from component/thumbor, superseded by 2.40.21-0+deb9u1 released via stretch-security
* 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
* 14:23 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
* 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
* 14:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
* 14:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
* 14:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
* 14:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
* 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
* 14:20 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 14:10 cmjohnson1: powering down ores100[3-9 to upgrade memory in each [[phab:T259909|T259909]]
* 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:05 elukey: create thirdparty/amd-rocm33 for stretch-wikimedia
* 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 14:03 cmjohnson1: powering down ores1002 to upgrade memory [[phab:T259909|T259909]]
* 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 13:55 cmjohnson1: powering down ores1001 to upgrade memory [[phab:T259909|T259909]]
* 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 13:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 13:27 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
* 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
* 13:12 hnowlan: started bootstrapping restbase1028-a, first buster restbase host
* 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
* 12:39 marostegui: Deploy schema change on db2080, db2081 [[phab:T264109|T264109]]
* 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2081', diff saved to https://phabricator.wikimedia.org/P12858 and previous config saved to /var/cache/conftool/dbconfig/20200930-123851-marostegui.json
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P12857 and previous config saved to /var/cache/conftool/dbconfig/20200930-123824-marostegui.json
* 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2080', diff saved to https://phabricator.wikimedia.org/P12856 and previous config saved to /var/cache/conftool/dbconfig/20200930-123753-marostegui.json
* 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080', diff saved to https://phabricator.wikimedia.org/P12855 and previous config saved to /var/cache/conftool/dbconfig/20200930-123659-marostegui.json
* 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures [[phab:T286763|T286763]]', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
* 11:33 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:33 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:33 effie: enable puppet  P:mediawiki::mcrouter_wancache for 630845 - [[phab:T244340|T244340]]
* 11:21 nikerabbit@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:627744{{!}}Enable Special:TranslationStats (T263004)]] (duration: 00m 59s)
* 11:06 effie: disable puppet on P:mediawiki::mcrouter_wancache for 630845 - [[phab:T244340|T244340]]
* 10:57 moritzm: installing librsvg security updates
* 10:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:47 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:44 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:21 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:07 kormat: deploying schema change to s4/eqiad [[phab:T259831|T259831]]
* 10:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:07 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:59 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:50 jayme: imported envoyproxy 1.15.1 to buster-wikimedia component/envoy-future - [[phab:T264157|T264157]]
* 09:12 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:10 gehel@cumin1001: START - Cookbook sre.hosts.downtime
* 08:45 kormat: deploying schema change to s7/eqiad [[phab:T259831|T259831]]
* 08:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2016 from dbctl [[phab:T264156|T264156]]', diff saved to https://phabricator.wikimedia.org/P12853 and previous config saved to /var/cache/conftool/dbconfig/20200930-080817-marostegui.json
* 08:06 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 08:00 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 07:56 akosiaris: upgrade termbox to latest chart, fixing various prometheus-statsd-export configuration minor issues.
* 07:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 07:55 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1131 on s6 eqiad master [[phab:T263227|T263227]], also give weight to db1093 as new API host', diff saved to https://phabricator.wikimedia.org/P12852 and previous config saved to /var/cache/conftool/dbconfig/20200930-074417-marostegui.json
* 07:41 marostegui: Starting s6 eqiad failover from db1093 to db1131 - [[phab:T263227|T263227]]
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 [[phab:T263227|T263227]]', diff saved to https://phabricator.wikimedia.org/P12851 and previous config saved to /var/cache/conftool/dbconfig/20200930-071841-marostegui.json
* 07:05 marostegui: Stop mysql on es2016 before decommissioning [[phab:T264156|T264156]]
* 07:01 elukey@deploy1001: Finished deploy [analytics/superset/deploy@7bdc414]: Upgrade to 0.37.2 (duration: 00m 49s)
* 07:00 elukey@deploy1001: Started deploy [analytics/superset/deploy@7bdc414]: Upgrade to 0.37.2
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2016 [[phab:T264156|T264156]]', diff saved to https://phabricator.wikimedia.org/P12850 and previous config saved to /var/cache/conftool/dbconfig/20200930-065838-marostegui.json
* 06:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 06:19 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2082', diff saved to https://phabricator.wikimedia.org/P12849 and previous config saved to /var/cache/conftool/dbconfig/20200930-061036-marostegui.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2082', diff saved to https://phabricator.wikimedia.org/P12848 and previous config saved to /var/cache/conftool/dbconfig/20200930-061005-marostegui.json
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318', diff saved to https://phabricator.wikimedia.org/P12847 and previous config saved to /var/cache/conftool/dbconfig/20200930-060754-marostegui.json
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318', diff saved to https://phabricator.wikimedia.org/P12846 and previous config saved to /var/cache/conftool/dbconfig/20200930-060705-marostegui.json
* 05:43 marostegui: Remove es2019 from tendril and zarcillo [[phab:T264063|T264063]]
* 05:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:29 marostegui: Reduce busy-time from 3600 to 1800 on labsdb1010
* 02:30 eileen: process-control config revision is {{Gerrit|646817a2c0}}
* 00:41 tgr@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/GrowthExperiments/: Backport: [[gerrit:630801{{!}}Ensure variant A homepage sidebar is always at least 300px (T263905)]] (duration: 01m 01s)


== 2020-09-29 ==
== 2021-07-15 ==
* 23:35 mutante: created testvm3001.esams.wmnet to test install3001
* 23:32 brennen: checking stashbot: [[phab:T286756|T286756]]
* 23:31 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: [[gerrit:704815{{!}}Fix creation of mw.Message objects (T286385)]] (duration: 00m 57s)
* 23:24 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Echo app push on all Wikipedias ([[phab:T262936|T262936]]) (duration: 00m 59s)
* 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # [[phab:T285811|T285811]]
* 23:20 Urbanecm: Evening B&C window completed
* 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # [[phab:T285811|T285811]]
* 23:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|68d7af9cb38de09b4cb8655f0b095b60d470fbbc
* 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # [[phab:T285811|T285811]]
* 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
* 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki [[phab:T284928|T284928]]
* 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
* 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
 


== 2020-09-28 ==
== 2021-07-02 ==
* 23:56 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T264053|T264053]]: Remove commonswiki from sidebar search (duration: 01m 09s)
* 22:06 foks: removing three files for legal compliance
* 23:42 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/extensions/GrowthExperiments/includes/NewcomerTasks/ConfigurationLoader/PageConfigurationLoader.php: Backport: [[gerrit:630420{{!}}Properly handle namespaces in tasktype template configuration (T264029)]] (duration: 01m 03s)
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:27 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 19:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 22:25 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 18:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:24 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 22:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:58 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:25 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:22 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:22 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 18:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:21 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 15:59 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:21 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 15:54 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 20:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:29 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:17 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dragonfly-supernode1001.eqiad.wmnet
* 20:51 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 15:07 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
* 20:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:05 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dragonfly-supernode1001.eqiad.wmnet
* 20:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
* 20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:54 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
* 20:46 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 14:53 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
* 20:45 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 14:52 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
* 20:17 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 14:40 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[0-1].eqiad.wmnet
* 20:17 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-9].eqiad.wmnet
* 20:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:38 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
* 20:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw142[0-1].eqiad.wmnet
* 20:13 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 14:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-9].eqiad.wmnet
* 20:13 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw142[0-1].eqiad.wmnet
* 20:10 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw141[4-9].eqiad.wmnet
* 19:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:15 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw142[0-1].eqiad.wmnet
* 19:16 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:12 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw141[4-9].eqiad.wmnet
* 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 14:12 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry[2005-2008].codfw.wmnet
* 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 13:54 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry[2005-2008].codfw.wmnet
* 19:12 ejegg: updated staging payments-wiki from {{Gerrit|43470629cc}} to {{Gerrit|885d87a905}}
* 13:32 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=registry200[5-8].codfw.wmnet,dc=codfw,cluster=docker-registry
* 18:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 18:15 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 18:15 Urbanecm: Morning B&C done
* 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c7e08bc2bbff6aead186350726d5c1c137cca052}}: Enable search in header A/B test for logged in users ([[phab:T263032|T263032]]) (duration: 00m 58s)
* 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 17:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
* 17:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 13:11 mutante: mw2380 - rebooting
* 17:15 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
* 17:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
* 16:58 ejegg: updated payment-wiki from {{Gerrit|b2eb456ed1}} to {{Gerrit|2083498811}}
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
* 16:34 cdanis@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 12:24 moritzm: added btullis to pwstore
* 16:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:06 mutante: mw2380 /puppetmaster: reimaged, revoking old cert, signing new cert, initial puppet run [[phab:T285603|T285603]]
* 16:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:51 mutante: mw2380 - PXE booting - does not boot from hard disk
* 16:24 cdanis@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 11:28 mutante: powercycling mw2380, trying to make it boot
* 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 16:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 11:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 16:20 nskaggs@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 10:33 jforrester@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/WikibaseMediaInfo: UploadWizard/WikibaseMediaInfo fix {{Gerrit|3fd2873}} for [[phab:T285579{{!}}T285579]] (duration: 00m 59s)
* 16:20 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 09:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1268.eqiad.wmnet
* 16:20 cdanis@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:37 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:702808{{!}}Fix handling of geEnabled flag (T285996)]] (duration: 00m 57s)
* 16:20 cdanis@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 09:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1268.eqiad.wmnet
* 16:08 hnowlan: reimaging new restbase hosts - restbase1028, restbase1029, restbase1030
* 09:24 godog: test thanos 0.21.1 locally on thanos-fe2001 and depool the host - [[phab:T285835|T285835]]
* 16:08 XioNoX: push pfw policies - [[phab:T264013|T264013]]
* 09:19 dcausse: restart blazegraph on wdqs1013
* 15:51 papaul: poweroff elastic2037 for DIMM replacing
* 09:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1267.eqiad.wmnet
* 15:26 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1114 [[phab:T196487|T196487]]', diff saved to https://phabricator.wikimedia.org/P12818 and previous config saved to /var/cache/conftool/dbconfig/20200928-152635-kormat.json
* 09:04 mutante: decom'ing mw1267
* 15:25 hashar: Restarting CI Jenkins for plugins uninstallation [[phab:T260565|T260565]]
* 09:02 moritzm: installing node-hosted-git-info security updates
* 15:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:02 tgr: deploying emergency backport: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/702808
* 15:15 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 08:54 moritzm: installing  golang-docker-credential-helpers security updates
* 15:13 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 08:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1267.eqiad.wmnet
* 15:13 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 15:12 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 15:12 cdanis@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 15:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 15:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 08:03 moritzm: installing ipmitool security updates
* 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1268.eqiad.wmnet
* 15:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1267.eqiad.wmnet
* 15:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 14:59 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
* 14:49 moritzm: installing glib-networking security updates
* 07:25 dcausse: installing openjdk-8-dbg on wdqs1013
* 14:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 03:14 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo run-puppet-agent --force'`
* 14:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 03:11 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo apt update'` fixed the issue
* 14:40 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1006.eqiad.wmnet
* 03:07 ryankemper: [[phab:T264053|T264053]] `Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install elasticsearch-madvise' returned 100: Reading package lists...` grr
* 14:33 XioNoX: repool eqiad
* 03:07 ryankemper: [[phab:T264053|T264053]] `ryankemper@elastic2054:~$ sudo run-puppet-agent --force`
* 14:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 03:06 ryankemper: [[phab:T264053|T264053]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/702791; will run puppet on single host
* 14:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 03:05 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo disable-puppet "verify new deb package works - [[phab:T264053|T264053]]"'`
* 14:05 moritzm: uploaded libdbi-perl 1.631-3+wmf1 for jessie-wikimedia [[phab:T259102|T259102]]
* 03:02 legoktm: uploaded elasticsearch-madvise_0.1~deb9u1_amd64.changes to stretch-wikimedia on apt1001
* 13:58 XioNoX: asw2-d-eqiad# run request system power-off member 4
* 01:47 eileen: civicrm revision changed from {{Gerrit|e07c2be1a7}} to {{Gerrit|bb62188ec6}}, config revision is {{Gerrit|1739c53fcb}}
* 13:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:16 legoktm: uploaded elasticsearch-madvise 0.1 to apt.wm.o ([[phab:T264053|T264053]])
* 13:46 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1006.eqiad.wmnet
* 13:45 XioNoX: downtiming all eqiad row D hosts - [[phab:T196487|T196487]]
* 13:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:38 godog: roll restart object-replicator on ms-be2* for higher concurrency - [[phab:T261633|T261633]]
* 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:20 volans@cumin1001: START - Cookbook sre.dns.netbox
* 13:19 moritzm: reimaging sretest1001 to validate puppetised sources.list with a new installation [[phab:T158562|T158562]]
* 13:03 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 12:57 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
* 12:37 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 12:31 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 12:29 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript resetUserEmail.php --wiki=arbcom_ruwiki 'Adamant.pwn' 'adamant.pwn@hotmail.com' # [[phab:T262812|T262812]]
* 12:28 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript createAndPromote.php --wiki=arbcom_ruwiki --bureaucrat --sysop 'Adamant.pwn' <PASSWORD REDACTED> # [[phab:T262812|T262812]]
* 12:26 Urbanecm: arbcom_ruwiki is created ([[phab:T262812|T262812]])
* 12:26 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 48s)
* 12:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating arbcom_ruwiki ([[phab:T262812|T262812]]) (duration: 00m 56s)
* 12:23 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating arbcom_ruwiki ([[phab:T262812|T262812]]) (duration: 00m 56s)
* 12:21 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating arbcom_ruwiki ([[phab:T262812|T262812]])
* 12:20 urbanecm@deploy1001: Synchronized dblists: Creating arbcom_ruwiki ([[phab:T262812|T262812]]) (duration: 00m 57s)
* 12:19 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating arbcom_ruwiki ([[phab:T262812|T262812]]) (duration: 00m 57s)
* 12:17 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating arbcom_ruwiki ([[phab:T262812|T262812]]) (duration: 00m 56s)
* 12:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:59 kormat@cumin1001: dbctl commit (dc=all): 'db1114 depooling: prep for rack switch upgrade [[phab:T196487|T196487]]', diff saved to https://phabricator.wikimedia.org/P12815 and previous config saved to /var/cache/conftool/dbconfig/20200928-115904-kormat.json
* 11:43 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|483beb2452caead8c44dfb8e608812778033fba0}}: ContentTranslation: Do not use wikishared DB for testwiki ([[phab:T263417|T263417]]; follow-up {{Gerrit|af09303a4a155681b198ac70468494c2155868df}} also included in this sync) (duration: 00m 56s)
* 11:34 Urbanecm: EU B&C window done
* 11:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|61eac95ef62aef682039761e0f02188437cb15fb}}: Creation of patroller group on arz.wikipedia ([[phab:T262218|T262218]]) (duration: 00m 57s)
* 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|483beb2452caead8c44dfb8e608812778033fba0}}: ContentTranslation: Do not use wikishared DB for testwiki ([[phab:T263417|T263417]]; follow-up {{Gerrit|af09303a4a155681b198ac70468494c2155868df}} also included in this sync) (duration: 00m 57s)
* 10:45 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:630561{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 10:44 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:630561{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:37 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 10:35 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:35 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:33 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:32 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:29 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:25 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 09:48 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 09:48 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 09:48 ema: upload@codfw: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 09:29 ema: text@codfw: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 09:17 _joe_: changing the restbase public TLS certs to include restbase-async.discovery.wmnet
* 09:17 XioNoX: restart bird on dns2001 - [[phab:T262372|T262372]]
* 09:15 jynus: restart db1077 for upgrade and cleanup [[phab:T187984|T187984]]
* 09:06 XioNoX: restart bird on centrallog2001 - [[phab:T262372|T262372]]
* 09:02 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:00 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 08:56 dcausse: [[phab:T263970|T263970]]: recovering lost apifeature indices (copying eqiad indices -> codfw)
* 08:55 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:53 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:46 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:43 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:43 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:37 elukey: decommission the hadoop test cluster (analytics1028->41)
* 08:36 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:36 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 08:35 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:34 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:34 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:32 ema: text@eqiad: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 08:28 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: mobo replaced [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12813 and previous config saved to /var/cache/conftool/dbconfig/20200928-082825-kormat.json
* 08:21 ema: upload@eqiad: rolling varnish upgrade to 6.0.6-1wm1 [[phab:T263557|T263557]]
* 08:21 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2113 from contributions/logpager/recentchanges*/watchlist [[phab:T263842|T263842]]', diff saved to https://phabricator.wikimedia.org/P12812 and previous config saved to /var/cache/conftool/dbconfig/20200928-082114-kormat.json
* 08:13 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: mobo replaced [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12811 and previous config saved to /var/cache/conftool/dbconfig/20200928-081321-kormat.json
* 08:07 jayme: restarting pybal on lvs3005 for switching to conf1005 - [[phab:T196487|T196487]]
* 08:06 jayme: restarting pybal on lvs3006 for switching to conf1005 - [[phab:T196487|T196487]]
* 08:02 jayme: restarting pybal on lvs3007 for switching to conf1005 - [[phab:T196487|T196487]]
* 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 07:58 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: mobo replaced [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12810 and previous config saved to /var/cache/conftool/dbconfig/20200928-075817-kormat.json
* 07:54 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 07:43 kormat@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: mobo replaced [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12809 and previous config saved to /var/cache/conftool/dbconfig/20200928-074313-kormat.json
* 07:29 _joe_: restarting pybal on the LVS primaries
* 07:24 dcausse: [[phab:T263970|T263970]]: forcing allocation of enwiki_general_1587198756 (chi@eqiad)
* 07:18 _joe_: restarting pybal on the backup LVS in eqiad, codfw to pick up the new wikifeeds endpoint
* 07:17 elukey@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0)
* 07:09 elukey@cumin1001: START - Cookbook sre.presto.roll-restart-workers
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2028 as es1 master in codfw [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P12806 and previous config saved to /var/cache/conftool/dbconfig/20200928-065938-marostegui.json
* 06:15 marostegui: Set innodb_change_buffering = inserts; on db2089 (s5), db2106 (s4), db2108 (s2), db2085 (s1), db2085 (s8), db2087 (s7), db2087 (s6), db2109 (s3) [[phab:T263443|T263443]]
* 05:55 marostegui: Stop MySQL on es2013 before decommissioning it [[phab:T263740|T263740]]
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2013 from dbctl [[phab:T263740|T263740]]', diff saved to https://phabricator.wikimedia.org/P12805 and previous config saved to /var/cache/conftool/dbconfig/20200928-055410-marostegui.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2013 [[phab:T263740|T263740]]', diff saved to https://phabricator.wikimedia.org/P12804 and previous config saved to /var/cache/conftool/dbconfig/20200928-054846-marostegui.json
* 05:22 marostegui: Decrease labsdb1011 weight


== 2020-09-27 ==
== 2021-07-01 ==
* 06:36 elukey: powercycle analytics1048
* 23:29 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:702777{{!}}Revert "deployment training: readme whitespace"]] (duration: 00m 56s)
* 23:21 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:702774{{!}}deployment training: readme whitespace]] (duration: 00m 57s)
* 22:37 urbanecm: Start server-side upload for 1 video file ([[phab:T285182|T285182]])
* 22:36 urbanecm: Start server-side upload for 1 video file ([[phab:T285789|T285789]])
* 22:31 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:702704{{!}}Use train-versions.json to map from version to image tag (T282824)]] (duration: 00m 57s)
* 22:27 urbanecm: Start server-side upload for 1 video file ([[phab:T285682|T285682]])
* 21:43 dancy@deploy1002: Synchronized .pipeline/config.yaml: Config: [[gerrit:702755{{!}}Temporarily disable notification for security patch failures]] (duration: 00m 57s)
* 19:45 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.12
* 19:41 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 12s)
* 19:39 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
* 19:35 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/tests/phpunit/includes/TitleMethodsTest.php: Backport: [[gerrit:702711{{!}}Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
* 19:34 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/Title.php: Backport: [[gerrit:702711{{!}}Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
* 19:18 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/.pipeline/config.yaml: Backport: [[gerrit:702168{{!}}Trigger update-train-versions job at end of wmf-publish pipeline]] (duration: 01m 08s)
* 18:55 otto@deploy1002: Finished deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883] (duration: 05m 19s)
* 18:50 otto@deploy1002: Started deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883]
* 18:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7995f7abe3b94eb0326064cbbd1d3111f8f21365}}: Use Vue.js for QuickSurveys on available wikis ([[phab:T285890|T285890]]) (duration: 01m 09s)
* 18:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: {{Gerrit|654877f92fa18ae766d693630025c69400cad3ac}}: EventDispatcher: Ensure we fetch page content from the primary database ([[phab:T285895|T285895]]) (duration: 01m 12s)
* 18:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: {{Gerrit|6d9043087ec421e1321cd6797934928e2651b1c1}}: EventDispatcher: Ensure we fetch page content from the primary database ([[phab:T285895|T285895]]) (duration: 01m 14s)
* 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:28 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.12"
* 16:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:23 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php: [[phab:T285959|T285959]] (duration: 01m 20s)
* 16:11 vgutierrez: restart varnish-fe on cp3059 - [[phab:T285953|T285953]]
* 14:58 papaul: poweroff mw2380 for disk replacement
* 14:57 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
* 14:53 effie: depool mw2380 for disk repair - [[phab:T285603|T285603]]
* 14:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:51 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:45 moritzm: installing glib2.0 security updates on buster
* 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts maps2002.codfw.wmnet
* 13:35 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts maps2002.codfw.wmnet
* 13:03 marostegui: Deploy schema change on s2 eqiad master [[phab:T276150|T276150]]
* 12:49 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1266.eqiad.wmnet
* 12:39 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1266.eqiad.wmnet
* 12:37 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 12:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1264-1265].eqiad.wmnet
* 12:23 tgr: EU deploys done
* 12:22 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/: Backport: [[gerrit:702402{{!}}Welcome tour: Mark as complete when notice is shown (T284800)]] [[gerrit:702404{{!}}SuggestedEdits: Return default JS data as 'noresults' (T285906)]] (duration: 01m 08s)
* 12:20 tgr@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/: Backport: [[gerrit:702401{{!}}Welcome tour: Mark as complete when notice is shown (T284800)]] [[gerrit:702403{{!}}SuggestedEdits: Return default JS data as 'noresults' (T285906)]] (duration: 01m 09s)
* 12:19 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1264-1265].eqiad.wmnet
* 12:09 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1263.eqiad.wmnet
* 11:58 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1263.eqiad.wmnet
* 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/: Backport: [[gerrit:702400{{!}}Stop using legacy entityNamespaces setting in onSetupAfterCache hook (T285472)]] (duration: 01m 15s)
* 11:46 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1262.eqiad.wmnet
* 11:35 elukey: reboot ml-serve-ctrl200[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
* 11:35 marostegui: Deploy schema change on s8 eqiad master [[phab:T276150|T276150]]
* 11:33 elukey: reboot ml-serve-ctrl100[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
* 11:33 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1262.eqiad.wmnet
* 11:19 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:697851{{!}}Avoid using MWNamespace]] (duration: 01m 06s)
* 11:07 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:27 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:05 moritzm: installing remaining libgcrypt20 security updates
* 09:56 moritzm: installing remaining gnutls28 security updates
* 09:55 Amir1: start of clean up of autoreview logs in ruwiki ([[phab:T285608|T285608]])
* 09:47 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:36 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:36 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:35 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:35 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:05 marostegui: Deploy schema change on s1 eqiad (db1157) master [[phab:T277123|T277123]]
* 08:52 marostegui: Deploy schema change on s1 eqiad (db1163) master [[phab:T277123|T277123]]
* 08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1261.eqiad.wmnet
* 08:28 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1261.eqiad.wmnet
* 08:23 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw126[2-6].eqiad.wmnet
* 08:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw126[2-6].eqiad.wmnet
* 08:13 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1261.eqiad.wmnet
* 08:11 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
* 07:06 marostegui: Deploy schema change on s4 eqiad (db1138) master [[phab:T277123|T277123]]
* 06:34 marostegui: Deploy schema change on s7 eqiad (db1136) masters [[phab:T277123|T277123]]
* 06:31 marostegui: Deploy schema change on s2,s8 eqiad masters [[phab:T277123|T277123]]
* 05:57 marostegui: Deploy schema change on s5 eqiad master (db1130) [[phab:T277123|T277123]]
* 05:55 marostegui: Deploy schema change on s6 eqiad master (db1173) [[phab:T277123|T277123]]
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129', diff saved to https://phabricator.wikimedia.org/P16750 and previous config saved to /var/cache/conftool/dbconfig/20210701-055243-marostegui.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P16749 and previous config saved to /var/cache/conftool/dbconfig/20210701-052702-marostegui.json
* 04:48 marostegui: Disconnect eqiad -> codfw replication from s1-s8


== 2020-09-26 ==
== 2021-06-30 ==
* 19:20 chrisalbon: sudo service uwsgi-ores restart
* 23:28 urbanecm: Evening B&C window finished
* 02:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|667d88054097b195208818aee15bb1eb58955437}}: Add Parsoid to wmgMonologChannels with warning level (duration: 01m 07s)
* 02:04 cdanis@cumin2001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=eqiad
* 23:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REVERT: {{Gerrit|8e719d54baa4c26aaa090e02503b0d9473301cce}}: Add Parsoid to wmgMonologChannels (duration: 00m 38s)
* 02:04 cdanis@cumin2001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=codfw
* 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8e719d54baa4c26aaa090e02503b0d9473301cce}}: Add Parsoid to wmgMonologChannels (duration: 01m 07s)
* 01:56 cdanis: ❌cdanis@cumin2001.codfw.wmnet ~ 🕙🍺 sudo cumin 'A:ores and A:codfw'  'systemctl restart celery-ores-worker.service uwsgi-ores.service '
* 21:43 Amir1: deleting auto-review logs from test2wiki ([[phab:T285608|T285608]])
* 01:48 cdanis@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=codfw
* 21:40 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T284931|T284931]] [[phab:T284459|T284459]] [[phab:T284394|T284394]])
* 01:48 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 21:29 cstone: civicrm revision changed from {{Gerrit|789c92d13b}} to {{Gerrit|e07c2be1a7}}
* 01:17 cdanis: ❌cdanis@ores2001.codfw.wmnet ~ 🕤🍺 sudo systemctl restart uwsgi-ores.service
* 21:23 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T284931|T284931]] [[phab:T284459|T284459]] [[phab:T284394|T284394]])
* 01:11 cdanis: ✔️ cdanis@ores2001.codfw.wmnet ~ 🕘🍺 sudo systemctl restart celery-ores-worker.service
* 19:06 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 07s)
* 00:56 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
* 00:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 18:57 legoktm: legoktm@mwmaint2002:~$ sudo systemctl start mediawiki_job_purge_parsercache_pc[123] # to start split purge jobs ahead of the timers
* 00:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:54 legoktm: legoktm@mwmaint2002:~$ sudo systemctl stop mediawiki_job_parser_cache_purging.service # to stop zombie service
* 00:46 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 18:53 Amir1: adding urbanecm as admin of newprojects mailing list
* 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:12 Jeff_Green: authdns-update to deploy A/PTR records for frdev1002.frack.eqiad.wmnet
* 00:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 17:57 thcipriani: restart ci jenkins following upgrade
* 00:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:54 thcipriani: restart releases-jenkins following upgrade
* 17:16 moritzm: imported jenkins 2.289.2 to thirdparty/ci [[phab:T285532|T285532]]
* 16:30 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki 'Tech/Server_switch_2020' 'Tech/Server_switch' 'Martin Urbanec' --move-subpages --reason='per [[:phab:T285866]]' # [[phab:T285866|T285866]]
* 16:10 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache for Beta Cluster (duration: 00m 46s)
* 16:08 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 01s)
* 16:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 20s)
* 16:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 16s)
* 16:03 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 17s)
* 16:02 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating banwikisource ([[phab:T284389|T284389]])
* 16:00 urbanecm@deploy1002: Synchronized dblists: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 17s)
* 15:58 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 14s)
* 15:57 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 13s)
* 15:48 urbanecm@deploy1002: Synchronized langlist: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 16s)
* 15:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 16s)
* 15:46 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 13s)
* 15:44 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 15s)
* 15:43 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating shiwiki ([[phab:T284885|T284885]])
* 15:41 urbanecm@deploy1002: Synchronized dblists: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 14s)
* 15:40 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 14s)
* 15:38 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 14s)
* 15:31 urbanecm@deploy1002: Synchronized langlist: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 12s)
* 15:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 14s)
* 15:28 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 15:27 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 15:26 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating dagwiki ([[phab:T284450|T284450]])
* 15:25 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=dagwiki --cluster=all # [[phab:T284450|T284450]]
* 15:24 urbanecm@deploy1002: Synchronized dblists: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 15:22 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 13s)
* 15:21 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 15:07 sukhe: restarted dnsdist.service and pdns-recursor.service on O:wikidough to install gnutls/gcrypt updates
* 15:06 urbanecm: sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1'
* 13:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 13:26 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 13:26 moritzm: installing fluidsynth security updates on stretch
* 13:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
* 13:18 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
* 13:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
* 13:11 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
* 13:04 mutante: switching docker-registry to nginx light variant [[phab:T164456|T164456]]
* 13:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
* 12:53 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
* 12:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
* 12:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
* 12:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
* 12:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
* 12:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
* 12:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
* 12:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
* 12:24 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
* 12:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
* 12:17 kart_: Updated cxserver to 2021-06-30-112813-production ([[phab:T284900|T284900]], [[phab:T284885|T284885]])
* 12:13 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
* 12:11 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:06 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:01 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 11:46 Lucas_WMDE: EU backport+config window done
* 11:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:701505{{!}}Remove $wmgWikibaseClientRepoConceptBaseUri (T257260)]] (2/2, beta) (disregard the earlier /3, I’m skipping the test file after all) (duration: 01m 04s)
* 11:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:701505{{!}}Remove $wmgWikibaseClientRepoConceptBaseUri (T257260)]] (1/3, prod) (duration: 01m 16s)
* 11:35 moritzm: rolling restart of FPM/Apache on mw canaries to pick up gnutls/gcrypt security updates
* 11:11 moritzm: installing libgcrypt security updates on buster
* 11:09 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug2001:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1' # clean up old l10n cache
* 11:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:701504{{!}}Stop setting Wikibase client repoConceptBaseUri (T257260)]] (duration: 01m 24s)
* 10:44 moritzm: installing gnutls security updates on buster
* 10:31 godog: add 200G to prometheus/eqiad for 'ops' instance
* 09:35 godog: start swiftrepl-mw on ms-fe2005 post-switchover (credentials were missing) - [[phab:T162123|T162123]]
* 08:51 jelto: jelto@puppetmaster1001:~$ sudo puppet cert -s gitlab2001.wikimedia.org # approve puppet certificate request for gitlab2001, fingerprint checked
* 08:47 topranks: Removing BGP peers for AS48237 (Etihad Etisalat) and AS11404 (Wave Division Holdings) from cr2-eqiad (peers have left Equinix IX)
* 08:31 godog: remove sdf1 from thanos-be1003 in swift - [[phab:T285835|T285835]]
* 07:43 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thanos-be1003.eqiad.wmnet
* 07:43 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
* 07:43 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host thanos-be1003.eqiad.wmnet
* 07:37 filippo@cumin1001: START - Cookbook sre.