You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(restbase1* upgrade to cassandra 2.1.7 completed (godog))
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
Line 1: Line 1:
== June 30 ==
== 2021-08-03 ==
* 00:37 godog: restbase1* upgrade to cassandra 2.1.7 completed
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== June 29 ==
== 2021-08-02 ==
* 23:57 robh: mw2027 was offline (blank screen on serial console). mgmt powercycled
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:48 godog: start upgrading restbase1* to cassandra 2.1.7
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 23:41 gwicke: restarted cassandra instance on restbase1004.eqiad; log showed many small writes and clients saw timeouts
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:29 gwicke: deployed restbase 32db4ce1e1
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:21 logmsgbot: ori Synchronized php-1.26wmf11/includes/resourceloader: I0e5f2d3b2: resourceloader: Add timing metrics for key operations (duration: 01m 12s)
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:15 logmsgbot: catrope Synchronized wmf-config/: wikitech cleanup (duration: 01m 08s)
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:11 RoanKattouw: ssh: connect to host mw2027.codfw.wmnet port 22: Connection timed out
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:11 RoanKattouw: Synced wmf-config/CommonSettings.php: Remove survey access point in Popups
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:09 godog: stop ircecho on neon, icinga spam
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 22:53 gwicke: canary deploy of restbase 32db4ce1e1 on restbase1001.eqiad
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 21:30 urandom: restarting restbase1004 to apply new metrics reporting interval
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 20:19 subbu: deployed parsoid sha ea98be88
* 21:31 tzatziki: removing 1 file for legal compliance
* 18:18 logmsgbot: ori Synchronized php-1.26wmf11/includes/db/LoadBalancer.php: I0e5f2d3b2: Use APC for caching slave lag times (duration: 01m 09s)
* 21:16 tzatziki: removing 7 files for legal compliance
* 18:00 cmjohnson1: powering down ms-be1015
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 16:06 bblack: re-enabling puppet on caches
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:51 bblack: disabling puppet on caches temporarily ...
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/OpenStackManager: https://gerrit.wikimedia.org/r/#/c/221648/ (duration: 00m 13s)
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:29 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221405/ (duration: 00m 15s)
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 15:26 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221612/ (duration: 00m 12s)
* 19:00 urbanecm: Morning B&C window completed
* 15:24 logmsgbot: krenair Synchronized w/static/images/project-logos/zhwiki-hans-2x.png: https://gerrit.wikimedia.org/r/#/c/221113/ (duration: 00m 14s)
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 15:24 logmsgbot: krenair Synchronized w/static/images/project-logos/zhwiki-hans-1.5x.png: https://gerrit.wikimedia.org/r/#/c/221113/ (duration: 00m 12s)
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 15:23 logmsgbot: krenair Synchronized w/static/images/project-logos/zhwiki-hans.png: https://gerrit.wikimedia.org/r/#/c/221113/ (duration: 00m 12s)
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:20 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/221009/ (duration: 00m 11s)
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:18 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221047/ (duration: 00m 13s)
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 15:12 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/ContentTranslation/modules/tools/ext.cx.tools.link.js: https://gerrit.wikimedia.org/r/#/c/221605 (duration: 00m 13s)
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:02 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/ContentTranslation/modules/tools/ext.cx.tools.formatter.js: https://gerrit.wikimedia.org/r/#/c/221604/ (duration: 00m 14s)
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 14:34 jynus: rebooting and reinstalling db1022
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 12:06 YuviPanda: restarting rsync with new exclusions file on labstore1002 to codfw
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 12:06 YuviPanda: excluded maps, mwoffliner and video project from rsync of broken FS to speed it up
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:59 YuviPanda: interupt rsync on labstore1001 to prevent it from copying mwofflienr files
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 11:00 _joe_: shutting down etcd1003, cleaning exported resources
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 10:32 _joe_: effectively removing etcd1003 from the cluster
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 10:17 _joe_: starting removal of etcd1003 from the etcd cluster
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 08:49 _joe_: joined conf1003 to the etcd cluster
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:20 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Depool db1022 for reinstall (duration: 00m 12s)
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 08:12 _joe_: adding conf1002 to the etcd cluster as a member
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:46 akosiaris: disabling ntp everywhere expect selected hosts in anticipation for the leap second
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 04:51 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 29 04:51:48 UTC 2015 (duration 51m 47s)
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 03:08 jgage: jmxtrans filled disks on all kafka brokers, 21GB log files. removed logs and restarted services.
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:23 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-29 02:23:47+00:00
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:20 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 05m 53s)
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 00:52 springle: restart eventlogging auto-purge on m4
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 00:51 springle: restart replication on dbstore2002
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 00:00 springle: pausing replication on dbstore2002
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 12:20 mutante: gerrit servers: disabling puppet
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 11:27 hashar: restarting Jenkins on contint2001
* 11:27 hashar: restarting Jenkins on contint1001
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 urbanecm: EU B&C window completed
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 11:08 moritzm: installing openjdk-11 security updates
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 07:24 moritzm: installing libsndfile security updates on buster
* 07:12 moritzm: installing aspell security updates
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)


== June 28 ==
== 2021-07-31 ==
* 23:51 logmsgbot: ori Synchronized php-1.26wmf11/extensions/CentralNotice/modules/ext.centralNotice.bannerController/bannerController.js: I6ffdc977e87: Parse older format of Geo cookies (duration: 00m 13s)
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 04:30 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 28 04:30:54 UTC 2015 (duration 30m 53s)
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
* 02:20 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-28 02:20:52+00:00
* 02:17 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 05m 56s)


== June 27 ==
== 2021-07-30 ==
* 23:30 bd808: Deleted corrupt shards on logstash1004 and logstash1005. Recovery in process
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 20:12 ori: Delegated full access to Google Webmaster Tools for myself (olivneh@).
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 04:58 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 27 04:58:46 UTC 2015 (duration 58m 45s)
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 02:23 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-27 02:23:40+00:00
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 02:20 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 05m 46s)
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 11:23 moritzm: installing libsndfile security updates on stretch
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json


== June 26 ==
== 2021-07-29 ==
* 23:57 bd808: Logstash log ingestion working again after forcing recovery of replicas for logstash-2015.06.26; new logs were being rejected with only a primary shard available
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 23:54 bd808: re-enabled allocation on logstash elasticsearch cluster
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 23:05 bblack: restarted gitblit on antimony, AGAIN
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 22:57 mutante: restarted gitblit
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 22:43 logmsgbot: catrope Synchronized php-1.26wmf11/extensions/Flow: Temporarily make subpages in Flow-occupied namespaces non-Flow again (duration: 00m 14s)
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 22:36 bd808: set indices.recovery.concurrent_streams to 4 on logstash ES cluster
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 22:36 godog: set indices.recovery.max_bytes_per_sec to 10mb on logstash ES cluster
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 22:25 godog: set indices.recovery.max_bytes_per_sec to 50mb on logstash ES cluster
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 22:25 jamesofur: Reset email address of User:Chwms identity verified in person at editathon
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 22:09 bd808: restarted logstash on logstash1001
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 21:10 urandom: taking xenon down to be rebootstrapped
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 20:10 bd808: Deleted 4 corrupt indices (logstash-2015.05.30 logstash-2015.05.31 logstash-2015.06.03 logstash-2015.06.06) on logstash1004
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:58 bd808: stopping elasticsearch on logstash1004 to cleanup corrupt shards
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 17:05 mutante: zirconium - manual cleanup, removing planet
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 17:04 godog: reverted cronolog puppetmaster patch, restarting apache
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:17 Krenair: Deployed patch for T103391
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 12:23 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/221105/ (duration: 00m 12s)
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 12:18 _joe_: added conf1001 to the etcd cluster
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 07:57 logmsgbot: krinkle Synchronized php-1.26wmf11/extensions/Popups: T103610 (duration: 00m 11s)
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 06:04 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun 26 06:04:14 UTC 2015 (duration 4m 13s)
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 05:22 twentyafterfour: restarted apache on iridium to fix phabricator fatal
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 02:33 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-26 02:33:33+00:00
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 02:30 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 05m 36s)
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 00:51 gwicke: reverted restbase1001 canary to 90817c2a
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 00:36 logmsgbot: ori Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi: I0e5f2d3b2: Updated mediawiki/core Project: mediawiki/extensions/SyntaxHighlight_GeSHi (duration: 00m 11s)
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 00:16 logmsgbot: krinkle Synchronized wmf-config/InitialiseSettings.php: T102852 (duration: 00m 12s)
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 00:15 logmsgbot: krinkle Synchronized w/static/images/project-logos/zhwiki-2x.png: T102852 (duration: 00m 13s)
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 00:14 logmsgbot: krinkle Synchronized w/static/images/project-logos/zhwiki-1.5x.png: T102852 (duration: 00m 12s)
* 14:11 vgutierrez: restart pybal on lvs2009
* 00:05 logmsgbot: krinkle Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi/modules/pygments.wrapper.css: I5d1510dc80d6d4712ca8411 (duration: 00m 12s)
* 14:09 vgutierrez: restart pybal on lvs2010
* 14:07 vgutierrez: restart pybal on lvs2008
* 14:05 vgutierrez: restart pybal on lvs2007
* 13:59 vgutierrez: restart pybal on lvs1014
* 13:55 vgutierrez: restart pybal on lvs1015
* 13:52 _joe_: restarting pybal on lvs1016
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:52 moritzm: restarting Tomcat on idp-test
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}


== June 25 ==
== 2021-07-28 ==
* 23:53 mutante: planet1001 (ganeti) - signing puppet cert, initial run
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 23:31 mutante: apt-get upgrade on zirconium
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 23:28 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/220847/ (duration: 00m 12s)
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 23:27 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/220847/ (duration: 00m 11s)
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 23:24 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi: https://gerrit.wikimedia.org/r/#/c/220997/ (duration: 00m 13s)
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 23:20 gwicke: canary update of restbase on restbase1001 to 4b961f166 (deploy d1c4d9961)
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 23:16 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/218926/ (duration: 00m 12s)
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 23:11 logmsgbot: krenair Synchronized wmf-config/logging.php: https://gerrit.wikimedia.org/r/#/c/220784/ (duration: 00m 13s)
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 23:03 legoktm: fixed content models on lrcwiki for Module namespace
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 23:02 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220485/ (duration: 00m 16s)
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 22:02 logmsgbot: hoo Synchronized php-1.26wmf11/extensions/Wikidata/: Update Wikidata: Use SELECT FOR UPDATE in SqlIdGenerator (duration: 00m 20s)
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 21:29 godog: rm /var/lib/git/operations/puppet/modules/cassandra from labcontrol1001 labcontrol1002
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 21:10 godog: rm /var/lib/git/operations/puppet/modules/cassandra from rhodium
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 21:07 godog: rm /var/lib/git/operations/puppet/modules/cassandra from strontium and palladium
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 21:06 godog: push puppet.git after module/cassandra removal T92560
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 20:41 mutante: deleted SVN monitor from watchmouse
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 20:18 mutante: bye SVN - subversion URLs now redirect to phab or doc
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 20:08 logmsgbot: nikerabbit Finished scap: T103888 CX aliases (duration: 22m 37s)
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 19:46 logmsgbot: nikerabbit Started scap: T103888 CX aliases
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 18:09 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf11
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 17:46 logmsgbot: krenair Synchronized wmf-config: (no message) (duration: 00m 31s)
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 17:43 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/218098/ (duration: 00m 12s)
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:43 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/218098/ (duration: 00m 12s)
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:18 logmsgbot: ori Synchronized php-1.26wmf11/resources/src/mediawiki.skinning/elements.css: Ieab6b1473e6ce: תיקון טעות (duration: 00m 12s)
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:59 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/219599/ (duration: 00m 12s)
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 15:57 logmsgbot: krenair Synchronized wmf-config/CommonSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/217539/ - noop for prod, labs only part (duration: 00m 12s)
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 15:56 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/217539/ (duration: 00m 13s)
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:51 logmsgbot: krenair Synchronized wmf-config/flaggedrevs.php: https://gerrit.wikimedia.org/r/#/c/203370/ (duration: 00m 12s)
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 15:49 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/218539/ (duration: 00m 15s)
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 15:32 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/220068/ - noop for prod, just labs (duration: 00m 12s)
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:30 logmsgbot: krenair Synchronized commonsuploads.dblist: https://gerrit.wikimedia.org/r/#/c/220715/ (duration: 00m 12s)
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 15:24 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220747/ (duration: 00m 12s)
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:16 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220408/ (duration: 00m 12s)
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:12 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/SemanticForms/includes/SF_AutoeditAPI.php: https://gerrit.wikimedia.org/r/#/c/220765/ (duration: 00m 12s)
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:04 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220706/ (duration: 00m 12s)
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 15:02 logmsgbot: krenair Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/220653/ (duration: 00m 12s)
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 13:30 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Repool es2003 (but not es2004) after maintenance (duration: 00m 12s)
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 10:57 jynus: rebooting es2003 and es2004
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 10:40 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: depool es2003 and es2004 for maintenance (duration: 00m 13s)
* 13:29 moritzm: installing python2.7 security updates on stretch
* 10:09 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool db1018 (duration: 00m 12s)
* 13:08 moritzm: installing python3.5 security updates on stretch
* 09:02 jynus: restarting mysqld on db1018
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 08:42 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: depool db1018 for maintenance (duration: 00m 13s)
* 11:27 moritzm: installing nginx security updates on thumbor*
* 08:33 logmsgbot: ori Synchronized php-1.26wmf11/resources/src/mediawiki.skinning/elements.css: I0e5f2d3b2: Wrap lines in <nowiki><pre></nowiki> and .mw-code by default (duration: 00m 12s)
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 06:59 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 25 06:59:13 UTC 2015 (duration 59m 12s)
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 04:04 ori: restarted apache2 on palladium
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 03:11 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-25 03:11:01+00:00
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 03:04 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 10m 19s)
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 02:40 bblack: puppet re-enabled on caches
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 02:37 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-25 02:37:44+00:00
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 02:34 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 06m 44s)
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 02:04 bblack: disabling puppet on cp* caches for patch-testing
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 00:43 awight: update crm from bd8a00196071ddd04efbff7b30567dd9357c9000 to e923225e423948bd70440e2d1131460b10cefac1
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 00:38 godog: upgrade cassandra to 2.1.7 on restbase1008
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 00:30 twentyafterfour: phabricator upgrade completed
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 00:28 godog: upgrade cassandra to 2.1.7 on restbase1004
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 00:12 legoktm: <twentyafterfour> Phabricator upgrade happening now. Will be down for a few minutes.
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:27 Amir1: running several long-running queries against pc1007
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 07:53 moritzm: installing aspell security updates on stretch
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php


== June 24 ==
== 2021-07-27 ==
* 23:18 logmsgbot: rmoen Synchronized wmf-config/mobile.php: Enable browse experiment on test and enwiki (duration: 00m 14s)
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 23:17 logmsgbot: rmoen Synchronized wmf-config/InitialiseSettings.php: Enable browse experiment on test and enwiki (duration: 00m 12s)
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 23:13 urandom: rolling restart of Cassandra staging cluster
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 23:04 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/CentralAuth: https://gerrit.wikimedia.org/r/#/c/220637/ (duration: 00m 13s)
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 23:03 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/UserMerge: https://gerrit.wikimedia.org/r/#/c/220638/ (duration: 00m 13s)
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 22:32 mutante: zirconium - stop using 443 at all, rm NameVirtualHost *:443
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 22:30 mutante: zirconium - deleting unused apache configs, bugzilla, etherpad, ...
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 21:09 godog: start cassandra on restbase1008
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 18:41 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf11
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 18:02 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/Flow/includes/Specials/SpecialEnableFlow.php: https://gerrit.wikimedia.org/r/#/c/220514/ (duration: 00m 15s)
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 17:24 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool es2001 and es2002 after maintenance (duration: 00m 13s)
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 17:05 thcipriani: scap completed with the exception of snapshot1001 that's disk is full
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:04 logmsgbot: thcipriani scap failed: OSError [Errno 2] No such file or directory: '/var/lock/scap' (duration: 41m 33s)
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:22 logmsgbot: thcipriani Started scap: SWAT: Automatically add to shell group when adding to a project [[gerrit:220468]]
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 16:10 logmsgbot: ori Synchronized php-1.26wmf11/includes/page/Article.php: I0e5f2d3b2: Revert r47388 / 8d9243cf3: Use Title::getLocalURL() for rel=canonical links (duration: 00m 13s)
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 15:57 logmsgbot: thcipriani Synchronized wmf-config: SWAT: Revert Enable browse prototype on test- and enwiki (duration: 00m 15s)
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 15:49 jynus: rebooting es2001 and es2002
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 15:44 logmsgbot: thcipriani Synchronized wmf-config: SWAT: Enable browse prototype on test- and enwiki [[gerrit:219451]] (duration: 00m 12s)
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 15:24 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ContentTranslation in testwiki [[gerrit:220385]] (duration: 00m 12s)
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 15:17 logmsgbot: thcipriani Synchronized php-1.26wmf11/extensions/ContentTranslation: SWAT: Enable publish button when the preference is not to use initial translation (duration: 00m 12s)
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 15:14 andrewbogott: disabled puppet on labcontrol1001 to hotfix https://gerrit.wikimedia.org/r/#/c/220476/
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 15:08 logmsgbot: thcipriani Synchronized php-1.26wmf10/extensions/ContentTranslation: SWAT: Enable publish button when the preference is not to use initial translation (duration: 00m 13s)
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 14:53 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: depool es2001 and es 2002 for maintenance (duration: 00m 13s)
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 14:12 logmsgbot: krenair Synchronized php-1.26wmf10/extensions/SemanticForms/includes/SF_AutoeditAPI.php: T103653 live hack (duration: 00m 13s)
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 10:44 _joe_: restarting jmxtrans on analytics1021
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 10:31 jgage: restarting kafka on analytics1021
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 10:10 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Switchover master es1008 -> es1009 (duration: 00m 12s)
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 09:24 hashar: removing java 6 from gallium and lanthanum https://phabricator.wikimedia.org/T103491
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:17 hashar: apt-get upgrade on gallium and lanthanum
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 09:16 jynus: performing a master failover of es1008 into es1009
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 08:27 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1004 (duration: 00m 14s)
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 05:46 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jun 24 05:46:32 UTC 2015 (duration 46m 31s)
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 05:12 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1045 (duration: 00m 13s)
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 05:03 jgage: removed old logs and did 'apt-get clean' on analytics1021 to make space
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 03:00 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-24 03:00:45+00:00
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 02:54 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 10m 34s)
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 02:28 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-24 02:28:16+00:00
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 02:24 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 07m 21s)
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 01:39 logmsgbot: ori Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi: I0e5f2d3b2 (duration: 00m 13s)
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 01:01 gwicke: rolling restart of cassandra instances to rule out a single node in funky state causing elevated p99 latency
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 00:43 ori: experimenting with httpd on mw1041 again
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 00:19 gwicke: rolling restart of restbase instances to rule out backend connections as a source for high p99 latencies
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 00:14 ori: experimenting with HHVM shutdown via /stop on the admin server on mw1041
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:11 moritzm: installing aspell security updates
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support.  Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 11:23 Lucas_WMDE: EU backport+config window done
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 08:57 _joe_: repooling mw225[12] for apis
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 08:36 jynus: reenabled puppet on mwmaint1002
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 07:52 jynus: disabling puppet on mwmaint1002
* 07:14 moritzm: installing krb security updates on buster
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json


== June 23 ==
== 2021-07-26 ==
* 23:38 logmsgbot: ori Finished scap: scapping to all apaches for --restart test (duration: 07m 03s)
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 23:30 logmsgbot: ori Started scap: scapping to all apaches for --restart test
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 23:24 bblack: nginxes all updated for ssl stapling bugfix
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 23:24 logmsgbot: ori Finished scap: scapping to scap-test dsh group for --restart test (duration: 06m 02s)
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 23:18 logmsgbot: ori Started scap: scapping to scap-test dsh group for --restart test
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 23:16 logmsgbot: ori scap aborted: scapping to scap-test dsh group for --restart test (duration: 00m 06s)
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 23:16 logmsgbot: ori Started scap: scapping to scap-test dsh group for --restart test
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 22:14 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php: RejectParserCacheValue may pass a WikiPage or Article (duration: 00m 13s)
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 22:07 mutante: tmp. disabling puppet on mw1033
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 21:53 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php: (no message) (duration: 00m 15s)
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 21:50 logmsgbot: ori Synchronized php-1.26wmf11/includes/parser/ParserCache.php: (no message) (duration: 00m 12s)
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 21:40 mutante: starting instance planet1001 on ganeti1003 - cant get console
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 21:40 logmsgbot: legoktm Synchronized php-1.26wmf11/includes/parser/ParserCache.php: (no message) (duration: 00m 13s)
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 21:36 bd808: updated scap to 33f3002 (Ensure that the minimum batch size used by cluster_ssh is 1)
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:34 logmsgbot: ori Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi: 3c8bb2c493: Update SyntaxHighlight_GeSHi for cherry-pick (duration: 00m 13s)
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 20:32 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.26wmf11
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 20:19 logmsgbot: mattflaschen Synchronized wmf-config/InitialiseSettings-labs.php: Beta-only change to add Flow_test to enwiki (duration: 00m 11s)
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 19:59 logmsgbot: ori scap failed: OSError [Errno 10] No child processes (duration: 01m 46s)
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 19:58 logmsgbot: ori Started scap: (no message)
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 19:52 ori: updated scap to master
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 19:11 ori: running apache graceful-stop on mw1042 to test mod_status behavior during graceful stop
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 19:02 logmsgbot: twentyafterfour Finished scap: New deployment branch: 1.26wmf11 try #2 (13 apaches failed) (duration: 03m 50s)
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 18:58 logmsgbot: twentyafterfour Started scap: New deployment branch: 1.26wmf11 try #2 (13 apaches failed)
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:53 logmsgbot: twentyafterfour Finished scap: New deployment branch: 1.26wmf11 (duration: 26m 37s)
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 18:31 godog: start rolling-downgrade of cassandra to 2.1.3 T102015
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 18:27 logmsgbot: twentyafterfour Started scap: New deployment branch: 1.26wmf11
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 18:13 logmsgbot: ori Finished scap: (no message) (duration: 04m 34s)
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 18:11 paravoid: reloading nginx on all cp* for reuseport
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 18:08 logmsgbot: ori Started scap: (no message)
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 17:57 ori: repooled scap-test servers (mw1170-mw1175 and mw1270-mw1275)
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 17:16 logmsgbot: ori Finished scap: (no message) (duration: 01m 42s)
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 17:14 logmsgbot: ori Started scap: (no message)
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 17:10 logmsgbot: ori Finished scap: (no message) (duration: 01m 34s)
* 06:39 moritzm: installing krb5 security updates
* 17:09 logmsgbot: ori Started scap: (no message)
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki
* 17:06 logmsgbot: ori scap aborted: (no message) (duration: 01m 23s)
* 17:04 logmsgbot: ori Started scap: (no message)
* 16:53 logmsgbot: bd808 Finished scap: no-op sync to scap-test dsh group; Testing HHVM restart take 4 (duration: 01m 30s)
* 16:52 logmsgbot: bd808 Started scap: no-op sync to scap-test dsh group; Testing HHVM restart take 4
* 16:45 cscott: updated OCG to version db7a56965233a74c73917c78b5c8c84c867321d9
* 16:37 logmsgbot: bd808 Finished scap: no-op sync to scap-test dsh group; Testing HHVM restart take 3 (duration: 01m 12s)
* 16:35 logmsgbot: bd808 Started scap: no-op sync to scap-test dsh group; Testing HHVM restart take 3
* 16:35 bd808: updated scap to da64a65 (Cast pid read from file to an int)
* 16:26 logmsgbot: bd808 Finished scap: no-op sync to scap-test dsh group; Testing HHVM restart take 2 (duration: 01m 26s)
* 16:25 logmsgbot: bd808 Started scap: no-op sync to scap-test dsh group; Testing HHVM restart take 2
* 16:22 bd808: updated scap to 947b93f (Fix reference to _get_apache_list)
* 16:12 logmsgbot: bd808 scap failed: AttributeError 'Scap' object has no attribute '_get_apache_list' (duration: 02m 15s)
* 16:10 logmsgbot: bd808 Started scap: no-op sync to scap-test dsh group; Testing HHVM restart
* 16:01 paravoid: staggered upgrade of cp* fleet to nginx 1.9.2
* 15:57 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: Follow-up 94e5fd2: Default wmgUseContentTranslation true only on Wikipedias [[gerrit:220161]] (duration: 00m 16s)
* 15:49 jynus: rebooting es1004
* 15:09 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Enable CX as default except where it is not deployed [[gerrit:220078]] (duration: 00m 12s)
* 15:04 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable 'frwiki-recommender' campaign in frwiki [[gerrit:220071]] (duration: 00m 13s)
* 14:54 paravoid: reprepro: including nginx 1.9.2-1~bpo8+1 to jessie-wikimedia/backports
* 14:39 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1003, depool es1004 (duration: 00m 12s)
* 14:04 cscott: reverted OCG to version ca4f64852de5b1de782b292b50038fbd2dd84266 (bundler failing with exit code 8)
* 13:57 cscott: updated OCG to version d7c698d5bf730d34057945e912ac75dc542dd788
* 13:44 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/209744/ (duration: 00m 13s)
* 13:44 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/209744/ (duration: 00m 12s)
* 12:54 moritzm: ssh on precise hosts has been updated to a backport of 6.6p1-2ubuntu2 (the version from trusty). this allows us to use modern crypto (plus labs can simplify key handling)
* 12:45 jynus: rebooting es1003
* 12:18 moritzm: uploaded openssh_6.6p1-2ubuntu2~wmfprecise2 to precise-wikimedia on apt.wikimedia.org
* 12:10 logmsgbot: hoo Synchronized arbitraryaccess.dblist: Arbitrary access for ruwiki and cswiki. T102122 (duration: 00m 12s)
* 11:33 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1002, depool es1003 (part 2/2) (duration: 00m 12s)
* 11:25 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1002, depool es1003 (duration: 00m 12s)
* 09:41 moritzm: updated jsch on gallium and lanthanum to support modern SSH key exchange in Jenkins (actually that happened yesterday, but I forgot to log it back then)
* 09:41 moritzm: added jsch_0.1.50-1ubuntu1~wmfprecise1 to precise-wikimedia on carbon
* 09:09 akosiaris: failing over etherpad to db1016
* 04:53 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 23 04:53:17 UTC 2015 (duration 53m 16s)
* 03:33 springle: xtrabackup clone db2023 to db1045
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-23 02:26:44+00:00
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 06m 47s)
* 01:17 logmsgbot: krinkle Synchronized docroot and w: (no message) (duration: 00m 12s)
* 01:00 bd808: Pruned virt1000 from trebuchet minions list: redis-cli srem "deploy:scap/scap:minions" virt1000.wikimedia.org


== June 22 ==
== 2021-07-24 ==
* 23:42 gwicke: restarted Cassandra on restbase1006
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]
* 23:27 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/MobileFrontend: For real this time (duration: 00m 14s)
* 23:27 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/Gather: For real this time (duration: 00m 13s)
* 23:17 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/Gather: SWAT (duration: 00m 12s)
* 23:17 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/MobileFrontend/: SWAT (duration: 00m 15s)
* 23:12 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Enable TinyRGB ICC profile swapping on testwiki (duration: 00m 13s)
* 22:51 logmsgbot: ori Synchronized php-1.26wmf10/resources/src/mediawiki/mediawiki.Title.js: I0e5f2d3b2: Fix undeclared dependency on jquery.mwExtension (duration: 00m 12s)
* 22:45 gwicke: restarting Cassandra on restbase1005 to get the metrics back
* 22:37 gwicke: restarting Cassandra on restbase1004 to get the metrics back
* 22:33 gwicke: restarting Cassandra on restbase1003 to get the metrics back
* 22:24 gwicke: restarting Cassandra on restbase1002 to get the metrics back
* 22:19 bd808: scap error "@ERROR: access denied to common from localhost (127.0.0.1)" from mw2187 and mw2080 on sync-file test.
* 22:17 logmsgbot: bd808 Synchronized README: Testing sync-file after scap update (duration: 00m 12s)
* 22:08 RoanKattouw: Deployed patch for T103054
* 21:59 godog: reboot restbase1008
* 21:56 bd808: updated scap to 81b7c14 (Move dsh group file names to config)
* 21:55 bd808: trebuchet checkout for scap/scap failed on 23 hosts: mw1104, mw1222, mw2009, mw2011, mw2021, mw2028, mw2031, mw2034, mw2069, mw2076, mw2080, mw2086, mw2095, mw2099, mw2120, mw2127, mw2131, mw2136, mw2170, mw2187, mw2189, mw2197, virt1000
* 21:50 bd808: trebuchet fetch for scap/scap failed on mw2086.codfw.wmnet, mw1222.eqiad.wmnet and virt1000.wikimedia.org
* 21:41 gwicke: restarting Cassandra on restbase1001 to get the metrics back
* 21:20 ori: Depooled mw1170-mw1175 and mw1270-mw1275 for testing Idddcfe46
* 21:07 chasemp: rebooting mw1101 the hard way
* 20:28 cscott: updated Parsoid to version d488783e
* 19:34 akosiaris: delete pad:ips from etherpad
* 19:01 jynus: rebooting es1002
* 18:52 logmsgbot: ori Synchronized php-1.26wmf10/includes/OutputPage.php: I0e5f2d3b2: Construct clean canonical URLs for wiki pages, ignoring request URL (T67402) (duration: 00m 14s)
* 18:01 legoktm: live-hacking mw1017 to debug T103053
* 17:49 mutante: Bugzilla has left the building
* 16:31 jynus: reseting wikitech-static mysql contents to improve fragmentation
* 16:26 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1001, depool es1002 (duration: 00m 14s)
* 16:12 andrewbogott: shutting down virt1000
* 16:08 andrewbogott: disabling puppet on virt1000
* 16:07 ottomata: deploying eventlogging 0.9.  This includes changes for arbitrary eventlogging URIs in all eventlogging stages, as well as support for schema based kafka topic URIs. 
* 15:24 logmsgbot: thcipriani Synchronized php-1.26wmf10/extensions/WikiEditor: SWAT: Reduce 'Edit' EventLogging schema sampling rate to 6.25% (1/16th) [[gerrit:219837]] (duration: 00m 13s)
* 15:04 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Default wmgUseWikibaseQuality on beta to true. [[gerrit:219630]] (duration: 00m 14s)
* 14:32 hashar: restarting Jenkins
* 13:26 jynus: rebooting es1001 for regular maintenance
* 12:08 paravoid: powercycled ms-be1002, stuck at console
* 11:12 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Depool es1001 (duration: 00m 13s)
* 11:06 _joe_: restarting hhvm on the low-memory appservers (main and api)
* 09:23 hashar: upgrading Jenkins gearman plugin from 0.1.1 to latest master (f2024bd). Restarting Jenkins.
* 05:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 22 05:11:22 UTC 2015 (duration 11m 21s)
* 02:31 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-22 02:31:32+00:00
* 02:27 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 07m 27s)
* 00:44 jgage: restarted gitblit on antimony again


== June 21 ==
== 2021-07-23 ==
* 11:28 jynus: restarting apache on mw1110
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 06:55 gwicke: restarted  bootstrap on restbase1009 earlier today; hardware hasn't died yet
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 05:01 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 21 05:01:07 UTC 2015 (duration 1m 6s)
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-21 02:27:13+00:00
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 10m 23s)
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 01:39 jgage: restarted gitblit on antimony at 00:43 UTC
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 01:37 Krenair: testing morebots
* 16:15 effie: enable puppet on mc-gp* hosts
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]


== June 20 ==
== 2021-07-22 ==
* 22:50 bblack: restarted gitblit java service on antimony
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 04:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 20 04:27:14 UTC 2015 (duration 27m 13s)
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 02:21 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-20 02:21:30+00:00
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 07m 02s)
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 14:27 moritzm: installing libwebp security updates on stretch
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 11:36 Lucas_WMDE: EU backport+config window done
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:42 effie: restart pybal on lvs2010  and lvs1016
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE


== June 19 ==
== 2021-07-21 ==
* 23:32 gwicke: upgraded restbase1006 to cassandra 2.1.7
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 23:30 gwicke: starting cassandra bootstrap on restbase1009
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:37 gwicke: upgraded cassandra on 1003 to 2.1.7 (pre-release, likely going out on Monday)
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 18:32 godog: stop cassandra on restbase1008
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:45 logmsgbot: krenair Synchronized private/PrivateSettings.php: sync 4a30446e for wikitech cleanup - T102361 (duration: 00m 12s)
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:24 godog: install linux 3.19 on restbase100[789]
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 17:12 ori: salt -t30 -G 'php:hhvm' cmd.run 'rm -f /usr/local/bin/check_tc_space' (https://gerrit.wikimedia.org/r/#/c/219102/)
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:54 moritzm: updated/rebooted nescio/maerlant to 3.19
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 13:40 andrewbogott: test test test
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:19 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-19 02:19:33+00:00
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 05m 08s)
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 00:49 springle: killed storm of research queries on dbstore1002, load avg 90+, replag, likely explosion, etc. emailing analytics@
* 20:27 dancy: testing upcoming Scap release on beta
* 00:13 logmsgbot: ebernhardson Synchronized php-1.26wmf10/extensions/Flow/tests/: no-op sync of flow test cases in wmf10 (duration: 00m 17s)
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 00:11 logmsgbot: ebernhardson Synchronized php-1.26wmf10/skins/Vector/: Bump Vector submodule in 1.26wmf10 for swat (duration: 00m 12s)
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 10:50 moritzm: installing systemd security updates on bullseye
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 10:14 effie: enable puppet on mw* servers
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:17 effie: enable puppet on alert*
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 07:16 godog: powercycle ms-be2048
* 07:03 moritzm: installing systemd security updates on stretch
* 06:51 effie: restart memcached on eqiad mc* hosts
* 06:51 effie: enable puppet on mc* hosts
* 06:35 effie: disable puppet on mc1* hosts and icinga - [[phab:T271967|T271967]]
* 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== June 18 ==
== 2021-07-20 ==
* 23:37 logmsgbot: ebernhardson Synchronized php-1.26wmf9/skins/Vector: Bump Vector in 1.26wmf9 for SWAT (duration: 00m 16s)
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|caa5a076f39b051b01622aa3e4c9d716a8643eef}}: Set wgGEMentorDashboardBackendEnabled properly ([[phab:T285811|T285811]]) (duration: 00m 57s)
* 23:22 logmsgbot: ebernhardson Synchronized wmf-config/: Actually enable the feedback link on Special:Search (duration: 00m 17s)
* 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|dafd953eb5cd35bddbd2fd348b03066420a42362}}: updateMenteeData: Make it possible to disable script per-wiki ([[phab:T285811|T285811]]) (duration: 00m 58s)
* 23:08 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Enable wgCirrusSearchFeedbackLink on enwiki (duration: 00m 13s)
* 18:57 urbanecm: Start server-side upload for 4 large PNG files ([[phab:T285708|T285708]])
* 21:07 godog: start (bootstrap) cassandra on restbase1008
* 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
* 20:43 akosiaris: uploaded to apt.wikimedia.org trusty-wikimedia: apertium-urd-hin_0.1.0+svn~r60389-1
* 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 20:17 akosiaris: restarted salt on sca1001, truncate log files. keep a sample in /tmp/
* 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 20:03 chasemp: apache && hhvm restart for mw 1243 1250 1254 1256 1257
* 17:06 rzl: enabled puppet on A:mw
* 20:00 chasemp: apache && hhvm restart for mw...1256 1255 1254 1250 1243 1242 1071 1021
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 19:58 mutante: restarting hhvm on mw1021, mw1071
* 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 19:27 godog: bounce cassandra on restbase1003, new logging configuration
* 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
* 19:26 akosiaris: puppet-merged on strontium
* 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 19:15 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedia wikis to 1.26wmf10
* 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 19:06 godog: upgrade cassandra to 2.1.6 on restbase1003
* 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:56 akosiaris: uploaded to apt.wikimedia.org jessie-wikimedia: apertium-urd_0.1.0~r57551-1
* 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
* 18:56 akosiaris: uploaded to apt.wikimedia.org jessie-wikimedia: apertium-hin_0.1.0~r57344-1
* 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:56 akosiaris: uploaded to apt.wikimedia.org jessie-wikimedia: apertium-cy-en_0.1.1~r57554-1
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
* 18:43 legoktm: fixed content model of MediaWiki:Common.css@lrcwiki
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
* 18:18 YuviPanda: restarted nutcracker on wikitech
* 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
* 18:16 YuviPanda: restarted keystone on labcontrol1001
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
* 17:13 gwicke: bouncing cassandra on restbase1002
* 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
* 17:11 godog: restart cassandra on restbase1004
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
* 15:53 gwicke: updated restbase to 7ffaf94b
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
* 15:13 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Hovercards: Disable test release on Catalan and Greek Wikipedias [[gerrit:215932]] (duration: 00m 13s)
* 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:06 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for deployment on 20150618 [[gerrit:218886]] (duration: 00m 14s)
* 15:23 vgutierrez: pool dns1002 - [[phab:T286069|T286069]]
* 11:14 akosiaris: powercycling labstore2001
* 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 09:08 moritzm: added firejail_0.9.26-1~wmfjessie1 and firejail_0.9.26-1~wmftrusty1 to apt.wikimedia.org
* 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 08:45 jynus: very brief replication stop for s7, already corrected
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
* 06:51 Coren: rebooting labstore2001
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
* 06:32 legoktm: live hacking mw1017 for T102915
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
* 05:26 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 18 05:26:01 UTC 2015 (duration 26m 0s)
* 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 02:48 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-18 02:48:44+00:00
* 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 02:46 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 05m 03s)
* 14:53 urbanecm: Start server-side upload for 7 large PNG files ([[phab:T285708|T285708]])
* 02:32 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-18 02:32:45+00:00
* 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
* 02:28 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 56s)
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 02:04 springle: applied T99941 scema change to all remaining affected (ie, old) wikis
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 02:01 tgr: ran https://gerrit.wikimedia.org/r/#/c/159350/7/backend/schema/mysql/developer_agreement.sql on mediawikiwiki
* 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 01:32 ejegg: updated payments from f33d0a8687a120a2057a7e6acad67da63b17f97e to a17ee221db0dbde70c92e24fc188379b6dbad613
* 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 01:20 logmsgbot: ori Synchronized php-1.26wmf10/resources/src/mediawiki.action/mediawiki.action.edit.stash.js: 0c21a14a6e: Revert StashEdit: Use postWithToken (duration: 00m 13s)
* 14:46 vgutierrez: depool dns1002 - [[phab:T286069|T286069]]
* 01:06 twentyafterfour: applied hotfix for T102276 and restarted apache on iridium
* 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 00:00 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf10
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
* 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
* 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
* 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10{{!}}0[1-9]).codfw.wmnet
* 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - [[phab:T285643|T285643]]
* 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 12:44 moritzm: installing systemd security updates on buster
* 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
* 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
* 11:58 Lucas_WMDE: EU config+backport window done
* 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (3/3) (duration: 00m 56s)
* 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (2/3) (duration: 00m 56s)
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (1/3) (duration: 00m 56s)
* 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 3/3) (duration: 00m 56s)
* 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 2/3) (duration: 00m 56s)
* 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 1/3) (duration: 00m 57s)
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:705498{{!}}Add patroller group for ckbwiki (T285221)]] (duration: 00m 57s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (2/2, beta) (duration: 00m 56s)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (1/2, prod) (duration: 00m 57s)
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704867{{!}}Update config for language switching on pilot wikis (T286459)]] (duration: 00m 59s)
* 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
* 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
* 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
* 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
* 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
* 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
* 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
* 03:17 eileen: civicrm revision changed from {{Gerrit|20e9ef6bbb}} to {{Gerrit|819c11307d}}, config revision is {{Gerrit|bb405c5232}}


== June 17 ==
== 2021-07-19 ==
* 23:35 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/Gather: SWAT (duration: 00m 14s)
* 20:48 urbanecm: Deploy security patch for [[phab:T286884|T286884]]
* 23:35 gwicke: rolled back restbase to 90817c2a
* 20:29 vgutierrez: pool text@codfw - [[phab:T286921|T286921]]
* 23:24 logmsgbot: catrope Synchronized php-1.26wmf9/extensions/MobileFrontend: SWAT (duration: 00m 15s)
* 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:23 logmsgbot: catrope Synchronized php-1.26wmf9/extensions/Flow: SWAT (duration: 00m 15s)
* 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
* 22:45 gwicke: rolling restart of cassandra nodes
* 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467{{!}}prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s)
* 22:09 gwicke: rolling restart of restbase instances to apply puppet change after puppet actually ran on all nodes
* 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
* 21:58 gwicke: rolling restart of restbase instances to apply config change
* 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448{{!}}Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s)
* 21:56 godog: restart nutcracker on mw1145
* 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - [[phab:T286921|T286921]]
* 21:35 gwicke: restarting cassandra on restbase1005
* 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - [[phab:T286921|T286921]]
* 20:47 mutante: temp. stopped icinga-wm
* 18:46 brennen: gerrit1001: restarting gerrit
* 20:37 gwicke: deployed RESTBase 7ffaf94bfc
* 18:40 vgutierrez: stop pybal on lvs2009  - [[phab:T286921|T286921]]
* 20:24 cscott: updated Parsoid to version 402ddf66
* 18:38 brennen: re-enabling puppet on gerrit1001]
* 20:01 ottomata: resized antimony's / LV from 30G to 100G.  looks like /var/lib/git was getting filled up
* 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - [[phab:T286921|T286921]]
* 19:43 jynus: rolling schema changes on hewiki
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 19:29 godog: downgrade and restart cassandra to 2.1.3 on restbase1001, metrics not being pushed to graphite with 2.1.6
* 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
* 19:05 godog: bounce cassandra on xenon
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P<nowiki>{</nowiki>cloudelastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:46 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Ic03b152de: Make $wgUploadPath for commons https only for benefit instant commons (duration: 00m 14s)
* 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - [[phab:T286921|T286921]]
* 18:11 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf10
* 18:20 vgutierrez: enabling pybal on lvs2007 - [[phab:T286921|T286921]]
* 17:45 godog: bounce cassandra on restbase1001
* 18:19 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 17:39 mutante: repooled mw1234
* 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
* 17:24 ottomata: starting reinstall of Zookeeper analytics nodes (analytics102[345]): https://phabricator.wikimedia.org/T101713
* 18:06 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:705437{{!}}pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately]] (duration: 00m 56s)
* 17:16 godog: bounce cassandra on restbase1001
* 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:14 jynus: rolling schema changes on ruwiki master
* 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:13 mutante: running puppet via salt on api appservers in batches, switch to ganglia_new and carbon
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:12 godog: cassandra stopped sending graphite metrics after restart, investigating (test cluster works fine tho)
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 16:58 jynus: rolling schema changes on ruwiki slaves
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
* 16:28 godog: start upgrading restbase1001 to cassandra 2.1.6 T102015
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 16:02 logmsgbot: thcipriani Finished scap: Wikitech-Ldap host record roll-out (duration: 24m 35s)
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
* 15:37 logmsgbot: thcipriani Started scap: Wikitech-Ldap host record roll-out
* 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 15:19 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Give patrolmarks right to "*" on dewiki [[gerrit:218901]] (duration: 00m 13s)
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
* 15:17 logmsgbot: anomie Synchronized wmf-config/throttle.php: SWAT: Add a throttle exception for United Islands of Prague [[gerrit:217413]] (duration: 00m 14s)
* 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 15:15 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable captcha on labswiki for now [[gerrit:218908]] (duration: 00m 13s)
* 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
* 15:10 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Add extra namespace aliases for Italian Wikipedia [[gerrit:215708]] (duration: 00m 13s)
* 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
* 15:08 anomie: SWAT: Enable anti-abuse features on labswiki [[gerrit:218903]]
* 17:30 volans: running puppet on elastic2038 after nework was restored
* 15:08 jynus: testing some schema changes on testwiki
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
* 15:00 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on nowiki and plwiki (duration: 00m 13s)
* 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 13:56 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on fiwiki and idwiki (duration: 00m 13s)
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
* 13:26 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on bgwiki and eowiki (duration: 00m 13s)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 10:52 akosiaris: reload pybal on lvs1006
* 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 10:50 mobrovac: finished deploying mathoid I40ef68 on SCA
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 10:48 akosiaris: repooled mathoid.svc.eqiad.wmnet: sca1002 backend
* 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 10:44 akosiaris: enable puppet on sca1002
* 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 10:43 akosiaris: enable puppet
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 10:43 akosiaris: depool sca1002 for mathoid.svc.eqiad.wmnet
* 17:23 volans: running authdns-update to force-update authdns2001
* 10:43 akosiaris: reloaded pybal on lvs1003
* 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 10:28 akosiaris: repool sca1002, depool sca1001
* 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:18 mark: Halting pvmove of md124 on labstore1001
* 17:21 XioNoX: remove ns1 redirect - [[phab:T286787|T286787]]
* 09:30 akosiaris: disable puppet on sca1001
* 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
* 09:09 akosiaris: depool sca1001, resource: mathoid
* 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:09 akosiaris: puppet disabled on sca1002
* 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
* 08:37 YuviPanda: run sudo salt -t 20 -b 100 '*' cmd.run 'sudo service salt-minion restart' on virt1000, attempt to get them to answer on labcontrol1001 instead
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
* 06:52 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jun 17 06:52:58 UTC 2015 (duration 52m 57s)
* 17:10 XioNoX: enable asw-a2-codfw access ports - [[phab:T286787|T286787]]
* 02:56 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-17 02:56:49+00:00
* 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - [[phab:T286787|T286787]]
* 02:55 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1045 (duration: 00m 13s)
* 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
* 02:54 springle: found wikiversions.json modified on tin since 2015-06-16 23:27 (catrope?); stashed and reapplied the file in order to do a pull
* 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
* 02:54 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 04m 44s)
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
* 02:35 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-17 02:35:23+00:00
* 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
* 02:32 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 06m 12s)
* 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
* 02:21 logmsgbot: ori Synchronized php-1.26wmf9/extensions/CentralNotice/modules/ext.centralNotice.bannerController/bannerController.js: I480cbc7ad (duration: 00m 12s)
* 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
* 02:21 logmsgbot: ori Synchronized php-1.26wmf10/extensions/CentralNotice/modules/ext.centralNotice.bannerController/bannerController.js: I480cbc7ad (duration: 00m 12s)
* 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
* 00:10 paravoid: draining esams because of upcoming network maintenance window
* 16:40 XioNoX: update asw-a2-codfw serial number - [[phab:T286787|T286787]]
* 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
* 16:21 mutante: depooled logstash2021 for dcops maintenance work
* 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
* 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
* 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|310be45f7}} (duration: 00m 57s)
* 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
* 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
* 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2bdfbd258e}} (duration: 00m 57s)
* 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I069c7b53}} (duration: 00m 58s)
* 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
* 15:10 godog: +100G to prometheus/ops in codfw
* 14:59 vgutierrez: rolling restart of eqiad pybal instances
* 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:42 vgutierrez: rolling restart of codfw pybal instances
* 14:33 vgutierrez: rolling restart of eqsin pybal instances
* 14:23 vgutierrez: rolling restart of ulsfo pybal instances
* 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
* 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
* 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
* 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
* 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
* 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
* 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
* 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
* 11:40 moritzm: installing bluez security updates
* 11:31 Lucas_WMDE: EU backport+config window done
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:703205{{!}}Add config for updated PropertySuggester beta cluster (T285098)]] (beta-only) (duration: 00m 57s)
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 09:52 moritzm: imported megacli for bullseye-wikimedia [[phab:T282272|T282272]] [[phab:T275873|T275873]]
* 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
* 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
* 08:15 vgutierrez: depool codfw text traffic
* 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
* 03:26 twentyafterfour: restarted phd on phab1001
* 03:25 twentyafterfour: investigating PHD failure


== June 16 ==
== 2021-07-16 ==
* 23:28 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Enable local upload on fawikivoyage; enable logging for T76305 (duration: 00m 13s)
* 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:28 logmsgbot: catrope Synchronized wmf-config/CommonSettings.php: Set previous values for password length policies (duration: 00m 16s)
* 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:17 logmsgbot: twentyafterfour Finished scap: testwiki to 1.26wmf10 (duration: 43m 04s)
* 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:02 godog: restore INFO cassandra logging level on restbase1003
* 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 22:44 godog: start cassandra on restbase1008
* 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
* 22:43 godog: enable back some cassandra debugging on restbase1003
* 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
* 22:33 logmsgbot: twentyafterfour Started scap: testwiki to 1.26wmf10
* 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P<nowiki>{</nowiki>elastic2*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
* 22:26 urandom: restored default logging level on restbase1003
* 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
* 22:22 urandom: enabling even more debugging on restbase1003
* 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
* 22:14 urandom: enable (some) debug logging on restbase1003
* 15:48 vgutierrez: restart pybal on lvs2010
* 21:57 logmsgbot: twentyafterfour scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.SxGNHsmVYP" ' returned non-zero exit status 1 (duration: 01m 24s)
* 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 21:56 logmsgbot: twentyafterfour Started scap: testwiki to 1.26wmf10
* 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
* 20:34 logmsgbot: krinkle Synchronized php-1.26wmf9/extensions/WikimediaEvents/modules/ext.wikimediaEvents.resourceloader.js: T101806 live hack (duration: 00m 12s)
* 15:24 godog: downtime flappy pages in codfw for 40 minutes
* 19:24 Coren: labstore1001 pvmove of slice2 to slice 51 started; some bursts of iowait expected but should have minimal enduser impact)
* 15:14 godog: set alert2001 as active in netbox (was staged) - [[phab:T247966|T247966]]
* 18:36 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Fix usage tracking setting (duration: 00m 14s)
* 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
* 18:03 godog: bounce statsite on graphite1001, stuck while writing to graphite
* 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
* 17:30 ejegg: update SmashPig on listener from e1e925c9fc2a60c1e14ef01d8b653dc09512f51f to 258f2c917b1ae50b01231927bcd6f58ecaa8940b
* 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw [[phab:T286787|T286787]]
* 17:23 logmsgbot: krinkle Synchronized php-1.26wmf9/includes/resourceloader/ResourceLoader.php: undo live hack (duration: 00m 13s)
* 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
* 17:09 logmsgbot: aude Synchronized arbitraryaccess.dblist: Enable arbitrary access on gomwiki and lrcwiki (duration: 00m 13s)
* 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:09 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on second batch of s3 wikis (duration: 00m 13s)
* 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
* 17:03 logmsgbot: bblack Synchronized wmf-config/InitialiseSettings.php: wgCanonicalServer: HTTPS for all (duration: 00m 15s)
* 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
* 16:44 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 13s)
* 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 16:43 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 13s)
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 16:43 logmsgbot: krenair Synchronized w/static/images/project-logos/gomwiki.png: (no message) (duration: 00m 14s)
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 16:42 logmsgbot: krenair Synchronized langlist: gomwiki (duration: 00m 13s)
* 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 16:41 logmsgbot: krenair rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
* 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 16:40 logmsgbot: krenair Synchronized database lists: (no message) (duration: 00m 13s)
* 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:29 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 13s)
* 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
* 16:27 logmsgbot: krenair Synchronized langlist: (no message) (duration: 00m 14s)
* 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
* 16:25 logmsgbot: krenair Synchronized w/static/images/project-logos/lrcwiki.png: (no message) (duration: 00m 13s)
* 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers ([[phab:T279309|T279309]])
* 16:21 moritzm: updated copper, oxygen, labstore2001 and labnodepool1001 to the 3.19 kernel
* 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 16:11 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 13s)
* 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 16:10 logmsgbot: krenair Synchronized wmf-config: (no message) (duration: 00m 14s)
* 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 16:06 logmsgbot: krenair rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
* 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 16:05 logmsgbot: krenair Synchronized database lists: (no message) (duration: 00m 15s)
* 12:39 mutante: mw1412 through mw1428 - set to active in netbox ([[phab:T279309|T279309]])
* 15:43 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: templateeditor: add templateeditor right in hewiki [[gerrit:218426]] (duration: 00m 13s)
* 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:09 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Turn on wgGenerateThumbnailOnParse for wikitech. [[gerrit:218553]] (duration: 00m 12s)
* 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
* 15:03 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for CX deployment on 20150616 [[gerrit:218341]] (duration: 00m 12s)
* 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
* 14:18 cmjohnson: barium is going down for disk replacement
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
* 13:38 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on dewiki (duration: 00m 15s)
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
* 13:18 akosiaris: rebooted etherpad1001 for kernel upgrades
* 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:51 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Repool es2005, es2006 and es2007 after maintenance (duration: 00m 13s)
* 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
* 12:44 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on cswiki (duration: 00m 14s)
* 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
* 12:20 logmsgbot: aude Synchronized usagetracking.dblist: Enable usage tracking on ruwiki (duration: 00m 15s)
* 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
* 11:21 paravoid: restarting the puppetmaster
* 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
* 11:19 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1073, warm up (duration: 00m 13s)
* 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
* 10:36 akosiaris: rebooting ganeti200{1..6}.codfw.wmnet for kernel upgrades
* 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 09:33 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Depool es2005, es2006 and es2007 for maintenance (duration: 00m 14s)
* 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 09:10 YuviPanda: deleted huge puppet-master.log on labcontrol1001
* 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:05 jynus: added m5-slave to dns servers
* 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 07:52 paravoid: restarting hhvm on mw1121
* 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 07:52 moritzm: blacklisted the overlayfs kernel module (prevents a reliable local root exploit on all Ubuntu systems). no systems in the fleet had an overlaysfs mount present or the kernel module loaded, so there should be no impact on existing systems. Note: This is a bandaid, I'll create a Phab task to deploy this via puppet in the future (and to also blacklist additional desktopy kernel modules which increase our attack
* 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:39 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1005 (duration: 00m 14s)
* 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 06:24 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 16 06:24:04 UTC 2015 (duration 24m 3s)
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 06:18 godog: restore ES replication throttling to 20mb/s
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
* 06:13 godog: restore ES replication throttling to 40mb/s
* 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
* 06:08 logmsgbot: filippo Synchronized wmf-config/PoolCounterSettings-common.php: unthrottle ES (duration: 00m 14s)
* 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
* 05:56 godog: bump ES replication throttling to 60mb/s
* 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 05:50 manybubbles: ok - we're yellow and recovering. ops can take this from here. We have a root cause and we have things I can complain about to the elastic folks I plan to meet with today anyway. I'm going to finish waking up now.
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 05:49 manybubbles: reenabling puppet agent on elasticsearch machines
* 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 05:46 manybubbles: I expect them to be red for another few minutes during the initial master recovery
* 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 05:45 manybubbles: started all elasticsearch nodes and now they are recovering.
* 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures [[phab:T286763|T286763]]', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
* 05:41 godog: restart gmond on elastic1007
* 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 05:39 logmsgbot: filippo Synchronized wmf-config/PoolCounterSettings-common.php: throttle ES (duration: 00m 13s)
* 05:25 manybubbles: shutting down all the elasticsearch on the elasticsearch nodes against - another full cluster restart should fix it like it did last time...............
* 05:11 godog: restart elasticsearch on elastic1031
* 03:06 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 00m 12s)
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-16 02:27:51+00:00
* 02:24 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 52s)
* 00:55 tgr: running extensions/Gather/maintenance/updateCounts.php for gather wikis - https://phabricator.wikimedia.org/T101460
* 00:52 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1057, warm up (duration: 00m 13s)
* 00:46 godog: killed bacula-fd on graphite1001, shouldn't be running and consuming bandwidth (cc akosiaris)
* 00:27 godog: kill python stats on cp1052, filling /tmp


== June 15 ==
== 2021-07-15 ==
* 23:42 ori: Cleaning up renamed jobqueue metrics on graphite{1,2}001
* 23:32 brennen: checking stashbot: [[phab:T286756|T286756]]
* 23:01 godog: killed bacula-fd on graphite2001, shouldn't be running and consuming bandwidth (cc akosiaris)
* 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: [[gerrit:704815{{!}}Fix creation of mw.Message objects (T286385)]] (duration: 00m 57s)
* 22:54 logmsgbot: hoo Synchronized wmf-config/filebackend.php: Fix commons image inclusion after commons went https only (duration: 00m 14s)
* 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # [[phab:T285811|T285811]]
* 22:18 godog: run disk stress-test on restbase1007 / restbase1009
* 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # [[phab:T285811|T285811]]
* 22:06 logmsgbot: twentyafterfour Synchronized hhvm-fatal-error.php: deploy: Guard header() call in error page (duration: 00m 15s)
* 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # [[phab:T285811|T285811]]
* 22:05 logmsgbot: twentyafterfour Synchronized wmf-config/InitialiseSettings-labs.php: deploy: Never use wgServer/wgCanonicalServer values from production in labs (duration: 00m 12s)
* 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
* 20:37 logmsgbot: yurik Synchronized docroot/bits/WikipediaMobileFirefoxOS: Bumping FirefoxOS app to latest (duration: 00m 14s)
* 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 20:30 godog: bounce cassandra on restbase1003
* 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki [[phab:T284928|T284928]]
* 20:18 godog: start cassandra on restbase1008, bootstrapping
* 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
* 20:04 godog: sign restbase1008 key, run puppet
* 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 20:00 godog: powercycle restbase1007, investigate disk issue
* 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 19:07 logmsgbot: ori Synchronized php-1.26wmf9/includes/jobqueue: 0a32aa3be4: jobqueue: use more sensible metric key names (duration: 00m 13s)
* 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 16:57 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 14s)
* 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 16:48 logmsgbot: thcipriani Synchronized php-1.26wmf9/extensions/OpenStackManager/OpenStackManagerHooks.php: SWAT: refer to user the right way (duration: 00m 13s)
* 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 16:48 godog: powercycle graphite1002, no ssh, unresponsive console
* 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 16:19 jynus: upgrading es1005 mysql service while depooled
* 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 16:12 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Grant cloudadmins the 'editallhiera' right [[gerrit:218115]] (duration: 00m 12s)
* 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
* 16:10 bblack: pybal restarts complete, all ok
* 19:28 volker-e@deploy1002: Started deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484)
* 16:09 logmsgbot: thcipriani Finished scap: SWAT: Openstack manager and language updates (duration: 21m 27s)
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:47 logmsgbot: thcipriani Started scap: SWAT: Openstack manager and language updates
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:46 bblack: starting pybal restart process for config changes ( https://gerrit.wikimedia.org/r/#/c/218285/ ), inactives first w/ manual verification of ok-ness
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:11 bblack: rebooting cp3041 (downtimed)
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:00 _joe_: ES is green
* 18:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:38 logmsgbot: aude Synchronized php-1.26wmf9/extensions/Wikidata: Fix property label constraints bug (duration: 00m 24s)
* 18:35 reedy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 06s)
* 14:27 logmsgbot: aude Synchronized arbitraryaccess.dblist: Enable arbitrary access on s7 wikis (duration: 00m 13s)
* 18:34 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 07s)
* 13:47 jynus: enabling puppet on all elastic* nodes, should enable also ganglia
* 18:32 robh@cumin1001: START - Cookbook sre.dns.netbox
* 13:11 logmsgbot: demon Synchronized wmf-config/PoolCounterSettings-common.php: all the search (duration: 00m 12s)
* 17:11 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs (duration: 05m 41s)
* 13:04 _joe_: re-scaling down the recovery index bandwidth in ES to 20 mb/s
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:52 logmsgbot: demon Synchronized wmf-config/PoolCounterSettings-common.php: partially turn search back on (duration: 00m 13s)
* 17:05 otto@deploy1002: Started deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs
* 11:54 _joe_: raised the ES index replica bandwidth limit to 60mb
* 17:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 11:31 akosiaris: migrating etherpad.wikimedia.org to etherpad1001.eqiad.wmnet
* 17:00 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs (duration: 17m 21s)
* 11:15 _joe_: raised the max bytes for ES recovery to 40mbps
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 10:49 manybubbles: and we're yellow right now.
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 10:49 manybubbles: the initial primaries stage - the red stage of the rolling restart - recovers quick-ish
* 16:43 otto@deploy1002: Started deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs
* 10:48 manybubbles: soon we should see it go yellow and stay that way while the replicas recover
* 16:40 ejegg: updated payments-wiki from {{Gerrit|d9892207c1}} to {{Gerrit|844b59ee42}}
* 10:48 manybubbles: manybubbles is confident his mighty bitch slap of the elasticsearch cluster has set it further to the road to recovery
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 10:46 jynus: disabled puppet on all elasticsearch nodes to avoid restarting services and other magic
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 10:44 _joe_: disabled hot threads logging, ganglia on es nodes
* 16:27 ejegg: updated fundraising CiviCRM from {{Gerrit|e0d53c92b5}} to {{Gerrit|20e9ef6bbb}}
* 10:44 manybubbles: started Elasticsearch on all elasticsearch nodes
* 16:24 ejegg: updated payments-wiki from {{Gerrit|0e7800027a}} to {{Gerrit|844b59ee42}}
* 10:38 manybubbles: stopping all elasticsearch servers - going for a full cluster resstart.
* 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 10:11 manybubbles: restarting elasticsearch on elasticsearch1021 - that one is in a gc death spiral
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:26 logmsgbot: oblivian Synchronized wmf-config/PoolCounterSettings-common.php: temporarily throttle down cirrussearch (duration: 00m 13s)
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:12 logmsgbot: oblivian Synchronized wmf-config/PoolCounterSettings-common.php: temporarily throttle down cirrussearch (duration: 00m 13s)
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 07:35 _joe_: attempting a fast restart of elastic1020
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 07:21 logmsgbot: ori Synchronized php-1.26wmf9/extensions/CirrusSearch/includes/Util.php: I504dac0c3: Add missing 'use \Status;' to includes/Util.php (duration: 00m 13s)
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 04:56 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 15 04:56:39 UTC 2015 (duration 56m 38s)
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 03:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1057 (duration: 00m 12s)
* 15:16 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704773{{!}}flaggedrevs: Allow admins of idwiki to change stablesettings (T268317)]], try II (duration: 01m 05s)
* 02:22 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-15 02:22:56+00:00
* 15:03 Amir1: temporary becoming admin on idwiki to debug [[phab:T268317|T268317]]
* 02:19 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 46s)
* 15:02 moritzm: installing nginx security updates on ms-fe*
* 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 14:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1301.eqiad.wmnet
* 13:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1301.eqiad.wmnet
* 13:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1300.eqiad.wmnet
* 13:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw130[0-1].eqiad.wmnet
* 13:22 mutante: mw1300, mw1301 - jobrunners going out of service, decom
* 13:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1300.eqiad.wmnet
* 13:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:17 jelto@cumin1001: START - Cookbook sre.dns.netbox
* 13:12 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw130[0-1].eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 13:05 mutante: mw1413 - pooling, was depooled but for unknown reason, dont see it in SAL, looks ok, scap pulled
* 13:03 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1422.eqiad.wmnet
* 13:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[3-5].eqiad.wmnet
* 12:54 mutante: mw1423, mw1424, mw1425 - pooled as new API servers
* 12:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 12:53 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[3-5].eqiad.wmnet
* 12:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 12:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 12:34 mutante: mw1423, mw1424, mw1425 - scap pull
* 12:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:09 mutante: mw1423,mw1424,mw1425 - rebooting
* 11:48 moritzm: restarting restbase1028-1030 to pick up libuv security update
* 11:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 11:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 11:47 mutante: mw1423, mw1424, mw1425 - initial puppet run, new API appservers going into production
* 11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704527{{!}}Make idwiki use protect mode of flaggedrevs (T268317)]] (duration: 01m 07s)
* 11:40 moritzm: restarting Etherpad to pick up libuv security update
* 11:37 moritzm: restarting Turnilo to pick up libuv security update
* 11:34 moritzm: installing libuv1 security updates
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 10 hosts
* 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 11:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 11:05 volans@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 10:56 volans: commented out cron-spam entries on thanos-fe2001, puppet is disabled, thanos-store.service fails to start - [[phab:T285835|T285835]]
* 10:41 godog: move wikibase.queryService.ui.app to wikibase.queryService.ui.index.app - [[phab:T272128|T272128]]
* 10:34 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 10:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 10:33 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 10:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 10:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 10:26 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 10:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 10:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 10:02 effie: disableing puppet on maps* for 704394
* 09:38 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:11 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-8].eqiad.wmnet
* 09:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 08:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:31 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:29 elukey: sudo rm /etc/rawdog/en/feeds/847a7185.state* on planet1002 (corrupted file) - backup in /home/elukey + restart planet-update-en.service
* 08:12 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-8].eqiad.wmnet
* 08:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 08:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 07:48 moritzm: updated bullseye d-i image for latest daily build [[phab:T275873|T275873]]
* 07:31 godog: reimage thanos-fe2001 with bullseye - [[phab:T285835|T285835]]
* 07:23 elukey: restart planet-update-en.service on planet1002
* 07:17 elukey: remove /etc/rawdog/en/<nowiki>{</nowiki>state,state.lock<nowiki>}</nowiki> on planet1002 (following what rawdog suggested) due to corrupted files (backups available in /home/elukey/en)
* 06:51 elukey: restart phabricator_clean_tmp_files.service on phab1001 - transient error (tmp files already cleaned up)
* 06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 06s)
* 06:47 tstarling@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 07s)
* 05:50 kart_: Updated cxserver to 2021-07-14-124232-production ([[phab:T282369|T282369]], [[phab:T284450|T284450]])
* 05:47 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:43 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:41 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 00:00 twentyafterfour: phabricator update deployed.


== June 14 ==
== 2021-07-14 ==
* 10:39 YuviPanda: running du -d 2 on /srv/project in a screen sesssion on labstore1001
* 23:23 eileen: civicrm revision changed from {{Gerrit|b1c63470bb}} to {{Gerrit|e0d53c92b5}}, config revision is {{Gerrit|bb405c5232}}
* 04:33 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 14 04:33:20 UTC 2015 (duration 33m 19s)
* 21:19 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.14 (duration: 01m 05s)
* 02:42 logmsgbot: reedy Synchronized wmf-config/extension-list: noop (duration: 00m 13s)
* 21:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.14
* 02:40 logmsgbot: krenair Synchronized wmf-config/squid-labs.php: sync random labs-only file to test per irc (duration: 00m 13s)
* 21:08 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/User.php: Backport: [[gerrit:704609{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 02:21 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-14 02:21:28+00:00
* 20:58 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/User.php: Backport: [[gerrit:704608{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 47s)
* 20:51 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.14
* 19:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/resources: Backport: [[gerrit:704606{{!}}Fix deprecated offset() on invalid DOM (T185629)]] (duration: 01m 07s)
* 19:31 andrew@deploy1002: Finished deploy [horizon/deploy@156a984]: fix trove-dashboard bug (duration: 04m 18s)
* 19:26 andrew@deploy1002: Started deploy [horizon/deploy@156a984]: fix trove-dashboard bug
* 19:17 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 19:17 nskaggs@cumin1001: Added views for new wiki: dagwiki [[phab:T284456|T284456]]
* 18:55 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:54 nskaggs@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
* 18:54 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:36 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 18:36 nskaggs@cumin1001: Added views for new wiki: banwikisource [[phab:T284390|T284390]]
* 18:30 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:14 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 17:52 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 17:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 17:49 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 17:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 17:39 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 17:35 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 17:35 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704383{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 06s)
* 17:00 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704382{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 05s)
* 16:27 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 16:26 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 16:11 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Translate: Backport: [[gerrit:704404{{!}}TranslationAid: Handle empty message definition (T285830)]] and [[gerrit:704405{{!}}TranslationAid: Make sure to return successfully fetched definitions (T285830)]] (duration: 01m 09s)
* 16:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:37 moritzm: installing klibc security updates
* 15:36 ottomata: deploying eventgate-analytics with direct service-runner promethues support
* 15:34 ryankemper: [Elastic] Manually triggering readahead mitigation across whole fleet to prevent any further issues today: `ryankemper@cumin1001:~$ sudo cumin -b 12 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl restart elasticsearch-disable-readahead.service'` (still need to investigate why `elasticsearch-disable-readahead.timer` isn't re-firing every 30 mins as desired)
* 15:34 moritzm: installing apache security updates on otrs1001 (ticket.wikimedia.org)
* 15:34 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:28 urbanecm: Start server-side upload of 3 large image files ([[phab:T285708|T285708]])
* 15:16 moritzm: installing apache security updates on lists1001 (lists.wikimedia.org)
* 14:51 moritzm: installing apache security updates on puppet masters
* 14:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2384.codfw.wmnet
* 14:47 effie: set mw2384 as inactive to investigate mw2383 issue - [[phab:T286463|T286463]]
* 14:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 14:44 moritzm: installing apache security updates on grafana*
* 14:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 14:43 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 14:40 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:40 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1422.eqiad.wmnet
* 14:33 dcausse: runnning elasticsearch-madvise-random ES_PID on elastic2045
* 14:31 dcausse: runnning elasticsearch-madvise-random 1022 on elastic2054
* 14:23 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:13 elukey: restart php-fpm on mw2370
* 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 13:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 13:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277118|T277118]]
* 13:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277118|T277118]]
* 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1005.eqiad.wmnet
* 12:43 urbanecm: Start server-side upload of 3 large image files ([[phab:T285708|T285708]])
* 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1005.eqiad.wmnet
* 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 12:23 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 12:15 mutante: mw1422 - scap pull
* 12:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1422.eqiad.wmnet
* 12:02 moritzm: upgrading python3-wmflib fleetwide to 0.0.8 (needed for new logout.d wrapper)
* 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
* 12:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
* 11:52 mutante: mw1422 - new setup, not in prod yet
* 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 11:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
* 11:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
* 11:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 11:49 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704525{{!}}Remove reviewer user group in ruwiki (T284589)]] (duration: 01m 05s)
* 11:40 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
* 11:39 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:700854{{!}}flaggedrevs: Reduce levels for ruwiki to 1 (T284589)]] (duration: 01m 05s)
* 11:37 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
* 11:23 ariel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2383.codfw.wmnet
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|72027e136f10867f5db02043b7505390e49130d1}}: Disable indexing in NS_USER and NS_USER_TALK on bnwiki ([[phab:T286152|T286152]]) (duration: 02m 07s)
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4dc11d2333cbf70a4eb20f3fb94a9e363b41d2df}}: Change category name of Babel extension on Javanese Wikipedia ([[phab:T286165|T286165]]) (duration: 02m 10s)
* 10:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 09:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277118|T277118]]
* 09:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277118|T277118]]
* 09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277118|T277118]]
* 09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277118|T277118]]
* 09:27 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php-1.37.0-wmf.14]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=testwiki # [[phab:T285811|T285811]]
* 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277118|T277118]]
* 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277118|T277118]]
* 07:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277118|T277118]]
* 07:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277118|T277118]]
* 07:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277118|T277118]]
* 07:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277118|T277118]]
* 07:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T277118|T277118]]
* 07:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T277118|T277118]]
* 00:58 eileen: process control updated to {{Gerrit|c291b3c6890364281d}}
* 00:58 eileen: {{Gerrit|c291b3c6890364281d}}
* 00:49 eileen: civicrm revision changed from {{Gerrit|bb62188ec6}} to {{Gerrit|b1c63470bb}}, config revision is {{Gerrit|c291b3c689}}
* 00:48 eileen: process-control config revision is {{Gerrit|c291b3c689}}
* 00:15 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: fix conf cache conflict with Defines.php noticed in beta (duration: 02m 09s)


== June 13 ==
== 2021-07-13 ==
* 19:30 bblack: repooled cp1071, cp3040
* 23:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: {{Gerrit|f3627361ff558c89d4a4452ff24b3457f46a4f46}}: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector ([[phab:T286587|T286587]]) (duration: 02m 08s)
* 18:53 bblack: rebooting cp1071, cp3040 to look at BIOS-level things (depooled, icinga-downed)
* 23:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: {{Gerrit|f3627361ff558c89d4a4452ff24b3457f46a4f46}}: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector ([[phab:T286587|T286587]]) (duration: 02m 07s)
* 17:08 logmsgbot: krinkle Synchronized php-1.26wmf9/extensions/WikimediaEvents: T101806 (duration: 00m 12s)
* 23:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
* 15:47 paravoid: labstore1001: stopping manage-nfs-volumes daemon
* 23:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
* 04:41 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 13 04:41:57 UTC 2015 (duration 41m 56s)
* 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
* 03:51 Krinkle: Running deleteEqualMessages.php for sawiki (T45917)
* 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
* 03:49 Krinkle: Running deleteEqualMessages.php for cewiki (T45917)
* 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
* 02:21 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-13 02:20:58+00:00
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 19s)
* 22:22 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
* 00:17 gwicke: restarted cassandra on restbase1001
* 22:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
* 00:13 gwicke: restarted cassandra on restbase1002
* 22:18 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Use Score with lilypond's safe mode only (duration: 02m 06s)
* 20:53 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 20:30 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/skins/Skin.php: Backport: [[gerrit:704368{{!}}links is flat array (T286040)]] (duration: 02m 07s)
* 20:26 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.9 (duration: 04m 21s)
* 20:19 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.14 (duration: 31m 56s)
* 19:47 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.14
* 19:02 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1283.eqiad.wmnet
* 17:45 mutante: mw1283 - decom - powered off by cookbook
* 17:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1283.eqiad.wmnet
* 17:41 mutante: homer "asw2-a*eqiad*" commit "decom mw1282 - [[phab:T280203|T280203]]"
* 17:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
* 17:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
* 17:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
* 17:09 mutante: mw1282 - decom, powered off
* 17:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
* 17:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1281.eqiad.wmnet
* 17:05 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: Backport: [[gerrit:704181{{!}}Do not lock user_preferences before updating (T286521)]] (duration: 01m 58s)
* 16:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:55 jbond: upload statograph to buster wikimedia
* 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1281.eqiad.wmnet
* 16:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 16:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 16:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 16:25 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw128[1-3].eqiad.wmnet
* 15:34 topranks: Adding IX peering to AS393950 (Xiber LLC) on cr2-eqiad.
* 15:20 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:19 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:52 volker-e@deploy1002: Finished deploy [design/style-guide@5c07233]: Deploy design/style-guide: {{Gerrit|5c07233}} “Components”: Add WikimediaUI theme Figma links to various components (#483) (duration: 00m 06s)
* 14:52 volker-e@deploy1002: Started deploy [design/style-guide@5c07233]: Deploy design/style-guide: {{Gerrit|5c07233}} “Components”: Add WikimediaUI theme Figma links to various components (#483)
* 14:35 nskaggs@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 14:35 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 13:57 otto@deploy1002: Finished deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job  - [[phab:T271232|T271232]] (duration: 03m 28s)
* 13:53 otto@deploy1002: Started deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job  - [[phab:T271232|T271232]]
* 13:37 effie: rolling restart php-fpm across clusters - [[phab:T286260|T286260]]
* 13:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/lib/includes/SimpleCacheWithBagOStuff.php: Backport: [[gerrit:704176{{!}}Send TTL instead of expiry in unix timestamp in calling BagOStuff (T286260)]] (duration: 00m 58s)
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 13:29 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 13:14 kormat: restarted replication on db1117:3325 [[phab:T284622|T284622]]
* 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 1732 hosts
* 13:10 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 13:10 hashar: Upgraded Apache on gerrit1001 and gerrit2001
* 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
* 13:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
* 12:53 kormat: stopping replication on db1117:3325 [[phab:T284622|T284622]]
* 12:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 [[phab:T284622|T284622]]
* 12:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 [[phab:T284622|T284622]]
* 12:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 12:41 mutante: depooling and decom'ing eqiad API servers mw1281, mw1282, mw1283 - [[phab:T280203|T280203]]
* 12:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[1-3].eqiad.wmnet
* 12:20 mutante: mwmaint1002 - scap pull after reimaging
* 11:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
* 11:28 Lucas_WMDE: EU backport+config window done
* 11:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:704304{{!}}Remove obsolete $wgShowDBErrorBacktrace config]] (duration: 01m 25s)
* 11:13 mutante: mwmaint1002 - reimaging with buster ([[phab:T267607|T267607]])
* 10:54 mutante: switching https://noc.wikimedia.org backened from eqiad to codfw for mwmaint1002 OS upgrade, not affecting config-master/pybal, tests passed ([[phab:T267607|T267607]])
* 10:44 moritzm: upgrading apache on phab1001 (phabricator.wikimedia.org)
* 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:39 hnowlan: running `nodetool decommission` on maps2008
* 10:27 moritzm: installing apache security updates on alert1001 (icinga.wikimedia.org)
* 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277116|T277116]]
* 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277116|T277116]]
* 10:18 moritzm: installing apache security updates on Logstash hosts
* 09:58 moritzm: upgrading PHP/Apache on matomo1002 (piwik.wikimedia.org)
* 09:40 moritzm: installing apache security updates on thanos-fe hosts
* 09:38 moritzm: installing apache security updates on parsoid hosts
* 09:31 effie: depool mw2383 [[phab:T286463|T286463]]
* 09:18 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 09:15 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277116|T277116]]
* 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277116|T277116]]
* 08:59 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
* 08:59 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
* 08:45 effie: depool mw2383 - [[phab:T286463|T286463]]
* 08:02 moritzm: upgrade bullseye pilot installs to latest state of bullseye
* 07:06 moritzm: installing apache security updates on codfw mw* hosts
* 06:53 elukey: systemctl reset-failed ifup@ens5 on gitlab2001 - [[phab:T273026|T273026]]
* 06:06 effie: pool mw2383  - [[phab:T286463|T286463]]
* 04:09 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 04:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 04:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 04:05 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@36f74b3]: 0.3.76 (duration: 08m 28s)
* 03:56 ryankemper@deploy1002: Started deploy [wdqs/wdqs@36f74b3]: 0.3.76
* 03:55 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@36f74b3]: 0.3.76 (duration: 02m 22s)
* 03:54 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.76` on canary `wdqs1003`; proceeding to rest of fleet
* 03:53 ryankemper@deploy1002: Started deploy [wdqs/wdqs@36f74b3]: 0.3.76
* 03:53 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.76`. Pre-deploy tests passing on canary `wdqs1003`


== June 12 ==
== 2021-07-12 ==
* 22:57 ejegg: rolled back SmashPig on listener from 15acdafef9d9682c417632e5ac5a5f2e5380f92e to e1e925c9fc2a60c1e14ef01d8b653dc09512f51f
* 23:57 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1896efc27f3de39659673091bc4c43ad874da0c5}}: Add sayahna.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T286163|T286163]]) (duration: 00m 56s)
* 22:40 ejegg: updated SmashPig on listener from e1e925c9fc2a60c1e14ef01d8b653dc09512f51f to 15acdafef9d9682c417632e5ac5a5f2e5380f92e
* 23:51 urbanecm: urbanecm@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=[[phab:T286396|T286396]] # [[phab:T286396|T286396]]
* 22:24 godog: upgrade and bounce carbon daemons on graphite2001 to investigate T101572
* 23:50 urbanecm: urbanecm@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=BROKEN # [[phab:T286396|T286396]]
* 21:16 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: I3694489ba: wgCanonicalServer->https for new HTTPS domains (duration: 00m 14s)
* 23:50 urbanecm: Delete Project:BROKENPesak at sr.wikipedia to be able to rerun namespaceDupes.php ([[phab:T286396|T286396]])
* 20:33 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/217878/1 (duration: 00m 13s)
* 23:45 urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=BROKEN # [[phab:T286396|T286396]]
* 20:32 logmsgbot: krenair Synchronized w/static/images/project-logos/dawiki-200k.png: https://gerrit.wikimedia.org/r/#/c/217878/1 (duration: 00m 16s)
* 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|284216a7d35c815ea203a9c0bd738a1e1bf31f7e}}: Add few namespace aliases for Serbian Wikipedia ([[phab:T286396|T286396]]) (duration: 00m 56s)
* 20:15 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/217670/ (duration: 00m 12s)
* 23:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8a79bf752ff5eb15f3042fd94ba10c2c50607a85}}: enwiki: Delete Book namespace ([[phab:T285766|T285766]]) (duration: 00m 57s)
* 19:28 ejegg: updated SmashPig on payments-listener from f9c3eaa99fa0fe8ef098d0fc876091d3676aa039 to 5a463400bc74706ba7bf6256cd0101014e792acb
* 23:29 urbanecm@deploy1002: Synchronized static/images/: {{Gerrit|d007b9ccb77db9f3dc492df7a35477e5563a921a}}: Remove unused celebration logos and wordmark ([[phab:T286380|T286380]]) (duration: 00m 57s)
* 19:28 ejegg: updated SmashPig on payments-listener ccepting New Patients:
* 23:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6c581493fbe5d9c372fd44635b704d04040d8b38}}: Add editautoreviewprotected to bot on hewikisource ([[phab:T275076|T275076]]) (duration: 00m 57s)
* 18:47 ejegg: updated SmashPig on payments-listener from 7fed22ad933a6d3e371d60dfc6f8fdd0f9131510 to f9c3eaa99fa0fe8ef098d0fc876091d3676aa039
* 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|40eade4131eac95ba3dc0d918ad540070d7bcb99}}: Enable RelatedArticles Extension in zhwikinews ([[phab:T266933|T266933]]) (duration: 00m 57s)
* 18:45 logmsgbot: faidon Synchronized wmf-config/InitialiseSettings.php: remove wmgHTTPSBlacklistCountries (duration: 00m 12s)
* 23:15 urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=zhwiktionary --fix --add-prefix=BROKEN # [[phab:T286101|T286101]], P16817
* 18:45 logmsgbot: faidon Synchronized wmf-config/CommonSettings.php: remove CanIPUseHTTPS hook (duration: 00m 13s)
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5ab00d188bc4161e40455b842f613698548b3518}}: zhwiktionary: Add templateeditor right ([[phab:T286101|T286101]]) (duration: 00m 57s)
* 17:39 moritzm: updated cerium, xenon and praseodymium to 3.19 kernel
* 23:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5822b2be129b934939af46bab5b8916039661e97}}: zhwiktionary: Add aliases for namespaces ([[phab:T286101|T286101]]) (duration: 00m 57s)
* 17:08 ejegg: enabled queue consumer
* 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ba0967f5c18652d02b7b476e9592b81dcb9b74fc}}: zhwiktionary: Add Reconstruction namespace ([[phab:T286101|T286101]]) (duration: 00m 57s)
* 17:08 ejegg: updated crm from d13aaa4e9e937b0b1ae1f5de61ea7ff1f316d58f to bd8a00196071ddd04efbff7b30567dd9357c9000
* 22:53 legoktm: root@urldownloader2002:/var/cache/apt# rm -rf * to free up space
* 16:53 ejegg: disabled donations queue consumer
* 21:26 urbanecm: Start server-side upload for 2 video files ([[phab:T286432|T286432]], [[phab:T286433|T286433]])
* 15:52 logmsgbot: faidon Synchronized wmf-config/CommonSettings.php: hide prefershttps user pref (duration: 00m 13s)
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@200b502]: Finalize event_default gobblin job  - [[phab:T271232|T271232]] (duration: 03m 39s)
* 15:40 logmsgbot: faidon Synchronized docroot/search.wikimedia.org/index.php: unbreak search.wikimedia.org due to HTTPS (duration: 00m 12s)
* 18:37 otto@deploy1002: Started deploy [analytics/refinery@200b502]: Finalize event_default gobblin job  - [[phab:T271232|T271232]]
* 15:27 jynus: mysql load issues on labsdb1003, investigating
* 18:12 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score using Shellbox on testwiki ([[phab:T257066|T257066]]) (duration: 00m 58s)
* 13:39 moritzm: updated etcd* to 3.19 kernel
* 16:15 ppchelko@deploy1002: Finished deploy [restbase/deploy@b05ade3]: Add newly created wikis [[phab:T284929|T284929]] [[phab:T284457|T284457]] [[phab:T284392|T284392]] (duration: 21m 24s)
* 12:11 jynus: restarting mariadb at labsdb1003
* 16:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]] - extending downtime
* 11:58 moritzm: updated rdb200* to 3.19 kernel
* 16:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]] - extending downtime
* 11:31 jynus: db2068 up but all services and console login unresponsive, powercycling
* 15:54 ppchelko@deploy1002: Started deploy [restbase/deploy@b05ade3]: Add newly created wikis [[phab:T284929|T284929]] [[phab:T284457|T284457]] [[phab:T284392|T284392]]
* 10:06 springle: killed a bunch of queries hammering labsdb1003 for days
* 15:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]]
* 09:58 moritzm: updated mc2004 to mc2016 to 3.19 kernel
* 15:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]]
* 06:06 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun 12 06:06:55 UTC 2015 (duration 6m 54s)
* 15:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277116|T277116]]
* 04:37 logmsgbot: ori Synchronized php-1.26wmf9/extensions/FlaggedRevs: I4cfb47b41: Avoid post-redirect parse for certain edits (duration: 00m 14s)
* 15:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277116|T277116]]
* 02:40 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-12 02:40:36+00:00
* 15:24 elukey: expand ML k8s iBGP neighbors to include the master nodes (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/704104)
* 02:34 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 10m 00s)
* 15:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277116|T277116]]
* 00:40 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/217759 (duration: 00m 15s)
* 15:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277116|T277116]]
* 00:07 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings-labs.php: (no message) (duration: 00m 14s)
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica1002.wikimedia.org
* 15:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277116|T277116]]
* 15:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277116|T277116]]
* 15:00 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica1002.wikimedia.org
* 14:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change [[phab:T277116|T277116]]
* 14:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change [[phab:T277116|T277116]]
* 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica1001.wikimedia.org
* 14:44 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica1001.wikimedia.org
* 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica2004.wikimedia.org
* 14:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica2004.wikimedia.org
* 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica2003.wikimedia.org
* 14:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica2003.wikimedia.org
* 14:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 13:59 otto@deploy1002: Finished deploy [analytics/refinery@dd65f38]: event_default gobblin job - fix typo  - [[phab:T271232|T271232]] (duration: 03m 30s)
* 13:56 otto@deploy1002: Started deploy [analytics/refinery@dd65f38]: event_default gobblin job - fix typo  - [[phab:T271232|T271232]]
* 13:52 otto@deploy1002: Finished deploy [analytics/refinery@0149c81]: Set event_default gobblin job max mappers=128 - [[phab:T271232|T271232]] (duration: 03m 16s)
* 13:49 otto@deploy1002: Started deploy [analytics/refinery@0149c81]: Set event_default gobblin job max mappers=128 - [[phab:T271232|T271232]]
* 13:36 otto@deploy1002: Finished deploy [analytics/refinery@1cb9e12]: Add event_default gobblin job - [[phab:T271232|T271232]] (duration: 03m 37s)
* 13:32 otto@deploy1002: Started deploy [analytics/refinery@1cb9e12]: Add event_default gobblin job - [[phab:T271232|T271232]]
* 12:51 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:48 volans@cumin2002: START - Cookbook sre.dns.netbox
* 12:42 volans: reverting Primary IP allocation for pc1011-1014, leaving only mgmt IPs - [[phab:T282484|T282484]]
* 12:34 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps2004.codfw.wmnet
* 11:58 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:703567{{!}}Enable template search improvements on first wikis 2/2 (T284553)]] (duration: 00m 57s)
* 11:54 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:703566{{!}}Enable template search improvements on first wikis 1/2 (T284553)]] (duration: 00m 56s)
* 11:49 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/VisualEditor/modules/ve-mw/ui/widgets/ve.ui.MWTemplateTitleInputWidget.js: Backport: [[gerrit:703649{{!}}Always add 1 prefixsearch match when searching for templates]] (duration: 00m 57s)
* 11:47 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps100[1-4].eqiad.wmnet
* 11:45 hnowlan: adjusting weights of eqiad maps servers to reduce load on older spec machines
* 11:40 moritzm: installing apache updates on mw1/eqiad hosts
* 11:38 hnowlan: adjusting weights of codfw maps servers to reduce load on older spec machines
* 11:37 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2004.codfw.wmnet
* 11:34 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|773c956811cba5c3a2cbba32bc1e1a536dbd9f0b}}: Revert "Use ptwiki 20th anniversary logos" ([[phab:T286380|T286380]]) (duration: 00m 57s)
* 11:34 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2003.codfw.wmnet
* 11:33 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2001.codfw.wmnet
* 11:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cd5f5375b4f712c56e9396cc550078272ef668de}}: Revert "ptwiki: Use celebration logos in new vector" ([[phab:T286380|T286380]]) (duration: 00m 57s)
* 11:26 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:702761{{!}}Add 'editautoreviewprotected' protection level to hewikisource (T275076)]] (duration: 00m 57s)
* 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 11:19 hnowlan: testing a depool of maps2010 to ensure kartotherian load can cope with two less nodes
* 11:12 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:703568{{!}}Enable transclusion back button on first wikis (T284553)]] (duration: 00m 58s)
* 11:01 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2008.codfw.wmnet
* 10:58 hnowlan: testing a depool of maps2008 to ensure kartotherian load can cope with one less node
* 10:30 moritzm: installing apache updates on an-tool* hosts (affects Turnilo, Yarn, Superset, Hue) briefly
* 10:11 elukey: add 10g disk to ml-serve-ctrl[12]00[12] for [[phab:T285927|T285927]]
* 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet
* 10:05 mutante: planet - deleting state files, manually running update for all 161 en feeds - [[phab:T285251|T285251]]
* 10:03 effie: depool mw2383
* 10:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet
* 10:01 godog: test thanos-compact upload with smaller part size - [[phab:T285835|T285835]]
* 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
* 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
* 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1006.eqiad.wmnet
* 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
* 09:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
* 09:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1006.eqiad.wmnet
* 09:07 godog: repool thanos-fe2002 - [[phab:T285835|T285835]]
* 08:38 godog: test a single frontend for thanos-swift / thanos-query to test "bad host" theory - [[phab:T285835|T285835]]
* 08:26 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/client: Backport: [[gerrit:703890{{!}}Remove subscribing to other aspect for entity usage (T286193)]] (duration: 00m 59s)
* 07:44 jynus: restart db1102:x1 mariadb instance
* 07:01 moritzm: installing apache2 security updates
* 05:14 Amir1: start of mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --batch-size=10 --verbose --mime="application/pdf" --force --sleep 5 on screen - It will take days / week to finish ([[phab:T275268|T275268]])
* 05:06 ladsgroup@deploy1002: Synchronized wmf-config/filebackend.php: Config: [[gerrit:703951{{!}}Enable json image metadata everywhere (T275268)]] (duration: 01m 05s)
* 04:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/maintenance/refreshImageMetadata.php: Backport: [[gerrit:703891{{!}}Add --sleep option to refreshImageMetadata.php]] (duration: 01m 04s)
* 04:10 Amir1: mwscript refreshImageMetadata.php --wiki=testcommonswiki --mediatype=OFFICE --batch-size=20 --verbose --mime="application/pdf" --force ([[phab:T275268|T275268]])
* 04:08 ladsgroup@deploy1002: Synchronized wmf-config/filebackend.php: Config: [[gerrit:703950{{!}}Set testcommonswiki to use json image metadata (T275268)]] (duration: 01m 10s)


== June 11 ==
== 2021-07-09 ==
* 23:59 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/217753 (duration: 00m 16s)
* 23:28 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:54 logmsgbot: ori Synchronized php-1.26wmf9/includes/EditPage.php: cf7df757f2: Instrument edit failures (duration: 00m 14s)
* 23:27 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:41 logmsgbot: ebernhardson Synchronized php-1.26wmf9/extensions/MobileFrontend: Bump MobileFrontend in 1.26wmf9 for SWAT (duration: 00m 14s)
* 22:36 legoktm: running benchmarking scripts again shellbox
* 23:40 ejegg: updated civicrm from 7ffe0cefb019828a09c9369187f14518847b5f41 to d13aaa4e9e937b0b1ae1f5de61ea7ff1f316d58f
* 14:49 otto@deploy1002: Finished deploy [analytics/refinery@cdb3fc5] (hadoop-test): Deploy for finalize event_default_test gobblin job in hadoop test - [[phab:T271232|T271232]] (duration: 03m 08s)
* 23:24 logmsgbot: ebernhardson Synchronized php-1.26wmf9/extensions/CirrusSearch/: Fix prefer-recent queries in cirrussearch (duration: 00m 13s)
* 14:46 otto@deploy1002: Started deploy [analytics/refinery@cdb3fc5] (hadoop-test): Deploy for finalize event_default_test gobblin job in hadoop test - [[phab:T271232|T271232]]
* 23:02 ejegg: updated SmashPig on the rest of the cluster from 477e8a8be5ea895262031c147330de5a651cc3ac to 7fed22ad933a6d3e371d60dfc6f8fdd0f9131510
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1118', diff saved to https://phabricator.wikimedia.org/P16809 and previous config saved to /var/cache/conftool/dbconfig/20210709-115609-marostegui.json
* 22:17 godog: temporary bump php memory_limit on magnesium to test T102092
* 11:40 _joe_: deleting coredns pod in codfw, potentially causing [[phab:T286360|T286360]]
* 22:11 ejegg: updated SmashPig on payments-listener from 477e8a8be5ea895262031c147330de5a651cc3ac to 7fed22ad933a6d3e371d60dfc6f8fdd0f9131510
* 10:13 _joe_: recreated all pods for zotero in codfw
* 21:54 ori: Widespread TC cache exhaustion again, doing rolling restart of HHVMs
* 00:47 legoktm: zotero rolling restart didn't help, filed [[phab:T286360|T286360]] for DNS issues
* 21:46 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I3d3ed7647: Test LCStoreStaticArray on test2wiki (duration: 00m 14s)
* 00:39 legoktm: doing a rolling restart of zotero in codfw to hopefully fix DNS ENOTFOUND issues
* 21:01 godog: NPE while trying to make restbase1007 (cassandra 2.1.5) join the cluster, trying matching the same cassandra version (2.1.3)
* 20:57 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: fix last commit, did not have any affect (duration: 00m 16s)
* 20:55 ejegg: updated payments from 43c7952d2a31deaea97e8319f5612d644dce43c8 to f33d0a8687a120a2057a7e6acad67da63b17f97e
* 20:54 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/217688/1 (duration: 00m 13s)
* 20:10 godog: sign restbase1007 puppet key and first puppet run
* 19:10 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/217591 (duration: 00m 13s)
* 18:58 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings-labs.php: beta only change - https://gerrit.wikimedia.org/r/217560 (duration: 00m 12s)
* 18:55 logmsgbot: krinkle Synchronized php-1.26wmf9/extensions/WikimediaEvents: T101806 (duration: 00m 14s)
* 18:43 logmsgbot: twentyafterfour Synchronized php-1.26wmf9/includes/AjaxResponse.php: Hotfix Iafff9982bbbee893c13f891901dde88f998db7a6 (duration: 00m 14s)
* 18:16 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf9
* 17:44 ejegg: rolled back payments to 43c7952d2a31deaea97e8319f5612d644dce43c8
* 17:41 ejegg: updated payments from 43c7952d2a31deaea97e8319f5612d644dce43c8 to 15f24d24b150d5d774314b0c1b40ae26a73185f2
* 17:00 moritzm: updated mc200[1-3] to linux 3.19
* 16:28 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Use arbitrary access tag (duration: 00m 12s)
* 16:27 logmsgbot: aude Synchronized wmf-config/CommonSettings.php: Add arbitrary access group tag (duration: 00m 13s)
* 16:27 logmsgbot: aude Synchronized arbitraryaccess.dblist: Add dblist for arbitrary access wikis (duration: 00m 13s)
* 16:24 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Use usagetracking tag (duration: 00m 13s)
* 16:23 logmsgbot: aude Synchronized wmf-config/CommonSettings.php: Add usagetracking group tag (duration: 00m 16s)
* 16:23 ori: Scap + deployments exhausted TC cache on Apaches; performed a rolling restart of HHVM
* 16:21 logmsgbot: aude Synchronized usagetracking.dblist: Add dblist for usage tracking wikis (duration: 00m 25s)
* 16:19 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Disable Parsoid update jobs (duration: 00m 14s)
* 16:18 logmsgbot: thcipriani Finished scap: SWAT: Update namespaces and special pages for Northern Luri (lrc) from translatewiki [[gerrit:216533]] [[gerrit:217327]] (duration: 32m 11s)
* 15:46 logmsgbot: thcipriani Started scap: SWAT: Update namespaces and special pages for Northern Luri (lrc) from translatewiki [[gerrit:216533]] [[gerrit:217327]]
* 15:27 logmsgbot: thcipriani Synchronized php-1.26wmf9/extensions/OpenStackManager: SWAT: update OpenStackManager to disable unused sudoer features [[gerrit:217407]] (duration: 00m 13s)
* 15:11 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Make VisualEditor access RESTbase directly on all public wikis [[gerrit:214833]] (duration: 00m 12s)
* 15:05 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for deployment on 20150611 [[gerrit:217460 ]] (duration: 00m 12s)
* 14:33 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable usage tracking on jawiki (duration: 00m 12s)
* 13:40 _joe_: rolling restart of all the restbase instances
* 13:33 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable usage tracking on frwiki (duration: 00m 12s)
* 13:32 _joe_: running puppet on all restbase hosts
* 13:19 _joe_: running puppet on restbase1001
* 13:16 _joe_: disabling puppet on restbase hosts in anticipation for merging https://gerrit.wikimedia.org/r/217431
* 13:11 paravoid: removing gdnsd from apt: precise-wikimedia (1.9.0-1~precise1/2.1.0-1~precise1), trusty-wikimedia (2.1.0-1), jessie-wikimedia (2.1.2-1~deb8u1)
* 12:13 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable arbitrary access on Wikivoyage and Wikiquote (duration: 00m 13s)
* 11:48 YuviPanda: reboot labvirt1005 for kernel upgrade
* 11:46 YuviPanda: installing linux-image-generic-lts-vivid on labvirt1005 to get a 3.19 kernel
* 09:51 akosiaris: uploaded ruby-jsduck_5.3.4 and ruby-rkelly-remix_0.0.6 on apt.wikimedia.org/jessie-wikimedia/main
* 08:18 akosiaris: recreating jessie chroots on copper
* 06:21 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 11 06:21:53 UTC 2015 (duration 21m 52s)
* 04:44 twentyafterfour: upgraded phabricator at 1:50 UTC (belatedly logged...)
* 03:01 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-11 03:01:48+00:00
* 03:00 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1057, warm up (duration: 01m 16s)
* 02:59 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 59s)
* 02:43 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-11 02:43:34+00:00
* 02:30 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 09m 13s)


== June 10 ==
== 2021-07-08 ==
* 23:23 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Add www.limis.lt to $wgCopyUploadsDomains (duration: 00m 19s)
* 22:48 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Add configuration to use Score with Shellbox (still disabled) (2/2) - [[phab:T281423|T281423]] (duration: 00m 57s)
* 22:07 logmsgbot: twentyafterfour Synchronized php-1.26wmf9/extensions/MobileFrontend/includes/skins/banners.mustache: Deploying https://gerrit.wikimedia.org/r/#/c/217417/ (duration: 00m 16s)
* 22:46 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add configuration to use Score with Shellbox (still disabled) (1/2) - [[phab:T281423|T281423]] (duration: 00m 58s)
* 20:38 logmsgbot: ori Synchronized php-1.26wmf8/includes/Hooks.php: d6802ad7d6: Avoid section profiling in Hooks::run due to high overhead (duration: 00m 14s)
* 19:29 legoktm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Score/includes/Score.php: Allow setting a different path for `convert` just for Score (2/2) (duration: 00m 57s)
* 20:37 logmsgbot: ori Synchronized php-1.26wmf9/includes/Hooks.php: e552f4942d: Avoid section profiling in Hooks::run due to high overhead (duration: 00m 17s)
* 19:27 legoktm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Score/extension.json: Allow setting a different path for `convert` just for Score (1/2) (duration: 00m 58s)
* 20:36 logmsgbot: ori Synchronized php-1.26wmf9/includes/User.php: 2f4f1e279d: Fixed "wfTimestamp() fed bogus time value" errors (duration: 00m 12s)
* 18:56 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 20:36 logmsgbot: ori Synchronized php-1.26wmf8/includes/User.php: 55e18123ca: Fixed "wfTimestamp() fed bogus time value" errors (duration: 00m 15s)
* 18:55 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 18:07 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: Group1 wikis to 1.26wmf9
* 18:53 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 16:14 godog: reboot ms-be2008 to check disk swap config
* 17:02 joal@deploy1002: Finished deploy [analytics/refinery@51a73f1] (hadoop-test): Analytics deploy for Gobblin replacing Camus - hadoop-test [analytics/refinery@51a73f1] (duration: 05m 38s)
* 15:50 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: retry (duration: 01m 08s)
* 16:56 joal@deploy1002: Started deploy [analytics/refinery@51a73f1] (hadoop-test): Analytics deploy for Gobblin replacing Camus - hadoop-test [analytics/refinery@51a73f1]
* 15:34 Krenair: sync failed to something like 25 hosts, cannot directly log into any of them either
* 16:47 joal@deploy1002: Finished deploy [analytics/refinery@51a73f1]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@51a73f1] (duration: 03m 17s)
* 15:17 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/215030/ - no code change, just docs - should not have to wait 9 days for this (duration: 01m 08s)
* 16:44 joal@deploy1002: Started deploy [analytics/refinery@51a73f1]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@51a73f1]
* 13:16 moritzm: installed curl security updates on elastic*, wtp*, db*, virt*, labs*, labmon*, labstore*, es*
* 15:37 otto@deploy1002: Finished deploy [analytics/refinery@9883dbf] (hadoop-test): Deploy for event_default_test job in hadoop test - [[phab:T271232|T271232]] (duration: 03m 06s)
* 12:38 paravoid: zirconium: rm -rf /var/log2 (last log there from Mar 20th 2014)
* 15:34 otto@deploy1002: Started deploy [analytics/refinery@9883dbf] (hadoop-test): Deploy for event_default_test job in hadoop test - [[phab:T271232|T271232]]
* 10:55 jynus: disruption for maintenance starting on labsdb1002 https://lists.wikimedia.org/pipermail/labs-l/2015-June/003766.html
* 15:29 otto@deploy1002: Finished deploy [analytics/refinery@51f4696] (hadoop-test): Deploy for eventlogging_legacy gobblin with final import path - [[phab:T271232|T271232]] (duration: 05m 27s)
* 03:02 logmsgbot: ori Synchronized php-1.26wmf8/includes/User.php: 55e18123ca: Fixed "wfTimestamp() fed bogus time value" (duration: 01m 07s)
* 15:23 otto@deploy1002: Started deploy [analytics/refinery@51f4696] (hadoop-test): Deploy for eventlogging_legacy gobblin with final import path - [[phab:T271232|T271232]]
* 03:01 logmsgbot: ori Synchronized php-1.26wmf9/includes/User.php: 2f4f1e279d: Fixed "wfTimestamp() fed bogus time value" (duration: 01m 08s)
* 15:11 otto@deploy1002: Finished deploy [analytics/refinery@42541e6] (hadoop-test): Deploy for eventlogging_legacy gobblin migration - [[phab:T271232|T271232]] (duration: 05m 42s)
* 02:36 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-10 02:35:44+00:00
* 15:05 otto@deploy1002: Started deploy [analytics/refinery@42541e6] (hadoop-test): Deploy for eventlogging_legacy gobblin migration - [[phab:T271232|T271232]]
* 02:31 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 07m 20s)
* 14:52 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add consumers.analytics_hadoop-ingestion stream config settings for automated gobblin imports - [[phab:T271232|T271232]] [[phab:T273901|T273901]] (duration: 01m 09s)
* 01:33 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1057 (duration: 01m 08s)
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16807 and previous config saved to /var/cache/conftool/dbconfig/20210708-134421-root.json
* 01:13 logmsgbot: ori Synchronized php-1.26wmf8/extensions/FlaggedRevs: 433fae7f23: Update FlaggedRevs for cherry-picks (duration: 01m 09s)
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16806 and previous config saved to /var/cache/conftool/dbconfig/20210708-132917-root.json
* 01:10 logmsgbot: ori Synchronized php-1.26wmf9/extensions/FlaggedRevs: 2cfc8c9f2b: Update FlaggedRevs for cherry-picks (duration: 01m 09s)
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16805 and previous config saved to /var/cache/conftool/dbconfig/20210708-131414-root.json
* 13:04 otto@deploy1002: Finished deploy [analytics/refinery@2d4c645]: Make gobblin-netflow use production directory - [[phab:T271232|T271232]] (duration: 03m 22s)
* 13:01 otto@deploy1002: Started deploy [analytics/refinery@2d4c645]: Make gobblin-netflow use production directory - [[phab:T271232|T271232]]
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16804 and previous config saved to /var/cache/conftool/dbconfig/20210708-125910-root.json
* 12:52 moritzm: installing klibc security updates on buster
* 12:38 moritzm: installing openexr security updates
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103', diff saved to https://phabricator.wikimedia.org/P16803 and previous config saved to /var/cache/conftool/dbconfig/20210708-105353-marostegui.json
* 10:20 jbond: upgrade golang-cfssl
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16802 and previous config saved to /var/cache/conftool/dbconfig/20210708-100947-root.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16801 and previous config saved to /var/cache/conftool/dbconfig/20210708-095443-root.json
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16800 and previous config saved to /var/cache/conftool/dbconfig/20210708-093939-root.json
* 09:25 jbond: upload golang-github-cloudflare-cfssl_1.6.0-1_amd64 to bullseye
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16799 and previous config saved to /var/cache/conftool/dbconfig/20210708-092436-root.json
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2116', diff saved to https://phabricator.wikimedia.org/P16798 and previous config saved to /var/cache/conftool/dbconfig/20210708-092411-marostegui.json
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16797 and previous config saved to /var/cache/conftool/dbconfig/20210708-090456-root.json
* 09:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16796 and previous config saved to /var/cache/conftool/dbconfig/20210708-084952-root.json
* 08:50 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:42 moritzm: imported ganeti 2.16.0 for stretch-security/component/ganeti216 [[phab:T284811|T284811]]
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16795 and previous config saved to /var/cache/conftool/dbconfig/20210708-083449-root.json
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16794 and previous config saved to /var/cache/conftool/dbconfig/20210708-081945-root.json
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2130', diff saved to https://phabricator.wikimedia.org/P16793 and previous config saved to /var/cache/conftool/dbconfig/20210708-081922-marostegui.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16792 and previous config saved to /var/cache/conftool/dbconfig/20210708-060812-root.json
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16791 and previous config saved to /var/cache/conftool/dbconfig/20210708-055309-root.json
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16790 and previous config saved to /var/cache/conftool/dbconfig/20210708-053805-root.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16789 and previous config saved to /var/cache/conftool/dbconfig/20210708-052302-root.json
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092', diff saved to https://phabricator.wikimedia.org/P16788 and previous config saved to /var/cache/conftool/dbconfig/20210708-052216-marostegui.json


== June 9 ==
== 2021-07-07 ==
* 23:57 logmsgbot: catrope Synchronized php-1.26wmf8/includes/: Avoid parser cache miss that often occurs post-save (duration: 01m 14s)
* 20:22 legoktm: repooling eqiad - https://gerrit.wikimedia.org/r/703561
* 23:29 logmsgbot: catrope Synchronized php-1.26wmf8/resources/src/mediawiki/mediawiki.js: touch (duration: 01m 08s)
* 18:06 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Add Shellbox to <nowiki>{</nowiki>Production,Labs<nowiki>}</nowiki>Services.php (2/2) (duration: 00m 59s)
* 23:23 logmsgbot: catrope Synchronized php-1.26wmf9/includes/resourceloader/ResourceLoaderOOUIImageModule.php: Fix OOUI image variants (duration: 01m 08s)
* 18:05 legoktm@deploy1002: Synchronized wmf-config/LabsServices.php: Add Shellbox to <nowiki>{</nowiki>Production,Labs<nowiki>}</nowiki>Services.php (1/2) (duration: 00m 59s)
* 23:22 ori: Deleting unused metrics on graphite2001 (sum_sq and stddev) as well
* 18:04 otto@deploy1002: Finished deploy [analytics/refinery@46c0b84] (hadoop-test): Deploy for gobblin migration - Refine now supports gzip - [[phab:T271232|T271232]] (duration: 05m 28s)
* 23:21 logmsgbot: catrope Synchronized php-1.26wmf9/resources/src/mediawiki/mediawiki.js: Add logging for T101806 private modules (duration: 01m 08s)
* 17:59 legoktm@deploy1002: Synchronized private/readme.php: Document $wgShellboxSecretKey in private/readme.php (duration: 01m 01s)
* 23:20 ori: Deleting unused  metrics in graphite1001 (sum_sq and stddev)
* 17:58 otto@deploy1002: Started deploy [analytics/refinery@46c0b84] (hadoop-test): Deploy for gobblin migration - Refine now supports gzip - [[phab:T271232|T271232]]
* 23:19 logmsgbot: catrope Synchronized php-1.26wmf8/resources/src/mediawiki/mediawiki.js: Add logging for T101806 private modules (duration: 01m 08s)
* 17:54 otto@deploy1002: Finished deploy [analytics/refinery@46c0b84]: Deploy for gobblin migration - Refine now supports gzip - [[phab:T271232|T271232]] (duration: 17m 22s)
* 23:16 logmsgbot: catrope Synchronized wmf-config/CirrusSearch-common.php: fix total breakage of search in wmf9 (duration: 01m 08s)
* 17:36 otto@deploy1002: Started deploy [analytics/refinery@46c0b84]: Deploy for gobblin migration - Refine now supports gzip - [[phab:T271232|T271232]]
* 22:44 andrewbogott: moving labs-ns0 from virt1000 to labcontrol1001
* 16:55 joal@deploy1002: Finished deploy [analytics/refinery@b5c4462]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@b5c4462] (duration: 03m 10s)
* 22:43 andrewbogott: stopping almost everything on virt1000
* 16:52 joal@deploy1002: Started deploy [analytics/refinery@b5c4462]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@b5c4462]
* 20:31 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf9
* 16:28 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:27 logmsgbot: twentyafterfour Finished scap: testwiki to php-1.26wmf9 and rebuild l10n cache (duration: 29m 24s)
* 16:15 joal@deploy1002: Finished deploy [analytics/refinery@b5c4462] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@b5c4462] (duration: 10m 21s)
* 19:58 logmsgbot: twentyafterfour Started scap: testwiki to php-1.26wmf9 and rebuild l10n cache
* 16:05 joal@deploy1002: Started deploy [analytics/refinery@b5c4462] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@b5c4462]
* 19:42 mutante: einsteinium - no console output after reboot command, powercycled, booting again
* 16:03 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:36 mutante: rebooting einsteinium
* 16:01 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:28 mutante: restarted apache on mw1227
* 15:25 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:30 mutante: wikitech-static: installing bunch of package upgrades on the external wikitech-static VM
* 15:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:13 cmjohnson1: db1058 replacing failed disk 7
* 14:49 moritzm: installing djvulibre security updates
* 16:20 cmjohnson1: analytics1028 going down for troubleshooting
* 14:05 _joe_: powercycling mw2267, stuck witout network, blank console
* 16:17 kart_: updated cxserver to 4a71145
* 13:25 otto@deploy1002: Finished deploy [analytics/refinery@8de71e6] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin dir fixes - [[phab:T271232|T271232]] (duration: 05m 41s)
* 15:37 logmsgbot: thcipriani Synchronized php-1.26wmf8/extensions/Wikidata: SWAT: Update Wikidata - forward compat for usage tracking [[gerrit:216967]] (duration: 01m 17s)
* 13:19 otto@deploy1002: Started deploy [analytics/refinery@8de71e6] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin dir fixes - [[phab:T271232|T271232]]
* 15:20 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT take II: Enabled Guided Tour on th.wikipedia [[gerrit:216950]] (duration: 01m 08s)
* 13:13 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:19 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Enabled Guided Tour on th.wikipedia [[gerrit:216950]] (duration: 01m 08s)
* 13:13 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:05 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for deployment on 20150609 [[gerrit:216622]] (duration: 01m 09s)
* 13:12 otto@deploy1002: Finished deploy [analytics/refinery@8de71e6]: analytics cluster deploy for webrequest gobblin job migration - [[phab:T271232|T271232]] (duration: 03m 11s)
* 11:09 Krenair: Email set for User:GifTagger@commonswiki per [[phab:T100889]]
* 13:09 otto@deploy1002: Started deploy [analytics/refinery@8de71e6]: analytics cluster deploy for webrequest gobblin job migration - [[phab:T271232|T271232]]
* 09:05 akosiaris: uploaded etherpad-lite_1.5.6-2 on apt.wikimedia.org/jessie-wikimedia/main component
* 12:12 urbanecm: Start server-side upload for 3 video files ([[phab:T286173|T286173]], [[phab:T286175|T286175]], [[phab:T286174|T286174]])
* 08:22 akosiaris: upload etherpad-lite_1.5.6-1 on apt.wikimedia.org, jessie-wikimedia dist, main component
* 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx1002.wikimedia.org
* 04:35 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun  9 04:34:08 UTC 2015 (duration 34m 7s)
* 11:49 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host mx1002.wikimedia.org
* 02:28 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-09 02:27:30+00:00
* 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx2002.wikimedia.org
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 07m 12s)
* 11:29 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host mx2002.wikimedia.org
* 01:42 godog: stop icinga-wm on neon
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16782 and previous config saved to /var/cache/conftool/dbconfig/20210707-112149-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16781 and previous config saved to /var/cache/conftool/dbconfig/20210707-110645-root.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16780 and previous config saved to /var/cache/conftool/dbconfig/20210707-105142-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16779 and previous config saved to /var/cache/conftool/dbconfig/20210707-103638-root.json
* 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316', diff saved to https://phabricator.wikimedia.org/P16778 and previous config saved to /var/cache/conftool/dbconfig/20210707-103553-marostegui.json
* 07:56 moritzm: bounced elasticsearch_5@production-logstash-eqiad on logstash1009
* 07:03 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== June 8 ==
== 2021-07-06 ==
* 23:43 bblack: repooled cp3030/cp1065 in pybal
* 18:34 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 23:11 logmsgbot: ebernhardson Synchronized php-1.26wmf8/extensions/UploadWizard/: Bump UploadWizard in 1.26wmf8 for evening SWAT (duration: 01m 09s)
* 18:34 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 22:21 bblack: depooled cp3030, cp1065 in pybal for ipsec
* 18:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 20:17 subbu: deployed parsoid sha 131554ba
* 18:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 19:18 jynus: RAID degradation (disk failure) on s5 master (db1058), no production impact, replacement on the way
* 17:25 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0] (duration: 05m 31s)
* 17:13 ottomata: restarted eventlogging services on eventlog1001 after disabling kafka pieces
* 17:20 joal@deploy1002: Started deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0]
* 16:13 _joe_: powercycling tmh1001, console blank, unresponsive to pings
* 17:19 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0] (duration: 00m 07s)
* 16:00 logmsgbot: thcipriani Synchronized commonsuploads.dblist: SWAT: Revert Temporarily re-enable uploads on Marathi Wikipedia, for real [[gerrit:216719]] (duration: 01m 07s)
* 17:19 joal@deploy1002: Started deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0]
* 15:58 logmsgbot: thcipriani Synchronized commonsuploads.dblist: SWAT: Revert Temporarily re-enable uploads on Marathi Wikipedia [[gerrit:216719]] (duration: 01m 08s)
* 17:19 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0] (duration: 36m 59s)
* 15:40 logmsgbot: thcipriani Synchronized php-1.26wmf8/extensions/Cite: SWAT: Revert Do all of Cite's real work during unstrip and followup [[gerrit:216715]] (duration: 01m 08s)
* 16:42 joal@deploy1002: Started deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0]
* 15:19 Coren: T96063: process halted for now as store/backup is unmovable and on slice5
* 15:54 otto@deploy1002: Finished deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration (duration: 05m 24s)
* 15:17 logmsgbot: thcipriani Synchronized w/static/images/project-logos/pflwiki.png: SWAT: Fix transparency of pflwiki logo [[gerrit:216595]] (duration: 01m 08s)
* 15:48 otto@deploy1002: Started deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration
* 15:15 akosiaris: disabled ircecho on neon for a while
* 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16777 and previous config saved to /var/cache/conftool/dbconfig/20210706-140049-root.json
* 14:53 Coren: T96063: starting pvmove from slice5 to slice2
* 13:53 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 14:48 Coren: T96063: dropped volume slice1 from vg store
* 13:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart
* 14:46 Coren: T96063: dropped store/project
* 13:49 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
* 14:44 Coren: starting https://phabricator.wikimedia.org/T96063 on labstore1001
* 13:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart
* 14:24 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: depool es1005 (duration: 01m 08s)
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16776 and previous config saved to /var/cache/conftool/dbconfig/20210706-134545-root.json
* 14:23 Coren: rsync in progress between labstore1001:store/backup and labstore1002:backup/backup (at ionice idle)
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16775 and previous config saved to /var/cache/conftool/dbconfig/20210706-133041-root.json
* 14:13 Coren: created store/backup snapshot on labstore1001 for backup copy
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16774 and previous config saved to /var/cache/conftool/dbconfig/20210706-131537-root.json
* 13:03 moritzm: added strongswan_5.3.0-1+wmf2 to jessie-wikimedia on carbon
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16773 and previous config saved to /var/cache/conftool/dbconfig/20210706-120242-root.json
* 11:42 _joe_: purging squid cache on carbon
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P16772 and previous config saved to /var/cache/conftool/dbconfig/20210706-115820-marostegui.json
* 11:26 moritzm: updated mc2* to 2:2.8.17-1+deb8u1
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P16771 and previous config saved to /var/cache/conftool/dbconfig/20210706-115732-marostegui.json
* 10:55 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool es1007 (duration: 01m 08s)
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16770 and previous config saved to /var/cache/conftool/dbconfig/20210706-114739-root.json
* 10:27 akosiaris: disabled puppet on uranium, investigating ganglia problems
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16769 and previous config saved to /var/cache/conftool/dbconfig/20210706-113235-root.json
* 10:05 akosiaris: ganglia gmetad problems
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16768 and previous config saved to /var/cache/conftool/dbconfig/20210706-111731-root.json
* 05:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun  8 05:24:08 UTC 2015 (duration 24m 7s)
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2071', diff saved to https://phabricator.wikimedia.org/P16767 and previous config saved to /var/cache/conftool/dbconfig/20210706-111635-marostegui.json
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-08 02:25:12+00:00
* 10:19 moritzm: installing jackson-databind security updates on buster
* 02:21 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 07m 07s)
* 09:01 _joe_: repooling wdqs1007 now that lag has caught up
* 08:43 moritzm: installing libuv1 security updates on buster
* 07:06 marostegui: Upgrade db1104 kernel
* 06:54 moritzm: installing PHP 7.3 securiy updates on buster
* 06:50 marostegui: Upgrade db1122 kernel
* 06:35 marostegui: Upgrade db1138 kernel
* 06:31 marostegui: Upgrade db1160 kernel
* 00:56 eileen: process-control config revision is {{Gerrit|8d46b52ed4}}


== June 7 ==
== 2021-07-05 ==
* 23:27 godog: reboot ms-be2008 sdg failed, xfs unhappy
* 17:40 legoktm: published fixed docker-registry.discovery.wmnet/nodejs10-devel:0.0.4 image ([[phab:T286212|T286212]])
* 07:03 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1073, warm up (duration: 01m 09s)
* 15:24 _joe_: leaving wdqs1007 depooled so that the updater can recover faster, now at 16.5 hours of lag
* 05:16 andrewbogott: we did a whole lot of things to labstore1001 while morebots was away
* 14:01 moritzm: uploaded nginx 1.13.9-1+wmf3 for stretch-wikimedoa
* 05:14 andrewbogott: service nfs-kernel-server restart on labstore1001
* 12:50 marostegui: Stop MySQL on db1117:3321 to clone db1125 [[phab:T286042|T286042]]
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-07 02:25:13+00:00
* 11:29 moritzm: installing openexr security updates on stretch
* 02:21 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 07m 09s)
* 11:07 moritzm: installing tiff security updates on stretch
* 10:48 moritzm: upgrading PHP on miscweb*
* 10:37 jbond: enable puppet  fleet wide to post puppetdb change
* 10:29 marostegui: Optimize ruwiki.logging on s6 eqiad with replication [[phab:T286102|T286102]]
* 10:27 jbond: disable puppet fleet wide to preforem puppetdb change
* 08:15 moritzm: rolling out debmonitor-client 0.3.0
* 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
* 08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
* 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
* 07:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
* 07:04 _joe_: restarting blazegraph, then restarting the updater again
* 06:48 moritzm: start rasdaemon on sretest1001, didn't start after last reboot from a week ago
* 06:47 _joe_: restart wdqs-updater on wdqs1007
* 00:53 eileen: process-control config revision is {{Gerrit|a1717c7fde}}
* 00:47 eileen: process-control config revision is {{Gerrit|24565578f7}}


== June 6 ==
== 2021-07-04 ==
* 23:46 subbu: deployed parsoid 5172a446 (cherry-pick of 719c736f) -- hotfix for T101599
* 17:43 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:702957{{!}}Revert "Replace depricating method IContextSource::getWikiPage to WikiPageFactory usage" (T286140)]] (duration: 01m 06s)
* 05:48 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun  6 05:47:40 UTC 2015 (duration 47m 39s)
* 08:02 elukey: repool eqsin after equinix maintenance - [[phab:T286113|T286113]]
* 02:31 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-06 02:30:24+00:00
* 02:26 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 07m 10s)


== June 5 ==
== 2021-07-03 ==
* 22:42 godog: powercycle graphite2001, no console no ssh
* 17:46 elukey: depool eqsin due to loss of power redundancy (equinix maintenance) - [[phab:T286113|T286113]]
* 22:06 andrewbogott: restarted apache on virt1000
* 09:12 Amir1: restarting mailman3-web on lists1001 to pick up patches for [[phab:T283659|T283659]]
* 20:49 ori: Upgrading hhvm-fss on application servers to 1.1.7; expect brief 5xx spike.
* 08:53 Amir1: patching postorius and mailmanclient on lists1001 for [[phab:T283659|T283659]]
* 20:14 logmsgbot: demon Synchronized php-1.26wmf8: live hack (duration: 02m 32s)
* 20:10 mutante: apt-get upgrade on terbium
* 19:52 godog: bounce redis on rdb1001/rdb1003 to pick up new slave limits
* 19:51 mutante: chown root:root / on terbium
* 19:50 godog: bounce redis on rdb1002/rdb1004 to pick up new slave limits
* 19:29 godog: bounce redis again on rdb1003 after increasing the slave limits more
* 19:17 godog: bounce redis on rdb1003 after bumping slave limits
* 19:07 godog: redis master logs shows periodic 'cmd=sync scheduled to be closed ASAP for overcoming of output buffer limits.' indicating the slave fails to sync
* 18:40 godog: spike in redis network starting at ~15.00 UTC, correlates with ocg failures
* 18:01 moritzm: restarted gerrit on ytterbium for java update
* 14:43 jynus: short lag period on db1049, traffic automatically redirected to other slave and back to normal
* 14:07 moritzm: added ubuntu-meta-1.325+wmf1 for trusty-wikimedia to apt.wikimedia.org (T100004)
* 14:07 moritzm: added ubuntu-meta-1.267.1+wmf1 for precise-wikimedia to apt.wikimedia.org (T100004)
* 12:44 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Depool es1007 (duration: 01m 08s)
* 12:08 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1009 (duration: 01m 08s)
* 11:30 _joe_: uploaded new HHVM package, installing on mw1025 for testing
* 09:17 moritzm: added redis_2.6.13-1+wmf1 to precise-wikimedia on apt.wikimedia.org
* 06:24 moritzm: added redis_2.8.4-2+wmf1 to trusty-wikimedia on apt.wikimedia.org
* 05:23 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun  5 05:22:50 UTC 2015 (duration 22m 49s)
* 04:10 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 01m 08s)
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-05 02:25:20+00:00
* 02:21 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 07m 09s)
* 01:27 tgr: deploying schema changes for Gather on enwiki, enwikivoyage, hewiki (T98490, T101460)
* 00:08 logmsgbot: catrope Synchronized php-1.26wmf8/vendor/oojs/oojs-ui/php/Tag.php: Fix OOUI fatals (T99210) (duration: 00m 13s)


== June 4 ==
== 2021-07-02 ==
* 23:40 logmsgbot: catrope Synchronized php-1.26wmf8/extensions/MobileFrontend: SWAT (duration: 00m 13s)
* 22:06 foks: removing three files for legal compliance
* 23:28 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Disable VE A/B test for new accounts on enwiki (duration: 00m 13s)
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:39 ejegg: updated payments from d22e44e3fab2b937707c2776384cb93a49b4cfd3 to 43c7952d2a31deaea97e8319f5612d644dce43c8
* 19:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 22:21 ottomata: doing controlled restart of kafka brokers services to apply auto create topic config
* 18:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:48 jgage: analyics1013 crashed, rebooted
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:42 logmsgbot: ori Synchronized php-1.26wmf8/includes/libs/ReplacementArray.php: 1b20d62c26: Revert "awful hack: disable fss on zhwiki only, except on mw1017" (duration: 00m 13s)
* 18:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:34 ori: performing rolling restart of HHVMs for hhvm-fss upgrade
* 18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:27 bd808: restarted logstash and elasticsearch on logstash100[1-3] to pick up latest jre updates
* 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:48 mutante: restarted apache on silver/wikitech
* 18:22 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:20 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Depool es1009 and master-slave switchover (duration: 00m 13s)
* 18:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:01 awight: Enabling PayPal audit parser job
* 18:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:57 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1008 (duration: 00m 15s)
* 15:59 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:44 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Repool es2008 and its slaves (duration: 00m 13s)
* 15:54 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 17:21 ori: Disabling Puppet and nutcracker on mw1017 to control for parser cache
* 15:29 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:18 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Depool es2008 and its slaves (duration: 00m 13s)
* 15:17 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dragonfly-supernode1001.eqiad.wmnet
* 17:17 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Depool es1008 (duration: 00m 12s)
* 15:07 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
* 16:33 logmsgbot: kartik Finished scap: Update ContentTranslation (duration: 09m 17s)
* 15:05 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dragonfly-supernode1001.eqiad.wmnet
* 16:23 logmsgbot: kartik Started scap: Update ContentTranslation
* 15:02 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
* 15:54 moritzm: added redis_2.8.4-2+wmf1 to trusty-wikimedia on apt.wikimedia.org
* 14:54 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
* 15:48 logmsgbot: anomie Synchronized php-1.26wmf8/includes/jobqueue/: SWAT: jobqueue: Record stats on how long it takes before a job is run [[gerrit:215748]] (duration: 00m 14s)
* 14:53 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
* 15:38 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ApiFeatureUsage everywhere [[gerrit:215901]] (duration: 00m 19s)
* 14:52 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
* 15:36 logmsgbot: anomie Synchronized wmf-config/CommonSettings.php: SWAT: Remove obsolete 'ValidateExtendedMetadataCache' hook [[gerrit:215900]] (duration: 00m 12s)
* 14:40 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[0-1].eqiad.wmnet
* 15:35 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Added staff-recommender campaign [[gerrit:215865]] (duration: 00m 12s)
* 14:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-9].eqiad.wmnet
* 15:30 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for deployment on 20150406 [[gerrit:215281]] (duration: 00m 12s)
* 14:38 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
* 15:12 logmsgbot: ori Synchronized php-1.26wmf8/includes/libs/ReplacementArray.php: Ia5f3dc84605: awful hack: disable fss on zhwiki only, except on mw1017 (duration: 00m 17s)
* 14:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw142[0-1].eqiad.wmnet
* 15:09 _joe_: puppet disabled, fss disabled on mw1017
* 14:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-9].eqiad.wmnet
* 14:42 YuviPanda: running sudo sed -i 's/GlobalSign_CA.pem/ca-certificates.crt/' /etc/ldap/ldap.conf on all labs nodes
* 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw142[0-1].eqiad.wmnet
* 14:36 awight: Disable PayPal audit parsing job
* 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw141[4-9].eqiad.wmnet
* 12:19 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1072, warm up (duration: 00m 13s)
* 14:15 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw142[0-1].eqiad.wmnet
* 05:12 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun  4 05:11:32 UTC 2015 (duration 11m 31s)
* 14:12 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw141[4-9].eqiad.wmnet
* 02:30 logmsgbot: LocalisationUpdate completed (1.26wmf8) at 2015-06-04 02:28:54+00:00
* 14:12 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry[2005-2008].codfw.wmnet
* 02:25 logmsgbot: l10nupdate Synchronized php-1.26wmf8/cache/l10n: (no message) (duration: 07m 22s)
* 13:54 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry[2005-2008].codfw.wmnet
* 13:32 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=registry200[5-8].codfw.wmnet,dc=codfw,cluster=docker-registry
* 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 13:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
* 13:11 mutante: mw2380 - rebooting
* 13:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
* 12:24 moritzm: added btullis to pwstore
* 12:06 mutante: mw2380 /puppetmaster: reimaged, revoking old cert, signing new cert, initial puppet run [[phab:T285603|T285603]]
* 11:51 mutante: mw2380 - PXE booting - does not boot from hard disk
* 11:28 mutante: powercycling mw2380, trying to make it boot
* 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 11:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 11:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 10:33 jforrester@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/WikibaseMediaInfo: UploadWizard/WikibaseMediaInfo fix {{Gerrit|3fd2873}} for [[phab:T285579{{!}}T285579]] (duration: 00m 59s)
* 09:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1268.eqiad.wmnet
* 09:37 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:702808{{!}}Fix handling of geEnabled flag (T285996)]] (duration: 00m 57s)
* 09:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1268.eqiad.wmnet
* 09:24 godog: test thanos 0.21.1 locally on thanos-fe2001 and depool the host - [[phab:T285835|T285835]]
* 09:19 dcausse: restart blazegraph on wdqs1013
* 09:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1267.eqiad.wmnet
* 09:04 mutante: decom'ing mw1267
* 09:02 moritzm: installing node-hosted-git-info security updates
* 09:02 tgr: deploying emergency backport: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/702808
* 08:54 moritzm: installing  golang-docker-credential-helpers security updates
* 08:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1267.eqiad.wmnet
* 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 08:03 moritzm: installing ipmitool security updates
* 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1268.eqiad.wmnet
* 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1267.eqiad.wmnet
* 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
* 07:25 dcausse: installing openjdk-8-dbg on wdqs1013
* 03:14 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo run-puppet-agent --force'`
* 03:11 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo apt update'` fixed the issue
* 03:07 ryankemper: [[phab:T264053|T264053]] `Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install elasticsearch-madvise' returned 100: Reading package lists...` grr
* 03:07 ryankemper: [[phab:T264053|T264053]] `ryankemper@elastic2054:~$ sudo run-puppet-agent --force`
* 03:06 ryankemper: [[phab:T264053|T264053]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/702791; will run puppet on single host
* 03:05 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo disable-puppet "verify new deb package works - [[phab:T264053|T264053]]"'`
* 03:02 legoktm: uploaded elasticsearch-madvise_0.1~deb9u1_amd64.changes to stretch-wikimedia on apt1001
* 01:47 eileen: civicrm revision changed from {{Gerrit|e07c2be1a7}} to {{Gerrit|bb62188ec6}}, config revision is {{Gerrit|1739c53fcb}}
* 01:16 legoktm: uploaded elasticsearch-madvise 0.1 to apt.wm.o ([[phab:T264053|T264053]])


== June 3 ==
== 2021-07-01 ==
* 23:42 logmsgbot: kaldari Synchronized wmf-config/InitialiseSettings.php: syncing ImportSource change for meta (duration: 00m 13s)
* 23:29 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:702777{{!}}Revert "deployment training: readme whitespace"]] (duration: 00m 56s)
* 23:34 logmsgbot: kaldari Synchronized wmf-config/InitialiseSettings.php: syncing config change for mediawiki logo on mobile, take 2 (duration: 00m 12s)
* 23:21 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:702774{{!}}deployment training: readme whitespace]] (duration: 00m 57s)
* 23:26 logmsgbot: kaldari Synchronized wmf-config/InitialiseSettings.php: syncing config change for mediawiki logo on mobile (duration: 00m 12s)
* 22:37 urbanecm: Start server-side upload for 1 video file ([[phab:T285182|T285182]])
* 23:25 logmsgbot: kaldari Synchronized images/mobile/mediawiki.png: syncing mediawiki logo for mobile (duration: 00m 12s)
* 22:36 urbanecm: Start server-side upload for 1 video file ([[phab:T285789|T285789]])
* 22:02 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase usage tracking on ukwiki and viwiki (duration: 00m 15s)
* 22:31 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:702704{{!}}Use train-versions.json to map from version to image tag (T282824)]] (duration: 00m 57s)
* 21:58 mutante: restarted gitblit
* 22:27 urbanecm: Start server-side upload for 1 video file ([[phab:T285682|T285682]])
* 21:53 logmsgbot: ori Synchronized php-1.26wmf8/includes/resourceloader/ResourceLoader.php: 7f49853fc9: ResourceLoader::filter: use APC when running under HHVM (did not sync correct file previously) (duration: 00m 12s)
* 21:43 dancy@deploy1002: Synchronized .pipeline/config.yaml: Config: [[gerrit:702755{{!}}Temporarily disable notification for security patch failures]] (duration: 00m 57s)
* 21:20 andrewbogott: restarting pdns on virt1000 and labcontrol1001
* 19:45 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.12
* 21:05 Jamesofur: decryption key for Board Election insert into voteWiki
* 19:41 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 12s)
* 20:58 bblack: repooling ns0 -> radon AuthDNS
* 19:39 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
* 20:55 bblack: depooling ns0 -> radon AuthDNS (rebooting for kernel update)
* 19:35 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/tests/phpunit/includes/TitleMethodsTest.php: Backport: [[gerrit:702711{{!}}Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
* 20:50 hashar: restarted zuul entirely to remove some stalled jobs
* 19:34 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/Title.php: Backport: [[gerrit:702711{{!}}Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
* 20:29 paravoid: kafka preferred-replica-election on an1021
* 19:18 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/.pipeline/config.yaml: Backport: [[gerrit:702168{{!}}Trigger update-train-versions job at end of wmf-publish pipeline]] (duration: 01m 08s)
* 20:28 hashar: Restarting Jenkins to release a deadlock
* 18:55 otto@deploy1002: Finished deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883] (duration: 05m 19s)
* 20:23 logmsgbot: ori Synchronized php-1.26wmf8/resources/Resources.php: 7f49853fc9: ResourceLoader::filter: use APC when running under HHVM (duration: 00m 13s)
* 18:50 otto@deploy1002: Started deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883]
* 20:19 subbu: deployed parsoid sha ab675400
* 18:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7995f7abe3b94eb0326064cbbd1d3111f8f21365}}: Use Vue.js for QuickSurveys on available wikis ([[phab:T285890|T285890]]) (duration: 01m 09s)
* 19:08 bblack: changed ops/puppet repo to ff-only in gerrit config, feel free to scream/revert if necc!
* 18:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: {{Gerrit|654877f92fa18ae766d693630025c69400cad3ac}}: EventDispatcher: Ensure we fetch page content from the primary database ([[phab:T285895|T285895]]) (duration: 01m 12s)
* 18:46 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: All wikis to 1.26wmf8, no new branch until next Tuesday, June 9th
* 18:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: {{Gerrit|6d9043087ec421e1321cd6797934928e2651b1c1}}: EventDispatcher: Ensure we fetch page content from the primary database ([[phab:T285895|T285895]]) (duration: 01m 14s)
* 18:42 logmsgbot: twentyafterfour Finished scap: Delete stale branch symlinks (1.26wmf1,1.26wmf2) (duration: 07m 14s)
* 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:35 logmsgbot: twentyafterfour Started scap: Delete stale branch symlinks (1.26wmf1,1.26wmf2)
* 16:28 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.12"
* 15:16 logmsgbot: legoktm Synchronized wmf-config/InitialiseSettings.php: Remove references to $wgEchoCohortInterval (duration: 00m 12s)
* 16:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:16 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Change default extension distributor branch to REL1_25 (duration: 00m 15s)
* 16:23 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php: [[phab:T285959|T285959]] (duration: 01m 20s)
* 15:15 bblack: repooling ns1->baham DNS traffic
* 16:11 vgutierrez: restart varnish-fe on cp3059 - [[phab:T285953|T285953]]
* 15:07 bblack: depooling ns1->baham DNS traffic for kernel update
* 14:58 papaul: poweroff mw2380 for disk replacement
* 15:00 moritzm: added linux 3.19.3-5 for jessie-wikimedia on apt.wikimedia.org
* 14:57 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
* 14:46 bblack: restarted hhvm on mw1195, seems to be a case of https://phabricator.wikimedia.org/T89912
* 14:53 effie: depool mw2380 for disk repair - [[phab:T285603|T285603]]
* 14:32 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase usage tracking on huwiki (duration: 00m 12s)
* 14:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .