You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(dwisehaupt: frmon2001 upgraded to buster with grafana 7.2.1)
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(248 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-10-20 ==
== 2021-08-03 ==
* 22:10 dwisehaupt: frmon2001 upgraded to buster with grafana 7.2.1
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:19 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:18 cdanis: ✔️ cdanis@mw2252.codfw.wmnet ~ 🕠🍺 sudo depool
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 20:57 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 00m 08s)
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:56 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:39 cdanis: doing some manual testing on mw2221, depooled and puppet disabled
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 20:33 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 08m 10s)
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 20:31 ryankemper: [Temporarily] disabled notifications for all wdqs hosts while we figure out how to unstick the updater process. Impact is that new updates will be delayed, but queries will still keep serving as normal, so fixing this is a priority but note that there's no availability outage
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:25 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 19:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=canary
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 19:24 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 18:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 18:56 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 17:48 effie: depooling mw2328 - [[phab:T266052|T266052]]
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 17:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 17:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 15:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@629e8bc]: search satisfaction: remove unused y/m/d cli args (duration: 01m 31s)
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 15:52 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@629e8bc]: search satisfaction: remove unused y/m/d cli args
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 15:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 15:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 14:58 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: {{Gerrit|fee2d3be13ae14d7ea51ff2db42090a1c27819bf}}: Prevent uncaught warnings/exception on Special:AbuseFilter ([[phab:T265994|T265994]]) (duration: 01m 03s)
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 14:56 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/AbuseFilter/includes/Views/AbuseFilterViewList.php: {{Gerrit|00ef00f59fd2a7a1366161ccc66c260be20e3e50}}: Prevent uncaught warnings/exception on Special:AbuseFilter ([[phab:T265994|T265994]]) (duration: 01m 01s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:48 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/FileImporter/: {{Gerrit|5eee9b773338e5181867cabec9faefbdeacf67ca}}: Set originalRequest (incl. X-Forwarded-For) for remote edits ([[phab:T265810|T265810]]) (duration: 01m 06s)
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:16 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/FileImporter/: {{Gerrit|5f8d3de14c116b618f5226419082d5c9a07766fb}}: Set originalRequest (incl. X-Forwarded-For) for remote edits ([[phab:T265810|T265810]]) (duration: 01m 09s)
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 14:15 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master u=)]$ sudo /usr/local/sbin/fix-staging-perms
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13033 and previous config saved to /var/cache/conftool/dbconfig/20201020-135436-root.json
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 80%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13032 and previous config saved to /var/cache/conftool/dbconfig/20201020-133933-root.json
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 60%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13031 and previous config saved to /var/cache/conftool/dbconfig/20201020-132430-root.json
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 13:19 XioNoX: install routinator 3000 0.8.0 on rpki2001 - [[phab:T266001|T266001]]
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 13:16 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.14
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:11 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.14 (duration: 58m 03s)
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13030 and previous config saved to /var/cache/conftool/dbconfig/20201020-130926-root.json
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after checking tables ', diff saved to https://phabricator.wikimedia.org/P13029 and previous config saved to /var/cache/conftool/dbconfig/20201020-125423-root.json
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 12:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 12:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 12:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 12:24 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 12:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 12:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:13 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.14
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:37 liw: 1.36.0-wmf.14 was branched at {{Gerrit|1b7b5f716015f9303d37158820dadf759e8db707}} for [[phab:T263180|T263180]]
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 11:35 Lucas_WMDE: EU backport/config window done
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/WikimediaEvents/: Backport: [[gerrit:635030{{!}}SearchSatisfaction: Set isAnon field (T259250)]] (duration: 00m 57s)
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 11:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:634039{{!}}Set Wikidata MF to collapse sections by default (T239195)]] (duration: 00m 56s)
* 16:59 hashar: Gerrit has been upgraded
* 11:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:634938{{!}}Remove noratelimit from Wikidata bot group (T258354)]] (duration: 00m 56s)
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 10:09 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 10:09 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 10:04 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 16:45 hashar: Stopping Gerrit for upgrade
* 09:59 dcausse: [[phab:T255399|T255399]]: resuming wdqs-data-reload manually from chunk no 776 on wdqs1009
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 09:51 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 09:50 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 09:50 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 09:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:25 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:08 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:08 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:06 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== 2020-10-19 ==
== 2021-08-02 ==
* 23:57 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 23:57 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:57 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:56 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:11 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4bfd6c9]: spark: case insensitive schema validation (duration: 04m 33s)
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:07 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4bfd6c9]: spark: case insensitive schema validation
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:02 mutante: etherpad got restarted with new config options related to rate limiting - hopefully this fixed [[phab:T265490|T265490]]
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 21:19 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@94c23a1]: airflow: fix column mismatch writing page predictions (duration: 04m 48s)
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 21:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@94c23a1]: airflow: fix column mismatch writing page predictions
* 21:31 tzatziki: removing 1 file for legal compliance
* 21:01 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 21:16 tzatziki: removing 7 files for legal compliance
* 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 20:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:41 eileen: drush vset match_on_import 1
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 20:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:00 urbanecm: Morning B&C window completed
* 20:21 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 20:21 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 20:19 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:19 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:18 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 20:18 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:17 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 20:17 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 20:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp2020.codfw.wmnet
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 20:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@e66bec2]: Fix column mismatch when reading discovery.wikibase_item (duration: 01m 03s)
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 20:16 mutante: decom'ing wtp201[0-9].codfw.wmnet (pooled=inactive) [[phab:T265558|T265558]]
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:15 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp201[0-9].codfw.wmnet
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 20:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@e66bec2]: Fix column mismatch when reading discovery.wikibase_item
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:09 dzahn@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=parsoid,service=canary
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 20:08 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 20:08 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 20:01 mutante: decom'ing wtp200[1-9].codfw.wmnet (pooled=inactive) [[phab:T265558|T265558]]
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 20:00 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: dc=codfw,name=wtp200[1-9].codfw.wmnet
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 19:57 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 19:57 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 19:57 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 19:52 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 19:52 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 19:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 19:45 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3c590e2]: Fix column mismatch for discovery.wikibase_item and multilist handler for esbulk uploads (duration: 03m 35s)
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 19:41 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3c590e2]: Fix column mismatch for discovery.wikibase_item and multilist handler for esbulk uploads
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 19:35 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 19:34 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 19:33 mutante: wtp2001 - sudo confctl decommission
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 19:29 dzahn@cumin1001: conftool action : set/weight=0; selector: dc=codfw,cluster=parsoid,service=canary
* 12:20 mutante: gerrit servers: disabling puppet
* 19:01 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Set default variant to D on trwiki ([[phab:T243445|T243445]], [[phab:T265556|T265556]]) (duration: 00m 56s)
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 18:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18902aa75efafb7d56ca347c12781dbe59f2f8ad}}: Change votewiki language temporarily to fa for fawiki elections ([[phab:T262689|T262689]]) (duration: 00m 56s)
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 18:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable and configure GrowthExperiments on trwiki ([[phab:T243445|T243445]]) (duration: 00m 57s)
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 18:29 tzatziki: removing 10 files for legal compliance
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 18:24 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/MobileFrontend/: Fix mobile diff redirect when curid parameter is present ([[phab:T265654|T265654]]) (duration: 00m 58s)
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 18:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable variant C/D for new users ([[phab:T265556|T265556]]) (duration: 00m 56s)
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 18:10 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Drop wgHiddenPrefs hack for VE beta feature ([[phab:T254349|T254349]]) (duration: 00m 56s)
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 17:53 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 16:44 robh@cumin1001: START - Cookbook sre.dns.netbox
* 11:27 hashar: restarting Jenkins on contint2001
* 16:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:27 hashar: restarting Jenkins on contint1001
* 16:16 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:59 Urbanecm: mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=smnwiki --cluster=all
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:31 elukey: update puppet compilers' facts
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 14:36 bpirkle@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:634841 Add api.wikimedia.org to the list of allowed CORS origins (duration: 00m 57s)
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:32 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: gerrit:634356 Configuration for user menu and sidebar special pages (duration: 00m 55s)
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 14:30 bpirkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:634356 Configuration for user menu and sidebar special pages (duration: 00m 56s)
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:15 moritzm: installing llvm-toolchain-7 bugfix updates from Buster point release
* 11:13 urbanecm: EU B&C window completed
* 13:34 Urbanecm: Start of `[urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > output/$wiki.log; done < wikis.dblist` ([[phab:T246539|T246539]]; wikis.dblist is medium wikis from group2.dblist)
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 13:33 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 11:08 moritzm: installing openjdk-11 security updates
* 13:32 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 13:31 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 13:26 moritzm: import prometheus-openldap-exporter 0+git20171128-2+deb10u1  for buster-wikimedia  [[phab:T264388|T264388]]
* 07:24 moritzm: installing libsndfile security updates on buster
* 12:48 moritzm: installing httpcomponents-client security updates on Buster
* 07:12 moritzm: installing aspell security updates
* 12:26 Urbanecm: Creation of smnwiki is done ([[phab:T264859|T264859]])
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:25 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 00m 56s)
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:22 urbanecm@deploy1001: Synchronized langlist: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 56s)
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)
* 12:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 55s)
* 12:16 marostegui: Sanitize smnwiki on db1124:3315 and db2094:3315 - [[phab:T264900|T264900]]
* 12:15 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 56s)
* 12:15 marostegui: Deploy schema change on smnwiki [[phab:T265321|T265321]] [[phab:T264900|T264900]]
* 12:14 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating smnwiki ([[phab:T264859|T264859]])
* 12:12 urbanecm@deploy1001: Synchronized dblists: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 55s)
* 12:11 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 55s)
* 12:10 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating smnwiki ([[phab:T264859|T264859]]) (duration: 00m 56s)
* 11:51 moritzm: updating idp-test1001 to CAS 6.2.4
* 11:46 moritzm: updating idp-test2001 to CAS 6.2.4
* 11:43 Urbanecm: End of `[urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist` # [[phab:T246539|T246539]] # small-group2.dblist is wikis from small.dblist that are also in group2.dblist
* 11:42 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers` ([[phab:T246539|T246539]])
* 11:40 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/script]$ while read wiki; do echo "Processing $wiki"; mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log; done < ../small-group2.dblist # [[phab:T246539|T246539]] # small-group2.dblist is wikis from small.dblist that are also in group2.dblist
* 11:31 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 11:24 Urbanecm: EU B&C window done
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ce92c9814bf9c12cab1a9592dfb32f935d255d93}}: Restore bureaucrat abilities at uzwiki ([[phab:T265746|T265746]]) (duration: 00m 56s)
* 11:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26b97261f2b9d1991ea08fe32b6007ba6fe5088f}}: Disable EditorJourney (UnderstandingFirstDay) ([[phab:T252391|T252391]]) (duration: 01m 10s)
* 11:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:13 Urbanecm: Manually run `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` for several small group2 wikis ([[phab:T246539|T246539]])
* 10:57 Urbanecm: Start `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=enwikisource --print-orphaned-records-to=/tmp/urbanecm/enwikisource-orphaned.log --progress-markers` in a tmux session named updateVarDumps at mwmaint2001 ([[phab:T246539|T246539]])
* 10:53 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/script]$  mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=jawikivoyage --print-orphaned-records-to=- --progress-markers # [[phab:T246539|T246539]]
* 09:09 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 08:40 jayme: updated helm to 2.16.12-1 on deploy*,chartmuseum*,contint*
* 08:37 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog2001 - [[phab:T259780|T259780]]
* 08:31 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:26 jayme: updated helm to 2.16.12-1 on deploy2001
* 08:24 jayme: imported helm 2.16.12-1 to buster-wikimedia stretch-wikimedia jessie-wikimedia - [[phab:T263616|T263616]]
* 08:01 godog: re-enable compaction for prometheus[12]003 - [[phab:T261281|T261281]]
* 07:53 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 07:36 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 07:36 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 ', diff saved to https://phabricator.wikimedia.org/P13022 and previous config saved to /var/cache/conftool/dbconfig/20201019-071614-marostegui.json
* 06:46 elukey@deploy1001: Finished deploy [analytics/turnilo/deploy@334627e]: Upgrade to 1.27 (duration: 00m 10s)
* 06:45 elukey@deploy1001: Started deploy [analytics/turnilo/deploy@334627e]: Upgrade to 1.27


== 2020-10-17 ==
== 2021-07-31 ==
* 13:22 Urbanecm: [urbanecm@mwmaint2001 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Fæ . # [[phab:T264529|T264529]]
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}


== 2020-10-16 ==
== 2021-07-30 ==
* 21:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 21:43 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 20:27 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 20:25 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 19:39 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 19:37 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 17:43 thcipriani: restarting gerrit due to gc thrashing
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:25 andrew@deploy1001: Finished deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors (duration: 04m 08s)
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 16:21 andrew@deploy1001: Started deploy [horizon/deploy@89b308c]: prevent creation of VMs with non-ceph flavors
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:36 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:01 bblack@cumin1001: START - Cookbook sre.hosts.decommission
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 13:41 effie: pooling mw2279.codfw.wmnet [[phab:T264698|T264698]]
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 12:11 jiji@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:09 jiji@cumin2001: START - Cookbook sre.hosts.downtime
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:35 reedy@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/ProofreadPage/: Revert excessive escaping [[phab:T265571|T265571]] (duration: 01m 12s)
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:23 ema: text@esams (except for cp3050/cp3052): upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 09:19 ema: upload@esams: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:08 ema: upload@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 09:03 XioNoX: eqsin, push CR 634473
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 09:01 ema: text@eqsin: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 08:53 ema: upload@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 08:52 XioNoX: add BGP_IXP_RS_in to eqsin RS BGP sessions
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:48 ema: text@codfw: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 08:29 ema: upload@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 08:24 ema: text@eqiad: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 08:09 elukey: reboot stat1005/stat1008 to pick up correct GPU settings
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 08:09 ema: upload@ulsfo: upgrade varnish to 6.0.6-1wm2, restart varnishkafka-webrequest [[phab:T264074|T264074]]
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 07:59 ema: text@ulsfo: upgrade varnish to 6.0.6-1wm2, restart varnishkafka instances [[phab:T264074|T264074]]
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 07:19 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@27d0b01]: cirrus namespace map: Align output columns with table (duration: 04m 22s)
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 07:15 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@27d0b01]: cirrus namespace map: Align output columns with table
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 06:57 XioNoX: enable cr2-eqdfw:xe-0/1/2
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 02:14 eileen: civicrm revision changed from {{Gerrit|585eb835d8}} to {{Gerrit|3c3dcf80ae}}, config revision is {{Gerrit|f76d7849bc}}
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 01:01 ryankemper: Cleaning up a dangling no-longer-puppet-managed udev elasticsearch-readahead rule across all cirrus instances: `sudo cumin -b 36 C:profile::elasticsearch::cirrus 'sudo rm -fv /etc/udev/rules.d/elasticsearch-readahead.rules && sudo /sbin/udevadm control --reload && sudo /sbin/udevadm trigger'`
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 00:56 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 00:56 cdanis@cumin1001: START - Cookbook sre.network.cf
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 11:23 moritzm: installing libsndfile security updates on stretch
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json


== 2020-10-15 ==
== 2021-07-29 ==
* 23:49 ryankemper: Began in-place reindex of `eqiad`, `codfw`, and `cloudelastic`. Running on `ryankemper@mwmaint2001` under tmux sessions `inplace_reindex_[eqiad, codfw, cloudelastic]`
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 23:00 krinkle@deploy1001: Synchronized wmf-config/env.php: {{Gerrit|I245e84e0b8c}} (duration: 01m 10s)
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 22:09 cdanis: previous sre.network.cf invocation was a no-op; just checking status
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 22:08 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 22:08 cdanis@cumin1001: START - Cookbook sre.network.cf
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 22:06 mutante: depooled remaining wtp* servers in codfw. old parsoid servers, new servers are parse2* ([[phab:T265558|T265558]])
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 22:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp2020.codfw.wmnet
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 22:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp201[6-9].codfw.wmnet
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 21:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp201[0-5].codfw.wmnet
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 20:27 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 20:27 cdanis@cumin1001: START - Cookbook sre.network.cf
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 19:46 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@88e1283]: spark: fix handling of unpartitioned data sources (duration: 06m 22s)
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:43 marxarelli: all wikis promoted to 1.36.0-wmf.13 ([[phab:T263179|T263179]])
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 19:39 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@88e1283]: spark: fix handling of unpartitioned data sources
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 19:33 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.13
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 19:30 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 19:23 robh@cumin1001: START - Cookbook sre.dns.netbox
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 19:20 catrope@deploy1001: Synchronized php-1.36.0-wmf.11/extensions/DiscussionTools/: Correctly generate timezone abbreviations for parsing ([[phab:T265500|T265500]]) (duration: 01m 29s)
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 19:16 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/DiscussionTools/: Correctly generate timezone abbreviations for parsing ([[phab:T265500|T265500]]) (duration: 01m 51s)
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 19:14 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/Echo/: Drop text indent in modern Vector ([[phab:T264339|T264339]]) (duration: 01m 51s)
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 19:09 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/skins/Vector/: Vertically align personal tools ([[phab:T264339|T264339]]) (duration: 01m 43s)
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 19:07 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/WikimediaEvents/: Revert "clientError: Adds is_logged_in tag to aid filtering" ([[phab:T256173|T256173]]) (duration: 01m 58s)
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 19:04 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/UploadWizard/: Work around LESS calculating calc() values wrong ([[phab:T265560|T265560]]) (duration: 02m 07s)
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 18:32 mutante: depooling wtp2005 through wtp2009 (parsoid, old server generation) [[phab:T265558|T265558]]
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 18:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[6-9].codfw.wmnet
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 18:07 mutante: mx1001/mx2001: made previous live hack official and added benefactors@wikipedia alias, re-enabling puppet
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 17:51 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:46 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 14:11 vgutierrez: restart pybal on lvs2009
* 17:19 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:09 vgutierrez: restart pybal on lvs2010
* 17:17 jbond42: deleteing old pcc reports in compiler1002 to free disk space
* 14:07 vgutierrez: restart pybal on lvs2008
* 17:12 volans@cumin1001: START - Cookbook sre.dns.netbox
* 14:05 vgutierrez: restart pybal on lvs2007
* 17:06 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 13:59 vgutierrez: restart pybal on lvs1014
* 17:05 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:55 vgutierrez: restart pybal on lvs1015
* 17:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 13:52 _joe_: restarting pybal on lvs1016
* 16:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 16:57 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 16:56 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 16:54 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 16:51 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 16:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 16:50 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 16:48 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 16:46 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 16:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 16:25 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 16:25 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 16:14 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 16:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 16:11 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 16:11 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 16:11 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 15:53 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 15:53 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/CheckUser/includes/specials/: {{Gerrit|fd94002cf6070180a289296ec65ad224e5a0ae67}}: Revert "Validate username input before constructing subpage links" ([[phab:T265606|T265606]]) (duration: 02m 48s)
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:50 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:52 moritzm: restarting Tomcat on idp-test
* 15:47 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 15:35 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 15:19 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 15:09 elukey@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0)
* 15:07 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@500bdad]: spark: correctly parse non-partitioned partition specs (duration: 00m 59s)
* 15:06 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@500bdad]: spark: correctly parse non-partitioned partition specs
* 14:51 elukey: roll restart druid-historical daemons on druid1004-1008 to pick up new conn pooling changes
* 14:51 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers
* 14:45 jbond42: enable puppet post deploy puppetdb change blacklisting dynamic facts
* 14:41 ema: varnish 6.0.6-1wm2 uploaded to apt.wikimedia.org component/varnish6 [[phab:T264074|T264074]]
* 14:38 jbond42: disable puppet to deploy puppetdb change blacklisting dynamic facts
* 14:21 ema: cp3050: systemctl reload varnishkafka-webrequest.service [[phab:T264074|T264074]]
* 14:21 jayme: imported doxygen_1.8.19-1~deb10+wmf1 to component/ci buster-wikimedia - [[phab:T265579|T265579]]
* 14:12 ema: cp3050: restart varnishkafka-webrequest w/ libvarnishapi2 6.0.6-1wm2 [[phab:T264074|T264074]]
* 14:11 ema: cp3050: upgrade varnish to 6.0.6-1wm2 [[phab:T264074|T264074]]
* 14:10 ema: cp3050: upgrade varnish to 6.0.6-1wm2 [[phab:T26407|T26407]]
* 12:58 gilles@deploy1001: Finished deploy [performance/navtiming@dff55f8]: (no justification provided) (duration: 00m 05s)
* 12:58 gilles@deploy1001: Started deploy [performance/navtiming@dff55f8]: (no justification provided)
* 12:12 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 10:47 vgutierrez: restart ats-backend on cp3050
* 10:00 akosiaris: [[phab:T264209|T264209]]. Initiate a docker pull of docker-registry.discovery.wmnet/mwcachedir:0.0.1 from all kubernetes and kubernetes staging nodes.
* 08:17 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 04:27 ryankemper: Rolling upgrade for cirrus `codfw` complete
* 04:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 02:18 ryankemper: Rolling upgrade for cirrussearch `codfw` beginning
* 02:18 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 02:14 ryankemper: Rolling upgrade for cirrussearch `eqiad` is complete
* 02:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-upgrade (exit_code=0)
* 00:36 ryankemper: Beginning rolling upgrade for cirrussearch `eqiad`. Cookbook will restart elasticsearch on 36 nodes total, 3 nodes at a time
* 00:36 eileen: tools revision changed from {{Gerrit|d4e08c52de}} to {{Gerrit|a2a91d6c6a}}
* 00:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-upgrade
* 00:24 twentyafterfour: phabricator update was uneventful
* 00:13 twentyafterfour: updating phabricator


== 2020-10-14 ==
== 2021-07-28 ==
* 23:35 foks: Removing one further file for legal compliance
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 23:28 foks: Removing nine files for legal compliance
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 23:11 ebernhardson: Syncronized wmf-config/InitialiseSettings.php to sync reduction of cirrus morelike query cache from 3 back to 1 day
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 23:08 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no justification provided) (duration: 01m 04s)
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 23:00 dwisehaupt: all payments hosts in eqiad are now running the REL1_35 code.
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 22:41 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression (duration: 02m 25s)
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 22:38 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@9ce273f]: bulk_daemon: revert of streaming gzip decompression
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 22:13 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s)
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 22:12 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 22:08 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive (duration: 03m 44s)
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 22:04 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@04548dd]: spark: centralize reading/writing to hive
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 22:01 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/NavigationTiming: BACON: [[gerrit:634002{{!}}Make attribution source logic more defensive]] [[phab:T263599|T263599]] (duration: 01m 05s)
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 21:51 dpifke@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling image preconnect in group0 ([[phab:T123582|T123582]]) (duration: 01m 03s)
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 21:33 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.13/skins/Vector/resources/skins.vector.styles/Menu.less: BACON: [[gerrit:634086{{!}}Stylesheet needs to be compatible with cached HTML]] [[phab:T265543|T265543]] (duration: 01m 07s)
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 20:39 marxarelli: group1 rolled back to 1.36.0-wmf.11 due to malformed html in nav. task incoming (cc: [[phab:T263179|T263179]])
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 20:37 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.11
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 20:32 marxarelli: rolling back group1 due to malformed html in nav menu
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 19:46 marxarelli: 1.36.0-wmf.13 promoted to group1. no new or concerning errors or changes in error rates ([[phab:T263179|T263179]])
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 19:39 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.13 (duration: 01m 03s)
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 19:38 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.13
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 19:33 mutante: mx1001/mx2001 - temp. disabled puppet, live hacking urgent alias change since private repo needs to be fixed
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 19:14 mutante: depooling 5 of the older parsoid servers in codfw
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=wtp200[1-5].codfw.wmnet
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:28 Urbanecm: wikiadmin@10.192.0.6(wikidatawiki)> DELETE FROM watchlist WHERE wl_user=104889; # [[phab:T265347|T265347]]
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6a56bb7fb762c53db5965f2698a93db2433d33d}}: Add rollbacker right on uzwiki ([[phab:T265509|T265509]]) (duration: 01m 04s)
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|0da89998e4e380f3ebe527a42a47dc66c49ee4d2}}: Add spamblacklistlog as a default right for the CU log user ([[phab:T239288|T239288]]) (duration: 01m 05s)
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 16:12 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:59 elukey: drain + reboot an-worker1100 to pick up GPU settings - [[phab:T255138|T255138]]
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 15:58 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 15:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:29 elukey: drain + reboot an-worker110[1,2] to pick up GPU settings - [[phab:T255138|T255138]]
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 15:28 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0)
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:24 jayme: enabled and ran puppet on deploy1001 - [[phab:T260917|T260917]]
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:56 elukey: drain + reboot an-worker109[8,9] to pick up GPU settings - [[phab:T255138|T255138]]
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 14:55 elukey@cumin1001: START - Cookbook sre.hadoop.reboot-workers
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:12 jayme: disable-puppet on deploy1001 to test a change in hemlfile puppet on deploy2001 only - [[phab:T260917|T260917]]
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 14:01 akosiaris: push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. [[phab:T264209|T264209]]
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 14:01 akosiaris: push a 6GB image, named docker-registry.discovery.wmnet/mwcachedir:0.0.1, containing the cache/ dir of a mediawiki installation to the registry. [[phab:T265183|T265183]]
* 13:29 moritzm: installing python2.7 security updates on stretch
* 13:53 jbond42: enable puppet fleet wide post - convert puppetdb stockpile queue to tmpfs
* 13:08 moritzm: installing python3.5 security updates on stretch
* 13:48 jbond42: disable puppet fleet wide to convert puppetdb stockpile queue to tmpfs
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:46 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 10% - [[phab:T258405|T258405]]
* 11:27 moritzm: installing nginx security updates on thumbor*
* 11:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 11:50 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:48 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 11:43 moritzm: imported php-memcached, php-redis to component/icu63 [[phab:T264991|T264991]]
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 11:25 Urbanecm: EU B&C window completed
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c63632de6a20b2f00da91187e5cf416fd39d8c5b}}: Enable DiscussionTools as a beta feature on 30 more wikis ([[phab:T264693|T264693]]) (duration: 01m 15s)
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 11:16 moritzm: imported php-igbinary, php-apcu-bc to component/icu63 [[phab:T264991|T264991]]
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:59 moritzm: imported php-wmerrors, tideways, tideways-xhprof, wikidiff2, xdebug to component/icu63 [[phab:T264991|T264991]]
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:34 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:28 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:09 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12988 and previous config saved to /var/cache/conftool/dbconfig/20201014-071440-root.json
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12987 and previous config saved to /var/cache/conftool/dbconfig/20201014-065936-root.json
* 08:27 Amir1: running several long-running queries against pc1007
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12986 and previous config saved to /var/cache/conftool/dbconfig/20201014-064433-root.json
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 40%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12985 and previous config saved to /var/cache/conftool/dbconfig/20201014-062930-root.json
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 20%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12984 and previous config saved to /var/cache/conftool/dbconfig/20201014-061426-root.json
* 07:53 moritzm: installing aspell security updates on stretch
* 06:12 marostegui: Change UNIQUE into KEY on enwikivoyage.imagelinks [[phab:T265445|T265445]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 30%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12983 and previous config saved to /var/cache/conftool/dbconfig/20201014-055923-root.json
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2125 (re)pooling @ 10%: Slowly repool db2125 after on-site maintenance [[phab:T260670|T260670]] ', diff saved to https://phabricator.wikimedia.org/P12982 and previous config saved to /var/cache/conftool/dbconfig/20201014-054420-root.json
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php


== 2020-10-13 ==
== 2021-07-27 ==
* 23:22 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/GrowthExperiments/: Revert removal of variant A ([[phab:T265372|T265372]]) (duration: 01m 04s)
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 23:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Rename GrowthExperiments help desk on ptwiki ([[phab:T265214|T265214]]) (duration: 01m 04s)
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable event logging in MediaViewer ([[phab:T260582|T260582]]) (duration: 01m 04s)
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 23:07 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable watchlist expiry on frwiki, fawiki, dewiki, cswiki ([[phab:T264780|T264780]]) (duration: 01m 04s)
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 21:16 mutante: icinga had gerrit health alert but did not notice an issue myself and was gone next check
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 21:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 21:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 21:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 21:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 20:44 mutante: bast1002 - apt-get autoremove - cleans up golang and ruby packages
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 20:44 mutante: bast1002 - apt-get remove nmap (it can be used on netmon hosts and was not consistent with other bast hosts)
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 20:15 ebernhardson: unban elastic2029 from production-search-psi-codfw
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:14 ebernhardson: restart production-search-psi-codfw on elastic2029 to reset any wonkiness from gc hell
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:06 marxarelli: 1.36.0-wmf.13 promoted to group0. no new or concerning errors or changes in error rates ([[phab:T263179|T263179]])
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 20:03 ebernhardson: add elastic2029-production-search-psi-codfw to cluster.routing.allocatin.exclude._name to drain active shards, instance currently in gc hell
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 19:54 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.13
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 19:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 19:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 19:40 dduvall@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.13 (duration: 40m 51s)
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 19:00 dduvall@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.13
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 18:58 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.9 (duration: 01m 56s)
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 18:56 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.8 (duration: 02m 10s)
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 18:53 dduvall@deploy1001: Pruned MediaWiki: 1.36.0-wmf.6 (duration: 13m 00s)
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 18:23 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.11
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 18:21 marxarelli: 1.36.0-wmf.11 promoted to group1. no new errors ([[phab:T263177|T263177]]). promoting to all wikis
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 18:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 18:09 robh: scs-c1-codfw mgmt firmware updated, updating scs-a1-codfw [[phab:T238036|T238036]]
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 18:08 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:01 robh: scs-c1-codfw firmware update via [[phab:T238036|T238036]]
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 17:47 marxarelli: 1.36.0-wmf.13 branched at {{Gerrit|a6be801fc6331a6a6b96f02f368750200d50ab09}} for [[phab:T263179|T263179]]
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 17:35 dduvall@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11 (duration: 01m 07s)
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 17:34 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 17:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 17:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 17:30 marxarelli: 1.36.0-wmf.11 promoted to group0. no new errors ([[phab:T263177|T263177]]). preparing to promote to group1
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 17:18 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 17:18 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 17:17 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 17:16 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 17:15 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 17:15 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 16:39 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 16:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc (duration: 05m 29s)
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 16:26 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@77febb6]: airflow: parameterize active mediawiki dc
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 15:56 papaul: power down ms-be2036 for maintenance
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 15:02 godog: bounce logstash on logstash1007, GC death
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 14:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 14:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 14:18 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|5b28fd685b9cb8d8e93650b5d02bc41b81d0883c}}: Add setmentor to wgAvailableRights (duration: 00m 59s)
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 13:42 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 13:40 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 13:15 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=BROKEN --fix # [[phab:T265336|T265336]]
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 13:08 moritzm: imported php-mailparse, php-mongodb, php-msgpack to component/icu63 [[phab:T264991|T264991]]
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 12:50 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --add-prefix=FIXME --fix # [[phab:T265336|T265336]]
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 12:49 Urbanecm: End of `urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --fix` # [[phab:T265336|T265336]]
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2026 for on-site maintenance [[phab:T263837|T263837]] ', diff saved to https://phabricator.wikimedia.org/P12975 and previous config saved to /var/cache/conftool/dbconfig/20201013-124940-marostegui.json
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 12:20 moritzm: imported dh-php, php-acpu, php-imagick to component/icu63 [[phab:T264991|T264991]]
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 11:22 moritzm: imported php-defaults, php-excimer, php-luasandbox, php-geoip to component/icu63 [[phab:T264991|T264991]]
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 11:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|90028b4c3c1cd4407e0834d603ccb8b256f2498e}}: Add suppressredirect right to reviewers on bnwiki ([[phab:T265169|T265169]]) (duration: 00m 58s)
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 11:14 Urbanecm: Start of `urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=trwiki --fix # [[phab:T265336|T265336]]`
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 11:13 volans: installed spicerack_0.0.43-1+deb10u1_amd64.deb on cumin2001 , need to wait a long-rnning cookbook to end to upgrade both hosts
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e61fcebe7315f73d1fb4d531da37d2c1253115ee}}: Add namespace aliases for Turkish Wikipedia ([[phab:T265336|T265336]]) (duration: 00m 59s)
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 10:47 jayme: no-change rolling restart of push-notifications in codfw - [[phab:T265258|T265258]]
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 10:29 volans: upgrading spicerack on cumin2001 to 0.0.44
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 10:19 ema: cp3050: clear varnishkafka-webrequest's vut->sighup via stap [[phab:T264074|T264074]]
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 10:09 ema: cp3050: *reload* varnishkafka-webrequest [[phab:T264074|T264074]]
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 10:04 volans: uploaded spicerack_0.0.44 to apt.wikimedia.org buster-wikimedia
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 09:55 ema: cp3054: systemctl restart varnishkafka-webrequest.service [[phab:T264074|T264074]]
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 09:51 ema: cp3052: systemctl restart varnishkafka-webrequest.service [[phab:T264074|T264074]]
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 09:39 kormat: running schema change against s1 in eqiad [[phab:T259831|T259831]]
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 09:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 09:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:11 moritzm: installing aspell security updates
* 09:32 ema: cp3050: set grouping by request (vut->g_arg = 2) on varnishkafka-webrequest [[phab:T264074|T264074]]
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 08:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 08:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 08:13 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 08:11 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 07:55 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 07:55 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 07:43 kormat: running schema change against s3 in eqiad [[phab:T259831|T259831]]
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 07:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 07:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 07:37 moritzm: installing ruby security updates on stretch
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 07:02 moritzm: installing PHP 7.0 security updates
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 06:39 moritzm: Installing httpcomponents-client security updates for Stretch
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 05:35 marostegui: Set global innodb_change_buffering = inserts; on pc2009 [[phab:T263443|T263443]]
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support.  Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 11:23 Lucas_WMDE: EU backport+config window done
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 08:57 _joe_: repooling mw225[12] for apis
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 08:36 jynus: reenabled puppet on mwmaint1002
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 07:52 jynus: disabling puppet on mwmaint1002
* 07:14 moritzm: installing krb security updates on buster
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json


== 2020-10-12 ==
== 2021-07-26 ==
* 17:03 jayme: fixed /var/lock/ permission (1777) on ms-be2036 - [[phab:T265208|T265208]]
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 15:41 godog: roll-restart logstash5 in codfw
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 14:44 _joe_: freed 1.5 GB of space on ms-be2036 by running "apt-get clean"
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 14:05 moritzm: uploaded php7.2 7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1+icu63 to component/icu63 [[phab:T264991|T264991]]
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 12:39 moritzm: installing rails security updates on Stretch
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 12:26 moritzm: installing spice security updates on Buster
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 11:38 Urbanecm: EU B&C done
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 11:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fff2532424f84970962f7de1e35d4250b83cb3da}}: [testwiki, test2wiki] Allow bureaucrats to grant import rights (duration: 00m 58s)
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4966e8a6b8ae4e6d5623dd35e65ed8fcf3338bc1}}: Enable wgCheckUserLogLogins at all wikis but few large wikis ([[phab:T253802|T253802]]) (duration: 00m 58s)
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 11:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:631809{{!}}Require autoconfirmed status to edit Wikidata Properties (T254280)]] (duration: 01m 00s)
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 10:26 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 10:26 hnowlan: roll-restarting restbase201[345678] for cert refresh
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 08:50 moritzm: uploaded libxml2 2.9.4+dfsg1-2.2+deb9u3+wmf1 to component/icu63 [[phab:T264991|T264991]]
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:54 godog: reboot ms-be2036 - [[phab:T265208|T265208]]
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 07:53 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 07:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 06:39 moritzm: installing krb5 security updates
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki


== 2020-10-10 ==
== 2021-07-24 ==
* 01:32 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633281{{!}}Enable session-ip log channel everywhere (T264799)]] (duration: 00m 59s)
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]
* 00:54 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633277{{!}}Enable session-ip log channel on all but enwiki (T264799)]] (duration: 01m 01s)
* 00:18 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633276{{!}}Enable session-ip log channel on eswiki (T264799)]] (duration: 00m 55s)
* 00:13 mutante: built prometheus-nutcracker-exporter for buster and imported on apt1001 (0.2+nmu1)


== 2020-10-09 ==
== 2021-07-23 ==
* 23:44 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633274{{!}}Enable session-ip log channel on Wikidata (T264799)]] (duration: 00m 59s)
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 23:25 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633272{{!}}Enable session-ip log channel on Commons (T264799)]] (duration: 00m 59s)
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 23:13 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL and only related ticket says resolved - powercycling it - boots normal but doesn't have a prod role ([[phab:T260271|T260271]])
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:07 mutante: maps2010 is down since almost 3 days - unhandled crit alert but nothing in SAL or tickets
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:15 effie: enable puppet on mc-gp* hosts
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 23:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 23:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 23:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 22:52 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633271{{!}}Enable session-ip log channel on group1, except Commons/Wikidata (T264799)]] (duration: 00m 57s)
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 22:23 tgr@deploy1001: Synchronized php-1.36.0-wmf.11/includes/: Backport: [[gerrit:633252{{!}}Log IP/device changes within the same session (T264799)]] & [[gerrit:633254{{!}}SessionManager: Always log IP/UA in session-ip]] (duration: 01m 04s)
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 22:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:633210{{!}}Enable session-ip log channel on group0 (T264799)]] (duration: 00m 59s)
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 22:09 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/: Backport: [[gerrit:633252{{!}}Log IP/device changes within the same session (T264799)]] & [[gerrit:633254{{!}}SessionManager: Always log IP/UA in session-ip]] (duration: 01m 06s)
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 22:01 tgr_: rolling out [[phab:T264799|T264799]]#6533622
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 21:53 Urbanecm: [urbanecm@mwmaint2001 ~]$ mwscript extensions/CentralAuth/maintenance/attachAccount.php --wiki=dewiki --userlist users.txt # users.txt contains Almeida # [[phab:T263935|T263935]]
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 20:41 dwisehaupt: upgrading pay-lvs1001 to buster
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 20:31 dwisehaupt: upgrading pay-lvs1002 to buster
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 20:04 dwisehaupt: upgrading payments1001 to buster
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 19:14 dwisehaupt: upgrading payments1002 to buster
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 18:44 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 18:30 dwisehaupt: upgrading payments1003 to buster
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 17:53 dwisehaupt: upgrading payments1004 to buster
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 17:52 cstone: civicrm revision changed from {{Gerrit|b86a15a430}} to {{Gerrit|585eb835d8}}, config revision is {{Gerrit|57843925bb}}
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 16:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 15:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 15:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 15:40 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 14:41 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 14:32 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 14:18 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 13:48 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 13:45 jayme: helm rollback push-notification in eqiad to revision 8
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 13:31 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 13:29 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 13:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 13:12 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 12:55 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 12:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 12:20 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]
* 12:16 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 12:15 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 12:13 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:38 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 11:16 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 11:13 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:13 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 10:52 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 10:41 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:17 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:16 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:11 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 09:55 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 09:53 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 09:47 elukey: roll restart of hadoop-yarn-nodemanager on all hadoop workers to pick up new settings
* 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 09:38 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 09:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:07 XioNoX: remove user from all network devices
* 08:22 marostegui: Restart dbstore1005 mysql to pick up new buffer pool sizes
* 08:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:11 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:36 moritzm: installing xen security updates for buster (libs only)
* 07:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:34 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 00:00 dzahn@cumin1001: START - Cookbook sre.hosts.decommission


== 2020-10-08 ==
== 2021-07-22 ==
* 23:42 ryankemper: `cloudelastic1006` done. Writes thawed, maintenance window lifted; restarts are done for `cloudelastic`
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 23:37 ryankemper: `cloudelastic1005` done
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 23:31 ryankemper: `cloudelastic1004` done
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 23:27 ryankemper: `cloudelastic1003` done
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 23:23 ryankemper: `cloudelastic1002` done
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 23:16 tgr_: Evening deploys done
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 23:16 ryankemper: `cloudelastic1001` is done restarting and cluster is green again. Proceeding to `cloudelastic1002`
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 23:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632797{{!}}Enable logging of session cookie changes everywhere (T264793)]] (duration: 01m 01s)
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 23:04 ryankemper: Beginning cluster restarts one server at a time. For each server, the process is depool->restart elasticsearch services->wait for services to restart and then pool->wait for cluster to return to green status before starting next server
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 23:01 ryankemper: Writes are frozen for `cloudelastic`: `/usr/local/bin/mwscript extensions/CirrusSearch/maintenance/FreezeWritesToCluster.php --wiki=enwiki --cluster=cloudelastic` on `mwmaint2001` => `Applied cluster-wide freeze`
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 22:56 ryankemper: `sudo apt policy wmf-elasticsearch-search-plugins` shows correct state: `Installed: 6.5.4-4~stretch`
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 22:56 ryankemper: `sudo -E cumin -b 6 C:role::elasticsearch::cloudelastic 'DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install wmf-elasticsearch-search-plugins'`
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 22:54 ryankemper: About to start plugin upgrade followed by restarts of `cloudelastic`. Maintenance window set for the next 2 hours on `cloudelastic100[1-6]`
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 21:54 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data (duration: 01m 04s)
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 21:53 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a923949]: search_satisfaction: update druid datasource to match previous data
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 21:52 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session/SessionBackend.php: Deduplicate SessionBackend::logPersistenceChange calls - [[phab:T264793|T264793]] (duration: 01m 01s)
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 21:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 21:00 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 21:00 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 20:50 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 20:45 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 20:43 volans: deploying Netbox DNS zone consolidation - [[phab:T264273|T264273]]
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 20:11 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 20:09 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 19:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name (duration: 01m 09s)
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 19:23 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@3b11443]: search_satisfaction: Alias sample multiplier to expected name
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 18:57 volker-e@deploy1001: Finished deploy [design/style-guide@b1166af]: Deploy design/style-guide:  (duration: 00m 06s)
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:57 volker-e@deploy1001: Started deploy [design/style-guide@b1166af]: Deploy design/style-guide:
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 18:17 tchanders@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632908{{!}}Enable Special:Investigate by default on production (T264357)]] (duration: 01m 06s)
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 17:50 root@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 17:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data (duration: 11m 55s)
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 17:44 root@cumin1001: START - Cookbook sre.dns.netbox
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:37 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@945e5c1]: airflow: Set search satisfaction dag start date to oldest current available data
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 17:31 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 17:30 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 17:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 17:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:23 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 17:16 shdubsh: install prometheus-rsyslog-exporter_0.0.0+git20201008 on centrallog1001 - [[phab:T210137|T210137]]
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 16:25 mutante: rebooting cloudvirt1023 - trying PXE boot
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 16:19 hashar: Restarting CI Jenkins
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 16:15 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:09 volans@cumin1001: START - Cookbook sre.dns.netbox
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 16:08 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 16:08 volans@cumin1001: START - Cookbook sre.dns.netbox
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 14:21 marostegui: Set  global innodb_change_buffering = all; on pc2009 [[phab:T263443|T263443]]
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 14:17 moritzm: importing icu 63.1-6+deb10u1~wmf5 to component/icu63 [[phab:T264991|T264991]]
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 13:37 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 12:29 kart_: Updated cxserver to 2020-10-08-053343-production ([[phab:T264407|T264407]], [[phab:T264859|T264859]])
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 12:26 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 12:24 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 12:21 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 12:10 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 12:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 12:07 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 12:07 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 12:05 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 10:54 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 10:52 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1030.eqiad.wmnet
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1030.eqiad.wmnet
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1030.eqiad.wmnet
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 10:37 moritzm: installing Postgres security updates on netboxdb1001
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1029.eqiad.wmnet
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1029.eqiad.wmnet
* 14:27 moritzm: installing libwebp security updates on stretch
* 10:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1029.eqiad.wmnet
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 10:32 moritzm: installing Postgres security updates on netboxdb2001
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:29 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 10:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 10:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-backend,name=restbase1028.eqiad.wmnet
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase-ssl,name=restbase1028.eqiad.wmnet
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=restbase,service=restbase,name=restbase1028.eqiad.wmnet
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 10:26 hnowlan: pooling restbase1028,restbase1029,restbase1030
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 10:22 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 10:14 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 09:40 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 11:36 Lucas_WMDE: EU backport+config window done
* 09:10 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 09:09 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 08:38 godog: roll-restart swift-object-replicator on ms-be2* - [[phab:T261633|T261633]]
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 08:19 kormat: running schema change against s8 in eqiad [[phab:T259831|T259831]]
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 08:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 08:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 08:06 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 08:04 gehel@cumin1001: START - Cookbook sre.hosts.downtime
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 08:02 gehel: repooling wdqs2002
* 10:42 effie: restart pybal on lvs2010  and lvs1016
* 07:55 marostegui: Rebuild db2125 from snapshots - [[phab:T260670|T260670]]
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 07:45 marostegui: Stop MySQL on db1077 to build it from s1 snapshot
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 07:40 gehel: depooled wdqs2002 to catch up on lag
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 07:29 jayme: updated envoyproxy to 1.15.1-2 on all codfw hosts
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 07:23 moritzm: installing pyzmq updates from Buster point release
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 07:00 dcausse: depooling wdqs2002 (catching-up lag)
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 06:57 dcausse: restart blazegraph on wdqs2002 (stuck) [[phab:T242453|T242453]]
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 06:51 _joe_: enable notifications for wdqs-ssl-codfw
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 05:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 05:27 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 04:05 ejegg: updated fundraising python tools from {{Gerrit|5515923ef7}} to {{Gerrit|d4e08c52de}}
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 00:31 tgr_: evening deploys done
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 00:20 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632796{{!}}Enable logging of session cookie changes in group1 (T264793)]] (again, forgot to rebase the previous time) (duration: 00m 59s)
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 00:15 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632796{{!}}Enable logging of session cookie changes in group1 (T264793)]] (duration: 00m 57s)
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 00:03 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:632795{{!}}Enable logging of session cookie changes in group0 (T264793)]] (duration: 00m 58s)
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE


== 2020-10-07 ==
== 2021-07-21 ==
* 23:58 tgr@deploy1001: Synchronized php-1.36.0-wmf.10/includes/session: Backport: [[gerrit:632685{{!}}Log when SessionManager is emitting cookies (T264793)]] (duration: 01m 00s)
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 23:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:55 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=99)
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 21:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 20:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:09 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset (duration: 03m 23s)
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:05 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@7fa787e]: airflow: update mjolnir configuration to reduce max training dataset
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:36 mutante: blog post: The latest addition to our family of Wikimedia languages is "Inari Sami" with language code "smn". It is a Sami language spoken by the Inari Sami of Finland and has about 400 native speakers. It's in the Uralic language family. Wikipedia will be created in [[phab:T264859|T264859]]. https://en.wikipedia.org/wiki/Inari_Sami {{!}} https://iso639-3.sil.org/code/smn {{!}}
* 20:27 dancy: testing upcoming Scap release on beta
* 18:30 ryankemper: search team's backport deploy is complete
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 18:30 ryankemper@deploy1001: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:632683{{!}}cloudelastic: envoy sits in front now (T263073)]] (duration: 00m 58s)
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 18:29 ryankemper: Above tests are as expected, syncing changes everywhere: `scap sync-file wmf-config/ProductionServices.php 'Config: [[gerrit:632683{{!}}cloudelastic: envoy sits in front now (T263073)]]'`
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 18:27 ryankemper: `scap pull`ed onto `mwdebug2001`; talking to cloudelastic via mediawiki from codfw has the expected decrease in latency due to the tls connection pooling
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 18:24 ryankemper: `scap pull`ed onto `mwdebug1002`. Talking to cloudelastic on localhost (which routes thru envoy), 6105 is `cloudelastic-chi-eqiad`, 6106 is `cloudelastic-omega-eqiad`, and 6107 is `cloudelastic-psi-eqiad` as expected
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 18:20 ryankemper: (backport) HEAD set to {{Gerrit|834b4571f978674162fa805906e665e35ac68e27}} as expected
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 18:12 hashar@deploy1001: Synchronized php-1.36.0-wmf.10/includes/HeaderCallback.php: Preload class used in HeaderCallback - [[phab:T261260|T261260]] (duration: 01m 01s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 17:58 hashar: Pulled https://gerrit.wikimedia.org/r/c/mediawiki/core/+/632680  on deployment staging area  and mw2001
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 17:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 17:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 16:39 jgleeson: updated civicrm from {{Gerrit|39b4f954ed}} to {{Gerrit|b86a15a430}}
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 16:35 mutante: switching webproxy service names to the new local install servers in esams/eqsin/ulsfo [[phab:T242602|T242602]]
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 15:12 godog: upgrade rsyslog to 8.2008.0-1~bpo10+1 on centrallog1001 - [[phab:T259780|T259780]]
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 14:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 14:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 14:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 14:33 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 14:22 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 14:04 hoo: Ran "mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php --wiki=wikidatawiki --property-id P1820 --new-data-type external-id" on mwmaint2001 ([[phab:T263986|T263986]])
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 14:03 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 14:00 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 13:42 jayme: updated envoyproxy to 1.15.1-2 on all eqiad hosts
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 13:39 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 13:37 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 13:18 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 04s)
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 13:18 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 12:33 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 12:24 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 12:22 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 11:55 _joe_: rolling restart of restbase due to running puppet with changed config-vars (a noop for the actual configuration)
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 11:22 Urbanecm: EU B&C window done
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f85bc3056f809910c0487fb0b0559b3de92b1992}}: Enable bot passwords at all fishbowl and private wikis ([[phab:T258356|T258356]]) (duration: 00m 58s)
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|57297362c0a22ecf16648b7be4a73c4cb80d53ef}}: Fix OAuthRateLimiter rate limit configuration (duration: 00m 59s)
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 11:14 urbanecm@deploy1001: sync-file aborted: {{Gerrit|57297362c0a22ecf16648b7be4a73c4cb80d53ef}}: Fix OAuthRateLimiter rate limit configuration (duration: 00m 02s)
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6cdeea2c4c15780a641722157584f12febedab2a}}: Set CXMTThresholdForPublish to 95% for Vietnamese Wikipedia ([[phab:T264161|T264161]]) (duration: 00m 59s)
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 10:58 marostegui: Set innodb_change_buffering = inserts on pc2009 [[phab:T263443|T263443]]
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 09:53 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from mw load groups [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12945 and previous config saved to /var/cache/conftool/dbconfig/20201007-095355-kormat.json
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 09:44 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 100%: 75', diff saved to https://phabricator.wikimedia.org/P12944 and previous config saved to /var/cache/conftool/dbconfig/20201007-094412-kormat.json
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 09:21 moritzm: imported icu63 63.1-6+deb10u1~wmf1 to component/icu63 for stretch-wikimedia
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 [[phab:T264755|T264755]] ', diff saved to https://phabricator.wikimedia.org/P12943 and previous config saved to /var/cache/conftool/dbconfig/20201007-090943-marostegui.json
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 08:39 kormat@cumin1001: dbctl commit (dc=all): 'db2138:3314 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12942 and previous config saved to /var/cache/conftool/dbconfig/20201007-083903-kormat.json
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 08:38 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 08:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 08:32 godog: roll-restart statsd-exporter across ms-be* after puppet run - [[phab:T264588|T264588]]
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 08:09 jayme: updated envoyproxy to 1.15.1-2 on all non mw and restbase hosts
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 08:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 07:58 volans@cumin1001: START - Cookbook sre.dns.netbox
* 10:50 moritzm: installing systemd security updates on bullseye
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2015 from dbctl [[phab:T264700|T264700]]', diff saved to https://phabricator.wikimedia.org/P12941 and previous config saved to /var/cache/conftool/dbconfig/20201007-074951-marostegui.json
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 07:14 marostegui: Stop MySQL es2015 for decommissioning [[phab:T264700|T264700]]
* 10:14 effie: enable puppet on mw* servers
* 05:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 05:46 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 02:37 eileen: civicrm revision changed from {{Gerrit|a30da7f92a}} to {{Gerrit|39b4f954ed}}, config revision is {{Gerrit|0ca9a3a055}}
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 01:00 cdanis: repool esams; cr2-esams router upgrade complete
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 00:43 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr2-esams> request chassis routing-engine master switch
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 00:40 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr2-esams> request system reboot other-routing-engine
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 00:36 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr2-esams> request system software add /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz re0 no-validate
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 00:26 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr2-esams> request chassis routing-engine master switch
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 00:22 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr2-esams> request system reboot other-routing-engine
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 00:15 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr2-esams> request system software add re1 no-validate /var/tmp/junos-install-mx-x86-64-17.3R3-S8.1.tgz
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 00:01 mutante: reinstalling testvm[345]001 to confirm OS installs work as normal after switching DHCP servers in POPs ([[phab:T252526|T252526]])
* 08:17 effie: enable puppet on alert*
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 07:16 godog: powercycle ms-be2048
* 07:03 moritzm: installing systemd security updates on stretch
* 06:51 effie: restart memcached on eqiad mc* hosts
* 06:51 effie: enable puppet on mc* hosts
* 06:35 effie: disable puppet on mc1* hosts and icinga - [[phab:T271967|T271967]]
* 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2020-10-06 ==
== 2021-07-20 ==
* 23:55 mutante: 🖧  switched DHCP server for eqsin from install2003 to install5001 - homer deployed to cr*eqsin* ([[phab:T252526|T252526]]) 🖧
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|caa5a076f39b051b01622aa3e4c9d716a8643eef}}: Set wgGEMentorDashboardBackendEnabled properly ([[phab:T285811|T285811]]) (duration: 00m 57s)
* 23:53 mutante: 🖧  switched DHCP server for ulsfo from install2003 to install4001 - homer deployed to cr*ulsfo* ([[phab:T252526|T252526]]) 🖧
* 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|dafd953eb5cd35bddbd2fd348b03066420a42362}}: updateMenteeData: Make it possible to disable script per-wiki ([[phab:T285811|T285811]]) (duration: 00m 58s)
* 23:52 mutante: 🖧  switched DHCP server for esams from install1003 to install3001 - homer deployed to cr*esams* ([[phab:T252526|T252526]]) 🖧
* 18:57 urbanecm: Start server-side upload for 4 large PNG files ([[phab:T285708|T285708]])
* 23:43 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
* 23:11 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 23:07 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 22:32 ryankemper: Restart of `wdqs-categories` done. WDQS deploy is complete
* 17:06 rzl: enabled puppet on A:mw
* 21:57 ryankemper: Restarting `wdqs-categories` across production instances one-at-a-time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 21:57 ryankemper: Restarting `wdqs-categories` across all test instances (not public facing): `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 21:56 ryankemper: Restarting `wdqs-updater` across the fleet: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
* 21:55 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@e56a20e]: 0.3.51 (duration: 13m 09s)
* 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 21:43 ryankemper: All tests passing on canary `wdqs1003`, proceeding to rest of fleet
* 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 21:42 ryankemper@deploy1001: Started deploy [wdqs/wdqs@e56a20e]: 0.3.51
* 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 21:14 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: gerrit:632535 (duration: 01m 00s)
* 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
* 20:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 20:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
* 18:40 Urbanecm: Morning B&C done
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
* 18:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.11/skins/MinervaNeue/: {{Gerrit|2118d265c0f5b6c914efeba86ba7eacd30c5ee0f}}: Hot fix: Use display for hiding/showing sidebar on OS 14_0 ([[phab:T264376|T264376]]) (duration: 01m 00s)
* 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
* 18:37 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.10/skins/MinervaNeue/: {{Gerrit|d428ccbdf3be9a45139f8b8c0874c113f1732198}}: Hot fix: Use display for hiding/showing sidebar on OS 14_0 ([[phab:T264376|T264376]]) (duration: 01m 03s)
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
* 18:25 ppchelko@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase.php gerrit:631775 [[phab:T263493|T263493]] [[phab:T259622|T259622]] (duration: 00m 58s)
* 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
* 18:23 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: IS.php gerrit:631775 [[phab:T263493|T263493]] [[phab:T259622|T259622]] (duration: 00m 59s)
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
* 18:19 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632516 [[phab:T264043|T264043]] (duration: 00m 59s)
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
* 18:15 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632323 [[phab:T264637|T264637]] (duration: 00m 58s)
* 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: gerrit:632484 [[phab:T264637|T264637]] (duration: 00m 58s)
* 15:23 vgutierrez: pool dns1002 - [[phab:T286069|T286069]]
* 15:41 godog: centrallog* delete archived logs from old, single file, organization
* 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 15:23 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 15:23 jayme: updated envoyproxy to 1.15.1-2 on mw-canary and restbase-canary
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
* 14:57 sukhe: upload dnsdist_1.5.0-1wm1 to apt.wm.o (buster) - [[phab:T263789|T263789]]
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
* 14:47 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12936 and previous config saved to /var/cache/conftool/dbconfig/20201006-144701-kormat.json
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
* 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 14:45 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 14:45 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 5% - [[phab:T262946|T262946]]
* 14:53 urbanecm: Start server-side upload for 7 large PNG files ([[phab:T285708|T285708]])
* 14:45 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
* 14:44 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 14:40 jayme: updated envoyproxy to 1.15.1-2 on mw2295.codfw.wmnet,restbase2017.codfw.wmnet
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-backend,name=restbase2009.codfw.wmnet
* 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase2009.codfw.wmnet
* 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 14:38 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2009.codfw.wmnet
* 14:46 vgutierrez: depool dns1002 - [[phab:T286069|T286069]]
* 14:36 hnowlan: repooling restbase2009
* 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:31 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12935 and previous config saved to /var/cache/conftool/dbconfig/20201006-143157-kormat.json
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:19 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide: (duration: 00m 05s)
* 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 14:19 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
* 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:15 jayme: installed envoyproxy 1.15.1-2 on mwdebug1001
* 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:08 marostegui: Reboot db1076 for kernel upgrade [[phab:T264755|T264755]]
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 14:04 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 14:03 marostegui: Power cycle db1076 [[phab:T264755|T264755]]
* 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 ', diff saved to https://phabricator.wikimedia.org/P12934 and previous config saved to /var/cache/conftool/dbconfig/20201006-135810-marostegui.json
* 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 13:41 kormat@cumin1001: dbctl commit (dc=all): 'db2137:3314 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12932 and previous config saved to /var/cache/conftool/dbconfig/20201006-134149-kormat.json
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 13:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 13:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:40 kormat@cumin1001: dbctl commit (dc=all): 'Remove db2119 from dump/vslow, add to all other contributions/logpager/recentchanges*/watchlist temporarily [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12931 and previous config saved to /var/cache/conftool/dbconfig/20201006-134020-kormat.json
* 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 13:40 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
* 13:14 jayme: pushed docker-registry.discovery.wmnet/envoy:1.15.1-2 - [[phab:T264157|T264157]]
* 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:04 marostegui: Change innodb_change_buffering = inserts on db2075 db2089 db2099 db2111 db2128 [[phab:T263443|T263443]]
* 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
* 12:55 godog: swift codfw-prod: bump weight for ms-be2057 - [[phab:T261633|T261633]]
* 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 12:20 elukey: update HDFS Namenode GC/Heap settings on an-master100[1,2]
* 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 12:13 jayme: imported envoyproxy_1.15.1-2 to buster-wikimedia and stretch-wikimedia
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 12:08 jbond42: deploy puppetlabs-stdlib 5.2
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
* 11:42 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10{{!}}0[1-9]).codfw.wmnet
* 11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 11:35 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 11:34 Urbanecm: EU B&C window done
* 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - [[phab:T285643|T285643]]
* 11:34 Urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=arbcom_ruwiki --fix # [[phab:T264430|T264430]] # P12930
* 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 11:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|07c19f97c79ec20d6b1657e589acfc242dd53b09}}: arbcom_ruwiki: Set AK as alias for NS_PROJECT ([[phab:T264430|T264430]]) (duration: 00m 58s)
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 11:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7e4e81129b8697c394ec329dd2b3c784e607a4d1}}: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons ([[phab:T264430|T264430]]) (duration: 00m 58s)
* 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 11:30 urbanecm@deploy1001: Synchronized static/favicon/arbcom_ruwiki.ico: {{Gerrit|7e4e81129b8697c394ec329dd2b3c784e607a4d1}}: arbcom_ruwiki: Change favicon to File:Arbcom-ru_favicon.svg from commons ([[phab:T264430|T264430]]) (duration: 00m 58s)
* 12:44 moritzm: installing systemd security updates on buster
* 11:20 XioNoX: push L3 prep work to cloudsw1-c8-eqiad
* 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
* 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7b1a4fad0f55c626e42961489062115d5f97ed6c}}: ruewiki: Add rollbacker, grantable and revokable by sysops ([[phab:T264147|T264147]]) (duration: 00m 58s)
* 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5cc7027ba8d0ddee5c9898b80afe850603bf870e}}: Allow bureaucrats to remove sysop permissions on Commons ([[phab:T261481|T261481]]) (duration: 00m 58s)
* 11:58 Lucas_WMDE: EU config+backport window done
* 11:07 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 03m 14s)
* 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (3/3) (duration: 00m 56s)
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5f9721b3300c8e733d331bcbc754d31d9493f8ba}}: GrowthExperiments: Change Help Page URL for kowiki ([[phab:T254364|T254364]]) (duration: 01m 00s)
* 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:04 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
* 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:02 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009 (duration: 00m 12s)
* 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (2/3) (duration: 00m 56s)
* 11:02 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying restbase2009
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (1/3) (duration: 00m 56s)
* 11:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 3/3) (duration: 00m 56s)
* 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 2/3) (duration: 00m 56s)
* 10:48 effie: set mw2279.codfw.wmnet as inactive [[phab:T264698|T264698]]
* 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 1/3) (duration: 00m 57s)
* 10:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2279.codfw.wmnet
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:705498{{!}}Add patroller group for ckbwiki (T285221)]] (duration: 00m 57s)
* 10:45 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (2/2, beta) (duration: 00m 56s)
* 10:44 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (1/2, prod) (duration: 00m 57s)
* 10:43 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts (duration: 01m 19s)
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704867{{!}}Update config for language switching on pilot wikis (T286459)]] (duration: 00m 59s)
* 10:41 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Deploying restbase to new hosts
* 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:37 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009 (duration: 00m 15s)
* 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:37 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: Redeploying to depooled restbase2009
* 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:36 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:33 hnowlan@deploy1001: Finished deploy [restbase/deploy@4ad65b0]: (no justification provided) (duration: 03m 01s)
* 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:31 volans@cumin1001: START - Cookbook sre.dns.netbox
* 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
* 10:30 hnowlan@deploy1001: Started deploy [restbase/deploy@4ad65b0]: (no justification provided)
* 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
* 10:01 marostegui: Restart mysql on dbstore1004 to pick up new buffer pool sizes
* 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:59 effie: enable puppet on mc20*
* 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 09:41 effie: enable puppet on mc10*
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 09:38 effie: disable puppet on mc*
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
* 09:27 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
* 09:26 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
* 08:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
* 08:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 08:33 jayme: imported envoyproxy_1.15.1-1+deb9u1 to stretch-wikimedia
* 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
* 08:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:17 eileen: civicrm revision changed from {{Gerrit|20e9ef6bbb}} to {{Gerrit|819c11307d}}, config revision is {{Gerrit|bb405c5232}}
* 08:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 08:02 volans: removing unused ms-fe and ms-fe-thumbs svc records from DNS (gerrit/628086)
* 07:53 marostegui: Change innodb_change_buffering = inserts on db2087:3316 db2089:3316 db2076 db2097:3316 db2114 [[phab:T263443|T263443]]
* 07:39 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 07:35 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 07:31 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 07:17 marostegui: Remove es2015 and es2017 from tendril and zarcillo [[phab:T264700|T264700]] [[phab:T264386|T264386]]
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2015 [[phab:T264700|T264700]] ', diff saved to https://phabricator.wikimedia.org/P12926 and previous config saved to /var/cache/conftool/dbconfig/20201006-071451-marostegui.json
* 07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2017 from dbctl [[phab:T264386|T264386]]', diff saved to https://phabricator.wikimedia.org/P12925 and previous config saved to /var/cache/conftool/dbconfig/20201006-052849-marostegui.json


== 2020-10-05 ==
== 2021-07-19 ==
* 23:11 ejegg: updated payments staging from {{Gerrit|52704ffe24}} to {{Gerrit|db03677b2d}}
* 20:48 urbanecm: Deploy security patch for [[phab:T286884|T286884]]
* 22:27 mutante: removing shinken puppet module and role
* 20:29 vgutierrez: pool text@codfw - [[phab:T286921|T286921]]
* 22:01 ebernhardson: restore wikidatawiki_content enwiki_content enwiki_general and commonswiki_file to default index.merge.policy.deletes_pct_allowed on eqiad cirrus cluster [[phab:T264053|T264053]]
* 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
* 20:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467{{!}}prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s)
* 20:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
* 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448{{!}}Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s)
* 20:26 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (32 sector, 16kB) readahead settings [[phab:T264053|T264053]]
* 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - [[phab:T286921|T286921]]
* 20:13 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2051 to take reduced (64 sector, 32kB) readahead settings [[phab:T264053|T264053]]
* 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - [[phab:T286921|T286921]]
* 19:56 ebernhardson: restart elasticsearch_6@production-search-codfw on elastic2050 to take reduced (128kB) readahead settings [[phab:T264053|T264053]]
* 18:46 brennen: gerrit1001: restarting gerrit
* 19:31 mutante: ran sre.dns.netbox to push addition of an-worker1113 which was commited in prod repo but not in netbox data
* 18:40 vgutierrez: stop pybal on lvs2009  - [[phab:T286921|T286921]]
* 19:30 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:38 brennen: re-enabling puppet on gerrit1001]
* 19:27 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - [[phab:T286921|T286921]]
* 18:59 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 00m 08s)
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:59 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335] (thin): [THIN] Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
* 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
* 18:58 mforns@deploy1001: Finished deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27] (duration: 12m 08s)
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P<nowiki>{</nowiki>cloudelastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:46 mforns@deploy1001: Started deploy [analytics/refinery@2c6c335]: Special deployment to unblock deletion jobs [analytics/refinery@2c6c335e61cecd0321ec6f066a153feaf2dbbc27]
* 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - [[phab:T286921|T286921]]
* 18:17 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 18:20 vgutierrez: enabling pybal on lvs2007 - [[phab:T286921|T286921]]
* 18:17 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 18:19 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
* 18:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 18:06 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:705437{{!}}pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately]] (duration: 00m 56s)
* 18:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 18:10 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
* 17:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
* 17:25 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
* 17:00 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
* 16:59 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
* 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:30 volans: running puppet on elastic2038 after nework was restored
* 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
* 15:15 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 14:56 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
* 14:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 14:41 elukey: shutdown stat1005 and stat1008 for ram expansion (1005 again)
* 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 14:36 ppchelko@deploy1001: Finished deploy [restbase/deploy@366a543]: [[phab:T263133|T263133]] [[phab:T264035|T264035]] (duration: 22m 23s)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 14:25 elukey: shutdown an-master1001 for ram expansion
* 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 14:13 ppchelko@deploy1001: Started deploy [restbase/deploy@366a543]: [[phab:T263133|T263133]] [[phab:T264035|T264035]]
* 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 14:01 filippo@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 13:58 filippo@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:23 volans: running authdns-update to force-update authdns2001
* 13:55 filippo@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 13:54 elukey: shutdown stat1005 for ram upgrade
* 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:31 elukey: shutdown an-master1002 for ram expansion (64 -> 128G)
* 17:21 XioNoX: remove ns1 redirect - [[phab:T286787|T286787]]
* 12:39 moritzm: installing curl security updates on remaining hosts
* 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
* 11:34 hoo@deploy1001: Synchronized wmf-config/: Revert "Remove $wgExtraLanguageNames from Wikidata and Commons" ([[phab:T264295|T264295]]) (duration: 00m 59s)
* 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|be73f155001e9095697c3c21a208c63e7bf5d2d1}}: Move changetags right from users to sysop [trwiki] ([[phab:T264508|T264508]]) (duration: 00m 59s)
* 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cd30b626e23b48146b970c72731f8f7bb1eee9e1}}: wgSkipSkins: Exclude contenttranslation skin from skin options for users ([[phab:T263093|T263093]]) (duration: 00m 59s)
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
* 11:05 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:632212{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 17:10 XioNoX: enable asw-a2-codfw access ports - [[phab:T286787|T286787]]
* 11:04 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:632212{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - [[phab:T286787|T286787]]
* 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
* 10:37 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:632204{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
* 10:36 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:632204{{!}} Bumping portals to master (T128546)]] (duration: 01m 00s)
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
* 10:34 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
* 10:32 ema: cp3052: pool with varnish 5.1.3-1wm15 [[phab:T264398|T264398]]
* 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
* 10:28 ema: cp3052: depool and downgrade varnish to 5.1.3-1wm15 [[phab:T264398|T264398]]
* 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
* 10:08 moritzm: installing ldap-replica1002 [[phab:T264390|T264390]]
* 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
* 09:52 moritzm: installing ldap-replica1001 [[phab:T264390|T264390]]
* 16:40 XioNoX: update asw-a2-codfw serial number - [[phab:T286787|T286787]]
* 09:22 moritzm: installing ldap-replica2003 [[phab:T264390|T264390]]
* 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:02 hnowlan: bootstrapping restbase1030-b
* 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
* 08:57 moritzm: installing ldap-replica2004 [[phab:T264390|T264390]]
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 08:40 kormat@cumin1001: dbctl commit (dc=all): 'db2073 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12918 and previous config saved to /var/cache/conftool/dbconfig/20201005-084022-kormat.json
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 08:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 08:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 08:38 kormat@cumin1001: dbctl commit (dc=all): 'Add db2119 to s4 dump/vslow temporarily [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12917 and previous config saved to /var/cache/conftool/dbconfig/20201005-083822-kormat.json
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 08:23 godog: prometheus codfw/ops, add 100G to the LV
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 08:06 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:46 marostegui: Stop mysql on es2017 [[phab:T264386|T264386]]
* 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
* 07:30 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 16:21 mutante: depooled logstash2021 for dcops maintenance work
* 06:52 XioNoX: add static NAT to pfw3-eqiad - [[phab:T264356|T264356]]
* 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
* 06:33 elukey: reboot stat1005 to resolve weird GPU state (scheduled last week)
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
* 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2017 [[phab:T264386|T264386]] ', diff saved to https://phabricator.wikimedia.org/P12916 and previous config saved to /var/cache/conftool/dbconfig/20201005-050636-marostegui.json
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
* 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
* 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|310be45f7}} (duration: 00m 57s)
* 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
* 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
* 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2bdfbd258e}} (duration: 00m 57s)
* 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I069c7b53}} (duration: 00m 58s)
* 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
* 15:10 godog: +100G to prometheus/ops in codfw
* 14:59 vgutierrez: rolling restart of eqiad pybal instances
* 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:42 vgutierrez: rolling restart of codfw pybal instances
* 14:33 vgutierrez: rolling restart of eqsin pybal instances
* 14:23 vgutierrez: rolling restart of ulsfo pybal instances
* 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
* 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
* 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
* 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
* 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
* 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
* 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
* 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
* 11:40 moritzm: installing bluez security updates
* 11:31 Lucas_WMDE: EU backport+config window done
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:703205{{!}}Add config for updated PropertySuggester beta cluster (T285098)]] (beta-only) (duration: 00m 57s)
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 09:52 moritzm: imported megacli for bullseye-wikimedia [[phab:T282272|T282272]] [[phab:T275873|T275873]]
* 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
* 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
* 08:15 vgutierrez: depool codfw text traffic
* 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
* 03:26 twentyafterfour: restarted phd on phab1001
* 03:25 twentyafterfour: investigating PHD failure


== 2020-10-03 ==
== 2021-07-16 ==
* 15:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: emergency: {{Gerrit|840545f1d9115ea6b672cecce1762d850d8b1f54}}: Restrict flow-hide right to autoconfirmed users on zhwiki ([[phab:T264489|T264489]]) (duration: 01m 17s)
* 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 00:08 ejegg: updated fundraising CiviCRM from {{Gerrit|256adda03c}} to {{Gerrit|a30da7f92a}}
* 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
* 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
* 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P<nowiki>{</nowiki>elastic2*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
* 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
* 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
* 15:48 vgutierrez: restart pybal on lvs2010
* 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
* 15:24 godog: downtime flappy pages in codfw for 40 minutes
* 15:14 godog: set alert2001 as active in netbox (was staged) - [[phab:T247966|T247966]]
* 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
* 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
* 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw [[phab:T286787|T286787]]
* 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
* 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
* 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
* 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
* 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
* 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers ([[phab:T279309|T279309]])
* 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 12:39 mutante: mw1412 through mw1428 - set to active in netbox ([[phab:T279309|T279309]])
* 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
* 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
* 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
* 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
* 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
* 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
* 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
* 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
* 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
* 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
* 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures [[phab:T286763|T286763]]', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
* 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)


== 2020-10-02 ==
== 2021-07-15 ==
* 22:00 mutante: depooling mw2271 because Icinga alerts about memcached and SAL shows there were ongoing tests of some kind on it
* 23:32 brennen: checking stashbot: [[phab:T286756|T286756]]
* 21:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,name=mw2271.codfw.wmnet
* 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: [[gerrit:704815{{!}}Fix creation of mw.Message objects (T286385)]] (duration: 00m 57s)
* 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # [[phab:T285811|T285811]]
* 21:32 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # [[phab:T285811|T285811]]
* 21:26 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # [[phab:T285811|T285811]]
* 21:22 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
* 19:14 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 18:35 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki [[phab:T284928|T284928]]
* 18:27 effie: enable puppet on mw2271
* 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
* 18:16 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events (duration: 02m 01s)
* 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 18:14 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@da6a098]: oozie: query_clicks_hourly needs to wait on codfw events
* 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 17:15 mutante: submitted puppet refactoring change on maps servers
* 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 16:49 effie: disable puppet on mw2271 and briefly depool it
* 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 15:39 _joe_: restarting redis on rdb2003, instance 6380
* 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 15:28 hnowlan: bootstrapping restbase1030-a
* 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 15:25 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 14:45 cdanis@deploy1001: Synchronized docroot/wikimediafoundation.org: Separate foundation.wikimedia.org docroot & add .well-known/matrix/server [[phab:T261531|T261531]] {{Gerrit|4573776bd}} {{Gerrit|2fb4c20ae}} (duration: 01m 01s)
* 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
* 14:19 moritzm: installing LLVM 7 bugfix updates from Buster point release
* 19:28 volker-e@deploy1002: Started deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484)
* 14:08 effie: enable puppet on mwdebug1001
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:08 moritzm: purging some unused kernels on ping* (these only have 3GB "disks")
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:37 Urbanecm: Create bot_passwords table at fishbowl wikis ([[phab:T258356|T258356]])
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:35 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12905 and previous config saved to /var/cache/conftool/dbconfig/20201002-133545-kormat.json
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:20 kormat@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12904 and previous config saved to /var/cache/conftool/dbconfig/20201002-132042-kormat.json
* 18:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:00 moritzm: installing Linux 4.19.146 on Buster updates (from latest Buster point release, at this point only installing the updates, no reboots (yet))
* 18:35 reedy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 06s)
* 12:26 effie: disable puppet on mwdebug1001
* 18:34 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 07s)
* 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db2140 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12903 and previous config saved to /var/cache/conftool/dbconfig/20201002-121830-kormat.json
* 18:32 robh@cumin1001: START - Cookbook sre.dns.netbox
* 12:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:11 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs (duration: 05m 41s)
* 12:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:08 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12902 and previous config saved to /var/cache/conftool/dbconfig/20201002-120825-kormat.json
* 17:05 otto@deploy1002: Started deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs
* 12:05 hnowlan: bootstrapping restbase1029-c
* 17:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 11:53 kormat@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12901 and previous config saved to /var/cache/conftool/dbconfig/20201002-115322-kormat.json
* 17:00 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs (duration: 17m 21s)
* 11:22 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 10:59 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 10:57 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 16:43 otto@deploy1002: Started deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs
* 10:47 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 16:40 ejegg: updated payments-wiki from {{Gerrit|d9892207c1}} to {{Gerrit|844b59ee42}}
* 10:47 jmm@cumin2001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 10:44 kormat@cumin1001: dbctl commit (dc=all): 'db2110 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12900 and previous config saved to /var/cache/conftool/dbconfig/20201002-104453-kormat.json
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 10:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:27 ejegg: updated fundraising CiviCRM from {{Gerrit|e0d53c92b5}} to {{Gerrit|20e9ef6bbb}}
* 10:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:24 ejegg: updated payments-wiki from {{Gerrit|0e7800027a}} to {{Gerrit|844b59ee42}}
* 10:43 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12899 and previous config saved to /var/cache/conftool/dbconfig/20201002-104320-kormat.json
* 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 10:40 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:36 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:28 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 67%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12898 and previous config saved to /var/cache/conftool/dbconfig/20201002-102817-kormat.json
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:13 kormat@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 33%: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12897 and previous config saved to /var/cache/conftool/dbconfig/20201002-101313-kormat.json
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:06 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:56 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 15:16 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704773{{!}}flaggedrevs: Allow admins of idwiki to change stablesettings (T268317)]], try II (duration: 01m 05s)
* 09:48 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:03 Amir1: temporary becoming admin on idwiki to debug [[phab:T268317|T268317]]
* 09:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 moritzm: installing nginx security updates on ms-fe*
* 09:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:27 kormat@cumin1001: dbctl commit (dc=all): 'db2106 depooling: schema change [[phab:T259831|T259831]]', diff saved to https://phabricator.wikimedia.org/P12896 and previous config saved to /var/cache/conftool/dbconfig/20201002-092715-kormat.json
* 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 09:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 09:19 jayme: running ipvsadm -D -t 10.2.1.20:10042; ipvsadm -D -t 10.2.1.16:1969 on lvs2010.codfw.wmnet,lvs2009.codfw.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 09:18 jayme: running ipvsadm -D -t 10.2.2.20:10042; ipvsadm -D -t 10.2.2.16:1969 on lvs1016.eqiad.wmnet,lvs1015.eqiad.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 09:17 jayme: restarting pybal on lvs1015.eqiad.wmnet,lvs2009.codfw.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:14 jayme: restarting pybal on lvs1016.eqiad.wmnet,lvs2010.codfw.wmnet - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:12 jayme: running puppet on lvs servers - [[phab:T255875|T255875]] [[phab:T255869|T255869]]
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 09:11 arturo: added helm3 package to buster-wikimedia/thirdparty/kubeadm-k8s-1-17 ([[phab:T264221|T264221]])
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 09:09 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1301.eqiad.wmnet
* 09:08 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 13:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 09:08 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 09:07 hnowlan: bootstrapping restbase1029-b cassandra
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 09:05 hashar: gerrit: running garbage collector
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 09:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 09:00 root@cumin1001: START - Cookbook sre.hosts.downtime
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 08:59 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 08:59 root@cumin1001: START - Cookbook sre.hosts.downtime
* 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1301.eqiad.wmnet
* 08:54 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 03s)
* 13:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1300.eqiad.wmnet
* 08:54 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
* 13:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw130[0-1].eqiad.wmnet
* 08:42 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy (duration: 00m 34s)
* 13:22 mutante: mw1300, mw1301 - jobrunners going out of service, decom
* 08:41 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Test stat1007 deploy
* 13:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1300.eqiad.wmnet
* 08:30 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 00m 33s)
* 13:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:30 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
* 13:17 jelto@cumin1001: START - Cookbook sre.dns.netbox
* 08:29 moritzm: installing pyzmq bugfix update from buster point release
* 13:12 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 08:24 moritzm: installing nginx security updates on puppetdb*
* 13:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw130[0-1].eqiad.wmnet
* 08:17 dcausse@deploy1001: Finished deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date (duration: 01m 35s)
* 13:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 08:16 dcausse@deploy1001: Started deploy [wikimedia/discovery/analytics@5713fb0]: Fix lexeme dumps expected date
* 13:05 mutante: mw1413 - pooling, was depooled but for unknown reason, dont see it in SAL, looks ok, scap pulled
* 07:42 moritzm: installing libcommons-compress-java security updates
* 13:03 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1422.eqiad.wmnet
* 07:35 godog: swift codfw-prod bump weight for ms-be2057 - [[phab:T261633|T261633]]
* 13:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 07:29 godog: prometheus codfw/k8s, add 50G to the LV
* 13:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 07:23 moritzm: installing libx11 security updates on buster
* 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 06:51 _joe_: restarting php-fpm on all appservers in eqiad, in batches of 10%, for testing the procedure suggested at [[phab:T264362|T264362]]
* 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es2011 from dbctl [[phab:T264261|T264261]]', diff saved to https://phabricator.wikimedia.org/P12893 and previous config saved to /var/cache/conftool/dbconfig/20201002-053020-marostegui.json
* 12:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[3-5].eqiad.wmnet
* 12:54 mutante: mw1423, mw1424, mw1425 - pooled as new API servers
* 12:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 12:53 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[3-5].eqiad.wmnet
* 12:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 12:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 12:34 mutante: mw1423, mw1424, mw1425 - scap pull
* 12:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:09 mutante: mw1423,mw1424,mw1425 - rebooting
* 11:48 moritzm: restarting restbase1028-1030 to pick up libuv security update
* 11:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 11:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 11:47 mutante: mw1423, mw1424, mw1425 - initial puppet run, new API appservers going into production
* 11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704527{{!}}Make idwiki use protect mode of flaggedrevs (T268317)]] (duration: 01m 07s)
* 11:40 moritzm: restarting Etherpad to pick up libuv security update
* 11:37 moritzm: restarting Turnilo to pick up libuv security update
* 11:34 moritzm: installing libuv1 security updates
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 10 hosts
* 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 11:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 11:05 volans@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 10:56 volans: commented out cron-spam entries on thanos-fe2001, puppet is disabled, thanos-store.service fails to start - [[phab:T285835|T285835]]
* 10:41 godog: move wikibase.queryService.ui.app to wikibase.queryService.ui.index.app - [[phab:T272128|T272128]]
* 10:34 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 10:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 10:33 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 10:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 10:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 10:26 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 10:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 10:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 10:02 effie: disableing puppet on maps* for 704394
* 09:38 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:11 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-8].eqiad.wmnet
* 09:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 08:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:31 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:29 elukey: sudo rm /etc/rawdog/en/feeds/847a7185.state* on planet1002 (corrupted file) - backup in /home/elukey + restart planet-update-en.service
* 08:12 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-8].eqiad.wmnet
* 08:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 08:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 07:48 moritzm: updated bullseye d-i image for latest daily build [[phab:T275873|T275873]]
* 07:31 godog: reimage thanos-fe2001 with bullseye - [[phab:T285835|T285835]]
* 07:23 elukey: restart planet-update-en.service on planet1002
* 07:17 elukey: remove /etc/rawdog/en/<nowiki>{</nowiki>state,state.lock<nowiki>}</nowiki> on planet1002 (following what rawdog suggested) due to corrupted files (backups available in /home/elukey/en)
* 06:51 elukey: restart phabricator_clean_tmp_files.service on phab1001 - transient error (tmp files already cleaned up)
* 06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 06s)
* 06:47 tstarling@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 07s)
* 05:50 kart_: Updated cxserver to 2021-07-14-124232-production ([[phab:T282369|T282369]], [[phab:T284450|T284450]])
* 05:47 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:43 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:41 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 00:00 twentyafterfour: phabricator update deployed.


== 2020-10-01 ==
== 2021-07-14 ==
* 23:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 34s)
* 23:23 eileen: civicrm revision changed from {{Gerrit|b1c63470bb}} to {{Gerrit|e0d53c92b5}}, config revision is {{Gerrit|bb405c5232}}
* 23:38 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
* 21:19 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.14 (duration: 01m 05s)
* 23:33 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 21:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.14
* 23:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10% (duration: 00m 24s)
* 21:08 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/User.php: Backport: [[gerrit:704609{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 23:15 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@6101b56]: mjolnir: increase training memory overhead by 10%
* 20:58 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/User.php: Backport: [[gerrit:704608{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 23:07 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:51 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.14
* 22:36 James_F: Manually created mediawiki/extensions.git REL1_35 at {{Gerrit|7ab9a74c9ebbb22ad9fb9b7c95c91b7fad8bf8c6}} for [[phab:T264365|T264365]]
* 19:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/resources: Backport: [[gerrit:704606{{!}}Fix deprecated offset() on invalid DOM (T185629)]] (duration: 01m 07s)
* 22:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 19:31 andrew@deploy1002: Finished deploy [horizon/deploy@156a984]: fix trove-dashboard bug (duration: 04m 18s)
* 22:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:26 andrew@deploy1002: Started deploy [horizon/deploy@156a984]: fix trove-dashboard bug
* 22:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 19:17 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 22:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:17 nskaggs@cumin1001: Added views for new wiki: dagwiki [[phab:T284456|T284456]]
* 22:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:55 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 21:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:54 nskaggs@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
* 21:29 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group0 as well [[phab:T264363|T264363]]
* 18:54 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 21:29 James_F: Manually created mediawiki/skins.git REL1_35 at {{Gerrit|796693cb7a2ee3191fcbe19769d341bd0530bd4a}} for [[phab:T264365|T264365]]
* 18:36 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 21:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:36 nskaggs@cumin1001: Added views for new wiki: banwikisource [[phab:T284390|T284390]]
* 21:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:30 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 21:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: rollback group1
* 18:14 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 20:48 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.11  refs [[phab:T263177|T263177]] (duration: 01m 06s)
* 17:52 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 20:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.11  refs [[phab:T263177|T263177]]
* 17:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 20:19 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.11
* 17:49 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 20:08 twentyafterfour@deploy1001: Synchronized php-1.36.0-wmf.11/includes/parser/: sync ParserCache patches to unblock the train [[phab:T264257|T264257]] [[phab:T263177|T263177]] (duration: 00m 59s)
* 17:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 18:40 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: increase more_like recommendation cache from one to three days [[phab:T264053|T264053]] (duration: 00m 59s)
* 17:39 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 17:49 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}} (duration: 13m 42s)
* 17:35 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 17:35 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}}
* 17:35 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704383{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 06s)
* 17:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:00 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704382{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 05s)
* 17:24 fdans@deploy1001: Finished deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}} (duration: 01m 34s)
* 16:27 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 17:24 mutante: etherpad1002 - attempted to upgrade Etherpad to newer version but wasn't working, reverted to previous one
* 16:26 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 17:22 fdans@deploy1001: Started deploy [analytics/refinery@530b339]: Regular analytics weekly train {{Gerrit|530b339}}
* 16:11 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Translate: Backport: [[gerrit:704404{{!}}TranslationAid: Handle empty message definition (T285830)]] and [[gerrit:704405{{!}}TranslationAid: Make sure to return successfully fetched definitions (T285830)]] (duration: 01m 09s)
* 17:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:07 otto@