You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(reload haproxy dbproxy1004 (springle))
imported>Stashbot
(elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - T296563 (duration: 02m 11s))
Line 1: Line 1:
== July 4 ==
== 2021-11-28 ==
* 01:00 springle: reload haproxy dbproxy1004
* 17:14 elukey@deploy1002: Finished deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]] (duration: 02m 11s)
* 17:12 elukey@deploy1002: Started deploy [ores/deploy@69ed061]: Canary upgrade of mwparserfromhell - [[phab:T296563|T296563]]


== July 3 ==
== 2021-11-27 ==
* 23:59 logmsgbot: legoktm Synchronized php-1.26wmf12/extensions/Translate/: Translate+UserMerge fixes (duration: 00m 17s)
* 19:55 andrew@deploy1002: Finished deploy [horizon/deploy@6115b3b]: network UI updates for [[phab:T296548|T296548]] (duration: 04m 14s)
* 23:55 logmsgbot: legoktm Synchronized php-1.26wmf12/extensions/WikiLove/: WikiLove+UserMerge fixes (duration: 00m 18s)
* 19:51 andrew@deploy1002: Started deploy [horizon/deploy@6115b3b]: network UI updates for [[phab:T296548|T296548]]
* 23:24 logmsgbot: ori Synchronized w/404.php: Force 'Transfer-Encoding: Chunked' header on 404 responses (duration: 00m 31s)
* 19:47 andrew@deploy1002: Finished deploy [horizon/deploy@6115b3b]: network UI tests in codfw1dev (duration: 02m 01s)
* 22:36 Krenair: restarted apache on silver to see if it would make https://gerrit.wikimedia.org/r/#/c/221969/ take effect for T104360. It did not.
* 19:45 andrew@deploy1002: Started deploy [horizon/deploy@6115b3b]: network UI tests in codfw1dev
* 21:46 ori: depooled mw1152
* 12:22 elukey: drop /var/tmp/core files from ores100[2,4] root partition full
* 20:12 ori: restarted cassandra on restbase1001
* 12:10 elukey: drop /var/tmp/core files from ores1009, root partition full
* 17:28 ori: pooled mw1152 (HHVM image scaler) for debugging.
* 11:55 elukey: disable coredumps for ORES celery units (will cause a roll restart of all celeries) - [[phab:T296563|T296563]]
* 17:05 logmsgbot: krenair Synchronized php-1.26wmf12/extensions/Collection/RenderingAPI.php: https://gerrit.wikimedia.org/r/#/c/222616/ - hoping this fixes T104708 (duration: 00m 44s)
* 11:46 elukey: drop ores coredumps from ores1008
* 15:35 YuviPanda: cd /mnt/backup/others-20150703/ ; tar --acls --xattrs -cpf - . | pv -L 80M -C -p -r -e -b -t -B 32M -T | ssh -c chacha20-poly1305@openssh.com -i ~/.ssh/id_labstore root@labstore2001.codfw.wmnet "pv -C -B 32M | tar --acls --xattrs -xpf - -C /srv/backup-others-20150703" on labstore1002
* 09:56 elukey: powercycle analytics1071, soft lockup stacktraces in the tty
* 15:35 YuviPanda: mount /dev/mapper/backup-others--20150703 /srv/backup-others-20150703/ on labstore2001
* 09:51 elukey: move ores coredump files from /var/cache/tmp to /srv/coredumps on ores100[6,7,8] and ores2003 to free space on the root partition
* 15:34 YuviPanda: mkdir /srv/backup-others-20150703 on labstore2001
* 15:33 YuviPanda: mkfs -t ext4 /dev/mapper/backup-others--20150703 on labstore2001 completed
* 15:33 YuviPanda: run mount -o ro /dev/mapper/labstore-others--20150703 /mnt/backup/others-20150703/ on labstore1002
* 15:32 YuviPanda: run mkdir /mnt/backup/others-20150703 on labstore1002
* 15:31 YuviPanda: run  lvcreate -L 640G -s -n others-20150703 labstore/others on labstore1002
* 15:29 YuviPanda: running mkfs -t ext4 /dev/mapper/backup-others--20150703 on labstore2001
* 15:28 YuviPanda: run lvcreate -L 3.5T -n others-20150703 backup on labstore2001
* 15:25 YuviPanda: begin process of backing up others (all labs projects except tools) on to labstore2001 from labstore1002
* 14:06 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool db1022 (low traffic) (duration: 00m 54s)
* 13:27 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool db2047 after maintenance (duration: 00m 22s)
* 13:27 YuviPanda: run cd /mnt/backup/tools-20150703/ ; tar --acls --xattrs -cpf - . | pv -L 80M -C -p -r -e -b -t -B 32M -T | ssh -c chacha20-poly1305@openssh.com -i ~/.ssh/id_labstore root@labstore2001.codfw.wmnet "pv -C -B 32M | tar --acls --xattrs -xpf - -C /srv/backup-tools-20150703" on labstore1002
* 13:27 YuviPanda: interrupting tar |ssh | tar script and cleaning out destination again
* 13:17 YuviPanda: clean out tar | ssh | tar target on labstore2001
* 13:15 YuviPanda: /dev/null filled up on labstore1002, aborting pipe of valuable user data into it.
* 13:13 YuviPanda: run cd /mnt/backup/tools-20150703/ ; tar --acls --xattrs -cpf - . | pv -L 80M -C -p -r -e -b -t -B 32M -T > /dev/null on labstore1002
* 13:02 YuviPanda: run cd /mnt/backup/tools-20150703/ ; tar --acls --xattrs -cpf - . | pv -L 80M -C -p -r -e -b -t -B 32M -T | ssh -i ~/.ssh/id_labstore root@labstore2001.codfw.wmnet "pv -C -B 32M | tar --acls --xattrs -xpf - -C /srv/backup-tools-20150703" on labstore1002
* 13:02 YuviPanda: interrupt tar | ssh | tar on labstore1002 and killed dest on labstore2001
* 12:43 YuviPanda: cd /mnt/backup/tools-20150703/ ; tar --acls --xattrs -cpf - . | pv -L 80M -p -r -e -b -t -B 32M -T | ssh -i ~/.ssh/id_labstore root@labstore2001.codfw.wmnet "pv -B 32M | tar --acls --xattrs -xpf - -C /srv/backup-tools-20150703" on screen on labstore1002
* 12:43 mobrovac: restbase deploying restbase/deploy @ 1a826a5
* 12:42 YuviPanda: interrupt tar | ssh | tar on labstore1002, clean out destination on labstore2001
* 12:36 YuviPanda: interrupted tar | ssh | tar on labstore1002 and cleaned out dest on labstore2001
* 12:35 YuviPanda: cd /mnt/backup/tools-20150703/ ; tar --acls --xattrs -cpf - . | pv -L 80M -p -r -e -b -t -B 16M | ssh -i ~/.ssh/id_labstore root@labstore2001.codfw.wmnet "pv -B 16M | tar --acls --xattrs -xpf - -C /srv/backup-tools-20150703" in screen on labstore1002
* 12:33 YuviPanda: rm -rf /srv/backup-tools-20150703/* on labstore2001
* 12:31 mark: labstore2001: mount /srv/backup -o remount,ro
* 12:31 YuviPanda: interrupt tar | ssh | tar on labstore1002
* 12:29 YuviPanda: cd /mnt/backup/tools-20150703/ ; tar --acls --xattrs -cpf - . | ssh -i ~/.ssh/id_labstore root@labstore2001.codfw.wmnet "pv -L 80M -p -r -e -b -t -B 16M | tar --acls --xattrs -xpf - -C /srv/backup-tools-20150703" on labstore1002
* 12:28 YuviPanda: cd /mnt/backup/tools-20150703/ ; tar --acls --xattrs cpf - . | ssh -i ~/.ssh/id_labstore root@labstore2001.codfw.wmnet "pv -L 80M -p -r -e -b -t -B 16M | tar --acls --xattrs xpf - -C /srv/backup-tools-20150703" on labstore1002
* 12:09 YuviPanda: running mount -o ro /dev/mapper/labstore-tools--20150703 /mnt/backup/tools-20150703/ now
* 11:57 YuviPanda: run  lvcreate -L 640G -s -n tools-20150703 labstore/tools on labstore1002
* 11:50 YuviPanda: running  lvcreate -L 640G -s tools -n tools-20150703 labstore on labstore1002
* 11:26 YuviPanda: umount /mnt/backup/project/tools/ on labstore1002
* 11:24 YuviPanda: ran mount /dev/mapper/backup-tools--20150703 /srv/backup-tools-20150703/ on labstore2001
* 11:22 YuviPanda: mkdir /srv/backup-tools-20150703 on labstore2001
* 11:13 YuviPanda: run mkfs -t ext4 /dev/mapper/backup-tools--20150703  on labstore2001
* 11:09 YuviPanda: lvcreate -L 6TB -n tools-20150703 backup on labstore2001
* 11:09 jynus: reimports finished on dbstore2* hosts and puppet reenabled after T104471 was fixed
* 10:56 mobrovac: restbase disabling puppet on restbase1005 to tweak JVM params for cassandra
* 10:50 YuviPanda: started du of maps project on labstore2001
* 09:36 mobrovac: restbase restarting cassandra on rb1002
* 06:19 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul  3 06:19:02 UTC 2015 (duration 19m 1s)
* 02:50 urandom: restbase rolling restart
* 02:49 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-03 02:49:31+00:00
* 02:42 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 11m 43s)
* 02:06 logmsgbot: ori Synchronized php-1.26wmf12/extensions/CentralAuth: I0e5f2d3b2: Updated mediawiki/core Project: mediawiki/extensions/CentralAuth  7f8da7139714dd5089dd03e8679aba25c2c89c4d (duration: 00m 15s)


== July 2 ==
== 2021-11-26 ==
* 22:34 logmsgbot: legoktm Synchronized php-1.26wmf12/extensions/CentralAuth/: Made use of new USE_MULTI_COMMIT flag in user merge jobs (duration: 00m 18s)
* 16:11 arnoldokoth: drain kubestage1002 node in prep for decommissioning
* 22:31 logmsgbot: legoktm Synchronized php-1.26wmf12/extensions/UserMerge/: Added USE_MULTI_COMMIT flag to enable query batching (duration: 00m 26s)
* 16:05 arnoldokoth: drain kubestage1001 node in prep for decommissioning
* 21:51 logmsgbot: legoktm Synchronized php-1.26wmf12/extensions/Interwiki/Interwiki_body.php: Add missing global $wgInterwikiViewOnly declaration (duration: 00m 15s)
* 15:46 elukey: move /var/tmp/core/* to /srv/coredumps on ores1008 to free root space
* 21:37 twentyafterfour: restarted apache2 or iridium after applying hotfix for phabricator css issue
* 14:30 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:22 logmsgbot: legoktm Synchronized php-1.26wmf12/extensions/CentralNotice/: https://gerrit.wikimedia.org/r/222484 (duration: 00m 15s)
* 14:25 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:16 cwdent: updated civicrm from 4fe0648ea9f36282731bf651a59ca1a617db6c08 to 04efc7d5c7bbb068f907125f2184692aee676123
* 14:21 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:47 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Disable global merge (duration: 00m 14s)
* 13:48 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:13 andrewbogott: restarted keystone on labcontrol1001
* 13:46 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:54 bd808: Running sync-common on mw1111; fatal log showed it to be running 1.26wmf9
* 13:25 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 18:30 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf12
* 13:25 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 18:02 YuviPanda: running exportfs -ra on labstore1002
* 12:21 vgutierrez: restarting HAProxy on O:cache::upload_haproxy - [[phab:T290005|T290005]]
* 16:40 bd808: Restarted logstash on logstash1001 due to OOM
* 11:41 akosiaris: [[phab:T296303|T296303]] cleanup weird state of calico-codfw cluster
* 16:05 bblack: cp1065 undowntimed/repooled
* 11:41 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 16:04 YuviPanda: clean out exports.d in labstore1002, will get regenerated. backup in /root/exports.backup
* 11:41 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 15:18 logmsgbot: anomie Synchronized php-1.26wmf12/extensions/Wikidata/: SWAT: Update Wikibase: SearchEntities return 'aliases' when not same as label [[gerrit:222311]] (duration: 00m 20s)
* 11:39 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 15:18 YuviPanda: killed icinga-wm again
* 11:25 vgutierrez: restarting HAProxy on O:cache::(text{{!}}upload)_haproxy - [[phab:T290005|T290005]]
* 15:17 bblack: depooled cp1065 in pybal/puppet
* 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after fixing users [[phab:T296274|T296274]]', diff saved to https://phabricator.wikimedia.org/P17880 and previous config saved to /var/cache/conftool/dbconfig/20211126-102340-ladsgroup.json
* 14:57 mutante: restarting gitblit on antimony for the 123443th time
* 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 ([[phab:T296274|T296274]])', diff saved to https://phabricator.wikimedia.org/P17879 and previous config saved to /var/cache/conftool/dbconfig/20211126-101714-ladsgroup.json
* 14:54 mutante: restarted apache on strontium
* 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 14:50 YuviPanda: killed icinga-wm for a bit
* 10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1111.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 14:43 YuviPanda: kicked puppetmaster on palladium
* 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after fixing users [[phab:T296274|T296274]]', diff saved to https://phabricator.wikimedia.org/P17878 and previous config saved to /var/cache/conftool/dbconfig/20211126-101423-ladsgroup.json
* 14:28 YuviPanda: restarted apache on labcontrol1001
* 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 ([[phab:T296274|T296274]])', diff saved to https://phabricator.wikimedia.org/P17877 and previous config saved to /var/cache/conftool/dbconfig/20211126-100547-ladsgroup.json
* 14:14 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: depool db2029 again: T104573 (duration: 00m 12s)
* 10:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 13:58 urandom: restarted restbase1005.eqiad
* 10:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance [[phab:T296274|T296274]]
* 13:49 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool db2029; depool db2047 for maintenance (duration: 00m 13s)
* 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:19 mobrovac: restbase restarting cassandra on rb1005
* 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:06 logmsgbot: krinkle Synchronized w/touch.php: T104538 (duration: 00m 11s)
* 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17876 and previous config saved to /var/cache/conftool/dbconfig/20211126-082834-ladsgroup.json
* 07:05 logmsgbot: krinkle Synchronized w/favicon.php: T104538 (duration: 00m 11s)
* 08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17875 and previous config saved to /var/cache/conftool/dbconfig/20211126-081329-ladsgroup.json
* 06:34 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Emergency depool of db2029 (duration: 00m 12s)
* 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17874 and previous config saved to /var/cache/conftool/dbconfig/20211126-075824-ladsgroup.json
* 06:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul  2 06:27:57 UTC 2015 (duration 27m 56s)
* 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17873 and previous config saved to /var/cache/conftool/dbconfig/20211126-074320-ladsgroup.json
* 04:18 ori: depooled mw1152.
* 06:28 Amir1: killing extensions/MachineVision/maintenance/fetchSuggestions.php in mwmaint
* 03:38 logmsgbot: krinkle Synchronized docroot/default/index.html: 6d49d229806 (duration: 00m 12s)
* 06:19 Amir1: killing lingering process from mwmaint to depooled db (db1160) that was depooled nine hours ago
* 03:37 logmsgbot: krinkle Synchronized 404.html: 6d49d229806 (duration: 00m 12s)
* 03:14 logmsgbot: legoktm Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s)
* 02:54 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-02 02:54:06+00:00
* 02:52 logmsgbot: krinkle Synchronized docroot and w: 245a1ff (duration: 00m 12s)
* 02:51 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 05m 19s)
* 02:37 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-07-02 02:37:03+00:00
* 02:30 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 10m 23s)
* 00:44 ori: Repooling mw1152 (HHVM image scaler) for testing)


== July 1 ==
== 2021-11-25 ==
* 23:30 springle: restart mysqld dbstore2002 T104471
* 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17872 and previous config saved to /var/cache/conftool/dbconfig/20211125-204357-ladsgroup.json
* 23:06 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222202/ (duration: 00m 11s)
* 20:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 21:39 godog: bounce gitblit
* 20:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 20:38 jgage: restarted gitblit on antimony
* 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 19:50 ori: restarted gitblit on antimony
* 19:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 19:49 ori: mw1152 not actually re-pooled because of ongoing work on palladium. I'm undoing the change and hanging back now.
* 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17871 and previous config saved to /var/cache/conftool/dbconfig/20211125-192850-ladsgroup.json
* 19:41 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf12
* 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17870 and previous config saved to /var/cache/conftool/dbconfig/20211125-191345-ladsgroup.json
* 19:36 logmsgbot: twentyafterfour Synchronized php-1.26wmf12: sync 1.26wmf12 branch revert of "Implement support for Google reCAPTCHA 2.0" 90665a737bc25ff3c859044755d662c6cd700573 (duration: 02m 04s)
* 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17869 and previous config saved to /var/cache/conftool/dbconfig/20211125-185841-ladsgroup.json
* 19:31 jynus: replication issues for shard s7 on dbstore2001 and dbstore2002, production applications *not* affected
* 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17868 and previous config saved to /var/cache/conftool/dbconfig/20211125-184336-ladsgroup.json
* 19:31 urandom: from restbase1002; node thin_out_key_rev_value_data.js `hostname -i` local_group_wikipedia_T_parsoid_html 2>&1 | pv --line-mode | gzip -c > wikipedia_T_parsoid_html.log.gz
* 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17867 and previous config saved to /var/cache/conftool/dbconfig/20211125-172714-ladsgroup.json
* 19:28 ori: Repooling mw1152 for further testing of HHVM scaler
* 17:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1149.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 19:03 logmsgbot: hoo Synchronized php-1.26wmf12/extensions/Wikidata/: Update DataModel to fix SnakList (duration: 00m 20s)
* 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1149.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 18:42 logmsgbot: hoo Synchronized wmf-config/mobile-labs.php: consistency (duration: 00m 12s)
* 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17866 and previous config saved to /var/cache/conftool/dbconfig/20211125-172707-ladsgroup.json
* 18:41 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings-labs.php: consistency (duration: 00m 31s)
* 17:12 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=inference
* 18:02 andrewbogott: restarted keystone on labcontrol1001
* 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17864 and previous config saved to /var/cache/conftool/dbconfig/20211125-171202-ladsgroup.json
* 17:03 jgage: beginning puppet CA replacement procedure
* 16:57 volans@deploy1002: Finished deploy [netbox/deploy@87a36a7]: Deploy v2.10.4-wmf6 (duration: 06m 59s)
* 16:06 ejegg: enabled queue consumers
* 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17863 and previous config saved to /var/cache/conftool/dbconfig/20211125-165657-ladsgroup.json
* 16:05 akosiaris: re-enabling ntp everywhere
* 16:50 volans@deploy1002: Started deploy [netbox/deploy@87a36a7]: Deploy v2.10.4-wmf6
* 15:59 ejegg: disabled queue consumers
* 16:49 jynus@cumin1001: dbctl commit (dc=all): 'Fully repool db1163', diff saved to https://phabricator.wikimedia.org/P17862 and previous config saved to /var/cache/conftool/dbconfig/20211125-164941-jynus.json
* 15:30 logmsgbot: hoo Synchronized php-1.26wmf12/extensions/Wikidata/: Remove alias uniqueness constraints (duration: 00m 21s)
* 16:46 volans@deploy1002: Finished deploy [netbox/deploy@87a36a7]: Test v2.10.4-wmf6 on netbox-next (duration: 01m 04s)
* 15:06 urandom: restbase1002: PWD=/home/eevans/restbase-mod-table-cassandra/maintenance; node thin_out_key_rev_value_data.js `hostname -i` local_group_wikimedia_T_parsoid_html 2>&1 | pv --line-mode | gzip -c > wikimedia_T_parsoid_html.log.gz
* 16:45 volans@deploy1002: Started deploy [netbox/deploy@87a36a7]: Test v2.10.4-wmf6 on netbox-next
* 15:05 bblack: re-enabling puppet on caches
* 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17861 and previous config saved to /var/cache/conftool/dbconfig/20211125-164153-ladsgroup.json
* 14:59 bblack: disabling puppet on caches (because puppet always breaks when you move files/modules around...)
* 16:18 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163++', diff saved to https://phabricator.wikimedia.org/P17860 and previous config saved to /var/cache/conftool/dbconfig/20211125-161833-jynus.json
* 13:57 bblack: rebooting cp2001 (test kernel update)
* 16:14 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163+', diff saved to https://phabricator.wikimedia.org/P17859 and previous config saved to /var/cache/conftool/dbconfig/20211125-161404-jynus.json
* 11:32 YuviPanda: rsync on labstore1002 finished, restarting to see what was skipped + errors
* 16:10 klausman: restarting pybal on lvs2009 [[phab:T289835|T289835]]
* 10:47 moritzm: installed patch security updates on 862 hosts
* 15:57 vgutierrez: restarting pybal  on lvs2010 - [[phab:T289835|T289835]]
* 10:42 hashar: restarting Jenkins: upgrading Jenkins gearman plugin from 0.1.1-8-gf2024bd to 0.1.1-9-g08e9c42-change_192429_2  https://phabricator.wikimedia.org/T72597#1416913
* 15:55 jynus@cumin1001: dbctl commit (dc=all): 'Slowly repool db1163', diff saved to https://phabricator.wikimedia.org/P17856 and previous config saved to /var/cache/conftool/dbconfig/20211125-155538-jynus.json
* 07:48 mobrovac: restbase restarting cassandra on rb1005
* 15:47 jynus: reenable gtid on db1163
* 05:28 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul  1 05:28:38 UTC 2015 (duration 28m 37s)
* 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17853 and previous config saved to /var/cache/conftool/dbconfig/20211125-152906-ladsgroup.json
* 05:27 csteipp: deployed patch for T103765
* 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1148.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 04:41 logmsgbot: krinkle Synchronized php-1.26wmf12/includes/resourceloader/ResourceLoader.php: Iee884208c5c4b minify cache key (duration: 00m 11s)
* 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1148.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 03:10 mutante: git pull on strontium
* 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17852 and previous config saved to /var/cache/conftool/dbconfig/20211125-152858-ladsgroup.json
* 03:00 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-01 03:00:21+00:00
* 15:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1001.eqiad.wmnet
* 02:53 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 10m 12s)
* 15:19 klausman@cumin1001: conftool action : set/pooled=yes:weight=1; selector: cluster=ml_serve,service=kubesvc
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-07-01 02:26:55+00:00
* 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17851 and previous config saved to /var/cache/conftool/dbconfig/20211125-151354-ladsgroup.json
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 06m 50s)
* 15:13 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping1001.eqiad.wmnet
* 02:12 springle: upgrade db1034 trusty
* 15:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping3001.esams.wmnet
* 01:37 ori: Depooled mw1152. Req error dashboard shows elevated 5xx rates correlating with the server getting pooled, but the logs don't appear to corroborate it. Odd.
* 15:05 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping3001.esams.wmnet
* 01:03 ori: Disabling Puppet on mw1152 for 12h to hack apache config to log locally
* 15:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2001.codfw.wmnet
* 00:42 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I9a8018981: Double $wgMaxShellMemory on HHVM scalers (512 Mb => 1024 Mb) (duration: 00m 12s)
* 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17850 and previous config saved to /var/cache/conftool/dbconfig/20211125-145849-ladsgroup.json
* 00:34 ori: pooled mw1152 (HHVM rendering) at weight 10 for testing
* 14:54 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts ping2001.codfw.wmnet
* 00:33 gwicke: rolling cassandra restart done
* 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17849 and previous config saved to /var/cache/conftool/dbconfig/20211125-144344-ladsgroup.json
* 00:23 gwicke: starting rolling restart of cassandra nodes to apply new config
* 14:42 XioNoX: Update ping redirect to point to new ping VMs - [[phab:T295767|T295767]]
* 00:01 greg-g: we're still here
* 14:25 jayme: uncordoned kubestage1003.eqiad.wmnet kubestage1004.eqiad.wmnet - [[phab:T293729|T293729]]
* 14:17 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:16 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 14:12 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 13:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping1002.eqiad.wmnet
* 13:32 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping1002.eqiad.wmnet
* 13:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2002.codfw.wmnet
* 13:28 Amir1: killing lingering process from mwmaint to depooled db1147
* 13:20 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping2002.codfw.wmnet
* 13:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping3002.esams.wmnet
* 13:05 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host ping3002.esams.wmnet
* 12:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase202[1-3].codfw.wmnet: Restarting for certificate updates - hnowlan@cumin1001
* 12:14 arturo: update repo bullseye-wikimedia/thirdparty/ceph-octopus ([[phab:T296175|T296175]])
* 12:14 jynus: disable temp. gtid on db1163
* 12:11 jynus@cumin1001: dbctl commit (dc=all): 'Temp. depool db1163 fully', diff saved to https://phabricator.wikimedia.org/P17847 and previous config saved to /var/cache/conftool/dbconfig/20211125-121138-jynus.json
* 12:04 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1163 load even more', diff saved to https://phabricator.wikimedia.org/P17846 and previous config saved to /var/cache/conftool/dbconfig/20211125-120435-jynus.json
* 11:56 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase202[1-3].codfw.wmnet: Restarting for certificate updates - hnowlan@cumin1001
* 11:56 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1163 load', diff saved to https://phabricator.wikimedia.org/P17845 and previous config saved to /var/cache/conftool/dbconfig/20211125-115602-jynus.json
* 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17844 and previous config saved to /var/cache/conftool/dbconfig/20211125-110443-ladsgroup.json
* 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1147.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1147.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17843 and previous config saved to /var/cache/conftool/dbconfig/20211125-110435-ladsgroup.json
* 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17842 and previous config saved to /var/cache/conftool/dbconfig/20211125-104930-ladsgroup.json
* 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17841 and previous config saved to /var/cache/conftool/dbconfig/20211125-103425-ladsgroup.json
* 10:25 vgutierrez: rolling restart of varnish and HAProxy on cp2042.codfw.wmnet,cp1090.eqiad.wmnet,cp[5012].eqsin.wmnet,cp3065.esams.wmnet,cp[4026,4032].ulsfo.wmnet' to disable PROXY protocol - [[phab:T290005|T290005]]
* 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17840 and previous config saved to /var/cache/conftool/dbconfig/20211125-101921-ladsgroup.json
* 09:55 jelto@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=(apertium{{!}}api-gateway{{!}}apple-search{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntaxh
* 09:45 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 09:43 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:39 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 09:37 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 09:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 09:29 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 09:27 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 09:24 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 09:23 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 09:21 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 09:19 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 09:16 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 09:10 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:05 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 09:02 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:59 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:51 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 08:50 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 08:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17837 and previous config saved to /var/cache/conftool/dbconfig/20211125-084834-ladsgroup.json
* 08:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:47 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:43 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 08:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:40 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 08:40 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1146.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 08:37 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 08:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 08:34 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 08:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 08:31 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 08:28 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 08:28 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 08:25 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 08:25 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 08:22 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:22 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 08:21 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 08:21 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 08:18 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 08:17 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 08:14 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 08:13 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 08:09 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 08:08 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 08:05 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 08:03 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:02 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:00 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 07:57 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 07:56 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1128.eqiad.wmnet with OS bullseye
* 07:51 jelto@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=(echostore{{!}}sessionstore)
* 07:49 marostegui: Stop mysql on db1133 to clone db1128 as a test host [[phab:T295965|T295965]]
* 07:49 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 07:48 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 07:47 jayme: elevated MediaWiki exceptions and fatals (from ~07:35) due to a mistake during re-deploy of eventgate-main
* 07:45 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 07:35 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 07:32 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 07:32 jelto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 07:29 elukey_: elukey@mwdebug2002:~$ sudo systemctl reset-failed ifup@ens5.service
* 07:27 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1128.eqiad.wmnet with OS bullseye
* 07:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1145.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:20 jelto@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=(apertium{{!}}api-gateway{{!}}apple-search{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntax
* 07:17 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 32 hosts with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 07:17 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 32 hosts with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 07:10 jelto: downtime PyBal backends health check on lvs1015 and lvs1016 for helm3 de-deploy [[phab:T251305|T251305]]. I'm keeping an eye on icing and remove downtime as soon as I'm finished
* 07:09 jelto: start re-deploy procedure in eqiad Kubernetes [[phab:T251305|T251305]]
* 06:31 marostegui: Restart tendril's DB
* 05:51 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 04:45 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@29c5cd7] (wcqs): Deploy 0.3.93 to WCQS (duration: 05m 27s)
* 04:43 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.93` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet
* 04:40 ryankemper@deploy1002: Started deploy [wdqs/wdqs@29c5cd7] (wcqs): Deploy 0.3.93 to WCQS
* 04:39 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 04:38 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 04:38 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 04:35 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@29c5cd7]: 0.3.93 (duration: 09m 23s)
* 04:30 ryankemper: [Elastic] Cleaning up dangling apt packages: `ryankemper@cumin1001:~$ sudo cumin -b 4 'elastic*' 'sudo apt autoremove -y'`
* 04:27 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.93` on canary `wdqs1003`; proceeding to rest of fleet
* 04:25 ryankemper@deploy1002: Started deploy [wdqs/wdqs@29c5cd7]: 0.3.93
* 04:25 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.93`. Pre-deploy tests passing on canary `wdqs1003`
* 03:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2072.codfw.wmnet with OS buster
* 02:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2072.codfw.wmnet with OS buster
* 02:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2071.codfw.wmnet with OS buster
* 02:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2070.codfw.wmnet with OS buster
* 02:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2071.codfw.wmnet with OS buster
* 01:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2070.codfw.wmnet with OS buster
* 01:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2068.codfw.wmnet with OS buster
* 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2067.codfw.wmnet with OS buster
* 01:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2068.codfw.wmnet with OS buster
* 01:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2067.codfw.wmnet with OS buster
* 00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2066.codfw.wmnet with OS buster


== June 30 ==
== 2021-11-24 ==
* 23:30 logmsgbot: hoo Synchronized php-1.26wmf12/extensions/Wikidata/: Fix EntityParserOutputGenerator (duration: 00m 21s)
* 23:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2066.codfw.wmnet with OS buster
* 22:55 ori: depooled mw1152
* 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2065.codfw.wmnet with OS buster
* 22:52 ori: Pooled HHVM image scaler (mw1152) at weight 1 for testing.
* 23:44 mutante: puppetmaster1001:~] $ sudo puppet cert sign gitlab-runner1001.eqiad.wmnet {{!}}  sudo install_console gitlab-runner1001.eqiad.wmnet ([[phab:T295481|T295481]])
* 22:52 gwicke: updated restbase1004 to openjdk-8
* 23:26 mutante: ganeti - bringing up new VM - sudo gnt-instance start gitlab-runner1001.eqiad.wmnet ; ran puppet on install1003; installing OS [[phab:T295481|T295481]]
* 22:46 bblack: restarting gitblit on antimony, because Java is so 1996
* 23:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2065.codfw.wmnet with OS buster
* 22:43 tgr: running eval.php (along the lines of https://gerrit.wikimedia.org/r/#/c/221783) on commonswiki to fix T104395
* 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2064.codfw.wmnet with OS buster
* 22:13 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Flow-occupy Wikipedia talk namespace on cawiki (duration: 00m 11s)
* 23:09 mutante: mwmaint1002 - sudo /usr/bin/find /var/lib/puppet/clientbucket/ -type f -size 1M -delete  - to fix Icinga alert about large files in client bucket
* 22:09 matt_flaschen: Done converting wikitext namespace to Flow on Catalan Wikipedia
* 23:08 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner1001.eqiad.wmnet
* 22:03 matt_flaschen: Started convertNamespaceFromWikitext.php for Project_talk on Catalan Wikipedia
* 23:03 mutante: wcqs1001 -  sudo systemctl restart wcqs-blazegraph - after <+jinxer-wm> (BlazegraphFreeAllocatorsDecreasingRapidly) firing: Blazegraph instance wcqs1001:9195 is burning free allocators
* 21:46 RoanKattouw: Also ran populateContentModel.php --table=archive for talk namespaces on officewiki
* 22:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner1001.eqiad.wmnet
* 21:45 RoanKattouw: Ran populateContentModel.php --table=archive --ns=5 on officewiki
* 22:50 mutante: Creating a new Ganeti VM and wondering which row to put it? [ganeti1009:~] $ for row in A B C D; do echo "row $<nowiki>{</nowiki>row<nowiki>}</nowiki>: $(sudo gnt-instance list -o name -F "pnode.group == 'row_$<nowiki>{</nowiki>row<nowiki>}</nowiki>'" {{!}} wc -l) VMs"; done
* 21:29 RoanKattouw: Ran populateContentModel.php --table=page --ns=5 on cawiki
* 22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gitlab-runner1001.wikimedia.org
* 21:19 logmsgbot: catrope Synchronized php-1.26wmf12/extensions/Flow: (no message) (duration: 00m 14s)
* 22:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2064.codfw.wmnet with OS buster
* 21:19 logmsgbot: catrope Synchronized php-1.26wmf11/extensions/Flow: (no message) (duration: 00m 14s)
* 22:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2063.codfw.wmnet with OS buster
* 21:14 logmsgbot: catrope Synchronized php-1.26wmf12/extensions/Flow: (no message) (duration: 00m 14s)
* 22:38 mutante: running decom cookbook on gitlab-runner1001.wikimedia.org VM which was in state "ADMIN_down" and not used yet. to make room to recreate it as gitlab-runner1001.eqiad.wmnet [[phab:T295481|T295481]]
* 21:14 logmsgbot: catrope Synchronized php-1.26wmf11/extensions/Flow: (no message) (duration: 00m 13s)
* 22:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts gitlab-runner1001.wikimedia.org
* 21:01 RoanKattouw: Running populateContentModel.php on officewiki for page table in namespaces occupied by Flow (1,3,5,7,9,11,13,15,91,93,101,111,113,829)
* 22:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2063.codfw.wmnet with OS buster
* 20:58 logmsgbot: catrope Synchronized php-1.26wmf12/maintenance/: Add populateContentModel maintenance script (duration: 00m 13s)
* 22:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2062.codfw.wmnet with OS buster
* 20:58 logmsgbot: catrope Synchronized php-1.26wmf11/maintenance/: Add populateContentModel maintenance script (duration: 00m 17s)
* 21:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:53 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings.php: Log 'wbq_evaluation' (duration: 00m 12s)
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings.php: Enable WikibaseQuality extensions on testwikidata (duration: 00m 14s)
* 21:35 legoktm@deploy1002: Synchronized wmf-config/: Improve docs on $wmgUseGlobalAbuseFilters and sort list of wikis (duration: 00m 57s)
* 20:39 hoo: Created `wbqc_constraints` on testwikidatawiki (s3).
* 21:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2062.codfw.wmnet with OS buster
* 20:23 logmsgbot: thcipriani rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf12
* 21:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2061.codfw.wmnet with OS buster
* 20:15 logmsgbot: thcipriani Purged l10n cache for 1.26wmf6
* 21:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:14 logmsgbot: thcipriani Purged l10n cache for 1.26wmf7
* 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:14 logmsgbot: thcipriani Purged l10n cache for 1.26wmf8
* 20:54 legoktm@deploy1002: Synchronized wmf-config/: Update configuration related to disabling Score functionality (duration: 00m 57s)
* 20:13 logmsgbot: thcipriani Purged l10n cache for 1.26wmf9
* 20:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2061.codfw.wmnet with OS buster
* 20:13 logmsgbot: thcipriani Purged l10n cache for 1.26wmf10
* 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17834 and previous config saved to /var/cache/conftool/dbconfig/20211124-194857-ladsgroup.json
* 20:05 logmsgbot: thcipriani Finished scap: testwiki to php-1.26wmf12 and rebuild l10n cache (duration: 34m 58s)
* 19:38 razzi: `sudo maintain-views --all-databases --replace-all` on clouddb1018 for [[phab:T292594|T292594]]
* 19:41 ostriches: OAI: disabled unused accounts
* 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17833 and previous config saved to /var/cache/conftool/dbconfig/20211124-193352-ladsgroup.json
* 19:30 logmsgbot: thcipriani Started scap: testwiki to php-1.26wmf12 and rebuild l10n cache
* 19:19 razzi: run `maintain-views --all-databases --replace-all` on clouddb1013 for [[phab:T292594|T292594]]
* 19:00 logmsgbot: demon Synchronized php-1.26wmf11/includes/WebResponse.php: rv my test (duration: 00m 12s)
* 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17832 and previous config saved to /var/cache/conftool/dbconfig/20211124-191847-ladsgroup.json
* 18:55 logmsgbot: demon Synchronized php-1.26wmf11/includes/WebResponse.php: (no message) (duration: 00m 12s)
* 19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17831 and previous config saved to /var/cache/conftool/dbconfig/20211124-190343-ladsgroup.json
* 18:36 cmjohnson1: labcontrol1002 going down for a few minutes
* 18:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir2002.codfw.wmnet
* 18:33 mutante: tendril - short downtime for switch to new repo
* 18:51 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2002.codfw.wmnet
* 18:17 gwicke: restarted cassandra on restbase1005 with g1gc GC and larger heap
* 18:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ncredir2001.codfw.wmnet
* 18:16 gwicke: restarted cassandra on restbase1004 with g1gc GC and larger heap
* 18:43 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2001.codfw.wmnet
* 17:02 akosiaris: enabled and ran puppet on lvs400X, lvs300X, lvs100[123]. noops
* 18:42 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM ncredir2001.codfw.wmnet
* 16:58 bblack: re-enabling puppet on caches
* 18:42 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ncredir2001.codfw.wmnet
* 16:52 bblack: disabling puppet on cache clusters
* 18:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief-test2001.codfw.wmnet
* 16:48 akosiaris: enabled an ran puppet on all lvs servers @ codfw
* 18:36 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief-test2001.codfw.wmnet
* 16:22 akosiaris: enabled and ran puppet on lvs1004. noop as well
* 18:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM acmechief2001.codfw.wmnet
* 16:19 akosiaris: enabled and running puppet on lvs1005
* 18:30 vgutierrez@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM acmechief2001.codfw.wmnet
* 16:11 akosiaris: enabling and running puppet on lvs1006
* 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17830 and previous config saved to /var/cache/conftool/dbconfig/20211124-174723-ladsgroup.json
* 16:09 akosiaris: disabling puppet on all lvs and neon
* 17:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 16:07 gwicke: restarting cassandra instance on restbase1004
* 17:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1144.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 15:12 logmsgbot: thcipriani Synchronized wmf-config: SWAT: Standardise a ton of ticket comments [[gerrit:221803]] (duration: 00m 13s)
* 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17829 and previous config saved to /var/cache/conftool/dbconfig/20211124-174615-ladsgroup.json
* 15:04 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Enable CX all wikipedias except enwiki [[gerrit:221831]] (duration: 00m 13s)
* 17:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: [[gerrit:741134{{!}}rdbms: Add full query to transaction profiler (T295706)]] (duration: 00m 56s)
* 14:46 kart_: Update cxserver to 0d21a80
* 17:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:10 mobrovac: restbase restarting cassandra on restbase1005
* 17:34 jhathaway@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=puppetboard
* 11:29 mobrovac: restbase restarting cassandra on restbase1005
* 17:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 10:41 mobrovac: restbase restarting on all nodes
* 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17828 and previous config saved to /var/cache/conftool/dbconfig/20211124-173110-ladsgroup.json
* 09:54 mobrovac: restbase restarting cassandra on restbase1004
* 17:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:53 mobrovac: restbase restrting cassandra on restbase1004
* 17:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:05 jynus: applying schema changes for Gather extension
* 17:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:56 jynus: initiating query profiling on db1018
* 17:23 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2016.codfw.wmnet
* 05:21 gwicke: restarting cassandra instance on restbase1004; was in small-write mode
* 17:22 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
* 05:17 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1034 (duration: 00m 12s)
* 17:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 04:37 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 30 04:37:00 UTC 2015 (duration 36m 59s)
* 17:21 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2016.codfw.wmnet
* 02:22 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-30 02:22:00+00:00
* 17:20 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM chartmuseum2001.codfw.wmnet
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 06m 09s)
* 17:20 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2015.codfw.wmnet
* 02:11 logmsgbot: krenair Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 12s)
* 17:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2015.codfw.wmnet
* 01:56 logmsgbot: krenair Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 11s)
* 17:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM chartmuseum2001.codfw.wmnet
* 01:41 logmsgbot: krinkle Synchronized php-1.26wmf11/includes/resourceloader/ResourceLoader.php: I7761242f01 (duration: 00m 14s)
* 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17827 and previous config saved to /var/cache/conftool/dbconfig/20211124-171604-ladsgroup.json
* 00:37 godog: restbase1* upgrade to cassandra 2.1.7 completed
* 17:11 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2006.codfw.wmnet
* 17:11 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry2004.codfw.wmnet
* 17:08 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
* 17:06 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry2004.codfw.wmnet
* 17:06 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2006.codfw.wmnet
* 17:05 mforns@deploy1002: Finished deploy [analytics/refinery@6253399] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6253399] (duration: 06m 45s)
* 17:05 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry2003.codfw.wmnet
* 17:01 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM registry2003.codfw.wmnet
* 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17826 and previous config saved to /var/cache/conftool/dbconfig/20211124-170100-ladsgroup.json
* 17:00 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubernetes2005.codfw.wmnet
* 16:58 mforns@deploy1002: Started deploy [analytics/refinery@6253399] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@6253399]
* 16:58 mforns@deploy1002: Finished deploy [analytics/refinery@6253399] (thin): Regular analytics weekly train THIN [analytics/refinery@6253399] (duration: 00m 07s)
* 16:58 mforns@deploy1002: Started deploy [analytics/refinery@6253399] (thin): Regular analytics weekly train THIN [analytics/refinery@6253399]
* 16:58 mforns@deploy1002: Finished deploy [analytics/refinery@6253399]: Regular analytics weekly train [analytics/refinery@6253399] (duration: 32m 50s)
* 16:56 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubernetes2005.codfw.wmnet
* 16:50 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:44 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:43 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2005.codfw.wmnet
* 16:43 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:42 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:42 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:41 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2005.codfw.wmnet
* 16:41 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:40 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:38 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2006.codfw.wmnet
* 16:36 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2002.codfw.wmnet
* 16:36 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2006.codfw.wmnet
* 16:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/libs/rdbms/: Backport: [[gerrit:741132{{!}}rdbms: Make TransactionProfiler logs more useful (T295706)]] (duration: 00m 57s)
* 16:33 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2002.codfw.wmnet
* 16:33 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubetcd2004.codfw.wmnet
* 16:33 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:33 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:31 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2003.codfw.wmnet
* 16:31 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubetcd2004.codfw.wmnet
* 16:29 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2003.codfw.wmnet
* 16:25 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagetcd2001.codfw.wmnet
* 16:25 mforns@deploy1002: Started deploy [analytics/refinery@6253399]: Regular analytics weekly train [analytics/refinery@6253399]
* 16:23 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2001.codfw.wmnet
* 16:23 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagetcd2001.codfw.wmnet
* 16:21 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2001.codfw.wmnet
* 16:19 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubemaster2002.codfw.wmnet
* 16:16 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubemaster2002.codfw.wmnet
* 16:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:13 Amir1: start of  "foreachwikiindblist s3 migrateRevisionActorTemp.php --sleep=2" in mwmaint1002 in a screen. It will take a month or  so ([[phab:T275246|T275246]])
* 16:09 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:09 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:00 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 16:00 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 16:00 btullis: systemctl reset-failed ifup@ens5.service on schema2004 [[phab:T273026|T273026]]
* 15:48 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema2004.codfw.wmnet
* 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17821 and previous config saved to /var/cache/conftool/dbconfig/20211124-154533-ladsgroup.json
* 15:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1143.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 15:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1143.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17820 and previous config saved to /var/cache/conftool/dbconfig/20211124-154236-ladsgroup.json
* 15:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafkamon2002.codfw.wmnet
* 15:39 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM schema2004.codfw.wmnet
* 15:36 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM schema2003.codfw.wmnet
* 15:36 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM kafkamon2002.codfw.wmnet
* 15:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM irc2001.wikimedia.org
* 15:34 btullis@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM schema2003.codfw.wmnet
* 15:32 papaul: reboot ms-be2058 for firmware upgrade
* 15:31 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM irc2001.wikimedia.org
* 15:30 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2001.codfw.wmnet
* 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17819 and previous config saved to /var/cache/conftool/dbconfig/20211124-152731-ladsgroup.json
* 15:23 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2001.codfw.wmnet
* 15:21 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dragonfly-supernode2001.codfw.wmnet
* 15:17 jayme@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM dragonfly-supernode2001.codfw.wmnet
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab2001.wikimedia.org
* 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17817 and previous config saved to /var/cache/conftool/dbconfig/20211124-151226-ladsgroup.json
* 15:08 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:08 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:08 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM gitlab2001.wikimedia.org
* 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 15:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow2001.codfw.wmnet
* 15:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow2001.codfw.wmnet
* 14:59 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2002.codfw.wmnet
* 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17815 and previous config saved to /var/cache/conftool/dbconfig/20211124-145721-ladsgroup.json
* 14:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum2002.codfw.wmnet
* 14:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM search-loader2001.codfw.wmnet
* 14:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM search-loader2001.codfw.wmnet
* 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2001.codfw.wmnet
* 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:49 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:44 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM durum2001.codfw.wmnet
* 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2002.wikimedia.org
* 14:39 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2031.codfw.wmnet
* 14:36 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2031.codfw.wmnet
* 14:36 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh2002.wikimedia.org
* 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2001.wikimedia.org
* 14:33 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:32 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:31 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2030.codfw.wmnet
* 14:31 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:31 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:30 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
* 14:30 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:30 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:28 godog: systemctl reset-failed ifup@ens5.service on logstash2024 [[phab:T273026|T273026]]
* 14:28 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:28 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:27 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:27 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp2001.wikimedia.org
* 14:26 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2030.codfw.wmnet
* 14:23 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp2001.wikimedia.org
* 14:21 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2025.codfw.wmnet
* 14:19 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 14:19 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 14:15 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2025.codfw.wmnet
* 14:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2001.wikimedia.org
* 14:10 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2024.codfw.wmnet
* 14:06 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2001.wikimedia.org
* 14:00 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2024.codfw.wmnet
* 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM serpens.wikimedia.org
* 13:55 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2023.codfw.wmnet
* 13:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM serpens.wikimedia.org
* 13:49 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2023.codfw.wmnet
* 13:41 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2006.codfw.wmnet
* 13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1142.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 13:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1142.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 13:39 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2006.codfw.wmnet
* 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17813 and previous config saved to /var/cache/conftool/dbconfig/20211124-133809-ladsgroup.json
* 13:37 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2005.codfw.wmnet
* 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2006.wikimedia.org
* 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17812 and previous config saved to /var/cache/conftool/dbconfig/20211124-133628-ladsgroup.json
* 13:36 XioNoX: add Jayme r/o user to all network devices
* 13:35 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2005.codfw.wmnet
* 13:34 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2006.wikimedia.org
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2005.wikimedia.org
* 13:30 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM logstash2004.codfw.wmnet
* 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2005.wikimedia.org
* 13:27 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2004.codfw.wmnet
* 13:27 filippo@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM logstash2004.codfw.wmnet
* 13:27 filippo@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM logstash2004.codfw.wmnet
* 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-corp2001.wikimedia.org
* 13:22 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-corp2001.wikimedia.org
* 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17811 and previous config saved to /var/cache/conftool/dbconfig/20211124-131519-ladsgroup.json
* 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17810 and previous config saved to /var/cache/conftool/dbconfig/20211124-130200-ladsgroup.json
* 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM apt2001.wikimedia.org
* 12:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM apt2001.wikimedia.org
* 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM grafana2001.codfw.wmnet
* 12:51 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM grafana2001.codfw.wmnet
* 12:48 jbond: enable puppet post puppetdb reboot
* 12:48 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:47 jbond@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetdb2002.codfw.wmnet
* 12:46 jelto@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(apertium{{!}}api-gateway{{!}}apple-search{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntaxh
* 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'After maintenance db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17809 and previous config saved to /var/cache/conftool/dbconfig/20211124-124420-ladsgroup.json
* 12:43 jbond@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM puppetdb2002.codfw.wmnet
* 12:37 jbond: disable puppet for puppetdb reboot
* 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader2002.wikimedia.org
* 12:29 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 12:29 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader2002.wikimedia.org
* 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader2001.wikimedia.org
* 12:25 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader2001.wikimedia.org
* 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM releases2002.codfw.wmnet
* 12:23 awight: EU scap deployment finished
* 12:21 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM releases2002.codfw.wmnet
* 12:21 awight@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:737195{{!}}Replace global with parent scope]] (duration: 00m 55s)
* 12:16 awight@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:737193{{!}}[lint] fully-qualify classname]] (duration: 00m 55s)
* 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netboxdb2001.codfw.wmnet
* 12:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netboxdb2001.codfw.wmnet
* 12:10 awight@deploy1002: Synchronized wmf-config: Config: [[gerrit:740766{{!}}VisualEditor template dialog: new sidebar and inline descriptions (T284203, T286992)]] (duration: 00m 57s)
* 12:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox2001.wikimedia.org
* 12:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:03 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netbox2001.wikimedia.org
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netbox-dev2001.wikimedia.org
* 12:02 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 12:01 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 11:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netbox-dev2001.wikimedia.org
* 11:58 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 11:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM rpki2002.codfw.wmnet
* 11:56 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM rpki2002.codfw.wmnet
* 11:53 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter2003.codfw.wmnet
* 11:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 11:49 moritzm: systemctl reset-failed ifup@ens5.service on poolcounter2003 [[phab:T273026|T273026]]
* 11:48 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 11:45 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 11:45 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 11:44 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter2003.codfw.wmnet
* 11:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM poolcounter2004.codfw.wmnet
* 11:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 11:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 11:37 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM poolcounter2004.codfw.wmnet
* 11:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 11:35 godog: bounce apache2 on logstash1025
* 11:35 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 11:32 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:27 Amir1: optimizing image.commonswiki in db1141 ([[phab:T296143|T296143]])
* 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17808 and previous config saved to /var/cache/conftool/dbconfig/20211124-112539-ladsgroup.json
* 11:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1141.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1141.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2004.codfw.wmnet
* 11:23 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:21 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:19 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2004.codfw.wmnet
* 11:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2003.codfw.wmnet
* 11:15 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2003.codfw.wmnet
* 11:13 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf2002.codfw.wmnet
* 11:05 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 11:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf2002.codfw.wmnet
* 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM webperf2001.codfw.wmnet
* 10:53 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 10:52 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 10:51 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM webperf2001.codfw.wmnet
* 10:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 10:50 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM xhgui2001.codfw.wmnet
* 10:48 XioNoX: rollback: disable ping-offload for codfw - [[phab:T294119|T294119]]
* 10:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM xhgui2001.codfw.wmnet
* 10:47 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 10:46 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 10:44 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 10:42 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM people2002.codfw.wmnet
* 10:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:40 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM people2002.codfw.wmnet
* 10:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:38 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 10:36 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 10:33 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 10:33 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ping2001.codfw.wmnet
* 10:28 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ping2001.codfw.wmnet
* 10:27 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 10:25 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:25 XioNoX: disable ping-offload for codfw - [[phab:T294119|T294119]]
* 10:24 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:21 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:20 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:20 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:18 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 10:17 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:14 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:13 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:12 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 10:06 jelto: downtime PyBal backends health check for helm3 de-deploy [[phab:T251305|T251305]]. I'm keeping an eye on icing and remove downtime as soon as I'm finished
* 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetboard2002.codfw.wmnet
* 10:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetboard2002.codfw.wmnet
* 10:02 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
* 10:02 vgutierrez: repool cp5006 - [[phab:T290005|T290005]]
* 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetboard2001.codfw.wmnet
* 10:00 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
* 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetboard2001.codfw.wmnet
* 09:58 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM debmonitor2002.codfw.wmnet
* 09:56 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:56 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
* 09:55 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM debmonitor2002.codfw.wmnet
* 09:54 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:53 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
* 09:53 vgutierrez: restart varnish/haproxy on cp5006 - [[phab:T290005|T290005]]
* 09:53 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
* 09:52 jelto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install2003.wikimedia.org
* 09:49 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
* 09:46 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
* 09:46 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM install2003.wikimedia.org
* 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mx2001.wikimedia.org
* 09:45 vgutierrez: depool cp5006 - [[phab:T290005|T290005]]
* 09:43 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
* 09:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mx2001.wikimedia.org
* 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM planet2002.codfw.wmnet
* 09:30 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM planet2002.codfw.wmnet
* 09:30 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=apple-search,name=eqiad
* 09:24 jelto@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=(apertium{{!}}api-gateway{{!}}blubberoid{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventstreams{{!}}eventstreams-internal{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}sessionstore{{!}}shellbox{{!}}shellbox-constraints{{!}}shellbox-media{{!}}shellbox-syntaxhighlight{{!}}she
* 09:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM failoid2002.codfw.wmnet
* 09:20 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM failoid2002.codfw.wmnet
* 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
* 09:16 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
* 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on zotero.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on zotero.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wikifeeds.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on wikifeeds.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on termbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on termbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on tegola-vector-tiles.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on tegola-vector-tiles.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on similar-users.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on similar-users.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-timeline.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-timeline.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-syntaxhighlight.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-syntaxhighlight.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-media.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-media.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox-constraints.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox-constraints.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on shellbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on shellbox.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on sessionstore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on sessionstore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on recommendation-api.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on recommendation-api.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on push-notifications.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on push-notifications.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on proton.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on proton.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mobileapps.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mobileapps.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mathoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mathoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on linkrecommendation.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on linkrecommendation.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventstreams-internal.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventstreams-internal.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventstreams.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventstreams.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-main.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-main.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-logging-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-logging-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:14 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-analytics-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-analytics-external.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on eventgate-analytics.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on eventgate-analytics.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on echostore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on echostore.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cxserver.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on cxserver.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on citoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on citoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on blubberoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on blubberoid.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on apple-search.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on apple-search.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on api-gateway.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:13 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on api-gateway.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on apertium.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on apertium.svc.codfw.wmnet with reason: helm3 de-deploy [[phab:T251305|T251305]]
* 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM deneb.codfw.wmnet
* 09:08 _joe_: switching search.wikimedia.org to be served by the apple-search servcie
* 09:04 jelto: start re-deploy procedure in codfw Kubernetes [[phab:T251305|T251305]]
* 09:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM deneb.codfw.wmnet
* 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:56 _joe_: repooling cp2027
* 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:55 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:51 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:741082{{!}}Set actor migration to write both on all wikis (T275246)]] (duration: 00m 57s)
* 08:51 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:41 vgutierrez: depool cp2027
* 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
* 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 07:23 elukey: reboot kubernetes1018 (role::insetup) to verify negotiated speed of eth interface
* 07:12 elukey: drop /tmp/blockmgr-20fe4b2b-31fb-4a85-b5b1-{{Gerrit|bebe254120f8}} and other blockmgr-* dirs on stat1006 to free space on the root partition
* 06:47 Amir1: running optimize table with replication on db1155:3314 ([[phab:T296143|T296143]])
* 06:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance ([[phab:T296143|T296143]])
* 06:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance ([[phab:T296143|T296143]])
* 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17807 and previous config saved to /var/cache/conftool/dbconfig/20211124-063228-root.json
* 06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17806 and previous config saved to /var/cache/conftool/dbconfig/20211124-061725-root.json
* 06:05 marostegui: Upgrade db1128's kernel [[phab:T288720|T288720]]
* 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17805 and previous config saved to /var/cache/conftool/dbconfig/20211124-060221-root.json
* 05:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: After optimize table ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17804 and previous config saved to /var/cache/conftool/dbconfig/20211124-054718-root.json
* 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2012.codfw.wmnet with OS buster


== June 29 ==
== 2021-11-23 ==
* 23:57 robh: mw2027 was offline (blank screen on serial console). mgmt powercycled
* 23:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2012.codfw.wmnet with OS buster
* 23:48 godog: start upgrading restbase1* to cassandra 2.1.7
* 23:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2011.codfw.wmnet with OS buster
* 23:41 gwicke: restarted cassandra instance on restbase1004.eqiad; log showed many small writes and clients saw timeouts
* 23:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2011.codfw.wmnet with OS buster
* 23:29 gwicke: deployed restbase 32db4ce1e1
* 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2010.codfw.wmnet with OS buster
* 23:21 logmsgbot: ori Synchronized php-1.26wmf11/includes/resourceloader: I0e5f2d3b2: resourceloader: Add timing metrics for key operations (duration: 01m 12s)
* 22:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2010.codfw.wmnet with OS buster
* 23:15 logmsgbot: catrope Synchronized wmf-config/: wikitech cleanup (duration: 01m 08s)
* 22:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2009.codfw.wmnet with OS buster
* 23:11 RoanKattouw: ssh: connect to host mw2027.codfw.wmnet port 22: Connection timed out
* 21:58 tgr: UTC evening deploys done
* 23:11 RoanKattouw: Synced wmf-config/CommonSettings.php:  Remove survey access point in Popups
* 21:57 tgr@deploy1002: Finished scap: (no justification provided) (duration: 10m 03s)
* 23:09 godog: stop ircecho on neon, icinga spam
* 21:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS buster
* 22:53 gwicke: canary deploy of restbase 32db4ce1e1 on restbase1001.eqiad
* 21:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2009.codfw.wmnet with OS buster
* 21:30 urandom: restarting restbase1004 to apply new metrics reporting interval
* 21:53 krinkle@deploy1002: Finished deploy [integration/docroot@a3435a7]: (no justification provided) (duration: 00m 07s)
* 20:19 subbu: deployed parsoid sha ea98be88
* 21:53 krinkle@deploy1002: Started deploy [integration/docroot@a3435a7]: (no justification provided)
* 18:18 logmsgbot: ori Synchronized php-1.26wmf11/includes/db/LoadBalancer.php: I0e5f2d3b2: Use APC for caching slave lag times (duration: 01m 09s)
* 21:47 tgr@deploy1002: Started scap: (no justification provided)
* 18:00 cmjohnson1: powering down ms-be1015
* 21:47 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: [[gerrit:740777{{!}}Add Image: Validate GEInfoboxTemplates size (T294518)]] (duration: 00m 56s)
* 16:06 bblack: re-enabling puppet on caches
* 21:39 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/includes/Api/ApiQueryGrowthTasks.php: Backport: [[gerrit:740776{{!}}Structured task caching/filtering cherry-picks step 3]] (duration: 00m 55s)
* 15:51 bblack: disabling puppet on caches temporarily ...
* 21:35 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments: Backport: [[gerrit:740775{{!}}Structured task caching/filtering cherry-picks step 2]] (duration: 00m 57s)
* 15:49 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/OpenStackManager: https://gerrit.wikimedia.org/r/#/c/221648/ (duration: 00m 13s)
* 21:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS buster
* 15:29 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221405/ (duration: 00m 15s)
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221612/ (duration: 00m 12s)
* 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:24 logmsgbot: krenair Synchronized w/static/images/project-logos/zhwiki-hans-2x.png: https://gerrit.wikimedia.org/r/#/c/221113/ (duration: 00m 14s)
* 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:24 logmsgbot: krenair Synchronized w/static/images/project-logos/zhwiki-hans-1.5x.png: https://gerrit.wikimedia.org/r/#/c/221113/ (duration: 00m 12s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 logmsgbot: krenair Synchronized w/static/images/project-logos/zhwiki-hans.png: https://gerrit.wikimedia.org/r/#/c/221113/ (duration: 00m 12s)
* 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:20 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/221009/ (duration: 00m 11s)
* 20:04 legoktm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/Echo/: re-enable cross-wiki notifications by default ([[phab:T296270|T296270]]) (duration: 00m 57s)
* 15:18 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221047/ (duration: 00m 13s)
* 19:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:12 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/ContentTranslation/modules/tools/ext.cx.tools.link.js: https://gerrit.wikimedia.org/r/#/c/221605 (duration: 00m 13s)
* 19:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:02 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/ContentTranslation/modules/tools/ext.cx.tools.formatter.js: https://gerrit.wikimedia.org/r/#/c/221604/ (duration: 00m 14s)
* 19:52 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:34 jynus: rebooting and reinstalling db1022
* 19:51 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: {{Gerrit|7d5f779a73594bb11f359bda055f2c7af8e92feb}}: Structured task caching/filtering cherry-picks, step 1 (duration: 00m 56s)
* 12:06 YuviPanda: restarting rsync with new exclusions file on labstore1002 to codfw
* 19:42 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: {{Gerrit|c26e407118e1cd8e1e3fea6e2f4e3e43a609ea62}}: GrowthExperiments backports (duration: 01m 03s)
* 12:06 YuviPanda: excluded maps, mwoffliner and video project from rsync of broken FS to speed it up
* 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:59 YuviPanda: interupt rsync on labstore1001 to prevent it from copying mwofflienr files
* 19:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:00 _joe_: shutting down etcd1003, cleaning exported resources
* 19:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bf82bfb3ddcaff04a1e90abc435ccb26f792780c}}: Add new icons, wordmarks & taglines for several wikis ([[phab:T290091|T290091]]; 2/2) (duration: 00m 56s)
* 10:32 _joe_: effectively removing etcd1003 from the cluster
* 19:17 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: {{Gerrit|bf82bfb3ddcaff04a1e90abc435ccb26f792780c}}: Add new icons, wordmarks & taglines for several wikis ([[phab:T290091|T290091]]; 1/2) (duration: 00m 56s)
* 10:17 _joe_: starting removal of etcd1003 from the etcd cluster
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:49 _joe_: joined conf1003 to the etcd cluster
* 19:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3993aacbfdbbfb6cdcc198ce369bf08b32ace865}}: Increase reading depth sampling rate to .1% ([[phab:T294777|T294777]]) (duration: 00m 57s)
* 08:20 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Depool db1022 for reinstall (duration: 00m 12s)
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:12 _joe_: adding conf1002 to the etcd cluster as a member
* 18:29 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 07:46 akosiaris: disabling ntp everywhere expect selected hosts in anticipation for the leap second
* 18:25 ejegg: updated SmashPig standalone (IPN listener) from {{Gerrit|be68299b}} -> {{Gerrit|211f8e65}}
* 04:51 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 29 04:51:48 UTC 2015 (duration 51m 47s)
* 18:18 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 03:08 jgage: jmxtrans filled disks on all kafka brokers, 21GB log files. removed logs and restarted services.
* 18:18 cmjohnson1: upgrading msw-c1-eqiad [[phab:T259758|T259758]]
* 02:23 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-29 02:23:47+00:00
* 18:04 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 02:20 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 05m 53s)
* 18:01 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 00:52 springle: restart eventlogging auto-purge on m4
* 18:00 moritzm: systemctl reset-failed ifup@ens5.service on durum2001 [[phab:T273026|T273026]]
* 00:51 springle: restart replication on dbstore2002
* 17:59 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 00:00 springle: pausing replication on dbstore2002
* 17:55 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2002.codfw.wmnet
* 17:49 mutante: miscweb1002 - rm -rf /srv/deployments/scholarships ([[phab:T243037|T243037]])
* 17:47 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum2002.codfw.wmnet
* 17:42 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM durum2001.codfw.wmnet
* 17:35 ebernhardson: [[phab:T295478|T295478]] start snapshot of commonswiki_file from cirrus codfw -> swift eqiad
* 17:34 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM durum2001.codfw.wmnet
* 17:33 sukhe@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doh2002.wikimedia.org
* 17:31 cmjohnson1: upgrading msw's  in row D eqiad [[phab:T259758|T259758]]
* 17:28 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2002.wikimedia.org
* 17:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2012.codfw.wmnet with OS stretch
* 17:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum2002.codfw.wmnet with reason: apply new KVM machine settings
* 17:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum2002.codfw.wmnet with reason: apply new KVM machine settings
* 17:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum2001.codfw.wmnet with reason: apply new KVM machine settings
* 17:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum2001.codfw.wmnet with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on doh2002.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on doh2002.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on doh2001.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on doh2001.wikimedia.org with reason: apply new KVM machine settings
* 17:15 sukhe@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM doh2001.wikimedia.org
* 17:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug2002.codfw.wmnet
* 17:14 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
* 17:14 sukhe@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM doh2001.wikimedia.org
* 17:11 sukhe@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM doh2001.wikimedia.org
* 17:10 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug2002.codfw.wmnet
* 17:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM mwdebug2001.codfw.wmnet
* 17:02 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM mwdebug2001.codfw.wmnet
* 16:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM miscweb2002.codfw.wmnet
* 16:57 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM miscweb2002.codfw.wmnet
* 16:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM doc2001.codfw.wmnet
* 16:53 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM doc2001.codfw.wmnet
* 16:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2012.codfw.wmnet with OS stretch
* 16:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2001.codfw.wmnet
* 16:47 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2001.codfw.wmnet
* 16:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
* 16:39 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
* 16:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2011.codfw.wmnet with OS stretch
* 16:13 cmjohnson1: updating mgmt switches in row C, racks C2-C8 eqiad [[phab:T259758|T259758]]
* 15:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2011.codfw.wmnet with OS stretch
* 15:46 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 15:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2010.codfw.wmnet with OS stretch
* 15:41 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 15:31 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 15:27 Emperor: rolling restart of thanos frontends [[phab:T294380|T294380]]
* 15:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2010.codfw.wmnet with OS stretch
* 14:40 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
* 14:34 jbond@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=puppetboard
* 14:30 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
* 14:09 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:09 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 14:03 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 14:03 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:00 marostegui: Failover m5 from db1128 to db1132 - [[phab:T288720|T288720]]
* 14:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2006.codfw.wmnet with OS bullseye
* 13:50 godog: powercycle (again) ms-be2058
* 13:48 godog: add 80G to prometheus global in eqiad
* 13:31 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host prometheus2006.codfw.wmnet with OS bullseye
* 13:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2005.codfw.wmnet with OS bullseye
* 13:01 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
* 12:58 btullis@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching O:aqs: restarting to pick up new JRE - btullis@cumin1001
* 12:52 aborrero@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudbackup1002-dev.eqiad.wmnet
* 12:46 Lucas_WMDE: UTC morning backport+config window done
* 12:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:43 aborrero@cumin1001: START - Cookbook sre.ganeti.makevm for new host cloudbackup1002-dev.eqiad.wmnet
* 12:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:737503{{!}}Set up beta test environment for QuickSurveys (T293798)]] (beta only) (duration: 00m 55s)
* 12:31 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:29 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/modules/page/ext.proofreadpage.page.edit.js: Backport: [[gerrit:740784{{!}}OSD: Handle cases where the image srcset attr is not set (T296260)]] (duration: 00m 56s)
* 12:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:27 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:26 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/modules/page/ext.proofreadpage.page.edit.js: Backport: [[gerrit:740778{{!}}OSD: Add a ready hook for scripts (T180569)]] (duration: 00m 56s)
* 12:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:21 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:12 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:09 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
* 12:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
* 11:54 btullis@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching O:aqs: restarting to pick up new JRE - btullis@cumin1001
* 11:51 btullis@cumin1001: END (ERROR) - Cookbook sre.aqs.roll-restart (exit_code=97) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 11:51 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2002.codfw.wmnet
* 11:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2002.codfw.wmnet
* 11:25 godog: powercycle ms-be2058 - down and nothign on console
* 11:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5012.eqsin.wmnet with OS buster
* 11:15 vgutierrez: pool cp5012 (text) using HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 Amir1: start of mwscript migrateRevisionActorTemp.php --wiki=testwiki --sleep=5 ([[phab:T275246|T275246]])
* 11:05 jayme: cordoned kubestage1003.eqiad.wmnet kubestage1004.eqiad.wmnet (we have issues with POD IP prefix allocation) - [[phab:T293729|T293729]]
* 11:05 jayme: uncordoned kubestage1001.eqiad.wmnet kubestage1002.eqiad.wmnet (we have issues with POD IP prefix allocation) - [[phab:T293729|T293729]]
* 11:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:02 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:740807{{!}}Set test wikis to write both for actor temp table migration (T275246)]] (duration: 00m 56s)
* 10:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1155.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 ([[phab:T296143|T296143]])', diff saved to https://phabricator.wikimedia.org/P17800 and previous config saved to /var/cache/conftool/dbconfig/20211123-102234-ladsgroup.json
* 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1121.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1121.eqiad.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 10:19 urbanecm@deploy1002: Finished scap: {{Gerrit|c98acaa2ab27e630c0a1b55a64fb81b131c087f9}}: Backport localisation updates (duration: 11m 06s)
* 10:19 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:08 urbanecm@deploy1002: Started scap: {{Gerrit|c98acaa2ab27e630c0a1b55a64fb81b131c087f9}}: Backport localisation updates
* 10:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5012.eqsin.wmnet with OS buster
* 10:01 vgutierrez: depool cp5012 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 09:57 jayme: cordoned kubestage1001.eqiad.wmnet kubestage1002.eqiad.wmnet - [[phab:T293729|T293729]]
* 09:52 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 09:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1124.eqiad.wmnet with OS bullseye
* 09:27 Amir1: dropping useless GRANTs on s6 eqiad replicas without replication ([[phab:T296274|T296274]])
* 09:16 Amir1: dropping useless GRANTs on s6 eqiad master without replication ([[phab:T296274|T296274]])
* 09:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1124.eqiad.wmnet with OS bullseye
* 09:05 Amir1: fixing incorrect grants of wikiadmin on localhost in s6 master in codfw with replication
* 07:52 topranks: Adjusting BGP on cr1-eqiad and cr2-eqiad to keep MED unchanged in iBGP.
* 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
* 06:41 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 05:29 ryankemper: [[phab:T295705|T295705]] Downtimed `elastic2044` for one hour and doing a full reboot for good measure. Already ran the plugin upgrade: `DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install elasticsearch-oss wmf-elasticsearch-search-plugins`
* 05:26 ryankemper: [[phab:T295705|T295705]] Rolling restart of `codfw` complete. `elastic2044` was manually restarted earlier today so the cookbook didn't restart it (b/c we pass in a datetime cutoff threshold) so I'm manually upgrading and restarting that host
* 05:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 04:17 ryankemper: [[phab:T295705|T295705]] Properly disabled the sane-itizer; we don't want it running until after we (a) complete rolling restarts and (b) restore the missing `commonswikI_file` index (which is blocked on the restarts)
* 03:42 Amir1: ladsgroup@mwmaint1002:~$ cat broken_imgs {{!}} xargs -I <nowiki>{</nowiki><nowiki>}</nowiki> mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --verbose --mime 'image/*' --force --batch-size 1 --sleep 1 --start=<nowiki>{</nowiki><nowiki>}</nowiki> --end=<nowiki>{</nowiki><nowiki>}</nowiki> ([[phab:T296001|T296001]])
* 03:37 Amir1: rebuilding metadata of all djvu files outside of commons ([[phab:T296001|T296001]])
* 03:06 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:58 ryankemper: [[phab:T295705|T295705]] `elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.codfw.wmnet', port=9243): Read timed out. (read timeout=60))` Probably transient failure; will wait 10 mins and try again
* 02:57 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:55 ryankemper: [[phab:T295705|T295705]] `ryankemper@cumin1001:~$ sudo cookbook sre.elasticsearch.rolling-operation codfw "codfw plugin upgrade + restart" --upgrade --nodes-per-run 2 --start-datetime 2021-11-18T18:55:54 --task-id [[phab:T295705|T295705]]` on tmux `rolling_restarts_codfw`
* 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (2 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:17 urbanecm: UTC late window done
* 01:17 urbanecm@deploy1002: Finished scap: {{Gerrit|69aa4a7}}: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 4/4) (duration: 25m 50s)
* 01:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:51 urbanecm@deploy1002: Started scap: {{Gerrit|69aa4a7}}: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 4/4)
* 00:50 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/autoload.php: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 3/4) (duration: 00m 55s)
* 00:49 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/includes/specials/: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 2/4) (duration: 00m 55s)
* 00:48 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/includes/specialpage/SpecialPageFactory.php: {{Gerrit|7c0e074}}: Revert "Create redirect Special Pages for delete and protect action" ([[phab:T295611|T295611]]; [[phab:T296203|T296203]]; 1/4) (duration: 00m 56s)
* 00:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b9209433dfc8b1f81a165ec75867337800db24b1}}: Enable reading depth instrumentation at low sampling rate ([[phab:T294777|T294777]]) (duration: 00m 56s)
* 00:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:30 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents: {{Gerrit|3f860c7}}: {{Gerrit|fa9fbf1}}: WikimediaEvents bbackports (2/2; [[phab:T294777|T294777]]) (duration: 00m 55s)
* 00:28 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/extension.json: {{Gerrit|3f860c72bca817c40486b90f0d8e0ffca72b2690}}: Restore ReadingDepth instrument (1/2) (duration: 00m 56s)
* 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:20 jeena: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/739908


== June 28 ==
== 2021-11-22 ==
* 23:51 logmsgbot: ori Synchronized php-1.26wmf11/extensions/CentralNotice/modules/ext.centralNotice.bannerController/bannerController.js: I6ffdc977e87: Parse older format of Geo cookies (duration: 00m 13s)
* 23:55 mutante: acmechief1001, acmechief-test1001: sudo systemctl restart reload-acme-chief-backend.timer
* 04:30 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 28 04:30:54 UTC 2015 (duration 30m 53s)
* 23:54 mutante: acmechief1001, acmechief-test1001: sudo systemctl start reload-acme-chief-backend.timer
* 02:20 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-28 02:20:52+00:00
* 23:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe2011.codfw.wmnet with OS stretch
* 02:17 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 05m 56s)
* 23:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe2010.codfw.wmnet with OS stretch
* 23:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2011.codfw.wmnet with OS stretch
* 23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2009.codfw.wmnet with OS stretch
* 22:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2010.codfw.wmnet with OS stretch
* 22:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2009.codfw.wmnet with OS stretch
* 22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2028.codfw.wmnet with OS buster
* 21:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2028.codfw.wmnet with OS buster
* 21:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2027.codfw.wmnet with OS buster
* 21:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2027.codfw.wmnet with OS buster
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:23 legoktm@deploy1002: Synchronized wmf-config/PoolCounterSettings.php: Lower CirrusSearch maxqueues to be closer to number of workers (duration: 00m 56s)
* 20:01 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 19:49 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:46 urbanecm: Evening B&C window completed
* 19:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:44 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/: {{Gerrit|10b8440069ac71434274462c545c6b2b2c9182d9}}: Use the WikiEditor ready hook instead of using() the lib ([[phab:T296033|T296033]]) (duration: 00m 56s)
* 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b6b05e30b3c9b4007fd31ab0698507d7a48d1caf}}: kswiki: set wgTranslateNumerals to false ([[phab:T296055|T296055]]) (duration: 00m 55s)
* 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4aa8d5bf465bfc3fee2ec547718af0c779f88ef4}}: Enable SandboxLink on lawiki ([[phab:T296073|T296073]]) (duration: 00m 56s)
* 19:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1c082bec4c74c156b26af4349488835902c5bacd}}: Enable mapframe on the Indonesian Wikipedia ([[phab:T295571|T295571]]) (duration: 00m 56s)
* 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:05 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:01 vgutierrez: pool cp4032 (text) using HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 18:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001
* 17:50 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:48 XioNoX: repool codfw
* 17:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4032.ulsfo.wmnet with OS buster
* 17:46 ejegg: updated fundraising python tools from {{Gerrit|d90f4c91}} -> {{Gerrit|d1d7b100}}
* 17:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:32 ebernhardson: restart both elasticsearch instances on elastic2044, reporting `connection refused` (after a brief period of `no route to host`) to masters even though the connection works outside elastic
* 17:01 ryankemper: [[phab:T295705|T295705]] Beginning rolling restart w/ plugin upgrade of `cloudelastic`: `ryankemper@cumin1001:~$ sudo cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic plugin upgrade + restart" --upgrade --nodes-per-run 3 --start-datetime 2021-11-22T16:59:38 --task-id [[phab:T295705|T295705]]` on tmux `rolling_restarts_cloudelastic`
* 17:00 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001
* 16:58 ryankemper: [Elastic] [[phab:T295705|T295705]] Rolling restart w/ plugin upgrade of `relforge` is complete
* 16:55 ryankemper: [Elastic] [[phab:T295705|T295705]] Restarting second and final relforge host: `ryankemper@relforge1003:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad.service elasticsearch_6@relforge-eqiad-small-alpha.service logstash.service`
* 16:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4032.ulsfo.wmnet with OS buster
* 16:52 ryankemper: [Elastic] [[phab:T295705|T295705]] Restarting first relforge host: `ryankemper@relforge1004:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad.service elasticsearch_6@relforge-eqiad-small-alpha.service logstash.service`
* 16:51 jayme: fleet wide updated wmf-certificates to 0~20211122-1
* 16:50 vgutierrez: depol cp4032 to be reimaged as cache::text_haproxy - [[phab:T290005|T290005]]
* 16:49 ryankemper: [Elastic] [[phab:T295705|T295705]] Downtimed relforge* for 2 hours in order to performing a manual rolling restart of the two hosts `relforge1003` and `relforge1004`
* 16:44 ryankemper: [[phab:T295705|T295705]] Upgrading `relforge` elasticsearch packages: `ryankemper@cumin1001:~$ sudo cumin -b 2 'relforge*' 'DEBIAN_FRONTEND=noninteractive sudo apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" install elasticsearch-oss wmf-elasticsearch-search-plugins'`
* 16:39 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 16:15 urbanecm: Password reset for Miraki@arbcom_dewiki per private request
* 16:15 moritzm: installing postgresql-13 security updates on bullseye
* 15:56 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:55 XioNoX: Telia DDoS auto-mitigation enabled on all circuits - [[phab:T288926|T288926]]
* 15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:28 Amir1: revoking DROP for wikiadmin from db1100 ([[phab:T249683|T249683]])
* 15:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2006.codfw.wmnet with OS bullseye
* 15:17 moritzm: set kvm:machine_version=pc-i440fx-2.8 for Ganeti cluster in codfw [[phab:T294119|T294119]]
* 15:16 jayme: imported wmf-certificates 0~20211122-1 to stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 15:13 _joe_: restarting pybal low-traffic in codfw, eqiad
* 15:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:58 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab-runner1001.wikimedia.org
* 14:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734426{{!}}Disable DPL on opt-in wikis where not in use (T287916)]] (duration: 00m 56s)
* 14:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2006.codfw.wmnet with OS bullseye
* 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:51 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734425{{!}}Disable DPL on Wikiversities where not in use (T287916)]] (duration: 00m 56s)
* 14:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734424{{!}}Disable DPL on Wikisources where not in use (T287916)]] (duration: 00m 56s)
* 14:44 jelto@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab-runner1001.wikimedia.org
* 14:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:23 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
* 14:06 akosiaris: repool wtp1025, wtp1041 to parsoid cluster. [[phab:T296098|T296098]]
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=parsoid,name=wtp1041.eqiad.wmnet
* 14:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=parsoid,name=wtp1025.eqiad.wmnet
* 13:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus2005.codfw.wmnet with OS bullseye
* 13:32 XioNoX: re-enable pybal on lvs2007 - [[phab:T295118|T295118]]
* 13:31 XioNoX: re-enable puppet on lvs2007
* 13:30 XioNoX: re-enabling V6 between cr2-codfw and asw-b-codfw - [[phab:T295118|T295118]]
* 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
* 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:20 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.9
* 13:04 XioNoX: asw-b-codfw# set virtual-chassis member 7 mastership-priority 255 - [[phab:T295118|T295118]]
* 12:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:51 Lucas_WMDE: UTC morning backport+config window done
* 12:51 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/includes/ProofreadPageLuaLibrary.php: Backport: [[gerrit:740556{{!}}Lua: use LinkBatch to speed up the template dependencies (T296092)]] (2/2) (duration: 01m 03s)
* 12:49 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/ProofreadPage/includes/Pagination/Pagination.php: Backport: [[gerrit:740556{{!}}Lua: use LinkBatch to speed up the template dependencies (T296092)]] (1/2) (duration: 01m 04s)
* 12:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:47 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/ProofreadPage/includes/ProofreadPageLuaLibrary.php: Backport: [[gerrit:740558{{!}}Lua: use LinkBatch to speed up the template dependencies (T296092)]] (2/2) (duration: 01m 03s)
* 12:45 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/ProofreadPage/includes/Pagination/Pagination.php: Backport: [[gerrit:740558{{!}}Lua: use LinkBatch to speed up the template dependencies (T296092)]] (1/2) (duration: 01m 04s)
* 12:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:19 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: 1.37.0 is out now, so there's no beta [[phab:T289585|T289585]] (duration: 01m 04s)
* 12:11 hashar@deploy1002: Synchronized php-1.38.0-wmf.9/skins/MinervaNeue: Fix banners to show CentralNotice - [[phab:T296077|T296077]] (duration: 01m 04s)
* 11:50 moritzm: installing qemu security updates on bullseye
* 11:46 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:43 moritzm: installing krb5 security updates on stretch
* 11:41 oblivian@cumin1001: START - Cookbook sre.dns.netbox
* 11:39 oblivian@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 11:36 oblivian@cumin1001: START - Cookbook sre.dns.netbox
* 11:34 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
* 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti-test2003.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 11:26 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti-test2003.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 11:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
* 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
* 11:20 XioNoX: re-enable LibertyGlobal in esams
* 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1125.eqiad.wmnet with OS bullseye
* 11:12 XioNoX: Revert "prepend_as_out for esams/knams"
* 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2003.codfw.wmnet with OS buster
* 10:54 elukey: apt-get purge up to linux-image-4.9.0-14-amd64 on sodium to free /boot space
* 10:49 elukey: `apt-get remove linux-image-4.9.0-5-amd64 linux-image-4.9.0-6-amd64` on sodium to free /boot
* 10:45 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2003.codfw.wmnet with OS buster
* 10:25 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1125.eqiad.wmnet with OS bullseye
* 10:16 jbond: restart snmp gracefully cr2-eqord
* 10:12 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 09:45 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 09:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 09:38 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1125.eqiad.wmnet with OS bullseye
* 09:35 moritzm: installing Linux 4.9.272 updates on Stretch hosts
* 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:24 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 09:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|24b3a7769ca97e3ed951d77d911f41afae5e4136}}: Growth: Disable filtering by unstarred mentees at arwiki, enwiki, fawiki ([[phab:T293182|T293182]]) (duration: 01m 04s)
* 09:14 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 09:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:08 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1125.eqiad.wmnet with OS bullseye
* 09:05 moritzm: installing 4.19.208-1 kernels on Stretch hosts with 4.19 kernels
* 09:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:49 moritzm: drain ganeti-test2003 for forthcoming reimage
* 08:44 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1125.eqiad.wmnet with OS bullseye
* 08:44 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/: {{Gerrit|4418c4367b7420139cd8b30cb003d697b58c618f}}: ApiSetMentorStatus: Use READ_LATEST to request back timestamp ([[phab:T295305|T295305]]) (duration: 01m 08s)
* 08:42 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:31 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17793 and previous config saved to /var/cache/conftool/dbconfig/20211122-082525-root.json
* 08:15 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'apple-search' for release 'main' .
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17792 and previous config saved to /var/cache/conftool/dbconfig/20211122-081022-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17791 and previous config saved to /var/cache/conftool/dbconfig/20211122-075518-root.json
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 40%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17790 and previous config saved to /var/cache/conftool/dbconfig/20211122-074015-root.json
* 07:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance [[phab:T296143|T296143]]
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17789 and previous config saved to /var/cache/conftool/dbconfig/20211122-072511-root.json
* 07:17 Amir1: running optimize table on image table in commonswiki on codfw with replication enabled, it'll cause replication lag ([[phab:T296143|T296143]])
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 20%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17788 and previous config saved to /var/cache/conftool/dbconfig/20211122-071006-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17787 and previous config saved to /var/cache/conftool/dbconfig/20211122-065502-root.json
* 06:46 marostegui: Revoke dump grants for scholarships database [[phab:T296166|T296166]]
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17786 and previous config saved to /var/cache/conftool/dbconfig/20211122-063959-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17785 and previous config saved to /var/cache/conftool/dbconfig/20211122-062455-root.json
* 03:30 Amir1: run optimize table on db2140 for image table ([[phab:T296143|T296143]])


== June 27 ==
== 2021-11-21 ==
* 23:30 bd808: Deleted corrupt shards on logstash1004 and logstash1005. Recovery in process
* 13:17 dcausse: restarting blazegraph on wdqs1007 (jvm stuck for 10h)
* 20:12 ori: Delegated full access to Google Webmaster Tools for myself (olivneh@).
* 07:26 XioNoX: cr1-eqiad# deactivate protocols bgp group Confed_eqord
* 04:58 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 27 04:58:46 UTC 2015 (duration 58m 45s)
* 05:22 Amir1: running clean up of djvu files in all wikis ([[phab:T275268|T275268]])
* 02:23 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-27 02:23:40+00:00
* 05:13 Amir1: end of djvu metadata maint script run ([[phab:T275268|T275268]])
* 02:20 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 05m 46s)


== June 26 ==
== 2021-11-20 ==
* 23:57 bd808: Logstash log ingestion working again after forcing recovery of replicas for logstash-2015.06.26; new logs were being rejected with only a primary shard available
* 01:02 mutante: lists1001 - restarted apache, icinga alerts for the web UI, but recovered
* 23:54 bd808: re-enabled allocation on logstash elasticsearch cluster
* 00:27 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 23:05 bblack: restarted gitblit on antimony, AGAIN
* 00:26 cdanis@cumin1001: START - Cookbook sre.network.cf
* 22:57 mutante: restarted gitblit
* 00:25 bblack: lvs3005 - re-enabling puppet + pybal
* 22:43 logmsgbot: catrope Synchronized php-1.26wmf11/extensions/Flow: Temporarily make subpages in Flow-occupied namespaces non-Flow again (duration: 00m 14s)
* 00:25 legoktm@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 22:36 bd808: set indices.recovery.concurrent_streams to 4 on logstash ES cluster
* 00:25 legoktm@cumin1001: START - Cookbook sre.network.cf
* 22:36 godog: set indices.recovery.max_bytes_per_sec to 10mb on logstash ES cluster
* 00:24 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 22:25 godog: set indices.recovery.max_bytes_per_sec to 50mb on logstash ES cluster
* 00:23 cdanis@cumin1001: START - Cookbook sre.network.cf
* 22:25 jamesofur: Reset email address of User:Chwms identity verified in person at editathon
* 00:06 bblack: lvs3005 - disabling puppet and stopping pybal (traffic will go to lvs3007)
* 22:09 bd808: restarted logstash on logstash1001
* 21:10 urandom: taking xenon down to be rebootstrapped
* 20:10 bd808: Deleted 4 corrupt indices (logstash-2015.05.30 logstash-2015.05.31 logstash-2015.06.03 logstash-2015.06.06) on logstash1004
* 19:58 bd808: stopping elasticsearch on logstash1004 to cleanup corrupt shards
* 17:05 mutante: zirconium - manual cleanup, removing planet
* 17:04 godog: reverted cronolog puppetmaster patch, restarting apache
* 14:17 Krenair: Deployed patch for T103391
* 12:23 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/221105/ (duration: 00m 12s)
* 12:18 _joe_: added conf1001 to the etcd cluster
* 07:57 logmsgbot: krinkle Synchronized php-1.26wmf11/extensions/Popups: T103610 (duration: 00m 11s)
* 06:04 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun 26 06:04:14 UTC 2015 (duration 4m 13s)
* 05:22 twentyafterfour: restarted apache on iridium to fix phabricator fatal
* 02:33 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-26 02:33:33+00:00
* 02:30 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 05m 36s)
* 00:51 gwicke: reverted restbase1001 canary to 90817c2a
* 00:36 logmsgbot: ori Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi: I0e5f2d3b2: Updated mediawiki/core Project: mediawiki/extensions/SyntaxHighlight_GeSHi (duration: 00m 11s)
* 00:16 logmsgbot: krinkle Synchronized wmf-config/InitialiseSettings.php: T102852 (duration: 00m 12s)
* 00:15 logmsgbot: krinkle Synchronized w/static/images/project-logos/zhwiki-2x.png: T102852 (duration: 00m 13s)
* 00:14 logmsgbot: krinkle Synchronized w/static/images/project-logos/zhwiki-1.5x.png: T102852 (duration: 00m 12s)
* 00:05 logmsgbot: krinkle Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi/modules/pygments.wrapper.css: I5d1510dc80d6d4712ca8411 (duration: 00m 12s)


== June 25 ==
== 2021-11-19 ==
* 23:53 mutante: planet1001 (ganeti) - signing puppet cert, initial run
* 23:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus2005.codfw.wmnet with OS bullseye
* 23:31 mutante: apt-get upgrade on zirconium
* 23:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
* 23:28 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/220847/ (duration: 00m 12s)
* 23:24 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host prometheus2005.codfw.wmnet with OS bullseye
* 23:27 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/220847/ (duration: 00m 11s)
* 23:15 mutante: LDAP - added mmartorana to wmf (91354e9e-5706-4289-9a60-{{Gerrit|98e8a7632853}}) [[phab:T295789|T295789]]
* 23:24 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi: https://gerrit.wikimedia.org/r/#/c/220997/ (duration: 00m 13s)
* 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus2005.codfw.wmnet with OS bullseye
* 23:20 gwicke: canary update of restbase on restbase1001 to 4b961f166 (deploy d1c4d9961)
* 20:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2018.codfw.wmnet with OS stretch
* 23:16 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/218926/ (duration: 00m 12s)
* 20:21 mutante: phabricator - adding eigyan to WMF-NDA (phab projectt 61 - https://phabricator.wikimedia.org/project/members/61/ ) - since that is now standard when adding people to the wmf LDAP group ([[phab:T295928|T295928]])
* 23:11 logmsgbot: krenair Synchronized wmf-config/logging.php: https://gerrit.wikimedia.org/r/#/c/220784/ (duration: 00m 13s)
* 20:20 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor2002.codfw.wmnet
* 23:03 legoktm: fixed content models on lrcwiki for Module namespace
* 20:05 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor2002.codfw.wmnet
* 23:02 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220485/ (duration: 00m 16s)
* 20:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw2280.codfw.wmnet
* 22:02 logmsgbot: hoo Synchronized php-1.26wmf11/extensions/Wikidata/: Update Wikidata: Use SELECT FOR UPDATE in SqlIdGenerator (duration: 00m 20s)
* 19:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2018.codfw.wmnet with OS stretch
* 21:29 godog: rm /var/lib/git/operations/puppet/modules/cassandra from labcontrol1001 labcontrol1002
* 19:51 mutante: shutting down undead server mw2280 - not icinga and puppetdb but in debmonitor and still has IP and puppet cert
* 21:10 godog: rm /var/lib/git/operations/puppet/modules/cassandra from rhodium
* 19:45 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw2280.codfw.wmnet
* 21:07 godog: rm /var/lib/git/operations/puppet/modules/cassandra from strontium and palladium
* 18:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
* 21:06 godog: push puppet.git after module/cassandra removal T92560
* 18:10 andrew@deploy1002: Finished deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone (duration: 04m 19s)
* 20:41 mutante: deleted SVN monitor from watchmouse
* 18:06 andrew@deploy1002: Started deploy [horizon/deploy@ba16257]: moving the proxy endpoint behind keystone
* 20:18 mutante: bye SVN - subversion URLs now redirect to phab or doc
* 17:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:08 logmsgbot: nikerabbit Finished scap: T103888 CX aliases (duration: 22m 37s)
* 17:41 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:46 logmsgbot: nikerabbit Started scap: T103888 CX aliases
* 17:25 andrew@deploy1002: Finished deploy [horizon/deploy@ee83e27]: fixing sudo rule editing (duration: 04m 10s)
* 18:09 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf11
* 17:21 andrew@deploy1002: Started deploy [horizon/deploy@ee83e27]: fixing sudo rule editing
* 17:46 logmsgbot: krenair Synchronized wmf-config: (no message) (duration: 00m 31s)
* 17:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:43 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/218098/ (duration: 00m 12s)
* 17:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:43 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/218098/ (duration: 00m 12s)
* 16:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:18 logmsgbot: ori Synchronized php-1.26wmf11/resources/src/mediawiki.skinning/elements.css: Ieab6b1473e6ce: תיקון טעות (duration: 00m 12s)
* 16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:59 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/219599/ (duration: 00m 12s)
* 16:42 thcipriani@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]] [[phab:T296098|T296098]]"
* 15:57 logmsgbot: krenair Synchronized wmf-config/CommonSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/217539/ - noop for prod, labs only part (duration: 00m 12s)
* 16:35 thcipriani: rolling back to group0 for [[phab:T296098|T296098]]
* 15:56 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/217539/ (duration: 00m 13s)
* 16:20 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
* 15:51 logmsgbot: krenair Synchronized wmf-config/flaggedrevs.php: https://gerrit.wikimedia.org/r/#/c/203370/ (duration: 00m 12s)
* 15:31 akosiaris: roll restart wtp10* php7.2-fpm excluding wtp1025, wtp1041
* 15:49 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/218539/ (duration: 00m 15s)
* 15:29 akosiaris: depooling wtp1041, wtp1025 from traffic. The entire of the parsoid cluster is in a memory pressure situation, it looks like a rolling restart of php-fpm will alleviate the pressure and gives us some time to drill more on the problem before the pressure builds up again.
* 15:32 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/220068/ - noop for prod, just labs (duration: 00m 12s)
* 15:28 akosiaris@cumin1001: conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1025.eqiad.wmnet
* 15:30 logmsgbot: krenair Synchronized commonsuploads.dblist: https://gerrit.wikimedia.org/r/#/c/220715/ (duration: 00m 12s)
* 15:28 akosiaris@cumin1001: conftool action : set/pooled=no; selector: cluster=parsoid,name=wtp1041.eqiad.wmnet
* 15:24 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220747/ (duration: 00m 12s)
* 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 15:16 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220408/ (duration: 00m 12s)
* 14:49 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 15:12 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/SemanticForms/includes/SF_AutoeditAPI.php: https://gerrit.wikimedia.org/r/#/c/220765/ (duration: 00m 12s)
* 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
* 15:04 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220706/ (duration: 00m 12s)
* 14:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
* 15:02 logmsgbot: krenair Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/220653/ (duration: 00m 12s)
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2001.codfw.wmnet with OS buster
* 13:30 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Repool es2003 (but not es2004) after maintenance (duration: 00m 12s)
* 14:15 jayme: fleet wide updated wmf-certificates to 0~20211119-1
* 10:57 jynus: rebooting es2003 and es2004
* 13:56 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2001.codfw.wmnet with OS buster
* 10:40 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: depool es2003 and es2004 for maintenance (duration: 00m 13s)
* 13:23 moritzm: draining instances from ganeti-test2001 for reimage [[phab:T284811|T284811]]
* 10:09 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool db1018 (duration: 00m 12s)
* 13:02 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:02 jynus: restarting mysqld on db1018
* 12:10 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:42 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: depool db1018 for maintenance (duration: 00m 13s)
* 12:06 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:33 logmsgbot: ori Synchronized php-1.26wmf11/resources/src/mediawiki.skinning/elements.css: I0e5f2d3b2: Wrap lines in <nowiki><pre></nowiki> and .mw-code by default (duration: 00m 12s)
* 11:54 hnowlan: roll-restarting cassandra on eqiad maps for java updates
* 06:59 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 25 06:59:13 UTC 2015 (duration 59m 12s)
* 11:36 jayme: imported wmf-certificates 0~20211119-1 to stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 04:04 ori: restarted apache2 on palladium
* 09:53 XioNoX: run `commit full` on asw-b-codfw - [[phab:T295118|T295118]]
* 03:11 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-25 03:11:01+00:00
* 09:30 XioNoX: re-enable cr2-codfw<->asw-b7-codfw link after disabling inet6 on cr2-codfw:ae2 - [[phab:T295118|T295118]]
* 03:04 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 10m 19s)
* 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 02:40 bblack: puppet re-enabled on caches
* 08:46 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 02:37 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-25 02:37:44+00:00
* 08:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:34 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 06m 44s)
* 08:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: update wmf-netbox - ayounsi@cumin1001
* 02:04 bblack: disabling puppet on cp* caches for patch-testing
* 08:29 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: update wmf-netbox - ayounsi@cumin1001
* 00:43 awight: update crm from bd8a00196071ddd04efbff7b30567dd9357c9000 to e923225e423948bd70440e2d1131460b10cefac1
* 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:38 godog: upgrade cassandra to 2.1.7 on restbase1008
* 08:26 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes: Backport: [[gerrit:739841{{!}}Revert "Title: use PageStore instead of LinkCache"]] (duration: 01m 03s)
* 00:30 twentyafterfour: phabricator upgrade completed
* 08:23 ayounsi@deploy1002: Finished deploy [homer/deploy@dc007aa]: Homer CR738905 (duration: 01m 25s)
* 00:28 godog: upgrade cassandra to 2.1.7 on restbase1004
* 08:22 ayounsi@deploy1002: Started deploy [homer/deploy@dc007aa]: Homer CR738905
* 00:12 legoktm: <twentyafterfour> Phabricator upgrade happening now. Will be down for a few minutes.
* 08:17 moritzm: installing mariadb-10.5 security updates on bullseye (as packaged in Debian, not the wmf-internal packages)
* 06:55 marostegui: Reboot db1132 to pick up new kernel [[phab:T288720|T288720]]
* 06:23 marostegui: Upgrade clouddb1019
* 05:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:55 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/media/DjVuImage.php: Backport: [[gerrit:739838{{!}}media: Store metadata of one-page documents correctly (T296001)]] (duration: 00m 56s)
* 02:54 tgr@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/GrowthExperiments/modules: Backport: [[gerrit:739837{{!}}Lazy-load structured task JS files (T296049)]] (duration: 00m 55s)
* 02:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
* 02:02 mutante: [puppetmaster1001:/var/run/confd-template] $ sudo rm .git-ssh*.err
* 02:01 mutante: [puppetmaster2001:/var/run/confd-template] $  sudo rm .git-ssh*.err
* 01:57 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor2001.codfw.wmnet
* 01:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 01:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=phab2001-vcs.codfw.wmnet
* 01:45 mutante: I think git-ssh6_22 is down (see alerts lvs2008/2009) due to the v6 issue from ongoing lvs maintenance. depooled in conftool
* 01:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 01:40 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor2001.codfw.wmnet
* 01:37 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 01:35 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.9/extensions/Cite/modules/ve-cite/ve.dm.MWReferenceNode.js: Backport for [[phab:T296044|T296044]] (duration: 00m 55s)
* 01:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:31 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 01:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor2002.codfw.wmnet
* 01:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor2001.codfw.wmnet
* 01:19 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 01:18 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 01:09 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2002.codfw.wmnet
* 01:09 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2001.codfw.wmnet
* 01:05 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor2006.codfw.wmnet
* 01:05 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor2005.codfw.wmnet
* 00:56 legoktm@cumin1001: conftool action : set/weight=5; selector: name=thumbor2006.codfw.wmnet
* 00:56 legoktm@cumin1001: conftool action : set/weight=5; selector: name=thumbor2005.codfw.wmnet
* 00:55 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor2006.codfw.wmnet
* 00:55 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor2005.codfw.wmnet
* 00:33 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 00:08 brennen: end of UTC late deployment training window


== June 24 ==
== 2021-11-18 ==
* 23:18 logmsgbot: rmoen Synchronized wmf-config/mobile.php: Enable browse experiment on test and enwiki (duration: 00m 14s)
* 23:47 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 23:17 logmsgbot: rmoen Synchronized wmf-config/InitialiseSettings.php: Enable browse experiment on test and enwiki (duration: 00m 12s)
* 23:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1001.eqiad.wmnet,service=miscweb
* 23:13 urandom: rolling restart of Cassandra staging cluster
* 23:28 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:04 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/CentralAuth: https://gerrit.wikimedia.org/r/#/c/220637/ (duration: 00m 13s)
* 23:27 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:03 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/UserMerge: https://gerrit.wikimedia.org/r/#/c/220638/ (duration: 00m 13s)
* 22:52 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 22:32 mutante: zirconium - stop using 443 at all, rm NameVirtualHost *:443
* 22:48 XioNoX: asw-b-codfw> request system power-off member 7
* 22:30 mutante: zirconium - deleting unused apache configs, bugzilla, etherpad, ...
* 22:44 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 21:09 godog: start cassandra on restbase1008
* 22:28 mutante: icinga (alert1001) - manually fix IP of mw1488.mgmt (was 0.0.0.0  is: 10.65.1.26) in /etc/icinga/objects/puppet_hosts.cfg , running puppet
* 18:41 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf11
* 22:06 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor1003.eqiad.wmnet
* 18:02 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/Flow/includes/Specials/SpecialEnableFlow.php: https://gerrit.wikimedia.org/r/#/c/220514/ (duration: 00m 15s)
* 21:53 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor1003.eqiad.wmnet
* 17:24 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool es2001 and es2002 after maintenance (duration: 00m 13s)
* 21:50 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thumbor1004.eqiad.wmnet
* 17:05 thcipriani: scap completed with the exception of snapshot1001 that's disk is full
* 21:36 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts thumbor1004.eqiad.wmnet
* 17:04 logmsgbot: thcipriani scap failed: OSError [Errno 2] No such file or directory: '/var/lock/scap' (duration: 41m 33s)
* 21:31 XioNoX: asw-b-codfw> request system power-off member 7
* 16:22 logmsgbot: thcipriani Started scap: SWAT: Automatically add to shell group when adding to a project [[gerrit:220468]]
* 21:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor1004.eqiad.wmnet
* 16:10 logmsgbot: ori Synchronized php-1.26wmf11/includes/page/Article.php: I0e5f2d3b2: Revert r47388 / 8d9243cf3: Use Title::getLocalURL() for rel=canonical links (duration: 00m 13s)
* 21:30 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=thumbor1003.eqiad.wmnet
* 15:57 logmsgbot: thcipriani Synchronized wmf-config: SWAT: Revert Enable browse prototype on test- and enwiki (duration: 00m 15s)
* 21:01 ejegg: updated payments-wiki from {{Gerrit|abb2bd9d}} -> {{Gerrit|d1d6f024}}
* 15:49 jynus: rebooting es2001 and es2002
* 21:00 mutante: [puppetmaster1001:/var/run/confd-template] $ sudo rm .git-ssh*.err
* 15:44 logmsgbot: thcipriani Synchronized wmf-config: SWAT: Enable browse prototype on test- and enwiki [[gerrit:219451]] (duration: 00m 12s)
* 21:00 mutante: [puppetmaster2001:/var/run/confd-template] $ sudo rm .git-ssh*.err
* 15:24 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ContentTranslation in testwiki [[gerrit:220385]] (duration: 00m 12s)
* 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:17 logmsgbot: thcipriani Synchronized php-1.26wmf11/extensions/ContentTranslation: SWAT: Enable publish button when the preference is not to use initial translation (duration: 00m 12s)
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:14 andrewbogott: disabled puppet on labcontrol1001 to hotfix https://gerrit.wikimedia.org/r/#/c/220476/
* 20:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
* 15:08 logmsgbot: thcipriani Synchronized php-1.26wmf10/extensions/ContentTranslation: SWAT: Enable publish button when the preference is not to use initial translation (duration: 00m 13s)
* 20:51 dcausse: restart blazegraph on wdqs1006 (jvm stuck)
* 14:53 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: depool es2001 and es 2002 for maintenance (duration: 00m 13s)
* 20:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 14:12 logmsgbot: krenair Synchronized php-1.26wmf10/extensions/SemanticForms/includes/SF_AutoeditAPI.php: T103653 live hack (duration: 00m 13s)
* 20:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 10:44 _joe_: restarting jmxtrans on analytics1021
* 20:45 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=phab2001-vcs.codfw.wmnet
* 10:31 jgage: restarting kafka on analytics1021
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:10 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Switchover master es1008 -> es1009 (duration: 00m 12s)
* 20:43 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]]
* 09:24 hashar: removing java 6 from gallium and lanthanum https://phabricator.wikimedia.org/T103491
* 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:17 hashar: apt-get upgrade on gallium and lanthanum
* 20:31 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]] (duration: 01m 03s)
* 09:16 jynus: performing a master failover of es1008 into es1009
* 20:30 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]]
* 08:27 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1004 (duration: 00m 14s)
* 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:46 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jun 24 05:46:32 UTC 2015 (duration 46m 31s)
* 20:27 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.9/tests/phpunit/includes/page/PageStoreTest.php: Backport for [[phab:T295931|T295931]] (duration: 01m 03s)
* 05:12 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1045 (duration: 00m 13s)
* 20:25 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.9/includes/page/PageStore.php: Backport for [[phab:T295931|T295931]] (duration: 01m 04s)
* 05:03 jgage: removed old logs and did 'apt-get clean' on analytics1021 to make space
* 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:00 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-24 03:00:45+00:00
* 20:05 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:54 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 10m 34s)
* 20:01 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 02:28 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-24 02:28:16+00:00
* 19:53 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1004.eqiad.wmnet
* 02:24 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 07m 21s)
* 19:52 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1003.eqiad.wmnet
* 01:39 logmsgbot: ori Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi: I0e5f2d3b2 (duration: 00m 13s)
* 19:52 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor1006.eqiad.wmnet
* 01:01 gwicke: rolling restart of cassandra instances to rule out a single node in funky state causing elevated p99 latency
* 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:43 ori: experimenting with httpd on mw1041 again
* 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:19 gwicke: rolling restart of restbase instances to rule out backend connections as a source for high p99 latencies
* 19:24 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 00:14 ori: experimenting with HHVM shutdown via /stop on the admin server on mw1041
* 19:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4b4c0bca9aa6bceac86f40f03ad688b9d4481c58}}: Enable DiscussionTools automatic topic subscriptions as beta feature on most wikis ([[phab:T290500|T290500]]) (duration: 01m 04s)
* 19:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:13 twentyafterfour: upgrading php7.3 packages on phab1001
* 19:07 twentyafterfour: rebooting phab2001 to apply updated php and kernel packages
* 19:06 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-test-coord1002.eqiad.wmnet with OS bullseye
* 19:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2001.codfw.wmnet with reason: kernel upgrade
* 19:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on phab2001.codfw.wmnet with reason: kernel upgrade
* 18:57 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart with plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade + restart - ryankemper@cumin1001 - [[phab:T295705|T295705]]
* 18:52 XioNoX: asw-b-codfw> request system reboot member 7 - [[phab:T295118|T295118]]
* 18:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bullseye
* 17:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:49 XioNoX: asw-b-codfw> request system power-off member 7 - [[phab:T295118|T295118]]
* 15:39 XioNoX: lvs2007:~$ sudo service pybal stop - [[phab:T295118|T295118]]
* 15:36 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 15:35 XioNoX: cr2-codfw# set interfaces et-1/0/3 disable - [[phab:T295118|T295118]]
* 15:34 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 15:33 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 15:16 hnowlan: roll restarting cassandra on codfw maps for java updates
* 15:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
* 14:44 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 14:38 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
* 14:37 hnowlan: roll-restarting sessionstore for java updates
* 14:19 moritzm: installing testvm2003
* 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2003.codfw.wmnet
* 13:34 moritzm: installing pam bugfix updates on bullseye hosts
* 13:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
* 13:22 moritzm: failover ganeti master in test cluster to ganeti-test2002 [[phab:T284811|T284811]]
* 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudcephosd1016.wikimedia.org
* 12:23 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudcephosd1016.wikimedia.org
* 12:21 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 12:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:20 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 12:20 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mc1025.eqiad.wmnet
* 12:16 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mc1025.eqiad.wmnet
* 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mc1026.eqiad.wmnet
* 12:16 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mc1026.eqiad.wmnet
* 12:15 marostegui: Upgrade dbstore1007 to 10.4.22 [[phab:T290841|T290841]] [[phab:T295970|T295970]]
* 12:15 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:739550{{!}}Enable Tamil (ta) Section Translation in test wiki (T294223)]] (duration: 01m 05s)
* 12:06 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6003.drmrs.wmnet with OS buster
* 11:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6002.drmrs.wmnet with OS buster
* 11:29 arturo: aborrero@apt1001:~$ sudo -i reprepro export
* 11:27 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host lvs6003.drmrs.wmnet with OS buster
* 11:26 arturo: aborrero@apt1001:~$ sudo -i reprepro processincoming default /srv/wikimedia/incoming/python-flask-keystone_0.2~git20201012.b5cd4da-1_amd64.changes ([[phab:T295234|T295234]])
* 11:08 arturo: run aborrero@apt1001:~$ sudo -i reprepro processincoming default
* 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
* 11:07 arturo: added python-flask-oslolog_0.1~git20201012.7803a46-1 to bullseye-wikimedia ([[phab:T295234|T295234]])
* 11:06 arturo: aborrero@apt1001:~ $ for i in $(ll /srv/wikimedia/incoming/ {{!}} grep aborrero {{!}} awk -F' ' '<nowiki>{</nowiki>print $NF<nowiki>}</nowiki>') ; do rm /srv/wikimedia/incoming/$i ; done
* 11:05 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host lvs6002.drmrs.wmnet with OS buster
* 11:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
* 10:57 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6001.drmrs.wmnet with OS buster
* 10:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2002.codfw.wmnet with OS buster
* 10:17 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host lvs6001.drmrs.wmnet with OS buster
* 10:12 topranks: Re-pooling eqiad in DNS after completing iBGP policy changes on cr1-eqiad and cr2-eqiad [[phab:T295672|T295672]]
* 10:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:01 moritzm: updating perf on buster hosts
* 10:00 topranks: Re-enabling Equinix IXP port on cr1-eqiad following iBGP changes to address [[phab:T295650|T295650]]
* 09:56 ema: cp4021: repool w/ single backend experiment enabled [[phab:T288106|T288106]]
* 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2002.codfw.wmnet with OS buster
* 09:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:41 ema: cp4021: stop ats-be and clear its cache [[phab:T288106|T288106]]
* 09:35 ema: cp4021: depool to enable single backend experiment [[phab:T288106|T288106]]
* 09:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1090.eqiad.wmnet with OS buster
* 09:32 vgutierrez: pool cp1090 (upload) running HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 09:18 jayme: systemctl start prune-production-images.service on deneb - [[phab:T287222|T287222]]
* 08:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1090.eqiad.wmnet with OS buster
* 08:46 vgutierrez: depool cp1090 to be reimaged as cache::upload_haproxy - [[phab:T290005|T290005]]
* 08:45 moritzm: installing mariadb-10.3 security updates on buster (as packaged in Debian, not the wmf-internal packages)
* 08:27 topranks: De-pool of Eqiad seems to be ok, transit/peering/transport links changed BW profile but nothing maxed, total LVS connections steady but have shifted to codfw.  Proceeding to reconfigure iBGP policy on cr1-eqiad and cr2-eqiad maually.
* 08:01 topranks: Depooling eqiad in authdns to allow for reconfiguration of CR routers on site ([[phab:T295672|T295672]])
* 07:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:35 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/maintenance/migrateRevisionActorTemp.php: Backport: [[gerrit:739636{{!}}maintenance: Add waitForReplication and sleep in migrateRevisionActorTemp (T275246)]] (duration: 01m 04s)
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17772 and previous config saved to /var/cache/conftool/dbconfig/20211118-073507-root.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17771 and previous config saved to /var/cache/conftool/dbconfig/20211118-072004-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist from s5 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17770 and previous config saved to /var/cache/conftool/dbconfig/20211118-070620-marostegui.json
* 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17769 and previous config saved to /var/cache/conftool/dbconfig/20211118-070559-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17768 and previous config saved to /var/cache/conftool/dbconfig/20211118-070500-root.json
* 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17767 and previous config saved to /var/cache/conftool/dbconfig/20211118-065055-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 40%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17766 and previous config saved to /var/cache/conftool/dbconfig/20211118-064957-root.json
* 06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17765 and previous config saved to /var/cache/conftool/dbconfig/20211118-063552-root.json
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17764 and previous config saved to /var/cache/conftool/dbconfig/20211118-063453-root.json
* 06:31 Amir1: revoked all grants from wikiadmin and gave back an explicit list on db1102:3312 ([[phab:T249683|T249683]])
* 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: After fixing GRANTs', diff saved to https://phabricator.wikimedia.org/P17763 and previous config saved to /var/cache/conftool/dbconfig/20211118-062048-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 20%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17762 and previous config saved to /var/cache/conftool/dbconfig/20211118-061949-root.json
* 06:17 Amir1: revoked all grants from wikiadmin and gave back an explicit list on db1156 ([[phab:T249683|T249683]])
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17761 and previous config saved to /var/cache/conftool/dbconfig/20211118-060446-root.json
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 5%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17760 and previous config saved to /var/cache/conftool/dbconfig/20211118-054942-root.json
* 05:47 marostegui: Upgrade clouddb1014
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 1%: Repool after HW maintenance', diff saved to https://phabricator.wikimedia.org/P17759 and previous config saved to /var/cache/conftool/dbconfig/20211118-053438-root.json
* 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 due to network issues ([[phab:T295952|T295952]])', diff saved to https://phabricator.wikimedia.org/P17758 and previous config saved to /var/cache/conftool/dbconfig/20211118-050802-ladsgroup.json
* 04:23 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 02:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2006.codfw.wmnet
* 02:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor2005.codfw.wmnet
* 01:56 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2006.codfw.wmnet
* 01:48 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2006.codfw.wmnet
* 01:47 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2005.codfw.wmnet
* 01:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:42 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2005.codfw.wmnet
* 01:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:35 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP - Config: [[gerrit:739633{{!}}Revert "Stop setting wgActorTableSchemaMigrationStage, no longer read in core" (T275246)]] (duration: 01m 04s)
* 00:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thumbor2006.codfw.wmnet with OS stretch
* 00:28 legoktm@cumin1001: START - Cookbook sre.hosts.reimage for host thumbor2006.codfw.wmnet with OS stretch
* 00:26 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thumbor2005.codfw.wmnet with OS stretch
* 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:20 ryankemper: [[phab:T290902|T290902]] Test host looks good, proceeding to rest of fleet `ryankemper@cumin1001:~$ sudo cumin -b 4 '*elastic*' 'sudo run-puppet-agent --force'`
* 00:18 urbanecm: UTC late B&C finished
* 00:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:18 ryankemper: [[phab:T290902|T290902]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/739379; running puppet agent on arbitrary elastic host: `ryankemper@elastic1051:~$ sudo run-puppet-agent --force`
* 00:17 ryankemper: [[phab:T290902|T290902]] Disabling puppet across all elastic*: `ryankemper@cumin1001:~$ sudo cumin '*elastic*' 'sudo disable-puppet "Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/739379"'`
* 00:16 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|5110fe77bb982cca82c8d474339a2b73d02c8024}}: Migrate wmfHostnames to wmgHostnames ([[phab:T45956|T45956]]) (duration: 01m 03s)
* 00:12 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/brwikimedia.png and respective HD variants
* 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:08 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|59c3fe66a0d140ae21f7269150a256a5e9786b24}}: Lossless optimization of the brwikimedia logo (duration: 01m 04s)
* 00:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:00 legoktm@cumin1001: START - Cookbook sre.hosts.reimage for host thumbor2005.codfw.wmnet with OS stretch


== June 23 ==
== 2021-11-17 ==
* 23:38 logmsgbot: ori Finished scap: scapping to all apaches for --restart test (duration: 07m 03s)
* 23:53 eileen: * revision {{Gerrit|8054869b}} -> {{Gerrit|b3e2a122}} (latest)
* 23:30 logmsgbot: ori Started scap: scapping to all apaches for --restart test
* 23:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1003.eqiad.wmnet
* 23:24 bblack: nginxes all updated for ssl stapling bugfix
* 23:49 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1001.eqiad.wmnet
* 23:24 logmsgbot: ori Finished scap: scapping to scap-test dsh group for --restart test (duration: 06m 02s)
* 23:45 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
* 23:18 logmsgbot: ori Started scap: scapping to scap-test dsh group for --restart test
* 23:45 legoktm@cumin1001: conftool action : set/weight=5; selector: name=thumbor1006.eqiad.wmnet
* 23:16 logmsgbot: ori scap aborted: scapping to scap-test dsh group for --restart test (duration: 00m 06s)
* 23:44 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1006.eqiad.wmnet
* 23:16 logmsgbot: ori Started scap: scapping to scap-test dsh group for --restart test
* 23:43 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1006.eqiad.wmnet
* 22:14 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php: RejectParserCacheValue may pass a WikiPage or Article (duration: 00m 13s)
* 23:35 legoktm@cumin1001: conftool action : set/weight=10; selector: name=thumbor1005.eqiad.wmnet
* 22:07 mutante: tmp. disabling puppet on mw1033
* 23:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:53 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php: (no message) (duration: 00m 15s)
* 23:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:50 logmsgbot: ori Synchronized php-1.26wmf11/includes/parser/ParserCache.php: (no message) (duration: 00m 12s)
* 22:42 mutante: miscweb1002/2002 - moved /srv/deployment/scholarships to /root/ ([[phab:T243037|T243037]])
* 21:40 mutante: starting instance planet1001 on ganeti1003 - cant get console
* 21:42 ayounsi@deploy1002: Finished deploy [homer/deploy@dc007aa]: Homer CR738905 (duration: 01m 27s)
* 21:40 logmsgbot: legoktm Synchronized php-1.26wmf11/includes/parser/ParserCache.php: (no message) (duration: 00m 13s)
* 21:41 ayounsi@deploy1002: Started deploy [homer/deploy@dc007aa]: Homer CR738905
* 21:36 bd808: updated scap to 33f3002 (Ensure that the minimum batch size used by cluster_ssh is 1)
* 21:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:34 logmsgbot: ori Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi: 3c8bb2c493: Update SyntaxHighlight_GeSHi for cherry-pick (duration: 00m 13s)
* 21:00 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:32 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.26wmf11
* 20:47 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:19 logmsgbot: mattflaschen Synchronized wmf-config/InitialiseSettings-labs.php: Beta-only change to add Flow_test to enwiki (duration: 00m 11s)
* 20:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:59 logmsgbot: ori scap failed: OSError [Errno 10] No child processes (duration: 01m 46s)
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:58 logmsgbot: ori Started scap: (no message)
* 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 ori: updated scap to master
* 20:33 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.38.0-wmf.7"
* 19:11 ori: running apache graceful-stop on mw1042 to test mod_status behavior during graceful stop
* 20:23 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]] (duration: 01m 03s)
* 19:02 logmsgbot: twentyafterfour Finished scap: New deployment branch: 1.26wmf11 try #2 (13 apaches failed) (duration: 03m 50s)
* 20:22 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]]
* 18:58 logmsgbot: twentyafterfour Started scap: New deployment branch: 1.26wmf11 try #2 (13 apaches failed)
* 19:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:53 logmsgbot: twentyafterfour Finished scap: New deployment branch: 1.26wmf11 (duration: 26m 37s)
* 19:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:31 godog: start rolling-downgrade of cassandra to 2.1.3 T102015
* 19:42 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.9/includes/export/WikiExporter.php: Backport: [[gerrit:739491{{!}}export: Ignore rev_page_id index (T285149)]] (duration: 01m 04s)
* 18:27 logmsgbot: twentyafterfour Started scap: New deployment branch: 1.26wmf11
* 19:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:13 logmsgbot: ori Finished scap: (no message) (duration: 04m 34s)
* 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:11 paravoid: reloading nginx on all cp* for reuseport
* 19:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:08 logmsgbot: ori Started scap: (no message)
* 19:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8e167a53cec3c3b216100bab686f28e09c424435}}: Disable local file upload on the Chinese Wikisource ([[phab:T295265|T295265]]) (duration: 01m 05s)
* 17:57 ori: repooled scap-test servers (mw1170-mw1175 and mw1270-mw1275)
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:16 logmsgbot: ori Finished scap: (no message) (duration: 01m 42s)
* 19:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:14 logmsgbot: ori Started scap: (no message)
* 19:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7b3a1d976cb1ef931c809b3670fb8c8b3f3a56e7}}: Make reply tool available as opt-out on commonswiki ([[phab:T295838|T295838]]) (duration: 01m 05s)
* 17:10 logmsgbot: ori Finished scap: (no message) (duration: 01m 34s)
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:09 logmsgbot: ori Started scap: (no message)
* 18:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2042.codfw.wmnet with OS buster
* 17:06 logmsgbot: ori scap aborted: (no message) (duration: 01m 23s)
* 18:57 ejegg: updated fundraising CiviCRM from {{Gerrit|9c5f0b69}} -> {{Gerrit|8054869b}}
* 17:04 logmsgbot: ori Started scap: (no message)
* 18:56 vgutierrez: pool cp2042 (upload) running HAProxy as TLS terminator - [[phab:T290005|T290005]]
* 16:53 logmsgbot: bd808 Finished scap: no-op sync to scap-test dsh group; Testing HHVM restart take 4 (duration: 01m 30s)
* 18:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2042.codfw.wmnet with OS buster
* 16:52 logmsgbot: bd808 Started scap: no-op sync to scap-test dsh group; Testing HHVM restart take 4
* 18:05 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:45 cscott: updated OCG to version db7a56965233a74c73917c78b5c8c84c867321d9
* 18:01 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 16:37 logmsgbot: bd808 Finished scap: no-op sync to scap-test dsh group; Testing HHVM restart take 3 (duration: 01m 12s)
* 17:59 vgutierrez: depool cp2042 to be reimaged as an HAProxy cache upload node - [[phab:T290005|T290005]]
* 16:35 logmsgbot: bd808 Started scap: no-op sync to scap-test dsh group; Testing HHVM restart take 3
* 17:41 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 16:35 bd808: updated scap to da64a65 (Cast pid read from file to an int)
* 17:25 cmooney@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host rpki2002.codfw.wmnet
* 16:26 logmsgbot: bd808 Finished scap: no-op sync to scap-test dsh group; Testing HHVM restart take 2 (duration: 01m 26s)
* 17:11 XioNoX: repool Telia eqiad-codfw transport
* 16:25 logmsgbot: bd808 Started scap: no-op sync to scap-test dsh group; Testing HHVM restart take 2
* 17:10 cmooney@cumin2002: START - Cookbook sre.ganeti.makevm for new host rpki2002.codfw.wmnet
* 16:22 bd808: updated scap to 947b93f (Fix reference to _get_apache_list)
* 16:34 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts rpki2001.codfw.wmnet
* 16:12 logmsgbot: bd808 scap failed: AttributeError 'Scap' object has no attribute '_get_apache_list' (duration: 02m 15s)
* 16:32 mutante: LDAP - added jkieserman to wmf ([[phab:T295693|T295693]])
* 16:10 logmsgbot: bd808 Started scap: no-op sync to scap-test dsh group; Testing HHVM restart
* 16:28 cmooney@cumin2002: START - Cookbook sre.hosts.decommission for hosts rpki2001.codfw.wmnet
* 16:01 paravoid: staggered upgrade of cp* fleet to nginx 1.9.2
* 16:28 XioNoX: drain Telia eqiad-codfw link
* 15:57 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: Follow-up 94e5fd2: Default wmgUseContentTranslation true only on Wikipedias [[gerrit:220161]] (duration: 00m 16s)
* 16:27 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts rpki2001.codfw.wmnet
* 15:49 jynus: rebooting es1004
* 16:21 XioNoX: move cr1-codfw<->cr2-eqdfw link to BO cable
* 15:09 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Enable CX as default except where it is not deployed [[gerrit:220078]] (duration: 00m 12s)
* 16:19 cmooney@cumin2002: START - Cookbook sre.hosts.decommission for hosts rpki2001.codfw.wmnet
* 15:04 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable 'frwiki-recommender' campaign in frwiki [[gerrit:220071]] (duration: 00m 13s)
* 16:06 XioNoX: move cr1-codfw:xe-5/3/0 to BO cable
* 14:54 paravoid: reprepro: including nginx 1.9.2-1~bpo8+1 to jessie-wikimedia/backports
* 16:04 XioNoX: re-enable Telia BGP on cr1-codfw
* 14:39 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1003, depool es1004 (duration: 00m 12s)
* 16:01 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
* 14:04 cscott: reverted OCG to version ca4f64852de5b1de782b292b50038fbd2dd84266 (bundler failing with exit code 8)
* 15:59 bblack: netbox: added ganeti01 and ganeti02 cluster definitions for drmrs
* 13:57 cscott: updated OCG to version d7c698d5bf730d34057945e912ac75dc542dd788
* 15:58 XioNoX: disable Telia BGP on cr1-codfw
* 13:44 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/209744/ (duration: 00m 13s)
* 15:55 XioNoX: move codfw-ulsfo link to break-out cable
* 13:44 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/209744/ (duration: 00m 12s)
* 15:46 mutante: restarting pybal on lvs1015
* 12:54 moritzm: ssh on precise hosts has been updated to a backport of 6.6p1-2ubuntu2 (the version from trusty). this allows us to use modern crypto (plus labs can simplify key handling)
* 15:43 _joe_: restarting pybal on lvs2009
* 12:45 jynus: rebooting es1003
* 15:42 mutante: restarting pybal on lvs1016
* 12:18 moritzm: uploaded openssh_6.6p1-2ubuntu2~wmfprecise2 to precise-wikimedia on apt.wikimedia.org
* 15:39 _joe_: restarting pybal on lvs2010
* 12:10 logmsgbot: hoo Synchronized arbitraryaccess.dblist: Arbitrary access for ruwiki and cswiki. T102122 (duration: 00m 12s)
* 15:35 XioNoX: drain ulsfo-codfw link
* 11:33 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1002, depool es1003 (part 2/2) (duration: 00m 12s)
* 14:47 moritzm: installing perl bugfix updates from Bullseye point release
* 11:25 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1002, depool es1003 (duration: 00m 12s)
* 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Ganeti update tests
* 09:41 moritzm: updated jsch on gallium and lanthanum to support modern SSH key exchange in Jenkins (actually that happened yesterday, but I forgot to log it back then)
* 14:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Ganeti update tests
* 09:41 moritzm: added jsch_0.1.50-1ubuntu1~wmfprecise1 to precise-wikimedia on carbon
* 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Change weights on s5 special slaves in eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17755 and previous config saved to /var/cache/conftool/dbconfig/20211117-134942-marostegui.json
* 09:09 akosiaris: failing over etherpad to db1016
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges from s5 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17754 and previous config saved to /var/cache/conftool/dbconfig/20211117-134835-marostegui.json
* 04:53 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 23 04:53:17 UTC 2015 (duration 53m 16s)
* 13:20 aborrero@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cloudbackup1001-dev.eqiad.wmnet
* 03:33 springle: xtrabackup clone db2023 to db1045
* 13:10 aborrero@cumin1001: START - Cookbook sre.ganeti.makevm for new host cloudbackup1001-dev.eqiad.wmnet
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-23 02:26:44+00:00
* 13:02 Lucas_WMDE: UTC morning backport+config window done
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 06m 47s)
* 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
* 01:17 logmsgbot: krinkle Synchronized docroot and w: (no message) (duration: 00m 12s)
* 12:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
* 01:00 bd808: Pruned virt1000 from trebuchet minions list: redis-cli srem "deploy:scap/scap:minions" virt1000.wikimedia.org
* 12:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:24 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:739467{{!}}Enable disambiguator notifications on 6 Wikipedias (T293319)]] (duration: 01m 04s)
* 12:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:22 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
* 12:17 topranks: Re-pooling ulsfo after completing routing changes on cr3-ulsfo and cr4-ulsfo ([[phab:T295672|T295672]])
* 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:11 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
* 12:11 moritzm: failover ganeti master in test cluster to ganeti-test2003
* 12:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:739391{{!}}Enable more languages for Section Translation in testwiki (T294223)]] (duration: 01m 52s)
* 12:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 moritzm: installing testvm2002
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked from s5 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17753 and previous config saved to /var/cache/conftool/dbconfig/20211117-105120-marostegui.json
* 10:45 dcausse: restarting blazegraph on wdqs1013 (jvm stuck)
* 10:45 topranks: Commencing manual config on cr3-ulsfo and cr4-ulsfo (site depooled) to reconfigure iBGP ([[phab:T295672|T295672]])
* 10:42 hnowlan: replaced all references to deploy1001 with deploy1002 in all .git/DEPLOY_HEAD directories on deploy1002:/srv/deployment
* 10:41 ema: A:cp re-enable puppet after testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/738949/ [[phab:T293879|T293879]]
* 10:37 jayme: imported wmf-certificates 0~20211110-1 to stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 10:31 ema: A:cp disable-puppet to merge and test https://gerrit.wikimedia.org/r/c/operations/puppet/+/738949/ [[phab:T293879|T293879]]
* 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 10:18 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS buster
* 10:18 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 10:14 topranks: De-pool ulsfo in DNS to allow safe reconfiguration / test of changes to CR routers iBGP ([[phab:T295672|T295672]])
* 10:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 10:00 moritzm: running "gnt-cluster upgrade --to 2.16" on ganeti test cluster
* 09:59 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:59 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:53 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS buster
* 09:48 moritzm: running "gnt-cluster renew-crypto --new-cluster-certificate" on ganeti test cluster
* 09:39 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS buster
* 09:35 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS buster
* 09:19 _joe_: removing php 7.3 images from docker-registry.wikimedia.org
* 09:13 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS buster
* 09:11 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS buster
* 09:03 moritzm: installing ffmpeg security updates on stretch
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17752 and previous config saved to /var/cache/conftool/dbconfig/20211117-090124-root.json
* 08:56 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS buster
* 08:54 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS buster
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17751 and previous config saved to /var/cache/conftool/dbconfig/20211117-084621-root.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17750 and previous config saved to /var/cache/conftool/dbconfig/20211117-083117-root.json
* 08:30 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS buster
* 08:24 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS buster
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 40%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17749 and previous config saved to /var/cache/conftool/dbconfig/20211117-081613-root.json
* 08:14 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS buster
* 08:11 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS buster
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17748 and previous config saved to /var/cache/conftool/dbconfig/20211117-080110-root.json
* 07:49 elukey: restart coal, navtiming, statsv (refreshed by puppet) after https://gerrit.wikimedia.org/r/737970
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 20%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17747 and previous config saved to /var/cache/conftool/dbconfig/20211117-074606-root.json
* 07:44 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS buster
* 07:34 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS buster
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17746 and previous config saved to /var/cache/conftool/dbconfig/20211117-073102-root.json
* 07:31 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS buster
* 07:29 elukey: `apt-get clean` on an-tool1005 to free space in the root partition
* 07:28 elukey: `sudo pkill -U jmixter` on stat100[5,8] to allow puppet to run and remove the offboarded user
* 07:22 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS buster
* 07:20 Amir1: start of clean up of autreview logs of ruwiki, deleting 3.5M rows ([[phab:T285608|T285608]])
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17745 and previous config saved to /var/cache/conftool/dbconfig/20211117-071559-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 1%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17744 and previous config saved to /var/cache/conftool/dbconfig/20211117-070055-root.json
* 06:58 marostegui: Upgrade db1180 to 10.4.22
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 for upgrade', diff saved to https://phabricator.wikimedia.org/P17743 and previous config saved to /var/cache/conftool/dbconfig/20211117-065740-marostegui.json
* 06:52 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS buster
* 06:43 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS buster
* 06:38 Amir1: start of deleting auto-review logs in arwiki ([[phab:T285608|T285608]]) deleting 23M rows
* 06:33 marostegui: Upgrade clouddb1018
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager from s5 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17742 and previous config saved to /var/cache/conftool/dbconfig/20211117-060426-marostegui.json
* 03:16 eileen: checkout revision ({{Gerrit|c67b18b9}} -> {{Gerrit|9c5f0b69}})
* 02:10 eileen: * revision {{Gerrit|817e514a}} -> {{Gerrit|c67b18b9}} (latest) civicrm
* 00:19 ryankemper: [[phab:T276198|T276198]] `ryankemper@cumin1001:~$ sudo cumin -b 3 '*elastic*' 'sudo run-puppet-agent --force'` Change looks good (no complaints from systemd), rolling out to rest of fleet / reenabling puppet
* 00:15 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1006.eqiad.wmnet
* 00:06 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1006.eqiad.wmnet


== June 22 ==
== 2021-11-16 ==
* 23:42 gwicke: restarted Cassandra on restbase1006
* 23:59 ryankemper: [[phab:T276198|T276198]] `ryankemper@elastic1049:~$ sudo run-puppet-agent --force` to test out https://gerrit.wikimedia.org/r/c/operations/puppet/+/739375
* 23:27 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/MobileFrontend: For real this time (duration: 00m 14s)
* 23:57 ejegg: updated payments-wiki from {{Gerrit|49ad5962}} -> {{Gerrit|abb2bd9d}}
* 23:27 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/Gather: For real this time (duration: 00m 13s)
* 23:27 ryankemper: [[phab:T276198|T276198]] `ryankemper@elastic1049:~$ sudo run-puppet-agent --force`;  `elasticsearch_6@production-search-eqiad.service ` didn't restart but it looks like there might be slightly wrong with the new `ExecPreStart` line => `Executable path is not absolute, ignoring: systemd-tmpfiles --create /usr/lib/tmpfiles.d/elasticsearch.conf`
* 23:17 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/Gather: SWAT (duration: 00m 12s)
* 23:27 legoktm@cumin1001: conftool action : set/weight=5; selector: name=thumbor1005.eqiad.wmnet
* 23:17 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/MobileFrontend/: SWAT (duration: 00m 15s)
* 23:25 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1005.eqiad.wmnet
* 23:12 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Enable TinyRGB ICC profile swapping on testwiki (duration: 00m 13s)
* 23:22 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1005.eqiad.wmnet
* 22:51 logmsgbot: ori Synchronized php-1.26wmf10/resources/src/mediawiki/mediawiki.Title.js: I0e5f2d3b2: Fix undeclared dependency on jquery.mwExtension (duration: 00m 12s)
* 23:22 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1005.eqiad.wmnet
* 22:45 gwicke: restarting Cassandra on restbase1005 to get the metrics back
* 23:21 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1005.eqiad.wmnet
* 22:37 gwicke: restarting Cassandra on restbase1004 to get the metrics back
* 23:19 ryankemper: [[phab:T276198|T276198]] `ryankemper@cumin1001:~$ sudo cumin '*elastic*' 'sudo disable-puppet "Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/721644"'` (done a few mins ago)
* 22:33 gwicke: restarting Cassandra on restbase1003 to get the metrics back
* 20:51 mutante: [miscweb2002:/var/cache] $ sudo rm -rf scholarships/
* 22:24 gwicke: restarting Cassandra on restbase1002 to get the metrics back
* 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:19 bd808: scap error "@ERROR: access denied to common from localhost (127.0.0.1)" from mw2187 and mw2080 on sync-file test.
* 20:39 dcausse: restarting blazegraph on wdqs1005 (jvm stuck)
* 22:17 logmsgbot: bd808 Synchronized README: Testing sync-file after scap update (duration: 00m 12s)
* 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:08 RoanKattouw: Deployed patch for T103054
* 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:59 godog: reboot restbase1008
* 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:56 bd808: updated scap to 81b7c14 (Move dsh group file names to config)
* 20:09 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]]
* 21:55 bd808: trebuchet checkout for scap/scap failed on 23 hosts: mw1104, mw1222, mw2009, mw2011, mw2021, mw2028, mw2031, mw2034, mw2069, mw2076, mw2080, mw2086, mw2095, mw2099, mw2120, mw2127, mw2131, mw2136, mw2170, mw2187, mw2189, mw2197, virt1000
* 19:52 cmjohnson1: moving mgmt cables from old msw to new msw in b7-eqiad
* 21:50 bd808: trebuchet fetch for scap/scap failed on mw2086.codfw.wmnet, mw1222.eqiad.wmnet and virt1000.wikimedia.org
* 19:51 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6002.drmrs.wmnet with OS bullseye
* 21:41 gwicke: restarting Cassandra on restbase1001 to get the metrics back
* 19:51 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6003.drmrs.wmnet with OS bullseye
* 21:20 ori: Depooled mw1170-mw1175 and mw1270-mw1275 for testing Idddcfe46
* 19:46 joal@deploy1002: Finished deploy [analytics/refinery@194b11b] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@194b11b] (duration: 06m 53s)
* 21:07 chasemp: rebooting mw1101 the hard way
* 19:43 cmjohnson1: moving mgmt cables from old msw to new msw in b5-eqiad
* 20:28 cscott: updated Parsoid to version d488783e
* 19:40 joal@deploy1002: Started deploy [analytics/refinery@194b11b] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@194b11b]
* 19:34 akosiaris: delete pad:ips from etherpad
* 19:39 joal@deploy1002: Finished deploy [analytics/refinery@194b11b] (thin): Regular analytics weekly train THIN [analytics/refinery@194b11b] (duration: 00m 07s)
* 19:01 jynus: rebooting es1002
* 19:39 joal@deploy1002: Started deploy [analytics/refinery@194b11b] (thin): Regular analytics weekly train THIN [analytics/refinery@194b11b]
* 18:52 logmsgbot: ori Synchronized php-1.26wmf10/includes/OutputPage.php: I0e5f2d3b2: Construct clean canonical URLs for wiki pages, ignoring request URL (T67402) (duration: 00m 14s)
* 19:38 joal@deploy1002: Finished deploy [analytics/refinery@194b11b]: Regular analytics weekly train [analytics/refinery@194b11b] (duration: 22m 14s)
* 18:01 legoktm: live-hacking mw1017 to debug T103053
* 19:34 cmjohnson1: moving mgmt cables from old msw to new msw in b3-eqiad
* 17:49 mutante: Bugzilla has left the building
* 19:27 cmjohnson1: moving mgmt cables from old msw to new msw in b2-eqiad
* 16:31 jynus: reseting wikitech-static mysql contents to improve fragmentation
* 19:18 cmjohnson1: moving mgmt cables from old msw to new msw in b1-eqiad
* 16:26 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1001, depool es1002 (duration: 00m 14s)
* 19:16 joal@deploy1002: Started deploy [analytics/refinery@194b11b]: Regular analytics weekly train [analytics/refinery@194b11b]
* 16:12 andrewbogott: shutting down virt1000
* 19:15 jhuneidi@deploy1002: Pruned MediaWiki: 1.38.0-wmf.6 (duration: 03m 17s)
* 16:08 andrewbogott: disabling puppet on virt1000
* 19:14 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:07 ottomata: deploying eventlogging 0.9. This includes changes for arbitrary eventlogging URIs in all eventlogging stages, as well as support for schema based kafka topic URIs.
* 19:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6001.drmrs.wmnet with OS bullseye
* 15:24 logmsgbot: thcipriani Synchronized php-1.26wmf10/extensions/WikiEditor: SWAT: Reduce 'Edit' EventLogging schema sampling rate to 6.25% (1/16th) [[gerrit:219837]] (duration: 00m 13s)
* 19:11 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]] (duration: 36m 32s)
* 15:04 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Default wmgUseWikibaseQuality on beta to true. [[gerrit:219630]] (duration: 00m 14s)
* 19:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6004.drmrs.wmnet with OS bullseye
* 14:32 hashar: restarting Jenkins
* 19:11 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6002.drmrs.wmnet with OS bullseye
* 13:26 jynus: rebooting es1001 for regular maintenance
* 19:11 cmjohnson1: moving mgmt cables from old msw to new msw in a7-eqiad
* 12:08 paravoid: powercycled ms-be1002, stuck at console
* 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:12 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Depool es1001 (duration: 00m 13s)
* 19:10 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6003.drmrs.wmnet with OS bullseye
* 11:06 _joe_: restarting hhvm on the low-memory appservers (main and api)
* 19:08 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 09:23 hashar: upgrading Jenkins gearman plugin from 0.1.1 to latest master (f2024bd). Restarting Jenkins.
* 19:06 cmjohnson1: moving mgmt cables from old msw to new msw in a5-eqiad
* 05:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 22 05:11:22 UTC 2015 (duration 11m 21s)
* 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:31 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-22 02:31:32+00:00
* 19:01 cmjohnson1: moving mgmt cables from old msw to new msw in a4-eqiad
* 02:27 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 07m 27s)
* 18:56 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti6003.drmrs.wmnet with OS bullseye
* 00:44 jgage: restarted gitblit on antimony again
* 18:55 cmjohnson1: moving mgmt cables from old msw to new msw in a3-eqiad
* 18:51 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti6002.drmrs.wmnet with OS bullseye
* 18:41 cmjohnson1: moving mgmt cables from old msw to new msw in a2-eqiad
* 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:35 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.9  refs [[phab:T293950|T293950]]
* 18:31 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6003.drmrs.wmnet with OS bullseye
* 18:31 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6001.drmrs.wmnet with OS bullseye
* 18:31 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6002.drmrs.wmnet with OS bullseye
* 18:31 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti6004.drmrs.wmnet with OS bullseye
* 18:28 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:26 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:17 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 17:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:20 mutante: removing scholarships.wikimedia.org from DNS - [[phab:T243037|T243037]]
* 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:11 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:56 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:27 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:23 herron: systemctl reset-failed ifup@ens13 on prometheus5001 [[phab:T273026|T273026]]
* 16:22 moritzm: systemctl reset-failed ifup@esn13 on durum5001 after restart [[phab:T273026|T273026]]
* 16:12 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:05 moritzm: powercycling ganeti5002
* 15:53 andrewbogott: merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/525220 which makes read-only ldap the default for ldap clients
* 14:44 cmooney@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host rpki2001.codfw.wmnet
* 14:31 cmooney@cumin2002: START - Cookbook sre.ganeti.makevm for new host rpki2001.codfw.wmnet
* 14:31 cmooney@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host rpki2002.codfw.wmnet
* 14:24 jynus: re-adding backup user to db1108:analytics_meta [[phab:T284150|T284150]]
* 14:22 cmooney@cumin2002: START - Cookbook sre.ganeti.makevm for new host rpki2002.codfw.wmnet
* 14:18 cmooney@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rpki2001.codfw.wmnet
* 14:09 cmooney@cumin2002: START - Cookbook sre.hosts.decommission for hosts rpki2001.codfw.wmnet
* 13:58 cmooney@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host rpki2001.codfw.wmnet
* 13:51 cmooney@cumin2002: START - Cookbook sre.ganeti.makevm for new host rpki2001.codfw.wmnet
* 13:23 moritzm: installing debconf bugfix updates on buster
* 13:21 moritzm: prune unused packages from ping3001 [[phab:T295767|T295767]]
* 13:18 moritzm: prune unused packages from ping1001/ping2001 [[phab:T295767|T295767]]
* 13:05 moritzm: installing psmisc bugfix updates on buster hosts
* 13:04 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS buster
* 12:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:29 moritzm: installing Linux 4.19.208 updates on buster hosts (no reboots)
* 12:24 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS buster
* 12:22 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
* 12:13 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6006.drmrs.wmnet with OS buster
* 11:55 moritzm: failover ganeti master in test cluster to ganeti-test2002
* 11:34 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6006.drmrs.wmnet with OS buster
* 11:31 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
* 11:03 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 10:30 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6005.drmrs.wmnet with OS buster
* 10:21 ema: A:cp re-enable puppet after successful test on cp402[17] [[phab:T293879|T293879]]
* 10:20 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
* 10:15 moritzm: installing testvm2001
* 10:06 arturo: updating deb packages on stretch-wikimedia/thirdparty/kubeadm-k8s-1-21 ([[phab:T282942|T282942]])
* 10:02 ema: A:cp disable puppet to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/738910 on cp4021 [[phab:T293879|T293879]]
* 09:51 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6005.drmrs.wmnet with OS buster
* 09:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6004.drmrs.wmnet with OS buster
* 09:40 ayounsi@deploy1002: Finished deploy [homer/deploy@c570af3]: Homer CR738905 (duration: 01m 25s)
* 09:39 ayounsi@deploy1002: Started deploy [homer/deploy@c570af3]: Homer CR738905
* 09:09 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6004.drmrs.wmnet with OS buster
* 08:54 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6003.drmrs.wmnet with OS buster
* 08:14 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6003.drmrs.wmnet with OS buster
* 08:04 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6002.drmrs.wmnet with OS buster
* 07:25 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS buster
* 02:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:28 urbanecm: UTC late window done
* 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:23 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/WikimediaEvents/: 738399: 739004: WikimediaEvents backports ([[phab:T294738|T294738]]) (duration: 00m 56s)
* 00:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|50d9f2687cd11e6f838313a530c6bbd498d0b83e}}: GrowthExperiments: Set up GEHomepageNewAccountVariantsByPlatform ([[phab:T294737|T294737]]) (duration: 00m 56s)
* 00:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== June 21 ==
== 2021-11-15 ==
* 11:28 jynus: restarting apache on mw1110
* 23:10 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1005.eqiad.wmnet
* 06:55 gwicke: restarted  bootstrap on restbase1009 earlier today; hardware hasn't died yet
* 22:59 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1005.eqiad.wmnet
* 05:01 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 21 05:01:07 UTC 2015 (duration 1m 6s)
* 22:58 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on thumbor1005.eqiad.wmnet with reason: reboot after first puppet run
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-21 02:27:13+00:00
* 22:58 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on thumbor1005.eqiad.wmnet with reason: reboot after first puppet run
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 10m 23s)
* 21:46 bblack: dns6002 - reboot for another round of bios fixups
* 01:39 jgage: restarted gitblit on antimony at 00:43 UTC
* 21:32 bblack: dns6001 - reboot for another round of bios fixups
* 01:37 Krenair: testing morebots
* 21:21 legoktm: uploaded php7.4_7.4.25-1+wmf2+buster1_amd64.changes to apt.wm.o with patch for [[phab:T293568|T293568]]
* 21:19 mutante: removing mediawiki font packages from remaining regular appservers globally ([[phab:T294378|T294378]])
* 20:49 mutante: retiring https://scholarships.wikimedia.org - removing from ATS ([[phab:T243037|T243037]])
* 20:49 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6001.drmrs.wmnet with OS buster
* 20:09 Amir1: revoked all grants from wikiadmin and gave back an explicit list on clouddb1013:3311 ([[phab:T249683|T249683]])
* 20:08 Amir1: revoked all grants from wikiadmin and gave back an explicit list on clouddb1021:3311 ([[phab:T249683|T249683]])
* 20:07 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS buster
* 20:03 Amir1: revoked all grants from wikiadmin and gave back an explicit list on db1102:3312 ([[phab:T249683|T249683]])
* 19:57 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp6001.drmrs.wmnet with OS buster
* 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:46 Amir1: revoked all grants from wikiadmins and gave back explicit list on db2101:3315 ([[phab:T249683|T249683]])
* 19:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 urbanecm: UTC evening B&C window done
* 19:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|898ebb1e8400a759ffc5553794f6a7200c97bf49}}: Enable talk for mobile users on enwiki ([[phab:T293946|T293946]]) (duration: 00m 57s)
* 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS buster
* 19:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cdac608e84250207efeac9ea489a7e5be908ec70}}: Change votewiki language back to English ([[phab:T292685|T292685]]) (duration: 00m 56s)
* 19:06 mutante: removing font packages from MW API appservers  [[phab:T294378|T294378]]
* 18:58 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp6001.drmrs.wmnet with OS buster
* 18:52 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 40 days, 0:00:00 on ps1-d1-codfw with reason: Testing new PDU devices [[phab:T265435|T265435]]
* 18:52 volans@cumin2002: START - Cookbook sre.hosts.downtime for 40 days, 0:00:00 on ps1-d1-codfw with reason: Testing new PDU devices [[phab:T265435|T265435]]
* 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:32 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS buster
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1be2d3941530bbed54632dafb0b804d0ddf41299}}: Growth IP research survey: Fix platforms ([[phab:T294568|T294568]]) (duration: 00m 55s)
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d15948e6da61af2d1db271cb0c9d8bd9a5395d75}}: foundationwiki: Restrict editing in more namespaces ([[phab:T294900|T294900]]) (duration: 00m 56s)
* 18:19 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikimedia.org/T294580
* 18:19 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikimedia.org/T294580
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 arnoldokoth: upgrading gitlab version on gitlab2001 ([[phab:T294580|T294580]])
* 18:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|00650753d77d7a526b6751669bf3548cf81fb02a}}: foundationwiki: Revoke edit from * ([[phab:T294900|T294900]]) (duration: 00m 56s)
* 16:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-coord1002.eqiad.wmnet with OS bullseye
* 16:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bullseye
* 16:34 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/createLocalAccount.php --wiki=enwiki 'MU test [[phab:T244635|T244635]] 1'
* 16:06 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:46 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.7/includes/media/DjVuHandler.php: Backport: [[gerrit:738932{{!}}media: Avoid logspam in case of lack of 'data' in metadata]] (duration: 00m 55s)
* 15:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|734f3b0094799007e38dea1d152f0afeb3134e1b}}: uzwiki: Enable Growth features in dark mode ([[phab:T294245|T294245]]; 3/3) (duration: 00m 55s)
* 15:28 urbanecm@deploy1002: Synchronized wmf-config/config/uzwiki.yaml: {{Gerrit|734f3b0094799007e38dea1d152f0afeb3134e1b}}: uzwiki: Enable Growth features in dark mode ([[phab:T294245|T294245]]; 2/3) (duration: 00m 55s)
* 15:26 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|734f3b0094799007e38dea1d152f0afeb3134e1b}}: uzwiki: Enable Growth features in dark mode ([[phab:T294245|T294245]]; 1/3) (duration: 00m 55s)
* 15:26 urbanecm: mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=uzwiki --phab=[[phab:T294245|T294245]] # [[phab:T294245|T294245]]
* 15:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:24 elukey: import AMD ROCm 4.5 in thirdparty/amd-rocm45 for buster-wikimedia - [[phab:T295661|T295661]]
* 15:18 urbanecm: uzwiki: Create growthexperiments tables ([[phab:T294245|T294245]])
* 15:15 elukey: `reprepro --delete clearvanished` on apt1001 to clean-up thirdparty/amd-rocm38 (buster and stretch) - [[phab:T295661|T295661]]
* 14:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4f17e85d4708b52fc98c34b489d7504d5e94523c}}: GrowthExperiments: Disable link recommendation frontend on dewiki (duration: 00m 56s)
* 14:45 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
* 14:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:15 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:734423{{!}}Disable DPL on Wikiquotes where not in use (T287916)]] (duration: 00m 56s)
* 13:55 moritzm: installing java-atk-wrapper bugfix updates
* 13:51 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:40 Amir1: start of djvu clean up in commons in a screen. Gonna take a couple of days ([[phab:T275268|T275268]])
* 13:40 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.7/includes: Backport: [[gerrit:738641{{!}}Revert "media: Port DjVuImage::retrieveMetaData() to use BoxedCommand"]] (duration: 01m 01s)
* 13:36 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:34 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:25 topranks: Adding new policy-statement to CR routers via homer to set next-hop self on iBGP sessions (not yet configured for any peers).
* 12:46 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
* 12:45 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
* 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:02 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|6b3bacd986ab041a5e3aee06c6de04e344dd8015}}: uzwiki: Enable VisualEditor by default ([[phab:T294245|T294245]]) (duration: 00m 56s)
* 11:59 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
* 11:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
* 11:07 cmooney@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host rpki1001.eqiad.wmnet
* 11:04 urbanecm: wikiadmin@10.64.0.164(ukwiki)> delete from growthexperiments_mentor_mentee where gemm_mentee_id = 464811 /* Martin Urbanec (WMF) */;
* 11:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
* 10:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:57 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.7/includes/media/DjVuHandler.php: Backport: [[gerrit:738639{{!}}media: Make new DjVu metadata handler more defensive]] (duration: 00m 54s)
* 10:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
* 10:54 cmooney@cumin1001: START - Cookbook sre.ganeti.makevm for new host rpki1001.eqiad.wmnet
* 10:53 volans: upgrading python3-wmflib to 1.0.0-1 on all hosts buster+
* 10:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:43 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rpki1001.eqiad.wmnet
* 10:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/GrowthExperiments/: {{Gerrit|05d6550218f21f89171fcb8c73230e0855cf41a4}}: MenteeOverviewDataUpdater: Use UserOptionsManager::saveOptions ([[phab:T295339|T295339]]) (duration: 00m 56s)
* 10:34 cmooney@cumin1001: START - Cookbook sre.hosts.decommission for hosts rpki1001.eqiad.wmnet
* 10:34 topranks: Rebuilding rpki1001.eqiad.wmnet. with larger disk - going to decom and then re-create via cookbooks.
* 10:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:23 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.7/includes/media/: Backport: [[gerrit:738638{{!}}media: Build and use JSON for metadata of djvu instead of XML (T275268 T192866)]] (duration: 00m 56s)
* 10:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:00 moritzm: update Java on Hadoop and Presto nodes
* 09:59 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.7/includes/media/: Backport: [[gerrit:738636{{!}}media: Port DjVuImage::retrieveMetaData() to use BoxedCommand (T289228)]] (duration: 00m 56s)
* 09:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 09:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 09:39 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 08:49 moritzm: installing glibc bugfix updates from bullseye point release
* 08:07 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp6002.drmrs.wmnet with OS buster
* 07:41 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS buster


== June 20 ==
== 2021-11-14 ==
* 22:50 bblack: restarted gitblit java service on antimony
* 11:48 paravoid: disable cr1-eqiad:xe-3/0/6 (IXP port) to mitigate [[phab:T295650|T295650]]
* 04:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 20 04:27:14 UTC 2015 (duration 27m 13s)
* 02:21 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-20 02:21:30+00:00
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 07m 02s)


== June 19 ==
== 2021-11-13 ==
* 23:32 gwicke: upgraded restbase1006 to cassandra 2.1.7
* 18:43 AndyRussG: Enabled debug logging for PayPal IPN listener (updated SmashPig config {{Gerrit|a9e30591}} -> {{Gerrit|9567cc4a}} on frpig1001)
* 23:30 gwicke: starting cassandra bootstrap on restbase1009
* 02:59 ryankemper: [Elastic] `relforge` cluster's back to green, rolling restarts complete
* 21:37 gwicke: upgraded cassandra on 1003 to 2.1.7 (pre-release, likely going out on Monday)
* 02:57 ryankemper: [Elastic] `ryankemper@relforge1003:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad.service elasticsearch_6@relforge-eqiad-small-alpha.service`
* 18:32 godog: stop cassandra on restbase1008
* 02:56 ryankemper: [Elastic] Cluster's green, proceeding to next and final host
* 17:45 logmsgbot: krenair Synchronized private/PrivateSettings.php: sync 4a30446e for wikitech cleanup - T102361 (duration: 00m 12s)
* 02:52 ryankemper: [Elastic] `ryankemper@relforge1004:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad.service elasticsearch_6@relforge-eqiad-small-alpha.service`
* 17:24 godog: install linux 3.19 on restbase100[789]
* 02:52 ryankemper: [Elastic] Downtimed relforge* for 2 hours in order to performing a rolling restart of the two hosts `relforge1003` and `relforge1004`
* 17:12 ori: salt -t30 -G 'php:hhvm' cmd.run 'rm -f /usr/local/bin/check_tc_space' (https://gerrit.wikimedia.org/r/#/c/219102/)
* 16:54 moritzm: updated/rebooted nescio/maerlant to 3.19
* 13:40 andrewbogott: test test test
* 02:19 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-19 02:19:33+00:00
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 05m 08s)
* 00:49 springle: killed storm of research queries on dbstore1002, load avg 90+, replag, likely explosion, etc. emailing analytics@
* 00:13 logmsgbot: ebernhardson Synchronized php-1.26wmf10/extensions/Flow/tests/: no-op sync of flow test cases in wmf10 (duration: 00m 17s)
* 00:11 logmsgbot: ebernhardson Synchronized php-1.26wmf10/skins/Vector/: Bump Vector submodule in 1.26wmf10 for swat (duration: 00m 12s)


== June 18 ==
== 2021-11-12 ==
* 23:37 logmsgbot: ebernhardson Synchronized php-1.26wmf9/skins/Vector: Bump Vector in 1.26wmf9 for SWAT (duration: 00m 16s)
* 21:00 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:22 logmsgbot: ebernhardson Synchronized wmf-config/: Actually enable the feedback link on Special:Search (duration: 00m 17s)
* 20:57 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 23:08 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Enable wgCirrusSearchFeedbackLink on enwiki (duration: 00m 13s)
* 18:09 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 21:07 godog: start (bootstrap) cassandra on restbase1008
* 18:08 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 20:43 akosiaris: uploaded to apt.wikimedia.org trusty-wikimedia: apertium-urd-hin_0.1.0+svn~r60389-1
* 17:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 20:17 akosiaris: restarted salt on sca1001, truncate log files. keep a sample in /tmp/
* 17:35 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 20:03 chasemp: apache && hhvm restart for mw 1243 1250 1254 1256 1257
* 17:33 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 20:00 chasemp: apache && hhvm restart for mw...1256 1255 1254 1250 1243 1242 1071 1021
* 17:23 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 19:58 mutante: restarting hhvm on mw1021, mw1071
* 17:15 ottomata: restarting and arming keyholder on deploy1002 - [[phab:T295380|T295380]]
* 19:27 godog: bounce cassandra on restbase1003, new logging configuration
* 17:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 19:26 akosiaris: puppet-merged on strontium
* 16:59 otto@deploy1002: Finished deploy [airflow-dags/analytics@093f067] (hadoop-test): (no justification provided) (duration: 00m 04s)
* 19:15 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedia wikis to 1.26wmf10
* 16:59 otto@deploy1002: Started deploy [airflow-dags/analytics@093f067] (hadoop-test): (no justification provided)
* 19:06 godog: upgrade cassandra to 2.1.6 on restbase1003
* 16:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 18:56 akosiaris: uploaded to apt.wikimedia.org jessie-wikimedia: apertium-urd_0.1.0~r57551-1
* 16:38 otto@deploy1002: Finished deploy [airflow-dags/analytics@093f067] (hadoop-test): (no justification provided) (duration: 01m 12s)
* 18:56 akosiaris: uploaded to apt.wikimedia.org jessie-wikimedia: apertium-hin_0.1.0~r57344-1
* 16:36 otto@deploy1002: Started deploy [airflow-dags/analytics@093f067] (hadoop-test): (no justification provided)
* 18:56 akosiaris: uploaded to apt.wikimedia.org jessie-wikimedia: apertium-cy-en_0.1.1~r57554-1
* 16:15 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:43 legoktm: fixed content model of MediaWiki:Common.css@lrcwiki
* 16:11 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 18:18 YuviPanda: restarted nutcracker on wikitech
* 14:38 moritzm: installing 5.10.70 kernels on bullseye systems (just the update, no coordinated reboot)
* 18:16 YuviPanda: restarted keystone on labcontrol1001
* 11:05 jynus@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2100.codfw.wmnet with OS buster
* 17:13 gwicke: bouncing cassandra on restbase1002
* 10:47 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2100.codfw.wmnet with OS buster
* 17:11 godog: restart cassandra on restbase1004
* 10:46 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 15:53 gwicke: updated restbase to 7ffaf94b
* 10:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 15:13 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Hovercards: Disable test release on Catalan and Greek Wikipedias [[gerrit:215932]] (duration: 00m 13s)
* 10:42 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 15:06 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for deployment on 20150618 [[gerrit:218886]] (duration: 00m 14s)
* 10:41 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster
* 11:14 akosiaris: powercycling labstore2001
* 10:35 ema: A:cp re-enable puppet after successful testing of https://gerrit.wikimedia.org/r/c/operations/puppet/+/737424 on cp4027 [[phab:T293879|T293879]]
* 09:08 moritzm: added firejail_0.9.26-1~wmfjessie1 and firejail_0.9.26-1~wmftrusty1 to apt.wikimedia.org
* 10:25 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
* 08:45 jynus: very brief replication stop for s7, already corrected
* 10:17 ema: A:cp disable-puppet to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/737424 on cp4027 [[phab:T293879|T293879]]
* 06:51 Coren: rebooting labstore2001
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 100%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17736 and previous config saved to /var/cache/conftool/dbconfig/20211112-084813-root.json
* 06:32 legoktm: live hacking mw1017 for T102915
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 75%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17735 and previous config saved to /var/cache/conftool/dbconfig/20211112-083310-root.json
* 05:26 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 18 05:26:01 UTC 2015 (duration 26m 0s)
* 08:27 moritzm: imported openjdk-8 8u312-b07-1~deb11u1 to component/jdk8 for bullseye-wikimedia (rebuild of latest Java 8 security release for Bullseye)
* 02:48 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-18 02:48:44+00:00
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 50%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17734 and previous config saved to /var/cache/conftool/dbconfig/20211112-081806-root.json
* 02:46 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 05m 03s)
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 40%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17733 and previous config saved to /var/cache/conftool/dbconfig/20211112-080302-root.json
* 02:32 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-18 02:32:45+00:00
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 25%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17732 and previous config saved to /var/cache/conftool/dbconfig/20211112-074759-root.json
* 02:28 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 56s)
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 20%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17731 and previous config saved to /var/cache/conftool/dbconfig/20211112-073255-root.json
* 02:04 springle: applied T99941 scema change to all remaining affected (ie, old) wikis
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 10%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17730 and previous config saved to /var/cache/conftool/dbconfig/20211112-071752-root.json
* 02:01 tgr: ran https://gerrit.wikimedia.org/r/#/c/159350/7/backend/schema/mysql/developer_agreement.sql on mediawikiwiki
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Add weight for db1104', diff saved to https://phabricator.wikimedia.org/P17729 and previous config saved to /var/cache/conftool/dbconfig/20211112-070236-marostegui.json
* 01:32 ejegg: updated payments from f33d0a8687a120a2057a7e6acad67da63b17f97e to a17ee221db0dbde70c92e24fc188379b6dbad613
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 5%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P17728 and previous config saved to /var/cache/conftool/dbconfig/20211112-070141-root.json
* 01:20 logmsgbot: ori Synchronized php-1.26wmf10/resources/src/mediawiki.action/mediawiki.action.edit.stash.js: 0c21a14a6e: Revert StashEdit: Use postWithToken (duration: 00m 13s)
* 00:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:06 twentyafterfour: applied hotfix for T102276 and restarted apache on iridium
* 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:00 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf10
* 00:15 tgr: UTC late deploys done
* 00:14 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:738284{{!}}Enable GrowthExperiments image recommendations on eswiki (T294878)]] (duration: 00m 56s)


== June 17 ==
== 2021-11-11 ==
* 23:35 logmsgbot: catrope Synchronized php-1.26wmf10/extensions/Gather: SWAT (duration: 00m 14s)
* 16:56 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster
* 23:35 gwicke: rolled back restbase to 90817c2a
* 16:30 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
* 23:24 logmsgbot: catrope Synchronized php-1.26wmf9/extensions/MobileFrontend: SWAT (duration: 00m 15s)
* 16:28 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster
* 23:23 logmsgbot: catrope Synchronized php-1.26wmf9/extensions/Flow: SWAT (duration: 00m 15s)
* 16:28 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
* 22:45 gwicke: rolling restart of cassandra nodes
* 16:26 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster
* 22:09 gwicke: rolling restart of restbase instances to apply puppet change after puppet actually ran on all nodes
* 16:26 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
* 21:58 gwicke: rolling restart of restbase instances to apply config change
* 16:26 jynus@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1139.eqiad.wmnet with OS buster
* 21:56 godog: restart nutcracker on mw1145
* 16:15 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp6001.drmrs.wmnet with OS buster
* 21:35 gwicke: restarting cassandra on restbase1005
* 16:12 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
* 20:47 mutante: temp. stopped icinga-wm
* 15:49 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS buster
* 20:37 gwicke: deployed RESTBase 7ffaf94bfc
* 15:44 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp6001.drmrs.wmnet with OS buster
* 20:24 cscott: updated Parsoid to version 402ddf66
* 15:18 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS buster
* 20:01 ottomata: resized antimony's / LV from 30G to 100G.  looks like /var/lib/git was getting filled up
* 15:16 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster
* 19:43 jynus: rolling schema changes on hewiki
* 14:59 moritzm: installing krb5 security updates on buster/bullseye (client-side libs/tools only, KDCs already fixed)
* 19:29 godog: downgrade and restart cassandra to 2.1.3 on restbase1001, metrics not being pushed to graphite with 2.1.6
* 14:55 moritzm: installing PHP 7.0 security updates
* 19:05 godog: bounce cassandra on xenon
* 14:52 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
* 18:46 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Ic03b152de: Make $wgUploadPath for commons https only for benefit instant commons (duration: 00m 14s)
* 14:50 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - btullis@cumin1001
* 18:11 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf10
* 14:46 moritzm: installing sqlalchemy security updates on stretch
* 17:45 godog: bounce cassandra on restbase1001
* 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:39 mutante: repooled mw1234
* 14:41 moritzm: installing libxstream-java security updates
* 17:24 ottomata: starting reinstall of Zookeeper analytics nodes (analytics102[345]): https://phabricator.wikimedia.org/T101713
* 14:38 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade. - btullis@cumin1001
* 17:16 godog: bounce cassandra on restbase1001
* 14:33 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
* 17:14 jynus: rolling schema changes on ruwiki master
* 14:32 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster
* 17:13 mutante: running puppet via salt on api appservers in batches, switch to ganglia_new and carbon
* 14:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 17:12 godog: cassandra stopped sending graphite metrics after restart, investigating (test cluster works fine tho)
* 14:21 volans: uploaded python3-wmflib_1.0.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 16:58 jynus: rolling schema changes on ruwiki slaves
* 14:15 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
* 16:28 godog: start upgrading restbase1001 to cassandra 2.1.6 T102015
* 14:12 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1139.eqiad.wmnet with OS buster
* 16:02 logmsgbot: thcipriani Finished scap: Wikitech-Ldap host record roll-out (duration: 24m 35s)
* 14:10 jynus@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2100.codfw.wmnet with OS buster
* 15:37 logmsgbot: thcipriani Started scap: Wikitech-Ldap host record roll-out
* 14:05 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons. - btullis@cumin1001
* 15:19 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Give patrolmarks right to "*" on dewiki [[gerrit:218901]] (duration: 00m 13s)
* 13:59 moritzm: installing bind9 security updates (only client-side-tools/libs)
* 15:17 logmsgbot: anomie Synchronized wmf-config/throttle.php: SWAT: Add a throttle exception for United Islands of Prague [[gerrit:217413]] (duration: 00m 14s)
* 13:48 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1139.eqiad.wmnet with OS buster
* 15:15 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable captcha on labswiki for now [[gerrit:218908]] (duration: 00m 13s)
* 13:45 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2100.codfw.wmnet with OS buster
* 15:10 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Add extra namespace aliases for Italian Wikipedia [[gerrit:215708]] (duration: 00m 13s)
* 13:38 root@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
* 15:08 anomie: SWAT: Enable anti-abuse features on labswiki [[gerrit:218903]]
* 13:38 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 15:08 jynus: testing some schema changes on testwiki
* 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:00 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on nowiki and plwiki (duration: 00m 13s)
* 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:56 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on fiwiki and idwiki (duration: 00m 13s)
* 13:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:735367{{!}}Load Wikibase Client before other Wikibase extensions (T294224)]] (duration: 00m 55s)
* 13:26 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on bgwiki and eowiki (duration: 00m 13s)
* 13:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:52 akosiaris: reload pybal on lvs1006
* 13:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:50 mobrovac: finished deploying mathoid I40ef68 on SCA
* 13:01 Lucas_WMDE: UTC morning backport+config window formally over (I’ll do one more config change shortly)
* 10:48 akosiaris: repooled mathoid.svc.eqiad.wmnet: sca1002 backend
* 13:00 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:738213{{!}}GrowthExperiments: Add campaign pattern for control group (T295068)]] (duration: 00m 55s)
* 10:44 akosiaris: enable puppet on sca1002
* 12:50 lucaswerkmeister-wmde@deploy1002: Synchronized multiversion/buildConfigCache.php: Config: [[gerrit:737189{{!}}Don't need to keep all config in memory]] (resync, previous deploy for this file was missing `git rebase`) (duration: 00m 55s)
* 10:43 akosiaris: enable puppet
* 12:47 kharlan@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: Backport: [[gerrit:737960{{!}}CreateAccountCampaign: Show/hide new HTML based on query param (T295068) (2/2 SpecialCreateAccountCampaign.php)]] (duration: 00m 55s)
* 10:43 akosiaris: depool sca1002 for mathoid.svc.eqiad.wmnet
* 12:46 kharlan@deploy1002: Synchronized php-1.38.0-wmf.7/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:737960{{!}}CreateAccountCampaign: Show/hide new HTML based on query param (T295068) (1/2 HomepageHooks.php)]] (duration: 00m 54s)
* 10:43 akosiaris: reloaded pybal on lvs1003
* 12:37 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1116.eqiad.wmnet with OS buster
* 10:28 akosiaris: repool sca1002, depool sca1001
* 12:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:18 mark: Halting pvmove of md124 on labstore1001
* 12:30 jynus@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2097.codfw.wmnet with OS buster
* 09:30 akosiaris: disable puppet on sca1001
* 12:28 kharlan@deploy1002: Synchronized php-1.38.0-wmf.7/includes/specialpage/LoginSignupSpecialPage.php: Backport: [[gerrit:737961{{!}}LoginSignup: Add function for overriding benefits container (T295068)]] (duration: 00m 57s)
* 09:09 akosiaris: depool sca1001, resource: mathoid
* 12:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:09 akosiaris: puppet disabled on sca1002
* 12:22 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:37 YuviPanda: run sudo salt -t 20 -b 100 '*' cmd.run 'sudo service salt-minion restart' on virt1000, attempt to get them to answer on labcontrol1001 instead
* 12:21 moritzm: imported openjdk-8 8u312-b07-1~deb10u1 to component/jdk8 for buster-wikimedia (rebuild of latest Java 8 security release for Buster)
* 06:52 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jun 17 06:52:58 UTC 2015 (duration 52m 57s)
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:56 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-17 02:56:49+00:00
* 12:15 awight@deploy1002: Synchronized multiversion/buildConfigCache.php: Config: [[gerrit:737189{{!}}Don't need to keep all config in memory]] (duration: 00m 55s)
* 02:55 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1045 (duration: 00m 13s)
* 12:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:54 springle: found wikiversions.json modified on tin since 2015-06-16 23:27 (catrope?); stashed and reapplied the file in order to do a pull
* 12:13 awight@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Config: [[gerrit:737192{{!}}Avoid error suppression]] (duration: 00m 55s)
* 02:54 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 04m 44s)
* 12:10 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host db2097.codfw.wmnet with OS buster
* 02:35 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-17 02:35:23+00:00
* 12:10 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host db1116.eqiad.wmnet with OS buster
* 02:32 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 06m 12s)
* 12:08 awight@deploy1002: Synchronized multiversion/buildConfigCache.php: Config: [[gerrit:737187{{!}}Anchor relative import]] (duration: 00m 56s)
* 02:21 logmsgbot: ori Synchronized php-1.26wmf9/extensions/CentralNotice/modules/ext.centralNotice.bannerController/bannerController.js: I480cbc7ad (duration: 00m 12s)
* 11:32 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: working on network tests
* 02:21 logmsgbot: ori Synchronized php-1.26wmf10/extensions/CentralNotice/modules/ext.centralNotice.bannerController/bannerController.js: I480cbc7ad (duration: 00m 12s)
* 11:31 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: working on network tests
* 00:10 paravoid: draining esams because of upcoming network maintenance window
* 11:28 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov1001.eqiad.wmnet with OS buster
* 11:04 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host dbprov1001.eqiad.wmnet with OS buster
* 10:56 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov2001.codfw.wmnet with OS buster
* 10:37 moritzm: updated routinator in thirdparty/routinator for bullseye-wikimedia to 0.10.12 [[phab:T292503|T292503]]
* 10:24 jynus@cumin2002: START - Cookbook sre.hosts.reimage for host dbprov2001.codfw.wmnet with OS buster
* 10:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3065.esams.wmnet with OS buster
* 10:15 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: working on network tests
* 10:15 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: working on network tests
* 10:15 vgutierrez: pool cp3065 running haproxy - [[phab:T290005|T290005]]
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Remove contributions from s5 eqiad [[phab:T263127|T263127]]', diff saved to https://phabricator.wikimedia.org/P17725 and previous config saved to /var/cache/conftool/dbconfig/20211111-092528-marostegui.json
* 09:13 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3065.esams.wmnet with OS buster
* 09:10 vgutierrez: depool cp3065 to be reimaged as cache::upload_haproxy - [[phab:T290005|T290005]]
* 09:03 arturo: pull all packages for buster-wikimedia/thirdparty/kubeadm-k8s-1-21 ([[phab:T282942|T282942]])
* 08:17 marostegui: Upgrade db2078 [[phab:T288720|T288720]]
* 08:13 marostegui: Restart db1132 [[phab:T288720|T288720]]
* 06:56 elukey: `systemctl start prometheus-mysqld-exporter@analytics_meta` on db1108
* 06:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1104.eqiad.wmnet with OS buster
* 06:10 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1104.eqiad.wmnet with OS buster
* 06:06 marostegui: Stop replication on db1104 (old master) [[phab:T294321|T294321]]
* 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (old master) [[phab:T294321|T294321]]', diff saved to https://phabricator.wikimedia.org/P17723 and previous config saved to /var/cache/conftool/dbconfig/20211111-060242-marostegui.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1109 to s8 primary and set section read-write [[phab:T294321|T294321]]', diff saved to https://phabricator.wikimedia.org/P17722 and previous config saved to /var/cache/conftool/dbconfig/20211111-060102-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - [[phab:T294321|T294321]]', diff saved to https://phabricator.wikimedia.org/P17721 and previous config saved to /var/cache/conftool/dbconfig/20211111-060031-marostegui.json
* 06:00 marostegui: Starting s8 eqiad failover from db1104 to db1109 - [[phab:T294321|T294321]]
* 05:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 31 hosts with reason: Primary switchover s8 [[phab:T294321|T294321]]
* 05:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 31 hosts with reason: Primary switchover s8 [[phab:T294321|T294321]]
* 02:52 eileen: civicrm revision {{Gerrit|7e38867f}} -> {{Gerrit|817e514a}} (latest)
* 00:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:18 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Set wgForeignUploadTargets on officewiki [[phab:T295510|T295510]] (duration: 00m 56s)


== June 16 ==
== 2021-11-10 ==
* 23:28 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Enable local upload on fawikivoyage; enable logging for T76305 (duration: 00m 13s)
* 23:46 ebernhardson: start test backup/restore of 1tb commonswiki from relforge to swift in eqiad
* 23:28 logmsgbot: catrope Synchronized wmf-config/CommonSettings.php: Set previous values for password length policies (duration: 00m 16s)
* 23:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript updateSpecialPages.php --wiki=foundationwiki --only=DoubleRedirects
* 23:17 logmsgbot: twentyafterfour Finished scap: testwiki to 1.26wmf10 (duration: 43m 04s)
* 23:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript updateSpecialPages.php --wiki=foundationwiki --only=BrokenRedirects
* 23:02 godog: restore INFO cassandra logging level on restbase1003
* 22:06 bblack: dns2002 - restart ntp.servce to fix drmrs peering
* 22:44 godog: start cassandra on restbase1008
* 22:01 bblack: dns1002 - restart ntp.servce to fix drmrs peering
* 22:43 godog: enable back some cassandra debugging on restbase1003
* 21:56 bblack: dns2001 - restart ntp.service to fix drmrs peering
* 22:33 logmsgbot: twentyafterfour Started scap: testwiki to 1.26wmf10
* 21:53 bblack: dns1001 - restart ntp.service to see if drmrs associations cleared up after dns changes, etc
* 22:26 urandom: restored default logging level on restbase1003
* 21:24 bblack: asw1-b1[23]-drmrs: added ipv6 router-advertisement clauses, which work, but probably imperfectly :)
* 22:22 urandom: enabling even more debugging on restbase1003
* 19:52 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns6001.wikimedia.org with OS buster
* 22:14 urandom: enable (some) debug logging on restbase1003
* 19:51 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns6002.wikimedia.org with OS buster
* 21:57 logmsgbot: twentyafterfour scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.SxGNHsmVYP" ' returned non-zero exit status 1 (duration: 01m 24s)
* 19:51 ottomata: altering <nowiki>{</nowiki>eqiad,codfw<nowiki>}</nowiki>.maps.tiles_change to increase to 6 partitions in kafka main-eqiad, main-codfw and jumbo-eqiad: https://phabricator.wikimedia.org/T293366#7497076
* 21:56 logmsgbot: twentyafterfour Started scap: testwiki to 1.26wmf10
* 19:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:34 logmsgbot: krinkle Synchronized php-1.26wmf9/extensions/WikimediaEvents/modules/ext.wikimediaEvents.resourceloader.js: T101806 live hack (duration: 00m 12s)
* 19:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:24 Coren: labstore1001 pvmove of slice2 to slice 51 started; some bursts of iowait expected but should have minimal enduser impact)
* 19:43 cjming: end of UTC evening backport & config window
* 18:36 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Fix usage tracking setting (duration: 00m 14s)
* 19:42 cjming: end of UTC late backport & config window
* 18:03 godog: bounce statsite on graphite1001, stuck while writing to graphite
* 19:41 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:737814{{!}}Lower mobile web click tracking rate (T295432)]] (duration: 00m 55s)
* 17:30 ejegg: update SmashPig on listener from e1e925c9fc2a60c1e14ef01d8b653dc09512f51f to 258f2c917b1ae50b01231927bcd6f58ecaa8940b
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:23 logmsgbot: krinkle Synchronized php-1.26wmf9/includes/resourceloader/ResourceLoader.php: undo live hack (duration: 00m 13s)
* 19:35 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:737814{{!}}Lower mobile web click tracking rate (T295432)]] (duration: 00m 57s)
* 17:09 logmsgbot: aude Synchronized arbitraryaccess.dblist: Enable arbitrary access on gomwiki and lrcwiki (duration: 00m 13s)
* 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:09 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on second batch of s3 wikis (duration: 00m 13s)
* 19:23 legoktm: uploaded php-pcov_1.0.6-4+wmf1~buster1_amd64.changes to apt.wm.o ([[phab:T243847|T243847]])
* 17:03 logmsgbot: bblack Synchronized wmf-config/InitialiseSettings.php: wgCanonicalServer: HTTPS for all (duration: 00m 15s)
* 18:57 mutante: removing mediawiki font packages from parsoid hosts - [[phab:T294378|T294378]]
* 16:44 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 13s)
* 18:37 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host dns6002.wikimedia.org with OS buster
* 16:43 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 13s)
* 18:37 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host dns6001.wikimedia.org with OS buster
* 16:43 logmsgbot: krenair Synchronized w/static/images/project-logos/gomwiki.png: (no message) (duration: 00m 14s)
* 18:19 dancy@deploy1002: Finished scap: Config: [[gerrit:737976{{!}}Get rid of obsolete train-versions.json file]] (duration: 15m 57s)
* 16:42 logmsgbot: krenair Synchronized langlist: gomwiki (duration: 00m 13s)
* 18:09 bblack: drmrs - rebooting a bunch of hosts to bios for further settings, please ignore any accidental alerts - they do *look* like they're alert-disabled)
* 16:41 logmsgbot: krenair rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
* 18:08 vgutierrez: restart haproxy on cp4026 and cp5006 to enable hitless reloads - [[phab:T290005|T290005]]
* 16:40 logmsgbot: krenair Synchronized database lists: (no message) (duration: 00m 13s)
* 18:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:29 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 13s)
* 18:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:27 logmsgbot: krenair Synchronized langlist: (no message) (duration: 00m 14s)
* 18:03 dancy@deploy1002: Started scap: Config: [[gerrit:737976{{!}}Get rid of obsolete train-versions.json file]]
* 16:25 logmsgbot: krenair Synchronized w/static/images/project-logos/lrcwiki.png: (no message) (duration: 00m 13s)
* 17:10 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns6001.wikimedia.org with OS buster
* 16:21 moritzm: updated copper, oxygen, labstore2001 and labnodepool1001 to the 3.19 kernel
* 16:49 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns6002.wikimedia.org with OS buster
* 16:11 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 13s)
* 16:47 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:10 logmsgbot: krenair Synchronized wmf-config: (no message) (duration: 00m 14s)
* 16:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:06 logmsgbot: krenair rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
* 16:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:05 logmsgbot: krenair Synchronized database lists: (no message) (duration: 00m 15s)
* 16:32 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T295480|T295480]]: Move all cirrussearch traffic to codfw (duration: 00m 55s)
* 15:43 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: templateeditor: add templateeditor right in hewiki [[gerrit:218426]] (duration: 00m 13s)
* 16:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:09 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Turn on wgGenerateThumbnailOnParse for wikitech. [[gerrit:218553]] (duration: 00m 12s)
* 16:28 elukey: move atskafka to the new CA bundle - [[phab:T291905|T291905]]
* 15:03 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Add wikis for CX deployment on 20150616 [[gerrit:218341]] (duration: 00m 12s)
* 16:26 elukey: move kafkatee instances (analytics-test,centralog) to the new CA bundle - [[phab:T291905|T291905]]
* 14:18 cmjohnson: barium is going down for disk replacement
* 16:14 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host dns6002.wikimedia.org with OS buster
* 13:38 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on dewiki (duration: 00m 15s)
* 16:12 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host dns6001.wikimedia.org with OS buster
* 13:18 akosiaris: rebooted etherpad1001 for kernel upgrades
* 15:52 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T295480|T295480]]: Move all cirrussearch traffic to codfw (duration: 00m 56s)
* 12:51 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Repool es2005, es2006 and es2007 after maintenance (duration: 00m 13s)
* 14:09 legoktm: restarted mailman3/mailman3-web to pick up new DNS for m5-master
* 12:44 logmsgbot: aude Synchronized usagetracking.dblist: Enable Wikibase usage tracking on cswiki (duration: 00m 14s)
* 14:08 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
* 12:20 logmsgbot: aude Synchronized usagetracking.dblist: Enable usage tracking on ruwiki (duration: 00m 15s)
* 14:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:21 paravoid: restarting the puppetmaster
* 13:48 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade. - elukey@cumin1001
* 11:19 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1073, warm up (duration: 00m 13s)
* 13:47 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 10:36 akosiaris: rebooting ganeti200{1..6}.codfw.wmnet for kernel upgrades
* 13:46 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001
* 09:33 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Depool es2005, es2006 and es2007 for maintenance (duration: 00m 14s)
* 13:36 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons. - elukey@cumin1001
* 09:10 YuviPanda: deleted huge puppet-master.log on labcontrol1001
* 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:05 jynus: added m5-slave to dns servers
* 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:52 paravoid: restarting hhvm on mw1121
* 13:03 Lucas_WMDE: UTC morning backport+config window done
* 07:52 moritzm: blacklisted the overlayfs kernel module (prevents a reliable local root exploit on all Ubuntu systems). no systems in the fleet had an overlaysfs mount present or the kernel module loaded, so there should be no impact on existing systems. Note: This is a bandaid, I'll create a Phab task to deploy this via puppet in the future (and to also blacklist additional desktopy kernel modules which increase our attack
* 13:01 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:737696{{!}}Enable the visual editor on the 2022 namespace on Wikimania wiki (T295267)]] (duration: 00m 55s)
* 07:39 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1005 (duration: 00m 14s)
* 12:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:24 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 16 06:24:04 UTC 2015 (duration 24m 3s)
* 12:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:18 godog: restore ES replication throttling to 20mb/s
* 12:53 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:737695{{!}}Update $wgNamespacesToBeSearchedDefault for Wikimania 2022 (T295267)]] (duration: 00m 55s)
* 06:13 godog: restore ES replication throttling to 40mb/s
* 12:46 XioNoX: delete route6 object for 2a02:ec80::/32 (split in two /48s)
* 06:08 logmsgbot: filippo Synchronized wmf-config/PoolCounterSettings-common.php: unthrottle ES (duration: 00m 14s)
* 12:46 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bea7fa6] (eqiad): Update kartotherian-package to {{Gerrit|006c027}} (duration: 01m 20s)
* 05:56 godog: bump ES replication throttling to 60mb/s
* 12:45 XioNoX: delete ROA for  2a02:ec80::/32
* 05:50 manybubbles: ok - we're yellow and recovering. ops can take this from here. We have a root cause and we have things I can complain about to the elastic folks I plan to meet with today anyway. I'm going to finish waking up now.
* 12:45 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bea7fa6] (eqiad): Update kartotherian-package to {{Gerrit|006c027}}
* 05:49 manybubbles: reenabling puppet agent on elasticsearch machines
* 12:43 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bea7fa6] (codfw): Update kartotherian-package to {{Gerrit|006c027}} (duration: 01m 31s)
* 05:46 manybubbles: I expect them to be red for another few minutes during the initial master recovery
* 12:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bea7fa6] (codfw): Update kartotherian-package to {{Gerrit|006c027}}
* 05:45 manybubbles: started all elasticsearch nodes and now they are recovering.
* 12:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:41 godog: restart gmond on elastic1007
* 12:38 mbsantos@deploy1002: Finished deploy [tilerator/deploy@ba00d7a] (eqiad): Update tilerator-package to {{Gerrit|1221976}} (duration: 01m 15s)
* 05:39 logmsgbot: filippo Synchronized wmf-config/PoolCounterSettings-common.php: throttle ES (duration: 00m 13s)
* 12:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:25 manybubbles: shutting down all the elasticsearch on the elasticsearch nodes against - another full cluster restart should fix it like it did last time...............
* 12:36 mbsantos@deploy1002: Started deploy [tilerator/deploy@ba00d7a] (eqiad): Update tilerator-package to {{Gerrit|1221976}}
* 05:11 godog: restart elasticsearch on elastic1031
* 12:36 mbsantos@deploy1002: Finished deploy [tilerator/deploy@ba00d7a] (codfw): Update tilerator-package to {{Gerrit|1221976}} (duration: 02m 06s)
* 03:06 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 00m 12s)
* 12:34 mbsantos@deploy1002: Started deploy [tilerator/deploy@ba00d7a] (codfw): Update tilerator-package to {{Gerrit|1221976}}
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf9) at 2015-06-16 02:27:51+00:00
* 12:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:735394{{!}}Remove tmpUseRequestLanguagesForRdfOutput Wikibase setting (T285795)]] (2/2) (duration: 00m 56s)
* 02:24 logmsgbot: l10nupdate Synchronized php-1.26wmf9/cache/l10n: (no message) (duration: 05m 52s)
* 12:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:735394{{!}}Remove tmpUseRequestLanguagesForRdfOutput Wikibase setting (T285795)]] (1/2) (duration: 00m 56s)
* 00:55 tgr: running extensions/Gather/maintenance/updateCounts.php for gather wikis - https://phabricator.wikimedia.org/T101460
* 12: