You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(elukey: powercycle mw1272 - no ssh, no tty available via com2 - DIMM correctable errors + OEM errors registered in getsel)
imported>Stashbot
(krinkle@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/3D/modules/mmv.3d.js: T224812 / bd4fbfddbe1a0 (duration: 01m 07s))
Line 1: Line 1:
== 2019-02-03 ==
== 2019-06-01 ==
* 20:25 elukey: powercycle mw1272 - no ssh, no tty available via com2 - DIMM correctable errors + OEM errors registered in getsel
* 22:49 krinkle@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/3D/modules/mmv.3d.js: [[phab:T224812|T224812]] / {{Gerrit|bd4fbfddbe1a0}} (duration: 01m 07s)
* 18:56 elukey: started a tmux session on dbstore1002 to migrate all the tokudb tables of mediawikiwiki to InnoDB - (s3 replication broken)
* 17:53 elukey: start all slaves on dbstore1002 (After a crash + recovery) + moved mediawikiwiki.revision_actor_temp to Innodb to unblock s3 slave replication (still broken though)
* 04:55 legoktm@deploy1001: Synchronized wmf-config/extension-list: Remove WikibaseQuality from extensions-list ([[phab:T208499|T208499]]) (duration: 00m 51s)
* 01:10 elukey: powercycle mw1299 - can't ssh nor get a tty via console - racadm getsel shows "An OEM diagnostic event occurred."


== 2019-02-02 ==
== 2019-05-31 ==
* 20:42 chaomodus: restarted pdfrender on scb1003
* 21:47 aaron@deploy1001: Synchronized wmf-config/db-eqiad.php: Set "secret" field in $wgLBFactoryConf for ChronologyProtector HMACs (duration: 00m 47s)
* 20:41 chaomodus: restarted pdfrender on scb1004
* 21:46 aaron@deploy1001: Synchronized wmf-config/db-codfw.php: Set "secret" field in $wgLBFactoryConf for ChronologyProtector HMACs (duration: 00m 50s)
* 20:06 chaomodus: parsoid was failed on scandium and alerting, the service parsoid-vd was restarted and appears to have come back
* 21:10 bblack: cp3034: repool - [[phab:T222937|T222937]]
* 05:44 jforrester@deploy1001: Synchronized php-1.33.0-wmf.14/extensions/VisualEditor/lib/ve/src/ui/dialogs/ve.ui.FindAndReplaceDialog.js: b/src/ui/dialogs/ve.ui.FindAndReplaceDialog.js [[phab:T214963|T214963]] Hot-deploy VE fix to stop hitting user pref writes without debounce (duration: 01m 02s)
* 20:04 bblack: cp3034: depool for reimage - [[phab:T222937|T222937]]
* 18:44 marostegui: Start MySQL on es1019 - [[phab:T213422|T213422]]
* 18:34 jgleeson: payments-wiki updated from {{Gerrit|a76658f0a3}} to {{Gerrit|c6c7bbf71e}}
* 17:29 andrewbogott: added jeh to the 'ops' group in ldap
* 16:20 ariel@deploy1001: Finished deploy [dumps/dumps@fd6100a]: remove orderrevs config option, unneeded now (duration: 00m 03s)
* 16:20 ariel@deploy1001: Started deploy [dumps/dumps@fd6100a]: remove orderrevs config option, unneeded now
* 15:05 bblack: cp3039: restart varnish-be for mbox lag (likely induced by 3049's depool for ATS conversion!)
* 15:00 Krinkle: krinkle@deploy1001: pulling down {{Gerrit|6f91b41}} for  php-1.34-wmf.7/extensions/ORES (without deploy), commit seems test-only
* 14:59 Krinkle: krinkle@deploy1001: git status in php-1.34-wmf.7/ is dirty (extensions/ORES)
* 14:52 bblack: pool cp3049 back into service - [[phab:T222937|T222937]]
* 14:32 onimisionipe: depool maps2004 (again) - [[phab:T224395|T224395]]
* 14:32 elukey: powercycle notebook1003 - host stuck due to user processes, no ssh available, OOM didn't trigger
* 14:20 _joe_: rolling restart of php-fpm across production to pick up the shorter revalidate frequency for [[phab:T224491|T224491]]
* 14:10 bblack: reboot cp3049 - [[phab:T222937|T222937]]
* 13:16 bblack: depool cp3049 for reimage - [[phab:T222937|T222937]]
* 11:46 jynus: stop and upgrade db2084
* 11:09 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1099 after maintenance (duration: 00m 48s)
* 10:54 jynus: depool labsdb1010 for maintenance
* 10:47 arturo: merging multiple commits to labs/private.git. We now require `puppet-merge --labsprivate` and people may not be yet aware of that
* 09:28 jynus: stop and upgrade db2073
* 09:11 jynus: stop and upgrade db2095 (s2, s4, s6, s7)
* 08:33 jynus: upgrade and restart db2065
* 08:16 jynus: depool labsdb1011 for maintenance
* 07:54 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099 with low weight (duration: 00m 49s)
* 07:43 _joe_: restarting php-fpm on canaries
* 07:24 _joe_: repooling mw1348
* 07:24 jynus: upgrade and restart labsdb1009
* 07:15 _joe_: draining mw1348 from traffic
* 07:14 jynus: depool labsdb1009 for maintenance
* 06:55 jynus: upgrade and restart db2058
* 06:33 _joe_: repooled mw1348
* 06:21 jijiki: depool mw1348
* 06:16 _joe_: restarting php-fpm on mw1348
* 00:08 jgleeson: Updating civicrm from {{Gerrit|bb4acf3d8a}} to {{Gerrit|e028bfcd63}}


== 2019-02-01 ==
== 2019-05-30 ==
* 23:16 vgutierrez: restart pdfrender on scb1004
* 23:36 XioNoX: remove BGP sessions to starhub on cr4-ulsfo (left the IXP)
* 21:57 ejegg: updated payments-wiki-staging from {{Gerrit|7767c7027e}} to {{Gerrit|52a271e681}}
* 22:59 marxarelli: deleted 95 docker images from contint1001, freeing ~ 8G on / cc: [[phab:T219850|T219850]]
* 21:25 ejegg: updated payments-wiki-staging to fundraising/REL1_31 branch
* 22:59 XioNoX: add terms to drop specific icmp frag packets from cr1/2-eqiad - [[phab:T224186|T224186]]
* 07:13 bawolff_: reset 2FA on wikitech for [[User:Cicalese]]
* 22:53 marxarelli: deleting stale docker images from contint1001, cc: [[phab:T207707|T207707]] [[phab:T219850|T219850]]
* 22:25 mutante: phab2001 / phab1003 - why is 'git status' in /srv/phab/phabricator unclean with lots of file deletions but also not identical
* 22:24 mutante: phab2001 - scap pull - but it fails with directory /srv/mediawiki not found  that's so wrong
* 22:20 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/WikimediaEvents/: Avoid division by zero warnings [[phab:T224686|T224686]] (duration: 00m 49s)
* 22:19 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/PageTriage/: Fix broken feed - [[phab:T224693|T224693]] (duration: 00m 51s)
* 21:27 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on test2wiki db, based on PageTriageTagsPatch-recreated.sql. [[phab:T224693|T224693]], [[phab:T189929|T189929]]
* 21:12 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on testwiki db, based on PageTriageTagsPatch-recreated.sql. [[phab:T224693|T224693]], [[phab:T189929|T189929]]
* 21:11 Krinkle: krinkle@mwmaint1002 Add 1 row to pagetriage_tags table on enwiki, based on PageTriageTagsPatch-recreated.sql. [[phab:T224693|T224693]], [[phab:T189929|T189929]]
* 21:10 niharika29@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/PageTriage: Bump wgPageTriageCacheVersion [[phab:T224693|T224693]] (duration: 00m 51s)
* 21:07 XioNoX: add RPKI sessions on cr4-ulsfo - [[phab:T220669|T220669]]
* 20:39 twentyafterfour: phabricator: restart ssh-phab.service
* 19:49 mutante: sodium (mirrors) -  sudo -u mirror /usr/local/sbin/update-ubuntu-mirror
* 18:49 Urbanecm: Morning SWAT finished
* 18:47 urbanecm@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/GrowthExperiments/: [[:gerrit:513300{{!}}QuestionPoster: Correctly set timestamp when question is posted]] ([[phab:T223338|T223338]]) (duration: 00m 51s)
* 18:26 mutante: phab1003 - switch 'vcs' user to 'NP' to match phab1001 setup and then /srv/phab/phabricator# ./bin/config set diffusion.ssh-user vcs ([[phab:T224677|T224677]])
* 18:24 XioNoX: bounce eqord-ulsfo interface to try to fix BFD sessions
* 18:12 Krinkle: Running `php7adm /opcache-free`  on mw1348 and mw1321, [[phab:T224491|T224491]]
* 18:12 Krinkle: Running `php7adm /opcache-free`  on mw1348 and mw1321
* 18:11 Krinkle: mw1348 (recent api/php72 100% experiment) shows signs of corruption
* 18:11 Krinkle: mw1321 php7.2 shows signs of corruption for over 2 hours – https://phabricator.wikimedia.org/T224491#5224464
* 18:03 krinkle@deploy1001: Synchronized wmf-config/arclamp.php: (no justification provided) (duration: 00m 53s)
* 16:24 bblack: re-pool cp3047 into service as ats-be - [[phab:T222937|T222937]]
* 16:04 mutante: phab1001 - removing 2620:0:861:103:10:64:32:186/128 from eth0
* 16:03 mutante: phab1001 - removing 10.64.32.186/32 from eth0
* 16:01 mutante: phab1001 - removing git-ssh.wm.org IP from interface - phab1003 - activating IPv6 listen address for git-ssh
* 15:36 jynus: stop es1019 for maintenance [[phab:T213422|T213422]]
* 15:26 cmjohnson1: shutting down db1099 to swap DIMM [[phab:T221502|T221502]]
* 15:20 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with full weight; depool es1019 (duration: 00m 52s)
* 15:19 herron: performing rolling reboots of eqiad kafka main cluster hosts for security updates
* 15:06 onimisionipe: pooled maps2004 - osm import is complete - [[phab:T224395|T224395]]
* 14:44 andrewbogott: reimaging cloudvirtan1001 for [[phab:T224566|T224566]]
* 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:42 andrewbogott: reimaging cloudvirtan1001
* 14:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:29 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:22 bblack: rebooting cp3047 (post-reimage/puppetization for [[phab:T222937|T222937]])
* 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:00 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:00 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:57 jijiki: enable puppet on mw* in eqiad
* 13:44 volans: rm /root/.ssh/known_hosts on cumin[12]001
* 13:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:36 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.7
* 13:28 jijiki: Enabling puppet on mw*.codfw.net
* 13:22 zfilipin@deploy1001: Synchronized php-1.34.0-wmf.7/resources/src/jquery/jquery.suggestions.js: SWAT: [[gerrit:513237{{!}}jquery.suggestions: Do not show suggestions on prefilled values ([T224524])]] (duration: 00m 58s)
* 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1015.eqiad.wmnet
* 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1014.eqiad.wmnet
* 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1013.eqiad.wmnet
* 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1012.eqiad.wmnet
* 13:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1011.eqiad.wmnet
* 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1010.eqiad.wmnet
* 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1009.eqiad.wmnet
* 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1008.eqiad.wmnet
* 13:16 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1007.eqiad.wmnet
* 13:08 bblack: cp3047 puppet-disable + depool for reimage to ATS - [[phab:T222937|T222937]]
* 13:03 marostegui: Stop MySQL on db1099 for onsite maintenance - [[phab:T221502|T221502]]
* 13:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 [[phab:T221502|T221502]] (duration: 00m 56s)
* 13:00 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/tests/phpunit/includes/: [[phab:T222628|T222628]] (duration: 01m 06s)
* 12:58 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/includes/Linker.php: [[phab:T222628|T222628]] (duration: 01m 04s)
* 12:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 12:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 12:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 12:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 12:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 12:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 12:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 12:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:52 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:34 akosiaris: reboot ganeti2003 for kernel upgrades
* 11:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:20 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:20 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:14 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:14 _joe_: freed opcache on mw1281
* 11:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:07 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:05 Urbanecm: EU SWAT finished
* 11:04 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: [[:gerrit:Enable abusefilter blocking ability in plwiki]] ([[phab:T224617|T224617]]) (duration: 00m 58s)
* 11:00 jijiki: Disable puppet on mw* servers to merge 507939 - [[phab:T219150|T219150]]
* 10:42 jynus: upgrade and restart db1117 (temporary proxy fail for passive host, reduced redundancy for m*)
* 10:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:22 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:19 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:18 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:15 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:07 jynus: upgrade and restart test-s4 hosts (db1111, db1112)
* 09:42 jynus: stop and upgrade db1102
* 09:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 09:31 _joe_: depooling mw1261 for benchmarking for [[phab:T224491|T224491]]
* 09:26 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight (duration: 00m 55s)
* 08:54 jynus: stop and restart db1089 for upgrade
* 08:50 onimisionipe: maps2001 postgres initialization - [[phab:T224395|T224395]]
* 08:44 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1089 for maintenance (duration: 00m 57s)
* 08:32 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2087 for maintenance (duration: 01m 00s)
* 08:10 mobrovac: drop old Parsoid tables from cassandra -- [[phab:T223998|T223998]]
* 07:40 mobrovac@deploy1001: Finished deploy [restbase/deploy@92591a7]: Switch to OpenAPI v3 and drop page/html/title/revision/tid - [[phab:T218218|T218218]] [[phab:T215956|T215956]] (duration: 19m 28s)
* 07:33 _joe_: upgraded service-checker on icinga1001,2
* 07:21 mobrovac@deploy1001: Started deploy [restbase/deploy@92591a7]: Switch to OpenAPI v3 and drop page/html/title/revision/tid - [[phab:T218218|T218218]] [[phab:T215956|T215956]]
* 00:40 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2091 - [[phab:T224393|T224393]] (duration: 00m 56s)
* 00:24 mutante: re-enabling puppet on phab1001 now that it does not have the phab role anymore ([[phab:T221389|T221389]])
* 00:17 mutante: rsyncing /srv/repos again. pulling on phab2001 from phab1003 ([[phab:T221389|T221389]])
 
== 2019-05-29 ==
* 23:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wikibase sameAs A/B test config, part II (duration: 00m 56s)
* 23:36 jforrester@deploy1001: sync-file aborted: Remove wikibase sameAs A/B test config, part I (duration: 00m 00s)
* 23:35 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Remove wikibase sameAs A/B test config, part I (duration: 00m 56s)
* 23:26 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/AbuseFilter/includes/parser/AbuseFilterTokenizer.php: SWAT AbuseFilter: Tokenizer caching back to APC {{Gerrit|I8c6a4a95e}} (duration: 00m 54s)
* 23:19 jforrester@deploy1001: Synchronized wmf-config/flaggedrevs.php: Replace FR constants with numbers {{Gerrit|Ia52f644948}} (duration: 00m 56s)
* 23:17 jforrester@deploy1001: Synchronized multiversion/MWScript.php: Mark refreshMessageBlobs.php as a global script (duration: 00m 56s)
* 23:15 mutante: repooled phab2001-vcs , fixes pybal / lvs alerts
* 23:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab2001-vcs.codfw.wmnet
* 23:10 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT Enable wgSpecialSearchFormOptions on production Wikidata [[phab:T55652|T55652]] (duration: 00m 57s)
* 23:01 mutante: phab2001 - same issue with tin.eqiad.wmnet still showing up when first trying to git clone
* 22:52 mutante: misweb2001 - a2dismod mpm_event ; systemctl restart apache2 to fix php7.0 dependency issue
* 22:50 mutante: miscweb2001 - when first trying to git pull iegreview - still tries to resolve 'tin.eqiad.wmnet' which is long gone. fix is still to manually edit /srv/deployment/iegreview/iegreview-cache/cache/.git/config
* 22:46 jforrester@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Hot-deploy [[phab:T224634|T224634]] to fix CirrusSearch for extension registration (duration: 00m 57s)
* 21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 21:47 mutante: installing OS on miscweb2001 VM failed at grub install step :( [[phab:T224323|T224323]]
* 21:47 mutante: sign puppet cert request for phab2001 after reinstall (for some reason it needed me to connect to console and hit enter, reimage script itself was stuck)
* 20:54 mutante: creating new ganeti VM miscweb2001.codfw.wmnet with same specs as krypton.eqiad.wmnet ([[phab:T224323|T224323]])
* 20:35 arlolra: Updated Parsoid to {{Gerrit|8546c79}} ([[phab:T219927|T219927]], [[phab:T211125|T211125]])
* 20:35 ejegg: updated payments-wiki from {{Gerrit|332aaa96e2}} to {{Gerrit|45b73e7749}}
* 20:28 arlolra@deploy1001: Finished deploy [parsoid/deploy@6caac43]: Updating Parsoid to {{Gerrit|8546c79}} (duration: 07m 46s)
* 20:20 arlolra@deploy1001: Started deploy [parsoid/deploy@6caac43]: Updating Parsoid to {{Gerrit|8546c79}}
* 20:10 bblack: pool cp3044 (esams cache_upload ats-be) - [[phab:T222937|T222937]]
* 19:46 reedy@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/Collection/: Replace missing wfCollectionSuggestAction (duration: 00m 57s)
* 19:45 XioNoX: enable cr1-codfw:et-0/2/1 - [[phab:T224511|T224511]]
* 19:45 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/: Replace missing wfCollectionSuggestAction (duration: 01m 01s)
* 19:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab2001-vcs.codfw.wmnet
* 19:32 mutante: phab2001 - reinstalling with stretch - upgrade from jessie ([[phab:T190568|T190568]])
* 19:09 XioNoX: enable cr1-codfw:et-0/2/0 - [[phab:T224511|T224511]]
* 18:37 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp3044.esams.wmnet
* 17:44 XioNoX: enable cr1-codfw:et-0/0/1 - [[phab:T224511|T224511]]
* 17:13 XioNoX: enable cr1-codfw:et-0/0/0 - [[phab:T224511|T224511]]
* 17:02 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: [[:gerrit:501926{{!}}Change arwiki default user preferences]], part 3/3 ([[phab:T220186|T220186]]) (duration: 00m 56s)
* 17:00 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: [[:gerrit:501926{{!}}Change arwiki default user preferences]], part 2/3 ([[phab:T220186|T220186]]) (duration: 00m 56s)
* 16:59 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:501926{{!}}Change arwiki default user preferences]], part 1/3 ([[phab:T220186|T220186]]) (duration: 00m 56s)
* 16:48 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:512942]] Revert: Hardcode korean help desk config (duration: 00m 56s)
* 16:45 sbisson@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/GrowthExperiments/includes/HelpPanel.php: SWAT: [[gerrit:512941]] Prevent parsing of GEHelpPanelHelpDeskTitle from accessing the session (duration: 00m 56s)
* 16:42 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/GrowthExperiments/includes/HelpPanel.php: SWAT: [[gerrit:512940]] Prevent parsing of GEHelpPanelHelpDeskTitle from accessing the session (duration: 01m 00s)
* 16:32 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/GrowthExperiments/includes/HelpPanel/QuestionRecord.php: SWAT: [[gerrit:512950]] Revert: Fix phan job: ignore line using JsonSerializable (duration: 00m 57s)
* 16:08 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
* 15:55 jynus: upgrade and restart db2087
* 15:11 moritzm: draining ganeti2008 for eventual reboot to pick up MDS-enabled kernel
* 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:06 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on group 1 ([[phab:T188327|T188327]]) (duration: 00m 57s)
* 14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:54 moritzm: draining ganeti2007 for eventual reboot to pick up MDS-enabled kernel
* 14:51 XioNoX: `request chassis fpc online slot 0` on cr1-codfw - [[phab:T224511|T224511]]
* 14:48 XioNoX: `request chassis fpc offline slot 0` on cr1-codfw - [[phab:T224511|T224511]]
* 14:47 XioNoX: disable et- interfaces on cr1-codfw - [[phab:T224511|T224511]]
* 14:45 andrewbogott: reimaging cloudcontrol1003 [[phab:T221770|T221770]]
* 14:34 moritzm: draining ganeti2006 for eventual reboot to pick up MDS-enabled kernel
* 14:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:32 andrewbogott: powering off cloudcontrol1003 as one last check to see what explodes before I reimage it
* 14:30 _joe_: installing the new service checker on restbase in eqiad
* 14:29 _joe_: installing new service checker version on restbase in codfw
* 14:20 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:20 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:58 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 13:58 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 13:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:48 urandom: decommissioning restbase1015-c -- [[phab:T223976|T223976]]
* 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:19 zfilipin@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.7 (duration: 00m 58s)
* 13:18 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.7
* 13:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:12 Urbanecm: mwscript emptyUserGroup.php --wiki=fawiki 'uploader' finished ([[phab:T221441|T221441]])
* 13:06 andrewbogott: stopping openstack services on cloudcontrol1003 in anticipation of a re-image
* 13:03 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 13:02 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 13:02 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 13:02 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 13:01 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 13:01 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 13:00 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 12:42 Zppix: [12:27:02]  jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:41 Zppix: [12:27:02] jbond@cumin1001 START - Cookbook sre.hosts.downtime
* 12:40 Zppix: [12:23:06] <jijiki> Rolling restart pdfrender on scb*
* {{safesubst:SAL entry|1=12:39 Zppix: [[12:20:49]  jbond@cumin1001 START - Cookbook sre.hosts.downtime}}
* 12:39 Zppix: [12:20:49] jbond@cumin1001 START - Cookbook sre.hosts.downtime
* 12:38 Zppix: [12:11:55] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:38 Zppix: [12:11:54] jbond@cumin1001 START - Cookbook sre.hosts.downtime
* 12:37 Zppix: [12:01:54] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0
* 12:36 Zppix: [12:01:54] jbond@cumin1001 START - Cookbook sre.hosts.downtime
* 12:36 Zppix: [12:00:21] marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Remove db2037 from config as it will be decommissioned [[phab:T221533|T221533]] (duration: 00m 56s)
* 12:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 12:34 Zppix: [11:59:19] marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Remove db2037 from config as it will be decommissioned [[phab:T221533|T221533]]
* 12:33 Zppix: [11:58:16] <arturo> [[phab:T221770|T221770]] icinga downtime cloudcontrol1003.wikimedia.org for upcoming rebuild as stretch
* 12:32 Zppix: [11:57:57] aborrero@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:32 Zppix: [11:57:55] aborrero@cumin1001 START - Cookbook sre.hosts.downtime
* 12:31 Zppix: [11:55:54] <Urbanecm> EU SWAT finished, maintenance script emptyUserGroup.php still running in separate tmux session
* 12:31 Zppix: [11:55:11] urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:511849{{!}}Set wgLocaltimezone for euwiki to Europe/Berlin]] ([[phab:T224091|T224091]]) (duration: 00m 57s)
* 12:30 Zppix: [11:55:10] jbond@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:29 Zppix: [11:55:09]  jbond@cumin1001 START - Cookbook sre.hosts.downtime
* 11:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:471260{{!}}RSS: Update URLs to the old Wikimedia Foundation blog to point to the new site]] ([[phab:T208458|T208458]]) (duration: 00m 57s)
* 11:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:46 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:46 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 11:45 Urbanecm: Started mwscript emptyUserGroup.php --wiki=fawiki 'uploader' ([[phab:T221441|T221441]])
* 11:44 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: [[:gerrit:505228{{!}}Remove uploader user group from fawiki and merge it with autoconfirmed]], part 2 ([[phab:T221441|T221441]]) (duration: 00m 55s)
* 11:43 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:505228{{!}}Remove uploader user group from fawiki and merge it with autoconfirmed]], part 1 ([[phab:T221441|T221441]]) (duration: 00m 55s)
* 11:40 Urbanecm: Purged angwikibooks HD logos
* 11:38 urbanecm@deploy1001: Synchronized static/images/project-logos/: [[:gerrit:512433{{!}}Add HD logo for angwikibooks]], logo files ([[phab:T150618|T150618]]) (duration: 00m 56s)
* 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:512478{{!}}Enable transwiki import between sqwiki and sqwikiquote]] ([[phab:T221234|T221234]]) (duration: 00m 56s)
* 11:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:30 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:509130 Enable Advanced Mobile Contributions Overflow menu (T223883)]] (duration: 00m 57s)
* 11:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:512488{{!}}Remove bureaucrat protection level for all Serbian projects]] ([[phab:T217005|T217005]]) (duration: 00m 57s)
* 11:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:512487{{!}}Fix Serbian projects wgRestrictionLevels]] ([[phab:T217005|T217005]]) (duration: 00m 57s)
* 11:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[:gerrit:506892{{!}}Add namespace aliases on zhwiktionary]] ([[phab:T222024|T222024]]) (duration: 00m 57s)
* 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:59 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 10:57 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2087 for  maintenance (duration: 01m 11s)
* 10:57 Urbanecm: deleteBatch.php for srwikinews finished ([[phab:T212346|T212346]])
* 10:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:48 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:33 mobrovac@deploy1001: Finished deploy [restbase/deploy@92591a7] (dev-cluster): Switch to OpenAPI v3 (duration: 03m 36s)
* 10:29 mobrovac@deploy1001: Started deploy [restbase/deploy@92591a7] (dev-cluster): Switch to OpenAPI v3
* 09:51 gehel@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 09:45 _joe_: uploading a new service-checker version to jessie-wikimedia
* 09:18 gehel@cumin1001: START - Cookbook sre.elasticsearch.rolling-reboot
* 08:51 moritzm: draining ganeti2002 for eventual reboot to pick up MDS-enabled kernel
* 08:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:31 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:31 moritzm: draining ganeti2001 for eventual reboot to pick up MDS-enabled kernel
* 07:42 mobrovac: decommission restbase1015-b -- [[phab:T223976|T223976]]
* 07:40 godog: ms-be2043 start sdd rebuild - [[phab:T222654|T222654]]
* 07:03 jijiki: restarting pdfrender on scb1003
 
== 2019-05-28 ==
* 23:19 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/TimedMediaHandler/includes/ApiTimedText.php: [[phab:T224522|T224522]] Fix fatal in ApiTimedText following redirect pages (duration: 00m 56s)
* 23:17 jforrester@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/TimedMediaHandler/includes/handlers/TextHandler/TextHandler.php: [[phab:T224367|T224367]] Fix regression in subtitles for non-English sites on Commons videos (duration: 00m 57s)
* 23:17 bstorm_: [[phab:T221339|T221339]] completed view updates on labsdb1009 without depooling
* 23:16 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/TimedMediaHandler/includes/handlers/TextHandler/TextHandler.php: [[phab:T224367|T224367]] Fix regression in subtitles for non-English sites on Commons videos (duration: 00m 56s)
* 23:14 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/TimedMediaHandler/includes/ApiTimedText.php: [[phab:T224522|T224522]] Fix fatal in ApiTimedText following redirect pages (duration: 00m 58s)
* 23:11 jforrester@deploy1001: Synchronized wmf-config/flaggedrevs.php: FlaggedRevisions: Copy in rest of the config, for static registration {{Gerrit|I77d70519f}} {{Gerrit|Id0cd2e18c}} (duration: 00m 56s)
* 23:10 bstorm_: [[phab:T221339|T221339]] repooled labsdb1011
* 23:06 jforrester@deploy1001: Synchronized wmf-config/throttle.php: Remove expired throttle rules {{Gerrit|I4ba3d489}} (duration: 00m 55s)
* 23:06 bstorm_: [[phab:T221339|T221339]] depooled labsdb1011 and updated views
* 23:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT [[phab:T55652|T55652]] Enable wgSpecialSearchFormOptions on testwikidata (duration: 00m 56s)
* 22:49 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT Fix order of edit tabs for multi-tabs on SET wikis [[phab:T223793|T223793]] (duration: 00m 57s)
* 22:28 cstone_: Re-enabled fundraising thank you mail job
* 22:25 mutante: cp3034 - sudo -i varnish-backend-restart
* 22:18 cstone_: Updated fundraising civicrm from {{Gerrit|21afd001b6}} to {{Gerrit|bb4acf3d8a}}
* 22:14 mutante: cp3035 - varnish-backend-restart
* 22:13 bstorm_: repooled labsdb1010
* 22:09 mutante: cp3034 - restart varnish backend
* 22:09 XioNoX: restart varnish backend on cp3039
* 22:02 cstone_: Disabled fundraising thank you mail job
* 21:46 bstorm_: depool labsdb1010 for view updates
* 21:38 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@5a69072]: Deploy GUI & Blazegraph update (duration: 14m 37s)
* 21:35 urandom: decommissioning restbase1015-a -- [[phab:T223976|T223976]]
* 21:24 smalyshev@deploy1001: Started deploy [wdqs/wdqs@5a69072]: Deploy GUI & Blazegraph update
* 21:23 ebernhardson: restart elasticsearch on cloudelastic1001 to test sanely sized readahead on /dev/dm-0
* 21:11 gehel@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=0)
* 20:58 mutante: phab1003 / phab2001 - removing 'apache restart' from root's crontab (gerrit:512977) ([[phab:T187790|T187790]])
* 20:28 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Update caption edit target counts (duration: 00m 57s)
* 19:17 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 19:15 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db1064 from config as it will be decommissioned [[phab:T223217|T223217]] (duration: 00m 55s)
* 19:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1064 from config as it will be decommissioned [[phab:T223217|T223217]] (duration: 00m 56s)
* 19:02 marostegui: Reboot db2091 for full OS and MySQL upgrade - [[phab:T224393|T224393]]
* 18:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMediaInfoEnableFilePageDepicts, no longer read (duration: 00m 57s)
* 18:51 jforrester@deploy1001: Synchronized wmf-config/Wikibase.php: Add forwards-compatibility for dataCdnMaxAge (duration: 01m 00s)
* 18:11 marostegui: Start mysql for s2 and s4 on db2091 [[phab:T224393|T224393]]
* 17:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:52 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:48 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:48 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:42 moritzm: rebooting yubiauth* servers for kernel update
* 17:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:35 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@0735c45]: Update mobileapps to {{Gerrit|ab67b78}} (duration: 05m 56s)
* 17:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:34 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:29 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@0735c45]: Update mobileapps to {{Gerrit|ab67b78}}
* 17:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:03 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:03 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:49 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:49 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:41 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:41 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:35 hoo: Ran scap pull on mw1240 (curl -H 'Host: www.wikidata.org' … mw1240.eqiad.wmnet/wiki/Special:SetEntitySchemaLabelDescriptionAliases/E10/en returned 404)
* 16:27 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:27 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:20 Lucas_WMDE: lucaswerkmeister-wmde@mw1271:~$ scap pull
* 16:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:15 moritzm: rearmed keyholder on deploy2001 following reboot
* 16:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:09 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:04 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:04 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:56 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:54 papaul: shutting down db2091 for firmware upgrade
* 15:53 godog: put back wrongly-replaced sdf on ms-be2043 - [[phab:T222654|T222654]]
* 15:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:42 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:42 Lucas_WMDE: Extension:EntitySchema deployment finished successfully
* 15:38 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=wikidatawiki
* 15:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:512909{{!}}Enable extension EntitySchema in production]] (duration: 00m 56s)
* 15:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:34 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema/: [[gerrit:512911{{!}}Steal maintenance script user]] (duration: 00m 58s)
* 15:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:17 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=testwikidatawiki
* 15:17 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/EntitySchema/: [[gerrit:512912{{!}}Steal maintenance script user]] – forgot `git submodule update` before previous sync (duration: 00m 57s)
* 15:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.7/extensions/EntitySchema/: [[gerrit:512912{{!}}Steal maintenance script user]] (duration: 00m 59s)
* 15:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:01 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 14:57 jbond42: reboot ms-be2016
* 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:36 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 14:30 zfilipin@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.7
* 14:10 herron: beginning rolling reboots of codfw kafka-main cluster for security updates
* 14:10 zfilipin@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache (duration: 34m 22s)
* 14:04 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 13:50 _joe_: hhvm restarted on mwdebug1001
* 13:48 _joe_: stopping hhvm on mwdebug1001 for testing
* 13:39 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 13:35 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache
* 13:32 gilles@deploy1001: Finished deploy [performance/asoranking@60369cc]: [[phab:T224388|T224388]] (duration: 00m 03s)
* 13:31 gilles@deploy1001: Started deploy [performance/asoranking@60369cc]: [[phab:T224388|T224388]]
* 13:31 gilles@deploy1001: deploy aborted: [[phab:T224388|T224388]] (duration: 00m 01s)
* 13:31 gilles@deploy1001: Started deploy [performance/asoranking@1c60db1]: [[phab:T224388|T224388]]
* 13:24 urandom: decommissioning restbase1014-c -- [[phab:T223976|T223976]]
* 13:23 gehel@cumin2001: END (ERROR) - Cookbook sre.elasticsearch.rolling-reboot (exit_code=97)
* 12:55 gehel@cumin2001: START - Cookbook sre.elasticsearch.rolling-reboot
* 12:51 gilles@deploy1001: Finished deploy [performance/asoranking@1c60db1]: [[phab:T224388|T224388]] (duration: 00m 04s)
* 12:50 gilles@deploy1001: Started deploy [performance/asoranking@1c60db1]: [[phab:T224388|T224388]]
* 12:40 gilles@deploy1001: Finished deploy [performance/asoranking@157c25f]: [[phab:T224388|T224388]] (duration: 00m 06s)
* 12:40 gilles@deploy1001: Started deploy [performance/asoranking@157c25f]: [[phab:T224388|T224388]]
* 12:13 raynor: EU SWAT done
* 12:11 pmiazga@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:512743 Disable the rdf2latex Collection portlet format(T224433)]] (duration: 00m 55s)
* 12:00 raynor: EU SWAT re-opened
* 11:58 Lucas_WMDE: EU SWAT done
* 11:54 Lucas_WMDE: ^ error, no change to wiki
* 11:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/EntitySchema/maintenance/createPreexistingSchemas.php --wiki=testwikidatawiki
* 11:52 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema/: SWAT: [[gerrit:512689{{!}}Add maintenance script to create preexisting Schemas]] + [[gerrit:512717{{!}}Small maintenance script adjustments]] (duration: 00m 54s)
* 11:48 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/EntitySchema: SWAT: [[gerrit:512677{{!}}Skip configured IDs]] (duration: 00m 57s)
* 11:43 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:511753{{!}}Add a list of IDs to skip in production]] (duration: 00m 54s)
* 11:37 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config: SWAT: [[gerrit:510204{{!}}Add feature flag config for breaking Wikibase API change (T223300)]] (duration: 00m 54s)
* 11:31 Urbanecm: Ran namespaceDupes.php for urwikibooks, urwikiquote, urwiktionary and aswikisource
* 11:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:512426{{!}}Use underscores instead of spaces in wgMetaNamespace(Talk) for several projects]] ([[phab:T223039|T223039]]) (duration: 00m 54s)
* 11:25 arturo: merging change to the puppet sudo module https://gerrit.wikimedia.org/r/c/operations/puppet/+/508311
* 11:18 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: SWAT: [[:gerrit:512422{{!}}Add abusefilter-modify-restricted to abusefilter group on plwiki (T224308)]] (duration: 02m 36s)
* 10:54 zfilipin@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_4182265560" --threads=30 --lang en  --quiet' returned non-zero exit status 1 (duration: 03m 00s)
* 10:51 zfilipin@deploy1001: Started scap: testwiki to php-1.34.0-wmf.7 and rebuild l10n cache
* 10:48 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.3 [keeping static files] (duration: 01m 32s)
* 10:45 zfilipin@deploy1001: Pruned MediaWiki: 1.34.0-wmf.4 [keeping static files] (duration: 06m 06s)
* 09:32 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Allow MW to honour the X-Request-Id header if set - [[phab:T201409|T201409]] (duration: 01m 12s)
* 09:28 moritzm: installing php5 security updates
* 09:00 moritzm: installing ffmpeg security updates
* 08:58 gehel: rebooting wdqs nodes for kernel upgrade
* 08:54 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob  to PHP7 - [[phab:T219148|T219148]] (duration: 01m 21s)
* 08:52 jiji@deploy1001: Started deploy [cpjobqueue/deploy@04cc66d]: Migrating ORESFetchScoresJob  to PHP7 - [[phab:T219148|T219148]]
* 08:52 moritzm: uploaded ffmpeg 3.2.14-1~deb9u1+wmf3 to component/vp9 of stretch-wikimedia (rebase of our vp9-row-mt backport to the latest stretch-security ffmpeg update)
* 08:47 vgutierrez: uploaded acme-chief 0.17 to apt.wikimedia.org (buster) - [[phab:T220518|T220518]] [[phab:T213820|T213820]]
* 08:40 volans: [[phab:T224448|T224448]] sudo cumin -b 15 -p 95 'R:git::clone' 'run-puppet-agent -q --failed-only'
* 08:29 volans: restarting gerrit due to stack threads - [[phab:T224448|T224448]]
* 07:17 moritzm: uploaded ffmpeg 3.2.14-1~deb9u1+wmf1 to component/vp9 of stretch-wikimedia (rebase of our vp9-row-mt backport to the latest stretch-security ffmpeg update)
* 07:02 mobrovac: decommission restbase1014-b -- [[phab:T223976|T223976]]
* 06:40 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 20% of anonymous users to PHP7.2 - [[phab:T219150|T219150]] (duration: 00m 51s)
* 00:38 urandom: decommissioning restbase1014-a -- [[phab:T223976|T223976]]
 
== 2019-05-27 ==
* 23:19 thcipriani: gerrit back after restarting due to [[phab:T224448|T224448]]
* 23:10 thcipriani: restarting gerrit due to active threads being stuck being a sendemail thread.
* 22:52 gilles@deploy1001: Finished deploy [performance/asoranking@bacfc37]: [[phab:T224388|T224388]] (duration: 00m 05s)
* 22:52 gilles@deploy1001: Started deploy [performance/asoranking@bacfc37]: [[phab:T224388|T224388]]
* 22:19 gilles@deploy1001: Finished deploy [performance/asoranking@d0c156e]: [[phab:T224388|T224388]] (duration: 00m 05s)
* 22:19 gilles@deploy1001: Started deploy [performance/asoranking@d0c156e]: [[phab:T224388|T224388]]
* 20:19 gilles@deploy1001: Finished deploy [performance/asoranking@61039f1]: (no justification provided) (duration: 00m 06s)
* 20:19 gilles@deploy1001: Started deploy [performance/asoranking@61039f1]: (no justification provided)
* 18:41 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/libs/rdbms: {{Gerrit|66556bf37e8}} / [[phab:T223310|T223310]], [[phab:T223978|T223978]] (duration: 00m 50s)
* 18:06 krinkle@deploy1001: Synchronized errorpages/: {{Gerrit|4ffcbfc2ba3}} (duration: 00m 48s)
* 17:56 andrewbogott: re-imaging cloudservices1004 in order to make sure our apt magic is working properly
* 17:37 andrewbogott: refreshing puppet-compiler facts
* 16:40 volans: removed unreferenced files in /etc/dhcp/ on install[12]002
* 16:34 mobrovac: decommission restbase1013-c - [[phab:T223976|T223976]]
* 15:40 akosiaris: initialize termbox namespace on eqiad/codfw/staging kubernetes clusters [[phab:T220402|T220402]]
* 15:36 akosiaris: initialize sessionstore namespace on eqiad/codfw/staging kubernetes clusters [[phab:T220401|T220401]]
* 13:03 godog: swift eqiad-prod: ms-be1033 weight to 0 - [[phab:T223518|T223518]]
* 11:33 onimisionipe: starting osm initial import on maps2004 - [[phab:T224395|T224395]]
* 10:35 mobrovac: decommission restbase1013-b - [[phab:T223976|T223976]]
* 10:31 onimisionipe: rebooting maps2004 - cassandra unit failed and got stuck
* 09:59 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage  to PHP7 - [[phab:T219148|T219148]] (duration: 01m 09s)
* 09:58 jiji@deploy1001: Started deploy [cpjobqueue/deploy@421c029]: Migrating wikibase-addUsagesForPage  to PHP7 - [[phab:T219148|T219148]]
* 09:52 _joe_: disabling puppet on mw1261, running some tests for [[phab:T223180|T223180]]
* 08:52 arturo: 1 day downtime systemd check for cloudcontrol1003
* 08:27 jiji@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2091 - [[phab:T224393|T224393]] (duration: 00m 49s)
* 08:03 gehel: depool maps2004 - [[phab:T224395|T224395]]
* 07:05 gehel: running nodetool repair on maps2004 -[[phab:T224395|T224395]]
* 04:23 gilles@deploy1001: Finished deploy [performance/asoranking@61039f1]: (no justification provided) (duration: 00m 28s)
* 04:23 gilles@deploy1001: Started deploy [performance/asoranking@61039f1]: (no justification provided)
* 02:59 urandom: decommissioning restbase1013-a -- [[phab:T223976|T223976]]
 
== 2019-05-26 ==
* 20:39 urandom: decommissioning restbase1012-c -- [[phab:T223976|T223976]]
* 14:09 urandom: decommissioning restbase1012-b -- [[phab:T223976|T223976]]
* 13:37 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/debug: [[phab:T187147|T187147]] / {{Gerrit|2be7aa4bc4af36}} (duration: 00m 51s)
* 08:01 mobrovac: decommission restbase1012-a - [[phab:T223976|T223976]]
 
== 2019-05-25 ==
* 22:41 urandom: decommissioning restbase1011-c -- [[phab:T223976|T223976]]
* 22:00 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/Linker.php: [[phab:T222628|T222628]] / {{Gerrit|c735a545df3a}} (duration: 00m 51s)
* 19:12 andrewbogott: reimaging cloudservices1004 with Stretch
* 13:46 urandom: decommissioning restbase1011-b -- [[phab:T223976|T223976]]
* 12:28 godog: bounce thumbor on thumbor1002
* 12:21 godog: bounce thumbor on thumbor1002
* 11:48 _joe_: restarted tumbor-instances on thumbor1001
* 09:20 mobrovac: decommission restbase1011-b - [[phab:T223976|T223976]]
* 04:56 ariel@deploy1001: Finished deploy [dumps/dumps@61114e0]: add namespaces param only once for abstracts with lang variants (duration: 00m 07s)
* 04:56 ariel@deploy1001: Started deploy [dumps/dumps@61114e0]: add namespaces param only once for abstracts with lang variants
* 00:30 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTarget.js: Hot-deploy [[phab:T224319|T224319]] for VisualEditor switching and auto-restore (duration: 00m 50s)
 
== 2019-05-24 ==
* 21:56 urandom: decommissioning restbase1011-a -- [[phab:T223976|T223976]]
* 16:34 XioNoX: add routinator package to reprepro/APT - [[phab:T220669|T220669]]
* 15:44 urandom: decommissioning restbase1010-c -- [[phab:T223976|T223976]]
* 15:30 XioNoX: disable bgp to telia on cr1-codfw for X-connect investigation - [[phab:T222967|T222967]]
* 15:01 jbond42: upload python{,3}-statsd.3.2.1-2 to jessie-wikimedia
* 14:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/libs/objectcache/: {{Gerrit|d262078b1}} / [[phab:T220470|T220470]] (duration: 01m 06s)
* 11:45 hoo: Updated the Wikidata property suggester with data from the 2019-05-13 JSON dump and applied the [[phab:T132839|T132839]] workarounds
* 11:32 jbond42: [actully] rebooting prometheous1004 now
* 11:32 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:32 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:23 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:23 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:23 jbond42: rebooting prometheous1004
* 10:56 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:56 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:56 jbond42: rebooting prometheous2003
* 10:25 jbond42: rebooting prometheous2004
* 10:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:25 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 10:09 mobrovac: decommission restbase1010-b - [[phab:T223976|T223976]]
* 07:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:32 moritzm: rebooting labweb* for kernel security update
* 07:05 mobrovac: restbase-dev1006 force-stop the cassandra instances, fsync exception during decomm - [[phab:T224260|T224260]]
* 06:47 moritzm: bounced ferm on mw2286, wasn't correctly started after reboot
* 06:45 mobrovac: restbase-dev1006 decommission cass-b - [[phab:T224260|T224260]]
* 06:43 _joe_: disable notifications in icinga for restbase-dev1006 [[phab:T224260|T224260]]
* 06:40 mobrovac: restbase-dev1006 decommission cass-a - [[phab:T224260|T224260]]
* 06:39 mobrovac: restbase-dev1006 stop restbase - [[phab:T224260|T224260]]
* 06:38 mobrovac: restbase-dev1006 puppet disabled - [[phab:T224260|T224260]]
* 06:26 mobrovac@deploy1001: Finished deploy [restbase/deploy@b153f5d] (dev-cluster): Remove Parsoid fallback and rate-limit stashing (duration: 05m 41s)
* 06:20 mobrovac@deploy1001: Started deploy [restbase/deploy@b153f5d] (dev-cluster): Remove Parsoid fallback and rate-limit stashing
* 06:20 mobrovac@deploy1001: Finished deploy [restbase/deploy@b153f5d]: Remove Parsoid fallback and rate-limit stashing - [[phab:T215956|T215956]] [[phab:T224055|T224055]] (duration: 21m 30s)
* 06:17 marostegui: Stop MySQL on db2078:m1 to clone db2062 - [[phab:T220170|T220170]]
* 06:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to new hosts [[phab:T220170|T220170]] (duration: 00m 48s)
* 05:58 mobrovac@deploy1001: Started deploy [restbase/deploy@b153f5d]: Remove Parsoid fallback and rate-limit stashing - [[phab:T215956|T215956]] [[phab:T224055|T224055]]
* 05:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2062 from config [[phab:T220170|T220170]] (duration: 00m 48s)
* 05:34 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2062 from config [[phab:T220170|T220170]] (duration: 00m 49s)
* 05:30 marostegui: Reload haproxy on dbproxy1010 to repool labsdb1011
* 00:32 XioNoX: remove lvs1001-5 bgp sessions from cr1/2-eqiad - [[phab:T224223|T224223]]
* 00:27 XioNoX: remove term protect-old-lvs-servers from cr1/2-eqiad - [[phab:T224223|T224223]]
* 00:20 urandom: decommissioning restbase1010-a -- [[phab:T223976|T223976]]
* 00:04 ebernhardson@deploy1001: Finished scap: php-1.34.0-wmf.6/extensions/CirrusSearch/includes/ [[phab:T223738|T223738]] Consider searching out of limits an error (duration: 21m 32s)
 
== 2019-05-23 ==
* 23:43 ebernhardson@deploy1001: Started scap: php-1.34.0-wmf.6/extensions/CirrusSearch/includes/ [[phab:T223738|T223738]] Consider searching out of limits an error
* 23:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup VII–X, InitialiseSettings (duration: 00m 48s)
* 23:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup VII–X, CommonSettings (duration: 00m 47s)
* 23:00 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup VI, InitialiseSettings (duration: 00m 47s)
* 22:59 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup VI, CommonSettings (duration: 00m 48s)
* 22:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup V, InitialiseSettings (duration: 00m 47s)
* 22:56 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup V, CommonSettings (duration: 00m 47s)
* 22:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup IV, InitialiseSettings (duration: 00m 47s)
* 22:51 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup IV, CommonSettings (duration: 00m 48s)
* 22:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup III, InitialiseSettings (duration: 00m 47s)
* 22:47 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup III, CommonSettings (duration: 00m 48s)
* 22:44 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup II, InitialiseSettings (duration: 00m 48s)
* 22:43 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup II, CommonSettings (duration: 00m 48s)
* 22:39 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Invariant config cleanup I, InitialiseSettings (duration: 00m 47s)
* 22:37 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Invariant config cleanup I, CommonSettings (duration: 00m 48s)
* 22:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wmgUseClusterSquid, never varied, no longer used (duration: 00m 48s)
* 22:29 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop reading wmgUseClusterSquid, never varied (duration: 00m 47s)
* 22:25 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T104148|T104148]] Duplicate …Squid variables into …Cdn ahead of MW renaming, part 3 (duration: 00m 47s)
* 22:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T104148|T104148]] Duplicate …Squid variables into …Cdn ahead of MW renaming, part 2 (duration: 00m 48s)
* 22:23 jforrester@deploy1001: Synchronized wmf-config/reverse-proxy.php: [[phab:T104148|T104148]] Duplicate …Squid variables into …Cdn ahead of MW renaming, part 1 (duration: 00m 48s)
* 22:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T223793|T223793]] Drop wmgVisualEditorSingleEditTabSecondaryEditor and wmgVisualEditorSecondaryTabs from InitialiseSettings (duration: 00m 48s)
* 22:17 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T223793|T223793]] Read wmgVisualEditorIsSecondaryEditor in CommonSettings (duration: 00m 48s)
* 22:13 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T223793|T223793]] Add wmgVisualEditorIsSecondaryEditor to InitialiseSettings (duration: 00m 49s)
* 19:48 ejegg: updated payments-wiki from {{Gerrit|786d76e212}} to {{Gerrit|332aaa96e2}}
* 18:54 urandom: decommissioning restbase1009-c -- [[phab:T223976|T223976]]
* 16:13 twentyafterfour: restarting phd on phab1003 to pick up new php module config
* 15:57 moritzm: rebooting furud/flerovium for kernel updates
* 15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:33 ottomata: rolling restart of swift-proxy to apply creation of analytics_admin account
* 15:31 hashar@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Hardcode korean help desk config - [[phab:T224224|T224224]] (duration: 00m 48s)
* 15:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:31 jbond42: reboot thumbor2004
* 15:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 jbond42: reboot thumbor2003
* 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:57 jbond42: reboot thumbor2002
* 14:51 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:51 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:50 jbond42: reboot thumbor2001
* 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:43 jbond42: reboot thumbor1004
* 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:36 jbond42: reboot thumbor1003
* 14:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:29 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:28 jbond42: reboot thumbor1002
* 14:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1001.eqiad.wmnet
* 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:21 jbond@cumin1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
* 13:56 sbisson@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Echo: SWAT: [[gerrit:512070{{!}}Don't add CommentStoreComment as plaintext params]] (duration: 00m 50s)
* 13:55 urandom: decommissioning restbase1009-b -- [[phab:T223976|T223976]]
* 13:41 bblack: stopped pybal on lvs1001-6 - [[phab:T224223|T224223]]
* 13:25 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.6
* 13:00 godog: swift eqiad-prod: ms-be1033 weight to 1500 - [[phab:T223518|T223518]]
* 12:04 moritzm: powercycling mw2268 (stuck after reboot)
* 11:50 jbond42: will shortly start rolling reboots of thumbor servers
* 11:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:34 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:34 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 11:23 moritzm: rebooting auth1002 for kernel update
* 11:21 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:21 jmm@cumin1001: START - Cookbook sre.hosts.downtime
* 10:51 Amir1: Deploying EntitySchema to testwikidatawiki is done
* 10:50 Amir1: ladsgroup@mwmaint1002:/srv/mediawiki/php-1.34.0-wmf.5$ mwscript sql.php --wiki=wikidatawiki extensions/EntitySchema/sql/EntitySchema.sql ([[phab:T216955|T216955]])
* 10:50 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:511844{{!}}deploy WikibaseSchema to test (T216956)]] (duration: 00m 56s)
* 10:44 Amir1: ladsgroup@mwmaint1002:/srv/mediawiki/php-1.34.0-wmf.5$ mwscript sql.php --wiki=testwikidatawiki extensions/EntitySchema/sql/EntitySchema.sql ([[phab:T216956|T216956]])
* 10:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:28 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1080 (duration: 00m 57s)
* 10:15 _joe_: restarted php7.2-fpm on mw1261 to assess the effect of a larger APCu shm size [[phab:T223180|T223180]]
* 10:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:00 moritzm: rebooting remaining mw servers in codfw (sans mcrouter proxies for now)
* 10:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:51 hashar@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection: Rename wfAjaxCollectionGetItemList() [[phab:T224093|T224093]] (duration: 00m 57s)
* 09:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 into API (duration: 00m 55s)
* 09:22 godog: bounce rsyslog on lithium - listener stuck /T199406
* 09:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:10 moritzm: rebooting scb servers in eqiad
* 09:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1080 (duration: 00m 55s)
* 08:29 marostegui: Upgrade MySQL and kernel on db1080
* 08:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 00m 55s)
* 08:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:26 moritzm: rebooting scb servers in codfw
* 07:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1080 (duration: 00m 56s)
* 07:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:33 moritzm: rebooting swift frontends in eqiad
* 07:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1136 (duration: 00m 53s)
* 07:11 marostegui: Stop MySQL on db1117:3323 to clone db1128 [[phab:T222682|T222682]]
* 06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1136 (duration: 00m 55s)
* 06:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2065 from config as it will be moved to m3 to replace db2042 (duration: 00m 55s)
* 06:28 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2065 from config as it will be moved to m3 to replace db2042 (duration: 00m 56s)
* 06:14 mobrovac: start ruwiki dumps to fill the new parsoid tables - [[phab:T215956|T215956]]
* 05:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2070 as m5 codfw master - [[phab:T221533|T221533]] (duration: 00m 54s)
* 05:29 marostegui: Promote db2070 to m5 codfw master instead of db2037 - [[phab:T221533|T221533]]
* 05:20 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify db2107 status - will be the new master (duration: 00m 54s)
* 05:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1136 into s7 [[phab:T222682|T222682]] (duration: 00m 55s)
* 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1136 into s7 [[phab:T222682|T222682]] (duration: 00m 55s)
* 04:57 mobrovac: decommission restbase1009-a - [[phab:T223976|T223976]]
* 04:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 00m 55s)
* 04:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1080 (duration: 00m 58s)
* 04:24 mobrovac: start nl, pt, pl wiki dumps to fill the new parsoid tables - [[phab:T215956|T215956]]
* 03:50 twentyafterfour: m3 database activity levels look like they have returned to normal
* 03:48 twentyafterfour: puppet runs cleanly on phab1003
* 03:39 mutante: phab1003 - disabling puppet; /etc/php/7.2/fpm/conf.d# ln -s /etc/php/7.2/mods-available/ldap.ini 20-ldap.ini ; systemctl restart php7.2-fpm
* 03:27 twentyafterfour: restarted php-fpm on phab1003
* 02:56 mutante: phab1001 - removing community_metrics and project_changes cron jobs to avoid duplicate mails
* 02:51 mutante: phab1003 - chown -R phd /srv/repos/
* 02:41 twentyafterfour: downtimed the systemd state on phab1001 for 1 year
* 02:35 mutante: phabricator - going read-write again
* 02:24 twentyafterfour: manually started aphlict on phab1003
* 02:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab1003-vcs.eqiad.wmnet
* 02:04 mutante: puppetmaster1001 - sudo -i conftool-merge
* 01:52 twentyafterfour: phabricator is now served by phab1003 though still in read-only mode for a bit longer
* 01:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=phab1003-vcs.eqiad.wmnet
* 01:49 mutante: puppetmaster1001 - conftool-merge
* 01:41 eileen: civicrm revision changed from {{Gerrit|e6e846708f}} to {{Gerrit|21afd001b6}}, config revision is {{Gerrit|87e78d3eac}}
* 01:37 mutante: depooled phab1001-vcs from git-ssh via conftool
* 01:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=phab1001-vcs.eqiad.wmnet
* 01:33 mutante: run puppet on mx1001/mx2001 - switch mail route for phab to phab1003
* 01:30 mutante: switched from phab1001 to phab1003 - applied on cp1008 varnish canary first
* 01:28 twentyafterfour: stopping phd on phab1001
* 01:18 mutante: phabricator going readonly momentarily
* 01:09 twentyafterfour: extended phab downtime in icinga, actual downtime hasn't started yet, prep work taking longer than expected
* 00:52 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@e040c6c]: Deploy GUI update (duration: 09m 54s)
* 00:45 mutante: phab1003 - rsyncing /srv/repos from phab1001
* 00:42 smalyshev@deploy1001: Started deploy [wdqs/wdqs@e040c6c]: Deploy GUI update
* 00:33 ejegg: updated payments-wiki from {{Gerrit|fa005a0640}} to {{Gerrit|786d76e212}}
 
== 2019-05-22 ==
* 23:30 twentyafterfour: scheduling downtime for phabricator from 0:00 to 1:00 utc
* 23:10 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511889/ (duration: 00m 55s)
* 22:18 mdholloway: mobileapps rolled back deployment (again) due to occasional references endpoint timeouts
* 22:17 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to {{Gerrit|fcf3724}}, take 2 (duration: 07m 19s)
* 22:15 foks: reset user email and password for Nv8200pa
* 22:09 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to {{Gerrit|fcf3724}}, take 2
* 22:09 mdholloway: mobileapps rolled back deployment due to endpoint check failure (not the same one as before); retrying momentarily
* 22:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to {{Gerrit|fcf3724}} (duration: 03m 25s)
* 22:08 foks: reset user email and password for DarkKyoushu
* 22:05 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@39e0ef1]: Update mobileapps to {{Gerrit|fcf3724}}
* 21:51 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/includes/resourceloader/MessageBlobStore.php: [[phab:T222539|T222539]] / {{Gerrit|734b3d84f7}} (duration: 00m 56s)
* 21:47 krinkle@deploy1001: Synchronized php-1.34.0-wmf.6/includes/resourceloader/MessageBlobStore.php: [[phab:T222539|T222539]] / {{Gerrit|3cb01cc73ce9}} (duration: 00m 56s)
* 21:41 urandom: decommissioning restbase1008-c -- [[phab:T223976|T223976]]
* 20:46 mdholloway: mobileapps rolled back deployment due to endpoint check failures
* 20:43 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@07632e1]: Update mobileapps to {{Gerrit|b058298}}, take 2 (duration: 04m 19s)
* 20:39 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@07632e1]: Update mobileapps to {{Gerrit|b058298}}, take 2
* 20:38 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@07632e1]: Update mobileapps to {{Gerrit|b058298}} (duration: 02m 41s)
* 20:35 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@07632e1]: Update mobileapps to {{Gerrit|b058298}}
* 19:26 jforrester@deploy1001: Finished scap: Re-build i18n and re-scap everything for i18n issues for [[phab:T224116|T224116]] [[phab:T224124|T224124]] [[phab:T220731|T220731]] (duration: 32m 55s)
* 18:53 jforrester@deploy1001: Started scap: Re-build i18n and re-scap everything for i18n issues for [[phab:T224116|T224116]] [[phab:T224124|T224124]] [[phab:T220731|T220731]]
* 18:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/FlaggedRevs: Hot-deploy reverting FlaggedRevs config for [[phab:T224116|T224116]] [[phab:T224124|T224124]] (duration: 00m 58s)
* 18:17 jforrester@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/UrlShortener/modules/ext.urlShortener.special.js: Fix i18n/command mix-up {{Gerrit|Ic99cf063a}} (duration: 01m 00s)
* 17:38 bblack: repool cp3046 as esams cache_upload ats-be node - [[phab:T222937|T222937]]
* 17:06 urandom: decommissioning restbase1008-b -- [[phab:T223976|T223976]]
* 16:17 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 to 1.34.0-wmf.5 [[phab:T224116|T224116]] [[phab:T224124|T224124]] # [[phab:T220731|T220731]]
* 15:11 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1002.wikimedia.org
* 15:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:08 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1002.wikimedia.org
* 15:07 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
* 15:04 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:04 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:04 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
* 15:00 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org
* 14:58 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:58 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
* 14:57 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:54 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org
* 14:49 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=nescio.wikimedia.org
* 14:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:45 jbond@cumin1001: conftool action : set/pooled=no; selector: name=nescio.wikimedia.org
* 14:42 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=maerlant.wikimedia.org
* 14:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:35 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:35 jbond@cumin1001: conftool action : set/pooled=no; selector: name=maerlant.wikimedia.org
* 14:17 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns4002.wikimedia.org
* 14:14 hashar: 1.34.0-wmf.6 deployed to group1 with the exception of cawikinews due to [[phab:T224116|T224116]]
* 14:14 mobrovac: start it, es wiki dumps (fr and de completed) to fill the new parsoid tables - [[phab:T215956|T215956]]
* 14:11 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:11 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:10 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns4002.wikimedia.org
* 14:09 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns4001.wikimedia.org
* 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:02 marostegui: Stop MySQL on db2078 for upgrade
* 13:58 bblack: depool cp3046 for reimage to ats-be - [[phab:T222937|T222937]]
* 13:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:58 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:57 moritzm: rebooting swift frontends in codfw
* 13:46 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns5002.wikimedia.org
* 13:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:43 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:43 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns5002.wikimedia.org
* 13:42 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns5001.wikimedia.org
* 13:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:35 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns5001.wikimedia.org
* 13:27 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/templates/: [[phab:T224092|T224092]] (duration: 00m 58s)
* 13:13 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.6 (duration: 00m 54s)
* 13:06 urandom: decommissioning restbase1008-a -- [[phab:T223976|T223976]]
* 12:39 marostegui: Stop replication on db2048 (s1 codfw master) to rebuild revision table - this will generate lag on codfw - [[phab:T224017|T224017]]
* 12:35 bblack: cp3035: restarting varnish backend
* 12:34 marostegui: Stop replication on db1080 to rebuild revision table - [[phab:T224017|T224017]]
* 12:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1080 to rebuild revision table [[phab:T224017|T224017]] (duration: 00m 55s)
* 11:30 Amir1: EU SWAT is done
* 11:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:503342{{!}}Remove constraint-suggestions beta feature (T220609)]] (duration: 00m 57s)
* 11:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:509878{{!}}Add configuration for EntitySchema ShExSimpleUrl (T223120)]] (duration: 00m 56s)
* 11:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:511674{{!}}[SDC] Enable depicts qualifiers on testcommons]] (duration: 00m 57s)
* 10:01 vgutierrez: restarting varnish-backend on cp3039
* 09:52 mobrovac: start the en, fr and de wiki dumps again to populate the new parsoid table - [[phab:T215956|T215956]]
* 09:43 mobrovac@deploy1001: Finished deploy [restbase/deploy@b90fb8b]: Temporarily copy from old tables to new ones if the data is not found, the correct way this time - [[phab:T215956|T215956]] (duration: 27m 07s)
* 09:42 marostegui: Stop MySQL on db2078:m5 to clone db2070 - [[phab:T221533|T221533]]
* 09:16 mobrovac@deploy1001: Started deploy [restbase/deploy@b90fb8b]: Temporarily copy from old tables to new ones if the data is not found, the correct way this time - [[phab:T215956|T215956]]
* 08:52 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2070 from s1 to m5 (duration: 00m 55s)
* 08:51 marostegui@deploy1001: sync-file aborted: Move db2070 from s1 to m5 (duration: 00m 03s)
* 08:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:42 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1086 (duration: 00m 56s)
* 08:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1086 into API (duration: 00m 56s)
* 08:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1086 (duration: 00m 55s)
* 07:41 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Tackle s8 codfw weights [[phab:T220170|T220170]] (duration: 00m 55s)
* 07:36 mobrovac: decommission restbase1007-c - [[phab:T223976|T223976]]
* 07:24 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Tackle s4 codfw weights [[phab:T220170|T220170]] (duration: 01m 06s)
* 07:23 marostegui: Restart MySQL on db2090 to change binlog format [[phab:T220170|T220170]]
* 06:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2040 from config [[phab:T224079|T224079]] (duration: 00m 55s)
* 06:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2040 from config [[phab:T224079|T224079]] (duration: 00m 56s)
* 06:13 marostegui: Remove db2040 from zarcillo and tendril - [[phab:T224079|T224079]]
* 06:01 marostegui: Stop MySQL on db2040 - [[phab:T224079|T224079]]
* 05:42 marostegui: Stop MySQL on db1086 to clone db1136
* 05:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1086 (duration: 00m 55s)
* 05:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2118 and db2120 into s7 [[phab:T222772|T222772]] (duration: 00m 55s)
* 05:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2118 and db2120 into s7 [[phab:T222772|T222772]] (duration: 00m 55s)
* 05:09 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1118 from s1 api and pool db1134 instead [[phab:T224017|T224017]] (duration: 00m 57s)
* 04:41 gilles: purging ruwiki and eswiki to make them get the new origin trial tokens
* 04:39 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Renew origin trial tokens (duration: 00m 57s)
* 03:22 legoktm: removed 2fa for [[phab:T224075|T224075]]
* 01:46 aaron@deploy1001: Synchronized php-1.34.0-wmf.5/includes/specials/SpecialWatchlist.php: {{Gerrit|68eeaa5b76738a6a07d148391220cdb6c8fd1d23}} (duration: 00m 57s)
* 01:22 aaron@deploy1001: Synchronized php-1.34.0-wmf.6/includes/specials/SpecialWatchlist.php: {{Gerrit|447bf504e498e2c18f29b90f7760514102236e4e}} (duration: 00m 57s)
 
== 2019-05-21 ==
* 23:47 maxsem@deploy1001: Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511668/ (duration: 00m 57s)
* 23:34 maxsem@deploy1001: Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/511667/ (duration: 00m 56s)
* 22:56 mutante: ms-be2034 - degraded systemd state was cleared and originally caused by " failed Session 72587 of user debmonitor"
* 22:56 mutante: ms-be2034 -  sudo systemctl reset-failed
* 22:51 urandom: decommissioning restbase1007-b -- [[phab:T223976|T223976]]
* 21:35 ejegg: updated payments-wiki from {{Gerrit|d5ef5ad067}} to {{Gerrit|fa005a0640}}
* 21:21 mutante: re-enabling puppet on mc1* hosts
* 20:43 mutante: re-enabling puppet on all hosts using memcached class - except mc1*
* 20:31 mutante: mc2019 - stopping memcached and letting puppet restart it to confirm no issues after switching to systemd::service
* 20:20 mutante: disabling puppet on all servers using class memcached (57)
* 20:06 tzatziki: removing (another) two files for legal compliance
* 19:43 tzatziki: removing two files for legal compliance
* 19:12 thcipriani: gerrit back on 2.15.13
* 19:09 thcipriani: restart gerrit for 2.15.13 update
* 19:08 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (cobalt, restart incoming) (duration: 00m 20s)
* 19:08 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (cobalt, restart incoming)
* 19:06 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (gerrit 2001 only) (duration: 00m 11s)
* 19:06 thcipriani@deploy1001: Started deploy [gerrit/gerrit@2de9001]: Gerrit to 2.15.13 (gerrit 2001 only)
* 18:50 bblack: repooling cp1085 frontends (weren't meant to be depooled)
* 18:38 bblack: re-pooling eqiad front edge traffic (onto new LVSes from [[phab:T184293|T184293]] )
* 18:36 XioNoX: update lvs static routes on cr1/2-eqiad - [[phab:T184293|T184293]]
* 18:06 andrewbogott: restarting rabbitmq-server on cloudcontrol1003 (turning on HA queues)
* 17:59 bblack: rebooting lvs1016 in attempt to clear interface config issues - [[phab:T224027|T224027]]
* 17:51 XioNoX: add BGP sessions to AS202053 in esams
* 17:31 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1016, bringing back pybal in "secondary" role for all 3 traffic classes (high-traffic1, high-traffic2, low-traffic), no traffic shift expected (again, after merging last-minute fixup https://gerrit.wikimedia.org/r/c/operations/puppet/+/511759 )
* 17:25 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1016, bringing back pybal in "secondary" role for all 3 traffic classes (high-traffic1, high-traffic2, low-traffic), no traffic shift expected
* 17:24 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1006, basically no-op
* 17:21 bblack: eqiad LVS: low-traffic (all internal services): puppeting lvs1015, bringing back pybal in primary role, shifting traffic to lvs1015
* 17:20 bblack: eqiad LVS: low-traffic (all internal services): disable pybal on lvs1016 + lvs1015, shifting traffic to lvs1006
* 17:18 reedy@deploy1001: Synchronized php-1.34.0-wmf.6/extensions/Collection/includes/CollectionHooks.php: Fix paths (duration: 00m 56s)
* 17:17 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1005, basically no-op
* 17:15 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1002, bringing back pybal in backup role, no traffic shift
* 17:13 bblack: eqiad LVS: high-traffic2 (upload): puppeting lvs1014, bringing back pybal in primary role, shifting traffic to lvs1014
* 17:11 bblack: eqiad LVS: high-traffic2 (upload): disable pybal on lvs1014 + lvs1002, shifting traffic to lvs1005
* 17:09 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1004, basically no-op
* 17:07 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1001, bringing back pybal in backup role, no traffic shift
* 17:06 bblack: eqiad LVS: high-traffic1 (text): puppeting lvs1013, bringing back pybal in primary role, shifting traffic to lvs1013
* 17:04 bblack: eqiad LVS: high-traffic1 (text): disable pybal on lvs1013 + lvs1001, shifting traffic to lvs1004
* 16:55 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:55 jbond42: rebooting wtp1046-1048
* 16:55 bblack: starting Eqiad LVS re-arrangement shortly - [[phab:T184293|T184293]] - https://gerrit.wikimedia.org/r/c/operations/puppet/+/511717 (eqiad front edge is still depooled from public traffic)
* 16:50 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:50 jbond42: rebooting wtp1043-1045
* 16:46 mutante: rebooting phab1003 (non-prod)
* 16:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:44 jbond42: rebooting wtp1040-1042
* 16:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:39 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:39 jbond42: rebooting wtp1037-1039
* 16:26 mobrovac: truncate "others_T_parsoid".data
* 16:25 mobrovac: restbase truncate "commons_T_parsoid".data
* 16:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:24 jbond42: rebooting wtp1033-1034
* 16:18 mobrovac: restbase truncate "enwiki_T_parsoid".data
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:16 jbond42: rebooting wtp1031-1032
* 16:10 mobrovac: restbase truncate "wikipedia_T_parsoid".data
* 16:09 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:09 jbond42: rebooting wtp1029-2030
* 16:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:01 jbond42: rebooting wtp1027-2028
* 15:56 urandom: decommissioning restbase1007-a -- [[phab:T208087|T208087]]
* 15:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:54 jbond42: rebooting wtp1025-2026
* 15:45 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found, rb1007 (duration: 02m 43s)
* 15:42 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found, rb1007
* 15:42 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found (duration: 02m 40s)
* 15:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:40 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:40 jbond42: rebooting wtp2019-2020
* 15:39 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Revert Temporarily copy from old tables to new ones if the data is not found
* 15:38 mobrovac@deploy1001: Finished deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found, take #2 (duration: 00m 45s)
* 15:38 mobrovac@deploy1001: Started deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found, take #2
* 15:37 mobrovac@deploy1001: Finished deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found - [[phab:T215956|T215956]] (duration: 07m 10s)
* 15:37 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Moving to 10% of users on php7 [[phab:T219150|T219150]] (duration: 00m 57s)
* 15:32 XioNoX: enable BGP to telia on cr1-codfw - [[phab:T222967|T222967]]
* 15:30 mobrovac@deploy1001: Started deploy [restbase/deploy@022cb98]: Temporarily copy from old tables to new ones if the data is not found - [[phab:T215956|T215956]]
* 15:24 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:23 jbond42: rebooting wtp2017-2018
* 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:13 jbond42: rebooting wtp2015-2016
* 15:10 XioNoX: disable BGP to telia on cr1-codfw - [[phab:T222967|T222967]]
* 15:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:05 jbond42: rebooting wtp2013-2014
* 15:02 crusnov@deploy1001: Finished deploy [netbox/deploy@3091b51]: deploy new version of ganeti-netbox sync - [[phab:T220422|T220422]] (duration: 00m 55s)
* 15:01 crusnov@deploy1001: Started deploy [netbox/deploy@3091b51]: deploy new version of ganeti-netbox sync - [[phab:T220422|T220422]]
* 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:57 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:57 jbond42: rebooting wtp2011-2012
* 14:57 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.6
* 14:50 jbond42: rebooting wtp2009-2010
* 14:50 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:44 jbond42: rebooting wtp2007-2008
* 14:44 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:44 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:37 jbond42: rebooting wtp2005-2006
* 14:31 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:31 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:31 jbond42: rebooting wtp2003-2004
* 14:27 hashar@deploy1001: Finished scap: testwiki to php-1.344.0-wmf.6 and rebuild l10n cache # [[phab:T220731|T220731]] (duration: 48m 09s)
* 14:26 volans: restarting wikibugs
* 14:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:25 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 14:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:13 jbond42: rebooting wtp2001-2002
* 13:50 bblack: rebooting lvs1013,14,15 for verification
* 13:39 hashar@deploy1001: Started scap: testwiki to php-1.344.0-wmf.6 and rebuild l10n cache # [[phab:T220731|T220731]]
* 13:37 hashar@deploy1001: Pruned MediaWiki: 1.34.0-wmf.1 (duration: 02m 12s)
* 13:36 hashar: scap clean --verbose --delete 1.34.0-wmf.1  # [[phab:T220731|T220731]]
* 13:29 hashar: scap clean --verbose --delete 1.33.0-wmf.25  # [[phab:T220731|T220731]]
* 13:25 godog: swift eqiad-prod: start depool ms-be1033 - [[phab:T223518|T223518]]
* 13:24 hashar: Applied security patches to 1.34.0-wmf.6 # [[phab:T220731|T220731]]
* 13:24 hashar: Applied security patches to 1.34.0-wmf.6
* 13:23 bblack: rebooting lvs1013 (possibly a few times, debugging startup issues)
* 13:20 hashar: scap prep 1.34.0-wmf.6  # [[phab:T220731|T220731]]
* 13:11 hashar: Updated plugins on https://releases-jenkins.wikimedia.org/
* 13:09 hashar: Restarting Jenkins [[phab:T224002|T224002]]
* 12:45 hashar: Cutting branch wmf/1.34.0-wmf.6 # [[phab:T220731|T220731]]
* 12:22 volans: restarting Icinga on icinga1001 to pick up new open files limits
* 12:08 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - [[phab:T219148|T219148]] (duration: 00m 54s)
* 12:07 jiji@deploy1001: Started deploy [cpjobqueue/deploy@4588f16]: Migrating htmlCacheUpdate to PHP7 - [[phab:T219148|T219148]]
* 11:59 mobrovac: started dewiki dumps - [[phab:T215956|T215956]]
* 11:58 mobrovac: started frwiki dumps - [[phab:T215956|T215956]]
* 11:46 mobrovac: started enwiki dumps - [[phab:T215956|T215956]]
* 11:27 Amir1: EU SWAT is done
* 11:27 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:511658{{!}}Revert "Switch off php7 for investigation of production instabilities"]] (duration: 00m 50s)
* 11:20 volans: restarting Icinga on icinga2001 (passive server) to pick up new open file limits
* 11:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:17 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:17 jbond42: reboot wtp1025.eqiad.wmnet
* 11:10 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:505816{{!}}Define wmgUseEntitySchema (T221651)]], part II (duration: 00m 49s)
* 11:09 mobrovac@deploy1001: Finished deploy [restbase/deploy@cf00120]: Switch Parsoid to simple k/v bucket - [[phab:T215956|T215956]] (duration: 25m 50s)
* 11:08 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:505816{{!}}Define wmgUseEntitySchema (T221651)]], part I (duration: 00m 50s)
* 11:07 godog: swift codfw-prod: remove ms-be201[345] - [[phab:T221068|T221068]]
* 10:59 _joe_: rolling restart of php7.2-fpm across the fleet to pick up a config change
* 10:44 mobrovac@deploy1001: Started deploy [restbase/deploy@cf00120]: Switch Parsoid to simple k/v bucket - [[phab:T215956|T215956]]
* 10:39 jijiki: updating prometheus-mcrouter-exporter on mw* servers
* 10:26 godog: pool new restbase hosts - [[phab:T219404|T219404]]
* 10:20 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1019.eqiad.wmnet
* 09:49 moritzm: updated buster netboot image to daily image from {{Gerrit|20190521}}
* 09:26 moritzm: reimaging graphite2001 to buster for some d-i tests
* 08:58 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2104 as candidate master and as API (duration: 00m 51s)
* 08:56 marostegui: Stop MySQL on db2041 as it will be decommissioned [[phab:T223950|T223950]]
* 06:59 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Turning off php7 sampling for investigation in [[phab:T223952|T223952]] (duration: 00m 53s)
* 06:55 elukey: reboot of stat100[4,5,6,7] and notebook100[3,4] for kernel upgrades
* 06:31 marostegui: Stop mariadb on db2104 to convert it to s2 candidate master
* 06:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2104 (duration: 00m 51s)
* 05:50 marostegui: Remove db2041 from tendril and zarcillo - [[phab:T223950|T223950]]
* 05:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2041 for decommissioning [[phab:T223950|T223950]] (duration: 00m 51s)
* 05:42 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2041 for decommissioning [[phab:T223950|T223950]] (duration: 00m 51s)
* 05:16 marostegui: Stop MySQL on db2040
* 05:16 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2040 (duration: 00m 50s)
* 05:14 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2114 into s6 - [[phab:T222772|T222772]] (duration: 00m 50s)
* 05:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2114 into s6 - [[phab:T222772|T222772]] (duration: 00m 51s)
* 03:36 urandom: bootstrapping restbase1027-c -- [[phab:T219404|T219404]]
* 00:47 urandom: bootstrapping restbase1027-b -- [[phab:T219404|T219404]]
* 00:05 aaron@deploy1001: Synchronized php-1.34.0-wmf.5/includes/libs/objectcache/APCUBagOStuff.php: {{Gerrit|982299d635623279}} (duration: 00m 54s)
 
== 2019-05-20 ==
* 21:07 ejegg: updated payments-wiki from {{Gerrit|8397ccf9cc}} to {{Gerrit|d5ef5ad067}}
* 19:20 mobrovac: bootstrap restbase1027-a - [[phab:T219404|T219404]]
* 18:55 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/includes/Linker.php: [[phab:T222857|T222857]] / {{Gerrit|Iecc2140fabd3}} (duration: 00m 54s)
* 16:43 onimisionipe: rolling reboot of maps eqiad to pick kernel upgrades
* 16:38 mobrovac: bootstrap restbase1026-c - [[phab:T219404|T219404]]
* 15:26 onimisionipe: rebooting codfw maps to pick up kernel upgrades
* 15:26 marostegui: Stop replication on labsdb1011 to start compressing tables - [[phab:T222978|T222978]]
* 15:13 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on group 0 ([[phab:T188327|T188327]]) (duration: 00m 55s)
* 14:54 bblack: rebooting lvs1013, lvs1014, lvs1015 (not in active service, yet)
* 14:43 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - [[phab:T219148|T219148]] (duration: 00m 55s)
* 14:42 jiji@deploy1001: Started deploy [cpjobqueue/deploy@89b0ad0]: Migrating RecordLintJob to PHP7 - [[phab:T219148|T219148]]
* 14:21 marostegui: Reload haproxy on dbroxy1010 to depool labsdb1011
* 14:14 marostegui: Reload haproxy on dbroxy1010 to repool labsdb1010
* 13:58 mobrovac: bootstrap restbase1026-b - [[phab:T219404|T219404]]
* 12:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 50s)
* 11:44 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:44 fsero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:28 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:28 fsero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:21 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:21 fsero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:17 mobrovac: bootstrap restbase1026-a - [[phab:T219404|T219404]]
* 11:16 fsero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:15 fsero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:01 arturo: icinga downtime toolschecker for 3h for [[phab:T223332|T223332]]
* 10:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:43 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:511398{{!}} Bumping portals to master (T128546)]] (duration: 00m 49s)
* 10:42 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:511398{{!}} Bumping portals to master (T128546)]] (duration: 00m 50s)
* 10:27 moritzm: rebooting contint1001 for kernel update
* 10:25 hashar: contint1001: docker image prune -f  {{!}} Total reclaimed space: 7.115GB {{!}} [[phab:T207707|T207707]]
* 10:20 hashar: Stopped Zuul gracefully
* 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:18 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:18 fsero: puppet reenabled certs renewed - [[phab:T221346|T221346]]
* 10:08 fsero: rolling over certs into mcrouter proxies codfw - [[phab:T221346|T221346]]
* 10:03 fsero: rolling over certs into mcrouter proxies eqiad - [[phab:T221346|T221346]]
* 09:42 marostegui: Remove db2036 from tendril and zarcillo - [[phab:T223885|T223885]]
* 09:39 marostegui: Stop MySQL on db2036 [[phab:T223885|T223885]]
* 09:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2036, going to be decommissioned [[phab:T223885|T223885]] (duration: 00m 49s)
* 09:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2036, going to be decommissioned [[phab:T223885|T223885]] (duration: 00m 49s)
* 09:36 fsero: rolling over new certs to all mcrouter hosts except proxys - [[phab:T221346|T221346]]
* 09:26 fsero: continue to rolling over new certs - [[phab:T221346|T221346]]
* 09:01 fsero: disabling puppet on mcrouter hosts for regenerating certs - [[phab:T221346|T221346]]
* 08:49 moritzm: installing atftpd security updates
* 08:43 mobrovac: bootstrap restbase1025-c - [[phab:T219404|T219404]]
* 08:38 moritzm: installing samba security updates
* 08:36 moritzm: installing ghostscript security updates on jessie
* 08:25 moritzm: installing cups-filter security updates on jessie (prerequisite for ghostscript security update)
* 07:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 48s)
* 07:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2046 (duration: 00m 50s)
* 06:25 elukey: rebuild and upload memkeys 20181031-1 to stretch-wikimedia
* 06:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1126 and db1134 (duration: 00m 49s)
* 06:20 elukey: upgrade memkeys to version 20181031-1 on all the mc* hosts (was deployied only on a few of them) - [[phab:T208376|T208376]]
* 06:11 mobrovac: bootstrap restbase1025-b - [[phab:T219404|T219404]]
* 06:00 elukey: powercycle analytics1071 - soft lockups error messages in the dmesg
* 05:51 marostegui: Reload haproxy on dbproxy1010 to depool labsdb1010
* 05:42 marostegui: Reload haproxy on dbproxy1010 and dbproxy1011 to repool labsdb1009 and restore original weights
* 05:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1126 into s8, db1134 into s1 [[phab:T222682|T222682]] (duration: 00m 49s)
* 05:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1126 into s8, db1134 into s1 [[phab:T222682|T222682]] (duration: 00m 49s)
* 05:12 marostegui: Stop MySQL on db2046
* 05:11 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2046 (duration: 00m 50s)
* 05:07 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2038 (duration: 00m 49s)
* 05:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2038 (duration: 00m 55s)
* 02:42 cdanis: cdanis@cp1075.eqiad.wmnet ~ % sudo -i varnish-backend-restart
 
== 2019-05-19 ==
* 20:16 ariel@deploy1001: Finished deploy [dumps/dumps@4febe0c]: for abstract dumps, skip any processing of pages not in main namespace (duration: 00m 03s)
* 20:16 ariel@deploy1001: Started deploy [dumps/dumps@4febe0c]: for abstract dumps, skip any processing of pages not in main namespace
* 17:51 mobrovac: bootstrap restbase1025-a - [[phab:T219404|T219404]]
* 13:26 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: [[phab:T223734|T223734]]: Depool cloudelastic100[12] (duration: 00m 49s)
* 12:37 reedy@deploy1001: Synchronized wmf-config/interwiki-labs.php: update (duration: 00m 57s)
* 10:32 reedy@deploy1001: Synchronized wikiversions-labs.json: [[phab:T223770|T223770]] (duration: 00m 48s)
* 10:31 reedy@deploy1001: Synchronized dblists/all-labs.dblist: [[phab:T223770|T223770]] (duration: 00m 51s)
* 10:12 mobrovac: bootstrap restbase1024-c - [[phab:T219404|T219404]]
* 09:59 ebernhardson: eqiad psi elasticsearch high disk watermark to 89% to allow unallocated shard to initialize
* 09:56 ebernhardson: eqiad psi elasticsearch low disk watermark to 79% to allow unallocated shard to initialize
* 08:13 jijiki: varnish-backend-restart on cp1087
* 06:56 mobrovac: bootstrap restbase1024-b - [[phab:T219404|T219404]]
* 05:09 marostegui: varnish-backend-restart on cp1081
 
== 2019-05-18 ==
* 23:53 bblack: rebooting lvs1015 for interface changes
* 22:44 bblack: imaging lvs1013-lvs1015
* 21:01 bblack: depooling eqiad public front edge in authdns
* 19:18 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/Collection/templates/CollectionSuggestTemplate.php: [[phab:T223742|T223742]] / {{Gerrit|89bd434a21a745ec}} (duration: 00m 49s)
* 19:16 mobrovac: bootstrap restbase1024-a - [[phab:T219404|T219404]]
* 18:50 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T222146|T222146]] / {{Gerrit|9385b2dd66}} (duration: 00m 50s)
* 16:53 mobrovac: bootstrap restbase1023-c - [[phab:T219404|T219404]]
* 15:57 krinkle@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/TimedMediaHandler/includes/handlers/WebMHandler/WebMHandler.php: [[phab:T223445|T223445]] / {{Gerrit|a9df59c59d7a30}} (duration: 00m 51s)
* 14:59 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: whitespace is srs (duration: 00m 49s)
* 14:56 reedy@deploy1001: Synchronized wmf-config/flaggedrevs.php: Copy in default config (duration: 01m 04s)
* 13:51 urandom: bootstrapping restbase1023-b - [[phab:T219404|T219404]]
* 05:41 mobrovac: bootstrap rb1023-a - [[phab:T219404|T219404]]
* 02:37 urandom: bootstrapping restbase1022-c - [[phab:T219404|T219404]]
 
== 2019-05-17 ==
* 23:55 urandom: bootstrapping restbase1022-b - [[phab:T219404|T219404]]
* 23:11 foks: removing one file for legal compliance
* 15:20 hashar@deploy1001: Synchronized php-1.34.0-wmf.5/includes/api/ApiUpload.php: Revert "Always validate uploads over api" - [[phab:T223448|T223448]] ([[phab:T222994|T222994]] [[phab:T223446|T223446]]) (duration: 01m 00s)
* 15:18 hashar: Deploying hotfix https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/510924/ . Should restore upload of large files on commons and other wikis #[[phab:T223448|T223448]] (poke [[phab:T22994|T22994]]  [[phab:T223446|T223446]] )
* 14:51 mobrovac: bootstrap restbase1022-a - [[phab:T219404|T219404]]
* 14:43 fsero: reenabling puppet puppet on mcrouter hosts  for [[phab:T221346|T221346]], checks in place is there any alert for cert expiration and mcrouter this is the source :)
* 14:17 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1098 & db1131 after maintenance (duration: 00m 49s)
* 14:09 fsero: second round of setting up cert check, disablign puppet on mcrouter hosts [[phab:T221346|T221346]]
* 12:58 mobrovac: bootstrap restbase1021-c - [[phab:T219404|T219404]]
* 10:59 mobrovac: bootstrap restbase1021-b - [[phab:T219404|T219404]]
* 09:27 godog: swift remove ms-be101[345] from rings - [[phab:T220590|T220590]]
* 09:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1083 (duration: 00m 48s)
* 08:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
* 08:43 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
* 08:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1083 (duration: 00m 49s)
* 08:24 fsero: reenabling puppet after reverting [[phab:T221346|T221346]]
* 08:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1083 (duration: 00m 59s)
* 07:57 fsero: disabling puppet on mcrouter hosts for [[phab:T221346|T221346]]
* 07:12 marostegui: Compress s7 on labsdb1012 [[phab:T222978|T222978]]
* 06:36 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2111 and db2113 into s5 [[phab:T222772|T222772]] (duration: 00m 49s)
* 06:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2111 and db2113 into s5 [[phab:T222772|T222772]] (duration: 00m 50s)
* 05:19 marostegui: Stop MySQL on db1083 to clone db1134
* 05:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1083 (duration: 00m 50s)
* 05:00 mobrovac: bootstrap 1021-a - [[phab:T219404|T219404]]
 
== 2019-05-16 ==
* 21:02 Jeff_Green: authdns-update to switch payments.wikimedia.org back to eqiad cluster
* 19:24 onimisionipe: pooling elastic2038 - shards are properly balanced across nodes
* 18:31 onimisionipe: depooling elastic2038 to investigate more
* 17:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:26 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:26 jbond42: reboot ores1007-1009
* 17:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:15 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:15 jbond42: reboot ores1005-1006
* 17:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:10 jbond42: reboot ores1003-1004
* 17:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:05 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 17:05 jbond42: reboot ores1001-1002
* 17:00 jbond42: reboot orespoolcounter[12]002
* 16:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:53 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:53 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:53 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:52 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:52 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:51 jbond42: reboot orespoolcounter[12]001
* 16:44 jbond42: reboot ores2008-2009
* 16:38 jbond42: will frist reboot ores2006-2007
* 16:36 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:36 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:36 jbond42: reboot ores2006-2009
* 16:28 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:28 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 16:28 jbond42: reboot ores2003-2005
* 16:22 XioNoX: add BGP session to Hetzner in AMS-IX
* 16:19 akosiaris: switch all etcd* kubestagetcd* servers from "drbd" ganeti disk template to "plain" ganeti disk template
* 16:17 jbond42: reboot ores2001-2002
* 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:59 akosiaris: build service-checker OCI container 0.0.2 with 0.1.5 service-checker version [[phab:T220401|T220401]]
* 15:49 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/CirrusSearch/includes/InterwikiSearcher.php: Hot-deploy CirrusSearch interwiki no result UBN [[phab:T223449|T223449]] (duration: 00m 49s)
* 15:45 marostegui: Drop the following databases from tendril to recreated them with the right user: db1127,db1129,db1130, db1131, db1137,db1138
* 15:35 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/includes/specials/pagers/ContribsPager.php: Hot-deploy Contribs getNamespaceInfo UBN fix [[phab:T223440|T223440]] (duration: 00m 53s)
* 15:25 aborrero@puppetmaster1001: conftool action : set/pooled=yes; selector: name=labweb1001.wikimedia.org,service=labweb
* 15:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 jbond42: rebooting aqs1009
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:54 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:54 jbond42: rebooting aqs1008
* 14:45 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:45 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:45 jbond42: rebooting aqs1007
* 14:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:34 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:34 jbond42: rebooting aqs1006
* 14:28 jbond42: rebooting aqs1005
* 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:21 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:18 moritzm: powercycling mw2199, stuck during reboot
* 14:08 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:08 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:07 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 14:07 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:07 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:57 marostegui: and recreate the following hosts in tendril: db2103,db2104,db2105,db2106,db2107,db2108,db2109,db2110,db2111,db2112,db2113,db2115,db2116,db2117,db2119 [[phab:T222772|T222772]]
* 13:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:50 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:39 cmjohnson1: replacing pdu in rack B5 eqiad
* 13:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.5
* 13:00 arturo: labweb1001 depooled
* 12:59 mobrovac: bootstrap restbase1020-c - [[phab:T219404|T219404]]
* 12:21 godog: stop swift and rsync on ms-be10[16,17,18,32,33] for eqiad B5 pdu replacement - [[phab:T223126|T223126]]
* 12:03 jynus: stop and shutdown db1098,db1131,db1139 [[phab:T223126|T223126]]
* 11:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:54 moritzm: rebooting mw app servers in codfw for kernel update
* 11:32 hoo@deploy1001: Synchronized wmf-config/extension-list: Add EntitySchema to extension-list ([[phab:T221650|T221650]]) (duration: 00m 56s)
* 11:22 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1098 & db1131 for maintenance (duration: 00m 57s)
* 11:00 arturo: [[phab:T223148|T223148]] downtime cloudvirt[1014,1028].eqiad.wmnet and labweb1001.wikimedia.org for 8 hours
* 11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 10:50 godog: bootstrap restbase1020-b - [[phab:T219404|T219404]]
* 10:27 jiji@deploy1001: Finished deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - [[phab:T219148|T219148]] (duration: 01m 07s)
* 10:26 jiji@deploy1001: Started deploy [cpjobqueue/deploy@4d55dff]: Migrating updateBetaFeaturesUserCounts to PHP7 - [[phab:T219148|T219148]]
* 08:52 akosiaris: upgrade mathoid to statsd_exporter 0.9 [[phab:T220709|T220709]]
* 08:48 akosiaris@deploy1001: scap-helm mathoid finished
* 08:48 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
* 08:48 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
* 08:48 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
* 08:47 akosiaris@deploy1001: scap-helm mathoid upgrade -f mathoid-values.yaml [namespace: mathoid, clusters: eqiad,codfw]
* 08:37 godog: bootstrap restbase1020-a - [[phab:T219404|T219404]]
* 08:32 elukey: depool/restart-nutcracker-pool mw1293/1313 - [[phab:T214275|T214275]]
* 08:22 elukey: depool/restart-nutcracker-pool mw1238 - [[phab:T214275|T214275]]
* 08:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1104 (duration: 00m 56s)
* 07:57 moritzm: installing linux 4.9.168-1+deb9u2~deb8u1 kernel on jessie hosts (no reboots, just installing the new package)
* 07:45 moritzm: removed intel-microcode 3.{{Gerrit|20180807a}} from jessie-wikimedia (superceded by newer version in security.debian.org, which doesn't get picked up by apt due to the higher apr priority of jessie-wikimedia)
* 07:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1104 into API (duration: 00m 56s)
* 07:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1104 (duration: 00m 57s)
* 06:59 moritzm: installing intel-microcode updates
* 05:34 elukey: roll restart of nutcracker on mw2* to pick up new config changes (no more memcached config) - [[phab:T214275|T214275]]
* 05:33 marostegui: Stop MySQL on db1104 to clone db1126
* 05:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 (duration: 00m 56s)
* 05:18 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2106, db2110, db2119 into s4 - [[phab:T222772|T222772]] (duration: 00m 56s)
* 05:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2106, db2110, db2119 into s4 - [[phab:T222772|T222772]] (duration: 00m 58s)
* 02:27 onimisionipe: pooling elastic2038 after unbanning - [[phab:T217398|T217398]]
 
== 2019-05-15 ==
* 22:16 mutante: phab1003 - start ssh-phab service after adding service IPs
* 22:01 eileen: civicrm update - lost the commit versions but 5.13.4 release
* 21:47 mutante: phab1003 - ip -6 addr del 2620:0:861:ed1a::3:16/128 dev lo - remove extra service IP for phab's separate sshd, duplicated with phab1001 ([[phab:T190568|T190568]])
* 21:24 jforrester@deploy1001: Synchronized wmf-config/MetaContactPages.php: Add movecomsignup contact page on meta [[phab:T218363|T218363]] (duration: 00m 56s)
* 21:23 eileen: civicrm revision changed from {{Gerrit|7d3ef1f2ae}} to {{Gerrit|c69c6e2e6a}}, config revision is {{Gerrit|a099f13a55}}
* 21:00 fdans@deploy1001: Finished deploy [analytics/refinery@ffa4931]: deploying analytics refinery (duration: 15m 31s)
* 20:45 tgr@deploy1001: Finished deploy [proton/deploy@9373c42]: Add gistcdn.githack.com to host blacklist ([[phab:T213362|T213362]]) (duration: 02m 41s)
* 20:45 fdans@deploy1001: Started deploy [analytics/refinery@ffa4931]: deploying analytics refinery
* 20:42 tgr@deploy1001: Started deploy [proton/deploy@9373c42]: Add gistcdn.githack.com to host blacklist ([[phab:T213362|T213362]])
* 20:20 robh: rebooting cloudvirt1015 into dell hardware tests per [[phab:T220853|T220853]]
* 20:18 arlolra@deploy1001: Finished deploy [parsoid/deploy@8f28977]: Updating Parsoid to {{Gerrit|6658cad}} (duration: 06m 23s)
* 20:12 arlolra@deploy1001: Started deploy [parsoid/deploy@8f28977]: Updating Parsoid to {{Gerrit|6658cad}}
* 19:42 hashar: group 1 promoted to 1.34.0-wmf.5  apparently without any issue # [[phab:T220730|T220730]]
* 19:03 hashar@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.5 (duration: 00m 58s)
* 19:02 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.5
* 18:38 andyrussg@deploed php-1.34.0-wmf.5/extensions/CentralNotice/: Revert CentralNotice (duration: 01m 00s)
* 17:32 thcipriani: deploy1001:sudo -u www-data /usr/local/bin/foreachwiki extensions/WikimediaMaintenance/refreshMessageBlobs.php
* 17:19 onimisionipe: unban elastic2038 from shard allocation - [[phab:T217398|T217398]]
* 17:19 XenoRyet: updated civicrm from {{Gerrit|4b6d569383}} to {{Gerrit|7d3ef1f2ae}}
* 17:09 elukey: powerup elastic2038 (was down for maintenance)
* 17:01 godog: bootstrap restbase1019-c - [[phab:T219404|T219404]]
* 16:58 bstorm_: [[phab:T212972|T212972]] updated all views on labsdb1012
* 16:50 elukey: restart Hadoop HDFS namenodes on an-master100[1,2] to pick up new settings
* 16:40 urandom: bootstrap restbase1019-c - [[phab:T219404|T219404]]
* 16:28 elukey: restart nutcracker on mw2240 to pick up the new config (no more memcached settings)
* 16:26 bstorm_: [[phab:T212972|T212972]] updated all views on labsdb1009
* 16:17 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T223166|T223166]] (duration: 00m 56s)
* 16:16 reedy@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/WikimediaEvents/: [[phab:T219128|T219128]] (duration: 01m 13s)
* 16:14 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/WikimediaEvents/: [[phab:T219128|T219128]] (duration: 01m 06s)
* 16:03 jynus: disable puppet on all production databases
* 15:21 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: [[phab:T222980|T222980]] (duration: 00m 57s)
* 14:28 andrewbogott: repooling labweb1002
* 14:16 andrewbogott: depooling labweb1002 to test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509916/
* 14:15 godog: bootstrap restbase1019-b - [[phab:T219404|T219404]]
* 13:21 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-new/read-new on testwikis and mediawikiwiki ([[phab:T188327|T188327]]) (duration: 00m 57s)
* 12:22 Lucas_WMDE: EU SWAT done
* 12:20 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: SWAT: [[gerrit:510217{{!}}VisualEditorHooks: Use isVisualAvailable() when changing tabs/editsections]] + [[gerrit:510218{{!}}DesktopArticleTarget.init: Allow veaction=edit to override namespace settings (T221892)]] (duration: 01m 15s)
* 12:20 akosiaris: depool esams, network issues
* 11:47 akosiaris@deploy1001: scap-helm mathoid finished
* 11:47 akosiaris@deploy1001: scap-helm mathoid cluster staging completed
* 11:46 akosiaris@deploy1001: scap-helm mathoid upgrade --wait -f mathoid-staging-values.yaml staging stable/mathoid [namespace: mathoid, clusters: staging]
* 11:41 akosiaris@deploy1001: scap-helm citoid finished
* 11:41 akosiaris@deploy1001: scap-helm citoid cluster eqiad completed
* 11:41 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad]
* 11:32 akosiaris@deploy1001: scap-helm citoid finished
* 11:32 akosiaris@deploy1001: scap-helm citoid cluster codfw completed
* 11:31 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw]
* 11:31 godog: bootstrap restbase1019-a - [[phab:T219404|T219404]]
* 11:29 akosiaris: upgrade to statsd_export 0.9 for citoid [[phab:T220709|T220709]]
* 11:27 akosiaris@deploy1001: scap-helm citoid finished
* 11:27 akosiaris@deploy1001: scap-helm citoid cluster staging completed
* 11:27 akosiaris@deploy1001: scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging]
* 10:31 elukey: superset.wikimedia.org moved to analytics-tool1004 (Buster + python 3.7 + Superset 0.32 upgrade)
* 10:27 moritzm: installing linux 4.9.168-1+deb9u2 kernel on stretch hosts (no reboots, just installing the new package)
* 10:04 elukey@deploy1001: Finished deploy [analytics/superset/deploy@9cdb9c5]: Superset 0.32 - update pyhive dependency (duration: 00m 26s)
* 10:04 elukey@deploy1001: Started deploy [analytics/superset/deploy@9cdb9c5]: Superset 0.32 - update pyhive dependency
* 09:33 hashar: Disable CI castor cache system since the instance is being migrated. Some / most CI jobs might have failed for the last 20 minutes or so [[phab:T223148|T223148]]
* 08:45 elukey@deploy1001: Finished deploy [analytics/superset/deploy@31c2c30]: Superset 0.32 (duration: 00m 26s)
* 08:44 elukey@deploy1001: Started deploy [analytics/superset/deploy@31c2c30]: Superset 0.32
* 08:36 elukey: stop superset on analytics-tool1003 as prep step for the migration to the new host - [[phab:T212243|T212243]]
* 08:31 moritzm: rebooting mw2164
* 07:33 elukey: restart nutcracker on mw2245 to pick up config changes (removal of memcached config)
* 07:29 elukey: powercycle an-worker1094 (OEM event occurred, checking if temporary)
* 07:21 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove the php7 beta feature [[phab:T219128|T219128]] (duration: 00m 59s)
* 06:24 elukey: force remount of /mnt/hdfs on stat1007 - fuse hdfs stuck
* 01:40 eileen: process control updated - omnigroupmember.load re-enabled
* 01:39 eileen: civicrm revision changed from {{Gerrit|5024c968ed}} to {{Gerrit|4b6d569383}}, config revision is {{Gerrit|a099f13a55}}
 
== 2019-05-14 ==
* 20:44 herron@deploy1001: Finished deploy [logstash/plugins@7fb8843]: adding logstash-filter-truncate plugin (duration: 00m 07s)
* 20:43 herron@deploy1001: Started deploy [logstash/plugins@7fb8843]: adding logstash-filter-truncate plugin
* 20:41 herron@deploy1001: Finished deploy [logstash/plugins@7fb8843]: (no justification provided) (duration: 00m 01s)
* 20:41 herron@deploy1001: Started deploy [logstash/plugins@7fb8843]: (no justification provided)
* 20:13 chaomodus: restarting gerrit on cobalt to pick up metrics export changes
* 19:37 herron: adding logstash filter truncate plugin to prod logstash collectors
* 19:28 gehel: shutting down elastic2038 for memory replacement - [[phab:T217398|T217398]]
* 19:25 gehel: ban elastic2038 from elasticsearch cluster for memory replacement - [[phab:T217398|T217398]]
* 18:21 mutante: mwmaint1002 - deleting /root/home-mwmaint2001 to save space - confirmed we have bacula backups of home on mwmaint2001
* 17:55 mutante: elastic2029 - enable puppet agent - was disabled without reason and nobody seems to have logged in recently
* 17:54 mutante: elastic2038 - restart nagios-nrpe-server - attempt to fix "CHECK_NRPE STATE UNKNOWN" for a single check
* 17:32 mutante: contint1001 - mkdir /srv/zuul-logs ; mv /var/log/zuul/debug.log* /srv/zuul-logs/ to prevent CI running out of disk again ([[phab:T207707|T207707]])
* 17:22 mbsantos@deploy1001: Finished deploy [proton/deploy@881b22b]: Update chromium-render to {{Gerrit|8cc96e7}} make timeout handler more robust ([[phab:T217724|T217724]]) (duration: 02m 23s)
* 17:20 mbsantos@deploy1001: Started deploy [proton/deploy@881b22b]: Update chromium-render to {{Gerrit|8cc96e7}} make timeout handler more robust ([[phab:T217724|T217724]])
* 16:30 jynus: stop replication and start table recompression on labsdb1009 [[phab:T222978|T222978]]
* 16:22 godog: statsd_exporter 0.9 upgrade on thumbor - [[phab:T220709|T220709]]
* 16:04 gilles@deploy1001: Finished deploy [performance/coal@5a32eb2]: [[phab:T221401|T221401]] (duration: 00m 06s)
* 16:04 gilles@deploy1001: Started deploy [performance/coal@5a32eb2]: [[phab:T221401|T221401]]
* 15:56 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/includes/ApiVisualEditor.php: Hot-deploy VE unset variable fix [[phab:T223281|T223281]] (duration: 00m 55s)
* 15:51 jforrester@deploy1001: Synchronized php-1.34.0-wmf.5/extensions/VisualEditor/includes/ApiVisualEditor.php: Hot-deploy VE unset variable fix [[phab:T223281|T223281]] (duration: 00m 57s)
* 15:49 crusnov@deploy1001: Finished deploy [netbox/deploy@81059c6]: Deploy new reqs for reports (duration: 00m 55s)
* 15:49 crusnov@deploy1001: Started deploy [netbox/deploy@81059c6]: Deploy new reqs for reports
* 15:43 jynus: reload haproxy config @ dbproxy1010, dbproxy1011
* 15:38 XioNoX: re-activate bgp to telia on cr1-codfw - [[phab:T222967|T222967]]
* 15:33 XioNoX: deactivate bgp to telia on cr1-codfw - [[phab:T222967|T222967]]
* 15:19 papaul: shutting down elastic2038 for memory replacement
* 15:14 hashar: mw1263: scap pull
* 14:53 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.5
* 14:50 moritzm: rebooting mw1263 for kernel update
* 14:47 hashar@deploy1001: Finished scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache (duration: 62m 47s)
* 14:07 _joe_: apt-get lean on mwmaint1002
* 13:44 hashar@deploy1001: Started scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache
* 13:44 godog: rearm keyholder on deploy and cumin hosts
* 13:27 hashar@deploy1001: Finished scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache (duration: 14m 39s)
* 13:12 hashar: train delay, I forgot to sync 1.34.0-wmf.5
* 13:12 hashar@deploy1001: Started scap: testwiki to 1.34.0-wmf.5 and rebuild l10n cache
* 12:37 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: Hot-deploy [[phab:T223023|T223023]] fix {{Gerrit|I1b35b28e42}} for mobile VE edit section switches (duration: 00m 54s)
* 12:10 moritzm: rebooting mw2164 for kernel update
* 11:33 hashar@deploy1001: Pruned MediaWiki: 1.33.0-wmf.24 (duration: 03m 20s)
* 11:30 hashar: Deleting 1.33.0-wmf.24 from deploy1001 # [[phab:T220730|T220730]]
* 11:28 kart_: EU-Mid day SWAT Done.
* 11:25 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT [[gerrit:508818{{!}}Decrease idwiki MT thresold for publishing]] ([[phab:T222782|T222782]]) (duration: 00m 51s)
* 11:23 hashar@deploy1001: clean aborted: Pruned MediaWiki: 1.33.0-wmf.23 (duration: 14m 31s)
* 11:23 jbond42: cumin1001 ~ % sudo cumin A:all '/usr/local/sbin/run-puppet-agent --failed-only
* 11:18 jbond42: enable puppet issue fixed https://gerrit.wikimedia.org/r/c/operations/puppet/+/510131
* 11:12 ema: pool cp3036 reimaged to ATS [[phab:T222937|T222937]]
* 11:09 hashar: Deleting 1.33.0-wmf.23 from deploy1001 # [[phab:T220730|T220730]]
* 11:09 jbond42: disable puppet
* 10:58 hashar: scap prep 1.34.0-wmf.5 # [[phab:T220730|T220730]]
* 10:16 hashar: Cutting branches for 1.34.0-wmf.5
* 10:01 ema: depool cp3036 and reimage as upload_ats [[phab:T222937|T222937]]
* 09:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db2034 from config [[phab:T219493|T219493]] (duration: 00m 49s)
* 09:53 marostegui@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 09:52 marostegui: Remove db2034 from tendril and zarcillo - [[phab:T219493|T219493]]
* 09:51 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove db2034 from config [[phab:T219493|T219493]] (duration: 00m 50s)
* 09:34 jynus: restart apache on ununpentium
* 09:29 marostegui: Parsercache deployment window FINISHED
* 09:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Deploy second parsercache key change everywhere after deploying it in batches first [[phab:T210725|T210725]] (duration: 00m 50s)
* 09:15 godog: statsd_exporter 0.9 upgrade on ores - [[phab:T220709|T220709]]
* 09:02 godog: statsd_exporter 0.9 upgrade on logstash - [[phab:T220709|T220709]]
* 08:53 jynus: failing connections over dbproxy1006 to dbproxy1001
* 07:48 moritzm: installing bind security updates for stretch (only client-side tools/libraries in use)
* 06:45 ema: cp-ats: upgrade trafficserver to 8.0.3-1wm2
* 06:20 ema: cp4021: upgrade trafficserver to 8.0.3-1wm2
* 06:15 ema: upload trafficserver 8.0.3-1wm2 to stretch-wikimedia
* 06:02 marostegui: Deploy parsercache change to eqiad canaries - [[phab:T210725|T210725]]
* 06:01 marostegui: Lock wmf-config deployment on deploy1001 to slowly change parsercache key on eqiad - [[phab:T210725|T210725]]
* 06:01 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Change parsercache on codfw [[phab:T210725|T210725]] (duration: 00m 54s)
* 01:55 mutante: re-scheduled nginx / HTTP availability icinga checks
* 01:42 mutante: cumin -b 6 'R:git::clone' 'run-puppet-agent -q --failed-only'
* 01:37 mutante: restarting Gerrit to apply 2 config changes - disable DNS reverse lookup (gerrit:508127) & list projects from index (gerrit:508892) - removes blockers for 2.16 upgrade ([[phab:T200739|T200739]])
* 00:32 mutante: restarting wikibugs because it left some channels
 
== 2019-05-13 ==
* 20:29 ejegg: updated payments-wiki from {{Gerrit|6e0172bac3}} to {{Gerrit|8397ccf9cc}}
* 20:24 halfak@deploy1001: Finished deploy [ores/deploy@c17a1a2]: [[phab:T202202|T202202]] (duration: 04m 16s)
* 20:20 halfak@deploy1001: Started deploy [ores/deploy@c17a1a2]: [[phab:T202202|T202202]]
* 20:19 ariel@deploy1001: Finished deploy [dumps/dumps@941d374]: lbzip2 decompression for 7z file production for big wikis (duration: 00m 03s)
* 20:19 ariel@deploy1001: Started deploy [dumps/dumps@941d374]: lbzip2 decompression for 7z file production for big wikis
* 20:04 halfak@deploy1001: Started deploy [ores/deploy@c17a1a2]: [[phab:T202202|T202202]]
* 18:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync: re-enabling all eventgate-analytics monolog events - [[phab:T222962|T222962]] (duration: 00m 49s)
* 18:28 ejegg: updated SmashPig standalone deploy {{Gerrit|22b6982}} Try turning off WSDL caching for Adyen
* 18:25 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T222954|T222954]] (duration: 00m 49s)
* 18:19 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-enabling all eventgate-analytics monolog events - [[phab:T222962|T222962]] (duration: 00m 50s)
* 18:17 ottomata: re-enabling all eventgate-analytics monolog events - [[phab:T222962|T222962]]
* 18:12 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T223006|T223006]] [[phab:T222740|T222740]] [[phab:T222044|T222044]] (duration: 00m 49s)
* 18:07 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:07 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
* 18:07 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/eqiad-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: eqiad]
* 18:05 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:05 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
* 18:04 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
* 18:03 fsero: deleting eventgate-analytics-production releases on codfw
* 18:01 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:01 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 18:01 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/staging-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: staging]
* 17:57 fsero: deleting eventgate-analytics and eventgate-analytics-staging releases on staging
* 17:41 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: retry - disabling all eventgate-analytics monolog events for eventgate chart migration - [[phab:T222962|T222962]] (duration: 00m 50s)
* 17:11 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: disabling all eventgate-analytics monolog events for eventgate chart migration - [[phab:T222962|T222962]] (duration: 00m 50s)
* 17:10 ottomata: disabling all eventgate-analytics monolog events for eventgate chart migration - [[phab:T222962|T222962]]
* 16:14 Amir1: removing tokipona language terms from items using maintenance script ([[phab:T200432|T200432]])
* 16:00 andrewbogott: reimaging clouvirt1024 (for the last time I hope)
* 14:33 otto@deploy1001: Synchronized wmf-config/ProductionServices.php: no-op in prod - Configure eventgate services in beta (duration: 00m 49s)
* 14:32 otto@deploy1001: Synchronized wmf-config/LabsServices.php: no-op in prod - Configure eventgate services in beta (duration: 00m 49s)
* 14:05 moritzm: uploaded puppet 4.8.2-5+wmf1 to component/puppetdb4 for apt.wikimedia.org/stretch-wikimedia  ([[phab:T219803|T219803]])
* 14:00 elukey: roll restart of aqs on aqs1* to pick up new druid settings
* 13:50 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -b8 'ms-fe2*' 'run-puppet-agent'
* 13:46 moritzm: updating puppet on deployment-puppetmaster03 to 4.8.2-5+wmf1 ([[phab:T219803|T219803]])
* 13:39 akosiaris: bump eventgate-analytics chart to 0.0.36. Renames nodejs GC stats to microseconds and bumps the biggest bucket to 100ms. [[phab:T220709|T220709]]
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-staging-values.yaml staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics finished
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics cluster codfw completed
* 13:38 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-codfw-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
* 13:36 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on all wikis ([[phab:T188327|T188327]]) (duration: 00m 50s)
* 13:30 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -b8 'ms-be2*' 'run-puppet-agent'
* 13:29 cdanis: swift codfw-prod: deploy {{Gerrit|I1035824d}}
* 13:25 moritzm: uploaded puppetdb 4.4.0-1~wmf2 to component/puppetdb4 for apt.wikimedia.org/stretch-wikimedia  ([[phab:T219803|T219803]])
* 13:07 akosiaris: bump cxserver chart to 0.0.7. Renames nodejs GC stats to microseconds and bumps the biggest bucket to 100ms. [[phab:T220709|T220709]]
* 13:06 akosiaris@deploy1001: scap-helm cxserver finished
* 13:06 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
* 13:06 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
* 13:06 akosiaris@deploy1001: scap-helm cxserver finished
* 13:06 akosiaris@deploy1001: scap-helm cxserver cluster eqiad completed
* 13:06 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
* 13:06 akosiaris@deploy1001: scap-helm cxserver finished
* 13:06 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
* 13:05 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
* 13:04 arturo: install libjs-jquery from stretch in cloudnet servers [[phab:T222862|T222862]]
* 13:03 arturo: enable puppet in cloudvirt1024 to refresh some apt config [[phab:T222862|T222862]]
* 12:50 moritzm: updating puppetdb on deployment-puppetdb02 to 4.4.0-1~wmf2 ([[phab:T219803|T219803]])
* 12:36 cdanis: root@ms-be2013.codfw.wmnet ~ # umount /srv/swift-storage/sda1 && mount /srv/swift-storage/sda1 && umount /srv/swift-storage/sdb1 && mount /srv/swift-storage/sdb1
* 12:36 krinkle@deploy1001: Synchronized php-1.34.0-wmf.4/resources/src/startup/startup.js: {{Gerrit|I76a2c8d52fa}} (duration: 00m 51s)
* 12:33 cdanis: root@ms-be2013.codfw.wmnet ~ # mount /srv/swift-storage/sdf1
* 12:25 cdanis: cdanis@ms-be2015.codfw.wmnet ~ % sudo umount /srv/swift-storage/sdl1 && sudo mount /srv/swift-storage/sdl1
* 12:25 cdanis: cdanis@ms-be2015.codfw.wmnet ~ % sudo umount /srv/swift-storage/sdf1 && sudo mount /srv/swift-storage/sdf1
* 12:18 cdanis: cdanis@ms-be2015.codfw.wmnet /var/log % sudo mount /srv/swift-storage/sda1
* 12:08 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/Wikibase/lib/includes/Formatters/CachingKartographerEmbeddingHandler.php: [[phab:T223085|T223085]] (duration: 00m 50s)
* 11:59 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/composer.json: [[phab:T215746|T215746]] (duration: 00m 49s)
* 11:58 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/vendor/: [[phab:T215746|T215746]] (duration: 01m 30s)
* 11:43 reedy@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/: [[phab:T222639|T222639]] (duration: 00m 52s)
* 11:04 ema: cp-ats rolling restart to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/509456/
* 10:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/includes/http/HttpRequestFactory.php: [[phab:T222935|T222935]] Hot-deploy fix for HttpRequestFactory (duration: 00m 50s)
* 10:38 jbond42: update puppet5 and facter3 in eqiad
* 10:17 vgutierrez: rebooting cloudvirt1024 - [[phab:T209707|T209707]]
* 09:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1064 [[phab:T217396|T217396]] (duration: 00m 49s)
* 09:33 hashar: Upgrading Zuul 2.5.1-wmf7 -> 2.5.1-wmf9 [[phab:T105474|T105474]]
* 07:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully pool db1130 (s5) and db1138 (s4) [[phab:T222682|T222682]] (duration: 00m 50s)
* 07:08 elukey: slow roll restart of celery on ores* nodes to allow cores to be generated upon segfault - [[phab:T222866|T222866]]
* 07:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) [[phab:T222682|T222682]] (duration: 00m 50s)
* 06:53 moritzm: installing ghostscript security updates
* 06:44 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) [[phab:T222682|T222682]] (duration: 00m 49s)
* 06:09 marostegui: Compress s2, s6 and s7 on labsdb1012 - [[phab:T222978|T222978]]
* 05:50 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic for db1130 (s5) and db1138 (s4) [[phab:T222682|T222682]] (duration: 00m 49s)
* 05:41 marostegui: Optimize tables on pc2007
* 05:18 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1130 into s5 and db1138 into s4 [[phab:T222682|T222682]] (duration: 00m 49s)
* 05:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1130 into s5 and db1138 into s4 [[phab:T222682|T222682]] (duration: 00m 51s)
 
== 2019-05-12 ==
* 15:32 elukey: rollback python-kafka one eventlog1002 to 1.4.1-1~stretch1 - [[phab:T222941|T222941]]
* 12:14 elukey: restart eventlogging on eventlog1002 - all processors stuck due to kafka python ([[phab:T222941|T222941]])
* 05:31 marostegui: DIsable notifications for db1116:s8 Slave LAG check as this is a snapshot source
 
== 2019-05-11 ==
* 18:26 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 57s)
* 06:37 elukey: restart eventlogging on eventlog1002 - huge kafka consumer lag accumulated ([[phab:T222941|T222941]])
* 02:01 mutante: actinium - low disk space - apt-get clean - gzip /var/log/squid3/access.log.1
 
== 2019-05-10 ==
* 18:58 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -b 15 -p 95 '*' 'run-puppet-agent -q --failed-only'
* 18:51 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -b 15 -p 95 '*' 'run-puppet-agent -q --failed-only'
* 18:49 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin '*' 'enable-puppet "Puppet breakages on all hosts -- cdanis"'
* 18:39 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin '*' 'disable-puppet "Puppet breakages on all hosts -- cdanis"'
* 16:50 reedy@deploy1001: Synchronized dblists/: Update size related dblists (duration: 00m 49s)
* 16:31 ebernhardson: drop archive indices from cloudelastic
* 16:11 ariel@deploy1001: Finished deploy [dumps/dumps@70e8498]: look for dumpstatus json file per wiki run (duration: 00m 05s)
* 16:11 ariel@deploy1001: Started deploy [dumps/dumps@70e8498]: look for dumpstatus json file per wiki run
* 16:05 ejegg: moved adyen smashpig job runner to frdev1001
* 15:25 _joe_: wiped opcache clean on all api, appservers
* 15:05 cdanis: cdanis@mw1239.eqiad.wmnet ~ % sudo php7adm /opcache-free
* 15:05 Krinkle: fix opcache krinkle@mw1268:~$ scap pull
* 15:04 cdanis: cdanis@mw1268.eqiad.wmnet ~ % sudo php7adm /opcache-free
* 15:03 Krinkle: ran 'scap pull' on mw1239.eqiad.wmnet to fix opcache corruption
* 14:56 jbond42: uploade zuul_2.5.10-wmf9 to jessie-wikimedia
* 14:54 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T99740|T99740]] / {{Gerrit|d9dbecad9c7b}} (duration: 00m 51s)
* 14:33 akosiaris@deploy1001: scap-helm eventgate-analytics finished
* 14:32 akosiaris@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 14:32 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f lala.yaml staging stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics finished
* 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
* 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-eqiad-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
* 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics finished
* 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics cluster codfw completed
* 14:30 akosiaris@deploy1001: scap-helm eventgate-analytics upgrade -f eventgate-analytics-codfw-values.yaml production stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
* 13:30 ema: pool cp3038 w/ ATS backend [[phab:T222937|T222937]]
* 12:19 ema: depool cp3038 and reimage as upload_ats [[phab:T222937|T222937]]
* 11:52 jbond42: (un)load edac kernel modules on elastic1029 to test resetting counters
* 11:04 jbond42: restart refinery-eventlogging-saltrotate on an-coord1001
* 10:30 moritzm: installing symfony security updates
* 09:17 jynus: disabling replication lag alerts for backup source hosts on s1, s4, s8 [[phab:T206203|T206203]]
* 07:14 moritzm: uploaded linux-meta 1.21 for jessie-wikimedia (pointing to the new -9 ABI introduced with the 4.9.168 kernel)
* 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1100 into API (duration: 00m 50s)
* 06:55 ema: swift-fe: rolling restart to enable ensure_max_age [[phab:T222937|T222937]]
* 06:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 into API (duration: 00m 50s)
* 06:27 ema: ms-fe1005: pool with ensure_max_age [[phab:T222937|T222937]]
* 06:26 ariel@deploy1001: Finished deploy [dumps/dumps@6f9a5a4]: remove sleep between incr dumps of wikis (duration: 00m 05s)
* 06:26 ariel@deploy1001: Started deploy [dumps/dumps@6f9a5a4]: remove sleep between incr dumps of wikis
* 06:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1100 (duration: 00m 50s)
* 06:18 ema: ms-fe1005: depool and test ensure_max_age [[phab:T222937|T222937]]
* 06:09 _joe_: depooling mw1261 for tests
* 05:41 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2105 db2109 into s3 [[phab:T222772|T222772]] (duration: 00m 49s)
* 05:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2105 db2109 into s3 [[phab:T222772|T222772]] (duration: 00m 52s)
* 05:40 elukey: execute kafka preferred-replica-election on kafka-jumbo1001 as attempt to rebalance traffic (1002 seems handling way more than others since some days)
* 05:32 elukey: restart eventlogging daemons on eventlog1002 - kafka consumer errors in the logs, some lag built over time
* 05:08 marostegui: Stop MySQL on db1100
* 05:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 (duration: 00m 50s)
* 04:56 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2112 (duration: 00m 51s)
* 00:15 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@e13facb]: Downgrade LDF server back for [[phab:T222471|T222471]] (duration: 00m 37s)
* 00:14 smalyshev@deploy1001: Started deploy [wdqs/wdqs@e13facb]: Downgrade LDF server back for [[phab:T222471|T222471]]
 
== 2019-05-09 ==
* 23:52 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T220625|T220625]]: Dont write to private wikis on cloudelastic (duration: 00m 50s)
* 23:48 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/CirrusSearch/: [[phab:T220819|T220819]] Uniquely identify connections in connection pool (duration: 00m 58s)
* 23:43 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/CirrusSearch/: [[phab:T220625|T220625]] Limit the clusters archive index is written to (duration: 00m 59s)
* 23:41 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/Wikibase/view/resources/jquery/wikibase/jquery.wikibase.entityselector.js: [[phab:T172937|T172937]] [[phab:T222346|T222346]] Revert Close entityselector after selecting exact match (duration: 00m 51s)
* 23:24 chaomodus: spicerack upgraded to 0.0.25 on cumin1001 and cumin 2001
* 22:58 volans: uploaded spicerack_0.0.25-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 22:57 bawolff: Manually cleared extdistributor cache [[phab:T188692|T188692]]
* 22:50 mutante: labweb1001/labweb1002 - remove "runJob" cron job from www-data's crontab, it is already also a systemd timer and puppet was meant to remove it ([[phab:T222917|T222917]])
* 21:27 foks: change user email for Melamrawy (WMF)@global
* 21:23 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikipediaAppCaptionEditCounter ([[phab:T222211|T222211]]) (duration: 00m 52s)
* 19:56 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.4
* 19:28 XioNoX: renumber mr1-esams<->cr2-knams link to 91.198.174.224/31 - [[phab:T211254|T211254]]
* 19:24 XioNoX: renumber mr1-esams<->cr1-esams link to 91.198.174.240/31 - [[phab:T211254|T211254]]
* 18:22 XioNoX: simplify filter analytics-in4 term mysql-dbstore on cr1/2-eqiad
* 16:17 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Restore original weight on db1084 (duration: 00m 59s)
* 16:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to  db1081 (duration: 01m 13s)
* 15:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to  db1081 (duration: 01m 01s)
* 15:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 (duration: 01m 00s)
* 15:37 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2112 (duration: 00m 59s)
* 15:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1081 (duration: 00m 56s)
* 15:20 marostegui: Stop mysql on db2112 for onsite work
* 15:16 otto@deploy1001: scap-helm eventgate-main finished
* 15:16 otto@deploy1001: scap-helm eventgate-main cluster eqiad completed
* 15:16 otto@deploy1001: scap-helm eventgate-main install -n main -f main/eqiad-values.yaml stable/eventgate [namespace: eventgate-main, clusters: eqiad]
* 15:13 otto@deploy1001: scap-helm eventgate-main finished
* 15:13 otto@deploy1001: scap-helm eventgate-main cluster codfw completed
* 15:13 otto@deploy1001: scap-helm eventgate-main install -n main -f main/codfw-values.yaml stable/eventgate [namespace: eventgate-main, clusters: codfw]
* 15:12 papaul: shurtting down db2114 for main board replacement
* 14:53 otto@deploy1001: scap-helm eventgate-main finished
* 14:52 otto@deploy1001: scap-helm eventgate-main cluster staging completed
* 14:52 otto@deploy1001: scap-helm eventgate-main upgrade main -f main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
* 14:48 moritzm: removing unused uwsgi packages from scb* hosts
* 14:13 otto@deploy1001: scap-helm eventgate-main finished
* 14:13 otto@deploy1001: scap-helm eventgate-main cluster staging completed
* 14:13 otto@deploy1001: scap-helm eventgate-main upgrade main -f main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
* 13:34 bblack: recdns: wiping dyna.wikimedia.org from pdns-recursors
* 13:13 fsero: running authdns-update for new docker-registry  [[phab:T221101|T221101]]
* 12:49 fsero: switching traffic from old-registry  to new registries registry[12]00[12] - [[phab:T221101|T221101]]
* 12:01 _joe_: reenabling puppet across the fleet
* 11:57 jbond42: all puppetmasters and puppetdbs should be restored'
* 11:55 jbond42: clean up old source files sudo cumin A:puppetmaster 'rm /etc/apt/sources.list.d/component-facter3.list /etc/apt/sources.list.d/component-puppet5.list'
* 11:49 volans: updated netbox statues for decommissioning and spare hosts according to [[phab:T222352|T222352]]
* 11:23 jbond42: running sudo apt-get install puppet-master=4.8.2-5~bpo8+1 puppet-master-passenger=4.8.2-5~bpo8+1 on labtestpuppetmaster2001
* 11:19 jbond42: running  sudo apt-get install facter=2.4.6-1 puppet=4.8.2-5 puppet-master puppet-master-passenger on labpuppetmaster1001
* 11:18 jbond42: starting puppetdb on puppetdb2001
* 11:15 jbond42: run sudo apt-get install puppetdb on puppetdb2001
* 11:14 jbond42: ran the folloowing on puppetdb2001  sudo apt-get install facter=2.4.6-1 puppet=4.8.2-5
* 11:14 jbond42: ran the folloowing on puppetmaster200{1,2}  sudo apt-get install facter=2.4.6-1 puppet=4.8.2-5 puppet-master puppet-master-passenger
* 11:04 _joe_: disabling puppet across the fleet
* 11:02 volans: stopped ircecho to avoid spam
* 10:43 marostegui: Stop MySQL on db1081
* 10:41 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 57s)
* 10:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give API traffic to db1129 (new host on s2) (duration: 00m 57s)
* 10:15 _joe_: restarting low-traffic pybals in codfw, eqiad
* 10:05 akosiaris: restart proton on proton1001. Host Out of memory [[phab:T214975|T214975]]
* 09:57 ariel@deploy1001: Finished deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more (retry) (duration: 00m 06s)
* 09:57 ariel@deploy1001: Started deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more (retry)
* 09:54 ariel@deploy1001: Finished deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more (duration: 00m 06s)
* 09:54 ariel@deploy1001: Started deploy [dumps/dumps@ab56fdd]: reduce sleep time between dumps of adds-changes wikis still more
* 09:31 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1129 (new host on s2) (duration: 00m 57s)
* 09:29 fsero@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=docker-registry,name=codfw
* 09:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 56s)
* 09:12 godog: bounce rsyslog on lithium
* 09:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 56s)
* 08:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 56s)
* 08:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 and db1129 (new host on s2) (duration: 00m 57s)
* 08:28 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1076 (duration: 00m 55s)
* 08:23 elukey: upload uwsgi 2.0.14+20161117-3+deb9u2+wmf1 packages to stretch-wikimedia - [[phab:T212697|T212697]]
* 08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1129 with low weight on s2 - [[phab:T222682|T222682]] (duration: 00m 56s)
* 08:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1076 (duration: 00m 56s)
* 08:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Provision db1129, db2104, db2107, db2108 [[phab:T222772|T222772]] [[phab:T222682|T222682]] (duration: 00m 57s)
* 08:06 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Provision db1129, db2104, db2107, db2108 [[phab:T222772|T222772]] [[phab:T222682|T222682]] (duration: 00m 59s)
* 07:54 moritzm: installing jquery security updates for stretch
* 07:50 elukey: roll restart HDFS masters on an-master100[1,2] to pick up new logging settings
* 07:23 moritzm: installing twitter-bootstrap3 security updates
* 06:53 _joe_: restarted nagios-nrpe-server on proton1001
* 05:58 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify disk status for db2103, db2112, db2116 (duration: 00m 58s)
* 05:29 marostegui: Stop replication on db2098:s2
* 05:25 marostegui: Stop MySQL on db1076
* 05:22 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1076 (duration: 00m 57s)
* 05:09 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2103, db2112 and db2116 into s1 [[phab:T222772|T222772]] (duration: 01m 41s)
* 05:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2103, db2112 and db2116 into s1 [[phab:T222772|T222772]] (duration: 01m 22s)
* 04:55 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 00m 59s)
* 00:57 twentyafterfour: stopped phd, now running `puppet agent --test` manually on phab1001
* 00:08 twentyafterfour: phabricator upgrade successful
* 00:04 twentyafterfour: starting phabricator deployment, momentary downtime expected (~1 minute)
 
== 2019-05-08 ==
* 23:06 krinkle@deploy1001: Synchronized php-1.34.0-wmf.3/includes/specials/SpecialWatchlist.php: [[phab:T218511|T218511]] / {{Gerrit|I42387498dff0b1}} (duration: 00m 57s)
* 23:00 krinkle@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/CirrusSearch/includes/Hooks.php: [[phab:T219342|T219342]] / {{Gerrit|164a7c135c800cf73f7fbfc}} (duration: 00m 59s)
* 22:20 ejegg: re-enabled fundraising jobs
* 22:15 ejegg: updated SmashPig standalone install from {{Gerrit|78b92b7fef}} to {{Gerrit|88fd9650ec}}
* 22:14 ejegg: disabled fundraising jobs for SmashPig update
* 22:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wmgUseAdvancedSearch, no longer read; drop rcenhancedfilters from BF whitelist (duration: 00m 57s)
* 22:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Unconditionally load AdvancedSearch everywhere, the config is always true (duration: 00m 57s)
* 22:00 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Beta Feature config cleanup: doc change plus drop advancedsearch and templatewizard-betafeature (duration: 00m 57s)
* 21:58 jforrester@deploy1001: Synchronized php-1.34.0-wmf.4/extensions/VisualEditor/includes/ApiVisualEditor.php: UBN [[phab:T209599|T209599]] ApiVisualEditor: Fix use of getBlockInfo() (duration: 00m 57s)
* 21:52 niharika29@deploy1001: Synchronized php-1.34.0-wmf.4/tests/phpunit/: Fix Block::newLoad for IPv6 range blocks - follow-up to {{Gerrit|Ie8bebd8}} [[phab:T222246|T222246]] (duration: 01m 09s)
* 21:50 niharika29@deploy1001: Synchronized php-1.34.0-wmf.4/includes/Block.php: Fix Block::newLoad for IPv6 range blocks - follow-up to {{Gerrit|Ie8bebd8}} [[phab:T222246|T222246]] (duration: 00m 59s)
* 21:49 niharika29@deploy1001: sync aborted: php-1.34.0-wmf.4/includes/Block.php Fix Block::newLoad for IPv6 range blocks - follow-up to {{Gerrit|Ie8bebd8}} [[phab:T222246|T222246]] (duration: 00m 03s)
* 21:49 niharika29@deploy1001: Started scap: php-1.34.0-wmf.4/includes/Block.php Fix Block::newLoad for IPv6 range blocks - follow-up to {{Gerrit|Ie8bebd8}} [[phab:T222246|T222246]]
* 20:12 thcipriani: restarting gerrit due to threads stuck behind sendemail
* 20:10 gehel: upgrade to nodejs 10 for maps completed - [[phab:T210704|T210704]]
* 20:08 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps1001 ([[phab:T215852|T215852]]) (duration: 00m 20s)
* 20:08 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps1001 ([[phab:T215852|T215852]])
* 20:07 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps1001 ([[phab:T215852|T215852]]) (duration: 00m 24s)
* 20:07 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps1001 ([[phab:T215852|T215852]])
* 19:58 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]004 ([[phab:T215852|T215852]]) (duration: 00m 58s)
* 19:57 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]004 ([[phab:T215852|T215852]])
* 19:56 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]004 ([[phab:T215852|T215852]]) (duration: 00m 59s)
* 19:55 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]004 ([[phab:T215852|T215852]])
* 19:47 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]003 ([[phab:T215852|T215852]]) (duration: 00m 54s)
* 19:46 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps[12]003 ([[phab:T215852|T215852]])
* 19:46 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]003 ([[phab:T215852|T215852]]) (duration: 00m 56s)
* 19:45 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps[12]003 ([[phab:T215852|T215852]])
* 19:35 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator/kartotherian node 10 build into maps[12]002 ([[phab:T215852|T215852]]) (duration: 01m 12s)
* 19:33 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator/kartotherian node 10 build into maps[12]002 ([[phab:T215852|T215852]])
* 19:32 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy tilerator node 10 build into maps[12]002 ([[phab:T215852|T215852]]) (duration: 00m 57s)
* 19:31 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy tilerator node 10 build into maps[12]002 ([[phab:T215852|T215852]])
* 19:26 gehel: continue upgrade to nodejs 10 for maps - [[phab:T210704|T210704]]
* 19:22 thcipriani@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.4 (duration: 01m 48s)
* 19:21 cdanis: swift codfw-prod: deploy {{Gerrit|I59c88aed}} [[phab:T221068|T221068]]
* 19:20 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.4
* 19:01 cdanis: [[phab:T221904|T221904]] cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'ms-be2*[4,7].codfw.wmnet' 'for DISK in /sys/block/sd*/queue/scheduler ; do echo cfq > $DISK ; done'
* 18:09 mutante: restarting gerrit to apply logging changes (gerrit:508391)
* 17:58 bblack: public authdns: deploying the big DYNA/CNAME change in https://gerrit.wikimedia.org/r/c/operations/dns/+/507399
* 17:44 jforrester@deploy1001: Synchronized wmf-config/extension-list: Re-sort extension-list (prod no-op) (duration: 00m 56s)
* 17:42 jforrester@deploy1001: Synchronized wmf-config/env.php: Clean-up: Allow for running outside the cluster for local testing (no-op for prod) (duration: 00m 56s)
* 17:22 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Retry: Enable WikimediaEditorTasks on Beta commonswiki (duration: 00m 57s)
* 17:15 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable WikimediaEditorTasks on Beta commonswiki (duration: 00m 57s)
* 16:55 otto@deploy1001: scap-helm eventgate-main finished
* 16:55 otto@deploy1001: scap-helm eventgate-main cluster staging completed
* 16:55 otto@deploy1001: scap-helm eventgate-main upgrade main -f main/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-main, clusters: staging]
* 16:08 gehel: restart tileratorui on maps2001 - [[phab:T222801|T222801]]
* 15:59 jynus: restart db2117 after first puppet run
* 15:56 mforns@deploy1001: Finished deploy [analytics/refinery@698f213]: deploying analytics-refinery up to {{Gerrit|698f2137aa965b07548ae7565aafaa784628b13c}} with source=v0.0.89 (duration: 15m 38s)
* 15:52 gehel: reset authentication on cassandra / maps / codfw - [[phab:T222801|T222801]]
* 15:40 mforns@deploy1001: Started deploy [analytics/refinery@698f213]: deploying analytics-refinery up to {{Gerrit|698f2137aa965b07548ae7565aafaa784628b13c}} with source=v0.0.89
* 15:19 moritzm: installing ruby-i18n security updates
* 15:14 moritzm: installing rails security updates
* 15:04 XioNoX: fix typo on asw2-ulsfo<->cr2-ulsfo interface (Xlink2 instead of Xlink1)
* 14:21 otto@deploy1001: scap-helm eventgate-main finished
* 14:21 otto@deploy1001: scap-helm eventgate-main cluster staging completed
* 14:21 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
* 14:18 mbsantos@deploy1001: Finished deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps2001 ([[phab:T215852|T215852]]) (duration: 00m 27s)
* 14:17 mbsantos@deploy1001: Started deploy [tilerator/deploy@2736a69] (stretch): Deploy tilerator node 10 build into maps2001 ([[phab:T215852|T215852]])
* 14:14 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps2001 ([[phab:T215852|T215852]]) (duration: 00m 27s)
* 14:14 mbsantos@deploy1001: Started deploy [kartotherian/deploy@7774721] (stretch): Deploy kartotherian node 10 build into maps2001 ([[phab:T215852|T215852]])
* 14:05 fsero@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
* 14:03 gehel: starting upgrade to nodejs 10 for maps - [[phab:T210704|T210704]]
* 13:50 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
* 13:18 ema: cp3035: restart varnish-be
* 12:07 kart_: EU-Midday SWAT done.
* 12:06 kartik@deploy1001: Synchronized php-1.34.0-wmf.3: SWAT: [[gerrit:508559{{!}}Log warning and show error on empty username (T222529)]] (duration: 07m 29s)
* 11:56 akosiaris@deploy1001: scap-helm cxserver finished
* 11:56 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
* 11:56 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
* 11:56 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml staging stable/cxserver [namespace: cxserver, clusters: codfw]
* 11:54 akosiaris: bump prometheus-statsd-exporter for cxserver to 0.0.5 [[phab:T220709|T220709]]
* 11:54 akosiaris@deploy1001: scap-helm cxserver finished
* 11:54 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
* 11:54 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
* 11:29 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT [[gerrit:495677{{!}}Add publish restrictions config for enwiki]] ([[phab:T217237|T217237]]) (duration: 00m 58s)
* 11:06 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - [[phab:T219148|T219148]] (duration: 01m 30s)
* 11:05 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@abd7fdc]: Prepare the config to allow jobs to be switched to PHP7 individually - [[phab:T219148|T219148]]
* 10:17 _joe_: restarted pybal on lvs1016 to pick up changes for [[phab:T222705|T222705]]
* 10:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1131 in s6 [[phab:T222682|T222682]] (duration: 00m 57s)
* 09:51 _joe_: restarted proton on proton1001
* 09:50 _joe_: restarted pybal on lvs1006 to pick up changes for [[phab:T222705|T222705]]
* 09:49 _joe_: restarted pybal on lvs2003 to pick up changes for [[phab:T222705|T222705]]
* 09:45 marostegui: Stop replication on db2097:3311
* 09:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give more traffic to db1131 in s6 [[phab:T222682|T222682]] (duration: 01m 07s)
* 09:26 _joe_: restarting pybal on lvs2006 to pick up changes for [[phab:T222705|T222705]] (3/3)
* 09:24 elukey: install uwsgi-core_2.0.14+20161117-3+deb9u2+wmf1 on netmon2001 to test a uwsgi bug fix - [[phab:T212697|T212697]]
* 09:12 _joe_: restarting pybal on lvs2006 to pick up changes for [[phab:T222705|T222705]] (2/3)
* 08:57 _joe_: restarting pybal on lvs2006 to pick up changes for [[phab:T222705|T222705]]
* 08:56 godog: upload prometheus-statsd-exporter 0.9.0+ds1-1 to stretch-wikimedia [[phab:T220709|T220709]]
* 08:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1131 into s6 with low weight [[phab:T222682|T222682]] (duration: 00m 51s)
* 08:48 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1131 into s6 with low weight [[phab:T222682|T222682]] (duration: 00m 53s)
* 08:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1093 (duration: 00m 58s)
* 08:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1093 (duration: 00m 58s)
* 07:49 marostegui: Stop replication s1 on db2102
* 07:45 elukey: install uwsgi-core_2.0.14+20161117-3+deb9u2+wmf1 on netmon1002 to test a uwsgi bug fix - [[phab:T212697|T212697]]
* 07:42 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some API traffic to db1093 (duration: 00m 57s)
* 07:41 vgutierrez: upgrading pybal to version 1.15.6 in lvs1001 - [[phab:T222705|T222705]]
* 07:40 godog: bounce prometheus on bast3002 to finalize migration
* 07:37 vgutierrez: upgrading pybal to version 1.15.6 in lvs1004 - [[phab:T222705|T222705]]
* 07:33 vgutierrez: upgrading pybal to version 1.15.6 in lvs1002 - [[phab:T222705|T222705]]
* 07:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db2115 into x1 [[phab:T222772|T222772]] (duration: 00m 56s)
* 07:32 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db2115 into x1 [[phab:T222772|T222772]] (duration: 01m 09s)
* 07:26 vgutierrez: upgrading pybal to version 1.15.6 in lvs1005 - [[phab:T222705|T222705]]
* 07:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some weight to db1093 (duration: 00m 56s)
* 07:21 vgutierrez: upgrading pybal to version 1.15.6 in lvs1016 - [[phab:T222705|T222705]]
* 07:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool db1127 and db1137 into x1 [[phab:T222682|T222682]] (duration: 00m 56s)
* 07:14 vgutierrez: upgrading pybal to version 1.15.6 in lvs1006 - [[phab:T222705|T222705]]
* 07:13 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Pool db1127 and db1137 into x1 [[phab:T222682|T222682]] (duration: 01m 03s)
* 07:04 vgutierrez: upgrading pybal to version 1.15.6 in lvs2001 - [[phab:T222705|T222705]]
* 07:02 vgutierrez: upgrading pybal to version 1.15.6 in lvs2004 - [[phab:T222705|T222705]]
* 06:58 vgutierrez: upgrading pybal to version 1.15.6 in lvs2002 - [[phab:T222705|T222705]]
* 06:51 vgutierrez: upgrading pybal to version 1.15.6 in lvs2005 - [[phab:T222705|T222705]]
* 06:42 vgutierrez: upgrading pybal to version 1.15.6 in lvs2003 - [[phab:T222705|T222705]]
* 06:36 vgutierrez: upgrading pybal to version 1.15.6 in lvs3001 - [[phab:T222705|T222705]]
* 06:32 vgutierrez: upgrading pybal to version 1.15.6 in lvs3003 - [[phab:T222705|T222705]]
* 06:29 elukey: restart uwsgi-netbox on netmon1002 after the daily segfault (upon restart)
* 06:29 vgutierrez: upgrading pybal to version 1.15.6 in lvs3002 - [[phab:T222705|T222705]]
* 06:24 vgutierrez: upgrading pybal to version 1.15.6 in lvs3004 - [[phab:T222705|T222705]]
* 06:20 marostegui: Stop MySQL on db2096
* 06:19 vgutierrez: upgrading pybal to version 1.15.6 in lvs4005 - [[phab:T222705|T222705]]
* 06:16 vgutierrez: upgrading pybal to version 1.15.6 in lvs4006 - [[phab:T222705|T222705]]
* 06:13 vgutierrez: upgrading pybal to version 1.15.6 in lvs4007 - [[phab:T222705|T222705]]
* 06:07 vgutierrez: upgrading pybal to version 1.15.6 in lvs5001 - [[phab:T222705|T222705]]
* 06:02 vgutierrez: upgrading pybal to version 1.15.6 in lvs5002 - [[phab:T222705|T222705]]
* 05:59 vgutierrez: upgrading pybal to version 1.15.6 in lvs5003 - [[phab:T222705|T222705]]
* 05:48 vgutierrez: upgrading pybal to version 1.15.6 in lvs2006 - [[phab:T222705|T222705]]
* 05:25 marostegui: Stop MySQL on db1093
* 05:01 marostegui: Optimize tables on pc1007
* 05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool pc1007 (duration: 00m 59s)
 
== 2019-05-07 ==
* 23:31 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: [[phab:T220625|T220625]] Configure wgCirrusSearchPrivateClusters (duration: 00m 58s)
* 22:06 ppchelko@deploy1001: Finished deploy [restbase/deploy@8f5859f]: Do not cache html if stash was requested [[phab:T215956|T215956]] (duration: 18m 12s)
* 21:48 ppchelko@deploy1001: Started deploy [restbase/deploy@8f5859f]: Do not cache html if stash was requested [[phab:T215956|T215956]]
* 21:47 ppchelko@deploy1001: deploy aborted: Do not cache html if stash was requested [[phab:T215956|T215956]] (duration: 00m 12s)
* 21:47 ppchelko@deploy1001: Started deploy [restbase/deploy@d91ee4c]: Do not cache html if stash was requested [[phab:T215956|T215956]]
* 21:46 mutante: deploy1001 - renabled puppet - deployment can go ahead
* 21:06 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -m async -p80 -b10 'C:profile::mediawiki::php and *.codfw.wmnet' 'run-puppet-agent' 'systemctl reload php7.2-fpm.service'
* 20:43 mutante: gerrit2001 - restarting apache.. failed
* 20:38 ejegg: updated payments-wiki from {{Gerrit|558427f731}} to {{Gerrit|6e0172bac3}}
* 20:31 mutante: gerrit2001 - temp disabling puppet - testing apache rewrites for [[phab:T218844|T218844]] on non-prod host
* 20:14 mutante: deploy1001 - temp disabled puppet - debugging issue with apache-fast-test script
* 19:52 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.4
* 19:42 thcipriani@deploy1001: Finished scap: testwiki to 1.34.0-wmf.4 and rebuild l10n cache (duration: 28m 55s)
* 19:13 thcipriani@deploy1001: Started scap: testwiki to 1.34.0-wmf.4 and rebuild l10n cache
* 19:04 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.22 (duration: 02m 15s)
* 18:50 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.21 (duration: 08m 48s)
* 18:38 mutante: LDAP - adding awight to 'wmde' group ([[phab:T222538|T222538]])
* 18:08 mutante: restarting icinga via web UI button
* 17:45 thcipriani: starting branchcut for train (1.34.0-wmf.4)
* 17:31 arturo: rebooting cloudvirt1024 to test interfaces configuration
* 16:59 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
* 16:39 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
* 16:38 arturo: rebooting cloudvirt1024 to test interfaces configuration
* 16:05 fsero: created eventgate-main tokens - [[phab:T218346|T218346]]
* 16:05 fsero: created eventgate-main tokens
* 15:47 fsero: creating eventgate-main namespace on k8s clusters
* 15:38 vgutierrez: uploaded pybal 1.15.6 to apt.wikimedia.org (stretch && jessie)
* 15:21 ebernhardson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/CirrusSearch/maintenance/forceSearchIndex.php: [[phab:T222641|T222641]]: Cirrus maint script handle ancient logging rows (duration: 00m 52s)
* 14:53 cdanis: pool mw1271
* 14:53 cdanis: pool mw1256
* 14:44 cdanis: cdanis@mw1256.eqiad.wmnet ~ % sudo php7adm /opcache-free
* 14:43 cdanis: cdanis@mw1271.eqiad.wmnet ~ % sudo php7adm /opcache-free
* 14:40 vgutierrez: uploaded pybal 1.15.5 to apt.wikimedia.org (stretch && jessie)
* 14:26 _joe_: repooling mw1320
* 14:25 _joe_: resetting opcache on mw1320
* 14:13 vgutierrez: uploaded pybal 1.15.4 to apt.wikimedia.org (stretch)
* 14:12 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1256.eqiad.wmnet
* 14:12 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1271.eqiad.wmnet
* 14:09 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1320.eqiad.wmnet
* 14:09 cdanis: depool mw1320
* 14:07 otto@deploy1001: scap-helm eventgate-analytics finished
* 14:07 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
* 14:07 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/eqiad-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: eqiad]
* 14:02 otto@deploy1001: scap-helm eventgate-analytics finished
* 14:02 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
* 14:02 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
* 14:01 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
* 14:01 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
* 13:59 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/codfw-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
* 13:58 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
* 13:57 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/codfw-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: codfw]
* 13:50 vgutierrez: uploaded prometheus-trafficserver-exporter 0.2.3 to apt.wikimedia.org (stretch) - [[phab:T221217|T221217]]
* 13:45 marostegui: Stop MySQL and poweroff db1093 for BBU replacement - [[phab:T222127|T222127]]
* 13:45 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093 for BBU replacement [[phab:T222127|T222127]] (duration: 00m 51s)
* 13:37 otto@deploy1001: scap-helm eventgate-analytics finished
* 13:37 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 13:37 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: staging]
* 13:37 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: staging]
* 13:17 cdanis: [[phab:T221904|T221904]] cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -m async -b5 'ms-be1*' 'run-puppet-agent -q' 'systemctl restart swift-object-replicator' 'systemctl restart swift-object-auditor'
* 13:08 ema: sudo ipmitool -I lanplus -H cp2009.mgmt.codfw.wmnet -U root mc reset cold [[phab:T222459|T222459]]
* 13:07 ema: sudo ipmitool -I lanplus -H "cp2009.mgmt.codfw.wmnet" -U root -E chassis power cycle [[phab:T222459|T222459]]
* 13:02 cdanis: [[phab:T221904|T221904]] cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -p95 -m async -b5 'ms-be2*' 'run-puppet-agent -q' 'systemctl restart swift-object-replicator' 'systemctl restart swift-object-auditor'
* 12:45 jynus: remove dbstore1001, dbstore2001, dbstore2002 from tendril and zarcillo [[phab:T220002|T220002]]
* 12:09 marostegui: Stop Replication on db1140:3320 to provision db1127 and db1137 [[phab:T222682|T222682]]
* 11:16 hashar: Downgraded Zuul back to 2.5.1-wmf7 # [[phab:T105474|T105474]] [[phab:T140297|T140297]]
* 11:08 hashar: Upgraded Zuul and it is broken. So downgrading back :-(
* 10:51 hashar: Gracefully stopping Zuul for upgrade
* 10:46 mlitn@deploy1001: Finished scap: SDC: Enable Depicts in UploadWizard on Commons (duration: 22m 45s)
* 10:40 ema: libvmod-uuid 1.4-1 uploaded to stretch-wikimedia [[phab:T221977|T221977]]
* 10:23 mlitn@deploy1001: Started scap: SDC: Enable Depicts in UploadWizard on Commons
* 10:16 hashar: contint1001: upgrading python-pbr from 0.8.2-1 to 1.10.0-1 , no more needed with recent Zuul # [[phab:T218559|T218559]]
* 10:16 hashar: contint1001, contint2002: rm /etc/apt/preferences.d/python_pbr.pref /etc/apt/preferences.d/python-pbr.pref # [[phab:T218559|T218559]]
* 10:08 jbond42: upload zull_2.5.1-wmf8 package to jessie-wikimedia
* 09:51 godog: test statsd-exporter 0.9 upgrade on deployment-imagescaler03 - [[phab:T220709|T220709]]
* 09:47 jbond42: restart pdfrender on scb1004 - [[phab:T174916|T174916]]
* 08:51 arturo: [[phab:T222685|T222685]] remove facter from jessie-wikimedia/openstack-mitaka-jessie
* 08:39 ema: repool cp1083 [[phab:T222620|T222620]]
* 07:59 moritzm: updating base-files from recent stretch point release
* 07:51 mobrovac@deploy1001: Finished deploy [restbase/deploy@d91ee4c]: Remove section functionality from the REST API - [[phab:T216636|T216636]] (duration: 24m 46s)
* 07:27 godog: upgrade prometheus on bast3002 - [[phab:T187987|T187987]]
* 07:26 mobrovac@deploy1001: Started deploy [restbase/deploy@d91ee4c]: Remove section functionality from the REST API - [[phab:T216636|T216636]]
* 07:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@d91ee4c] (dev-cluster): Remove section functionality from the REST API (duration: 03m 02s)
* 07:21 marostegui: Optimize tables on pc1010
* 07:18 mobrovac@deploy1001: Started deploy [restbase/deploy@d91ee4c] (dev-cluster): Remove section functionality from the REST API
* 06:59 moritzm: updating firmware-bnx2x (from stretch point release, this is a NOP, the source package firmware-nonfree was updated for various Wifi chipsets we don't use, doublechecked by comparing check sums for old and new bnx2x firmware)
* 06:44 elukey: restart uwsgi-netbox on netmon1002 after segfault
* 05:23 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2045 to codfw x1 master [[phab:T219493|T219493]] (duration: 00m 55s)
* 05:12 marostegui: Change topology on x1 codfw to promote db2045 to master [[phab:T219493|T219493]]
* 02:12 tstarling@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Use Preprocessor_Hash unconditionally (duration: 00m 52s)
* 00:53 mutante: install2002 - disabling puppet, live hacking DHCP config for db2103 to not serve installer via http to debug install issue for [[phab:T221532|T221532]] which seems like [[phab:T190424|T190424]]#4548003
* 00:38 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.ArticleTarget.js: Hot-deploy fix for visual diffs on mobile in non-section mode [[phab:T222489|T222489]] (duration: 00m 53s)
* 00:32 ejegg: disabled fundraising scheduled jobs for CiviCRM maintenance
 
== 2019-05-06 ==
* 23:25 maxsem@deploy1001: Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/503546/ (duration: 00m 50s)
* 22:46 crusnov@deploy1001: Finished deploy [netbox/deploy@0061190]: Deploy new version of ganeti-netbox sync. (duration: 03m 53s)
* 22:43 RoanKattouw: Running refreshMessageBlobs.php on all wikis for [[phab:T222539|T222539]]
* 22:42 crusnov@deploy1001: Started deploy [netbox/deploy@0061190]: Deploy new version of ganeti-netbox sync.
* 21:59 mutante: LDAP - remove 'sukhe' from 'nda' and add to 'wmf' instead ([[phab:T221990|T221990]])
* 21:24 cdanis: experimenting with different disk scheduler on ms-be2014 -- cdanis@ms-be2014.codfw.wmnet ~ % for D in /sys/block/sd*/queue/scheduler ; echo cfq {{!}} sudo tee $D
* 21:15 godog: swift codfw-prod: push up-to-date rings, mistakenly pushed earlier an older version
* 19:48 gehel: rolling restart of cassandra on maps* fro config change
* 19:47 RoanKattouw: Running recomputeNotifCounts.php  --notif-types=login-success on all Echo wikis for [[phab:T220762|T220762]]
* 19:31 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -m async -b4 'ms-be1*' 'run-puppet-agent --enable "cdanis rollout I369f9b29"' 'systemctl restart swift-object-replicator'
* 19:22 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin -m async -b4 'ms-be2*' 'run-puppet-agent --enable "cdanis rollout I369f9b29"' 'systemctl systemctl restart swift-object-replicator'
* 19:01 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Begin homepage experiment on cswiki and kowiki ([[phab:T221266|T221266]]) (duration: 00m 51s)
* 18:47 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/: Remove link to pageviews tool when no data available ([[phab:T222405|T222405]]) (duration: 00m 52s)
* 18:32 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/skins/MinervaNeue/includes/menu/Definitions.php: Harden Definitions::insertCommunityPortal() method ([[phab:T222407|T222407]]) (duration: 00m 53s)
* 18:30 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'ms-be*' 'disable-puppet "cdanis rollout I369f9b29"'
* 18:24 jynus: restart and upgrade db1116
* 18:14 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Set $wgOresFrontendBaseUrl ([[phab:T219396|T219396]]) (duration: 00m 51s)
* 17:53 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
* 17:52 otto@deploy1001: scap-helm eventgate-main install -n main -f main/staging-values.yaml stable/eventgate [namespace: eventgate-main, clusters: staging]
* 17:19 elukey: restart netbox on netmon1002 as test
* 17:11 jynus: restart dbprov* hosts, in sequence, for kernel upgrade
* 16:42 jynus: restart db1114 mysql for upgrade testing
* 16:38 andrewbogott: re-imaging cloudvirt1024
* 16:34 jynus: restart db2102 mysql for upgrade testing
* 16:11 hashar: CI queue drained. Should be working fine again now
* 15:57 hashar: CI / Zuul is being slowed down and being investigated
* 15:48 moritzm: updating firmware-bnx2x (from stretch point release, this is a NOP, the source package firmware-nonfree was updated for various Wifi chipsets we don't use, doublechecked by comparing check sums for old and new bnx2x firmware)
* 15:37 moritzm: updating firmware-bnx2 (from stretch point release, this is a NOP, the source package firmware-nonfree was updated for various Wifi chipsets we don't use, doublechecked by comparing check sums for old and new bnx2 firmware)
* 15:35 papaul: shutting down elastic2038 for DIMM swap
* 15:30 moritzm: updating base-files from recent stretch point release
* 15:14 ema: pool cp4026 w/ ATS backend [[phab:T219967|T219967]]
* 14:57 godog: capture strace / core for rsyslog on wezen / lithium and restart - [[phab:T199406|T199406]]
* 14:42 ema: powercycle cp1083
* 14:41 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1083.eqiad.wmnet
* 14:35 godog: swift eqiad-prod: finish decom ms-be101[45] - [[phab:T220590|T220590]]
* 14:25 moritzm: installing vips security updates
* 14:19 ema: depool cp4026 and reimage as upload_ats [[phab:T219967|T219967]]
* 14:11 otto@deploy1001: scap-helm eventgate-analytics finished
* 14:11 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 14:11 otto@deploy1001: scap-helm eventgate-analytics upgrade analytics -f analytics/staging-values.yaml --reset-values stable/eventgate [namespace: eventgate-analytics, clusters: staging]
* 14:09 hashar: CI workflow fixed by reverting a change deployed around 10:00 UTC # [[phab:T222614|T222614]]
* 14:03 ema: cp3038: restart varnish-be
* 13:56 otto@deploy1001: scap-helm eventgate-analytics finished
* 13:56 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 13:56 otto@deploy1001: scap-helm eventgate-analytics install -n analytics -f analytics/staging-values.yaml stable/eventgate [namespace: eventgate-analytics, clusters: staging]
* 13:54 moritzm: installing zziplib security updates
* 13:52 hashar: CI does not run sometime for some reason ... https://phabricator.wikimedia.org/T222614  :(
* 13:22 moritzm: installing audiofile security updates
* 13:20 moritzm: installing unzip security updates
* 12:43 moritzm: installing rsync security updates
* 12:24 moritzm: installing golang security updates on jessie
* 11:44 Amir1: EU SWAT is done
* 11:40 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:508303{{!}}Enable Suggestion Constraint Status on Wikidata]] (duration: 00m 52s)
* 11:32 arturo: reverting puppet change to the sudo module
* 11:17 arturo: merging puppet change to the sudo module https://gerrit.wikimedia.org/r/c/operations/puppet/+/507376
* 10:59 ema: manual puppet-merge $sha on failed puppetmasters https://phabricator.wikimedia.org/P8477
* 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:508302{{!}} Bumping portals to master (T128546)]] (duration: 00m 51s)
* 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:508302{{!}} Bumping portals to master (T128546)]] (duration: 00m 52s)
* 10:05 arturo: upgrade udev in cloudservices2002-dev
* 09:59 arturo: [[phab:T222148|T222148]] upgrade udev & libudev1 on cloudvirt[1001-1003,1005].eqiad.wmnet
* 09:35 elukey: restart netbox on netmon1002 (trying to reproduce the segfault) - [[phab:T212697|T212697]]
* 09:03 godog: upgrade labmon1001 to prometheus 2 - [[phab:T187987|T187987]]
* 06:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some API traffic to db1093 (duration: 00m 52s)
* 05:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Give some weight to db1093 (duration: 00m 58s)
* 04:08 ariel@deploy1001: Finished deploy [dumps/dumps@b4b7733]: reduce sleep time more between wikis for incrs (duration: 00m 05s)
* 04:08 ariel@deploy1001: Started deploy [dumps/dumps@b4b7733]: reduce sleep time more between wikis for incrs
 
== 2019-05-05 ==
* 14:42 elukey: restart pdfrender on scb1004
* 03:10 chaomodus: fyi scb* flapping on some endpoints seems to be just noise, there is high load from mobileapi but things appear to be operating normally otherwise, several boxes are in the process of checking md which may account for service lags
* 02:40 andrewbogott: restarting mariadb on cloudservices1003
 
== 2019-05-04 ==
* 22:20 reedy@deploy1001: Synchronized docroot/mediawiki/xml/index.html: Add extra xml namespace links (duration: 01m 06s)
* 10:38 ariel@deploy1001: Finished deploy [dumps/dumps@26b52ef]: misc small fixes, reduce sleep time for incr wikis (duration: 00m 09s)
* 10:38 ariel@deploy1001: Started deploy [dumps/dumps@26b52ef]: misc small fixes, reduce sleep time for incr wikis
 
== 2019-05-03 ==
* 23:50 thcipriani: gerrit back
* 23:49 thcipriani: gerrit restart due to threads piling up
* 22:09 XioNoX: clear v4 BGP to AS17451 on cr1-eqsin/cr4-ulsfo
* 17:16 arturo: [[phab:T222148|T222148]] aborrero@labstore1005:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
* 17:15 arturo: [[phab:T222148|T222148]] aborrero@labstore1004:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
* 17:11 arturo: [[phab:T222148|T222148]] aborrero@labpuppetmaster1002:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
* 17:10 arturo: [[phab:T222148|T222148]] aborrero@labpuppetmaster1001:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
* 17:09 arturo: [[phab:T222148|T222148]] aborrero@labtestpuppetmaster2001:~ $ sudo apt-get install libudev1 udev systemd systemd-sysv libsystemd0
* 17:08 arturo: [[phab:T222148|T222148]] drop libudev1 from openstack-mitaka-jessie/jessie-wikimedia (related to [[phab:T216497|T216497]])
* 17:07 arturo: [[phab:T222148|T222148]] drop udev from openstack-mitaka-jessie/jessie-wikimedia (related to [[phab:T216497|T216497]])
* 15:02 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=parsoid,dc=codfw
* 15:02 _joe_: repooling the wtp* servers depooled in codfw for load testing
* 14:56 _joe_: repool mw1275
* 13:49 jijiki: Restart npre on proton1001
* 12:26 gehel: replaying 30 minutes of eqiad search traffic on codfw - [[phab:T221121|T221121]]
* 12:21 ema: cp3038: varnish-backend-restart
* 11:10 _joe_: purging opcache on mw1275
* 10:47 ema: pool cp4025 w/ ATS backend [[phab:T219967|T219967]]
* 10:43 jbond42: [[phab:T220380|T220380]] remove zull_2.5.0-8-gcbc7f62-wmf4jessie1 from jessie-wikimedia/thirdparty
* 10:42 jbond42: [[phab:T220380|T220380]] upload zull_2.5.1-wmf7 to jessie-wikimedia
* 10:25 jijiki: Depool mw1275
* 10:02 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/WikibaseLexemeCirrusSearch/: [[gerrit:507847{{!}}Fix reference to classes that moved (T222347)]] (duration: 00m 55s)
* 09:49 ema: depool cp4025 and reimage as upload_ats [[phab:T219967|T219967]]
* 09:49 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=parsoid,dc=codfw,name=wtp201[3-4].*
* 09:21 gehel: ban elastic2038 from elastic clusters pending memory issue investigation - [[phab:T217398|T217398]]
* 08:47 ema: pool cp4024 w/ ATS backend [[phab:T219967|T219967]]
* 08:27 jynus: starting table recompression on new backup source hosts on eqiad and codfw (stop replication) [[phab:T220572|T220572]]
* 07:45 ema: depool cp4024 and reimage as upload_ats [[phab:T219967|T219967]]
* 07:16 ema: cp1089: varnish-backend-restart
* 05:32 _joe_: restarting varnish backend on cp1077
* 05:05 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=parsoid,dc=codfw,name=wtp201[5-6].*
* 04:57 oblivian@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=parsoid,dc=codfw,name=wtp20(1[7-9]{{!}}20).*
* 04:55 _joe_: progressively depooling parsoid servers in codfw to assess load tolerance
* 00:32 mutante: powercycling elastic2038
* 00:10 XioNoX: remove static route to 208.80.155.128/25 on cr1/2-eqiad - [[phab:T193496|T193496]]
* 00:06 mutante: restarting gerrit to pick up config changes for 2 mail threads and lower timeout (gerrit:507852, gerrit: 507853)
 
== 2019-05-02 ==
* 22:10 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/MobileFrontend/resources/dist/mobile.editor.overlay.js: Hot-deploy [[phab:T222229|T222229]] to fix VE switching on MobileFrontend (duration: 00m 52s)
* 21:21 thcipriani: gerrit back
* 21:20 ejegg: updated payments-wiki from {{Gerrit|aa8dad50e7}} to {{Gerrit|558427f731}}
* 21:19 thcipriani: gerrit restart to pick up config changes: https://gerrit.wikimedia.org/r/504973/ and https://gerrit.wikimedia.org/r/507858/
* 21:00 crusnov@deploy1001: Finished deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - [[phab:T222351|T222351]] (duration: 01m 48s)
* 20:58 crusnov@deploy1001: Started deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - [[phab:T222351|T222351]]
* 20:58 crusnov@deploy1001: Finished deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - [[phab:T222351|T222351]] (duration: 00m 33s)
* 20:57 crusnov@deploy1001: Started deploy [netbox/deploy@bf9aef2]: Upgrade Netbox to 2.5.12 - [[phab:T222351|T222351]]
* 19:41 ejegg: updated CiviCRM from {{Gerrit|01c4d15c9a}} to {{Gerrit|5024c968ed}}
* 19:40 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/resources/src/mediawiki.widgets/mw.widgets.SearchInputWidget.js: Hot-deploy [[phab:T222329|T222329]] fix part 2 (duration: 00m 50s)
* 19:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.3/includes/widget/SearchInputWidget.php: Hot-deploy [[phab:T222329|T222329]] fix part 1 (duration: 00m 53s)
* 19:31 James_F: Shuffled 1.34.0-wmf.3 security patch {{Gerrit|cee0e569f4}} for [[phab:T222324|T222324]] into the tip of the upstream branch now it's merged; no-op
* 19:27 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.3
* 19:03 mutante: phab2001 - apt-get autoremove ..removes a single python package not needed anymore
* 19:00 mutante: phab1001 - upgrading PHP packages on prod phab server
* 18:59 jynus: restart dbstore1001 for upgrade
* 18:33 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/: Don't fatal on deleted pages in 'recent questions' ([[phab:T222206|T222206]]) (duration: 01m 01s)
* 18:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable cirrussearch-request logging to eventgate-analytics on all wikis ([[phab:T214080|T214080]]) (duration: 00m 58s)
* 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable SpecialHomepage on cswiki and kowiki ([[phab:T221266|T221266]]) (duration: 00m 58s)
* 18:09 mutante: phab1001 - install package upgrades for bash and cron
* 17:46 sbassett: Deployed patch for [[phab:T222324|T222324]] (1.34.0-wmf.3)
* 17:45 arlolra@deploy1001: Finished deploy [parsoid/deploy@414387b]: Updating Parsoid to {{Gerrit|9786781}} (duration: 05m 45s)
* 17:39 arlolra@deploy1001: Started deploy [parsoid/deploy@414387b]: Updating Parsoid to {{Gerrit|9786781}}
* 16:42 gehel: replaying 30 minutes of eqiad search traffic on codfw - [[phab:T221121|T221121]]
* 16:10 jynus: restarted dbproxy1005 haproxy, weird connection issue
* 15:42 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Re-enable account creation on wikitech (duration: 00m 57s)
* 15:40 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Invalidate user sessions upon blocking on wikitech (duration: 00m 59s)
* 15:15 chasemp: add dsharpe to content admin on wikitech for user blocking
* 12:42 jynus: stopping several instances at dbstore1001 to clone them to db1139/40 [[phab:T220572|T220572]]
* 12:06 ema: swift-proxy rolling restart [[phab:T222071|T222071]]
* 12:01 ema: restart swift-proxy on ms-fe1005 [[phab:T222071|T222071]]
* 10:37 ariel@deploy1001: Finished deploy [dumps/dumps@53c9f22]: speed up adds-changes dumps by generating index.html less often. tmep sleep 120 (duration: 00m 15s)
* 10:36 ariel@deploy1001: Started deploy [dumps/dumps@53c9f22]: speed up adds-changes dumps by generating index.html less often. tmep sleep 120
* 10:04 ema: pool cp4023 w/ ATS backend [[phab:T219967|T219967]]
* 09:41 jynus: testing backups on db2102 (increased network and disk usage) [[phab:T220572|T220572]]
* 09:07 jynus: reboot db2102 [[phab:T220572|T220572]]
* 09:02 ema: depool cp4023 and reimage as upload_ats [[phab:T219967|T219967]]
* 09:02 godog: rollout rsyslog upgrade 8.1901.0-1~bpo9+wmf1 to eqiad
* 08:55 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 5% of anonymous users to PHP7.2 - [[phab:T219150|T219150]] (duration: 01m 03s)
* 08:49 jijiki: Sending more traffic to PHP7.2 - [[phab:T219150|T219150]]
* 04:28 andrewbogott: upgraded mediawiki on wikitech-static to 1.32.1
* 04:25 kart_: Updated cxserver to 2019-05-02-040910-production ([[phab:T222305|T222305]])
* 04:23 andrewbogott: apt-get upgrade on wikitech-static
* 04:18 kartik@deploy1001: scap-helm cxserver finished
* 04:18 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
* 04:18 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
* 04:16 kartik@deploy1001: scap-helm cxserver finished
* 04:16 kartik@deploy1001: scap-helm cxserver cluster codfw completed
* 04:16 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
* 04:15 kartik@deploy1001: scap-helm cxserver finished
* 04:15 kartik@deploy1001: scap-helm cxserver cluster staging completed
* 04:15 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
* 00:35 eileen: civicrm revision changed from {{Gerrit|3414657d36}} to {{Gerrit|01c4d15c9a}}, config revision is {{Gerrit|2119df9495}}
 
== 2019-05-01 ==
* 23:35 catrope@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/: Drop RENDER_NOW for impact module images ([[phab:T222223|T222223]]) (duration: 01m 04s)
* 23:19 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T220625|T220625]] Start writing to cloudelastic for group0 (duration: 01m 05s)
* 22:07 mutante: LDAP - adding jaufrecht to wmf ([[phab:T222214|T222214]])
* 21:57 ebernhardson: start importing group2 to cloudelastic in parallel with group1
* 21:18 ebernhardson: start importing group1 into cloudelastic from mwmaint1002
* 20:15 halfak@deploy1001: Finished deploy [ores/deploy@52e9759]: [[phab:T222121|T222121]] (duration: 14m 03s)
* 20:01 halfak@deploy1001: Started deploy [ores/deploy@52e9759]: [[phab:T222121|T222121]]
* 19:17 thcipriani@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.3 (duration: 01m 53s)
* 19:15 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.3
* 17:59 elukey: force remount of /mnt/hdfs on notebook1003 (fuse hdfs got stuck)
* 17:43 joal@deploy1001: Finished deploy [analytics/refinery@682ab7c]: Regular analytics weekly train - Second try after space freed (duration: 03m 15s)
* 17:40 joal@deploy1001: Started deploy [analytics/refinery@682ab7c]: Regular analytics weekly train - Second try after space freed
* 17:27 joal@deploy1001: Finished deploy [analytics/refinery@682ab7c]: Regular analytics weekly train (duration: 25m 18s)
* 17:02 joal@deploy1001: Started deploy [analytics/refinery@682ab7c]: Regular analytics weekly train
* 16:58 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T220625|T220625]] Start writing to cloudelastic from testwiki (duration: 01m 01s)
* 16:52 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.QuestionPosterDialog.js: SWAT: [[gerrit:507598{{!}}Ensure text exists before logging enter-question-text action]] (duration: 01m 00s)
* 16:48 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: SWAT: [[gerrit:507593{{!}}Re-use timestamp for section header and question storage]] (duration: 01m 01s)
* 16:41 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/includes/HelpPanel/QuestionPoster.php: SWAT: [[gerrit:507593{{!}}Re-use timestamp for section header and question storage]] (duration: 01m 01s)
* 16:23 sbisson@deploy1001: Synchronized php-1.34.0-wmf.3/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.Mentorship.js: SWAT: [[gerrit:507580{{!}}Mentorship module: Add data-link-id to mentor's talkpage link]] (duration: 01m 01s)
* 16:17 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:507550{{!}}Enable cirrussearch-request logging to eventgate-analytics for group1 wikis]] (duration: 01m 00s)
* 15:58 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Re-enable password reset on wikitech (duration: 00m 58s)
* 14:54 reedy@deploy1001: Synchronized wmf-config/wikitech.php: propagate blocks to gerrit (duration: 00m 57s)
* 14:52 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add new logging channel for wikitech (duration: 00m 58s)
* 13:57 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T209572|T209572]] Disable Reporting API endpoint (duration: 00m 59s)
* 13:31 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T209572|T209572]] Enable Feature Policy Reporting origin trial (duration: 01m 01s)
* 13:28 jbond42: update puppet and facter on esams
* 12:53 gehel: start recording 30 minutes of traffic from elasticsearch eqiad - [[phab:T221121|T221121]]
* 11:27 gilles: [[phab:T216499|T216499]] Y216594 [[phab:T216598|T216598]] mwscript purgeList.php ruwiki --all --verbose
* 11:22 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T216499|T216499]] [[phab:T216598|T216598]] [[phab:T216594|T216594]] Renew origin trial tokens for ruwiki (duration: 01m 14s)
* 01:01 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@5d619e4]: Update spec x-amples (duration: 03m 58s)
* 00:57 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@5d619e4]: Update spec x-amples
* 00:30 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - [[phab:T207481|T207481]] (duration: 00m 04s)
* 00:30 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - [[phab:T207481|T207481]]
 
== 2019-04-30 ==
* 23:56 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 05s)
* 23:56 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
* 23:49 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - [[phab:T207481|T207481]] (duration: 00m 04s)
* 23:49 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - [[phab:T207481|T207481]]
* 23:35 ariel@deploy1001: Finished deploy [dumps/dumps@d715ea0]: determine page ranges of content output files by cumul revision length as well as rev count (duration: 00m 03s)
* 23:35 ariel@deploy1001: Started deploy [dumps/dumps@d715ea0]: determine page ranges of content output files by cumul revision length as well as rev count
* 23:18 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 05s)
* 23:18 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
* 23:07 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - [[phab:T207481|T207481]] (duration: 00m 05s)
* 23:07 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - [[phab:T207481|T207481]]
* 22:21 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b140f]: Parsoid: Use the new stash tables for old revisions - [[phab:T215956|T215956]] (duration: 23m 56s)
* 21:57 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b140f]: Parsoid: Use the new stash tables for old revisions - [[phab:T215956|T215956]]
* 21:56 mobrovac@deploy1001: Finished deploy [restbase/deploy@b3b140f] (dev-cluster): Parsoid: use the new stashing tables for old revisions too (duration: 03m 22s)
* 21:52 mobrovac@deploy1001: Started deploy [restbase/deploy@b3b140f] (dev-cluster): Parsoid: use the new stashing tables for old revisions too
* 21:44 sbassett: Deployed patch for [[phab:T222038|T222038]] (1.34.0-wmf.1 and 1.34.0-wmf.3)
* 21:44 sbassett: Deployed patch for [[phab:T222036|T222036]] (1.34.0-wmf.1 and 1.34.0-wmf.3)
* 21:13 thcipriani@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.34.0-wmf.3
* 21:10 mutante: netmon1002 - apt-get remove --purge php 7.0* ; apt-get install php-common php-pear (pending upgrades) {{!}} netmon2001: apt autoremove
* 21:06 mutante: netmon2001 -  apt-get install php-common php-pear (pending upgrades)
* 21:03 mutante: netmon2001 -  apt-get remove --purge php7.0*
* 21:03 mutante: librenms - switched from PHP 7.0 to PHP 7.2 succesful now. reverted manual changes for debugging on netmon1002
* 20:29 thcipriani@deploy1001: Finished scap: testwiki to 1.34.0-wmf.3 and rebuild l10n cache (duration: 31m 17s)
* 20:21 mutante: netmon1002 - loading PHP 7.2 module to debug issue for librenms. librenms very short downtime
* 19:58 thcipriani@deploy1001: Started scap: testwiki to 1.34.0-wmf.3 and rebuild l10n cache
* 19:56 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.20 (duration: 02m 07s)
* 19:47 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.19 (duration: 02m 24s)
* 19:44 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@4360316]: Redeploy GUI for fixes [[phab:T222133|T222133]], [[phab:T222129|T222129]], [[phab:T222181|T222181]], [[phab:T222182|T222182]] (duration: 09m 17s)
* 19:44 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.18 (duration: 02m 25s)
* 19:43 mutante: switched netmon1002/netmon2001 from PHP 7.0 to 7.2 but reverted because LibreNMS still had an issue with it
* 19:40 thcipriani@deploy1001: Pruned MediaWiki: 1.33.0-wmf.17 (duration: 10m 11s)
* 19:35 smalyshev@deploy1001: Started deploy [wdqs/wdqs@4360316]: Redeploy GUI for fixes [[phab:T222133|T222133]], [[phab:T222129|T222129]], [[phab:T222181|T222181]], [[phab:T222182|T222182]]
* 19:27 otto@deploy1001: scap-helm eventgate-analytics finished
* 19:27 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
* 19:27 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
* 19:26 otto@deploy1001: scap-helm eventgate-analytics finished
* 19:26 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
* 19:26 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
* 19:25 otto@deploy1001: scap-helm eventgate-analytics finished
* 19:25 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 19:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 19:24 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 19:24 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics/analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 18:40 cdanis: running puppet on ms-be201[3,5] to bump replication concurrency [[phab:T221068|T221068]]
* 18:24 cdanis: running puppet on ms-be2014 to bump replication concurrency [[phab:T221068|T221068]]
* 18:09 thcipriani: start branchcut for 1.34.0-wmf.3
* 17:16 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@1f09e44]: Update mobileapps to {{Gerrit|142ba30}} ([[phab:T217837|T217837]]) (duration: 04m 16s)
* 17:11 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@1f09e44]: Update mobileapps to {{Gerrit|142ba30}} ([[phab:T217837|T217837]])
* 16:57 ayounsi@deploy1001: Finished deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS (duration: 00m 09s)
* 16:57 ayounsi@deploy1001: Started deploy [librenms/librenms@0fd8da6]: Rollback LibreNMS
* 16:52 arturo: merging change to `profile::base` and `::raid` https://gerrit.wikimedia.org/r/c/operations/puppet/+/507357 related to [[phab:T221225|T221225]]
* 16:36 ayounsi@deploy1001: Finished deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - [[phab:T207706|T207706]] (duration: 00m 11s)
* 16:36 ayounsi@deploy1001: Started deploy [librenms/librenms@2094575]: Upgrade LibreNMS to 1.51 - [[phab:T207706|T207706]]
* 16:27 XioNoX: upgrade librenms to 1.51
* 16:26 jbond42: upgrade puppet and facter in eqsin
* 16:04 ema: pool cp4022 w/ ATS backend [[phab:T219967|T219967]]
* 15:58 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:58 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 15:58 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:58 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 15:45 elukey: restart hadoop hdfs namenodes on an-master100[1,2] to pick up new logging settings - [[phab:T220702|T220702]]
* 15:18 jynus: stop s8 instance on dbstore2001 for cloning to db2100 [[phab:T220572|T220572]]
* 15:09 jiji@deploy1001: Synchronized wmf-config/CommonSettings.php: Send 1% of anonymous users to PHP7.2 - [[phab:T219150|T219150]] (duration: 00m 54s)
* 14:58 jbond42: enable-puppet "[[phab:T220987|T220987]]: global kafaka log shipping - staged rollout (jbond)"
* 14:56 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'bast3002*' 'run-puppet-agent --enable "filippo prometheus"'
* 14:49 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'labmon1001*' 'run-puppet-agent --enable "staged rollout [[phab:T222105|T222105]] by cdanis"'
* 14:44 jijiki: Sending 1% of anonymous users to PHP7.2 - [[phab:T219150|T219150]]
* 14:43 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'bast5001*' 'run-puppet-agent --enable "staged rollout [[phab:T222105|T222105]] by cdanis"'
* 14:26 jbond42: disable-puppet "[[phab:T220987|T220987]]: global kafaka log shipping - staged rollout (jbond)"
* 14:24 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'prometheus2004*' 'run-puppet-agent --enable "staged rollout [[phab:T222105|T222105]] by cdanis"'
* 14:17 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'prometheus2003*' 'run-puppet-agent --enable "staged rollout [[phab:T222105|T222105]] by cdanis"'
* 14:15 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo enable-puppet 'cdanis testing original query.max-samples [[phab:T222105|T222105]]'
* 13:29 cdanis: cdanis@prometheus1004.eqiad.wmnet ~ % sudo systemctl restart prometheus@ops.service
* 13:28 ema: depool cp4022 and reimage as upload_ats [[phab:T219967|T219967]]
* 13:20 arturo: reverting sudo puppet module changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/507317
* 13:16 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo systemctl restart prometheus@ops.service
* 13:15 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo disable-puppet 'cdanis testing original query.max-samples [[phab:T222105|T222105]]'
* 13:08 cdanis: OOMed the eqiad ops prometheus @ prometheus1003
* 13:02 cdanis: OOMed the eqiad ops prometheus @ prometheus1004
* 12:47 cdanis: cdanis@prometheus1003.eqiad.wmnet ~ % sudo run-puppet-agent --enable "staged rollout [[phab:T222105|T222105]] by cdanis"
* 12:41 arturo: merging a sudo puppet module change
* 12:39 cdanis: cdanis@prometheus1004.eqiad.wmnet ~ % sudo run-puppet-agent --enable "staged rollout [[phab:T222105|T222105]] by cdanis"
* 12:34 elukey: moved /home to /srv/home (more space in a dedicated partition) on stat1005
* 12:32 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'R:prometheus::server' 'disable-puppet "staged rollout [[phab:T222105|T222105]] by cdanis"'
* 11:27 Lucas_WMDE: EU SWAT done
* 11:22 mlitn@deploy1001: Synchronized wmf-config/CommonSettings.php: Allow cross-site requests from mobile domains (duration: 00m 52s)
* 11:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:507032{{!}}Serialize empty lists as objects on Commons (T138104)]] (duration: 00m 54s)
* 11:12 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:507031{{!}}Serialize empty lists as objects on Wikidata (T138104)]] (duration: 00m 55s)
* 11:08 gilles@deploy1001: Finished deploy [performance/navtiming@d6756c0]: [[phab:T221848|T221848]] Proper fix for partitions_for_topic in python-kafka > 1.4.4 (duration: 00m 05s)
* 11:08 gilles@deploy1001: Started deploy [performance/navtiming@d6756c0]: [[phab:T221848|T221848]] Proper fix for partitions_for_topic in python-kafka > 1.4.4
* 11:02 ema: cp3038 mbox lag, restarting varnish-be
* 10:55 kart_: Updated cxserver to 2019-04-30-055331-production ([[phab:T219412|T219412]])
* 10:49 santhosh@deploy1001: scap-helm cxserver finished
* 10:49 santhosh@deploy1001: scap-helm cxserver cluster codfw completed
* 10:49 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
* 10:48 santhosh@deploy1001: scap-helm cxserver finished
* 10:48 santhosh@deploy1001: scap-helm cxserver cluster eqiad completed
* 10:48 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
* 10:45 santhosh@deploy1001: scap-helm cxserver finished
* 10:45 santhosh@deploy1001: scap-helm cxserver cluster staging completed
* 10:45 santhosh@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
* 10:32 godog: rollout rsyslog upgrade to 8.1901.0-1~bpo9+wmf1 in codfw
* 10:32 arturo: [[phab:T222060|T222060]] reimaged labtestservices2003 as stretch spare system
* 10:32 arturo: [[phab:T222057|T222057]] reimaged labtestvirt2003 as spare system
* 10:12 godog: rollout rsyslog upgrade to 8.1901.0-1~bpo9+wmf1 in eqsin / ulsfo / esams
* 10:08 jynus: stop s7 and x1 instances on dbstore2* for cloning [[phab:T220572|T220572]]
* 09:31 fsero@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=docker-registry,service=docker-registry
* 09:26 fsero: creating lvs endpoints for docker registry - [[phab:T221101|T221101]]
* 09:02 elukey: roll restart hdfs namenodes on an-master100[1,2] to pick up new settings - [[phab:T220702|T220702]]
* 08:22 godog: bounce prometheus on bast4002 after backfill has finished - [[phab:T187987|T187987]]
* 08:11 gilles@deploy1001: Finished deploy [performance/navtiming@8f135ac]: [[phab:T221848|T221848]] Default to partition 0 when no partition is found (duration: 00m 05s)
* 08:11 gilles@deploy1001: Started deploy [performance/navtiming@8f135ac]: [[phab:T221848|T221848]] Default to partition 0 when no partition is found
* 08:11 gilles@deploy1001: deploy aborted: [[phab:T221848|T221848]] Defalt to partition 0 when no partition is found (duration: 00m 00s)
* 08:11 gilles@deploy1001: Started deploy [performance/navtiming@8f135ac]: [[phab:T221848|T221848]] Defalt to partition 0 when no partition is found
* 07:53 gilles@deploy1001: Finished deploy [performance/navtiming@e900152]: [[phab:T221848|T221848]] add more logging around startup (duration: 00m 05s)
* 07:53 gilles@deploy1001: Started deploy [performance/navtiming@e900152]: [[phab:T221848|T221848]] add more logging around startup
* 07:29 moritzm: installing systemd updates for jessie
* 07:24 marostegui: Remove labservices1001 and labservices1002 from tendril [[phab:T221857|T221857]]
* 05:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1093's status (duration: 00m 51s)
* 05:26 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify db1093's status (duration: 00m 55s)
* 04:26 mutante: LDAP - remove user pirroh from group nda ([[phab:T222085|T222085]] and cross-validate-accounts demands consistency)
* 02:23 mutante: analytics1050 - systemctl start mclog ... it was failed like recently on analytics1052 ([[phab:T212219|T212219]] ?)
* 02:09 tgr@deploy1001: Synchronized wmf-config/db-eqiad.php: SWAT: [[gerrit:507237{{!}}depool db1093]] (duration: 00m 54s)
* 01:30 mutante: contint2001..then contint1001 - deleting /etc/zuul/wikimedia and letting puppet re-clone it (gerrit:507070) ([[phab:T218844|T218844]])
 
== 2019-04-29 ==
* 23:59 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-production.php: [[phab:T220625|T220625]] Add cloudelastic servers to wgCirrusSearchClusters (5/5) (duration: 00m 52s)
* 23:58 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T220625|T220625]] Add cloudelastic servers to wgCirrusSearchClusters (4/5) (duration: 00m 52s)
* 23:56 ebernhardson@deploy1001: Synchronized wmf-config/ProductionServices.php: [[phab:T220625|T220625]] Add cloudelastic servers to wgCirrusSearchClusters (3/5) (duration: 00m 50s)
* 23:55 ebernhardson@deploy1001: Synchronized wmf-config/LabsServices.php: [[phab:T220625|T220625]] Add cloudelastic servers to wgCirrusSearchClusters (2/5) (duration: 00m 52s)
* 23:54 ebernhardson@deploy1001: Synchronized tests/: [[phab:T220625|T220625]] Add cloudelastic servers to wgCirrusSearchClusters (1/5) (duration: 00m 53s)
* 23:34 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@65796ad]: New deploy with GUI fix (duration: 31m 04s)
* 23:33 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T221154|T221154]]: Add static.inaturalist.org to $wgCopyUploadDomains for Commons (duration: 00m 54s)
* 23:03 smalyshev@deploy1001: Started deploy [wdqs/wdqs@65796ad]: New deploy with GUI fix
* 21:13 mutante: restarting gerrit
* 21:10 mutante: cobalt (gerrit) upgrading openjdk 8 minor version
* 20:40 arlolra: Updated Parsoid to {{Gerrit|c9dab9d}} ([[phab:T106578|T106578]], [[phab:T113194|T113194]], [[phab:T205338|T205338]], [[phab:T219072|T219072]], [[phab:T219938|T219938]], [[phab:T221384|T221384]], [[phab:T219943|T219943]])
* 20:37 XioNoX: add BGP session to AS4922 in eqiad
* 20:37 RoanKattouw: Deployed patch for [[phab:T222014|T222014]]
* 20:26 arlolra@deploy1001: Finished deploy [parsoid/deploy@7859b58]: Updating Parsoid to {{Gerrit|c9dab9d}} (duration: 06m 36s)
* 20:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw127[5-9].eqiad.wmnet
* 20:19 arlolra@deploy1001: Started deploy [parsoid/deploy@7859b58]: Updating Parsoid to {{Gerrit|c9dab9d}}
* 20:18 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw127[5-9].eqiad.wmnet
* 20:18 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw127[0-4].eqiad.wmnet
* 20:10 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw127[0-4].eqiad.wmnet
* 20:08 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw126[5-9].eqiad.wmnet
* 19:59 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw126[5-9].eqiad.wmnet
* 19:52 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw126[1-4].eqiad.wmnet
* 19:44 thcipriani: gerrit back
* 19:44 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw126[1-4].eqiad.wmnet
* 19:44 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw125[4-8].eqiad.wmnet
* 19:43 thcipriani: gerrit restart for https://gerrit.wikimedia.org/r/327763 [[phab:T221026|T221026]]
* 19:39 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw125[4-8].eqiad.wmnet
* 19:39 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw125[0-3].eqiad.wmnet
* 19:36 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw125[0-3].eqiad.wmnet
* 19:35 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[5-9].eqiad.wmnet
* 19:32 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw124[5-9].eqiad.wmnet
* 19:31 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[0-4].eqiad.wmnet
* 19:26 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw124[0-4].eqiad.wmnet
* 19:26 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw124[0-4].eqiad.wmnet
* 19:25 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw123[8-9].eqiad.wmnet
* 19:21 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw123[8-9].eqiad.wmnet
* 19:20 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw123[0-5].eqiad.wmnet
* 19:17 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw123[0-5].eqiad.wmnet
* 19:07 otto@deploy1001: sync-file aborted: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis - [[phab:T214080|T214080]] (duration: 00m 02s)
* 19:05 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable cirrussearch-request logging to eventgate-analytics for group0 wikis - [[phab:T214080|T214080]] (duration: 00m 53s)
* 19:01 ottomata: deploying config change to enable cirrusssearch-request logging to eventgate-analytics for group0 wikis - [[phab:T214080|T214080]]
* 18:59 RoanKattouw: Deployed patch for [[phab:T221739|T221739]]
* 18:45 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:45 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
* 18:45 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f analytics/eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
* 18:44 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:44 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
* 18:44 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f analytics/eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
* 18:42 catrope@deploy1001: Synchronized static/images/project-logos/: Change wikimaniawiki logo to Wikimania 2019 version ([[phab:T221829|T221829]]) (duration: 00m 54s)
* 18:41 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:41 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 18:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 18:41 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw122[8-9].eqiad.wmnet
* 18:37 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw122[8-9].eqiad.wmnet
* 18:37 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Serialize empty lists as objects on Test Commons ([[phab:T138104|T138104]]) (duration: 00m 54s)
* 18:34 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=mw122[1-6].eqiad.wmnet
* 18:33 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:33 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 18:33 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 18:30 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Serialize empty lists as objects on Test Wikidata ([[phab:T138104|T138104]]) (duration: 00m 53s)
* 18:29 jbond@cumin1001: conftool action : set/pooled=no; selector: name=mw122[1-6].eqiad.wmnet
* 18:26 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:26 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 18:26 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f analytics/eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 18:22 Jeff_Green: authdns-update for [[phab:T221475|T221475]]
* 18:21 catrope@deploy1001: Synchronized docroot/noc: Publish throttle-analyze at noc ([[phab:T187894|T187894]]) (duration: 00m 53s)
* 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add www4.bibl.ulaval.ca to wgCopyUploadsDomains ([[phab:T220704|T220704]]) (duration: 00m 53s)
* 17:35 Jeff_Green: authdns-update to deploy [[phab:T214525|T214525]]
* 17:15 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@9273213]: Blazegraph upgrade for new LDF version and GUI updates (duration: 06m 58s)
* 17:08 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@9273213]: Blazegraph upgrade for new LDF version and GUI updates
* 16:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC UW config cleanup: Drop wmgMediaInfoEnableUploadWizardDepicts from IS (duration: 00m 53s)
* 16:34 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SDC UW config cleanup: Switch to wmgMediaInfoEnableUploadWizardStatements in CS (duration: 00m 53s)
* 16:33 jforrester@deploy1001: sync-file aborted: SDC UW config cleanup: Switch to wmgMediaInfoEnableUploadWizardStatements in CS (duration: 00m 01s)
* 16:33 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC UW config cleanup: Add wmgMediaInfoEnableUploadWizardDepicts to IS (duration: 00m 53s)
* 16:28 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable feature flag for depicts in UW on Test Commons (duration: 00m 53s)
* 15:40 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Update WikimediaEditorTasks counter config ([[phab:T221951|T221951]]) (duration: 00m 58s)
* 14:49 herron: added uid=sukhe,ou=people,dc=wikimedia,dc=org to nda ldap group [[phab:T221990|T221990]]
* 13:56 jbond42: rolling security updates for imagemagick
* 13:45 fsero: DNS: creating docker-registry.svc.(eqiad{{!}}codfw).wmnet RRs
* 13:17 jbond42: rolling security updates for libpng
* 12:46 godog: resume rollout rsyslog 8.1901.0-1 to jessie hosts - [[phab:T219764|T219764]]
* 12:07 jynus: stop dbstore2002:s3 and dbstore2001:s5 for cloning to db2098/99 [[phab:T220572|T220572]]
* 11:56 kart_: EU-Midday SWAT done. Thanks.
* 11:56 kartik@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/ContentTranslation: SWAT: [[gerrit{{!}}506971{{!}}Change the way we calculate total unmodified MT (T221930)]] (duration: 00m 56s)
* 11:30 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}505765{{!}}Add namespace "Aldono" at eo.wiktionary (T221525)]] (duration: 00m 54s)
* 11:21 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}506939{{!}} (T222018)]] (duration: 00m 53s)
* 11:14 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}506860{{!}}Allow admins to add or remove patroller group at enwikivoyage (T222008)]] (duration: 00m 55s)
* 09:27 joal@deploy1001: Finished deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) - bis (duration: 28m 19s)
* 09:13 jynus: stop dbstore2002:s4 for cloning to db2099 [[phab:T220572|T220572]]
* 08:59 joal@deploy1001: Started deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) - bis
* 08:39 godog: begin migration of bast4002 to prometheus v2 - [[phab:T187987|T187987]]
* 08:38 joal@deploy1001: Finished deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy) (duration: 15m 38s)
* 08:33 elukey: restart keyholder on deploy1001 + rearm keys
* 08:28 elukey: restart keyholder-proxy on deploy1001 (attempt to see if new analytics scap settings got applied)
* 08:25 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Enable unicode overrides table for php 7.2 [[phab:T219279|T219279]] (duration: 00m 53s)
* 08:25 jynus: stop dbstore2001:s2 for cloning to db2098 [[phab:T220572|T220572]]
* 08:23 oblivian@deploy1001: Synchronized wmf-config/Php72ToUpper.php: Adding unicode overrides table for php 7.2 [[phab:T219279|T219279]] (duration: 00m 54s)
* 08:23 joal@deploy1001: Started deploy [analytics/refinery@53a4eee]: Test of deploy-user change (analytics -> analytics-deploy)
* 07:58 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Move db2045 from s8 to x1 [[phab:T219493|T219493]] (duration: 00m 55s)
* 07:47 marostegui: Stop mysql on db2034 (lag will happen on x1 codfw) - [[phab:T219493|T219493]]
* 07:44 marostegui: Stop replication on db2034 (x1 master) for maintenance - [[phab:T219493|T219493]]
* 07:13 moritzm: updated stretch netboot image for 9.9 point release
 
== 2019-04-28 ==
* 17:46 jiji@cumin1001: conftool action : set/pooled=no; selector: name=cp3037.esams.wmnet
* 17:46 jijiki: Depooling cp3037 - server and mgmt is unreachable
* 14:55 James_F: Updated trwiki's MediaWiki:Common.css to not over-ride the logo.
* 14:53 James_F: Manually purged the trwiki logos from Varnish as part of updating them for 2 year anniversary.
* 14:47 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki.png: trwiki: Update logo for 2 year anniversary, part III (duration: 00m 53s)
* 14:45 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki-1.5x.png: trwiki: Update logo for 2 year anniversary, part II (duration: 00m 53s)
* 14:44 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki-2x.png: trwiki: Update logo for 2 year anniversary, part I (duration: 00m 55s)
 
== 2019-04-27 ==
* 17:44 elukey: restart pdfrender on scb1002 (alert flapping)
* 12:37 jynus: correcting last log, stopping dbstore2002:s1 to clone it to db2097 [[phab:T220572|T220572]]
* 12:37 jynus: stopping dbstore2002:s6 to clone it to db2097 [[phab:T220572|T220572]]
* 00:11 foks: reset passwords for FritzSolms@global and Seanhood@global
 
== 2019-04-26 ==
* 20:15 foks: changing email and password for "Lemon martini@global"
* 19:38 foks: changing password for JDiPierro@global
* 19:21 bblack: varnish-backend-restart on cp4026, evidence of artificial 503s from mbox lag behavior, probably related to the semi-abuse client doing odd 404 traffic to ulsfo that's triggering bugs in swift's rewrite.py ....
* 19:04 foks: changing password for Subinsebastien
* 17:50 mutante: analytics1052 - reported broken systemd state in Icinga - service mcelog was in state failed - systemctl start mcelog - ([[phab:T212219|T212219]]  ?)
* 16:18 jynus: stop s6 mariadb instance on dbstore2001 [[phab:T220572|T220572]]
* 15:34 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: thumbor1001 ms-fe1005 ms-be1013 scb1001 restbase1007
* 15:05 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: ores1001.yaml wtp1025.yaml rdb1006.yaml
* 14:18 marostegui: Set pc1004-1006 and pc2004-2006 as unracked on netbox - [[phab:T209858|T209858]] [[phab:T210969|T210969]]
* 13:17 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: mw1311.yaml, mx2001 & dubnium
* 12:52 ema: cp4025: restart varnish-be due to mbox lag
* 12:50 jijiki: Restarting hhvm on mw1288
* 12:48 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on  mc1019, maps1001 and logstash1007
* 12:45 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=cache_upload,name=cp4021.ulsfo.wmnet,dc=ulsfo
* 12:44 ema: pool cp4021 w/ ATS backend [[phab:T219967|T219967]]
* 12:20 ema: repool cp3030 after directors.frontend.vcl testing [[phab:T219967|T219967]]
* 12:09 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on canary hosts: elastic1017, ganeti2001, analytics1042
* 11:26 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on lvs4007, dns2001 and multatuli
* 11:16 jbond42: upgrade puppet 4=> 5 and facter 2 => 3 on bast4002, aqs1004 and conf2001
* 10:28 moritzm: restarting Parsoid on wtp1025 for glibc update
* 10:19 ema: depool cp3030 for testing [[phab:T219967|T219967]]
* 09:48 marostegui: Remove labtestservices2001 from tendril - [[phab:T218022|T218022]]
* 09:11 moritzm: restarting AQS on aqs1004 for glibc update
* 08:42 elukey: restart pdfrender on scb1003 (alert flapping)
* 08:21 moritzm: uploaded php-xdebug 2.7.0+wmf1 for component/php72 ([[phab:T221923|T221923]])
* 07:20 moritzm: installing glibc updates on a number of analytics hosts
* 04:58 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1113:3316 [[phab:T221782|T221782]] (duration: 00m 56s)
* 00:31 eileen: civicrm revision changed from {{Gerrit|88736c7c11}} to {{Gerrit|34027da7df}}, config revision is {{Gerrit|2119df9495}}
 
== 2019-04-25 ==
* 23:41 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 23:41 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 23:41 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 23:41 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 23:39 eileen: civicrm revision changed from {{Gerrit|519fe8028e}} to {{Gerrit|88736c7c11}}, config revision is {{Gerrit|2119df9495}} - deployed patch to start recording payment_processor_id on recurring
* 22:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 22:56 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 22:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 22:56 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 21:31 andrewbogott: stopping nova services on labnet1001/1002
* 21:26 andrewbogott: revoking M5 grants as per https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/506428/4/modules/role/templates/mariadb/grants/production-m5.sql.erb and https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/506345/3/modules/role/templates/mariadb/grants/production-m5.sql.erb
* 21:12 tgr: [[phab:T221516|T221516]] running mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'FoldDownPro' 'MichaelOBFDP'
* 19:45 mobrovac@deploy1001: Finished deploy [restbase/deploy@7187e0c]: Bump HTML content version in docs, remove Parsoid stash fall-back and start logging all sections requests - [[phab:T221432|T221432]] [[phab:T215956|T215956]] [[phab:T216636|T216636]] (duration: 20m 04s)
* 19:25 mobrovac@deploy1001: Started deploy [restbase/deploy@7187e0c]: Bump HTML content version in docs, remove Parsoid stash fall-back and start logging all sections requests - [[phab:T221432|T221432]] [[phab:T215956|T215956]] [[phab:T216636|T216636]]
* 19:24 mobrovac@deploy1001: Finished deploy [restbase/deploy@7187e0c] (dev-cluster): Bump HTML content version in docs and remove Parsoid stash fall-back (duration: 03m 10s)
* 19:21 mobrovac@deploy1001: Started deploy [restbase/deploy@7187e0c] (dev-cluster): Bump HTML content version in docs and remove Parsoid stash fall-back
* 18:32 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:506316{{!}}Cleanup old EchoCrossWikiBetaFeature]] (2/2) (duration: 00m 53s)
* 18:31 sbisson@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:506316{{!}}Cleanup old EchoCrossWikiBetaFeature]] (1/2) (duration: 00m 54s)
* 18:24 sbisson@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/GrowthExperiments/includes/EventLogging/SpecialHomepageLogger.php: SWAT: [[gerrit:506210{{!}}EventLogging: Make namespace int, use enum for impact module state]] (duration: 00m 54s)
* 16:58 XioNoX: add analytics firewall filter term schema to cr1/2-eqiad - [[phab:T221690|T221690]]
* 16:57 XioNoX: reorganize analytics firewall filters terms (description) on cr1/2-eqiad
* 16:34 moritzm: rolling restart of Cassandra on restbase1016-1018 to pick up Java security update
* 16:27 andrewbogott: repooled labweb1002
* 15:49 andrewbogott: depooling labweb1002 for easier debugging on labweb1001
* 15:09 thcipriani: gerrit back
* 15:07 thcipriani: gerrit restart to pickup new cache config changes
* 14:56 jynus: syncing facts for puppet compiler
* 14:51 jynus: update backup grants for dbprov1* on source dbs
* 12:48 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1113:3316 [[phab:T221782|T221782]] (duration: 00m 53s)
* 12:39 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3314 [[phab:T221782|T221782]] (duration: 00m 53s)
* 11:55 Lucas_WMDE_: EU SWAT done
* 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Kartographer/: SWAT: [[gerrit:506363{{!}}Support data-mw="interface" also in staticframe (T221439)]] (duration: 00m 54s)
* 11:46 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/WikibaseQualityConstraints: SWAT: [[gerrit:505764{{!}}Remove beta feature for constraint suggestions (T220609)]] (duration: 00m 56s)
* 11:43 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/WikibaseQualityConstraints: SWAT: [[gerrit:505763{{!}}Enable constraint suggestions for everyone (T220609)]] (duration: 00m 59s)
* 11:10 Lucas_WMDE_: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript maintenance/namespaceDupes.php --wiki=cswikisource --fix
* 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:506134{{!}}Create new namespace "Edice" for cswikisource (T221697)]] (duration: 00m 54s)
* 09:57 moritzm: installing multipath-tools update from stretch point release
* 09:49 moritzm: installing libcgroup security updates
* 08:30 moritzm: installing php5 security updates
* 08:08 jynus: update statistics grants for dbprov1* on tendril
* 07:56 moritzm: installing gnutls security updates
* 07:01 marostegui: Run compare.py for main tables between db2045 and db2080 [[phab:T220170|T220170]]
* 06:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Reorganize s8 codfw - [[phab:T220170|T220170]] (duration: 00m 54s)
* 06:14 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2080 after onsite maintenance to upgrade BIOS and firmware - [[phab:T216240|T216240]] (duration: 00m 54s)
* 06:04 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Promote db2079 to s8 codfw master [[phab:T220170|T220170]] (duration: 00m 52s)
* 05:47 marostegui: Start changing topology to make db2079 s8 codfw master - [[phab:T220170|T220170]]
* 05:28 marostegui: Deploy schema change on db1103:3314 to fix revision table partitioning and indexing - [[phab:T221782|T221782]]
* 05:27 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 [[phab:T221782|T221782]] (duration: 00m 54s)
* afk: updated fundraising CiviCRM from {{Gerrit|468f85e524}} to {{Gerrit|519fe8028e}}
* 00:12 maxsem@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/CirrusSearch/includes/Maintenance/AnalysisConfigBuilder.php: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CirrusSearch/+/506209/ (duration: 00m 54s)
* 00:00 eileen: process-control config revision is {{Gerrit|0098b7a118}} - adjust dedupe rule
 
== 2019-04-24 ==
* 22:46 mutante: icinga-downtime -h ms-be2034 -r swift-rebalancing -d 86400
* 22:19 mutante: deploying varnish/trafficserver change to cover www.wikiba.se (not prod yet)
* 22:19 mutante: icinga-downtime -h ms-be2039 -r swift-rebalancing -d 86400
* 21:31 mutante: icinga-downtime -h ms-be2038 -r swift-rebalancing -d 86400
* 20:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@8a6b6fc]: Parsoid storage simplification step 1: switch Parsoid stashing to simple key/value - [[phab:T215956|T215956]] (duration: 20m 39s)
* 20:21 mobrovac@deploy1001: Started deploy [restbase/deploy@8a6b6fc]: Parsoid storage simplification step 1: switch Parsoid stashing to simple key/value - [[phab:T215956|T215956]]
* 20:01 mobrovac@deploy1001: Finished deploy [restbase/deploy@8a6b6fc] (dev-cluster): Switch Parsoid stashing to simple key/value (duration: 04m 18s)
* 19:57 mobrovac@deploy1001: Started deploy [restbase/deploy@8a6b6fc] (dev-cluster): Switch Parsoid stashing to simple key/value
* 18:47 mutante: pooled mw1297 as a new API server ([[phab:T192457|T192457]])
* 18:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1297.eqiad.wmnet,cluster=api_appserver
* 18:45 mutante: mw1297 - scap pull
* 18:17 mutante: sudo icinga-downtime -h ms-be2031 -r swift-rebalancing -d 86400
* 17:52 mutante: contint1001 - for logfile in $(find /var/log/zuul/ ! -name "*.gz"); do gzip $logfile; done to get more disk space ([[phab:T207707|T207707]])
* 17:33 mutante: contint1001 - apt-get clean for 1% more disk space
* 17:23 mutante: proton1001 - restarting proton service - low RAM caused facter/puppet fails  (https://tickets.puppetlabs.com/browse/PUP-8048) freed memory and fixed puppet run (cc: [[phab:T219456|T219456]] [[phab:T214975|T214975]])
* 16:33 catrope@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/GrowthExperiments/: Fix exceptions in Homepage logging (duration: 00m 56s)
* 15:52 herron: performing rolling restart of pybal on low-traffic eqiad/codfw lvs hosts
* 15:32 jijiki: Restarting php7.2-fpm on mw2* in codfw for 505383 and [[phab:T211488|T211488]]
* 15:00 herron: switching kibana lvs to source hash scheduler
* 14:41 jijiki: restart pdfrender on scb1002
* 14:28 godog: being rollout rsyslog 8.1901.0-1 to jessie hosts - [[phab:T219764|T219764]]
* 13:38 marostegui: Poweroff db2080 for onsite maintenance - [[phab:T216240|T216240]]
* 13:01 jijiki: Restarting php7.2-fpm on mw13* for 505383 and [[phab:T211488|T211488]]
* 12:36 jijiki: restarting pdfrender on scb1004
* 12:23 moritzm: rolling restart of Cassandra on restbase/eqiad to pick up Java security update
* 11:59 jijiki: Restarting php7.2-fpm on mw12* for 505383 and [[phab:T211488|T211488]]
* 11:45 gehel: restarting relforge for jvm ugprade
* 11:33 jbond42: security update ghostscript on scb jessie servers
* 11:25 jijiki: Restarting php7.2-fpm on mw-canary for 505383 and [[phab:T211488|T211488]]
* 11:23 ladsgroup@deploy1001: Finished deploy [ores/deploy@060fc37]: (no justification provided) (duration: 16m 18s)
* 11:07 ladsgroup@deploy1001: Started deploy [ores/deploy@060fc37]: (no justification provided)
* 10:28 akosiaris@deploy1001: scap-helm cxserver finished
* 10:28 akosiaris@deploy1001: scap-helm cxserver cluster staging completed
* 10:28 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
* 10:23 jijiki: Restarting php-fpm on mw1238 for 505383 and [[phab:T211488|T211488]]
* 09:58 moritzm: installing rsync security updates on jessie
* 08:44 moritzm: rolling restart of Cassandra on restbase/codfw to pick up Java security update
* 08:29 godog: swift eqiad-prod: start decom for ms-be101[45] - [[phab:T220590|T220590]]
* 08:17 godog: bounce prometheus on bast5001 after migration and backfill
* 08:04 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 08:04 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 08:02 gehel@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
* 08:02 gehel@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
* 06:41 marostegui: Optimize tables on pc1010
* 06:38 elukey: restart pdfrender on scb1003
* 06:37 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2082 (duration: 00m 52s)
* 06:22 marostegui: Upgrade db2082
* 06:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2079, depool db2082 (duration: 00m 55s)
* 06:18 marostegui: Upgrade db2081
* 06:10 marostegui: Upgrade db2079
* 06:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2086, depool db2079 (duration: 00m 53s)
* 05:55 marostegui: Upgrade db2086
* 05:55 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2083 and depool db2086 (duration: 00m 52s)
* 05:38 marostegui: Upgrade db2080 and db2083
* 05:38 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2080 and db2083 (duration: 00m 54s)
* 03:45 SMalyshev: repooled wdqs1003, it's good now
* 01:26 eileen: jobs restarted  process-control config revision is {{Gerrit|ef6d4761e5}}
* 01:06 eileen: civicrm revision changed from {{Gerrit|31982324b8}} to {{Gerrit|468f85e524}}, config revision is {{Gerrit|13b9eefe7b}}
* 01:02 eileen: process-control config revision is {{Gerrit|13b9eefe7b}}
* 00:29 mutante: mw1297 - rebooting for nutcracker issue
* 00:28 mutante: mw1297 - scap pull
* 00:08 mutante: DNS - add initiatives.wikimedia.org (and initiaves.m) for campaign wiki requested at [[phab:T167375|T167375]]
 
== 2019-04-23 ==
* 23:51 mutante: mw1297 - initial puppet run - will show up in Icinga in a little while but not pooled yet.. all the things are being installed right now
* 23:48 ejegg: updated payments-wiki (inactive cluster) from {{Gerrit|7a312e371a}} to {{Gerrit|aa8dad50e7}}
* 23:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.Logger.js: SWAT GrowthExperiments: Fix validation errors due to state='' (duration: 00m 53s)
* 23:38 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/GrowthExperiments/includes/EventLogging/SpecialHomepageLogger.php: SWAT GrowthExperiments: Fix EventLogging errors (duration: 00m 53s)
* 23:25 mutante: generating mcrouter certs for appservers, added mw1297.eqiad.wmnet ([[phab:T192457|T192457]])
* 23:23 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/languages/Language.php: SWAT [[phab:T219728|T219728]] Add support for new Japanese era name 'Reiwa' (duration: 00m 52s)
* 23:20 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: SWAT [[phab:T221668|T221668]] VisualEditor: Restore external paste sanitization of DOM elements (duration: 00m 55s)
* 23:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT [[phab:T221521|T221521]] Add autoreviewer to wgRestrictionLevels on ptwikinews (duration: 00m 54s)
* 22:35 XioNoX: push firewall rule to pfw3-eqiad - [[phab:T221475|T221475]]
* 22:33 XioNoX: push firewall rule to pfw3-codfw - [[phab:T221475|T221475]]
* 21:54 reedy@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/ORES/includes/Specials/SpecialORESModels.php: [[phab:T221696|T221696]] (duration: 00m 55s)
* 21:43 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:43 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 21:33 thcipriani: restarting gerrit to pickup config changes
* 20:55 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@51b4728]: Deploy new Updater fix for cnstraints ([[phab:T221407|T221407]]) (duration: 13m 03s)
* 20:43 andrewbogott: updating designate pools on cloudservices1003 and 1004 using eqiad1_pool_config.yml template from the puppet repo
* 20:42 smalyshev@deploy1001: Started deploy [wdqs/wdqs@51b4728]: Deploy new Updater fix for cnstraints ([[phab:T221407|T221407]])
* 20:26 urandom: dropping disused restbase keyspaces -- [[phab:T221530|T221530]]
* 19:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:57 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 19:32 mutante: webperf* - running puppet to git pull docroot
* 19:11 thcipriani: gerrit restart
* 18:59 krinkle@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/MassMessage: {{Gerrit|c640195}} (duration: 00m 56s)
* 18:09 SMalyshev: depool wdqs1003 to let it catch up
* 18:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:03 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:02 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 17:43 jijiki: Restarting memcached on mc1029 - [[phab:T208844|T208844]]
* 17:26 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@78985fb]: Update mobileapps to {{Gerrit|6d3a422}} ([[phab:T201382|T201382]] [[phab:T217837|T217837]]) (duration: 04m 06s)
* 17:22 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@78985fb]: Update mobileapps to {{Gerrit|6d3a422}} ([[phab:T201382|T201382]] [[phab:T217837|T217837]])
* 16:55 jijiki: Depool thumbor2004 for 505759 and pool back - [[phab:T187765|T187765]]
* 16:54 gehel: restart wdqs for jvm ugprade
* 16:49 jijiki: Depool thumbor1004 for 505759 and pool back - [[phab:T187765|T187765]]
* 16:43 jijiki: Depool thumbor2003 for 505759 and pool back - [[phab:T187765|T187765]]
* 16:40 jijiki: Depool thumbor1003 for 505759 and pool back - [[phab:T187765|T187765]]
* 16:38 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable api-request logging to eventgate-analytics for all wikis - [[phab:T214080|T214080]] (duration: 00m 53s)
* 16:33 ottomata: proceeding to enable api-request eventgate-analytics logging for all wikis
* 16:31 herron: added jfishback to wmf ldap group [[phab:T221660|T221660]]
* 16:12 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:12 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 16:07 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: set wglocaltimezone for sqwikiquote [[phab:T221627|T221627]] (duration: 00m 54s)
* 15:28 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable Depicts functionality on Commons (duration: 00m 54s)
* 14:27 jijiki: Depool thumbor2002 for 505759 and pool back - [[phab:T187765|T187765]]
* 14:21 jijiki: Depool thumbor1002 for 505759 and pool back - [[phab:T187765|T187765]]
* 14:16 jijiki: Depool thumbor2001 for 505759 and pool back - [[phab:T187765|T187765]]
* 14:14 jijiki: Depool thumbor1001 for 505759 and pool back - [[phab:T187765|T187765]]
* 14:07 jijiki: Disable puppet on thumbor* to merge 505759
* 13:54 ema: depool cp4021 and reimage as upload_ats [[phab:T219967|T219967]]
* 13:17 jijiki: Restart nagios-nrpe-server on prometheus1003
* 12:15 godog: swift eqiad-prod: fully decom ms-be1013 - [[phab:T220590|T220590]]
* 11:59 moritzm: installing clamav security updates on fermium
* 11:56 kart_: EU-Midday SWAT is done.
* 11:54 kart_: 'SWAT: [[gerrit:505059]] deployment-prep: Use new poolcounter instance, [[gerrit:505060]] deployment-prep: Use new ms-fe host.'
* 11:53 kartik@deploy1001: Synchronized wmf-config/LabsServices.php: SWAT: [[gerrit:505643]]  (duration: 00m 53s)
* 11:45 jijiki: Stop xenon-log, excimer-log and apache on mwlog*
* 11:43 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:505643]] Turn off logging for CitationUsage and CitationUsagePageLoad ([[phab:T213969|T213969]]) (duration: 00m 53s)
* 11:29 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix undefined variable from last SWAT (duration: 00m 54s)
* 11:27 moritzm: installing clamav security updates on mendelevium (OTRS host)
* 11:18 kartik@deploy1001: Synchronized wmf-config: SWAT: [[gerrit:505220]] Use higher unmodified MT threshold for Indonesian Wikipedia ([[phab:T221353|T221353]]) (duration: 00m 57s)
* 10:44 moritzm: uploaded ferm 2.4-1+wmf2+deb10u1 to buster-wikimedia ([[phab:T153468|T153468]])
* 09:23 godog: upgrade prometheus to v2 on bast5001, previous metrics will not be available until migration and backfill are complete - [[phab:T187987|T187987]]
* 09:19 elukey: dumping Kafka consumer offsets' history on logstash1012 for [[phab:T221202|T221202]]
* 09:00 fdans@deploy1001: Finished deploy [analytics/refinery@0d63671]: deploying changes to pageview definition brought in refinery source 0.0.87 (duration: 14m 09s)
* 08:54 fsero: synchronizing old docker_registry content into new one - [[phab:T221101|T221101]]
* 08:46 fdans@deploy1001: Started deploy [analytics/refinery@0d63671]: deploying changes to pageview definition brought in refinery source 0.0.87
* 08:14 moritzm: removing debmonitor entries for labvirt* hosts
* 08:06 moritzm: installing wget security updates on jessie
* 07:27 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T216499|T216499]] Set wgPriorityHintsRatio (duration: 00m 52s)
* 06:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool all slaves in x1 [[phab:T136427|T136427]] (duration: 00m 57s)
* 05:52 elukey: powercycle wtp2019 - no ssh, mgmt console stuck
* 05:16 marostegui: Deploy schema change on x1 master - lag will appear on x1 slaves - [[phab:T136427|T136427]]
* 05:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool all slaves in x1 [[phab:T136427|T136427]] (duration: 00m 54s)
 
== 2019-04-22 ==
* 18:46 gilles@deploy1001: Synchronized php-1.34.0-wmf.1/includes/media/ThumbnailImage.php: [[phab:T216499|T216499]] Only apply high priority hint half the time (duration: 00m 53s)
* 18:22 XioNoX: Add k8s BGP neighbors on cr1/2-eqiad - [[phab:T220822|T220822]]
* 18:15 XioNoX: Add k8s BGP neighbors on cr1/2-codfw - [[phab:T220822|T220822]]
* 08:47 marostegui: finished maintenance window on dbstore1003 and dbstore1005
* 08:37 marostegui: Upgrade dbstore1005
* 07:53 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1099 (duration: 00m 54s)
* 07:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
* 07:16 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
* 06:40 marostegui: Upgrade dbstore1003
* 06:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
* 05:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: More traffic to db1099 (duration: 00m 53s)
* 05:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1099 (duration: 00m 54s)
* 05:26 marostegui: Stop MySQL and reboot db1099 to see if memory errors clear up [[phab:T221502|T221502]]
* 05:25 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099 [[phab:T221502|T221502]] (duration: 01m 15s)
 
== 2019-04-21 ==
* 05:19 marostegui: Clean up some space on webperf2001 - [[phab:T221508|T221508]]
 
== 2019-04-20 ==
* 08:12 _joe_: depooling mw1261,mw1312 wikidata (at least) not working
* 07:58 jijiki: Pool thumbor1001
* 07:52 jijiki: depool thumbor1001, switch back to nginx - [[phab:T187765|T187765]]
* 07:50 _joe_: restarting php-fpm on mw1312, mw1261 to test the new settings over the weekend
 
== 2019-04-19 ==
* 23:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2245.codfw.wmnet,cluster=api_appserver
* 23:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2244.codfw.wmnet,cluster=api_appserver
* 23:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2150.codfw.wmnet,service=nginx,cluster=jobrunner
* 22:55 mutante: mw2244,mw2245,mw2150 - scap pull
* 22:53 mutante: mw2244,mw2245,mw2150 - rebooting for known nutcracker issue after first install
* 22:47 mutante: furud - remounted /mnt/hdfs for [[phab:T221483|T221483]]
* 21:42 mutante: mw2150,mw2244,mw2245: initial puppet run, added to mw roles
* 19:38 otto@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: No-op - enabling cirrussearch-request logging in beta (duration: 00m 52s)
* 19:37 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: No-op - enabling cirrussearch-request logging in beta (duration: 00m 53s)
* 19:36 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: No-op - prep for enabling cirrussearch-request logging in beta (duration: 00m 53s)
* 16:20 bblack: wikipedia.org CNAME TTLs increase to 4H - https://gerrit.wikimedia.org/r/c/operations/dns/+/505249 - [[phab:T208263|T208263]]
* 16:18 ejegg: rolled back payments-wiki from {{Gerrit|eb3d0f35de}} to {{Gerrit|aa8dad50e7}}
* 15:55 reedy@deploy1001: Synchronized php-1.34.0-wmf.1/includes/logging/LogFormatter.php: [[phab:T220767|T220767]] (duration: 00m 53s)
* 15:54 bblack: restart pybal on lvs1016 (eqiad primary) for eventscehmas service add
* 15:54 reedy@deploy1001: Synchronized php-1.34.0-wmf.1/includes/Linker.php: [[phab:T220767|T220767]] (duration: 00m 55s)
* 15:50 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=schema.*
* 15:42 bblack: restart pybal on lvs2003 (codfw primary) for eventscehmas service add
* 15:39 bblack: restart pybal on lvs2006 (codfw backup) for eventscehmas service add
* 15:32 bblack: restarting pybal on lvs1006 (eqiad backup) for eventschema service add
* 14:59 volans: uploaded spicerack_0.0.23-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 12:59 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T216499|T216499]] [[phab:T216598|T216598]] Enable Priority Hints and Element Timing on eswiki (duration: 00m 56s)
* 08:45 akosiaris: restart gerrit to pick up https://gerrit.wikimedia.org/r/504981
* 06:39 elukey: roll restart of druid daemons on druid100[1-3] to pick up new jvm settings
 
== 2019-04-18 ==
* 23:16 mobrovac: evening SWAT completed
* 23:10 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes: (no justification provided) (duration: 00m 54s)
* 23:10 ejegg: updated payments-wiki from {{Gerrit|aa8dad50e7}} to {{Gerrit|eb3d0f35de}}
* 23:07 mobrovac@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wikimania years namespaces to wgNamespacesWithSubpages - [[phab:T220950|T220950]] (duration: 00m 53s)
* 23:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 23:00 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 22:40 ejegg: updated payments-wiki from {{Gerrit|aa8dad50e7}} to {{Gerrit|2f7cd8f195}}
* 22:14 mutante: LDAP - adding 'ldoan' and 'schang' to 'wmf' ([[phab:T221118|T221118]])
* 22:01 XioNoX: remove asw2-a-eqiad license keys for troubleshoting
* 21:58 ejegg: rolled back payments-wiki to {{Gerrit|aa8dad50e7}}
* 21:55 mutante: LDAP - adding rosalie-wmde to group 'wmde' ([[phab:T220691|T220691]])
* 21:52 ejegg: updated payments-wiki from {{Gerrit|aa8dad50e7}} to {{Gerrit|2f7cd8f195}}
* 21:28 mutante: puppetmaster1001 - mcrouter_generate_certs --generate
* 21:18 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (cobalt) (duration: 00m 10s)
* 21:18 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (cobalt)
* 21:17 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (gerrit2001) (duration: 00m 11s)
* 21:17 thcipriani@deploy1001: Started deploy [gerrit/gerrit@e3c340f]: plugin update -- no restart needed (gerrit2001)
* 21:14 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 21:14 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 20:56 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.34.0-wmf.1  refs [[phab:T220726|T220726]]
* 20:52 cdanis: root@icinga1001.wikimedia.org /var/lib/icinga # for DOWNTIME in $(fgrep -B12 'comment=mobrovac: temp stop JQ for [[phab:T221368|T221368]] - cdanis@cumin1001' retention.dat {{!}} grep -A13 servicedowntime {{!}} grep downtime_id {{!}} cut -d= -f2); do  printf "[%lu] DEL_SVC_DOWNTIME;%u\n" $(date +%s) $DOWNTIME ; done > rw/icinga.cmd
* 20:40 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Translate/utils/MessageUpdateJob.php: Translate jobs: Remove problematic Job::$params assignments, dir 2/2 - [[phab:T221368|T221368]] (duration: 01m 00s)
* 20:39 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Translate/tag: Translate jobs: Remove problematic Job::$params assignments, dir 1/2 - [[phab:T221368|T221368]] (duration: 01m 01s)
* 20:32 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'scb*' 'enable-puppet "mobrovac: temp stop JQ for [[phab:T221368|T221368]]"'
* 20:31 mobrovac@deploy1001: Finished deploy [cpjobqueue/deploy@71941b1]: Ignore Kafka disconnect errors (duration: 00m 51s)
* 20:30 mobrovac@deploy1001: Started deploy [cpjobqueue/deploy@71941b1]: Ignore Kafka disconnect errors
* 19:36 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cookbook sre.hosts.downtime -r "mobrovac: temp stop JQ for [[phab:T221368|T221368]]" 'scb*'
* 19:36 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:36 cdanis@cumin1001: START - Cookbook sre.hosts.downtime
* 19:29 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'scb*' 'disable-puppet "mobrovac: temp stop JQ for [[phab:T221368|T221368]]" && systemctl stop cpjobqueue'
* 19:17 mobrovac@deploy1001: Started restart [cpjobqueue/deploy@922cbc0]: Bounce CP4JQ, lots of transport broken failures - [[phab:T221368|T221368]]
* 19:11 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes/EventFactory.php: Remove the use of page titles in JobExecutor, file 2/2 - [[phab:T221368|T221368]] (duration: 00m 59s)
* 19:10 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes/JobExecutor.php: Remove the use of page titles in JobExecutor, file 1/2 - [[phab:T221368|T221368]] (duration: 01m 01s)
* 18:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:47 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:47 robh@cumin1001: START - Cookbook sre.hosts.decommission
* 18:41 mutante: mw2150 - reimaging, not in confctl
* 18:02 dzahn@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw2151.codfw.wmnet,cluster=jobrunner,service=nginx
* 17:49 mutante: mw2151 - scap pull
* 17:46 mobrovac@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/EventBus/includes/JobExecutor.php: Default to a dummy title for invalid titles - [[phab:T221368|T221368]] (duration: 01m 01s)
* 17:20 twentyafterfour@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/AbuseFilter/includes/: sync https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/504863 (duration: 01m 00s)
* 16:20 bblack: Experimental DNS-level changes deploying for wikipedia.org domain - if wikipedia.org DNS problems appear, revert https://gerrit.wikimedia.org/r/c/operations/dns/+/504588 - [[phab:T208263|T208263]]
* 16:17 XioNoX: remove peering to 63199 in eqsin (down for 1 month, no reply to emails)
* 16:13 XioNoX: rollback dhcp option 82 test from asw2-b-eqiad
* 14:55 fsero: synchronizing docker_registry_codfw swift container from docker_registry
* 14:40 XioNoX: push firewall change to pfw3-eqiad - [[phab:T221278|T221278]]
* 13:30 jbond42: rolling updates of ruby2.1 on jessie
* 13:08 elukey: roll restart of cassandra on aqs* to pick up new openjdk upgrades
* 13:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:58 reedy@deploy1001: rebuilt and synchronized wikiversions files: group1 back to .25
* 12:36 anomie: Ran `php7adm /opcache-free` on mw1274 to test a theory related to [[phab:T221347|T221347]]. The log entries related to that task stopped immediately.
* 12:30 gehel: restarting blazegraph + updater on wdqs* for jvm upgrade
* 12:22 moritzm: installing Java security updates on restbase-dev hosts (along with Cassandra restarts)
* 12:21 gehel: restarting blazegraph + updater on wdqs1009 / wdqs1010 for jvm upgrade
* 12:19 moritzm: installing Java security updates on WDQS autodeploy/test hosts
* 10:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:40 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:35 moritzm: installing rails security updates on jessie hosts
* 10:21 moritzm: installing jasper updates on jessie hosts
* 09:44 akosiaris: update grafana service/ dashboard to have user, system, throttled CPU metrics under the CPU saturation row
* 09:41 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T216597|T216597]] Run CPU benchmark for all samples on eswiki/ruwiki (duration: 01m 06s)
* 09:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:10 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 08:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 08:54 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 08:53 elukey: reboot kafka10[12-23] (old Analytics cluster) for kernel + openjdk upgrades
* 08:23 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 08:14 moritzm: installing libssh2 security updates on jessie
* 08:01 moritzm: restarting mw1261-mw1265 to pick up new libssh2
* 07:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:55 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:53 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2004.codfw.wmnet
* 07:28 moritzm: installing libssh2 security updates
* 07:19 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 06:58 moritzm: restarting icinga on icinga1001 ([[phab:T196336|T196336]])
* 06:37 moritzm: rolling reboots of Swift backends in eqiad for combined kernel/glibc/OpenSSL update
 
== 2019-04-17 ==
* 22:46 krinkle@deploy1001: Synchronized php-1.34.0-wmf.1/includes/: {{Gerrit|I3a50508178159}} (duration: 01m 21s)
* 22:40 XioNoX: push firewall change to pfw3-codfw - [[phab:T221278|T221278]]
* 22:28 krinkle@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Score/: {{Gerrit|Id58156cfca805}} / [[phab:T219342|T219342]] (duration: 01m 03s)
* 21:30 XioNoX: enable option-82 on asw2-b:cloud-hosts1-b-eqiad vlan
* 21:10 thcipriani: gerrit back
* 21:07 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4dcb851]: Gerrit update (cobalt -- restart incoming) (duration: 00m 10s)
* 21:07 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4dcb851]: Gerrit update (cobalt -- restart incoming)
* 21:06 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@4dcb851]: Gerrit update (gerrit2001 only) (duration: 00m 11s)
* 21:06 thcipriani@deploy1001: Started deploy [gerrit/gerrit@4dcb851]: Gerrit update (gerrit2001 only)
* 19:14 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.34.0-wmf.1  refs [[phab:T220726|T220726]] (duration: 01m 49s)
* 19:13 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.34.0-wmf.1  refs [[phab:T220726|T220726]]
* 18:04 thcipriani: gerrit back
* 18:01 thcipriani: gerrit restart for https://gerrit.wikimedia.org/r/504611/
* 17:59 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SDC: Enable Wikidata federation on Commons again [[phab:T214075|T214075]] (duration: 01m 00s)
* 17:20 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling EventGate api-request logging on group1 wikis (duration: 01m 00s)
* 17:18 mutante: LDAP - added 'brennen' to group 'gerritadmin' ([[phab:T218858|T218858]])
* 17:18 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/OATHAuth/: UBN [[phab:T221257|T221257]] train un-blocker (duration: 01m 02s)
* 17:09 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/Echo/includes/formatters/: Notifications: Revert {{Gerrit|7121b9c4}} per {{Gerrit|I8f9a6a19ba}} (duration: 01m 01s)
* 16:49 tzatziki: deleting three files for legal compliance
* 16:47 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/WikibaseMediaInfo/: SDC: Various fixes [[phab:T218922|T218922]] [[phab:T221071|T221071]] [[phab:T221110|T221110]] [[phab:T221123|T221123]] (duration: 01m 02s)
* 16:41 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/autoload.php: Update to point to new maintenance scripts (duration: 01m 00s)
* 16:39 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/maintenance/language/generateUpperCharTable.php: Maintenance script for _joe_ (duration: 00m 59s)
* 16:38 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/maintenance/language/generateUcfirstOverrides.php: Maintenance script for _joe_ (duration: 01m 00s)
* 16:21 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/languages/Language.php: [[phab:T219279|T219279]] Ability to set wgOverrideUcfirstCharacters part 1 try two (duration: 01m 00s)
* 16:18 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/includes/DefaultSettings.php: [[phab:T219279|T219279]] Ability to set wgOverrideUcfirstCharacters part 1b (duration: 01m 03s)
* 16:13 jforrester@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 16:11 XioNoX: set fasw-c-eqiad:ge-[0-1]/0/17 in admin vlan - [[phab:T221232|T221232]]
* 16:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT [[phab:T220434|T220434]] Deploy Partial blocks to Chinese Wikipedia (duration: 01m 02s)
* 14:37 ariel@deploy1001: Finished deploy [dumps/dumps@dcf04a0]: fix up paths for 1.34_wmf.1 for AbstractFilter (duration: 00m 04s)
* 14:36 ariel@deploy1001: Started deploy [dumps/dumps@dcf04a0]: fix up paths for 1.34_wmf.1 for AbstractFilter
* 14:35 otto@deploy1001: scap-helm eventgate-analytics finished
* 14:35 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
* 14:35 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
* 14:34 otto@deploy1001: scap-helm eventgate-analytics finished
* 14:34 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
* 14:34 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
* 14:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:56 otto@deploy1001: scap-helm eventgate-analytics finished
* 13:56 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 13:56 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 13:52 elukey: upgrading hadoop cdh distrubition to 5.16.1 on all the Hadoop-related nodes - [[phab:T218343|T218343]]
* 13:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:48 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 13:48 godog: reimage prometheus2004 - [[phab:T187987|T187987]]
* 12:57 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1004.eqiad.wmnet
* 12:44 godog: bounce prometheus instances on prometheus[12]003 after https://gerrit.wikimedia.org/r/c/operations/puppet/+/499742
* 12:33 moritzm: running some ferm tests on graphite2002
* 12:10 godog: briefly stop all prometheus on prometheus1003 to finish metrics rsync - [[phab:T187987|T187987]]
* 11:39 Lucas_WMDE: EU SWAT done
* 11:38 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:504380{{!}}Enable suggestion constraint status on testwikidata (T221108, T204439)]] (duration: 01m 01s)
* 10:58 volans@deploy1001: Finished deploy [debmonitor/deploy@f049b3b]: Deploy Debmonitor v0.1.9 (duration: 01m 00s)
* 10:57 volans@deploy1001: Started deploy [debmonitor/deploy@f049b3b]: Deploy Debmonitor v0.1.9
* 10:40 moritzm: installing Java security updates on kafka/analytics cluster
* 09:17 godog: swift eqiad-prod continue ms-be1013 decom - [[phab:T220590|T220590]]
* 09:09 elukey: restart eventlogging on eventlog1002 due to errors in processors and consumer lag accumulated after the last Kafka Jumbo roll restart
* 08:47 godog: reimage prometheus1004 - [[phab:T187987|T187987]]
* 08:38 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 fully (duration: 01m 00s)
* 08:29 moritzm: installing ghostscript security updates
* 07:51 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/NavigationTiming: [[phab:T216597|T216597]] Event timing support (duration: 01m 01s)
* 07:45 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T216597|T216597]] Enable Event Timing origin trial on ruwiki and eswiki (duration: 01m 04s)
* 07:21 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 with low load (duration: 01m 18s)
* 07:07 moritzm: rolling reboots of Swift backends in codfw for combined kernel/glibc/OpenSSL update
 
== 2019-04-16 ==
* 23:42 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Return CirrusSearch to standard execution against eqiad cluster (duration: 01m 00s)
* 23:37 ebernhardson@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/CirrusSearch/includes/: Fix fatals on malformed search queries against overridden clusters (duration: 01m 06s)
* 22:42 thcipriani: gerrit back
* 22:39 thcipriani: restarting gerrit for configuration update https://gerrit.wikimedia.org/r/504448
* 22:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T165795|T165795]] Give bureaucrats the usermerge right (duration: 00m 59s)
* 22:20 jforrester@deploy1001: Synchronized php-1.34.0-wmf.1/extensions/NewUserMessage/includes/NewUserMessage.php: Disable onLocalUserCreated for known bot accounts (duration: 01m 01s)
* 22:17 mobrovac@deploy1001: Finished deploy [restbase/deploy@f1c767d]: mobile-sections simplification: use the key/value bucket only - [[phab:T215960|T215960]] (duration: 20m 02s)
* 22:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T165795|T165795]] Enable the UserMerge extension for clean-up on wikitech (duration: 01m 00s)
* 21:57 mobrovac@deploy1001: Started deploy [restbase/deploy@f1c767d]: mobile-sections simplification: use the key/value bucket only - [[phab:T215960|T215960]]
* 21:56 eileen: civicrm revision changed from {{Gerrit|1bc1570967}} to {{Gerrit|31982324b8}}, config revision is {{Gerrit|e5a7908330}}
* 21:56 mobrovac@deploy1001: Finished deploy [restbase/deploy@f1c767d] (dev-cluster): mobile-sections simplification: use the key/value bucket only (duration: 05m 24s)
* 21:50 mobrovac@deploy1001: Started deploy [restbase/deploy@f1c767d] (dev-cluster): mobile-sections simplification: use the key/value bucket only
* 21:47 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.1  refs [[phab:T220726|T220726]]
* 21:24 andrewbogott: deleting 'eqiad' endpoint in keystone
* 21:21 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.34.0-wmf.1  refs [[phab:T220726|T220726]] (duration: 36m 47s)
* 21:09 XioNoX: add wpao to wmf/ops in LDAP - [[phab:T221142|T221142]]
* 21:02 cdanis@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1280.eqiad.wmnet
* 20:59 otto@deploy1001: scap-helm eventgate-analytics finished
* 20:58 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 20:58 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 20:55 andrewbogott: removing keystone endpoints for the 'eqiad' region
* 20:45 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.34.0-wmf.1  refs [[phab:T220726|T220726]]
* 20:43 mobrovac@deploy1001: Finished deploy [restbase/deploy@dfca9e6]: Use the simplified key/value bucket - [[phab:T215960|T215960]] (duration: 19m 52s)
* 20:43 otto@deploy1001: scap-helm eventgate-analytics finished
* 20:42 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 20:42 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 20:23 mobrovac@deploy1001: Started deploy [restbase/deploy@dfca9e6]: Use the simplified key/value bucket - [[phab:T215960|T215960]]
* 20:19 ariel@deploy1001: Finished deploy [dumps/dumps@796ccb5]: use safe_load yaml and getReplicaServer.php, cleanup symlinks once per job only (duration: 00m 04s)
* 20:19 ariel@deploy1001: Started deploy [dumps/dumps@796ccb5]: use safe_load yaml and getReplicaServer.php, cleanup symlinks once per job only
* 20:11 mobrovac@deploy1001: Finished deploy [restbase/deploy@dfca9e6] (dev-cluster): Use the simplified key/value bucket (duration: 05m 24s)
* 20:05 mobrovac@deploy1001: Started deploy [restbase/deploy@dfca9e6] (dev-cluster): Use the simplified key/value bucket
* 20:04 otto@deploy1001: scap-helm eventgate-analytics finished
* 20:04 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 20:04 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 19:59 otto@deploy1001: scap-helm eventgate-analytics finished
* 19:59 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 19:59 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 19:56 gehel: restarting cassandra on maps* for config change - [[phab:T221055|T221055]]
* 19:49 otto@deploy1001: scap-helm eventgate-analytics finished
* 19:49 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 19:49 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 19:48 otto@deploy1001: scap-helm eventgate-analytics finished
* 19:48 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 19:48 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set main_app.debug_mode_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 19:11 twentyafterfour: twentyafterfour@deploy1001:/srv/mediawiki-staging$ scap prep 1.34.0-wmf.1
* 19:07 bblack: restarting varnish backend on cp1083
* 19:04 bblack: restarting varnish backend on cp1085
* 18:55 cdanis: cdanis@cp1085.eqiad.wmnet ~ % sudo -i depool
* 18:53 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:53 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 18:53 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set main_app.profiling_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 18:46 twentyafterfour: branching 1.34.0-wmf.1 refs [[phab:T220726|T220726]]
* 18:25 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:25 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 18:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set wmfdebug_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 18:14 cmjohnson1: powering off mw1280 to replace DIMM
* 18:08 mutante: restbase2007, restbase2008 - re-enabled puppet which was disabled with reason 'decom'ed' but actually needed to run to decom after they had moved to role::spare::system ([[phab:T208087|T208087]])
* 17:56 reedy@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/WikimediaIncubator/: [[phab:T220623|T220623]] (duration: 00m 53s)
* 17:47 herron: beginning rolling ELK upgrade to 5.6.15
* 17:46 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: [[gerrit:504386{{!}}no-op preparatory change (T221107)]] (duration: 00m 52s)
* 17:36 arturo: toolforge k8s reallocation (from nova-network to neutron) is causing troubles with IRC bots, expect missing entries in the SAL
* 17:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:28 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 17:27 andrewbogott: restarting rabbitmq on cloudcontrol1003
* 17:26 cdanis@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,name=mw1280.eqiad.wmnet,cluster=api_appserver
* 17:25 arturo: rebooted cloudnet1003
* 17:24 gehel: force initialization of unassigned shards on elasticsearch eqiad
* 17:16 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:504374{{!}}no-op preparatory change (T221108)]] (duration: 00m 52s)
* 16:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript extensions/WikibaseQualityConstraints/maintenance/ImportConstraintEntities.php --wiki=testwikidatawiki --config-format=wgConf {{!}} tee [[phab:T221108|T221108]].php
* 16:53 mutante: bast2001 - shutdown -h now - decom'ed ([[phab:T219492|T219492]])
* 16:48 mutante: puppet node clean bast2001.wikimedia.org ; puppet node deactivate bast2001.wikimedia.org ; it showed up in Icinga again despite running decom cookbook ([[phab:T219492|T219492]])
* 16:47 otto@deploy1001: scap-helm eventgate-analytics finished
* 16:47 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 16:47 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values --set wmfdebug_enabled=true stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 16:44 otto@deploy1001: scap-helm eventgate-analytics finished
* 16:44 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 16:44 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 16:43 jynus: upgrading and shutting down db1078 [[phab:T219115|T219115]]
* 16:41 jynus: disabling notifications on db1078 [[phab:T219115|T219115]]
* 16:37 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 (duration: 00m 52s)
* 15:36 arturo: reimaging cloudnet2002-dev because role name change
* 15:21 otto@deploy1001: scap-helm eventgate-analytics finished
* 15:21 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 15:20 otto@deploy1001: scap-helm eventgate-analytics upgrade staging --version 0.0.28 -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 15:19 otto@deploy1001: scap-helm eventgate-analytics finished
* 15:19 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 15:19 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 15:18 otto@deploy1001: scap-helm eventgate-analytics finished
* 15:18 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 15:18 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 15:16 elukey: roll restart kafka on kafka-jumbo100[1-6] to pick up openjdk upgrades
* 14:58 gehel: manual data transfer from wdqs1008 to wdqs1009 - [[phab:T220830|T220830]]
* 14:56 ema: swift-fe-eqiad: nginx reload for new TLS certificate [[phab:T204245|T204245]]
* 14:53 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 14:52 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:51 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1005.eqiad.wmnet
* 14:45 ema: test https://gerrit.wikimedia.org/r/504340 on ms-fe1005 [[phab:T204245|T204245]]
* 14:30 ema: swift-fe-codfw: nginx reload for new TLS certificate [[phab:T204245|T204245]]
* 14:22 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 14:21 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:20 elukey: roll restart of all the druid daemons on druid100[1-6] to pick up new openjdk updates
* 14:17 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe2005.codfw.wmnet
* 14:07 jijiki: Pooling thumbor1001
* 14:04 ema: test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/504331/ on ms-fe2005 [[phab:T204245|T204245]]
* 14:01 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe2005.codfw.wmnet
* 14:01 jijiki: Depooling thumbor1001
* 13:58 jijiki: Disable puppet on thumbor1001 for ~24h to serve traffic via haproxy - [[phab:T187765|T187765]]
* 13:54 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 13:53 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:52 jijiki: Enable puppet on thumbor*
* 13:42 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 13:41 gehel@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:39 gehel: restetting cookbooks repo on cumin1001 (local changes)
* 13:34 jijiki: Disabling puppet on thumbor* to merge 504284
* 13:13 ema: cp-ats: upgrade fifo-log-demux to 0.2 and restart services
* 13:10 ema: fifo-log-demux 0.2 uploaded to stretch-wikimedia
* 13:03 arturo: [[phab:T220095|T220095]] renaming/reimaging labtestcontrol2003 as cloudcontrol2003-dev
* 12:58 moritzm: installing ghostscript update on thumbor1001
* 12:54 gehel: cleanup redundant prometheus-elasticsearch units on elasticsearch servers
* 12:52 godog: swift eqiad-prod continue ms-be1013 decom - [[phab:T220590|T220590]]
* 12:17 moritzm: installing OpenSSL 1.0.2 updates on cp* Varnish hosts
* 12:07 arturo: rebooting cloudvirt200[123]-dev because deep changes in config
* 11:18 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgWikibaseMusicalNotationLineWidthInches to config ([[phab:T218191|T218191]]) (duration: 00m 52s)
* 11:10 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Revert "WikibaseClient: Conditionally enable mapframe support" ([[phab:T218051|T218051]]) (duration: 00m 51s)
* 11:08 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable signatures in 2019: NS (ID 128) for wikimaniawiki ([[phab:T221062|T221062]]) (duration: 00m 52s)
* 10:49 gilles: [[phab:T221065|T221065]] eswiki purge finished
* 10:45 moritzm: installing libjs-bootstrap updates from Stretch point release
* 10:21 gilles: [[phab:T221065|T221065]] mwscript purgeList.php eswiki --all --verbose on mwmaint1002
* 10:21 moritzm: installing xapian-core update from stretch point release
* 10:18 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T221065|T221065]] Set up origin trials on Spanish Wikipedia mobile site (duration: 00m 52s)
* 09:59 jijiki: Enabling puppet again on on dbproxy* and thumbor*
* 09:51 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Reduce db1078 load (duration: 00m 53s)
* 09:37 jijiki: Disabling puppet on dbproxy* and thumbor* to merge 502972
* 09:26 fsero: [late logging] swift container-to-container synchronization enabled between docker_registry_eqiad and docker_registry_codfw swift containers at 08:15:00 UTC
* 09:05 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
* 09:05 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=nginx
* 09:05 ema: cp1076: repool varnish-fe pointing to Varnish [[phab:T213263|T213263]]
* 08:57 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=varnish-fe
* 08:57 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1076.eqiad.wmnet,service=nginx
* 08:57 ema: cp1076: depool varnish-fe in preparation of traffic switchback to Varnish [[phab:T213263|T213263]]
* 08:40 hoo: Updated the Wikidata property suggester with data from the 2019-04-08 JSON dump and applied the [[phab:T132839|T132839]] workarounds
* 08:33 moritzm: rebooting ms-be1020 for combined kernel/glibc/OpenSSL update
* 08:01 moritzm: rebooting Swift frontends in codfw for combined kernel/glibc/OpenSSL security updates
* 07:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=varnish-fe
* 07:50 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2002.codfw.wmnet,service=nginx
* 07:50 ema: cp2002: repool varnish-fe pointing to Varnish [[phab:T213263|T213263]]
* 07:47 moritzm: rebooting Swift frontends in eqiad combined kernel/glibc/OpenSSL security updates
* 07:45 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=varnish-fe
* 07:45 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2002.codfw.wmnet,service=nginx
* 07:45 ema: cp2002: depool varnish-fe in preparation of traffic switchback to Varnish [[phab:T213263|T213263]]
* 07:36 marostegui: Upgrade db2093
* 07:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=varnish-fe
* 07:33 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2005.codfw.wmnet,service=nginx
* 07:32 ema: cp2005: repool varnish-fe pointing to Varnish [[phab:T213263|T213263]]
* 07:25 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=varnish-fe
* 07:25 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2005.codfw.wmnet,service=nginx
* 07:25 ema: cp2005: depool varnish-fe in preparation of traffic switchback to Varnish [[phab:T213263|T213263]]
* 07:11 moritzm: upgrading Java on Hadoop/Kafka/Jumbo/Druid clusters
* 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 00m 31s)
* 01:46 aaron@deploy1001: Synchronized php-1.33.0-wmf.25/includes/parser/Parser.php: {{Gerrit|73529ae6c5ffb6}} (duration: 00m 53s)
* 00:34 onimisionipe: pooled maps2003 - postgres init complete!
* 00:33 krinkle@deploy1001: Synchronized wmf-config/profiler.php: {{Gerrit|I7589aa153}} (duration: 00m 52s)
* 00:33 urandom: creating new restbase schema -- [[phab:T221031|T221031]]
 
== 2019-04-15 ==
* 23:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 23:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 23:20 cdanis: cdanis@icinga1001.wikimedia.org ~ % sudo systemctl restart tcpircbot-logmsgbot.service
* 23:17 bd808: scap: SWAT: [[gerrit:497423{{!}}wikitech: Use cn:caseExactMatch: as account search filter]] ([[phab:T165795|T165795]])
* 20:59 thcipriani: gerrit back
* 20:57 gehel: shutting down blazegraph and updater on wdqs1010, waiting for data reimport
* 20:55 thcipriani: gerrit restart to pick up gc log changes incoming
* 20:37 arlolra: Updated Parsoid to {{Gerrit|83c17fc9}}
* 20:23 Amir1: the ores deployment is over
* 19:49 XioNoX: export BGP communities (prepend x3 outside asia) to AS3491 in eqsin
* 19:46 mutante: bromine/vega: rm /etc/rsyncd.conf ; systemctl stop rsync (clean up old rsync config gerrit:503961)
* 19:45 XioNoX: update (and add) AS3491 BGP communities in eqsin
* 18:58 XioNoX: update mr1-* security policies - [[phab:T219384|T219384]]
* 18:41 onimisionipe: depooling maps2003 for psotgres init
* 18:40 onimisionipe: pooling map2002 - postgres init complete
* 18:39 Amir1: Morning SWAT is done
* 18:35 shdubsh: logstash1009: disabling puppet and testing logstash config
* 18:09 mutante: LDAP - adding legoktm and qchris to gerritadmin group ([[phab:T219086|T219086]])
* 17:45 anomie: Backporting fix for [[phab:T220991|T220991]]
* 17:41 akosiaris: force puppet agent run on maps* after moving config-vars.yaml file for kartotherian, tilerator, tileratorui [[phab:T220982|T220982]]
* 17:33 mutante: LDAP - re-adding 'pbj' to 'nda' group, extended access until May 6th, transparency report contractor
* 17:23 mutante: wikibugs - qdel'ed jobs and restarted another time, make it rejoin
* 17:17 onimisionipe: wdqs deployment is complete! for some reasons I don't know scap did not logging here
* 17:17 herron: restarted logstash on logstash1007
* 17:15 mutante: restarted wikibugs because it stopped talking
* 16:08 onimisionipe: pooling maps2001 - postgres reinit is complete
* 15:55 Reedy: changed /srv/mediawiki/docroot/wikimedia.org to a symlink to standard-docroot
* 15:53 XioNoX: add cloud-in4 firewall filter to codfw - [[phab:T211921|T211921]]
* 15:31 onimisionipe: restarting prometheus-wmf-elasticsearch-exporter-9* on all elastic nodes
* 15:30 onimisionipe: restarting prometheus-wmf-elasticsearch-exporter-9200 on all elastic nodes
* 15:28 _joe_: systemctl reset-failed on ms-be1027, debmonitor session
* 15:24 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T219871|T219871]])
* 14:55 gehel: deploying tilerator to maps1001 to validate deployment is working - [[phab:T220982|T220982]]
* 14:55 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T219871|T219871]])
* 14:43 _joe_: running apply-config-tilerator on maps1001
* 14:40 _joe_: running apply-config-karthoterian on maps1001
* 14:22 cdanis: [[phab:T220982|T220982]] cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps1*' 'sudo chmod -R a+r /srv/deployment/tilerator /srv/deployment/kartotherian'
* 14:21 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps1*' "disable-puppet 'bad permissions - [[phab:T220982|T220982]] - cdanis'"
* 14:18 cdanis: cdanis@cumin1001.eqiad.wmnet ~ % sudo cumin 'maps*' 'sudo chmod -R a+r /srv/deployment/tilerator /srv/deployment/kartotherian'
* 14:18 gehel: reseting permissions on maps server fir /srv/deployment/kartotherian and /srv/deplyoment/tilerator
* 14:04 moritzm: rebooting ms-fe1005 for combined kernel/glibc/OpenSSL update
* 13:57 jbond42: upgrading puppet 4 -> 5 and facter 2 -> 3 on mediawiki::canary_appserver, mediawiki::appserver::canary_api and cache::cache roles
* 13:56 gehel: restart tilerator / kartotherian on all maps servers for openssl update
* 13:55 godog: start ms-be1013 decom - [[phab:T220590|T220590]]
* 13:42 godog: reboot ms-be1013
* 13:09 moritzm: installing wget security updates on trusty hosts
* 12:59 moritzm: restarting archiva on archiva1001 for OpenJDK security update
* 12:50 moritzm: restarting Apache on matomo1001 to pick up OpenSSL update
* 12:14 moritzm: rolling restart of HHVM/Apache on deployment servers to pick up OpenSSL update
* 11:59 fsero: pointing boron docker builds to the new registry temporarily (docker builds on boron might fail)
* 11:35 Amir1: EU swat is done
* 11:26 moritzm: rolling restart of HHVM/Apache on labweb* to pick up OpenSSL update
* 09:58 moritzm: installing openssl1.0 security updates
* 09:18 gehel: unbanning elastic1029 from cluster
* 08:58 moritzm: updating mediawiki servers in eqiad to version 1.8.1 of the PHP extension for wikidiff
* 08:29 onimisionipe: increase wal_keep_segments on codfw maps master
* 08:19 moritzm: updating mediawiki servers in codfw to version 1.8.1 of the PHP extension for wikidiff
* 07:50 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/initSiteStats.php --wiki=hywwiki --active  ([[phab:T220936|T220936]])
* 05:31 marostegui: Upgrade db1100
* 05:07 marostegui: powercycle mw1280 (crashed)
 
== 2019-04-14 ==
* 06:10 ebernhardson: unban elastic1027 from eqiad-psi
* 05:36 ebernhardson: unbanning elastic1027 after about half the shards left and load dropped
* 05:31 ebernhardson: ban elastic1027 from elasticsearch-psi in eqiad
* 04:59 ebernhardson: restart elasticsearch_6@production-searhc-psi-eqiad on elastic1027 due to 100% cpu for last 30+ minutes
 
== 2019-04-13 ==
* 18:46 godog: 3h downtime for cloudvirt1015
* 15:58 ebernhardson: restart elasticsearch on elastic1027
* 15:34 shdubsh: restart recommendation_api on scb1001
* 15:33 shdubsh: restart recommendation_api on scb2001
* 10:46 onimisionipe: depooling maps2001 for postgres init
* 08:05 gehel: repooling wdqs1008 - data transfer completed - [[phab:T220830|T220830]]
* 00:32 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/includes/: {{Gerrit|Idc19cc29764a}} / [[phab:T220854|T220854]] - hot fix (duration: 05m 37s)
 
== 2019-04-12 ==
* 21:16 Krinkle: scap was unable to sync to 1 apache (connect to host cloudweb2001-dev.wikimedia.org port 22: Connection timed out)
* 21:10 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/ImageMap/includes/ImageMap.php: {{Gerrit|I0ee84f059da}} / [[phab:T217087|T217087]] (duration: 05m 12s)
* 19:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 19:27 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 19:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 17:17 onimisionipe: depooling maps2002 for postgres init
* 17:16 onimisionipe: repooling maps2001 - postgres init is complete
* 16:14 elukey: install ifstat on all the mc1* hosts for network bandwidth investigation
* 15:56 gehel: starting data trasnfer from wdqs1008 to wdqs1009 - [[phab:T220830|T220830]]
* 15:32 thcipriani: gerrit back
* 15:29 thcipriani: gerrit restart incoming
* 14:29 onimisionipe: depool maps2001 for postgres initialization
* 13:24 akosiaris: re-enable puppet across the fleet. Patch merged, recovery storm coming
* 13:18 akosiaris: disable puppet across the fleet to avoid incoming puppet alert storm
* 12:57 marostegui: Purge old rows and optimize tables on spare host pc1010 [[phab:T210725|T210725]]
* 12:53 urandom: decommissioning cassandra-c, restbase2008 -- [[phab:T208087|T208087]]
* 12:49 gehel: rolling restart of cassandra on maps* for jvm upgrade
* 12:22 arturo: [[phab:T220095|T220095]] disable icinga checks for labtestcontrol2003
* 12:16 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T220807|T220807]] Reduce cawiki survey sampling rate (duration: 05m 11s)
* 11:56 moritzm: upgrading app server canaries to version 1.8.1 of the PHP wikidiff extension (HHVM already deployed) [[phab:T203069|T203069]]
* 11:46 moritzm: upgrading acmechief hosts to latest buster state
* 11:44 gilles@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T220807|T220807]] Oversample navtiming on cawiki and commonswiki (duration: 05m 14s)
* 11:37 Trey314159: reindexing Greek, Turkish, and Irish wikis on elastic@eqiad and elastic@codfw complete ([[phab:T217806|T217806]])
* 11:19 moritzm: installed Java security updates on relforge* hosts
* 11:10 moritzm: installing Java security updates on remaining maps hosts
* 10:32 arturo: [[phab:T219626|T219626]] reimaging cloudcontrol2001-dev
* 10:13 elukey: matomo updated to 3.9.1 on matomo1001 + deb upload to wikimedia-stretch - [[phab:T218037|T218037]]
* 09:53 moritzm: updated mwdebug1001 to php-wikidiff 1.8.1
* 09:37 moritzm: updated mwdebug1002 to php-wikidiff 1.8.1
* 09:30 volans: reset mgmt card on labtestcontrol2003 - [[phab:T220783|T220783]]
* 09:07 moritzm: added the wikimedia repository key to the stretch build chroot on boron, fixes builds using the PHP72/SPICERACK hooks
* 09:05 arturo: [[phab:T218021|T218021]] disable icinga checks for labtestcontrol2001
* 08:35 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/NavigationTiming/modules/ext.navigationTiming.js: [[phab:T220788|T220788]] Fix veaction === null case (duration: 00m 54s)
* 08:02 moritzm: updated ssacli in thirdparty/hwraid component for stretch to 3.30-13.0 [[phab:T220787|T220787]]
* 07:12 marostegui: Manually install ssacli on db2[097{{!}}098{{!}}099{{!}}100{{!}}101{{!}}102] [[phab:T220787|T220787]] [[phab:T220572|T220572]]
* 07:04 moritzm: synced ssacli to thirdparty/hwraid components for jessie/stretch [[phab:T220787|T220787]]
* 01:00 mutante: puppet cert clean, puppet node clean, puppet node deactivate on cloudnet2001-dev.codfw.wmnet  ([[phab:T218025|T218025]])
* 00:25 tstarling@deploy1001: Synchronized wmf-config/profiler.php: increase excimer max depth (duration: 00m 53s)
* 00:02 ejegg: updated fundraising CiviCRM from {{Gerrit|24b968b1f9}} to {{Gerrit|1bc1570967}}
 
== 2019-04-11 ==
* 23:57 urandom: decommissioning cassandra-b, restbase2008 -- [[phab:T208087|T208087]]
* 22:15 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/WikibaseMediaInfo/resources/: Hot-deploy fix for WBMI variable cache miss [[phab:T220665|T220665]] (duration: 00m 55s)
* 20:46 mutante: deleting job of wikibugs-phab-listener in an attempt to restart it
* 19:47 cdanis: cdanis@mwdebug1001.eqiad.wmnet ~ % sudo systemctl stop hhvm && sudo rm /var/cache/hhvm/fcgi.hhbc.sq3 && sudo systemctl start hhvm
* 19:39 twentyafterfour: mediawiki error rate seems to be back to normal after deploying 1.33.0-wmf.25, the new branch looks stable refs [[phab:T206679|T206679]]
* 18:55 mutante: disabling puppet on hosts using class 'confd' to safely deploy gerrit:456317
* 18:55 Trey314159: reindexing Greek, Turkish, and Irish wikis on elastic@eqiad and elastic@codfw  ([[phab:T217806|T217806]])
* 18:01 onimisionipe: increase replication factor on maps codfw cluster
* 17:45 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@5394b59] (stretch): Insert maps2001 into stretch environment (duration: 00m 22s)
* 17:45 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@5394b59] (stretch): Insert maps2001 into stretch environment
* 17:22 mbsantos@deploy1001: Finished deploy [proton/deploy@5cb8bbe]: Update chromium-renderer to {{Gerrit|8988283}} ([[phab:T213362|T213362]], [[phab:T216191|T216191]], [[phab:T212322|T212322]]) (duration: 01m 33s)
* 17:21 mbsantos@deploy1001: Started deploy [proton/deploy@5cb8bbe]: Update chromium-renderer to {{Gerrit|8988283}} ([[phab:T213362|T213362]], [[phab:T216191|T216191]], [[phab:T212322|T212322]])
* 16:25 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 15:48 otto@deploy1001: scap-helm eventgate-analytics finished
* 15:47 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 15:47 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 15:42 otto@deploy1001: scap-helm eventgate-analytics finished
* 15:42 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 15:42 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 15:41 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 15:36 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@13d9ebb] (stretch): Update stretch instance with latest code (duration: 00m 22s)
* 15:35 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@13d9ebb] (stretch): Update stretch instance with latest code
* 15:23 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:503008{{!}}no-op comment update]] (duration: 01m 00s)
* 15:06 cdanis@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 14:53 paravoid: rebooting labnet1002
* 14:49 vgutierrez: uploaded acme-chief 0.16 to apt.wikimedia.org (buster) - [[phab:T207461|T207461]]
* 14:47 urandom: decommissioning cassandra-a, restbase2008 -- [[phab:T208087|T208087]]
* 14:46 akosiaris: cxserver Add gargage collections graphs under saturation. [[phab:T205911|T205911]]
* 14:18 Amir1: Deployment of Url shortener is done now
* 14:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy UrlShortener to metawiki, let's get the party started ([[phab:T108557|T108557]], [[phab:T44085|T44085]]) (duration: 01m 00s)
* 12:49 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: dc=codfw,cluster=maps,name=maps2001.codfw.wmnet
* 12:20 kartik@deploy1001: scap-helm cxserver finished
* 12:19 kartik@deploy1001: scap-helm cxserver cluster eqiad completed
* 12:19 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-eqiad-values.yaml production stable/cxserver [namespace: cxserver, clusters: eqiad]
* 12:16 kartik@deploy1001: scap-helm cxserver finished
* 12:16 kartik@deploy1001: scap-helm cxserver cluster codfw completed
* 12:15 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
* 12:12 kartik@deploy1001: scap-helm cxserver finished
* 12:12 kartik@deploy1001: scap-helm cxserver cluster staging completed
* 12:12 kartik@deploy1001: scap-helm cxserver upgrade -f cxserver-staging-values.yaml staging stable/cxserver [namespace: cxserver, clusters: staging]
* 11:40 zeljkof: EU SWAT finished
* 11:39 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:500692{{!}}Increase musical notation datatype string length limit (T218767)]] (duration: 01m 02s)
* 11:37 akosiaris@deploy1001: scap-helm cxserver finished
* 11:36 akosiaris@deploy1001: scap-helm cxserver cluster codfw completed
* 11:36 akosiaris@deploy1001: scap-helm cxserver upgrade -f cxserver-codfw-values.yaml production stable/cxserver [namespace: cxserver, clusters: codfw]
* 11:30 onimisionipe: removing maps2002 from cassandra cluster  due to dead node error
* 10:46 moritzm: upgrading remaining app servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 ([[phab:T203069|T203069]])
* 10:39 hashar: Upgrading CI Jenkins
* 10:21 volans: forcing puppet run on A:cp-upload_codfw
* 10:15 gehel: remove maps2001 from new cassandra cluster -[[phab:T198622|T198622]]
* 10:10 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 09:57 elukey: roll restart druid-coordinator/overlord on druid100[4-6] to pick up new jvm settings
* 09:01 moritzm: deployment servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 ([[phab:T203069|T203069]])
* 08:20 moritzm: upgrading remaining job runners to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 ([[phab:T203069|T203069]])
* 08:19 elukey: roll restart of druid-broker/historical on druid100[4-6] to pick up new settings
* 06:33 moritzm: uploaded jenkins 2.164.2 to apt.wikimedia.org (stretch-wikimedia / thirdparty/ci)
* 06:32 moritzm: uploaded jenkins 2.164.2 to apt.wikimedia.org (jessie-wikimedia / thirdparty)
* 06:24 moritzm: upgrading remaining API Servers to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 ([[phab:T203069|T203069]])
* 05:03 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove s3 ready only [[phab:T219115|T219115]] (duration: 00m 36s)
* 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Switchover s3 master eqiad from db1078 to db1075 [[phab:T219115|T219115]] (duration: 00m 36s)
* 05:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Set s3 on read-only [[phab:T219115|T219115]]  (duration: 00m 37s)
* 05:00 marostegui: Starting s3 failover from db1078 to db1075 - [[phab:T219115|T219115]]
* 04:32 marostegui: Disable puppet on db1078 and db1075 [[phab:T219115|T219115]]
* 04:18 marostegui: Start topology changes to move s3 slaves under db1075 [[phab:T219115|T219115]]
* 04:14 marostegui: Disable GTID on s3 hosts - https://phabricator.wikimedia.org/T219115
* 00:45 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/PageTriage/: UBN Fix for pageTriage and ORES [[phab:T220649|T220649]] (duration: 01m 04s)
* 00:12 twentyafterfour: deploying phabricator upgrade
 
== 2019-04-10 ==
* 20:43 urandom: decommissioning cassandra-c, restbase2007 -- [[phab:T208087|T208087]]
* 20:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert - Enabling api-request logging via eventgate-analytics for group1 wikis - [[phab:T214080|T214080]] (duration: 01m 00s)
* 19:48 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling api-request logging via eventgate-analytics for group1 wikis - [[phab:T214080|T214080]] (duration: 00m 59s)
* 19:42 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.25  refs [[phab:T206679|T206679]] (duration: 01m 48s)
* 19:40 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.25  refs [[phab:T206679|T206679]]
* 19:28 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.33.0-wmf.25  refs [[phab:T206679|T206679]]
* 19:26 XioNoX: enable sampling on cr2-eqiad external links, outbound
* 19:17 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.20 [keeping static files] (duration: 02m 18s)
* 19:14 ejegg: updated fundraising CiviCRM from {{Gerrit|d0e44a9e51}} to {{Gerrit|24b968b1f9}}
* 19:08 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.19 [keeping static files] (duration: 02m 22s)
* 17:44 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.18 [keeping static files] (duration: 02m 22s)
* 16:58 chaomodus: restarted nagios-nrpe-server on proton1001 (it died due to OOM)
* 16:51 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=druid1004.eqiad.wmnet
* 16:01 elukey: restart brokers on druid100[3-6] - locking after segments get deleted
* 15:46 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/includes/parser/DateFormatter.php: {{Gerrit|Ib2b3fb315dc93b}} / [[phab:T220563|T220563]] (duration: 01m 00s)
* 15:28 gilles@deploy1001: Synchronized php-1.33.0-wmf.25/includes/media/ThumbnailImage.php: [[phab:T216499|T216499]] Only apply high priority hint half the time (duration: 00m 59s)
* 15:26 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 everywhere (duration: 00m 21s)
* 15:26 oblivian@deploy1001: Started deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 everywhere
* 15:24 jforrester@deploy1001: Synchronized php-1.33.0-wmf.25/extensions/Score/: UBN Revert Score changes that broke VE [[phab:T220465|T220465]] (duration: 01m 01s)
* 15:19 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0 (duration: 00m 13s)
* 15:19 oblivian@deploy1001: Started deploy [docker-pkg/deploy@605690c]: Upgrade to docker-pkg 2.0.0
* 15:01 fsero: pooled back mwdebug200[1,2] [[phab:T219989|T219989]]
* 15:00 fsero: repooling mwdebug2002
* 15:00 jijiki: Enable puppet on thumbor1001, switch back to nginx, pool thumbor1004 - [[phab:T187765|T187765]]
* 14:57 fsero: repooling mwdebug2001
* 14:20 hashar: CI processing was a bit slower than usual over the past couple hours or so. It should be slightly faster now [[phab:T220606|T220606]]
* 14:13 joal@deploy1001: Finished deploy [analytics/aqs/deploy@fc1d232]: Deploying per-page limits for druid-endpoints (duration: 14m 41s)
* 13:58 joal@deploy1001: Started deploy [analytics/aqs/deploy@fc1d232]: Deploying per-page limits for druid-endpoints
* 13:47 fsero: resizing disk on mwdebug2002 [[phab:T219989|T219989]]
* 13:42 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on group0 ([[phab:T188327|T188327]]) (duration: 01m 00s)
* 13:19 marostegui: Deploy schema change on aawiki aawikibooks aawiktionary abwiki abwiktionary acewiki advisorswiki advisorywiki adywiki afwiki on x1 - [[phab:T136427|T136427]]
* 12:41 urandom: decommissioning cassandra-b, restbase2007 -- [[phab:T208087|T208087]]
* 12:40 hashar: contint2001: stopped puppet and zuul-merger for debugging
* 12:17 jbond42: rolling security update of systemd on stretch systems
* 12:07 Amir1: EU swat is done
* 12:07 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Prep work for deploying UrlShortener extension ([[phab:T108557|T108557]]), part II (duration: 01m 00s)
* 12:05 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Prep work for deploying UrlShortener extension ([[phab:T108557|T108557]]), part I (duration: 01m 00s)
* 11:46 dcausse: elastisearch search cluster: reindexing zh-min-nan wikis ([[phab:T219533|T219533]])
* 10:55 moritzm: upgrading nodejs on analytics-tool1002 to latest node 10 version from component/node10
* 10:46 gilles: [[phab:T220265|T220265]] setZoneAccess on all wikis finished
* 10:40 akosiaris: upgrade kubernetes-node on kubestage1002 (staging cluster) to 1.12.7-1 [[phab:T220405|T220405]]
* 10:33 moritzm: upgrading nodejs on aqs* to latest node 10 version from component/node10
* 10:25 fsero: resizing disk on mwdebug2001 [[phab:T219989|T219989]]
* 10:17 akosiaris: upload kubernetes_1.12.7-1 to apt.wikimedia.org/stretch-wikimedia component main [[phab:T220405|T220405]]
* 10:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1064 [[phab:T217453|T217453]] (duration: 00m 59s)
* 10:08 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1120 [[phab:T217453|T217453]] (duration: 01m 03s)
* 09:59 moritzm: upgrading labweb hosts (wikitech) to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 ([[phab:T203069|T203069]])
* 09:51 akosiaris: upgrade kubernetes-node on kubestage1001 (staging cluster) to 1.12.7-1 [[phab:T220405|T220405]]
* 09:50 moritzm: upgrading snapshot hosts to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 ([[phab:T203069|T203069]])
* 09:40 akosiaris: upgrade kubernetes-master on neon (staging cluster) to 1.12.7-1 [[phab:T220405|T220405]]
* 09:40 akosiaris: upgrade kubernetes-master on neon (staging cluster) to 1.12.7-1
* 09:05 moritzm: upgrading job runners mw1299-mw1311 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 ([[phab:T203069|T203069]])
* 08:56 elukey: restart druid-broker on druid100[4-6] - stuck after attempt datasource delete action
* 08:46 godog: roll-restart swift frontends - [[phab:T214289|T214289]]
* 08:36 elukey: update thirdparty/cloudera packages to cdh 5.16.1 for jessie/stretch-wikimedia - [[phab:T218343|T218343]]
* 08:26 onimisionipe@deploy1001: Finished deploy [kartotherian/deploy@f7518bb] (stretch): Insert maps2003 into stretch environment (duration: 00m 22s)
* 08:26 onimisionipe@deploy1001: Started deploy [kartotherian/deploy@f7518bb] (stretch): Insert maps2003 into stretch environment
* 08:12 gilles: [[phab:T220265|T220265]] foreachwiki extensions/WikimediaMaintenance/filebackend/setZoneAccess.php --backend local-multiwrite
* 07:22 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@efd5bd5]: Revert "Bifurcate imageinfo queries to improve performance" ([[phab:T220574|T220574]]) (duration: 04m 05s)
* 07:18 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@efd5bd5]: Revert "Bifurcate imageinfo queries to improve performance" ([[phab:T220574|T220574]])
* 07:12 onimisionipe: depooling maps200[34] to increase cassandra replication factor - [[phab:T198622|T198622]]
* 07:09 jijiki: Rolling restart thumbor service
* 07:08 jijiki: Upgrading Thumbor servers to python-thumbor-wikimedia to 2.4-1+deb9u1
* 06:59 marostegui: Deploy schema change on x1 master, with replication, lag will happen on x1 [[phab:T217453|T217453]]
* 06:59 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool x1 slaves [[phab:T217453|T217453]] (duration: 01m 13s)
* 05:52 _joe_: setting both mwdebug200{1,2} to pooled = inactive to remove them from scap dsh list and allow deployments, [[phab:T219989|T219989]]
* 05:12 _joe_: same on mwdebug2001
* 05:08 _joe_: removing hhvm cache on mwdebug2002
* 00:37 Krinkle: last scap sync-file failed to mwdebug2002.codfw and mwdebug2001.codfw due to insufficient disk space
* 00:20 krinkle@deploy1001: Synchronized php-1.33.0-wmf.25/resources/src/startup/: {{Gerrit|I3b9f1a13379a}} / {{Gerrit|Ie9db60e417cca}} (duration: 01m 01s)
 
== 2019-04-09 ==
* 23:14 twentyafterfour@deploy1001: Pruned MediaWiki: 1.33.0-wmf.17 [keeping static files] (duration: 06m 03s)
* 22:31 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.33.0-wmf.25  refs [[phab:T206679|T206679]] (duration: 39m 59s)
* 22:19 chaomodus: uploaded python-pynetbox to apt.wikimedia.org/stretch-wikimedia ([[phab:T217072|T217072]])
* 22:13 mobrovac@deploy1001: Finished deploy [restbase/deploy@c0a2977]: Bring RB on restbase20(19{{!}}20) up to date - [[phab:T208087|T208087]] (duration: 02m 32s)
* 22:11 mobrovac@deploy1001: Started deploy [restbase/deploy@c0a2977]: Bring RB on restbase20(19{{!}}20) up to date - [[phab:T208087|T208087]]
* 21:57 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.33.0-wmf.25  refs [[phab:T206679|T206679]]
* 21:48 urandom: decommissioning cassandra-a, restbase2007 -- [[phab:T208087|T208087]]
* 19:46 herron: added myself to ldap group cn=archiva-deployers,ou=groups,dc=wikimedia,dc=org
* 19:10 twentyafterfour: branching 1.33.0-wmf.25
* 18:53 crusnov@deploy1001: Finished deploy [netbox/deploy@018d83e]: Minor fix to Netbox-Ganeti sync script (duration: 00m 52s)
* 18:52 crusnov@deploy1001: Started deploy [netbox/deploy@018d83e]: Minor fix to Netbox-Ganeti sync script
* 18:50 thcipriani: gerrit back
* 18:48 thcipriani: gerrit restart
* 18:48 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@43d2d2e]: Gerrit update (cobalt) -- restart incoming (duration: 00m 10s)
* 18:47 thcipriani@deploy1001: Started deploy [gerrit/gerrit@43d2d2e]: Gerrit update (cobalt) -- restart incoming
* 18:46 thcipriani@deploy1001: Finished deploy [gerrit/gerrit@43d2d2e]: Gerrit update (gerrit2001 only) (duration: 00m 10s)
* 18:46 thcipriani@deploy1001: Started deploy [gerrit/gerrit@43d2d2e]: Gerrit update (gerrit2001 only)
* 18:42 volans: restart icinga on icinga1001 - [[phab:T196336|T196336]]
* 18:38 cdanis: [[phab:T196336|T196336]] cdanis@icinga1001$ sudo systemctl restart nsca
* 18:27 crusnov@deploy1001: Finished deploy [netbox/deploy@4aa3e47]: Add node sync to Netbox-Ganeti sync script - [[phab:T215229|T215229]] (duration: 00m 57s)
* 18:26 crusnov@deploy1001: Started deploy [netbox/deploy@4aa3e47]: Add node sync to Netbox-Ganeti sync script - [[phab:T215229|T215229]]
* 18:11 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 03s)
* 18:11 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
* 18:07 urandom: bootstrapping cassandra-c, restbase2020 -- [[phab:T208087|T208087]]
* 17:58 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 02s)
* 17:58 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
* 17:56 elukey: restart keyholder-agent on deploy1001 to pick up new settings for analytics (+ arm all the keys)
* 17:42 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 04s)
* 17:42 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
* 17:42 elukey: restart keyholder-proxy.service on deploy1001 as attempt to reload perms for the analytics_deploy key
* 17:37 gilles@deploy1001: Finished deploy [performance/asoranking@4c83130]: (no justification provided) (duration: 00m 10s)
* 17:37 gilles@deploy1001: Started deploy [performance/asoranking@4c83130]: (no justification provided)
* 17:19 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@b04c397]: Update mobileapps to {{Gerrit|3edfcad}} ([[phab:T220045|T220045]] [[phab:T219411|T219411]] [[phab:T219667|T219667]]) (duration: 03m 50s)
* 17:15 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@b04c397]: Update mobileapps to {{Gerrit|3edfcad}} ([[phab:T220045|T220045]] [[phab:T219411|T219411]] [[phab:T219667|T219667]])
* 17:14 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.24/includes/export/WikiExporter.php: deploy https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502537/1 (duration: 00m 51s)
* 17:09 twentyafterfour@deploy1001: Synchronized php-1.33.0-wmf.24/includes/export/XmlDumpWriter.php: deploy https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502538/1 (duration: 00m 52s)
* 17:04 gilles@deploy1001: Synchronized php-1.33.0-wmf.24/includes/specials/SpecialUploadStash.php: [[phab:T220265|T220265]] Add support for X-Swift-Secret to upload stash (duration: 00m 53s)
* 17:03 twentyafterfour: deploying https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502538/1 and https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502537/1
* 17:01 arturo: [[phab:T220426|T220426]] reimaging+renaming labtestnet2002 to cloudweb2001-dev
* 16:49 otto@deploy1001: scap-helm eventgate-analytics finished
* 16:49 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
* 16:49 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
* 16:46 otto@deploy1001: scap-helm eventgate-analytics finished
* 16:46 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
* 16:46 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
* 16:45 otto@deploy1001: scap-helm eventgate-analytics finished
* 16:45 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 16:45 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 16:41 herron: performing rolling restart of kafka main brokers and eventbus instances in eqiad to pick up security updates
* 16:32 otto@deploy1001: scap-helm eventgate-analytics finished
* 16:32 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 16:32 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 16:28 jijiki: Restarting thumbor service on thumbor1001
* 16:26 jijiki: Upgrading thumbor1001 to python-thumbor-wikimedia_2.4-1+deb9u1
* 16:18 jijiki: Uploading python-thumbor-wikimedia_2.4-1+deb9u1 to component/thumbor in stretch-wikimedia
* 15:05 moritzm: uploaded jenkins 2.164.1 for stretch-wikimedia/thirdparty/ci
* 15:04 moritzm: uploaded jenkins 2.164.1 for jessie-wikimedia/thirdparty
* 14:42 ejegg: updated payments-wiki from {{Gerrit|15bcb3d1a6}} to {{Gerrit|aa8dad50e7}}
* 14:10 ema: reboot lvs2010 with systemd 232 [[phab:T209707|T209707]]
* 14:09 godog: bootstrapping cassandra-b, restbase2020 -- [[phab:T208087|T208087]]
* 13:19 godog: bounce rsyslog on wezen
* 13:11 fsero: building envoy docker image
* 13:07 jbond42: rolling security updates of systemd on canary systems
* 12:35 godog: bounce rsyslog on lithium
* 12:13 elukey: powercycle logstash1012 - no ssh, no mgmt console available, seems completely stuck
* 12:10 jbond42: remove facter2.4  from wikimedia-buster
* 11:27 moritzm: upgrading API servers mw1276-mw1290 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 ([[phab:T203069|T203069]])
* 11:07 akosiaris: pool both DCs for newly created swift.recovery.wmnet RR
* 11:07 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=.*,dnsdisc=swift
* 11:00 ema: rebooting lvs2010 with systemd 241-1~bpo9+1 [[phab:T209707|T209707]]
* 10:57 moritzm: updated buster installer to daily build from 9th of April
* 10:09 godog: bootstrapping cassandra-a, restbase2020 -- [[phab:T208087|T208087]]
* 10:07 moritzm: rebooting stat1005 for some tests again
* 09:49 gilles@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/NavigationTiming: [[phab:T220476|T220476]] Add originCountry to paintTiming context (duration: 00m 54s)
* 09:46 moritzm: rebooting stat1005 for some tests
* 08:47 akosiaris: switch swift to be accessed from varnish+ats active/active rw
* 08:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove old comment from db1089 (duration: 00m 51s)
* 08:33 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2069 (duration: 00m 50s)
* 08:10 marostegui: Upgrade db2069
* 08:10 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2069 (duration: 00m 51s)
* 07:52 moritzm: upgrading app servers mw1319-mw1333 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 ([[phab:T203069|T203069]])
* 07:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Deploy parsercache key change everywhere [[phab:T210725|T210725]] (duration: 00m 53s)
* 07:37 moritzm: installing samba security updates
* 07:21 marostegui: Change parsercache keys on mw[1230-1235,1238-1239] - [[phab:T210725|T210725]]
* 07:10 jijiki: Depool thumbor1004 for testing - [[phab:T187765|T187765]]
* 07:09 marostegui: Change parsercache keys on mw[1221-1229] - [[phab:T210725|T210725]]
* 07:03 marostegui: Change parsercache keys on mw[1280-1289] - [[phab:T210725|T210725]]
* 06:51 dcausse: elasticsearch search cluster: reindex all spaceless languages in eqiad and codfw ([[phab:T219533|T219533]])
* 06:47 moritzm: installing libav security updates
* 06:39 marostegui: Change parsercache keys on mw[1260-1269] - [[phab:T210725|T210725]]
* 06:30 marostegui: Change parsercache keys on mw[1270-1279] - [[phab:T210725|T210725]]
* 06:01 marostegui: Deploy parsercache key change on canaries only - [[phab:T210725|T210725]]
* 03:23 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/ExternalGuidance/extension.json: {{Gerrit|Id04a3a4f40a884}} / [[phab:T219841|T219841]] (duration: 00m 52s)
* 03:16 onimisionipe: depooled maps2003 - [[phab:T219849|T219849]]
* 02:47 onimisionipe: restarting tilerator on maps2003 - [[phab:T219849|T219849]]
* 02:40 krinkle@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/ExternalGuidance/extension.json: {{Gerrit|I8614f63960bc763}} / [[phab:T219841|T219841]] (duration: 00m 53s)
* 01:27 eileen: civicrm revision changed from {{Gerrit|dfe89516b3}} to {{Gerrit|d0e44a9e51}}, config revision is {{Gerrit|2bcbf44521}}
* 00:45 urandom: bootstrapping cassandra-c, restbase2019 -- [[phab:T208087|T208087]]
* 00:07 ebernhardson@deploy1001: Synchronized wmf-config/: [[phab:T218716|T218716]]: Migrade configs to WikibaseCirrusSearch (duration: 00m 51s)
 
== 2019-04-08 ==
* 23:57 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T218954|T218954]]: Enable WBCS search on commons too (duration: 00m 50s)
* 23:45 ebernhardson@deploy1001: Synchronized wmf-config: [[phab:T218954|T218954]]: Disable wbcs dispatching query builder on commons (3/3) (duration: 00m 52s)
* 23:41 ebernhardson@deploy1001: Synchronized wmf-config: [[phab:T218954|T218954]]: Disable wbcs dispatching query builder on commons (3/3) (duration: 00m 51s)
* 23:33 ebernhardson@deploy1001: Synchronized wmf-config/Wikibase.php: [[phab:T218954|T218954]]: Disable wbcs dispatching query builder on commons (2/3) (duration: 00m 52s)
* 23:10 ebernhardson@deploy1001: Synchronized wmf-config/: [[phab:T218954|T218954]]: Disable wbcs dispatching query builder on commons (1/3) (duration: 00m 52s)
* 22:45 XioNoX: rollback enable sampling on cr2-eqiad external links
* 22:29 XioNoX: enable sampling on cr2-eqiad external links
* 22:18 XioNoX: enable sampling on eqiad Telia transit link
* 22:04 jforrester@deploy1001: Synchronized php-1.33.0-wmf.24/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: WBMI  [[phab:T220277|T220277]] (duration: 00m 57s)
* 22:01 XioNoX: pfw firewall rules update - [[phab:T217355|T217355]]
* 20:49 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@c7fa522]: Update mobileapps to {{Gerrit|cdb9928}} ([[phab:T220045|T220045]] [[phab:T219411|T219411]] [[phab:T219667|T219667]]) (duration: 07m 55s)
* 20:41 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@c7fa522]: Update mobileapps to {{Gerrit|cdb9928}} ([[phab:T220045|T220045]] [[phab:T219411|T219411]] [[phab:T219667|T219667]])
* 20:24 urandom: bootstrapping cassandra-b, restbase2019 -- [[phab:T208087|T208087]]
* 20:08 bearND: mobileapps deploy failed on canary (Check 'endpoints' failed). Rolled back canary.
* 20:08 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@c7fa522]: Update mobileapps to {{Gerrit|cdb9928}} ([[phab:T220045|T220045]] [[phab:T219411|T219411]] [[phab:T219667|T219667]]) (duration: 02m 10s)
* 20:05 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@c7fa522]: Update mobileapps to {{Gerrit|cdb9928}} ([[phab:T220045|T220045]] [[phab:T219411|T219411]] [[phab:T219667|T219667]])
* 19:59 marxarelli: promotion of 1.33.0-wmf.24 to all wikis completed. error rates nominal aside from usual timeouts. cc: [[phab:T206678|T206678]], [[phab:T220037|T220037]]
* 19:51 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.33.0-wmf.24
* 19:48 marxarelli: promoting 1.33.0-wmf.24 to all wikis. cc: [[phab:T220037|T220037]], [[phab:T206678|T206678]]
* 19:41 dduvall@deploy1001: Synchronized php: group1 wikis to 1.33.0-wmf.24 (duration: 01m 46s)
* 19:41 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.24
* 19:41 marxarelli: dduvall@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.2
* 19:35 marxarelli: starting promotion of 1.33.0-wmf.24 to group1
* 18:45 Lucas_WMDE: Morning SWAT done
* 18:31 bblack: deploying wiktionary CNAME experiment - https://phabricator.wikimedia.org/T208263#5094712
* 18:27 mobrovac@deploy1001: Finished deploy [restbase/deploy@9cf5364]: Lower AQS rate limits and fix recommendation-api spec - [[phab:T219910|T219910]] [[phab:T220221|T220221]] (duration: 21m 14s)
* 18:27 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable eventgate-analytics api-request logging for group0 wikis - [[phab:T214080|T214080]] (duration: 00m 56s)
* 18:24 mobrovac: restart pdfrender on scb2001 - [[phab:T174916|T174916]]
* 18:13 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:13 otto@deploy1001: scap-helm eventgate-analytics cluster eqiad completed
* 18:12 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-eqiad-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: eqiad]
* 18:12 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:12 otto@deploy1001: scap-helm eventgate-analytics cluster codfw completed
* 18:12 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
* 18:10 otto@deploy1001: scap-helm eventgate-analytics upgrade production -f eventgate-analytics-codfw-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: codfw]
* 18:09 otto@deploy1001: scap-helm eventgate-analytics finished
* 18:09 otto@deploy1001: scap-helm eventgate-analytics cluster staging completed
* 18:09 otto@deploy1001: scap-helm eventgate-analytics upgrade staging -f eventgate-analytics-staging-values.yaml --reset-values stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging]
* 18:06 mobrovac@deploy1001: Started deploy [restbase/deploy@9cf5364]: Lower AQS rate limits and fix recommendation-api spec - [[phab:T219910|T219910]] [[phab:T220221|T220221]]
* 17:50 arturo: [[phab:T220129|T220129]] renaming labtestmetal2001.codfw.wmnet to clouddb2001-dev.codfw.wmnet
* 17:42 XioNoX: add swift term to cr1/2-eqiad - [[phab:T220081|T220081]]
* 17:14 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@c30a540]: GUI updates, Updater with redirect fix and Blazegraph with XSS fix (duration: 11m 17s)
* 17:03 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@c30a540]: GUI updates, Updater with redirect fix and Blazegraph with XSS fix
* 16:59 mobrovac@deploy1001: Finished deploy [mobileapps/deploy@64f09a0]: Force-deploy to scb1001 to test the config perms (duration: 00m 16s)
* 16:59 mobrovac@deploy1001: Started deploy [mobileapps/deploy@64f09a0]: Force-deploy to scb1001 to test the config perms
* 16:55 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Replace needed WikimediaEditorTasks Beta Cluster config ([[phab:T220153|T220153]]) (duration: 00m 58s)
* 16:31 urandom: bootstrapping cassandra-a, restbase2019 -- [[phab:T208087|T208087]]
* 15:35 herron: aborting ores to logstash kafka logging pipeline switchover for now. puppet applied only to ores2009, reverting now
* 15:19 herron: switching ores to logstash kafka logging pipeline (via temporary puppet disable and rolling puppet agent runs)
* 15:09 jijiki: Pool mw2206 - [[phab:T215415|T215415]]
* 14:55 papaul: powering down mw2206  for DIMM replacement
* 14:49 otto@deploy1001: Finished deploy [analytics/refinery@7fa6fb7]: deploying oozie article recommender for baho (duration: 18m 35s)
* 14:45 papaul: powering down elastic2048 for disk replacement
* 14:30 otto@deploy1001: Started deploy [analytics/refinery@7fa6fb7]: deploying oozie article recommender for baho
* 14:17 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting actor migration to write-both/read-new on test wikis and mediawikiwiki ([[phab:T188327|T188327]]) (duration: 00m 59s)
* 14:06 jijiki: Temporarily serve thumbor traffic on thumbor1001 via haproxy - [[phab:T187765|T187765]]
* 13:41 moritzm: upgrading job runners in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 ([[phab:T203069|T203069]])
* 12:31 hashar: contint2001: upgraded python-pbr 0.8.2-1 -> 1.10.0-1 # [[phab:T218559|T218559]]
* 12:25 moritzm: upgrading API servers in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 ([[phab:T203069|T203069]])
* 12:06 arturo: reboot cloudvirt1009 to clean some ACPI errors in dmesg
* 12:03 arturo: [[phab:T219776|T219776]] puppet node deactivate labtestnet2003.codfw.wmnet
* 12:00 hashar: contint1001 upgraded zuul to 2.5.1-wmf6 # [[phab:T208426|T208426]]
* 11:53 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: WikibaseClient: Conditionally enable mapframe support ([[phab:T218051|T218051]]) (duration: 00m 58s)
* 11:48 hashar: contint2001: stopping zuul-server , it is not meant to be running there
* 11:41 hoo@deploy1001: Synchronized wmf-config/abusefilter.php: Enable blocking feature of AbuseFilter in zh.wikipedia ([[phab:T210364|T210364]]) (duration: 00m 58s)
* 11:25 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php:  Create uploader user group for thwiki ([[phab:T216615|T216615]]) (duration: 00m 58s)
* 11:12 jijiki: Restarted thumbor services after librsvg upgrade
* 11:11 fsero: upgrading envoy to 1.9.1 [[phab:T215810|T215810]]
* 10:42 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:502190{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 10:41 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:502190{{!}} Bumping portals to master (T128546)]] (duration: 00m 59s)
* 10:34 moritzm: upgrading app servers in codfw to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 ([[phab:T203069|T203069]])
* 10:23 jijiki: Running debdeploy to upgrade librsvg
* 09:43 gehel: force allocation of 3 unassigned shards on elasticsearch / cirrus / eqiad
* 09:30 arturo: [[phab:T219776|T219776]] puppet node clean labtestnet2003.codfw.wmnet
* 09:20 volans: restarting icinga on icinga1001 - [[phab:T196336|T196336]]
* 08:45 moritzm: upgrading API servers mw1221-mw1235 to HHVM 3.18.5+dfsg-1+wmf8+deb9u2 and wikidiff 1.8.1 ([[phab:T203069|T203069]])
* 08:34 akosiaris@deploy1001: scap-helm zotero finished
* 08:34 akosiaris@deploy1001: scap-helm zotero cluster staging completed
* 08:34 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-staging.yaml --reset-values staging stable/zotero [namespace: zotero, clusters: staging]
* 08:32 akosiaris@deploy1001: scap-helm zotero finished
* 08:32 akosiaris@deploy1001: scap-helm zotero cluster eqiad completed
* 08:32 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-eqiad.yaml production stable/zotero [namespace: zotero, clusters: eqiad]
* 08:32 akosiaris: lower CPU, memory limits for zotero pods. Set 1 cpu, 700Mi. This should help the pods to recover faster in some cases. The old memory leak issues we used to have seem to be no longer present
* 08:31 akosiaris@deploy1001: scap-helm zotero finished
* 08:31 akosiaris@deploy1001: scap-helm zotero cluster codfw completed
* 08:31 akosiaris@deploy1001: scap-helm zotero upgrade -f zotero-values-codfw.yaml production stable/zotero [namespace: zotero, clusters: codfw]
* 08:17 godog: delete fundraising folder from public grafana - [[phab:T219825|T219825]]
<