You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(bstorm_: updated views on labsdb1010 T252219)
imported>Stashbot
(cwhite: draining shards from logstash1010, logstash1033, logstash1034, logstash1035 - T321410)
 
(830 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== 2020-05-29 ==
== 2022-12-03 ==
* 22:32 bstorm_: updated views on labsdb1010 [[phab:T252219|T252219]]
* 00:17 cwhite: draining shards from logstash1010, logstash1033, logstash1034, logstash1035 - [[phab:T321410|T321410]]
* 20:55 bstorm_: updating views on labsdb1011 [[phab:T252219|T252219]]
* 19:27 ryankemper: Successfully finished a rolling restart of the `cloudelastic` clusters (chi, psi, omega) as part of elasticsearch plugins upgrade. Host and service checks re-enabled.
* 17:28 bstorm_: updating views on labsdb1009 [[phab:T252219|T252219]]
* 16:50 ryankemper: Performing a rolling restart of the `cloudelastic` clusters (chi, psi, omega) as part of elasticsearch plugins upgrade. Host and service checks disabled.
* 16:00 bstorm_: Updating views on labsdb1012 [[phab:T252219|T252219]]
* 15:59 ryankemper: Concluded rolling restart of the `relforge` clusters as part of elasticsearch plugins upgrade. Both hosts `relforge1001` and `relforge1002` are back up. Downtime lifted.
* 15:29 ryankemper: Performing a rolling restart of the `relforge` clusters as part of elasticsearch plugins upgrade
* 14:59 cdanis: disabling puppet on netflow* to deploy {{Gerrit|Ic71e96f0}} [[phab:T253128|T253128]]
* 14:47 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:47 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 14:41 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:41 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 14:35 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:35 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 14:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:24 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:15 mdholloway: ran extensions/MachineVision/maintenance/removeBlacklistedSuggestions.php on commonswiki ([[phab:T253821|T253821]])
* 12:49 hnowlan: reimaging restbase2009 after disk replacement
* 12:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 12:15 godog: roll-restart to upgrade thanos to 0.13.0rc0 - [[phab:T252186|T252186]] [[phab:T233956|T233956]]
* 11:32 moritzm: installing cups security updates (client-side libs/tools)
* 11:01 ema: upload prometheus-rdkafka-exporter 0.2 to buster-wikimedia [[phab:T253551|T253551]]
* 10:53 moritzm: updating mwdebug2002 to 7.2.31
* 10:02 marostegui: Compress InnoDB on db1138 [[phab:T232446|T232446]]
* 08:30 godog: update swift uid/gid on thanos hosts - [[phab:T123918|T123918]]
* 08:04 mutante: phabricator - restarted apache2 - back for me now
* 08:03 XioNoX: add new AMS-IX link to LACP bundle
* 08:01 mutante: phabricator - broken due to "PhabricatorRepositoryMirrorEngine::pushToGitRepository" starting git process that uses 100% CPU, stopped phd service
* 07:56 mutante: phabricator - killed pid 25070 (git) which used 100% of CPU, restarted phd service
* 07:25 moritzm: updating perf on buster systems to new version from 10.4 point release
* 07:15 moritzm: installing el-api update from latest Buster point release
* 07:12 moritzm: installing xdg-utils update from latest Buster point release
* 07:11 mutante: mw1293 (canary jobrunner ) replace apache2.conf with version from mwdebug1001, restart apache, to debug for [[phab:T190111|T190111]]
* 07:00 moritzm: installing rake security updates
* 06:36 mutante: deneb - systemctl start docker-reporter-releng-images
* 05:20 marostegui: Deploy schema change on db1138 (no longer s4 master) - [[phab:T250055|T250055]]
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1081 to s4 master and remove read-only from s4 [[phab:T253808|T253808]]', diff saved to https://phabricator.wikimedia.org/P11334 and previous config saved to /var/cache/conftool/dbconfig/20200529-050224-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance [[phab:T253808|T253808]]', diff saved to https://phabricator.wikimedia.org/P11333 and previous config saved to /var/cache/conftool/dbconfig/20200529-050153-marostegui.json
* 05:00 marostegui: Starting s4 failover from db1138 to db1081 -[[phab:T253808|T253808]]
* 04:25 marostegui: Start topology changes in s4 - [[phab:T253808|T253808]]


== 2020-05-28 ==
== 2022-12-02 ==
* 23:48 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/skins/Vector/resources/skins.vector.styles/Menu.less: [[phab:T253912|T253912]] Hotfix: Cannot rename emptyPortlet to empty-portlet yet (duration: 00m 59s)
* 19:42 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:41 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/extensions/WikibaseMediaInfo/src/Services/FilePageLookup.php: [[phab:T253792|T253792]] Follow-up {{Gerrit|1827c7a}}: Ensure inNamespace() is called only on Title object (duration: 00m 58s)
* 19:42 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
* 22:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T253821|T253821]] Update MachineVision block list for 2020-05-27 (duration: 00m 57s)
* 19:41 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
* 22:09 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Move one CheckUser right change next to the other (duration: 00m 57s)
* 19:39 volans@cumin1001: START - Cookbook sre.dns.netbox
* 22:06 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove version wrapper around wgOverrideUcfirstCharacters; always true (duration: 00m 59s)
* 19:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:48 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.34
* 19:37 volans@cumin1001: START - Cookbook sre.dns.netbox
* 21:26 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/includes/filerepo/FileRepo.php: [[phab:T253922|T253922]] Mark two FileRepo functions public (duration: 01m 07s)
* 19:36 volans: fixed git checkout permissions [[phab:T324334|T324334]]
* 21:12 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/includes/specials/SpecialUserrights.php: [[phab:T253909|T253909]] Restore visibility (previously implicitely public) (duration: 01m 06s)
* 19:11 sukhe: restart pybal on lvs5004
* 20:38 jforrester@deploy1001: Synchronized php-1.35.0-wmf.32/skins/Vector/resources/skins.vector.styles: [[phab:T253905|T253905]] HOTFIX: Do not apply p-personal absolute positioning to all menus (duration: 01m 07s)
* 19:07 mutante: gitlab-runner* - upgrading gitlab-runner package version
* 20:22 shdubsh: restart varnishmtail and atsmtail eqsin
* 18:55 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 863383"
* 20:11 shdubsh: restart ncredirmtail on ncredir5001
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5001.eqsin.wmnet
* 19:20 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: roll back the train due to [[phab:T253905|T253905]]
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:20 twentyafterfour: group2 back to wmf.32 due to [[phab:T253905|T253905]]
* 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 19:20 milimetric@deploy1001: Finished deploy [analytics/refinery@f6d73c8] (thin): Hotfix #2 today (thin): forgot jars [analytics/refinery@f6d73c8] (duration: 00m 09s)
* 18:51 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
* 19:20 milimetric@deploy1001: Started deploy [analytics/refinery@f6d73c8] (thin): Hotfix #2 today (thin): forgot jars [analytics/refinery@f6d73c8]
* 18:49 sukhe@cumin2002: START - Cookbook sre.dns.netbox
* 19:17 milimetric@deploy1001: Finished deploy [analytics/refinery@f6d73c8]: Hotfix #2 today: forgot jars [analytics/refinery@f6d73c8] (duration: 16m 54s)
* 18:44 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5001.eqsin.wmnet
* 19:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.34  refs [[phab:T253022|T253022]]
* 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
* 19:01 shdubsh: restart varnishmtail and atsmtail on cp5001.eqsin.wmnet
* 18:21 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
* 19:00 milimetric@deploy1001: Started deploy [analytics/refinery@f6d73c8]: Hotfix #2 today: forgot jars [analytics/refinery@f6d73c8]
* 18:20 sukhe: decomm lvs5001: restarting pybal
* 17:03 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.34  refs [[phab:T253022|T253022]] (duration: 01m 06s)
* 18:14 sukhe: cr[23]-eqsin*: set routing-options static route 103.102.166.224/28 next-hop 10.132.0.39
* 17:02 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.34  refs [[phab:T253022|T253022]]
* 18:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:32 jforrester@deploy1001: Synchronized php-1.35.0-wmf.34/extensions/Wikibase: [[phab:T253804|T253804]] Use ThrowingEntityTermStoreWriter when writers shouldn't be called (duration: 01m 15s)
* 18:05 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
* 15:37 milimetric@deploy1001: Finished deploy [analytics/refinery@203d182] (thin): Three hotfixes (THIN) [analytics/refinery@203d182] (duration: 00m 10s)
* 18:03 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
* 15:37 milimetric@deploy1001: Started deploy [analytics/refinery@203d182] (thin): Three hotfixes (THIN) [analytics/refinery@203d182]
* 18:01 volans@cumin1001: START - Cookbook sre.dns.netbox
* 15:05 milimetric@deploy1001: Finished deploy [analytics/refinery@203d182]: Three hotfixes [analytics/refinery@203d182] (duration: 25m 59s)
* 18:00 volans: performed git gc on all (auth)dns hosts in /srv/git/netbox_dns_snippets - [[phab:T324334|T324334]]
* 15:02 moritzm: installing exim4 security updates on jessie (stretch/buster already fixed)
* 17:36 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862944"
* 14:39 milimetric@deploy1001: Started deploy [analytics/refinery@203d182]: Three hotfixes [analytics/refinery@203d182]
* 16:56 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 14:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:53 jnuche@deploy1002: Finished scap: testing k8s deployment (duration: 08m 35s)
* 14:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:49 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 14:01 ema: atskafka 0.8 uploaded to buster-wikimedia [[phab:T253551|T253551]]
* 16:49 bblack: (above agent runs completed on all text nodes for requestctl-for-misc patch)
* 13:49 godog: roll-restart prometheus k8s-staging to enable thanos upload - [[phab:T252186|T252186]]
* 16:44 jnuche@deploy1002: Started scap: testing k8s deployment
* 13:36 hashar: Restarting CI Jenkins for plugin rollback
* 16:44 bblack: running agent on A:cp-text for https://gerrit.wikimedia.org/r/c/operations/puppet/+/863375 (requestctl for misc)
* 11:49 moritzm: installing unbound security updates
* 16:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 11:03 kormat@cumin1001: dbctl commit (dc=all): 'Add db2138 to s2+s4 [[phab:T252985|T252985]]', diff saved to https://phabricator.wikimedia.org/P11330 and previous config saved to /var/cache/conftool/dbconfig/20200528-110333-kormat.json
* 16:28 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs5004.eqsin.wmnet with OS buster
* 10:36 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 16:21 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 10:34 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 16:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 10:30 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 16:02 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
* 10:02 mutante: gerrit1002 (test server) - chown -R gerrit2:gerrit2 /var/lib/gerrit/review_site ; restarted gerrit service, now the service is not in restart loop anymore, gerrit-ssh is listening too, just not accepting publickey ([[phab:T239151|T239151]])
* 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
* 09:51 XioNoX: failover VRRP in ulsfo
* 15:55 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 09:41 XioNoX: re-activate peering/transit on cr2-eqdfw - [[phab:T243080|T243080]]
* 15:48 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862998"
* 09:35 mutante: restarting gerrit on gerrit1002 after fixing db_pass to the readonly one ([[phab:T243800|T243800]])
* 15:47 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 09:33 XioNoX: restart cr2-eqdfw for upgrade - [[phab:T243080|T243080]]
* 15:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster
* 09:30 XioNoX: deactivate peering/transit on cr2-eqdfw - [[phab:T243080|T243080]]
* 15:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 09:25 _joe_: updating ACLs on all etcd servers
* 15:40 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 09:22 XioNoX: install new Junos on cr2-eqdfw - [[phab:T243080|T243080]]
* 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 09:16 XioNoX: rollback cr2-eqord ospf/bgp - [[phab:T243080|T243080]]
* 15:33 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 09:07 XioNoX: restart cr2-eqord for upgrade - [[phab:T243080|T243080]]
* 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 09:05 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 15:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
* 08:50 _joe_: upgrading etcd ACLs (adding new users) to conf1004
* 15:28 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 08:50 XioNoX: install new Junos on cr2-eqord - [[phab:T243080|T243080]]
* 15:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 08:46 XioNoX: deactivate peering/transit on cr2-eqord - [[phab:T243080|T243080]]
* 15:22 bking@cumin2002: START - Cookbook sre.wdqs.restart
* 08:45 XioNoX: de-pref all OSPF links to cr2-eqord - [[phab:T243080|T243080]]
* 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 08:13 marostegui: Pool db1141 into labsdb analytics role - [[phab:T249188|T249188]]
* 15:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 07:33 gilles@deploy1001: Synchronized static/images: [[phab:T252108|T252108]] Deploying optimised static PNGs (duration: 01m 39s)
* 15:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
* 07:31 gilles@deploy1001: Synchronized static/apple-touch: [[phab:T252108|T252108]] Deploying optimised static PNGs (duration: 01m 12s)
* 15:06 volans: run `git gc` on /srv/netbox-exports/dns.git on netbox[12]002 - [[phab:T324334|T324334]]
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1081 from API and set its weight to 0 on main traffic - preparation for tomorrow's failover [[phab:T253808|T253808]]', diff saved to https://phabricator.wikimedia.org/P11329 and previous config saved to /var/cache/conftool/dbconfig/20200528-063037-marostegui.json
* 14:48 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS buster
* 04:44 marostegui: Run check_private data on db1141 - [[phab:T249188|T249188]]
* 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
* 04:22 marostegui: Stop MySQL on db1141 - [[phab:T249188|T249188]]
* 12:09 jynus: dropping all databases from db1133
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5001.eqsin.wmnet
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5001.eqsin.wmnet
* 10:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
* 10:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
* 10:01 vgutierrez: upload acme-chief 0.36 to apt.wm.o (bullseye) - [[phab:T321309|T321309]]
* 09:58 moritzm: installing publicsuffix updates from bullseye/buster point releases
* 09:54 moritzm: installing debootstrap updates from bullseye point release
* 09:53 moritzm: rebalance ganeti codfw/C [[phab:T323222|T323222]]
* 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42215 and previous config saved to /var/cache/conftool/dbconfig/20221202-091126-root.json
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42214 and previous config saved to /var/cache/conftool/dbconfig/20221202-085621-root.json
* 08:41 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 08:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42213 and previous config saved to /var/cache/conftool/dbconfig/20221202-084116-root.json
* 08:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:40 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42212 and previous config saved to /var/cache/conftool/dbconfig/20221202-082611-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42211 and previous config saved to /var/cache/conftool/dbconfig/20221202-081106-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42210 and previous config saved to /var/cache/conftool/dbconfig/20221202-075601-root.json
* 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P42209 and previous config saved to /var/cache/conftool/dbconfig/20221202-074300-ladsgroup.json
* 07:41 moritzm: draining ganeti5001 for eventual decom [[phab:T322048|T322048]]
* 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P42208 and previous config saved to /var/cache/conftool/dbconfig/20221202-072755-ladsgroup.json
* 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P42207 and previous config saved to /var/cache/conftool/dbconfig/20221202-071250-ladsgroup.json
* 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P42206 and previous config saved to /var/cache/conftool/dbconfig/20221202-065745-ladsgroup.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P42204 and previous config saved to /var/cache/conftool/dbconfig/20221202-061259-marostegui.json
* 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(45{{!}}46).eqiad.wmnet,cluster=jobrunner
* 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(39{{!}}40).eqiad.wmnet,cluster=videoscaler
* 00:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster


== 2020-05-27 ==
== 2022-12-01 ==
* 23:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add autoreviewrestore right to rollbacker group on hiwiki ([[phab:T252986|T252986]]) (duration: 01m 05s)
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1347-1348].eqiad.wmnet
* 23:16 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add thwiki Draft namespace to wmgExemptFromUserRobotsControlExtra and enable VE there ([[phab:T252959|T252959]]) (duration: 01m 06s)
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:58 gehel@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 22:02 crusnov@deploy1001: Finished deploy [netbox/deploy@5251cf1]
* 23:45 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:43 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 23:37 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1347-1348].eqiad.wmnet
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1327-1346].eqiad.wmnet
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:34 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
* 23:31 rzl@cumin1001: START - Cookbook sre.dns.netbox
* 22:59 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1327-1346].eqiad.wmnet
* 22:57 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:856008{{!}}GrowthExperiments: Remove unused config variable GEMentorDashboardUseVue]] (duration: 07m 28s)
* 22:57 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1320.eqiad.wmnet  # [[phab:T306162|T306162]]
* 22:56 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1312.eqiad.wmnet  # [[phab:T306162|T306162]]
* 22:54 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1307-1326].eqiad.wmnet
* 22:54 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:54 rzl@cumin1001: END (PASS


== 2020-05-26 ==
==Archives ==
* 21:34 krinkle@deploy1001: Synchronized wmf-config/mc.php: {{Gerrit|I0fb124b3593}} (duration: 01m 05s)
See [[Server Admin Log/Archives]].
* 21:30 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2714e2ae26404}} (duration: 01m 06s)
* 21:18 krinkle@deploy1001: Synchronized wmf-config/profiler.php: {{Gerrit|Ib0bf8d97b10b}}, [[phab:T253674|T253674]] (duration: 01m 06s)
* 20:29 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.34  refs [[phab:T253022|T253022]]
* 20:08 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.34  refs [[phab:T253022|T253022]] (duration: 70m 02s)
* 18:58 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.34  refs [[phab:T253022|T253022]]
* 18:07 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.30 (duration: 20m 45s)
* 18:02 bblack: cr[12]-eqiad: re-route ns0.wikimedia.org to authdns1001 - [[phab:T241770|T241770]]
* 18:02 ejegg: restarted fundraising jobs: recurring charge, audit processing, deduplication
* 17:57 moritzm: installing bind security updates for stretch (only client-side tools/libraries in use)
* 17:47 cdanis: netflow3001: disabling puppet and testing some pmacct/librdkafka config tweaks [[phab:T253128|T253128]]
* 17:16 James_F: 1.35.0-wmf.34 was branched at {{Gerrit|b5012a1e7d0bbd2bf7444b8708d421992bcbe2fb}} for [[phab:T253022|T253022]]
* 16:45 moritzm: installing jsp-api bugfix update from Buster point release
* 15:22 akosiaris: sync kubernetes eqiad namespaces configuration with helmfile
* 15:15 akosiaris: sync kubernetes codfw namespaces configuration with helmfile
* 15:08 arturo: delete/re-import docker/containerd.io packages in the right version in buster-wikimedia/thirdparty/kubeadm-k8s-1-<nowiki>{</nowiki>15,16<nowiki>}</nowiki> ([[phab:T250866|T250866]])
* 15:08 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add lazy-loading to Wikimedia Foundation powered-by icon [[phab:T239377|T239377]] (duration: 00m 57s)
* 15:01 jforrester@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: Drop enwiki mobile mainpage special casing [[phab:T32405|T32405]] (duration: 00m 59s)
* 14:58 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:57 akosiaris: sync staging namespaces configuration
* 14:57 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:57 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 14:57 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 14:56 jforrester@deploy1001: Synchronized docroot/noc/: Clear out symlink to mobile.php, now removed (duration: 00m 55s)
* 14:56 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:54 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 14:53 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Move mobile.php into CommonSettings.php (duration: 00m 57s)
* 14:44 arturo: upgrade packages in buster-wikimedia/thirdpardy/kubeadm-k8s-1-16 ([[phab:T246122|T246122]])
* 14:44 jforrester@deploy1001: Synchronized docroot/noc/: Clear out symlink to mobile-labs.php, now removed (duration: 00m 58s)
* 14:43 moritzm: installing rails security updates
* 14:41 jforrester@deploy1001: Synchronized wmf-config/mobile.php: Don't try to load mobile-labs.php from mobile.php (duration: 00m 57s)
* 14:38 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings.php: Move uncondition/no-sideeffect includes up (duration: 00m 57s)
* 14:35 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Clean up MWMultiVersion check in CommonSettings.php (duration: 00m 59s)
* 14:33 XioNoX: test bgp med on dns4002
* 14:31 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: SpecialVersionVersionUrl: Don't use confusing local variable name (duration: 00m 58s)
* 14:30 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: Remove EOL REL1_32 (duration: 00m 58s)
* 13:54 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 13:04 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.32
* 12:43 godog: swift eqiad-prod: decom ms-be101[678] - [[phab:T252008|T252008]]
* 12:21 XioNoX: repool ulsfo - [[phab:T243080|T243080]]
* 12:11 XioNoX: cr4-ulsfo re-activate transit/ix/4/6 - [[phab:T243080|T243080]]
* 12:03 XioNoX: cr4-ulsfo> request vmhost reboot - [[phab:T243080|T243080]]
* 12:01 XioNoX: cr4-ulsfo deactivate transit/ix/4/6 - [[phab:T243080|T243080]]
* 11:49 XioNoX: cr3-ulsfo> request vmhost reboot - [[phab:T243080|T243080]]
* 11:42 XioNoX: cr4-ulsfo> request vmhost software add ... - [[phab:T243080|T243080]]
* 11:28 XioNoX: cr3-ulsfo> request vmhost software add ... - [[phab:T243080|T243080]]
* 11:27 awight: nnwiki updateCollation.php script has finished.
* 11:26 XioNoX: depool ulsfo for routers upgrade - [[phab:T243080|T243080]]
* 11:16 awight: EU SWAT done (pending a maintenance script to updateCollation)
* 11:14 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:598553{{!}}Add 'deletedtext' permission to researcher group (T253420)]] (duration: 01m 06s)
* 11:06 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:598509{{!}}[nnwiki] Change category collation to  (T253559)]] (duration: 01m 10s)
* 10:46 marostegui: Stop tendril's event scheduler
* 10:18 jynus: stop db2097 for hw maintenance [[phab:T252492|T252492]]
* 09:48 vgutierrez: rolling upgrade to ats 8.0.7-1wm11
* 09:41 _joe_: all jobrunners converted to use envoy for TLS termination
* 09:38 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw131[0-1].eqiad.wmnet
* 09:38 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw133[4-8].eqiad.wmnet
* 09:37 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw130[0-9].eqiad.wmnet
* 09:37 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw130[0-3].eqiad.wmnet
* 09:36 oblivian@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=mw129[3-9].eqiad.wmnet
* 09:31 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw130[0-3].eqiad.wmnet
* 09:27 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw130[4-7].eqiad.wmnet
* 09:22 gehel: repool wdqs1007, catched up on lag
* 09:09 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw13(0[89]{{!}}1[01]).eqiad.wmnet
* 09:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:02 mutante: decom'ing people1001 - replaced by people1002
* 09:01 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 09:01 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw13(1{{!}}3)8.eqiad.wmnet
* 08:57 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw133[4-7].eqiad.wmnet
* 08:55 _joe_: progressively converting jobrunners to envoy
* 08:41 oblivian@cumin1001: conftool action : set/weight=1:pooled=yes; selector: name=mw1337.eqiad.wmnet
* 07:20 moritzm: installing libssh security updates
* 07:03 vgutierrez: upgrade to ats 8.0.7-1wm11 on cp3064 and cp3065
* 06:49 marostegui: Deploy schema change on s3 directly on the master with 1 minute sleep in between wikis [[phab:T253342|T253342]]
* 06:47 marostegui: Deploy schema change on s1 directly on the master [[phab:T253342|T253342]]
* 06:44 marostegui: Deploy schema change on s4 directly on the master [[phab:T253342|T253342]]
* 06:35 XioNoX: reboot scs-ulsfo - [[phab:T253609|T253609]]
* 06:29 marostegui: Deploy schema change on s7 directly on the master [[phab:T253342|T253342]]
* 06:24 marostegui: Deploy schema change on s8 directly on the master [[phab:T253342|T253342]]
* 06:01 marostegui: Deploy schema change on s2 directly on the master [[phab:T253342|T253342]]
* 04:35 marostegui: Repool labsdb1011 - [[phab:T249188|T249188]]
* 04:14 marostegui: Stop slaves and stop mysql on labsdb1011 [[phab:T249188|T249188]]
* 03:55 tstarling@deploy1001: Synchronized php-1.35.0-wmf.31/includes/export/XmlDumpWriter.php: [[phab:T253468|T253468]] (duration: 01m 06s)
* 03:53 tstarling@deploy1001: Synchronized php-1.35.0-wmf.32/includes/export/XmlDumpWriter.php: [[phab:T253468|T253468]] (duration: 01m 07s)
* 03:20 tstarling@deploy1001: Synchronized php-1.35.0-wmf.32/includes/specials/SpecialChangeContentModel.php: for UBN [[phab:T252963|T252963]] (duration: 01m 07s)
* 03:18 tstarling@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 32s)
 
== 2020-05-25 ==
* 23:34 ejegg: re-enabled fundraising queue consumers and job runners, except audits, dedupe, and recurring
* 21:38 eileen: civicrm revision changed from {{Gerrit|5428c5c449}} to {{Gerrit|d1cd99166f}}, config revision is {{Gerrit|6b05d6bb25}}
* 21:18 eileen: civicrm revision is {{Gerrit|7380e0e8ce}}, config revision is {{Gerrit|6b05d6bb25}}
* 21:01 ejegg: updated fundraising CiviCRM from {{Gerrit|737d88a5ee}} to {{Gerrit|7380e0e8ce}}
* 17:17 ejegg: updated fundraising CiviCRM from {{Gerrit|6b1d5902dd}} to {{Gerrit|737d88a5ee}}
* 17:09 ejegg: enabled contribution tracking queue on payments-wiki
* 16:24 ejegg: updated standalone SmashPig from {{Gerrit|2702b04329}} to {{Gerrit|44690f761c}}
* 16:17 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 16:16 XioNoX: enable IX4/6 BGP group on cr4-ulsfo - [[phab:T237575|T237575]]
* 16:00 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:55 XioNoX: disable IX4/6 BGP group on cr4-ulsfo - [[phab:T237575|T237575]]
* 15:17 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:15 ejegg: updated payments-wiki from {{Gerrit|3c465cb11c}} to {{Gerrit|d11efeb1cf}}, put it into maintenance mode
* 15:15 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:53 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:39 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:06 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:00 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 13:46 _joe_: uploaded doxygen 1.8.17-1 to wikimedia-buster component/ci
* 13:43 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-swift
* 13:40 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 13:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 13:10 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 13:09 vgutierrez: upgrade ATS to version 8.0.7-1wm11 on cp4026 and cp4032
* 12:52 godog: roll-restart pybal in low-traffic codfw
* 12:44 ema: upload atskafka 0.7 to buster-wikimedia, upgrade cp3050 [[phab:T253551|T253551]]
* 12:37 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 12:30 marostegui: Deploy schema change on s5 directly on the master [[phab:T253342|T253342]]
* 12:14 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 12:09 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 12:01 _joe_: converting the remaining appservers to use envoy for TLS termination
* 11:57 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 11:54 marostegui: Install a new tendril_purge_global_status_log event on db1115 (tendril) [[phab:T252331|T252331]]
* 11:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 11:51 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:48 marostegui: Stop event scheduler on db1115 (tendril) - [[phab:T252331|T252331]]
* 11:46 moritzm: uploaded CAS 6.1.5-1 to apt.wikimedia.org [[phab:T233947|T233947]]
* 11:36 _joe_: switch mw[1349-1355,1364-1373].eqiad.wmnet to envoy
* 11:27 marostegui: Extend /srv 1100G on db213[6-9] [[phab:T252985|T252985]]
* 11:23 marostegui: Extend /srv 1100G on db114[1-9] [[phab:T252512|T252512]]
* 11:21 marostegui: Extend db1141's (temporary labsdb test host) /srv 1TB extra - [[phab:T249188|T249188]]
* 11:09 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 11:09 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 11:01 ema: upload prometheus-rdkafka-exporter to buster-wikimedia [[phab:T253197|T253197]]
* 10:34 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:598439{{!}} Bumping portals to master (598439)]] (duration: 01m 05s)
* 10:33 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:598439{{!}} Bumping portals to master (598439)]] (duration: 01m 06s)
* 10:20 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 09:56 _joe_: transition done
* 09:49 _joe_: depooled mw1337, it was getting all traffic supposed to go to the jobrunners
* 09:45 vgutierrez: upload trafficserver 8.0.7-1wm10 to apt.wm.o (buster)
* 09:42 _joe_: converting mw1319-1333 to use envoy for TLS termination
* 09:17 _joe_: migrated mw1337 to use envoy for TLS termination [[phab:T247389|T247389]]
* 09:10 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 09:04 godog: turn on sni by default for check_http --ssl icinga invocations - [[phab:T253292|T253292]]
* 08:52 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 08:39 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 08:21 filippo@cumin1001: conftool action : set/pooled=yes:weight=100; selector: service=thanos-swift
* 08:05 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 07:36 moritzm: installed linux-image-amd64 on labstore1005 (current meta package for kernels following the Stretch update) [[phab:T224582|T224582]]
* 07:36 moritzm: installed linux-imageamd64 on labstore (current meta package for kernels following the Stretch update) [[phab:T224582|T224582]]
* 07:02 marostegui: Stop event scheduler on tendril [[phab:T252331|T252331]]
* 05:11 marostegui: Deploy schema change on s6, directly on the master - [[phab:T253342|T253342]]
* 04:54 marostegui: Depool labsdb1011 - [[phab:T249188|T249188]]
* 04:11 kart_: Updated cxserver to 2020-05-22-083137-production ([[phab:T246317|T246317]], [[phab:T252871|T252871]])
* 04:07 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 04:04 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 04:02 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
 
== 2020-05-24 ==
* 17:36 gehel: restarting elasticsearch psi on elastic1052
* 16:44 gehel: depool wdqs1007 to catch on lag
* 16:43 gehel: restart blazegraph on wdqs1007
 
== 2020-05-23 ==
* 19:04 krinkle@deploy1001: Synchronized php-1.35.0-wmf.31/includes/filerepo/file/LocalFile.php: {{Gerrit|I0f7e885997d60}} (duration: 01m 06s)
* 18:58 krinkle@deploy1001: Synchronized php-1.35.0-wmf.32/includes/filerepo/file/LocalFile.php: {{Gerrit|I0f7e885997d60}} (duration: 01m 08s)
* 18:06 krinkle@deploy1001: Synchronized php-1.35.0-wmf.32/includes/filerepo/: {{Gerrit|I31a9bb6672}} (duration: 01m 06s)
* 18:05 krinkle@deploy1001: Synchronized php-1.35.0-wmf.31/includes/filerepo/: {{Gerrit|I31a9bb6672}} (duration: 01m 10s)
* 15:44 krinkle@deploy1001: Synchronized wmf-config/mc.php: {{Gerrit|I5ad8fe96b9098a8}} - Disable coalesceKeys on commonswiki (duration: 01m 09s)
* 14:58 Krinkle: scap-pull to reset state on mwdebug1002
* 14:50 Krinkle: Testing mc.php changes on mwdebug1002
* 08:04 elukey: powercycle an-presto1004 - unresponsive, racadm getsel shows CPU overheating alerts
 
== 2020-05-22 ==
* 22:42 krinkle@deploy1001: Synchronized php-1.35.0-wmf.31/includes/filerepo/: {{Gerrit|Ie19613ef7643a}} (duration: 01m 06s)
* 22:40 krinkle@deploy1001: Synchronized php-1.35.0-wmf.32/includes/filerepo/: {{Gerrit|Ie19613ef7643a}} (duration: 01m 08s)
* 15:58 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:58 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:57 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:53 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:47 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:45 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:45 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:30 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:25 cdanis: fixing prometheus-nic-firmware-textfile.service wherever it is broken [[phab:T253374|T253374]]
* 15:25 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:24 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 15:06 marostegui: Decrease tendril_purge_global_status_log_5m storing rows time from 2 days to 1 day [[phab:T252331|T252331]]
* 15:01 kormat@cumin1001: dbctl commit (dc=all): 'Pool db2137 into s4+s5 [[phab:T252985|T252985]]', diff saved to https://phabricator.wikimedia.org/P11292 and previous config saved to /var/cache/conftool/dbconfig/20200522-150120-kormat.json
* 14:53 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/maintenance/blockUsers.php: (no justification provided) (duration: 01m 08s)
* 14:51 reedy@deploy1001: Synchronized php-1.35.0-wmf.32/maintenance/blockUsers.php: (no justification provided) (duration: 01m 09s)
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1144:331[45] and db1097:331[45]', diff saved to https://phabricator.wikimedia.org/P11290 and previous config saved to /var/cache/conftool/dbconfig/20200522-143541-marostegui.json
* 14:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1144:331[45] and db1097:331[45]', diff saved to https://phabricator.wikimedia.org/P11289 and previous config saved to /var/cache/conftool/dbconfig/20200522-141513-marostegui.json
* 14:13 sukhe: upload dnsdist_1.4.0-1~deb10u1 to apt.wm.o (buster) - [[phab:T252132|T252132]]
* 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1144:331[45] and db1097:331[45]', diff saved to https://phabricator.wikimedia.org/P11288 and previous config saved to /var/cache/conftool/dbconfig/20200522-140847-marostegui.json
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1144:331[45] and db1097:331[45]', diff saved to https://phabricator.wikimedia.org/P11286 and previous config saved to /var/cache/conftool/dbconfig/20200522-131452-marostegui.json
* 13:10 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 13:10 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 13:09 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 13:08 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1144:3314 and db1144:3315 to the list of hosts', diff saved to https://phabricator.wikimedia.org/P11284 and previous config saved to /var/cache/conftool/dbconfig/20200522-130707-marostegui.json
* 12:56 vgutierrez: depool cp4032 for some ats tests
* 12:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 12:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:04 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 12:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 12:03 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 10:48 marostegui: Stop MySQL on db1097:3314, db1097:3315 to clone db1144 - [[phab:T252512|T252512]]
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314, db1097:3315 - [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11281 and previous config saved to /var/cache/conftool/dbconfig/20200522-104437-marostegui.json
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 10:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:32 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 10:10 marostegui: Stop event_scheduler on db1115 - [[phab:T252331|T252331]]
* 10:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:05 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 10:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 10:05 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 10:00 jbond42: update pdns-recursor on dns recursors
* 09:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:41 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 09:22 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 09:09 elukey@deploy1001: Finished deploy [analytics/superset/deploy@be203c8]: Rollback superset to 0.35.2 (duration: 00m 43s)
* 09:09 elukey@deploy1001: Started deploy [analytics/superset/deploy@be203c8]: Rollback superset to 0.35.2
* 08:41 vgutierrez: reverting hugepages experiment on cp2041
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1149 and db1081', diff saved to https://phabricator.wikimedia.org/P11278 and previous config saved to /var/cache/conftool/dbconfig/20200522-082700-marostegui.json
* 08:18 elukey@deploy1001: Finished deploy [analytics/superset/deploy@59ba01d]: Upgrade Superset to 0.36 (duration: 01m 01s)
* 08:17 elukey@deploy1001: Started deploy [analytics/superset/deploy@59ba01d]: Upgrade Superset to 0.36
* 08:13 vgutierrez: test hugepages allocator on ATS in cp2041
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1149 and db1081', diff saved to https://phabricator.wikimedia.org/P11277 and previous config saved to /var/cache/conftool/dbconfig/20200522-080629-marostegui.json
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1149 and db1081', diff saved to https://phabricator.wikimedia.org/P11276 and previous config saved to /var/cache/conftool/dbconfig/20200522-074853-marostegui.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1149 and db1081', diff saved to https://phabricator.wikimedia.org/P11275 and previous config saved to /var/cache/conftool/dbconfig/20200522-072000-marostegui.json
* 07:07 elukey@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=druid1008.eqiad.wmnet
* 07:04 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=druid1007.eqiad.wmnet
* 07:04 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=druid1007.eqiad.wmnet
* 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1081 - [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11272 and previous config saved to /var/cache/conftool/dbconfig/20200522-043418-marostegui.json
 
== 2020-05-21 ==
* 23:58 ejegg: updated civicrm from {{Gerrit|b658fd8233}} to {{Gerrit|6b1d5902dd}}
* 23:54 krinkle@deploy1001: Synchronized php-1.35.0-wmf.32/includes/content/ContentHandlerFactory.php: {{Gerrit|If578893f5689}} (duration: 01m 06s)
* 23:47 krinkle@deploy1001: Synchronized php-1.35.0-wmf.32/extensions/LiquidThreads/classes/Thread.php: {{Gerrit|If3418cba06e}} (duration: 01m 07s)
* 23:41 krinkle@deploy1001: Synchronized wmf-config/mc.php: {{Gerrit|I222457729a5b}} (duration: 01m 08s)
* 21:46 eileen: civicrm revision changed from {{Gerrit|ed4c9522ac}} to {{Gerrit|b658fd8233}}, config revision is {{Gerrit|9babae3954}}
* 21:10 foks: removing two files for legal compliance
* 20:44 bstorm_: labstore1005 is now running stretch and drbd devices are resyncing after several reboots and some significant effort [[phab:T224582|T224582]]
* 18:24 twentyafterfour: restarting phabricator on phab1001 to deploy https://phabricator.wikimedia.org/rPHEX2687d08786a9dadcbaa96709de991f471f239830
* 17:24 bblack: anycast experiment done, all back to normal
* 17:20 bblack: anycast experimentation commencing in ulsfo (test route withdrawal)...
* 17:04 bstorm_: starting labstore1005 upgrades [[phab:T224582|T224582]]
* 16:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:04 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Update mitigations for [[phab:T250887|T250887]] (duration: 01m 08s)
* 15:48 andrewbogott: rebuilding cloudnet1003.eqiad.wmnet with Debian Buster for [[phab:T253124|T253124]]
* 15:22 XioNoX: Add BGP between cr1/2-eqiad and authdns1001 - [[phab:T253196|T253196]]
* 15:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:08 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:07 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:59 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw217[0-2].codfw.wmnet
* 14:59 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw216[0-9].codfw.wmnet
* 14:58 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw215[8-9].codfw.wmnet
* 14:50 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:47 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 14:44 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
* 14:33 akosiaris: upload helmfile 0.109.0 to apt.wikimedia.org/buster-wikimedia and stretch-wikimedia, component main
* 13:51 vgutierrez: depool cp4032 for some ats tests
* 13:22 mutante: cloudnet1004 - reboot to test PXE boot
* 12:44 andrewbogott: reimaging cloudnet1004.eqiad.wmnet for [[phab:T253124|T253124]]
* 12:29 elukey: roll restart druid-public cluster (druid100[4-6], backend for the AQS API) to apply new settings + openjdk upgrade - [[phab:T252771|T252771]]
* 12:13 mutante: depooled mw2158 through mw2172 to make room again in C3 as planned ([[phab:T247018|T247018]])
* 12:12 marostegui: Repool labsdb1011 into the analytics role 🤞- [[phab:T249188|T249188]]
* 12:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw217[0-2].codfw.wmnet
* 12:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw216[0-9].codfw.wmnet
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1143 and db1091', diff saved to https://phabricator.wikimedia.org/P11270 and previous config saved to /var/cache/conftool/dbconfig/20200521-120555-marostegui.json
* 12:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw215[8-9].codfw.wmnet
* 11:18 hnowlan: Removed changeprop from scb hosts
* 11:04 vgutierrez: rolling restart of ncredir servers for kernel update
* 10:17 vgutierrez: restart of acme-chief servers for kernel update
* 10:13 jbond42: deploy CI for pupet privcate repo
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143 and db1091', diff saved to https://phabricator.wikimedia.org/P11268 and previous config saved to /var/cache/conftool/dbconfig/20200521-101100-marostegui.json
* 10:07 mutante: replaced backend of people.wikimedia.org - people1001 will be inaccessible, replaced with people1002 on buster. all home dirs have been synced over, there should be no difference except you have to use people1002 now for uploads ([[phab:T247649|T247649]])
* 10:06 godog: test adding --sni to check_http -S on icinga2001 - [[phab:T253292|T253292]]
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143 and db1091', diff saved to https://phabricator.wikimedia.org/P11267 and previous config saved to /var/cache/conftool/dbconfig/20200521-095100-marostegui.json
* 09:28 mutante: deneb - sudo systemctl reset-failed  to clear Icinga alerts about systemd degraded state
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143 and db1091', diff saved to https://phabricator.wikimedia.org/P11266 and previous config saved to /var/cache/conftool/dbconfig/20200521-091245-marostegui.json
* 09:01 mutante: LDAP - added lmata to wmf group ([[phab:T253277|T253277]])
* 08:55 XioNoX: Advertise Anycast 198.35.27.0/24 from esams - [[phab:T253196|T253196]]
* 08:52 XioNoX: Advertise Anycast 198.35.27.0/24 from eqsin - [[phab:T253196|T253196]]
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1143 with minimal weight for the first time [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11265 and previous config saved to /var/cache/conftool/dbconfig/20200521-084933-marostegui.json
* 08:47 XioNoX: Advertise Anycast 198.35.27.0/24 from eqiad/eqord - [[phab:T253196|T253196]]
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1143 to the list of s4 hosts, depooled - [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11264 and previous config saved to /var/cache/conftool/dbconfig/20200521-084226-marostegui.json
* 08:34 XioNoX: Advertise Anycast 198.35.27.0/24 from dfw - [[phab:T253196|T253196]]
* 08:27 XioNoX: Advertise Anycast 198.35.27.0/24 from ulsfo - [[phab:T253196|T253196]]
* 08:20 XioNoX: Delete ARIN route object for 198.35.26.0/23 - [[phab:T253196|T253196]]
* 08:13 XioNoX: Delete ROA for 198.35.26.0/23 - [[phab:T253196|T253196]]
* 08:10 XioNoX: repool ulsfo - [[phab:T253196|T253196]]
* 08:03 XioNoX: Shrink ulsfo's 198.35.26.0/23 to 198.35.26.0/24 - [[phab:T253196|T253196]]
* 07:29 XioNoX: depool ulsfo - [[phab:T253196|T253196]]
* 07:22 marostegui: Purge events from tendril.global_status_log older than 24h - [[phab:T252331|T252331]]
* 07:03 jynus@cumin1001: dbctl commit (dc=all): 'Repool es1019 fully', diff saved to https://phabricator.wikimedia.org/P11263 and previous config saved to /var/cache/conftool/dbconfig/20200521-070335-jynus.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 - [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11261 and previous config saved to /var/cache/conftool/dbconfig/20200521-065858-marostegui.json
* 06:28 jynus@cumin1001: dbctl commit (dc=all): 'Repool es1019 with 50% weight', diff saved to https://phabricator.wikimedia.org/P11260 and previous config saved to /var/cache/conftool/dbconfig/20200521-062823-jynus.json
* 06:04 vgutierrez: pool cp5012 - [[phab:T251219|T251219]]
* 05:42 jynus@cumin1001: dbctl commit (dc=all): 'Repool es1019 with low weight', diff saved to https://phabricator.wikimedia.org/P11259 and previous config saved to /var/cache/conftool/dbconfig/20200521-054231-jynus.json
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set enwiki as read-only=off after maintenance [[phab:T251982|T251982]]', diff saved to https://phabricator.wikimedia.org/P11258 and previous config saved to /var/cache/conftool/dbconfig/20200521-050328-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set enwiki as read-only for maintenance [[phab:T251982|T251982]]', diff saved to https://phabricator.wikimedia.org/P11257 and previous config saved to /var/cache/conftool/dbconfig/20200521-050029-marostegui.json
* 01:03 krinkle@deploy1001: Synchronized wmf-config/mc.php: {{Gerrit|Ic9efa98312b}} (duration: 01m 08s)
 
== 2020-05-20 ==
* 20:16 herron: logstash1011:~# kafka-preferred-replica-election --zookeeper conf1004.eqiad.wmnet,conf1005.eqiad.wmnet,conf1006.eqiad.wmnet/kafka/logging-eqiad
* 19:27 robh: cp5012 still offline for mem tests, "fast" testing complete without errors and extended testing in progress.  system firmware was updated before testing.  [[phab:T251219|T251219]]
* 18:10 XioNoX: accept 198.35.27.0/24 from Anycast peers on all routers  - [[phab:T253196|T253196]]
* 18:01 XioNoX: add BGP between authdns2001 and cr1-codfw - [[phab:T253196|T253196]]
* 17:57 XioNoX: accept 198.35.27.0/24 from Anycast peers on cr3-ulsfo  - [[phab:T253196|T253196]]
* 17:44 robh: cp5012 rebooting for troubleshooting
* 17:02 bblack: dns* + authdns* - disabling puppet to test https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/597311/
* 16:53 bblack: kraz.wikimedia.org ( https://wikitech.wikimedia.org/wiki/IRCD ) - stopping ircecho then ircd, then restarting them in reverse order - [[phab:T239993|T239993]]
* 16:01 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 16:01 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
* 15:42 elukey: update puppet compiler's facts
* 15:21 moritzm: installing libssh security updates
* 15:15 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 15:00 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T253096|T253096]] [itwikivoyage] Undeploy Insider and Listings extensions (duration: 01m 08s)
* 14:43 marostegui: Replace tendril_purge_global_status_log_5m event with the new one (purging every 2d of data and with a higher limit of rows) - [[phab:T252331|T252331]]
* 14:34 hnowlan@deploy1001: Finished deploy [restbase/deploy@6d2f88c]: Add awa.wikipedia.org to wikipedia list (duration: 19m 49s)
* 14:15 hnowlan@deploy1001: Started deploy [restbase/deploy@6d2f88c]: Add awa.wikipedia.org to wikipedia list
* 14:06 XioNoX: special-ranges6, remove 4000::/2 and 8000::/1
* 14:03 bblack: authdns1001 - poweroff for [[phab:T241770|T241770]]
* 14:00 bblack: cr2-eqiad - re-routing ns[01] public IPs from authdns1001 (going offline for hw work) to dns1002 - [[phab:T241770|T241770]] (redo from earlier, commit didn't take for whatever reason)
* 13:52 bblack: cr[12]-eqiad - re-routing ns[01] public IPs from authdns1001 (going offline for hw work) to dns1002 - [[phab:T241770|T241770]]
* 13:51 bblack: authdns1001 - downtimed for physical work - [[phab:T241770|T241770]]
* 13:24 milimetric@deploy1001: Finished deploy [analytics/refinery@a891999] (thin): Regular analytics weekly train THIN [analytics/refinery@a891999] (duration: 00m 10s)
* 13:23 milimetric@deploy1001: Started deploy [analytics/refinery@a891999] (thin): Regular analytics weekly train THIN [analytics/refinery@a891999]
* 13:23 milimetric@deploy1001: Finished deploy [analytics/refinery@a891999]: Regular analytics weekly train [analytics/refinery@a891999] (duration: 38m 33s)
* 13:23 godog: remove stale tcp service on lvs codfw low-traffic 10.2.1.53:10902
* 13:00 Amir1: creating two wikis are done
* 12:52 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 10m 49s)
* 12:45 milimetric@deploy1001: Started deploy [analytics/refinery@a891999]: Regular analytics weekly train [analytics/refinery@a891999]
* 12:41 ladsgroup@deploy1001: Synchronized static/images/project-logos/: Creating Wiktionary Konkani (gomwiktionary) - [[phab:T249506|T249506]] (duration: 01m 06s)
* 12:40 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:38 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating Wiktionary Konkani (gomwiktionary) - [[phab:T249506|T249506]] (duration: 01m 05s)
* 12:35 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: Creating Wiktionary Konkani (gomwiktionary) - [[phab:T249506|T249506]]
* 12:33 ladsgroup@deploy1001: Synchronized dblists: Creating Wiktionary Konkani (gomwiktionary) - [[phab:T249506|T249506]] (duration: 01m 06s)
* 12:28 godog: roll-restart pybal on codfw low-traffic - [[phab:T233956|T233956]]
* 12:26 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:22 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 12:22 ladsgroup@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 01s)
* 12:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 12:18 ladsgroup@deploy1001: Synchronized langlist: Create Awadhi Wikipedia (awawiki) - [[phab:T251371|T251371]] (duration: 01m 06s)
* 12:16 ladsgroup@deploy1001: Synchronized static/images/project-logos: Create Awadhi Wikipedia (awawiki) - [[phab:T251371|T251371]] (duration: 01m 06s)
* 12:14 ladsgroup@deploy1001: Synchronized multiversion/MWMultiVersion.php: Create Awadhi Wikipedia (awawiki) - [[phab:T251371|T251371]] (duration: 01m 06s)
* 12:12 ladsgroup@deploy1001: rebuilt and synchronized wikiversions files: Create Awadhi Wikipedia (awawiki) - [[phab:T251371|T251371]]
* 12:07 ladsgroup@deploy1001: Synchronized dblists: (no justification provided) (duration: 01m 08s)
* 11:37 mutante: rebooting ganeti1009 and ganeti1011 to hopefully clear icinga alerts about microcode mitigations
* 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool new host db1142 and db1084', diff saved to https://phabricator.wikimedia.org/P11253 and previous config saved to /var/cache/conftool/dbconfig/20200520-111013-marostegui.json
* 11:07 jynus@cumin1001: dbctl commit (dc=all): 'Repool es1018, es1015 fully', diff saved to https://phabricator.wikimedia.org/P11252 and previous config saved to /var/cache/conftool/dbconfig/20200520-110732-jynus.json
* 11:04 jbond42: roll out update or exim4
* 10:46 moritzm: installing 4.19.118 Linux packages on Buster hosts
* 10:28 vgutierrez: rolling restart of ats-tls in text@esams - [[phab:T249335|T249335]]
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1142 and db1084 on s4', diff saved to https://phabricator.wikimedia.org/P11250 and previous config saved to /var/cache/conftool/dbconfig/20200520-101928-marostegui.json
* 10:07 jynus@cumin1001: dbctl commit (dc=all): 'Repool es1018, es1015 at 50% weight', diff saved to https://phabricator.wikimedia.org/P11249 and previous config saved to /var/cache/conftool/dbconfig/20200520-100726-jynus.json
* 09:43 vgutierrez: disable KA for POST/PUT requests on esams - [[phab:T249335|T249335]]
* 09:36 XioNoX: create ROAs for 198.35.26.0/24 and 198.35.27.0/24 - [[phab:T253196|T253196]]
* 09:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1142 and db1084 on s4', diff saved to https://phabricator.wikimedia.org/P11247 and previous config saved to /var/cache/conftool/dbconfig/20200520-093141-marostegui.json
* 09:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 09:28 XioNoX: create ARIN inetnum 198.35.27.0/24 and route 198.35.26.0/24 + 198.35.27.0/24 - [[phab:T253196|T253196]]
* 09:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 09:26 marostegui: Upgrade db1083 (s1 master) to 10.1.43-2 without restarting [[phab:T251982|T251982]]
* 09:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 09:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 09:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for new host db1142 and start to repool db1084', diff saved to https://phabricator.wikimedia.org/P11246 and previous config saved to /var/cache/conftool/dbconfig/20200520-091153-marostegui.json
* 09:08 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 09:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:01 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 09:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1142 with minimum weight for the first time [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11245 and previous config saved to /var/cache/conftool/dbconfig/20200520-085757-marostegui.json
* 08:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 08:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 08:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:49 _joe_: converting mw1266-1275 to use envoy [[phab:T247389|T247389]]
* 08:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 08:42 XioNoX: Remove bogons4 for policy options on all routers - gerrit 597272
* 08:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 08:33 _joe_: disabling puppet on mw1266-1275 for migration to envoy
* 08:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 08:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 07:41 marostegui: alter table categorylinks engine=Innodb ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8,force on all labsdb1011 wikis - [[phab:T249188|T249188]]
* 07:24 moritzm: install systemd security updates
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 to clone db1142 [[phab:T252512|T252512]]', diff saved to https://phabricator.wikimedia.org/P11241 and previous config saved to /var/cache/conftool/dbconfig/20200520-071010-marostegui.json
* 00:05 RoanKattouw: Ran namespaceDupes.php on tiwiki and tiwiktionary for [[phab:T251287|T251287]]
* 00:03 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set sitename and meta namespace localizations for tiwiki and tiwiktionary ([[phab:T251287|T251287]]) (duration: 01m 06s)
 
== 2020-05-19 ==
* 23:59 RoanKattouw: Ran namespaceDupes.php on jvwiki and jvwiktionary for [[phab:T252754|T252754]]
* 23:57 jforrester@deploy1001: Synchronized php-1.35.0-wmf.32/extensions/Insider/includes/InsiderHooks.php: [[phab:T252846|T252846]] Use SidebarBeforeOutput hook with correct format (duration: 01m 06s)
* 23:55 catrope@deploy1001: Finished scap: i18n scap for namespace localizations ([[phab:T251287|T251287]], [[phab:T252754|T252754]]) (duration: 62m 26s)
* 22:53 catrope@deploy1001: Started scap: i18n scap for namespace localizations ([[phab:T251287|T251287]], [[phab:T252754|T252754]])
* 18:46 herron: performing rolling restarts of codfw/eqiad ELK clusters for java updates
* 18:41 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Grant template editors editcontentmodel on enwiki ([[phab:T253081|T253081]]) (duration: 01m 06s)
* 18:35 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments features on frwiki ([[phab:T252420|T252420]]) (duration: 01m 08s)
* 17:09 arturo: added tesseract suite to stretch-wikimedia component/tesseract-410-bpo ([[phab:T247422|T247422]])
* 16:24 godog: power cycle thanos-fe* / thanos-be*
* 15:23 kormat@cumin1001: dbctl commit (dc=all): 'Repool db2073 into s4 [[phab:T252985|T252985]]', diff saved to https://phabricator.wikimedia.org/P11236 and previous config saved to /var/cache/conftool/dbconfig/20200519-152340-kormat.json
* 15:20 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:20 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
* 15:16 cdanis: canary on ~150 hosts looks great, re-enabling puppet on all physical hosts ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕥☕ sudo cumin 'F:virtual = physical'  'enable-puppet "cdanis deploying I68c97d5"'
* 15:04 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:04 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
* 14:59 moritzm: installing fuse update from Buster point release
* 14:47 cdanis: disabling puppet on all physical hosts ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕥☕ sudo cumin 'F:virtual = physical'  'disable-puppet "cdanis deploying I68c97d5"'
* 14:38 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 14:26 XioNoX: Set minimum-links 2 to AMS-IX LACP - [[phab:T253122|T253122]]
* 13:53 XioNoX: configure new AMS-IX port as quarantine - [[phab:T251121|T251121]]
* 13:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 13:09 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 13:09 jayme: updated helm: 2.16.7-1 -> 2.16.7-2 on deploy[1,2]001 and contint[1,2]001
* 13:09 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 13:03 kormat@cumin1001: dbctl commit (dc=all): 'Pool db2136 into s4 [[phab:T252985|T252985]]', diff saved to https://phabricator.wikimedia.org/P11233 and previous config saved to /var/cache/conftool/dbconfig/20200519-130313-kormat.json
* 12:40 ariel@deploy1001: Finished deploy [dumps/dumps@a329605]: make page content fixup script move inprog files into place if good (duration: 00m 04s)
* 12:40 ariel@deploy1001: Started deploy [dumps/dumps@a329605]: make page content fixup script move inprog files into place if good
* 12:37 jayme: imported helm 2.16.7-2 to main for buster-wikimedia, stretch-wikimedia, jessie-wikimedia
* 12:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 11:51 jynus: starting backups of es1, es2, es3 on eqiad into backup1002
* 11:41 jynus@cumin1001: dbctl commit (dc=all): 'Depool es1018, es1015, es1019', diff saved to https://phabricator.wikimedia.org/P11232 and previous config saved to /var/cache/conftool/dbconfig/20200519-114148-jynus.json
* 11:12 marostegui: Deploy schema change on db2124 (frwiki, jawiki, ruwiki) [[phab:T238966|T238966]]
* 10:34 mutante: releases2001 - restarted failed jenkins
* 10:33 mutante: releases2001 - Failed to restart jenkins.service: The name org.freedesktop.PolicyKit1 was not provided by any .service files
* 10:32 volans: flushed all Netbox caches (manage.py invalidate all) - [[phab:T253091|T253091]]
* 10:29 volans: start Netbox restore - [[phab:T253091|T253091]]
* 10:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 10:13 akosiaris: upgrade etherpad-lite to 1.8.4 on etherpad1002
* 09:58 hnowlan: roll-restart of eqiad restbase hosts for java security updates
* 09:58 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 09:55 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 09:55 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
* 09:55 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 09:54 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 09:10 godog: eqiad-prod: decom ms-be101[678] - [[phab:T252008|T252008]]
* 08:07 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - eqsin
* 08:04 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - esams
* 08:01 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - eqiad
* 07:55 volker-e@deploy1001: Finished deploy [design/style-guide@37c67dd]: Deploy design/style-guide:  (duration: 00m 06s)
* 07:54 volker-e@deploy1001: Started deploy [design/style-guide@37c67dd]: Deploy design/style-guide:
* 07:52 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - *dfw
* 07:49 XioNoX: Push 596597: BGP: standardize fixed part of IX4/IX6 groups - ulsfo
* 07:45 vgutierrez: rolling upgrade to trafficserver 8.0.7-1wm10 with puppet disabled on cp hosts
* 07:09 jynus: starting es4 & es5 eqiad backups with low concurrency
* 06:35 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 06:29 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 06:24 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 06:17 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 05:57 volker-e@deploy1001: Finished deploy [design/style-guide@7bfbd2a]: Deploy design/style-guide:  (duration: 00m 06s)
* 05:57 volker-e@deploy1001: Started deploy [design/style-guide@7bfbd2a]: Deploy design/style-guide:
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 and s8 as read-only=off for maintenance [[phab:T251981|T251981]]', diff saved to https://phabricator.wikimedia.org/P11227 and previous config saved to /var/cache/conftool/dbconfig/20200519-050346-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 and s8 as read-only for maintenance [[phab:T251981|T251981]]', diff saved to https://phabricator.wikimedia.org/P11226 and previous config saved to /var/cache/conftool/dbconfig/20200519-050043-marostegui.json
* 04:27 marostegui: Repool labsdb1011 [[phab:T249188|T249188]]
* 03:29 volker-e@deploy1001: Finished deploy [design/style-guide@4b4bc51]: Deploy design/style-guide:  (duration: 00m 07s)
* 03:28 volker-e@deploy1001: Started deploy [design/style-guide@4b4bc51]: Deploy design/style-guide:
 
== 2020-05-18 ==
* 23:50 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:47 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:25 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:12 ryankemper: Restarted `wdqs-updater` across all wdqs nodes and restarted `wdqs-categories` across all nodes except 1010 (test wdqs server) and 1009 (automated deployment server)
* 22:55 Krinkle: Clear module_deps on dewiki (group2, old mw version, s5) to monitor regeneration
* 22:48 Krinkle: Clear module_deps on group0 (mostly s3) to monitor regeneration
* 22:35 Krinkle: Clear module_deps on commonswiki (group1, s4) to monitor regeneration
* 22:33 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@4886dc3]: 0.3.32 (duration: 17m 12s)
* 22:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:18 Krinkle: Clear module_deps on s2 wikis to monitor regeneration
* 22:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:15 ryankemper@deploy1001: Started deploy [wdqs/wdqs@4886dc3]: 0.3.32
* 22:02 Krinkle: Clear module_deps on hewiki (group1, s7) to monitor regeneration, ref [[phab:T247028|T247028]]
* 21:40 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:23 krinkle@deploy1001: Synchronized php-1.35.0-wmf.32/includes/resourceloader/dependencystore/: {{Gerrit|I015fa5885}}, {{Gerrit|I972a93806006}} (duration: 01m 07s)
* 21:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:27 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@12efc14]: Update mobileapps to {{Gerrit|c960b349}} (duration: 03m 31s)
* 20:24 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@12efc14]: Update mobileapps to {{Gerrit|c960b349}}
* 19:07 herron: performing rolling maintenance on kafka-main to pick up java security updates
* 19:00 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Ic005093778d}} (duration: 01m 08s)
* 18:58 krinkle@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: {{Gerrit|Ic005093778d}} (duration: 01m 06s)
* 18:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:46 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:38 volans: upgraded spicerack to 0.0.37-1 on cumin[12]001
* 18:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix English Wikipedia wordmark dimensions ([[phab:T252143|T252143]]) (duration: 01m 06s)
* 17:14 XioNoX: update domain object for 56.15.185.in-addr.arpa - [[phab:T247972|T247972]]
* 17:06 bblack: dns1001 - removing downtimes, back in service - [[phab:T241770|T241770]]
* 16:45 bstorm_: updated views on labsdb1011 for the wb_terms changes [[phab:T251598|T251598]]
* 16:32 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:30 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 16:17 bblack: dns1001 - reimaging for new NIC - [[phab:T241770|T241770]]
* 16:10 volans: uploaded spicerack_0.0.37-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 15:52 hnowlan: rolling codfw cassandra for java security updates
* 15:51 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 15:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 15:11 Krinkle: krinkle@mc1021 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 14:57 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 14:56 hnowlan: roll-restart of sessionstore cassandra hosts for java security update
* 14:55 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart
* 14:53 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 14:50 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 14:50 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 14:35 hnowlan@deploy1001: Finished deploy [changeprop/deploy@16bf19f]: Stop consuming purges topic, purged is now doing this (duration: 01m 22s)
* 14:34 hnowlan@deploy1001: Started deploy [changeprop/deploy@16bf19f]: Stop consuming purges topic, purged is now doing this
* 14:33 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of esams [[phab:T133821|T133821]]
* 14:29 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of eqiad [[phab:T133821|T133821]]
* 14:23 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of eqsin, ulsfo [[phab:T133821|T133821]]
* 14:19 _joe_: start consuming $dc.resource-purge kafka topic from purged in all of codfw [[phab:T133821|T133821]]
* 14:15 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2073 while replacing it [[phab:T252985|T252985]]', diff saved to https://phabricator.wikimedia.org/P11216 and previous config saved to /var/cache/conftool/dbconfig/20200518-141505-kormat.json
* 14:12 bblack: dns1001 - shutting down for [[phab:T241770|T241770]]
* 14:09 volans: uploaded spicerack_0.0.36-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 14:07 bblack: authdns - ns[01] static routes on cr[12]-eqiad switching back to authdns1001 (oops, that's not the server we're taking offline today!)
* 14:06 vgutierrez: upload trafficserver 8.0.7-1wm9 to apt.wm.o (buster)
* 14:02 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 14:00 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 13:57 bblack: authdns - ns[01] static routes on cr[12]-eqiad switching from authdns1001 to dns1002 for [[phab:T241770|T241770]]
* 13:29 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 13:00 hashar@deploy1001: Synchronized php-1.35.0-wmf.32/skins/Vector/includes/VectorTemplate.php: VectorTemplate: SkinTemplateToolboxEnd hook isn't deprecated - [[phab:T252906|T252906]] (duration: 01m 07s)
* 11:52 marostegui: Install 10.1.43-2 on db1122 and db1109 - [[phab:T251981|T251981]]
* 11:27 Lucas_WMDE: EU SWAT done
* 11:25 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.32/extensions/Wikibase/: SWAT: [[gerrit:596616{{!}}Fix core's TitleFactory not being used correctly (T252803)]] (duration: 01m 12s)
* 11:20 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:597010{{!}}Update GrowthExperiments mentor list page for viwiki]] (duration: 01m 06s)
* 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:596916{{!}}Make the threshold for Chinese WP to prevent publishing 5% more strict (T252786)]] (duration: 01m 06s)
* 10:38 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:597033{{!}} Bumping portals to master (597033)]] (duration: 01m 06s)
* 10:37 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:597033{{!}} Bumping portals to master (597033)]] (duration: 01m 32s)
* 10:37 elukey: copy prometheus-druid-exporter 0.8-1 from stretch to buster wikimedia
* 10:20 _joe_: upgrading purged in the remaining datacenters
* 10:07 elukey: upload druid 0.12.3-1.1 to stretch{{!}}buster-wikimedia
* 10:02 vgutierrez: upload trafficserver 8.0.7-1wm8 to apt.wm.o (buster)
* 09:53 _joe_: upgrading purged in codfw, ulsfo
* 09:46 mutante: contint2001 - apt-get remove --purge openjdk-11-* - [[phab:T224591|T224591]]
* 09:43 _joe_: upload purged 0.13 to buster-wikimedia
* 08:44 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 08:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 08:25 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 08:25 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 08:13 godog: set weight to 0 for all but objects in ms-be10[678] - [[phab:T252008|T252008]]
* 07:57 mutante: replacing apache module with httpd module on deployment servers
* 07:47 moritzm: installing apt security updates on jessie systems
* 07:36 marostegui: Remove and add pc2007 from tendril as the Act is frozen after reimage - [[phab:T250666|T250666]]
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2088 after upgrade', diff saved to https://phabricator.wikimedia.org/P11214 and previous config saved to /var/cache/conftool/dbconfig/20200518-072234-marostegui.json
* 07:20 marostegui: Upload MariaDB 10.4.13 to the buster repo - [[phab:T250666|T250666]]
* 07:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:41 marostegui: Stop MySQL on db2088
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088 for upgrade', diff saved to https://phabricator.wikimedia.org/P11213 and previous config saved to /var/cache/conftool/dbconfig/20200518-062452-marostegui.json
* 05:55 _joe_: installing purged 0.12 on cp2027
* 05:54 _joe_: uploaded purged 0.12 to apt.w.o
* 05:00 marostegui: Stop MySQL on labsdb1011 to copy its content to backup1001 [[phab:T249188|T249188]]
 
== 2020-05-16 ==
* 22:04 Krinkle: krinkle@mc1022 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 21:56 Krinkle: krinkle@mc1019 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 20:23 Krinkle: krinkle@mc1034,mc1035,mc1036 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 20:04 Krinkle: krinkle@mc1033 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 19:57 Krinkle: krinkle@mc1032 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 19:51 Krinkle: krinkle@mc1031 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 19:42 Krinkle: krinkle@mc1030 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 19:25 Krinkle: krinkle@mc1029 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 19:10 Krinkle: krinkle@mc1028 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet
* 18:58 Krinkle: krinkle@mc1027 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref [[phab:T252945|T252945]]
* 18:54 Krinkle: krinkle@mc1026 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref [[phab:T252945|T252945]]
* 18:30 Krinkle: krinkle@mc1024 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref [[phab:T252945|T252945]]
* 18:24 Krinkle: krinkle@mc1025 Pruning the old `echo:seen:` Redis keys that didn't have a ttl yet, ref [[phab:T252945|T252945]]
* 17:56 Krinkle: krinkle@mc1023 Pruning old echo:seen: Redis keys that didn't use a ttl yet, ref [[phab:T252945|T252945]]
* 17:49 Krinkle: krinkle@mwmaint1002: Running cleanupRemovedModules.php to prune old module_deps rows [[phab:T113916|T113916]]
* 17:24 Krinkle: krinkle@mc1020 Prune old echo:seen: keys that have ttl:-1 from Redis main stash, ref [[phab:T252945|T252945]]
* 15:16 Krinkle: krinkle@mc1020 Looking at why there are still over 2M echo:seen keys in redis main stash
* 00:55 krinkle@deploy1001: Synchronized wmf-config/logging.php: {{Gerrit|I046868190b472}} (duration: 01m 13s)
* 00:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:16 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:16 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:13 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:10 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:06 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 00:06 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 00:05 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 00:05 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
 
== 2020-05-15 ==
* 23:50 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:47 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:46 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 23:46 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:46 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:43 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:43 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:37 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 23:35 ryankemper: Pooled wdqs2007 following successful query tests (all data transfers are done now)
* 22:53 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I1b1578a57ef5}} (duration: 01m 07s)
* 22:51 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Iaa240eb8cf9}} (duration: 01m 06s)
* 21:41 ryankemper: depooled wdqs2007 while it catches up on lag
* 21:40 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:36 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 20:33 ryankemper: pooled wdqs2003 and wdqs1007 following successful query tests
* 19:46 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|If0fd1b51}} (duration: 01m 08s)
* 18:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:34 ryankemper: depooled wdqs2003 while lag catches up
* 18:32 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:55 vgutierrez: upload acme-chief 0.25 to apt.wm.o (buster) - [[phab:T252881|T252881]]
* 17:27 XioNoX: renumber cr2-eqord:xe-0/1/1 to xe-0/1/3 - [[phab:T221259|T221259]]
* 17:02 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 17:01 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:00 ryankemper: depooled wqds1007 in preparation for impending wdqs data xfer
* 16:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 16:53 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:52 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 16:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:02 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:57 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 15:56 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:52 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 15:49 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:45 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 15:44 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:40 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 15:36 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:32 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 15:31 gehel@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:27 gehel@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 14:19 cdanis: reverting sysctl net.ipv4.udp_mem to original on netflow3001
* 14:18 cdanis: re-enable puppet on netflow*
* 14:14 cdanis: disable puppet on netflow*
* 14:04 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 13:47 ema: cp2029, cp3050: varnish-fe-restart to clear 'child restarted' alerts
* 13:47 vgutierrez: downgrade ats to version 8.0.7-1wm7 on cp4032
* 13:42 vgutierrez: upgrade ats to version 8.0.7-1wm8 on cp4032
* 13:37 mutante: rsyncing gerrit git data from gerrit1001 to gerrit1002 ([[phab:T200739|T200739]])
* 13:13 cdanis: increase samplicator recvbuf on netflow3001 & restart samplicator
* 13:01 cdanis: increasing sysctl net.ipv4.udp_mem on netflow3001
* 09:57 vgutierrez: upload trafficserver 8.0.7-1wm7 to apt.wm.o (buster)
* 09:21 ema: cp2029: attempt forced discard of stuck VCL [[phab:T236754|T236754]]
* 09:09 elukey: restart druid brokers on druid100[4-6] - locked up due to datasources dropped - [[phab:T226035|T226035]]
* 08:51 ema: cp2029: try out varnish 5.1.3-1wm15 [[phab:T236754|T236754]]
* 07:36 XioNoX: bumps prefix limit for AS16735 in eqiad
* 05:35 jynus: stop replication on pc2009, pc2010 for benchmarking [[phab:T252761|T252761]]
* 04:53 volker-e@deploy1001: Finished deploy [design/style-guide@dc956a3]: Deploy design/style-guide:  (duration: 00m 10s)
* 04:52 volker-e@deploy1001: Started deploy [design/style-guide@dc956a3]: Deploy design/style-guide:
* 04:42 vgutierrez: repool cp5006
* 04:28 vgutierrez: depool and reboot cp5006
 
== 2020-05-14 ==
* 23:24 catrope@deploy1001: Synchronized static/images/project-logos/: Revert temporary 20k logo for vecwiki ([[phab:T252770|T252770]]) (duration: 01m 06s)
* 23:23 RoanKattouw: Ran namespaceDupes.php for [[phab:T252343|T252343]]
* 23:20 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create Gapura (Portal) namespace on jvwiki ([[phab:T252343|T252343]]) (duration: 01m 06s)
* 23:09 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add *.ub.uni-heidelberg.de and hq.eso.org to $wgCopyUploadDomains ([[phab:T252600|T252600]], [[phab:T252726|T252726]]) (duration: 01m 07s)
* 21:43 ryankemper: depooled wdqs2006 while lag recovers
* 21:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:08 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:16 volans: moved codereview.tar.gz and with_r.tar.gz from miscweb1002 to cumin1001 to free space
* 20:15 hashar@deploy1001: Synchronized php-1.35.0-wmf.32/skins/Vector/includes/VectorTemplate.php: Allow plain text labels in side bar - [[phab:T252727|T252727]] (duration: 01m 06s)
* 19:51 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:49 ryankemper: Depooled wqds1006 in preparation for impending wdqs data xfer
* 18:36 Urbanecm: Morning SWAT done
* 18:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|15adbbc}}: [thwikisource] Set ProofReadPage separator to an empty string ([[phab:T252610|T252610]]) (duration: 01m 06s)
* 18:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4b8399c}}: Undeploy graphoid from mediawikiwiki ([[phab:T242855|T242855]]) (duration: 01m 05s)
* 18:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|f03a45c}}: Adding import to test wikis from mediawikiwiki ([[phab:T242855|T242855]]) (duration: 01m 07s)
* 17:03 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 1 member 1 - [[phab:T252797|T252797]]
* 16:55 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 3 member 1 - [[phab:T252797|T252797]]
* 16:51 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port set pic-slot 0 port 48 member 2 - [[phab:T252797|T252797]]
* 16:50 XioNoX: request virtual-chassis vc-port set pic-slot 1 port 2 member 1 - [[phab:T252797|T252797]]
* 16:42 XioNoX: request virtual-chassis vc-port delete pic-slot 1 port 2 member 1 - [[phab:T252797|T252797]]
* 16:36 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 0 port 48 member 2 - [[phab:T252797|T252797]]
* 15:59 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:57 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:56 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:25 XioNoX: disable asw2-d1-eqiad:et-1/1/0 - [[phab:T251663|T251663]]
* 14:39 mutante: kuai kuai is https://twitter.com/Arlieth/status/1257714333133357056 {{!}} https://en.wikipedia.org/wiki/Kuai_Kuai_culture
* 13:31 _joe_: updating purged to 0.11 in eqiad,eqsin,esams
* 12:47 vgutierrez: rolling upgrade ats to version 8.0.7-1wm7
* 12:46 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 12:43 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 12:22 kormat: reverted iosched on pc1010 to `mq-deadline` [[phab:T252761|T252761]]
* 11:47 kormat: changed iosched on pc1010 to `none` as a test [[phab:T252761|T252761]]
* 11:07 matthiasmullie: EU swat done
* 11:05 mlitn@deploy1001: Synchronized php-1.35.0-wmf.32/extensions/WikibaseMediaInfo/: [MediaInfo] Enable media search for all users by default (duration: 01m 12s)
* 11:04 vgutierrez: upgrade ats to version 8.0.7-1wm7 on cp3064
* 10:31 fdans@deploy1001: Finished deploy [analytics/refinery@6f13979]: Regular analytics weekly train (duration: 17m 14s)
* 10:14 fdans@deploy1001: Started deploy [analytics/refinery@6f13979]: Regular analytics weekly train
* 09:58 elukey: remove matomo 3.11 from the main component of stretch-wikimedia
* 09:56 elukey: upgrade matomo on matomo1001 to 3.13.3 (latest upstream) - [[phab:T252741|T252741]]
* 09:30 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 09:29 elukey: upload matomo-3.13.3 to thirdparty/matomo on stretch{{!}}buster-wikimedia
* 09:22 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 08:57 elukey: imported gpg key 1FD752571FE36FF23F78F91B81E2E78B66FED89E in apt1001 (Matomo public debian repo)
* 08:56 moritzm: installing Java security updates on Presto
* 08:43 jayme: updated helm: 2.12.2-1 -> 2.16.7-1 on deploy[1,2]001 and contint1001. 2.12.2-4 -> 2.16.7-1 on contint2001
* 08:39 jayme: imported helm 2.16.7-1 to main for jessie-wikimedia
* 08:32 moritzm: installing Java security updates on Hadoop/AQS/Druid
* 08:20 jayme@deploy2001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 08:00 vgutierrez: upgrade ats to version 8.0.7-1wm7 on cp5011
* 07:03 moritzm: installing apt security updates
* 06:33 ryankemper: Pooled wdqs2005 following successful test queries
* 04:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 04:02 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:59 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:59 ryankemper: wdqs1005 has been de-pooled pending wdqs data xfer
* 02:57 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 02:57 ryankemper: wdqs1004 was repooled after successful test queries
* 02:55 ryankemper: wdqs2006 was repooled after successful test queries
* 01:32 ryankemper: depooled wdqs2006 while waiting for lag to recover
* 00:54 foks: change password for "Python eggs"
* 00:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:31 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:08 twentyafterfour: phabricator update appears to be stable.
* 00:05 twentyafterfour: updating phabricator. 1 patch + new translations. Expect only brief downtime.
 
== 2020-05-13 ==
* 23:46 cstone: SmashPig revision changed from {{Gerrit|cd1a49da5f}} to {{Gerrit|2702b04329}}
* 23:43 ejegg: updated payments-wiki from {{Gerrit|dabba1804c}} to {{Gerrit|3c465cb11c}}
* 23:36 ejegg: rolled back payments-wiki to {{Gerrit|dabba1804c}}
* 23:29 ejegg: updated payment-wiki from {{Gerrit|dabba1804c}} to {{Gerrit|3c465cb11c}}
* 22:40 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:39 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 22:36 ryankemper: Depooled wdqs1004 for subsequent wdqs data xfer
* 22:29 ryankemper: Pooled wdqs2005 given that lag has returned to normal levels and the instance is responding to queries correctly
* 22:26 ryankemper: Pooled wdqs1008 given that lag has returned to normal levels and the instance is responding to queries correctly
* 21:30 elukey: powercycle analytics1055
* 21:05 eileen: civicrm revision changed from {{Gerrit|cfb6101e39}} to {{Gerrit|ed4c9522ac}}, config revision is {{Gerrit|2eb75f8dff}}
* 20:16 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T242430|T242430]] Stop loading the ParsoidBatchAPI extension (duration: 01m 08s)
* 19:09 hashar@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.32 (duration: 01m 05s)
* 19:08 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.32
* 18:54 twentyafterfour: restarted php-fpm on phab1001
* 18:53 thcipriani: restarting gerrit
* 18:52 twentyafterfour: restarting apache on phab1001 for lack of a better idea
* 18:50 herron: restarted kafka broker on kafka-main1001 for java security updates
* 18:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|38db3e0}}: Update production wordmarks ([[phab:T252143|T252143]]) (duration: 01m 07s)
* 18:17 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: {{Gerrit|38db3e0}}: Update production wordmarks ([[phab:T252143|T252143]]) (duration: 01m 09s)
* 17:55 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:53 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 17:51 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:24 ryankemper: Manually depooled wdqs2005 while lag catches up following the data xfer
* 17:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:18 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:12 urandom: restarted cassandra-c, restbase2017
* 17:04 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:57 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:54 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 16:11 James_F: Running AbuseFilter updateVarDumps on group0 on mwmaint1002 [[phab:T246539|T246539]]
* 16:00 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:38 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:32 vgutierrez: upgrade ats to version 8.0.7-1wm7 on cp4032
* 15:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:30 jayme: imported scap 3.14.0-1 to main for buster-wikimedia
* 15:30 jayme: imported scap 3.14.0-1 to main for jessie-wikimedia
* 15:29 ryankemper: Manually de-pooling `wdqs1008.eqiad.wmnet` in preparation for wdqs data transfer
* 15:29 jayme: imported scap 3.14.0-1 to main for stretch-wikimedia
* 15:26 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 15:23 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 15:08 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:06 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:55 _joe_: upgrading + restarting purged across ulsfo and codfw [[phab:T133821|T133821]]
* 14:50 filippo@deploy1001: Finished deploy [librenms/librenms@0a88d64]: Upgrade LibreNMS to 1.63 [[phab:T251222|T251222]] (duration: 00m 10s)
* 14:50 filippo@deploy1001: Started deploy [librenms/librenms@0a88d64]: Upgrade LibreNMS to 1.63 [[phab:T251222|T251222]]
* 14:35 vgutierrez: upload trafficserver 8.0.7-1wm6 to apt.wm.o (buster) - [[phab:T249335|T249335]] [[phab:T251537|T251537]]
* 13:59 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:57 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:55 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 11:39 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:595881{{!}}Add *.deutsche-digitale-bibliothek.de to the wgCopyUploadsDomains (T252296)]] (duration: 01m 06s)
* 11:17 Amir1: EU SWAT is done
* 11:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:596180{{!}}Disable wgLegacyJavaScriptGlobals on fawiki and wikidatawiki (T72470)]] (duration: 01m 06s)
* 11:09 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:06 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:595544{{!}}Anchor RegExp for Data Bridge in Beta (BETA-ONLY)]] (duration: 01m 06s)
* 11:00 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:00 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
* 10:55 volans: imported tqdm 4.11.2-1 packages into buster-wikimedia component/spicerack
* 10:34 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 10:09 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool pc1007 as pc1 master [[phab:T252182|T252182]] (duration: 01m 05s)
* 09:55 jbond42: deployed a fix to ferm-status script.  unmanaged ferm rules may get removed
* 09:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:38 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:37 marostegui: Upgrade db2102 to the new 10.4.13 - [[phab:T250666|T250666]]
* 09:32 _joe_: installing purged 0.11 on cp2027 [[phab:T133821|T133821]]
* 09:21 _joe_: installing purged 0.11 on cp2028 [[phab:T133821|T133821]]
* 09:11 moritzm: re-enabling puppet
* 09:08 mutante: rsyncing /home dirs from people.wikimedia.org to new backend people1002
* 09:00 moritzm: disabling puppet temporarily
* 08:53 _joe_: uploaded purged 0.11
* 08:52 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool pc1010 as pc1 master [[phab:T252182|T252182]] (duration: 01m 17s)
* 07:42 jayme: imported helm 2.16.7-1 to main for stretch-wikimedia
* 07:41 jayme: imported helm 2.16.7-1 to main for buster-wikimedia
* 07:29 godog: roll-restart logstash in codfw/eqiad for configuration change
* 07:14 elukey: upload spark2_2.4.4-bin-hadoop2.6-2 for buster/stretch on apt1001
* 05:33 ryankemper: wdqs2004 was depooled ~3 hours ago and was re-pooled ~10 mins ago after verifying the wdqs service was healthy
* 05:32 ryankemper: wdqs1003 was depooled ~6 hours ago and was re-pooled ~10 mins ago after verifying the wdqs service was healthy
* 05:27 _joe_: restarting php-fpm on mw1374, children dying with SIGILL
* 05:11 root@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 05:11 root@cumin1001: Updating IPMI password on 1 hosts - root@cumin1001
* 05:10 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 05:10 root@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
* 05:10 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 04:52 kart_: Updated cxserver to 2020-05-11-082207-production ([[phab:T250004|T250004]])
* 04:47 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 04:44 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 04:42 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 02:27 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:43 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:33 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
 
== 2020-05-12 ==
* 23:09 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 23:06 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 20:15 hashar@deploy1001: Synchronized php-1.35.0-wmf.32/includes/revisionlist/RevisionItemBase.php: Fix RevisionItemBase::getId to actually return an int, as intended - [[phab:T252076|T252076]] (duration: 01m 06s)
* 19:55 dpifke@deploy1001: Finished deploy [performance/navtiming@48110b9]: Fixes swapped dc/host labels - [[phab:T238086|T238086]] (duration: 00m 05s)
* 19:55 dpifke@deploy1001: Started deploy [performance/navtiming@48110b9]: Fixes swapped dc/host labels - [[phab:T238086|T238086]]
* 19:05 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.32
* 18:41 legoktm: started codereview-archiver script in screen on mwmaint1002
* 18:23 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:23 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 18:17 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 18:17 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:14 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 18:14 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:49 bblack: 'gdnsdctl replace' on all authdns to load new maxmind data
* 17:43 bblack: updating maxmind database on puppetmasters (usually automated weekly; we're mid-cycle)
* 17:10 James_F: Running AbuseFilter updateVarDumps on testwikis on mwmaint1002 [[phab:T246539|T246539]]
* 16:55 James_F: Running AbuseFilter updateVarDumps on closed wikis on mwmaint1002 [[phab:T246539|T246539]]
* 16:55 mstyles@deploy1001: Finished deploy [wdqs/wdqs@f617307]: v0.3.31 (duration: 14m 53s)
* 16:40 mstyles@deploy1001: Started deploy [wdqs/wdqs@f617307]: v0.3.31
* 16:35 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:34 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-query
* 15:15 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 15:15 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 15:14 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 15:13 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:13 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
* 15:12 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:09 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:04 moritzm: installing 4.9.118 Linux updates on Buster nodes (reboots happening later)
* 15:02 moritzm: upgrading contint2001 to openjdk-8 u252
* 15:01 godog: bounce pybal on lvs2010 and lvs2009 - [[phab:T252186|T252186]]
* 14:40 moritzm: imported openjdk-8 u252 forward port for buster-wikimedia component/jdk8
* 14:40 ema: rolling thumbor upgrade to 2.8-1+deb10u1 [[phab:T252509|T252509]] [[phab:T219569|T219569]] [[phab:T236240|T236240]]
* 14:39 andrewbogott: rebuilding cloudcontrol1003 and 1004
* 14:38 hashar: 1.35.0-wmf.22 is on test wikis. Will be pushed to group0 later today during the american window (19:00 - 21:00 UTC) # [[phab:T249964|T249964]]
* 14:34 ema: thumbor2001: repool
* 14:33 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventLogging to EventGate: - Test everywhere, SearchSatisfaction on testwiki only - [[phab:T249261|T249261]] (duration: 01m 06s)
* 14:33 ema: thumbor2001: upgrade python-thumbor-wikimedia to 2.8-1+deb10u1 [[phab:T252509|T252509]] [[phab:T219569|T219569]] [[phab:T236240|T236240]]
* 14:23 moritzm: installing Java security updates on WDQS hosts
* 14:20 hashar@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.32 (duration: 72m 04s)
* 14:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:05 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 14:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:05 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 14:02 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:00 ema: thumbor2001: depool due to minor bug in 2.7-1+deb10u1 [[phab:T252509|T252509]] [[phab:T219569|T219569]] [[phab:T236240|T236240]]
* 13:54 ema: thumbor2001: pool thumbor 2.7-1+deb10u1 for prod traffic [[phab:T252509|T252509]] [[phab:T219569|T219569]] [[phab:T236240|T236240]]
* 13:50 ema: thumbor2001: upgrade python-thumbor-wikimedia to 2.7-1+deb10u1 [[phab:T252509|T252509]] [[phab:T219569|T219569]] [[phab:T236240|T236240]]
* 13:42 jbond42: disable puppet on all CP hosts to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/583342
* 13:36 kormat: reimaging pc2007 to buster [[phab:T252182|T252182]]
* 13:36 moritzm: rebooting netflow* hosts for kernel update
* 13:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:33 vgutierrez: rolling upgrade of ATS to version 8.0.7-1wm5 - [[phab:T249335|T249335]]
* 13:31 moritzm: rebooting deneb for kernel update
* 13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:24 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 13:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 13:24 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 13:08 hashar@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.32
* 13:05 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.28 (duration: 23m 47s)
* 12:37 moritzm: installing iputils update from Buster point release
* 12:08 hashar: Cutting branch 1.35.0-wmf.32 # [[phab:T249964|T249964]]
* 12:08 gehel: restart blazegraph + updater on wdqs2002 - JVM upgrade
* 11:56 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 11:20 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:55 vgutierrez: upgrade trafficserver to version 8.0.7-1wm5 on cp5011 - [[phab:T249335|T249335]]
* 10:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 10:53 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 10:53 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 10:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 10:43 kormat: reimaging pc2010 to buster [[phab:T252182|T252182]]
* 10:30 vgutierrez: upgrade trafficserver to version 8.0.7-1wm5 on cp4032 - [[phab:T249335|T249335]]
* 10:30 ema: rolling thumbor upgrade to 2.6-1+deb10u1 [[phab:T226707|T226707]]
* 10:19 ema: repool thumbor2001 with upgraded python-thumbor-wikimedia
* 10:13 ema: thumbor2001: upgrade python-thumbor-wikimedia to 2.6-1+deb10u1
* 10:04 godog: update compiler facts
* 09:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:34 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 09:34 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 09:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 09:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 09:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 09:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 09:29 filippo@cumin1001: conftool action : set/pooled=yes:weight=100; selector: cluster=thanos
* 09:07 moritzm: rebooting contint2001 for kernel update
* 09:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:46 godog: reboot thanos hosts for kernel upgrade
* 07:41 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:41 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:12 moritzm: rebooting the IDP hosts, SSO sessions will need to be renewed
* 07:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:04 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 06:56 vgutierrez: upload trafficserver 8.0.7-1wm4 to apt.wm.o (buster) - [[phab:T242767|T242767]] [[phab:T249335|T249335]]
* 05:29 marostegui: Restart docker-report-releng on deneb
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only=off for maintenance [[phab:T251502|T251502]]', diff saved to https://phabricator.wikimedia.org/P11180 and previous config saved to /var/cache/conftool/dbconfig/20200512-050339-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance [[phab:T251502|T251502]]', diff saved to https://phabricator.wikimedia.org/P11179 and previous config saved to /var/cache/conftool/dbconfig/20200512-050054-marostegui.json
* 04:46 marostegui: Stop mysql on labsdb1011 to transfer its content - [[phab:T249188|T249188]]
* 02:14 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 02:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:45 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:43 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:16 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:14 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:37 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:34 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
 
== 2020-05-11 ==
* 21:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 21:00 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 20:19 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 20:19 cdanis@cumin1001: START - Cookbook sre.network.cf
* 19:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 19:03 Zoranzoki21: [[phab:T235414|T235414]] is wrong task number, [[phab:T235415|T235415]] is correct
* 19:02 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add *.bollywoodhungama.in and *.britishmuseum.org to $wgCopyUploadDomains ([[phab:T235414|T235414]], [[phab:T251882|T251882]]) (duration: 00m 57s)
* 18:51 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove "Create a book" link on enwiki ([[phab:T241683|T241683]]) (duration: 00m 57s)
* 18:44 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable modern Vector on officewiki, reveal preference on testwiki ([[phab:T251285|T251285]]) (duration: 00m 58s)
* 18:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:40 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add tw-photometa.de to $wgCopyUploadsDomains ([[phab:T252141|T252141]]) (duration: 00m 58s)
* 18:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 18:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 18:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 18:28 catrope@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: Drop mainpage special casing for scowiki and itwiki ([[phab:T252048|T252048]], [[phab:T252065|T252065]]) (duration: 00m 58s)
* 18:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 18:20 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 18:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 18:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 18:11 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/includes/Revision/RevisionStore.php: [[phab:T252156|T252156]] [[phab:T212428|T212428]] RevisionStore: fall back to master db if main slot is missing (duration: 00m 58s)
* 18:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 17:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 17:30 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/AbuseFilter/maintenance/updateVarDumps.php: updateVarDumps: wait for replication after each batch (duration: 00m 58s)
* 17:27 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/skins/Vector/includes/VectorTemplate.php: [[phab:T251521|T251521]] Correctly populate the language variants drop-down rather than breaking early (duration: 00m 59s)
* 17:24 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/skins/Vector/includes/VectorTemplate.php: [[phab:T251521|T251521]] Correctly populate the language variants drop-down rather than breaking early (duration: 00m 59s)
* 17:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 17:04 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
* 16:47 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.31 (duration: 04m 43s)
* 16:42 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 16:42 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.31
* 16:40 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 16:34 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.31
* 16:17 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 16:13 brennen@deploy1001: rebuilt and synchronized wikiversions files: mediawikiwiki to 1.35.0-wmf.31 ([[phab:T249963|T249963]]) for testing [[phab:T252179|T252179]]
* 16:10 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 16:06 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/WikimediaMaintenance: [[gerrit:595076{{!}}Revert "Remove use of WikiPage::doEditContent"]] (duration: 01m 06s)
* 16:05 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/UploadWizard: [[gerrit:595078{{!}}Revert "Remove use of WikiPage::doEditContent"]] (duration: 01m 06s)
* 16:04 hnowlan@deploy1001: Finished deploy [changeprop/deploy@82276cb]: Enabling consumption of purges topic (duration: 01m 58s)
* 16:04 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Babel: [[gerrit:595077{{!}}Revert "Remove use of WikiPage::doEditContent"]] (duration: 01m 07s)
* 16:03 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Translate: [[gerrit:595135{{!}}Revert "Remove uses of WikiPage::doEditContent"]] (duration: 01m 08s)
* 16:02 hnowlan@deploy1001: Started deploy [changeprop/deploy@82276cb]: Enabling consumption of purges topic
* 15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:54 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:52 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:49 cdanis@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=eventgate-analytics.*
* 15:45 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:42 brennen: syncing backports to 1.35.0-wmf.31 ([[phab:T249963|T249963]]) for [[phab:T252179|T252179]]
* 15:42 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:01 moritzm: installing puma security updates
* 14:29 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:44 vgutierrez: upgrade ATS to 8.0.7-1wm4 in cp4032 - [[phab:T249335|T249335]]
* 13:36 hashar: Rolling back CI system switch to previous known state # [[phab:T224591|T224591]]
* 13:20 marostegui: Upgrade mysql package on s4 master in preparation for tomorrow's maintenance [[phab:T251502|T251502]]
* 12:50 hashar: Pointing CI Jenkins to contint2001 Gearman server [[phab:T224591|T224591]]
* 12:46 mutante: contint2001 - chown -R jenkins-slave:jenkins-slave /srv/.git
* 12:45 mutante: contint1001 - rsync -avz --delete /srv/.git/ rsync://contint2001.wikimedia.org/ci--srv/.git/
* 12:43 mutante: contint1001 - rsync -avz --delete /srv/.git/ rsync://contint2001.wikimedia.org/ci--srv-/org/.git/
* 12:40 mutante: contint1001 - rsync -avz --delete /srv/org/wikimedia/integration/ rsync://contint2001.wikimedia.org/ci--srv-/org/wikimedia/integration/
* 12:24 mutante: contint2001 - find /var/lib/jenkins/ -group bacula -exec chown jenkins:jenkins <nowiki>{</nowiki><nowiki>}</nowiki> \;
* 12:21 mutante: contint2001 - find /var/lib/jenkins/ -user statsite -exec chown jenkins <nowiki>{</nowiki><nowiki>}</nowiki> \;
* 12:19 mutante: contint2001 - chown -R jenkins:jenkins /srv/jenkins/*
* 12:19 mutante: contint1001 - rsync -avz --delete /srv/jenkins/ rsync://contint2001.wikimedia.org/ci--srv-/jenkins/
* 12:17 mutante: contint1001 - rsync -avz --delete /var/lib/jenkins/ rsync://contint2001.wikimedia.org/ci--var-lib-jenkins-
* 12:14 hashar: shutting down Zuul and Jenkins for system switch # [[phab:T224591|T224591]]
* 12:02 jynus@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:59 jynus@cumin1001: START - Cookbook sre.hosts.downtime
* 11:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:45 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:32 Lucas_WMDE: EU SWAT done
* 11:30 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/WikimediaEvents/: SWAT: [[gerrit:594693{{!}}Update Banner Interaction Schema (T250791, wmf.30)]] (duration: 01m 08s)
* 11:23 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/WikimediaEvents/: SWAT: [[gerrit:594694{{!}}Update Banner Interaction Schema (T250791, wmf.31)]] (duration: 01m 07s)
* 11:14 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}595478{{!}}Revert limit adjustment for Chinese translation with ContentTranslation (T252371)]] (duration: 01m 09s)
* 10:58 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:595498{{!}} Bumping portals to master (595498)]] (duration: 01m 06s)
* 10:56 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:595498{{!}} Bumping portals to master (595498)]] (duration: 01m 07s)
* 10:15 vgutierrez: upload trafficserver 8.0.7-1wm3 to apt.wm.o (buster) - [[phab:T242767|T242767]] [[phab:T249335|T249335]]
* 09:44 mutante: contint2001 -  find /var/lib/jenkins -user statsite -exec chown jenkins:jenkins <nowiki>{</nowiki><nowiki>}</nowiki> \;
* 09:31 hashar: contint2001 started zuul-merger again (had permission issues in /var/lib/zuul )
* 09:07 mutante: contint1001 - rsync -avpz --delete /srv/jenkins/ rsync://contint2001.wikimedia.org/ci--srv-/jenkins/ ([[phab:T224591|T224591]])
* 09:05 mutante: contint2001 - mkdir /srv/jenkins
* 08:55 hashar: contint2001 stopping zuul-merger , permission problem
* 08:46 godog: bounce ferm on kubernetes1007 to resolve icinga UNKNOWN
* 08:40 mutante: rsyncing /var/lib/jenkins from contint1001 to contint2001 with --delete
* 08:32 mutante: rsynced data from contint1001 to contint2001 - pathes per [[phab:T224591|T224591]]#6039192 for the migration later today
* 08:30 ema: cp3050: upgrade atskafka to 0.6 [[phab:T237993|T237993]]
* 08:30 _joe_: removing the iptables DROP rule on mc1020 [[phab:T251378|T251378]]
* 07:54 moritzm: installing squid security updates
* 07:21 moritzm: updated buster netboot images to 10.4 (updated to latest point release)
* 07:09 _joe_: dropping requests to mc1020 via a firewall rule [[phab:T251378|T251378]]
* 06:04 elukey: restart wikimedia-discovery-golden on stat1007 - apparenlty killed by no memory left to allocate on the system
 
== 2020-05-10 ==
* 12:18 marostegui: Start event scheduler on db1115 after a massive delete - [[phab:T252324|T252324]]
* 11:05 marostegui: Stop event scheduler on db1115 to perform a massive delete - [[phab:T252324|T252324]]
* 10:27 dcausse: restarting blazgraph on wdqs1004: [[phab:T242453|T242453]]
* 09:56 marostegui: Change scaling_governor from powersave to performance on db1115 - [[phab:T252324|T252324]]
* 09:25 marostegui: Stop MySQL and restart db1115 - [[phab:T252324|T252324]]
* 08:50 marostegui: Restart mysql on db1115 to change buffer pool size from 20GB to 40GB [[phab:T252324|T252324]] (
* 08:44 elukey: Power cycle analytics1052 after eno1 issue
* 08:01 marostegui: Disable unused events like %_schema [[phab:T252324|T252324]]  [[phab:T231185|T231185]]
* 07:11 marostegui: Restart mysql on db1115 [[phab:T231185|T231185]]
* 07:11 marostegui: Truncate tendril. processlist_query_log [[phab:T231185|T231185]]
 
== 2020-05-08 ==
* 21:45 bstorm_: cleaned up wb_terms_no_longer_updated view for testwikidatawiki and testcommonswiki on labsdb1010 [[phab:T251598|T251598]]
* 21:45 bstorm_: cleaned up wb_terms_no_longer_updated view on labsdb1012 [[phab:T251598|T251598]]
* 21:33 bstorm_: cleaning up wb_terms_no_longer_updated view on labsdb1009 [[phab:T251598|T251598]]
* 21:06 ottomata: running prefered replica election for kafka-jumbo  to get preferred leaders back after reboot of broker earlier today - [[phab:T252203|T252203]]
* 19:16 jhuneidi@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 19:12 jhuneidi@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 19:07 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 18:12 andrewbogott: reprepro copy buster-wikimedia stretch-wikimedia prometheus-openstack-exporter for [[phab:T252121|T252121]]
* 17:59 marostegui: Extend /srv by 500G on labsdb1011 [[phab:T249188|T249188]]
* 16:55 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:53 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:51 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:48 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 16:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:36 ottomata: starting kafka broker on kafka-jumbo1006, same issue on other brokers when they are leaders of offending partitions - [[phab:T252203|T252203]]
* 15:31 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:28 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:27 ottomata: stopping kafka broker on kafka-jumbo1006 to investigate camus import failures - [[phab:T252203|T252203]]
* 14:50 otto@deploy1001: Finished deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only (duration: 00m 03s)
* 14:50 otto@deploy1001: Started deploy [analytics/refinery@4a2c530]: fix for camus wrapper, deploy to an-launcher1001 only
* 14:05 akosiaris: [[phab:T243106|T243106]] undo experiment with DROP iptable rules this time around. Use mw1331, mw1348
* 13:22 vgutierrez: rolling restart of ats-tls on eqiad, codfw, ulsfo and eqsin - [[phab:T249335|T249335]]
* 13:20 akosiaris: [[phab:T243106|T243106]] redo experiment with DROP iptable rules this time around. Use mw1331, mw1348
* 13:16 akosiaris: [[phab:T243106|T243106]] undo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348. Experiment done successfully, no issues to the infrastructure.
* 12:49 akosiaris: [[phab:T243106|T243106]] redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle. Use mw1331, mw1348
* 12:49 akosiaris: [[phab:T243106|T243106]] redo experiment with REJECT, DROP iptable rules now that we have envoy in the middle
* 11:49 hnowlan: restarting cassandra on restbase2009 for java updates
* 11:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:08 akosiaris: repool eqiad eventgate-analytics. Test concluded
* 11:08 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 09:54 mutante: disabling puppet on puppetmasters temporarily to switch them carefully to use httpd module and not apache module which we want to get rid of
* 09:52 akosiaris: depool eqiad eventgate-analytics for a test involving reinitializing the eqiad kubernetes cluster
* 09:52 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
* 09:51 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 09:45 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=eventgate-analytics.*
* 08:20 vgutierrez: rolling restart of ats-tls on esams - [[phab:T249335|T249335]]
* 07:19 vgutierrez: ats-tls restart on cp3050 and cp3052 (max_connections_active_in experiment) - [[phab:T249335|T249335]]
* 07:07 mutante: phabricator rmdir /var/run/phd/pid  - empty and now unused
* 07:01 moritzm: installing php5 security updates
* 05:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:10 marostegui: Upgrade pc1010
* 00:30 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert all wikis except test to 1.35.0-wmf.30 for [[phab:T252179|T252179]]
* 00:19 brennen: rolling 1.35.0-wmf.31 train back to group0 for [[phab:T252179|T252179]]
 
== 2020-05-07 ==
* 22:36 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
* 22:31 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Scribunto/includes/engines/LuaCommon/TitleLibrary.php: [[gerrit:595054{{!}}Handle RevisionAccessException with try-catch (T252156)]] (duration: 01m 08s)
* 20:40 ryankemper@cumin2001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 20:37 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 20:10 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingStreamNames: set initial stream names, as yet unused - [[phab:T238230|T238230]] (duration: 01m 07s)
* 19:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.35.0-wmf.30
* 19:09 brennen: rolling 1.35.0-wmf.31 back to group1
* 19:09 XioNoX: Upgrade Routinator 3000 to 0.7.0 on rpki1001 - [[phab:T252010|T252010]]
* 19:05 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.31
* 18:25 ppchelko@deploy1001: Finished deploy [changeprop/deploy@383fba5]: Enable both purging types [[phab:T252142|T252142]] (duration: 01m 17s)
* 18:23 ppchelko@deploy1001: Started deploy [changeprop/deploy@383fba5]: Enable both purging types [[phab:T252142|T252142]]
* 18:15 Urbanecm: Morning SWAT done
* 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|899c175}}: Update project icons to refreshed SVGs ([[phab:T249047|T249047]]; part 2/2) (duration: 01m 06s)
* 18:13 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: {{Gerrit|899c175}}: Update project icons to refreshed SVGs ([[phab:T249047|T249047]]; part 1/2) (duration: 01m 08s)
* 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|54bd2f1}}: Add the investigate right to the checkuser group on testwiki ([[phab:T251932|T251932]]) (duration: 01m 08s)
* 17:50 bsitzmann@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 17:46 bsitzmann@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 17:44 bsitzmann@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 17:44 otto@deploy1001: Finished deploy [analytics/refinery@4a2c530]: (no justification provided) (duration: 05m 31s)
* 17:38 otto@deploy1001: Started deploy [analytics/refinery@4a2c530]: (no justification provided)
* 17:18 ejegg: updated payments-wiki from {{Gerrit|afb84cc391}} to {{Gerrit|dabba1804c}}
* 16:46 hnowlan@deploy1001: Finished deploy [changeprop/deploy@cd1386e]: Rollback varnish consumption (duration: 01m 05s)
* 16:45 hnowlan@deploy1001: Started deploy [changeprop/deploy@cd1386e]: Rollback varnish consumption
* 16:42 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 16:36 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 16:32 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:29 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 16:27 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:27 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 16:26 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 16:26 hnowlan@deploy1001: Finished deploy [changeprop/deploy@cd1386e]: Enabling consumption of purges topic (duration: 01m 45s)
* 16:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 16:24 hnowlan@deploy1001: Started deploy [changeprop/deploy@cd1386e]: Enabling consumption of purges topic
* 16:23 hnowlan@deploy1001: Finished deploy [changeprop/deploy@6c65779]: Enabling consumption of purges topic (duration: 00m 24s)
* 16:23 hnowlan@deploy1001: Started deploy [changeprop/deploy@6c65779]: Enabling consumption of purges topic
* 15:59 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 15:51 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 15:36 jforrester@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Collection/includes/Specials/SpecialCollection.php: [[phab:T251460|T251460]] Set skin on BaseTemplates if you are using getSkin (duration: 01m 08s)
* 15:28 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 15:27 vgutierrez: rolling restart of ats-tls on text@esams - [[phab:T249335|T249335]]
* 15:26 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 15:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 15:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 15:12 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:09 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:03 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:59 moritzm: imported component/facter3 for stretch-wikimedia into "main"
* 14:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:50 moritzm: imported component/puppet5 for stretch-wikimedia into "main"
* 14:49 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 14:47 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:42 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:40 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 14:30 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:17 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:07 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:06 moritzm: imported component/facter3 for jessie-wikimedia into "main"
* 13:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 13:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 13:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 13:12 hashar@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 13:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:04 jynus: disabling puppet on all db hosts to control deployment of new paging alert [[phab:T172489|T172489]]
* 13:02 zpapierski@deploy1001: Finished deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI - new servers (duration: 02m 43s)
* 13:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 12:59 zpapierski@deploy1001: Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI - new servers
* 12:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 12:43 zpapierski@deploy1001: Finished deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI (duration: 16m 20s)
* 12:27 zpapierski@deploy1001: Started deploy [wdqs/wdqs@94906d0]: Deploy WDQS 0.3.28 + GUI
* 12:13 addshore@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/Wikibase: [[gerrit:594920]] [[phab:T252079|T252079]] Revert "Move prefetching-term-lookup-callback service wiring" (duration: 01m 12s)
* 12:12 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:33 moritzm: imported component/puppet5 for jessie-wikimedia into "main"
* 11:31 jbond42: enable ferm-status script https://gerrit.wikimedia.org/r/c/operations/puppet/+/576102
* 11:10 matthiasmullie: EU swat done
* 11:07 mlitn@deploy1001: Synchronized php-1.35.0-wmf.31/extensions/WikibaseMediaInfo/: [MediaInfo] Add dummy concept chips without thumbnail (duration: 01m 09s)
* 10:07 moritzm: installing Java security updates on restbase/sessionstore
* 09:11 elukey: roll restart cassandra on aqs1005 to pick up new openjdk upgrades (canary)
* 08:32 moritzm: upgrading restbase-dev to latest OpenJDK security update
* 08:06 jynus: setting pc2007, pc2009 as read-write
* 07:44 godog: further decrease weight for ms-be10[678] - [[phab:T252008|T252008]]
* 05:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:33 elukey: restart hadoop yarn nodemanager on analytics1071
* 05:22 marostegui: Reimage db2078
* 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 and s7 as read-only=off for maintenance [[phab:T251158|T251158]]', diff saved to https://phabricator.wikimedia.org/P11167 and previous config saved to /var/cache/conftool/dbconfig/20200507-050419-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 and s7 as read-only for maintenance [[phab:T251158|T251158]]', diff saved to https://phabricator.wikimedia.org/P11166 and previous config saved to /var/cache/conftool/dbconfig/20200507-050046-marostegui.json
* 02:56 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.35.0-wmf.30 for [[phab:T252079|T252079]]
* 02:55 brennen: reverting group1 to 1.35.0-wmf.30 for [[phab:T252079|T252079]]
* 00:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
 
== 2020-05-06 ==
* 23:59 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable GrowthExperiments guidance on testwiki (duration: 01m 07s)
* 23:18 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable password-reset-update on Wikipedias ([[phab:T245791|T245791]]) (duration: 01m 07s)
* 22:22 brennen@deploy1001: Synchronized php-1.35.0-wmf.31/includes/revisionlist/RevisionItem.php: [[gerrit:594803{{!}}RevisionItem: Fix providing timestamp in getRevisionLink ]] (duration: 01m 09s)
* 21:45 andrewbogott: updating puppet compiler facts
* 21:07 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:05 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 21:04 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:35 ejegg: updated Fundraising CiviCRM from {{Gerrit|b15b2cfbb5}} to {{Gerrit|cfb6101e39}}
* 19:08 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.31 (duration: 01m 08s)
* 19:07 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.31
* 19:03 brennen: CORRECTION: 1.35.0-wmf.31 train unblocked ([[phab:T249963|T249963]]), rolling forward to group1
* 19:03 brennen: 1.35.0-wmf.31 train unblocked ([[phab:T249963|T249963]]), rolling forward to group0
* 18:58 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/pagers/DeletedContribsPager.php: deploy https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/594778/ fixes UBN [[phab:T252052|T252052]] (duration: 01m 09s)
* 18:54 volans: upgraded spicerack to spicerack_0.0.34-1_amd64.deb on cumin[12]001
* 18:45 volans: uploaded spicerack_0.0.34-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 18:44 volans@deploy1001: Finished deploy [homer/deploy@8224f0a]: Release v0.2.2 (duration: 00m 18s)
* 18:44 volans@deploy1001: Started deploy [homer/deploy@8224f0a]: Release v0.2.2
* 18:28 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/pagers/DeletedContribsPager.php: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/594768/ fixes [[phab:T252043|T252043]] (duration: 01m 08s)
* 17:34 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:31 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:12 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:06 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:05 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 16:21 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:41 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:27 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:36 mutante: puppetmaster - revoking cert for webserver-misc-apps , recreating it with static-codereview.wikimedia.org as addiitonal SAN ([[phab:T243056|T243056]])
* 13:32 hashar: Restarting CI Jenkins
* 13:27 mutante: puppetmaster - revoking cert for webserver-misc-static, not used anymore, merged into webserver-misc-apps
* 13:27 moritzm: installing graphicsmagick security updates
* 13:26 XioNoX: Upgrade Routinator 3000 to 0.7.0 on rpki2001 - [[phab:T252010|T252010]]
* 13:25 XioNoX: add routinator 3000 0.7.0 to buster-wikimedia - [[phab:T252010|T252010]]
* 13:19 ema: cp: upgrade purged to v0.10
* 13:08 godog: start swift decom ms-be101[678] - [[phab:T252008|T252008]]
* 11:22 kart_: EU SWAT done.
* 11:13 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}594668{{!}}Enable ContentTranslation in Armenian WP as a default tool (T249229)]] (duration: 01m 08s)
* 10:27 ema: cp2027: test purged v0.10
* 10:20 moritzm: restarting apache on dbmonitor/grafana/miscweb/graphite/netmon to pick up openldap update
* 10:00 moritzm: installing remaining openldap security updates (client-side libs, tools)
* 09:52 jbond42: enable rember me feature of CAS
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121 and remove db1103:3314 from vslow in s4', diff saved to https://phabricator.wikimedia.org/P11159 and previous config saved to /var/cache/conftool/dbconfig/20200506-093940-marostegui.json
* 09:12 marostegui: Upgrade package on s3 and s7 master (db1123 and db1086) in preparation for tomorrow's restart - [[phab:T251158|T251158]]
* 08:56 jbond42: restarting ps1-a4-eqiad.mgmt.eqiad.wmnet.
* 08:53 jynus: kill FTWRL on db2101
* 08:43 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: Reverting change on mw1407 [[phab:T99740|T99740]] (duration: 01m 16s)
* 08:02 _joe_: restarted php-fpm with tweaked parameters on mw1407, now briefly pooling for traffic ([[phab:T99740|T99740]])
* 07:38 kormat@cumin1001: dbctl commit (dc=all): 'Set es1023 (es5 master) to 0 weight after reimaging es1024 [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11158 and previous config saved to /var/cache/conftool/dbconfig/20200506-073856-kormat.json
* 07:32 vgutierrez: downgrade to ATS 8.0.7-1wm3 on cp4026, cp4031, cp5006 and cp5011
* 06:00 elukey: powercycle analytics1060 - host stuck - [[phab:T251973|T251973]]
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1103:3314 in vslow on s4 while db1121 is out [[phab:T250055|T250055]]', diff saved to https://phabricator.wikimedia.org/P11157 and previous config saved to /var/cache/conftool/dbconfig/20200506-050340-marostegui.json
* 05:02 marostegui: Deploy schema change on db1121
 
== 2020-05-05 ==
* 23:44 catrope@deploy1001: Synchronized wmf-config/flaggedrevs.php: Restore the reviewer group on fawiki ([[phab:T249643|T249643]]) (duration: 01m 06s)
* 23:22 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part3) (duration: 00m 11s)
* 23:22 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part3)
* 23:22 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1) (duration: 01m 14s)
* 23:21 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1)
* 23:21 crusnov@deploy1001: Finished deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1) (duration: 01m 20s)
* 23:20 crusnov@deploy1001: Started deploy [netbox/deploy@03cc2dd]: Netbox upgrade to 2.8.1 (part1)
* 22:00 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/parser/CoreParserFunctions.php: [[phab:T251952|T251952]] take 2 (duration: 01m 06s)
* 21:57 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/parser/CoreParserFunctions.php: [[phab:T251952|T251952]] (duration: 01m 05s)
* 21:55 reedy@deploy1001: Synchronized php-1.35.0-wmf.31/includes/specials/SpecialNewpages.php: [[phab:T251950|T251950]] (duration: 01m 06s)
* 20:02 herron: added ryankemper to wmf and ops ldap groups [[phab:T251572|T251572]]
* 19:38 mforns@deploy1001: Finished deploy [analytics/refinery@6868fc0] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 00m 08s)
* 19:38 mforns@deploy1001: Started deploy [analytics/refinery@6868fc0] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 19:38 mforns@deploy1001: Finished deploy [analytics/refinery@6868fc0]: Regular analytics weekly train (2nd try) [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 25m 18s)
* 19:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.31
* 19:13 mforns@deploy1001: Started deploy [analytics/refinery@6868fc0]: Regular analytics weekly train (2nd try) [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 19:12 brennen@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.31 (duration: 97m 23s)
* 19:02 brennen: train status: 1.35.0-wmf.31: presently pressing enter through scap-cdb-rebuild; at 8% ([[phab:T249963|T249963]], [[phab:T223287|T223287]])
* 18:39 cdanis: depool mw2221 for some manual testing
* 18:35 mforns@deploy1001: Finished deploy [analytics/refinery@ebd624a] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 00m 09s)
* 18:35 mforns@deploy1001: Started deploy [analytics/refinery@ebd624a] (thin): Regular analytics weekly train THIN [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 18:34 mforns@deploy1001: Finished deploy [analytics/refinery@ebd624a]: Regular analytics weekly train [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c] (duration: 18m 54s)
* 18:15 mforns@deploy1001: Started deploy [analytics/refinery@ebd624a]: Regular analytics weekly train [analytics/refinery@ebd624a5e4c88ac6983387d4603971f8a326ee7c]
* 17:35 brennen@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.31
* 16:48 brennen: 1.35.0-wmf.31 was branched at {{Gerrit|4d3fed31a435e7bd24925a154f89a9407670986d}} for [[phab:T249963|T249963]]
* 16:34 brennen: triggering branch cut for 1.35.0-wmf.31 ([[phab:T249963|T249963]]) via https://releases-jenkins.wikimedia.org/job/MediaWiki%20Train%20Branch%20Cut/build?delay=0sec
* 16:18 brennen: notice: planning branch cut for 1.35.0-wmf.31 ([[phab:T249963|T249963]]) at 16:30 UTC
* 15:47 cstone: SmashPig revision changed from {{Gerrit|8c30ed7fe5}} to {{Gerrit|cd1a49da5f}}
* 15:38 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 100% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11153 and previous config saved to /var/cache/conftool/dbconfig/20200505-153843-kormat.json
* 15:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 14:58 hnowlan@deploy1001: Finished deploy [changeprop/deploy@6c65779]: Enabling on_transclusion_update on k8s, disabling on scb (duration: 01m 31s)
* 14:56 hnowlan@deploy1001: Started deploy [changeprop/deploy@6c65779]: Enabling on_transclusion_update on k8s, disabling on scb
* 14:45 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 14:43 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 14:32 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 75% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11149 and previous config saved to /var/cache/conftool/dbconfig/20200505-143158-kormat.json
* 13:46 akosiaris: deploy cxserver chart 0.0.15 to staging, codfw, eqiad. [[phab:T219921|T219921]]
* 13:45 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 13:41 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 13:41 hashar: Updated Jenkins job https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler to have it defined in JJB # [[phab:T97513|T97513]]
* 13:36 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 13:18 vgutierrez: upgrade ATS to version 8.1 () on cp4026, cp4032, cp5006 and cp5011
* 13:15 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 to 50% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11147 and previous config saved to /var/cache/conftool/dbconfig/20200505-131520-kormat.json
* 12:52 kormat@cumin1001: dbctl commit (dc=all): 'Repool es1024 at 25% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11145 and previous config saved to /var/cache/conftool/dbconfig/20200505-125254-kormat.json
* 12:37 XioNoX: push pfw policy - [[phab:T251769|T251769]]
* 12:07 jbond42: updating cas login page
* 12:07 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:03 moritzm: rolling restart of apache on puppetboard* to pick up OpenLDAP update
* 11:47 moritzm: rolling restart of apache on kibana hosts
* 11:41 mutante: LDAP - added eamedia to wmf group ([[phab:T251358|T251358]])
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11144 and previous config saved to /var/cache/conftool/dbconfig/20200505-113152-marostegui.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11143 and previous config saved to /var/cache/conftool/dbconfig/20200505-113100-marostegui.json
* 11:30 marostegui: Drop [[phab:T248086|T248086]]_wb_terms table on labsdb hosts - [[phab:T248086|T248086]]
* 11:26 moritzm: rolling restart of apache/FPM on mw1261-mw1265
* 11:22 kart_: EU SWAT done.
* 11:09 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}592479{{!}}Adjust ContentTranslation MT threshold for Chinese WP to 70% (T246383)]] (duration: 01m 01s)
* 11:01 moritzm: installing remaining openldap security updates (client-side libs, tools)
* 11:00 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1024 for reimaging, add es1023 (master) for reading in the meantime [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11141 and previous config saved to /var/cache/conftool/dbconfig/20200505-110031-kormat.json
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1126 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11140 and previous config saved to /var/cache/conftool/dbconfig/20200505-104540-marostegui.json
* 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 [[phab:T248086|T248086]]', diff saved to https://phabricator.wikimedia.org/P11139 and previous config saved to /var/cache/conftool/dbconfig/20200505-104441-marostegui.json
* 10:33 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:23 arturo: copy prometheus-rabbitmq-exporter v0.4 from stretch-wikimedia to buster-wikimedia in apt1001 ([[phab:T251660|T251660]])
* 10:18 arturo: copy prometheus-pdns-exporter v0.5.1 from stretch-wikimedia to buster-wikimedia in apt1001 ([[phab:T251575|T251575]])
* 10:16 mutante: temp disabling puppet on all ganeti hosts to carefully deploy change related to rapi cert location
* 09:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:36 moritzm: removing boron.eqiad.wmnet
* 09:36 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 09:03 gehel: restarting wdqs updater on all servers
* 08:53 moritzm: installing Java security updates on releases*
* 08:44 kormat: reimaging es1024 to buster [[phab:T250666|T250666]]
* 08:27 ema: cp2028 and cp2030 (both upload): varnish-fe restart to clear cache and evaluate 'exp' admission policy [[phab:T144187|T144187]] [[phab:T249809|T249809]]
* 08:26 moritzm: upgrading slapd on serpens/seaborgium
* 08:19 ema: cp2027 and cp2029 (both text): varnish-fe restart to clear cache and evaluate 'exp' admission policy [[phab:T144187|T144187]] [[phab:T249809|T249809]]
* 08:08 moritzm: installing Java security updates on notebook/stat hosts
* 07:54 gehel@deploy1001: Finished deploy [wdqs/wdqs@d37a059]: rollback wdqs to v 0.3.22 (duration: 04m 18s)
* 07:50 gehel@deploy1001: Started deploy [wdqs/wdqs@d37a059]: rollback wdqs to v 0.3.22
* 07:36 zpapierski@deploy1001: Started deploy [wdqs/wdqs@d37a059]: fix for the duplicated jars
* 06:59 addshore: depool wdqs1006 heavy lag
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 and s6 as read-only=off for maintenance [[phab:T251154|T251154]]', diff saved to https://phabricator.wikimedia.org/P11133 and previous config saved to /var/cache/conftool/dbconfig/20200505-052334-marostegui.json
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 and s6 as read-only for maintenance [[phab:T251154|T251154]]', diff saved to https://phabricator.wikimedia.org/P11132 and previous config saved to /var/cache/conftool/dbconfig/20200505-052058-marostegui.json
* 05:19 marostegui: Start s5 and s6 maintenance - [[phab:T251154|T251154]]
* 04:39 marostegui: Restart mysql on tendril host: db1115 - [[phab:T231769|T231769]]
 
== 2020-05-04 ==
* 23:38 mstyles@deploy1001: Finished deploy [wdqs/wdqs@6518a8d]: v.0.3.26 (duration: 14m 39s)
* 23:37 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Use namespaced EventBus classes (duration: 00m 57s)
* 23:35 reedy@deploy1001: Synchronized wmf-config/logging.php: Use namespaced EventBus classes (duration: 00m 56s)
* 23:33 reedy@deploy1001: Synchronized rpc/RunSingleJob.php: Use namespaced EventBus classes (duration: 00m 58s)
* 23:29 reedy@deploy1001: Synchronized wmf-config/logging.php: Replace AuthManagerStatsdHandler with WikimediaEventsAuthManagerStatsdHandler::class (duration: 00m 57s)
* 23:23 mstyles@deploy1001: Started deploy [wdqs/wdqs@6518a8d]: v.0.3.26
* 22:42 sbassett@deploy1001: Synchronized private/PrivateSettings.php: [[phab:T251835|T251835]]: Restore {{Gerrit|dc752af1e94684faacbe9662789815c6edbbdf46}} (duration: 00m 57s)
* 22:16 eileen: process-control config revision is {{Gerrit|2eb75f8dff}}
* 22:06 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Partial mitigation for [[phab:T250887|T250887]] (duration: 00m 57s)
* 21:45 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Revert partial mitigation for [[phab:T250887|T250887]] (duration: 00m 57s)
* 21:41 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Deploy partial mitigation for [[phab:T250887|T250887]] (duration: 00m 57s)
* 18:20 dpifke@deploy1001: Finished deploy [performance/navtiming@239d359]: Deploy navtiming with new/updated Prometheus metrics - [[phab:T249822|T249822]], [[phab:T238086|T238086]] (duration: 00m 05s)
* 18:19 dpifke@deploy1001: Started deploy [performance/navtiming@239d359]: Deploy navtiming with new/updated Prometheus metrics - [[phab:T249822|T249822]], [[phab:T238086|T238086]]
* 18:16 Urbanecm: Morning SWAT done
* 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c04fbdd}}: Adding upload_by_url user right to all registered users on Commons ([[phab:T251474|T251474]]) (duration: 00m 57s)
* 18:11 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/DiscussionTools/includes/DiscussionToolsHooks.php: SWAT: {{Gerrit|b85fc16}}: Enable on all ExtraSignaturesNamespaces ([[phab:T249036|T249036]]) (duration: 01m 00s)
* 18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|18c1efb}}: Load DiscussionTools on en.wiki ([[phab:T249376|T249376]]) (duration: 00m 58s)
* 17:57 XioNoX: configure singtel interface on cr1-eqsin
* 17:36 volans: upgraded spicerack on cumin[12]001 to 0.0.33-1
* 17:02 joal@deploy1001: Finished deploy [analytics/refinery@2252f9a] (thin): Analytics hotfix deploy 2 THIN (sqoop) [{{Gerrit|2252f9a}}] (duration: 00m 09s)
* 17:02 joal@deploy1001: Started deploy [analytics/refinery@2252f9a] (thin): Analytics hotfix deploy 2 THIN (sqoop) [{{Gerrit|2252f9a}}]
* 17:01 joal@deploy1001: Finished deploy [analytics/refinery@2252f9a]: Analytics hotfix deploy 2 (sqoop) [{{Gerrit|2252f9a}}] (duration: 16m 45s)
* 16:44 joal@deploy1001: Started deploy [analytics/refinery@2252f9a]: Analytics hotfix deploy 2 (sqoop) [{{Gerrit|2252f9a}}]
* 16:08 liw@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.35.0-wmf.30
* 15:59 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.30 (duration: 01m 05s)
* 15:58 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.30
* 15:53 root@cumin1001: END (PASS) - Cookbook sre.hosts.ipmi-password-reset (exit_code=0)
* 15:53 root@cumin1001: Updating IPMI password on 1 hosts - root@cumin1001
* 15:53 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 15:52 root@cumin1001: END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99)
* 15:52 root@cumin1001: START - Cookbook sre.hosts.ipmi-password-reset
* 15:47 kormat@cumin1001: dbctl commit (dc=all): 'Repool es2025 after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11128 and previous config saved to /var/cache/conftool/dbconfig/20200504-154747-kormat.json
* 15:45 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/includes/libs/rdbms/database/DatabaseMysqlBase.php: [[phab:T251457|T251457]] rdbms: don't treat lock() as a write operation (duration: 01m 04s)
* 15:43 jforrester@deploy1001: Synchronized php-1.35.0-wmf.30/resources/src/mediawiki.diff.styles/diff.less: [[phab:T250393|T250393]] Follow-up {{Gerrit|I07dd6f7}}: Fix font size in diff (duration: 01m 05s)
* 15:34 volans: uploaded spicerack_0.0.33-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 15:26 volans: deploy1001: deleted old .hhvm.hhbc files (/home/*/.hhvm.hhbc) https://phabricator.wikimedia.org/P11127
* 15:23 volans: deploy1001: deleted old .hhvm.hhbc files moved from tin (/home/*/home-tin/.hhvm.hhbc) https://phabricator.wikimedia.org/P11126
* 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 fully after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11125 and previous config saved to /var/cache/conftool/dbconfig/20200504-151243-kormat.json
* 15:11 ppchelko@deploy1001: Finished deploy [restbase/deploy@74db57e]: Enable greek community wiki, fix analytics endpoints (duration: 14m 36s)
* 15:05 joal@deploy1001: Finished deploy [analytics/refinery@3396279] (thin): Analytics hotfix deploy (sqoop) THIN [{{Gerrit|3396279}}] (duration: 00m 10s)
* 15:05 joal@deploy1001: Started deploy [analytics/refinery@3396279] (thin): Analytics hotfix deploy (sqoop) THIN [{{Gerrit|3396279}}]
* 15:05 joal@deploy1001: Finished deploy [analytics/refinery@3396279]: Analytics hotfix deploy (sqoop) [{{Gerrit|3396279}}] (duration: 15m 07s)
* 15:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:57 ppchelko@deploy1001: Started deploy [restbase/deploy@74db57e]: Enable greek community wiki, fix analytics endpoints
* 14:50 joal@deploy1001: Started deploy [analytics/refinery@3396279]: Analytics hotfix deploy (sqoop) [{{Gerrit|3396279}}]
* 14:19 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 fully and db1101:3318 to 75% after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11123 and previous config saved to /var/cache/conftool/dbconfig/20200504-141919-kormat.json
* 14:15 XioNoX: add static nat for fran1001 - [[phab:T251763|T251763]]
* 13:50 kormat@cumin1001: dbctl commit (dc=all): 'Depool es2025 for reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11122 and previous config saved to /var/cache/conftool/dbconfig/20200504-135039-kormat.json
* 13:34 kormat: reimaging es2025 to buster [[phab:T250666|T250666]]
* 13:27 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 and db1101:3318 some more after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11121 and previous config saved to /var/cache/conftool/dbconfig/20200504-132744-kormat.json
* 13:02 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T248664|T248664]] Stop setting legacy wmgWikibase(Repo/Client)Repositories for TEST wikis (duration: 01m 06s)
* 12:47 kormat@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317 and db1101:3318 after reimaging [[phab:T250666|T250666]]', diff saved to https://phabricator.wikimedia.org/P11120 and previous config saved to /var/cache/conftool/dbconfig/20200504-124659-kormat.json
* 12:10 marostegui: Temporary enable slow query log on db1099:3311 - [[phab:T206103|T206103]]
* 12:09 Amir1: EU SWAT is done
* 11:53 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:592761{{!}}Increase wmgMemoryLimit from 660MB to 666MB]] (duration: 01m 06s)
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311 [[phab:T206103|T206103]] after removing tmp_2 index', diff saved to https://phabricator.wikimedia.org/P11119 and previous config saved to /var/cache/conftool/dbconfig/20200504-114727-marostegui.json
* 11:46 tgr@deploy1001: Synchronized php-1.35.0-wmf.30/extensions/GrowthExperiments/modules/helppanel/ext.growthExperiments.HelpPanel.cta.js: SWAT: [[gerrit:594134{{!}}Help panel: Check if guidance feature flag is set before loading mobile peek (T251589)]] (duration: 01m 06s)
* 11:46 marostegui: Remove index tmp_2 from recentchanges on db1099:3311 [[phab:T206103|T206103]]
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 [[phab:T206103|T206103]] to remove tmp_2 index', diff saved to https://phabricator.wikimedia.org/P11118 and previous config saved to /var/cache/conftool/dbconfig/20200504-114539-marostegui.json
* 11:43 tgr@deploy1001: Synchronized php-1.35.0-wmf.28/extensions/GrowthExperiments/modules/helppanel/ext.growthExperiments.HelpPanel.cta.js: SWAT: [[gerrit:594137{{!}}Help panel: Check if guidance feature flag is set before loading mobile peek (T251589)]] (duration: 01m 10s)
* 11:38 jbond42: rebooting ps1-a7-codfw.mgmt.eqiad.wmnet.
* 11:30 jbond42: rebooting ps1-a7-codfw.mgmt.eqiad.wmnet.
* 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4d00236}}: Enable cross-project search on frwikibooks ([[phab:T251683|T251683]]) (duration: 01m 05s)
* 11:25 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/elwikiversity*.png ([[phab:T251050|T251050]])
* 11:24 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|64556ba}}: Correct typo in Greek Wikiversity logo ([[phab:T248391|T248391]]) (duration: 01m 06s)
* 11:20 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/jvwiki*.png ([[phab:T251050|T251050]])
* 11:20 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|3b8c618}}: Update jvwiki logos ([[phab:T251050|T251050]]) (duration: 01m 05s)
* 11:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|cc94ea7}}: Enable VisualEditor for more namespaces on vecwiki ([[phab:T250419|T250419]]) (duration: 01m 07s)
* 10:49 arturo: update packages in buster-wikimedia {{!}} thirdparty/kubead-k8s-1-15 and thirdparty/kubeadm-k8s-1-16 ([[phab:T250866|T250866]])
* 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:594128{{!}} Bumping portals to master (563985)]] (duration: 01m 05s)
* 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:594128{{!}} Bumping portals to master (563985)]] (duration: 01m 29s)
* 10:39 vgutierrez: rolling upgrade of ATS to version 8.0.7-1wm3
* 10:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:33 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 10:30 arturo: running `aborrero@apt1001:~ $ sudo -i reprepro --delete clearvanished` to cleanup buster-wikimedia{{!}}thirdparty/kubeadm-k8s ([[phab:T250866|T250866]])
* 09:46 vgutierrez: upload trafficserver 8.0.7-1wm2 to apt.wm.o (buster)
* 09:22 kormat: reimaging db1101 to buster [[phab:T250666|T250666]]
* 08:50 XioNoX: configure BGP peering with AS132203
* 08:20 godog: add 50G to prometheus-ops on prometheus100[34]
* 08:17 marostegui: Deploy schema change on s5 codfw - [[phab:T251188|T251188]]
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 and db1101:3318 for reimage', diff saved to https://phabricator.wikimedia.org/P11113 and previous config saved to /var/cache/conftool/dbconfig/20200504-075148-marostegui.json
* 07:31 marostegui: Drop unused flagged* tables from mediawikiwiki - [[phab:T248298|T248298]]
* 07:26 moritzm: removed jmorgan from cn=wmf
* 07:24 marostegui: Install 10.1.43-2 on s5 (db110) and s6 (db1131) masters in preparations for tomorrow's restart - [[phab:T251154|T251154]]
* 07:24 moritzm: removed Kerberos principal for lexnasser and jmorgan
* 07:23 moritzm: removed lexnasser from cn=nda
* 07:07 elukey: execute ifdown eno1; ifup eno1 on analytics1052 - interface neg speed flapping
* 06:41 elukey: upload prometheus-druid-exporter 0.8-1 to stretch-wikimedia
 
== 2020-05-03 ==
* 22:52 Krinkle: scap pull mwmaint1002 and mw2001 for noc.wm.o. – https://gerrit.wikimedia.org/r/593929
* 22:42 Krinkle: scap pull mwmaint1002 and mw2001 for noc.wm.o. – https://gerrit.wikimedia.org/r/591459
* 21:37 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@0c68d62]: Update the recommendation API service (duration: 04m 22s)
* 21:32 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@0c68d62]: Update the recommendation API service
 
== 2020-05-02 ==
* 07:49 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(49{{!}}5[0-9]{{!}}6[0-2])\.eqiad\.wmnet
* 07:08 XioNoX: asw2-d-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 0 member 1
* 02:36 volker-e@deploy1001: Finished deploy [design/style-guide@f0d467b]: Deploy design/style-guide:  (duration: 00m 07s)
* 02:36 volker-e@deploy1001: Started deploy [design/style-guide@f0d467b]: Deploy design/style-guide:
 
== 2020-05-01 ==
* 19:56 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 18:57 gehel: restart blazegraph on wdqs1006 - [[phab:T242453|T242453]]
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1104 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P11110 and previous config saved to /var/cache/conftool/dbconfig/20200501-142354-marostegui.json
* 14:18 hknust: holger@mwmaint1002 finished renameInvalidUsernames.php (fail) as part of [[phab:T219279|T219279]]
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1104 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P11109 and previous config saved to /var/cache/conftool/dbconfig/20200501-140603-marostegui.json
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'More traffic to db1104 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P11108 and previous config saved to /var/cache/conftool/dbconfig/20200501-134707-marostegui.json
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly warm up db1104 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P11107 and previous config saved to /var/cache/conftool/dbconfig/20200501-132804-marostegui.json
* 13:06 hknust: holger@mwmaint1002 Starting renameInvalidUsernames.php as part of [[phab:T219279|T219279]]
* 13:01 vgutierrez: rolling restart of ats-tls in text@esams - [[phab:T249335|T249335]]
* 12:24 mutante: mw230* - rolling restart of php-fpm - icinga warnings about opcache health in codfw
* 12:20 mutante: mw2376 - restarting php-fpm - icinga warnings about opcache health in codfw
* 12:07 mutante: notebook1004 - puppet was failed due to removal of jmorgan while one of his processes was still running. "change to absent failed.. user jmorgan currently used by process 29038". killing 29038, running puppet [[phab:T251560|T251560]]
* 12:05 mutante: notebook1003 - puppet was failed due to removal of jmorgan while one of his processeswas still running. "change to absent failed.. user jmorgan currently used by porcess 3288". killing 3288, running puppet [[phab:T251560|T251560]]
* 11:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 11:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 11:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 08:54 _joe_: depooled all servers in the app pool in rack D1
* 08:54 oblivian@cumin1001: conftool action : set/pooled=no:weight=30; selector: name=mw13(49{{!}}5[0-5])\.eqiad\.wmnet
* 08:50 oblivian@cumin1001: conftool action : set/weight=10; selector: name=mw13(49{{!}}5[0-5])\.eqiad\.wmnet
* 08:48 _joe_: repooling mw1407 with LCStoreStaticArray, increased opcache, puppet disabled
* 08:45 _joe_: repooling mw1409
* 08:39 _joe_: repool mw1352
* 08:37 _joe_: depooling mw1352
* 07:44 marostegui: Copy wikireplica dump from labsdb1009 to labsdb1011 - [[phab:T249188|T249188]]
* 01:36 bmansurov@deploy1001: Finished deploy [recommendation-api/deploy@5f47cd7]: Update the recommendation API service (duration: 04m 33s)
* 01:32 bmansurov@deploy1001: Started deploy [recommendation-api/deploy@5f47cd7]: Update the recommendation API service
 
==Archives==
See [[Server admin log/Archives]].
<noinclude>
<noinclude>
[[Category:SAL]]
[[Category:SAL]]
[[Category:Operations]]
[[Category:Operations]]
</noinclude>
</noinclude>

Latest revision as of 00:17, 3 December 2022

2022-12-03

  • 00:17 cwhite: draining shards from logstash1010, logstash1033, logstash1034, logstash1035 - T321410

2022-12-02

  • 19:42 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:42 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
  • 19:41 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Force run after a permission problem - volans@cumin1001"
  • 19:39 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 19:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:37 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 19:36 volans: fixed git checkout permissions T324334
  • 19:11 sukhe: restart pybal on lvs5004
  • 19:07 mutante: gitlab-runner* - upgrading gitlab-runner package version
  • 18:55 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 863383"
  • 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs5001.eqsin.wmnet
  • 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:53 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:51 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:49 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:44 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs5001.eqsin.wmnet
  • 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
  • 18:21 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs5001.eqsin.wmnet with reason: downtimed, in the process of decom
  • 18:20 sukhe: decomm lvs5001: restarting pybal
  • 18:14 sukhe: cr[23]-eqsin*: set routing-options static route 103.102.166.224/28 next-hop 10.132.0.39
  • 18:05 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:05 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
  • 18:03 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Test run after git gc - volans@cumin1001"
  • 18:01 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 18:00 volans: performed git gc on all (auth)dns hosts in /srv/git/netbox_dns_snippets - T324334
  • 17:36 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862944"
  • 16:56 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 16:53 jnuche@deploy1002: Finished scap: testing k8s deployment (duration: 08m 35s)
  • 16:49 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 16:49 bblack: (above agent runs completed on all text nodes for requestctl-for-misc patch)
  • 16:44 jnuche@deploy1002: Started scap: testing k8s deployment
  • 16:44 bblack: running agent on A:cp-text for https://gerrit.wikimedia.org/r/c/operations/puppet/+/863375 (requestctl for misc)
  • 16:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 16:28 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs5004.eqsin.wmnet with OS buster
  • 16:21 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 16:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 16:02 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
  • 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
  • 15:55 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 15:48 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862998"
  • 15:47 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 15:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster
  • 15:40 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:40 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 15:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:33 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 15:28 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:22 bking@cumin2002: START - Cookbook sre.wdqs.restart
  • 15:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 15:13 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 15:06 volans: run `git gc` on /srv/netbox-exports/dns.git on netbox[12]002 - T324334
  • 14:48 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS buster
  • 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
  • 12:09 jynus: dropping all databases from db1133
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti5001.eqsin.wmnet
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti5001.eqsin.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti5001.eqsin.wmnet
  • 10:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
  • 10:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti5001.eqsin.wmnet with reason: Remove from cluster for decom
  • 10:01 vgutierrez: upload acme-chief 0.36 to apt.wm.o (bullseye) - T321309
  • 09:58 moritzm: installing publicsuffix updates from bullseye/buster point releases
  • 09:54 moritzm: installing debootstrap updates from bullseye point release
  • 09:53 moritzm: rebalance ganeti codfw/C T323222
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
  • 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42215 and previous config saved to /var/cache/conftool/dbconfig/20221202-091126-root.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42214 and previous config saved to /var/cache/conftool/dbconfig/20221202-085621-root.json
  • 08:41 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:41 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42213 and previous config saved to /var/cache/conftool/dbconfig/20221202-084116-root.json
  • 08:41 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:40 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42212 and previous config saved to /var/cache/conftool/dbconfig/20221202-082611-root.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42211 and previous config saved to /var/cache/conftool/dbconfig/20221202-081106-root.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: After cloning db1206', diff saved to https://phabricator.wikimedia.org/P42210 and previous config saved to /var/cache/conftool/dbconfig/20221202-075601-root.json
  • 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P42209 and previous config saved to /var/cache/conftool/dbconfig/20221202-074300-ladsgroup.json
  • 07:41 moritzm: draining ganeti5001 for eventual decom T322048
  • 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P42208 and previous config saved to /var/cache/conftool/dbconfig/20221202-072755-ladsgroup.json
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P42207 and previous config saved to /var/cache/conftool/dbconfig/20221202-071250-ladsgroup.json
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P42206 and previous config saved to /var/cache/conftool/dbconfig/20221202-065745-ladsgroup.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P42204 and previous config saved to /var/cache/conftool/dbconfig/20221202-061259-marostegui.json
  • 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(45|46).eqiad.wmnet,cluster=jobrunner
  • 00:09 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw14(39|40).eqiad.wmnet,cluster=videoscaler
  • 00:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns5004.wikimedia.org with OS buster

2022-12-01

  • 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1347-1348].eqiad.wmnet
  • 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:47 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 23:45 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1347-1348].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 23:43 rzl@cumin1001: START - Cookbook sre.dns.netbox
  • 23:37 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1347-1348].eqiad.wmnet
  • 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1327-1346].eqiad.wmnet
  • 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:35 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 23:34 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1327-1346].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 23:31 rzl@cumin1001: START - Cookbook sre.dns.netbox
  • 22:59 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1327-1346].eqiad.wmnet
  • 22:57 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: Remove unused config variable GEMentorDashboardUseVue (duration: 07m 28s)
  • 22:57 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1320.eqiad.wmnet # T306162
  • 22:56 rzl: rzl@puppetmaster1001:~$ sudo puppet node deactivate mw1312.eqiad.wmnet # T306162
  • 22:54 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw[1307-1326].eqiad.wmnet
  • 22:54 rzl@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:54 rzl@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1307-1326].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 22:50 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: Remove unused config variable GEMentorDashboardUseVue
  • 22:49 urbanecm@deploy1002: backport aborted: (duration: 00m 03s)
  • 22:42 andrewbogott: upgradedwikitech-static-ord (aka wikitech-static) to Debian Buster, installed php7.4, upgraded MW to 1_39. Will delete the rackspace backup image in a few days.
  • 22:19 rzl@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[1307-1326].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - rzl@cumin1001"
  • 22:07 rzl@cumin1001: START - Cookbook sre.dns.netbox
  • 22:02 cwhite: restart swift-proxy on thanos::frontend eqiad
  • 22:01 brennen: end of utc late backport & config window
  • 21:46 brennen@deploy1002: Finished scap: Backport for GrowthExperiments: Enable user impact refresh script on pilot wikis (T322541) (duration: 07m 48s)
  • 21:40 brennen@deploy1002: brennen and kharlan: Backport for GrowthExperiments: Enable user impact refresh script on pilot wikis (T322541) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:38 brennen@deploy1002: Started scap: Backport for GrowthExperiments: Enable user impact refresh script on pilot wikis (T322541)
  • 21:34 brennen@deploy1002: Finished scap: Backport for New configs for android schemas (duration: 09m 49s)
  • 21:26 brennen@deploy1002: brennen and sharvaniharan: Backport for New configs for android schemas synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:25 andrewbogott: saving an image of wikitech-static-ord (aka wikitech-static) before upgrading the host to Buster
  • 21:25 brennen@deploy1002: Started scap: Backport for New configs for android schemas
  • 21:22 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1307-1326].eqiad.wmnet
  • 21:21 brennen@deploy1002: Finished scap: Backport for Start writing to cul_actor on test wikis (T233004) (duration: 14m 56s)
  • 21:13 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts mw[1307-1326].eqiad.wmnet
  • 21:10 rzl@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1307-1326].eqiad.wmnet
  • 21:08 brennen@deploy1002: brennen and zabe: Backport for Start writing to cul_actor on test wikis (T233004) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:06 brennen@deploy1002: Started scap: Backport for Start writing to cul_actor on test wikis (T233004)
  • 20:47 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for gitlab1004.wikimedia.org
  • 20:47 aokoth@cumin1001: START - Cookbook sre.hosts.remove-downtime for gitlab1004.wikimedia.org
  • 20:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1061.eqiad.wmnet with OS bullseye
  • 20:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 20:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 20:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 20:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 20:00 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version https://phabricator.wikmiedia.org/T324195
  • 19:59 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version https://phabricator.wikmiedia.org/T324195
  • 19:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1061.eqiad.wmnet with OS bullseye
  • 19:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1061']
  • 19:44 mutante: gitlab-runner1002 - upgrading gitlab-runner package
  • 19:44 rzl@cumin2002: conftool action : set/pooled=inactive; selector: name=mw13(0[7-9]|[1-3]\d|4[0-8])\..*
  • 19:43 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 42 hosts with reason: decom
  • 19:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T323907)', diff saved to https://phabricator.wikimedia.org/P42201 and previous config saved to /var/cache/conftool/dbconfig/20221201-194301-ladsgroup.json
  • 19:42 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 42 hosts with reason: decom
  • 19:41 mutante: gitlab2002 (gitlab-replica) - upgrading gitlab-ce
  • 19:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
  • 19:39 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns5004.wikimedia.org with OS buster
  • 19:38 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 19:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1061']
  • 19:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 19:28 dancy@deploy1002: Finished scap: testing k8s deployment (duration: 06m 17s)
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42200 and previous config saved to /var/cache/conftool/dbconfig/20221201-192755-ladsgroup.json
  • 19:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 19:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1061']
  • 19:27 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs5004.eqsin.wmnet with OS buster
  • 19:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 19:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1060.eqiad.wmnet with OS bullseye
  • 19:21 dancy@deploy1002: Started scap: testing k8s deployment
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 19:16 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.12 refs T320517
  • 19:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1061']
  • 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P42199 and previous config saved to /var/cache/conftool/dbconfig/20221201-191248-ladsgroup.json
  • 19:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 19:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 19:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 19:02 dancy@deploy1002: Installation of scap version "4.30.0" completed for 601 hosts
  • 19:01 dancy@deploy1002: Installing scap version "4.30.0" for 601 hosts
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T323907)', diff saved to https://phabricator.wikimedia.org/P42197 and previous config saved to /var/cache/conftool/dbconfig/20221201-185742-ladsgroup.json
  • 18:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 18:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 18:43 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 18:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 18:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1061']
  • 18:37 rzl@cumin2002: conftool action : set/pooled=no; selector: name=mw13(0[7-9]|[1-3]\d|4[0-8])\..*
  • 18:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 18:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 18:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 18:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 18:26 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 18:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 18:25 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 18:21 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:19 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:19 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:17 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:17 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:16 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1059.eqiad.wmnet with OS bullseye
  • 18:14 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1061']
  • 18:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1060.eqiad.wmnet with OS bullseye
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T323907)', diff saved to https://phabricator.wikimedia.org/P42196 and previous config saved to /var/cache/conftool/dbconfig/20221201-181215-ladsgroup.json
  • 18:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T323907)', diff saved to https://phabricator.wikimedia.org/P42195 and previous config saved to /var/cache/conftool/dbconfig/20221201-181153-ladsgroup.json
  • 18:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1060']
  • 18:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
  • 18:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1058.eqiad.wmnet with OS bullseye
  • 18:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS buster
  • 18:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 17:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 17:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42194 and previous config saved to /var/cache/conftool/dbconfig/20221201-175647-ladsgroup.json
  • 17:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 17:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 17:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1060']
  • 17:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
  • 17:47 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns5004.wikimedia.org with reason: host reimage
  • 17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1060']
  • 17:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
  • 17:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1060']
  • 17:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1059.eqiad.wmnet with OS bullseye
  • 17:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1058.eqiad.wmnet with OS bullseye
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P42193 and previous config saved to /var/cache/conftool/dbconfig/20221201-174140-ladsgroup.json
  • 17:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1058']
  • 17:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1059']
  • 17:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bullseye
  • 17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1057']
  • 17:34 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1060']
  • 17:33 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1057']
  • 17:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1056.eqiad.wmnet with OS bullseye
  • 17:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1057']
  • 17:27 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1059']
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T323907)', diff saved to https://phabricator.wikimedia.org/P42192 and previous config saved to /var/cache/conftool/dbconfig/20221201-172634-ladsgroup.json
  • 17:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1058']
  • 17:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1058']
  • 17:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1059']
  • 17:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns5004.wikimedia.org with OS buster
  • 17:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T323907)', diff saved to https://phabricator.wikimedia.org/P42191 and previous config saved to /var/cache/conftool/dbconfig/20221201-171335-ladsgroup.json
  • 17:08 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1059']
  • 17:07 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1058']
  • 17:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1056.eqiad.wmnet with OS bullseye
  • 17:01 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:59 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1057']
  • 16:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1055.eqiad.wmnet with OS bullseye
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42190 and previous config saved to /var/cache/conftool/dbconfig/20221201-165828-ladsgroup.json
  • 16:56 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:55 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 16:50 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns5004
  • 16:50 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns5004
  • 16:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1057']
  • 16:49 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:49 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5004 fix - robh@cumin2002"
  • 16:48 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5004 fix - robh@cumin2002"
  • 16:46 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T323907)', diff saved to https://phabricator.wikimedia.org/P42189 and previous config saved to /var/cache/conftool/dbconfig/20221201-164509-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T323907)', diff saved to https://phabricator.wikimedia.org/P42188 and previous config saved to /var/cache/conftool/dbconfig/20221201-164437-ladsgroup.json
  • 16:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 16:43 moritzm: installing ini4j security updates
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P42187 and previous config saved to /var/cache/conftool/dbconfig/20221201-164322-ladsgroup.json
  • 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1056']
  • 16:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 16:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 16:36 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 16:34 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1057']
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42185 and previous config saved to /var/cache/conftool/dbconfig/20221201-162930-ladsgroup.json
  • 16:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1055.eqiad.wmnet with OS bullseye
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T323907)', diff saved to https://phabricator.wikimedia.org/P42184 and previous config saved to /var/cache/conftool/dbconfig/20221201-162815-ladsgroup.json
  • 16:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1056']
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P42183 and previous config saved to /var/cache/conftool/dbconfig/20221201-161424-ladsgroup.json
  • 16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1055']
  • 16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1056']
  • 16:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bullseye
  • 16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1054']
  • 16:00 effie: php7.4 upgrade + apache upgrade + rolling restarts of parsoid servers - T323358
  • 16:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1055']
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T323907)', diff saved to https://phabricator.wikimedia.org/P42182 and previous config saved to /var/cache/conftool/dbconfig/20221201-155917-ladsgroup.json
  • 15:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1055']
  • 15:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1056']
  • 15:57 effie: php7.4 upgrade + apache upgrade + rolling restarts of jobrunners/videoscalers servers - T323358
  • 15:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1054']
  • 15:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt1054']
  • 15:45 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1055']
  • 15:41 effie: php7.4 upgrade + apache upgrade + rolling restarts of api servers - T323358
  • 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T323907)', diff saved to https://phabricator.wikimedia.org/P42181 and previous config saved to /var/cache/conftool/dbconfig/20221201-153918-ladsgroup.json
  • 15:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42180 and previous config saved to /var/cache/conftool/dbconfig/20221201-153856-ladsgroup.json
  • 15:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns5001.wikimedia.org
  • 15:38 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:38 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 15:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1054']
  • 15:36 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns5001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 15:34 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 15:28 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns5001.wikimedia.org
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42179 and previous config saved to /var/cache/conftool/dbconfig/20221201-152350-ladsgroup.json
  • 15:12 effie: php7.4 upgrade + apache upgrade + rolling restarts of app servers - T323358
  • 15:11 sukhe: [done] homer "cr*-eqsin*" commit "running homer for Gerrit: 862321"
  • 15:10 sukhe: homer "cr*-eqsin*" commit "running homer for Gerrit: 862321"
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P42178 and previous config saved to /var/cache/conftool/dbconfig/20221201-150843-ladsgroup.json
  • 15:01 Lucas_WMDE: UTC afternoon backport+config window done
  • 15:00 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable limited width on plwikisource MAIN namespace (T323185) (duration: 08m 06s)
  • 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:53 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and soda: Backport for Enable limited width on plwikisource MAIN namespace (T323185) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42177 and previous config saved to /var/cache/conftool/dbconfig/20221201-145337-ladsgroup.json
  • 14:52 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable limited width on plwikisource MAIN namespace (T323185)
  • 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:50 moritzm: installing krb5 security updates
  • 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:45 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: Enable new impact module on testwiki (T323526) (duration: 06m 12s)
  • 14:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:42 XioNoX: add BGP sessions to RIPE RIS in drmrs
  • 14:40 kharlan@deploy1002: kharlan and kharlan: Backport for GrowthExperiments: Enable new impact module on testwiki (T323526) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:39 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: Enable new impact module on testwiki (T323526)
  • 14:36 kharlan@deploy1002: Finished scap: Backport for [no-op] GrowthExperiments: Enable D3 in production (T318854) (duration: 06m 04s)
  • 14:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:31 kharlan@deploy1002: kharlan and tgr: Backport for [no-op] GrowthExperiments: Enable D3 in production (T318854) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:30 kharlan@deploy1002: Started scap: Backport for [no-op] GrowthExperiments: Enable D3 in production (T318854)
  • 14:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:27 kharlan@deploy1002: Finished scap: Backport for DatabaseUserImpactStore: Fix parameter style for upsert keys (T324188) (duration: 07m 25s)
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T323907)', diff saved to https://phabricator.wikimedia.org/P42176 and previous config saved to /var/cache/conftool/dbconfig/20221201-142735-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:21 kharlan@deploy1002: kharlan and kharlan: Backport for DatabaseUserImpactStore: Fix parameter style for upsert keys (T324188) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:20 kharlan@deploy1002: Started scap: Backport for DatabaseUserImpactStore: Fix parameter style for upsert keys (T324188)
  • 14:00 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust DNS for LVS eqsin. - cmooney@cumin1001"
  • 13:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust DNS for LVS eqsin. - cmooney@cumin1001"
  • 13:28 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42175 and previous config saved to /var/cache/conftool/dbconfig/20221201-132000-ladsgroup.json
  • 13:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T323907)', diff saved to https://phabricator.wikimedia.org/P42174 and previous config saved to /var/cache/conftool/dbconfig/20221201-131950-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42172 and previous config saved to /var/cache/conftool/dbconfig/20221201-130443-ladsgroup.json
  • 12:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42171 and previous config saved to /var/cache/conftool/dbconfig/20221201-125821-ladsgroup.json
  • 12:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 12:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 12:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 12:49 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P42170 and previous config saved to /var/cache/conftool/dbconfig/20221201-124936-ladsgroup.json
  • 12:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 12:48 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 12:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 12:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 12:43 moritzm: installing glibc security updates on buster
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42169 and previous config saved to /var/cache/conftool/dbconfig/20221201-124314-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T323907)', diff saved to https://phabricator.wikimedia.org/P42168 and previous config saved to /var/cache/conftool/dbconfig/20221201-123430-ladsgroup.json
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P42167 and previous config saved to /var/cache/conftool/dbconfig/20221201-122807-ladsgroup.json
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42166 and previous config saved to /var/cache/conftool/dbconfig/20221201-121301-ladsgroup.json
  • 12:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T318605)', diff saved to https://phabricator.wikimedia.org/P42165 and previous config saved to /var/cache/conftool/dbconfig/20221201-120102-ladsgroup.json
  • 11:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 11:55 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 11:47 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 11:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 11:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42164 and previous config saved to /var/cache/conftool/dbconfig/20221201-114555-ladsgroup.json
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P42163 and previous config saved to /var/cache/conftool/dbconfig/20221201-113049-ladsgroup.json
  • 11:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:18 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Fix broken search with vector-2022 on www.wikidata.org (T324148) (duration: 06m 56s)
  • 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T318605)', diff saved to https://phabricator.wikimedia.org/P42162 and previous config saved to /var/cache/conftool/dbconfig/20221201-111542-ladsgroup.json
  • 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:12 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and migr: Backport for Fix broken search with vector-2022 on www.wikidata.org (T324148) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 11:11 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Fix broken search with vector-2022 on www.wikidata.org (T324148)
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T318605)', diff saved to https://phabricator.wikimedia.org/P42161 and previous config saved to /var/cache/conftool/dbconfig/20221201-110938-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T318605)', diff saved to https://phabricator.wikimedia.org/P42160 and previous config saved to /var/cache/conftool/dbconfig/20221201-110916-ladsgroup.json
  • 11:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 11:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T323907)', diff saved to https://phabricator.wikimedia.org/P42159 and previous config saved to /var/cache/conftool/dbconfig/20221201-105938-ladsgroup.json
  • 10:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42158 and previous config saved to /var/cache/conftool/dbconfig/20221201-105916-ladsgroup.json
  • 10:57 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thanos-web
  • 10:56 elukey: deleted knative controller + net-istio controllers on ml-serve-eqiad to clear out some weird state (causing high latencies for the k8s api)
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42157 and previous config saved to /var/cache/conftool/dbconfig/20221201-105410-ladsgroup.json
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42156 and previous config saved to /var/cache/conftool/dbconfig/20221201-104409-ladsgroup.json
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P42155 and previous config saved to /var/cache/conftool/dbconfig/20221201-103903-ladsgroup.json
  • 10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42154 and previous config saved to /var/cache/conftool/dbconfig/20221201-103448-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42153 and previous config saved to /var/cache/conftool/dbconfig/20221201-103426-ladsgroup.json
  • 10:34 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 10:34 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5004.eqsin.wmnet to cluster eqsin and group 1
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P42152 and previous config saved to /var/cache/conftool/dbconfig/20221201-102903-ladsgroup.json
  • 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T318605)', diff saved to https://phabricator.wikimedia.org/P42151 and previous config saved to /var/cache/conftool/dbconfig/20221201-102357-ladsgroup.json
  • 10:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42150 and previous config saved to /var/cache/conftool/dbconfig/20221201-101920-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T318605)', diff saved to https://phabricator.wikimedia.org/P42149 and previous config saved to /var/cache/conftool/dbconfig/20221201-101754-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42148 and previous config saved to /var/cache/conftool/dbconfig/20221201-101733-ladsgroup.json
  • 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42147 and previous config saved to /var/cache/conftool/dbconfig/20221201-101356-ladsgroup.json
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P42146 and previous config saved to /var/cache/conftool/dbconfig/20221201-100413-ladsgroup.json
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42145 and previous config saved to /var/cache/conftool/dbconfig/20221201-100227-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42144 and previous config saved to /var/cache/conftool/dbconfig/20221201-094907-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P42143 and previous config saved to /var/cache/conftool/dbconfig/20221201-094720-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42142 and previous config saved to /var/cache/conftool/dbconfig/20221201-093214-ladsgroup.json
  • 09:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42141 and previous config saved to /var/cache/conftool/dbconfig/20221201-092455-ladsgroup.json
  • 09:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T318605)', diff saved to https://phabricator.wikimedia.org/P42140 and previous config saved to /var/cache/conftool/dbconfig/20221201-092434-ladsgroup.json
  • 09:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:19 kostajh: UTC morning deploys done
  • 09:18 kharlan@deploy1002: Finished scap: Backport for User impact: Fix per-page pageview numbers (T323253) (duration: 08m 31s)
  • 09:15 Emperor: depool, restart, repool swift-proxy on ms-fe1011
  • 09:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:11 kharlan@deploy1002: kharlan and kharlan: Backport for User impact: Fix per-page pageview numbers (T323253) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 09:09 kharlan@deploy1002: Started scap: Backport for User impact: Fix per-page pageview numbers (T323253)
  • 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42139 and previous config saved to /var/cache/conftool/dbconfig/20221201-090927-ladsgroup.json
  • 09:07 moritzm: rebuilding raid on ganeti2013 T323222
  • 09:01 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2013.codfw.wmnet
  • 08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P42138 and previous config saved to /var/cache/conftool/dbconfig/20221201-085421-ladsgroup.json
  • 08:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
  • 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 08:49 volans: restart idrac on mw1334, ipmi and remote ipmi works fine, ssh not responding
  • 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 08:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42137 and previous config saved to /var/cache/conftool/dbconfig/20221201-084147-ladsgroup.json
  • 08:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 08:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T323907)', diff saved to https://phabricator.wikimedia.org/P42136 and previous config saved to /var/cache/conftool/dbconfig/20221201-084125-ladsgroup.json
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P42135 and previous config saved to /var/cache/conftool/dbconfig/20221201-084026-ladsgroup.json
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T318605)', diff saved to https://phabricator.wikimedia.org/P42134 and previous config saved to /var/cache/conftool/dbconfig/20221201-083914-ladsgroup.json
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42131 and previous config saved to /var/cache/conftool/dbconfig/20221201-082619-ladsgroup.json
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P42130 and previous config saved to /var/cache/conftool/dbconfig/20221201-082519-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T318605)', diff saved to https://phabricator.wikimedia.org/P42129 and previous config saved to /var/cache/conftool/dbconfig/20221201-082215-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T318605)', diff saved to https://phabricator.wikimedia.org/P42128 and previous config saved to /var/cache/conftool/dbconfig/20221201-082154-ladsgroup.json
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42127 and previous config saved to /var/cache/conftool/dbconfig/20221201-081444-ladsgroup.json
  • 08:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 08:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T323907)', diff saved to https://phabricator.wikimedia.org/P42126 and previous config saved to /var/cache/conftool/dbconfig/20221201-081433-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P42125 and previous config saved to /var/cache/conftool/dbconfig/20221201-081112-ladsgroup.json
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P42124 and previous config saved to /var/cache/conftool/dbconfig/20221201-081013-ladsgroup.json
  • 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42123 and previous config saved to /var/cache/conftool/dbconfig/20221201-080647-ladsgroup.json
  • 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42122 and previous config saved to /var/cache/conftool/dbconfig/20221201-075927-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T323907)', diff saved to https://phabricator.wikimedia.org/P42120 and previous config saved to /var/cache/conftool/dbconfig/20221201-075606-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P42119 and previous config saved to /var/cache/conftool/dbconfig/20221201-075506-ladsgroup.json
  • 07:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 400474
  • 07:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P42118 and previous config saved to /var/cache/conftool/dbconfig/20221201-075140-ladsgroup.json
  • 07:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 400474
  • 07:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P42117 and previous config saved to /var/cache/conftool/dbconfig/20221201-074420-ladsgroup.json
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T318605)', diff saved to https://phabricator.wikimedia.org/P42116 and previous config saved to /var/cache/conftool/dbconfig/20221201-073634-ladsgroup.json
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T318605)', diff saved to https://phabricator.wikimedia.org/P42115 and previous config saved to /var/cache/conftool/dbconfig/20221201-073015-ladsgroup.json
  • 07:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T323907)', diff saved to https://phabricator.wikimedia.org/P42114 and previous config saved to /var/cache/conftool/dbconfig/20221201-072914-ladsgroup.json
  • 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42113 and previous config saved to /var/cache/conftool/dbconfig/20221201-072659-ladsgroup.json
  • 07:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T323907)', diff saved to https://phabricator.wikimedia.org/P42111 and previous config saved to /var/cache/conftool/dbconfig/20221201-071641-ladsgroup.json
  • 07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 07:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T323907)', diff saved to https://phabricator.wikimedia.org/P42110 and previous config saved to /var/cache/conftool/dbconfig/20221201-071615-ladsgroup.json
  • 07:14 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 07:13 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 07:13 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 07:13 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 07:12 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 07:12 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42109 and previous config saved to /var/cache/conftool/dbconfig/20221201-071153-ladsgroup.json
  • 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1163 T323547', diff saved to https://phabricator.wikimedia.org/P42108 and previous config saved to /var/cache/conftool/dbconfig/20221201-070758-ladsgroup.json
  • 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1118 to s1 primary and set section read-write T323547', diff saved to https://phabricator.wikimedia.org/P42107 and previous config saved to /var/cache/conftool/dbconfig/20221201-070203-ladsgroup.json
  • 07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T323547', diff saved to https://phabricator.wikimedia.org/P42106 and previous config saved to /var/cache/conftool/dbconfig/20221201-070131-ladsgroup.json
  • 07:01 Amir1: Starting s1 eqiad failover from db1163 to db1118 - T323547
  • 07:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42105 and previous config saved to /var/cache/conftool/dbconfig/20221201-070108-ladsgroup.json
  • 06:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T318605)', diff saved to https://phabricator.wikimedia.org/P42104 and previous config saved to /var/cache/conftool/dbconfig/20221201-065737-ladsgroup.json
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P42103 and previous config saved to /var/cache/conftool/dbconfig/20221201-065646-ladsgroup.json
  • 06:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P42102 and previous config saved to /var/cache/conftool/dbconfig/20221201-064602-ladsgroup.json
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42101 and previous config saved to /var/cache/conftool/dbconfig/20221201-064230-ladsgroup.json
  • 06:42 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 06:42 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42100 and previous config saved to /var/cache/conftool/dbconfig/20221201-064140-ladsgroup.json
  • 06:41 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 06:40 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P42099 and previous config saved to /var/cache/conftool/dbconfig/20221201-063930-ladsgroup.json
  • 06:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 06:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42098 and previous config saved to /var/cache/conftool/dbconfig/20221201-063908-ladsgroup.json
  • 06:36 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 06:35 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 06:31 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 06:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T323907)', diff saved to https://phabricator.wikimedia.org/P42097 and previous config saved to /var/cache/conftool/dbconfig/20221201-063055-ladsgroup.json
  • 06:30 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P42096 and previous config saved to /var/cache/conftool/dbconfig/20221201-062724-ladsgroup.json
  • 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42095 and previous config saved to /var/cache/conftool/dbconfig/20221201-062402-ladsgroup.json
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T318605)', diff saved to https://phabricator.wikimedia.org/P42094 and previous config saved to /var/cache/conftool/dbconfig/20221201-061218-ladsgroup.json
  • 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P42093 and previous config saved to /var/cache/conftool/dbconfig/20221201-060855-ladsgroup.json
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T318605)', diff saved to https://phabricator.wikimedia.org/P42092 and previous config saved to /var/cache/conftool/dbconfig/20221201-060230-ladsgroup.json
  • 06:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 06:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42091 and previous config saved to /var/cache/conftool/dbconfig/20221201-060206-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1118 with weight 0 T323547', diff saved to https://phabricator.wikimedia.org/P42090 and previous config saved to /var/cache/conftool/dbconfig/20221201-060157-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 37 hosts with reason: Primary switchover s1 T323547
  • 06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 37 hosts with reason: Primary switchover s1 T323547
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T323907)', diff saved to https://phabricator.wikimedia.org/P42089 and previous config saved to /var/cache/conftool/dbconfig/20221201-055359-ladsgroup.json
  • 05:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42088 and previous config saved to /var/cache/conftool/dbconfig/20221201-055349-ladsgroup.json
  • 05:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T323907)', diff saved to https://phabricator.wikimedia.org/P42087 and previous config saved to /var/cache/conftool/dbconfig/20221201-055337-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42086 and previous config saved to /var/cache/conftool/dbconfig/20221201-055239-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 05:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42085 and previous config saved to /var/cache/conftool/dbconfig/20221201-055218-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T323907)', diff saved to https://phabricator.wikimedia.org/P42084 and previous config saved to /var/cache/conftool/dbconfig/20221201-055142-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 05:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T323907)', diff saved to https://phabricator.wikimedia.org/P42083 and previous config saved to /var/cache/conftool/dbconfig/20221201-055120-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42082 and previous config saved to /var/cache/conftool/dbconfig/20221201-054653-ladsgroup.json
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42081 and previous config saved to /var/cache/conftool/dbconfig/20221201-053831-ladsgroup.json
  • 05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42080 and previous config saved to /var/cache/conftool/dbconfig/20221201-053711-ladsgroup.json
  • 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42079 and previous config saved to /var/cache/conftool/dbconfig/20221201-053613-ladsgroup.json
  • 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P42078 and previous config saved to /var/cache/conftool/dbconfig/20221201-053147-ladsgroup.json
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T322618)', diff saved to https://phabricator.wikimedia.org/P42077 and previous config saved to /var/cache/conftool/dbconfig/20221201-052524-ladsgroup.json
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P42076 and previous config saved to /var/cache/conftool/dbconfig/20221201-052325-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T322618)', diff saved to https://phabricator.wikimedia.org/P42075 and previous config saved to /var/cache/conftool/dbconfig/20221201-052223-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P42074 and previous config saved to /var/cache/conftool/dbconfig/20221201-052205-ladsgroup.json
  • 05:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P42073 and previous config saved to /var/cache/conftool/dbconfig/20221201-052107-ladsgroup.json
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T322618)', diff saved to https://phabricator.wikimedia.org/P42072 and previous config saved to /var/cache/conftool/dbconfig/20221201-052014-ladsgroup.json
  • 05:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 05:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T322618)', diff saved to https://phabricator.wikimedia.org/P42071 and previous config saved to /var/cache/conftool/dbconfig/20221201-051942-ladsgroup.json
  • 05:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42070 and previous config saved to /var/cache/conftool/dbconfig/20221201-051640-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T323907)', diff saved to https://phabricator.wikimedia.org/P42069 and previous config saved to /var/cache/conftool/dbconfig/20221201-050818-ladsgroup.json
  • 05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42068 and previous config saved to /var/cache/conftool/dbconfig/20221201-050658-ladsgroup.json
  • 05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T323907)', diff saved to https://phabricator.wikimedia.org/P42067 and previous config saved to /var/cache/conftool/dbconfig/20221201-050600-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42066 and previous config saved to /var/cache/conftool/dbconfig/20221201-050548-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 05:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P42065 and previous config saved to /var/cache/conftool/dbconfig/20221201-050527-ladsgroup.json
  • 05:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P42064 and previous config saved to /var/cache/conftool/dbconfig/20221201-050435-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42063 and previous config saved to /var/cache/conftool/dbconfig/20221201-045020-ladsgroup.json
  • 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P42062 and previous config saved to /var/cache/conftool/dbconfig/20221201-044929-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42061 and previous config saved to /var/cache/conftool/dbconfig/20221201-044053-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42060 and previous config saved to /var/cache/conftool/dbconfig/20221201-044031-ladsgroup.json
  • 04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P42059 and previous config saved to /var/cache/conftool/dbconfig/20221201-043514-ladsgroup.json
  • 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T322618)', diff saved to https://phabricator.wikimedia.org/P42058 and previous config saved to /var/cache/conftool/dbconfig/20221201-043422-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T322618)', diff saved to https://phabricator.wikimedia.org/P42057 and previous config saved to /var/cache/conftool/dbconfig/20221201-043315-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T322618)', diff saved to https://phabricator.wikimedia.org/P42056 and previous config saved to /var/cache/conftool/dbconfig/20221201-043253-ladsgroup.json
  • 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42055 and previous config saved to /var/cache/conftool/dbconfig/20221201-042525-ladsgroup.json
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T323907)', diff saved to https://phabricator.wikimedia.org/P42054 and previous config saved to /var/cache/conftool/dbconfig/20221201-042251-ladsgroup.json
  • 04:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 04:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42053 and previous config saved to /var/cache/conftool/dbconfig/20221201-042229-ladsgroup.json
  • 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P42052 and previous config saved to /var/cache/conftool/dbconfig/20221201-042008-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P42051 and previous config saved to /var/cache/conftool/dbconfig/20221201-041758-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P42050 and previous config saved to /var/cache/conftool/dbconfig/20221201-041747-ladsgroup.json
  • 04:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 04:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 04:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P42049 and previous config saved to /var/cache/conftool/dbconfig/20221201-041652-ladsgroup.json
  • 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T322618)', diff saved to https://phabricator.wikimedia.org/P42048 and previous config saved to /var/cache/conftool/dbconfig/20221201-041322-ladsgroup.json
  • 04:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P42047 and previous config saved to /var/cache/conftool/dbconfig/20221201-041018-ladsgroup.json
  • 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42046 and previous config saved to /var/cache/conftool/dbconfig/20221201-040723-ladsgroup.json
  • 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P42045 and previous config saved to /var/cache/conftool/dbconfig/20221201-040240-ladsgroup.json
  • 04:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42044 and previous config saved to /var/cache/conftool/dbconfig/20221201-040145-ladsgroup.json
  • 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P42043 and previous config saved to /var/cache/conftool/dbconfig/20221201-035816-ladsgroup.json
  • 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42042 and previous config saved to /var/cache/conftool/dbconfig/20221201-035512-ladsgroup.json
  • 03:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P42041 and previous config saved to /var/cache/conftool/dbconfig/20221201-035216-ladsgroup.json
  • 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T322618)', diff saved to https://phabricator.wikimedia.org/P42040 and previous config saved to /var/cache/conftool/dbconfig/20221201-034734-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P42039 and previous config saved to /var/cache/conftool/dbconfig/20221201-034639-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T322618)', diff saved to https://phabricator.wikimedia.org/P42038 and previous config saved to /var/cache/conftool/dbconfig/20221201-034627-ladsgroup.json
  • 03:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T322618)', diff saved to https://phabricator.wikimedia.org/P42037 and previous config saved to /var/cache/conftool/dbconfig/20221201-034527-ladsgroup.json
  • 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P42036 and previous config saved to /var/cache/conftool/dbconfig/20221201-034309-ladsgroup.json
  • 03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P42035 and previous config saved to /var/cache/conftool/dbconfig/20221201-033710-ladsgroup.json
  • 03:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5027.eqsin.wmnet with OS buster
  • 03:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T323907)', diff saved to https://phabricator.wikimedia.org/P42034 and previous config saved to /var/cache/conftool/dbconfig/20221201-033449-ladsgroup.json
  • 03:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 03:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 03:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P42033 and previous config saved to /var/cache/conftool/dbconfig/20221201-033132-ladsgroup.json
  • 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P42032 and previous config saved to /var/cache/conftool/dbconfig/20221201-033020-ladsgroup.json
  • 03:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P42031 and previous config saved to /var/cache/conftool/dbconfig/20221201-032922-ladsgroup.json
  • 03:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 03:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 03:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P42030 and previous config saved to /var/cache/conftool/dbconfig/20221201-032901-ladsgroup.json
  • 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T322618)', diff saved to https://phabricator.wikimedia.org/P42029 and previous config saved to /var/cache/conftool/dbconfig/20221201-032803-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T322618)', diff saved to https://phabricator.wikimedia.org/P42028 and previous config saved to /var/cache/conftool/dbconfig/20221201-032553-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 03:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42027 and previous config saved to /var/cache/conftool/dbconfig/20221201-032531-ladsgroup.json
  • 03:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42026 and previous config saved to /var/cache/conftool/dbconfig/20221201-031608-ladsgroup.json
  • 03:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 03:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 03:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42025 and previous config saved to /var/cache/conftool/dbconfig/20221201-031546-ladsgroup.json
  • 03:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P42024 and previous config saved to /var/cache/conftool/dbconfig/20221201-031514-ladsgroup.json
  • 03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42023 and previous config saved to /var/cache/conftool/dbconfig/20221201-031354-ladsgroup.json
  • 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P42022 and previous config saved to /var/cache/conftool/dbconfig/20221201-031024-ladsgroup.json
  • 03:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
  • 03:03 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
  • 03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42021 and previous config saved to /var/cache/conftool/dbconfig/20221201-030040-ladsgroup.json
  • 03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T322618)', diff saved to https://phabricator.wikimedia.org/P42020 and previous config saved to /var/cache/conftool/dbconfig/20221201-030007-ladsgroup.json
  • 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T322618)', diff saved to https://phabricator.wikimedia.org/P42019 and previous config saved to /var/cache/conftool/dbconfig/20221201-025900-ladsgroup.json
  • 02:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P42018 and previous config saved to /var/cache/conftool/dbconfig/20221201-025848-ladsgroup.json
  • 02:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T322618)', diff saved to https://phabricator.wikimedia.org/P42017 and previous config saved to /var/cache/conftool/dbconfig/20221201-025838-ladsgroup.json
  • 02:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P42016 and previous config saved to /var/cache/conftool/dbconfig/20221201-025517-ladsgroup.json
  • 02:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P42015 and previous config saved to /var/cache/conftool/dbconfig/20221201-024533-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P42014 and previous config saved to /var/cache/conftool/dbconfig/20221201-024341-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P42013 and previous config saved to /var/cache/conftool/dbconfig/20221201-024331-ladsgroup.json
  • 02:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P42012 and previous config saved to /var/cache/conftool/dbconfig/20221201-024131-ladsgroup.json
  • 02:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 02:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 02:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P42011 and previous config saved to /var/cache/conftool/dbconfig/20221201-024110-ladsgroup.json
  • 02:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42010 and previous config saved to /var/cache/conftool/dbconfig/20221201-024011-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T322618)', diff saved to https://phabricator.wikimedia.org/P42009 and previous config saved to /var/cache/conftool/dbconfig/20221201-023801-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 02:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 02:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P42008 and previous config saved to /var/cache/conftool/dbconfig/20221201-023750-ladsgroup.json
  • 02:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
  • 02:33 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5027.eqsin.wmnet with OS buster
  • 02:32 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P42007 and previous config saved to /var/cache/conftool/dbconfig/20221201-023027-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P42006 and previous config saved to /var/cache/conftool/dbconfig/20221201-022825-ladsgroup.json
  • 02:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42005 and previous config saved to /var/cache/conftool/dbconfig/20221201-022603-ladsgroup.json
  • 02:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P42004 and previous config saved to /var/cache/conftool/dbconfig/20221201-022244-ladsgroup.json
  • 02:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
  • 02:21 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5027.eqsin.wmnet with OS buster
  • 02:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
  • 02:20 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5027.eqsin.wmnet with OS buster
  • 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T322618)', diff saved to https://phabricator.wikimedia.org/P42003 and previous config saved to /var/cache/conftool/dbconfig/20221201-021318-ladsgroup.json
  • 02:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-coord - cmjohnson@cumin1001"
  • 02:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T322618)', diff saved to https://phabricator.wikimedia.org/P42002 and previous config saved to /var/cache/conftool/dbconfig/20221201-021211-ladsgroup.json
  • 02:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 02:12 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-coord - cmjohnson@cumin1001"
  • 02:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 02:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T322618)', diff saved to https://phabricator.wikimedia.org/P42001 and previous config saved to /var/cache/conftool/dbconfig/20221201-021149-ladsgroup.json
  • 02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P42000 and previous config saved to /var/cache/conftool/dbconfig/20221201-021057-ladsgroup.json
  • 02:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 02:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 02:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P41999 and previous config saved to /var/cache/conftool/dbconfig/20221201-020737-ladsgroup.json
  • 02:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 02:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 02:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T323907)', diff saved to https://phabricator.wikimedia.org/P41998 and previous config saved to /var/cache/conftool/dbconfig/20221201-020308-ladsgroup.json
  • 02:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 02:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 01:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cephosd - cmjohnson@cumin1001"
  • 01:58 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cephosd - cmjohnson@cumin1001"
  • 01:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P41997 and previous config saved to /var/cache/conftool/dbconfig/20221201-015643-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P41996 and previous config saved to /var/cache/conftool/dbconfig/20221201-015550-ladsgroup.json
  • 01:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 01:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P41995 and previous config saved to /var/cache/conftool/dbconfig/20221201-015340-ladsgroup.json
  • 01:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 01:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P41994 and previous config saved to /var/cache/conftool/dbconfig/20221201-015332-ladsgroup.json
  • 01:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 01:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 01:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41993 and previous config saved to /var/cache/conftool/dbconfig/20221201-015230-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T318605)', diff saved to https://phabricator.wikimedia.org/P41992 and previous config saved to /var/cache/conftool/dbconfig/20221201-015115-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 01:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41991 and previous config saved to /var/cache/conftool/dbconfig/20221201-015020-ladsgroup.json
  • 01:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 01:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41990 and previous config saved to /var/cache/conftool/dbconfig/20221201-015010-ladsgroup.json
  • 01:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P41989 and previous config saved to /var/cache/conftool/dbconfig/20221201-014136-ladsgroup.json
  • 01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P41988 and previous config saved to /var/cache/conftool/dbconfig/20221201-013503-ladsgroup.json
  • 01:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS buster
  • 01:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T322618)', diff saved to https://phabricator.wikimedia.org/P41987 and previous config saved to /var/cache/conftool/dbconfig/20221201-012630-ladsgroup.json
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T322618)', diff saved to https://phabricator.wikimedia.org/P41986 and previous config saved to /var/cache/conftool/dbconfig/20221201-012522-ladsgroup.json
  • 01:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 01:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T322618)', diff saved to https://phabricator.wikimedia.org/P41985 and previous config saved to /var/cache/conftool/dbconfig/20221201-012500-ladsgroup.json
  • 01:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5026.eqsin.wmnet with OS buster
  • 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P41984 and previous config saved to /var/cache/conftool/dbconfig/20221201-011957-ladsgroup.json
  • 01:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P41983 and previous config saved to /var/cache/conftool/dbconfig/20221201-010954-ladsgroup.json
  • 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41982 and previous config saved to /var/cache/conftool/dbconfig/20221201-010450-ladsgroup.json
  • 01:04 ejegg: payments-wiki upgraded from 96c74911 to c52a6a39
  • 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T322618)', diff saved to https://phabricator.wikimedia.org/P41981 and previous config saved to /var/cache/conftool/dbconfig/20221201-010240-ladsgroup.json
  • 01:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T322618)', diff saved to https://phabricator.wikimedia.org/P41980 and previous config saved to /var/cache/conftool/dbconfig/20221201-010219-ladsgroup.json
  • 00:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
  • 00:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P41979 and previous config saved to /var/cache/conftool/dbconfig/20221201-005447-ladsgroup.json
  • 00:53 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
  • 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P41978 and previous config saved to /var/cache/conftool/dbconfig/20221201-004712-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T322618)', diff saved to https://phabricator.wikimedia.org/P41977 and previous config saved to /var/cache/conftool/dbconfig/20221201-003941-ladsgroup.json
  • 00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T322618)', diff saved to https://phabricator.wikimedia.org/P41976 and previous config saved to /var/cache/conftool/dbconfig/20221201-003533-ladsgroup.json
  • 00:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 00:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T322618)', diff saved to https://phabricator.wikimedia.org/P41975 and previous config saved to /var/cache/conftool/dbconfig/20221201-003511-ladsgroup.json
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P41974 and previous config saved to /var/cache/conftool/dbconfig/20221201-003205-ladsgroup.json
  • 00:25 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS buster
  • 00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1206.eqiad.wmnet with OS bullseye
  • 00:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P41973 and previous config saved to /var/cache/conftool/dbconfig/20221201-002005-ladsgroup.json
  • 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T322618)', diff saved to https://phabricator.wikimedia.org/P41972 and previous config saved to /var/cache/conftool/dbconfig/20221201-001659-ladsgroup.json
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T322618)', diff saved to https://phabricator.wikimedia.org/P41971 and previous config saved to /var/cache/conftool/dbconfig/20221201-001449-ladsgroup.json
  • 00:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 00:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T322618)', diff saved to https://phabricator.wikimedia.org/P41970 and previous config saved to /var/cache/conftool/dbconfig/20221201-001427-ladsgroup.json
  • 00:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1206.eqiad.wmnet with reason: host reimage
  • 00:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1206.eqiad.wmnet with reason: host reimage
  • 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P41969 and previous config saved to /var/cache/conftool/dbconfig/20221201-000458-ladsgroup.json

Archives

See Server Admin Log/Archives.