You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(dpifke@deploy1001: Synchronized wmf-config/profiler.php: Deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/620139 (duration: 01m 18s))
imported>Stashbot
(mutante: ms-be1039 - started failed ferm service)
Line 1: Line 1:
== 2020-08-20 ==
* 00:51 mutante: ms-be1039 - started failed ferm service
* 00:35 ejegg: stopped fundraising scheduled jobs
* 00:27 eileen: civicrm revision changed from {{Gerrit|c442a09153}} to {{Gerrit|cf9fadbeed}}, config revision is {{Gerrit|3cdffd4fc2}}
== 2020-08-19 ==
== 2020-08-19 ==
* 23:20 Urbanecm: Evening B&C window closed
* 23:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a80899948c26ca36b970b80fbad07600fe4ce92c}}: Enable VisualEditor in namespaces Draft and Wikiproject on hywiki ([[phab:T260825|T260825]]) (duration: 01m 05s)
* 22:41 eileen: civicrm revision changed from {{Gerrit|34f95a3311}} to {{Gerrit|c442a09153}}, config revision is {{Gerrit|3cdffd4fc2}}
* 21:27 eileen: civicrm revision changed from {{Gerrit|154519cc1f}} to {{Gerrit|34f95a3311}}, config revision is {{Gerrit|3cdffd4fc2}}
* 21:17 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 21:17 cdanis@cumin1001: START - Cookbook sre.network.cf
* 20:39 dpifke@deploy1001: Finished deploy [performance/arc-lamp@2ef1af7]: Deploy fixes for notifications and OOM prevention ([[phab:T259167|T259167]]) (duration: 00m 06s)
* 20:39 dpifke@deploy1001: Started deploy [performance/arc-lamp@2ef1af7]: Deploy fixes for notifications and OOM prevention ([[phab:T259167|T259167]])
* 19:43 ebernhardson: restart mjolnir-kafka-bulk-daemon on search-loader2001 with debug logging
* 19:20 mutante: testreduce1001 - re-enabled puppet, confirmed parsoid-rt service was now stopped properly by puppet while it runs as before on scandium, the previous parsoid-testing host. switching it over is now a Hiera one-liner. ([[phab:T257906|T257906]])
* 19:15 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.5  refs [[phab:T257973|T257973]] (duration: 01m 04s)
* 19:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.5  refs [[phab:T257973|T257973]]
* 19:02 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|60af096b80a8ef7bc94ec40ce203fd27b0c97f26}}: Add autopatrolled group at arzwiki ([[phab:T260761|T260761]]) (duration: 01m 04s)
* 18:52 mutante: testreduce1001 - disable puppet; stop parsoid-rt service
* 18:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|924a03bd624d6750a7e776e09713056cc45e5cc5}}: Add clinton.presidentiallibraries.us to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T259927|T259927]]) (duration: 01m 04s)
* 18:45 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|83b34e1bd1ed804a70f67e089580e082f89e2a0f}}: ClosedWikiProvider: Use testUserForCreation rather than testForAuthentication ([[phab:T258695|T258695]]) (duration: 01m 04s)
* 18:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|95d45f6e002df78d4860a711042d77a6b0bdecb9}}: Dont index Draft (118) and Draft talk (119) on hywiki ([[phab:T260804|T260804]]) (duration: 01m 04s)
* 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|803cb1a0d2c8cc6df8e4e88ab3c4d27eb71d01b3}}: Update taglines for various projects ([[phab:T258552|T258552]]) (duration: 01m 04s)
* 18:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: {{Gerrit|803cb1a0d2c8cc6df8e4e88ab3c4d27eb71d01b3}}: Update taglines for various projects ([[phab:T258552|T258552]]) (duration: 01m 06s)
* 18:25 mutante: rebooting webperf1002 VM on ganeti level (outside OS) to upgrade rom 8 to 16GB RAM ([[phab:T260192|T260192]])
* 18:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bb4aa44b0bd5b2b33d190d3af81e038e5fc55e3f}}: Configure namespaces on commons to include categories ([[phab:T198716|T198716]]) (duration: 01m 04s)
* 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b9043331c1c1b352256cffd471b9ff128806607c}}: Update project wordmarks ([[phab:T254788|T254788]]; sync 2/2) (duration: 01m 04s)
* 18:19 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: {{Gerrit|b9043331c1c1b352256cffd471b9ff128806607c}}: Update project wordmarks ([[phab:T254788|T254788]]; sync 1/2) (duration: 01m 06s)
* 18:15 mutante: rebooting webperf2002 VM on ganeti level (outside OS) to upgrade rom 8 to 16GB RAM ([[phab:T260192|T260192]])
* 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a6f8354e7599a5e92bea060807065f5b42c540e5}}: Enable $wgMFNoindexPages for all wikis ([[phab:T255458|T255458]]) (duration: 01m 07s)
* 18:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:13 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 17:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:38 mutante: decom'ing releases2001.codfw.wmnet (
* 17:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 16:39 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:37 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:41 rzl: finished exercising the switchdc cookbooks with --live-test for now, all changes reverted including re-enabling puppet on cumin1001
* 15:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 15:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 15:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 15:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 15:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
* 15:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 15:31 jbond42: update java.security https://gerrit.wikimedia.org/r/c/operations/puppet/+/593467
* 15:30 oblivian@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=api-rw
* 15:26 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
* 15:26 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 15:22 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
* 15:22 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 15:18 godog: prometheus codfw lvextend --resizefs --size +80G /dev/mapper/vg--ssd-prometheus--ops
* 15:17 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 15:17 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 15:16 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
* 15:16 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 15:14 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 15:14 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 15:08 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
* 15:08 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 15:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:50 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
* 14:50 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 14:50 rzl: running the switchdc cookbooks with --live-test, simulating a switch to eqiad where we're already running, no production impact is expected
* 14:47 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 14:47 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 14:41 rzl: disable puppet on cumin1001 for switchdc testing
* 14:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:33 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:27 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:38 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:34 gehel: depooling wdqs1007 and restarting blazegraph
* 13:29 _joe_: depooling and disabling puppet on restbase1024 for further investigation
* 13:27 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:26 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:25 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:10 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:03 _joe_: building and uploading fluent-bit, ratelimit images
* 13:01 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 12:57 _joe_: building a new version of the base docker images
* 11:29 awight: EU bacon finished
* 11:28 effie: restart mwdebug* servers
* 11:08 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:621227{{!}}Fix typos in flaggedrevs comments ()]] (duration: 01m 19s)
* 09:22 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 08:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:36 XioNoX: update firewall policies on pfw - [[phab:T260585|T260585]]
* 08:35 jayme: running puppet on A:all-mw-eqiad
* 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:20 godog: switch grafana.w.o to grafana 7 in codfw - [[phab:T259143|T259143]]
* 08:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:18 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:14 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 08:06 jayme: running puppet on A:all-mw-eqiad
* 07:46 godog: upgrade to grafana 7 on cloudmetrics hosts - [[phab:T259143|T259143]]
* 07:15 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 07:10 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 06:39 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 06:13 eileen: tools revision changed from {{Gerrit|b4ebd1e564}} to {{Gerrit|0b9d971bc4}}
* 06:07 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 06:04 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 06:03 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 06:00 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 05:55 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 05:53 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 05:47 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 05:37 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 05:31 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 03:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 02:53 cstone: civicrm revision changed from {{Gerrit|f5469d0a4c}} to {{Gerrit|154519cc1f}}
* 02:00 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 01:05 dpifke@deploy1001: Synchronized wmf-config/profiler.php: Deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/620139 (duration: 01m 18s)
* 01:05 dpifke@deploy1001: Synchronized wmf-config/profiler.php: Deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/620139 (duration: 01m 18s)
* 00:49 dpifke@deploy1001: Synchronized wmf-config/ProductionServices.php: Disabling old XHGui backend ([[phab:T180761|T180761]]) (duration: 05m 13s)
* 00:49 dpifke@deploy1001: Synchronized wmf-config/ProductionServices.php: Disabling old XHGui backend ([[phab:T180761|T180761]]) (duration: 05m 13s)

Revision as of 00:51, 20 August 2020

2020-08-20

  • 00:51 mutante: ms-be1039 - started failed ferm service
  • 00:35 ejegg: stopped fundraising scheduled jobs
  • 00:27 eileen: civicrm revision changed from c442a09153 to cf9fadbeed, config revision is 3cdffd4fc2

2020-08-19

  • 23:20 Urbanecm: Evening B&C window closed
  • 23:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a808999: Enable VisualEditor in namespaces Draft and Wikiproject on hywiki (T260825) (duration: 01m 05s)
  • 22:41 eileen: civicrm revision changed from 34f95a3311 to c442a09153, config revision is 3cdffd4fc2
  • 21:27 eileen: civicrm revision changed from 154519cc1f to 34f95a3311, config revision is 3cdffd4fc2
  • 21:17 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 21:17 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 20:39 dpifke@deploy1001: Finished deploy [performance/arc-lamp@2ef1af7]: Deploy fixes for notifications and OOM prevention (T259167) (duration: 00m 06s)
  • 20:39 dpifke@deploy1001: Started deploy [performance/arc-lamp@2ef1af7]: Deploy fixes for notifications and OOM prevention (T259167)
  • 19:43 ebernhardson: restart mjolnir-kafka-bulk-daemon on search-loader2001 with debug logging
  • 19:20 mutante: testreduce1001 - re-enabled puppet, confirmed parsoid-rt service was now stopped properly by puppet while it runs as before on scandium, the previous parsoid-testing host. switching it over is now a Hiera one-liner. (T257906)
  • 19:15 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.5 refs T257973 (duration: 01m 04s)
  • 19:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.5 refs T257973
  • 19:02 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 60af096: Add autopatrolled group at arzwiki (T260761) (duration: 01m 04s)
  • 18:52 mutante: testreduce1001 - disable puppet; stop parsoid-rt service
  • 18:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 924a03b: Add clinton.presidentiallibraries.us to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T259927) (duration: 01m 04s)
  • 18:45 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 83b34e1: ClosedWikiProvider: Use testUserForCreation rather than testForAuthentication (T258695) (duration: 01m 04s)
  • 18:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 95d45f6: Dont index Draft (118) and Draft talk (119) on hywiki (T260804) (duration: 01m 04s)
  • 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 803cb1a: Update taglines for various projects (T258552) (duration: 01m 04s)
  • 18:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: 803cb1a: Update taglines for various projects (T258552) (duration: 01m 06s)
  • 18:25 mutante: rebooting webperf1002 VM on ganeti level (outside OS) to upgrade rom 8 to 16GB RAM (T260192)
  • 18:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: bb4aa44: Configure namespaces on commons to include categories (T198716) (duration: 01m 04s)
  • 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: b904333: Update project wordmarks (T254788; sync 2/2) (duration: 01m 04s)
  • 18:19 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: b904333: Update project wordmarks (T254788; sync 1/2) (duration: 01m 06s)
  • 18:15 mutante: rebooting webperf2002 VM on ganeti level (outside OS) to upgrade rom 8 to 16GB RAM (T260192)
  • 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a6f8354: Enable $wgMFNoindexPages for all wikis (T255458) (duration: 01m 07s)
  • 18:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:13 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:38 mutante: decom'ing releases2001.codfw.wmnet (
  • 17:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 16:39 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:37 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:41 rzl: finished exercising the switchdc cookbooks with --live-test for now, all changes reverted including re-enabling puppet on cumin1001
  • 15:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 15:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 15:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 15:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 15:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:31 jbond42: update java.security https://gerrit.wikimedia.org/r/c/operations/puppet/+/593467
  • 15:30 oblivian@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=api-rw
  • 15:26 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 15:26 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 15:22 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 15:22 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:18 godog: prometheus codfw lvextend --resizefs --size +80G /dev/mapper/vg--ssd-prometheus--ops
  • 15:17 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 15:17 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 15:16 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 15:16 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 15:14 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 15:14 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:08 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
  • 15:08 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 15:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:50 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
  • 14:50 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 14:50 rzl: running the switchdc cookbooks with --live-test, simulating a switch to eqiad where we're already running, no production impact is expected
  • 14:47 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 14:47 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 14:41 rzl: disable puppet on cumin1001 for switchdc testing
  • 14:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:33 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:27 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:38 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:34 gehel: depooling wdqs1007 and restarting blazegraph
  • 13:29 _joe_: depooling and disabling puppet on restbase1024 for further investigation
  • 13:27 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:26 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:25 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:10 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:03 _joe_: building and uploading fluent-bit, ratelimit images
  • 13:01 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 12:57 _joe_: building a new version of the base docker images
  • 11:29 awight: EU bacon finished
  • 11:28 effie: restart mwdebug* servers
  • 11:08 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Fix typos in flaggedrevs comments () (duration: 01m 19s)
  • 09:22 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 08:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:36 XioNoX: update firewall policies on pfw - T260585
  • 08:35 jayme: running puppet on A:all-mw-eqiad
  • 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 08:20 godog: switch grafana.w.o to grafana 7 in codfw - T259143
  • 08:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:18 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:14 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:06 jayme: running puppet on A:all-mw-eqiad
  • 07:46 godog: upgrade to grafana 7 on cloudmetrics hosts - T259143
  • 07:15 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 07:10 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 06:39 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 06:13 eileen: tools revision changed from b4ebd1e564 to 0b9d971bc4
  • 06:07 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 06:04 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 06:03 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 06:00 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 05:55 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 05:53 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 05:47 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 05:37 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 05:31 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 03:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 02:53 cstone: civicrm revision changed from f5469d0a4c to 154519cc1f
  • 02:00 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 01:05 dpifke@deploy1001: Synchronized wmf-config/profiler.php: Deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/620139 (duration: 01m 18s)
  • 00:49 dpifke@deploy1001: Synchronized wmf-config/ProductionServices.php: Disabling old XHGui backend (T180761) (duration: 05m 13s)
  • 00:15 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster

2020-08-18

  • 23:45 catrope@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/GrowthExperiments: Only fetch task card data for users in variant C/D (T258021) (duration: 01m 05s)
  • 23:44 catrope@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/GrowthExperiments: Only fetch task card data for users in variant C/D (T258021) (duration: 01m 06s)
  • 23:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1301.eqiad.wmnet
  • 23:34 Urbanecm: Run scap pull at mw1301
  • 23:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable static maps on testwiki, disable them on test2wiki (duration: 03m 22s)
  • 23:32 mutante: rebooting mw1301 via mgmt
  • 23:22 mutante: killed reboot-cluster on cumin1001
  • 23:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ac34f72: Enable subpages in NS:0 in techconductwiki (T260350) (duration: 05m 14s)
  • 23:04 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1300.eqiad.wmnet
  • 22:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 22:41 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
  • 22:09 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:07 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 22:06 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 21:39 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:37 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:34 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 21:24 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 21:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 20:27 hashar: https://releases-jenkins.wikimedia.org/ changed agent from releases1001 to releases1002
  • 20:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.5 refs T257973
  • 20:11 mutante: running puppet on cp-ats-ulsfo and switching releases-jenkins backend
  • 20:07 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.5 refs T257973 (duration: 53m 12s)
  • 20:00 mutante: releases1001 rm /etc/rsync.d/frag* & run puppet
  • 19:54 mutante: rsyncing /var/lib/jenkins from releases1001 to releases1002/2002 with --delete T256164
  • 19:47 ejegg: updated payments-wiki from a7ee1790e0 to ef7ebd08cb
  • 19:44 hashar: Deleting old jobs from https://releases-jenkins.wikimedia.org/ # T256164
  • 19:41 hashar: releases1001: deleting old legacy mediawiki snapshots under /var/lib/jenkins/{REL1_27,REL1_29,REL1_30} # T256164
  • 19:14 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.5 refs T257973
  • 19:13 twentyafterfour: Promote testwikis from 1.36.0-wmf.4 to 1.36.0-wmf.5 refs T257973
  • 17:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:12 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw14(09|11|13).*
  • 16:03 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
  • 15:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 15:30 jayme@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 15:02 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 14:56 papaul: replacing msw-c1,c2 and c4
  • 14:55 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1104', diff saved to https://phabricator.wikimedia.org/P12293 and previous config saved to /var/cache/conftool/dbconfig/20200818-145337-marostegui.json
  • 14:48 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(55|64|65).*
  • 14:46 XioNoX: move v4 HE on cr3-ulsfo from peering to transit bgp group
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12292 and previous config saved to /var/cache/conftool/dbconfig/20200818-144415-marostegui.json
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12291 and previous config saved to /var/cache/conftool/dbconfig/20200818-143758-marostegui.json
  • 14:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12290 and previous config saved to /var/cache/conftool/dbconfig/20200818-142937-marostegui.json
  • 14:28 marostegui: Stop MYSQL on db2125 for on-site maintenance - T260670
  • 13:54 marostegui: Revoke DELETE and CREATE from xhgui user on m2 T260640
  • 13:53 XioNoX: bump Zayo v4 BGP session in eqiad
  • 13:49 XioNoX: move v4 HE on cr2-eqord from peering to transit bgp group
  • 13:37 XioNoX: move v4 cr1-eqiad from peering to transit bgp group
  • 13:04 kormat: disabling puppet on all db machines T259516
  • 12:57 _joe_: rebooting appservers in eqiad, 3 at a time
  • 12:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 12:37 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 12:34 kormat: deploying wmfmariadbpy 0.4
  • 12:21 jayme@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:53 XioNoX: add new icinga hosts to mr policies - T260533
  • 11:40 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:36 Lucas_WMDE: EU backport&config done
  • 11:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Wikisource wordmark for trwikisource (T260658), part 2 (duration: 00m 55s)
  • 11:32 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf '%s\n' 'https://en.wikipedia.org/static/images/mobile/copyright/wikisource-wordmark-tr.svg' | mwscript purgeList.php # T260658
  • 11:32 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/mobile/copyright/wikisource-wordmark-tr.svg: Config: Add Wikisource wordmark for trwikisource (T260658), part 1 (duration: 00m 55s)
  • 11:24 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Data Bridge on Catalan Wikipedia (T232584) (duration: 01m 01s)
  • 11:06 jbond42: deploy net-snmp update to buster
  • 10:56 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw229.*
  • 10:55 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 10:54 marostegui: Reboot db2125 after running a full upgrade - T260670
  • 10:46 marostegui: Powercycle db2125 from the idrac T260670
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - host down T260670', diff saved to https://phabricator.wikimedia.org/P12288 and previous config saved to /var/cache/conftool/dbconfig/20200818-100718-marostegui.json
  • 09:45 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 09:43 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
  • 09:40 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw214[234].*
  • 09:40 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 09:35 kart_: Update cxserver to 2020-08-17-090424-production (T259980)
  • 09:32 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 09:29 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 09:28 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 09:28 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw214[02].*
  • 09:26 volans: upgraded spicerack to v0.0.39 on cumin hosts
  • 09:25 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 09:21 volans: uploaded spicerack_0.0.39-1+deb10u1 to apt.wikimedia.org buster-wikimedia
  • 09:05 hashar: Restarting CI Jenkins
  • 08:44 vgutierrez: restart ats-tls on cp5006
  • 08:24 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 08:17 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:16 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 08:10 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1089', diff saved to https://phabricator.wikimedia.org/P12284 and previous config saved to /var/cache/conftool/dbconfig/20200818-080256-marostegui.json
  • 07:58 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 07:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 07:45 godog: VictorOps ack'd incidents will re-trigger after 24h if not resolved - T259465
  • 07:44 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12283 and previous config saved to /var/cache/conftool/dbconfig/20200818-074325-marostegui.json
  • 07:42 _joe_: performing rolling reboot of all codfw api servers
  • 07:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12282 and previous config saved to /var/cache/conftool/dbconfig/20200818-072349-marostegui.json
  • 07:19 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw213[5-9].codfw.wmnet
  • 07:16 jynus: update rest of phabricator passwords T250361
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12281 and previous config saved to /var/cache/conftool/dbconfig/20200818-071121-marostegui.json
  • 07:08 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 07:07 godog: prometheus eqiad: add 100G to prometheus/global
  • 07:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 07:01 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 07:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 06:53 twentyafterfour: phabricator maintenance successful
  • 06:48 jynus: deploy another password change to phabricator service (potentially disruptive) T250361
  • 06:41 XioNoX: add cloudflare PNI IPs in eqiad - T259036
  • 06:21 jynus: deploy password change to phabricator service T146055
  • 06:06 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 06:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 05:52 _joe_: running puppet on mc1020 T260622
  • 05:02 twentyafterfour: phabricator appears to be fully functional
  • 05:01 twentyafterfour: phabricator read-only ended
  • 05:00 twentyafterfour: phabricator is now read-only
  • 05:00 marostegui: Failover m3 (phabricator) database master from db1128 to db1132 - T259589
  • 04:32 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1088', diff saved to https://phabricator.wikimedia.org/P12279 and previous config saved to /var/cache/conftool/dbconfig/20200818-043241-marostegui.json
  • 01:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1376.eqiad.wmnet
  • 01:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1343.eqiad.wmnet
  • 01:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1344.eqiad.wmnet
  • 01:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 00:48 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1341.eqiad.wmnet
  • 00:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1340.eqiad.wmnet
  • 00:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1339.eqiad.wmnet
  • 00:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 00:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 00:15 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 00:07 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 00:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1315.eqiad.wmnet
  • 00:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)

2020-08-17

  • 23:59 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1313.eqiad.wmnet
  • 23:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 23:41 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1312.eqiad.wmnet
  • 23:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 23:30 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1297.eqiad.wmnet
  • 23:26 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 23:25 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:11 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:10 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1288.eqiad.wmnet
  • 23:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:47 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1286.eqiad.wmnet
  • 22:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:41 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 22:37 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1285.eqiad.wmnet
  • 22:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:26 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:26 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 22:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1284.eqiad.wmnet
  • 22:25 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:23 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 22:21 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 22:21 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:09 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1282.eqiad.wmnet
  • 22:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 22:02 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1281.eqiad.wmnet
  • 22:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:57 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki=aawiktionary --site-group wiktionary (T259360)
  • 21:56 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 21:56 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 21:53 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add api-gateway.request stream config T259736, one host timed out (duration: 00m 55s)
  • 21:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:48 ppchelko@deploy1001: sync-file aborted: Add api-gateway.request stream config T259736 (duration: 05m 01s)
  • 21:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1278.eqiad.wmnet
  • 21:46 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 21:43 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1279.eqiad.wmnet
  • 21:42 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 21:38 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Further mitigations for T257687 (duration: 00m 57s)
  • 21:38 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:34 effie: blocking temporarily traffic to mc1020
  • 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1276.eqiad.wmnet
  • 21:12 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2240.codfw.wmnet
  • 21:08 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 20:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 20:47 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:38 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 20:20 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 20:02 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 19:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 19:30 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:28 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:22 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:01 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 3 (duration: 02m 57s)
  • 18:58 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 3
  • 18:58 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 2 (duration: 11m 19s)
  • 18:46 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002, take 2
  • 18:46 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002 (duration: 131m 17s)
  • 18:46 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:43 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:39 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:32 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 808c17d: Change logo for lldwiki to match the requested one (T259432) (duration: 00m 56s)
  • 18:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: 67e8f88: Add logo files for lldwiki (T259432) (duration: 00m 56s)
  • 17:17 cdanis@cumin1001: conftool action : set/pooled=yes; selector: name=mw1359.*
  • 17:06 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 17:04 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=codfw,name=mw2246.codfw.wmnet
  • 17:01 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 16:36 jynus: restart backup2001, backup1001 one after the other
  • 16:35 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki T259002
  • 16:31 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 16:27 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 00m 56s)
  • 16:23 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - remove unneeded override for SearchSatisfaction - T259163 (duration: 00m 56s)
  • 16:22 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:21 oblivian@cumin1001: conftool action : set/pooled=inactive; selector: cluster=jobrunner,dc=codfw,name=mw2250.codfw.wmnet
  • 16:20 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=codfw
  • 16:20 cdanis@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:14 cdanis@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1359.*
  • 16:12 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 16:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 15:44 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 3. feeds timed out (duration: 01m 31s)
  • 15:43 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 3. feeds timed out
  • 15:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 2. feeds timed out (duration: 20m 40s)
  • 15:36 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕦☕ homer 'cr*' commit 'revert skipping RPKI validation for Jio AS55836 I0fd4683 T260452'
  • 15:30 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕦☕ homer 'cr*-codfw*' commit 'revert skipping RPKI validation for Jio AS55836 I0fd4683 T260452'
  • 15:22 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054. take 2. feeds timed out
  • 15:22 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054 (duration: 02m 30s)
  • 15:19 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: T257943 T260556 T253478 T254490 T259054
  • 15:08 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:06 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 15:04 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
  • 14:57 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - all wikis (take 2) - T254606 (duration: 00m 53s)
  • 14:57 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 14:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 14:44 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - all wikis - T254606 (duration: 00m 55s)
  • 14:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - group0 - T254606 (duration: 00m 56s)
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12277 and previous config saved to /var/cache/conftool/dbconfig/20200817-141449-marostegui.json
  • 14:09 marostegui: Sanitize thankyouwiki on db1124:3315, db2094:3315 - T260551
  • 14:03 marostegui: Sanitize lldwiki on db1124:3315 and db2094:3315 T259436
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12276 and previous config saved to /var/cache/conftool/dbconfig/20200817-140229-marostegui.json
  • 13:58 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T259432)
  • 13:54 Urbanecm: Creating thankyouwiki and lldwiki is done
  • 13:54 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 52s)
  • 13:54 Urbanecm: Create account Pcoombe (WMF) at thankyouwiki, email set to pcoombe@wikimedia.org (T259002)
  • 13:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:51 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:49 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating thankyouwiki (T259002)
  • 13:48 urbanecm@deploy1001: Synchronized dblists: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:47 marostegui: Deploy MCR change on db1104
  • 13:47 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating thankyouwiki (T259002) (duration: 00m 56s)
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 for MCR change', diff saved to https://phabricator.wikimedia.org/P12275 and previous config saved to /var/cache/conftool/dbconfig/20200817-134701-marostegui.json
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12274 and previous config saved to /var/cache/conftool/dbconfig/20200817-134619-marostegui.json
  • 13:46 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating thankyouwiki (T259002) (duration: 00m 55s)
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12273 and previous config saved to /var/cache/conftool/dbconfig/20200817-134604-marostegui.json
  • 13:41 jayme: imported td-agent-bit_1.5.3-0 to buster-wikimedia - T260536
  • 13:40 jayme: imported !log imported to buster-wikimedia
  • 13:39 marostegui: Upgrade db1088 (s6) to a newer mysql version (10.4.14)
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for mysql upgrade', diff saved to https://phabricator.wikimedia.org/P12272 and previous config saved to /var/cache/conftool/dbconfig/20200817-133905-marostegui.json
  • 13:34 jbond42: deploy json-c security update to buster
  • 13:33 marostegui: Restart mysql on db2102 (testing new package)
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12271 and previous config saved to /var/cache/conftool/dbconfig/20200817-133043-marostegui.json
  • 13:29 urbanecm@deploy1001: Synchronized langlist: Creating lldwiki (T259432) (duration: 00m 54s)
  • 13:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating lldwiki (T259432) (duration: 00m 55s)
  • 13:27 urbanecm@deploy1001: sync-file aborted: Creating lldwiki (T259432)¨ (duration: 00m 00s)
  • 13:26 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating lldwiki (T259432) (duration: 00m 53s)
  • 13:25 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating lldwiki (T259432)
  • 13:23 urbanecm@deploy1001: Synchronized dblists: Creating lldwiki (T259432) (duration: 00m 56s)
  • 13:22 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating lldwiki (T259432) (duration: 00m 56s)
  • 13:20 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating lldwiki (T259432) (duration: 00m 55s)
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12270 and previous config saved to /var/cache/conftool/dbconfig/20200817-131307-marostegui.json
  • 13:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:09 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12269 and previous config saved to /var/cache/conftool/dbconfig/20200817-130127-marostegui.json
  • 12:58 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depoool db1089 for MCR change', diff saved to https://phabricator.wikimedia.org/P12268 and previous config saved to /var/cache/conftool/dbconfig/20200817-124458-marostegui.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12267 and previous config saved to /var/cache/conftool/dbconfig/20200817-124409-marostegui.json
  • 12:44 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 12:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 12:35 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:27 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12266 and previous config saved to /var/cache/conftool/dbconfig/20200817-122234-marostegui.json
  • 12:21 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:20 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:19 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:19 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12265 and previous config saved to /var/cache/conftool/dbconfig/20200817-121600-marostegui.json
  • 12:05 Lucas_WMDE: EU backport window done
  • 12:02 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bjnwiki --fix | tee T259429-fix
  • 12:02 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bjnwiki | tee T259429-dryrun
  • 12:01 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Set Portal and Portal_talk namespaces in bjnwiki as an extra namespace. (T259429) (duration: 00m 55s)
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12264 and previous config saved to /var/cache/conftool/dbconfig/20200817-115741-marostegui.json
  • 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Wiktionary wordmark for eswiktionary (T254059), part 2 (duration: 00m 57s)
  • 11:53 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/mobile/copyright/wiktionary-wordmark-es.svg\n' | mwscript purgeList.php # T254059
  • 11:53 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/mobile/copyright/wiktionary-wordmark-es.svg: Config: Add Wiktionary wordmark for eswiktionary (T254059), part 1 (duration: 00m 56s)
  • 11:46 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/zh_classicalwiki%s.png\n' '-1.5x' '-2x' | mwscript purgeList.php # T259006
  • 11:45 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/project-logos/: Config: Change the logo of lzh Wikipedia (T259006) (duration: 00m 55s)
  • 11:40 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Turkish powered by MW and Wikimedia project icons for Turkish Wikiquote, Turkish Wiktionary, Turkish Wikisource and Turkish Wikibooks (T260493) (duration: 00m 55s)
  • 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add Turkish powered by MW and Wikimedia project icons (T260492) (duration: 00m 57s)
  • 11:25 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:14 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:09 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] configure mediasearch A/B test (duration: 01m 08s)
  • 11:08 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:54 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:52 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:52 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:52 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:51 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:49 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:47 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:42 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:36 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:35 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:35 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:30 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:14 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:14 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 10:13 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:13 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:06 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:56 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:55 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:52 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:45 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:42 jynus: updating compiler facts for cloud puppet compiler project to include new host dbprov2003
  • 09:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:38 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:38 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:28 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:27 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:23 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:22 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 09:18 _joe_: running a full apt-get upgrade on mw1379-1380
  • 09:18 _joe_: re-upgrading imagemagick on mw1378
  • 09:16 _joe_: upgrading packages on mw1377
  • 09:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 09:06 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:06 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:05 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 08:25 jayme: forcing a puppet run on all mw-api servers in eqiad - T260329
  • 07:52 _joe_: repooling mw1382
  • 07:37 _joe_: running the same test on mw1382 T260329
  • 07:34 _joe_: repooling mw1381
  • 07:15 _joe_: running the same test on mw1381 T260329
  • 07:15 _joe_: repooled mw1281
  • 06:26 _joe_: stop testing on mw1281, T260329
  • 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:28 marostegui: Stop mysql on db1099:3311, db1099:3318 for reimage
  • 05:28 _joe_: depooling mw1281 for testing for T260329
  • 05:25 marostegui: Deploy schema change on db1139:3311
  • 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311, db1099:3318 for reimage and MCR change', diff saved to https://phabricator.wikimedia.org/P12263 and previous config saved to /var/cache/conftool/dbconfig/20200817-052147-marostegui.json

2020-08-16

  • 11:12 gehel: repooling wdqs1004 - catched up on lag

2020-08-15

  • 21:18 gehel: depooling wdqs1004 and restarting services, will wait to catch up on lag before repooling

2020-08-14

  • 19:41 effie: restart mwdebug1002
  • 16:58 cdanis: done deploying 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8' to all routers T260449
  • 16:44 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr2-esams*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
  • 16:39 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr1-codfw*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
  • 16:36 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕧☕ homer 'cr2-codfw*' commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
  • 02:41 eileen: tools revision changed from 9a89f45974 to b4ebd1e564

2020-08-13

  • 23:39 tzatziki: removing 3 files for legal compliance
  • 22:03 mutante: switching xhgui from tungsten to xhgui1001 - ran puppet on webperf*001 - T180761 T158837
  • 21:54 andrew@deploy1001: Finished deploy [horizon/deploy@f3dcb29]: fix proxy in project-local domain --bug T260388 (duration: 03m 53s)
  • 21:50 andrew@deploy1001: Started deploy [horizon/deploy@f3dcb29]: fix proxy in project-local domain --bug T260388
  • 21:11 mutante: rsyncing /var/lib/jenkins from releases1001 to releases1002 and then all other releases* servers. 57GB, overwriting existing data from manual config (T247652)
  • 20:53 kormat: dropping xhgui.xhgui on m2
  • 19:35 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/DiscussionTools: Revert new reply API (again) T259855 (duration: 00m 57s)
  • 18:06 herron: restarted ES on logstash1010
  • 18:05 dpifke@deploy1001: Synchronized wmf-config/ProductionServices.php: Enabling new XHGui backend (T180761) (duration: 00m 56s)
  • 17:16 hnowlan: deployed ATS and varnish rules to route api.wikimedia.org
  • 16:26 hnowlan: created api.wikimedia.org
  • 15:49 hnowlan: moving api-gateway service to state production. critical set to false
  • 15:41 herron: restart ES on logstash1012
  • 14:56 fdans@deploy1001: Finished deploy [analytics/refinery@ba1a439]: Regular analytics weekly train (duration: 11m 34s)
  • 14:45 ema: repool mw1382 with kernel memory accounting disabled T260281
  • 14:45 fdans@deploy1001: Started deploy [analytics/refinery@ba1a439]: Regular analytics weekly train
  • 14:41 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:40 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:38 ema: reboot mw1382 with kernel memory accounting disabled T260281
  • 14:34 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:34 _joe_: rebooting mw1381 with a newer kernel, mw1383 as control with the old kernel T260329
  • 14:33 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:31 _joe_: installing kernel 4.19.0-0.bpo.9 on mw1381 T260329
  • 14:05 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 14:00 elukey: create schema[12]00[34] in ganeti - T260347
  • 13:59 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:58 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 13:53 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:51 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 13:46 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:45 hnowlan: moving api-gateway service to monitoring_setup
  • 13:44 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 13:44 hashar: Gracefully restarting Zuul
  • 13:39 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 13:10 _joe_: forcing a puppet run on the api appservers in eqiad T260329
  • 13:07 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: revert enabling of lilypond (again) T257091 T260329 (duration: 00m 59s)
  • 11:24 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:20 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 11:09 hnowlan: restarting pybal on lvs2010 T254908
  • 11:09 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:06 hnowlan: restarting pybal on lvs2009 T254908
  • 11:05 hnowlan: restarting pybal on lvs1016 T254908
  • 11:05 jayme: depool mw1380 for downgrade of poppler-utils,libpoppler-glib8,libpoppler64,curl,libcurl3,libcurl3-gnutls,libpython3.5,python3.5,libpython3.5-stdlib,python3.5-minimal,libpython3.5-minimal,imagemagick-6-common,libmagickcore-6.q16-3,libmagickwand-6.q16-3,imagemagick-6.q16,imagemagick,e2fslibs,e2fsprogs,libcomerr2,libss2 and reboot - T260329
  • 11:05 hnowlan: restarting pybal on lvs1015 T254908
  • 11:04 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:42 hnowlan: Moving api-gateway service to from service_setup to lvs_setup and running puppet on LVS servers
  • 10:17 jayme: depool mw1379 for downgrade of poppler-utils,libpoppler-glib8,libpoppler64,curl,libcurl3,libcurl3-gnutls,libpython3.5,python3.5,libpython3.5-stdlib,python3.5-minimal,libpython3.5-minimal,imagemagick-6-common,libmagickcore-6.q16-3,libmagickwand-6.q16-3,imagemagick-6.q16,imagemagick,e2fslibs,e2fsprogs,libcomerr2,libss2 and reboot - T260329
  • 10:04 XioNoX: re-order OSPF interfaces on all routers (now partially netbox driven)
  • 09:37 ayounsi@deploy1001: Finished deploy [homer/deploy@89636df]: Homer release v0.2.5 (duration: 03m 03s)
  • 09:34 ayounsi@deploy1001: Started deploy [homer/deploy@89636df]: Homer release v0.2.5
  • 08:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 08:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1082', diff saved to https://phabricator.wikimedia.org/P12247 and previous config saved to /var/cache/conftool/dbconfig/20200813-085547-marostegui.json
  • 08:45 _joe_: downgrading imagemagick on mw1378 T260329
  • 08:43 _joe_: downgrading imagemagick on mw1378 T260281
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 07:55 _joe_: downgrading curl/libcurl3/libcurl3-gnutls on mw1377 T260329
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12246 and previous config saved to /var/cache/conftool/dbconfig/20200813-074528-marostegui.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12244 and previous config saved to /var/cache/conftool/dbconfig/20200813-071943-marostegui.json
  • 07:16 marostegui: Stop replication on db1082 to remove triggers on sanitarium for MCR changs
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P12243 and previous config saved to /var/cache/conftool/dbconfig/20200813-071545-marostegui.json
  • 06:48 marostegui: Deploy MCR change on dbstore1003:3311
  • 06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1126', diff saved to https://phabricator.wikimedia.org/P12242 and previous config saved to /var/cache/conftool/dbconfig/20200813-060135-marostegui.json
  • 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:43 marostegui: Stop MySQL on db2135 (codfw master), haproxy irc alert will fire T260324
  • 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12241 and previous config saved to /var/cache/conftool/dbconfig/20200813-052859-marostegui.json
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12240 and previous config saved to /var/cache/conftool/dbconfig/20200813-051222-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12239 and previous config saved to /var/cache/conftool/dbconfig/20200813-050107-marostegui.json
  • 02:56 mutante: testreduce1001 - systemctl reset-failed ; fix parsoid-vd systemd state and icinga alert
  • 00:37 mutante: removing jenkins_service_running checks from secondary servers where it's stopped, manually from icinga config, running puppet on icinga
  • 00:14 mutante: re-enabling puppet on releases* servers

2020-08-12

  • 23:44 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:41 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:40 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:39 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:37 wkandek: reboot mw1372
  • 23:36 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:36 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:32 wkandek: reboot mw1373
  • 23:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:31 wkandek: reboot mw1371
  • 23:31 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:31 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:30 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:28 wkandek: reboot mw1384
  • 23:27 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:27 wkandek: reboot mw1385
  • 23:26 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:25 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:24 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:22 wkandek: reboot mw1370
  • 23:22 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:19 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:18 wkandek: reboot mw1369
  • 23:18 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:17 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:17 wkandek: reboot mw1387
  • 23:16 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:16 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:16 wkandek: reboot mw1389
  • 23:15 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:14 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:09 wkandek: reboot mw1368
  • 23:09 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:08 wkandek: reboot me1367
  • 23:08 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:07 wkandek: reboot mw1391
  • 23:07 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:05 ejegg: updated Fundraising CiviCRM from 72452e28a9 to f5469d0a4c
  • 23:05 wkandek: reboot mw1393
  • 23:04 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:04 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 23:01 wkandek: reboot mw1395
  • 23:01 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 23:00 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:53 wkandek: reboot mw1397
  • 22:53 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:52 wkandek: reboot mw1366
  • 22:52 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:52 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:52 wkandek: reboot me1365
  • 22:51 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:51 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:51 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:47 wkandek: reboot mw1399
  • 22:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:46 wkandek: reboot mw1364
  • 22:46 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:45 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:44 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:42 wkandek: reboot mw1401
  • 22:42 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:41 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:41 wkandek: reboot mw1355
  • 22:40 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:40 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:38 wkandek: reboot mw1354
  • 22:38 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:36 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:36 wkandek: reboot mw1396
  • 22:36 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:35 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:32 wkandek: reboot mw1353
  • 22:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:31 wkandek: reboot mw1352
  • 22:31 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:31 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:30 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:29 wkandek: reboot mw1348
  • 22:29 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:28 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:26 wkandek: reboot 1347
  • 22:26 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:25 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:23 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:22 wkandek: reboot mw1350
  • 22:22 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:21 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:20 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:19 wkandek: reboot mw1346
  • 22:19 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:18 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:14 wkandek: reboot mw1345
  • 22:13 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:12 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:12 wkandek: reboot mw1349
  • 22:12 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:11 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:08 wkandek: reboot mw1333
  • 22:07 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:07 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1330.eqiad.wmnet
  • 22:03 wkandek: reboot mw1344
  • 22:03 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:02 wkandek: reboot mw1343
  • 22:02 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 22:02 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 22:00 wkandek: reboot mw1332
  • 22:00 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:56 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:55 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:53 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:50 wkandek: reboot mw1331
  • 21:50 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:48 wkandek: reboot mw1342
  • 21:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:46 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:46 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1340.eqiad.wmnet
  • 21:41 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 21:40 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:39 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:39 wkandek: reboot mw1341
  • 21:39 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:37 wkandek@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
  • 21:37 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:36 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:33 wkandek: reboot mw1329
  • 21:33 wkandek: reboot mw1328
  • 21:32 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:29 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:28 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:28 ejegg: updated payments-wiki from 77ff5d70fc to a7ee1790e0
  • 21:25 wkandek: reboot mw1340
  • 21:25 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:23 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:21 wkandek: reboot mw1339
  • 21:20 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:20 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:15 wkandek: reboot mw1327
  • 21:15 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:14 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:13 wkandek: reboot mw1326
  • 21:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:11 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:11 wkandek: reboot mw1317
  • 21:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:10 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:05 wkandek: reboot mw1316
  • 21:04 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:03 wkandek: reboot mw1325
  • 21:03 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:02 wkandek: reboot mw1324
  • 21:02 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:02 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 21:01 wkandek: reboot mw1315
  • 21:01 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 21:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:57 wkandek: reboot mw1323
  • 20:57 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:54 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:52 wkandek: reboot mw1322
  • 20:52 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:51 wkandek: reboot mw1314
  • 20:51 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:50 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:50 wkandek: reboot mw1313
  • 20:50 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:49 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:48 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:44 wkandek: reboot mw1312
  • 20:44 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:43 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:43 wkandek: reboot mw1321
  • 20:42 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:41 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:40 wkandek: reboot mw1297
  • 20:40 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:39 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:39 wkandek: reboot mw1320
  • 20:39 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:38 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:34 wkandek: reboot mw1290
  • 20:34 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:33 wkandek: reboot mw1319
  • 20:33 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:32 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:31 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:29 wkandek: reboot mw1275
  • 20:29 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:28 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:26 wkandek: reboot mw1289
  • 20:25 wkandek: reboot mw1288
  • 20:25 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:25 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:23 wkandek: reboot mw1274
  • 20:23 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:23 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:22 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:20 wkandek: reboot mw1273
  • 20:20 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:16 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 20:13 wkandek: reboot mw1287
  • 20:13 wkandek: reboot mw1286
  • 20:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 20:11 wkandek: reboot mw1272
  • 20:11 wkandek: reboot mw1271
  • 19:41 hashar: Upgrading Jenkins on contint2001 (primary)
  • 19:25 hashar: contint1001: sudo systemctl mask jenkins # spare server
  • 19:25 mutante: all releases* servers except 1001 - disable puppet; stop jenkins, mask jenkins
  • 19:22 mutante: releases1002 - stopped and masked jenkins service
  • 19:22 mutante: releases2001 - stopped and masked jenkins service
  • 19:20 mutante: upgrading jenkins on releases*001
  • 19:19 hashar: Upgrading Jenkins on contint1001 (spare)
  • 19:16 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.4
  • 19:13 mutante: uploade new jenkins version to APT repo; upgrading jenkins on releases1002/2002
  • 19:08 effie: pool mw1396
  • 19:06 effie: repool mw1395 mw1397 mw1399
  • 18:56 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 18:55 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 18:50 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/Wikibase/client/includes/Store/Sql/DirectSqlStore.php: Set caching of CachingEntityRevisionLookup to CACHE_NONE in client (duration: 02m 13s)
  • 18:47 wkandek: reboot mw1270
  • 18:47 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:45 wkandek: reboot mw1269
  • 18:41 root@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:39 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:39 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:38 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 18:25 wkandek: reboot mw1268
  • 18:25 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 18:25 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 18:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:22 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 18:22 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:17 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
  • 18:17 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 18:16 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on hewiki (T255020) (duration: 01m 03s)
  • 18:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:07 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:07 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:06 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 18:04 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: Set caching of CachingEntityRevisionLookup to CACHE_NONE in repo (duration: 01m 06s)
  • 18:02 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 18:02 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 18:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:56 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:52 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 17:52 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 17:52 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:51 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 17:51 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:51 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:50 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:49 effie: reboot mw1265 mw1282 mw1283
  • 17:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:45 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:37 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 17:36 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 17:30 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 17:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 17:19 effie: reboot mw1263 mw1264 mw1279 and mw1281
  • 17:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 17:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 17:16 cdanis: for posterity: mw1359 has a bunch of special packages installed (previously recorded in SAL) and also has `sudo memleak-bpfcc -o 60000 -z 31 -Z 33 30` running in a tmux in an attempt to understand what's causing the page fragmentation in the appserver fleet
  • 17:16 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 17:16 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:13 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 17:13 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:03 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 17:00 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 16:57 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Additional mitigations for T257687 (duration: 01m 03s)
  • 16:53 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:52 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:48 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 16:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:35 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:35 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:32 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:31 effie: reboot mw1277 mw1278 && mw1261 mw1262
  • 16:29 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 16:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 16:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 16:04 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: I3726a6364d, T257079 (duration: 01m 02s)
  • 15:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:52 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:50 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:48 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:48 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:42 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:37 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:36 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:32 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:32 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:26 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:22 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 15:21 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:15 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:14 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:12 cdanis: ✔️ cdanis@mw1359.eqiad.wmnet ~ 🕚☕ sudo apt install linux-headers-4.9.0-12-amd64
  • 15:10 cdanis: ✔️ cdanis@mw1359.eqiad.wmnet ~ 🕚☕ sudo apt install python3-netaddr ieee-data
  • 15:09 cdanis: ✔️ cdanis@mw1359.eqiad.wmnet ~ 🕚☕ sudo dpkg -i bpfcc-tools_0.12.0-2_all.deb libbpfcc_0.12.0-2_amd64.deb python3-bpfcc_0.12.0-2_all.deb
  • 15:08 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 15:03 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 15:03 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 15:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 15:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:54 cdanis: again un-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports
  • 14:53 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 14:52 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 14:44 cdanis: temporarily re-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports, original in my homedir
  • 14:37 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:37 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 14:35 cdanis: un-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports
  • 14:32 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:31 cdanis: temporarily kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports, original in my homedir
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:02 kormat: uploaded wmfmariadbpy 0.3 to apt
  • 13:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:42 effie: restart mw1383 & mw1386
  • 13:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 13:27 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.4 (duration: 01m 16s)
  • 13:25 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.4
  • 13:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 13:19 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:15 cdanis: ✔️ cdanis@mw1357.eqiad.wmnet ~ 🕘☕ sudo sysctl -w vm/compact_memory=1
  • 13:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 13:07 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 13:04 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:59 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:52 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:50 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:33 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
  • 12:27 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 12:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:15 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 12:15 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 12:15 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:51 ema: pool mw1363 after reboot
  • 11:49 jynus: creating artificial low replication lag on db2130 to test icinga alerts T253120
  • 11:41 ema@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:37 ema@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:30 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:28 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:21 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:17 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:17 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:13 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 11:10 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:08 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:07 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 11:07 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 11:00 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
  • 11:00 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:55 _joe_: rebooting mw1361
  • 10:51 jayme: rebooting mw1356
  • 10:49 _joe_: rebooting mw1378
  • 09:45 _joe_: repooling mw1377
  • 09:40 _joe_: rebooting mw1377
  • 09:22 _joe_: depool mw1357 tool
  • 09:14 _joe_: depooling mw1377 for inspection
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1110', diff saved to https://phabricator.wikimedia.org/P12220 and previous config saved to /var/cache/conftool/dbconfig/20200812-091211-marostegui.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12219 and previous config saved to /var/cache/conftool/dbconfig/20200812-090831-marostegui.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12218 and previous config saved to /var/cache/conftool/dbconfig/20200812-085021-marostegui.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12217 and previous config saved to /var/cache/conftool/dbconfig/20200812-083548-marostegui.json
  • 07:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for reimage', diff saved to https://phabricator.wikimedia.org/P12215 and previous config saved to /var/cache/conftool/dbconfig/20200812-073130-marostegui.json
  • 04:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for MCR change', diff saved to https://phabricator.wikimedia.org/P12214 and previous config saved to /var/cache/conftool/dbconfig/20200812-045157-marostegui.json

2020-08-11

  • 23:41 Urbanecm: Evening B&C window completed
  • 23:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0f238f7: Update wgMFRemovableClasses (T231160) (duration: 01m 03s)
  • 23:36 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/MobileFrontend/extension.json: c22d65f: Hide vertical nav-boxes on mobile domain (T231160) (duration: 01m 03s)
  • 23:34 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/MobileFrontend/extension.json: 81d54b0: Hide vertical nav-boxes on mobile domain (T231160) (duration: 01m 05s)
  • 23:07 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: 28faa27: Switching to updated license definition (duration: 01m 04s)
  • 21:52 krinkle@deploy1001: Synchronized php-1.36.0-wmf.3/includes/skins/SkinMustache.php: Ibe1f07346, T259872, T259858 (duration: 01m 04s)
  • 19:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 19:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 19:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add streams for eventgate-main - T251935 (duration: 01m 04s)
  • 19:21 ejegg: updated payments-wiki from f199c071c3 to 77ff5d70fc
  • 18:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:48 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 18:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Grant investigate right to checkuser group on frwiki (T260171) (duration: 01m 04s)
  • 18:18 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Beta-only: Configured additional settings for API Portal beta wiki gerrit:619339 (duration: 01m 03s)
  • 18:05 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Direct GrowthExperiments help panel questions to mentors on cswiki (T250235) (duration: 01m 03s)
  • 17:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Remove extraneous mediawiki.api-request stream - T251935 (duration: 01m 01s)
  • 17:53 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:53 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:43 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:43 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 17:38 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:33 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:28 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:25 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:58 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:53 hashar@deploy1001: Synchronized php-1.36.0-wmf.4/skins/MinervaNeue/: Revert "ServiceWiring: Avoid usage of deprecated Title::getSubjectPage()" - T260155 (duration: 01m 06s)
  • 16:12 herron: migrating lists.wikimedia.org services from fermium to lists1001 T224586
  • 15:36 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.4
  • 15:27 hashar@deploy1001: Finished scap: (no justification provided) (duration: 30m 51s)
  • 14:59 marostegui: Deploy MCR change on db1116:3318
  • 14:56 hashar@deploy1001: Started scap: (no justification provided)
  • 14:56 hashar@deploy1001: Pruned MediaWiki: 1.36.0-wmf.2 (duration: 04m 15s)
  • 14:55 jayme: updated helmfile to 0.125.2-1 on contint* and deploy*
  • 14:52 otto@deploy1001: Finished deploy [analytics/refinery@35c4430]: Deploying to an-launcher1002 to get camus wrapper script changes - T251935 (duration: 01m 14s)
  • 14:51 otto@deploy1001: Started deploy [analytics/refinery@35c4430]: Deploying to an-launcher1002 to get camus wrapper script changes - T251935
  • 14:50 hashar@deploy1001: Pruned MediaWiki: 1.36.0-wmf.1 (duration: 02m 07s)
  • 14:48 jayme: imported helmfile_0.125.2-1 to buster-wikimedia, jessie-wikimedia, stretch-wikimedia
  • 14:47 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.41 (duration: 04m 20s)
  • 14:40 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.40 (duration: 10m 24s)
  • 14:37 papaul: replacing msw-b5,b6,b7 and b8
  • 14:30 hashar: Cleaning old MediaWiki versions that were never removed
  • 14:27 hashar@deploy1001: sync aborted: testwikis wikis to 1.36.0-wmf.4 (duration: 72m 36s)
  • 14:10 hashar: mw1319: scap pull
  • 13:27 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 13:27 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:23 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:16 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 13:14 hashar@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.4
  • 13:12 hashar: Applied 1.36.0-wmf.4 security patches # T257972
  • 13:03 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 13:03 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:52 kormat: uploaded wmfmariadbpy 0.2 packages to apt1001
  • 12:36 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 12:36 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 11:54 marostegui: Install new MariaDB 10.4.14 on db2102
  • 11:42 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 11:30 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 11:18 Urbanecm: EU B&C window done
  • 11:08 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 619255|Enable ContentTranslation in Sundanese WP as a default tool (T258502) (duration: 00m 59s)
  • 10:39 volans: migrating *all* eqiad mgmt DNS records to the autogenerated ones via Netbox - T233183
  • 10:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:34 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:20 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0)
  • 10:01 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh
  • 10:00 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
  • 09:51 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
  • 09:29 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:25 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:11 marostegui: Rename tables on muswiki and mhwiktionary on s3 master (db1123) without replication T260112
  • 09:01 volans: renewed puppet certificate on scb1001.eqiad.wmnet
  • 08:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: e6ec237: Revert "Turn muswiki and mhwiktionary to read-only" (T259004) (duration: 00m 58s)
  • 08:45 urbanecm@deploy1001: Synchronized dblists/: 81f4594: Point muswiki and mhwiktionary to s5 (T259004; 3/3) (duration: 00m 58s)
  • 08:44 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: 81f4594: Point muswiki and mhwiktionary to s5 (T259004; 2/3) (duration: 00m 58s)
  • 08:43 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: 81f4594: Point muswiki and mhwiktionary to s5 (T259004; 1/3) (duration: 01m 02s)
  • 08:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a04bc1f: Turn muswiki and mhwiktionary to read-only (T259004) (duration: 01m 01s)
  • 08:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 06:54 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 06:45 XioNoX: Re-prioritize peering over transit eqiad/esams - T259614
  • 01:59 tstarling@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: enabling fast stale mode T250248 (duration: 00m 58s)
  • 00:33 dpifke@deploy1001: Finished deploy [performance/arc-lamp@fc5f1c6]: Deploying latest attempt to fix T259167 (duration: 01m 03s)
  • 00:31 dpifke@deploy1001: Started deploy [performance/arc-lamp@fc5f1c6]: Deploying latest attempt to fix T259167
  • 00:24 mutante: reverting switch of releases.wikimedia.org for today since releases-jenkins.wikimedia.org is tied to it and new jenkins still needs some config and plugins (T247652)
  • 00:08 mutante: releases-jenkins.wikimedia.org currently under maintenance (T247652)

2020-08-10

  • 23:56 eileen: tools revision changed from 22550f38c5 to 9a89f45974
  • 23:53 mutante: https://releases.wikimedia.org switched to new backends running Debian buster. files have been synced. httpbb tests have been created and pass. (T247652)
  • 23:52 mutante: https://releases.wikimedia.org switched to new backends running Debian buster. files have been synced of course.
  • 20:13 hashar: Updated container for Jenkins job operations-puppet-tests-buster-docker https://gerrit.wikimedia.org/r/c/integration/config/+/619359/
  • 20:10 ejegg: updated payments-wiki from 932aacde54 to f199c071c3
  • 18:32 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@3e12dbb]: 0.3.44 (duration: 15m 18s)
  • 18:20 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:17 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 18:17 ryankemper@deploy1001: Started deploy [wdqs/wdqs@3e12dbb]: 0.3.44
  • 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Special:Investigate on frwiki (T257891) (duration: 00m 58s)
  • 18:07 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Explicitly disable nativeGallery in Parsoid settings (no-op) (duration: 00m 58s)
  • 18:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump the weight of near match for search (T257922) (duration: 00m 59s)
  • 17:56 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:52 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:49 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-analytics streams - T251935 (duration: 01m 02s)
  • 17:46 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:38 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:38 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:34 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 17:34 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 17:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:31 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 17:31 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 17:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:14 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:12 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 17:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:08 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:04 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:03 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:59 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:59 volans@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 15:59 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 15:55 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:17 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:11 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:01 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:48 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 14:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:15 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 14:15 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 14:14 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:55 XioNoX: Re-prioritize peering over transit - codfw - T259614
  • 12:34 XioNoX: Re-prioritize peering over transit - eqsin - T259614
  • 12:07 XioNoX: standardize cr1-eqiad interfaces
  • 11:56 Urbanecm: EU B&C window done
  • 11:55 Urbanecm: Run `mwscript namespaceDupes.php --wiki=tiwiki --fix` at mwmaint1002 (T259295)
  • 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 14b2897: Define Portal namespace for tiwiki (T259295) (duration: 00m 59s)
  • 11:49 urbanecm@deploy1001: Synchronized static/images/project-logos/: bbbf701: Regenerate Bengali Wikipedia logo from source SVG (T259292) (duration: 00m 59s)
  • 11:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 0d8366f: Search Work NS by default at bnwikisource (T258982) (duration: 00m 59s)
  • 11:37 Urbanecm: Run `mwscript namespaceDupes.php --wiki=hywiki --fix` at mwmaint1002 (T259987)
  • 11:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 1771487: add two extra namespaces for hywiki (T259987) (duration: 00m 59s)
  • 11:28 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/shnwiktionary*.png with purgeList.php (T260010)
  • 11:27 XioNoX: standardize cr2-eqiad interfaces
  • 11:27 urbanecm@deploy1001: Synchronized static/images/project-logos/: c5c96ca: Regenerate shnwiktionary logo from source svg (T260010) (duration: 00m 58s)
  • 11:21 XioNoX: repool ulsfo
  • 11:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: a15e3a2: Increase autoconfirmed threshold for Chinese Wikinews to 7 days and 20 edits at least (T259869) (duration: 00m 58s)
  • 11:13 XioNoX: Re-prioritize peering over transit - ulsfo - T259614
  • 11:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: ba0b2ab: Create TemplateEditor group on zhwiki (T260012) (duration: 00m 58s)
  • 11:09 Urbanecm: Run mwscript namespaceDupes.php --wiki=ptwikinews --fix --add-prefix=T259959 (T259959)
  • 11:09 Urbanecm: Run mwscript namespaceDupes.php --wiki=ptwikinews --fix (T259959)
  • 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 010f63e: Add WN as an alias to project namespace in Portuguese Wikinews (T259959) (duration: 00m 58s)
  • 11:06 urbanecm@deploy1001: sync-file aborted: 010f63e: Add WN as an alias to project namespace in Portuguese Wikinews (T259959¨) (duration: 00m 00s)
  • 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 01s)
  • 10:42 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.pool (exit_code=0)
  • 10:37 jayme@cumin1001: START - Cookbook sre.discovery.pool
  • 10:36 jayme@cumin1001: END (FAIL) - Cookbook sre.discovery.pool (exit_code=99)
  • 10:36 jayme@cumin1001: START - Cookbook sre.discovery.pool
  • 10:33 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 10:29 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.depool (exit_code=0)
  • 10:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 10:23 jayme@cumin1001: START - Cookbook sre.discovery.depool
  • 10:19 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.pool (exit_code=0)
  • 10:18 jayme@cumin1001: START - Cookbook sre.discovery.pool
  • 10:14 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:10 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 10:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
  • 10:04 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
  • 09:56 hashar: Updated containeer for Jenkins job operations-dns-lint-docker https://gerrit.wikimedia.org/r/619267
  • 09:55 hashar: Updated container for Jenkins job operations-puppet-tests-buster-docker https://gerrit.wikimedia.org/r/619266
  • 09:54 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.depool (exit_code=0)
  • 09:49 jayme@cumin1001: START - Cookbook sre.discovery.depool
  • 09:21 marostegui: Promote dbproxy1019 back T255408
  • 08:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:43 marostegui: Remove revision triggers from db2094:3318 T238966
  • 06:42 marostegui: Stop replication on s8 codfw master to deploy MCR change, this will generate lag on s8 codfw T238966
  • 04:46 marostegui: Depool dbproxy1019 for reimage T255408

2020-08-09

  • 21:58 ejegg: updated payments-wiki from cd012f37f1 to 932aacde54
  • 03:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)

2020-08-08

  • 02:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
  • 02:21 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
  • 02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload

2020-08-07

  • 16:42 jforrester@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/DiscussionTools/: T259855 Revert new reply API (duration: 01m 06s)
  • 15:01 volans: import DNS names for network devices in Netbox - T258729
  • 13:27 godog: bounce pybal on lvs1016 and then lvs1015 to reset state, logstash1025 reported down but actually up
  • 10:27 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:27 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:02 elukey: reboot deneb via ganeti2021 (hostname config pointing to recdns for some reason)
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1092', diff saved to https://phabricator.wikimedia.org/P12195 and previous config saved to /var/cache/conftool/dbconfig/20200807-091527-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12194 and previous config saved to /var/cache/conftool/dbconfig/20200807-084747-marostegui.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12193 and previous config saved to /var/cache/conftool/dbconfig/20200807-080719-marostegui.json
  • 07:50 godog: prometheus codfw lvextend --resize --size +60G /dev/mapper/vg--hdd-prometheus--global
  • 07:49 godog: prometheus codfw lvextend --resize --size +30G /dev/mapper/vg--ssd-prometheus--k8s
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12192 and previous config saved to /var/cache/conftool/dbconfig/20200807-074658-marostegui.json
  • 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for upgrade', diff saved to https://phabricator.wikimedia.org/P12191 and previous config saved to /var/cache/conftool/dbconfig/20200807-063431-marostegui.json

2020-08-06

  • 23:21 catrope@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/GrowthExperiments/: Fixes for WelcomeSurvey language question (T232410) (duration: 00m 59s)
  • 23:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change GrowthExperiments mentor list on fawiki (T253291) (duration: 00m 59s)
  • 21:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:39 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 21:35 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 21:33 brennen@deploy1001: Synchronized php-1.36.0-wmf.3/vendor: Update git submodules (vendor) (T259832) (duration: 01m 08s)
  • 21:32 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 20:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 20:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 20:47 shdubsh: restart logstash -- pipeline appears stuck
  • 20:38 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:38 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 20:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 20:19 brennen: manually updating the vendor submodule on 1.36.0 for T259832
  • 20:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 20:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:47 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:45 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - wgEventStreams - fix another typo in eventgate stream config - T251935 (duration: 00m 58s)
  • 19:40 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - wgEventStreams - fix typo in eventgate stream config - T251935 (duration: 00m 59s)
  • 19:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 19:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 19:04 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.3
  • 18:58 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 18:57 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 18:29 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 18:29 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 18:21 Urbanecm: Morning B&C window was completed
  • 18:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/GrowthExperiments/modules/: fb4a808: Fix "Ask mentor" help panel button styling (T250235) (duration: 01m 07s)
  • 18:11 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 18:11 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9db9659: Remove temporary logging for mediamoderation (T259742) (duration: 01m 07s)
  • 18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 9695811: : Enable DiscussionTools as a beta feature on 8 more wikis ("phase 1") (T259574) (duration: 01m 06s)
  • 17:42 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.3 (duration: 01m 06s)
  • 17:41 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.3
  • 17:37 brennen: train 1.36.0-wmf.3: proceeding to group1
  • 17:36 brennen@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/WikibaseMediaInfo/src/View/MediaInfoEntityTermsView.php: Backport: Fix array unpacking as argument list (T259745) (duration: 01m 07s)
  • 16:32 chrisalbon@deploy1001: Finished deploy [ores/deploy@f3c44be]: T258435 (duration: 14m 12s)
  • 16:18 dpifke@deploy1001: Finished deploy [performance/arc-lamp@7838c88]: Deploying fixes for T259167 (duration: 00m 05s)
  • 16:18 dpifke@deploy1001: Started deploy [performance/arc-lamp@7838c88]: Deploying fixes for T259167
  • 16:18 chrisalbon@deploy1001: Started deploy [ores/deploy@f3c44be]: T258435
  • 15:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 15:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 15:10 fdans@deploy1001: Finished deploy [analytics/refinery@97a02a3]: Regular analytics weekly train [analytics/refinery@97a02a3 (duration: 20m 01s)
  • 14:50 fdans@deploy1001: Started deploy [analytics/refinery@97a02a3]: Regular analytics weekly train [analytics/refinery@97a02a3
  • 14:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-* test.event streams - T251935 (duration: 01m 08s)
  • 13:32 jayme: updated helm to 2.16.9-2 on contint*, deploy* and chartmuseum*
  • 13:24 jayme: imported helm_2.16.9-2 and tiller_2.16.9-2 to buster-wikimedia, jessie-wikimedia and stretch-wikimedia
  • 12:06 kart_: Updated cxserver to 2020-08-05-070016-production (T258919, T199523, T257943, T256194)
  • 12:03 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 11:59 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 11:57 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 11:54 Lucas_WMDE: EU backport window done
  • 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/Flow/: Backport: Pass jQuery objects into jqueryMsg (duration: 01m 09s)
  • 11:53 XioNoX: reboot cr2-eqord - T259621
  • 11:37 XioNoX: drain traffic away cr2-eqord - T259621
  • 11:27 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/Wikibase/lib/: Backport: Fix CachingFallbackLabelDescriptionLookup failing in edge-cases (T259744) (duration: 01m 10s)
  • 11:22 XioNoX: reboot cr2-eqdfw - T259621
  • 11:13 XioNoX: drain traffic away cr2-eqdfw - T259621
  • 10:52 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:48 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:45 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 10:23 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:16 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:14 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:12 jynus@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:11 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1127', diff saved to https://phabricator.wikimedia.org/P12188 and previous config saved to /var/cache/conftool/dbconfig/20200806-084406-marostegui.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12187 and previous config saved to /var/cache/conftool/dbconfig/20200806-083743-marostegui.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12186 and previous config saved to /var/cache/conftool/dbconfig/20200806-083033-marostegui.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12185 and previous config saved to /var/cache/conftool/dbconfig/20200806-081416-marostegui.json
  • 07:03 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 06:57 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 06:57 marostegui: Truncate tables on zerowiki T227717
  • 06:53 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 06:47 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 06:43 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 06:37 elukey: roll restart of druid clusters' zookeeper and an-conf* zookeeper for openjdk-11 upgrades
  • 06:36 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 for MCR', diff saved to https://phabricator.wikimedia.org/P12184 and previous config saved to /var/cache/conftool/dbconfig/20200806-050743-marostegui.json
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079', diff saved to https://phabricator.wikimedia.org/P12182 and previous config saved to /var/cache/conftool/dbconfig/20200806-045622-marostegui.json
  • 04:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12181 and previous config saved to /var/cache/conftool/dbconfig/20200806-045107-marostegui.json
  • 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12180 and previous config saved to /var/cache/conftool/dbconfig/20200806-044608-marostegui.json
  • 04:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12179 and previous config saved to /var/cache/conftool/dbconfig/20200806-043758-marostegui.json
  • 03:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2019.codfw.wmnet
  • 02:24 eileen: process-control config revision is 525eb71235 turn off delete deleted contacts
  • 01:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 01:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 01:17 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
  • 00:35 mutante: wtp2019 - reimaging - parsoid service does not work, unlike on all other wtp*, making sure it's clean
  • 00:00 mutante: LDAP - removed demon from nda group

2020-08-05

  • 23:57 eileen: civicrm revision changed from 150c3476c4 to 72452e28a9, config revision is b6ece03513
  • 23:02 shdubsh: logstash in codfw looks stuck -- restarting
  • 19:41 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.2
  • 19:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 19:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:13 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.3 (duration: 01m 44s)
  • 19:11 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.3
  • 18:26 Lucas_WMDE: Morning backport window done
  • 18:25 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/ContentTranslation/: Backport: Pass jQuery objects into jqueryMsg (duration: 01m 11s)
  • 18:14 mutante: test !log
  • 18:11 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Re-enable growth study quick survey (T257015) (duration: 01m 12s)
  • 17:30 shdubsh: test prometheus-icinga-exporter upgrade on icinga2001
  • 16:50 elukey: powercycle stat1005 after GPU issue
  • 15:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-logging-external streams and destination_event_service settings - T251935 (duration: 01m 05s)
  • 15:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:11 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 godog: bounce logstash on logstash100[789] - udp loss reported
  • 15:05 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 14:48 elukey: reboot stat1008 for unexpected maintenance (GPU stuck)
  • 14:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:32 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:27 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:27 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:25 moritzm: installing nmap bugfix updates from buster point release
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:20 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:20 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:14 moritzm: installing pillow security updates
  • 14:03 moritzm: installing node-minimist security updates
  • 13:51 moritzm: installing Linux update to 4.9.132 from buster point update (no reboots, just the package updates)
  • 13:32 jayme: updated helmfile to 0.125.2-0 and helm-diff to 3.1.2-1 on contint* and deploy*
  • 13:28 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:24 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 13:04 elukey: restart yarn resource managers on an-master100[12] to pick up new Yarn settings - https://gerrit.wikimedia.org/r/c/operations/puppet/+/618529
  • 13:00 moritzm: installing libjpeg-turbo security updates on stretch
  • 12:52 XioNoX: netmon1002:/srv/deployment/librenms/librenms$ sudo -u librenms ./lnms migrate
  • 12:49 jayme: imported helm-diff_3.1.2-1 to buster-wikimedia, jessie-wikimedia and stretch-wikimedia
  • 12:46 moritzm: installing imagemagick security updates on buster
  • 12:33 moritzm: installing net-snmp security updates on icinga hosts
  • 11:36 awight: EU Bacon reclosed
  • 11:36 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Switch test wikis to new version of vector by default (3/3) (T254227) (duration: 01m 07s)
  • 11:29 awight: EU Bacon reopened
  • 11:28 awight: EU Bacon complete
  • 11:26 awight@deploy1001: Synchronized wmf-config: Config: FileImporter: full default deployment (T232542) (duration: 01m 04s)
  • 11:23 jayme: imported helm-diff_3.1.2-0 to jessie-wikimedia and stretch-wikimedia
  • 11:22 jayme: imported helm-diff_3.1.2-0 to buster-wikimedia
  • 11:19 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Add import sources for lijwikisource (T259633) (duration: 01m 07s)
  • 11:13 awight@deploy1001: sync-file aborted: Config: Add import sources for lijwikisource (T259633) (duration: 00m 13s)
  • 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Data Bridge on Test Wikidata clients (T232584) (duration: 01m 20s)
  • 10:39 XioNoX: reboot cr3-ulsfo - T259621
  • 10:28 XioNoX: drain traffic away cr3-ulsfo - T259621
  • 10:21 moritzm: installing libssh security updates
  • 10:18 XioNoX: reboot cr4-ulsfo - T259621
  • 09:58 XioNoX: drain traffic away cr4-ulsfo
  • 09:53 XioNoX: depool ulsfo - T259621
  • 09:32 elukey: set ticket max renewable lifetime to 7d on all kerberos clients (was zero, the default)
  • 09:07 jayme: imported helmfile_0.125.2-0 to jessie-wikimedia
  • 09:07 jayme: imported helmfile_0.125.2-0 to stretch-wikimedia
  • 09:05 jayme: imported helmfile_0.125.2-0 to buster-wikimedia
  • 08:39 marostegui: Remove revision triggers on db1125:3317
  • 08:39 marostegui: Stop replication on db1079 for MCR, this will generate lag on s7 on labsdb
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for MCR', diff saved to https://phabricator.wikimedia.org/P12173 and previous config saved to /var/cache/conftool/dbconfig/20200805-083916-marostegui.json
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1094', diff saved to https://phabricator.wikimedia.org/P12172 and previous config saved to /var/cache/conftool/dbconfig/20200805-083833-marostegui.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12171 and previous config saved to /var/cache/conftool/dbconfig/20200805-082908-marostegui.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12170 and previous config saved to /var/cache/conftool/dbconfig/20200805-082138-marostegui.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12169 and previous config saved to /var/cache/conftool/dbconfig/20200805-081237-marostegui.json
  • 07:49 marostegui: Stop mysql on db1117:3323 (this will generate haproxy irc alerts) T259589
  • 07:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 07:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:26 moritzm: installing perl security updates on buster
  • 07:20 moritzm: installing libexif security updates on buster
  • 07:14 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 07:13 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 07:04 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 07:04 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 06:50 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 06:50 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:46 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 06:46 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 05:53 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for MCR', diff saved to https://phabricator.wikimedia.org/P12167 and previous config saved to /var/cache/conftool/dbconfig/20200805-050907-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1136', diff saved to https://phabricator.wikimedia.org/P12166 and previous config saved to /var/cache/conftool/dbconfig/20200805-050808-marostegui.json
  • 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12165 and previous config saved to /var/cache/conftool/dbconfig/20200805-050308-marostegui.json
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12164 and previous config saved to /var/cache/conftool/dbconfig/20200805-045334-marostegui.json
  • 04:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12163 and previous config saved to /var/cache/conftool/dbconfig/20200805-043346-marostegui.json

2020-08-04

  • 22:41 brennen: restarting php7.2-fpm on mw1404 for opcache issues
  • 21:45 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 21:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 21:34 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 21:03 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 21:03 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
  • 20:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 20:52 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 20:27 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@c80e2e7]: use provided ca certs for elasticsearch (duration: 02m 22s)
  • 20:25 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: use provided ca certs for elasticsearch
  • 20:15 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@b17bfd4]: Move mjolnir daemons from cirrus hosts to dedicated instances (duration: 02m 07s)
  • 20:12 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@b17bfd4]: Move mjolnir daemons from cirrus hosts to dedicated instances
  • 19:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.3
  • 19:11 brennen@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.3 (duration: 91m 03s)
  • 19:03 brennen: current 1.36.0-wmf.3 train status (T257971): mid scap-cdb-rebuild for testwiki sync; will proceed with group0 when finished.
  • 18:55 sukhe: upload pdns-recursor_4.3.3-1~deb10u1 to apt.wm.o (buster) - T252132
  • 18:49 mutante: letting puppet install envoy on all ores1* hosts
  • 18:46 mutante: letting puppet install envoy on all ores2* hosts
  • 18:37 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:19 mutante: temp disabling puppet on all ores hosts to add envoy
  • 17:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:40 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:40 brennen@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.3
  • 17:36 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:17 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:05 brennen: 1.36.0-wmf.3 was branched at 2d0cf09cdf for T257971
  • 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:49 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:24 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:15 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
  • 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 15:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Set default topic_prefixes - T255888 (duration: 00m 58s)
  • 15:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 15:39 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 15:39 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 15:38 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:18 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove now unused wgEventServiceStreamConfig - T229863 (duration: 00m 58s)
  • 15:18 moritzm: installing jackson-databind security issues
  • 15:08 moritzm: installing qemu security updates on cloudvirt* Stretch hosts
  • 14:54 cmjohnson1: swapping kubernetes1010 network cable T257542
  • 14:48 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 14:41 cmjohnson1: powercycling analytics1050 T258370
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for MCR', diff saved to https://phabricator.wikimedia.org/P12161 and previous config saved to /var/cache/conftool/dbconfig/20200804-143524-marostegui.json
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12160 and previous config saved to /var/cache/conftool/dbconfig/20200804-142710-marostegui.json
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12159 and previous config saved to /var/cache/conftool/dbconfig/20200804-142220-marostegui.json
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12158 and previous config saved to /var/cache/conftool/dbconfig/20200804-141556-marostegui.json
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12157 and previous config saved to /var/cache/conftool/dbconfig/20200804-141004-marostegui.json
  • 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
  • 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 13:51 hashar: Install newer openjdk on contint2001 and restarting CI Jenkins
  • 12:00 jayme: helm was updated: 2.16.7-2 -> 2.16.9-1 on chartmuseum*, contint*, deploy*
  • 11:43 Lucas_WMDE: EU backport window done
  • 11:41 marostegui: Deploy schema change on s3 codfw master, lag might show up on codfw s3 T259238
  • 11:37 moritzm: installing openjdk-11 security updates
  • 11:36 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: Load WikibaseRepo using extension registration in production (T257433) (duration: 00m 58s)
  • 11:12 Lucas_WMDE: Deployed patch for T86738 / T259565
  • 11:03 moritzm: installing e2fsprogs security updates for stretch
  • 10:47 moritzm: installing tomcat8 security updates
  • 10:47 vgutierrez: upgrade acme-chief to version 0.28
  • 10:33 vgutierrez: upload acme-chief 0.28 to apt.wm.o (buster) - T259338
  • 10:18 moritzm: installing imagemagick security updates on stretch
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for MCR and PK change T259524', diff saved to https://phabricator.wikimedia.org/P12156 and previous config saved to /var/cache/conftool/dbconfig/20200804-100035-marostegui.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12155 and previous config saved to /var/cache/conftool/dbconfig/20200804-095608-marostegui.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12154 and previous config saved to /var/cache/conftool/dbconfig/20200804-094909-marostegui.json
  • 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:58 moritzm: installing python3.5 security updates
  • 08:15 moritzm: installing remaining cups security updates
  • 08:13 XioNoX: cleaning up a bunch of prefix limit reached issues
  • 08:00 marostegui: Failover m2 from db1132 to db1107 -T257540
  • 07:54 moritzm: installing poppler security updates on stretch
  • 07:43 jayme: imported helm_2.16.9-1 to jessie-wikimedia
  • 07:43 jayme: imported helm_2.16.9-1 to stretch-wikimedia
  • 07:38 jayme: imported helm_2.16.9-1 to buster-wikimedia
  • 07:34 elukey: upgrade druid analytics (backend for Turnilo/Superset/etc..) to 0.19
  • 07:32 XioNoX: remove nonstop-bridging from fasw-c-eqiad switches - T191667
  • 07:29 XioNoX: remove nonstop-bridging from eqiad asw2 switches - T191667
  • 07:28 XioNoX: remove nonstop-bridging from asw2-esams - T191667
  • 07:27 marostegui: Start topology changes on m2 - T257540
  • 07:25 moritzm: installing rails security updates
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P12153 and previous config saved to /var/cache/conftool/dbconfig/20200804-064223-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12152 and previous config saved to /var/cache/conftool/dbconfig/20200804-063026-marostegui.json
  • 06:27 _joe_: restarting docker daemon on kubestage1002, seems like a case of https://github.com/moby/moby/issues/29635
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Restore original weight to db1089 on main traffic', diff saved to https://phabricator.wikimedia.org/P12151 and previous config saved to /var/cache/conftool/dbconfig/20200804-062358-marostegui.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12150 and previous config saved to /var/cache/conftool/dbconfig/20200804-062256-marostegui.json
  • 06:19 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 06:13 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: re-enabling lilypond execution in safe mode 3rd attempt (duration: 00m 58s)
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1089 on main traffic', diff saved to https://phabricator.wikimedia.org/P12149 and previous config saved to /var/cache/conftool/dbconfig/20200804-061255-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12148 and previous config saved to /var/cache/conftool/dbconfig/20200804-061209-marostegui.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for MCR', diff saved to https://phabricator.wikimedia.org/P12147 and previous config saved to /var/cache/conftool/dbconfig/20200804-061003-marostegui.json
  • 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for reimage', diff saved to https://phabricator.wikimedia.org/P12146 and previous config saved to /var/cache/conftool/dbconfig/20200804-051843-marostegui.json
  • 05:04 marostegui: Reboot db1107 to pick up the last kernel
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into API', diff saved to https://phabricator.wikimedia.org/P12145 and previous config saved to /var/cache/conftool/dbconfig/20200804-050150-marostegui.json
  • 03:56 legoktm: added Arlo to wmf-deployment Gerrit group
  • 03:53 legoktm: added subbu to wmf-deployment Gerrit group

2020-08-03

  • 23:43 mutante: mwdebug1001 - temp installing apt-file for debugging an issue on mwmaint
  • 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on fawiki (T253291) (duration: 00m 59s)
  • 21:35 sbassett: Deployed mitigations for T115888
  • 21:14 sbassett@deploy1001: Synchronized php-1.36.0-wmf.2/resources/src/mediawiki.jqueryMsg/mediawiki.jqueryMsg.js: (no justification provided) (duration: 01m 00s)
  • 18:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 18:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 18:09 dcausse@deploy1001: Finished deploy [wdqs/wdqs@20dcff3]: deploy 0.3.43 and gui update (duration: 15m 53s)
  • 17:53 dcausse@deploy1001: Started deploy [wdqs/wdqs@20dcff3]: deploy 0.3.43 and gui update
  • 17:33 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.2
  • 17:28 dcausse@deploy1001: Finished deploy [wdqs/wdqs@20dcff3]: (no justification provided) (duration: 00m 35s)
  • 17:28 dcausse@deploy1001: Started deploy [wdqs/wdqs@20dcff3]: (no justification provided)
  • 16:58 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.36.0-wmf.1"
  • 16:21 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 16:16 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
  • 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
  • 15:55 _joe_: regenerating the TLS certs for blubberoid
  • 15:33 XioNoX: standardize all routers routing-options config
  • 15:27 marostegui: Change PK on frwiktionary.revision on db2087:3317, db2129, db2121 db2086:3317 T259524
  • 15:16 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P12143 and previous config saved to /var/cache/conftool/dbconfig/20200803-145111-marostegui.json
  • 14:40 moritzm: update Buster netboot images to Buster 10.5 T259519
  • 14:33 XioNoX: disable all ALGs from pfw3-codfw
  • 14:28 XioNoX: remove IGMP and PIM from pfw3-codfw security zones
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into dump and depool db1106', diff saved to https://phabricator.wikimedia.org/P12142 and previous config saved to /var/cache/conftool/dbconfig/20200803-142749-marostegui.json
  • 14:27 XioNoX: remove nonstop-bridging from fasw-c-codfw - T191667
  • 14:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:04 filippo@deploy1001: Finished deploy [librenms/librenms@413e006]: Upgrade LibreNMS to 1.66 - T257017 (duration: 00m 23s)
  • 14:03 filippo@deploy1001: Started deploy [librenms/librenms@413e006]: Upgrade LibreNMS to 1.66 - T257017
  • 14:00 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin A:puppetmaster 'enable-puppet "cdanis deploying I92e9a05"'
  • 13:56 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin A:puppetmaster 'disable-puppet "cdanis deploying I92e9a05"'
  • 13:27 moritzm: installing libopenmpt security updates
  • 13:15 XioNoX: remove nonstop-bridging from asw-d-codfw - T191667
  • 13:14 XioNoX: remove nonstop-bridging from asw-c-codfw - T191667
  • 13:12 XioNoX: remove nonstop-bridging from asw-b-codfw - T191667
  • 13:11 XioNoX: remove nonstop-bridging from asw-a-codfw - T191667
  • 13:05 moritzm: installing json-c security updates
  • 12:53 XioNoX: move VRRP master to cr3-eqsin
  • 12:32 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.2
  • 12:26 moritzm: installing apache-log4j1.2 security updates
  • 12:20 moritzm: restarting nginx on francium to pick up luajit update
  • 12:13 kormat: disabling puppet on cumin hosts T259021
  • 11:55 moritzm: installing luajit security updates
  • 11:20 moritzm: installing ruby-rack security updates
  • 11:19 Urbanecm: EU B&C done
  • 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 346138d: Add extra namespaces for yuewiktionary (T258913) (duration: 01m 06s)
  • 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 8c2a2b2: Add gpophotoeng.gov.il to the wgCopyUploadsDomains allowlist for commonswiki (T258857) (duration: 01m 07s)
  • 11:03 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: ead6b9e: New throttle rule for Czech editathon (T259352) (duration: 01m 06s)
  • 11:03 moritzm: installing ruby2.5 security updates
  • 11:01 moritzm: removing cloudcephmon100[1-3].wikimedia.org from debmonitor (these eventually got re-installed as cloudcephmon100[1-3].eqiad.wmnet)
  • 10:51 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 06s)
  • 10:50 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 08s)
  • 10:29 moritzm: installing NSS security updates on buster
  • 10:26 moritzm: restarting Apache on puppetboard to pick up curl security updates
  • 10:19 moritzm: restarting wtp1025 (parsoid canary) to pick up curl security updates
  • 09:46 moritzm: restarting mw1261-mw1265 to pick up curl security updates
  • 09:42 moritzm: installing curl security updates on stretch
  • 08:59 moritzm: installing ffmpeg security updates on jobrunners/video scalers (3.2.15 rebuilt with VP9/row-mt patches)
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into API', diff saved to https://phabricator.wikimedia.org/P12141 and previous config saved to /var/cache/conftool/dbconfig/20200803-082641-marostegui.json
  • 08:25 moritzm: installing qemu security updates on stretch
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12140 and previous config saved to /var/cache/conftool/dbconfig/20200803-082533-marostegui.json
  • 08:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify s5 wikis T259437 (duration: 01m 05s)
  • 08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify s5 wikis T259437 (duration: 01m 40s)
  • 08:07 elukey: roll restart aqs on aqs* to pick up new druid settings
  • 07:10 marostegui: Remove revision triggers from db2095:3317 for MCR changes T238966
  • 07:09 marostegui: Deploy MCR change on s7 codfw, lag will appear on codfw T238966
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12139 and previous config saved to /var/cache/conftool/dbconfig/20200803-070702-marostegui.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12138 and previous config saved to /var/cache/conftool/dbconfig/20200803-052715-marostegui.json
  • 05:04 marostegui: Remove db1108:3321 and db1108:3322 from tendril and add db1108:3351 and db1108:3352 T254462
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12137 and previous config saved to /var/cache/conftool/dbconfig/20200803-050148-marostegui.json

2020-08-01

  • 16:30 Amir1: wikiadmin@10.64.32.197(avkwiki)> delete from site_identifiers; (T259122)
  • 16:27 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T259122)

Archives

See Server Admin Log/Archives.