You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech
Jump to navigation Jump to search
imported>Stashbot
(bstorm_: restarted maintain-dbusers on labstore1004 after recovering the m5 DB's connection issue)
imported>Stashbot
(bstorm_: set max_connections on db1133 (m5-master) back to 500 since the neutron connections seem fairly stable now T242817)
Line 1: Line 1:
 +
== 2020-01-16 ==
 +
* 00:40 bstorm_: set max_connections on db1133 (m5-master) back to 500 since the neutron connections seem fairly stable now [[phab:T242817|T242817]]
 +
* 00:23 catrope@deploy1001: Synchronized static/images/project-logos/: Restore pre-censorship trwiki logos ([[phab:T242932|T242932]]) (duration: 01m 05s)
 +
* 00:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable topics for suggested edits on testwiki (duration: 01m 04s)
 +
 
== 2020-01-15 ==
 
== 2020-01-15 ==
 +
* 22:40 mutante: phabricator - disabling 'bzimport' user ([[phab:T242860|T242860]])
 +
* 21:03 jforrester@deploy1001: Synchronized php-1.35.0-wmf.14/languages/messages/MessagesMrj.php: Fix fallbacks of mrj (Hill Mari) [[phab:T242409|T242409]] [[phab:T242796|T242796]] (duration: 01m 05s)
 +
* 20:47 mutante: gerrit - adding Zoranzoki to members of extension-GoogleAdSense (endorsed by extension owner Siebrand) ([[phab:T241509|T241509]])
 +
* 20:28 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touched IS.php for sync (duration: 01m 05s)
 +
* 20:27 jforrester@deploy1001: sync-file aborted: Enable partial blocks on last wiki,  (duration: 00m 01s)
 +
* 20:17 krinkle@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/MultimediaViewer/resources/: [[phab:T229484|T229484]] (duration: 01m 06s)
 +
* 19:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable partial blocks on last wiki, Commons [[phab:T242570|T242570]] (duration: 01m 03s)
 +
* 19:54 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable banner for wikis that recently opted in to partial blocks [[phab:T240300|T240300]] [[phab:T242570|T242570]] [[phab:T242569|T242569]] (duration: 01m 05s)
 +
* 18:10 anomie@deploy1001: Synchronized wmf-config/CommonSettings.php: Set OAuth 2 access token expiry to "infinity" (duration: 01m 04s)
 +
* 17:50 anomie@deploy1001: Synchronized private/PrivateSettings.php: Setting RSA keys for OAuth 2.0 ([[phab:T242872|T242872]]) (duration: 01m 05s)
 +
* 16:27 elukey: import key 0xDBBF9D42B7B4BD70 (Apache BigTop) manually on install1002's gpg
 +
* 15:55 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/WikibaseQualityConstraints/extension.json: [[gerrit:565012{{!}}Fix service injection for special page (T242846)]] (duration: 01m 08s)
 +
* 15:40 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/Wikibase/client/includes/Api/PageTerms.php: [[gerrit:565034{{!}}Fix invalid iteration over false in PageTerms (T242856)]] (duration: 01m 06s)
 +
* 15:37 vgutierrez: rolling restart of ats-tls instances - [[phab:T196558|T196558]] [[phab:T242778|T242778]]
 +
* 15:28 ema: cp3064: ats-tls-restart to apply https://gerrit.wikimedia.org/r/559711 [[phab:T196558|T196558]]
 +
* 15:20 moritzm: installing OpenSSL security updates on db* hosts
 +
* 15:02 moritzm: installing OpenSSL security updates on mw*
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1252.eqiad.wmnet
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1251.eqiad.wmnet
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1250.eqiad.wmnet
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1249.eqiad.wmnet
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1248.eqiad.wmnet
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1247.eqiad.wmnet
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1246.eqiad.wmnet
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1245.eqiad.wmnet
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1244.eqiad.wmnet
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1243.eqiad.wmnet
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1242.eqiad.wmnet
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1241.eqiad.wmnet
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1240.eqiad.wmnet
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1239.eqiad.wmnet
 +
* 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1238.eqiad.wmnet
 +
* 14:54 effie: lower weights on slower servers mw1238-mw1252
 +
* 14:53 effie: pool mw1238, mw1240, mw1246
 +
* 14:44 XioNoX: reject RPKI invalids in dfw - [[phab:T220669|T220669]]
 +
* 14:30 moritzm: rolling restart of FPM on mw1261-mw1265 to pick up OpenSSL security update
 +
* 14:25 XioNoX: reject RPKI invalids in ams - [[phab:T220669|T220669]]
 +
* 14:18 godog: reenable puppet on cp hosts, after https://gerrit.wikimedia.org/r/c/operations/puppet/+/563430 deployment
 +
* 14:08 effie: depool mw1238, mw1240, mw1246
 +
* 14:06 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.15 (duration: 01m 07s)
 +
* 14:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.15
 +
* 13:58 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
 +
* 13:56 filippo@cumin1001: START - Cookbook sre.hosts.downtime
 +
* 13:54 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
 +
* 13:54 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
 +
* 13:53 akosiaris: update calico policy on eqiad/codfw/staging. Add new urldownloaders. [[phab:T224551|T224551]]
 +
* 13:52 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
 +
* 13:02 _joe_: restarting gerrit
 +
* 12:50 XioNoX: reject RPKI invalids in eqsin - [[phab:T220669|T220669]]
 +
* 12:38 vgutierrez: Pooling ulsfo for ncredir service - [[phab:T242321|T242321]]
 +
* 12:27 awight: EU SWAT done
 +
* 12:24 awight@deploy1001: Synchronized php-1.35.0-wmf.14/extensions/Cite: SWAT: [[gerrit:564002{{!}}Don't fail with a LogicException during section preview (T242434)]] (duration: 01m 10s)
 +
* 12:22 vgutierrez: upgrading ats on cp4026, cp4032, cp5006 and cp5012 - [[phab:T242778|T242778]] [[phab:T242620|T242620]]
 +
* 12:06 XioNoX: reject RPKI invalids in ulsfo - [[phab:T220669|T220669]]
 +
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1112', diff saved to https://phabricator.wikimedia.org/P10161 and previous config saved to /var/cache/conftool/dbconfig/20200115-115826-marostegui.json
 +
* 11:36 elukey: restart all varnishkafka daemons on cp4031
 +
* 11:09 legoktm: added SonarQubeBot to "Non-Interactive Users" group on Gerrit
 +
* 10:38 moritzm: installing openssl1.0 updates on stretch (update to 1.0.2u)
 +
* 10:08 ema: cache: rolling varnish-frontend-restart to add CAP_KILL to varnish-frontend.service [[phab:T242411|T242411]]
 +
* 09:56 vgutierrez: repooling cp5012
 +
* 09:46 vgutierrez: depooling cp5012 for some ats parent select tests
 +
* 09:42 XioNoX: enable ping offload in esams - [[phab:T190090|T190090]]
 +
* 09:32 marostegui: Deploy schema change on x1 eqiad hosts [[phab:T242749|T242749]]
 +
* 09:19 elukey: roll-restart druid brokers on druid100[4-6] - locked up after segments deletion
 +
* 09:11 marostegui: Deploy schema change on x1 codfw - [[phab:T242749|T242749]]
 +
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10160 and previous config saved to /var/cache/conftool/dbconfig/20200115-085145-marostegui.json
 +
* 08:44 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
 +
* 08:40 godog: roll restart ores in codfw/eqiad to apply logging pipeline changes
 +
* 08:40 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
 +
* 08:40 elukey@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
 +
* 08:40 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
 +
* 08:23 godog: roll restart ores in codfw/eqiad to apply logging pipeline changes
 +
* 08:13 godog: testing ores logging to pipeline on ores2001
 +
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10159 and previous config saved to /var/cache/conftool/dbconfig/20200115-070201-marostegui.json
 +
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10158 and previous config saved to /var/cache/conftool/dbconfig/20200115-065353-marostegui.json
 +
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1080', diff saved to https://phabricator.wikimedia.org/P10157 and previous config saved to /var/cache/conftool/dbconfig/20200115-065305-marostegui.json
 +
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10156 and previous config saved to /var/cache/conftool/dbconfig/20200115-064606-marostegui.json
 +
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10155 and previous config saved to /var/cache/conftool/dbconfig/20200115-064535-marostegui.json
 +
* 06:25 marostegui: Upgrade db1098:3316 and db1098:3317
 +
* 06:23 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Make testcommonswiki behavior consistent with commonswiki (duration: 01m 16s)
 +
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 db1098:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P10152 and previous config saved to /var/cache/conftool/dbconfig/20200115-062028-marostegui.json
 +
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10151 and previous config saved to /var/cache/conftool/dbconfig/20200115-061859-marostegui.json
 +
* 06:16 marostegui: Remove revision partitions from db2088:3311 - [[phab:T239453|T239453]]
 +
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10150 and previous config saved to /var/cache/conftool/dbconfig/20200115-061052-marostegui.json
 +
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10148 and previous config saved to /var/cache/conftool/dbconfig/20200115-060347-marostegui.json
 +
* 06:00 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@3c5f615]: Update mobileapps to {{Gerrit|7f507ae}} (duration: 05m 56s)
 +
* 05:54 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@3c5f615]: Update mobileapps to {{Gerrit|7f507ae}}
 +
* 01:32 mutante: lvs1015 powercycling, crashed, nothing on console, lots of unknowns in icinga
 +
* 01:17 mutante: dbproxy1017 and dbproxy1021 were showing "haproxy failover" icinga alerts. did the check described on https://wikitech.wikimedia.org/wiki/HAProxy#Failover and it claimed on both that db1133 was DOWN..but checking db1133 itself showed it was up and working normal. in that case the docs said to 'systemctl reload haproxy'. done on both and things recovered
 +
* 01:13 mutante: dbproxy1017 - systemctl reload haproxy
 
* 00:22 bstorm_: restarted maintain-dbusers on labstore1004 after recovering the m5 DB's connection issue
 
* 00:22 bstorm_: restarted maintain-dbusers on labstore1004 after recovering the m5 DB's connection issue
 
* 00:12 bstorm_: set max_connections to 600 temporarily while troubleshooting on m5 (db1133)
 
* 00:12 bstorm_: set max_connections to 600 temporarily while troubleshooting on m5 (db1133)

Revision as of 00:40, 16 January 2020

2020-01-16

  • 00:40 bstorm_: set max_connections on db1133 (m5-master) back to 500 since the neutron connections seem fairly stable now T242817
  • 00:23 catrope@deploy1001: Synchronized static/images/project-logos/: Restore pre-censorship trwiki logos (T242932) (duration: 01m 05s)
  • 00:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable topics for suggested edits on testwiki (duration: 01m 04s)

2020-01-15

  • 22:40 mutante: phabricator - disabling 'bzimport' user (T242860)
  • 21:03 jforrester@deploy1001: Synchronized php-1.35.0-wmf.14/languages/messages/MessagesMrj.php: Fix fallbacks of mrj (Hill Mari) T242409 T242796 (duration: 01m 05s)
  • 20:47 mutante: gerrit - adding Zoranzoki to members of extension-GoogleAdSense (endorsed by extension owner Siebrand) (T241509)
  • 20:28 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touched IS.php for sync (duration: 01m 05s)
  • 20:27 jforrester@deploy1001: sync-file aborted: Enable partial blocks on last wiki, (duration: 00m 01s)
  • 20:17 krinkle@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/MultimediaViewer/resources/: T229484 (duration: 01m 06s)
  • 19:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable partial blocks on last wiki, Commons T242570 (duration: 01m 03s)
  • 19:54 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable banner for wikis that recently opted in to partial blocks T240300 T242570 T242569 (duration: 01m 05s)
  • 18:10 anomie@deploy1001: Synchronized wmf-config/CommonSettings.php: Set OAuth 2 access token expiry to "infinity" (duration: 01m 04s)
  • 17:50 anomie@deploy1001: Synchronized private/PrivateSettings.php: Setting RSA keys for OAuth 2.0 (T242872) (duration: 01m 05s)
  • 16:27 elukey: import key 0xDBBF9D42B7B4BD70 (Apache BigTop) manually on install1002's gpg
  • 15:55 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/WikibaseQualityConstraints/extension.json: Fix service injection for special page (T242846) (duration: 01m 08s)
  • 15:40 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/Wikibase/client/includes/Api/PageTerms.php: Fix invalid iteration over false in PageTerms (T242856) (duration: 01m 06s)
  • 15:37 vgutierrez: rolling restart of ats-tls instances - T196558 T242778
  • 15:28 ema: cp3064: ats-tls-restart to apply https://gerrit.wikimedia.org/r/559711 T196558
  • 15:20 moritzm: installing OpenSSL security updates on db* hosts
  • 15:02 moritzm: installing OpenSSL security updates on mw*
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1252.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1251.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1250.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1249.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1248.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1247.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1246.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1245.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1244.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1243.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1242.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1241.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1240.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1239.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1238.eqiad.wmnet
  • 14:54 effie: lower weights on slower servers mw1238-mw1252
  • 14:53 effie: pool mw1238, mw1240, mw1246
  • 14:44 XioNoX: reject RPKI invalids in dfw - T220669
  • 14:30 moritzm: rolling restart of FPM on mw1261-mw1265 to pick up OpenSSL security update
  • 14:25 XioNoX: reject RPKI invalids in ams - T220669
  • 14:18 godog: reenable puppet on cp hosts, after https://gerrit.wikimedia.org/r/c/operations/puppet/+/563430 deployment
  • 14:08 effie: depool mw1238, mw1240, mw1246
  • 14:06 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.15 (duration: 01m 07s)
  • 14:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.15
  • 13:58 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:56 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:54 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:54 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:53 akosiaris: update calico policy on eqiad/codfw/staging. Add new urldownloaders. T224551
  • 13:52 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:02 _joe_: restarting gerrit
  • 12:50 XioNoX: reject RPKI invalids in eqsin - T220669
  • 12:38 vgutierrez: Pooling ulsfo for ncredir service - T242321
  • 12:27 awight: EU SWAT done
  • 12:24 awight@deploy1001: Synchronized php-1.35.0-wmf.14/extensions/Cite: SWAT: Don't fail with a LogicException during section preview (T242434) (duration: 01m 10s)
  • 12:22 vgutierrez: upgrading ats on cp4026, cp4032, cp5006 and cp5012 - T242778 T242620
  • 12:06 XioNoX: reject RPKI invalids in ulsfo - T220669
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1112', diff saved to https://phabricator.wikimedia.org/P10161 and previous config saved to /var/cache/conftool/dbconfig/20200115-115826-marostegui.json
  • 11:36 elukey: restart all varnishkafka daemons on cp4031
  • 11:09 legoktm: added SonarQubeBot to "Non-Interactive Users" group on Gerrit
  • 10:38 moritzm: installing openssl1.0 updates on stretch (update to 1.0.2u)
  • 10:08 ema: cache: rolling varnish-frontend-restart to add CAP_KILL to varnish-frontend.service T242411
  • 09:56 vgutierrez: repooling cp5012
  • 09:46 vgutierrez: depooling cp5012 for some ats parent select tests
  • 09:42 XioNoX: enable ping offload in esams - T190090
  • 09:32 marostegui: Deploy schema change on x1 eqiad hosts T242749
  • 09:19 elukey: roll-restart druid brokers on druid100[4-6] - locked up after segments deletion
  • 09:11 marostegui: Deploy schema change on x1 codfw - T242749
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10160 and previous config saved to /var/cache/conftool/dbconfig/20200115-085145-marostegui.json
  • 08:44 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 08:40 godog: roll restart ores in codfw/eqiad to apply logging pipeline changes
  • 08:40 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 08:40 elukey@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
  • 08:40 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 08:23 godog: roll restart ores in codfw/eqiad to apply logging pipeline changes
  • 08:13 godog: testing ores logging to pipeline on ores2001
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10159 and previous config saved to /var/cache/conftool/dbconfig/20200115-070201-marostegui.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10158 and previous config saved to /var/cache/conftool/dbconfig/20200115-065353-marostegui.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1080', diff saved to https://phabricator.wikimedia.org/P10157 and previous config saved to /var/cache/conftool/dbconfig/20200115-065305-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10156 and previous config saved to /var/cache/conftool/dbconfig/20200115-064606-marostegui.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10155 and previous config saved to /var/cache/conftool/dbconfig/20200115-064535-marostegui.json
  • 06:25 marostegui: Upgrade db1098:3316 and db1098:3317
  • 06:23 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Make testcommonswiki behavior consistent with commonswiki (duration: 01m 16s)
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 db1098:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P10152 and previous config saved to /var/cache/conftool/dbconfig/20200115-062028-marostegui.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10151 and previous config saved to /var/cache/conftool/dbconfig/20200115-061859-marostegui.json
  • 06:16 marostegui: Remove revision partitions from db2088:3311 - T239453
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312 - T239453', diff saved to https://phabricator.wikimedia.org/P10150 and previous config saved to /var/cache/conftool/dbconfig/20200115-061052-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10148 and previous config saved to /var/cache/conftool/dbconfig/20200115-060347-marostegui.json
  • 06:00 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@3c5f615]: Update mobileapps to 7f507ae (duration: 05m 56s)
  • 05:54 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@3c5f615]: Update mobileapps to 7f507ae
  • 01:32 mutante: lvs1015 powercycling, crashed, nothing on console, lots of unknowns in icinga
  • 01:17 mutante: dbproxy1017 and dbproxy1021 were showing "haproxy failover" icinga alerts. did the check described on https://wikitech.wikimedia.org/wiki/HAProxy#Failover and it claimed on both that db1133 was DOWN..but checking db1133 itself showed it was up and working normal. in that case the docs said to 'systemctl reload haproxy'. done on both and things recovered
  • 01:13 mutante: dbproxy1017 - systemctl reload haproxy
  • 00:22 bstorm_: restarted maintain-dbusers on labstore1004 after recovering the m5 DB's connection issue
  • 00:12 bstorm_: set max_connections to 600 temporarily while troubleshooting on m5 (db1133)

2020-01-14

  • 20:11 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@1cf0530]: Increment service-runner to latest version (duration: 04m 48s)
  • 20:07 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@1cf0530]: Increment service-runner to latest version
  • 19:22 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: e400916: [wikitech] Restore contentadmin ability to manage abuse filters (duration: 01m 05s)
  • 18:11 vgutierrez: repooling cp5012
  • 18:06 vgutierrez: depool cp5012 for some ats parent select debugging
  • 17:43 vgutierrez: repooling cp4027
  • 17:39 vgutierrez: depooling cp4027 for some ats-tls parent balancing tests
  • 17:21 _joe_: upload docker-report 0.0.2 to {buster,stretch}-wikimedia T242604
  • 16:53 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.15
  • 16:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:44 liw: branch is cut for 1.35.0-wmv.15; train window is closed, but I'll continue train since the next time slot seems to not have anything
  • 16:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:41 marostegui: Enable puppet back on install1002 and install2002 - T242481
  • 16:31 liw@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (try 2) (duration: 43m 29s)
  • 16:26 marostegui: Disable temporarily puppet on install1002 and install2002 - T242481
  • 16:08 volans@deploy1001: Finished deploy [debmonitor/deploy@e72911c]: Release v0.2.4 (duration: 01m 09s)
  • 16:07 volans@deploy1001: Started deploy [debmonitor/deploy@e72911c]: Release v0.2.4
  • 15:47 liw@deploy1001: Started scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (try 2)
  • 15:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 marostegui: Copy data from db1080 to db1107 T242702
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080 for tranfer', diff saved to https://phabricator.wikimedia.org/P10144 and previous config saved to /var/cache/conftool/dbconfig/20200114-150223-marostegui.json
  • 15:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:51 liw@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_44869219" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 03m 55s)
  • 14:47 liw@deploy1001: Started scap: testwiki to php-1.35.0-wmf.15 and rebuild l10n cache
  • 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10143 and previous config saved to /var/cache/conftool/dbconfig/20200114-144341-marostegui.json
  • 14:26 marostegui: Move db1114 under db1080
  • 14:24 marostegui: Stop db1080 and db1107 replication in sync
  • 14:21 XioNoX: push firewall policies to pfw3-eqiad - T242681
  • 14:15 XioNoX: push firewall policies to pfw3-codfw - T242681
  • 14:12 liw: branch cut for 1.35.0-wmf.15
  • 14:09 vgutierrez: upgrade ats to 8.0.5-1wm12 in cp5006 and cp5012 - T242620
  • 14:03 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:54 marostegui: Upgrade db1080
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080 for upgrade', diff saved to https://phabricator.wikimedia.org/P10142 and previous config saved to /var/cache/conftool/dbconfig/20200114-135238-marostegui.json
  • 12:16 vgutierrez@puppetmaster1001: conftool action : set/weight=1; selector: service=nginx,name=ncredir3002.esams.wmnet
  • 12:16 vgutierrez@puppetmaster1001: conftool action : set/weight=1; selector: service=nginx,name=ncredir3001.esams.wmnet
  • 12:14 vgutierrez@puppetmaster1001: conftool action : set/weight=1; selector: service=nginx,name=ncredir4001.ulsfo.wmnet
  • 12:14 vgutierrez@puppetmaster1001: conftool action : set/weight=1; selector: service=nginx,name=ncredir4002.ulsfo.wmnet
  • 12:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:51 vgutierrez: restarting pybal on lvs4005 (high-traffic1 LVS) - T242321
  • 11:49 vgutierrez: restarting pybal on lvs4007 (secondary LVS) - T242321
  • 11:48 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ncredir4002.ulsfo.wmnet
  • 11:47 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ncredir4001.ulsfo.wmnet
  • 11:15 vgutierrez: Updating puppet-compiler facts
  • 10:40 vgutierrez: upgrade ats to 8.0.5-1wm12 in cp4026 and cp4032 - T242620
  • 10:07 moritzm: installing remaining cyrus-sasl security updates
  • 09:44 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.14/extensions/Wikibase/lib/includes/Store/Sql/Terms: wbterms: Add Statsd metrics in critical parts of the new term store (duration: 00m 57s)
  • 07:33 XioNoX: add peering to AS26744 in eqiad, eqord and eqdfw
  • 06:25 marostegui: Deploy schema change on flowdb (x1) directly on the master T242688
  • 06:23 marostegui: Deploy schema change on labswiki (wikitech) T242688
  • 06:20 marostegui: Deploy schema change on s3 master for officewiki and techconductwiki T242688
  • 06:01 marostegui: Remove partitions from revision table on db1103:3312
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 - T239453', diff saved to https://phabricator.wikimedia.org/P10141 and previous config saved to /var/cache/conftool/dbconfig/20200114-060116-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after removing partitions from revision table', diff saved to https://phabricator.wikimedia.org/P10140 and previous config saved to /var/cache/conftool/dbconfig/20200114-060003-marostegui.json
  • 05:29 andrewbogott: rebooting cloudservices1004 to make sure all my upgrades are sustainable
  • 01:03 catrope@deploy1001: Synchronized php-1.35.0-wmf.14/extensions/GrowthExperiments/: Various topic search-related cherry-picks (duration: 00m 57s)

2020-01-13

  • 21:35 milimetric@deploy1001: Finished deploy [analytics/refinery@690517c]: Referer Classify change (duration: 09m 08s)
  • 21:32 arlolra@deploy1001: Finished deploy [parsoid/deploy@dd92eeb]: Updating Parsoid to 5d37da1 (duration: 08m 21s)
  • 21:26 milimetric@deploy1001: Started deploy [analytics/refinery@690517c]: Referer Classify change
  • 21:24 arlolra@deploy1001: Started deploy [parsoid/deploy@dd92eeb]: Updating Parsoid to 5d37da1
  • 20:37 clarakosi@deploy1001: Finished deploy [restbase/deploy@bfdd342]: Use parsoid_uri, add ngwiki. T241756, T240771 (duration: 15m 41s)
  • 20:21 clarakosi@deploy1001: Started deploy [restbase/deploy@bfdd342]: Use parsoid_uri, add ngwiki. T241756, T240771
  • 19:39 tgr: ran disableOATHAuthForUser.php for T242543
  • 19:22 tgr@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Revert a temporary CommonsMetadata cache validation hook that has been unneeded for a long time (duration: 00m 56s)
  • 15:56 moritzm: installing cyrus-sasl security updates
  • 15:19 moritzm: remove hassium in Ganeti T224567
  • 15:19 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 15:18 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:18 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 15:18 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 15:00 joal@deploy1001: Finished deploy [analytics/hdfs-tools/deploy@a1b4d34]: Deploy hdfs-rsync bug correction (duration: 00m 08s)
  • 15:00 joal@deploy1001: Started deploy [analytics/hdfs-tools/deploy@a1b4d34]: Deploy hdfs-rsync bug correction
  • 14:58 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:57 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 14:55 moritzm: remove hassaleh in Ganeti T224567
  • 14:24 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (563985) (duration: 00m 55s)
  • 14:24 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (563985) (duration: 00m 56s)
  • 13:11 moritzm: upgrade mw canaries to PHP 7.2.26 T241222
  • 12:08 Urbanecm: EU SWAT done
  • 12:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c7cf53c: Deploy partial blocks on enwiki (T242569) (duration: 00m 55s)
  • 11:58 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (563985) (duration: 00m 55s)
  • 11:57 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (563985) (duration: 00m 55s)
  • 11:42 moritzm: upgrading remaining mwdebug* servers and mw1261 to PHP 7.2.26 T241222
  • 11:04 volans@deploy1001: Finished deploy [debmonitor/deploy@265059b]: Release v0.2.3 (duration: 01m 10s)
  • 11:03 volans@deploy1001: Started deploy [debmonitor/deploy@265059b]: Release v0.2.3
  • 10:51 vgutierrez: pooling esams for ncredir - T242321
  • 09:38 moritzm: rename Ganeti group in ulsfo from "default" to "row_1"
  • 09:16 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:16 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112', diff saved to https://phabricator.wikimedia.org/P10134 and previous config saved to /var/cache/conftool/dbconfig/20200113-075334-marostegui.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112', diff saved to https://phabricator.wikimedia.org/P10133 and previous config saved to /var/cache/conftool/dbconfig/20200113-073656-marostegui.json
  • 07:30 XioNoX: cr3-knams> clear bfd session fe80::5e5e:ab00:d3d:85c - T240659
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112', diff saved to https://phabricator.wikimedia.org/P10132 and previous config saved to /var/cache/conftool/dbconfig/20200113-072611-marostegui.json
  • 06:45 marostegui: Upgrade db1112
  • 06:36 marostegui: Deploy schema change on db1112 with replication (lag will appear on s3 on labs) - T234052
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P10131 and previous config saved to /var/cache/conftool/dbconfig/20200113-063513-marostegui.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1081 for compression T232446', diff saved to https://phabricator.wikimedia.org/P10130 and previous config saved to /var/cache/conftool/dbconfig/20200113-062007-marostegui.json
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1084', diff saved to https://phabricator.wikimedia.org/P10129 and previous config saved to /var/cache/conftool/dbconfig/20200113-061835-marostegui.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after compression', diff saved to https://phabricator.wikimedia.org/P10128 and previous config saved to /var/cache/conftool/dbconfig/20200113-061434-marostegui.json
  • 06:11 marostegui: Deploy schema change on s1 master (db1083) - T234052
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es1013', diff saved to https://phabricator.wikimedia.org/P10127 and previous config saved to /var/cache/conftool/dbconfig/20200113-061106-marostegui.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 T234052', diff saved to https://phabricator.wikimedia.org/P10126 and previous config saved to /var/cache/conftool/dbconfig/20200113-061025-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1013', diff saved to https://phabricator.wikimedia.org/P10125 and previous config saved to /var/cache/conftool/dbconfig/20200113-060841-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after compression', diff saved to https://phabricator.wikimedia.org/P10124 and previous config saved to /var/cache/conftool/dbconfig/20200113-060112-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 T234052', diff saved to https://phabricator.wikimedia.org/P10123 and previous config saved to /var/cache/conftool/dbconfig/20200113-060012-marostegui.json
  • 05:58 marostegui: Remove partitions from db1105:3312 - T239453
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 - T239453', diff saved to https://phabricator.wikimedia.org/P10122 and previous config saved to /var/cache/conftool/dbconfig/20200113-055811-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091:3312', diff saved to https://phabricator.wikimedia.org/P10121 and previous config saved to /var/cache/conftool/dbconfig/20200113-055554-marostegui.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after compression', diff saved to https://phabricator.wikimedia.org/P10120 and previous config saved to /var/cache/conftool/dbconfig/20200113-055315-marostegui.json
  • 05:51 marostegui: Deploy schema change on x1 master on flowdb with replication - T241387
  • 02:02 andrewbogott: restarted mariadb on cloudservices1003, cloudservices1004, cloudservices2001-dev, clouddb2001-dev for T239791
  • 00:58 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=cp3061.esams.wmnet
  • 00:53 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=cp3065.esams.wmnet
  • 00:23 jiji@cumin1001: conftool action : set/pooled=no; selector: name=cp3061.esams.wmnet
  • 00:23 jiji@cumin1001: conftool action : set/pooled=no; selector: name=cp3065.esams.wmnet
  • 00:22 effie: depool and restart cp3065 cp3061 - T238305
  • 00:21 effie: depool and restart cp3065 cp3061

2020-01-12

  • 14:48 effie: restart php on mw1240
  • 14:46 effie: restart php on mw1238
  • 04:35 volker-e@deploy1001: Finished deploy [design/style-guide@8bec25e]: Deploy design/style-guide: (duration: 00m 07s)
  • 04:35 volker-e@deploy1001: Started deploy [design/style-guide@8bec25e]: Deploy design/style-guide:
  • 02:57 volker-e@deploy1001: Finished deploy [design/style-guide@cebc152]: Deploy design/style-guide: (duration: 00m 07s)
  • 02:57 volker-e@deploy1001: Started deploy [design/style-guide@cebc152]: Deploy design/style-guide:

2020-01-11

  • 05:34 volker-e@deploy1001: Finished deploy [design/style-guide@6a44c69]: Deploy design/style-guide: (duration: 00m 08s)
  • 05:34 volker-e@deploy1001: Started deploy [design/style-guide@6a44c69]: Deploy design/style-guide:

2020-01-10

  • 22:33 mutante: ms-be1026 sudo systemctl reset-failed (failed Session 372989 of user debmonitor)
  • 20:45 jeh: cloudcontrol200[13]-dev schedule downtime until Feb 28 2020 on systemd service check T242462
  • 20:29 jeh: cloudmetrics100[12] schedule downtime until Feb 28 2020 on prometheus check T242460
  • 20:03 urandom: drop legacy Parsoid/JS storage keyspaces, production env -- T242344
  • 19:56 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:54 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:52 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 19:51 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 19:48 mutante: LDAP - add Zbyszko Papierski to "wmf" group (T242341)
  • 19:47 mutante: LDAP - add Hugh Nowlan to "wmf" group (T242309)
  • 19:42 dcausse: restarting blazegraph on wdqs1005
  • 19:40 ebernhardson: restart mjolnir-kafka-bulk-daemon across eqiad and codfw search clusters
  • 19:40 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@e141941]: repair model upload in bulk daemon (duration: 05m 02s)
  • 19:35 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@e141941]: repair model upload in bulk daemon
  • 19:13 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:53 mutante: welcome new (restbase) service deployer Clara Andrew-Wani (T242152)
  • 18:29 bd808: Restarted zuul on contint1001; no logs since 2020-01-10 17:55:28,452
  • 11:48 moritzm: stop/mask nginx on hassium/hassaleh T224567
  • 10:56 akosiaris: repool mathoid codfw for testing canary support in the mathoid helm chart
  • 10:56 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mathoid
  • 10:51 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'canary' .
  • 10:51 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
  • 10:40 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 10:38 akosiaris: depool mathoid codfw in preparation for testing canary support in the mathoid helm chart
  • 10:37 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=mathoid
  • 10:24 moritzm: rename Ganeti group for esams from "default" to "row_OE" T236216
  • 10:21 moritzm: rename Ganeti group for eqsin from "default" to "row_1" T228099
  • 09:02 marostegui: Remove revision partitions from db2091:3312
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depoool db2091:3312', diff saved to https://phabricator.wikimedia.org/P10113 and previous config saved to /var/cache/conftool/dbconfig/20200110-090143-marostegui.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2088:3312', diff saved to https://phabricator.wikimedia.org/P10112 and previous config saved to /var/cache/conftool/dbconfig/20200110-085921-marostegui.json
  • 08:55 vgutierrez: restarting pybal on lvs3005 (high-traffic1) - T242321
  • 08:51 vgutierrez: restarting pybal on lvs3007 - T242321
  • 08:48 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ncredir3002.esams.wmnet
  • 08:48 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ncredir3001.esams.wmnet
  • 08:24 ema: cp3062: varnish-frontend-restart to clear things up after child crash the past days
  • 02:11 jhuneidi@deploy1001: Pruned MediaWiki: 1.35.0-wmf.10 (duration: 04m 13s)
  • 00:45 catrope@deploy1001: Synchronized php-1.35.0-wmf.14/extensions/GrowthExperiments/: Expose tasktype/topic API parameter info (T240512) (duration: 01m 01s)
  • 00:35 shdubsh: restart prometheus on prometheus2004, enabling debug log

2020-01-09

  • 21:25 ebernhardson@deploy1001: Finished deploy [search/airflow@746c149]: Add skein to airflow venv (duration: 00m 55s)
  • 21:24 ebernhardson@deploy1001: Started deploy [search/airflow@746c149]: Add skein to airflow venv
  • 20:32 chasemp: add phabtest2 to #security temp to ensure reporting settings (T240605)
  • 20:06 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.14 refs T233862
  • 19:51 Urbanecm: Morning SWAT done
  • 19:51 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.14/resources/Resources.php: SWAT: 39bc331: Enable mediawiki.page.patrol.ajax on mobile (T242310) (duration: 01m 05s)
  • 19:35 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.14/extensions/MobileFrontend/: SWAT: 31d3be7: Hot fixes for mobile diff page (T242310) (duration: 01m 09s)
  • 19:13 urbanecm@deploy1001: Synchronized wmf-config/mobile.php: SWAT: 2f9ee90: Drop beta setting (T237290) (duration: 01m 06s)
  • 18:56 otto@deploy1001: Finished deploy [analytics/hdfs-tools/deploy@f8e9d6f]: (no justification provided) (duration: 00m 08s)
  • 18:55 otto@deploy1001: Started deploy [analytics/hdfs-tools/deploy@f8e9d6f]: (no justification provided)
  • 18:05 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:03 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:01 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 17:38 volans@cumin1001: conftool action : set/weight=10; selector: name=elastic106.*.eqiad.wmnet
  • 17:38 volans@cumin1001: conftool action : set/weight=10; selector: name=elastic105[3-9].eqiad.wmnet
  • 17:37 volans: confctl set/weight=10 for elastic10[53-67] - T242348
  • 15:46 ema: cp3058: varnish-frontend-restart to clear things up after child crash yesterday
  • 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1078', diff saved to https://phabricator.wikimedia.org/P10110 and previous config saved to /var/cache/conftool/dbconfig/20200109-152545-marostegui.json
  • 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P10109 and previous config saved to /var/cache/conftool/dbconfig/20200109-152157-marostegui.json
  • 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P10108 and previous config saved to /var/cache/conftool/dbconfig/20200109-151434-marostegui.json
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P10107 and previous config saved to /var/cache/conftool/dbconfig/20200109-150333-marostegui.json
  • 14:38 papaul: upgrading Firmware on backup2001
  • 14:27 marostegui: Upgrade db1078
  • 14:27 ema: cp3054: varnish-frontend-restart to clear things up after child crash yesterday
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P10105 and previous config saved to /var/cache/conftool/dbconfig/20200109-141057-marostegui.json
  • 14:04 moritzm: imported PHP 7.2.26 to component/php72 for stretch-wikimedia
  • 13:48 moritzm: upgrading mwdebug2002 to PHP 7.2.26 T241224
  • 13:47 moritzm: upgrading mwdebug2002 to PHP 7.2.26
  • 12:41 marostegui: Deploy schema change on s3 codfw, lag will appear on s3 codfw - T234052
  • 12:25 jynus: shutting down backup2001 T240177
  • 12:22 Urbanecm: EU SWAT done
  • 12:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ed0357a: Set $wgArticleCountMethod to any for minwiktionary (T241694) (duration: 01m 08s)
  • 12:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 06394ea: Add ipblock-exempt and extendedconfirmed to bot group on fawiki (T241904) (duration: 01m 05s)
  • 12:11 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set wmgUseEntitySourceBasedFederation for test.wikidata.org (T241973) (duration: 01m 07s)
  • 11:23 moritzm: installing cyrus-sasl security updates
  • 11:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:04 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1106', diff saved to https://phabricator.wikimedia.org/P10104 and previous config saved to /var/cache/conftool/dbconfig/20200109-100948-marostegui.json
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P10103 and previous config saved to /var/cache/conftool/dbconfig/20200109-100552-marostegui.json
  • 09:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P10102 and previous config saved to /var/cache/conftool/dbconfig/20200109-095433-marostegui.json
  • 09:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P10101 and previous config saved to /var/cache/conftool/dbconfig/20200109-095249-marostegui.json
  • 09:48 marostegui: Upgrade db1106
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for upgrade', diff saved to https://phabricator.wikimedia.org/P10100 and previous config saved to /var/cache/conftool/dbconfig/20200109-094748-marostegui.json
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1118', diff saved to https://phabricator.wikimedia.org/P10099 and previous config saved to /var/cache/conftool/dbconfig/20200109-093946-marostegui.json
  • 09:32 marostegui: Deploy schema change on db1106, this will generate a bit of lag on s1 labs
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118', diff saved to https://phabricator.wikimedia.org/P10098 and previous config saved to /var/cache/conftool/dbconfig/20200109-093119-marostegui.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118', diff saved to https://phabricator.wikimedia.org/P10097 and previous config saved to /var/cache/conftool/dbconfig/20200109-082243-marostegui.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118', diff saved to https://phabricator.wikimedia.org/P10096 and previous config saved to /var/cache/conftool/dbconfig/20200109-081629-marostegui.json
  • 07:40 XioNoX: enable traceoptions for BFD on cr2-eqdfw - T240659
  • 07:37 marostegui: Upgrade db1118
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P10094 and previous config saved to /var/cache/conftool/dbconfig/20200109-073713-marostegui.json
  • 06:27 marostegui: Remove revision partitions from db2088:3312 T239453
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3312 T239453', diff saved to https://phabricator.wikimedia.org/P10093 and previous config saved to /var/cache/conftool/dbconfig/20200109-062608-marostegui.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315 db1096:3316 T239453', diff saved to https://phabricator.wikimedia.org/P10092 and previous config saved to /var/cache/conftool/dbconfig/20200109-062157-marostegui.json
  • 00:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no-op) set config page for newcomer tasks (T233465) (duration: 01m 05s)

2020-01-08

  • 23:44 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: Roll commonswiki forward to 1.35.0-wmf.14
  • 23:34 jforrester@deploy1001: Synchronized php-1.35.0-wmf.14/extensions/WikibaseMediaInfo/resources/statements/StatementWidget.js: T242286 Update StatementWidget initialization logic (duration: 01m 05s)
  • 23:14 XenoRyet: updated civicrm from 42e88f92a9 to 9ac771a913
  • 23:09 mutante: LDAP - added moushirael to 'wmf' (T242000)
  • 22:39 mutante: restarted zuul on contint1001
  • 21:56 arlolra: Updated Parsoid to f963e51 (T238934, T237318, T238022, T228217)
  • 21:46 XenoRyet: updated civicrm from 2468d85f95 to 42e88f92a9
  • 21:46 arlolra@deploy1001: Finished deploy [parsoid/deploy@45a4245]: Updating Parsoid to f963e51 (duration: 08m 00s)
  • 21:38 arlolra@deploy1001: Started deploy [parsoid/deploy@45a4245]: Updating Parsoid to f963e51
  • 21:30 mutante: phab1003 - running decom cookbook - shutdown host, removed from puppetmaster, debmonitor etc (T238957)
  • 21:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:29 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:28 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: Revert "commonswiki to 1.35.0-wmf.11"
  • 21:21 halfak@deploy1001: Finished deploy [ores/deploy@039251f]: T242035 (duration: 16m 32s)
  • 21:07 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 21:04 halfak@deploy1001: Started deploy [ores/deploy@039251f]: T242035
  • 21:03 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 21:00 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 20:53 XenoRyet: updated civicrm from 51b6fca9b2 to 2468d85f95
  • 20:51 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.14 refs T233862 (duration: 01m 04s)
  • 20:50 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.14 refs T233862
  • 20:40 mutante: contint1001 - restarting zuul service
  • 20:00 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 19:31 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:16 mutante: LDAP - added 'sihe' to 'wmde' and 'nda' (T242080)
  • 19:15 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:13 joal@deploy1001: Finished deploy [analytics/refinery@c205576] (thin): Regular analytics weekly deploy train [thin] (duration: 00m 07s)
  • 19:13 joal@deploy1001: Started deploy [analytics/refinery@c205576] (thin): Regular analytics weekly deploy train [thin]
  • 19:13 joal@deploy1001: Finished deploy [analytics/refinery@c205576]: Regular analytics weekly deploy train (duration: 08m 36s)
  • 19:04 joal@deploy1001: Started deploy [analytics/refinery@c205576]: Regular analytics weekly deploy train
  • 18:46 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:46 marostegui: Remove partitions from dewiki.revision on db1096:3315 T239453
  • 18:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P10090 and previous config saved to /var/cache/conftool/dbconfig/20200108-184510-marostegui.json
  • 18:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3315', diff saved to https://phabricator.wikimedia.org/P10089 and previous config saved to /var/cache/conftool/dbconfig/20200108-184350-marostegui.json
  • 18:39 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:36 ppchelko@deploy1001: Finished deploy [restbase/deploy@ebb1849]: Clean up Parsoid-PHP transition code & config T241756 (duration: 14m 27s)
  • 18:33 volans: restarted wikibugs
  • 18:22 ppchelko@deploy1001: Started deploy [restbase/deploy@ebb1849]: Clean up Parsoid-PHP transition code & config T241756
  • 18:21 ppchelko@deploy1001: Finished deploy [restbase/deploy@ebb1849] (dev-cluster): Clean up Parsoid-PHP transition code & config T241756 (duration: 02m 41s)
  • 18:18 ppchelko@deploy1001: Started deploy [restbase/deploy@ebb1849] (dev-cluster): Clean up Parsoid-PHP transition code & config T241756
  • 18:07 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 18:04 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 18:03 elukey@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
  • 18:03 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 16:25 _joe_: running puppet on deploy1001 to remove my hot-patch to scap.cfg
  • 16:20 ema: rolling ats-be restart on !text@eqiad, !text@esams to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/562849/
  • 16:00 bblack: re-pooling esams text traffic in DNS
  • 15:45 ema: cumin -s10 -b1 'A:cp-text_eqiad' 'run-puppet-agent -q ; ats-backend-restart'
  • 15:40 vgutierrez: restarting ats-tls on esams text nodes
  • 15:37 ema: cumin -s10 -b1 'A:cp-text_esams' 'run-puppet-agent -q ; ats-backend-restart'
  • 15:37 bblack: authdns-update to depool esams
  • 15:26 otto@deploy1001: Synchronized wmf-config/ProductionServices.php: REVERT Make EventBus use TLS for eventgate-analytics - T242224 (duration: 00m 34s)
  • 15:24 otto@deploy1001: sync-file aborted: REVERT Make EventBus use TLS for eventgate-analytics - T242224 (duration: 03m 56s)
  • 15:20 otto@deploy1001: sync-file aborted: REVERT Make EventBus use TLS for eventgate-analytics - T242224 (duration: 06m 33s)
  • 15:12 otto@deploy1001: Scap failed!: 4/11 canaries failed their endpoint checks(http://en.wikipedia.org)
  • 15:11 otto@deploy1001: sync-file aborted: Make EventBus use TLS for eventgate-analytics - T242224 (duration: 00m 00s)
  • 15:10 otto@deploy1001: Synchronized wmf-config/ProductionServices.php: Make EventBus use TLS for eventgate-analytics - T242224 (duration: 06m 10s)
  • 15:02 XioNoX: Routinator 0.6.4 looking good on rpki2001, upgrading rpki1001 - T242197
  • 15:00 ottomata: deploying change to make EventBus use new TLS port for eventgate-analytics - T242224
  • 14:35 ema: repool cp4028 after successful X-Analytics-TLS patch test T237993
  • 14:23 ema: depool cp4028 to test X-Analytics-TLS patch T237993
  • 14:07 XioNoX: add routinator 0.6.4 to reprepro stretch-wikimedia - T242197
  • 14:00 ariel@deploy1001: Finished deploy [dumps/dumps@dbd0ecd]: don't regenerate existing 7z files on rerun of the 7z recompression job (duration: 00m 05s)
  • 14:00 ariel@deploy1001: Started deploy [dumps/dumps@dbd0ecd]: don't regenerate existing 7z files on rerun of the 7z recompression job
  • 12:46 _joe_: deleting releng/composer-php55:0.1.0 from the docker registry
  • 12:36 Lucas_WMDE: EU SWAT done
  • 12:34 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Update Skolt Sami language name (T223544) (duration: 01m 06s)
  • 12:30 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.11/extensions/Cite: SWAT: Fix handling of `` (T241303) (duration: 01m 06s)
  • 12:17 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable tainted references on test.wikidata.org (T239621) (duration: 01m 19s)
  • 12:08 kart_: Updated cxserver to 2020-01-06-070550-production (T233405)
  • 12:04 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 12:01 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 12:00 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 11:47 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes2001.*
  • 11:45 akosiaris@cumin1001: conftool action : set/weight=10; selector: service=echostore
  • 11:44 vgutierrez: uploaded varnish 5.1.3-1wm12 to apt.wikimedia.org (buster) - T242093
  • 11:44 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1001.*
  • 11:44 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1001.*
  • 11:07 moritzm: test failover of Ganeti master in eqsin T228099
  • 11:00 moritzm: drain ganeti5003 to test new Ganeti setup in eqsin T228099
  • 10:53 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:41 moritzm: rebooting netflow5001 to pick up microcode
  • 10:08 moritzm: enabling spec-ctr, ssbd. md-clear passthrough for new eqsin cluster T228099
  • 09:27 moritzm: installing urldownloader1002 T241979
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1085', diff saved to https://phabricator.wikimedia.org/P10088 and previous config saved to /var/cache/conftool/dbconfig/20200108-091124-marostegui.json
  • 09:00 moritzm: installing urldownloader1001 T241979
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P10087 and previous config saved to /var/cache/conftool/dbconfig/20200108-082930-marostegui.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P10086 and previous config saved to /var/cache/conftool/dbconfig/20200108-082050-marostegui.json
  • 08:09 marostegui: Upgrade db1085
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P10085 and previous config saved to /var/cache/conftool/dbconfig/20200108-080853-marostegui.json
  • 08:07 marostegui: Deploy schema change on s1 codfw, there will be lag on s1 codfw - T234052
  • 07:58 marostegui: Deploy schema change on clouddb2001-dev.labtestwiki - T234052
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079', diff saved to https://phabricator.wikimedia.org/P10084 and previous config saved to /var/cache/conftool/dbconfig/20200108-072017-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P10083 and previous config saved to /var/cache/conftool/dbconfig/20200108-071312-marostegui.json
  • 07:07 marostegui: Remove partitions from dewiki.revision on db1097:3315 T239453
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315', diff saved to https://phabricator.wikimedia.org/P10082 and previous config saved to /var/cache/conftool/dbconfig/20200108-070712-marostegui.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P10081 and previous config saved to /var/cache/conftool/dbconfig/20200108-070614-marostegui.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P10080 and previous config saved to /var/cache/conftool/dbconfig/20200108-070009-marostegui.json
  • 06:56 marostegui: Upgrade db1079
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P10079 and previous config saved to /var/cache/conftool/dbconfig/20200108-064404-marostegui.json
  • 06:42 marostegui: Remove partitions from revision table on s6 for db1096:3316 - T239453
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10078 and previous config saved to /var/cache/conftool/dbconfig/20200108-064144-marostegui.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10077 and previous config saved to /var/cache/conftool/dbconfig/20200108-063550-marostegui.json
  • 05:41 XioNoX: enable netflow in eqsin
  • 03:54 volker-e@deploy1001: Finished deploy [design/style-guide@ad595d5]: Deploy design/style-guide: (duration: 00m 08s)
  • 03:54 volker-e@deploy1001: Started deploy [design/style-guide@ad595d5]: Deploy design/style-guide:
  • 00:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@024488f]: airflow: set mjolnir dag start date to today (20200108) (duration: 00m 42s)
  • 00:37 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@024488f]: airflow: set mjolnir dag start date to today (20200108)
  • 00:21 reedy@deploy1001: Synchronized wmf-config/throttle.php: T240845 (duration: 01m 04s)

2020-01-07

  • 23:53 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@cb228ae]: Force python to use python3.5 dependencies (take two) (duration: 00m 10s)
  • 23:53 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@cb228ae]: Force python to use python3.5 dependencies (take two)
  • 23:36 mutante: [puppetmaster2001:/var/run/confd-template] $ sudo rm .cloudceph*.err
  • 23:02 cdanis: cp3055.mgmt% racadm serveraction powercycle T240425
  • 20:42 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@6c1f455]: Bump to master: Allow cli to load without pyspark (duration: 05m 55s)
  • 20:40 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.14 refs T233862
  • 20:36 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@6c1f455]: Bump to master: Allow cli to load without pyspark
  • 20:30 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.14 refs T233862 (duration: 29m 01s)
  • 20:12 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@867d674]: Bump to master: Allow cli to load without pyspark (duration: 05m 13s)
  • 20:06 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@867d674]: Bump to master: Allow cli to load without pyspark
  • 20:01 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.14 refs T233862
  • 19:28 James_F: mwscript createAndPromote.php foundationwiki 'Jdforrester (WMF)' --force --custom-groups=interface-admin for T241950
  • 19:02 James_F: 1.35.0-wmf.14 was branched at fb16374 T233862
  • 18:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@511f745]: [airflow] Force PYTHONPATH to use pyspark 3.5 deps (duration: 00m 14s)
  • 18:38 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@511f745]: [airflow] Force PYTHONPATH to use pyspark 3.5 deps
  • 17:31 Urbanecm: Run scap pull at mwdebug1001, test over
  • 17:29 Urbanecm: Stashing at mwdebug1001
  • 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2076 T241647', diff saved to https://phabricator.wikimedia.org/P10072 and previous config saved to /var/cache/conftool/dbconfig/20200107-172839-marostegui.json
  • 17:23 marostegui: Remove partitions from dewiki.revision from db2089:3315 T239453
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1088', diff saved to https://phabricator.wikimedia.org/P10071 and previous config saved to /var/cache/conftool/dbconfig/20200107-171955-marostegui.json
  • 17:18 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@b378752]: bump numpy to 1.17.2 (duration: 05m 53s)
  • 17:18 vgutierrez: restarting pybal on lvs1015 - T240715
  • 17:13 vgutierrez: restarting pybal on lvs1016 - T240715
  • 17:12 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@b378752]: bump numpy to 1.17.2
  • 17:10 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: service=cloudceph,name=cloudcephmon1003.wikimedia.org
  • 17:10 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: service=cloudceph,name=cloudcephmon1002.wikimedia.org
  • 17:10 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: service=cloudceph,name=cloudcephmon1001.wikimedia.org
  • 16:43 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable banner on Special:Block for partial blocks early-adopter wikis T240300 (duration: 00m 57s)
  • 16:10 elukey: cr1/cr2-eqiad: set port 443 (was 8190) for term schema in analytics-in4
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088', diff saved to https://phabricator.wikimedia.org/P10070 and previous config saved to /var/cache/conftool/dbconfig/20200107-154529-marostegui.json
  • 15:44 papaul: shutting down db2076 for FW upgrade
  • 15:41 moritzm: installing urldownloader2002 T241979
  • 15:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088', diff saved to https://phabricator.wikimedia.org/P10069 and previous config saved to /var/cache/conftool/dbconfig/20200107-152304-marostegui.json
  • 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088', diff saved to https://phabricator.wikimedia.org/P10068 and previous config saved to /var/cache/conftool/dbconfig/20200107-151633-marostegui.json
  • 15:11 moritzm: installing urldownloader2001 T241979
  • 15:09 moritzm: reimaging mw2282
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for upgrade', diff saved to https://phabricator.wikimedia.org/P10067 and previous config saved to /var/cache/conftool/dbconfig/20200107-150440-marostegui.json
  • 14:39 _joe_: uploading python3-docker-report to {buster,stretch}-wikimedia, T241206
  • 14:35 marostegui: Power off db2076 for on-site maintenance T241647
  • 14:32 marostegui: Stop MySQL on db2076 for maintenance T241647
  • 14:22 marostegui: Deploy schema change on s7 codfw master, this will generate lag on s7 codfw - T234052
  • 14:21 marostegui: Deploy schema change on s2 codfw master, this will generate lag on s2 codfw - T234052
  • 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P10066 and previous config saved to /var/cache/conftool/dbconfig/20200107-140300-marostegui.json
  • 14:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:43 moritzm: reimaging mw2282
  • 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P10065 and previous config saved to /var/cache/conftool/dbconfig/20200107-134251-marostegui.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104', diff saved to https://phabricator.wikimedia.org/P10064 and previous config saved to /var/cache/conftool/dbconfig/20200107-133439-marostegui.json
  • 12:56 Lucas_WMDE: EU SWAT done
  • 12:56 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix WBRepoCanonicalUriProperty setting for testwikidatawiki (duration: 00m 54s)
  • 12:52 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix wgImportSources setting for wikidata dblist (duration: 00m 54s)
  • 12:39 Urbanecm: Run mwscript initSiteStats.php --wiki=tawiktionary --update (T241684)
  • 12:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5be01f0: Modify $wgArticleCount to any for ta.wiktionary (T241684) (duration: 00m 55s)
  • 12:32 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: d6ee5fe: Modify ge.wikimedia project logos (T241327) (duration: 00m 57s)
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104', diff saved to https://phabricator.wikimedia.org/P10063 and previous config saved to /var/cache/conftool/dbconfig/20200107-122914-marostegui.json
  • 12:17 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Clean up unused configs in InitialiseSettings.php (T238154) (duration: 00m 54s)
  • 12:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Clean up unused configs in InitialiseSettings.php (T238154) (duration: 00m 55s)
  • 12:13 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Clean up unused configs in Wikibase.php (T238154) (duration: 00m 54s)
  • 12:12 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Clean up unused configs in Wikibase.php (T238154) (duration: 00m 54s)
  • 12:11 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Clean up unused configs in Wikibase.php (T238154) (duration: 00m 56s)
  • 11:12 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:12 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 11:12 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 11:11 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:10 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 11:10 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 10:53 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 10:53 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 10:39 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.11/extensions/Wikibase/lib/includes/Store/Sql/Terms/DatabaseTermIdsAcquirer.php: Temporary add metrics of the need to reinsert in the new term store (duration: 00m 57s)
  • 10:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 10:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1092', diff saved to https://phabricator.wikimedia.org/P10062 and previous config saved to /var/cache/conftool/dbconfig/20200107-100743-marostegui.json
  • 10:05 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 10:05 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P10061 and previous config saved to /var/cache/conftool/dbconfig/20200107-100157-marostegui.json
  • 10:01 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P10060 and previous config saved to /var/cache/conftool/dbconfig/20200107-095501-marostegui.json
  • 09:52 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 09:52 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P10059 and previous config saved to /var/cache/conftool/dbconfig/20200107-094944-marostegui.json
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P10058 and previous config saved to /var/cache/conftool/dbconfig/20200107-094506-marostegui.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for alter and upgrade', diff saved to https://phabricator.wikimedia.org/P10057 and previous config saved to /var/cache/conftool/dbconfig/20200107-092221-marostegui.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 for compression', diff saved to https://phabricator.wikimedia.org/P10056 and previous config saved to /var/cache/conftool/dbconfig/20200107-082236-marostegui.json
  • 08:11 ayounsi@deploy1001: Finished deploy [librenms/librenms@7a0f7aa]: Upgrade LibreNMS to 1.59 - T241962 (duration: 00m 10s)
  • 08:11 ayounsi@deploy1001: Started deploy [librenms/librenms@7a0f7aa]: Upgrade LibreNMS to 1.59 - T241962
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es1019', diff saved to https://phabricator.wikimedia.org/P10055 and previous config saved to /var/cache/conftool/dbconfig/20200107-074159-marostegui.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1019 for upgrade', diff saved to https://phabricator.wikimedia.org/P10054 and previous config saved to /var/cache/conftool/dbconfig/20200107-074035-marostegui.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es1013', diff saved to https://phabricator.wikimedia.org/P10053 and previous config saved to /var/cache/conftool/dbconfig/20200107-073922-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1013 for upgrade', diff saved to https://phabricator.wikimedia.org/P10052 and previous config saved to /var/cache/conftool/dbconfig/20200107-073543-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es1018', diff saved to https://phabricator.wikimedia.org/P10051 and previous config saved to /var/cache/conftool/dbconfig/20200107-073508-marostegui.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1018 for upgrade', diff saved to https://phabricator.wikimedia.org/P10050 and previous config saved to /var/cache/conftool/dbconfig/20200107-072930-marostegui.json
  • 07:15 marostegui: Remove partitions from s5: db2084:3315 T239453
  • 07:13 marostegui: Remove partitions from revision table on s6: db1098 T239453
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10049 and previous config saved to /var/cache/conftool/dbconfig/20200107-070850-marostegui.json
  • 07:05 marostegui: Depool labsdb1011
  • 07:03 marostegui: Deploy schema change on s8 codfw (this will generate lag on s8 codfw) - T234052
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10048 and previous config saved to /var/cache/conftool/dbconfig/20200107-064846-marostegui.json
  • 01:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 01:15 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: InitialiseSettings - clean up groupOverrides layout / spacing (sync again) (duration: 00m 53s)
  • 01:14 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: InitialiseSettings - clean up groupOverrides layout / spacing (duration: 00m 54s)
  • 01:12 mutante: ganeti - creating urldownloader2002.wikimedia.org in codfw_B with 1 CPU, 1 GB RAM, 10 GB disk, public IP (T241979)
  • 01:12 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 01:09 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 01:04 mutante: ganeti - creating urldownloader2001.wikimedia.org in codfw_A with 1 CPU, 1 GB RAM, 10 GB disk, public IP (T241979)
  • 01:04 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 01:03 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert: "cirrus: Shift more_like to codfw cirrus cluster" (duration: 00m 54s)
  • 01:02 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 00:59 mutante: ganeti - creating urldownloader1002.wikimedia.org in eqiad_C with 1 CPU, 1 GB RAM, 10 GB disk, public IP (T241979)
  • 00:58 mutante: ganeti - creating urldownloader1001.wikimedia.org in eqiad_A with 1 CPU, 1 GB RAM, 10 GB disk, public IP (T241979)
  • 00:57 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:57 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Revert "reduce query load on cirrus elastic clusters" (duration: 00m 54s)
  • 00:57 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 00:56 ebernhardson@deploy1001: sync-file aborted: Revery (duration: 00m 00s)
  • 00:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:46 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: use local search in production (T235717) (duration: 00m 54s)
  • 00:45 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: GrowthExperiments: use local search in production (T235717) (duration: 00m 58s)
  • 00:27 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Partial Blocks on every wiki excluding those that have opted-out (T218626) (duration: 00m 55s)

2020-01-06

  • 23:49 ejegg: updated payments-wiki from 827e3235dc to c3ca3ad6a7
  • 23:12 mutante: mailman - running /usr/local/sbin/rename_list wikimediamy wikimedia-my (T241988)
  • 22:34 eileen: civicrm revision changed from b7746c31aa to 51b6fca9b2, config revision is b8af24d7c8
  • 21:28 Amir1: starting rebuild of holes in new term store from Q1Mio to Q10Mio using screen in mwmaint1002 (T219123)
  • 20:06 ejegg: updated fundraising civicrm from 5642a92223 to b7746c31aa
  • 20:02 mutante: LDAP - added 'krli' (Kris Litson) to 'wmde' and 'nda' for superset access (T241722)
  • 19:39 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@41a22b8]: Bump to latest master (duration: 06m 57s)
  • 19:32 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@41a22b8]: Bump to latest master
  • 19:26 Urbanecm: Morning SWAT done
  • 19:25 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: 0f045c3: Enable local uploads on inh.wiki (T239925) (duration: 00m 54s)
  • 19:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 7722ff3: 0bff587: Add www.digital.archives.go.jp/mediaphoto.mnhn.fr to the wgCopyUploadsDomains (T238476, T241637) (duration: 00m 54s)
  • 19:19 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 1324af9: Add throttle rule for ECLAC editathon in Santiago, Chile (T241414) (duration: 00m 54s)
  • 19:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c3a3248: Add sandboxlink for eswikivoyage (T241163) (duration: 00m 58s)
  • 19:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: d7a19ca: Enable GeoData extension in ruwikinews (T239000) (duration: 00m 56s)
  • 18:49 ebernhardson@deploy1001: Finished deploy [search/airflow@8db442c]: match cryptography package with debian buster (duration: 00m 53s)
  • 18:48 ebernhardson@deploy1001: Started deploy [search/airflow@8db442c]: match cryptography package with debian buster
  • 18:17 ebernhardson@deploy1001: Finished deploy [search/airflow@8ae8500]: Require apache-airflow[kerberos] python package (duration: 00m 27s)
  • 18:16 ebernhardson@deploy1001: Started deploy [search/airflow@8ae8500]: Require apache-airflow[kerberos] python package
  • 17:11 jakob@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 16:56 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 16:27 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 15:57 milimetric@deploy1001: Finished deploy [analytics/refinery@09133cf]: Fix for geoeditors monthly (duration: 11m 49s)
  • 15:47 herron: migrating mx1001 to seconday ganeti node T240906
  • 15:45 milimetric@deploy1001: Started deploy [analytics/refinery@09133cf]: Fix for geoeditors monthly
  • 15:30 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [officewiki] Grant ipblock-exempt to all users T231943 (duration: 00m 56s)
  • 15:06 ariel@deploy1001: Finished deploy [dumps/dumps@db81d78]: avoid aborts on some symlink cleanup failures (duration: 00m 06s)
  • 15:06 ariel@deploy1001: Started deploy [dumps/dumps@db81d78]: avoid aborts on some symlink cleanup failures
  • 15:04 XioNoX: remove BGP to AS13285 in ulsfo (IXP not listed in peeringdb anymore)
  • 14:56 XioNoX: remove BGP to AS13285 in eqiad (IXP not listed in peeringdb anymore)
  • 14:32 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable WebAuthn everywhere (duration: 00m 54s)
  • 14:31 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WebAuthn everywhere (duration: 00m 57s)
  • 13:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:35 moritzm: reimaging mw2282 to validate correctness of apt::package_from_component for fresh installs
  • 12:58 Urbanecm: EU SWAT done
  • 12:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 88c800c: Add basic transwiki sources for ltwiki (T241288) (duration: 00m 54s)
  • 12:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c44b4ff: Enable subpages for the main namespace on ge.wikimedia (T241329) (duration: 00m 55s)
  • 12:46 Urbanecm: mwscript namespaceDupes.php --wiki=napwikisource --fix (T231880)
  • 12:45 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 864a2f8: Set Author and Author_talk aliases for Autore NS at napwikisource (T231880) (duration: 00m 55s)
  • 12:43 Urbanecm: mwscript namespaceDupes.php --wiki=zhwiktionary --fix (T241023)
  • 12:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0baf554: Add new namespace and aliases for zh.wiktionary (T241023) (duration: 00m 54s)
  • 12:39 urbanecm@deploy1001: sync-file aborted: SWAT: 0ac5032: Add throttle exception for Amical Wikimedia Workshop (T241705) (duration: 00m 01s)
  • 12:39 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 12:37 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 0ac5032: Add throttle exception for Amical Wikimedia Workshop (T241705) (duration: 00m 56s)
  • 12:31 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Don’t check constraints on P6685 statements Bypassing T236104 (duration: 00m 55s)
  • 12:28 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.11/maintenance/rebuildLocalisationCache.php: SWAT: Add option to override storeClass in rebuildLocalisationCache (T105683 T99740) (duration: 00m 55s)
  • 12:25 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Revert "Add a bit for forcing LC caching backend in cli mode" (duration: 00m 54s)
  • 12:23 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 12:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Don’t check constraints on P6685 statements (T227865) (duration: 00m 55s)
  • 12:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set read new for item term store up to Q100K (T219123) (duration: 00m 55s)
  • 12:07 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable wgCiteResponsiveReferences on cswiki (T241304) (duration: 00m 56s)
  • 11:42 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 55s)
  • 11:41 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 55s)
  • 10:56 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Revert T227416 mitigations (duration: 01m 05s)
  • 10:39 moritzm: installing libbsd security updates
  • 09:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:32 moritzm: reimaging mw2282 to validate correctness of apt::package_from_component for fresh installs
  • 07:37 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --batch-size=100 --sleep=2 --file=/tmp/1mio.lines (T219301)
  • 03:53 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --batch-size=100 --sleep=2 --file=/tmp/100k.lines (T219301)
  • 00:06 effie: pool cp3065 T238305
  • 00:05 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=cp3065.esams.wmnet

2020-01-05

  • 23:56 effie: powecycle cp3065.esams.wmnet T238305
  • 23:53 jiji@cumin1001: conftool action : set/pooled=no; selector: name=cp3065.esams.wmnet
  • 13:09 Urbanecm: mwmaint1002: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Coffeeandcrumbs /home/urbanecm/T241917 (T241917)

2020-01-04

  • 16:34 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:34 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:34 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:34 aborrero@cumin1001: START - Cookbook sre.hosts.downtime

2020-01-03

  • 22:14 volker-e@deploy1001: Finished deploy [design/style-guide@8054026]: Deploy design/style-guide: (duration: 00m 08s)
  • 22:14 volker-e@deploy1001: Started deploy [design/style-guide@8054026]: Deploy design/style-guide:
  • 17:44 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2084 instances T241103', diff saved to https://phabricator.wikimedia.org/P10035 and previous config saved to /var/cache/conftool/dbconfig/20200103-174447-jynus.json
  • 16:54 ejegg: updated fundraising CiviCRM from 217a1f8c63 to 5642a92223
  • 16:36 jynus: stopping db2084
  • 15:04 marostegui: Upgrade db2107
  • 14:58 marostegui: Deploy schema changes on s2 and s4 eqiad hosts T234052
  • 14:56 jbond42: clean up old /etc/apt/preferences.d/smartmontools.pref file
  • 14:48 jbond42: clean up old /etc/apt/preferences.d/puppet_all.pref file
  • 14:45 jbond42: clean up old /etc/apt/preferences.d/facter.pref file
  • 14:15 Urbanecm: Run undelete.php on a couple of pages at plwikisource per T241824
  • 13:50 marostegui: Deploy schema change on s4 codfw (lag will appear on codfw s4) - T234052
  • 13:46 moritzm: restarting exim on MXes to pick up SASL security update
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 after schema change', diff saved to https://phabricator.wikimedia.org/P10033 and previous config saved to /var/cache/conftool/dbconfig/20200103-110028-marostegui.json
  • 10:20 moritzm: restarting apache on cloudmetrics* to pick up SASL security update
  • 10:11 moritzm: installing cyrus-sasl2 security updates on stretch/buster
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 schema change', diff saved to https://phabricator.wikimedia.org/P10032 and previous config saved to /var/cache/conftool/dbconfig/20200103-094252-marostegui.json
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 after schema change', diff saved to https://phabricator.wikimedia.org/P10031 and previous config saved to /var/cache/conftool/dbconfig/20200103-093829-marostegui.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 schema change', diff saved to https://phabricator.wikimedia.org/P10030 and previous config saved to /var/cache/conftool/dbconfig/20200103-092107-marostegui.json
  • 08:17 marostegui: Deploy schema change on labswiki (wikitech) T234052
  • 07:10 marostegui: Deploy schema change on s2 codfw master, lag will appear on codfw - T234052
  • 06:57 marostegui: Deploy schema change on s6 eqiad hosts - T234052
  • 06:23 marostegui: Deploy schema change on db2089:3316
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10029 and previous config saved to /var/cache/conftool/dbconfig/20200103-062242-marostegui.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2087:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10028 and previous config saved to /var/cache/conftool/dbconfig/20200103-062148-marostegui.json

2020-01-02

  • 23:33 ejegg: updated Fundraising CiviCRM from d534f4e966 to 217a1f8c63
  • 23:09 ejegg: updated Fundraising CiviCRM from 6936aa0262 to d534f4e966
  • 22:44 ejegg: updated fundraising CiviCRM from f4db7fdb31 to 6936aa0262
  • 20:48 ejegg: updated Fundraising CiviCRM from abf0019c44 to f4db7fdb31
  • 20:30 sbassett@deploy1001: Synchronized wmf-config/CommonSettings.php: Deploying revert of temporary patch for T241503 (permissions clean-up) (duration: 00m 53s)
  • 19:57 sbassett@deploy1001: Synchronized wmf-config/CommonSettings.php: Deploying temporary patch for T241503 (permissions clean-up) (duration: 00m 54s)
  • 18:53 ejegg: re-enabled fundraising cron jobs
  • 18:29 ejegg: disabled fundraising cron jobs
  • 16:15 moritzm: restarting Apache on graphite* hosts to pick up SASL security update
  • 16:11 moritzm: restarting Apache on webperf* hosts to pick up SASL security update
  • 15:52 moritzm: restarting Apache on puppetboard* hosts to pick up SASL security update
  • 15:46 moritzm: restarting FPM on parsoid canary to pick up SASL security update
  • 14:27 marostegui: Deploy schema change on s6 codfw master (db2129) with replication - T234052
  • 14:22 marostegui: Deploy schema change on s5 eqiad hosts - T234052
  • 14:05 moritzm: restarting PHP/Apache on mw canaries to pick up SASL security update
  • 13:47 moritzm: installing cyrus-sasl security updates on Stretch/Buster
  • 13:23 marostegui: Deploy schema change on s5 codfw master (db2123) with replication - T234052
  • 13:17 moritzm: upgrading jessie servers to intel-microcode 3.20191115.2
  • 13:14 foks: scramble password for Windy906
  • 13:00 XioNoX: enable BFD traceoptions on cr1-eqiad and cr3-knams - T240659
  • 12:41 moritzm: upgrade recently reimaged hosts to puppet 5 T239832
  • 12:32 moritzm: upgrade recently reimaged hosts to facter 3 T239832
  • 12:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:07 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:53 moritzm: restarting FPM on scandium to clear opcache health
  • 11:42 moritzm: reimaging mw2277 to validate fix for puppet5/facter3 installation on new installs T239832
  • 11:23 arturo: import more openstack packages into stretch-wikimedia thirdparty/openstack-pike-stretch (T241347)
  • 10:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:58 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:58 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2076 T241647', diff saved to https://phabricator.wikimedia.org/P10021 and previous config saved to /var/cache/conftool/dbconfig/20200102-085806-marostegui.json
  • 08:35 marostegui: Upgrade db2090
  • 08:26 marostegui: Upgrade db2075
  • 08:10 marostegui: Deploy schema change on officewiki.flow_wiki_ref on s3 master (db1123) T241387
  • 07:49 marostegui: Deploy schema change on techconductwiki.flow_wiki_ref (empty table) on s3 master (db1123) T241387
  • 07:26 marostegui: Upgrade db2079
  • 07:18 marostegui: Deploy schema change on labswiki.flow_wiki_ref (empty table) T241387
  • 06:46 marostegui: Deploy schema change on db2131 - T241387
  • 06:44 marostegui: Repool labsdb1009
  • 06:30 marostegui: Upgrade labsdb1009
  • 06:29 marostegui: Remove revision partitions from db2087:3316 T239453
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10020 and previous config saved to /var/cache/conftool/dbconfig/20200102-062650-marostegui.json
  • 06:22 marostegui: Depool labsdb1009
  • 00:22 ejegg: re-enabled fundraising cron jobs

2020-01-01

  • 21:13 ejegg: stopped fundraising cron jobs to calculate EOY summaries
  • 04:57 andrewbogott: depooling labweb1002 so I can hotfix labweb1001 for T240734


Archives

See Server admin log/Archives.