You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log

From Wikitech-static
Revision as of 16:40, 7 October 2018 by imported>Stashbot (dereckson: Reset user email for account "Dominic Mayers" (T206421))
Jump to navigation Jump to search

2018-10-07

  • 16:40 dereckson: Reset user email for account "Dominic Mayers" (T206421)
  • 16:35 elukey: run a script in tmux (my username) on mw2201 to poll the status of a mcrouter key/route every 10s using its admin api (very lightweight but kill if needed)
  • 14:52 onimisionipe: repooling wdqs2003. Catched up on Lag and also Lag issues seems to be creeping on wdqs200[1|2]
  • 04:29 SMalyshev: temp depooled wdqs2003
  • 03:12 ejegg: disabled all fundraising scheduled jobs - something that looks like disk issues on civi1001

2018-10-06

  • 21:20 gehel: repooling wdqs2003: catched up on updater lag
  • 20:43 _joe_: restarting apache2 on puppetmaster1001
  • 19:16 onimisionipe: depooling wdqs2003
  • 18:10 elukey: restart Yarn Resource Manager on an-master1002 to force an-master1001 to take the active role back (failed over due to a zk conn issue)
  • 17:07 onimisionipe: restarting wdqs-blazegraph on wdqs2003
  • 13:48 bblack: multatuli: update gdnsd package to 2.99.9930-beta-1+wmf1
  • 13:47 bblack: authdns1001: update gdnsd package to 2.99.9930-beta-1+wmf1 (correction to last msg)
  • 13:46 bblack: authdns1001: update gdnsd package to 2.99.9161-beta-1+wmf1
  • 12:57 bblack: rebooting cp1076
  • 12:49 bblack: depool cp1076, apparently has disk issues

2018-10-05

  • 23:50 bblack: <<<<<<< repooling eqiad edge caches, a few days ahead of intended switchback next Weds, to alleviate some traffic engineering concerns over the weekend >>>>>>
  • 20:48 mutante: T191183 - it's still showing the error page as before but that isn't due to apache issues, it just needs additional ferm rules
  • 20:44 mutante: gerrit - adding gerrit.wmfusercontent.org virtual host for avatars. applied first on gerrit2001, then on cobalt (T191183)
  • 20:03 ejegg: updated fundraising CiviCRM from ebc2e0076c to 7a0d14015e
  • 19:48 banyek: repooling labsdb1009 (T195747)
  • 19:44 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@f8776de]: Redeploy 1009 (duration: 00m 26s)
  • 19:44 smalyshev@deploy1001: Started deploy [wdqs/wdqs@f8776de]: Redeploy 1009
  • 18:37 bblack: authdns2001: upgraded gdnsd to 2.99.9930-beta
  • 18:31 bblack: gdnsd-2.99.9930-beta-1+wmf1 uploaded to stretch-wikimedia
  • 18:26 mutante: icinga - noop on all servers, no change, puppet re-enabled, operations normal
  • 18:08 mutante: disabling puppet on icinga for 5 min for extra safety before a change that should be noop
  • 17:58 banyek: depooling labsdb1009 (T195747)
  • 17:50 banyek: repooling labsdb1011 (T195747)
  • 17:12 elukey: set etcd in codfw as read/write (was readonly) and eqiad as readonly (was read/write)
  • 14:57 banyek: depooling labsdb1011 (T195747)
  • 14:56 banyek: depooling labsdb1011
  • 13:26 banyek: adding wmf-pt-kill_2.2.20-1+wmf3 package for stretch
  • 13:25 moritzm: installing python3.5/2.7 security updates
  • 13:02 volans: upgraded spicerack to version 0.0.9 on sarin/neodymium/cumin* - T199079
  • 12:13 vgutierrez: Creating certcentral1001.eqiad.wmnet in ganeti - T206308
  • 12:12 vgutierrez: Creating certcentral2001.codfw.wmnet in ganeti - T206308
  • 11:59 elukey: deleted bohrium from ganeti via gnt-instance
  • 11:43 moritzm: rebooting wezen for kernel security update
  • 11:29 moritzm: rebooting ruthenium for kernel security update
  • 10:40 jynus: restarting replication on labsdb1010/1 on s3 and s5
  • 10:37 volans: uploaded spicerack_0.0.9-1{,+deb9u1} to apt.wikimedia.org {jessie,stretch}-wikimedia - T199079
  • 10:17 moritzm: rearmed keyholder on netmon2001
  • 10:10 elukey: restart confd on labs-puppetmaster to pick up new etcd settings (eqiad -> codfw)
  • 10:03 _joe_: restarting navtiming.service on webperf1001 to pick up the dns change for etcd
  • 09:37 elukey: restart rsyslog on lithium - broken connection to tegmen - T199406
  • 09:37 banyek: disabling puppet on labsdb1009,labsdb1010,labsdb1011 (T203674)
  • 09:36 banyek: adding wmf-pt-kill_2.2.20-1+wmf2 package for stretch
  • 09:16 volans: rebooting tegmen, console stuck, possible re-occurrence of T199413 (to be confirmed)
  • 09:12 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Move some wikis for s3 to s5 (duration: 00m 56s)
  • 09:06 elukey: stop etcdmirror replication on conf2002
  • 09:05 _joe_: restarting confd on all nodes in eqiad and esams
  • 08:58 _joe_: wiped cached values for the read-only etcd SRV record
  • 08:56 _joe_: read-write connections to etcd only go to codfw now
  • 08:35 _joe_: reenabling notifications for etcdmirror on conf1005
  • 08:02 jynus: start replication on db1069 (x1)
  • 07:54 jynus: starting replicatios on db1075; db1070, db1070:s3 with disabled gtid
  • 07:50 jynus: stopping dbstore1001:x1
  • 07:33 jynus: chaning s3 master for db1070
  • 07:28 jynus: stopping s3 replication on db1070
  • 07:20 jynus: stopping x1 replication on db1069
  • 07:20 godog: temporarily stop prometheus on bast4001 to finalize data transfer - T179050
  • 07:19 jynus: stopping s3 replication on db1075
  • 07:18 jynus: stopping s5 replication on db1070
  • 07:09 moritzm: installing python3.4/2.7 security updates
  • 05:55 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T205599 - Ic28e00c30 (duration: 00m 57s)
  • 05:53 _joe_: upgrading python-etcd on conf1004-6, restarting etcdmirror
  • 05:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1092 status - T205514 (duration: 00m 57s)
  • 04:18 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/libs/filebackend/FileBackendStore.php: T205567 - I75f1eb6dc2cb (duration: 00m 56s)
  • 04:16 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/CirrusSearch/includes/DataSender.php: I0769c50c (duration: 01m 01s)
  • 00:31 mutante: LDAP: added user skvjold to group wmf (T204377)

2018-10-04

  • 22:51 ejegg: updated fundraising CiviCRM from 944b954bac to ebc2e0076c
  • 21:27 XioNoX: bounce phab1001 switch port - T201039
  • 20:47 ejegg: updated fundraising CiviCRM from ddf4865650 to 944b954bac
  • 20:23 mforns@deploy1001: Finished deploy [analytics/refinery@3eb9bf2]: deploying refinery together with refinery-source v0.0.76 (duration: 00m 17s)
  • 20:22 mforns@deploy1001: Started deploy [analytics/refinery@3eb9bf2]: deploying refinery together with refinery-source v0.0.76
  • 20:10 mforns@deploy1001: Finished deploy [analytics/refinery@3eb9bf2]: deploying refinery together with refinery-source v0.0.76 (duration: 14m 04s)
  • 19:56 mforns@deploy1001: Started deploy [analytics/refinery@3eb9bf2]: deploying refinery together with refinery-source v0.0.76
  • 19:30 marxarelli: rise in fatals "Fatal error: entire web request took longer than 60 seconds and timed out in /srv/mediawiki/php-1.32.0-wmf.24/includes/Title.php"
  • 19:26 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.32.0-wmf.24
  • 19:15 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@6dc89c0]: Bump cirrusSearchLinksUpdate concurrency to 50 (duration: 00m 53s)
  • 19:14 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@6dc89c0]: Bump cirrusSearchLinksUpdate concurrency to 50
  • 18:49 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:460202|]] (duration: 00m 59s)
  • 18:24 XioNoX: bounce lvs1002:eth1 switch port
  • 18:23 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable PageTriage/ORES on enwiki (T206149) (duration: 01m 01s)
  • 18:21 bblack: lvs1002: puppet disabled, stopping pybal (fail to 1005)
  • 18:07 _joe_: disabled notifications for etcd replication lag on conf1005, not in production
  • 17:47 banyek: repooling labsb1010 (T195747)
  • 17:41 _joe_: uploaded new python-etcd packages for jessie, stretch
  • 17:38 XioNoX: asw2-b-eqiad recabling done - T201039
  • 17:34 elukey: pool kafka1002 (eventbus) after maintenance
  • 17:22 elukey: re-enable ircecho after alarms shower
  • 17:15 andrewbogott: triggering some alerts on labvirt1018 to figure out about alert thresholds
  • 17:06 elukey: stop ircecho on einstenium - alarms shower
  • 17:02 gtirloni: tools - published updated toollabs-* Docker images
  • 16:54 ejegg: updated standalone SmashPig deploy from 82f9d49c23 to 5f21d3f2db
  • 16:52 XioNoX: Step 3) Add missing links - T201039
  • 16:45 shdubsh: etherpad1001 running systemctl reset-failed
  • 16:41 XioNoX: Connect/enable fpc2:0/51-fpc5:1/0 (5m DAC) - T201039
  • 16:39 XioNoX: Enable fpc5-fpc7 - T201039
  • 16:33 twentyafterfour: started phd on phab1001 and re-enabled puppet (I had it disabled to prevent starting phd during read-only)
  • 16:25 twentyafterfour: phabricator is read-write
  • 16:21 jynus: reloading dbproxy1003,8
  • 16:16 marostegui: Stop and reboot db1072 (phabricator master) for maintenance
  • 16:16 twentyafterfour: phabricator is read-only
  • 16:14 XioNoX: Enable all VC ports on FPC2 and FPC7 - T201039
  • 16:13 XioNoX: starting asw2-b-eqiad re-cabling - T201039
  • 16:08 twentyafterfour: logged downtime for phabricator in icinga, stopped phd queue processing in preparation for read-only mode
  • 16:07 jynus: reloading haproxy @ dbproxy1005
  • 16:00 marostegui: Stop MySQL on db1073 for mariadb and kernel upgrade - T201039 T148507
  • 15:58 arturo: icinga downtime every server in the main cloudvps deployment for 2h T201039
  • 15:56 arturo: icinga downtime every server with the cloudXXXX scheme for 2h T201039
  • 15:54 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@55dbb8b]: Proper reconnect on topics change T199444 (duration: 00m 55s)
  • 15:53 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@55dbb8b]: Proper reconnect on topics change T199444
  • 15:52 ppchelko@deploy1001: Finished deploy [changeprop/deploy@5d00448]: Proper reconnect on topics change T199444 (duration: 01m 40s)
  • 15:51 ppchelko@deploy1001: Started deploy [changeprop/deploy@5d00448]: Proper reconnect on topics change T199444
  • 15:41 elukey: depool kafka1002 from eventbus as precautionary step for T201039
  • 14:48 banyek: depooling labsb1010 (T195747)
  • 14:09 marostegui: Sanitize enwikivoyage cebwiki shwiki srwiki mgwiktionary on db1124:3315 T184805
  • 13:46 pmiazga@deploy1001: Finished deploy [proton/deploy@ecb9a0e]: Bugfix:handle undefined response and fix grafana stats (T186748,T201158) (duration: 02m 55s)
  • 13:43 pmiazga@deploy1001: Started deploy [proton/deploy@ecb9a0e]: Bugfix:handle undefined response and fix grafana stats (T186748,T201158)
  • 13:14 banyek: muting alerts on s2replication @dbstore2002 and resuming compression of s2 database tables (T204930)
  • 13:14 banyek: muting alerts on dbstore2002 and resuming compression of s2 database tables (T204930)
  • 12:23 elukey: deploy etcdmirror on conf1005 - T205814
  • 12:06 zeljkof: EU SWAT finished
  • 12:06 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add permission "move-rootuserpages" to usergroup "eliminator" at ptwiki (T205595) (duration: 00m 57s)
  • 12:01 moritzm: rolling reboot of ms-fe hosts in codfw for kernel security update
  • 12:00 zeljkof: one more patch for EU SWAT
  • 11:57 zeljkof: EU SWAT finished
  • 11:57 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add *.nasimonline.ir to wgCopyUploadsDomains whitelist for Commons (T203371) (duration: 00m 56s)
  • 11:52 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: add Radlines.org to $wgCopyUploadsDomains (T203219) (duration: 00m 57s)
  • 11:42 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add .bollywoodhungama.in to wgCopyUploadsDomains (T203363) (duration: 00m 57s)
  • 11:35 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add some namespaces aliases for zhwikiversity (T201675) (duration: 00m 57s)
  • 11:27 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change acewiki default time zone to Asia/Jakarta (T205693) (duration: 00m 56s)
  • 11:17 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create Photowalk and Photowalk Talk namespaces for bd.wikimedia.org (T205747) (duration: 00m 57s)
  • 10:44 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.23/README: noop sync to verify that scap 3.8.7-1 works (at least on a basic level) (duration: 00m 59s)
  • 10:38 godog: upload scap 3.8.7-1 - T204383
  • 10:36 _joe_: uploading etcd-mirror to stretch-wikimedia T205814
  • 10:08 moritzm: rolling reboot of ms-fe hosts in eqiad for kernel security update
  • 09:13 arturo: T203177 schedule 8h icinga downtime for cloudcontrol1003,1004 and labmon1001
  • 08:52 moritzm: installing python2.7/python3.4/python3.5 security updates on jessie/stretch
  • 08:34 moritzm: installing ca-certificates updates for jessie/stretch
  • 08:09 marostegui: Restart icinga T196336
  • 08:00 gehel: re-enabling puppet on maps1004
  • 07:31 elukey: move Piwik/Matomo from bohrium to matomo1001 - T202962
  • 07:25 godog: reformat ms-be1041 with crc=1 finobt=0 - T199198
  • 06:57 jynus: starting multisource replication of s3 from s5 at eqiad master
  • 06:51 jynus: reenabling consistency configuration on s5 replica databases
  • 06:24 jynus: create manual backup of databases on eqiad s6, s7, s8, x1
  • 05:36 marostegui: Deploy schema change on db2048 (s1 master) - T205913
  • 05:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2062 (duration: 00m 56s)
  • 05:30 marostegui: Deploy schema change on db2062 - T205913
  • 05:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2062 (duration: 00m 57s)
  • 04:04 SMalyshev: repooled wdqs2003
  • 03:22 SMalyshev: depool wdqs2003 to let it catch up
  • 03:21 SMalyshev: repooled wdqs2001
  • 03:16 ejegg: re-enabled PayPal EC orphan rectifier
  • 03:06 ejegg: updated CiviCRM from 80cb98e33e to ddf4865650
  • 02:43 SMalyshev: depooled wdqs2001 to see if it catches up faster
  • 01:54 ejegg: updated payments-wiki from 8b673cfb4f to d623de9494

2018-10-03

  • 23:54 mutante: scheduled downtime for wdqs as it's flapping and already known
  • 23:45 catrope@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/VisualEditor/: Require Parsoid HTML 2.0.0, and handle its <audio> tags (T201081); ext.visualEditor.mwlanguage: Actually load all of the code (T205834) (duration: 00m 57s)
  • 23:41 catrope@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/VisualEditor/: Require Parsoid HTML 2.0.0, and handle its <audio> tags (T201081) (duration: 00m 59s)
  • 23:29 catrope@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/PageTriage/: Hide copyvio AFC filter option behind flag (T205918) (duration: 00m 57s)
  • 23:23 catrope@deploy1001: Synchronized php-1.32.0-wmf.24/includes/utils/UIDGenerator.php: Make UID clock drift error have more details (T94522) (duration: 00m 58s)
  • 23:20 XenoRyet: shut off Paypal orphan rectifier
  • 23:12 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump Minerva A/B test rates to 100% on jawiki, ruwiki, fawiki (T200792) (duration: 00m 56s)
  • 22:49 shdubsh: re-enable puppet on einsteinium
  • 22:45 shdubsh: einsteinium: setting enable_notifications=1 and reloading icinga
  • 22:36 herron: herron@neodymium:~$ sudo cumin -b 15 -p 95 '*' 'run-puppet-agent -q --failed-only'
  • 22:20 shdubsh: einsteinium: setting enable_notifications=0 and starting icinga
  • 22:06 herron: herron@neodymium:~$ sudo cumin -b 40 -p 95 'R:file = /etc/nagios/nrpe_local.cfg' run-puppet-agent
  • 22:02 mutante: mw2242 - started nagios-nrpe-server
  • 22:01 shdubsh: icinga stopped manually
  • 21:57 mutante: einstienium - disabling puppet
  • 21:25 bblack: upgraded gdnsd to 2.99.9161 on authdns1001
  • 21:17 dduvall@deploy1001: Synchronized php: group1 wikis to 1.32.0-wmf.24 (duration: 00m 55s)
  • 21:16 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.24
  • 21:12 dduvall@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/WikibaseQualityConstraints/src/ServiceWiring.php: deploying fix to 1.32.0-wmf.24 for T206161 (duration: 00m 57s)
  • 20:28 marxarelli: deployed proposed WikibaseQualityConstraints fix and wikiversions bump for wikidatawiki to mwdebug1001 and mwdebug1002 for verification (T206161)
  • 20:18 robh: optic swap on cr4-ulsfo:et-0/0/1
  • 20:03 bblack: upgraded gdnsd to 2.99.9161 on multatuli
  • 19:40 bblack: upgraded gdnsd to 2.99.9161 on authdns2001
  • 19:35 bblack: uploaded 2.99.9161-beta-1+wmf1 to stretch-wikimedia
  • 19:33 mateusbs17: running initial osm import in maps1004
  • 19:23 dduvall@deploy1001: Synchronized php: rollback group1 to 1.32.0-wmf.23 (duration: 00m 54s)
  • 19:18 dduvall@deploy1001: rebuilt and synchronized wikiversions files: rollback group1 to 1.32.0-wmf.23
  • 19:15 marxarelli: rolling back group1 after rapid rise in fatals
  • 19:14 dduvall@deploy1001: scap failed: average error rate on 6/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 18:49 RoanKattouw: Deployed patches for T206130
  • 18:36 papaul: reinstalling OS on lvs2010
  • 18:16 mutante: lvs2010 - schduled downtime for host and services for 12 hours for reinstall
  • 18:09 mutante: lvs2009 - schedule downtime in icinga for 4 hours, reinstall in progress
  • 18:08 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@d5bab41]: Bump cirrusSearchLinksUpdate concurrency to 20 (duration: 00m 57s)
  • 18:07 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@d5bab41]: Bump cirrusSearchLinksUpdate concurrency to 20
  • 18:07 XioNoX: disable ulsfo Zayo transit/transport links
  • 17:42 XioNoX: re-enable cr1-eqiad:ae1 - T201145
  • 17:28 XioNoX: start of recabling asw2-a-eqiad between asw and cr1 - T201145
  • 17:26 XioNoX: disable cr1-eqiad:ae1 - T201145
  • 17:10 papaul: reinstalling OS on lvs2009
  • 16:24 reedy@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/Flow/: fixup flow exporting T203424 (duration: 01m 03s)
  • 15:45 ejegg: updated fundraising CiviCRM from e3e1963915 to 80cb98e33e
  • 14:42 jynus: fixed some prometheus metrics grants on dbstore1001:3306, db1116:3317 and db1116:3318
  • 14:07 banyek: converting wikidatawiki.change_tag to TokuDB on host dbstrore1002 (T205544)
  • 12:54 urandom: DROP unused RESTBase tables - T204752
  • 12:26 stephanebisson: Finished mwscript extensions/ORES/maintenance/BackfillPageTriageQueue.php --wiki enwiki (T203286)
  • 12:12 stephanebisson: Starting mwscript extensions/ORES/maintenance/BackfillPageTriageQueue.php --wiki enwiki (T203286)
  • 11:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Don't purge articlequality, draftquality scores (T203286) (duration: 00m 57s)
  • 11:45 banyek: converting enwiki.slots to TokuDB on host dbstrore1002 (T205544)
  • 11:42 pmiazga@deploy1001: Synchronized wmf-config: SWAT: Remove dead config relating to wgRelatedArticlesEnabledBucketSize (T202306) (duration: 00m 57s)
  • 11:38 arturo: downtime cloudcontrol1003,1004 for 2h for T203177
  • 11:30 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create eliminator group at Vietnamese Wikibooks (T202207) (duration: 00m 58s)
  • 11:25 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix a typo in zhwikiversitys importsources definition (T201328) (duration: 00m 57s)
  • 11:20 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Fix a typo in lift account creation cap for cswiki event (T206119) (duration: 00m 56s)
  • 10:41 jynus: start compressing dbstore1001:x1 tables
  • 09:26 jynus: reducing io overhead temporarilly in exchange for crash safety for s5 replicas T184805
  • 09:23 jynus: fixing replication filters on dbstore1002 (again)
  • 08:34 jynus: fixing replication filters on dbstore1002
  • 08:18 jynus: starting importing of certain s3 wikis into eqiad s5 master T184805
  • 07:51 jynus: deploying replication filtes to s5 at labsdb1009/10/11 and dbstore1002 T184805
  • 07:06 mholloway-shell@deploy1001: Finished deploy [kartotherian/deploy@27062b4] (maps1004): Specify WDQS endpoint at wdqs.discovery.wmnet in the service config (T205607) (duration: 00m 28s)
  • 07:05 mholloway-shell@deploy1001: Started deploy [kartotherian/deploy@27062b4] (maps1004): Specify WDQS endpoint at wdqs.discovery.wmnet in the service config (T205607)
  • 06:42 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2055 (duration: 00m 55s)
  • 06:37 marostegui: Deploy schema change on db2055 - T205913
  • 06:37 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2055 (duration: 00m 56s)
  • 06:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2085:3311 (duration: 00m 56s)
  • 05:59 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@e1aab7b]: Request Parsoid HTML version 2.0.0 (0866a07) (duration: 03m 32s)
  • 05:57 marostegui: Deploy schema change on db2085:3311 - T205913
  • 05:56 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@e1aab7b]: Request Parsoid HTML version 2.0.0 (0866a07)
  • 05:55 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2085:3311 (duration: 00m 58s)
  • 05:26 marostegui: Deploy schema change on db1067 (s1 eqiad master), lag will be generated - T205913
  • 05:25 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2070 (duration: 00m 57s)
  • 05:24 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/languages/Language.php: T206030 - I985dfa3eb17 (duration: 00m 56s)
  • 05:21 marostegui: Deploy schema change on db1075 (s3 eqiad master), lag will be generated - T205913
  • 05:20 marostegui: Deploy schema change on db2070 - T205913
  • 05:20 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2070 (duration: 00m 56s)
  • 04:45 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/NavigationTiming: T205580 - I04c52658fbf6d (duration: 01m 03s)
  • 00:42 Amir1: Evening SWAT is done
  • 00:41 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/GlobalPreferences/resources/ext.GlobalPreferences.global.ooui.js: SWAT: Fail gracefully if we failed to find associated widget (T205991) (duration: 00m 57s)
  • 00:38 mutante: icinga1001 (not prod yet), removing all icinga packages, running puppet to reinstall them, debugging dpkg issue
  • 00:19 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/GlobalPreferences/resources/ext.GlobalPreferences.global.ooui.js: SWAT: Fail gracefully if we failed to find associated widget (T205991) (duration: 00m 55s)

2018-10-02

  • 23:54 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/PageTriage/i18n/en.json: SWAT: Align copyvio log terminology (T199359) (duration: 00m 56s)
  • 23:38 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/PageTriage/modules/ext.pageTriage.views.list/ext.pageTriage.listControlNav.underscore: SWAT: Hide copyvio, none afc filter options behind flag (T205918) (duration: 00m 56s)
  • 23:33 ejegg: updated fundraising CiviCRM from c353eba283 to e3e1963915
  • 23:26 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/ORES/tests/phpunit/includes/HooksTest.php: SWAT: Disable RCFilters in tests (duration: 00m 54s)
  • 23:16 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/FlaggedRevs/frontend/specialpages/reports/ProblemChanges_body.php: SWAT: Fix using the old index when new indexes are not there (T205904) (duration: 00m 57s)
  • 22:53 shdubsh: powercycling icinga1001 after removing problematic entry from fstab
  • 22:26 gtirloni: labstore2003 re-started service block_sync
  • 21:39 XioNoX: Fix unused vlans XLink1/2 on asw2-a5
  • 21:15 banyek: enabling puppet on es2001
  • 21:12 banyek: re-enabling and starting backups on host es2001 (TT205257)
  • 21:01 gtirloni: labstore2003 stopped service block_sync
  • 20:15 dduvall@deploy1001: Finished scap: group0 to php-1.32.0-wmf.24 (duration: 33m 00s)
  • 20:04 Jeff_Green: authdns-update to deploy new IP for frbast2001.frack.eqiad.wmnet
  • 19:50 XioNoX: update prefix-list fundraising-codfw-internal4 to /24 on pfw3-codfw - T204271
  • 19:42 dduvall@deploy1001: Started scap: group0 to php-1.32.0-wmf.24
  • 19:36 dduvall@deploy1001: Pruned MediaWiki: 1.32.0-wmf.19 (duration: 07m 25s)
  • 19:21 XioNoX: update fw policies on pfw3-eqiad - T204271
  • 19:19 XioNoX: update fw policies on pfw3-codfw - T204271
  • 18:39 XioNoX: replace 10.195.0.73/29 with 10.195.0.65/28 on pfw3-codfw - T204271
  • 18:26 XioNoX: remove old 10.195.0.65/29 from pfw3-codfw - T204271
  • 18:24 jynus: restarting ferm on dbstore2002 T205257
  • 18:08 arlolra: Updated Parsoid to 65d6f82 (T163438, T205674, T205673)
  • 18:07 ariel@deploy1001: Finished deploy [dumps/dumps@a9570fb]: fix incr dumps multiversion conf setting (duration: 00m 06s)
  • 18:07 ariel@deploy1001: Started deploy [dumps/dumps@a9570fb]: fix incr dumps multiversion conf setting
  • 18:01 arlolra@deploy1001: Finished deploy [parsoid/deploy@19053a3]: Updating Parsoid to 65d6f82 (duration: 10m 44s)
  • 17:51 arlolra@deploy1001: Started deploy [parsoid/deploy@19053a3]: Updating Parsoid to 65d6f82
  • 17:37 XioNoX: update NAT for frbast2001 on pfw3-codfw - T204271
  • 17:25 XioNoX: update fw policies on pfw3-eqiad - T204271
  • 17:22 XioNoX: update fw policies on pfw3-codfw - T204271
  • 17:22 andrewbogott: upgraded wikitech-static to remotes/origin/REL1_31
  • 17:18 andrewbogott: upgrading debian packages and MediaWiki version on wikitech-static
  • 16:53 jynus: setup test s3 replication channel on db1110 (filtered)
  • 16:49 XioNoX: assign 10.195.0.129/29 to pfw3-codfw:reth0.2133 - T204271
  • 16:38 cmjohnson1: swapping failed disk db1067 T205780
  • 16:04 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@093551f]: Increase cirrusSearchLinksUpdate concurrency (duration: 01m 06s)
  • 16:03 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@093551f]: Increase cirrusSearchLinksUpdate concurrency
  • 15:50 marxarelli: cutting 1.32.0-wmf.24 branch
  • 15:33 gehel: cleanup old cronjob (cleanup GC logs) on all elasticsearch servers
  • 15:24 akosiaris: upgrade mathoid chart version to 0.0.11
  • 15:24 akosiaris@deploy1001: scap-helm mathoid finished
  • 15:23 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 15:23 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 15:23 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
  • 15:21 akosiaris@deploy1001: scap-helm mathoid finished
  • 15:21 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 15:21 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 15:21 akosiaris@deploy1001: scap-helm mathoid upgrade -h [namespace: mathoid, clusters: eqiad,codfw]
  • 14:11 banyek: powering off dbstore2002.codfw.wmnet for BBU change (T205257)
  • 13:47 marostegui: Deploy schema change on s4 eqiad, this will generate lag on eqiad - T205913
  • 13:06 marostegui: Deploy schema change on s7 eqiad, this will generate lag on eqiad - T205913
  • 12:47 banyek: converting enwiki.content to TokuDB on host dbstrore1002 (T205544)
  • 12:47 banyek: converting enwiki.contents to TokuDB on host dbstrore1002 (T205544)
  • 11:58 banyek: converting wikidatawiki.slots to TokuDB on host dbstrore1002 (T205544)
  • 11:41 arturo: downtime labstore1007 load check in icinga for 1d
  • 11:21 zeljkof: EU SWAT finished
  • 11:19 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/FlaggedRevs/frontend/specialpages/reports/ProblemChanges_body.php: SWAT: Use proper index on change_tag table (T205904) (duration: 00m 57s)
  • 10:58 mobrovac@deploy1001: Synchronized rpc/RunSingleJob.php: RunSingleJob: Delay job execution while in read-only mode - T204154 (duration: 00m 57s)
  • 10:34 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2092 (duration: 00m 56s)
  • 10:24 marostegui: Deploy schema change on db2092 - T203709
  • 10:24 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2092 (duration: 00m 56s)
  • 09:30 marostegui: Deploy schema change on s2 eqiad master, lag will be generated T205913
  • 08:43 banyek: disabling puppet on es2001 and disabling backups too
  • 08:28 marostegui: Deploy schema change on s6 eqiad master, lag will be generated T205913
  • 08:16 jynus: test recover some s3 wiki data onto db1110 (s5)
  • 08:04 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1110 (duration: 00m 56s)
  • 08:04 marostegui: Deploy schema change on s5 eqiad master, lag will be generated T205913
  • 08:01 banyek: converting wikidatawiki.content to TokuDB on host dbstrore1002 (T205544)
  • 07:54 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2071 (duration: 00m 55s)
  • 07:50 marostegui: Deploy schema change on db2071 T205913
  • 07:50 mholloway-shell@deploy1001: Finished deploy [tilerator/deploy@6c80537] (maps1004): Disable event logging requests and remove HTTP proxy (duration: 00m 17s)
  • 07:49 mholloway-shell@deploy1001: Started deploy [tilerator/deploy@6c80537] (maps1004): Disable event logging requests and remove HTTP proxy
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2071 (duration: 00m 56s)
  • 07:48 mholloway-shell@deploy1001: Finished deploy [kartotherian/deploy@0bf513a] (maps1004): Remove HTTP proxy (duration: 00m 16s)
  • 07:48 mholloway-shell@deploy1001: Started deploy [kartotherian/deploy@0bf513a] (maps1004): Remove HTTP proxy
  • 07:42 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2088:3311 (duration: 00m 56s)
  • 07:36 marostegui: Deploy schema change on db2088:3311 T205913
  • 07:36 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2088:3311 (duration: 00m 55s)
  • 07:32 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2072 (duration: 00m 55s)
  • 07:18 marostegui: Deploy schema change on db2072 T205913
  • 07:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2072 (duration: 01m 02s)
  • 05:22 _joe_: stopped tilerator on maps1004, was spamming like crazy
  • 01:18 ejegg: updated CiviCRM from e7a620a00c to c353eba283

2018-10-01

  • 23:44 eileen: update process control revision is b9c7ab286e - define but not enable Redis
  • 23:43 foks: disabling 2FA for two users
  • 23:31 twentyafterfour: finished creating database tables
  • 23:18 twentyafterfour: creating ipblocks_restrictions table (command run on mwmaint2001: foreachwiki sql.php maintenance/archives/patch-ipblocks_restrictions-table.sql)
  • 22:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@babfe80]: Don't log the request for transform failures, take 3, feeds check timeouts (duration: 06m 22s)
  • 22:46 ppchelko@deploy1001: Started deploy [restbase/deploy@babfe80]: Don't log the request for transform failures, take 3, feeds check timeouts
  • 22:45 ppchelko@deploy1001: Finished deploy [restbase/deploy@babfe80]: Don't log the request for transform failures, take 2, feeds check timeouts (duration: 03m 57s)
  • 22:41 ppchelko@deploy1001: Started deploy [restbase/deploy@babfe80]: Don't log the request for transform failures, take 2, feeds check timeouts
  • 22:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@babfe80]: Don't log the request for transform failures (duration: 12m 27s)
  • 22:29 ppchelko@deploy1001: Started deploy [restbase/deploy@babfe80]: Don't log the request for transform failures
  • 21:17 arlolra: Updated Parsoid to 224ecde (T198504, T133673, T202666)
  • 20:45 arlolra@deploy1001: Finished deploy [parsoid/deploy@8ff45db]: Updating Parsoid to 224ecde (duration: 08m 22s)
  • 20:37 arlolra@deploy1001: Started deploy [parsoid/deploy@8ff45db]: Updating Parsoid to 224ecde
  • 20:35 gehel@deploy1001: Finished deploy [wdqs/wdqs@a637583]: New version of WDQS GUI, updater and blazegraph (duration: 14m 00s)
  • 20:21 gehel@deploy1001: Started deploy [wdqs/wdqs@a637583]: New version of WDQS GUI, updater and blazegraph
  • 19:52 gehel@deploy1001: Finished deploy [wdqs/wdqs@a637583]: New version of WDQS GUI, updater and blazegraph (wdqs1009 only) (duration: 00m 30s)
  • 19:51 gehel@deploy1001: Started deploy [wdqs/wdqs@a637583]: New version of WDQS GUI, updater and blazegraph (wdqs1009 only)
  • 19:27 ppchelko@deploy1001: Finished deploy [restbase/deploy@7caf4d8]: Content-negotiation filter going live T128040 (duration: 03m 38s)
  • 19:24 ppchelko@deploy1001: Started deploy [restbase/deploy@7caf4d8]: Content-negotiation filter going live T128040
  • 19:11 thcipriani: restarting ci jenkins for new plugins
  • 18:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable page issues A/B test at 20% rate (T200792) (duration: 00m 56s)
  • 18:28 Amir1: ladsgroup@mwmaint2001:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=enwiki --prefix (T201009)
  • 18:23 catrope@deploy1001: Synchronized php-1.32.0-wmf.23/maintenance/includes/DeleteLocalPasswords.php: T201009 (duration: 00m 56s)
  • 18:17 catrope@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/PageTriage/: Ensure valid AFC option is selected (T205324, T205168); hide copyvio behind a global var and URL param (duration: 00m 57s)
  • 18:12 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable page issues A/B test at 5% rate (T200792) (duration: 00m 59s)
  • 17:59 XioNoX: push fw change on pfw3-eqiad - T205888
  • 17:57 XioNoX: push fw change on pfw3-codfw - T205888
  • 17:28 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@a637583]: Test deployment for recent updater build and GUI changes. Also blazegraph updates(wdqs1009) (duration: 01m 46s)
  • 17:27 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@a637583]: Test deployment for recent updater build and GUI changes. Also blazegraph updates(wdqs1009)
  • 17:06 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1093, db1064 (duration: 00m 57s)
  • 17:02 jynus: stopping some mariadb instances on dbstore1001 and starting compression T201392
  • 16:26 ppchelko@deploy1001: Started restart [cpjobqueue/deploy@58f9ed3]: Fix KafkaConsumer not connected error
  • 15:16 jynus: stopping db1064 to clone it to dbstore1001
  • 15:00 akosiaris: upgrade etherpad to 1.7.0-2
  • 14:14 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting MCR migration stage to write-both/read-new on mediawikiwiki (T198308) (duration: 00m 56s)
  • 13:51 banyek: Downtimed the slave lag monitoring on dbstore1002 while the tables getting converted (T205544)
  • 12:38 akosiaris: upload hfst_3.13.0~r3461-1+wmf2 to apt.wikimedia.org/jessie-wikimedia/main. T199962
  • 12:26 banyek: converting enwiki.categorylinks to TokuDB on host dbstrore1002 (T205544)
  • 12:19 banyek: stopping replication on s2@dbstore20002: the tables being compressed (T204930)
  • 12:19 banyek: stopping replication on s2@dbstore20002: the tables being compressed
  • 12:15 banyek: enabling puppet on labsdb1009, labsdb1010, labsdb1011 (T183983)
  • 12:13 zeljkof: EU SWAT finished
  • 12:12 zfilipin@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/ContentTranslation/: SWAT: Fix error in CXTransclusionNode#afterRender method (T205521) (duration: 00m 59s)
  • 11:56 jynus: stopping db1093 to clone it to dbstore1001
  • 11:52 arturo: install prometheus-openstack-exporte 0.0.8-3 in reprepro T203177
  • 11:41 zfilipin@deploy1001: Synchronized wmf-config: SWAT: Remove unused default source language config for CX (duration: 00m 57s)
  • 11:16 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2058 (duration: 00m 55s)
  • 11:09 _joe_: killed bash runner.sh by user ladsgroup on mwmaint2001
  • 10:58 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2058 (duration: 00m 57s)
  • 10:52 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093, db1064 (duration: 00m 57s)
  • 10:42 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:41 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 59s)
  • 10:21 godog: repair /dev/sdf1 /dev/sde1 on ms-be1041 - T199198
  • 10:15 Amir1: ladsgroup@mwmaint2001:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --prefix on all CentralAuth wikis (T201009)
  • 10:10 Amir1: mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=fawiki --delete (T201009)
  • 09:33 godog: test formatting sdh and sdi on ms-be2040 with crc=0 - T199198
  • 09:15 volans: Set Racktables in read-only mode - T199083
  • 08:56 _joe_: rolling restart of parsoid in codfw; afterwards, parsoid will connect to the MediaWiki API via HTTPS
  • 08:54 _joe_: rolling restart of parsoid in eqiad
  • 07:54 banyek: disabling puppet on labsdb1009, labsdb1010, labsdb1011 (T183983)
  • 07:54 banyek: disabling puppet on labsdb1009, labsdb1010, labsdb1011
  • 07:00 mholloway-shell@deploy1001: Finished deploy [kartotherian/deploy@ab6cb74] (maps1004): Update kartotherian to latest (T205462) (duration: 00m 16s)
  • 07:00 mholloway-shell@deploy1001: Started deploy [kartotherian/deploy@ab6cb74] (maps1004): Update kartotherian to latest (T205462)
  • 06:39 mholloway-shell@deploy1001: Finished deploy [tilerator/deploy@22f90ee] (maps1004): Update tilerator to latest (T205462) (duration: 00m 19s)
  • 06:39 mholloway-shell@deploy1001: Started deploy [tilerator/deploy@22f90ee] (maps1004): Update tilerator to latest (T205462)
  • 05:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3312 (duration: 00m 56s)
  • 05:19 marostegui: Stop replication on dbstore1002 and db1103:3312 in sync
  • 05:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 (duration: 01m 01s)
  • 05:19 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@07cbfb4]: Update mobileapps to a1fa41b (duration: 03m 18s)
  • 05:15 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@07cbfb4]: Update mobileapps to a1fa41b
  • 05:07 marostegui: Deploy schema change on s1 codfw msater - T203709
  • 03:21 onimisionipe: restarting inplace reindexing of enwiki and viwiki at codfw - T204362


Archives

See Server admin log/Archives.