You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Server Admin Log

From Wikitech-static
Revision as of 20:24, 5 January 2019 by imported>Elukey (→‎2019-01-05)
Jump to navigation Jump to search

2019-01-05

  • 20:23 elukey: manually clean up of big logs under /var/log/.. on analytics-tool1002 due to root partition almost filled up

2019-01-04

  • 23:07 mutante: scandium apt-get remove nodejs nodes-legacy ; puppet agent -tv - after merging gerrit:482150 this fixed "you have held broken packages" issue, now we are at a puppet dependecy cycle with apt::pin T201366
  • 15:42 bawolff@deploy1001: Synchronized private/PrivateSettings.php: T212667 - More aggressive anti-spam measures for account creation on kowiki (duration: 00m 48s)
  • 14:08 moritzm: rebooting etcd1001-1003 to pick up SSBD-enabled qemu
  • 13:52 moritzm: rebooting etcd1004-1006 to pick up SSBD-enabled qemu
  • 13:33 moritzm: rebooting kubernetes staging etcd hosts to pick up SSBD-enabled qemu
  • 13:11 moritzm: rebooting kubernetes staging master to pick up SSBD-enabled qemu
  • 12:57 moritzm: rebooting kubernetes staging workers for kernel security update
  • 11:58 moritzm: installing libsndfile security updates
  • 11:33 moritzm: installing jasper security updates
  • 11:31 moritzm: installing libdatetime-timezone-perl updates for recent tz changes
  • 10:47 arturo: T212898 reimaging cloudvirt1024 as stretch
  • 10:46 moritzm: rolling restart of swift proxies to pick up OpenSSL update
  • 09:57 jijiki: restarting thumbor services to pick up 481141
  • 09:50 onimisionipe: restarting nginx on all wdqs hosts
  • 09:40 banyek: executing schema change on dbstore1002 - T85757
  • 09:13 moritzm: restarting nginx on puppetdb hosts to pick up new OpenSSL
  • 09:03 banyek: executing schema change on db1116 - T85757
  • 08:44 moritzm: restarting nginx on francium to pick up new OpenSSL
  • 08:16 elukey: restart eventlogging daemons on eventlog1002 to pick up openssl updates
  • 07:56 moritzm: installing OpenSSL security updates
  • 00:07 mutante: an-coord1001 - apt-get clean to free disk space, reacting to Icinga alert for running out of disk

2019-01-03

  • 23:08 volans: restarted pdfrender on scb1004
  • 22:29 volans: restarted all slaves on dbstore1002 (relayed from banyek)
  • 22:14 banyek: stopping all slaves on dbstore1002 (NOT labsdb)
  • 22:14 banyek: stopping all slaves on labsdb1002
  • 20:50 reedy@deploy1001: Synchronized multiversion/MWMultiVersion.php: Fix error for testcommons (duration: 00m 44s)
  • 20:46 reedy@deploy1001: Synchronized dblists/group0.dblist: Add testcommonswiki to group0 (duration: 00m 43s)
  • 20:43 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Updating interwiki cache (duration: 02m 05s)
  • 20:24 reedy@deploy1001: Synchronized wmf-config/db-codfw.php: T197616 (duration: 00m 44s)
  • 20:23 reedy@deploy1001: Synchronized wmf-config/db-eqiad.php: T197616 (duration: 00m 44s)
  • 20:13 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T197616 (duration: 00m 44s)
  • 20:12 reedy@deploy1001: Synchronized multiversion/MWMultiVersion.php: T197616 (duration: 00m 44s)
  • 20:11 reedy@deploy1001: rebuilt and synchronized wikiversions files: T197616
  • 20:09 reedy@deploy1001: Synchronized dblists/: T197616 (duration: 00m 45s)
  • 18:51 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@1182b3b]: Update mobileapps to f6ad0e5: Set timeout for backend /page/html requests, part 2 (duration: 05m 27s)
  • 18:46 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@1182b3b]: Update mobileapps to f6ad0e5: Set timeout for backend /page/html requests, part 2
  • 18:37 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@c470ed2]: Update mobileapps to f6ad0e5: Set timeout for backend /page/html requests (duration: 04m 11s)
  • 18:33 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@c470ed2]: Update mobileapps to f6ad0e5: Set timeout for backend /page/html requests
  • 18:21 volans: restart pdfrender on scb1003
  • 17:58 ariel@deploy1001: Finished deploy [dumps/dumps@10dc8ad]: return properly if commands failed (duration: 00m 08s)
  • 17:58 ariel@deploy1001: Started deploy [dumps/dumps@10dc8ad]: return properly if commands failed
  • 16:32 XioNoX: remove old 10.64.22.0/24 IPs from cloud-instance-transport1-b-eqiad - T207663
  • 16:22 moritzm: rebooting kubernetes workers in eqiad for kernel security update
  • 16:02 arturo: reimaging cloudvirt1013 cloudvirt1026-1028 to stretch
  • 15:48 moritzm: restart parsoid on wtp1025 to pick up OpenSSL update for nodejs
  • 15:43 jijiki: Enabled puppet on mw servers after merging 481796 - T197616
  • 15:31 jijiki: Disabling puppet on mw servers to test 481796 - T197616
  • 15:14 ejegg: updated Fundraising CiviCRM from b33dcd3c94 to bcb4b7a7d1
  • 14:37 moritzm: rebooting kubernetes workers in codfw for kernel security update
  • 14:37 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1101:3317 after schema change - T85757 (duration: 00m 44s)
  • 14:32 banyek: repooling db1101:3317 after schema change - T85757
  • 14:21 moritzm: rebooting kubernetes masters in eqiad to pick up SSBD-enabled qemu
  • 14:14 moritzm: rebooting kubernetes mastes in codfw to pick up SSBD-enabled qemu
  • 14:05 arturo: T209616 reimage cloudvirt1029 as debian stretch
  • 13:43 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1101:3317 for schema change - T85757 (duration: 00m 44s)
  • 13:41 banyek: depooling db1101:3317 for schema change - T85757
  • 13:38 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1098:3317 after schema change - T85757 (duration: 00m 44s)
  • 13:34 banyek: repooling db1098:3317 after schema change - T85757
  • 13:24 kartik@deploy1001: Finished deploy [cxserver/deploy@3b2ede7]: Update cxserver to 2369a18 (duration: 04m 30s)
  • 13:20 kartik@deploy1001: Started deploy [cxserver/deploy@3b2ede7]: Update cxserver to 2369a18
  • 12:58 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1098:3317 for schema change - T85757 (duration: 00m 45s)
  • 12:55 banyek: depooling db1098:3317 for schema change - T85757
  • 12:54 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1094 after schema change - T85757 (duration: 00m 45s)
  • 12:49 banyek: repooling db1094 after schema change - T85757
  • 12:41 arturo: T212302 reimaging again cloudvirt1030 to test final puppet code
  • 12:33 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1094 for schema change - T85757 (duration: 00m 46s)
  • 12:28 banyek: depooling db1094 for schema change - T85757
  • 12:27 moritzm: restarting tor on torrelay1001 to pick up OpenSSL security update
  • 11:02 _joe_: manually reloading icinga to pick up changes to commands.cfg
  • 10:55 moritzm: installing apache updates on puppetmasters
  • 10:22 moritzm: installing ghostscript security updates on jessie
  • 09:51 elukey: restart memcached on mc1023 to apply -R 200 - T208844
  • 09:46 moritzm: remove imagemagick remnants from ATS hosts (obsoleted by upstream packaging change which dropped the webp plugin)
  • 09:39 moritzm: installing nginx updates on puppetdb*
  • 09:26 banyek@deploy1001: Synchronized wmf-config/db-codfw.php: repool es2019 - T212833 (duration: 01m 33s)
  • 09:18 banyek: repooling es2019 - T212833
  • 08:46 moritzm: rolling restart of proton to pick up OpenSSL update
  • 08:35 banyek: depooled es2019 as host was unsresponsive - T212833
  • 08:35 banyek@deploy1001: Synchronized wmf-config/db-codfw.php: depool es2019, host is unsresponsible - T212833 (duration: 00m 49s)
  • 08:11 moritzm: installing OpenSSL security updates
  • 00:21 mutante: notebook1004 - started nagios-nrpe-server one more time

2019-01-02

  • 23:59 mutante: notebook1004 still keeps running out of memory from some user actions and that kills nagios-nrpe-server and that causes a bunch of Icinga alerts
  • 23:39 mutante: notebook1004 - systemctl start nagios-nrpe-server
  • 23:39 mutante: notebook1004 - systemctl status nagios-nrpe-server
  • 20:59 herron@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,service=parsoid,name=wtp1028.eqiad.wmnet
  • 20:59 herron: repooling wtp1028 T212624
  • 20:52 herron: rebooting wtp1028 — looking for POST errors T212624
  • 20:05 Krinkle: mwmaint1002: foreachwikiindblist s5 deleteEqualMessages.php
  • 20:04 Krinkle: mwmaint1002: foreachwikiindblist s2 deleteEqualMessages.php
  • 18:35 volans: restarting icinga on icinga1001 T212669
  • 16:50 XioNoX: create BGP sessions to AS3214 in AMS-IX
  • 16:46 XioNoX: remove BGP sessions to AS42949 in AMS-IX (leaving the IX)
  • 16:43 XioNoX: remove BGP sessions to AS6866 in AMS-IX (leaving the IX)
  • 16:33 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1090:3317 after schema change - T85757 (duration: 00m 46s)
  • 16:30 arturo: reimaging cloudvirt1030 with stretch, server cleanup after puppet refactoring
  • 16:29 moritzm: restarting Superset to pick up openssl security update
  • 16:25 moritzm: restarting Hue to pick up openssl security update
  • 16:23 arturo: T212302 re-enable puppet in all {cloud,lab}virt* servers, all was fine
  • 16:22 banyek: repooling db1090:3317 after schema change (T85757)
  • 16:11 arturo: T212302 disable puppet in all {cloud,lab}virt* servers to merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/481194/
  • 15:39 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1090:3317 for schema change - T85757 (duration: 00m 44s)
  • 15:34 moritzm: installing OpenSSL security updates
  • 15:31 banyek: depooling db1090:3317 for schema change (T85757)
  • 15:13 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: repool db1086 after schema change - T85757 (duration: 00m 44s)
  • 15:07 banyek: repooling db1086 after schema change (T85757)
  • 14:49 banyek: executing schema change on db1086 - T85757
  • 14:48 moritzm: installing ghostscript security update for jessie
  • 14:47 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: depool db1086 for schema change - T85757 (duration: 00m 45s)
  • 14:38 banyek: depooling db1086 for schema change (T85757)
  • 14:15 ema: cp hosts: upgrade OpenSSL from 1.1.0f to 1.1.0j
  • 13:39 moritzm: installing ghostscript update for stretch
  • 13:33 moritzm: installing libav security updates
  • 13:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1119 T86338 T202167 (duration: 00m 44s)
  • 13:17 moritzm: installing openjpeg2 security updates
  • 13:17 banyek: executing schema change on db2040 (s7 codfw master) replication lag could be expected on codfw - T85757
  • 13:13 banyek: stopping replication on db2077 prior to executing schema change on codfw s7 master (db2040) - T85757
  • 13:06 marostegui: Deploy schema change on db1119 - T86338 T202167
  • 13:05 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1119 T86338 T202167 (duration: 00m 45s)
  • 13:01 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099:3311 T86338 T202167 (duration: 00m 47s)
  • 12:00 moritzm: rebooting labtestpuppetmaster2001 for kernel security update
  • 11:53 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe1006.eqiad.wmnet
  • 11:51 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1006.eqiad.wmnet
  • 11:50 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe1006.codfw.wmnet
  • 11:46 ema: replace TLS certificates on ms-fe eqiad hosts T212215
  • 11:41 moritzm: rebooting labtestweb2001 for kernel security update
  • 11:24 marostegui: Deploy schema change on db1099:3311 - T86338 T202167
  • 11:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1099:3311 T86338 T202167 (duration: 00m 45s)
  • 11:17 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe2006.codfw.wmnet
  • 11:10 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=ms-fe2006.codfw.wmnet
  • 10:59 ema: replace TLS certificates on ms-fe codfw hosts T212215
  • 10:52 moritzm: rebooting centrallog1001 for kernel security update
  • 10:48 volans: testing the new spicerack package on cumin2001, in the unlikely event you need to use spicerack cookbooks today please use cumin1001
  • 10:45 godog: ms-be2018 Flashing Smart Array P840 in Slot 3 [ 3.00 -> 6.60 ]
  • 10:43 moritzm: removed labvirt1013 from debmonitor, got renamed in T212513
  • 10:42 volans: uploaded spicerack_0.0.10-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 10:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2096 (duration: 00m 44s)
  • 09:50 marostegui: Stop MySQL on db2096 for kernel and mysql upgrade
  • 09:49 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2096 (duration: 00m 45s)
  • 09:48 marostegui@deploy1001: sync-file aborted: Depool db2096 (duration: 00m 01s)
  • 09:18 moritzm: installing c3p0 security updates
  • 09:07 Zoranzoki21: Drop valid_tag from s8 by Marostegui - T212254
  • 09:06 godog: eqiad-prod: final weight for ms-be10[44-50].eqiad.wmnet - T209618
  • 08:56 moritzm: installing libarchive security updates
  • 07:38 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1078 - T212692 (duration: 00m 46s)
  • 07:30 marostegui: Fix login.logging table on db1078 - T212692
  • 07:30 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1078 - T212692 (duration: 00m 47s)
  • 07:01 marostegui: Deploy schema change on s1 codfw master (lag will be generated on s1 codfw) - T202167 T86338
  • 06:54 marostegui: Drop empty valid_tag table from labswiki labtestwiki - T212254
  • 06:49 marostegui: Drop empty valid_tag table from s5 - T212254
  • 06:25 marostegui: Drop valid_tag from s6 - T212254
  • 06:15 marostegui: Fix last chunks on db1124:338 - T212574


Archives

See Server admin log/Archives.