You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Release Engineering/SAL: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf (cscott))
imported>Labslogbot
(deployment-prep recreating deployment-sca01 for T125003 (mobrovac))
Line 1: Line 1:
== 2016-01-28 ==
* 01:28 mobrovac: deployment-prep recreating deployment-sca01 for T125003
* 00:36 mobrovac: deployment-prep re-imaging deployment-mathoid for T125003
* 00:02 jzerebecki: integration-slave-trusty-1016:~$ sudo -i rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/skins/Donate
== 2016-01-27 ==
* 23:49 jzerebecki: integration-slave-precise-1011:~$ sudo -i /etc/init.d/salt-minion restart
* 23:46 jzerebecki: work around https://phabricator.wikimedia.org/T117710 : salt --show-timeout '*slave*' cmd.run 'rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/skins/BlueSky'
* 21:19 cscott: updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf (should be no-op after yesterday's deploy)
* 10:29 hashar: triggered bunch of browser tests, deployment-redis01 was dead/faulty
* 10:08 hashar: mass restarting redis-server process on deployment-redis01 (for https://phabricator.wikimedia.org/T124677 )
* 10:07 hashar: mass restarting redis-server process on deployment-redis01
* 09:00 hashar: beta:  commenting out "latency-monitor-threshold 100" parameter from any /etc/redis/redis.conf we have ( https://phabricator.wikimedia.org/T124677 ). Puppet will not reapply it unless distribution is Jessie
== 2016-01-26 ==
== 2016-01-26 ==
* 16:51 cscott: updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf
* 16:51 cscott: updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf

Revision as of 01:28, 28 January 2016

2016-01-28

  • 01:28 mobrovac: deployment-prep recreating deployment-sca01 for T125003
  • 00:36 mobrovac: deployment-prep re-imaging deployment-mathoid for T125003
  • 00:02 jzerebecki: integration-slave-trusty-1016:~$ sudo -i rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/skins/Donate

2016-01-27

  • 23:49 jzerebecki: integration-slave-precise-1011:~$ sudo -i /etc/init.d/salt-minion restart
  • 23:46 jzerebecki: work around https://phabricator.wikimedia.org/T117710 : salt --show-timeout '*slave*' cmd.run 'rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/skins/BlueSky'
  • 21:19 cscott: updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf (should be no-op after yesterday's deploy)
  • 10:29 hashar: triggered bunch of browser tests, deployment-redis01 was dead/faulty
  • 10:08 hashar: mass restarting redis-server process on deployment-redis01 (for https://phabricator.wikimedia.org/T124677 )
  • 10:07 hashar: mass restarting redis-server process on deployment-redis01
  • 09:00 hashar: beta: commenting out "latency-monitor-threshold 100" parameter from any /etc/redis/redis.conf we have ( https://phabricator.wikimedia.org/T124677 ). Puppet will not reapply it unless distribution is Jessie

2016-01-26

  • 16:51 cscott: updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf
  • 12:14 hashar: Added Jenkins IRC bot (wmf-insecte) to #wikimedia-perf for https://gerrit.wikimedia.org/r/#/c/265631/
  • 09:30 hashar: restarting Jenkins to upgrade the gearman plugin with https://review.openstack.org/#/c/271543/
  • 04:18 bd808: integration-slave-jessie-1001:/mnt full; cleaned up 15G of files in /mnt/pbuilder/build (27 hours after the last time I did that)

2016-01-25

  • 18:59 twentyafterfour: started redis-server on deployment-redis01 by commenting out latency-monitor-threshold from the redis.conf
  • 15:22 hashar: CI: fixing kernels not upgrading via: rm /boot/grub/menu.lst ; update-grub -y (i.e.: regenerate the Grub menu from scratch)
  • 14:21 hashar: integration-slave-trusty-1015.integration.eqiad.wmflabs is gone. I have failed the kernel upgrade / grub update
  • 01:35 bd808: integration-slave-jessie-1001:/mnt full; cleaned up 15G of files in /mnt/pbuilder/build

2016-01-24

2016-01-22

  • 23:58 legoktm: removed skins from mwext-qunit workspace on trusty-1013 slave
  • 23:34 legoktm: rm -rf /mnt/jenkins-workspace/workspace/mediawiki-phpunit-php53 on slave precise 1012
  • 22:45 legoktm: deploying https://gerrit.wikimedia.org/r/265864
  • 22:27 hashar: rebooted all CI slaves using OpenStackManager
  • 22:09 hashar: rebooting deployment-redis01 (kernel upgrade)
  • 21:22 hashar: Image ci-jessie-wikimedia-1453497269 in wmflabs-eqiad is ready (with node 4.2 for https://phabricator.wikimedia.org/T119143 )
  • 21:14 hashar: updating nodepool snapshot based on new image
  • 21:12 hashar: rebuilding nodepool reference image
  • 20:04 hashar: Image ci-jessie-wikimedia-1453492820 in wmflabs-eqiad is ready
  • 20:00 hashar: Refreshing nodepool image to hopefully get Nodejs 4.2.4 https://phabricator.wikimedia.org/T124447 https://gerrit.wikimedia.org/r/#/c/265802/
  • 16:32 hashar: Nuked corrupted git repo on integration-slave-precise-1012 /mnt/jenkins-workspace/workspace/mediawiki-extensions-php53
  • 12:23 hashar: beta: reinitialized keyholder on deployment-bastion. The proxy apparently had no identity
  • 09:32 hashar: beta cluster Jenkins job have been stalled for 9hours and 25 minutes. Disabling/reenabling the Gearman plugin to remove the deadlock

2016-01-21

  • 21:41 hashar: restored role::mail::mx on deployment-mx
  • 21:36 hashar: dropping role::mail::mx from deployment-mx to let puppet run
  • 21:33 hashar: rebooting deployment-jobrunner01 / kernel upgrade / /tmp is only 1MBytes
  • 21:19 hashar: fixing up deployment-jobrunner01 /tmp and / disks are full
  • 19:57 thcipriani: ran REPAIR TABLE globalnames; on centralauth db
  • 19:48 legoktm: deploying https://gerrit.wikimedia.org/r/265552
  • 19:39 legoktm: deploying jjb changes for https://gerrit.wikimedia.org/r/264990
  • 19:25 legoktm: deploying https://gerrit.wikimedia.org/r/265546
  • 01:59 jzerebecki: jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions/SpellingDictionary$ rm -r modules/jquery.uls && git rm modules/jquery.uls
  • 01:00 jzerebecki: jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions$ git pull && git submodule update --init --recursive
  • 00:57 jzerebecki: jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions$ git reset HEAD SpellingDictionary

2016-01-20

  • 20:05 hashar: beta sudo find /data/project/upload7/math -type f -delete (probably some old left over)
  • 19:50 hashar: beta: on commons ran deleteArchivedFile.php : Nuked 7130 files
  • 19:49 hashar: beta : foreachwiki deleteArchivedRevisions.php -delete
  • 19:26 hasharAway: Nuked all files from http://commons.wikimedia.beta.wmflabs.org/wiki/Category:GWToolset_Batch_Upload
  • 19:19 hasharAway: beta: sudo find /data/project/upload7/*/*/temp -type f -delete
  • 19:14 hasharAway: beta: sudo rm /data/project/upload7/*/*/lockdir/*
  • 18:57 hasharAway: beta cluster code has been stalled for roughly 2h30
  • 18:55 hasharAway: disconnecting Gearman plugin to remove deadlock for beta cluster rjobs
  • 17:06 hashar: clearing files from beta-cluster to prepare for Swift migration. python pwb.py delete.py -family:betacommons -lang:en -cat:'GWToolset Batch Upload' -verbose -putthrottle:0 -summary:'Clearing out old batched upload to save up disk space for Swift migration'

2016-01-19

2016-01-17

2016-01-16

2016-01-15

  • 12:17 hashar: restarting Jenkins for plugins updates
  • 02:49 bd808: Trying to fix submodules in deployment-bastion:/srv/mediawiki-staging/php-master/extensions for T123701

2016-01-14

2016-01-13

  • 21:06 hashar: beta cluster code is up to date again. Got delayed by roughly 4 hours.
  • 20:55 hashar: unlocked Jenkins jobs for beta cluster by disabling/reenabling Jenkins Gearman client
  • 10:15 hashar: beta: fixed puppet on deployment-elastic06 . Was still using cert/hostname without .deployment-prep. .... Mass update occurring.

2016-01-12

2016-01-11

  • 22:24 hashar: Deleting old references on Zuul-merger for mediawiki/core : /usr/share/python/zuul/bin/python /home/hashar/zuul-clear-refs.py --until 15 /srv/ssd/zuul/git/mediawiki/core
  • 22:21 hashar: gallium in /srv/ssd/zuul/git/mediawiki/core$ git gc --prune=all && git remote update --prune
  • 22:21 hashar: scandium in /srv/ssd/zuul/git/mediawiki/core$ git gc --prune=all && git remote update --prune
  • 07:35 legoktm: deploying https://gerrit.wikimedia.org/r/263319

2016-01-07

2016-01-06

  • 21:13 thcipriani: kicking integration puppetmaster, weird node unable to find definition.
  • 21:11 jzerebecki: on scandium: sudo -u zuul rm -rf /srv/ssd/zuul/git/mediawiki/services/mathoid
  • 21:04 legoktm: ^ on gallium
  • 21:04 legoktm: manually deleted /srv/ssd/zuul/git/mediawiki/services/mathoid to force zuul to re-clone it
  • 20:17 hashar: beta: dropped a few more /etc/apt/apt.conf.d/*-proxy files. webproxy is no more reachable from labs
  • 09:44 hashar: CI/beta: deleting all git tags from /var/lib/git/operations/puppet and doing git repack
  • 09:39 hashar: restoring puppet hacks on beta cluster puppetmaster.
  • 09:35 hashar: beta/CI: salt -v '*' cmd.run 'rm -v /etc/apt/apt.conf.d/*-proxy' https://phabricator.wikimedia.org/T122953

2016-01-05

2016-01-04

2016-01-02

  • 03:17 yurik: purged varnishs on deployment-cache-text04

2016-01-01

  • 22:17 bd808: No nodepool ci-jessie-* hosts seen in Jenkins interface and rake-jessie jobs backing up

Archive