You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Release Engineering/SAL
Jump to navigation
Jump to search
2016-02-02
- 00:42 legoktm: due to T125474
- 00:42 legoktm: marked integration-slave-precise-1011 as offline
- 00:39 legoktm: precise-1011 slave hasn't had a puppet run in 6 days
2016-02-01
- 23:53 bd808: Logstash working again; I applied a change to the default mapping template for Elasticsearch that ensures that fields named "timestamp" are indexed as plain strings
- 23:46 bd808: Elasticsearch index template for beta logstash cluster making crappy guesses about syslog events; dropped 2016-02-01 index; trying to fix default mappings
- 23:09 bd808: HHVM logs causing rejections during document parse when inserting in Elasticsearch from logstash. They contain a "timestamp" field that looks like "Feb 1 22:56:39" which is making the mapper in Elasticsearch sad.
- 23:04 bd808: Elasticsearch on deployment-logstash2 rejecting all documents with 400 status. Investigating
- 22:50 bd808: Copying deployment-logstash2.deployment-prep:/var/log/logstash/logstash.log to /srv for debugging later
- 22:48 bd808: deployment-logstash2.deployment-prep:/var/log/logstash/logstash.log is 11G of fail!
- 22:46 bd808: root partition on deployment-logstash2 full
- 22:43 bd808: No data in logstash since 2016-01-30T06:55:37.838Z; investigating
- 15:33 hashar: Image ci-jessie-wikimedia-1454339883 in wmflabs-eqiad is ready
- 15:01 hashar: Refreshing Nodepool image. Might have npm/grunt properly set up
- 03:15 legoktm: deploying https://gerrit.wikimedia.org/r/267630
2016-01-31
- 13:35 hashar: Jenkins IRC bot started falling at Jan 30 01:04:00 2016 for whatever reason.... Should be fine now
- 13:33 hashar: cancelling/aborting jobs that are stuck while reporting to IRC (mostly browser tests and beta cluster jobs)
- 13:32 hashar: Jenkins jobs are being blocked because they can no more report back to IRC :-(((
- 13:28 hashar: Jenkins jobs are being blocked because they can no more report back to IRC :-(((
2016-01-30
- 12:46 hashar: integration-slave-jessie-1001 : fixed puppet.con server name and ran puppet
2016-01-29
- 18:43 thcipriani: updated scap on beta
- 16:44 thcipriani: deployed scap updates on beta
- 11:58 _joe_: upgraded hhvm to 3.6 wm8 in deployment-prep
2016-01-28
- 23:22 MaxSem: Updated portals on betalabs to master
- 22:23 hashar: salt '*slave-precise*' cmd.run 'apt-get install php5-ldap' ( https://phabricator.wikimedia.org/T124613 ) will need to be puppetized
- 18:17 thcipriani: cleaning npm cache on slave machines: salt -v '*slave*' cmd.run 'sudo -i -u jenkins-deploy -- npm cache clean'
- 18:12 thcipriani: running npm cache clean on integration-slave-precise-1011 sudo -i -u jenkins-deploy -- npm cache clean
- 15:25 hashar: apt-get upgrade deployment-sca01 and deployment-sca02
- 15:09 hashar: fixing puppet.conf hostname on deployment-upload deployment-conftool deployment-tmh01 deployment-zookeeper01 and deployment-urldownloader
- 15:06 hashar: fixing puppet.con hostname on deployment-upload.deployment-prep.eqiad.wmflabs and running puppet
- 15:00 hashar: Running puppet on deployment-memc02 and deployment-elastic07 . It is catching up with lot of changes
- 14:59 hashar: fixing puppet hostnames on deployment-elastic07
- 14:59 hashar: fixing puppet hostnames on deployment-memc02
- 14:55 hashar: Deleted salt keys deployment-pdf01.eqiad.wmflabs and deployment-memc04.eqiad.wmflabs (obsolete, entries with '.deployment-prep.' are already there)
- 07:38 jzerebecki: reload zuul for 4951444..43a030b
- 05:55 jzerebecki: doing https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code.2Fdb_update
- 03:49 mobrovac: deployment-prep re-enabled puppet on deployment-restbase0x
- 02:49 mobrovac: deployment-prep deployment-restbase01 disabled puppet to set up cassandra for
- 02:27 mobrovac: deployment-prep recreating deployment-restbase01 for T125003
- 02:23 mobrovac: deployment-prep deployment-restbase02 disabled puppet to recreate deployment-restbase01 for T125003
- 01:42 mobrovac: deployment-prep recreating deployment-sca02 for T125003
- 01:28 mobrovac: deployment-prep recreating deployment-sca01 for T125003
- 00:36 mobrovac: deployment-prep re-imaging deployment-mathoid for T125003
- 00:02 jzerebecki: integration-slave-trusty-1016:~$ sudo -i rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/skins/Donate
2016-01-27
- 23:49 jzerebecki: integration-slave-precise-1011:~$ sudo -i /etc/init.d/salt-minion restart
- 23:46 jzerebecki: work around https://phabricator.wikimedia.org/T117710 : salt --show-timeout '*slave*' cmd.run 'rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/skins/BlueSky'
- 21:19 cscott: updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf (should be no-op after yesterday's deploy)
- 10:29 hashar: triggered bunch of browser tests, deployment-redis01 was dead/faulty
- 10:08 hashar: mass restarting redis-server process on deployment-redis01 (for https://phabricator.wikimedia.org/T124677 )
- 10:07 hashar: mass restarting redis-server process on deployment-redis01
- 09:00 hashar: beta: commenting out "latency-monitor-threshold 100" parameter from any /etc/redis/redis.conf we have ( https://phabricator.wikimedia.org/T124677 ). Puppet will not reapply it unless distribution is Jessie
2016-01-26
- 16:51 cscott: updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf
- 12:14 hashar: Added Jenkins IRC bot (wmf-insecte) to #wikimedia-perf for https://gerrit.wikimedia.org/r/#/c/265631/
- 09:30 hashar: restarting Jenkins to upgrade the gearman plugin with https://review.openstack.org/#/c/271543/
- 04:18 bd808: integration-slave-jessie-1001:/mnt full; cleaned up 15G of files in /mnt/pbuilder/build (27 hours after the last time I did that)
2016-01-25
- 18:59 twentyafterfour: started redis-server on deployment-redis01 by commenting out latency-monitor-threshold from the redis.conf
- 15:22 hashar: CI: fixing kernels not upgrading via: rm /boot/grub/menu.lst ; update-grub -y (i.e.: regenerate the Grub menu from scratch)
- 14:21 hashar: integration-slave-trusty-1015.integration.eqiad.wmflabs is gone. I have failed the kernel upgrade / grub update
- 01:35 bd808: integration-slave-jessie-1001:/mnt full; cleaned up 15G of files in /mnt/pbuilder/build
2016-01-24
- 06:45 legoktm: deploying https://gerrit.wikimedia.org/r/266039
- 06:13 legoktm: deploying https://gerrit.wikimedia.org/r/266041
2016-01-22
- 23:58 legoktm: removed skins from mwext-qunit workspace on trusty-1013 slave
- 23:34 legoktm: rm -rf /mnt/jenkins-workspace/workspace/mediawiki-phpunit-php53 on slave precise 1012
- 22:45 legoktm: deploying https://gerrit.wikimedia.org/r/265864
- 22:27 hashar: rebooted all CI slaves using OpenStackManager
- 22:09 hashar: rebooting deployment-redis01 (kernel upgrade)
- 21:22 hashar: Image ci-jessie-wikimedia-1453497269 in wmflabs-eqiad is ready (with node 4.2 for https://phabricator.wikimedia.org/T119143 )
- 21:14 hashar: updating nodepool snapshot based on new image
- 21:12 hashar: rebuilding nodepool reference image
- 20:04 hashar: Image ci-jessie-wikimedia-1453492820 in wmflabs-eqiad is ready
- 20:00 hashar: Refreshing nodepool image to hopefully get Nodejs 4.2.4 https://phabricator.wikimedia.org/T124447 https://gerrit.wikimedia.org/r/#/c/265802/
- 16:32 hashar: Nuked corrupted git repo on integration-slave-precise-1012 /mnt/jenkins-workspace/workspace/mediawiki-extensions-php53
- 12:23 hashar: beta: reinitialized keyholder on deployment-bastion. The proxy apparently had no identity
- 09:32 hashar: beta cluster Jenkins job have been stalled for 9hours and 25 minutes. Disabling/reenabling the Gearman plugin to remove the deadlock
2016-01-21
- 21:41 hashar: restored role::mail::mx on deployment-mx
- 21:36 hashar: dropping role::mail::mx from deployment-mx to let puppet run
- 21:33 hashar: rebooting deployment-jobrunner01 / kernel upgrade / /tmp is only 1MBytes
- 21:19 hashar: fixing up deployment-jobrunner01 /tmp and / disks are full
- 19:57 thcipriani: ran REPAIR TABLE globalnames; on centralauth db
- 19:48 legoktm: deploying https://gerrit.wikimedia.org/r/265552
- 19:39 legoktm: deploying jjb changes for https://gerrit.wikimedia.org/r/264990
- 19:25 legoktm: deploying https://gerrit.wikimedia.org/r/265546
- 01:59 jzerebecki: jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions/SpellingDictionary$ rm -r modules/jquery.uls && git rm modules/jquery.uls
- 01:00 jzerebecki: jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions$ git pull && git submodule update --init --recursive
- 00:57 jzerebecki: jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions$ git reset HEAD SpellingDictionary
2016-01-20
- 20:05 hashar: beta sudo find /data/project/upload7/math -type f -delete (probably some old left over)
- 19:50 hashar: beta: on commons ran deleteArchivedFile.php : Nuked 7130 files
- 19:49 hashar: beta : foreachwiki deleteArchivedRevisions.php -delete
- 19:26 hasharAway: Nuked all files from http://commons.wikimedia.beta.wmflabs.org/wiki/Category:GWToolset_Batch_Upload
- 19:19 hasharAway: beta: sudo find /data/project/upload7/*/*/temp -type f -delete
- 19:14 hasharAway: beta: sudo rm /data/project/upload7/*/*/lockdir/*
- 18:57 hasharAway: beta cluster code has been stalled for roughly 2h30
- 18:55 hasharAway: disconnecting Gearman plugin to remove deadlock for beta cluster rjobs
- 17:06 hashar: clearing files from beta-cluster to prepare for Swift migration. python pwb.py delete.py -family:betacommons -lang:en -cat:'GWToolset Batch Upload' -verbose -putthrottle:0 -summary:'Clearing out old batched upload to save up disk space for Swift migration'
2016-01-19
- 22:25 legoktm: deleting *zend* workspaces on precise slaves
- 21:58 thcipriani: trying https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code.2Fdb_update again
- 21:57 thcipriani: beta-scap-eqiad still can't find executor on deployment-bastion.eqiad
- 21:52 thcipriani: following steps at https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code.2Fdb_update for deployment-bastion
- 19:34 legoktm: deleting all *zend* jobs from jenkins
- 09:40 hashar: Created github repo https://github.com/wikimedia/operations-debs-varnish4
- 03:59 legoktm: deploying https://gerrit.wikimedia.org/r/264912 and https://gerrit.wikimedia.org/r/264922
2016-01-17
- 18:02 legoktm: deploying https://gerrit.wikimedia.org/r/264605
2016-01-16
- 21:47 legoktm: deploying https://gerrit.wikimedia.org/r/264489
- 21:36 legoktm: deploying https://gerrit.wikimedia.org/r/264488
- 21:29 legoktm: deploying https://gerrit.wikimedia.org/r/264487
- 21:21 legoktm: deploying https://gerrit.wikimedia.org/r/264483 https://gerrit.wikimedia.org/r/264485
- 20:58 legoktm: deploying https://gerrit.wikimedia.org/r/264492
- 18:55 jzerebecki: reloadin zuul for 996c558..5f8eb50
- 09:12 legoktm: deploying https://gerrit.wikimedia.org/r/264448
- 09:01 legoktm: deploying https://gerrit.wikimedia.org/r/264446 and https://gerrit.wikimedia.org/r/264447
- 07:46 legoktm: sudo -u jenkins-deploy mv /mnt/jenkins-workspace/workspace/mediawiki-core-phplint /mnt/jenkins-workspace/workspace/mediawiki-core-php53lint on all precise slaves
- 07:17 legoktm: deploying https://gerrit.wikimedia.org/r/264444
- 06:31 legoktm: deploying https://gerrit.wikimedia.org/r/264441
- 06:10 legoktm: added phpflavor-php53 label to all phpflavor-zend slaves
2016-01-15
- 12:17 hashar: restarting Jenkins for plugins updates
- 02:49 bd808: Trying to fix submodules in deployment-bastion:/srv/mediawiki-staging/php-master/extensions for T123701
2016-01-14
- 20:06 legoktm: deploying https://gerrit.wikimedia.org/r/264122
- 19:32 legoktm: deploying https://gerrit.wikimedia.org/r/264114
- 19:18 legoktm: deploying https://gerrit.wikimedia.org/r/264108
2016-01-13
- 21:06 hashar: beta cluster code is up to date again. Got delayed by roughly 4 hours.
- 20:55 hashar: unlocked Jenkins jobs for beta cluster by disabling/reenabling Jenkins Gearman client
- 10:15 hashar: beta: fixed puppet on deployment-elastic06 . Was still using cert/hostname without .deployment-prep. .... Mass update occurring.
2016-01-12
- 23:30 legoktm: deploying https://gerrit.wikimedia.org/r/263757 https://gerrit.wikimedia.org/r/263756
- 13:32 hashar: beta cluster: running /usr/local/sbin/cleanup-pam-config
- 13:29 hashar: integration running /usr/local/sbin/cleanup-pam-config on slaves
2016-01-11
- 22:24 hashar: Deleting old references on Zuul-merger for mediawiki/core : /usr/share/python/zuul/bin/python /home/hashar/zuul-clear-refs.py --until 15 /srv/ssd/zuul/git/mediawiki/core
- 22:21 hashar: gallium in /srv/ssd/zuul/git/mediawiki/core$ git gc --prune=all && git remote update --prune
- 22:21 hashar: scandium in /srv/ssd/zuul/git/mediawiki/core$ git gc --prune=all && git remote update --prune
- 07:35 legoktm: deploying https://gerrit.wikimedia.org/r/263319
2016-01-07
- 23:16 legoktm: deleted /mnt/jenkins-workspace/workspace/mediawiki-extensions-qunit/src/extensions/PdfHandler/.git/refs/heads/wmf/1.26wmf16.lock on slave 1013
- 06:32 legoktm: deploying https://gerrit.wikimedia.org/r/262868
- 02:24 legoktm: deploying https://gerrit.wikimedia.org/r/262855
- 01:25 jzerebecki: reloading zuul for b0a5335..c16368a
2016-01-06
- 21:13 thcipriani: kicking integration puppetmaster, weird node unable to find definition.
- 21:11 jzerebecki: on scandium: sudo -u zuul rm -rf /srv/ssd/zuul/git/mediawiki/services/mathoid
- 21:04 legoktm: ^ on gallium
- 21:04 legoktm: manually deleted /srv/ssd/zuul/git/mediawiki/services/mathoid to force zuul to re-clone it
- 20:17 hashar: beta: dropped a few more /etc/apt/apt.conf.d/*-proxy files. webproxy is no more reachable from labs
- 09:44 hashar: CI/beta: deleting all git tags from /var/lib/git/operations/puppet and doing git repack
- 09:39 hashar: restoring puppet hacks on beta cluster puppetmaster.
- 09:35 hashar: beta/CI: salt -v '*' cmd.run 'rm -v /etc/apt/apt.conf.d/*-proxy' https://phabricator.wikimedia.org/T122953
2016-01-05
- 16:54 hashar_: Removed elastic search from CI slaves https://phabricator.wikimedia.org/T89083 https://gerrit.wikimedia.org/r/#/c/259301/
- 03:45 Krinkle: integration-slave-trusty-1015: rm -rf /mnt/home/jenkins-deploy/.npm per https://integration.wikimedia.org/ci/job/mediawiki-core-qunit/56577/console
2016-01-04
- 21:06 hashar: gallium has puppet enabled again
- 20:53 hashar: stopping puppet on gallium and live hacking Zuul configuration for https://phabricator.wikimedia.org/T122656
2016-01-02
- 03:17 yurik: purged varnishs on deployment-cache-text04
2016-01-01
- 22:17 bd808: No nodepool ci-jessie-* hosts seen in Jenkins interface and rake-jessie jobs backing up
Archive
- Archive 1 (September 2014 - December 2015)