You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Release Engineering/SAL: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(Jenkins: added ldap-labs-codfw.wikimedia.org as a fallback LDAP server T130446 (hashar))
imported>Labslogbot
(zuul: almost all MediaWiki extensions migrated to run the npm job on Nodepool (with Node.js 4.3) T119143 . All tested. Will monitor the build results that ran overnight tomorrow (hashar))
Line 1: Line 1:
== 2016-03-21 ==
* 21:55 hashar: zuul: almost all MediaWiki extensions migrated to run the npm job on Nodepool (with Node.js 4.3)  T119143 . All tested. Will monitor the build results that ran overnight tomorrow
* 20:28 hashar: Mass running npm-node-4.3 jobs against MediaWiki extensions to make sure they all pass ( https://gerrit.wikimedia.org/r/#/c/278004/  |  T119143 )
* 17:40 elukey: executed git rebase --interactive on deployment-puppetmaster.deployment-prep.eqiad.wmflabs to remove https://gerrit.wikimedia.org/r/#/c/278713/
* 15:46 elukey: hacked manually the cdh puppet submodule on deployment-puppetmaster.deployment-prep.eqiad.wmflabs - please let me know if interfere with anybody's tests
* 14:24 elukey: executed git submodule update --init on deployment-puppetmaster.deployment-prep.eqiad.wmflabs
* 11:25 elukey: beta: cherry picked https://gerrit.wikimedia.org/r/#/c/278713/ to test an updated to the cdh module (analytics)
* 11:13 hashar: beta: rebased puppet master which had a conflict on https://gerrit.wikimedia.org/r/#/c/274711/  which got merged meanwhile (saves Elukey )
* 11:02 hashar: beta: added Elukey (wikimedia ops) to the project as member and admin
== 2016-03-19 ==
== 2016-03-19 ==
* 13:04 hashar: Jenkins: added ldap-labs-codfw.wikimedia.org as a fallback LDAP server  T130446
* 13:04 hashar: Jenkins: added ldap-labs-codfw.wikimedia.org as a fallback LDAP server  T130446

Revision as of 21:55, 21 March 2016

2016-03-21

  • 21:55 hashar: zuul: almost all MediaWiki extensions migrated to run the npm job on Nodepool (with Node.js 4.3) T119143 . All tested. Will monitor the build results that ran overnight tomorrow
  • 20:28 hashar: Mass running npm-node-4.3 jobs against MediaWiki extensions to make sure they all pass ( https://gerrit.wikimedia.org/r/#/c/278004/ | T119143 )
  • 17:40 elukey: executed git rebase --interactive on deployment-puppetmaster.deployment-prep.eqiad.wmflabs to remove https://gerrit.wikimedia.org/r/#/c/278713/
  • 15:46 elukey: hacked manually the cdh puppet submodule on deployment-puppetmaster.deployment-prep.eqiad.wmflabs - please let me know if interfere with anybody's tests
  • 14:24 elukey: executed git submodule update --init on deployment-puppetmaster.deployment-prep.eqiad.wmflabs
  • 11:25 elukey: beta: cherry picked https://gerrit.wikimedia.org/r/#/c/278713/ to test an updated to the cdh module (analytics)
  • 11:13 hashar: beta: rebased puppet master which had a conflict on https://gerrit.wikimedia.org/r/#/c/274711/ which got merged meanwhile (saves Elukey )
  • 11:02 hashar: beta: added Elukey (wikimedia ops) to the project as member and admin

2016-03-19

  • 13:04 hashar: Jenkins: added ldap-labs-codfw.wikimedia.org as a fallback LDAP server T130446

2016-03-18

  • 17:16 jzerebecki: reloading zuul for e33494f..89a9659

2016-03-17

  • 21:10 thcipriani: updating scap on deployment-tin to test D133
  • 18:31 cscott: updated OCG to version c1a8232594fe846bd2374efd8f7c20d7e97ac449
  • 09:34 hashar: deployment-jobrunner01 deleted /var/log/apache/*.gz T130179
  • 09:04 hashar: Upgrading hhvm and related extensions on jobrunner01 T130179

2016-03-16

2016-03-15

  • 15:17 jzerebecki: added wikidata.beta.wmflabs.org in https://wikitech.wikimedia.org/wiki/Special:NovaAddress to deployment-cache-text04.deployment-prep.eqiad.wmflabs
  • 14:19 hashar: Image ci-jessie-wikimedia-1458051246 in wmflabs-eqiad is ready T124447
  • 14:14 hashar: Refreshing Nodepool snapshot images so it get a fresh copy of slave-scripts T124447
  • 14:08 hashar: Deploying slave script change https://gerrit.wikimedia.org/r/#/c/277508/ "npm-install-dev.py: Use config.dev.yaml instead of config.yaml" for T124447

2016-03-14

  • 22:18 greg-g: new jobs weren't processing in Zuul, lego fixed it and blamed Reedy
  • 20:13 hashar: Updating Jenkins jobs mwext-Wikibase-* so they no more rely on --with-phpunit ( ping @hoo https://gerrit.wikimedia.org/r/#/c/277330/ )
  • 17:03 Krinkle: Doing full Zuul restart due to deadlock (T128569)
  • 10:18 moritzm: re-enabled systemd unit for logstash on deployment-logstash2

2016-03-11

  • 22:42 legoktm: deploying https://gerrit.wikimedia.org/r/276901
  • 19:41 legoktm: legoktm@integration-slave-trusty-1001:/mnt/jenkins-workspace/workspace$ sudo rm -rf mwext-Echo-testextension-* # because it was broken

2016-03-10

  • 20:22 hashar: Nodepool Image ci-jessie-wikimedia-1457641052 in wmflabs-eqiad is ready
  • 20:19 hashar: Refreshing Nodepool to include the 'varnish' package T128188
  • 20:05 hashar: apt-get upgrade integration-slave-jessie1001 (bring in ffmpeg update and nodejs among other things)
  • 12:22 hashar: Nodeppol Image ci-jessie-wikimedia-1457612269 in wmflabs-eqiad is ready
  • 12:18 hashar: Nodepool: rebuilding image to get mathoid/graphoid packages included (hopefully) T119693 T128280

2016-03-09

  • 17:56 bd808: Cleaned up git clone state in deployment-tin.deployment-prep:/srv/mediawiki-staging/php-master and queued beta-code-update-eqiad to try again (T129371)
  • 17:48 bd808: Git clone at deployment-tin.deployment-prep:/srv/mediawiki-staging/php-master in completely horrible state. Investigating
  • 17:22 bd808: Fixed https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update-eqiad/4452/
  • 17:19 bd808: Manually cleaning up broken rebase in deployment-tin.deployment-prep:/srv/mediawiki-staging
  • 16:27 bd808: Removed cherry-pick of https://gerrit.wikimedia.org/r/#/c/274696 ; manually cleaned up systemd unit and restarted logstash on deployment-logstash2
  • 14:59 hashar: Image ci-jessie-wikimedia-1457535250 in wmflabs-eqiad is ready T129345
  • 14:57 hashar: Rebuilding snapshot image to get Xvfb enabled at boot time T129345
  • 13:04 moritzm: cherrypicked patch to deployment-prep which provides a systemd unit for logstash
  • 10:52 hashar: Image ci-jessie-wikimedia-1457520493 in wmflabs-eqiad is ready
  • 10:29 hashar: Nodepool: created new image and refreshing snapshot in attempt to get Xvfb running T129320 T128090

2016-03-08

  • 23:42 legoktm: running CentralAuth's checkLocalUser.php --verbose=1 --delete=1 on deployment-tin for T115198
  • 21:33 hashar: Nodepool Image ci-jessie-wikimedia-1457472606 in wmflabs-eqiad is ready
  • 19:23 hashar: Zuul inject DISPLAY https://gerrit.wikimedia.org/r/#/c/273269/
  • 16:03 hashar: Image ci-jessie-wikimedia-1457452766 is ready T128090
  • 15:59 hashar: Nodepool: refreshing snapshot image to ship browsers+Xvfb for T128090
  • 14:27 hashar: Mass refreshed CI slave-scripts 1d2c60d..e27c292
  • 13:38 hashar: Rebased integration puppet master. Dropped a make-wmf-branch patch and the one for raita role
  • 11:26 hashar: Nodepool: created new snapshot to set puppet $::labsproject : ci-jessie-wikimedia-1457436175 hoping to fix hiera lookup T129092
  • 02:51 ori: deployment-prep Updating HHVM on deployment-mediawiki01
  • 02:27 ori: deployment-prep Updating HHVM on deployment-mediawiki02
  • 01:50 Krinkle: integration-saltmater: salt -v '*slave-trusty*' cmd.run 'rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/skins/BlueSky' (T117710)
  • 01:50 Krinkle: integration-saltmater: salt -v '*slave-trusty*' cmd.run 'rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm-composer/src/skins/BlueSky'

2016-03-07

  • 21:03 hashar: Nodepool upgraded to 0.1.1-wmf.4 , it no more waits 1 minute before deleted a used node | T118573
  • 20:05 hashar: Upgrading Nodepool from 0.1.1-wmf3 to 0.1.1-wmf.4 with andrewbogott | T118573

2016-03-06

2016-03-04

  • 19:31 hashar: Nodepool Image ci-jessie-wikimedia-1457119603 in wmflabs-eqiad is ready - T128846
  • 13:29 hashar: Nodepool Image ci-jessie-wikimedia-1457097785 in wmflabs-eqiad is ready
  • 08:42 hashar: CI deleting integration-slave-precise-1001 (2 executors). It is not in labs DNS which causes bunch of issues, no need for the capacity anymore. T128802
  • 02:49 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/274889
  • 00:11 Krinkle: salt -v --show-timeout '*slave*' cmd.run "bash -c 'cd /srv/deployment/integration/slave-scripts; git pull'"

2016-03-03

  • 23:37 legoktm: salt -v --show-timeout '*slave*' cmd.run "bash -c 'cd /srv/deployment/integration/slave-scripts; git pull'"
  • 22:34 legoktm: mysql not running on integration-slave-precise-1002, manually starting (T109704)
  • 22:30 legoktm: mysql not running on integration-slave-precise-1011, manually starting (T109704)
  • 22:19 legoktm: mysql not running on integration-slave-precise-1012, manually starting (T109704)
  • 22:07 legoktm: deploying https://gerrit.wikimedia.org/r/274821
  • 21:58 Krinkle: Reloading Zuul to deploy (EventLogging and AdminLinks) https://gerrit.wikimedia.org/r/274821 /
  • 18:49 thcipriani: killing deployment-bastion since it is no longer used
  • 14:23 hashar: https://integration.wikimedia.org/ci/computer/integration-slave-trusty-1011/ is out of disk space

2016-03-02

2016-03-01

  • 23:10 Krinkle: Updated Jenkins configuration to also support php5 and hhvm for Console Sections detection of "PHPUnit"
  • 17:05 hashar: gerrit: set accounts inactive for Eloquence and Mgrover. Former employees of wmf and mail bounceback
  • 16:41 hashar: Restarted Jenkins
  • 16:32 hashar: Bunch of Jenkins job got stall because I have killed threads in Jenkins to unblock integration-slave-trusty-1003 :-(
  • 12:14 hashar: integration-slave-trusty-1003 is back online
  • 12:13 hashar: Might have killed the proper Jenkins thread to unlock integration-slave-trusty-1003
  • 12:03 hashar: Jenkins can not pool back integration-slave-trusty-1003 Jenkins master has a bunch of blocking threads pilling up with hudson.plugins.sshslaves.SSHLauncher.afterDisconnect() locked somehow
  • 11:41 hashar: Rebooting integration-slave-trusty-1003 (does not reply to salt / ssh)
  • 10:34 hashar: Image ci-jessie-wikimedia-1456827861 in wmflabs-eqiad is ready
  • 10:24 hashar: Refreshing Nodepool snapshot instances
  • 10:22 hashar: Refreshing Nodepool base image to speed instances boot time (dropping open-iscsi package https://gerrit.wikimedia.org/r/#/c/273973/ )

2016-02-29

  • 16:23 hashar: salt -v '*slave*' cmd.run 'rm -fR /mnt/jenkins-workspace/workspace/mwext*jslint' T127362
  • 16:17 hashar: Deleting all mwext-.*-jslint jobs from Jenkins. Paladox has migrated all of them to jshint/jsonlint generic jobs T127362
  • 16:16 hashar: Deleting all mwext-.*-jslint jobs from Jenkins. Paladox has migrated all of them to jshint/jsonlint generic jobs
  • 09:46 hashar: Jenkins installing Yaml Axis Plugin 0.2.0

2016-02-28

  • 01:30 Krinkle: Rebooting integration-slave-precise-1012 – Might help T109704 (MySQL not running)

2016-02-26

  • 15:14 jzerebecki: salt -v --show-timeout '*slave*' cmd.run "bash -c 'cd /srv/deployment/integration/slave-scripts; git pull'" T128191
  • 15:14 jzerebecki: salt -v --show-timeout '*slave*' cmd.run "bash -c 'cd /srv/deployment/integration/slave-scripts; git pull'"
  • 14:44 hashar: (since it started, dont be that scared!)
  • 14:44 hashar: Nodepool has triggered 40 000 instances
  • 11:53 hashar: Restarted memcached on deployment-memc02 T128177
  • 11:53 hashar: memcached process on deployment-memc02 seems to have a nice leak of socket usages (from lost) and plainly refuse connections (bunch of CLOSE_WAIT) T128177
  • 11:53 hashar: memcached process on deployment-memc02 seems to have a nice leak of socket usages (from lost) and plainly refuse connections (bunch of CLOSE_WAIT)
  • 11:40 hashar: deployment-memc04 find /etc/apt -name '*proxy' -delete (prevented apt-get update)
  • 11:26 hashar: beta: salt -v '*' cmd.run 'apt-get -y install ruby-msgpack' . I am tired of seeing puppet debug messages: "Debug: Failed to load library 'msgpack' for feature 'msgpack'"
  • 11:24 hashar: puppet keep restarting nutcracker apparently T128177
  • 11:20 hashar: Memcached error for key "enwiki:flow_workflow%3Av2%3Apk:63dc3cf6a7184c32477496d63c173f9c:4.8" on server "127.0.0.1:11212": SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY

2016-02-25

  • 22:38 hashar: beta: maybe deployment-jobunner01 is processing jobs a bit faster now. Seems like hhvm went wild
  • 22:23 hashar: beta: jobrunner01 had apache/hhvm killed somehow .... Blame me
  • 21:56 hashar: beta: stopped jobchron / jobrunner on deployment-jobrunner01 and restarting them by running puppet
  • 21:49 hashar: beta did a git-deploy of jobrunner/jobrunner hoping to fix puppet run on deployment-jobrunner01 and apparently it did! T126846
  • 11:21 hashar: deleting workspace /mnt/jenkins-workspace/workspace/browsertests-Wikidata-WikidataTests-linux-firefox-sauce on slave-trusty-1015
  • 10:08 hashar: Jenkins upgraded T128006
  • 01:44 legoktm: deploying https://gerrit.wikimedia.org/r/273170
  • 01:39 legoktm: deploying https://gerrit.wikimedia.org/r/272955 (undeployed) and https://gerrit.wikimedia.org/r/273136
  • 01:37 legoktm: deploying https://gerrit.wikimedia.org/r/273136
  • 00:31 thcipriani: running puppet on beta to update scap to latest packaged version: sudo salt -b '10%' -G 'deployment_target:scap/scap' cmd.run 'puppet agent -t'
  • 00:20 thcipriani: deployment-tin not accepting jobs for some time, ran through https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code.2Fdb_update, is back now

2016-02-24

  • 19:55 legoktm: legoktm@deployment-tin:~$ mwscript extensions/ORES/maintenance/PopulateDatabase.php --wiki=enwiki
  • 18:30 bd808: "configuration file '/etc/nutcracker/nutcracker.yml' syntax is invalid"
  • 18:27 bd808: nutcracker dead on mediawiki01; investigating
  • 17:20 hashar: Deleted Nodepool instances so new ones get to use the new snapshot ci-jessie-wikimedia-1456333979
  • 17:12 hashar: Refreshing nodepool snapshot. Been stall since Feb 15th T127755
  • 17:01 bd808: https://wmflabs.org/sal/releng missing SAL data since 2016-02-20T20:19 due to bot crash; needs to be backfilled from wikitech data (T127981)
  • 16:43 hashar: sal on elastic search is stall https://phabricator.wikimedia.org/T127981
  • 15:07 hasharAW: beta app servers have lost access to memcached due to bad nutcracker conf | T127966
  • 14:41 hashar: beta: we have a lost a memcached server 11:51am UTC

2016-02-23

  • 22:45 thcipriani: deployment-puppetmaster is in a weird rebase state
  • 22:25 legoktm: running sync-common manually on deployment-mediawiki02
  • 09:59 hashar: Deleted a bunch of mwext-.*-jslint jobs that are no more in used (migrated to either 'npm' or 'jshint' / 'jsonlint' )

2016-02-22

  • 22:06 bd808: Restarted puppetmaster service on deployment-puppetmaster to "fix" error "invalid byte sequence in US-ASCII"
  • 17:46 jzerebecki: ssh integration-slave-trusty-1017.eqiad.wmflabs 'sudo -u jenkins-deploy rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/.git/config.lock
  • 16:47 gehel: deployment-prep upgrading deployment-logstash2 to elasticsearch 1.7.5
  • 10:26 gehel: deployment-prep upgrading elastic-search to 1.7.5 on deployment-elastic0[5-8]

2016-02-20

  • 20:19 Krinkle: beta-code-update-eqiad job repeatedly stuck at "IRC notifier plugin"
  • 19:29 Krinkle: beta-code-update-eqiad broken because deployment-tin:/srv/mediawiki-staging/php-master/extensions/MobileFrontend/includes/MobileFrontend.hooks.php was modified on the server without commit
  • 19:22 Krinkle: Various beta-mediawiki-config-update-eqiad jobs have been stuck 'queued' for > 24 hours

2016-02-19

2016-02-18

2016-02-17

2016-02-16

  • 23:22 yuvipanda: new instances on deployment-prep no longer get NFS because of https://wikitech.wikimedia.org/w/index.php?title=Hiera%3ADeployment-prep&type=revision&diff=311783&oldid=311781
  • 23:18 hashar: jenkins@gallium find /var/lib/jenkins/config-history/nodes -maxdepth 1 -type d -name 'ci-jessie*' -exec rm -vfR {} \;
  • 23:17 hashar: Jenkins accepting slave creations again. Root cause is /var/lib/jenkins/config-history/nodes/ has reached the 32k inode limit.
  • 23:14 hashar: Jenkins: Could not create rootDir /var/lib/jenkins/config-history/nodes/ci-jessie-wikimedia-34969/2016-02-16_22-40-23
  • 23:02 hashar: Nodepool can not authenticate with Jenkins anymore. Thus it can not add slaves it spawned.
  • 22:56 hashar: contint: Nodepool instances pool exhausted
  • 21:14 andrewbogott: deployment-logstash2 migration finished
  • 20:49 jzerebecki: reloading zuul for 3bf7584..67fec7b
  • 19:58 andrewbogott: migrating deployment-logstash2 to labvirt1010
  • 19:00 hashar: tin: checking out mw 1.27.0-wmf.14
  • 15:23 hashar: integration-make-wmfbranch : /mnt/make-wmf-branch mount now has gid=wikidev and group setuid (i.e. mode 2775)
  • 15:20 hashar: integration-make-wmfbranch : change tmpfs to /mnt/make-wmf-branch (from /var/make-wmf-branch )
  • 11:30 jzerebecki: T117710 integration-saltmaster:~# salt -v '*slave-trusty*' cmd.run 'rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm-composer/src/skins/BlueSky'
  • 09:52 hashar: will cut the wmf branches this afternoon starting around 14:00 CET

2016-02-15

2016-02-14

2016-02-13

  • 06:42 bd808: restarted nutcracker on deployment-mediawiki01
  • 06:32 bd808: jobrunner on deployment-jobrunner01 enabled after reverting changes from T87928 that caused T126830
  • 05:51 bd808: disabled jobrunner process on jobrunner01; queue full of jobs broken by T126830
  • 05:31 bd808: trebuchet clone of /srv/jobrunner/jobrunner broken on jobrunner01; failing puppet runs
  • 05:25 bd808: jobrunner process on deployment-jobrunner01 badly broken; investigating
  • 05:20 bd808: Ran https://phabricator.wikimedia.org/P2273 on deployment-jobrunner01.deployment-prep.eqiad.wmflabs; freed ~500M; disk utilization still at 94%

2016-02-12

  • 23:54 hashar: beta cluster broken since 20:30 UTC https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/fatalmonitor havent looked
  • 17:36 hashar: salt -v '*slave-trusty*' cmd.run 'apt-get -y install texlive-generic-extra' # T126422
  • 17:32 hashar: adding texlive-generic-extra on CI slaves by cherry picking https://gerrit.wikimedia.org/r/#/c/270322/ - T126422
  • 17:19 hashar: get rid of integration-dev it is broken somehow
  • 17:10 hashar: Nodepool back at spawning instances. contintcloud has been migrated in wmflabs
  • 16:51 thcipriani: running sudo salt '*' -b '10%' deploy.fixurl to fix deployment-prep trebuchet urls
  • 16:31 hashar: bd808 added support for saltbot to update tasks automagically!!!! T108720
  • 03:10 yurik: attempted to sync graphoid from gerrit 270166 from deployment-tin, but it wouldn't sync. Tried to git pull sca02, submodules wouldn't pull

2016-02-11

  • 22:53 thcipriani: shutting down deployment-bastion
  • 21:28 hashar: pooling back slaves 1001 to 1006
  • 21:18 hashar: re enabling hhvm service on slaves ( https://phabricator.wikimedia.org/T126594 ) Some symlink is missing and only provided by the upstart script grrrrrrr https://phabricator.wikimedia.org/T126658
  • 20:52 legoktm: deploying https://gerrit.wikimedia.org/r/270098
  • 20:35 hashar: depooling the six recent slaves: /usr/lib/x86_64-linux-gnu/hhvm/extensions/current/luasandbox.so cannot open shared object file
  • 20:29 hashar: pooling integration-slave-trusty-1004 integration-slave-trusty-1005 integration-slave-trusty-1006
  • 20:14 hashar: pooling integration-slave-trusty-1001 integration-slave-trusty-1002 integration-slave-trusty-1003
  • 19:35 marxarelli: modifying deployment server node in jenkins to point to deployment-tin
  • 19:27 thcipriani: running sudo salt -b '10%' '*' cmd.run 'puppet agent -t' from deployment-salt
  • 19:27 twentyafterfour: Keeping notes on the ticket: https://phabricator.wikimedia.org/T126537
  • 19:24 thcipriani: moving deployment-bastion to deployment-tin
  • 17:59 hashar: recreated instances with proper names: integration-slave-trusty-{1001-1006}
  • 17:52 hashar: Created integration-slave-trusty-{1019-1026} as m1.large (note 1023 is an exception it is for Android). Applied role::ci::slave , lets wait for puppet to finish
  • 17:42 Krinkle: Currently testing https://gerrit.wikimedia.org/r/#/c/268802/ in Beta Labs
  • 17:27 hashar: Depooling all the ci.medium slaves and deleting them.
  • 17:27 hashar: I tried. The ci.medium instances are too small and MediaWiki tests really need 1.5GBytes of memory :-(
  • 16:00 hashar: rebuilding integration-dev https://phabricator.wikimedia.org/T126613
  • 15:27 Krinkle: Deploy Zuul config change https://gerrit.wikimedia.org/r/269976
  • 11:46 hashar: salt -v '*' cmd.run '/etc/init.d/apache2 restart' might help for Wikidata browser tests failling
  • 11:32 hashar: disabling hhvm service on CI slaves ( https://phabricator.wikimedia.org/T126594 , cherry picked both patches )
  • 10:50 hashar: reenabled puppet on CI. All transitioned to a 128MB tmpfs (was 512MB)
  • 10:16 hashar: pooling back integration-slave-trusty-1009 and integration-slave-trusty-1010 (tmpfs shrunken)
  • 10:06 hashar: disabling puppet on all CI slaves. Trying to lower tmpfs 512MB to 128MB ( https://gerrit.wikimedia.org/r/#/c/269880/ )
  • 02:45 legoktm: deploying https://gerrit.wikimedia.org/r/269853 https://gerrit.wikimedia.org/r/269893

2016-02-10

  • 23:54 hashar_: depooling Trusty slaves that only have 2GB of ram that is not enough. https://phabricator.wikimedia.org/T126545
  • 22:55 hashar_: gallium: find /var/lib/jenkins/config-history/config -type f -wholename '*/2015*' -delete ( https://phabricator.wikimedia.org/T126552 )
  • 22:34 Krinkle: Zuul is back up and procesing Gerrit events, but jobs are still queued indefinitely. Jenkins is not accepting new jobs
  • 22:31 Krinkle: Full restart of Zuul. Seems Gearman/Zuul got stuck. All executors were idling. No new Gerrit events processed either.
  • 21:22 legoktm: cherry-picking https://gerrit.wikimedia.org/r/#/c/269370/ on integration-puppetmaster again
  • 21:17 hashar: CI dust have settled. Krinkle and I have pooled a lot more Trusty slaves to accommodate for the overload caused by switching to php55 (jobs run on Trusty)
  • 21:08 hashar: pooling trusty slaves 1009, 1010, 1021, 1022 with 2 executors (they are ci.medium)
  • 20:38 hashar: cancelling mediawiki-core-jsduck-publish and mediawiki-core-doxygen-publish jobs manually. They will catch up on next merge
  • 20:34 Krinkle: Pooled integration-slave-trusty-1019 (new)
  • 20:28 Krinkle: Pooled integration-slave-trusty-1020 (new)
  • 20:24 Krinkle: created integration-slave-trusty-1019 and integration-slave-trusty-1020 (ci1.medium)
  • 20:18 hashar: created integration-slave-trusty-1009 and 1010 (trusty ci.medium)
  • 20:06 hashar: creating integration-slave-trusty-1021 and integration-slave-trusty-1022 (ci.medium)
  • 19:48 greg-g: that cleanup was done by apergos
  • 19:48 greg-g: did cleanup across all integration slaves, some were very close to out of room. results: https://phabricator.wikimedia.org/P2587
  • 19:43 hashar: Dropping slaves Precise m1.large integration-slave-precise-1014 and integration-slave-precise-1013 , most load shifted to Trusty (php53 -> php55 transition)
  • 18:20 Krinkle: Creating a Trusty slave to support increased demand following MediaWIki php53(precise)>php55(trusty) bump
  • 16:06 jzerebecki: reloading zuul for 41a92d5..5b971d1
  • 15:42 jzerebecki: reloading zuul for 639dd40..41a92d5
  • 14:12 jzerebecki: recover a bit of disk space: integration-saltmaster:~# salt --show-timeout '*slave*' cmd.run 'rm -rf /mnt/jenkins-workspace/workspace/*WikibaseQuality*'
  • 13:46 jzerebecki: reloading zuul for 639dd40
  • 13:15 jzerebecki: reloading zuul for 3be81c1..e8e0615
  • 08:07 legoktm: deploying https://gerrit.wikimedia.org/r/269619
  • 08:03 legoktm: deploying https://gerrit.wikimedia.org/r/269613 and https://gerrit.wikimedia.org/r/269618
  • 06:41 legoktm: deploying https://gerrit.wikimedia.org/r/269607
  • 06:34 legoktm: deploying https://gerrit.wikimedia.org/r/269605
  • 02:59 legoktm: deleting 14GB broken workspace of mediawiki-core-php53lint from integration-slave-precise-1004
  • 02:37 legoktm: deleting /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm-composer on trusty-1017, it had a skin cloned into it
  • 02:26 legoktm: queuing mwext jobs server-side to identify failing ones
  • 02:21 legoktm: deploying https://gerrit.wikimedia.org/r/269582
  • 01:03 legoktm: deploying https://gerrit.wikimedia.org/r/269576

2016-02-09

  • 23:17 legoktm: deploying https://gerrit.wikimedia.org/r/269551
  • 23:02 legoktm: gracefully restarting zuul
  • 22:57 legoktm: deploying https://gerrit.wikimedia.org/r/269547
  • 22:29 legoktm: deploying https://gerrit.wikimedia.org/r/269540
  • 22:18 legoktm: re-enabling puppet on all CI slaves
  • 22:02 legoktm: reloading zuul to see if it'll pickup the new composer-php53 job
  • 21:53 legoktm: enabling puppet on just integration-slave-trusty-1012
  • 21:52 legoktm: cherry-picked https://gerrit.wikimedia.org/r/#/c/269370/ onto integration-puppetmaster
  • 21:50 legoktm: disabling puppet on all trusty/precise CI slaves
  • 21:40 legoktm: deploying https://gerrit.wikimedia.org/r/269533
  • 17:49 marxarelli: disabled/enabled gearman in jenkins, connection works this time
  • 17:49 marxarelli: performed stop/start of zuul on gallium to restore zuul and gearman
  • 17:45 marxarelli: "Failed: Unable to Connect" in jenkins when testing gearman connection
  • 17:40 marxarelli: killed old zull process manually and restarted service
  • 17:39 marxarelli: restart of zuul fails as well. old process cannot be killed
  • 17:38 marxarelli: reloading zuul fails with "failed to kill 13660: Operation not permitted"
  • 16:06 bd808: Deleted corrupt integration-slave-precise-1003:/mnt/jenkins-workspace/workspace/mediawiki-core-php53lint/.git
  • 15:11 hashar: mira: /srv/mediawiki-staging/multiversion/checkoutMediaWiki 1.27.0-wmf.13 php-1.27.0-wmf.13
  • 14:51 hashar: ./make-wmf-branch -n 1.27.0-wmf.13 -o master
  • 14:50 hashar: pooling back integration-slave-precise1001 - 1004. Manually fetched git repos in workspace for mediawiki core php53
  • 14:49 hashar: make-wmf-branch instance: created a local ssh key pair and set the config to use User: hashar
  • 14:13 hashar: pooling https://integration.wikimedia.org/ci/computer/integration-slave-precise-1012/ Mysql is back .. Blame puppet
  • 14:12 hashar: de pooling https://integration.wikimedia.org/ci/computer/integration-slave-precise-1012/ Mysql is gone somehow
  • 14:04 hashar: Manually git fetching mediawiki-core in /mnt/jenkins-workspace/workspace/mediawiki-core-php53lint of slaves precise 1001 to 1004 (git on Precise is remarkably too slow)
  • 13:28 hashar: salt '*trusty*' cmd.run 'update-alternatives --set php /usr/bin/hhvm'
  • 13:28 hashar: salt '*precise*' cmd.run 'update-alternatives --set php /usr/bin/php5'
  • 13:18 hashar: salt -v --batch=3 '*slave*' cmd.run 'puppet agent -tv'
  • 13:15 hashar: removing https://gerrit.wikimedia.org/r/#/c/269370/ from CI puppet master
  • 13:14 hashar: slave recurse infinitely doing /bin/bash -eu /srv/deployment/integration/slave-scripts/bin/mw-install-mysql.sh then loop over /bin/bash /usr/bin/php maintenance/install.php --confpath /mnt/jenkins-workspace/workspace/mediawiki-core-qunit/src --dbtype=mysql --dbserver=127.0.0.1:3306 --dbuser=jenkins_u2 --dbpass=pw_jenkins_u2 --dbname=jenkins_u2_mw --pass testpass TestWiki WikiAdmin https://phabricator.wikimedia.org/T126327
  • 12:46 hashar: Mass testing php loop of death: salt -v '*slave*' cmd.run 'timeout 2s /srv/deployment/integration/slave-scripts/bin/php --version'
  • 12:40 hashar: mass rebooting CI slaves from wikitech
  • 12:39 hashar: salt -v '*' cmd.run "bash -c 'cd /srv/deployment/integration/slave-scripts; git pull'"
  • 12:33 hashar: all slaves dieing due to PHP looping
  • 12:02 legoktm: re-enabling puppet on all trusty/precise slaves
  • 11:20 legoktm: cherry-picked https://gerrit.wikimedia.org/r/#/c/269370/ on integration-puppetmaster
  • 11:20 legoktm: enabling puppet just on integration-slave-trusty-1012
  • 11:13 legoktm: disabling puppet on all *(trusty|precise)* slaves
  • 10:26 hashar: pooling in integration-slave-trusty-1018
  • 03:19 legoktm: deploying https://gerrit.wikimedia.org/r/269359
  • 02:53 legoktm: deploying https://gerrit.wikimedia.org/r/238988
  • 00:39 hashar: gallium edited /usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/trigger/gerrit.py and modified: replication_timeout = 300 -> replication_timeout = 10
  • 00:37 hashar: live hacking Zuul code to have it stop sleeping() on force merge
  • 00:36 hashar: killing zuul

2016-02-08

2016-02-06

  • 18:34 jzerebecki: reloading zuul for bdb2ed4..46ccca9

2016-02-05

  • 13:30 hashar: beta cleaning out /data/project/logs/archive was from pre logstash area. We no more log this way since May 2015 apparently
  • 13:29 hashar: beta deleting /data/project/swift-disk created in august 2014 , unused since june 2015. Was a fail attempt at bringing swift to beta
  • 13:27 hashar: beta: reclaiming disk space from extensions.git. On bastion: find /srv/mediawiki-staging/php-master/extensions/.git/modules -maxdepth 1 -type d -print -execdir git gc \;
  • 13:03 hashar: integration-slave-trusty-1011 went out of disk space. Did some brute clean up and git gc.
  • 05:21 Tim: configured mediawiki-extensions-qunit to only run on integration-slave-trusty-1017, did a rebuild and then switched it back

2016-02-04

  • 22:08 jzerebecki: reloading zuul for bed7be1..f57b7e2
  • 21:51 hashar: salt-key -d integration-slave-jessie-1001.eqiad.wmflabs
  • 21:50 hashar: salt-key -d integration-slave-precise-1011.eqiad.wmflabs
  • 00:57 bd808: Got deployment-bastion processing Jenkins jobs again via instructions left by my past self at https://phabricator.wikimedia.org/T72597#747925
  • 00:43 bd808: Jenkins agent on deployment-bastion.eqiad doing the trick where it doesn't pick up jobs again

2016-02-03

  • 22:24 bd808: Manually ran sync-common on deployment-jobrunner01.eqiad.wmflabs to pickup wmf-config changes that were missing (InitializeSettings, Wikibase, mobile)
  • 17:43 marxarelli: Reloading Zuul to deploy previously undeployed Icd349069ec53980ece2ce2d8df5ee481ff44d5d0 and Ib18fe48fe771a3fe381ff4b8c7ee2afb9ebb59e4
  • 15:12 hashar: apt-get upgrade deployment-sentry2
  • 15:03 hashar: redeployed rcstream/rcstream on deployment-stream by using git-deploy on deployment-bastion
  • 14:55 hashar: upgrading deployment-stream
  • 14:42 hashar: pooled back integration-slave-trusty-1015 Seems ok
  • 14:35 hashar: manually triggered a bunch of browser tests jobs
  • 11:40 hashar: apt-get upgrade deployment-ms-be01 and deployment-ms-be02
  • 11:32 hashar: fixing puppet.conf on deployment-memc04
  • 11:09 hashar: restarting beta cluster puppetmaster just in case
  • 11:07 hashar: beta: apt-get upgrade on delpoyment-cache* hosts and checking puppet
  • 10:59 hashar: integration/beta: deleting /etc/apt/apt.conf.d/*proxy files. There is no need for them, in fact web proxy is not reachable from labs
  • 10:53 hashar: integration: switched puppet repo back to 'production' branch, rebased.
  • 10:49 hashar: various beta cluster have puppet errors ..
  • 10:46 hashar: integration-slave-trusty-1013 heading to out of disk space on /mnt ...
  • 10:42 hashar: integration-slave-trusty-1016 out of disk space on /mnt ...
  • 03:45 bd808: Puppet failing on deployment-fluorine with "Error: Could not set uid on user[datasets]: Execution of '/usr/sbin/usermod -u 10003 datasets' returned 4: usermod: UID '10003' already exists"
  • 03:44 bd808: Freed 28G by deleting deployment-fluorine:/srv/mw-log/archive/*2015*
  • 03:42 bd808: Ran deployment-bastion.deployment-prep:/home/bd808/cleanup-var-crap.sh and freed 565M

2016-02-02

  • 18:32 marxarelli: Reloading Zuul to deploy If1f3cb60f4ccb2c1bca112900dbada03a8588370
  • 17:42 marxarelli: cleaning mwext-donationinterfacecore125-testextension-php53 workspace on integration-slave-precise-1013
  • 17:06 ostriches: running sync-common on mw2051 and mw1119
  • 09:38 hashar: Jenkins is fully up and operational
  • 09:33 hashar: restarting Jenkins
  • 08:47 hashar: pooling back integration-slave-precise1011 , puppet run got fixed ( https://phabricator.wikimedia.org/T125474 )
  • 03:48 legoktm: deploying https://gerrit.wikimedia.org/r/267828
  • 03:29 legoktm: deploying https://gerrit.wikimedia.org/r/266941
  • 00:42 legoktm: due to T125474
  • 00:42 legoktm: marked integration-slave-precise-1011 as offline
  • 00:39 legoktm: precise-1011 slave hasn't had a puppet run in 6 days

2016-02-01

  • 23:53 bd808: Logstash working again; I applied a change to the default mapping template for Elasticsearch that ensures that fields named "timestamp" are indexed as plain strings
  • 23:46 bd808: Elasticsearch index template for beta logstash cluster making crappy guesses about syslog events; dropped 2016-02-01 index; trying to fix default mappings
  • 23:09 bd808: HHVM logs causing rejections during document parse when inserting in Elasticsearch from logstash. They contain a "timestamp" field that looks like "Feb 1 22:56:39" which is making the mapper in Elasticsearch sad.
  • 23:04 bd808: Elasticsearch on deployment-logstash2 rejecting all documents with 400 status. Investigating
  • 22:50 bd808: Copying deployment-logstash2.deployment-prep:/var/log/logstash/logstash.log to /srv for debugging later
  • 22:48 bd808: deployment-logstash2.deployment-prep:/var/log/logstash/logstash.log is 11G of fail!
  • 22:46 bd808: root partition on deployment-logstash2 full
  • 22:43 bd808: No data in logstash since 2016-01-30T06:55:37.838Z; investigating
  • 15:33 hashar: Image ci-jessie-wikimedia-1454339883 in wmflabs-eqiad is ready
  • 15:01 hashar: Refreshing Nodepool image. Might have npm/grunt properly set up
  • 03:15 legoktm: deploying https://gerrit.wikimedia.org/r/267630

2016-01-31

  • 13:35 hashar: Jenkins IRC bot started falling at Jan 30 01:04:00 2016 for whatever reason.... Should be fine now
  • 13:33 hashar: cancelling/aborting jobs that are stuck while reporting to IRC (mostly browser tests and beta cluster jobs)
  • 13:32 hashar: Jenkins jobs are being blocked because they can no more report back to IRC :-(((
  • 13:28 hashar: Jenkins jobs are being blocked because they can no more report back to IRC :-(((

2016-01-30

  • 12:46 hashar: integration-slave-jessie-1001 : fixed puppet.con server name and ran puppet

2016-01-29

  • 18:43 thcipriani: updated scap on beta
  • 16:44 thcipriani: deployed scap updates on beta
  • 11:58 _joe_: upgraded hhvm to 3.6 wm8 in deployment-prep

2016-01-28

  • 23:22 MaxSem: Updated portals on betalabs to master
  • 22:23 hashar: salt '*slave-precise*' cmd.run 'apt-get install php5-ldap' ( https://phabricator.wikimedia.org/T124613 ) will need to be puppetized
  • 18:17 thcipriani: cleaning npm cache on slave machines: salt -v '*slave*' cmd.run 'sudo -i -u jenkins-deploy -- npm cache clean'
  • 18:12 thcipriani: running npm cache clean on integration-slave-precise-1011 sudo -i -u jenkins-deploy -- npm cache clean
  • 15:25 hashar: apt-get upgrade deployment-sca01 and deployment-sca02
  • 15:09 hashar: fixing puppet.conf hostname on deployment-upload deployment-conftool deployment-tmh01 deployment-zookeeper01 and deployment-urldownloader
  • 15:06 hashar: fixing puppet.con hostname on deployment-upload.deployment-prep.eqiad.wmflabs and running puppet
  • 15:00 hashar: Running puppet on deployment-memc02 and deployment-elastic07 . It is catching up with lot of changes
  • 14:59 hashar: fixing puppet hostnames on deployment-elastic07
  • 14:59 hashar: fixing puppet hostnames on deployment-memc02
  • 14:55 hashar: Deleted salt keys deployment-pdf01.eqiad.wmflabs and deployment-memc04.eqiad.wmflabs (obsolete, entries with '.deployment-prep.' are already there)
  • 07:38 jzerebecki: reload zuul for 4951444..43a030b
  • 05:55 jzerebecki: doing https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code.2Fdb_update
  • 03:49 mobrovac: deployment-prep re-enabled puppet on deployment-restbase0x
  • 02:49 mobrovac: deployment-prep deployment-restbase01 disabled puppet to set up cassandra for
  • 02:27 mobrovac: deployment-prep recreating deployment-restbase01 for T125003
  • 02:23 mobrovac: deployment-prep deployment-restbase02 disabled puppet to recreate deployment-restbase01 for T125003
  • 01:42 mobrovac: deployment-prep recreating deployment-sca02 for T125003
  • 01:28 mobrovac: deployment-prep recreating deployment-sca01 for T125003
  • 00:36 mobrovac: deployment-prep re-imaging deployment-mathoid for T125003
  • 00:02 jzerebecki: integration-slave-trusty-1016:~$ sudo -i rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/skins/Donate

2016-01-27

  • 23:49 jzerebecki: integration-slave-precise-1011:~$ sudo -i /etc/init.d/salt-minion restart
  • 23:46 jzerebecki: work around https://phabricator.wikimedia.org/T117710 : salt --show-timeout '*slave*' cmd.run 'rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/skins/BlueSky'
  • 21:19 cscott: updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf (should be no-op after yesterday's deploy)
  • 10:29 hashar: triggered bunch of browser tests, deployment-redis01 was dead/faulty
  • 10:08 hashar: mass restarting redis-server process on deployment-redis01 (for https://phabricator.wikimedia.org/T124677 )
  • 10:07 hashar: mass restarting redis-server process on deployment-redis01
  • 09:00 hashar: beta: commenting out "latency-monitor-threshold 100" parameter from any /etc/redis/redis.conf we have ( https://phabricator.wikimedia.org/T124677 ). Puppet will not reapply it unless distribution is Jessie

2016-01-26

  • 16:51 cscott: updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf
  • 12:14 hashar: Added Jenkins IRC bot (wmf-insecte) to #wikimedia-perf for https://gerrit.wikimedia.org/r/#/c/265631/
  • 09:30 hashar: restarting Jenkins to upgrade the gearman plugin with https://review.openstack.org/#/c/271543/
  • 04:18 bd808: integration-slave-jessie-1001:/mnt full; cleaned up 15G of files in /mnt/pbuilder/build (27 hours after the last time I did that)

2016-01-25

  • 18:59 twentyafterfour: started redis-server on deployment-redis01 by commenting out latency-monitor-threshold from the redis.conf
  • 15:22 hashar: CI: fixing kernels not upgrading via: rm /boot/grub/menu.lst ; update-grub -y (i.e.: regenerate the Grub menu from scratch)
  • 14:21 hashar: integration-slave-trusty-1015.integration.eqiad.wmflabs is gone. I have failed the kernel upgrade / grub update
  • 01:35 bd808: integration-slave-jessie-1001:/mnt full; cleaned up 15G of files in /mnt/pbuilder/build

2016-01-24

2016-01-22

  • 23:58 legoktm: removed skins from mwext-qunit workspace on trusty-1013 slave
  • 23:34 legoktm: rm -rf /mnt/jenkins-workspace/workspace/mediawiki-phpunit-php53 on slave precise 1012
  • 22:45 legoktm: deploying https://gerrit.wikimedia.org/r/265864
  • 22:27 hashar: rebooted all CI slaves using OpenStackManager
  • 22:09 hashar: rebooting deployment-redis01 (kernel upgrade)
  • 21:22 hashar: Image ci-jessie-wikimedia-1453497269 in wmflabs-eqiad is ready (with node 4.2 for https://phabricator.wikimedia.org/T119143 )
  • 21:14 hashar: updating nodepool snapshot based on new image
  • 21:12 hashar: rebuilding nodepool reference image
  • 20:04 hashar: Image ci-jessie-wikimedia-1453492820 in wmflabs-eqiad is ready
  • 20:00 hashar: Refreshing nodepool image to hopefully get Nodejs 4.2.4 https://phabricator.wikimedia.org/T124447 https://gerrit.wikimedia.org/r/#/c/265802/
  • 16:32 hashar: Nuked corrupted git repo on integration-slave-precise-1012 /mnt/jenkins-workspace/workspace/mediawiki-extensions-php53
  • 12:23 hashar: beta: reinitialized keyholder on deployment-bastion. The proxy apparently had no identity
  • 09:32 hashar: beta cluster Jenkins job have been stalled for 9hours and 25 minutes. Disabling/reenabling the Gearman plugin to remove the deadlock

2016-01-21

  • 21:41 hashar: restored role::mail::mx on deployment-mx
  • 21:36 hashar: dropping role::mail::mx from deployment-mx to let puppet run
  • 21:33 hashar: rebooting deployment-jobrunner01 / kernel upgrade / /tmp is only 1MBytes
  • 21:19 hashar: fixing up deployment-jobrunner01 /tmp and / disks are full
  • 19:57 thcipriani: ran REPAIR TABLE globalnames; on centralauth db
  • 19:48 legoktm: deploying https://gerrit.wikimedia.org/r/265552
  • 19:39 legoktm: deploying jjb changes for https://gerrit.wikimedia.org/r/264990
  • 19:25 legoktm: deploying https://gerrit.wikimedia.org/r/265546
  • 01:59 jzerebecki: jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions/SpellingDictionary$ rm -r modules/jquery.uls && git rm modules/jquery.uls
  • 01:00 jzerebecki: jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions$ git pull && git submodule update --init --recursive
  • 00:57 jzerebecki: jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions$ git reset HEAD SpellingDictionary

2016-01-20

  • 20:05 hashar: beta sudo find /data/project/upload7/math -type f -delete (probably some old left over)
  • 19:50 hashar: beta: on commons ran deleteArchivedFile.php : Nuked 7130 files
  • 19:49 hashar: beta : foreachwiki deleteArchivedRevisions.php -delete
  • 19:26 hasharAway: Nuked all files from http://commons.wikimedia.beta.wmflabs.org/wiki/Category:GWToolset_Batch_Upload
  • 19:19 hasharAway: beta: sudo find /data/project/upload7/*/*/temp -type f -delete
  • 19:14 hasharAway: beta: sudo rm /data/project/upload7/*/*/lockdir/*
  • 18:57 hasharAway: beta cluster code has been stalled for roughly 2h30
  • 18:55 hasharAway: disconnecting Gearman plugin to remove deadlock for beta cluster rjobs
  • 17:06 hashar: clearing files from beta-cluster to prepare for Swift migration. python pwb.py delete.py -family:betacommons -lang:en -cat:'GWToolset Batch Upload' -verbose -putthrottle:0 -summary:'Clearing out old batched upload to save up disk space for Swift migration'

2016-01-19

2016-01-17

2016-01-16

2016-01-15

  • 12:17 hashar: restarting Jenkins for plugins updates
  • 02:49 bd808: Trying to fix submodules in deployment-bastion:/srv/mediawiki-staging/php-master/extensions for T123701

2016-01-14

2016-01-13

  • 21:06 hashar: beta cluster code is up to date again. Got delayed by roughly 4 hours.
  • 20:55 hashar: unlocked Jenkins jobs for beta cluster by disabling/reenabling Jenkins Gearman client
  • 10:15 hashar: beta: fixed puppet on deployment-elastic06 . Was still using cert/hostname without .deployment-prep. .... Mass update occurring.

2016-01-12

2016-01-11

  • 22:24 hashar: Deleting old references on Zuul-merger for mediawiki/core : /usr/share/python/zuul/bin/python /home/hashar/zuul-clear-refs.py --until 15 /srv/ssd/zuul/git/mediawiki/core
  • 22:21 hashar: gallium in /srv/ssd/zuul/git/mediawiki/core$ git gc --prune=all && git remote update --prune
  • 22:21 hashar: scandium in /srv/ssd/zuul/git/mediawiki/core$ git gc --prune=all && git remote update --prune
  • 07:35 legoktm: deploying https://gerrit.wikimedia.org/r/263319

2016-01-07

2016-01-06

  • 21:13 thcipriani: kicking integration puppetmaster, weird node unable to find definition.
  • 21:11 jzerebecki: on scandium: sudo -u zuul rm -rf /srv/ssd/zuul/git/mediawiki/services/mathoid
  • 21:04 legoktm: ^ on gallium
  • 21:04 legoktm: manually deleted /srv/ssd/zuul/git/mediawiki/services/mathoid to force zuul to re-clone it
  • 20:17 hashar: beta: dropped a few more /etc/apt/apt.conf.d/*-proxy files. webproxy is no more reachable from labs
  • 09:44 hashar: CI/beta: deleting all git tags from /var/lib/git/operations/puppet and doing git repack
  • 09:39 hashar: restoring puppet hacks on beta cluster puppetmaster.
  • 09:35 hashar: beta/CI: salt -v '*' cmd.run 'rm -v /etc/apt/apt.conf.d/*-proxy' https://phabricator.wikimedia.org/T122953

2016-01-05

2016-01-04

2016-01-02

  • 03:17 yurik: purged varnishs on deployment-cache-text04

2016-01-01

  • 22:17 bd808: No nodepool ci-jessie-* hosts seen in Jenkins interface and rake-jessie jobs backing up

Archive