You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Release Engineering/SAL: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(reloading zuul for 9398fa1..943f17b (jzerebecki))
imported>Stashbot
(taavi: reload zuul for https://gerrit.wikimedia.org/r/808021)
Line 1: Line 1:
== 2016-03-02 ==
== 2022-06-23 ==
* 16:22 jzerebecki: reloading zuul for 9398fa1..943f17b
* 15:59 taavi: reload zuul for https://gerrit.wikimedia.org/r/808021
* 10:38 hashar: Zuul should no more be caught in death loop due to Depends-On on an  event-schemas change. Hole filled with https://gerrit.wikimedia.org/r/#/c/274356/ T128569
* 08:53 hashar: gerrit set-account Jsahleen --inactive    T108854
* 01:19 thcipriani: force restarting zuul because the queue is very stuck https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Restart
* 01:13 thcipriani: following steps for gearman deadlock: https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Known_issues


== 2016-03-01 ==
== 2022-06-22 ==
* 23:10 Krinkle: Updated Jenkins configuration to also support php5 and hhvm for Console Sections detection of "PHPUnit"
* 17:36 taavi: gerrit: add tfellows to the extension-OpenBadges group per request in [[phab:T308278|T308278]]
* 17:05 hashar: gerrit: set accounts inactive for Eloquence and Mgrover. Former employees of wmf and mail bounceback
* 17:35 taavi: gerrit: create group extension-JsonData with robla in it, make it an owner of mediawiki/extensions/JsonData per request in [[phab:T303147|T303147]]
* 16:41 hashar: Restarted Jenkins
* 16:19 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/807586
* 16:32 hashar: Bunch of Jenkins job got stall because I have killed threads in Jenkins to unblock  integration-slave-trusty-1003 :-(
* 09:35 hashar: Switched `gitlab-prod-1001.devtools.eqiad1.wikimedia.cloud` instance to use the project Puppet master `puppetmaster-1001.devtools.eqiad1.wikimedia.cloud`
* 12:14 hashar: integration-slave-trusty-1003 is back online
* 09:08 hashar: contint1001 , contint2002: deleting `.git/logs` from all zuul-merger repositories. We do not need the reflog `sudo -u zuul find /srv/zuul/git -type d -name .git -print -execdir rm -fR .git/logs \;` # [[phab:T307620|T307620]]
* 12:13 hashar: Might have killed the proper Jenkins thread to unlock integration-slave-trusty-1003
* 09:00 hashar: contint1001 , contint2002: setting `core.logallrefupdates=false` on all Zuul merger git repositories: `sudo -u zuul find /srv/zuul/git -type d -name .git -print -execdir git config core.logallrefupdates false \;` # [[phab:T307620|T307620]]
* 12:03 hashar: Jenkins can not pool back integration-slave-trusty-1003  Jenkins master has a bunch of blocking threads pilling up with hudson.plugins.sshslaves.SSHLauncher.afterDisconnect() locked somehow
* 07:46 hashar: Building operations-puppet docker image for https://gerrit.wikimedia.org/r/c/integration/config/+/807180
* 11:41 hashar: Rebooting integration-slave-trusty-1003 (does not reply to salt / ssh)
* 10:34 hashar: Image ci-jessie-wikimedia-1456827861 in wmflabs-eqiad is ready
* 10:24 hashar: Refreshing Nodepool snapshot instances
* 10:22 hashar: Refreshing Nodepool base image to speed instances boot time (dropping open-iscsi package https://gerrit.wikimedia.org/r/#/c/273973/ )


== 2016-02-29 ==
== 2022-06-21 ==
* 16:23 hashar: salt -v '*slave*' cmd.run 'rm -fR /mnt/jenkins-workspace/workspace/mwext*jslint' T127362
* 22:01 brennen: gitlab-runners: re-registering all shared runners
* 16:17 hashar: Deleting all mwext-.*-jslint jobs from Jenkins. Paladox has migrated all of them to jshint/jsonlint generic jobs T127362
* 17:55 dancy: Upgrading scap to 4.9.4-1+0~20220621174226.320~1.gbp56e4d4 in beta cluster
* 16:16 hashar: Deleting all mwext-.*-jslint jobs from Jenkins. Paladox has migrated all of them to jshint/jsonlint generic jobs
* 09:46 hashar: Jenkins installing Yaml Axis Plugin 0.2.0


== 2016-02-28 ==
== 2022-06-20 ==
* 01:30 Krinkle: Rebooting integration-slave-precise-1012 – Might help T109704 (MySQL not running)
* 16:30 urbanecm: add sgimeno as a project member (Growth engineer with need for access)
* 15:50 ori: On deployment-cache-<nowiki>{</nowiki>text,upload<nowiki>}</nowiki>06, ran: touch /srv/trafficserver/tls/etc/ssl_multicert.config && systemctl reload trafficserver-tls.service ([[phab:T310957|T310957]])
* 14:07 ori: restarted acme-chief on deployment-acme-chief03


== 2016-02-26 ==
== 2022-06-17 ==
* 15:14 jzerebecki: salt -v --show-timeout '*slave*' cmd.run "bash -c 'cd /srv/deployment/integration/slave-scripts; git pull'" T128191
* 17:15 ori: provisioned deployment-cache-text07 in deployment-prep to test query normalization via VCL
* 15:14 jzerebecki: salt -v --show-timeout '*slave*' cmd.run "bash -c 'cd /srv/deployment/integration/slave-scripts; git pull'"
* 01:08 TimStarling: on deployment-docker-cpjobqueue01 and deployment-docker-changeprop01 I redeployed the changeprop configuration, reverting the PHP 7.4 hack
* 14:44 hashar: (since it started, dont be that scared!)
* 14:44 hashar: Nodepool has triggered 40 000 instances
* 11:53 hashar: Restarted memcached on deployment-memc02  T128177
* 11:53 hashar: memcached process on deployment-memc02 seems to have a nice leak of socket usages (from lost) and plainly refuse connections (bunch of CLOSE_WAIT)  T128177
* 11:53 hashar: memcached process on deployment-memc02 seems to have a nice leak of socket usages (from lost) and plainly refuse connections (bunch of CLOSE_WAIT)
* 11:40 hashar: deployment-memc04 find /etc/apt -name '*proxy' -delete  (prevented apt-get update)
* 11:26 hashar: beta: salt -v '*' cmd.run 'apt-get -y install ruby-msgpack'  . I am tired of seeing puppet debug messages: "Debug: Failed to load library 'msgpack' for feature 'msgpack'"
* 11:24 hashar: puppet keep restarting nutcracker apparently T128177
* 11:20 hashar: Memcached error for key "enwiki:flow_workflow%3Av2%3Apk:63dc3cf6a7184c32477496d63c173f9c:4.8" on server "127.0.0.1:11212": SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY


== 2016-02-25 ==
== 2022-06-16 ==
* 22:38 hashar: beta: maybe deployment-jobunner01 is processing jobs a bit faster now.  Seems like hhvm went wild
* 12:24 hashar: gitlab: runner-1030: `docker volume prune -f`
* 22:23 hashar: beta: jobrunner01  had apache/hhvm killed somehow .... Blame me
* 12:24 hashar: gitlab: runner-1026: `docker volume prune -f`
* 21:56 hashar: beta: stopped jobchron / jobrunner on deployment-jobrunner01  and restarting them by running puppet
* 10:02 elukey: ran `scap install-world --batch` to allow scap/puppet to work on ml-cache100[2,3]
* 21:49 hashar: beta did a git-deploy of jobrunner/jobrunner hoping to fix puppet run on deployment-jobrunner01 and apparently it did! T126846
* 11:21 hashar: deleting workspace /mnt/jenkins-workspace/workspace/browsertests-Wikidata-WikidataTests-linux-firefox-sauce on slave-trusty-1015
* 10:08 hashar: Jenkins upgraded T128006
* 01:44 legoktm: deploying https://gerrit.wikimedia.org/r/273170
* 01:39 legoktm: deploying https://gerrit.wikimedia.org/r/272955 (undeployed) and https://gerrit.wikimedia.org/r/273136
* 01:37 legoktm: deploying https://gerrit.wikimedia.org/r/273136
* 00:31 thcipriani: running puppet on beta to update scap to latest packaged version: sudo salt -b '10%' -G 'deployment_target:scap/scap' cmd.run 'puppet agent -t'
* 00:20 thcipriani: deployment-tin not accepting jobs for some time, ran through https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code.2Fdb_update, is back now


== 2016-02-24 ==
== 2022-06-15 ==
* 19:55 legoktm: legoktm@deployment-tin:~$ mwscript extensions/ORES/maintenance/PopulateDatabase.php --wiki=enwiki
* 22:39 brennen: phabricator: tagged release/2022-06-15/1 ([[phab:T310742|T310742]])
* 18:30 bd808: "configuration file '/etc/nutcracker/nutcracker.yml' syntax is invalid"
* 16:31 hashar: integration-agent-docker-1035: docker image prune
* 18:27 bd808: nutcracker dead on mediawiki01; investigating
* 15:26 dancy: Upgrading scap to 4.9.4-1+0~20220615151557.315~1.gbped3b8d in beta cluster
* 17:20 hashar: Deleted Nodepool instances so new ones get to use the new snapshot ci-jessie-wikimedia-1456333979
* 17:12 hashar: Refreshing nodepool snapshot. Been stall since Feb 15th T127755
* 17:01 bd808: https://wmflabs.org/sal/releng missing SAL data since 2016-02-20T20:19 due to bot crash; needs to be backfilled from wikitech data (T127981)
* 16:43 hashar: sal on elastic search is stall https://phabricator.wikimedia.org/T127981
* 15:07 hasharAW: beta app servers have lost access to memcached due to bad nutcracker conf | T127966
* 14:41 hashar: beta: we have a lost a memcached server 11:51am UTC


== 2016-02-23 ==
== 2022-06-14 ==
* 22:45 thcipriani: deployment-puppetmaster is in a weird rebase state
* 21:30 TheresNoTime: clear out stuck `beta-scap-sync-world` jobs (repeatedly per each queued `beta-mediawiki-config-update-eqiad` job), queued jobs now running. monitored for until each job had run successfully. jobs up to date
* 22:25 legoktm: running sync-common manually on deployment-mediawiki02
* 17:18 brennen: starting 1.39.0-wmf.16 ([[phab:T308069|T308069]]) transcript in deploy1002:~brennen/1.39.0-wmf.16.log
* 09:59 hashar: Deleted a bunch of mwext-.*-jslint jobs that are no more in used (migrated to either 'npm' or  'jshint' / 'jsonlint' )
* 13:35 TheresNoTime: clear stuck `beta-scap-sync-world` job, other queued jobs now running. Cancel running `beta-update-databases-eqiad` job, will ensure it runs on the next timer
* 00:42 TimStarling: on deployment-deploy03 removed helm2, as was done in production


== 2016-02-22 ==
== 2022-06-13 ==
* 22:06 bd808: Restarted puppetmaster service on deployment-puppetmaster to "fix" error "invalid byte sequence in US-ASCII"
* 22:04 TheresNoTime: cleared out stalled Jenkins beta jobs on `deployment-deploy03`, manually started `beta-code-update-eqiad` job & watched to completion. all caught up
* 17:46 jzerebecki: ssh integration-slave-trusty-1017.eqiad.wmflabs 'sudo -u jenkins-deploy rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/.git/config.lock
* 04:33 hashar: Restarting Docker on contint1001.wikimedia.org , apparently can't build images anymore
* 16:47 gehel: deployment-prep upgrading deployment-logstash2 to elasticsearch 1.7.5
* 10:26 gehel: deployment-prep upgrading elastic-search to 1.7.5 on deployment-elastic0[5-8]


== 2016-02-20 ==
== 2022-06-12 ==
* 20:19 Krinkle: beta-code-update-eqiad job repeatedly stuck at "IRC notifier plugin"
* 21:13 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/804777
* 19:29 Krinkle: beta-code-update-eqiad broken because deployment-tin:/srv/mediawiki-staging/php-master/extensions/MobileFrontend/includes/MobileFrontend.hooks.php was modified on the server without commit
* 19:22 Krinkle: Various beta-mediawiki-config-update-eqiad jobs have been stuck 'queued' for > 24 hours


== 2016-02-19 ==
== 2022-06-10 ==
* 12:09 hashar: killed https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/  been running for 13 hours. Blocked because slave went offline due to labs reboots yesterday
* 15:20 James_F: Zuul: [mediawiki/extensions/SearchVue] Add initial CI jobs for [[phab:T309932|T309932]]
* 10:15 hashar: Creating a bunch of repository in GitHub to fix Gerrit replication errors
* 08:28 hashar: Reloaded Zuul to remove mediawiki/services/parsoid from CI dependencies # https://gerrit.wikimedia.org/r/c/integration/config/+/803990
* 04:27 TimStarling: on deployment-deploy03 running scap sync-world -v with PHP 7.4 for [[phab:T295578|T295578]]
* 04:03 TimStarling: on deployment-deploy03 running scap sync-world -v with PHP 7.2 for [[phab:T295578|T295578]] sanity check


== 2016-02-18 ==
== 2022-06-09 ==
* 19:20 legoktm: deploying https://gerrit.wikimedia.org/r/271583 and https://gerrit.wikimedia.org/r/271581, both no-ops
* 22:49 dancy: Upgrading scap to 4.9.1-1+0~20220609211227.304~1.gbpe48c42 in beta cluster
* 18:14 legoktm: deploying https://gerrit.wikimedia.org/r/271012
* 16:39 brennen: gitlab shared runners: re-registering to apply image allowlist configuration
* 17:36 legoktm: deploying https://gerrit.wikimedia.org/r/271555
* 16:01 hashar: deleting instance  integration-slave-precise-1003  think we have enough precise slaves
* 10:44 hashar: Nodepool: JenkinsException: Could not parse JSON info for server[https://integration.wikimedia.org/ci/]


== 2016-02-17 ==
== 2022-06-08 ==
* 07:36 legoktm: deploying https://gerrit.wikimedia.org/r/271201
* 17:14 hashar: Reloaded Zuul for {{Gerrit|I39342265033e82ae13998f53defe6612dc6819b4}}
* 01:01 yuvipanda: attempting to turn off NFS on 52 instances on deployment-prep project
* 15:57 dancy: Set `profile::mediawiki::php::restarts::ensure: present` in deployment-prep hiera config for [[phab:T237033|T237033]]
* 09:28 hashar: Reloaded Zuul for "Add doc publish for Translate" https://gerrit.wikimedia.org/r/792134


== 2016-02-16 ==
== 2022-06-06 ==
* 23:22 yuvipanda: new instances on deployment-prep no longer get NFS because of https://wikitech.wikimedia.org/w/index.php?title=Hiera%3ADeployment-prep&type=revision&diff=311783&oldid=311781
* 14:37 James_F: Zuul: [mediawiki/extensions/ImageSuggestions] Mark as in production for [[phab:T302711|T302711]]
* 23:18 hashar: jenkins@gallium find /var/lib/jenkins/config-history/nodes -maxdepth 1 -type d -name 'ci-jessie*' -exec rm -vfR {} \;
* 23:17 hashar: Jenkins accepting slave creations again. Root cause is /var/lib/jenkins/config-history/nodes/ has reached the 32k inode limit.
* 23:14 hashar: Jenkins: Could not create rootDir /var/lib/jenkins/config-history/nodes/ci-jessie-wikimedia-34969/2016-02-16_22-40-23
* 23:02 hashar: Nodepool can not authenticate with Jenkins anymore. Thus it can not add slaves it spawned.
* 22:56 hashar: contint: Nodepool instances pool exhausted
* 21:14 andrewbogott: deployment-logstash2 migration finished
* 20:49 jzerebecki: reloading zuul for 3bf7584..67fec7b
* 19:58 andrewbogott: migrating deployment-logstash2 to labvirt1010
* 19:00 hashar: tin: checking out mw 1.27.0-wmf.14
* 15:23 hashar: integration-make-wmfbranch : /mnt/make-wmf-branch  mount now has gid=wikidev and group setuid (i.e. mode 2775)
* 15:20 hashar: integration-make-wmfbranch : change tmpfs to /mnt/make-wmf-branch  (from /var/make-wmf-branch )
* 11:30 jzerebecki: T117710 integration-saltmaster:~# salt -v '*slave-trusty*' cmd.run 'rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm-composer/src/skins/BlueSky'
* 09:52 hashar: will cut the wmf branches this afternoon starting around 14:00 CET


== 2016-02-15 ==
== 2022-06-02 ==
* 16:28 jzerebecki: reloading zuul for 2d16ad3..3bb0afa
* 15:33 dancy: Upgrading scap to 4.8.1-1+0~20220602153109.295~1.gbp318d9c in beta cluster
* 16:10 hashar: Image ci-jessie-wikimedia-1455552377 in wmflabs-eqiad is ready
* 11:26 hashar: Restarting Jenkins on contint2001
* 15:25 jzerebecki: reloading zuul for e174335..2d16ad3
* 11:19 hashar: Restarting Jenkins on releases1002
* 15:23 hashar: Image ci-jessie-wikimedia-1455549539 in wmflabs-eqiad is ready
* 15:19 hashar: Regenerating Nodepool snapshot. Slave scripts have 0 bytes...
* 15:04 hashar: Slave scripts added to Nodepool instances! Image ci-jessie-wikimedia-1455548346 in wmflabs-eqiad is ready
* 11:05 hashar: Image ci-jessie-wikimedia-1455534001 in wmflabs-eqiad is ready
* 07:52 legoktm: deploying https://gerrit.wikimedia.org/r/270686
* 06:52 legoktm: legoktm@gallium:/srv/org/wikimedia/doc$ sudo -u jenkins-slave rm -rf EventLogging/ GuidedTour/ MultimediaViewer/ TemplateData/
* 06:22 legoktm: deploying https://gerrit.wikimedia.org/r/270677
* 06:12 legoktm: deploying https://gerrit.wikimedia.org/r/270675
* 06:02 legoktm: deploying https://gerrit.wikimedia.org/r/270674
* 05:56 legoktm: deploying https://gerrit.wikimedia.org/r/270673
* 05:32 legoktm: deploying https://gerrit.wikimedia.org/r/270670
* 04:05 legoktm: deploying https://gerrit.wikimedia.org/r/270667
* 03:26 legoktm: deploying https://gerrit.wikimedia.org/r/270665
* 02:56 legoktm: deploying https://gerrit.wikimedia.org/r/270657


== 2016-02-14 ==
== 2022-05-31 ==
* 23:54 legoktm: deploying https://gerrit.wikimedia.org/r/270656
* 21:16 dancy: Upgrading scap to 4.8.0-1+0~20220531211114.292~1.gbp8dbbcf in beta cluster
* 23:25 legoktm: deploying https://gerrit.wikimedia.org/r/270654
* 17:40 dancy: Upgrading scap to 4.8.0-1+0~20220531173912.291~1.gbp21a7ef in beta cluster
* 23:13 legoktm: also deploying https://gerrit.wikimedia.org/r/#/c/265098/
* 17:33 dancy: Reverted to scap 4.8.0-1+0~20220524160924.288~1.gbp794a08 in beta cluster
* 23:11 legoktm: deploying https://gerrit.wikimedia.org/r/270651
* 17:07 dancy: Upgrading scap to 4.8.0-1+0~20220531170512.289~1.gbp143729 in beta cluster
* 05:18 bd808: tools.stashbot Testing after restart (T126419)


== 2016-02-13 ==
== 2022-05-30 ==
* 06:42 bd808: restarted nutcracker on deployment-mediawiki01
* 11:47 jelto: apply gitlab-settings to gitlab1004 - [[phab:T307142|T307142]]
* 06:32 bd808: jobrunner on deployment-jobrunner01 enabled after reverting changes from T87928 that caused T126830
* 11:46 jelto: apply gitlab-settings to gitlab1003 - [[phab:T307142|T307142]]
* 05:51 bd808: disabled jobrunner process on jobrunner01; queue full of jobs broken by T126830
* 05:31 bd808: trebuchet clone of /srv/jobrunner/jobrunner broken on jobrunner01; failing puppet runs
* 05:25 bd808: jobrunner process on deployment-jobrunner01 badly broken; investigating
* 05:20 bd808: Ran https://phabricator.wikimedia.org/P2273 on deployment-jobrunner01.deployment-prep.eqiad.wmflabs; freed ~500M; disk utilization still at 94%


== 2016-02-12 ==
== 2022-05-28 ==
* 23:54 hashar: beta cluster broken since 20:30 UTC  https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/fatalmonitor  havent looked
* 19:09 TheresNoTime: deployment-deploy04 live, not referenced by anything [[phab:T309437|T309437]]
* 17:36 hashar: salt -v '*slave-trusty*' cmd.run 'apt-get -y install texlive-generic-extra'    # T126422
* 17:32 hashar: adding texlive-generic-extra on CI slaves by cherry picking https://gerrit.wikimedia.org/r/#/c/270322/ - T126422
* 17:19 hashar: get rid of integration-dev   it is broken somehow
* 17:10 hashar: Nodepool back at spawning instances.  contintcloud has been migrated in wmflabs
* 16:51 thcipriani: running  sudo salt '*' -b '10%' deploy.fixurl to fix deployment-prep trebuchet urls
* 16:31 hashar: bd808 added support for saltbot to update tasks automagically!!!! T108720
* 03:10 yurik: attempted to sync graphoid from gerrit 270166 from deployment-tin, but it wouldn't sync.  Tried to git pull sca02, submodules wouldn't pull


== 2016-02-11 ==
== 2022-05-27 ==
* 22:53 thcipriani: shutting down deployment-bastion
* 22:55 zabe: zabe@deployment-mwmaint02:~$ mwscript extensions/WikiLambda/maintenance/updateTypedLists.php --wiki=wikifunctionswiki --db # started ~20 min ago
* 21:28 hashar: pooling back slaves 1001 to 1006
* 22:49 TheresNoTime: manually running database update script: samtar@deployment-deploy03:~$ /usr/local/bin/wmf-beta-update-databases.py
* 21:18 hashar: re enabling hhvm service on slaves ( https://phabricator.wikimedia.org/T126594 ) Some symlink is missing and only provided by the upstart script grrrrrrr https://phabricator.wikimedia.org/T126658
* 22:09 TheresNoTime: samtar@deployment-deploy03:~$ sudo keyholder arm
* 20:52 legoktm: deploying https://gerrit.wikimedia.org/r/270098
* 21:44 TheresNoTime: hard rebooted deployment-deploy03 as soft reboot unresponsive
* 20:35 hashar: depooling the six recent slaves: /usr/lib/x86_64-linux-gnu/hhvm/extensions/current/luasandbox.so cannot open shared object file
* 21:44 bd808: `sudo wmcs-openstack role add --user zabe --project deployment-prep projectadmin` ([[phab:T309419|T309419]])
* 20:29 hashar: pooling integration-slave-trusty-1004 integration-slave-trusty-1005 integration-slave-trusty-1006
* 21:10 zabe: zabe@deployment-deploy03:~$ sudo keyholder arm
* 20:14 hashar: pooling integration-slave-trusty-1001 integration-slave-trusty-1002 integration-slave-trusty-1003
* 20:53 bd808: `sudo wmcs-openstack role add --user samtar --project deployment-prep projectadmin` ([[phab:T309415|T309415]])
* 19:35 marxarelli: modifying deployment server node in jenkins to point to deployment-tin
* 20:49 dancy: Initiated hard reboot of deployment-deploy03.deployment-prep
* 19:27 thcipriani: running sudo salt -b '10%' '*' cmd.run 'puppet agent -t' from deployment-salt
* 19:27 twentyafterfour: Keeping notes on the ticket: https://phabricator.wikimedia.org/T126537
* 19:24 thcipriani: moving deployment-bastion to deployment-tin
* 17:59 hashar: recreated instances with proper names:  integration-slave-trusty-{1001-1006}
* 17:52 hashar: Created integration-slave-trusty-{1019-1026} as m1.large  (note 1023 is an exception it is for Android).  Applied role::ci::slave , lets wait for puppet to finish
* 17:42 Krinkle: Currently testing https://gerrit.wikimedia.org/r/#/c/268802/ in Beta Labs
* 17:27 hashar: Depooling all the ci.medium slaves and deleting them.
* 17:27 hashar: I tried.  The ci.medium instances are too small and MediaWiki tests really need 1.5GBytes of memory :-(
* 16:00 hashar: rebuilding integration-dev https://phabricator.wikimedia.org/T126613
* 15:27 Krinkle: Deploy Zuul config change https://gerrit.wikimedia.org/r/269976
* 11:46 hashar: salt -v '*' cmd.run '/etc/init.d/apache2 restart'  might help for Wikidata browser tests failling
* 11:32 hashar: disabling hhvm service on CI slaves ( https://phabricator.wikimedia.org/T126594 , cherry picked both patches )
* 10:50 hashar: reenabled puppet on CI. All transitioned to a 128MB tmpfs (was 512MB)
* 10:16 hashar: pooling back integration-slave-trusty-1009 and integration-slave-trusty-1010  (tmpfs shrunken)
* 10:06 hashar: disabling puppet on all CI slaves. Trying to lower tmpfs 512MB to 128MB  ( https://gerrit.wikimedia.org/r/#/c/269880/ )
* 02:45 legoktm: deploying https://gerrit.wikimedia.org/r/269853 https://gerrit.wikimedia.org/r/269893


== 2016-02-10 ==
== 2022-05-26 ==
* 23:54 hashar_: depooling Trusty slaves that only have 2GB of ram that is not enough.  https://phabricator.wikimedia.org/T126545
* 18:33 dancy: Updated Jenkins beta-* job configs
* 22:55 hashar_: gallium: find /var/lib/jenkins/config-history/config -type f -wholename '*/2015*' -delete  (  https://phabricator.wikimedia.org/T126552 )
* 16:51 TheresNoTime: manually triggered beta-update-databases-eqiad post-merge of {{Gerrit|2c7b5825}}
* 22:34 Krinkle: Zuul is back up and procesing Gerrit events, but jobs are still queued indefinitely. Jenkins is not accepting new jobs
* 16:51 brennen: puppetmaster-1001.devtools: resetting ops/puppet checkout to production branch
* 22:31 Krinkle: Full restart of Zuul. Seems Gearman/Zuul got stuck. All executors were idling. No new Gerrit events processed either.
* 21:22 legoktm: cherry-picking https://gerrit.wikimedia.org/r/#/c/269370/ on integration-puppetmaster again
* 21:17 hashar: CI dust have settled.  Krinkle and I have pooled a lot more Trusty slaves to accommodate for the overload caused by switching to php55 (jobs run on Trusty)
* 21:08 hashar: pooling trusty slaves 1009, 1010, 1021, 1022  with 2 executors  (they are ci.medium)
* 20:38 hashar: cancelling mediawiki-core-jsduck-publish  and mediawiki-core-doxygen-publish jobs manually.  They will catch up on next merge
* 20:34 Krinkle: Pooled integration-slave-trusty-1019 (new)
* 20:28 Krinkle: Pooled integration-slave-trusty-1020 (new)
* 20:24 Krinkle: created integration-slave-trusty-1019 and integration-slave-trusty-1020 (ci1.medium)
* 20:18 hashar: created integration-slave-trusty-1009 and 1010 (trusty ci.medium)
* 20:06 hashar: creating integration-slave-trusty-1021 and integration-slave-trusty-1022 (ci.medium)
* 19:48 greg-g: that cleanup was done by apergos
* 19:48 greg-g: did cleanup across all integration slaves, some were very close to out of room. results:  https://phabricator.wikimedia.org/P2587
* 19:43 hashar: Dropping slaves Precise m1.large  integration-slave-precise-1014 and  integration-slave-precise-1013 , most load shifted to Trusty (php53 -> php55 transition)
* 18:20 Krinkle: Creating a Trusty slave to support increased demand following MediaWIki php53(precise)>php55(trusty) bump
* 16:06 jzerebecki: reloading zuul for 41a92d5..5b971d1
* 15:42 jzerebecki: reloading zuul for 639dd40..41a92d5
* 14:12 jzerebecki: recover a bit of disk space: integration-saltmaster:~# salt --show-timeout '*slave*' cmd.run 'rm -rf /mnt/jenkins-workspace/workspace/*WikibaseQuality*'
* 13:46 jzerebecki: reloading zuul for 639dd40
* 13:15 jzerebecki: reloading zuul for 3be81c1..e8e0615
* 08:07 legoktm: deploying https://gerrit.wikimedia.org/r/269619
* 08:03 legoktm: deploying https://gerrit.wikimedia.org/r/269613 and https://gerrit.wikimedia.org/r/269618
* 06:41 legoktm: deploying https://gerrit.wikimedia.org/r/269607
* 06:34 legoktm: deploying https://gerrit.wikimedia.org/r/269605
* 02:59 legoktm: deleting 14GB broken workspace of  mediawiki-core-php53lint from  integration-slave-precise-1004
* 02:37 legoktm: deleting /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm-composer on trusty-1017, it had a skin cloned into it
* 02:26 legoktm: queuing mwext jobs server-side to identify failing ones
* 02:21 legoktm: deploying https://gerrit.wikimedia.org/r/269582
* 01:03 legoktm: deploying https://gerrit.wikimedia.org/r/269576


== 2016-02-09 ==
== 2022-05-25 ==
* 23:17 legoktm: deploying https://gerrit.wikimedia.org/r/269551
* 18:38 TheresNoTime: (@ ~18:20UTC) samtar@deployment-mwmaint02:~$ mwscript resetUserEmail.php --wiki=wikidatawiki Mahir256 [snip] [[phab:T309230{{!}}T309230]]
* 23:02 legoktm: gracefully restarting zuul
* 15:46 dancy: Restarted apache2 on gerrit1001
* 22:57 legoktm: deploying https://gerrit.wikimedia.org/r/269547
* 22:29 legoktm: deploying https://gerrit.wikimedia.org/r/269540
* 22:18 legoktm: re-enabling puppet on all CI slaves
* 22:02 legoktm: reloading zuul to see if it'll pickup the new composer-php53 job
* 21:53 legoktm: enabling puppet on just integration-slave-trusty-1012
* 21:52 legoktm: cherry-picked https://gerrit.wikimedia.org/r/#/c/269370/ onto integration-puppetmaster
* 21:50 legoktm: disabling puppet on all trusty/precise CI slaves
* 21:40 legoktm: deploying https://gerrit.wikimedia.org/r/269533
* 17:49 marxarelli: disabled/enabled gearman in jenkins, connection works this time
* 17:49 marxarelli: performed stop/start of zuul on gallium to restore zuul and gearman
* 17:45 marxarelli: "Failed: Unable to Connect" in jenkins when testing gearman connection
* 17:40 marxarelli: killed old zull process manually and restarted service
* 17:39 marxarelli: restart of zuul fails as well. old process cannot be killed
* 17:38 marxarelli: reloading zuul fails with "failed to kill 13660: Operation not permitted"
* 16:06 bd808: Deleted corrupt integration-slave-precise-1003:/mnt/jenkins-workspace/workspace/mediawiki-core-php53lint/.git
* 15:11 hashar: mira: /srv/mediawiki-staging/multiversion/checkoutMediaWiki 1.27.0-wmf.13 php-1.27.0-wmf.13
* 14:51 hashar: ./make-wmf-branch -n 1.27.0-wmf.13 -o master
* 14:50 hashar: pooling back integration-slave-precise1001 - 1004.  Manually fetched git repos in workspace for  mediawiki core php53
* 14:49 hashar: make-wmf-branch instance: created a local ssh key pair and set the config to use User: hashar
* 14:13 hashar: pooling  https://integration.wikimedia.org/ci/computer/integration-slave-precise-1012/  Mysql is back .. Blame puppet
* 14:12 hashar: de pooling  https://integration.wikimedia.org/ci/computer/integration-slave-precise-1012/  Mysql is gone somehow
* 14:04 hashar: Manually git fetching  mediawiki-core in /mnt/jenkins-workspace/workspace/mediawiki-core-php53lint of slaves precise 1001 to 1004  (git on Precise is remarkably too slow)
* 13:28 hashar: salt '*trusty*' cmd.run 'update-alternatives --set php /usr/bin/hhvm'
* 13:28 hashar: salt '*precise*' cmd.run 'update-alternatives --set php /usr/bin/php5'
* 13:18 hashar: salt -v --batch=3 '*slave*' cmd.run 'puppet agent -tv'
* 13:15 hashar: removing https://gerrit.wikimedia.org/r/#/c/269370/ from CI puppet master
* 13:14 hashar: slave recurse infinitely doing /bin/bash -eu /srv/deployment/integration/slave-scripts/bin/mw-install-mysql.sh  then loop over /bin/bash /usr/bin/php maintenance/install.php --confpath /mnt/jenkins-workspace/workspace/mediawiki-core-qunit/src --dbtype=mysql --dbserver=127.0.0.1:3306 --dbuser=jenkins_u2 --dbpass=pw_jenkins_u2 --dbname=jenkins_u2_mw --pass testpass TestWiki WikiAdmin  https://phabricator.wikimedia.org/T126327
* 12:46 hashar: Mass testing php loop of death:  salt -v '*slave*' cmd.run 'timeout 2s /srv/deployment/integration/slave-scripts/bin/php --version'
* 12:40 hashar: mass rebooting CI slaves from wikitech
* 12:39 hashar: salt -v '*' cmd.run "bash -c 'cd /srv/deployment/integration/slave-scripts; git pull'"
* 12:33 hashar: all slaves dieing due to PHP looping
* 12:02 legoktm: re-enabling puppet on all trusty/precise slaves
* 11:20 legoktm: cherry-picked https://gerrit.wikimedia.org/r/#/c/269370/ on integration-puppetmaster
* 11:20 legoktm: enabling puppet just on integration-slave-trusty-1012
* 11:13 legoktm: disabling puppet on all *(trusty|precise)* slaves
* 10:26 hashar: pooling in  integration-slave-trusty-1018
* 03:19 legoktm: deploying https://gerrit.wikimedia.org/r/269359
* 02:53 legoktm: deploying https://gerrit.wikimedia.org/r/238988
* 00:39 hashar: gallium edited /usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/trigger/gerrit.py  and modified:  replication_timeout = 300 -> replication_timeout = 10
* 00:37 hashar: live hacking Zuul code to have it stop sleeping() on force merge
* 00:36 hashar: killing zuul


== 2016-02-08 ==
== 2022-05-24 ==
* 23:48 legoktm: finally deploying https://gerrit.wikimedia.org/r/269327
* 15:15 dancy: Upgrading scap to 4.7.1-1+0~20220524151055.286~1.gbpe809e8 in beta cluster
* 23:14 hashar: zuul promote --pipeline gate-and-submit --changes 269065,2 https://gerrit.wikimedia.org/r/#/c/269065/
* 13:35 James_F: Zuul: [mediawiki/tools/code-utils] Add composer test CI for [[phab:T309099|T309099]]
* 23:10 hashar: pooling integration-slave-precise-1001 1002 1004
* 11:36 TheresNoTime: cleared stuck beta deployment jobs per https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code/db_update
* 22:47 hashar: Err need to reboot newly provisioned instances before adding them to Jenkins (kernel upgrade,apache restart etc)
* 22:45 hashar: Pooled https://integration.wikimedia.org/ci/computer/integration-slave-precise-1003/
* 22:25 hashar: integration-slave-precise-{1001-1004} applied role::ci::slave::labs, running puppet in slaves. I have added the instances as Jenkins slaves and put them offline. Whenever puppet is done, we can mark them online in Jenkins then monitor the jobs running on them are working properly
* 22:15 hashar: Provisioning integration-slave-precise-{1001-1004} https://phabricator.wikimedia.org/T126274 (need more php53 slots)
* 22:13 hashar: Deleted cache-rsync instance superseded by castor instance
* 22:10 hashar: Deleting pmcache.integration.eqiad.wmflabs (was to investigate various kind of central caches).
* 20:14 marxarelli: aborting pending mediawiki-extensions-php53 job for CheckUser
* 20:08 bd808: toggled "Enable Gearman" off and on in Jenkins to wake up deployment-bastion workers
* 14:54 hashar: nodepool: refreshed snapshot image , Image ci-jessie-wikimedia-1454942958 in wmflabs-eqiad is ready
* 14:47 hashar: regenerated nodepool reference image (got rid of grunt-cli https://gerrit.wikimedia.org/r/269126 )
* 09:41 legoktm: deploying https://gerrit.wikimedia.org/r/269093 https://gerrit.wikimedia.org/r/269094
* 09:36 hashar: restarting integration puppetmaster (out of memory / cannot fork)
* 06:11 bd808: tgr set $wgAuthenticationTokenVersion on beta cluster (test run for T124440)
* 02:09 legoktm[NE]: deploying https://gerrit.wikimedia.org/r/268047
* 00:57 legoktm[NE]: deploying https://gerrit.wikimedia.org/r/268031


== 2016-02-06 ==
== 2022-05-23 ==
* 18:34 jzerebecki: reloading zuul for bdb2ed4..46ccca9
* 19:21 inflatador: Deleted deployment-elastic0[5-7] in favor of newer bullseye hosts [[phab:T299797|T299797]]
* 18:37 dancy: Reverted to scap 4.7.1-1+0~20220505181519.270~1.gbpeb47ae in beta cluster
* 18:35 dancy: Upgrading beta cluster scap to 4.7.1-1+0~20220523183110.280~1.gbpaa0826
* 14:49 James_F: Zuul: Enforce Postgres and SQLite support via in-mediawiki-tarball
* 08:37 elukey: move kafka jumbo in deployment-prep to fixed uid/gid - [[phab:T296982|T296982]]
* 08:29 elukey: move kafka main in deployment-prep to fixed uid/gid - [[phab:T296982|T296982]]
* 08:06 elukey: move kafka logging in deployment-prep to fixed uid/gid - [[phab:T296982|T296982]]


== 2016-02-05 ==
== 2022-05-22 ==
* 13:30 hashar: beta cleaning out /data/project/logs/archive  was from pre logstash area.  We no more log this way since May 2015 apparently
* 18:39 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/795818/
* 13:29 hashar: beta deleting /data/project/swift-disk  created in august 2014 , unused since june 2015. Was a fail attempt at bringing swift to beta
* 13:27 hashar: beta: reclaiming disk space from extensions.git. On bastion: find /srv/mediawiki-staging/php-master/extensions/.git/modules -maxdepth 1 -type d -print -execdir git gc \;
* 13:03 hashar: integration-slave-trusty-1011 went out of disk space. Did some brute clean up and git gc.
* 05:21 Tim: configured mediawiki-extensions-qunit to only run on integration-slave-trusty-1017, did a rebuild and then switched it back


== 2016-02-04 ==
== 2022-05-21 ==
* 22:08 jzerebecki: reloading zuul for bed7be1..f57b7e2
* 23:05 legoktm: deployed https://gerrit.wikimedia.org/r/c/integration/config/+/794756/
* 21:51 hashar: salt-key -d integration-slave-jessie-1001.eqiad.wmflabs
* 14:11 hashar: Icinga reports `Gerrit Health Check SSL Expiry` errors filed as [[phab:T308908|T308908]]
* 21:50 hashar: salt-key -d integration-slave-precise-1011.eqiad.wmflabs
* 00:57 bd808: Got deployment-bastion processing Jenkins jobs again via instructions left by my past self at https://phabricator.wikimedia.org/T72597#747925
* 00:43 bd808: Jenkins agent on deployment-bastion.eqiad doing the trick where it doesn't pick up jobs again


== 2016-02-03 ==
== 2022-05-20 ==
* 22:24 bd808: Manually ran sync-common on deployment-jobrunner01.eqiad.wmflabs to pickup wmf-config changes that were missing (InitializeSettings, Wikibase, mobile)
* 16:21 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/793809
* 17:43 marxarelli: Reloading Zuul to deploy previously undeployed Icd349069ec53980ece2ce2d8df5ee481ff44d5d0 and Ib18fe48fe771a3fe381ff4b8c7ee2afb9ebb59e4
* 15:12 hashar: apt-get upgrade deployment-sentry2
* 15:03 hashar: redeployed rcstream/rcstream on deployment-stream by using git-deploy on deployment-bastion
* 14:55 hashar: upgrading deployment-stream
* 14:42 hashar: pooled back integration-slave-trusty-1015  Seems ok
* 14:35 hashar: manually triggered a bunch of browser tests jobs
* 11:40 hashar: apt-get upgrade deployment-ms-be01 and deployment-ms-be02
* 11:32 hashar: fixing puppet.conf on deployment-memc04
* 11:09 hashar: restarting beta cluster puppetmaster just in case
* 11:07 hashar: beta: apt-get upgrade on delpoyment-cache* hosts  and checking puppet
* 10:59 hashar: integration/beta:  deleting /etc/apt/apt.conf.d/*proxy  files.  There is no need for them, in fact web proxy is not reachable from labs
* 10:53 hashar: integration: switched puppet repo back to 'production' branch, rebased.
* 10:49 hashar: various beta cluster have puppet errors ..
* 10:46 hashar: integration-slave-trusty-1013 heading to out of disk space on /mnt ...
* 10:42 hashar: integration-slave-trusty-1016 out of disk space on /mnt ...
* 03:45 bd808: Puppet failing on deployment-fluorine with "Error: Could not set uid on user[datasets]: Execution of '/usr/sbin/usermod -u 10003 datasets' returned 4: usermod: UID '10003' already exists"
* 03:44 bd808: Freed 28G by deleting deployment-fluorine:/srv/mw-log/archive/*2015*
* 03:42 bd808: Ran deployment-bastion.deployment-prep:/home/bd808/cleanup-var-crap.sh and freed 565M


== 2016-02-02 ==
== 2022-05-19 ==
* 18:32 marxarelli: Reloading Zuul to deploy If1f3cb60f4ccb2c1bca112900dbada03a8588370
* 19:34 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/793527
* 17:42 marxarelli: cleaning mwext-donationinterfacecore125-testextension-php53 workspace on integration-slave-precise-1013
* 14:31 hashar: Reloaded zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/793458 {{!}} Don't re-trigger the test pipeline on patches with C+2 already
* 17:06 ostriches: running sync-common on mw2051 and mw1119
* 09:38 hashar: Jenkins is fully up and operational
* 09:33 hashar: restarting Jenkins
* 08:47 hashar: pooling back integration-slave-precise1011 , puppet run got fixed ( https://phabricator.wikimedia.org/T125474 )
* 03:48 legoktm: deploying https://gerrit.wikimedia.org/r/267828
* 03:29 legoktm: deploying https://gerrit.wikimedia.org/r/266941
* 00:42 legoktm: due to T125474
* 00:42 legoktm: marked integration-slave-precise-1011 as offline
* 00:39 legoktm: precise-1011 slave hasn't had a puppet run in 6 days


== 2016-02-01 ==
== 2022-05-18 ==
* 23:53 bd808: Logstash working again; I applied a change to the default mapping template for Elasticsearch that ensures that fields named "timestamp" are indexed as plain strings
* 19:31 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/793028
* 23:46 bd808: Elasticsearch index template for beta logstash cluster making crappy guesses about syslog events; dropped 2016-02-01 index; trying to fix default mappings
* 18:45 brennen: gitlab: created placeholder /repos/mediawiki group for squatting purposes
* 23:09 bd808: HHVM logs causing rejections during document parse when inserting in Elasticsearch from logstash. They contain a "timestamp" field that looks like "Feb  1 22:56:39" which is making the mapper in Elasticsearch sad.
* 08:29 hashar: Updating SSH Build agent from 1.31.5 to 1.32.0 on CI Jenkins to prevent an issue when uploading `remoting.jar`  # [[phab:T307339|T307339]]#7937268
* 23:04 bd808: Elasticsearch on deployment-logstash2 rejecting all documents with 400 status. Investigating
* 07:32 hashar: Deleting Jenkins agent configuration for `integration-castor03` # [[phab:T252071|T252071]]
* 22:50 bd808: Copying deployment-logstash2.deployment-prep:/var/log/logstash/logstash.log to /srv for debugging later
* 22:48 bd808: deployment-logstash2.deployment-prep:/var/log/logstash/logstash.log is 11G of fail!
* 22:46 bd808: root partition on deployment-logstash2 full
* 22:43 bd808: No data in logstash since 2016-01-30T06:55:37.838Z; investigating
* 15:33 hashar: Image ci-jessie-wikimedia-1454339883 in wmflabs-eqiad is ready
* 15:01 hashar: Refreshing Nodepool image. Might have npm/grunt properly set up
* 03:15 legoktm: deploying https://gerrit.wikimedia.org/r/267630


== 2016-01-31 ==
== 2022-05-17 ==
* 13:35 hashar: Jenkins IRC bot started falling at Jan 30 01:04:00 2016  for whatever reason....  Should be fine now
* 23:26 James_F: Zuul: [mediawiki/extensions/Phonos] Install basic quibble CI for [[phab:T308558|T308558]]
* 13:33 hashar: cancelling/aborting jobs that are stuck while reporting to IRC (mostly browser tests and beta cluster jobs)
* 13:32 hashar: Jenkins jobs are being blocked because they can no more report back to IRC :-(((
* 13:28 hashar: Jenkins jobs are being blocked because they can no more report back to IRC :-(((


== 2016-01-30 ==
== 2022-05-16 ==
* 12:46 hashar: integration-slave-jessie-1001 : fixed puppet.con server name and ran puppet
* 19:31 inflatador: bking@deployment-elastic07 halted deployment-elastic07 in beta ES cluster; will decom on Friday [[phab:T299797|T299797]]
* 19:02 inflatador: bking@deployment-elastic06 halted deployment-elastic06 in beta ES cluster; will decom on Friday [[phab:T299797|T299797]]
* 08:33 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/791809


== 2016-01-29 ==
== 2022-05-14 ==
* 18:43 thcipriani: updated scap on beta
* 23:19 James_F: Zuul: Add Dreamy_Jazz to CI allow list
* 16:44 thcipriani: deployed scap updates on beta
* 23:17 James_F: Zuul: [mediawiki/extensions/LocalisationUpdate] Move out of production section
* 11:58 _joe_: upgraded hhvm to 3.6 wm8 in deployment-prep
* 20:25 urbanecm: add TheresNoTime (samtar) as a project member per request


== 2016-01-28 ==
== 2022-05-13 ==
* 23:22 MaxSem: Updated portals on betalabs to master
* 22:59 James_F: Zuul: [mediawiki/extensions/SocialProfile] Add WikiEditor as a CI dependency
* 22:23 hashar: salt '*slave-precise*' cmd.run 'apt-get install php5-ldap'  ( https://phabricator.wikimedia.org/T124613 )  will need to be puppetized
* 22:52 James_F: Zuul: Add Tranve to CI allow list
* 18:17 thcipriani: cleaning npm cache on slave machines: salt -v '*slave*' cmd.run 'sudo -i -u jenkins-deploy -- npm cache clean'
* 22:01 hashar: reloaded zuul for https://gerrit.wikimedia.org/r/791688
* 18:12 thcipriani: running npm cache clean on integration-slave-precise-1011 sudo -i -u jenkins-deploy -- npm cache clean
* 18:58 inflatador: bking@deployment-elastic05 halted deployment-elastic05 in beta ES cluster; will decom in 1 wk [[phab:T299797|T299797]]
* 15:25 hashar: apt-get upgrade deployment-sca01 and deployment-sca02
* 17:18 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/791644/
* 15:09 hashar: fixing puppet.conf hostname on deployment-upload  deployment-conftool  deployment-tmh01 deployment-zookeeper01 and deployment-urldownloader
* 13:16 taavi: added user Zoranzoki21 to extension-HidePrefix gerrit group [[phab:T305317|T305317]]
* 15:06 hashar: fixing puppet.con hostname on deployment-upload.deployment-prep.eqiad.wmflabs and running puppet
* 15:00 hashar: Running puppet on deployment-memc02 and deployment-elastic07 . It is catching up with lot of changes
* 14:59 hashar: fixing puppet hostnames on deployment-elastic07
* 14:59 hashar: fixing puppet hostnames on deployment-memc02
* 14:55 hashar: Deleted salt keys deployment-pdf01.eqiad.wmflabs and deployment-memc04.eqiad.wmflabs  (obsolete,  entries with '.deployment-prep.' are already there)
* 07:38 jzerebecki: reload zuul for 4951444..43a030b
* 05:55 jzerebecki: doing https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code.2Fdb_update
* 03:49 mobrovac: deployment-prep re-enabled puppet on deployment-restbase0x
* 02:49 mobrovac: deployment-prep deployment-restbase01 disabled puppet to set up cassandra for
* 02:27 mobrovac: deployment-prep recreating deployment-restbase01 for T125003
* 02:23 mobrovac: deployment-prep deployment-restbase02 disabled puppet to recreate deployment-restbase01 for T125003
* 01:42 mobrovac: deployment-prep recreating deployment-sca02 for T125003
* 01:28 mobrovac: deployment-prep recreating deployment-sca01 for T125003
* 00:36 mobrovac: deployment-prep re-imaging deployment-mathoid for T125003
* 00:02 jzerebecki: integration-slave-trusty-1016:~$ sudo -i rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/skins/Donate


== 2016-01-27 ==
== 2022-05-12 ==
* 23:49 jzerebecki: integration-slave-precise-1011:~$ sudo -i /etc/init.d/salt-minion restart
* 22:09 inflatador: bking@deployment-elastic05 banned deployment-elastic05 from beta ES cluster in preparation for decom [[phab:T299797|T299797]]
* 23:46 jzerebecki: work around https://phabricator.wikimedia.org/T117710 : salt --show-timeout '*slave*' cmd.run 'rm -rf /mnt/jenkins-workspace/workspace/mwext-testextension-hhvm/src/skins/BlueSky'
* 19:53 hashar: gerrit: triggering full replication to gerrit2001 to test [[phab:T307137|T307137]]
* 21:19 cscott: updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf (should be no-op after yesterday's deploy)
* 16:00 hashar: contint2001 and contint1001 now automatically run `docker system prune --force` every day  and `docker system prune --force` on Sunday {{!}} https://gerrit.wikimedia.org/r/c/operations/puppet/+/773784/
* 10:29 hashar: triggered bunch of browser tests, deployment-redis01 was dead/faulty
* 15:05 brennen: gitlab-prod-1001.devtools: soft reboot
* 10:08 hashar: mass restarting redis-server process on deployment-redis01 (for https://phabricator.wikimedia.org/T124677 )
* 00:46 brennen: gitlab: disabling container registries on all existing projects ([[phab:T307537|T307537]])
* 10:07 hashar: mass restarting redis-server process on deployment-redis01
* 09:00 hashar: beta: commenting out "latency-monitor-threshold 100" parameter from any /etc/redis/redis.conf we have ( https://phabricator.wikimedia.org/T124677 ). Puppet will not reapply it unless distribution is Jessie


== 2016-01-26 ==
== 2022-05-11 ==
* 16:51 cscott: updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf
* 23:20 brennen: gitlab-prod-1001.devtools: container registry currently enabled
* 12:14 hashar: Added Jenkins IRC bot (wmf-insecte) to #wikimedia-perf for https://gerrit.wikimedia.org/r/#/c/265631/
* 18:58 brennen: gitlab-prod-1001.devtools: setting to use devtools standalone puppetmaster
* 09:30 hashar: restarting Jenkins to upgrade the gearman plugin with https://review.openstack.org/#/c/271543/
* 04:18 bd808: integration-slave-jessie-1001:/mnt full; cleaned up 15G of files in /mnt/pbuilder/build (27 hours after the last time I did that)


== 2016-01-25 ==
== 2022-05-10 ==
* 18:59 twentyafterfour: started redis-server on deployment-redis01 by commenting out latency-monitor-threshold from the redis.conf
* 12:06 hashar: Updating Quibble jobs to image 1.4.5 with Memcached enabled {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/790641 {{!}} [[phab:T300340|T300340]]
* 15:22 hashar: CI: fixing kernels not upgrading via:  rm /boot/grub/menu.lst ; update-grub -y  (i.e.: regenerate the Grub menu from scratch)
* 10:55 hashar: Updating `wmf-quibble-*` jobs to Quibble 1.4.5 # https://gerrit.wikimedia.org/r/c/integration/config/+/790638/
* 14:21 hashar: integration-slave-trusty-1015.integration.eqiad.wmflabs  is gone. I have failed the kernel upgrade / grub update
* 08:36 hashar: Updating wikibase-client-docker and wikibase-repo-docker to Quibble 1.4.5 + supervisord https://gerrit.wikimedia.org/r/c/integration/config/+/790621
* 01:35 bd808: integration-slave-jessie-1001:/mnt full; cleaned up 15G of files in /mnt/pbuilder/build
* 08:30 hashar: Updating MediaWiki coverage jobs to Quibble image 1.4.5 + supervisord https://gerrit.wikimedia.org/r/c/integration/config/+/790381
* 08:24 hashar: Updating codehealth jobs to Quibble 1.4.5 + supervisord https://gerrit.wikimedia.org/r/c/integration/config/+/790380/
* 08:23 hashar: Updating MediaWiki Phan jobs to Quibble 1.4.5 https://gerrit.wikimedia.org/r/c/integration/config/+/790377


== 2016-01-24 ==
== 2022-05-09 ==
* 06:45 legoktm: deploying https://gerrit.wikimedia.org/r/266039
* 21:43 James_F: Beta Cluster: Shutting down old deployment-restbase03 instance for [[phab:T295375|T295375]]
* 06:13 legoktm: deploying https://gerrit.wikimedia.org/r/266041
* 20:33 hashar: Manually cancelling deadlock build jobs for beta https://integration.wikimedia.org/ci/view/Beta/ # [[phab:T307963|T307963]]


== 2016-01-22 ==
== 2022-05-08 ==
* 23:58 legoktm: removed skins from mwext-qunit workspace on trusty-1013 slave
* 12:33 urbanecm: deployment-prep: urbanecm@deployment-mwmaint02:~$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/migrateMenteeOverviewFiltersToPresets.php --update # [[phab:T304057|T304057]]
* 23:34 legoktm: rm -rf /mnt/jenkins-workspace/workspace/mediawiki-phpunit-php53 on slave precise 1012
* 22:45 legoktm: deploying https://gerrit.wikimedia.org/r/265864
* 22:27 hashar: rebooted all CI slaves using OpenStackManager
* 22:09 hashar: rebooting deployment-redis01 (kernel upgrade)
* 21:22 hashar: Image ci-jessie-wikimedia-1453497269 in wmflabs-eqiad is ready (with node 4.2 for https://phabricator.wikimedia.org/T119143 )
* 21:14 hashar: updating nodepool snapshot based on new image
* 21:12 hashar: rebuilding nodepool reference image
* 20:04 hashar: Image ci-jessie-wikimedia-1453492820 in wmflabs-eqiad is ready
* 20:00 hashar: Refreshing nodepool image to hopefully get Nodejs 4.2.4 https://phabricator.wikimedia.org/T124447  https://gerrit.wikimedia.org/r/#/c/265802/
* 16:32 hashar: Nuked corrupted git repo on integration-slave-precise-1012 /mnt/jenkins-workspace/workspace/mediawiki-extensions-php53
* 12:23 hashar: beta: reinitialized keyholder on deployment-bastion.  The proxy apparently  had no identity
* 09:32 hashar: beta cluster Jenkins job have been stalled for 9hours and 25 minutes. Disabling/reenabling the Gearman plugin to remove the deadlock


== 2016-01-21 ==
== 2022-05-06 ==
* 21:41 hashar: restored role::mail::mx on deployment-mx
* 12:55 hashar: Migrated Castor service from integration-castor03 to integration-castor05 # [[phab:T252071|T252071]]
* 21:36 hashar: dropping role::mail::mx from deployment-mx  to let  puppet  run
* 21:33 hashar: rebooting deployment-jobrunner01  / kernel upgrade /  /tmp is only 1MBytes
* 21:19 hashar: fixing up deployment-jobrunner01  /tmp and / disks are full
* 19:57 thcipriani: ran REPAIR TABLE globalnames; on centralauth db
* 19:48 legoktm: deploying https://gerrit.wikimedia.org/r/265552
* 19:39 legoktm: deploying jjb changes for https://gerrit.wikimedia.org/r/264990
* 19:25 legoktm: deploying https://gerrit.wikimedia.org/r/265546
* 01:59 jzerebecki: jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions/SpellingDictionary$ rm -r modules/jquery.uls && git rm modules/jquery.uls
* 01:00 jzerebecki: jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions$ git pull && git submodule update --init --recursive
* 00:57 jzerebecki: jenkins-deploy@deployment-bastion:/srv/mediawiki-staging/php-master/extensions$ git reset HEAD SpellingDictionary


== 2016-01-20 ==
== 2022-05-05 ==
* 20:05 hashar: beta sudo find /data/project/upload7/math -type f -delete  (probably some old left over)
* 22:57 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/789723
* 19:50 hashar: beta: on commons ran deleteArchivedFile.php : Nuked 7130 files
* 22:31 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/789721
* 19:49 hashar: beta : foreachwiki deleteArchivedRevisions.php -delete
* 22:28 dduvall: created 2 new jobs to deploy https://gerrit.wikimedia.org/r/789720
* 19:26 hasharAway: Nuked all files from http://commons.wikimedia.beta.wmflabs.org/wiki/Category:GWToolset_Batch_Upload
* 22:24 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/789718
* 19:19 hasharAway: beta: sudo find /data/project/upload7/*/*/temp -type f -delete
* 22:21 dduvall: created 4 new jobs to deploy https://gerrit.wikimedia.org/r/789717
* 19:14 hasharAway: beta: sudo rm /data/project/upload7/*/*/lockdir/*
* 22:15 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/789714
* 18:57 hasharAway: beta cluster code has been stalled for roughly 2h30
* 22:13 dduvall: created 2 new jobs to deploy https://gerrit.wikimedia.org/r/789713
* 18:55 hasharAway: disconnecting Gearman plugin to remove deadlock for beta cluster rjobs
* 22:09 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/789711
* 17:06 hashar: clearing files from beta-cluster to prepare for Swift migration. python pwb.py delete.py -family:betacommons -lang:en -cat:'GWToolset Batch Upload' -verbose -putthrottle:0 -summary:'Clearing out old batched upload to save up disk space for Swift migration'
* 22:07 dduvall: created 2 new jobs to deploy https://gerrit.wikimedia.org/r/789710
* 21:57 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789707/1
* 21:51 dduvall: created 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789706
* 21:48 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789704
* 21:44 dduvall: created 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789703
* 21:38 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/789698
* 21:35 dduvall: created 4 jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789697
* 21:26 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789694
* 21:22 dduvall: creating 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789693
* 18:27 dduvall: reenabled puppet on integration-agent-docker-1023.integration.eqiad1.wikimedia.cloud
* 18:25 dancy: Update to scap 4.7.1-1+0~20220505181519.270~1.gbpeb47ae in beta cluster
* 18:16 dduvall: disabled puppet on integration-agent-docker-1023.integration.eqiad1.wikimedia.cloud for deployment of https://gerrit.wikimedia.org/r/c/operations/puppet/+/768774
* 16:29 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789650
* 16:26 dduvall: created 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789649
* 14:25 hashar: Created integration-castor05
* 12:28 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/789179 and https://gerrit.wikimedia.org/r/789232
* 07:45 hashar: deployment-prep: removed a few queued Jenkins  builds from https://integration.wikimedia.org/ci/view/Beta/


== 2016-01-19 ==
== 2022-05-04 ==
* 22:25 legoktm: deleting *zend* workspaces on precise slaves
* 21:29 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789285
* 21:58 thcipriani: trying https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code.2Fdb_update again
* 21:16 dduvall: created 1 new job to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789284
* 21:57 thcipriani: beta-scap-eqiad still can't find executor on deployment-bastion.eqiad
* 21:07 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789278
* 21:52 thcipriani: following steps at https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code.2Fdb_update for deployment-bastion
* 21:00 dduvall: created 2 jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789277
* 19:34 legoktm: deleting all *zend* jobs from jenkins
* 20:48 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/789274
* 09:40 hashar: Created github repo https://github.com/wikimedia/operations-debs-varnish4
* 20:44 dduvall: creating 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789273
* 03:59 legoktm: deploying https://gerrit.wikimedia.org/r/264912 and https://gerrit.wikimedia.org/r/264922
* 20:31 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789265
* 20:25 dduvall: created 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789264
* 20:22 urbanecm: urbanecm@deployment-mwmaint02:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki "There'sNoTime" "TheresNoTime" # [[phab:T307590|T307590]]
* 20:14 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789259/1
* 20:11 dduvall: created 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789258
* 18:54 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789245
* 18:47 dduvall: creating 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789244
* 18:31 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789238
* 18:24 dduvall: created 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789237
* 17:51 dduvall: created 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789225
* 17:22 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789218
* 17:12 dduvall: created 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789217
* 16:11 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789204
* 16:01 dduvall: created 2 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789203
* 16:01 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789195
* 15:42 dduvall: created 2 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/789194
* 13:44 James_F: Zuul: [mediawiki/services/function-evaluator] Use bespoke pipeline jobs only [[phab:T307507|T307507]]


== 2016-01-17 ==
== 2022-05-03 ==
* 18:02 legoktm: deploying https://gerrit.wikimedia.org/r/264605
* 23:35 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/788871
* 23:23 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/788868
* 22:03 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/788806
* 22:01 dduvall: created 4 new jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/788806
* 21:40 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/788798
* 21:27 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/788799
* 21:25 dduvall: created trigger-pipelinelib-pipeline-test and pipelinelib-pipeline-test jobs for https://gerrit.wikimedia.org/r/c/integration/config/+/788799
* 11:50 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/788682


== 2016-01-16 ==
== 2022-05-02 ==
* 21:47 legoktm: deploying https://gerrit.wikimedia.org/r/264489
* 15:09 dancy: Updating beta cluster scap to 4.7.1-1+0~20220502085300.264~1.gbp367de7?
* 21:36 legoktm: deploying https://gerrit.wikimedia.org/r/264488
* 10:06 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/786934 # [[phab:T301766|T301766]]
* 21:29 legoktm: deploying https://gerrit.wikimedia.org/r/264487
* 21:21 legoktm: deploying https://gerrit.wikimedia.org/r/264483 https://gerrit.wikimedia.org/r/264485
* 20:58 legoktm: deploying https://gerrit.wikimedia.org/r/264492
* 18:55 jzerebecki: reloadin zuul for 996c558..5f8eb50
* 09:12 legoktm: deploying https://gerrit.wikimedia.org/r/264448
* 09:01 legoktm: deploying https://gerrit.wikimedia.org/r/264446 and https://gerrit.wikimedia.org/r/264447
* 07:46 legoktm: sudo -u jenkins-deploy mv /mnt/jenkins-workspace/workspace/mediawiki-core-phplint /mnt/jenkins-workspace/workspace/mediawiki-core-php53lint on all precise slaves
* 07:17 legoktm: deploying https://gerrit.wikimedia.org/r/264444
* 06:31 legoktm: deploying https://gerrit.wikimedia.org/r/264441
* 06:10 legoktm: added phpflavor-php53 label to all phpflavor-zend slaves


== 2016-01-15 ==
== 2022-04-29 ==
* 12:17 hashar: restarting Jenkins for plugins updates
* 21:49 brennen: created https://gitlab.wikimedia.org/toolforge-repos and https://gitlab.wikimedia.org/cloudvps-repos for cloud tenants ([[phab:T305301|T305301]])
* 02:49 bd808: Trying to fix submodules in deployment-bastion:/srv/mediawiki-staging/php-master/extensions for T123701
* 18:37 James_F: Zuul: Add SimilarEditors dependency on QuickSurveys extension for [[phab:T297687|T297687]]


== 2016-01-14 ==
== 2022-04-28 ==
* 20:06 legoktm: deploying https://gerrit.wikimedia.org/r/264122
* 20:31 James_F: Zuul: Add PHP81 as voting for libraries, PHP extensions etc. for [[phab:T293509|T293509]]
* 19:32 legoktm: deploying https://gerrit.wikimedia.org/r/264114
* 18:57 brennen: finished editing mediawiki-new-errors
* 19:18 legoktm: deploying https://gerrit.wikimedia.org/r/264108
* 18:50 brennen: adding some filters to mediawiki-new-errors, including one based on https://wikitech.wikimedia.org/wiki/Performance/Runbook/Kibana_monitoring#Filtering_by_query_string
* 09:03 hashar: Gerrit upgraded to 3.4.4  at roughly 8:00 UTC


== 2016-01-13 ==
== 2022-04-27 ==
* 21:06 hashar: beta cluster code is up to date again. Got delayed by roughly 4 hours.
* 19:06 hashar: Updating operations/software/gerrit branches and tags from upstream # [[phab:T292759|T292759]]
* 20:55 hashar: unlocked Jenkins jobs for beta cluster by disabling/reenabling  Jenkins Gearman client
* 15:20 hashar: Updating non-quibble jobs to composer 2.3.3 {{!}} [[phab:T303867|T303867]] {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/777029
* 10:15 hashar: beta: fixed puppet on deployment-elastic06 . Was still using cert/hostname without .deployment-prep. .... Mass update occurring.


== 2016-01-12 ==
== 2022-04-26 ==
* 23:30 legoktm: deploying https://gerrit.wikimedia.org/r/263757 https://gerrit.wikimedia.org/r/263756
* 15:40 brennen: train 1.39.0-wmf.9 ([[phab:T305215|T305215]]): no current blockers - expect to start train ops after the toolhub deployment window wraps, so some time after 17:00 UTC; taking a pre-train stroll-around-the-block break before that.
* 13:32 hashar: beta cluster: running /usr/local/sbin/cleanup-pam-config
* 13:46 James_F: Deleting deployment-mx02.deployment-prep.eqiad1.wikimedia.cloud for [[phab:T306068|T306068]]
* 13:29 hashar: integration running /usr/local/sbin/cleanup-pam-config  on slaves
* 13:38 James_F: Zuul: [mediawiki/extensions/SimilarEditors] Install basic prod CI for [[phab:T306897|T306897]]
* 12:33 hashar: Manually pruned dangling docker images on contint1001 and contint2001
* 08:30 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/780824
* 08:09 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/785204


== 2016-01-11 ==
== 2022-04-25 ==
* 22:24 hashar: Deleting old references on Zuul-merger for mediawiki/core : <tt>/usr/share/python/zuul/bin/python /home/hashar/zuul-clear-refs.py --until 15 /srv/ssd/zuul/git/mediawiki/core </tt>
* 17:29 dancy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/779450
* 22:21 hashar: gallium in /srv/ssd/zuul/git/mediawiki/core$  git gc --prune=all && git remote update --prune
* 15:31 James_F: Zuul: [mediawiki/extensions/RegularTooltips] Add basic quibble CI
* 22:21 hashar: scandium  in /srv/ssd/zuul/git/mediawiki/core$  git gc --prune=all && git remote update --prune
* 07:35 legoktm: deploying https://gerrit.wikimedia.org/r/263319


== 2016-01-07 ==
== 2022-04-20 ==
* 23:16 legoktm: deleted /mnt/jenkins-workspace/workspace/mediawiki-extensions-qunit/src/extensions/PdfHandler/.git/refs/heads/wmf/1.26wmf16.lock on slave 1013
* 16:25 zabe: root@deployment-cache-upload06:~# touch /srv/trafficserver/tls/etc/ssl_multicert.config && systemctl reload trafficserver-tls.service
* 06:32 legoktm: deploying https://gerrit.wikimedia.org/r/262868
* 02:24 legoktm: deploying https://gerrit.wikimedia.org/r/262855
* 01:25 jzerebecki: reloading zuul for b0a5335..c16368a


== 2016-01-06 ==
== 2022-04-18 ==
* 21:13 thcipriani: kicking integration puppetmaster, weird node unable to find definition.
* 19:27 brennen: gitlab runners: deleting a number of stale runners with no contacts in > 2 months which are most likely no longer extant
* 21:11 jzerebecki: on scandium: sudo -u zuul rm -rf /srv/ssd/zuul/git/mediawiki/services/mathoid
* 16:49 brennen: phabricator: created phame blog https://phabricator.wikimedia.org/phame/blog/view/22/ for [[phab:T306329|T306329]]
* 21:04 legoktm: ^ on gallium
* 16:48 brennen: phabricator: adding self to acl*blog-admins
* 21:04 legoktm: manually deleted /srv/ssd/zuul/git/mediawiki/services/mathoid to force zuul to re-clone it
* 15:33 James_F: Shutting off deployment-wdqs01 from the Beta Cluster project per [[phab:T306054|T306054]]; it's apparently unused, so this shouldn't break anything.
* 20:17 hashar: beta: dropped a few more /etc/apt/apt.conf.d/*-proxy files.  webproxy is no more reachable from labs
* 09:44 hashar: CI/beta: deleting all git tags from /var/lib/git/operations/puppet and doing git repack
* 09:39 hashar: restoring puppet hacks on beta cluster puppetmaster.
* 09:35 hashar: beta/CI:  salt -v '*' cmd.run 'rm -v /etc/apt/apt.conf.d/*-proxy'  https://phabricator.wikimedia.org/T122953


== 2016-01-05 ==
== 2022-04-14 ==
* 16:54 hashar_: Removed elastic search from CI slaves https://phabricator.wikimedia.org/T89083 https://gerrit.wikimedia.org/r/#/c/259301/
* 22:30 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/779969
* 03:45 Krinkle: integration-slave-trusty-1015: rm -rf /mnt/home/jenkins-deploy/.npm per https://integration.wikimedia.org/ci/job/mediawiki-core-qunit/56577/console
* 16:09 brennen: removed or renamed 4 filters from mediawiki-new-errors per check-new-error-tasks/check.sh


== 2016-01-04 ==
== 2022-04-12 ==
* 21:06 hashar: gallium has puppet enabled again
* 21:49 brennen: Updating dev-images docker-pkg files on primary contint for elastic 7.10.2
* 20:53 hashar: stopping puppet on gallium and live hacking Zuul configuration for https://phabricator.wikimedia.org/T122656
* 21:46 brennen: Updating dev-images docker-pkg files on primary contint for elastic 6.8.23
* 21:37 brennen: Updating dev-images docker-pkg files on primary contint for apache & elasticsearch changes ([[phab:T304290|T304290]], [[phab:T305143|T305143]])
* 16:05 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/779500
* 15:55 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/779498 https://gerrit.wikimedia.org/r/779141


== 2016-01-02 ==
== 2022-04-08 ==
* 03:17 yurik: purged varnishs on deployment-cache-text04
* 11:08 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/778287


== 2016-01-01 ==
== 2022-04-07 ==
* 22:17 bd808: No nodepool ci-jessie-* hosts seen in Jenkins interface and rake-jessie jobs backing up
* 06:07 urbanecm: deployment-prep: foreachwiki extensions/GrowthExperiments/maintenance/T304461.php --delete # [[phab:T304461|T304461]], output is at P24204
* 05:54 urbanecm: deployment-prep: mwscript extensions/GrowthExperiments/maintenance/T304461.php --wiki=<nowiki>{</nowiki>enwiki,cswiki<nowiki>}</nowiki> --delete # [[phab:T304461|T304461]]


== Archive ==
== 2022-04-06 ==
* [[/Archive 1|Archive 1]] (September 2014 - December 2015)
* 20:03 thcipriani: rebooting phabricator
* 11:44 James_F: Zuul: [mediawiki/extensions/WikiEditor] Add BetaFeatures to phan deps for [[phab:T304596|T304596]]
 
== 2022-04-04 ==
* 22:43 James_F: dockerfiles: [composer-scratch] Upgrade composer to 2.3.3 and cascade for [[phab:T294260|T294260]]
* 18:49 hashar: Reloading Zuul to revert https://gerrit.wikimedia.org/r/776179
* 18:23 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/776179
* 17:50 dancy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/775796
* 12:12 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/776723
* 10:28 James_F: Zuul: [mediawiki/extensions/WikiLambda] Publish PHP and JS documentation
* 08:54 jnuche: redeploying Zuul
 
== 2022-04-02 ==
* 12:00 zabe: apply https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/773903 on deployment-prep centralauth databases
 
== 2022-03-31 ==
* 20:58 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/775957
 
== 2022-03-29 ==
* 14:20 James_F: Zuul: [mediawiki/extensions/IPInfo] Add EventLogging phan dependency for [[phab:T304948|T304948]]
* 12:32 hashar: integration-agent-docker-1039: clearing leftover pipelinelib builds: `sudo rm -fR /srv/jenkins/workspace/workspace/*`  [[phab:T304932|T304932]] [[phab:T302477|T302477]]
* 05:35 hashar: Relocate castor directory on integration-castor03 from `/srv/jenkins-workspace/caches` to `/srv/castor` https://gerrit.wikimedia.org/r/c/operations/puppet/+/774771
 
== 2022-03-28 ==
* 16:55 hashar: integration: created instance integration-castor04 with flavor `g3.cores8.ram32.disk20` (twice more ram than integration-castor03) # [[phab:T252071|T252071]]
* 16:49 hashar: integration: created 320G volume https://horizon.wikimedia.org/project/volumes/3f90c3f2-158d-4e45-a919-0f048f47c3b6/ . Intended to migrate integration-castor03 [[phab:T252071|T252071]]
* 10:34 hashar: contint2001 and contint1001: pruning obsolete branches from the zuul-merger: `sudo -H -u zuul find /srv/zuul/git -type d -name .git -print -execdir git -c url."https://gerrit.wikimedia.org/r/".insteadOf="ssh://jenkins-bot@gerrit.wikimedia.org:29418/" remote prune origin \;` [[phab:T220606|T220606]]
* 10:25 hashar: Changed `Trainsperiment Survey Questions` surveys permissions to be open outside of WMF and limited to 1 answer (forcing signin) https://docs.google.com/forms/u/0/d/e/1FAIpQLSd0Nc2jGkAGW-5rTiKN2EHWzfw2HeHm13N-ZCw1xUdE3z6woQ/formrestricted
* 10:18 hashar: contint2001 and contint1001: pruning all git reflog entries from the zuul-merger: `sudo -u zuul find /srv/zuul/git -name .git -type d -execdir git reflog expire --expire=all --all`.  They are useless and no more generated since https://gerrit.wikimedia.org/r/c/operations/puppet/+/757943
* 09:53 hashar: Tag Quibble 1.4.5 @ {{Gerrit|abe16d574}} {{!}} [[phab:T291549|T291549]]
 
== 2022-03-27 ==
* 13:23 James_F: Zuul: [releng/phatality] Make the node14 CI job voting [[phab:T304736|T304736]]
 
== 2022-03-26 ==
* 02:37 Reedy: beta-update-databases-eqiad is back to @hourly
 
== 2022-03-25 ==
* 23:51 Reedy: temporarily turning off period building of beta-update-databases-eqiad until it's run to completion
* 23:21 Reedy: running /usr/local/bin/wmf-beta-update-databases.py manually
* 20:22 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/773866
* 20:02 brennen: mediawiki-new-errors: ran check-new-error-tasks/check.sh and cleared "resolved" filters
* 09:43 hashar: Building Quibble Docker images to rename quibble-with-apache to quibble-with-supervisord
 
== 2022-03-24 ==
* 20:00 hashar: reloading Zuul for {{Gerrit|Id844e1723a38eed627af03397cf0ad90c7b09a32}} # [[phab:T299320|T299320]]
* 20:00 James_F: Clearing integration-castor03:/srv/jenkins-workspace/caches/castor-mw-ext-and-skins/master/mwgate-node14-docker/_cacache/content-v2/sha512/22/ for [[phab:T304652|T304652]]
* 15:00 James_F: Zuul: [design/codex] Publish code coverage reports for [[phab:T303899|T303899]]
* 09:37 Lucas_WMDE: killed a beta-scap-sync-world job manually, let’s see if that helps getting beta updates unstuck
 
== 2022-03-23 ==
* 17:35 brennen: restarting phabricator for [[phab:T304540|T304540]], brief downtime expected
* 14:56 dancy: Updating scap to 4.5.0-1+0~20220321191814.216~1.gbp24bc64 in beta cluster
 
== 2022-03-22 ==
* 14:44 hashar: gerrit: `./deploy_artifacts.py --version=3.3.10 gerrit.war` [[phab:T304226|T304226]]
* 13:50 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/771945
 
== 2022-03-21 ==
* 08:35 hashar: The castor cache for mediawiki/core wmf/1.39-wmf.1 is actually empty!
* 08:32 hashar: Nuking npm castor cache /srv/jenkins-workspace/caches/castor-mw-ext-and-skins/master/wmf-quibble-selenium-php72-docker/npm/ # [[phab:T300203|T300203]]
 
== 2022-03-18 ==
* 14:18 elukey: restart testing of kafka logging TLS certificates (may affect logstash in beta, ping me in case it is a problem)
* 13:22 hashar: Rolling back Quibble jobs from 1.4.4 [[phab:T304147|T304147]]
* 07:41 elukey: experimenting with PKI and kafka logging on deployment-prep, logstash dashboard/traffic may be down (please ping me in case it is a problem)
 
== 2022-03-17 ==
* 19:11 hashar: Building Docker images for Quibble 1.4.4
* 19:06 hashar: Tag Quibble 1.4.4 @ {{Gerrit|56b2c9ba52c}} # [[phab:T300340|T300340]]
* 16:25 hashar: Switching Quibble jobs to use memcached rather than APCu {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/770468 {{!}} [[phab:T300340|T300340]]
* 14:11 hashar: Update all jobs to support `CASTOR_HOST` env variable {{!}} https://gerrit.wikimedia.org/r/770921 {{!}} [[phab:T216244|T216244]] {{!}} [[phab:T252071|T252071]]
* 14:07 hashar: Building Docker image to support `CASTOR_HOST` {{!}} https://gerrit.wikimedia.org/r/770921 {{!}} [[phab:T216244|T216244]]
 
== 2022-03-16 ==
* 22:00 James_F: Docker: Publishing sonar-scanner:4.6.0.2311-3 for [[phab:T303958|T303958]]
* 20:13 James_F: Zuul: [mediawiki/services/function-evaluator and …/function-orchestrator] Switch to npm coverage job for [[phab:T302607|T302607]] and [[phab:T302608|T302608]]
* 19:48 zabe: apply https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/769424/ on deployment-prep
* 19:43 taavi: apply https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/771347/ on deployment-prep
 
== 2022-03-15 ==
* 18:26 brennen: gitlab: removed most existing /people groups
* 18:10 brennen: gitlab: finished migrating access for all existing people groups to direct project membership ([[phab:T274461|T274461]], [[phab:T300935|T300935]])
* 16:49 dancy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/770963
* 14:30 hashar: CI Jenkins: globally defined CASTOR_HOST=integration-castor03.integration.eqiad.wmflabs via https://integration.wikimedia.org/ci/configure # [[phab:T216244|T216244]]
* 14:17 hashar: Apply label `castor` to node https://integration.wikimedia.org/ci/computer/integration-castor03/ # [[phab:T216244|T216244]]
* 01:37 James_F: Zuul: Switch services/function* publish job from node12 to node14
* 01:14 James_F: Zuul: [wikidata/query-builder] Switch branchdeploy from node12 to node14
* 00:08 James_F: Zuul: [wikipeg] Switch from node12 to node14 special job
 
== 2022-03-14 ==
* 23:57 James_F: Zuul: [ooui] Switch from node12 to node14
* 23:46 James_F: Docker: Publishing node14-test-browser-php80-composer:0.1.0
* 23:27 James_F: Zuul: Drop legacy node12 templates except the one for Services
* 23:10 James_F: Zuul: [oojs/router] Drop custom job and just use the generic node14 one
* 23:08 James_F: Zuul: [oojs/core] Switch from node12 to node14 jobs
* 22:46 James_F: Zuul: [unicodejs] Switch from node12 to node14
* 22:25 James_F: Zuul: [VisualEditor/VisualEditor] Switch from node12 to node14
* 19:51 James_F: Zuul: Migrate almost all libraries and tools from node12 to node14 for [[phab:T267890|T267890]]
* 15:36 James_F: Zuul: Switch extension-javascript-documentation from node12 to node14 for [[phab:T267890|T267890]]
* 15:21 James_F: Zuul: Switch all mwgate jobs from node12 to node14 for [[phab:T267890|T267890]]
* 09:52 hashar: Building Quibble Docker images for https://gerrit.wikimedia.org/r/757867 {{!}} [[phab:T300340|T300340]]
* 08:54 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/770079
 
== 2022-03-11 ==
* 04:02 zabe: zabe@deployment-mwmaint02:~$ mwscript extensions/CentralAuth/maintenance/populateGlobalEditCount.php --wiki=metawiki
 
== 2022-03-10 ==
* 20:45 zabe: apply https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/769416 on deployment-prep centralauth databases
* 20:25 James_F: Zuul: [mediawiki/extensions/VueTest] Add basic quibble CI
* 20:03 Krinkle: Updating docker-pkg files on contint primary for  https://gerrit.wikimedia.org/r/768843
* 15:12 hashar: updating Quibble jenkins jobs
* 14:26 James_F: Docker: Publishing new versions of quibble-buster and cascade adding unzip for [[phab:T250496|T250496]] / [[phab:T303417|T303417]].
* 11:43 Amir1: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/769668
* 09:59 dwalden: restarted apache on deployment-mediawiki11 # [[phab:T302699|T302699]]
 
== 2022-03-09 ==
* 17:08 hashar: Updating Gerrit Comment.soy to get rid of a literal `null` string being inserted in notification emails {{!}} https://gerrit.wikimedia.org/r/c/operations/puppet/+/768005 {{!}} https://phabricator.wikimedia.org/T288312
 
== 2022-03-08 ==
* 20:31 brennen: requiring 2fa for all users under /repos
 
== 2022-03-07 ==
* 10:53 zabe: restarted apache on deployment-mediawiki11 # [[phab:T302699|T302699]]
 
== 2022-03-04 ==
* 20:29 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/768146
* 19:13 Krinkle: Reloading Zuul to deploy  https://gerrit.wikimedia.org/r/768068
 
== 2022-03-03 ==
* 19:13 dancy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/767864
* 15:37 James_F: Docker: Publishing sury-php images based on bullseye not stretch and cascade for [[phab:T278203|T278203]]
* 14:43 hashar: Reloading Zuul for {{Gerrit|Iae45cae8ec209a3e795fe4fd7dd92290565277db}}
* 12:47 hashar: Upgrading Quibble on CI Jenkins jobs from 1.3.0 to 1.4.3 https://gerrit.wikimedia.org/r/c/integration/config/+/767749/
* 10:30 hashar: Building Docker images for Quibble 1.4.3
* 10:22 hashar: Tagged Quibble 1.4.3 @ {{Gerrit|cf5cd1a0a07}}
* 09:24 hashar: Building Docker images for Quibble 1.4.2
* 09:20 hashar: Tag Quibble 1.4.2 @ {{Gerrit|63d2855a1e}} # [[phab:T302226|T302226]] [[phab:T302707|T302707]]
 
== 2022-03-02 ==
* 19:53 James_F: Zuul: Configure CI for the forthcoming REL1_38 branches for [[phab:T302908|T302908]]
* 15:56 dancy: Updating scap to 4.4.1-1+0~20220302155149.192~1.gbpe351d6 in beta
* 15:27 Krinkle: Reloading Zuul to deploy  https://gerrit.wikimedia.org/r/767493
* 15:04 taavi: resolve merge conflicts on deployment-puppetmaster04
 
== 2022-02-28 ==
* 19:29 brennen: removing mutante (dzahn) as application-level gitlab admin; adding as owner of /repos for the time being to facilitate some migrations
* 19:22 dancy: Update scap to 4.4.0-1+0~20220228192031.189~1.gbp0a8436 in beta
* 19:17 brennen: adding mutante (dzahn) as application-level gitlab admin
 
== 2022-02-26 ==
* 20:05 zabe: apply [[phab:T302658|T302658]] on deployment-prep centralauth databases
* 13:24 zabe: apply [[phab:T302660|T302660]] on deployment-prep centralauth databases
* 13:19 zabe: apply [[phab:T302659|T302659]] on deployment-prep centralauth databases
 
== 2022-02-24 ==
* 16:02 dancy: Updating beta cluster scap to 4.4.0-1+0~20220224155429.187~1.gbp66c5c2
* 13:44 hashar: integration/config now fully enforces shellcheck https://gerrit.wikimedia.org/r/756088
* 13:13 hashar: Built image docker-registry.discovery.wmnet/releng/castor:0.2.5
* 13:10 hashar: Updating castor-save-workspace-cache job https://gerrit.wikimedia.org/r/764817
* 11:54 hashar: Built image docker-registry.discovery.wmnet/releng/shellcheck:0.1.1
* 11:41 hashar: Built image docker-registry.discovery.wmnet/releng/sonar-scanner:4.6.0.2311-2
* 11:04 hashar: Built image docker-registry.discovery.wmnet/releng/operations-puppet:0.8.6
* 08:58 hashar: Built image docker-registry.discovery.wmnet/releng/mediawiki-phan-testrun:0.2.1
 
== 2022-02-23 ==
* 23:21 dancy: Update beta cluster scap to 4.3.1-1+0~20220223231645.183~1.gbp8ddb60
* 20:10 dancy: Updating scap in beta
* 19:23 hashar: Built docker-registry.discovery.wmnet/releng/logstash-filter-verifier:0.0.3
* 12:41 hashar: Depooling integration-agent-puppet-docker-1002 , pooling integration-agent-puppet-docker-1003 # [[phab:T252071|T252071]]
* 10:21 hashar: Created Bullseye instance integration-agent-puppet-docker-1003 https://horizon.wikimedia.org/project/instances/96cf9ddc-daa3-4c9f-8c21-cdd58e95973e/  # [[phab:T252071|T252071]]
* 08:37 hashar: Removing Stretch based integration-agent-qemu-1001 # [[phab:T284774|T284774]]
 
== 2022-02-22 ==
* 16:41 zabe: zabe@deployment-mwmaint02:~$ foreachwiki migrateUserGroup.php oversight suppress # [[phab:T112147|T112147]]
* 13:28 urbanecm: deployment-prep: Create database for incubatorwiki ([[phab:T210492|T210492]])
 
== 2022-02-21 ==
* 14:58 hashar: Reverting Quibble jobs from 1.4.0 to 1.3.0 # [[phab:T302226|T302226]]
* 07:31 hashar: Switching Quibble jobs from Quibble 1.3.0 to 1.4.0 # [[phab:T300340|T300340]] [[phab:T291549|T291549]] [[phab:T225730|T225730]]
* 07:27 hashar: Refreshing all Jenkins jobs
 
== 2022-02-20 ==
* 10:32 qchris: Manually triggering replication run of Gerrit's analytics/datahub to populate newly created analytics-datahub GitHub repo
 
== 2022-02-19 ==
* 12:19 taavi: restart trafficserver-tls on deployment-cache-text06
* 02:15 James_F: Zuul: [design/codex] Publish the Netlify preview on every patch for [[phab:T293705|T293705]]
* 00:35 James_F: Manually re-triggered a build of the docs of Codex (via `zuul-test-repo design/codex postmerge`) now that we actually set the environment vars for [[phab:T293705|T293705]]
 
== 2022-02-18 ==
* 22:54 James_F: Zuul: [branchdeploy-codex-node14-npm-docker] Create as experimental for [[phab:T293705|T293705]]
* 22:14 James_F: Jenkins: Defined BRANCHDEPLOY_AUTH_TOKEN_codex and BRANCHDEPLOY_SITE_ID_codex secrets for [[phab:T293705|T293705]]
* 13:44 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/763724 [[phab:T301453|T301453]]
* 09:21 hashar: Reloading Zuul for {{Gerrit|I1494abb5e9e28da951ffb72154a074a16a0f8381}}
 
== 2022-02-17 ==
* 21:48 brennen: added Dzahn (mutante) to acl*repository-admins on phabricator
* 15:58 zabe: root@deployment-cache-upload06:~# touch /srv/trafficserver/tls/etc/ssl_multicert.config && systemctl reload trafficserver-tls.service # [[phab:T301995|T301995]]
* 13:35 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/763207
* 13:20 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/763458
* 11:12 hashar: Bringing deployment-deploy03 back
* 11:07 hashar: Disabled deployment-deploy03 Jenkins agent in order to revert some mediawiki/core patch and test the outcome
 
== 2022-02-16 ==
* 18:20 hashar: Tag Quibble 1.4.1 @ {{Gerrit|d4bd2801de}} # [[phab:T300301|T300301]]
* 16:42 dancy: Updating to scap 4.3.1-1+0~20220216163646.173~1.gbp823710?in beta
* 12:55 jelto: apply gitlab-settings to gitlab-prod-1001.devtools.eqiad1.wikimedia.cloud
* 10:09 hashar: Reloading Zuul for {{Gerrit|I997fee0f160ca3049b8085879831bfe175096ced}}
* 09:59 hashar: Reloading Zuul for {{Gerrit|I2ffa016563ad37f1e7c13dcce81deb8ab411c9e2}}
 
== 2022-02-15 ==
* 21:12 dancy: rebooting deployment-mediawiki12.deployment-prep.eqiad1.wikimedia.cloud to try to revive beta wikis
* 20:59 dancy: Killed runaway puppet agent on deployment-mediawiki11.deployment-prep.eqiad1.wikimedia.cloud
* 16:24 hashar: Restarting CI Jenkins for plugins updates
* 16:21 hashar: Upgrading Jenkins plugins on releases Jenkins
* 16:06 hashar: Rollback fresh-test Jenkins job to the version intended to run on integration-agent-qemu-1001
* 15:26 hashar: Reloading Zuul for {{Gerrit|If80b4b4cfa5c1a869ceb220f5b11c272b384a721}}
 
== 2022-02-14 ==
* 16:28 dancy: Updating scap in beta cluster to 4.3.1-1+0~20220211225318.167~1.gbp315b2c
* 16:16 Amir1: Reloading Zuul to deploy  https://gerrit.wikimedia.org/r/c/integration/config/+/762471
* 15:41 hashar: Messing up with fresh-test Jenkns job to polish up Qemu / qcow2 integration
* 14:26 jnuche: Jenkins upgrade complete [[phab:T301361|T301361]]
* 13:54 jnuche: Jenkins contint instances are going to be restarted soon
 
== 2022-02-12 ==
* 18:22 urbanecm: deployment-prep: reboot deployment-eventgate-3 ([[phab:T289029|T289029]])
 
== 2022-02-10 ==
* 17:29 jeena: reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/761602
 
== 2022-02-09 ==
* 15:22 taavi: deleted shutoff deployment-mx02
 
== 2022-02-08 ==
* 17:34 taavi: remove scap from deployment-kafka-main/jumbo
* 16:23 taavi: hard reboot misbehaving deployment-echostore01
* 13:39 taavi: delete /srv/mediawiki-staging.save on deployment-deploy03
 
== 2022-02-07 ==
* 20:55 taavi: added Zabe as member of the deployment-prep project [[phab:T301179|T301179]]
* 18:19 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/760550
 
== 2022-02-04 ==
* 00:21 Krinkle: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/759622
 
== 2022-02-03 ==
* 18:41 taavi: deployment-prep: route /w/api.php to deployment-mediawiki11, trying to reduce load on a single server
* 14:53 hashar: Building Docker images for Quibble 1.4.0  (prepared by kostajh)
* 13:51 kostajh: Tag Quibble 1.4.0 @ {{Gerrit|4231bc2832395d94e29a332fe8d863301a0cd441}} # [[phab:T300340|T300340]] [[phab:T291549|T291549]] [[phab:T225730|T225730]]
 
== 2022-02-02 ==
* 16:50 dancy: Upgrading scap to 4.2.2-1+0~20220202164708.157~1.gbp376a16 in beta.
* 16:12 dancy: Upgrading scap to 4.2.2-1+0~20220201161808.156~1.gbp1c1c64 in beta
 
== 2022-02-01 ==
* 17:27 addshore: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/734654
* 00:34 tgr: deployment-pre un-cherry-picked gerrit 758584 from beta puppetmaster, patch is now merged [[phab:T300591|T300591]]
* 00:12 tgr: deployment-prep cherry-picked gerrit 758584 to beta puppetmaster [[phab:T300591|T300591]]
 
== 2022-01-31 ==
* 19:01 James_F: Re-configured Jenkins job mediawiki-i18n-check-docker to {{Gerrit|9e3ea96c548d7a84be763d38c2d118bc861cf189}} for [[phab:T222216|T222216]]
* 10:49 hashar: Added integration-agent-qemu-1003 with label `Qemu` # [[phab:T284774|T284774]]
 
== 2022-01-28 ==
* 21:45 taavi: running recountCategories.php on all beta wikis per [[phab:T299823|T299823]]#7652496
* 14:27 hashar: taking heapdump  of CI Jenkins `sudo -u jenkins /usr/lib/jvm/java-11-openjdk-amd64/bin/jmap -dump:live,format=b,file=/var/lib/jenkins/202201281527.hprof xxxx`
 
== 2022-01-27 ==
* 20:26 hashar: Successfully published image docker-registry.discovery.wmnet/releng/logstash-filter-verifier:0.0.2  # [[phab:T299431|T299431]]
* 19:34 Amir1: Reloading Zuul to deploy 757464
* 16:00 hashar: Pooling back agents 1035 1036 1037 1038 , they could not connect due to ssh host mismatch since yesterday they all got attached to instance 1033 and accepted that host key # [[phab:T300214|T300214]]
* 09:16 hashar: integration: cumin --force 'name:docker' 'apt install rsync'  # [[phab:T300236|T300236]]
* 09:05 hashar: integration: cumin --force 'name:docker' 'apt install rsync'  # [[phab:T300214|T300214]]
* 00:24 thcipriani: restarting jenkins
 
== 2022-01-26 ==
* 20:29 hashar: Completed migration of integration-agent-docker-XXXX instances from Stretch to Bullseye - [[phab:T252071|T252071]]
* 19:55 hashar: deleting integration-agent-docker-1014 which only has the `codehealth` label. A short live experiment no more used since October 2nd 2019 - https://gerrit.wikimedia.org/r/c/integration/config/+/540362 - [[phab:T234259|T234259]]
* 18:56 hashar: integration: pooled in Jenkins a few more Bullseye docker agents for [[phab:T252071|T252071]]
* 18:17 hashar: integration: pooled in Jenkins a few Bullseye docker agent for [[phab:T252071|T252071]]
* 16:45 hashar: integration: creating  integration-agent-docker-1023  based on buster with new flavor `g3.cores8.ram24.disk20.ephemeral60.4xiops` # [[phab:T290783|T290783]]
 
== 2022-01-25 ==
* 20:17 James_F: Zuul: [mediawiki/extensions/CentralAuth] Drop UserMerge dependency
* 16:39 James_F: Zuul: Mark Math extension as now tarballed in parameter_functions for [[phab:T232948|T232948]]
* 15:57 James_F: Zuul: [mediawiki/extensions/Math] Add Math to the main gate for [[phab:T232948|T232948]]
* 13:44 hashar: Jenkins CI: added Logger https://integration.wikimedia.org/ci/log/ProcessTree%20-%20T299995/ to watch `hudson.util.ProcessTree` for [[phab:T299995|T299995]]
* 10:02 hashar: integration: removing usage of `role::ci::slave::labs::docker::docker_lvm_volume` in Horizon following https://gerrit.wikimedia.org/r/c/operations/puppet/+/755948  . Docker role instances now always have a 24G partition for Docker
* 09:59 hashar: integration-agent-qemu-1001: resized /srv to 100% disk free: `lvextend -r -l +100%FREE /dev/mapper/vd-second--local--disk` # [[phab:T299996|T299996]]
* 09:59 hashar: integration-agent-qemu-1001: resizing /dev/mapper/vd-second--local--disk (/srv) to 20G : `resize2fs -p /dev/mapper/vd-second--local--disk 20G` # [[phab:T299996|T299996]]
* 09:51 hashar: integration-agent-qemu-1001: resizing /dev/mapper/vd-second--local--disk (/srv) to 20G : `resize2fs -p /dev/mapper/vd-second--local--disk 20G`
* 09:51 hashar: integration-agent-qemu-1003: nuked /dev/vd/second-local-disk and /srv to make room for a docker logical volume. That has fixed puppet  [[phab:T299996|T299996]]
* 09:22 Reedy: unblocked beta again
* 07:32 Krinkle: integration-castor03:/srv/jenkins-workspace/caches$ sudo rm -rf castor-mw-ext-and-skins/
 
== 2022-01-24 ==
* 21:44 Reedy: unstick beta ci jobs
* 21:19 jeena: reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/756523
* 20:36 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/756139
* 17:28 hashar: Nuke castor caches on integration-castor03 : sudo rm -fR /srv/jenkins-workspace/caches/castor-mw-ext-and-skins/master/<nowiki>{</nowiki>quibble-vendor-mysql-php72-selenium-docker,wmf-quibble-selenium-php72-docker<nowiki>}</nowiki>  # [[phab:T299933|T299933]]
* 17:28 hashar: Nuke castor caches on integration-castor03 : sudo rm -fR /srv/jenkins-workspace/caches/castor-mw-ext-and-skins/master/<nowiki>{</nowiki>quibble-vendor-mysql-php72-selenium-docker,wmf-quibble-selenium-php72-docker<nowiki>}</nowiki>
 
== 2022-01-22 ==
* 13:40 taavi: apply [[phab:T299827|T299827]] on deployment-prep centralauth database
* 11:44 taavi: restart varnish-frontend.service on deployment-cache-upload06 to clear puppet agent failure alerts
 
== 2022-01-21 ==
* 18:12 taavi: resolved merge conflicts on deployment-puppetmaster04
* 15:50 hashar: integration-puppetmaster-02: deleted 2021 snapshot tags in puppet repo and ran `git gc --prune=now`
 
== 2022-01-20 ==
* 20:24 James_F: Zuul: [Kartographer] Add parsoid as dependency for CI jobs
* 20:22 James_F: Zuul: [DiscussionTools] Add Gadgets as dependency for Phan jobs
* 20:04 dancy: Jenkins beta jobs are back online, using scap prep auto now.
* 19:19 dancy: Pausing beta Jenkins jobs to make a copy of /srv/mediawiki-staging in preparation for testing
* 19:10 dancy: Unpacking scap (4.1.1-1+0~20220120175448.144~1.gbp517f9d) over (4.1.1-1+0~20220113154148.133~1.gbp6e3a17) on deploy03
* 18:07 hashar: Updating Quibble jobs to have MediaWiki files written on the hosts /srv partition (38G) instead of inside the container which ends in /var/lib/docker (24G) https://gerrit.wikimedia.org/r/755743  # [[phab:T292729|T292729]]
* 16:31 hashar: Rebalancing /var/lib/docker and /srv partitions on CI agents {{!}} https://gerrit.wikimedia.org/r/755713
* 12:12 hashar: contint2001 deleting all the Docker images (they will be pulled as needed)
* 12:10 hashar: contint2001 : docker container prune && docker image prune
* 12:07 hashar: contint1001 deleting all the Docker images (they will be pulled as needed)
* 12:04 hashar: contint1001 `docker image prune`
* 11:51 hashar: Cleaning very old Docker images on contint1001.wikimedia.Org
 
== 2022-01-19 ==
* 18:20 hashar: Adding  https://integration.wikimedia.org/ci/computer/contint1001/ back to the pool again
* 17:31 hashar: Adding  https://integration.wikimedia.org/ci/computer/contint1001/ back to the pool after the machine got powercycled # [[phab:T299542|T299542]]
* 10:38 Reedy: kill some stuck jobs [[phab:T299485|T299485]]
 
== 2022-01-18 ==
* 19:56 hashar: building Docker images for https://gerrit.wikimedia.org/r/754951
* 18:01 taavi: added ryankemper as a member of the deployment-prep project
* 15:00 hashar: Updating Jenkins jobs for Quibble 1.3.0  with proper PHP version in the images # [[phab:T299389|T299389]]
* 11:39 hashar: Rolling back Quibble 1.3.0 jobs due to php configuration files with at least releng/quibble-buster73:1.3.0  # [[phab:T299389|T299389]]
* 08:07 hashar: Updating Jenkins jobs for Quibble to pass `--parallel-npm-install` https://gerrit.wikimedia.org/r/c/integration/config/+/754569
* 08:02 hashar: Updating Jenkins jobs for Quibble 1.3.0
 
== 2022-01-17 ==
* 16:28 hashar: Building Quibble 1.3.0 Docker images
* 16:16 hashar: Tagged Quibble 1.3.0 @ {{Gerrit|2b2c7f9a45}} # [[phab:T297480|T297480]] [[phab:T226869|T226869]] [[phab:T294931|T294931]]
* 08:32 hashar: Refreshing all Jenkins jobs with jjb to take in account recent changes related to the Jinja2 docker macro
 
== 2022-01-14 ==
* 15:56 dancy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/753981
* 14:59 hashar: Starting VM integration-agent-docker-1022 which was in shutdown state since December and is Bullseye based # [[phab:T290783|T290783]]
* 13:49 hashar: Restarting all CI Docker agents via Horizon to apply new flavor settings [[phab:T265615|T265615]] [[phab:T299211|T299211]]
* 01:47 dancy: revert to scap 4.1.1-1+0~20220113154148.133~1.gbp6e3a17 in beta
 
== 2022-01-13 ==
* 18:02 dancy: Updating scap to 4.1.1-1+0~20220113154506.135~1.gbp523480 on all beta hosts
* 17:54 dancy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/753792
* 16:27 dancy: testing scap prep auto on deployment-deploy03
* 15:52 dancy: Update scap to 4.1.1-1+0~20220113154506.135~1.gbp523480 on deployment-deploy03
* 11:27 hashar: Updating Jenkins job to normalize usage of `docker run --workdir` https://gerrit.wikimedia.org/r/c/integration/config/+/753457
* 10:52 hashar: Restarting Jenkins CI for plugins update
* 10:42 hashar: Applied Jenkins built-in node migration to CI Jenkins (`master` > `built-in` renaming) # [[phab:T298691|T298691]]
* 10:14 taavi: cancelled stuck deployment-prep jobs on jenkins
 
== 2022-01-12 ==
* 18:58 hashar: Applied plugins update to https://releases-jenkins.wikimedia.org/
 
== 2022-01-11 ==
* 09:18 hashar: Updating all Jenkins jobs following recent "noop" refactorings
 
== 2022-01-10 ==
* 17:13 dancy: Update beta scap to 4.1.0-1+0~20220107203309.130~1.gbpcd0ace
* 14:01 James_F: Zuul: Add gate-and-submit-l10n to Isa for [[phab:T222291|T222291]]
 
== 2022-01-05 ==
* 19:15 taavi: run `sudo chown -R jenkins-deploy:wikidev public/dists/bullseye-deployment-prep/` on deployment-deploy03
* 17:31 hashar: Deploying Zuul change https://gerrit.wikimedia.org/r/c/integration/config/+/751697  to get rid of the wmf-quibble-apache jobs # [[phab:T285649|T285649]]
* 10:48 hashar: CI: switching MediaWiki selenium from php built-in server to Apache # https://gerrit.wikimedia.org/r/751697
* 09:24 hashar: Updating Quibble jobs to use latest image (provides `quibble-with-apache` entrypoint) https://gerrit.wikimedia.org/r/c/integration/config/+/751685/
 
== 2022-01-04 ==
* 12:49 hashar: Reloading Zuul for "api-testing: rename jobs to shorter forms"  https://gerrit.wikimedia.org/r/751422
* 09:48 hashar: Builder Quibble Docker images with Apache included https://gerrit.wikimedia.org/r/c/integration/config/+/748104
* 09:47 hashar: Reloading Zuul for "Add CentralAuth to phan dependency list for GrowthExperiments" https://gerrit.wikimedia.org/r/751383
 
== 2022-01-03 ==
* 14:37 hashar: Upgraded Java 11 on contint2001 && contint1001.  Restarted CI Jenkins.
* 14:35 hashar: Upgraded Java 11 on releases1002 && releases2002
 
 
{{SAL-archives/Release Engineering}}


__NOTOC__
<noinclude>[[Category:SAL]]</noinclude>
<noinclude>[[Category:SAL]]</noinclude>

Revision as of 15:59, 23 June 2022

2022-06-23

2022-06-22

  • 17:36 taavi: gerrit: add tfellows to the extension-OpenBadges group per request in T308278
  • 17:35 taavi: gerrit: create group extension-JsonData with robla in it, make it an owner of mediawiki/extensions/JsonData per request in T303147
  • 16:19 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/807586
  • 09:35 hashar: Switched `gitlab-prod-1001.devtools.eqiad1.wikimedia.cloud` instance to use the project Puppet master `puppetmaster-1001.devtools.eqiad1.wikimedia.cloud`
  • 09:08 hashar: contint1001 , contint2002: deleting `.git/logs` from all zuul-merger repositories. We do not need the reflog `sudo -u zuul find /srv/zuul/git -type d -name .git -print -execdir rm -fR .git/logs \;` # T307620
  • 09:00 hashar: contint1001 , contint2002: setting `core.logallrefupdates=false` on all Zuul merger git repositories: `sudo -u zuul find /srv/zuul/git -type d -name .git -print -execdir git config core.logallrefupdates false \;` # T307620
  • 07:46 hashar: Building operations-puppet docker image for https://gerrit.wikimedia.org/r/c/integration/config/+/807180

2022-06-21

  • 22:01 brennen: gitlab-runners: re-registering all shared runners
  • 17:55 dancy: Upgrading scap to 4.9.4-1+0~20220621174226.320~1.gbp56e4d4 in beta cluster

2022-06-20

  • 16:30 urbanecm: add sgimeno as a project member (Growth engineer with need for access)
  • 15:50 ori: On deployment-cache-{text,upload}06, ran: touch /srv/trafficserver/tls/etc/ssl_multicert.config && systemctl reload trafficserver-tls.service (T310957)
  • 14:07 ori: restarted acme-chief on deployment-acme-chief03

2022-06-17

  • 17:15 ori: provisioned deployment-cache-text07 in deployment-prep to test query normalization via VCL
  • 01:08 TimStarling: on deployment-docker-cpjobqueue01 and deployment-docker-changeprop01 I redeployed the changeprop configuration, reverting the PHP 7.4 hack

2022-06-16

  • 12:24 hashar: gitlab: runner-1030: `docker volume prune -f`
  • 12:24 hashar: gitlab: runner-1026: `docker volume prune -f`
  • 10:02 elukey: ran `scap install-world --batch` to allow scap/puppet to work on ml-cache100[2,3]

2022-06-15

  • 22:39 brennen: phabricator: tagged release/2022-06-15/1 (T310742)
  • 16:31 hashar: integration-agent-docker-1035: docker image prune
  • 15:26 dancy: Upgrading scap to 4.9.4-1+0~20220615151557.315~1.gbped3b8d in beta cluster

2022-06-14

  • 21:30 TheresNoTime: clear out stuck `beta-scap-sync-world` jobs (repeatedly per each queued `beta-mediawiki-config-update-eqiad` job), queued jobs now running. monitored for until each job had run successfully. jobs up to date
  • 17:18 brennen: starting 1.39.0-wmf.16 (T308069) transcript in deploy1002:~brennen/1.39.0-wmf.16.log
  • 13:35 TheresNoTime: clear stuck `beta-scap-sync-world` job, other queued jobs now running. Cancel running `beta-update-databases-eqiad` job, will ensure it runs on the next timer
  • 00:42 TimStarling: on deployment-deploy03 removed helm2, as was done in production

2022-06-13

  • 22:04 TheresNoTime: cleared out stalled Jenkins beta jobs on `deployment-deploy03`, manually started `beta-code-update-eqiad` job & watched to completion. all caught up
  • 04:33 hashar: Restarting Docker on contint1001.wikimedia.org , apparently can't build images anymore

2022-06-12

2022-06-10

  • 15:20 James_F: Zuul: [mediawiki/extensions/SearchVue] Add initial CI jobs for T309932
  • 08:28 hashar: Reloaded Zuul to remove mediawiki/services/parsoid from CI dependencies # https://gerrit.wikimedia.org/r/c/integration/config/+/803990
  • 04:27 TimStarling: on deployment-deploy03 running scap sync-world -v with PHP 7.4 for T295578
  • 04:03 TimStarling: on deployment-deploy03 running scap sync-world -v with PHP 7.2 for T295578 sanity check

2022-06-09

  • 22:49 dancy: Upgrading scap to 4.9.1-1+0~20220609211227.304~1.gbpe48c42 in beta cluster
  • 16:39 brennen: gitlab shared runners: re-registering to apply image allowlist configuration

2022-06-08

  • 17:14 hashar: Reloaded Zuul for I393422
  • 15:57 dancy: Set `profile::mediawiki::php::restarts::ensure: present` in deployment-prep hiera config for T237033
  • 09:28 hashar: Reloaded Zuul for "Add doc publish for Translate" https://gerrit.wikimedia.org/r/792134

2022-06-06

  • 14:37 James_F: Zuul: [mediawiki/extensions/ImageSuggestions] Mark as in production for T302711

2022-06-02

  • 15:33 dancy: Upgrading scap to 4.8.1-1+0~20220602153109.295~1.gbp318d9c in beta cluster
  • 11:26 hashar: Restarting Jenkins on contint2001
  • 11:19 hashar: Restarting Jenkins on releases1002

2022-05-31

  • 21:16 dancy: Upgrading scap to 4.8.0-1+0~20220531211114.292~1.gbp8dbbcf in beta cluster
  • 17:40 dancy: Upgrading scap to 4.8.0-1+0~20220531173912.291~1.gbp21a7ef in beta cluster
  • 17:33 dancy: Reverted to scap 4.8.0-1+0~20220524160924.288~1.gbp794a08 in beta cluster
  • 17:07 dancy: Upgrading scap to 4.8.0-1+0~20220531170512.289~1.gbp143729 in beta cluster

2022-05-30

  • 11:47 jelto: apply gitlab-settings to gitlab1004 - T307142
  • 11:46 jelto: apply gitlab-settings to gitlab1003 - T307142

2022-05-28

  • 19:09 TheresNoTime: deployment-deploy04 live, not referenced by anything T309437

2022-05-27

  • 22:55 zabe: zabe@deployment-mwmaint02:~$ mwscript extensions/WikiLambda/maintenance/updateTypedLists.php --wiki=wikifunctionswiki --db # started ~20 min ago
  • 22:49 TheresNoTime: manually running database update script: samtar@deployment-deploy03:~$ /usr/local/bin/wmf-beta-update-databases.py
  • 22:09 TheresNoTime: samtar@deployment-deploy03:~$ sudo keyholder arm
  • 21:44 TheresNoTime: hard rebooted deployment-deploy03 as soft reboot unresponsive
  • 21:44 bd808: `sudo wmcs-openstack role add --user zabe --project deployment-prep projectadmin` (T309419)
  • 21:10 zabe: zabe@deployment-deploy03:~$ sudo keyholder arm
  • 20:53 bd808: `sudo wmcs-openstack role add --user samtar --project deployment-prep projectadmin` (T309415)
  • 20:49 dancy: Initiated hard reboot of deployment-deploy03.deployment-prep

2022-05-26

  • 18:33 dancy: Updated Jenkins beta-* job configs
  • 16:51 TheresNoTime: manually triggered beta-update-databases-eqiad post-merge of 2c7b5825
  • 16:51 brennen: puppetmaster-1001.devtools: resetting ops/puppet checkout to production branch

2022-05-25

  • 18:38 TheresNoTime: (@ ~18:20UTC) samtar@deployment-mwmaint02:~$ mwscript resetUserEmail.php --wiki=wikidatawiki Mahir256 [snip] T309230
  • 15:46 dancy: Restarted apache2 on gerrit1001

2022-05-24

2022-05-23

  • 19:21 inflatador: Deleted deployment-elastic0[5-7] in favor of newer bullseye hosts T299797
  • 18:37 dancy: Reverted to scap 4.7.1-1+0~20220505181519.270~1.gbpeb47ae in beta cluster
  • 18:35 dancy: Upgrading beta cluster scap to 4.7.1-1+0~20220523183110.280~1.gbpaa0826
  • 14:49 James_F: Zuul: Enforce Postgres and SQLite support via in-mediawiki-tarball
  • 08:37 elukey: move kafka jumbo in deployment-prep to fixed uid/gid - T296982
  • 08:29 elukey: move kafka main in deployment-prep to fixed uid/gid - T296982
  • 08:06 elukey: move kafka logging in deployment-prep to fixed uid/gid - T296982

2022-05-22

2022-05-21

2022-05-20

2022-05-19

2022-05-18

  • 19:31 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/793028
  • 18:45 brennen: gitlab: created placeholder /repos/mediawiki group for squatting purposes
  • 08:29 hashar: Updating SSH Build agent from 1.31.5 to 1.32.0 on CI Jenkins to prevent an issue when uploading `remoting.jar` # T307339#7937268
  • 07:32 hashar: Deleting Jenkins agent configuration for `integration-castor03` # T252071

2022-05-17

  • 23:26 James_F: Zuul: [mediawiki/extensions/Phonos] Install basic quibble CI for T308558

2022-05-16

2022-05-14

  • 23:19 James_F: Zuul: Add Dreamy_Jazz to CI allow list
  • 23:17 James_F: Zuul: [mediawiki/extensions/LocalisationUpdate] Move out of production section
  • 20:25 urbanecm: add TheresNoTime (samtar) as a project member per request

2022-05-13

2022-05-12

  • 22:09 inflatador: bking@deployment-elastic05 banned deployment-elastic05 from beta ES cluster in preparation for decom T299797
  • 19:53 hashar: gerrit: triggering full replication to gerrit2001 to test T307137
  • 16:00 hashar: contint2001 and contint1001 now automatically run `docker system prune --force` every day and `docker system prune --force` on Sunday | https://gerrit.wikimedia.org/r/c/operations/puppet/+/773784/
  • 15:05 brennen: gitlab-prod-1001.devtools: soft reboot
  • 00:46 brennen: gitlab: disabling container registries on all existing projects (T307537)

2022-05-11

  • 23:20 brennen: gitlab-prod-1001.devtools: container registry currently enabled
  • 18:58 brennen: gitlab-prod-1001.devtools: setting to use devtools standalone puppetmaster

2022-05-10

2022-05-09

2022-05-08

  • 12:33 urbanecm: deployment-prep: urbanecm@deployment-mwmaint02:~$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/migrateMenteeOverviewFiltersToPresets.php --update # T304057

2022-05-06

  • 12:55 hashar: Migrated Castor service from integration-castor03 to integration-castor05 # T252071

2022-05-05

2022-05-04

2022-05-03

2022-05-02

2022-04-29

2022-04-28

2022-04-27

2022-04-26

  • 15:40 brennen: train 1.39.0-wmf.9 (T305215): no current blockers - expect to start train ops after the toolhub deployment window wraps, so some time after 17:00 UTC; taking a pre-train stroll-around-the-block break before that.
  • 13:46 James_F: Deleting deployment-mx02.deployment-prep.eqiad1.wikimedia.cloud for T306068
  • 13:38 James_F: Zuul: [mediawiki/extensions/SimilarEditors] Install basic prod CI for T306897
  • 12:33 hashar: Manually pruned dangling docker images on contint1001 and contint2001
  • 08:30 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/780824
  • 08:09 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/785204

2022-04-25

2022-04-20

  • 16:25 zabe: root@deployment-cache-upload06:~# touch /srv/trafficserver/tls/etc/ssl_multicert.config && systemctl reload trafficserver-tls.service

2022-04-18

  • 19:27 brennen: gitlab runners: deleting a number of stale runners with no contacts in > 2 months which are most likely no longer extant
  • 16:49 brennen: phabricator: created phame blog https://phabricator.wikimedia.org/phame/blog/view/22/ for T306329
  • 16:48 brennen: phabricator: adding self to acl*blog-admins
  • 15:33 James_F: Shutting off deployment-wdqs01 from the Beta Cluster project per T306054; it's apparently unused, so this shouldn't break anything.

2022-04-14

2022-04-12

2022-04-08

2022-04-07

  • 06:07 urbanecm: deployment-prep: foreachwiki extensions/GrowthExperiments/maintenance/T304461.php --delete # T304461, output is at P24204
  • 05:54 urbanecm: deployment-prep: mwscript extensions/GrowthExperiments/maintenance/T304461.php --wiki={enwiki,cswiki} --delete # T304461

2022-04-06

  • 20:03 thcipriani: rebooting phabricator
  • 11:44 James_F: Zuul: [mediawiki/extensions/WikiEditor] Add BetaFeatures to phan deps for T304596

2022-04-04

2022-04-02

2022-03-31

2022-03-29

  • 14:20 James_F: Zuul: [mediawiki/extensions/IPInfo] Add EventLogging phan dependency for T304948
  • 12:32 hashar: integration-agent-docker-1039: clearing leftover pipelinelib builds: `sudo rm -fR /srv/jenkins/workspace/workspace/*` T304932 T302477
  • 05:35 hashar: Relocate castor directory on integration-castor03 from `/srv/jenkins-workspace/caches` to `/srv/castor` https://gerrit.wikimedia.org/r/c/operations/puppet/+/774771

2022-03-28

2022-03-27

  • 13:23 James_F: Zuul: [releng/phatality] Make the node14 CI job voting T304736

2022-03-26

  • 02:37 Reedy: beta-update-databases-eqiad is back to @hourly

2022-03-25

  • 23:51 Reedy: temporarily turning off period building of beta-update-databases-eqiad until it's run to completion
  • 23:21 Reedy: running /usr/local/bin/wmf-beta-update-databases.py manually
  • 20:22 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/773866
  • 20:02 brennen: mediawiki-new-errors: ran check-new-error-tasks/check.sh and cleared "resolved" filters
  • 09:43 hashar: Building Quibble Docker images to rename quibble-with-apache to quibble-with-supervisord

2022-03-24

  • 20:00 hashar: reloading Zuul for Id844e1 # T299320
  • 20:00 James_F: Clearing integration-castor03:/srv/jenkins-workspace/caches/castor-mw-ext-and-skins/master/mwgate-node14-docker/_cacache/content-v2/sha512/22/ for T304652
  • 15:00 James_F: Zuul: [design/codex] Publish code coverage reports for T303899
  • 09:37 Lucas_WMDE: killed a beta-scap-sync-world job manually, let’s see if that helps getting beta updates unstuck

2022-03-23

  • 17:35 brennen: restarting phabricator for T304540, brief downtime expected
  • 14:56 dancy: Updating scap to 4.5.0-1+0~20220321191814.216~1.gbp24bc64 in beta cluster

2022-03-22

2022-03-21

  • 08:35 hashar: The castor cache for mediawiki/core wmf/1.39-wmf.1 is actually empty!
  • 08:32 hashar: Nuking npm castor cache /srv/jenkins-workspace/caches/castor-mw-ext-and-skins/master/wmf-quibble-selenium-php72-docker/npm/ # T300203

2022-03-18

  • 14:18 elukey: restart testing of kafka logging TLS certificates (may affect logstash in beta, ping me in case it is a problem)
  • 13:22 hashar: Rolling back Quibble jobs from 1.4.4 T304147
  • 07:41 elukey: experimenting with PKI and kafka logging on deployment-prep, logstash dashboard/traffic may be down (please ping me in case it is a problem)

2022-03-17

2022-03-16

2022-03-15

2022-03-14

  • 23:57 James_F: Zuul: [ooui] Switch from node12 to node14
  • 23:46 James_F: Docker: Publishing node14-test-browser-php80-composer:0.1.0
  • 23:27 James_F: Zuul: Drop legacy node12 templates except the one for Services
  • 23:10 James_F: Zuul: [oojs/router] Drop custom job and just use the generic node14 one
  • 23:08 James_F: Zuul: [oojs/core] Switch from node12 to node14 jobs
  • 22:46 James_F: Zuul: [unicodejs] Switch from node12 to node14
  • 22:25 James_F: Zuul: [VisualEditor/VisualEditor] Switch from node12 to node14
  • 19:51 James_F: Zuul: Migrate almost all libraries and tools from node12 to node14 for T267890
  • 15:36 James_F: Zuul: Switch extension-javascript-documentation from node12 to node14 for T267890
  • 15:21 James_F: Zuul: Switch all mwgate jobs from node12 to node14 for T267890
  • 09:52 hashar: Building Quibble Docker images for https://gerrit.wikimedia.org/r/757867 | T300340
  • 08:54 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/770079

2022-03-11

  • 04:02 zabe: zabe@deployment-mwmaint02:~$ mwscript extensions/CentralAuth/maintenance/populateGlobalEditCount.php --wiki=metawiki

2022-03-10

2022-03-09

2022-03-08

  • 20:31 brennen: requiring 2fa for all users under /repos

2022-03-07

  • 10:53 zabe: restarted apache on deployment-mediawiki11 # T302699

2022-03-04

2022-03-03

2022-03-02

  • 19:53 James_F: Zuul: Configure CI for the forthcoming REL1_38 branches for T302908
  • 15:56 dancy: Updating scap to 4.4.1-1+0~20220302155149.192~1.gbpe351d6 in beta
  • 15:27 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/767493
  • 15:04 taavi: resolve merge conflicts on deployment-puppetmaster04

2022-02-28

  • 19:29 brennen: removing mutante (dzahn) as application-level gitlab admin; adding as owner of /repos for the time being to facilitate some migrations
  • 19:22 dancy: Update scap to 4.4.0-1+0~20220228192031.189~1.gbp0a8436 in beta
  • 19:17 brennen: adding mutante (dzahn) as application-level gitlab admin

2022-02-26

  • 20:05 zabe: apply T302658 on deployment-prep centralauth databases
  • 13:24 zabe: apply T302660 on deployment-prep centralauth databases
  • 13:19 zabe: apply T302659 on deployment-prep centralauth databases

2022-02-24

  • 16:02 dancy: Updating beta cluster scap to 4.4.0-1+0~20220224155429.187~1.gbp66c5c2
  • 13:44 hashar: integration/config now fully enforces shellcheck https://gerrit.wikimedia.org/r/756088
  • 13:13 hashar: Built image docker-registry.discovery.wmnet/releng/castor:0.2.5
  • 13:10 hashar: Updating castor-save-workspace-cache job https://gerrit.wikimedia.org/r/764817
  • 11:54 hashar: Built image docker-registry.discovery.wmnet/releng/shellcheck:0.1.1
  • 11:41 hashar: Built image docker-registry.discovery.wmnet/releng/sonar-scanner:4.6.0.2311-2
  • 11:04 hashar: Built image docker-registry.discovery.wmnet/releng/operations-puppet:0.8.6
  • 08:58 hashar: Built image docker-registry.discovery.wmnet/releng/mediawiki-phan-testrun:0.2.1

2022-02-23

  • 23:21 dancy: Update beta cluster scap to 4.3.1-1+0~20220223231645.183~1.gbp8ddb60
  • 20:10 dancy: Updating scap in beta
  • 19:23 hashar: Built docker-registry.discovery.wmnet/releng/logstash-filter-verifier:0.0.3
  • 12:41 hashar: Depooling integration-agent-puppet-docker-1002 , pooling integration-agent-puppet-docker-1003 # T252071
  • 10:21 hashar: Created Bullseye instance integration-agent-puppet-docker-1003 https://horizon.wikimedia.org/project/instances/96cf9ddc-daa3-4c9f-8c21-cdd58e95973e/ # T252071
  • 08:37 hashar: Removing Stretch based integration-agent-qemu-1001 # T284774

2022-02-22

  • 16:41 zabe: zabe@deployment-mwmaint02:~$ foreachwiki migrateUserGroup.php oversight suppress # T112147
  • 13:28 urbanecm: deployment-prep: Create database for incubatorwiki (T210492)

2022-02-21

  • 14:58 hashar: Reverting Quibble jobs from 1.4.0 to 1.3.0 # T302226
  • 07:31 hashar: Switching Quibble jobs from Quibble 1.3.0 to 1.4.0 # T300340 T291549 T225730
  • 07:27 hashar: Refreshing all Jenkins jobs

2022-02-20

  • 10:32 qchris: Manually triggering replication run of Gerrit's analytics/datahub to populate newly created analytics-datahub GitHub repo

2022-02-19

  • 12:19 taavi: restart trafficserver-tls on deployment-cache-text06
  • 02:15 James_F: Zuul: [design/codex] Publish the Netlify preview on every patch for T293705
  • 00:35 James_F: Manually re-triggered a build of the docs of Codex (via `zuul-test-repo design/codex postmerge`) now that we actually set the environment vars for T293705

2022-02-18

2022-02-17

  • 21:48 brennen: added Dzahn (mutante) to acl*repository-admins on phabricator
  • 15:58 zabe: root@deployment-cache-upload06:~# touch /srv/trafficserver/tls/etc/ssl_multicert.config && systemctl reload trafficserver-tls.service # T301995
  • 13:35 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/763207
  • 13:20 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/763458
  • 11:12 hashar: Bringing deployment-deploy03 back
  • 11:07 hashar: Disabled deployment-deploy03 Jenkins agent in order to revert some mediawiki/core patch and test the outcome

2022-02-16

  • 18:20 hashar: Tag Quibble 1.4.1 @ d4bd2801de # T300301
  • 16:42 dancy: Updating to scap 4.3.1-1+0~20220216163646.173~1.gbp823710?in beta
  • 12:55 jelto: apply gitlab-settings to gitlab-prod-1001.devtools.eqiad1.wikimedia.cloud
  • 10:09 hashar: Reloading Zuul for I997fee
  • 09:59 hashar: Reloading Zuul for I2ffa01

2022-02-15

  • 21:12 dancy: rebooting deployment-mediawiki12.deployment-prep.eqiad1.wikimedia.cloud to try to revive beta wikis
  • 20:59 dancy: Killed runaway puppet agent on deployment-mediawiki11.deployment-prep.eqiad1.wikimedia.cloud
  • 16:24 hashar: Restarting CI Jenkins for plugins updates
  • 16:21 hashar: Upgrading Jenkins plugins on releases Jenkins
  • 16:06 hashar: Rollback fresh-test Jenkins job to the version intended to run on integration-agent-qemu-1001
  • 15:26 hashar: Reloading Zuul for If80b4b

2022-02-14

  • 16:28 dancy: Updating scap in beta cluster to 4.3.1-1+0~20220211225318.167~1.gbp315b2c
  • 16:16 Amir1: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/762471
  • 15:41 hashar: Messing up with fresh-test Jenkns job to polish up Qemu / qcow2 integration
  • 14:26 jnuche: Jenkins upgrade complete T301361
  • 13:54 jnuche: Jenkins contint instances are going to be restarted soon

2022-02-12

  • 18:22 urbanecm: deployment-prep: reboot deployment-eventgate-3 (T289029)

2022-02-10

2022-02-09

  • 15:22 taavi: deleted shutoff deployment-mx02

2022-02-08

  • 17:34 taavi: remove scap from deployment-kafka-main/jumbo
  • 16:23 taavi: hard reboot misbehaving deployment-echostore01
  • 13:39 taavi: delete /srv/mediawiki-staging.save on deployment-deploy03

2022-02-07

2022-02-04

2022-02-03

  • 18:41 taavi: deployment-prep: route /w/api.php to deployment-mediawiki11, trying to reduce load on a single server
  • 14:53 hashar: Building Docker images for Quibble 1.4.0 (prepared by kostajh)
  • 13:51 kostajh: Tag Quibble 1.4.0 @ 4231bc2 # T300340 T291549 T225730

2022-02-02

  • 16:50 dancy: Upgrading scap to 4.2.2-1+0~20220202164708.157~1.gbp376a16 in beta.
  • 16:12 dancy: Upgrading scap to 4.2.2-1+0~20220201161808.156~1.gbp1c1c64 in beta

2022-02-01

2022-01-31

  • 19:01 James_F: Re-configured Jenkins job mediawiki-i18n-check-docker to 9e3ea96 for T222216
  • 10:49 hashar: Added integration-agent-qemu-1003 with label `Qemu` # T284774

2022-01-28

  • 21:45 taavi: running recountCategories.php on all beta wikis per T299823#7652496
  • 14:27 hashar: taking heapdump of CI Jenkins `sudo -u jenkins /usr/lib/jvm/java-11-openjdk-amd64/bin/jmap -dump:live,format=b,file=/var/lib/jenkins/202201281527.hprof xxxx`

2022-01-27

  • 20:26 hashar: Successfully published image docker-registry.discovery.wmnet/releng/logstash-filter-verifier:0.0.2 # T299431
  • 19:34 Amir1: Reloading Zuul to deploy 757464
  • 16:00 hashar: Pooling back agents 1035 1036 1037 1038 , they could not connect due to ssh host mismatch since yesterday they all got attached to instance 1033 and accepted that host key # T300214
  • 09:16 hashar: integration: cumin --force 'name:docker' 'apt install rsync' # T300236
  • 09:05 hashar: integration: cumin --force 'name:docker' 'apt install rsync' # T300214
  • 00:24 thcipriani: restarting jenkins

2022-01-26

  • 20:29 hashar: Completed migration of integration-agent-docker-XXXX instances from Stretch to Bullseye - T252071
  • 19:55 hashar: deleting integration-agent-docker-1014 which only has the `codehealth` label. A short live experiment no more used since October 2nd 2019 - https://gerrit.wikimedia.org/r/c/integration/config/+/540362 - T234259
  • 18:56 hashar: integration: pooled in Jenkins a few more Bullseye docker agents for T252071
  • 18:17 hashar: integration: pooled in Jenkins a few Bullseye docker agent for T252071
  • 16:45 hashar: integration: creating integration-agent-docker-1023 based on buster with new flavor `g3.cores8.ram24.disk20.ephemeral60.4xiops` # T290783

2022-01-25

  • 20:17 James_F: Zuul: [mediawiki/extensions/CentralAuth] Drop UserMerge dependency
  • 16:39 James_F: Zuul: Mark Math extension as now tarballed in parameter_functions for T232948
  • 15:57 James_F: Zuul: [mediawiki/extensions/Math] Add Math to the main gate for T232948
  • 13:44 hashar: Jenkins CI: added Logger https://integration.wikimedia.org/ci/log/ProcessTree%20-%20T299995/ to watch `hudson.util.ProcessTree` for T299995
  • 10:02 hashar: integration: removing usage of `role::ci::slave::labs::docker::docker_lvm_volume` in Horizon following https://gerrit.wikimedia.org/r/c/operations/puppet/+/755948 . Docker role instances now always have a 24G partition for Docker
  • 09:59 hashar: integration-agent-qemu-1001: resized /srv to 100% disk free: `lvextend -r -l +100%FREE /dev/mapper/vd-second--local--disk` # T299996
  • 09:59 hashar: integration-agent-qemu-1001: resizing /dev/mapper/vd-second--local--disk (/srv) to 20G : `resize2fs -p /dev/mapper/vd-second--local--disk 20G` # T299996
  • 09:51 hashar: integration-agent-qemu-1001: resizing /dev/mapper/vd-second--local--disk (/srv) to 20G : `resize2fs -p /dev/mapper/vd-second--local--disk 20G`
  • 09:51 hashar: integration-agent-qemu-1003: nuked /dev/vd/second-local-disk and /srv to make room for a docker logical volume. That has fixed puppet T299996
  • 09:22 Reedy: unblocked beta again
  • 07:32 Krinkle: integration-castor03:/srv/jenkins-workspace/caches$ sudo rm -rf castor-mw-ext-and-skins/

2022-01-24

  • 21:44 Reedy: unstick beta ci jobs
  • 21:19 jeena: reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/756523
  • 20:36 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/756139
  • 17:28 hashar: Nuke castor caches on integration-castor03 : sudo rm -fR /srv/jenkins-workspace/caches/castor-mw-ext-and-skins/master/{quibble-vendor-mysql-php72-selenium-docker,wmf-quibble-selenium-php72-docker} # T299933
  • 17:28 hashar: Nuke castor caches on integration-castor03 : sudo rm -fR /srv/jenkins-workspace/caches/castor-mw-ext-and-skins/master/{quibble-vendor-mysql-php72-selenium-docker,wmf-quibble-selenium-php72-docker}

2022-01-22

  • 13:40 taavi: apply T299827 on deployment-prep centralauth database
  • 11:44 taavi: restart varnish-frontend.service on deployment-cache-upload06 to clear puppet agent failure alerts

2022-01-21

  • 18:12 taavi: resolved merge conflicts on deployment-puppetmaster04
  • 15:50 hashar: integration-puppetmaster-02: deleted 2021 snapshot tags in puppet repo and ran `git gc --prune=now`

2022-01-20

  • 20:24 James_F: Zuul: [Kartographer] Add parsoid as dependency for CI jobs
  • 20:22 James_F: Zuul: [DiscussionTools] Add Gadgets as dependency for Phan jobs
  • 20:04 dancy: Jenkins beta jobs are back online, using scap prep auto now.
  • 19:19 dancy: Pausing beta Jenkins jobs to make a copy of /srv/mediawiki-staging in preparation for testing
  • 19:10 dancy: Unpacking scap (4.1.1-1+0~20220120175448.144~1.gbp517f9d) over (4.1.1-1+0~20220113154148.133~1.gbp6e3a17) on deploy03
  • 18:07 hashar: Updating Quibble jobs to have MediaWiki files written on the hosts /srv partition (38G) instead of inside the container which ends in /var/lib/docker (24G) https://gerrit.wikimedia.org/r/755743 # T292729
  • 16:31 hashar: Rebalancing /var/lib/docker and /srv partitions on CI agents | https://gerrit.wikimedia.org/r/755713
  • 12:12 hashar: contint2001 deleting all the Docker images (they will be pulled as needed)
  • 12:10 hashar: contint2001 : docker container prune && docker image prune
  • 12:07 hashar: contint1001 deleting all the Docker images (they will be pulled as needed)
  • 12:04 hashar: contint1001 `docker image prune`
  • 11:51 hashar: Cleaning very old Docker images on contint1001.wikimedia.Org

2022-01-19

2022-01-18

  • 19:56 hashar: building Docker images for https://gerrit.wikimedia.org/r/754951
  • 18:01 taavi: added ryankemper as a member of the deployment-prep project
  • 15:00 hashar: Updating Jenkins jobs for Quibble 1.3.0 with proper PHP version in the images # T299389
  • 11:39 hashar: Rolling back Quibble 1.3.0 jobs due to php configuration files with at least releng/quibble-buster73:1.3.0 # T299389
  • 08:07 hashar: Updating Jenkins jobs for Quibble to pass `--parallel-npm-install` https://gerrit.wikimedia.org/r/c/integration/config/+/754569
  • 08:02 hashar: Updating Jenkins jobs for Quibble 1.3.0

2022-01-17

  • 16:28 hashar: Building Quibble 1.3.0 Docker images
  • 16:16 hashar: Tagged Quibble 1.3.0 @ 2b2c7f9a45 # T297480 T226869 T294931
  • 08:32 hashar: Refreshing all Jenkins jobs with jjb to take in account recent changes related to the Jinja2 docker macro

2022-01-14

  • 15:56 dancy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/753981
  • 14:59 hashar: Starting VM integration-agent-docker-1022 which was in shutdown state since December and is Bullseye based # T290783
  • 13:49 hashar: Restarting all CI Docker agents via Horizon to apply new flavor settings T265615 T299211
  • 01:47 dancy: revert to scap 4.1.1-1+0~20220113154148.133~1.gbp6e3a17 in beta

2022-01-13

  • 18:02 dancy: Updating scap to 4.1.1-1+0~20220113154506.135~1.gbp523480 on all beta hosts
  • 17:54 dancy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/753792
  • 16:27 dancy: testing scap prep auto on deployment-deploy03
  • 15:52 dancy: Update scap to 4.1.1-1+0~20220113154506.135~1.gbp523480 on deployment-deploy03
  • 11:27 hashar: Updating Jenkins job to normalize usage of `docker run --workdir` https://gerrit.wikimedia.org/r/c/integration/config/+/753457
  • 10:52 hashar: Restarting Jenkins CI for plugins update
  • 10:42 hashar: Applied Jenkins built-in node migration to CI Jenkins (`master` > `built-in` renaming) # T298691
  • 10:14 taavi: cancelled stuck deployment-prep jobs on jenkins

2022-01-12

2022-01-11

  • 09:18 hashar: Updating all Jenkins jobs following recent "noop" refactorings

2022-01-10

  • 17:13 dancy: Update beta scap to 4.1.0-1+0~20220107203309.130~1.gbpcd0ace
  • 14:01 James_F: Zuul: Add gate-and-submit-l10n to Isa for T222291

2022-01-05

2022-01-04

2022-01-03

  • 14:37 hashar: Upgraded Java 11 on contint2001 && contint1001. Restarted CI Jenkins.
  • 14:35 hashar: Upgraded Java 11 on releases1002 && releases2002


Archives