You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Release Engineering/SAL"

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(deploying https://gerrit.wikimedia.org/r/226753 (legoktm))
imported>Stashbot
(hashar: Building Docker images for [tox-buster] Install shellcheck and cascade [integration/config] - https://gerrit.wikimedia.org/r/721881)
Line 1: Line 1:
== 2015-08-02 ==
== 2021-09-17 ==
* 01:28 legoktm: deploying https://gerrit.wikimedia.org/r/226753
* 19:05 hashar: Building Docker images for [tox-buster] Install shellcheck and cascade [integration/config] - https://gerrit.wikimedia.org/r/721881
* 01:20 legoktm: deploying https://gerrit.wikimedia.org/r/228507
* 18:08 Krinkle: Re-recreating qemu-1002 as integration-agent-qemu-1003 (Debian 11 Bullseye, g3.cores8.ram24.disk20.ephemeral40.4xiops), ref [[phab:T284774|T284774]]
* 00:36 legoktm: deploying https://gerrit.wikimedia.org/r/228583
* 18:07 Krinkle: Re-recreating qemu-1002 as integration-agent-qemu-1003 (Debian 11 Bullseye, g3.cores8.ram24.disk20.ephemeral40.4xiops), ref [[phab:T28477|T28477]]
* 16:19 dpifke: Enabled TLS on Jumbo Kafka instances in deployment-prep.


== 2015-08-01 ==
== 2021-09-16 ==
* 23:27 legoktm: deploying https://gerrit.wikimedia.org/r/228492
* 22:17 dpifke: Cherry-picking https://gerrit.wikimedia.org/r/c/performance/navtiming/+/721567 in deployment-prep, should only affect deployment-webperf11.
* 16:51 addshore: reload zuul for Promote branchdeploy for query-builder to test https://gerrit.wikimedia.org/r/721581 (https://phabricator.wikimedia.org/T278706)
* 13:27 Reedy: Actually reloading Zuul to deploy  https://gerrit.wikimedia.org/r/683589
* 13:01 addshore: reload zuul for https://gerrit.wikimedia.org/r/683589 (https://phabricator.wikimedia.org/T278706) 1 experimental branchdeploy job


== 2015-07-31 ==
== 2021-09-15 ==
* 01:32 jzerebecki: reload zuul for 83a30e5..f2d2517
* 22:59 dpifke: Un-cherry-picked  https://gerrit.wikimedia.org/r/c/operations/puppet/+/721047 in deployment-prep for now; more work needed.
* 22:14 dpifke: Cherry-picking https://gerrit.wikimedia.org/r/c/analytics/statsv/+/721044 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/721047 in deployment-prep; should only affect deployment-webperf11.
* 08:58 Amir1: update wb_changes_dispatch set chd_disabled = 1 where chd_site = 'enwiki'; ([[phab:T290985|T290985]])


== 2015-07-30 ==
== 2021-09-14 ==
* 23:50 bd808: upgraded nutcracker to 0.4.1-1+wm2~precise1 on deployment-bastion
* 17:17 Amir1: delete from wb_changes_dispatch where chd_db = 'enwiki'; ([[phab:T290985|T290985]])
* 21:48 legoktm: deploying https://gerrit.wikimedia.org/r/228155
* 17:09 ostriches: cleaned up /var space on deployment-videoscaler01
* 09:17 hashar: apt-get upgrade on all Trusty slaves
* 09:13 hashar_: integration: upgrading Zuul package on Precise/Trusty instances ( https://phabricator.wikimedia.org/T106499 )


== 2015-07-29 ==
== 2021-09-13 ==
* 23:55 marxarelli: clearing disk space on integrations-slave-trusty-1012 with `find /mnt/jenkins-workspace/workspace -mindepth 1 -maxdepth 1 -type d -mtime +15 -exec rm -rf {} \;`
* 09:33 hashar: Castor cache: nuked files that were last changed more than six months ago to free up disk space
* 18:15 bd808: upgraded nutcracker on deployment-jobrunner01
* 18:14 bd808: upgraded nutcracker on deployment-videoscaler01
* 18:08 bd808: rm deployment-fluorine:/a/mw-log/archive/*-201506*
* 18:08 bd808: rm deployment-fluorine:/a/mw-log/archive/*-201505*
* 18:02 bd808: rm deployment-videoscaler01:/var/log/atop.log.?*
* 16:49 thcipriani: lots of "Error connecting to 10.68.16.193: Can't connect to MySQL server on '10.68.16.193'" deployment-db1 seems up and functional :(
* 16:27 thcipriani: deployment-prep login timeouts, tried restarting apache, hhvm, and nutcracker on mediawiki{01..03}
* 14:38 bblack: cherry-picked https://gerrit.wikimedia.org/r/#/c/215624 (updated to PS8) into deployment-puppetmaster ops/puppet
* 14:28 bblack: cherry-picked https://gerrit.wikimedia.org/r/#/c/215624 into deployment-puppetmaster ops/puppet
* 12:38 hashar_: salt minions are back somehow
* 12:36 hashar_: salt on deployment-salt is missing most of the instances :-(((
* 03:00 ostriches: deployment-bastion: please please someone rebuild me to not have a stupid 2G /var partition
* 03:00 ostriches: deployment-bastion: purged a bunch of atop and pacct logs, and apt cache...clogging up /var again.
* 02:34 legoktm: deploying https://gerrit.wikimedia.org/r/227640


== 2015-07-28 ==
== 2021-09-10 ==
* 23:43 marxarelli: running `jenkins-jobs update config/ 'mwext-mw-selenium'` to deploy I7afa07e9f559bffeeebaf7454cc6b39a37e04063
* 21:52 James_F: Created experimental integration-agent-docker-1021 for [[phab:T252071|T252071]]
* 21:05 bd808: upgraded nutcracker on mediawiki03
* 21:48 James_F: Deleting CI agent integration-agent-docker-1001 for [[phab:T252071|T252071]]
* 21:04 bd808: upgraded nutcracker on mediawiki02
* 21:44 James_F: Pulling oldest CI agent integration-agent-docker-1001 from rotation so it can be replaced by a bullseye one for [[phab:T252071|T252071]]
* 21:01 bd808: upgraded nutcracker on mediawiki01
* 21:23 James_F: Zuul: [integration/config] Add shellcheck job for scripts defined in jjb as an experimental job
* 19:49 jzerebecki: reloading zuul b1b2cab..b02830e
* 17:41 James_F: Zuul: [cloud/toolforge/jobs-framework-emailer] Add basic tox CI
* 11:18 hashar: Assigning label "BetaClusterBastion" to https://integration.wikimedia.org/ci/computer/deployment-bastion.eqiad/
* 02:18 James_F: Zuul: [wikipeg] Switch JS+PHP job from node10 to node12
* 11:12 hashar: Jenkins jobs for the beta cluster ended up stuck again.  Found a workaround by removing the Jenkins label  on deployment-bastion node and reinstating it.  Seems to get rid of the deadlock ( ref: https://phabricator.wikimedia.org/T72597#1487801 )
* 02:13 James_F: Zuul: [wikipeg] Provide wikipeg-special-node12-plus-php80-composer-docker as an experimental job
* 09:50 hashar: deployment-apertium01 is back!  The ferm rules were outdated / not maintained by puppet, dropped ferm entirely.
* 02:10 James_F: Zuul: [oojs/ui] Switch special JS+PHP job from node10 to node12
* 09:40 hashar: rebooting deployment-apertium01 to ensure its ferm rules are properly loaded on boot ( https://phabricator.wikimedia.org/T106658 )
* 01:58 James_F: Zuul: [oojs/ui] Add ooui-special-node12-plus-php80-composer-docker as experimental
* 00:46 legoktm: deploying https://gerrit.wikimedia.org/r/227383
* 01:48 James_F: Zuul: [wikipeg] Drop php72 special test job, the php80 one suffices


== 2015-07-27 ==
== 2021-09-09 ==
* 23:04 marxarelli: running `jenkins-jobs update config/ 'browsertests-*'` to deploy I3c61ff4089791375e21aadfa045d503dfd73ca0e
* 22:20 brennen: gitlab-ansible-test: resetting instance data
* 13:26 hashar: Precise slaves had faulty elasticsearch: apt-get install --reinstall elasticsearch
* 19:42 James_F: Docker: Building node<nowiki>{</nowiki>10,12<nowiki>}</nowiki>-test-browser-php80-composer for [[phab:T290651|T290651]]
* 13:21 hashar: puppet stalled on Precise Jenkins slaves :-(
* 10:56 hashar: Successfully published image docker-registry.discovery.wmnet/releng/helm-linter:0.2.17
* 08:52 hashar: upgrading packages on Precise slaves
* 10:37 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/719496
* 08:49 hashar: rebooting all Trusty jenkins slaves
* 08:39 hashar: upgrading python-pip on Trusty from 1.5.4-1ubuntu1 to 1.5.4-1ubuntu3 . Fix up pip silently removing system packages ( https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771794 )
* 08:12 hashar: On CI slaves, bumping HHVM from  3.6.1+dfsg1-1+wm3 to 3.6.5+dfsg1-1+wm1
* 08:11 hashar: apt-get upgrade Trusty Jenkins slaves


== 2015-07-24 ==
== 2021-09-08 ==
* 17:35 marxarelli: updating integration slave scripts from integration-saltmaster to deploy I6906fadede546ce2205797da1c6b267aed586e17
* 20:16 thcipriani: self +2 on https://gerrit.wikimedia.org/r/c/mediawiki/core/+/719500 to unbreak beta
* 17:17 marxarelli: running `jenkins-jobs update config/ 'mediawiki-selenium-integration' 'mwext-mw-selenium'` to deploy Ib289d784c7b3985bd4823d967fbc07d5759dc756
* 18:47 brennen: contint1001 / contint2001: /srv/dev-images: git remote set-url origin 'https://gitlab.wikimedia.org/releng/dev-images.git'
* 17:05 marxarelli: running `jenkins-jobs update config/ 'mediawiki-selenium-integration'` to deploy and test Ib289d784c7b3985bd4823d967fbc07d5759dc756
* 18:28 dduvall: Running ./fab deploy_docker to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/719350
* 17:04 hashar: integration-saltmaster, in  a '''screen''' : salt -b 1 '*slave*' cmd.run '/usr/local/sbin/puppet-run'|tee hashar-massrun.log
* 17:05 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/719355
* 17:04 hashar: cancelled last command
* 16:37 thcipriani: Reloading zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/719493
* 17:03 hashar: integration-saltmaster : salt -b 1 '*slave*' cmd.run '/usr/local/sbin/puppet-run' &  && disown && exit
* 14:55 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/714328/
* 16:55 hashar: Might have fixed the puppet/pip mess on CI slaves by creating a symlink from /usr/bin/pip to /usr/local/bin/pip ( https://gerrit.wikimedia.org/r/#/c/226729/1..2/modules/contint/manifests/packages/python.pp,unified )
* 14:05 brennen: runner-1002.gitlab-runners: tested upgrade of gitlab-runner to 14.2.0, seemed to go fine, will do remaining runners ([[phab:T289802|T289802]])
* 16:36 hashar: puppet on Jenkins slaves might have some intermittent issues due to pip installation  https://gerrit.wikimedia.org/r/226729
* 15:29 hashar: removing pip obsolete download-cache setting ( https://gerrit.wikimedia.org/r/#/c/226730/ )
* 15:27 hashar: upgrading pip to 7.1.0 via pypi ( https://gerrit.wikimedia.org/r/#/c/226729/ ).  Revert plan is to uncherry pick the patch on the puppetmaster and:  pip uninstall pip
* 12:46 hashar: Jenkins: switching gearman plugin from our custom compiled 0.1.1-9-g08e9c42-change_192429_2  to upstream 0.1.2. They are actually the exact same versions.
* 08:40 hashar: upgrading zuul to zuul_2.0.0-327-g3ebedde-wmf3precise1 to fix a regression ( https://phabricator.wikimedia.org/T106531 )
* 08:39 hashar: upgrading zuul


== 2015-07-23 ==
== 2021-09-07 ==
* 23:03 marxarelli: running `jenkins-jobs update config/ 'browsertests-*'` to deploy I2d0f83d0c6a406d46627578cb8db0706d1b8655d
* 20:49 James_F: Marked https://gerrit.wikimedia.org/g/mediawiki/tools/cli as read-only and pointed users to GitLab.
* 16:38 marxarelli: Reloading Zuul to deploy I96b6218a208f133209452c71bcf01a1088305aea
* 20:46 brennen: migrating dev-images to https://gitlab.wikimedia.org/releng/dev-images and deactivating on gerrit
* 15:39 urandom: applied wip logstash & cassandra changes (https://gerrit.wikimedia.org/r/#/c/226025/) to deployment-prep
* 18:16 dduvall: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/719311
* 13:24 hashar: apt-get upgrade integration-puppetmaster and rebooting it
* 18:11 dduvall: creating 2 new jenkins jobs for deployment of https://gerrit.wikimedia.org/r/c/integration/config/+/719311
* 13:23 hashar: integration puppetmaster in bad shape: Warning: Error 400 on SERVER: Cannot allocate memory - fork(2)
* 16:25 James_F: Docker: Publishing node12-test-browser-php<nowiki>{</nowiki>72,80<nowiki>}</nowiki>-composer images
* 10:58 hashar: beta : salt '*' cmd.run 'rm /etc/apt/apt.conf.d/20auto-upgrades.ucf-dist'
* 16:19 James_F: Zuul: [mediawiki/extensions/BlueSpiceDistributionConnector] Add 4 dependencies
* 10:52 hashar: Beta cluster puppetmaster is now deployment-puppetmaster.deployment-prep.eqiad.wmflabs . Migrated all instances (solves https://phabricator.wikimedia.org/T106649 )
* 10:30 hashar: regenerated puppet cert on deployment-salt , the old puppetmaster now a puppet client
* 10:23 hashar: running apt-get upgrade on deployment-parsoidcache02
* 09:32 hashar: puppet broken on deployment-fluorine : Error: Could not request certificate: Neither PUB key nor PRIV key:: header too long
* 08:39 hashar: Disabling puppet agent on ALL beta cluster instances
* 08:18 hashar: creating deployment-puppetmaster m1.medium :D
* 01:57 jzerebecki: reconnected slave and needed to kill a few pending beta jobs, works again
* 01:50 jzerebecki: trying https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code.2Fdb_update
* 01:09 legoktm: beta-mediawiki-config-update-eqiad jobs stuck
* 00:41 jzerebecki: clean up doc dir after job changes gallium:~$ sudo -iu jenkins-slave rm -r /srv/org/wikimedia/doc/MobileFrontend/master/{app-0c945a27f43452df695771ddb60b3d14.js,data-500abda2bcb0df13609e38707dfa7f4e.js,eg-iframe.html,extjs,favicon.ico,index.html,member-icons,output,resources,source,styles-3eba09980fa05ead185cb17d9c0deb0f.css}
* 00:14 jzerebecki: reloading zuul 369e6eb..73dc1f6 for https://gerrit.wikimedia.org/r/#/c/223527/


== 2015-07-22 ==
== 2021-09-03 ==
* 10:24 hashar: Upgrading Zuul on Jenkins Precise slaves to zuul_2.0.0-327-g3ebedde-wmf2precise1_amd64.deb
* 23:02 Krinkle: Creating integration-agent-qemu-1002 (Debian 11 Bullseye, g3.cores8.ram24.disk20.ephemeral40.4xiops), ref [[phab:T284774|T284774]]
* 09:32 hashar_: Reupgrading Zuul to zuul_2.0.0-327-g3ebedde-wmf2precise1_amd64.deb with an approval fix ( https://gerrit.wikimedia.org/r/#/c/226274/ ) for gate-and-submit no more matching Code-Review+2 events ( https://phabricator.wikimedia.org/T106436 )
* 17:42 dduvall: deploying blubberoid:2021-09-03-160524-production to eqiad/codfw (https://gerrit.wikimedia.org/r/c/blubber/+/716519) ([[phab:T289367|T289367]])
* 17:36 dduvall: staging blubberoid to deploy https://gerrit.wikimedia.org/r/c/blubber/+/716519


== 2015-07-21 ==
== 2021-09-02 ==
* 22:54 greg-g: 22:50 <  chasemp> "then git reset --hard 9588d0a6844fc9cc68372f4bf3e1eda3cffc8138 in  /etc/zuul/wikimedia"
* 15:17 brennen: gitlab-test: testing upgrade path to 14.x
* 22:53 greg-g: 22:47 <  chasemp> service zuul stop && service zuul-merger stop && sudo apt-get install  zuul=2.0.0-304-g685ca22-wmf1precise1
* 21:48 greg-g: Zuul not responding
* 20:23 hasharConfcall: Zuul no more reports back to Gerrit due to an error with the Gerrit label
* 20:10 hasharConfcall: Zuul restarted with 2.0.0-327-g3ebedde-wmf2precise1
* 19:48 hasharConfcall: Upgrading Zuul to zuul_2.0.0-327-g3ebedde-wmf2precise1  Previous version failed because python-daemon was too old, now shipped in the venv  https://phabricator.wikimedia.org/T106399
* 15:04 hashar: upgraded Zuul on gallium from zuul_2.0.0-306-g5984adc-wmf1precise1_amd64.deb to zuul_2.0.0-327-g3ebedde-wmf1precise1_amd64.deb . now uses python-daemon 2.0.5
* 13:37 hashar: upgraded Zuul on gallium from zuul_2.0.0-304-g685ca22-wmf1precise1 to zuul_2.0.0-306-g5984adc-wmf1precise1 . Uses a new version of GitPython
* 02:15 bd808: upgraded to elasticsearch-1.7.0.deb on deployment-logstash2


== 2015-07-20 ==
== 2021-09-01 ==
* 16:55 thcipriani: restarted puppetmaster on deployment-salt, was acting whacky
* 21:22 dpifke: Cherry-picking https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/716041 in beta.
* 16:55 urbanecm: deployment-prep: Unlock scap
* 15:50 urbanecm: deployment-prep: Lock scap again
* 15:40 urbanecm: deployment-prep: Lock scap to be able to test something
* 14:08 urbanecm: deployment-prep: Create foundationwiki ([[phab:T290164|T290164]])
* 14:07 urbanecm: urbanecm@deployment-mediawiki11:~$ sudo run-puppet-agent # [[phab:T290164|T290164]]
* 13:58 urbanecm: urbanecm@deployment-cache-text06:~$ sudo run-puppet-agent # [[phab:T290164|T290164]]


== 2015-07-17 ==
== 2021-08-31 ==
* 21:45 hashar: upgraded nodepool to 0.0.1-104-gddd6003-wmf4 . That fix graceful stop via SIGUSR1 and let me complete the systemd integration
* 21:38 dduvall: deploying new blubberoid to eqiad/codfw following successful testing in staging
* 20:03 hashar: stopping Zuul to get rid of a faulty registered function "build:Global-Dev Dashboard Data". Job is gone already.
* 21:35 dduvall: staging new blubberoid release to deploy https://gerrit.wikimedia.org/r/c/blubber/+/715276
* 14:29 hashar: Restarting CI Jenkins for plugins upgrade


== 2015-07-16 ==
== 2021-08-30 ==
* 16:08 hashar_: kept nodepool stopped on labnodepool1001.eqiad.wmnet because it spams the cron log
* 19:53 urbanecm: urbanecm@deployment-deploy01:/srv/mediawiki-staging$ git submodule update portals # to clear dirty staging dir at beta
* 10:27 hashar: fixing puppet on deployment-bastion. Stalled since July 7th - https://phabricator.wikimedia.org/T106003
* 19:27 urbanecm: deployment-prep: reboot deployment-eventgate-3 ([[phab:T289029|T289029]])
* 10:26 hashar: deployment-bastion: apt-get upgrade
* 19:16 brennen: gitlab-test: powering off gitlab (former main test instance)
* 02:34 bd808: cherry-picked https://gerrit.wikimedia.org/r/#/c/224313 for scap testing
* 19:11 James_F: Zuul: [mediawiki/services/apple-search] Add postmerge publish
* 19:10 brennen: gitlab-test: upgrading gitlab on gitlab-ansible-test to 13.12.9; re-associating floating IP to gitlab-ansible-test
* 18:23 brennen: gitlab-test: associating floating IP to primary test box
* 16:36 James_F: Zuul: [mediawiki/services/apple-search] Remove composer-package
* 15:16 James_F: Zuul: [mediawiki/services/apple-search] Add pipeline CI for [[phab:T289224|T289224]]
* 12:20 Amir1: foreachwikiindblist wikisource refreshImageMetadata.php --mediatype=OFFICE --batch-size=10 --verbose --split --sleep 5


== 2015-07-15 ==
== 2021-08-27 ==
* 20:53 bd808: Added JanZerebecki as deployment-prep root
* 19:24 James_F: Docker: Publish initial node14 CI images for [[phab:T267888|T267888]]
* 17:53 bd808: cherry-picked https://gerrit.wikimedia.org/r/#/c/224829/
* 16:10 bd808: sudo rm -rf /tmp/scap_l10n_* on deployment-bastion
* 15:33 bd808: root (/) is full on deployment-bastion, trying to figure out why
* 14:39 bd808: mkdir mira.deployment-prep:/home/l10nupdate because puppet's managehome flag doesn't seem to be doing that :(
* 05:00 bd808: created mira.deployment-prep.eqiad.wmflabs to begin testing multi-master scap


== 2015-07-14 ==
== 2021-08-25 ==
* 00:45 bd808: /srv/deployment/scap/scap on deployment-mediawiki02 had corrupt git cache info; moved to scap-corrupt and forced a re-sync
* 17:48 twentyafterfour: updating zuul config after deploying https://gerrit.wikimedia.org/r/714538
* 00:41 bd808: trebuchet deploy of scap to mediawiki02 failed. investigating
* 00:41 bd808: Updated scap to d7db8de (Don't assume current l10n cache files are .cdb)


== 2015-07-13 ==
== 2021-08-24 ==
* 20:44 thcipriani: might be some failures, puppetmaster refused to stop as usual, had to kill pid and restart
* 23:15 James_F: Zuul: Configure the REL1_37 test and gate pipelines [[phab:T289587|T289587]]
* 20:39 thcipriani: restarting puppetmaster on deployment-salt, seeing weird errors on instances
* 22:18 thcipriani: phab1001:sudo /srv/phab/phabricator/bin/bulk make-silent --id 2822 (releng-logspam -> unstewarded production error)
* 10:24 hashar: pushed mediawiki/ruby/api tags for versions 0.4.0 and 0.4.1
* 20:02 Reedy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/712990
* 10:12 hashar: deployment-prep: killing puppetmaster
* 10:06 hashar: integration: kicking puppet master. It is stalled somehow


== 2015-07-11 ==
== 2021-08-23 ==
* 04:35 bd808: Updated /var/lib/git/labs/private to latest upstream
* 20:37 James_F: Docker: Publish php-ast with 1.0.14 ([[phab:T289429|T289429]]) and no longer support PHP 7.0 or 7.1 (last trace!)
* 03:54 bd808: cherry-picked https://gerrit.wikimedia.org/r/#/c/224219/
* 03:54 bd808: fixed rebase conflict with "Enable firejail containment for zotero" by removing stale cherry-pick


== July 10 ==
== 2021-08-21 ==
* 16:12 hashar: nodepool puppitization going on :-D
* 15:03 majavah: fixing deployment-prep puppet merge conflicts re: swift
* 03:01 legoktm: deploying https://gerrit.wikimedia.org/r/223992


== July 9 ==
== 2021-08-20 ==
* 22:16 hashar: integration: pulled labs/private.git : dbef45d..d41010d
* 21:11 urbanecm: urbanecm@deployment-deploy01:/srv/mediawiki-staging/private$ rm mwblocker.log # remove weird blank log file
* 18:56 jeena: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/713439
* 18:55 urbanecm: urbanecm@deployment-mwmaint01:/srv/mediawiki/php-master$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki
* 18:54 urbanecm: urbanecm@deployment-mwmaint01:~$ for i in <nowiki>{</nowiki>1..20<nowiki>}</nowiki>; do echo "test $i" {{!}} mwscript edit.php --wiki=<nowiki>{</nowiki>cswiki,enwiki<nowiki>}</nowiki> --user="Martin Urbanec (test $i)" --summary="test" Sandbox; done
* 18:49 urbanecm: urbanecm@deployment-mwmaint01:/srv/mediawiki/php-master$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=enwiki
* 18:49 urbanecm: urbanecm@deployment-mwmaint01:/srv/mediawiki/php-master$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki
* 18:46 urbanecm: urbanecm@deployment-mwmaint01:/srv/mediawiki/php-master$ for i in <nowiki>{</nowiki>1..20<nowiki>}</nowiki>; do mwscript extensions/CentralAuth/maintenance/createLocalAccount.php --wiki=enwiki "Martin Urbanec (test $i)"; done
* 18:40 urbanecm: urbanecm@deployment-mwmaint01:~$ for i in <nowiki>{</nowiki>1..20<nowiki>}</nowiki>; do mwscript createAndPromote.php --wiki=cswiki "Martin Urbanec (test $i)" "$password"; done # to test a feature that needs a lot of different accounts
* 16:30 majavah: restart sssd on deployment-cache-text06, [[phab:T286502|T286502]]?
* 16:24 majavah: deployment-prep: configure wikifunctions.beta.wmflabs.org dns zones and add to acme-chief [[phab:T284162|T284162]]


== July 8 ==
== 2021-08-19 ==
* 23:17 bd808: Kibana functional again. Imported some dashboards from prod instance.
* 22:58 twentyafterfour: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/713612/
* 22:48 marxarelli: cherry-picked https://gerrit.wikimedia.org/r/#/c/223691/ on integration-puppetmaster
* 22:33 bd808: about half of the indices on deployment-logstash2 lost. I assume it was caused by shard rebalancing to logstash1 that I didn't notice before I shut it down and deleted it :(
* 22:32 bd808: Upgraded elasticsearch on logstash2 to 1.6.0
* 22:00 bd808: Kibana messed up. Half of the logstash elasticsearch indices are gone from deployment-logstash2
* 21:05 legoktm: deployed https://gerrit.wikimedia.org/r/223669
* 11:47 Krinkle: Reloading Zuul to deploy  https://gerrit.wikimedia.org/r/223530
* 09:26 hashar: upgraded plugins on jenkins and restarting it


== July 7 ==
== 2021-08-17 ==
* 23:58 bd808: updated scap to 303e72e (Increment deployment stats after sync-wikiversions)
* 21:23 brennen: Updating dev-images docker-pkg files on primary contint for [[gerrit:709831{{!}}add buster-apache2, log errors to stdio]] ([[phab:T283416|T283416]])
* 21:23 bd808: deleted instance deployment-logstash1
* 20:48 marxarelli: cherry-picking https://gerrit.wikimedia.org/r/#/c/158016/ on deployment-salt
* 20:07 bd808: Forced puppet run on deployment-restbase01; run picked up changes that should have been applied yesterday, not sure why puppet wasn't running from cron properly
* 19:58 bd808: cherry-picked https://gerrit.wikimedia.org/r/#/c/223391/
* 18:51 bd808: restarted puppetmaster on deployment-salt to pick up logging config changes
* 18:14 bd808: Changed role::protoproxy::ssl::beta to role::tlsproxy::ssl::beta for deployment-cache-*
* 18:10 bd808: puppet broken on deployment-cache-* by https://gerrit.wikimedia.org/r/#/c/222124/
* 15:45 bd808: Cherry-picked https://gerrit.wikimedia.org/r/#/c/223301/


== July 6 ==
== 2021-08-16 ==
* 23:34 marxarelli: Reloading Zuul to deploy I33ac72e7df498e58f0e25d8c59f167d13eae06cf
* 23:08 jeena: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/711731
* 23:24 bd808: restarted nutcracker on deployment-mediawiki01
* 21:32 bd808: cherry-picked https://gerrit.wikimedia.org/r/#/c/223184/ to deployment-salt
* 20:57 bd808: restarted puppetmaster on deployment-salt
* 20:55 bd808: cherry-picked https://gerrit.wikimedia.org/r/#/c/223172/ for testing
* 20:50 hashar: removing lanthanum from Jenkins slave configuration. Server is gone ( https://phabricator.wikimedia.org/T86658 )
* 20:34 hashar: lanthanum: deleting gerrit replicas under /srv/ssd/gerrit
* 20:32 hashar: Gerrit: reloading replication plugin: <tt>gerrit plugin reload replication</tt>
* 14:08 hashar: Disconnected lanthanum Jenkins slave. Being phased out https://phabricator.wikimedia.org/T86658


== July 3 ==
== 2021-08-15 ==
* 14:07 hashar: adding puppetmaster::certcleaner class to integration and beta  puppetmaster
* 17:44 James_F: Zuul: [mediawiki/extensions/CIForms] Add basic quibble CI
* 14:03 hashar: rebased puppetmaster on integration project
* 13:59 hashar: removing puppetmaster::autosigner from integration-puppetmaster
* 13:58 hashar: removing puppetmaster::autosigner from deployment-salt. It is now automatic per https://gerrit.wikimedia.org/r/#/c/220306/
* 13:55 hashar: restarted puppetmaster on deployment-salt
* 05:20 legoktm: deploying https://gerrit.wikimedia.org/r/222539
* 01:18 legoktm: deploying https://gerrit.wikimedia.org/r/166074
* 00:41 legoktm: deploying https://gerrit.wikimedia.org/r/222503


== July 2 ==
== 2021-08-13 ==
* 10:07 hashar: adding mobrovac to the integration project so he can ssh to slaves and sudo as jenkins-deploy user
* 20:09 urbanecm: Manually start `beta-update-databases-eqiad` CI job
* 20:06 urbanecm: deployment-prep: sudo -u jenkins-deploy /usr/local/bin/wmf-beta-update-databases.py
* 20:03 urbanecm: Kill beta-scap-sync-world job for the usual reason
* 13:13 majavah: `mwscript extensions/CentralAuth/maintenance/importMissingLocalNames.php --wiki metawiki` on the beta cluster


== July 1 ==
== 2021-08-11 ==
* 15:44 hashar: Kunal awesome dashboard for repos https://www.mediawiki.org/wiki/User:Legoktm/ci
* 00:52 James_F: Zuul: Add Aca to the CI allow list
* 15:34 hashar: https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs-HEAD/ is fixed. populated the git repos manually
* 00:52 James_F: Zuul: [mediawiki/extensions/SimpleCalendar] Add basic quibble CI
* 15:21 hashar: manually populating mediawiki/core on Precise instances for mediawiki-core-phpcs-HEAD job using: <tt>git config remote.origin.url https://gerrit.wikimedia.org/r/p/mediawiki/core</tt> <tt>git fetch</tt>
* 15:14 hashar: https://integration.wikimedia.org/ci/job/mediawiki-core-phpcs-HEAD/ broken while cloning mediawiki/core :-(
* 10:47 hashar: puppet fixed by restarting the puppet master
* 10:41 hashar: restarting Jenkins
* 10:40 hashar: upgrading Jenkins gearman plugin from 0.1.1-8-gf2024bd to 0.1.1-9-g08e9c42-change_192429_2  https://phabricator.wikimedia.org/T72597#1416913
* 10:38 hashar: restarted puppetmaster on integration
* 10:36 hashar: Error: /Stage[main]/Ldap::Client::Utils/File[/usr/local/sbin/archive-project-volumes]: Could not evaluate: Could not retrieve information from environment production source(s) puppet:///modules/ldap/scripts/archive-project-volumes
* 10:36 hashar: integration: puppet now fails on instances :-/
* 10:29 hashar: rebased puppet.git on integration-puppetmaster.  Autoupdater was blocked by a couple 3-way merges.


== June 30 ==
== 2021-08-10 ==
* 10:07 hashar: deployment-bastion sudo -u l10nupdate bash -c 'cd /srv/l10nupdate/mediawiki/extension && git submodule foreach git gc'
* 16:57 James_F: Zuul: Update e-mail address for Zabe in the allow list
* 09:43 hashar: deployment-bastion sudo -u jenkins-deploy bash -c 'cd /srv/mediawiki-staging/php-master/extensions && git submodule foreach git gc'
* 16:06 James_F: Ran `sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/Wikibase` on doc1001 for [[phab:T288396|T288396]]
* 09:40 hashar: deployment-bastion sudo -u l10nupdate bash -c 'cd /srv/l10nupdate/mediawiki/core/.git && git gc'
* 09:39 hashar: deployment-bastion: sudo -u l10nupdate bash -c 'cd /srv/l10nupdate/mediawiki/extensions/.git && git gc'
* 09:38 hashar: deployment-bastion sudo -u jenkins-deploy bash -c 'cd /srv/mediawiki-staging/php-master/extensions/.git && git gc'
* 09:31 hashar: beta: running git gc on deployment-bastion Trebuchet directories. As trebuchet: find /srv/deployment/*/*/.git -type d -name .git -print -exec bash -c 'cd {} && git gc' \;
* 07:09 legoktm: deploying https://gerrit.wikimedia.org/r/221835


== June 29 ==
== 2021-08-09 ==
* 23:19 bd808: Moved logstash irc bot from logstash1 to logstash2
* 21:02 urbanecm: Remove hanging beta-scap-sync-world job in CI to unblock beta auto-updates
* 22:25 legoktm: deploying https://gerrit.wikimedia.org/r/221749
* 18:08 thcipriani: restarted nutcracker on beta cluster salt '*-mediawiki*' cmd.run 'service nutcracker restart'
* 10:42 hashar: manually rebasing integration-puppetmaster git repo
* 10:24 hashar: restarted puppetmater on deployment-salt
* 10:23 hashar: puppet master stalled due to: [ldap-yaml-enc.p] <defunct> .  Killing it
* 10:21 hashar: sees beta cluster puppetmaster is suffering from some random issue


== June 27 ==
== 2021-08-07 ==
* 02:42 legoktm: deploying https://gerrit.wikimedia.org/r/221343 & https://gerrit.wikimedia.org/r/221344
* 00:48 James_F: Docker: Publish quibble-buster-php73-coverage 1.1.1 for [[phab:T287918|T287918]].
* 02:36 Krinkle: Reloading Zuul to deploy  https://gerrit.wikimedia.org/r/221342
* 00:29 James_F: Zuul: Add skin-coverage jobs to all Wikimedia production skins [[phab:T287918|T287918]]
* 02:22 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/221338
* 00:27 James_F: Zuul: Provide a skin-coverage template [[phab:T287918|T287918]]
* 02:15 legoktm: deploying https://gerrit.wikimedia.org/r/#/c/221337/
* 01:56 legoktm: deploying https://gerrit.wikimedia.org/r/221333 & https://gerrit.wikimedia.org/r/221334
* 01:42 legoktm: deploying https://gerrit.wikimedia.org/r/221331
* 01:36 legoktm: deploying https://gerrit.wikimedia.org/r/221330
* 01:28 legoktm: deploying https://gerrit.wikimedia.org/r/221329
* 01:13 legoktm: deploying https://gerrit.wikimedia.org/r/221328
* 00:15 legoktm: deploying https://gerrit.wikimedia.org/r/221316 & https://gerrit.wikimedia.org/r/221318


== June 26 ==
== 2021-08-06 ==
* 22:39 marxarelli: Reloading Zuul to deploy I3deec5e5a7ce7eee75268d0546eafb3e4145fdc7
* 23:53 James_F: Docker: Publishing quibble-buster-php73-coverage 1.1.0 for [[phab:T287918|T287918]]
* 22:20 marxarelli: Reloading Zuul to deploy I7affe14e878d5c1fc4bcb4dfc7f2d1494cd795b7
* 23:47 James_F: Zuul: [mediawiki/skins/Mirage] Not a production skin; move to right section
* 21:45 legoktm: deploying https://gerrit.wikimedia.org/r/221295
* 21:21 marxarelli: running `jenkins-jobs update` to deploy I7affe14e878d5c1fc4bcb4dfc7f2d1494cd795b7
* 18:46 marxarelli: running `jenkins-jobs update '*bundle*'` to deploy Icb31cf57bee0483800b41a2fb60d236fcd2d004e


== June 25 ==
== 2021-08-05 ==
* 23:38 legoktm: deploying https://gerrit.wikimedia.org/r/221001
* 22:26 brennen: gitlab: setting CI access on all repos to "Project members only", per https://www.mediawiki.org/wiki/GitLab/Policy#Permissions - may need revisited depending on effects
* 21:21 thcipriani: updated deployment-salt to match puppet by rm /var/lib/git/operations/puppet/modules/cassandra per godog's instructions
* 22:04 jeena: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/710108
* 19:09 hashar: purged all WikidataQuality workspaces.  Got renamed to WikibaseQuality*
* 15:10 hashar: integration: sudo cumin --force 'name:docker' 'docker container prune -f && docker image prune -f'
* 14:22 jzerebecki: reloading zuul for https://gerrit.wikimedia.org/r/#/c/220737/2
* 15:09 hashar: integration-agent-docker-1020: docker container prune && docker image prune
* 14:20 jzerebecki: killing a fellows idle shell zuul@gallium:~$ kill 13602
* 15:02 hashar: Building Docker image helm-linter for https://gerrit.wikimedia.org/r/c/integration/config/+/710276
* 11:03 hashar: Rebooting  integration-raita and integration-vmbuilder-trusty
* 11:01 hashar: Unmounting /data/project and /home NFS mounts from integration-raita and integration-vmbuilder-trusty https://phabricator.wikimedia.org/T90610
* 10:45 hashar: deployment-sca02 deleted /var/lib/puppet/state/agent_catalog_run.lock from June 5th
* 08:57 hashar: Fixed puppet "Can't dup Symbol" on deployment-pdf01  by deleting puppet, /var/lib/puppet and reinstalling it from scratch https://phabricator.wikimedia.org/T87197
* 08:39 hashar: apt-get upgrade deployment-salt
* 08:08 hashar: deployment-pdf01 deleted /var/log/ocg/ content. Last entry is from July 25th 2014 and puppet complains with <tt>e[/var/log/ocg]: Not removing directory; use 'force' to override</tt>
* 08:04 hashar: apt-get upgrade deployment-pdf01
* 06:37 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/220712
* 06:33 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/220705


== June 24 ==
== 2021-08-04 ==
* 19:31 hashar: rebooting deployment-cache-upload02
* 22:53 bd808: Updated composer-github-oauthtoken in Jenkins config to us a newer personal access token from GitHub per notices about https://github.blog/2021-04-05-behind-githubs-new-authentication-token-formats/
* 19:28 hashar: fixing DNS puppet etc on deployment-cache-upload02
* 17:49 brennen: gitlab-test: testing upgrade to 13.12.9
* 19:24 hashar: rebooting deployment-zookeeper to get rid of the /home NFS https://phabricator.wikimedia.org/T102169 
* 16:41 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/710066 # [[phab:T288111|T288111]]
* 19:06 hashar: beta: salt 'i-00*' cmd.run "echo 'domain integration.eqiad.wmflabs\nsearch integration.eqiad.wmflabs eqiad.wmflabs\nnameserver 208.80.154.20\noptions timeout:5' > /etc/resolv.conf"
* 15:13 thcipriani: puppet fixed on deployment-deploy<nowiki>{</nowiki>01,03<nowiki>}</nowiki>
* 19:06 hashar: fixing DNS / puppet and salt on i-000008d5.eqiad.wmflabs  i-000002de.eqiad.wmflabs i-00000958.eqiad.wmflabs
* 15:08 thcipriani: rebase deployment-puppetmaster04:labs/private causing deployment-deploy<nowiki>{</nowiki>01,03<nowiki>}</nowiki> failure for...¯\_(ツ)_/¯
* 15:35 hashar: integration-dev recovered!  puppet hasn't run for ages but caught up with changes
* 15:13 hashar: removed /var/lib/puppet/state/agent_catalog_run.lock on integration-dev
* 09:52 hashar: Java 6 removed from gallium / lanthanum and CI labs slaves.
* 09:18 hashar: getting rid of java 6 on CI machines ( https://phabricator.wikimedia.org/T103491 )
* 07:58 hashar: Bah puppet reenable NFS on deployment-parsoidcache02 for some reason
* 07:57 hashar: disabling NFS on deployment-parsoidcache02
* 00:38 marxarelli: reloading zuul to deploy https://gerrit.wikimedia.org/r/#/c/219513/
* 00:32 marxarelli: running `jenkins-jobs update` to create 'mwext-MobileFrontend-mw-selenium' with I7affe14e878d5c1fc4bcb4dfc7f2d1494cd795b7
* 00:20 marxarelli: running `jenkins-jobs update` to create 'mediawiki-selenium-integration' with I7affe14e878d5c1fc4bcb4dfc7f2d1494cd795b7


== June 23 ==
== 2021-08-03 ==
* 23:29 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/220350
* 21:12 brennen: Updating dev-images docker-pkg files on primary contint for [[gerrit:698880{{!}}add buster php images]]
* 21:34 bd808: updated scap to 33f3002 (Ensure that the minimum batch size used by cluster_ssh is 1)
* 19:53 legoktm: deleted broken renames from centralauth.renameuser_status on beta cluster
* 18:28 jzerebecki: zuul reload for https://gerrit.wikimedia.org/r/#/c/219778/4
* 16:33 bd808: updated scap to da64a65 (Cast pid read from file to an int)
* 16:20 bd808: updated scap to 947b93f (Fix reference to _get_apache_list)
* 12:24 hashar: rebooting integration-labvagrant (stuck)
* 00:07 legoktm: deploying https://gerrit.wikimedia.org/r/220020


== June 22 ==
== 2021-08-02 ==
* 22:23 legoktm: deploying https://gerrit.wikimedia.org/r/219603
* 23:27 Reedy: Reloading Zuul to deploy  https://gerrit.wikimedia.org/r/709563
* 21:47 bd808: scap emitting soft failures due to missing python-netifaces on deployment-videoscaler01; should be fixed by a current puppet run
* 23:12 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/709562
* 21:37 bd808: Updated scap to 81b7c14 (Move dsh group file names to config)
* 23:05 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/709558
* 14:58 hashar: disabled  sshd MAC/KEX hardening on beta (was https://gerrit.wikimedia.org/r/#/c/219828/ )
* 22:54 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/709555 https://gerrit.wikimedia.org/r/709556
* 14:32 hashar: restarting Jenkins
* 21:41 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/709543
* 14:30 hashar: Reenable sshd MAC/KEX hardening on beta by cherry picking https://gerrit.wikimedia.org/r/#/c/219828/
* 13:17 moritzm: activated firejail service containment for graphoid, citoid and mathoid in deployment-sca
* 11:07 hashar: fixing puppet on integration-zuul-server
* 10:29 hashar: rebooted deployment-kafka02 to get rid of /home NFS share
* 10:25 hashar: fixed puppet.conf on deployment-urldownloader
* 10:20 hashar: enabled puppet agent on deployment-urldownloader
* 10:05 hashar: removing puppet lock on deployment-elastic07 ( rm /var/lib/puppet/state/agent_catalog_run.lock )
* 09:40 hashar: fixed puppet certificates on integration-lightslave-jessie-1002 by deleting the SSL certs
* 09:31 hashar: cant reach integration-lightslave-jessie-1002 , probably NFS related
* 09:22 hashar: upgrading Jenkins gearman plugin from 0.1.1 to latest master (f2024bd).


== June 21 ==
== 2021-07-30 ==
* 02:40 legoktm_: deploying https://gerrit.wikimedia.org/r/219401
* 21:27 dduvall: "Total reclaimed space: 141.4GB" on releases1002 following docker prune
* 21:24 dduvall: running `docker system prune -af` on releases1002


== June 20 ==
== 2021-07-29 ==
* 03:12 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/219449
* 22:12 jeena: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/708596
* 14:06 brennen: gerrit: added ldap/ops to ownership of operations/gitlab-ansible
* 13:21 addshore: reload zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/708764 "Add michaelcochez to whitelist users"


== June 19 ==
== 2021-07-28 ==
* 18:39 thcipriani: running `salt -b 2 '*' cmd.run 'puppet agent -t'` from deployment salt to remount /data/projects
* 17:53 andrewbogott: rebooting deployment-logstash03 as it's in an inconsistent config state
* 18:36 thcipriani: added role::deployment::repo_config to deployment-prep hiera, to be removed after patched in ops/puppet
* 14:44 hashar: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/708536
* 16:48 thcipriani: primed keyholder on deployment-bastion
* 15:35 hashar: nodepool manages to boot instances and ssh to them. Now attempting to add them as slave in Jenkins!


== June 17 ==
== 2021-07-27 ==
* 20:43 legoktm: deploying https://gerrit.wikimedia.org/r/219021
* 16:55 dduvall: creating new gitlab runner instance runner-1002 for testing
* 18:56 legoktm: deploying https://gerrit.wikimedia.org/r/218981
* 16:32 hashar: cleaned some obsolete caches under integration-castor03 /srv/jenkins-workspace/caches
* 16:40 legoktm: deploying https://gerrit.wikimedia.org/r/218938 & https://gerrit.wikimedia.org/r/218939
* 14:16 jzerebecki: deploying zuul config ca3bd69..00eb921
* 13:53 jzerebecki: applying https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Gearman_deadlock
* 13:00 jzerebecki: done
* 12:32 jzerebecki: also needed to kill a few beta jobs, like https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code.2Fdb_update says. no proceeding with https://gerrit.wikimedia.org/r/#/c/214603/8
* 12:23 jzerebecki: before doing that actually trying https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Jenkins_execution_lock to try to unlock https://integration.wikimedia.org/ci/computer/deployment-bastion.eqiad/
* 12:17 jzerebecki: changing many jenkins jobs while deploying https://gerrit.wikimedia.org/r/#/c/214603/8


== June 16 ==
== 2021-07-26 ==
* 15:55 bd808: Resolved rebase conflicts on deployment-salt caused by code review changes of https://gerrit.wikimedia.org/r/#/c/216325 prior to merge
* 20:45 brennen: runner-1001: installed docker & gitlab-runner, registered runner-1001 to the gitlab instance for pipeline experimentation ([[phab:T287279|T287279]])
* 13:05 hashar: upgrading HHVM on CI trusty slaves https://phabricator.wikimedia.org/T102616  <tt>salt -v -t 30  --out=json -C 'G@oscodename:trusty and *slave*' pkg.install pkgs='["hhvm","hhvm-dev","hhvm-fss","hhvm-luasandbox","hhvm-tidy","hhvm-wikidiff2"]'</tt>
* 18:10 dduvall: reloading zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/707879
* 11:45 hashar: integration-slave-trusty-1021 downgrading hhvm plugins to match hhvm 3.3.1
* 18:00 dduvall: creating 2 new jenkins jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/707879
* 11:42 hashar: integration-slave-trusty-1021 downgrading hhvm, hhvm-dev from 3.3.6 to 3.3.1
* 08:24 hashar: jjb: update jobs to Quibble 1.0.1 # https://gerrit.wikimedia.org/r/c/integration/config/+/708038
* 11:19 hashar: rebooting integration-dev , unreacheable
* 11:09 hashar: apt-get upgrade on integration-slave-trusty-1021
* 08:19 hashar: rebooting integration-slave-jessie-1001, unreacheable


== June 15 ==
== 2021-07-23 ==
* 23:39 legoktm: deploying https://gerrit.wikimedia.org/r/218549
* 19:26 brennen: gitlab-runners: launched runner-1001, g3.cores8.ram36.disk20 to install baseline experimental runner ([[phab:T287279|T287279]])
* 23:22 legoktm: deploying https://gerrit.wikimedia.org/r/218527
* 14:08 hashar: Building Docker images for quibble 1.0.1
* 21:10 bd808: Put cherry-picks of https://gerrit.wikimedia.org/r/#/c/216325/ and https://gerrit.wikimedia.org/r/#/c/216337/ back on deployment-salt
* 13:45 hashar: Tag Quibble 1.0.1 @ {{Gerrit|5a2548699a}} # [[phab:T287001|T287001]]
* 19:59 hashar: manually rebased puppet repo on integration-puppetmaster (some patch got merged)
* 17:21 legoktm: deploying https://gerrit.wikimedia.org/r/218391
* 15:02 hashar: rebooting integration-slave-jessie-1001.integration.eqiad.wmflabs  (unresponsive)
* 14:37 hashar: rebooting integration-dev since it is unresponsive
* 13:22 hashar: cleaned integration-puppetmaster certificate
* 13:09 hashar: deleting integration-saltmaster puppet cert


== June 13 ==
== 2021-07-21 ==
* 07:53 legoktm: deploying https://gerrit.wikimedia.org/r/217997
* 21:06 brennen: gitlab1001: running ansible for logging typo fix ([[phab:T274462|T274462]])
* 03:42 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/217993
* 20:36 dancy: Newest scap deployed to beta cluster
* 01:11 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/217982
* 19:06 brennen: gitlab1001: running ansible to deploy nginx logging and status changes ([[phab:T274462|T274462]], [[phab:T275170|T275170]])
* 16:46 dancy: restarting Gerrit to fix plugins
* 16:07 dancy: Updating plugins on releases-jenkins
* 15:00 urbanecm: deployment-prep: Change password for `Martin Urbanec` at votewiki


== June 12 ==
== 2021-07-20 ==
* 21:29 jzerebecki: reloading zuul with 9ceb1ea..3b862a7 for https://gerrit.wikimedia.org/r/#/c/176377/3
* 23:17 brennen: removed erroneous listing of myself as a train deployer for this week from deployment schedule, added hashar ([[phab:T281156|T281156]])
* 21:22 legoktm: deploying https://gerrit.wikimedia.org/r/217448
* 18:39 hashar: Rolling back Jenkins jobs from Quibble 1.0.0 to 0.0.47  # [[phab:T287001|T287001]]
* 19:02 jzerebecki: done
* 19:00 jzerebecki: doing https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Gearman_deadlock
* 14:56 jzerebecki: reloaded zuul 38c009d..6753a47


== June 11 ==
== 2021-07-19 ==
* 14:44 hashar: deployment-prep and integration labs project got migrated out of ec2id.  Flawless / self maintaining task thanks to Andrew B. !
* 18:42 brennen: gerrit1001: ran puppet; noted that quotes were added to jvm configuration values
* 14:38 hashar: integration-saltmaster: salt-key --accept-all --yes
* 14:30 hashar: rebasing puppetmaster on integration-puppetmaster ca27502..c409503
* 14:28 hashar: rebasing puppetmaster on deployment-salt  ca27502..c409503
* 14:28 hashar: cert madness on integration and deployment-prep ( https://gerrit.wikimedia.org/r/#/c/202924/ )
* 10:44 hashar: operations-dns-lint can't be migrated yet until we figure out a solution to provide some missing GeoIP file https://phabricator.wikimedia.org/T98737
* 10:33 hashar: integration: pooling https://integration.wikimedia.org/ci/computer/integration-lightslave-jessie-1002/ with labels <tt>DebianJessie</tt> and <tt>contintLabsSlave</tt>. Does not have Zuul package installed though.
* 10:27 hashar: integration: do not install zuul on light slaves (i.e.: integration-lightslave-jessie-1002 ). Jessie does not have a zuul package yet  https://gerrit.wikimedia.org/r/#/c/217476/1
* 10:03 hashar: integration: cherry picked https://gerrit.wikimedia.org/r/#/c/217466/1 and https://gerrit.wikimedia.org/r/#/c/217467/1 and applied role::ci::slave::labs::light to integration-lightslave-jessie-1002
* 09:41 hashar: [[Hiera:Integration]] change puppet master from 'integration-puppetmaster' to 'integration-puppetmaster.integration.eqiad.wmflabs' https://phabricator.wikimedia.org/T102108
* 09:20 hashar: creating integration-lightslave-jessie-1002 a m1.small (1CPU) instance that would be a very basic Jenkins slaves.  The reason is role::ci::slave::labs includes too many things which are not ready for Jessie yet ( https://phabricator.wikimedia.org/T94836 ).  Will let us migrate operations-dns-lint to it since prod switched to Jessie (https://phabricator.wikimedia.org/T98003)


== June 10 ==
== 2021-07-18 ==
* 20:18 legoktm: deploying https://gerrit.wikimedia.org/r/217277
* 08:48 majavah: set shared_acme_certificates: <nowiki>{</nowiki><nowiki>}</nowiki> on deployment-prep shared hiera, [[phab:T276653|T276653]]
* 10:42 hashar: restarted jobchron/jobrunner on deployment-jobrunner01
* 10:42 hashar: manually nuked and repopulated jobqueue:aggregator:s-wikis:v2 on deplkoyment-redis01  It now only contains entries from all-labs.dblist
* 09:46 hashar: deployment-videoscaler restarted jobchron
* 08:19 mobrovac: reboot deployment-restbase01 due to ssh problems


== June 9 ==
== 2021-07-16 ==
* 22:13 thcipriani: are we back?
* 21:01 jeena: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/698774
* 17:31 twentyafterfour: Branching 1.26wmf9
* 17:44 brennen: Try #2: Updating dev-images docker-pkg files on primary contint for [[gerrit:704162{{!}}Add a Swift language dev image for CI testing]] ([[phab:T284195|T284195]])
* 17:10 hashar: restart puppet master on deployment-salt. Was overloaded with wait I/O since roughly 1am UTC
* 17:12 hashar: Tag Quibble 1.0.0 @ {{Gerrit|a13d133f7d1}} # [[phab:T286187|T286187]] [[phab:T280506|T280506]] [[phab:T90875|T90875]] [[phab:T218534|T218534]] [[phab:T227352|T227352]]
* 16:56 hashar: restarted puppetmaster on deployment-salt


== June 8 ==
== 2021-07-14 ==
* 14:12 hashar: clearing disk space on trusty 1011 and 1012
* 22:47 brennen: Updating dev-images docker-pkg files on primary contint for [[gerrit:704162{{!}}Add a Swift language dev image for CI testing]] ([[phab:T284195|T284195]])
* 14:12 hashar: clearing disk space on trusty 1011 and 1012
* 21:14 dpifke: Cherry-picking https://gerrit.wikimedia.org/r/c/operations/puppet/+/703912 in beta puppet again, should only affect deployment-webperf12.
* 08:56 hashar: rebooted trusty-1013 trusty-1015  ( https://phabricator.wikimedia.org/T101658 ) and repooled them in Jenkins
* 08:48 hashar: rebooting integration-slave-trusty-1012 (stalled can't login)
* 04:30 legoktm: deploying https://gerrit.wikimedia.org/r/216520
* 00:40 legoktm: deploying https://gerrit.wikimedia.org/r/216600


== June 7 ==
== 2021-07-13 ==
* 20:43 Krinkle: Rebooting integration-slave-trusty-1015 to see if it comes back so we can inspect logs (T101658)
* 23:22 dpifke: Re-cherry-picking newer https://gerrit.wikimedia.org/r/c/operations/puppet/+/703912 patch in deployment-prep.  Should only affect deployment-webperf12.
* 20:16 Krinkle: Per Yuvi's advice, disabled "Shared project storage" (/data/project NFS mount) for the integration project. Mostly unused. Two existing directories were archived to /home/krinkle/integration-nfs-data-project/  
* 19:29 James_F: Manually deleted Jade extension coverage from doc1001 for [[phab:T281430|T281430]]
* 17:51 Krinkle: integration-slave-trusty-1012, trusty-1013 and 1015 unresponsive to pings or ssh. Other trusty slaves still reachable.
* 19:24 James_F: Zuul: [mediawiki/extensions/Jade] Mark repo as archived [[phab:T281430|T281430]]
* 16:33 dpifke: Cherry-picking https://gerrit.wikimedia.org/r/c/operations/puppet/+/703912 in deployment-prep puppet.  Should only affect deployment-webperf12.
* 10:22 hashar: gerrit: pushed upstream tags for plugins/gitiles  # [[phab:T278990|T278990]]
* 09:13 James_F: Zuul: [mediawiki/extensions/Report] Add basic quibble CI job
* 08:24 hashar: Updated operations/software/gerrit branches to 3.2.11 # [[phab:T278990|T278990]]
* 07:46 hashar: Wiping all Docker images from contint2001l


== June 6 ==
== 2021-07-12 ==
* 21:05 legoktm: deploying https://gerrit.wikimedia.org/r/216500
* 18:45 majavah: upgrade deployment-cache-text06 to use varnish 6 (with profile::cache::varnish::frontend::packages_component), and run apt upgrade, [[phab:T286506|T286506]]
* 18:43 majavah: deployment-cache-text06 varnish not starting, [[phab:T286506|T286506]], causing an outage on text traffic on deployment-prep
* 18:23 majavah: hard reboot deployment-cache-text06 once I got in using a root ssh key
* 16:15 majavah: hard reboot deployment-cache-text06, refusing to let me log in and console full of errors
* 14:48 Amir1: ran $ ./jjb-update 'wikidata-query-gui-build' ([[phab:T286479|T286479]])
* 14:44 majavah: fix merge conflict on deployment-puppetmaster04
* 13:19 James_F: Zuul: Add Voidwalker to the CI allow list
* 13:19 James_F: Zuul: Add R4356th to the CI allow list
* 13:05 James_F_: Zuul: [pywikibot/i18n] Add gate-and-submit-l10n pipeline [[phab:T286207|T286207]]


== June 5 ==
== 2021-07-09 ==
* 23:55 bd808: added deployment-logstash2 host and told cluster to move logstash all data there
* 14:47 bd808: Slienced puppet failure alert for deployment-parsoid12 for 7 days ([[phab:T286375|T286375]])
* 21:22 bd808: restarted puppetmaster on deployment-salt ("Could not request certificate: Error 500 on SERVER: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">")
* 00:18 bd808: Silenced puppet failure alert for deployment-kafka-jumbo-3 for the next 7 days ([[phab:T286358|T286358]])
* 21:17 hashar: Pooled in mediawiki-extensions-qunit which runs qunit tests with karma with multiple extensions . https://gerrit.wikimedia.org/r/#/c/216132/ . https://phabricator.wikimedia.org/T99877
* 19:45 thcipriani: set use_dnsmasq: false on Hiera:Integration
* 19:40 hashar: refreshed Jenkins jobs mediawiki-extensions-hhvm and mediawiki-extensions-zend with  https://gerrit.wikimedia.org/r/#/c/216100/3 (refactoring)
* 18:56 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/216182
* 18:52 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/216159


== June 4 ==
== 2021-07-08 ==
* 18:06 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/214501
* 08:21 majavah: kick stuck puppet agent on deployment-logstash04
* 16:50 legoktm: deploying https://gerrit.wikimedia.org/r/215935
* 15:07 hashar: integration-jessie-slave1001 : upgrading salt from 2014.1.13 to 2014.7.5
* 14:58 thcipriani: running sudo salt '*' cmd.run 'sed -i "s/GlobalSign_CA.pem/ca-certificates.crt/" /etc/ldap/ldap.conf' on integration-saltmaster
* 14:54 hashar: integration-jessie-slave1001 : running dpkg --configure -a
* 09:26 Krinkle: Reloading Zuul to deploy  https://gerrit.wikimedia.org/r/215870


== June 3 ==
== 2021-07-07 ==
* 23:31 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/209991
* 18:10 majavah: create shellbox.svc.deployment-prep.eqiad1.wikimedia.cloud. record as a CNAME to deployment-shellbox instance [[phab:T286298|T286298]]
* 20:49 hashar: restarted zuul entirely to remove some stalled jobs
* 20:47 marxarelli: Reloading Zuul to deploy I96649bc92a387021a32d354c374ad844e1680db2
* 20:28 hashar: Restarting Jenkins to release a deadlock
* 20:22 hashar: deployment-bastion Jenkins slave is stalled again :-(  No code update happening on beta cluster
* 18:50 thcipriani: change use_dnsmasq: false for deployment-prep
* 18:24 thcipriani: updating deployment-salt puppet in prep for use_dnsmasq=false
* 11:58 kart_: Cherry-picked 213840 to test logstash
* 10:08 hashar: Update JJB fork again f966521..4135e14 . Will remove the http notification to zuul {{bug:T93321}}. REFRESHING ALL JOBS!
* 10:03 hashar: Further updated JJB fork  c7231fe..f966521
* 09:10 hashar: Refershing almost all jenkins jobs to take in account the Jenkins Git plugin upgrade https://phabricator.wikimedia.org/T101105
* 03:07 Krinkle: Reloading Zuul to deploy  https://gerrit.wikimedia.org/r/215571


== June 2 ==
== 2021-07-06 ==
* 20:58 bd808: redis-cli srem "deploy:scap/scap:minions" i-000002f4.eqiad.wmflabs
* 01:16 legoktm: reloaded zuul for https://gerrit.wikimedia.org/r/703219
* 20:54 bd808: deleted unused deployment-rsync01 instance
* 20:49 bd808: Updated scap to 62d5cb2 (Lint JSON files)
* 20:40 marxarelli: cherry-picked https://gerrit.wikimedia.org/r/#/c/208024/ on integration-puppetmaster
* 20:38 marxarelli: manually rebased operations/puppet on integration-puppetmaster to fix empty commit from cherry-pick
* 17:01 hashar: updated JJB fork to e3199d9..c7231fe
* 15:16 hashar: updated integration/jenkins-job-builder to  e3199d9
* 13:16 hashar: restarted deployment-salt


== June 1 ==
== 2021-07-05 ==
* 08:18 hashar: Jenkins: upgrading git plugin from 1.5.0 to latest
* 15:02 Amir1: deployed 703212 ([[phab:T286058|T286058]])


== May 31 ==
== 2021-07-02 ==
* 21:31 legoktm: deploying https://gerrit.wikimedia.org/r/214982
* 17:32 brennen: gitlab1001: run ansible to deploy https://gerrit.wikimedia.org/r/c/operations/gitlab-ansible/+/701068 ([[phab:T274463|T274463]])
* 20:50 legoktm: deploying https://gerrit.wikimedia.org/r/214939
* 09:49 James_F: Zuul: [pywikibot/core] Add deeptest to gate-and-submit section
* 00:59 legoktm: deployed https://gerrit.wikimedia.org/r/214889


== May 29 ==
== 2021-06-30 ==
* 22:45 legoktm: deploying https://gerrit.wikimedia.org/r/214775
* 22:42 brennen: gitlab: published https://gitlab.wikimedia.org/releng/gitlab-settings
* 19:48 legoktm: deleting corrupt mwext-qunit@2 workspace on integration-slave-trusty-1017
* 22:39 brennen: gitlab: creating people, people/wmf, and people/wmf/release-engineering groups; mandating 2fa for people/wmf
* 17:21 legoktm: deploying https://gerrit.wikimedia.org/r/214652 and https://gerrit.wikimedia.org/r/214653
* 17:57 thcipriani: restart ci jenkins following upgrade
* 17:54 thcipriani: restart releases-jenkins following upgrade
* 12:36 James_F: Zuul: [mediawiki/core] Drop PHP70/71 testing for REL1_31


== May 28 ==
== 2021-06-29 ==
* 20:50 bd808: Ran "del jobqueue:aggregator:h-ready-queues:v2" on deployment-redis01
* 21:45 urbanecm: urbanecm@deployment-deploy01:~$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # [[phab:T285811|T285811]]
* 13:46 hashar: upgrading Jenkins git plugin from 1.4.6+wmf1 to 1.7.1 {{bug|T100655}}  and restarting Jenkins


== May 27 ==
== 2021-06-28 ==
* 15:09 hashar: Jenkins slaves are all back up. Root cause was some ssh algorithm in their sshd which is not supported by Jenkins jsch embedded lib.
* 12:06 majavah: revert manual scap downgrade on deployment-mediawiki11
* 14:30 hashar: manually rebasing puppet git on deployment-salt (stalled)
* 11:59 majavah: downgrade scap to 3.17.1-1 (matching production) on deployment-mediawiki11, testing for [[phab:T285125|T285125]]
* 14:27 hashar: restarting deployment-salt / some process is 100% wa/IO
* 13:38 hashar: restarted integration puppetmaster (memory leak)
* 13:35 hashar: integration-puppetmaster apparently out of memory
* 13:30 hashar: All Jenkins slaves are disconnected due to some ssh error. CI is down.


== May 24 ==
== 2021-06-24 ==
* 10:27 duh: deploying https://gerrit.wikimedia.org/r/213218
* 01:11 Krinkle: deployment-memc08 and -memc09: apt-get install memkeys (already installed on deployment-mediawiki11)


== May 23 ==
== 2021-06-23 ==
* 21:43 legoktm: deploying https://gerrit.wikimedia.org/r/212960
* 16:32 Reedy: beta update jobs have been stuck for ~9.5 hours. Going to attempt to unstick


== May 20 ==
== 2021-06-22 ==
* 17:19 thcipriani|afk: add --fail to curl inside mwext-Wikibase-qunit jenkins job
* 23:03 jeena: increasing releases-jenkins executors from 2 to 4
* 15:59 bd808: Applied role::beta::puppetmaster on deployment-salt to get Puppet logstash reports back
* 21:25 jeena: Updating dev-images docker-pkg files on primary contint [[phab:T273682|T273682]]
* 09:31 hashar: Removed DISPLAY=:94 being set from Zuul for most jobs https://gerrit.wikimedia.org/r/c/integration/config/+/700857/


== May 19 ==
== 2021-06-21 ==
* 02:54 bd808: Primed keyholder agent via `sudo -u keyholder env SSH_AUTH_SOCK=/run/keyholder/agent.sock ssh-add /etc/keyholder.d/mwdeploy_rsa`
* 23:27 Krinkle: Jobs for `deployment-deploy01` are waiting for executor but host is completely idle for over 10min. Disconnecting and relaunching.
* 02:40 Krinkle: deployment-bastion.eqiad magically back online and catching up jobs, though failing due to T99644
* 02:36 Krinkle: Jenkins is unable to launch slave agent on deployment-bastion.eqiad. Using "Jenkins Script Console" throws HTTP 503.
* 02:30 Krinkle: Various beta-mediawiki-config-update-eqiad jobs have been stuck for over 13 hours.


== May 12 ==
== 2021-06-19 ==
* 15:18 hashar: downgrading hhvm on CI slaves
* 13:44 majavah: remove deployment-deploy02 [[phab:T278689|T278689]]
* 15:10 hashar: mediawiki-phpunit-hhvm Jenkins job is broken due to an hhvm upgrade {{bug|T98876}}
* 08:05 majavah: creating deployment-logstash05 and configure it like 04, looks like elasticsearch does not like clusters with only one host [[phab:T283013|T283013]]
* 00:48 bd808: beta cluster central syslog going to logstash rather than deployment-bastion (see https://gerrit.wikimedia.org/r/#/c/210253)
* 00:36 bd808: Cherry-picked https://gerrit.wikimedia.org/r/#/c/210253/
* 00:16 legoktm: deploying https://gerrit.wikimedia.org/r/210251


== May 11 ==
== 2021-06-16 ==
* 22:50 legoktm: deploying https://gerrit.wikimedia.org/r/210219
* 21:47 James_F: Zuul: Install CI for mediawiki/libs/NormalizedException [[phab:T284732|T284732]]
* 22:29 bd808: removed duplicate local group l10nupdate from deployment-bastion that was shadowing the ldap group of the same name
* 05:44 majavah: restart trafficserver-tls.service on deployment-cache-upload06, was using an expired cert
* 22:24 bd808: removed duplicate local group mwdeploy from deployment-bastion that was shadowing the ldap group of the same name
* 22:15 bd808: Removed role::logging::mediawiki from deployment-bastion
* 20:55 legoktm: deleted operations-puppet-tox-py27 workspace on integration-slave-precise-1012, it was corrupt (fatal: loose object b48ccc3ef5be2d7252eb0f0f417f1b5b7c23fd5f (stored in .git/objects/b4/8ccc3ef5be2d7252eb0f0f417f1b5b7c23fd5f) is corrupt)
* 13:54 hashar: Jenkins: removing label hasContintPackages from production slaves, it is no more needed :)


== May 9 ==
== 2021-06-14 ==
* 00:10 bd808: Cherry-picked https://gerrit.wikimedia.org/r/#/c/209830 to deployment-bastion:/srv/deployment/scap/scap and deployed with trebuchet
* 22:06 brennen: gitlab-test: repointing floating IP to ansible test box, running ansible to test issue & wiki default config
* 21:41 brennen: gitlab-test: repointing floating IP to main test instance; gitlab-ctl reconfigure to test some feature flags
* 16:57 James_F: uul: [mediawiki/tools/api-testing] Publish docs on postmerge [[phab:T236915|T236915]]
* 16:31 James_F: Zuul: [mediawiki/tools/api-testing] Add npm run doc and publishing [[phab:T236915|T236915]]
* 16:28 James_F: Zuul: Add Yashvarshney02 to Trusted users
* 16:25 James_F: Zuul: Add Jay (CIS-A2K) to Trusted users
* 16:22 James_F: Zuul: Add initial CI for cloud/toolforge/jobs-framework-cli


== May 8 ==
== 2021-06-13 ==
* 23:59 bd808: Created /data/project/logs/WHERE_DID_THE_LOGS_GO.txt to point folks to the right places
* 18:44 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/699567
* 23:54 bd808: Switched MediaWiki debug logs to deployment-fluorine:/srv/mw-log
* 18:12 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/699541
* 20:05 bd808: Cherry-picked https://gerrit.wikimedia.org/r/#/c/209801
* 18:15 bd808: Cherry-picked https://gerrit.wikimedia.org/r/#/c/209769/
* 05:14 bd808: apache2 access logs now only locally on instances in /var/log/apache2/other_vhosts_access.log; error log in /var/log/apache2.log and still relayed to deployment-bastion and logstash (works like production now)
* 04:49 bd808: Symbolic link not allowed or link target not accessible: /srv/mediawiki/docroot/bits/static/master/extensions
* 04:47 bd808: cherry-picked https://gerrit.wikimedia.org/r/#/c/209680/


== May 7 ==
== 2021-06-11 ==
* 20:48 bd808: Updated kibana to bb9fcf6 (Merge remote-tracking branch 'upstream/kibana3')
* 20:49 brennen: gitlab1001: resetting application data, re-running ansible playbook
* 18:00 greg-g: brought deployment-bastion.eqiad back online in Jenkins (after Krinkle disconnected it some hours ago). Jobs are processing
* 15:50 James_F: Zuul: [node-rdkafka-statsd] Switch to service-pipeline-test [[phab:T284345|T284345]]
* 16:05 bd808: Updated scap to 5d681af (Better handling for php lint checks)
* 15:25 James_F: Zuul: [node-rdkafka-factory] Switch to service-pipeline-test [[phab:T284345|T284345]]
* 14:05 Krinkle: deployment-bastion.eqiad has been stuck for 10 hours.
* 14:47 majavah: generate and add my (taavi) own root key to deployment-prep
* 14:05 Krinkle: As of two days now, Jenkins always returns Wikimedia 503 Error page after logging in. Log in session itself is fine.
* 14:14 hashar: deployment-imagescaler03: delete local mwdeploy user with uid 497  # [[phab:T73480|T73480]]
* 05:02 legoktm: slaves are going up/down likely due to automated labs migration script
* 12:31 hashar: deployment-prep: removed deployment-shellbox puppet certificate and regenerated it. Ran puppet and it passes all fine.
* 10:30 hashar: deployment-prep: cherry picked https://gerrit.wikimedia.org/r/c/operations/puppet/+/699207 to add a motd on all instances # [[phab:T100837|T100837]]
* 02:03 Reedy: beta-update-databases-eqiad seemingly broken by CategoryTree fix for [[phab:T271011|T271011]].  Comment left on gerrit patch and task,  not reverting patch in master at this stage


== May 6 ==
== 2021-06-10 ==
* 15:13 bd808: Updated scap to 57036d2 (Update statsd events)
* 21:12 James_F: Zuul: [mediawiki/extensions/ProofreadPage] Add Scribunto as phan dep too [[phab:T281195|T281195]]
* 21:05 James_F: Zuul: [mediawiki/extensions/ProofreadPage] Add Scribunto as a dependency [[phab:T281195|T281195]]


== May 5 ==
== 2021-06-08 ==
* 19:06 jzerebecki: integration-slave-trusty-1015:~$ sudo -u jenkins-deploy rm -rf /mnt/jenkins-workspace/workspace/mwext-Wikibase-qunit/src/node_modules
* 20:32 brennen: gitlab1001: k6_gitlab: running test data creation
* 15:42 legoktm: deploying https://gerrit.wikimedia.org/r/208975 & https://gerrit.wikimedia.org/r/208976
* 19:59 brennen: gitlab1001: gitlab-ansible run to reset configuration
* 04:36 legoktm: deploying https://gerrit.wikimedia.org/r/208899
* 19:40 brennen: gitlab1001: resetting all application data for a second attempt at test data creation
* 04:04 legoktm: deploying https://gerrit.wikimedia.org/r/208889,90,91,92
* 17:40 brennen: gitlab1001: running k6 data generator
* 17:25 James_F: Zuul: [mediawiki/extensions/Wikibase] Switch legacy Ruby jobs to 2.5 [[phab:T280491|T280491]]


== May 4 ==
== 2021-06-07 ==
* 23:50 hashar: restarted Jenkins (deadlock with deployment-bastion)
* 22:02 urbanecm: urbanecm@deployment-sessionstore04:~$ sudo service cassandra start # [[phab:T263617|T263617]]
* 23:49 hashar: restarted Jenkins
* 22:02 urbanecm: urbanecm@deployment-sessionstore04:~$ sudo touch /etc/cassandra/service-enabled #[[phab:T263617|T263617]]
* 22:50 hashar: Manually retriggering last change of operations/mediawiki-config.git with: <tt>zuul enqueue --trigger gerrit --pipeline postmerge --project operations/mediawiki-config --change 208822,1</tt>
* 21:40 James_F: Docker: Pushing node12-test ano node12-test-browser 0.0.2 for [[phab:T284492|T284492]]
* 22:49 hashar: restarted Zuul to clear out a bunch of operations/mediawiki-config.git jobs
* 11:51 hashar: zuul enqueue --trigger gerrit --pipeline postmerge --project operations/software/tegola --change 698470,1 # request by mbsantos for https://gerrit.wikimedia.org/r/c/operations/software/tegola/+/698470
* 22:20 hashar: restarting Jenkins from gallium :/
* 22:18 thcipriani: jenkins restarted
* 22:12 thcipriani: preparing jenkins for shutdown
* 21:59 hashar: disconnected reconnected  Jenkins Gearman client
* 21:41 thcipriani: deployment-bastion still not accepting jobs from jenkins
* 21:35 thcipriani: disconnecting deployment-bastion and reconnecting, again
* 20:54 thcipriani: marking node deployment-bastion offline due to suck jenkins execution lock
* 19:03 legoktm: deploying https://gerrit.wikimedia.org/r/208339
* 17:46 bd808: integration-slave-precise-1014 died trying to clone mediawiki/core.git with "fatal: destination path 'src' already exists and is not an empty directory."


== May 2 ==
== 2021-06-05 ==
* 06:53 legoktm: deploying https://gerrit.wikimedia.org/r/208366
* 20:34 James_F: Zuul: [mediawiki/extensions/TitleIcon] Switch to non-composer, with-selenium
* 06:45 legoktm: deploying https://gerrit.wikimedia.org/r/208364
* 05:49 legoktm: deploying https://gerrit.wikimedia.org/r/208358
* 05:25 legoktm: deploying https://gerrit.wikimedia.org/r/207132
* 04:18 legoktm: deploying https://gerrit.wikimedia.org/r/208342 and https://gerrit.wikimedia.org/r/208340
* 03:56 legoktm: reset mediawiki-extensions-hhvm workspace on integration-slave-trusty-1015 (bad .git lock)


== April 30 ==
== 2021-06-04 ==
* 19:26 Krinkle: Repooled integration-slave-trusty-1013. IP unchanged.
* 23:57 Krinkle: integration-agent-qemu-1001 back up, Thanks andrewbogott
* 19:00 Krinkle: Depooled integration-slave-trusty-1013 for labs maintenance (per andrewbogott)
* 23:47 Krinkle: Qemu jobs are stuck. Jenkins is unable to connect to integration-agent-qemu-1001
* 14:17 hashar: Jenkins: properly downgraded IRC plugin from 2.26 to 2.25
* 20:13 James_F: Zuul: Switch almost all node10 jobs to node12 [[phab:T284345|T284345]]
* 13:40 hashar: Jenkins: downgrading IRC plugin from 2.26 to 2.25
* 20:11 James_F: Zuul: [VisualEditor/VisualEditor] Switch node10 jobs to node12 [[phab:T284345|T284345]]
* 12:09 hashar: restarting Jenkins https://phabricator.wikimedia.org/T96183
* 19:20 James_F: Docker: Publishing node12 CI images [[phab:T284343|T284343]]


== April 29 ==
== 2021-06-03 ==
* 17:15 thcipriani: removed l10nupdate user from /etc/passwd on deployment-bastion
* 19:06 hashar: contint1001 and contint2001: deleted all workspaces under /srv/jenkins-slave/workspace/* # [[phab:T284125|T284125]]
* 15:00 hashar: Instances are being moved out from labvirt1005 which has some faulty memory. List of instances at https://phabricator.wikimedia.org/T97521#1245217
* 00:39 James_F: Zuul: Add EventLogging to dependencies of PropertySuggester
* 14:25 hashar: upgrading zuul on integration-slave-precise-1011 for  https://phabricator.wikimedia.org/T97106
* 00:24 James_F: Zuul: Add Anysite to CI allowlist
* 14:11 hashar: rebooting integration-saltmaster stalled.
* 00:19 James_F: Zuul: [mediawiki/services/image-suggestion-api] Use bespoke pipeline [[phab:T281132|T281132]]
* 13:11 hashar: Rebooting deployment-parsoid05 via wikitech interface.
* 13:02 hashar: labvirt1005 seems to have hardware issue. Impacts a bunch of beta cluster / integration instances as listed on https://phabricator.wikimedia.org/T97521#1245217
* 12:22 hashar: deployment-parsoid05 slow down is https://phabricator.wikimedia.org/T97421  . Running apt-get upgrade and rebooting it but its slowness issue might be with the underlying hardware
* 12:13 hashar: killing puppet on deployment-parsoid05  eats all CPU for some reason
* 02:40 legoktm: deploying https://gerrit.wikimedia.org/r/207363 and https://gerrit.wikimedia.org/r/207368


== April 28 ==
== 2021-06-02 ==
* 23:37 hoo: Ran foreachwiki extensions/Wikidata/extensions/Wikibase/lib/maintenance/populateSitesTable.php --load-from 'http://meta.wikimedia.beta.wmflabs.org/w/api.php' --force-protocol http (because some sites are http only, although the sitematrix claims otherwise)
* 19:21 urbanecm: deployment-prep: Enlarge mwlog's /srv partition to 5 GB (was 2 GB)
* 23:33 hoo: Ran foreachwiki extensions/Wikidata/extensions/Wikibase/lib/maintenance/populateSitesTable.php --load-from 'http://meta.wikimedia.beta.wmflabs.org/w/api.php' to fix all sites tables
* 13:49 hashar: zuul enqueue --trigger gerrit --pipeline postmerge --project search/MjoLniR --change 685569,4  #  Rerun sonar analysis against https://gerrit.wikimedia.org/r/c/search/MjoLniR/+/685569
* 23:18 hoo: Ran mysql> INSERT INTO sites (SELECT * FROM wikidatawiki.sites); on enwikinews to populate the sites table
* 23:18 hoo: Ran mysql> INSERT INTO sites (SELECT * FROM wikidatawiki.sites); on testwiki to populate the sites table
* 17:48 James_F: Restarting grrrit-wm for config change.
* 16:24 bd808: Updated scap to ef15380 (Make scap localization cache build $TMPDIR aware)
* 15:42 bd808: Freed 5G on deployment-bastion by deleting abandoned /tmp/scap_l10n_* directories
* 14:01 marxarelli: reloading zuul to deploy https://gerrit.wikimedia.org/r/#/c/206967/
* 00:17 greg-g: after the 3rd or so time doing it (while on the Golden Gate Bridge, btw) it worked
* 00:11 greg-g: still nothing...
* 00:10 greg-g: after disconnecting, marking temp offline, bringing back online, and launching slave agent: "Slave successfully connected and online"
* 00:07 greg-g: deployment-bastion is idle, yet we have 3 pending jobs waiting for an executer on it - will disconnect/reconnect it in Jenkins


== April 27 ==
== 2021-05-28 ==
* 21:45 bd808: Manually triggered beta-mediawiki-config-update-eqiad for zuul build df1e789c726ad4aae60d7676e8a4fc8a2f6841fb
* 09:41 addshore: reload zuul for https://gerrit.wikimedia.org/r/697033
* 21:20 bd808: beta-scap-equad job green again after adding a /srv/ disk to deployment-jobrunner01
* 08:02 hashar: Successfully published image docker-registry.discovery.wmnet/releng/ci-bullseye:0.1.0 # [[phab:T283777|T283777]]
* 21:08 bd808: Applied role::labs::lvm::srv on deployment-jobrunner01 and forced puppet run
* 21:08 bd808: Deleted deployment-jobrunner01:/srv/* in preparation for applying role::labs::lvm::srv
* 21:06 bd808: deployment-jobrunner01 missing role::labs::lvm::srv
* 21:00 bd808: Root partition full on deployment-jobrunner01
* 20:53 bd808: removed mwdeploy user from deployment-bastion:/etc/passwd
* 20:15 Krinkle: Relaunched Gearman connection
* 19:53 Krinkle: Jenkins unable to re-create Gearman connection. (HTTP 503 error from /configure). Have to force restart Jenkins
* 17:32 Krinkle: Relauch slave agent on deployment-bastion
* 17:31 Krinkle: Jenkins slave deployment-bastion deadlock waiting for executors


== April 26 ==
== 2021-05-27 ==
* 06:09 thcipriani|afk: rm scap l10nfiles from /tmp on deployment-bastion root partition 100% again...
* 23:14 brennen: gitlab1001: gitlab-ctl stop nginx - pausing httpd for the weekend
* 20:36 brennen: gitlab1001: temporarily disabling backup cron jobs
* 17:46 legoktm: reloaded zuul for https://gerrit.wikimedia.org/r/696565 https://gerrit.wikimedia.org/r/696566
* 16:52 brennen: gitlab1001: ran gitlab-ctl start; logins now working; will add banner to effect that this is all provisional state
* 16:05 brennen: gitlab1001: re-running ansible and puppet per [[phab:T279545|T279545]]
* 00:14 James_F: Zuul: [wikimedia/irc/ircservserv-config] Fix bad copy-paste
* 00:06 James_F: Zuul: [wikimedia/irc/ircservserv-config] Add bespoke pipeline jobs


== April 25 ==
== 2021-05-26 ==
* 16:00 thcipriani|afk: manually ran logrotate on deployment-jobrunner01, root partition at 100%
* 17:37 brennen: gitlab1001: reset admin password and ran `gitlab-ctl stop` ([[phab:T279545|T279545]])
* 15:16 thcipriani|afk: clear /tmp/scap files on deployment-bastion, root partition at 100%
* 16:24 brennen: running gitlab-ansible's install-gitlab-server.sh against gitlab1001.wikimedia.org


== April 24 ==
== 2021-05-24 ==
* 18:01 thcipriani: ran sudo chown -R mwdeploy:mwdeploy /srv/mediawiki on deployment-bastion to fix beta-scap-eqiad, hopefully
* 06:19 legoktm: reloaded zuul for https://gerrit.wikimedia.org/r/693639
* 17:26 thcipriani: remove deployment-prep from domain in /etc/puppet/puppet.conf on deployment-stream, puppet now OK
* 17:20 thcipriani: rm stale lock on deployment-rsync01, puppet fine
* 17:10 thcipriani: gzip /var/log/account/pacct.0 on deployment-bastion: ought to revisit logrotate on that instance.
* 17:00 thcipriani: rm stale /var/lib/puppet/state/agent_catalog_run.lock on deployment-kafka02
* 9:56 hashar: restarted mysql on both deployment-db1 and deployment-db2. The service is apparently not started on instance boot.  https://phabricator.wikimedia.org/T96905
* 9:08 hashar: beta: manually rebased operations/puppet.git
* 8:43 hashar: Enabling puppet on deployment-eventlogging02.eqiad.wmflabs {{bug|T96921}}


== April 23 ==
== 2021-05-21 ==
* 06:11 Krinkle: Running git-cache-update inside screen on integration-slave-trusty-1021 at /mnt/git
* 18:30 James_F: Zuul: Add phan to all extensions and skins [[phab:T283097|T283097]]
* 06:11 Krinkle: integration-slave-trusty-1021 stays depooled (see T96629 and T96706)
* 17:59 James_F: Docker: Publishing mediawiki-phan* images where the job passes if .phan/config.php is absent [[phab:T283097|T283097]]
* 04:35 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/206044 and https://gerrit.wikimedia.org/r/206072
* 08:30 legoktm: reloaded zuul for https://gerrit.wikimedia.org/r/693361 https://gerrit.wikimedia.org/r/691323
* 00:29 bd808: cherry-picked and applied https://gerrit.wikimedia.org/r/#/c/205969/ (logstash: Convert $::realm switches to hiera)
* 00:17 bd808: beta cluster fatal monitor full of "Bad file descriptor: AH00646: Error writing to /data/project/logs/apache-access.log"
* 00:03 bd808: cleaned up redis leftovers on deployment-logstash1


== April 22 ==
== 2021-05-20 ==
* 23:57 bd808: cherry-picked and applied https://gerrit.wikimedia.org/r/#/c/205968 (remove redis from logstash)
* 21:36 Krinkle: Fix broken Jenkins config for console sections of selenium jobs to accomodate for updates to wdio
* 23:33 bd808: reset deployment-salt:/var/lib/git/operations/puppet HEAD to production; forced update with upstream; re-cherry-picked I46e422825af2cf6f972b64e6d50040220ab08995
* 01:43 James_F: Zuul: Add Southparkfan to CI allowlist
* 23:28 bd808: deployment-salt:/var/lib/git/operations/puppet in detached HEAD state; looks to be for cherry pick of I46e422825af2cf6f972b64e6d50040220ab08995 ?
* 01:33 James_F: Zuul: [labs/codesearch] Add "test" deployment pipeline job too
* 21:40 thcipriani: restarted mariadb on deployment-db{1,2}
* 00:18 James_F: Zuul: [labs/codesearch] Install deployment pipeline
* 20:20 thcipriani: gzipped /var/log/pacct.0 on deployment-bastion
* 00:14 James_F: Publishing quibble-buster-php73-coverage:0.0.47-s2 with no memory limit for coverage jobs [[phab:T280669|T280669]]
* 19:50 hashar: zuul/jenkins are back up (blame Jenkins)
* 19:40 hashar: reenabling Jenkins gearman client
* 19:30 hashar: Gearman went back. Reenabling Jenkins as a Gearman client
* 19:27 hashar: Zuul gearman is stalled.  Disabling Jenkins gearman client to free up connections
* 17:58 Krinkle: Creating integration-slave-trusty-1021 per T96629 (using ci1.medium type)
* 14:34 hashar: beta: failures on instances are due to them being moved on different openstack compute nodes (virt***)
* 13:51 jzerebecki: integration-slave-trusty-1015:~$ sudo -u jenkins-deploy rm -rf /mnt/jenkins-workspace/workspace/mwext-Wikibase-qunit/src/node_modules
* 12:48 hashar: beta: Andrew B. starting to migrate beta cluster instances on new virt servers
* 11:34 hashar: integration: apt-get upgrade on integration-slave-trusty* instances
* 11:31 hashar: integration: Zuul package has been uploaded for Trusty!  Deleting the .deb from /home/hashar/


== April 21 ==
== 2021-05-19 ==
* 09:27 hashar: Nodepool created it is first instance ever! :)
* 23:44 James_F: Publishing quibble-buster-php73-coverage:0.0.47-s1 with a 4GiB memory limit for coverage jobs [[phab:T280669|T280669]]
* 01:51 legoktm: deploying https://gerrit.wikimedia.org/r/205494
* 16:42 James_F: Zuul: [mediawiki/extensions/AbuseFilter] Add Scribunto & EventLogging deps [[phab:T279275|T279275]]
* 16:39 James_F: Zuul: Add H.krishna123 to the list of trusted users [[phab:T279552|T279552]]


== April 20 ==
== 2021-05-17 ==
* 23:34 legoktm: deploying https://gerrit.wikimedia.org/r/205465
* 18:01 James_F: Zuul: [mediawiki/extensions/RelatedLinks] Archive [[phab:T279221|T279221]]
* 19:20 legoktm: mediawiki-extensions-hhvm workspace on integration-slave-trusty-1011 had bad lock file, wiping
* 16:10 hashar: deployment-salt kill -9 of puppetmaster processes
* 16:08 hashar: deployment-salt killed git-sync-upstream    netcat to labmon1001.eqiad.wmnet 8125  was eating all memory
* 16:04 hashar: beta: manually rebasing  operations/puppet on deployment-salt . Might have killed some live hack in the process :/
* 13:58 hashar: In Gerrit, hidden integration/jenkins-job-builder-config and integration/zuul-config historical repositories. Suggest by addshore on {{bug:T96522}}
* 03:39 legoktm: deploying https://gerrit.wikimedia.org/r/205174


== April 19 ==
== 2021-05-16 ==
* 06:12 legoktm: deploying https://gerrit.wikimedia.org/r/205076
* 19:58 Krinkle: deployment-mediawiki11$ apt-get install memkeys
* 09:29 Majavah: fix labs/private merge conflicts on deployment-puppetmaster04


== April 18 ==
== 2021-05-15 ==
* 05:18 legoktm: deploying https://gerrit.wikimedia.org/r/204995
* 21:51 James_F: Zuul: [mediawiki/tools/cli] Make mw-cli-test experimental for now [[phab:T248779|T248779]]
* 03:09 Krinkle: Finished set up of integration-slave-trusty-1017. Pooled.
* 21:40 James_F: Zuul: [mediawiki/tools/cli] Add new bespoke job [[phab:T248779|T248779]]
* 21:09 James_F: Zuul: [mediawiki/extensions/MediaWikiAuth] Mark repo as archived [[phab:T282955|T282955]]
* 18:38 James_F: Zuul: Temporarily remove TwoColConflict from gated extensions [[phab:T234002|T234002]] [[phab:T282935|T282935]].
* 09:30 Majavah: create deployment-logstash04 to install elk7
* 09:22 Majavah: beta: cherry-pick https://gerrit.wikimedia.org/r/c/operations/puppet/+/688315, remove cherry-pick for https://gerrit.wikimedia.org/r/c/operations/puppet/+/683837 [[phab:T277990|T277990]]
* 06:23 Majavah: cherry pick https://gerrit.wikimedia.org/r/c/operations/puppet/+/691494/ [[phab:T281986|T281986]]


== April 17 ==
== 2021-05-14 ==
* 17:52 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/204812
* 21:31 Krinkle: Delete now-unreadable unread echo notifications from deploymentwiki and clear cache badge count cache (echo_unread_wikis: 9892 rows affected, Echo/maintenance/recomputeNotifCounts.php),  [[phab:T198673|T198673]]
* 17:45 Krinkle: Creating integration-slave-trusty-1017
* 21:10 Krinkle: Delete beta cluster commonswiki.globalusage data for deploymentwiki, [[phab:T198673|T198673]],  https://wikitech.wikimedia.org/wiki/Delete_a_wiki (86 rows affected)
* 16:29 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/204791
* 21:09 Krinkle: Delete beta cluster centralauth rows relating to deploymentwiki, [[phab:T198673|T198673]], https://wikitech.wikimedia.org/wiki/Delete_a_wiki (12600 rows affected)
* 16:00 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/204783
* 20:51 Krinkle: I broke beta  `InvalidArgumentException: mcrouter-with-onhost-tier not present in $wgObjectCaches` - working on it
* 12:42 hashar: restarting Jenkins
* 13:08 addshore: Github, Allowed Wikimedia Helper Bot for GitHub to read `github/workflows/dependabot-gerrit.yml`
* 12:38 hashar: Switching zuul on lanthanum.eqiad.wmnet to the Debian package version
* 10:26 addshore: reload zuul for WMDE: Add Marta to trusted emails [integration/config] - https://gerrit.wikimedia.org/r/691117
* 12:14 hashar: Switching Zuul scheduler on gallium.wikimedia.org to the Debian package version
* 02:16 James_F: Zuul: [mediawiki/skins/MinervaNeue] Drop Ruby-based selenium job [[phab:T174018|T174018]] [[phab:T177260|T177260]] [[phab:T280901|T280901]]
* 12:12 hashar: Jenkins: enabled plugin "ZMQ Event Publisher"  and publishing all jobs result on TCP port 8888
* 02:09 James_F: Zuul: [operations/container/miscweb] Install bespoke pipeline CI [[phab:T281538|T281538]]
* 05:37 legoktm: deploying https://gerrit.wikimedia.org/r/204706
* 01:11 Krinkle: Repool integration-slave-precise-1013 and integration-slave-trusty-1015 (live hack with libeatmydata enabled for mysql; T96308)


== April 16 ==
== 2021-05-13 ==
* 22:08 Krinkle: Rebooting integration-slave-precise-1013 (depooled; experimenting with libeatmydata)
* 23:14 Krinkle: root@integration-agent-qemu-1001:/home/addshore# rm qmeutest/out.img (Reclaiming space to be able to make other changes)
* 22:07 Krinkle: Rebooted integration-slave-trusty-1015 (experimenting with libeatmydata)
* 22:48 longma: restarting zuul to try getting tests to run for https://gerrit.wikimedia.org/r/c/mediawiki/skins/Modern/+/684509
* 18:31 Krinkle: Rebooting integration-slave-precise-1012 and integration-slave-trusty-1012
* 17:57 Krinkle: Repooled instances. Conversion of mysql.datadir to tmpfs complete.
* 17:22 Krinkle: Gracefully depool integration slaves to deploy https://gerrit.wikimedia.org/r/#/c/204528/ (T96230)
* 14:35 thcipriani: running dpkg --configure -a on deployment-bastion to correct puppet failures


== April 15 ==
== 2021-05-12 ==
* 23:21 Krinkle: beta-update-databases-eqiad stuck waiting for executors on a node that has plenty executors available
* 19:40 dpifke: Un-cherry-picking https://gerrit.wikimedia.org/r/c/operations/puppet/+/439774 in deployment-prep puppet.  Hasn't been touched in over a year and is preventing merges.
* 21:15 hashar: Jenkins browser test jobs sometime deadlock because of the IRC notification plugin  https://phabricator.wikimedia.org/T96183
* 19:32 James_F: Zuul: Add PageImages, PageViewInfo, and Graph to MediaWiki gated extension set [[phab:T249674|T249674]]
* 20:34 hashar: hard restarting Jenkins
* 19:23 dpifke: Cherry picking https://gerrit.wikimedia.org/r/c/operations/puppet/+/683695 in deployment-prep puppet. Should only affect beta logstash.
* 19:24 Krinkle: Aborting browser tests jobs. Stuck for over 5 hours.  
* 16:15 James_F: Docker: Publishing civicrm 0.2.1 for [[phab:T277500|T277500]]
* 19:24 Krinkle: Aborting beta-scap-eqiad. Has been stuck for 2 hours on "Notifying IRC" after "Connection time out" from scap.
* 16:00 James_F: Zuul: Add GrowthExperiments to the MediaWiki gated extension set [[phab:T247507|T247507]] [[phab:T249674|T249674]]
* 08:22 hashar: restarted Jenkins
* 15:36 _joe_: updating helm-linter docker image
* 08:20 hashar: Exception in thread "RequestHandlerThread[#2]" java.lang.OutOfMemoryError: Java heap space
* 08:16 hashar: Jenkins process went wild taking all CPU busy on gallium


== April 14 ==
== 2021-05-11 ==
* 20:43 legoktm: starting SULF on beta cluster
* 23:32 James_F: Zuul: Add Disambiguator to the MediaWiki gated extension set [[phab:T237538|T237538]] [[phab:T249674|T249674]]
* 20:42 marktraceur: stopping all beta jobs, aborting running (and stuck) beta DB update, kicking bastion, to try and get beta to update
* 23:25 James_F: Zuul: [mediawiki/extensions/NCBITaxonomyLookup] Enable basic quibble CI
* 19:49 Krinkle: All systems go.
* 19:48 Krinkle: Jenkins configuration panel won't load ("Loading..." stays indefine, "Uncaught TypeError: Cannot convert to object at prototype.js:195")
* 19:46 Krinkle: Jenkins restarted. Relaunching Gearman
* 19:42 Krinkle: Jenkins still unable to obtain Gearman connection. (HTTP 503 error from /configure). Have to force restart Jenkins.
* 19:42 Krinkle: deployment-bastion jobs were stuck. marktraceur cancelled queue and relaunched slave. Now processing again.
* 15:27 Krinkle: puppetmaster: Re-apply I05c49e5248cb operations/puppet patch to re-fix T91524. Somehow the patch got lost.
* 08:46 hashar: does qa-morebots works ?


== April 13 ==
== 2021-05-10 ==
* 20:14 Krinkle: Restarting Zuul, Jenkins and aborting all builds. Everything got stuck following NFS outage in lab
* 14:38 James_F: Zuul: [mediawiki/extensions/VoteNY] Add SocialProfile as a phan dependency
* 19:28 Krinkle: Restarting Zuul, Jenkins and aborting all builds. Everything crashed following NFS outage in labs
* 14:04 CFisch_WMDE: Improve comment around ReferencePreviews beta cluster default ([[phab:T271206|T271206]])
* 17:01 legoktm: deploying https://gerrit.wikimedia.org/r/203858
* 14:04 CFisch_WMDE: Forward renamed config name for improved template search features ([[phab:T277028|T277028]])
* 13:56 Krinkle: Delete old integration-slave1001...1004 (T94916)
* 10:43 hashar: reducing number of executors on Precise instances from 5 to 4 and on Trusty instances from 6 to 4.  The Jenkins scheduler tends to assign the unified jobs to the same slave which overload a single slave while others are idling.
* 10:43 hashar: reducing number of executors from 5 to 4
* 08:46 hashar: jenkins removed #wikimedia-qa IRC channel from the global configuration
* 08:42 hashar: kill -9 jenkins  causes it was stuck in some deadlock related to the IRC plugin :(
* 08:34 zeljkof: restarting stuck Jenkins


== April 12 ==
== 2021-05-07 ==
* 23:58 bd808: sudo ln -s /srv/l10nupdate/mediawiki /var/lib/l10nupdate/mediawiki on deployment-bastion
* 16:37 James_F: Zuul: [operations/software/mailman-templates] Add CI of debian-glue [[phab:T282018|T282018]]
* 23:11 greg-g: 0bytes left on /var on deployment-bastion


== April 11 ==
== 2021-05-06 ==
* 23:13 legoktm: deploying https://gerrit.wikimedia.org/r/203628
* 02:52 James_F: jjb: Enable Sonar analysis for mjolnir builds [[phab:T264877|T264877]]
* 22:58 legoktm: deploying https://gerrit.wikimedia.org/r/203619 & https://gerrit.wikimedia.org/r/203626
* 02:14 James_F: Zuul: [mediawiki/extensions/UploadWizard] Drop tox job, not useful
* 06:13 legoktm: deployed https://gerrit.wikimedia.org/r/203520
* 00:16 James_F: Zuul: [mediawiki/services/parsoid] Drop parsoidsvc-parsertests-docker job [[phab:T271562|T271562]]
* 05:49 legoktm: deploying https://gerrit.wikimedia.org/r/203519 https://gerrit.wikimedia.org/r/203516 https://gerrit.wikimedia.org/r/203518


== April 10 ==
== 2021-05-05 ==
* 13:50 Krinkle: Pool integration-slave-precise-1012..integration-slave-precise-1014
* 23:52 James_F: Zuul: Ensure Parsoid's tests include the Disambiguator extension (take 2) [[phab:T271863|T271863]] [[phab:T237538|T237538]]
* 11:43 hashar: Filled https://phabricator.wikimedia.org/T95675 to migrate "Global-Dev Dashboard Data" to JJB/Zuul
* 20:38 James_F: Docker: Publishing quibble-buster* 0.0.47 images
* 11:40 Krinkle: Deleting various jobs from Jenkins that can be safely deleted (no longer in jjb-config). Will report the others to T91410 for inspection.
* 19:58 James_F: Docker: Publishing quibble-stretch* 0.0.47-s1 images, now with node
* 11:29 Krinkle: Fixed job "Global-Dev Dashboard Data" to be restricted to node "gallium" because it fails to connect to gp.wmflabs.org from lanthanum 1/2 builds.
* 18:23 James_F: Docker: Publishing quibble-stretch* 0.0.47 images
* 11:26 Krinkle: Re-established Gearman connection from Jenkins
* 16:54 hashar: Tag Quibble 0.0.47 @ {{Gerrit|8b200cfb0}} # [[phab:T271863|T271863]] [[phab:T199403|T199403]] [[phab:T281607|T281607]]
* 11:20 Krinkle: Jenkins unable to re-establish Gearman connection. Full restart.
* 14:17 CFisch_WMDE: Disable ReferencePreviews beta mode on beta labs ([[phab:T271206|T271206]])
* 10:39 Krinkle: Deleting the old integration1401...integration1405 instances. They've been depooled for 24h and their replacements are OK. This is to free up quota to create new Precise instances.
* 13:22 Majavah: cherry picking https://gerrit.wikimedia.org/r/c/operations/puppet/+/684034 https://gerrit.wikimedia.org/r/c/operations/puppet/+/684088 - might cause beta to go read only for a bit - [[phab:T110115|T110115]]
* 10:35 Krinkle: Creating integration-slave-precise-1012...integration-slave-precise-1014
* 10:31 Krinkle: Pool integration-slave-precise-1011
* 09:02 hashar: integration: Refreshed Zuul packages under /home/hashar
* 08:57 Krinkle: Fixed puppet failure for missing Zuul package on integration-dev by applying patch-integration-slave-trusty.sh


== April 9 ==
== 2021-05-04 ==
* 19:50 legoktm: deployed https://gerrit.wikimedia.org/r/202932
* 23:34 James_F: Zuul: Add Adam Hammad to CI allow list
* 17:20 Krinkle: Creating integration-slave-precise-1011
* 17:02 Amir1: stop exim4 and upgrade it in deployment-mx02
* 17:11 Krinkle: Depool integration-slave1402...integration-slave1405
* 16:52 Krinkle: Pool integration-slave-trusty-1011...integration-slave-trusty-1016
* 16:00 hashar: integration-slave-jessie-1001  recreated. Applying it role::ci::slave::labs which should also bring in the package builder role under /mnt/pbuilder
* 15:32 thcipriani: added mwdeploy_rsa to keyholder agent.sock via chmod 400 /etc/keyholder.d/mwdeploy_rsa && SSH_AUTH_SOCK=/run/keyholder/agent.sock ssh-add /etc/keyholder.d/mwdeploy_rsa && chmod 440 /etc/keyholder.d/mwdeploy_rsa; permissions in puppet may be wrong?
* 14:24 hashar: deleting integration-slave-jessie-1001 extended disk is too small
* 14:24 hashar: deleting integration-slave-jessie-1001 extended disk is too smal
* 13:14 hashar: integration-zuul-packaged applied role::labs::lvm::srv
* 13:01 hashar: integration-zuul-packaged  applied zuul::merger and zuul::server
* 12:59 Krinkle: Creating integration-slave-trusty-1011 - integration-slave-trusty-1016
* 12:40 hashar: spurts out <tt>Permission denied (publickey).</tt>
* 12:39 hashar: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ is still broken :-(
* 12:31 hashar: beta: reset hard of operations/puppet repo on the puppetmaster since it has been stalled for 9+days https://phabricator.wikimedia.org/T95539
* 10:46 hashar: repacked extensions in deployment-bastion staging area: <tt>find /mnt/srv/mediawiki-staging/php-master/extensions -maxdepth 2  -type f -name .git  -exec bash  -c 'cd `dirname {}` && pwd && git repack -Ad && git gc' \;</tt>
* 10:31 hashar: deployment-bastion has a lock file remaining /mnt/srv/mediawiki-staging/php-master/extensions/.git/refs/remotes/origin/master.lock
* 09:55 hashar: restarted Zuul to clear out some stalled jobs
* 09:35 Krinkle: Pooled integration-slave-trusty-1010
* 08:59 hashar: rebooted deployment-bastion and cleared some files under /var/
* 08:51 hashar: deployment-bastion is out of disk space on /var/  :(
* 08:50 hashar: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/ timed out after 30 minutes while trying to  git pull
* 08:50 hashar: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/  job stalled for some reason
* 06:15 legoktm: deploying https://gerrit.wikimedia.org/r/202998
* 06:02 legoktm: deploying https://gerrit.wikimedia.org/r/202992
* 05:11 legoktm: deleted core dumps from integration-slave1002, /var had filled up
* 04:36 legoktm: deploying https://gerrit.wikimedia.org/r/202938
* 00:32 legoktm: deploying https://gerrit.wikimedia.org/r/202279


== April 8 ==
== 2021-05-03 ==
* 21:56 legoktm: deploying https://gerrit.wikimedia.org/r/202930
* 19:42 James_F: Zuul: [mediawiki/services/image-suggestion-api] Publish images post-merge [[phab:T281256|T281256]]
* 21:15 legoktm: deleting non-existent jobs' workspaces on labs slaves
* 17:05 James_F: Docker: Publishing quibble-buster images with python3-distutils so quibble can build
* 19:09 Krinkle: Re-establishing Gearman-Jenkins connection
* 16:07 James_F: Zuul: Add Luca Mauri to the CI allow list
* 19:00 Krinkle: Restarting Jenkins
* 13:55 CFisch_WMDE: enable new search features for the template dialog ([[phab:T271802|T271802]])
* 19:00 Krinkle: Jenkins Master unable to re-establish Gearman connection
* 19:00 Krinkle: Zuul queue is not being distributed properly. Many slaves are idling waiting to receive builds but not getting any.
* 18:29 Krinkle: Another attempt at re-creating the Trusty slave pool (T94916)
* 18:07 legoktm: deploying https://gerrit.wikimedia.org/r/202289 and https://gerrit.wikimedia.org/r/202445
* 18:01 Krinkle: Jobs for Precise slaves are not starting. Stuck in Zuul as 'queued'. Disconnected and restarted slave agent on them. Queue is back up now.
* 17:36 legoktm: deployed https://gerrit.wikimedia.org/r/180418
* 13:32 hashar: Disabled Zuul install based on git clone / setup.py by cherry picking https://gerrit.wikimedia.org/r/#/c/202714/ .  Installed the Zuul debian package on all slaves
* 13:31 hashar: integration: running <tt>apt-get upgrade</tt> on Trusty slaves
* 13:30 hashar: integration: upgrading python-gear and python-six on Trusty slaves
* 12:43 hasharLunch: Zuul is back and it is nasty
* 12:24 hasharLunch: killed zuul on gallium :/


== April 7 ==
== 2021-05-02 ==
* 16:26 Krinkle: git-deploy: Deploying integration/slave-scripts 4c6f541
* 18:58 Majavah: add dns record upload.wikimedia.beta.wmflabs.org. -> 185.15.56.35 (deployment-cache-upload floating address)
* 12:57 hashar: running apt-get upgrade on integration-slave-trusty* hosts
* 18:50 Majavah: adjust deployment-cache* hieradata to treat upload.wikimedia.beta.wmflabs.org like upload.beta.wmflabs.org
* 12:45 hashar: recreating integration-slave-trusty-1005
* 18:42 Krinkle: Cherry-pick "mediawiki: Remove 'deployment.wikimedia' vhost from Beta Cluster" - <https://gerrit.wikimedia.org/r/c/operations/puppet/+/684117>, ref [[phab:T198673|T198673]]
* 12:26 hashar: deleting integration-slave-trusty-1005 has been provisioned with role::ci::website instead of role::ci::slave::labs
* 18:41 Krinkle: Run `puppet agent -tv` on deployment-cache-text06 and deployment-mediawiki11
* 12:11 hashar: retriggering a bunch of browser tests hitting beta.wmflabs.org
* 18:37 Krinkle: Cherry-pick "mediawiki: Remove 'deployment.wikimedia' vhost from Beta Cluster" - https://gerrit.wikimedia.org/r/c/operations/puppet/+/684117
* 12:07 hashar: Puppet being fixed, it is finishing the installation of integration-slave-trusty-*** hosts
* 12:03 hashar: Browser tests against beta cluster were all failing due to an improper DNS resolver being applied on CI labs instances {{bug|T95273}}. Should be fixed now.
* 12:00 hashar: running puppet on all integration machines and resigning puppet client certs
* 11:31 hashar: integration-puppetmaster is back and operational with local puppet client working properly.
* 11:28 hashar: restored /etc/puppet/fileserver.conf
* 11:08 hashar: dishing out puppet SSL configuration on all integratio nodes. Can't figure out so lets restart from scratch
* 10:52 hashar: made puppetmaster certname = integration-puppetmaster.eqiad.wmflabs instead of the ec2 id :(
* 10:49 hashar: manually hacking integration-puppetmaster /etc/puppet/puppet.conf config file which is missing the [master] section
* 09:37 hashar: integration project has been switched to a new labs DNS resolver ( https://lists.wikimedia.org/pipermail/labs-l/2015-April/003585.html ) . It is missing the dnsmasq hack to resolve beta cluster URls to the instance IP instead of the public IP.  Causes a wild range of jobs to fail.
* 01:25 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/202300


== April 6 ==
== 2021-05-01 ==
* 23:19 bd808: Updated scap to f9b9a82 (emove exotic unicode from ascii logo)
* 19:19 James_F: Zuul: Add atagar to the CI allow list
* 22:34 legoktm: deployed https://gerrit.wikimedia.org/r/202229
* 10:37 Majavah: installing deployment-urldownloader03 to replace 02 - [[phab:T278641|T278641]]
* 20:55 legoktm: deploying https://gerrit.wikimedia.org/r/202233
* 04:05 Krinkle: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/684004
* 20:46 legoktm: deploying https://gerrit.wikimedia.org/r/202225
* 17:37 legoktm: deploying https://gerrit.wikimedia.org/r/201032
* 12:38 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/201984 https://gerrit.wikimedia.org/r/202020 https://gerrit.wikimedia.org/r/202026
* 04:20 legoktm: deploying https://gerrit.wikimedia.org/r/201669


== April 5 ==
== 2021-04-30 ==
* 11:13 Krinkle: New integration-slave-trusty-1001..1005 must remain unpooled. Provisioning failed. details at https://phabricator.wikimedia.org/T94916#1180522
* 20:13 dancy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/683987
* 10:48 Krinkle: Puppet on integration-puppetmaster has been failing for the past 2 days: "Failed when searching for node i-0000063a.eqiad.wmflabs: You must set the 'external_nodes' parameter to use the external node terminus" (=integraton-dev.eqiad.wmflabs)
* 19:21 James_F: Docker: Publishing mediawiki-phan-taint-check-demo:0.1.1 for [[phab:T257301|T257301]]
* 10:22 Krinkle: Creating integration-slave-trusty-1001-1005 per T94916.
* 14:21 Majavah: add profile::pki::client to all deployment-prep instances to trust deployment-prep cfssl certificates, already deployed on production
* 14:15 Majavah: revert above as it's not working, [[phab:T206158|T206158]]
* 14:13 Majavah: deployment-cache-text: trying out useusing HTTPS for backend traffic to deployment-mediawiki11 [[phab:T206158|T206158]]
* 12:37 Majavah: force reboot deployment-cache-text06, not letting me to log in, this will disrupt beta cluster availability
* 02:37 James_F: Docker: Publishing node10 images based on buster [[phab:T278203|T278203]] [[phab:T240955|T240955]]


== April 3 ==
== 2021-04-29 ==
* 23:47 greg-g: for Krinkle 23:31 "Finished npm upgrade on trusty slaves."
* 12:19 Majavah: dropping jade_diff_judgement, jade_diff_label, jade_revision_judgement, jade_revision_label tables on all-labs.dblist [[phab:T281418|T281418]]
* 23:08 Krinkle: Finished npm upgrade on precise slaves. Rolling trusty slaves now.
* 22:55 bd808: Updated scap to a1a5235 (Add a logo banner to scap)
* 21:31 Krinkle: Upgrading npm from v2.4.1 to v2.7.6 (rolling, slave by slave graceful)
* 21:11 ^d: puppet re-enabled on staging-palladium, running fine again
* 21:05 Krinkle: Delete unfinished/unpoooled instances integration-slave-precise-1011-1014. (T94916)
* 14:49 hashar: integration-slave-jessie-1001 : manually installed jenkins-debian-glue Debian packages. It is pending upload by ops to apt.wikimedia.org {{bug|T95006}}
* 12:56 hashar: installed zuul_2.0.0-304-g685ca22-wmf1precise1_amd64.deb on integration-slave-precise-101* instances
* 12:56 hashar: installed zuul_2.0.0-304-g685ca22-wmf1precise1_amd64.deb on integration-slave-precise-1011.eqiad.wmflabs
* 12:35 hashar: Switching Jessie slave from role::ci::slave::labs::common to role::ci::slave::labs  which will bring a whole lot of packages and break
* 12:28 hashar: integration-slave-jessie-1001 applying role::ci::slave::labs::common  to pool it as a very basic Jenkins slave
* 12:19 hashar: enabled puppetmaster::autosigner on integration-puppetmaster
* 11:58 hashar: Applied role::ci::slave::labs on integration-slave-precise-101[1-4] that Timo created earlier
* 11:58 hashar: Cherry picked a couple patches to fix puppet Package[] definitions issues
* 11:49 hashar: made integration-puppetmaster to self update its puppet clone
* 11:42 hashar: recreating integration-slave-precise-1011  stalled with a puppet oddity related to Package['gdb'] defined twice {{bug|T94917}}
* 11:30 hashar: integration-puppetmaster migrated down to Precise
* 11:23 hashar: rebooting integration-publisher : cant ssh to it
* 10:37 hashar: disabled some hiera configuration related to puppetmaster.
* 10:22 hashar: Created instance i-00000a4a with image "ubuntu-12.04-precise" and hostname i-00000a4a.eqiad.wmflabs.
* 10:21 hashar: downgrading integration-puppetmaster from Trusty to Precise https://phabricator.wikimedia.org/T94927
* 05:42 legoktm: deploying https://gerrit.wikimedia.org/r/200744
* 03:58 Krinkle: Jobs were throwing NOT_RECOGNISED.  Relaunched Gearman. Jobs are now happy again.
* 03:51 Krinkle: Jenkins is unable to re-establish Gearman connection. Have to force restart Jenkins master.
* 03:42 Krinkle: Reloading Jenking config repaired the broken references. However Jenkins is still unable to make new references properly. New builds are 404'ing the same way.
* 03:26 Krinkle: Reloading Jenkins configuration from disk
* 03:18 Krinkle: Build metadata exists properly at /var/lib/jenkins/jobs/:jobname/builds/:nr, but the "last*Build" symlinks are outdated.
* 03:12 Krinkle: As of 03:03, recent builds are mysteriously missing their entry in Jenkins. They show up on the dashbaord when running, but their build log is never published (url is 404). E.g. https://integration.wikimedia.org/ci/job/integration-docroot-deploy/105 and https://integration.wikimedia.org/ci/job/jshint/239
* 02:47 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/201644
* 00:31 greg-g: rm 'd .gitignore in /srv/mediawiki-staging/php-master/skins due to https://gerrit.wikimedia.org/r/#/c/200307/ clashing with a local untracked version


== April 2 ==
== 2021-04-28 ==
* 22:56 Krinkle: New integration-slave-precise-101x are unfinished and must remain depooled. See T94916.
* 15:27 James_F: Zuul: [mediawiki/libs/metrics-platform] Add pipeline-based CI jobs [[phab:T279180|T279180]]
* 22:53 Krinkle: Most puppet failures blocking T94916 may be caused by the fact that intergration-puppetmaster was inadvertently changed to Trusty; puppetmaster version of Trusty is not yet supported by ops
* 07:26 hashar: contint2001: sudo -u jenkins find *quibble* -path '*/archive/log/rawSeleniumVideoGrabs/*' -delete # [[phab:T249268|T249268]]
* 21:41 Krinkle: It seems integration-slave-jessie-1001 has role::ci::slave::labs::common instead of role::ci::slave::labs. Intentional?
* 07:26 hashar: contint2001: sudo -u jenkins find *quibble* -path '*/archive/log/rawSeleniumVideoGrabs/*' -delete
* 21:25 Krinkle: Re-creating integration-dev-slave-precise in preparation of re-creating precise slaves
* 07:19 hashar: contint2001: sudo -u jenkins find /srv/jenkins/builds/mediawiki-fresnel-patch-docker -name "*trace.json" -exec gzip <nowiki>{</nowiki><nowiki>}</nowiki> \+  # [[phab:T249268|T249268]]
* 14:51 hashar: applying role::ci::slave::labs::common on integration-slave-jessie-1001
* 01:19 dpifke: Cherry-picking https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886 in beta, should only affect deployment-webperf11.
* 14:49 hashar: integration: nice thing, newly created instances are automatically made to point to integration-pummetmaster via hiera! Just have to sign the certificate on the master using: puppet ca list ; puppet ca sign i-000xxxx.eqiad.wmflabs
* 14:42 hashar: Created [[Nova_Resource:I-00000a3b.eqiad.wmflabs|integration-slave-jessie-1001]] to try out CI slave on Jessie ([[phab:T94836]])
* 14:11 hashar: reduced integration-slave1004 executors from 6 to 5 to make it on par with the other precise slaves
* 14:10 hashar: integration-slave100[1-4] are now using Zuul provided by a Debian package as of https://gerrit.wikimedia.org/r/#/c/195272/ PS 16
* 14:04 hashar: uninstall the pip installed zuul version from Precise labs slaves by doing: pip uninstall zuul && rm /usr/local/bin/zuul* . Switching them all to a Debian package
* 13:45 hashar: pooling back integration-slave1001 and 1002 which are using zuul-cloner provided by a debian package
* 13:35 hashar: reloading Jenkins configuration files from disk to make it knows about a change manually applied to most jobs config.xml files for https://gerrit.wikimedia.org/r/#/c/201451/
* 13:01 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/201458
* 12:19 hashar: preventing job to run on integration-slave1001 by replacing its label with 'DoNotLabelThisSlaveHashar'. Going to install Zuul debian package on it
* 09:37 hashar: rebooting integration-zuul-server  homedir seems to be stalled/missing
* 08:12 hashar: upgrading packages on integration-dev
* 05:14 greg-g: and right when I log'd that, things seem to be recovering
* 05:12 greg-g: the shinken alerts about beta cluster issues are due to wmflabs having issues.


== April 1 ==
== 2021-04-27 ==
* 07:17 Krinkle: Creating integration-slave1410 as test. Will re-create our pool later today.
* 19:16 James_F: Docker: Rebuilding all Sury-php derivatives for [[phab:T277742|T277742]].
* 06:26 Krinkle: Apply puppetmaster::autosigner to integration-puppetmaster
* 17:52 Majavah: delete deployment-sessionstore03 [[phab:T263617|T263617]] [[phab:T278641|T278641]]
* 05:51 legoktm: deleting non-existent job workspaces from integration slaves
* 16:35 James_F: Docker: Publishing composer-scratch 1.10.22 and its cascade for [[phab:T281283|T281283]]
* 05:42 Krinkle: Free up space on integration-slave1001-1004 by removing obsolete phplint and qunit workspaces
* 14:18 hashar: Updating most jenkins jobs to change cleanup commands from stretch to buster {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/680476
* 02:05 Krinkle: Restarting Jenkins again..
* 12:44 hashar: Restarted CI Jenkins for plugins upgrade
* 01:35 legoktm: started zuul on gallium
* 12:24 hashar: Upgraded releases Jenkins from 2.263.3 to 2.277.2  (with ldap plugin 1.26)
* 01:00 Krinkle: Restarting Jenkins
* 12:11 hashar: Upgrading Jenkins plugins on the releases jenkins
* 01:00 Krinkle: Jenkins is unable to start Gearman connection (HTTP 503);
* 06:40 Majavah: installing deployment-sessionstore04 [[phab:T263617|T263617]]
* 01:00 Krinkle: Force restarted Zuul, didn't help
* 05:29 Majavah: restart cassandra on deployment-sessionstore03 refs [[phab:T281198|T281198]]
* 00:55 Krinkle: Jenkins stuck. Builds are queued in Zuul but nothing is sent to Jenkins.


== March 31 ==
== 2021-04-26 ==
* 21:00 greg-g: puppet-compiler02: This node is offline because Jenkins failed to launch the slave agent on it.
* 16:53 James_F: Zuul: Add AnjaliKumari to the CI allow list
* 20:15 legoktm: deploying https://gerrit.wikimedia.org/r/200926
* 18:48 legoktm: DEPLOYING https://gerrit.wikimedia.org/r/200327
* 15:44 thcipriani: primed keyholder on deployment-bastion to ensure jenkins-deploy can ssh
* 12:25 hashar: qa-morebots is back


== March 30 ==
== 2021-04-24 ==
* 22:58 legoktm: 1001-1003 were depooled, restarted and repooled. 1004 is depooled and restarted
* 17:47 James_F: Zuul: [mediawiki/extensions/MultimediaViewer] Drop Ruby selenium test job
* 22:40 legoktm: rebooting precise jenkins slaves
* 21:40 greg-g: Beta Cluster is down due to WMF Labs issues, being taken care of now (by Coren and Yuvi)
* 19:53 legoktm: deleted core dumps from integration-slave1001
* 19:11 legoktm: deploying https://gerrit.wikimedia.org/r/200646
* 16:29 jzerebecki: another damaged git repo integration-slave1001:~$ sudo -u jenkins-deploy rm -rf /mnt/jenkins-workspace/workspace/mwext-Wikibase-qunit/src/vendor/
* 16:07 jzerebecki: removing workspaces of deleted jobs integration-slave100{1,2,3,4}:~$ sudo -u jenkins-deploy rm -rf /mnt/jenkins-workspace/workspace/mwext-Wikibase-{client,repo,repo-api}-tests{,@*}
* 15:14 jzerebecki: integration-slave1001:~$ sudo -u jenkins-deploy rm -rf /mnt/jenkins-workspace/workspace/mwext-Wikibase-repo-api-tests-sqlite
* 15:05 jzerebecki: integration-slave1001:~$ sudo -u jenkins-deploy rm -rf /mnt/jenkins-workspace/workspace/mwext-Wikibase-repo-api-tests-mysql/src/extensions/cldr
* 14:36 jzerebecki: integration-slave1001:~$ sudo -u jenkins-deploy rm -rf /mnt/jenkins-workspace/workspace/mwext-Wikibase-*-tests{,@*}
* 13:06 jzerebecki: integration-slave1001:~$ sudo -u jenkins-deploy rm -rf /mnt/jenkins-workspace/workspace/mwext-Wikibase-client-tests@*
* 13:05 jzerebecki: integration-slave1001:~$ sudo -u jenkins-deploy rm -rf /mnt/jenkins-workspace/workspace/mwext-Wikibase-client-tests


== March 29 ==
== 2021-04-23 ==
* 07:29 legoktm: deploying https://gerrit.wikimedia.org/r/#/c/200333/
* 22:14 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/682029
* 07:07 legoktm: deploying https://gerrit.wikimedia.org/r/#/c/200332/
* 16:30 Majavah: remove deployment-prep hiera settings for phabricator, given there is no phabricator instance on that project
* 03:51 legoktm: deploying https://gerrit.wikimedia.org/r/200330
* 09:12 Majavah: signing puppet certs for deployment-eventlog08 and running puppet for the first time to stop annoying email alerts
* 03:09 legoktm: deploying https://gerrit.wikimedia.org/r/#/c/200329/
* 00:10 legoktm: deploying https://gerrit.wikimedia.org/r/#/c/200323/


== March 28 ==
== 2021-04-22 ==
* 04:02 bd808: manually updated beta-code-update-eqiad job to remove sudo to mwdeploy; needs associated jjb change for T94261
* 06:06 legoktm: reloading zuul to deploy https://gerrit.wikimedia.org/r/680697
* 02:53 Reedy: killed a few stuck beta ci jobs
* 02:51 Krinkle: The 'beta-mediawiki-config-update-eqiad' jobs have been stuck for ~ 8 hours
* 02:19 James_F: Zuul: Switch bundle-yard-publish jobs to Ruby 2.5 [[phab:T280874|T280874]]
* 01:49 James_F: Zuul: [mediawiki/vagrant] Add mediawiki-vagrant-ruby2.5-rake-docker as experimental [[phab:T280874|T280874]]
* 01:44 James_F: Docker: Publishing rake-vagrant-ruby2.5:0.1.0 for [[phab:T280874|T280874]]
* 00:45 James_F: Zuul: Add experimental Ruby 2.5 jobs for two repos [[phab:T280874|T280874]]


== March 27 ==
== 2021-04-21 ==
* 23:28 bd808: applied beta::autoupdater directly to deployment-bastion via wikitech interface
* 23:43 James_F: Zuul: [operations/puppet-lint/wmf_styleguide-check] Switch to Ruby 2.5
* 23:21 bd808: Duplicate declaration: Git::Clone[operations/mediawiki-config] is already declared in file /etc/puppet/modules/beta/manifests/autoupdater.pp:46; cannot redeclare at /etc/puppet/modules/scap/manifests/master.pp:22
* 23:25 James_F: Zuul: Provide experimental Ruby 2.5 rake jobs [[phab:T280874|T280874]]
* 23:01 bd808: restarted puppetmaster
* 22:56 James_F: Zuul: Add mwgate-ruby2.5-rake-docker experimentally to mwgate-rake
* 22:52 hashar: integration: jzerebecki addition and sudo policy  tracked for history purpose as {{bug|T94280}}
* 22:36 James_F: Docker: Publishing rake-ruby2.5:0.1.0 for [[phab:T280874|T280874]]
* 22:52 bd808: chown -R l10nupdate:wikidev /srv/mediawiki-staging/php-master/cache/l10n
* 18:47 James_F: Add ImageMap to the list of Parsoid's ext dependencies
* 22:44 bd808: deployment-bastion: chown -R jenkins-deploy:wikidev /srv/mediawiki-staging/
* 22:41 bd808: forcing puppet run on deployment-bastion
* 22:41 bd808: cherry-picked https://gerrit.wikimedia.org/r/#/c/200248/ and https://gerrit.wikimedia.org/r/#/c/199988/
* 22:40 hashar: integration: created sudo policy allowing members to run any command as jenkins-deploy on all hosts.
* 22:40 hashar: added jzerebecki to the integration labs project as a normal member
* 22:34 hashar: integration-slave1001 rm -fR mwext-Wikibase-repo-api-tests/src/vendor
* 21:13 greg-g: things be better
* 20:56 greg-g: Beta Cluster is down, known
* 18:50 marxarelli: running `jenkins-jobs update` to update 'browsertests-UploadWizard-*' with Id33ffde07f0c15e153d52388cf130be4c59b4559
* 17:50 legoktm: deleted core dumps from integration-slave1002
* 17:48 legoktm: marked integration-slave1002 as offline, /var filled up
* 05:42 legoktm: marked integration-slave1001 as offline due to https://phabricator.wikimedia.org/T94138


== March 26 ==
== 2021-04-20 ==
* 23:47 legoktm: deploying https://gerrit.wikimedia.org/r/200069
* 07:19 CFisch_WMDE: enable changes to the descriptions in the VE transclusion dialog ([[phab:T273425|T273425]])
* 19:22 bd808: Manually added missing !log entries from 2015-03-25 from my bouncer logs
* 07:17 CFisch_WMDE: enable suggested values paramter in TemplateData and VisualEditor ([[phab:T271825|T271825]])
* 17:14 greg-g: jobs appear to be processing according to zuul, the Jenkins UI just takes forever to load, apparently
* 17:12 greg-g: "Please wait while Jenkins is getting ready to work"
* 17:08 greg-g: 0:07 <      robh> kill -9 and restarted per instrucitons
* 16:53 greg-g: Still.... "Please wait while Jenkins is restarting..."
* 16:49 greg-g: "Please wait while Jenkins is restarting..."
* 16:39 greg-g: going to do a safe-restart of Jenkins https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Restart_all_of_Jenkins
* 16:38 greg-g: nothing executing on deployment-bastion, that is
* 16:38 greg-g: same, nothing executing
* 16:37 greg-g: did that checklist once, jobs still not executing, doing again
* 16:32 greg-g: I'll start going through the checklist at https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code.2Fdb_update
* 16:30 hashar: deadlock on deployment-bastion slave. Someone need to restart Jenkins :(
* 13:25 hashar: yamllint job fixed by altering the label https://gerrit.wikimedia.org/r/#/c/199876/
* 13:17 hashar: Changes blocked because there is nothing able to run yamllint  ( zuul-gearman.py status|grep build:yamllint  ,  shows 8 jobs pending and no worker available)


== March 25 ==
== 2021-04-19 ==
* 23:23 bd808: chown -R jenkins-deploy:project-deployment-prep /srv/mediawiki-staging/php-master/cache/gitinfo
* 23:04 James_F: Zuul: Add legacy-quibble-rubyselenium-docker as experimental [[phab:T280491|T280491]]
* 23:14 bd808: chown -R l10nupdate:project-deployment-prep /srv/mediawiki-staging/php-master/cache/l10n
* 17:58 Majavah: apply hack (https://phabricator.wikimedia.org/T277206#7015609) to deployment-puppetmaster04 to unbreak maintenance scripts until we have conftool
* 23:14 bd808: chown -R l10nupdate:project-deployment-prep /srv/mediawiki-staging/php-master/cache/l10n
* 15:24 James_F: Re-pushing mwselenium-quibble-docker back to master for [[phab:T280491|T280491]]
* 23:04 bd808: chown -R mwdeploy:project-deployment-prep /srv/mediawiki-staging
* 22:58 bd808: File permissions in deployment-bastion:/srv/mediawiki-staging as part mwdeploy:mwdeploy and part mwdeploy:project-deployment-prep and part jenkins-deploy:project-deployment-prep
* 21:52 legoktm: deploying https://gerrit.wikimedia.org/r/199736
* 18:49 legoktm: deploying https://gerrit.wikimedia.org/r/196745
* 15:13 bd808: Updated scap to include 4a63a63 (Copy l10n CDB files to rebuildLocalisationCache.php tmp dir)
* 03:44 legoktm: deploying https://gerrit.wikimedia.org/r/199555 and https://gerrit.wikimedia.org/r/199559
* 00:52 Krinkle: Restarted Jenkins-Gearman connection
* 00:50 Krinkle: Jenkins is unable to start Gearman connection (HTTP 503); Restarting Jenkins.
* 00:32 legoktm: disabling/enabling gearman in jenkins


== March 24 ==
== 2021-04-17 ==
* 23:32 Krinkle: Force restart Zuul
* 07:23 Majavah: restart uwsgi-ores on deployment-ores01 for [[phab:T280420|T280420]]
* 22:25 hashar: marked gallium and lanthanum slaves as temp offline, then back. Seems to have cleared some Jenkins internal state and resumed the build
* 21:55 bd808: Ran trebuchet for scap to keep cherry-pick of I01b24765ce26cf48d9b9381a476c3bcf39db7ab8 on top of active branch; puppet was forcing back to prior trebuchet sync tag
* 21:42 hashar: Reconfigured [https://integration.wikimedia.org/ci/view/Beta/job/mediawiki-core-code-coverage/ mediawiki-core-code-coverage]
* 21:22 hashar: Zuul gate is deadlocked for up to half an hour due to change being force merged :(
* 21:15 hashar: beta: deleted untracked file /srv/mediawiki-staging/php-master/extensions/.gitignore . That fixed the Jenkins job https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/
* 20:31 twentyafterfour: sudo ln -s /srv/l10nupdate/ /var/lib/
* 20:31 twentyafterfour: sudo mv /var/lib/l10nupdate/ /srv/
* 20:28 bd808: deployment-bastion -- rm -r pacct.1.gz pacct.2.gz pacct.3.gz pacct.4.gz pacct.5.gz pacct.6.gz
* 20:24 bd808: Deleted junk in deployment-bastion:/tmp
* 18:57 legoktm: deploying https://gerrit.wikimedia.org/r/199305
* 18:25 legoktm: deploying https://gerrit.wikimedia.org/r/199216
* 17:06 legoktm: deploying https://gerrit.wikimedia.org/r/199273
* 11:23 hashar: beta-scap-eqiad keeps regenerating l10n cache https://phabricator.wikimedia.org/T93737 
* 08:35 hashar: restarting Jenkins for some plugins upgrades
* 08:07 legoktm: deployed https://gerrit.wikimedia.org/r/199190
* 07:21 legoktm: deploying https://gerrit.wikimedia.org/r/199205
* 07:17 legoktm: deploying https://gerrit.wikimedia.org/r/199204
* 07:08 legoktm: deploying https://gerrit.wikimedia.org/r/199201
* 06:46 legoktm: freed ~6G on lanthanum by deleting mediawiki-extensions-zend* worksapces
* 05:04 legoktm: deleting workspaces of jobs that no longer exist in jjb on lathanum
* 04:11 legoktm: deploying https://gerrit.wikimedia.org/r/198792
* 03:14 Krinkle: Deleting old job workspaces on gallium not touched since 2013
* 02:42 Krinkle: Restarting Zuul, wikimedia-fundraising-civicrm is stuck as of 46min ago waiting for something already merged
* 02:32 legoktm: toggling gearman off/on in jenkins
* 01:47 twentyafterfour: deployed scap/scap-sync-20150324-014257 to beta cluster
* 00:23 Krinkle: Restarted Zuul


== March 23 ==
== 2021-04-16 ==
* 23:18 hasharDinner: Stopping Jenkins for an upgrade
* 23:20 James_F: Docker: Publishing quibble-buster-php73-coverage version with performance tuning config tweaks [[phab:T234020|T234020]] [[phab:T280167|T280167]]
* 23:16 legoktm: deleting mwext-*-lint* workspaces on gallium, shouldn't be needed
* 22:39 James_F: Docker: Publishing quibble-buster-php72-bundle
* 23:11 legoktm: deleting mwext-*-qunit* workspaces on gallium, shouldn't be needed
* 22:00 James_F: Docker: Publishing quibble-fresnel based on buster not stretch [[phab:T278203|T278203]]
* 23:07 legoktm: deleting mwext-*-lint workspaces on gallium, shouldn't be needed
* 19:43 James_F: Zuul: Drop the now duplicate PHP72 'buster' quibble jobs [[phab:T252432|T252432]]
* 23:00 legoktm: lanthanum is now online again, with 13G free disk space
* 19:11 Krinkle: Remove `profile::mediawiki::install_hhvm: false` Hiera config in Horizon for deployment-prep. This variable is no longer used. ref [[phab:T235142|T235142]]
* 22:58 legoktm: deleting mwext-*-qunit* workspaces on lanthanum, shouldn't be needed any more
* 19:06 Krinkle: Change profile::mail::mx::verp_post_connect_server in Horizon for deployment-prep from `deployment.wikimedia.beta.wmflabs.org` to `meta.wikimedia.beta.wmflabs.org`, ref [[phab:T198673|T198673]]
* 22:54 legoktm: deleting mwext-*-qunit-mobile workspaces on lanthanum, shouldn't be needed any more
* 19:05 Krinkle: Change profile::mail::mx::verp_bounce_post_url in Horizon for deployment-prep from `http://deployment.wikimedia.beta.wmflabs.org/w/api.php` to `http://meta.wikimedia.beta.wmflabs.org/w/api.php`, ref [[phab:T198673|T198673]]
* 22:48 legoktm: deleting mwext-*-lint workspaces on lanthanum, shouldn't be needed any more
* 18:47 Krinkle: Delete forceupdate.beta.wmflabs.org from DNS for deployment-prep (created 2020-03-18, comment "I'm going to delete this in a moment")
* 22:45 legoktm: took lanthanum offline in jenkins
* 17:44 dancy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/680392
* 20:59 bd808: Last log copied from #wikimedia-labs
* 17:02 dancy: Updating dev-images docker-pkg files on primary contint
* 20:58 bd808: 20:41 cscott deployment-prep updated OCG to version 11f096b6e45ef183826721f5c6b0f933a387b1bb
* 16:46 James_F: Zuul: Make php72_buster jobs voting for extension-quibble template [[phab:T252434|T252434]]
* 19:28 YuviPanda: created staging-rdb01.eqiad.wmflabs
* 19:19 YuviPanda: disabled puppet on staging-palladium to test a puppet patch
* 18:41 legoktm: deploying https://gerrit.wikimedia.org/r/198762
* 13:11 hashar: and I restarted qa-morebots a minute or so ago (see https://wikitech.wikimedia.org/wiki/Morebots#Example:_restart_the_ops_channel_morebot )
* 13:11 hashar: Jenkins: deleting unused jobs mwext-.*-phpcs-HEAD and mwext-.*-lint


== March 21 ==
== 2021-04-15 ==
* 17:53 legoktm: deployed https://gerrit.wikimedia.org/r/198503
* 18:26 paladox: gerrit: created openstack/horizon/trove-dashboard per andrewbogott (with parent set as openstack/horizon/horizon)
* 00:02 Krinkle: Reestablished Jenkins-Gearman connection
* 16:47 Majavah: manually rebase deployment-puppetmaster04 due to local hacks having conflicts


== March 20 ==
== 2021-04-14 ==
* 23:08 marxarelli: Reloading Zuul to deploy I693ea49572764c96f5335127902404167ca86487
* 16:19 James_F: Docker: Publish quibble-buster-php73-coverage fixing loading of pcov [[phab:T234020|T234020]]
* 22:50 marxarelli: Running `jenkins-jobs update` to create job mediawiki-vagrant-bundle17-yard-publish
* 19:00 Krinkle: Reloading Zuul to deploy  https://gerrit.wikimedia.org/r/198276
* 17:17 Krinkle: Reloading Zuul to deploy I5edff10a4f0
* 12:32 mobrovac: deployment-salt ops/puppet: un-cherry-picked I48b1a139b02845c94c85cd231e54da67c62512c9
* 12:30 mobrovac: deployment-prep disabled puppet on deployment-restbase[1,2] until https://gerrit.wikimedia.org/r/#/c/197662/ is merged
* 08:36 mobrovac: deployment-salt ops/puppet: cherry-picking I48b1a139b02845c94c85cd231e54da67c62512c9
* 04:57 legoktm: deployed https://gerrit.wikimedia.org/r/198184
* 00:21 legoktm: deployed https://gerrit.wikimedia.org/r/198161
* 00:14 legoktm: deployd https://gerrit.wikimedia.org/r/198160


== March 19 ==
== 2021-04-13 ==
* 23:59 legoktm: deployed https://gerrit.wikimedia.org/r/198154
* 17:00 halfak: failed deploy to ORES (connection to host failed)
* 21:48 hashar: Jenkins: depooled/repooled lanthanum slave, it was no more processing any jobs.
* 16:57 halfak: deploying ores {{Gerrit|f08a3cb}}
* 14:09 hashar: Further updated our JJB fork to upstream commit 4bf020e07 which version 1.1.0-3
* 16:41 marxarelli: deleting errant wmf/1.36.0-wmf.39 branches in mediawiki/core and submodule repos
* 13:22 hashar: refreshed our JJB fork 7ad4386..8928b66 . No difference in our jobs.
* 12:26 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/678829 https://gerrit.wikimedia.org/r/678830 [[phab:T280004|T280004]]
* 11:25 hashar: refreshing configuration of all beta* jenkins jobs
* 07:46 awight: enable syntax highlighting line numbering on all namespaces ([[phab:T267911|T267911]])
* 06:18 legoktm: deployed https://gerrit.wikimedia.org/r/197860 & https://gerrit.wikimedia.org/r/197858
* 05:20 legoktm: deleting 'mediawiki-ruby-api-bundle-*' 'mediawiki-selenium-bundle-*' 'mwext-*-bundle-*' jobs
* 05:06 legoktm: deployed https://gerrit.wikimedia.org/r/197853
* 00:57 Krinkle: Reloading Zuul to deploy Ie1d7bf114b34f9


== March 18 ==
== 2021-04-12 ==
* 17:52 legoktm: deployed https://gerrit.wikimedia.org/r/197674 and https://gerrit.wikimedia.org/r/197675
* 15:46 Urbanecm: deployment-prep: Run `mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php` on all beta wikis with GrowthExperiments installed (wikis that are both in all-labs and growthexperiments, plus enwiki; [[phab:T279853|T279853]])
* 17:27 legoktm: deployed https://gerrit.wikimedia.org/r/197651
* 15:40 Urbanecm: deployment-prep: urbanecm@deployment-deploy01:~$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=cswiki # [[phab:T279853|T279853]]
* 15:20 hashar: setting gallium # of executors from 5 back to 3.  When jobs run on it that slowdown the zuul scheduler and merger!
* 14:39 Majavah: remove https://gerrit.wikimedia.org/r/c/operations/puppet/+/263024 cherry pick from beta cluster per [[phab:T106915|T106915]]#6279270 - [[phab:T135427|T135427]]
* 15:06 legoktm: deployed https://gerrit.wikimedia.org/r/194990
* 14:38 Majavah: fix parsoid CI ferm rule local hack puppet patch on deployment-puppetmaster04 after it broke due to operations/puppet changes
* 02:02 bd808: Updated scap to I58e817b (Improved test for content preceeding <?php opening tag)
* 11:30 Urbanecm: deployment-prep: Beta is down due to my change, fix on its way (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/678578)
* 01:48 marxarelli: memory usage, swap, io wait seem to be back to normal on deployment-salt and kill/start of puppetmaster
* 01:45 marxarelli: kill 9'd puppetmaster processes on deployment-salt after repeated attempts to stop
* 01:28 marxarelli: restarting salt master on deployment-salt
* 01:20 marxarelli: deployment-salt still unresponsive, lot's of io wait (94%) + swapping
* 00:32 marxarelli: seeing heavy swapping on deployment-salt; puppet processes using 250M+ memory each


== March 17 ==
== 2021-04-11 ==
* 21:42 YuviPanda: recreated staging-sca01, let’s wait and see if it just automagically configures itself :)
* 14:44 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/678378
* 21:40 YuviPanda: deleted staging-sca01 because why not :)
* 00:52 James_F: dockerfiles: [quibble-buster-php73-coverage] Switch from xdebug to pcov [[phab:T234020|T234020]]
* 17:52 Krinkle: Reloading Zuul to deploy I206c81fe9bb88feda6
* 00:08 James_F: Zuul: [mediawiki/core] Enforce PHP 8.0 composer test for REL1_3<nowiki>{</nowiki>5,6<nowiki>}</nowiki> [[phab:T274971|T274971]]
* 16:28 bd808: Updated scap to include I61dcf7ae6d52a93afc6e88d3481068f09a45736d (Run rebuildLocalisationCache.php as www-data)
* 00:02 James_F: Docker: Publishing quibble-buster images [[phab:T252432|T252432]]
* 16:25 bd808: chown -R trebuchet:wikidev && chmod -R g+rwX deployment-bastion:/srv/deployment/scap/scap
* 16:16 YuviPanda: created staging-sca01
* 14:39 hashar: me versus debian packaging tool chain http://xkcd.com/1168/  
* 09:24 hashar: deleted operations-puppet-validate
* 09:21 hashar: deleted mwext-Wikibase-lint job, not triggered anymore


== March 16 ==
== 2021-04-10 ==
* 21:55 legoktm: deployed https://gerrit.wikimedia.org/r/197213
* 17:36 James_F: Zuul: Add Meno25 to &email_allowlist list
* 21:25 legoktm: deployed https://gerrit.wikimedia.org/r/#/c/196095/
* 18:50 legoktm: deployed https://gerrit.wikimedia.org/r/197109
* 18:38 legoktm: deployed https://gerrit.wikimedia.org/r/196743 & https://gerrit.wikimedia.org/r/196746
* 18:24 legoktm: deleted rcstream-* jobs
* 18:11 legoktm: deployed https://gerrit.wikimedia.org/r/197094
* 10:02 hashar: restarting Jenkins
* 02:00 legoktm: deleting all 'mwext-*-composer-*' jobs that should never have been used it


== March 15 ==
== 2021-04-09 ==
* 07:39 legoktm: deleting non-generic, unused *-rubylint1.9.3lint & *-ruby2.0lint jobs
* 18:41 Krinkle: Logstash in beta shows no messages from mediawiki for 24 hours, [[phab:T233134|T233134]].
* 00:56 Krinkle: Reload Zuul to deploy Idb2f15a94a67
* 18:39 Krinkle: Logstash in beta has no messages for 24 hours.
* 12:59 addshore: reload zuul for https://gerrit.wikimedia.org/r/673503 and https://gerrit.wikimedia.org/r/678242 [[phab:T277750|T277750]] phan for wikibase release branches
* 09:51 Majavah: deleting deployment-jobrunner03 [[phab:T278664|T278664]]
* 03:32 James_F: Docker: Publishing new quibble images with pcov instead of xdebug for test coverage [[phab:T234080|T234080]]


== March 14 ==
== 2021-04-08 ==
* 03:52 legoktm: deployed https://gerrit.wikimedia.org/r/196540
* 15:57 James_F: Zuul: [mediawiki/extensions/WikiToLDAP] Add quibble and phan job
* 06:52 Majavah: deployment-docker-cpjobqueue01 edit configuration to pick up https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/676563


== March 13 ==
== 2021-04-07 ==
* 01:49 legoktm: deleted a bunch of unused *-tox-* jobs
* 16:03 James_F: Zuul: Add Bharatkhatri in the CI allow list
* 01:03 legoktm: deployed https://gerrit.wikimedia.org/r/191063 & https://gerrit.wikimedia.org/r/196505
* 16:03 James_F: Zuul: [wikidata/query-builder] Add gate-and-submit-l10n template
* 00:17 Krinkle: Reloading Zuul to deploy I46c60d520
* 15:12 Majavah: remove jessie-deployment-prep from deployment-deploy01 aptly
* 14:27 Majavah: delete deployment-mediawiki-07 and deployment-parsoid11 [[phab:T278664|T278664]]
* 00:57 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/677388
* 00:43 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/677382


== March 12 ==
== 2021-04-06 ==
* 23:34 Krinkle: Depooling integration-slave1402 to play with T92351
* 20:26 James_F: Zuul: [mediawiki/extensions/D3Loader] Mark as archived [[phab:T277626|T277626]]
* 20:26 Krinkle: Restablished Gearman connection from Zuul due to deadlock
* 20:26 James_F: Zuul: [mediawiki/extensions/Gravatar] Add basic quibble and phan jobs [[phab:T279260|T279260]]
* 17:39 YuviPanda: killll deployment-rsync01, wasn’t being used for anything discernable, and that’s not how proxies work in prod
* 19:58 James_F: Zuul: Configure the REL1_36 test and gate pipelines [[phab:T279459|T279459]]
* 15:31 Krinkle: Reloading Zuul to deploy Ia289ebb0
* 17:58 James_F: Zuul: [mediawiki/services/function-<nowiki>{</nowiki>orchestr,evalu<nowiki>}</nowiki>ator] Publish images
* 15:22 Krinkle: Fix Jenkins UI (was stuck in German)
* 15:05 YuviPanda: jenkins loves german again
* 07:11 YuviPanda: scap still failing on beta, I'll check when I'm back from lunch
* 07:11 YuviPanda: rebooted puppetmaster, was dead


== March 11 ==
== 2021-04-05 ==
* 19:47 legoktm: deployed https://gerrit.wikimedia.org/r/195990
* 18:20 brennen: resizing gitlab-ansible-test to g3.cores8.ram16.disk20
* 15:11 Krinkle: Jenkins UI in German, again
* 17:45 brennen: halting gitlab-test for resize
* 14:05 Krinkle: Jenkins web dashboard is in German
* 11:02 hashar: created integration-zuul-packaged.eqiad.wmflabs to test out the Zuul debian package
* 09:07 hashar: Deleted refs/heads/labs branch in integration/zuul.git
* 09:01 hashar: https://gerrit.wikimedia.org/r/#/c/195287/
* 09:01 hashar: made Zuul clone on labs to use the master branch instead of the labs one. There is no point in keeping separate ones anymore


== March 10 ==
== 2021-04-02 ==
* 15:22 apergos: after update of salt in deployment-prep git deploy restart is likely broken. details; https://phabricator.wikimedia.org/T92276
* 10:53 Majavah: change deployment-wikifeeds01 config to use deployment-mediawiki11
* 14:50 Krinkle: Browsertest job was stuck for > 10hrs. Jobs should not be allowed to run that long.
* 10:47 Majavah: update web proxy parsoid-beta.wmflabs.org to point to deployment-parsoid12


== March 9 ==
== 2021-04-01 ==
* 23:57 legoktm: deployed https://gerrit.wikimedia.org/r/195486
* 16:16 Majavah: hard reboot unresponsive deployment-cache-text06
* 22:49 Krinkle: Reloading Zuul to deploy I229d24c57d90ef
* 12:52 Majavah: update floating ip 185.15.56.9 from deployment-parsoid11 to deployment-parsoid12
* 20:37 legoktm: doing the gearman shuffle dance thing
* 11:00 Majavah: restart changeprop container on deployment-docker-mobileapps01 to pick up config changes
* 19:42 Krinkle: Reloading Zuul to deploy I48cb4db87
* 10:45 hashar: Updating all Jenkins jobs with jjb to deploy https://gerrit.wikimedia.org/r/676298
* 19:35 Krinkle: Delete integration-slave1010
* 19:31 Krinkle: Restarted slave agent on gallium
* 19:30 Krinkle: Re-established Gearman connection from Jenkins


== March 8 ==
== 2021-03-30 ==
* 17:40 Krinkle: Delete integration-slave1006, integration-slave1007 and integration-slave1008
* 18:05 Majavah: remove <nowiki>{</nowiki>trysty,precise<nowiki>}</nowiki>-deployment-prep repos from deployment-deploy01 aptly
* 00:06 legoktm: deployed https://gerrit.wikimedia.org/r/195072
* 17:51 Majavah: arm deployment-deploy01 keyholder with all the keys
* 14:50 Majavah: cherry pick 675807 675814 and 675815 to deployment-puppetmaster to unblock work on deployment-deploy03 until sre has merged those [[phab:T278689|T278689]]
* 14:44 Majavah: remove deployment-puppetmaster04 local patch adding releng/phatality to scap to see if it unbreaks deployment-deploy03 puppet runs
* 13:55 Majavah: cherry pick https://gerrit.wikimedia.org/r/c/operations/puppet/+/675802/ on beta to unblock my progress until merged
* 13:35 Majavah: create and install deployment-deploy03 [[phab:T278689|T278689]]
* 13:17 Majavah: armed deployment-cumin keyholder, found passphrase at deployment-puppetmaster04:/var/lib/git/labs/private/files/ssh/tin/cumin_rsa.passphrase
* 07:26 Majavah: shutoff deployment-mediawiki-09 [[phab:T278664|T278664]]
* 06:25 Majavah: switch w-beta.wmflabs.org web proxy to deployment-mediawiki11
* 06:18 Majavah: restart restbase on deployment-restbase03 to pick up config changes to use deployment-mediawiki11


== March 7 ==
== 2021-03-29 ==
* 22:10 legoktm: deployed https://gerrit.wikimedia.org/r/195069
* 15:37 Majavah: hard reboot deployment-sessionstore03 [[phab:T263617|T263617]]
* 14:44 Krinkle: Depool integration-slave1008 and integration-slave1010 (not deleting yet, just in case)
* 15:16 Majavah: manually run puppet on deployment-sessionstore03, starting Cassandra (which was stopped) [[phab:T263617|T263617]]
* 14:43 Krinkle: Depool integration-slave1006 and integration-slave1007 (not deleting yet, just in case)
* 13:04 Majavah: cherry pick https://gerrit.wikimedia.org/r/c/operations/puppet/+/675503/ on deployment-puppetmaster04 ([[phab:T278664|T278664]]), also apply same change on horizon. this will switch traffic from deployment-mediawiki-07 to deployment-mediawiki11
* 14:41 Krinkle: Pool integration-slave1404
* 10:29 Majavah: remove deployment-mediawiki10, too much live debugging, not in use
* 14:35 Krinkle: Reloading Zuul to deploy I864875aa4acc
* 09:56 Majavah: taavi@deployment-mediawiki10:~$ sudo ln -s /usr/local/share/ca-certificates/Puppet_Internal_CA.crt /etc/ssl/certs/aeffde42.0 && sudo update-ca-certificates
* 06:28 Krinkle: Reloading Zuul to deploy I8d7e0bd315c4fc2
* 09:29 Urbanecm: Manually run puppet on mediawiki10
* 04:53 Krinkle: Reloading Zuul to deploy I585b7f026
* 09:28 Urbanecm: Re-enable puppet on mediawiki10
* 04:51 Krinkle: Pool integration-slave1403
* 08:49 Urbanecm: DIsable puppet on mediawiki10 - investigating failing curl certificate check
* 03:55 Krinkle: Pool integration-slave1402
* 06:46 Majavah: cherry-pick https://gerrit.wikimedia.org/r/c/operations/puppet/+/675357/ on deployment-puppetmaster04 - [[phab:T278664|T278664]]
* 03:31 Krinkle: Reloading Zuul to deploy I30131a32c7f1
* 05:40 Majavah: move role::labs::lvm::srv puppet classes from deployment-mediawiki- prefix to current individual appservers, [[phab:T278664|T278664]]
* 02:59 James_F: Pushed Ib4f6e9 and Ie26bb17 to grrrit-wm and restarted
* 02:54 Krinkle: Reloading Zuul to deploy Ia82a0d45ac431b5


== March 6 ==
== 2021-03-26 ==
* 23:30 Krinkle: Pool integration-slave1401
* 08:16 hashar: reloading zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/675004
* 22:24 Krinkle: Re-establishing Gearman connection from Jenkins (deployment-bastion was deadlocked)
* 07:05 Majavah: delete remaining shutdown deployment-prep jessies: deployment-sca[01-02], deplyoment-logstash2 ([[phab:T218729|T218729]])
* 22:16 Krinkle: beta-scap-eqiad is has been waiting for 50minutes for an executor on deployment-bastion.eqiad (which has 5/5 slots idle)
* 21:36 Krinkle: Provisioning integration-slave1401 - integration-slave1404
* 20:14 legoktm: deployed https://gerrit.wikimedia.org/r/194939 for reals this time
* 20:12 legoktm: deployed https://gerrit.wikimedia.org/r/194939
* 18:22 ^d: staging: set has_ganglia to false in hiera
* 16:57 legoktm: deployed https://gerrit.wikimedia.org/r/194892
* 16:40 Krinkle: Jenkins auto-depooled integration-slave1008 due to low /tmp space. Purged /tmp/npm-* to bring back up.
* 16:27 Krinkle: Delete integration-slave1005
* 09:17 hasharConf: Jenkins: upgrading and restarting. Wish me luck.
* 06:29 Krinkle: Re-creating integration-slave1401 - integration-slave1404
* 02:21 legoktm: deployed https://gerrit.wikimedia.org/r/194340
* 02:12 Krinkle: Pooled integration-slave1405
* 01:52 legoktm: deployed https://gerrit.wikimedia.org/r/194461


== March 5 ==
== 2021-03-25 ==
* 22:01 Krinkle: Reloading Zuul to deploy I97c1d639313b
* 20:20 marxarelli: reloading zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/674964
* 21:15 hashar: stopping Jenkins
* 16:24 Majavah: install openssl security update and restart trafficserver-tls on deployment-cache-*
* 21:08 hashar: killing browser tests running
* 03:24 legoktm: reloaded zuul for https://gerrit.wikimedia.org/r/674749
* 20:48 Krinkle: Re-establishing Gearman connection from Jenkins
* 20:44 Krinkle: Deleting integration-slave1201-integration-slave1204, and integration-slave1401-integration-slave1404.
* 20:18 Krinkle: Finished creation and provisioning of integration-slave1405
* 19:34 legoktm: deploying https://gerrit.wikimedia.org/r/194461, lots of new jobs
* 18:50 Krinkle: Re-creating integration-slave1405
* 17:52 twentyafterfour: pushed wmf/1.25wmf20 branch to submodule repos
* 16:18 greg-g: now there are jobs running on the zuul status page
* 16:16 greg-g: getting "/zuul/status.json: Service Temporarily Unavailable" after the zuul restart
* 16:12 ^d: restarted zuul
* 16:06 greg-g: jenkins doesn't have anything queued and is processing jobs apparently, not sure why zuul is showing two jobs queued for almost 2 hours (one with all tests passing, the other with nothing tested yet)
* 16:04 greg-g: not sure it helped
* 16:02 greg-g: about to disconnect/reconnect gearman per https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Known_issues
* 00:34 legoktm: deployed https://gerrit.wikimedia.org/r/194421


== March 4 ==
== 2021-03-24 ==
* 17:34 Krinkle: Depooling all new integation-slave12xx and integration-slave14xx instances again (See T91524)
* 07:42 Majavah: remove deployment-logstash2 hiera from horizon, instahce was shut off earlier by moritzm [[phab:T238707|T238707]]
* 17:11 Krinkle: Pooled integration-slave1201, integration-slave1202, integration-slave1203, integration-slave1204
* 17:06 Krinkle: Pooled integration-slave1402, integration-slave1403, integration-slave1404, integration-slave1405
* 16:56 Krinkle: Pooled integration-slave1401
* 16:26 Krinkle: integration-slave12xx and integration-slave14xx are now provisioned. Old slaves will be depooled later and eventually deleted.


== March 3 ==
== 2021-03-23 ==
* 22:00 hashar: reboot integration-puppetmaster in case it solves a NFS mount issue
* 20:46 James_F: Zuul: [mediawiki/extensions/CopyToClipboard] Archive per [[phab:T274015|T274015]]
* 20:33 legoktm: manually created centralauth.users_to_rename table
* 20:44 James_F: Zuul: [mediawiki/extensions/Wikibase] Add experimental Postgres job [[phab:T207226|T207226]]
* 18:28 Krinkle: Lots of Jenkins builds are stuck even though they're "Finished". All services look up. (Filed T91430.)
* 19:33 James_F: Zuul: [operations/homer/public] Add bespoke tox-publish job
* 17:18 Krinkle: Reloading Zuul to deploy Icad0a26dc8 and Icac172b16
* 04:45 James_F: dockerfiles: [quibble-buster] Switch npm to our own build, and cascade [[phab:T252434|T252434]]
* 15:39 hashar: cancelled logrotate update of all jobs since that seems to kill the Jenkins/Zuul gearman connection. Probably because all jobs are registered on each config change.
* 02:39 James_F: Zuul: Make php72_buster jobs voting for skin-quibble template [[phab:T252434|T252434]]
* 15:31 hashar: updating all jobs in Jenkins based on PS2 of https://gerrit.wikimedia.org/r/194109
* 02:23 James_F: Zuul: [mediawiki/vendor] Make php72_buster jobs voting for master branch [[phab:T252434|T252434]]
* 10:56 hashar: Created instance i-000008fb with image "ubuntu-14.04-trusty" and hostname i-000008fb.eqiad.wmflabs.
* 01:35 James_F: Zuul: [mediawiki/core] Make php72_buster jobs voting for master branch [[phab:T252434|T252434]]
* 10:52 hashar: deleting integration-puppetmaster to recreate it with a new image {bug|T87484} . Will have to reapply I5335ea7cbfba33e84b3ddc6e3dd83a7232b8acfd  and I30e5bfeac398e0f88e538c75554439fe82fcc1cf
* 00:03 brennen: re-associating floating IP for gitlab-test to gitlab-ansible-test box for speed & function use
* 03:47 Krinkle: git-deploy: Deploying integration/slave-scripts 05a5593..1e64ed9
* 01:11 marxarelli: gzip'd /var/log/account/pacct.0 on deployment-bastion to free space


== March 2 ==
== 2021-03-22 ==
* 21:35 twentyafterfour: <Krenair> (per #mediawiki-core, have deleted the job queue key in redis, should get regenerated. also cleared screwed up log and restarted job runner service)
* 23:06 James_F: Zuul: [labs/tools/majavah-bot] Run generic tox tests
* 15:39 Krinkle: Removing /usr/local/src/zuul from integration-slave12xx and integration-slave14xx to let puppet re-install zuul-cloner (T90984)
* 19:28 James_F: Zuul: [mediawiki/services/function-orchestrator] Add code coverage job
* 13:39 Krinkle: integration-slave12xx and integration-slave14xx instances still depooled due to T90984
* 12:07 Majavah: delete deployment-restbase[01-02], [[phab:T250574|T250574]]
* 11:36 dcaro: Created subzone svc.deployment-prep.eqiad1.wikimedia.cloud. ([[phab:T276624|T276624]])
* 11:33 dcaro: Created subzone beta.wmcloud.org ([[phab:T276624|T276624]])


== February 27 ==
== 2021-03-19 ==
* 21:58 Krinkle: Ragekilled all queued jobs related to beta and force restarted Jenkins slave agent on deployment-bastion.eqiad
* 15:11 dpifke: Re-cherry-picking https://gerrit.wikimedia.org/r/c/operations/puppet/+/623068 in beta.
* 21:56 Krinkle: Job beta-update-databases-eqiad and node deployment-bastion.eqiad have been stuck for the past 4 hours
* 13:35 addshore: reload zuul for https://gerrit.wikimedia.org/r/673472 (https://phabricator.wikimedia.org/T277750)
* 21:49 marxarelli: Reloading Zuul to deploy I273270295fa5a29422a57af13f9e372bced96af1 and I81f5e785d26e21434cd66dc694b4cfe70c1fa494
* 12:48 Majavah: shutdown deployment-sca*, services on them are too old and broken to be useful according to the SREs, have no maintainers and the hosts are running Jessie, [[phab:T218729|T218729]]
* 18:08 Krenair: Kicked deployment-bastion node in jenkins to try to fix jobs
* 11:49 Majavah: remove now-deleted deployment-cumin02 from hiera allowed cumin masters
* 06:42 legoktm: deployed https://gerrit.wikimedia.org/r/193057
* 08:54 Majavah: remove deployment-restbase02 from cassandra and shut it down [[phab:T250574|T250574]]
* 01:01 Krinkle: Keeping all integration-slave12xx and slave14xx instances depooled.
* 08:45 Majavah: disable puppet and stop restbase service on deployment-restbase02 for [[phab:T250574|T250574]]
* 00:53 Krinkle: Finished provisioning of integration-slave12xx and slave14xx instance. Initial testing failed due to "/usr/local/bin/zuul-cloner: No such file or directory"


== February 26 ==
== 2021-03-18 ==
* 23:24 Krinkle: integration-puppetmaster /var disk is full (1.8 of 1.9GB) - /var/log/puppet/reports is 1.1GB - purging
* 22:29 marxarelli: deploying https://gerrit.wikimedia.org/r/c/blubber/+/673332 and https://gerrit.wikimedia.org/r/c/blubber/+/671199 to eqiad/codfw ([[phab:T277109|T277109]])
* 23:23 Krinkle: Puppet failing on new instances due to "Error 400 on SERVER: cannot generate tempfile `/var/lib/puppet/yaml/"
* 22:28 marxarelli: staging https://gerrit.wikimedia.org/r/c/blubber/+/673332 and https://gerrit.wikimedia.org/r/c/blubber/+/671199
* 13:27 Krinkle: Provisioning the new integration-slave12xx and integration-slave14xx instances
* 18:27 brennen: updating gitlab-test to 13.9.4-ce
* 05:05 legoktm: deployed https://gerrit.wikimedia.org/r/192980
* 18:03 James_F: zuul: [mediawiki/extensions/Wikibase] Use composer not vendor on REL1_35 [[phab:T277750|T277750]]
* 03:48 Krinkle: Creating integration-slave1201,02,03,04 and integration-slave1401,02,03,04,05 per T74011 (not yet setup/provisioned, keep depooled)  
* 17:39 marxarelli: deploying https://gerrit.wikimedia.org/r/c/blubber/+/671199 to staging
* 03:39 Krinkle: Cleaned up and re-pooled integration-slave1006 (was depooled since yesterday)
* 16:48 hashar: Purging openjdk-8 packages from Jenkins agents # [[phab:T269354|T269354]]
* 03:39 Krinkle: Cleaned up and re-pooled integration-slave1007 and integration-slave1008 (was auto-depooled by Jenkins)
* 16:10 addshore: reload zuul for https://gerrit.wikimedia.org/r/673208 and https://gerrit.wikimedia.org/r/673211 [[phab:T277750|T277750]] (apitests php versions)
* 01:54 Krinkle: integration-slave1007 and integration-slave1008 were auto-deplooed due to main disk (/ and its /tmp) being < 900 MB free
* 15:55 addshore: reload zuul for Introduce query-builder job so it can use npm 6.14.* instead [integration/config] - https://gerrit.wikimedia.org/r/673183 [[phab:T277060|T277060]]
* 01:20 legoktm: actually deployed https://gerrit.wikimedia.org/r/192772 this time
* 15:52 hashar: Disconnecting a bunch of Jenkins agents to upgrade them to Java 11  # [[phab:T269354|T269354]]
* 01:16 legoktm: deployed https://gerrit.wikimedia.org/r/192772
* 13:20 Majavah: manually systemctl daemon-reload && systemctl start srv-swift\\x2dstorage-lv\\x2da1.mount on deployment-ms-be* nodes for [[phab:T276179|T276179]]
* 09:10 addshore: reload zuul for Remove mwselenium-quibble-docker [integration/config] - https://gerrit.wikimedia.org/r/673206
* 08:44 Majavah: delete now unused deployment-ircd [[phab:T277081|T277081]]
* 08:40 Majavah: delete deployment-db06, 07/08 have been working fine for a week now


== February 25 ==
== 2021-03-17 ==
* 23:55 Krinkle: Re-established Jenkins-Gearman connection
* 20:30 hashar: Reloaded Zuul for {{Gerrit|I2368478e4c4ab8752581f55a7c5ab493fafdeb41}}
* 23:54 Krinkle: Zuul queue is growing. Nothing is added to its dashboard. Jenkins executers all idle. Gearman deadlock?
* 15:37 Majavah: shutdown deployment-restbase01 for [[phab:T250574|T250574]]
* 20:38 legoktm: deployed https://gerrit.wikimedia.org/r/192564
* 15:32 Majavah: taavi@deployment-restbase01:~$ sudo nodetool decomission # [[phab:T250574|T250574]]
* 20:18 legoktm: deployed https://gerrit.wikimedia.org/r/192267
* 14:53 addshore: reload zuul for https://gerrit.wikimedia.org/r/673028 Run more Wikibase tests jobs for REL1_35 branch
* 17:22 ^d: reloading zuul to pick up utfnormal jobs
* 01:21 James_F: Zuul: [labs/tools/wikisource-ocr] Remove CI
* 02:15 Krinkle: integration-slave1006 has <700MB free disk space (including /tmp)


== February 24 ==
== 2021-03-16 ==
* 18:41 marxarelli: Running `jenkins-jobs update` to create browsertests-CentralAuth-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce
* 21:37 longma: Updating dev-images docker-pkg files on primary contint
* 17:55 Krinkle: It seems xdebug was enabled on integration slaves running trusty. This makes errors in build logs incomprehensible.
* 21:22 marxarelli: reloading zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/672792
* 20:06 marxarelli: restarting zuul due to seemingly stuck dependency chain
* 16:22 James_F: Docker: Publishing quibble-stretch-php72-apache:0.0.46-s1
* 10:29 addshore: reload zuul for https://gerrit.wikimedia.org/r/670898 Add configuration for new wikidata/query-builder repo
* 10:17 hashar: Building docker-registry.wikimedia.org/releng/sonar-scanner:4.6.0.2311-1  # [[phab:T277527|T277527]]


== February 21 ==
== 2021-03-15 ==
* 03:01 Krinkle: Reloading Zuul to deploy I3bcd3d17cb886740bd67b33b573aa25972ddb574
* 08:25 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/670782
* 08:13 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/672118


== February 20 ==
== 2021-03-13 ==
* 07:25 Krinkle: Finished setting up integration-slave1010 and added it to Jenkins slave pool
* 17:10 twentyafterfour: restart apache on gerrit1001
* 00:54 Krinkle: Setting up integration-slave1010 (replacement for integration-slave1009)


== February 19 ==
== 2021-03-12 ==
* 23:13 bd808: added Thcipriani to under_NDA sudoers group; WMF staff
* 22:57 marxarelli: reloading zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/671295
* 19:45 Krinkle: Destroying integration-slave1009 and re-imaging
* 22:28 marxarelli: running `tox -e jenkins-jobs -- --conf jenkins_jobs.ini update ./jjb '*-pipeline-*'` to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/668199
* 19:02 bd808: VICTORY! deployment-bastion jenkins slave unstuck
* 19:13 Majavah: taavi@deployment-cumin:~$ sudo cumin -b 1 -s 5 'wdqs2*' 'run-puppet-agent -q'
* 19:01 bd808: toggling gearman plugin in jenkins admin console
* 19:01 legoktm: legoktm@deployment-puppetmaster04:/var/lib/git/labs$ sudo mv private-back /root/private-back-2020-06
* 18:58 bd808: took deployment-bastion jenkins connection offline and online 5 times; gearman plugin still stuck
* 14:10 addshore: reload zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/670967/ [[phab:T276428|T276428]]
* 18:41 bd808: cleaned up mess in /tmp on integration-slave1008
* 18:38 bd808: brought integration-slave1007 back online
* 18:37 bd808: cleaned up mess in /tmp on integration-slave1007
* 18:29 bd808: restarting jenkins because I messed up and disabled gearman plugin earlier
* 16:30 bd808: disconnected and reconnected deployment-bastion.eqiad again
* 16:28 bd808: reconnected deployment-bastion.eqiad to jenkins
* 16:28 bd808: disconnected deployment-bastion.eqiad from jenkins
* 16:27 bd808: killed all pending jobs for deployment-bastion.eqiad
* 16:26 bd808: disconnected deployment-bastion.eqiad from jenkins
* 16:20 legoktm: updated phpunit for https://gerrit.wikimedia.org/r/188398


== February 18 ==
== 2021-03-11 ==
* 23:50 marxarelli: Reloading Zuul to deploy Id311d632e5032ed153277ccc9575773c0c8f30f1
* 22:11 marxarelli: reverted https://gerrit.wikimedia.org/r/670963 and re-running failed job([[phab:T277236|T277236]])
* 23:37 marxarelli: Running `jenkins-jobs update` to create mediawiki-vagrant-bundle17-cucumber job
* 22:10 marxarelli: running `tox -e jenkins-jobs -- --conf jenkins_jobs.ini update ./jjb/ trigger-research-mwaddlink-pipeline-test research-mwaddlink-pipeline-test` to revert https://gerrit.wikimedia.org/r/c/integration/config/+/668199 for select jobs ([[phab:T277236|T277236]])
* 23:15 marxarelli: Running `jenkins-jobs update` to update mediawiki-vagrant-bundle17 jobs
* 22:03 marxarelli: running `tox -e jenkins-jobs -- --conf jenkins_jobs.ini update ./jjb/ trigger-research-mwaddlink-pipeline-test research-mwaddlink-pipeline-test` to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/668199 for select jobs ([[phab:T277236|T277236]])
* 22:56 marxarelli: Reloading Zuul to deploy I3b71f4dc484d5f9ac034dc1050faf3ba6f321752
* 16:58 Majavah: copy a tarball of deployment-fluorine02 /home to deployment-mwlog01 root home dir, delete deployment-fluorine02 [[phab:T276419|T276419]]
* 22:42 marxarelli: running `jenkins-jobs update` to create mediawiki-vagrant-bundle17 jobs
* 16:49 Majavah: delete deployment-etcd-01 [[phab:T276462|T276462]]
* 22:13 hashar: saving Jenkins configuration at https://integration.wikimedia.org/ci/configure to reset the locale
* 13:51 Majavah: shut down deployment-db06, now unused [[phab:T277070|T277070]]
* 16:41 bd808: beta-scap-eqiad job fixed after manually rebuilding git clones of scap/scap on rsync01 and videoscaler01
* 13:48 Majavah: set deployment-db07 as r/w [[phab:T277070|T277070]]
* 16:39 bd808: rebuilt corrupt deployment-videoscaler01:/srv/deployment/scap/scap
* 13:48 Majavah: stop mariadb to ensure reads have stopped on deployment-db06 [[phab:T277070|T277070]]
* 16:36 bd808: rebuilt corrupt deployment-rsync01:/srv/deployment/scap/scap
* 13:41 Majavah: stop slave on deployment-db06 [[phab:T276968|T276968]]
* 16:26 bd808: scap failures only from deployment-videoscaler01 and deployment-rsync01
* 13:37 Majavah: make deployment-db06 and deployment-db08 be replicas of deployment-db07 [[phab:T277070|T277070]]
* 16:25 bd808: scap failing with "ImportError: cannot import name cli" after latest update; investigating
* 13:34 Majavah: stop and reset slave on deployment-db07 [[phab:T277070|T277070]]
* 16:23 bd808: redis-cli srem 'deploy:scap/scap:minions' i-0000059b.eqiad.wmflabs i-000007f8.eqiad.wmflabs i-0000022e.eqiad.wmflabs i-0000044e.eqiad.wmflabs i-000004ba.eqiad.wmflabs
* 13:32 Majavah: set deployment-db06 as read only [[phab:T277070|T277070]]
* 16:16 bd808: 5 deleted instances in trebuchet redis cache for salt/salt repo
* 09:12 hashar: Updated plugins on https://releases-jenkins.wikimedia.org/
* 16:16 bd808: updated scap to 7c64584 (Add universal argument to ignore ssh_auth_sock)
* 16:14 bd808: scap clone on deployment-mediawiki02 corrupt; git fsck did not fix; will delete and refetch
* 01:41 bd808: fixed git rebase conflict on deployment-salt caused by outdated cherry-pick; cherry-picks are merged now so reset to tracking origin/production


== February 17 ==
== 2021-03-10 ==
* 17:47 hashar: beta cluster is mostly down because the  instance supporting the main database (deployment-db1) is down. The root cause is an outage on the labs infra
* 19:36 Majavah: shutdown deployment-ircd [[phab:T277081|T277081]]
* 03:43 Krinkle: Depooled integration-slave1009 (Debugging T89180)
* 18:46 Majavah: switch floating ip 185.15.56.34 to deployment-ircd02 [[phab:T277081|T277081]]
* 03:38 Krinkle: Depooled integration-slave1009
* 18:05 Majavah: create deployment-ircd02 for [[phab:T277081|T277081]]
* 17:26 marxarelli: `rm -rf /srv/dump` on deployment-db06 and reenabling puppet
* 17:25 marxarelli: `rm -rf /srv/restore` on deployment-db08 and reenabling puppet
* 17:24 marxarelli: `rm -rf /srv/backup /srv/restore` on deployment-db07 and reenabling puppet
* 17:09 Majavah: set beta cluster mediawiki as read write on mw config ([[phab:T276968|T276968]])
* 17:03 Majavah: make deployment-db06 read-write [[phab:T276968|T276968]]
* 16:50 Majavah: `reset slave;` on new master deployment-db06 [[phab:T276968|T276968]]
* 16:49 Majavah: add deployment-db07 as a replica of db06 for [[phab:T276968|T276968]]
* 16:45 Urbanecm: root@deployment-db07:/opt/wmf-mariadb104/bin# ./mysql_upgrade -h 127.0.0.1 # [[phab:T276968|T276968]]
* 16:12 Majavah: deployment-db08 CHANGE MASTER to MASTER_USER='repl', MASTER_PASSWORD='redacted', MASTER_PORT=3306, MASTER_HOST='deployment-db06.deployment-prep.eqiad1.wikimedia.cloud', MASTER_LOG_FILE='deployment-db06-bin.000059', MASTER_LOG_POS=522469730; ([[phab:T276968|T276968]])
* 16:06 Urbanecm: start root@deployment-db07:/srv/sqldata.db06# rsync --progress -r deployment-db06:/srv/sqldata/ . ([[phab:T276968|T276968]])
* 15:57 Majavah: set deployment-db06 as readonly from mysql side [[phab:T276968|T276968]]
* 15:54 Urbanecm: Start `root@deployment-db08:/opt/wmf-mariadb104/bin# ./mysql_upgrade -h 127.0.0.1` ([[phab:T276968|T276968]])
* 15:54 Urbanecm: Start mariadb on db08 ([[phab:T276968|T276968]])
* 15:22 Urbanecm: rsync deployment-db06:/srv/sqldata to deployment-db08:/srv/sqldata in a tmux session on deploymdeployment-db08 ([[phab:T276968|T276968]])
* 14:52 Majavah: delete deployment-db08 /srv/sqldata to attempt procedure in https://phabricator.wikimedia.org/T276968#6900199
* 10:16 arturo: briefly stopping deployment-puppetdb03 to disable VMX CPU flag
* 00:28 marxarelli: mariadb successfully started on db07 following transfer/extraction using mariabackup and following mysql_upgrade ([[phab:T276968|T276968]])
* 00:10 marxarelli: restore of db06 failed yet again. trying mariabackup db06 -> db07 instead of mysqldump (after fixing docs/usage of the former) ([[phab:T276968|T276968]])


== February 14 ==
== 2021-03-09 ==
* 00:55 marxarelli: gzip'd /var/log/account/pacct.0 on deployment-bastion
* 21:54 marxarelli: restoring from db06 dump on db07 and db08 following `DROP VIEW IF EXISTS user` workaround ([[phab:T276968|T276968]])
* 00:02 bd808: Stopped udp2log ans started udp2log-mw on deployment-bastion
* 20:53 marxarelli: restore on db07 failed. appears to be a bug w/ mariadb/mysqldump 10.4 compat https://jira.mariadb.org/browse/MDEV-22127 ([[phab:T276968|T276968]])
* 20:53 marxarelli: restore on db07 failed. appears to be a bug w/ mariadb/mysqldump 10.4 compat https://jira.mariadb.org/browse/MDEV-22127
* 20:39 marxarelli: doing `--skip-grant-tables` on deployment-db08 and creating a new root@127.0.0.1 user ([[phab:T276968|T276968]])
* 20:33 Majavah: install mariadb on deployment-db08 [[phab:T276968|T276968]]
* 19:59 marxarelli: creating new instance deployment-db08 to use as new beta replica db ([[phab:T276968|T276968]])
* 19:56 marxarelli: deleting deployment-db05 to free up quota for new replica ([[phab:T276968|T276968]])
* 19:50 marxarelli: restoring database dump on deployment-db07 ([[phab:T276968|T276968]])
* 18:49 marxarelli: restarting db dump on db06 `mysqldump -h 127.0.0.1 --events --routines --triggers --all-databases -f --single-transaction` ([[phab:T276968|T276968]])
* 18:38 Majavah: installing mariadb 10.4 via role::mariadb::beta to db07 [[phab:T276968|T276968]]
* 18:25 marxarelli: "View 'labswiki.tag_summary' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them" when using LOCK TABLES" during mysqldump on db06 ([[phab:T276968|T276968]])
* 18:21 Majavah: create deployment-db07 as g2.cores8.ram16.disk160 Buster [[phab:T276968|T276968]]
* 18:20 marxarelli: disabled puppet on deployment-db06 and started mysqldump ([[phab:T276968|T276968]])
* 18:09 Majavah: set deployment-db05 to read-only to avoid issues with [[phab:T276968|T276968]]
* 18:04 marxarelli: deleting shut down memc* deployment-prep instances to free up quota for replacement db instances ([[phab:T276968|T276968]])
* 17:25 marxarelli: seeing "[ 2886.337845] EXT4-fs error (device vda3): ext4_validate_block_bitmap:" for deployment-db05
* 17:22 marxarelli: restarting deployment-db05 via horizon
* 17:22 marxarelli: deployment-db05 seems to be acting up (intermittent connection failures) which is causing issues with beta-update-databases-eqiad, which is (possibly) causing post-merge jobs to pile up
* 16:47 marxarelli: still seeing "JobOffer[deployment-deploy01 #3] rejected beta-scap-eqiad: Waiting for next available executor on ‘deployment-deploy01’" despite available executors
* 16:27 marxarelli: builds once again being scheduled on deployment-deploy01
* 16:24 marxarelli: cycling gearman plugin on integration.wikimedia.org
* 16:16 marxarelli: taking deployment-deploy01 agent offline to mitigate stuck post-merge jobs
* 13:32 arturo: hard-reboot deployment-db05 because issues related to [[phab:T276922|T276922]]
* 12:34 arturo: briefly rebooting VM deployment-db05, we need to reboot its hypervisor cloudvirt1038 and failed to migrate to other


== February 13 ==
== 2021-03-08 ==
* 23:25 bd808: cherry-picked https://gerrit.wikimedia.org/r/#/c/190231/ to deployment-salt for testing
* 21:38 brennen: Updating dev-images docker-pkg files on primary contint for https://gerrit.wikimedia.org/r/c/releng/dev-images/+/663159
* 14:03 Krinkle: Jenkins UI stuck in Spanish. Resetting configuration.
* 08:57 hashar: Nuked castor cache for labs/striker # [[phab:T276605|T276605]]
* 13:05 Krinkle: Reloading Zuul to deploy I0eaf2085576165b


== February 12 ==
== 2021-03-07 ==
* 11:11 hashar: changed passwords of selenium users.
* 17:46 James_F: Deleting deployment-snapshot01, shut off since 2020-10-03.
* 10:41 hashar: Removing MEDIAWIKI_PASSWORD* global env variables from Jenkins configuration {{bug|T89226}}
* 17:43 James_F: Deleting deployment-cumin02, shut off since 2020-10-16.
* 17:18 Majavah: shutdown deployment-memc[04-05] [[phab:T276707|T276707]]
* 16:51 Majavah: cherry pick 669436 and 669436 to deployment-puppetmaster04 [[phab:T276707|T276707]]
* 15:52 Majavah: redis::shards change shard01 from deployment-memc04 to deployment-memc08, shard02 from deployment-memc05 to deployment-memc10 [[phab:T276707|T276707]]
* 15:44 Majavah: create deployment-memc10 on Buster [[phab:T276707|T276707]], beta cluster is almost on full quota but will get better when old shutdown Jessie instances will be deleted
* 15:28 Majavah: remove and shard04 (deployment-memc07) from redis::shards, switch shard03 from deployment-memc06 to deployment-memc09, [06-07] are both already shut down and 09 is a new in setup Buster machine to replace it, [[phab:T276707|T276707]] [[phab:T250585|T250585]]
* 13:14 Majavah: create deployment-memc09 on Buster [[phab:T276707|T276707]]


== February 11 ==
== 2021-03-06 ==
* 19:39 Krinkle: Jenkins UI is stuck in French. Resetting..
* 19:45 Majavah: restart deployment-logstash03 to see if it fixes it being empty
* 17:56 greg-g: hashar saved Jenkins global configuration at https://integration.wikimedia.org/ci/configure  to hopefully reset the web interface default locale
* 09:48 Majavah: cherry-pick https://gerrit.wikimedia.org/r/668995 on deployment-puppetmaster04 [[phab:T276654|T276654]]
* 09:57 hashar: restarting Jenkins to upgrade the Credentials plugin
* 08:09 Majavah: deployment-acme-chief change authorized regex for mx to use .eqiad1.wikimedia.cloud domain to fix [[phab:T276652|T276652]]
* 09:25 hashar: bunch of puppet failure since 8:00am UTC. Seems to be DNS timeouts.


== February 10 ==
== 2021-03-05 ==
* 09:18 hashar: reenabling puppet-agent on deployment-salt . Was disabled with no reason nor sal entry.
* 20:25 James_F: Disabling deployment-memc06 on the grounds that it's an unreferenced Jessie box we don't want any more [[phab:T250585|T250585]]
* 06:32 Krinkle: Fix lanthanum:/srv/ssd/jenkins-slave/workspace/mediawiki-extensions-zend@3/src/extensions/Flow/.git/config.lock
* 20:23 James_F: Disabling deployment-memc07 on the grounds that it's an unreferenced Jessie box we don't want any more [[phab:T250585|T250585]]
* 00:50 bd808: Updated integration/slave-scripts to "Load extensions using wfLoadExtensions() if possible" (b532a9a)
* 19:36 Majavah: release deployment-prep floating ip 185.15.56.7, was used for mailman upgrade which is now on its own project
* 19:30 Majavah: shutdown deployment-etcd-01 to see if anything breaks, will delete if nothing has broken during next week [[phab:T276462|T276462]]
* 19:15 Majavah: beta cluster etcd was switched from deployment-etcd-01 to deployment-etcd02 ref [[phab:T276462|T276462]]
* 17:50 Majavah: switch deployment-prep hiera key etcd_host to use deployment-etcd02 ref [[phab:T276462|T276462]]
* 13:40 Majavah: create deployment-etcd02 and sign its puppet certificate [[phab:T276462|T276462]]
* 13:13 Majavah: move profile::etcd::cluster_name hiera key from deployment-etcd prefix to deployment-etcd-01 vm specific
* 11:48 Majavah: live hack beta puppetmaster to fix hopefully trust store location; [[phab:T276521|T276521]] and possibly others
* 08:32 Majavah: deployment-logstash03 try to recreate /etc/rsyslog.d using puppet to try to repair [[phab:T241481|T241481]], directory is different on deployment-logstash2


== February 9 ==
== 2021-03-04 ==
* 22:40 Krinkle: Various mediawiki-extensions-zend builds are jammed half-way through phpunit execution (filed T89050)
* 15:47 hashar: Refreshing jobs based on releng/tox-buster to use latest image.  That brings in tox installed with python3 instead of python2 # [[phab:T276384|T276384]]
* 21:31 hashar: Deputized legoktm to the Gerrit 'integration' group. Brings +2 on integration/* repos.
* 15:00 Majavah: remove graphoid role from deploymenr-sca[01-02] ref [[phab:T276102|T276102]] and it being decomissioned in [[phab:T242855|T242855]]
* 20:38 hashar: reconnected jenkins slave agents 1006 1007 and 1008
* 13:18 Majavah: shutdown deployment-fluorine02 for a scream test for [[phab:T276419|T276419]], I believe everything has been moved to deployment-mwlog01
* 20:37 hashar: deleted /tmp on integration slaves 1006 1007 and 1008. Filled with npm temp directories
* 12:38 Majavah: `git rebase origin/production` on deployment-puppetmaster04 to update few settings for [[phab:T276419|T276419]]
* 15:51 hashar: integration : allowed ssh from gallium 208.80.154.135/32 to the instances
* 12:19 Majavah: Beta cluster is now using deployment-mwlog01 instead of deployment-fluorine02 for MediaWiki logs. fluorine02 is still used for some other misc services, these will be migrated soon
* 09:20 hashar: starting puppet agent on integration-puppetmaster
* 12:06 Majavah: deployment-prep Delete lists.beta.wmflabs.org DNS record, points to an unassigned floating IP and not used according to Amir
* 11:02 Majavah: live hacking https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/668338/ on deployment-deploy01 to test new deployment-mwlog01 ref [[phab:T276419|T276419]]
* 10:51 Majavah: stop bogus service udp2log on deployment-mwlog01, no idea what it is but it was using the same port as udp2log-mw.service is
* 09:20 hashar: Restored analytics/udp2log cause it got to be packaged for Buster # [[phab:T276422|T276422]] [[phab:T180301|T180301]]
* 07:47 legoktm: rebuilding php*-compile images https://gerrit.wikimedia.org/r/668259
* 06:33 Majavah: create Buster VM deployment-mwlog01 to eventually replace deployment-fluorine02 which is still on Stretch


== February 7 ==
== 2021-03-03 ==
* 16:23 hashar: puppet is broken on integration project for some reason. No clue what is going on :-( {{bug|T88960}}
* 20:30 legoktm: added Majavah as projectadmin in deployment-prep (Beta Cluster)
* 16:19 hashar: restarted puppetmaster on integration-puppetmaster.eqiad.wmflabs
* 19:59 James_F: Zuul: [mediawiki/services/function-schemata] Revert "Use bespoke pipeline jobs"
* 00:42 Krinkle: Jenkins is alerting for integration-slave1006, integration-slave1007 and integration-slave1008 having low /tmp space free (< 0.8GB)
* 16:49 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Run phan with Echo
* 11:40 addshore: reload zuul for https://gerrit.wikimedia.org/r/667623
* 10:33 hashar: REplaced java-codehealth-patch job in favor of running sonar:sonar inline in all the java jobs.  Thanks gehel!  # [[phab:T264873|T264873]] {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/666869/


== February 6 ==
== 2021-03-02 ==
* 22:40 Krinkle: Installed dsh on integration-dev
* 22:22 Krinkle: Run `sudo systemctl restart memcached` on deployment-mediawiki-07
* 05:46 Krinkle: Reloading Zuul to deploy I096749565 and I405bea9d3e
* 22:22 Krinkle: Set `profile::mediawiki::mcrouter_wancache::use_onhost_memcached: true` manaully in Horizon for deployment-mediawiki-07 (TODO: Move to cloud/eqiad1 in operations/puppet.git).
* 01:35 Krinkle: Upgraded all integration slaves to npm v2.4.1


== February 5 ==
== 2021-03-01 ==
* 13:11 hasharAway: restarted Zuul server to clear out stalled jobs
* 18:25 marxarelli: deleting unused docker-registry-uploader jenkins credential
* 12:25 hashar: Upgrading puppet-lint from 0.3.2 to 1.1.0 on all repositories. All jobs are non voting beside mediawiki-vagrant-puppetlint-lenient which pass just fine with 1.1.0
* 14:41 andrewbogott: changed profile::redis::multidc::discovery from 'false' to "" to comply with strict typing in the deployment-memc puppet prefix.
* 03:21 Krinkle: Reloading Zuul to deploy I08a524ea195c
* 00:22 marxarelli: Reloaded Zuul to deploy Iebdd0d2ddd519b73b1fc5e9ce690ecb59da9b2db


== February 4 ==
== 2021-02-27 ==
* 10:43 hashar: beta-scap-eqiad job is broken because mwdeploy can no more ssh from deployment-bastion to deployment-mediawiki01 . Filled as {{bug|T88529}}
* 22:03 Reedy: re-armed beta keyholder... I think...
* 10:30 hashar: piok


== February 3 ==
== 2021-02-26 ==
* 13:55 hashar: ElasticSearch /var/log/ filling up is {{bug|T88280}}
* 19:47 James_F: Zuul: [mediawiki/services/geoshapes] Add typescript service CI [[phab:T274380|T274380]]
* 09:15 hashar: Running puppet on deployment-eventlogging02 has been stalled for 3d15h. No log :-(
* 01:11 legoktm: update credentials in https://integration.wikimedia.org/ci/credentials/store/system/domain/service-pipeline/credential/docker-registry-uploader/ for new ci-build user ([[phab:T275559|T275559]])
* 09:08 hashar: cleaning /var/log on deployment-elastic06 and deployment-elastic07
* 00:44 Krinkle: Restarting Jenkins-Gearman connection


== February 2 ==
== 2021-02-24 ==
* 21:39 Krinkle: Deployed I94f65b56368 and reloading Zuul
* 22:47 James_F: Docker: Actually re-building Rust images for 1.50.0
* 22:16 legoktm: rebuilding Rust docker images


== January 31 ==
== 2021-02-23 ==
* 20:31 hashar: canceling a bunch of browser tests jobs that are deadlocked waiting for SauceLabs.  The http request has no timeout {{bug|T88221}}
* 18:20 James_F: Zuul: [mediawiki/services/function-schemata] Add generic pipeline CI
* 16:24 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Test with Echo (Notifications)


== January 29 ==
== 2021-02-20 ==
* 01:39 James_F: Restarting Jenkins because deployment-bastion.eqiad isn't depooling even after restart.
* 18:35 James_F: Zuul: [mediawiki/services/function-evaluator] Drop direct CI; uses pipeline
* 00:47 Krenair: running instructions at https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code.2Fdb_update
* 18:13 James_F: Zuul: [mediawiki/extensionos/LockAuthor] Enable basic quibble CI
* 00:26 Krinkle: integration-slave1007 rm -rf /mnt/jenkins-workspace/workspace/oojs*
* 00:19 Krinkle: Jenkins slave on deployment-bastion.eqiad has been stuck for the past 5 hours


== January 28 ==
== 2021-02-19 ==
* 22:53 Krinkle: rm -rf integration-slave1007  rm -rf /mnt/jenkins-workspace/workspace/mwext-DonationInterface-np*
* 13:51 hashar: Reupdating tox jobs since https://gerrit.wikimedia.org/r/c/integration/config/+/664897  did not get merged
* 22:43 Krinkle: /srv/deployment/integration/slave-scripts got corrupted by puppet on labs slaves. No longer has the appropriate permission flags.
* 13:49 hashar: Updating Jenkins jobs for "Remove dependency on Maven binaries and wrapper script." {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/651791/
* 16:52 marktraceur: restarting nginx on deployment-upload so beta images might work again


== January 27 ==
== 2021-02-18 ==
* 18:54 Krinkle: rm -rf integration-slave1007 mwext-VisualEditor-*
* 01:50 Urbanecm: Kill stuck beta-scap-eqiad job and start a new one to sync beta
* 00:06 brennen: gerrit: added abstract-wikipedia to members for extension-WikiLambda, mediawiki-services-function-schemata


== January 26 ==
== 2021-02-17 ==
* 23:22 bd808: rm integration-slave1006:/mnt/jenkins-workspace/workspace/mediawiki-phpunit-hhvm/src/.git/HEAD.lock (file was timestamped Jan 22 23:55)
* 20:05 hashar: Updating all Jenkins jobs for https://gerrit.wikimedia.org/r/664897 # [[phab:T275049|T275049]]
* 21:06 bd808: I just merged a scap change that probably will break the beta-recomile-math-textvc-eqiad job -- https://gerrit.wikimedia.org/r/#/c/186808/
* 17:59 hashar: Building Docker images for https://gerrit.wikimedia.org/r/c/integration/config/+/664680  # [[phab:T275049|T275049]]
* 03:26 James_F: Zuul: [mediawiki/core] PHP 8.0 version of composertest job to experimental


== January 24 ==
== 2021-02-16 ==
* 01:05 hashar: restarting Jenkins (deadlock on deployment-bastion slave)
* 20:33 brennen: updating gitlab-test.wmcloud.org to 13.8.4-ce.0
* 19:58 hashar: Updating Jenkins job wikimedia-fundraising-civicrm-docker to stop cloning the drupal repository # [[phab:T273822|T273822]]
* 17:02 greg-g: doing the https://www.mediawiki.org/wiki/Continuous_integration/Jenkins#Hung_beta_code/db_update dance
* 04:18 James_F: Manually updated doc1001 via https://www.mediawiki.org/wiki/Continuous_integration/Documentation_generation#Updating_the_doc.wikimedia.org_site
* 04:00 James_F: Zuul: Add Tim Abdullin from S&F to CI allow list


== January 20 ==
== 2021-02-15 ==
* 18:50 Krinkle: Reconfigure Jenkins default language back to 'en' as it was set to Turkish
* 15:58 hashar: Successfully published image docker-registry.discovery.wmnet/releng/operations-puppet:0.8.1 # [[phab:T209953|T209953]]


== January 17 ==
== 2021-02-14 ==
* 20:20 James_F: Brought deployment-bastion.eqiad back online, but without effect AFAICS.
* 21:31 James_F: Zuul: Add 'check php' support for library repos
* 20:19 James_F: Marking deployment-bastion.eqiad as temporarily offline to try to fix the backlog.
* 20:12 James_F: Zuul: [mediawiki/services/graphoid] Archive [[phab:T274738|T274738]]


== January 16 ==
== 2021-02-13 ==
* 23:26 bd808: cherry-picked https://gerrit.wikimedia.org/r/#/c/185570/ to fix puppet errors on deployment-prep
* 03:50 James_F: Zuul: [mediawiki/libs/IDLeDOM] Turn on jenkins CI for the `idle-dom` library
* 12:43 _joe_: added hhvm.pcre_cache_type = "lru" to beta hhvm config
* 12:32 _joe_: installing the new HHVM package on mediawiki hosts
* 11:59 akosiaris: removed ferm from all beta hosts via salt


== January 15 ==
== 2021-02-12 ==
* 17:06 greg-g: turned off the beta-scap-eqiad jenkins job due to the persistent failing (https://phabricator.wikimedia.org/T86901) and the impending labs outage
* 17:19 brennen: Publishing from dev-images docker-pkg files on primary contint for fr-tech images
* 14:50 hashar: beta-scap-eqiad broken since ~ 7:52am UTC.  Depends on mwdeploy user homedir to be fixed in LDAP https://phabricator.wikimedia.org/T86903
* 12:05 Lucas_WMDE: canceled one beta-scap-eqiad job per https://w.wiki/J5$
* 10:55 hashar: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ is broken since roughly 7:52am UTC.


== January 14 ==
== 2021-02-11 ==
* 23:22 mutante: cherry-picked I1e5f9f7bcbbe6c4 on deployment-bastion
* 21:44 Krinkle: Logstash in beta is not receiving any events [[phab:T274593|T274593]]
* 20:37 hashar: Restarting Zuul
* 17:36 James_F: Zuul: [mediawiki/extensions/Acrolinx] Disable running selenium tests
* 20:36 hashar: Zuul applied Ori patch to fix a git lock contention in Zuul-cloner {{bug|T86730}} . Tagged wmf-deploy-20150114-1
* 17:14 James_F: Zuul: [mediawiki/extensions/GoogleAppEngine] Archive the extension [[phab:T274069|T274069]]
* 16:58 greg-g: rm -rf'd the Wikigrok checkout in integration-slave1006:/mnt/jenkins-workspace/workspace/mediawiki-extensions-hhvm/src/extensions to (hopefully) fix https://phabricator.wikimedia.org/T86730
* 09:50 hashar: Successfully build Docker images for Quibble 0.0.46
* 14:56 anomie: Cherry-pick https://gerrit.wikimedia.org/r/#/c/173336/11/ to Beta Labs
* 09:07 hashar: Building Quibble 0.0.46 Docker images on contint1001 (it is faster than contint2001)
* 02:05 bd808: There is some kind of race / conflict with the mediawiki-extensions-hhvm; I cleaned up the same error for a different extension yesterday
* 01:24 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/663339
* 02:04 bd808: integration-slave1006 IOError: Lock for file '/mnt/jenkins-workspace/workspace/mediawiki-extensions-hhvm/src/extensions/WikiGrok/.git/config' did already exist, delete '/mnt/jenkins-workspace/workspace/mediawiki-extensions-hhvm/src/extensions/WikiGrok/.git/config.lock' in case the lock is illegal


== January 13 ==
== 2021-02-10 ==
* 22:37 hashar: Restarted Zuul, deadlocked waiting for Gerrit
* 22:55 longma: Deploying zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/661796
* 21:38 ori: deployment-prep upgraded nutcracker on mw1/mw2 to 0.4.0+dfsg-1+wm1
* 20:59 brennen: Attempting one more update from dev-images docker-pkg on contint2001 for [[phab:T274306|T274306]]
* 17:49 hashar: If Zuul status page ( https://integration.wikimedia.org/zuul/ ) shows a lot of changes with completed jobs and the number of results growing, Zuul is deadlocked waiting for Gerrit. Have to restart it on gallium.wikimedia.org with /etc/init.d/zuul restart
* 18:36 Urbanecm: deployment-prep: Run scap sync-world as jenkins-deploy
* 17:43 hashar: Restarted deadlocked Zuul , which drops ALL events.  Reason is Gerrit lost connection with its database which is not handled by Zuul . See https://wikitech.wikimedia.org/wiki/Incident_documentation/20150106-Zuul
* 18:36 Urbanecm: deployment-prep deploy01: Run cd /srv/mediawiki-staging/php-master/extensions/PagedTiffHandler && git reset HEAD * && git checkout -- * to fix disappeared extension
* 17:32 James_F: No effect from restarting Gearman. Getting Timo to restart Zuul.
* 17:30 James_F: No effect. Restarting Gearman.
* 17:26 James_F: Trying a shutdown/re-enable of Jenkins.
* 13:59 YuviPanda: running scap via jenkins, hitting buttons on https://integration.wikimedia.org/ci/job/beta-scap-eqiad/
* 13:58 YuviPanda: scap failed
* 13:58 YuviPanda: running scap, because why not
* 13:58 YuviPanda: modified PrivateSettings.php to make it use wikiadmin user rather than mw user
* 13:51 YuviPanda: created user wikiadmin on deployment-db1
* 04:31 James_F: Zuul now appears fixed.
* 04:29 marktraceur: FORCE RESTART ZUUL (James_F told me to)
* 04:28 marktraceur: Attempting graceful zuul restart
* 04:26 marktraceur: Reloaded zuul to see if it will help
* 04:24 James_F: Took the gallium Jenkins slave offline, disconnected and relaunched; no effect.
* 04:19 James_F: Disabled and re-enabled Gearman, no effect.
* 04:15 James_F: Flagged and unflagged Jenkins for restart, no effect.
* 04:10 James_F: Jenkins/zuul/whatever not working, investigating.
* 01:12 marxarelli: Added twentyafterfour as an admin to the integration project
* 01:08 bd808: Added Dduvall as an admin in the integration project
* 00:55 bd808: zuul is plugged up because a gate-and-submit job failed on integration-slave1006 (ZeroBanner clone problem) and then the patch was force merged
* 00:48 bd808: deleted ntegration-slave1006:/mnt/jenkins-workspace/workspace/mediawiki-extensions-hhvm/src/extensions/ZeroBanner to try and clear the git clone problem there
* 00:35 bd808: git clone failure in https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm/131/console blocking merge of core patch


== January 12 ==
== 2021-02-09 ==
* 21:17 hashar: qa-morebots moved from #wikimedia-qa to #wikimedia-releng  {{bug|T86053}}
* 22:27 brennen: once more, with feeling: attempting docker-pkg run for dev-images again
* 20:57 greg-g: yuvi removed webserver:php5-mysql role from  deployment-sentry2, thus getting puppet onit to unfail
* 22:09 brennen: attempting to run docker-pkg manually for dev-images on contint2001
* 20:57 greg-g: test-qa
* 21:08 brennen: Updating dev-images docker-pkg files on primary contint for [[gerrit:635361]] and [[gerrit:632173]]
* 11:41 hashar: foo
* 20:10 James_F: layout: [operations/software/wmfmariadbpy] Use tox, not tox-mysqld
* 10:28 hashar: Removing Jenkins IRC notifications from #wikimedia-qa , please switch to #wikimedia-releng
* 19:31 James_F: Docker: Building and publishing tox-buster &c. with tox 3.21.4 [[phab:T274232|T274232]]
* 09:06 hashar: Tweak Zuul configuration to pin python-daemon <= 2.0  and deploying tag wmf-deploy-20150112-1. {{bug|T86513}}
* 04:43 Krinkle: Submitted wikimedia/minify to Packagist. https://packagist.org/packages/wikimedia/minify ref [[phab:T273247|T273247]]
* 04:42 Krinkle: Submitted wikimedia/minify to Packagist. https://packagist.org/packages/wikimedia/minify


== January 8 ==
== 2021-02-08 ==
* 19:21 Krinkle: Force restart Zuul
* 21:40 brennen: enabled administrative approval for new account signups on gitlab-test
* 19:21 Krinkle: Gearman is back up but Zuul itself still stuck (no longer processing new events, doing "Updating information for .." for the same three jobs over and over again)
* 21:40 brennen: upgraded gitlab instance on gitlab-test from 13.3.3-ce.0 to 13.8.3-ce.0
* 19:08 Krinkle: Relaunched Gearman from Jenkins manager
* 17:22 hashar: Built image docker-registry.discovery.wmnet/releng/quibble-buster-php72:0.0.45-s4
* 19:05 Krinkle: Zuul/Gearman stuck
* 17:17 hashar: Building some docker images on contint.wikimedia.org
* 18:26 YuviPanda: purged nscd cache on all deployment-prep hosts
* 16:51 hashar: Now really reloading zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/662729
* 16:34 Krinkle: Reload Zuul to deploy I9bed999493feb715
* 16:43 hashar: Reloading zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/662729
* 14:58 hashar: [[Nova_Resource:Contintcloud|contintcloud labs project]] has been created! {{bug|T86170}}. Added Krinkle and 20after4 as project admins.
* 14:55 hashar: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/658946  # [[phab:T272863|T272863]]
* 14:44 hashar: on gallium and lanthanum, pushing integration/jenkins.git which would: 1b6a290 - Upgrade JSHint from v2.5.6 to 2.5.11
* 09:06 _joe_: cherry-picked  https://gerrit.wikimedia.org/r/c/operations/puppet/+/657139 in deployment-prep


== January 7 ==
== 2021-02-06 ==
* 10:57 hashar: Taught Jenkins configuration about Java 8. Name: "Ubuntu - OpenJdk 8"  JAVA_HOME: /usr/lib/jvm/java-8-openjdk-amd64/  . Only available on Trusty slaves though
* 21:13 Reedy: unstuck beta jobs
* 10:56 hashar: installed openjdk 8 on CI Trusty labs slaves https://phabricator.wikimedia.org/T85964
* 10:34 hashar: varnish text cache is back up. Had to delete /etc/varnish and reinstall varnish from scratch + rerun puppet.
* 10:25 hashar: deleting /etc/varnish on deplloyment-cache-text02 and running puppet
* 10:24 hashar: beta varnish text cache is broken. The vcl refuses to load because of undefined probes
* 10:01 hashar: restarted deployment-cache-mobile03 and deployment-cache-text02
* 09:49 hashar: rebooting deployment-cache-bits01
* 00:41 Krinkle: rm -rf slave-scripts and re-cloning from integration/jenkins.git on all slaves (under sudo, just like puppet originally did) - git-status and jshint both work fine now
* 00:40 Krinkle: Permissions of deployment/integration/slave-scripts on labs slave are all screwed up (git-status says files are dirty, but when run as root git-status is clean and jshint also works fine via sudo)
* 00:29 Krinkle: Tried reconnecting Gearman, relaunching slave agents. Force-restarting Zuul now.
* 00:15 Krinkle: Permissions in deployment/integration/slave-scripts on integration-slave1003 are screwed up as well


== January 6 ==
== 2021-02-04 ==
* 22:13 hashar: jshint complains with:  Error: Cannot find module './lib/node'  :-(
* 11:40 Lucas_WMDE: canceled one beta-scap-eqiad job per https://w.wiki/J5$
* 22:12 hashar: integration-slave1005 chmod -R go+r /srv/deployment/integration/slave-scripts
* 00:17 James_F: Zuul: [mediawiki/libs/Minify] Install initial CI [[phab:T273247|T273247]]
* 22:08 hashar: integration-slave1007 chmod -R go+r /srv/deployment/integration/slave-scripts  . cscott mentioned build failures of parsoidsvc-jslint  which could not read /srv/deployment/integration/slave-scripts/tools/node_modules/jshint/src/cli.js
* 02:29 ori: qdel -f'd qa-morebots and started a new instance


== December 22 ==
== 2021-02-03 ==
* 20:06 bd808: Saved settings in https://integration.wikimedia.org/ci/configure to get jenkins ui language back to english from korean
* 21:38 James_F: Zuul: Archive VirtualKeyboard extension [[phab:T273801|T273801]]
* 00:30 James_F: Docker: Publish rust images with default-libmysqlclient-dev
* 00:24 James_F: Zuul: [mediawiki/extensions/UseResource] Rename from TemplateScripts
* 00:17 James_F: Zuul: Enable CI for mediawiki/libs/Dodo and mediawiki/libs/WebIDL [[phab:T273295|T273295]]


== December 21 ==
== 2021-02-02 ==
: 08:31 Krinkle: /var on integration-slave1005 had 93% of 2GB full. Removed some large items in /var/cache/apt/archives that seemed unneeded and don't exist on other slaves.
* 19:43 hashar: Pruning dangling Docker images on contint2001
* 19:39 hasharDinner: Pruning dangling Docker images on contint1001
* 19:28 James_F: Zuul: [mediawiki/extensions/PageNotice] Tag as in-wikimedia-production, move [[phab:T61245|T61245]]
* 11:27 hashar: gerrit: fixed notifications queries having single quotes instead of double quotes for qchris, arturo and twentyafterfour
* 10:59 hashar: Marking https://integration.wikimedia.org/ci/computer/compiler1002.puppet-diffs.eqiad.wmflabs/ as offline due to disk space issue # [[phab:T273599|T273599]]


== December 19 ==
== 2021-02-01 ==
* 23:01 greg-g: Krinkle restarted Gearman, which got the jobs to flow again
* 16:04 dpifke: Cherry-picking https://gerrit.wikimedia.org/r/c/mediawiki/core/+/660063 in beta.
* 20:51 Krinkle: integration-slave1005 (new Ubuntu Trusty instance) is now pooled
* 10:55 Urbanecm: deployment-prep: Create beta votewiki ([[phab:T272608|T272608]])
* 18:51 Krinkle: Re-created and provisioning integration-slave1005 (UbuntuTrusty)
* 09:15 hashar: devtools: switched gerrit-prod-1001 to local puppetmaster
* 18:23 bd808: redis input to logstash stuck; restarted service
* 18:16 bd808: ran `apt-get dist-upgrade` on logstash01
* 18:02 bd808: removed local mwdeploy user & group from videoscaler01
* 18:01 bd808: deployment-videoscaler01 has mysteriously aquired a local mwdeploy user instead of the ldap one
* 17:58 bd808: forcing puppet run on deploymnet-videoscaler01
* 07:24 Krinkle: Restarting Gearman connection to Jenkins
* 07:24 Krinkle: Attempt #5 at re-creating integration-slave1001. Completed provisioning per Setup instructions. Pooled.
* 05:33 Krinkle: Rebasing integration-puppetmaster with latest upstream operations/puppet (5 local patches) and labs/private
* 00:06 bd808: restored local commit with ssh keys for scap to deployment-salt


== December 18 ==
== 2021-01-29 ==
* 23:57 bd808: temporarily disabled jenkins scap job
* 18:51 hashar: CI slightly overloaded due to a surge of library updates but is otherwise processing changes
* 23:56 bd808: killed some ancient screen sessions on deployment-bastion
* 23:53 bd808: Restarted udp2log-mw on deployment-bastion
* 23:53 bd808: Restarted salt-minion on deployement-bastion
* 23:47 bd808: Updated scap to latest HEAD version
* 21:57 Krinkle: integration-slave1005 is not ready. It's incompletely setup due to https://phabricator.wikimedia.org/T84917
* 19:29 marxarelli: restarted puppetmaster on deployment-salt
* 19:29 marxarelli: seeing "Could not evaluate: getaddrinfo: Temporary failure in name resolution" in the deployment-* puppet logs
* 14:17 hashar: deleting instance deployment-parsoid04 and removing it from Jenkins
* 14:08 hashar: restarted varnish backend on parsoidcache02
* 14:00 hashar: parsoid05 seems happy: curl http://localhost:8000/_version: <tt>{"name":"parsoid","version":"0.2.0-git","sha":"d16dd2db6b3ca56e73439e169d52258214f0aeb2"}</tt>
* 14:00 hashar: parsoid05 seems happy: curl http://localhost:8000/_version<br/>
* 13:56 hashar: applying latest changes of Parsoid on parsoid05 via: <tt>zuul enqueue --trigger gerrit --pipeline postmerge --project mediawiki/services/parsoid --change 180671,2</tt>
* 13:56 hashar: parsoid05: disabling puppet, stopping parsoid, rm -fR /srv/deployment/parsoid ; rerunning the Jenkins beta-parsoid-update-eqiad to hopefully recreate everything properly
* 13:52 hashar: making parsoid05 a Jenkins slave to replace parsoid04
* 13:24 hashar: apt-get upgrade on parsoidcache02 and parsoid04
* 13:23 hashar: updated labs/private on puppet master to fix a puppet dependency cycle with sudo-ldap
* 13:19 hashar: rebased puppetmaster repo
* 12:53 hashar: reenqueuing last merged change of Parsoid in Zuul postmerge pipeline in order to trigger the beta-parsoid-update-eqiad job properly. <tt>zuul enqueue --trigger gerrit --pipeline postmerge --project mediawiki/services/parsoid --change 180671,2</tt>
* 12:52 hashar: deleting the workspace for the beta-parsoid-update-eqiad jenkins job on deployment-parsoid04 . Some file belong to root which prevent the job from processing
* 09:13 hashar: enabled MediaWiki core 'structure' PHPUnit tests for all extensions.  Will require folks to fix their incorrect AutoLoader and  RessourceLoader entries. {{gerrit|180496}} {{bug|T78798}}


== December 17 ==
== 2021-01-28 ==
* 21:02 hashar: cancelled all browser tests,suspecting them to deadlock Jenkins somehow :(
* 22:15 marxarelli: deleting unused integration-registry-1003 instance
* 17:46 marxarelli: reloading zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/659339
* 17:42 marxarelli: updating jenkins jobs to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/659339


== December 16 ==
== 2021-01-27 ==
* 17:17 bd808: git-sync-upstream runs cleanly on deployment-salt again!
* 10:29 apergos: decommissioned deployment-snapshot01 at last, long since replaced by deployment-snapshot02
* 17:16 bd808: removed cherry pick of Ib2a0401a7aa5632fb79a5b17c0d0cef8955cf990 (-2 by _joe_; replaced by Ibcad98a95413044fd6c5e9bd3c0a6fb486bd5fe9)
* 17:15 bd808: removed cherry pick of I3b6e37a2b6b9389c1a03bd572f422f898970c5b4 (modified in gerrit by bd808 and not repicked; merged)
* 17:15 bd808: removed cherry pick of I08c24578596506a1a8baedb7f4a42c2c78be295a (-2 by _joe_ in gerrit; replaced by Iba742c94aa3df7497fbff52a856d7ba16cf22cc7)
* 17:13 bd808: removed cherry pick of I6084f49e97c855286b86dbbd6ce8e80e94069492 (merged by Ori with a change)
* 17:09 bd808: trying to fix it without using important changes
* 17:08 bd808: deployment-salt:/var/lib/git/operations/puppet is a rebase hell of cherry-picks that don't apply
* 13:51 hashar: deleting integration-slave1001 and recreating it. It is blocked on boot and we can't console on it https://phabricator.wikimedia.org/T76250


== December 15 ==
== 2021-01-26 ==
* 23:24 Krinkle: integration-slave1001 isn't coming back (T76250), building integration-slave1005 as its replacement.
* 18:27 marxarelli: restarting jenkins on releases-jenkins.wikimedia.org following plugin updates
* 12:53 YuviPanda: manually restarted diamond on all betalabs host, to see if that is why metrics aren’t being sent anymore
* 18:26 marxarelli: updating pipeline plugins on releases-jenkins.wikimedia.org
* 09:41 hashar: deleted hhvm core files in /var/tmp/core from both mediawiki01 and mediawiki02 {{T1259}} and {{T71979}}
* 18:26 marxarelli: updating git plugins on releases-jenkins.wikimedia.org
* 16:06 dancy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/658449
* 05:43 legoktm: reloaded zuul for https://gerrit.wikimedia.org/r/658355


== December 13 ==
== 2021-01-25 ==
* 18:51 bd808: Running chmod -R g+s /data/project/upload7 on deploymnet-mediawiki02
* 15:57 Lucas_WMDE: canceled one beta-scap-eqiad job per https://w.wiki/J5$
* 18:25 bd808: Running chmod -R u=rwX,g=rwX,o=rX /data/project/upload7 from deployment-mediawiki02
* 18:16 bd808: chown done for /data/project/upload7
* 17:51 bd808: Running chown -R apache:apache on /data/project/upload7 from deployment-mediawiki02
* 17:11 bd808: Labs DNS seems to be flaking out badly and causing random scap and puppet failures
* 16:58 bd808: restarted puppetmaster on deployment-salt
* 16:31 bd808: apache user renumbered on deployment-mediawiki03
* 16:23 bd808: apache and hhvm restarted on beta app servers following apache user renumber
* 16:09 bd808: apache and hhvm stopped on beta app server tier. All requests expected to return 503 from varnish
* 16:03 bd808: Starting work on [[phab:T78076]] to renumber apache users in beta
* 08:21 YuviPanda|zzz: forcing puppet run on all deployment-prep hosts


== December 12 ==
== 2021-01-22 ==
* 22:38 bd808: Fixed scap by deleting /srv/mediawiki/~tmp~ on deployment-rsync01
* 08:07 legoktm: manually started mwcore-phpunit-coverage-master job with 6hr timeout
* 22:27 hashar: Creating 1300 Jenkins jobs to run extensions PHPUnit tests under either HHVM or Zend  PHP flavors.
* 18:35 bd808: Added puppet config to record !log messages in logstash
* 17:32 bd808: forcing puppet runs on deployment-mediawiki0[12]; hiera settings specific to beta were not applied on the hosts leading to all kinds of problems
* 17:12 bd808: restarted hhvm on deployment-mediawiki0[12] and purged hhbc database
* 17:00 bd808: restarted apache2 on deployment-mediawiki01
* 16:59 bd808: restarted apache2 on deployment-mediawiki02


== December 11 ==
== 2021-01-19 ==
* 22:13 hashar: Adding chrismcmahon to the 'integration' Gerrit group so he can +2 changes made to integration/config.git
* 17:11 James_F: Zuul: [mediawiki/services/function-orchestrator] Add pipeline CI [[phab:T271761|T271761]]
* 21:47 hashar: Jenkins re adding [https://integration.wikimedia.org/ci/computer/integration-slave1009/ integration-slave1009] to the pool of slaves
* 19:45 bd808|LUNCH: I got nerd snipped into looking at beta. Major personal productivity failure.
* 19:43 bd808|LUNCH: nslcd log noise is probably a red herring -- https://access.redhat.com/solutions/58684
* 19:39 bd808|LUNCH: lots of nslcd errors in syslog on deployment-rsync01 which may be causing scap failures
* 07:45 YuviPanda: shut up shinken-wm


== December 10 ==
== 2021-01-18 ==
* 22:17 bd808: restarted logstash on logstash1001. redis event queue not being processed
* 23:38 James_F: Zuul: [labs/tools/bodh-backend] Provide CI with tox-docker [[phab:T272320|T272320]]
* 10:30 hashar: Adding hhvm on Trusty slaves, using depooled integration-slave1009 as the main work area


== December 9 ==
== 2021-01-17 ==
* 16:33 bd808: restarted puppetmaster to pick up changes to custom functions
* 03:44 James_F: Zuul: [mediawiki/core] Add composer (not vendor) experimental PHP 8.0 job [[phab:T248925|T248925]]
* 16:19 bd808: forced install of sudo-ldap across beta with: salt '*' cmd.run 'env SUDO_FORCE_REMOVE=yes DEBIAN_FRONTEND=noninteractive apt-get -y install sudo-ldap'


== December 8 ==
== 2021-01-16 ==
* 23:45 bd808: deleted hhvm core on mediawiki01
* 23:24 James_F: Docker: Building cascade of new php-ast image [[phab:T271428|T271428]]
* 23:43 bd808: Ran `apt-get clean` on deployment-mediawiki01


== December 5 ==
== 2021-01-14 ==
* 22:21 bd808: 1.1G free on deployment-mediawiki02:/var after removing a lot of crap form logs and /var/tmp/cores
* 19:08 James_F: Zuul: [mediawiki/extensions/HeadScript] Add quibble job
* 22:06 bd808: /var full on deployment-mediawiki02 again :(((
* 10:50 hashar: applying mediawiki::multimedia class on contint slaves ( https://phabricator.wikimedia.org/T76661 | https://gerrit.wikimedia.org/r/#/c/177770/ )
* 01:01 bd808: Deleted a ton of jeprof.*.heap files from deployment-mediawiki02:/
* 00:54 YuviPanda: cleared out pngs from mediawiki02 to kill low space warning
* 00:53 YuviPanda: mediawiki02 instance is low on space, /tmp has lots of... pngs?


== December 4 ==
== 2021-01-13 ==
* 22:48 YuviPanda: manually rebased puppet on deployment-prep
* 15:26 hashar: Pruned Docker containers and images on all Docker based Jenkins agents
* 00:29 bd808: deleted instance "udplog"


== December 3 ==
== 2021-01-12 ==
* 19:11 bd808: Cleaned up legacy jobrunner scripts on deployment-jobrunner01 (/etc/default/mw-job-runner /etc/init.d/mw-job-runner /usr/local/bin/jobs-loop.sh)
* 21:15 brennen: Updating dev-images docker-pkg files on primary contint for https://gerrit.wikimedia.org/r/c/releng/dev-images/+/640567
* 20:13 James_F: Zuul: Remove Disambiguator from Parsoid dependencies (again)
* 20:06 James_F: Zuul: Add parsoid as a dependency of the Disambiguator extension
* 20:03 James_F: Zuul: Allow parsoid to be added to dependency and gatedextensions lists
* 19:33 James_F: Zuul: Revert Parsoid integration job injection.
* 02:35 James_F: Zuul: [mediawiki/vendor] Experimental composer-php80 job, not 72
* 01:11 James_F: Zuul: Ensure Parsoid's integration job tests against the Disambiguator extension [[phab:T237538|T237538]]
* 01:07 James_F: Zuul: [labs/tools/stewardbots] Enable PHP 8.0 jobs; drop special template


== December 2 ==
== 2021-01-11 ==
* 23:39 bd808: Cause of full disk on deployment-mediawiki01 was an hhvm core file; fixed now
* 08:59 hashar: gerrit: created integration/jenkinsci/gearman-plugin.git to maintain the Jenkins Gearman plugin # [[phab:T271683|T271683]]
* 23:35 bd808: /var full on deployment-mediawiki01
* 11:27 hashar: deleting /srv/vdb/varnish* files on all varnish instances ( https://phabricator.wikimedia.org/T76091 )
* 10:23 hashar: restarted parsoid on deployment-parsoid05
* 05:26 Krinkle: integration-slave1001 has been down since the failed reboot on 28 November 2014. Still unreachable over ssh and no Jenkins slave agent.


== December 1 ==
== 2021-01-09 ==
* 18:54 bd808: Got jenkins updates working again by taking deployment-bastion node offline, killing waiting jobs and bringing it back online again.
* 04:30 James_F: Zuul: [mediawiki/libs/RemexHtml] Enable PHP 8.0 jobs, now passing [[phab:T271575|T271575]]
* 18:51 bd808: updates in beta suck with the "Waiting for next available executor" deadlock again
* 04:30 James_F: Zuul: [mediawiki/libs/Equivset] Enable PHP 8.0 jobs, now passing [[phab:T271575|T271575]]
* 17:59 bd808: Testing rsyslog event forwarding to logstash via puppet cherry-pick


== November 27 ==
== 2021-01-08 ==
* 12:28 hashar: enabled puppet master autoupdate by setting <tt>puppetmaster_autoupdate: true</tt> in [[Hiera:Integration]] . https://phabricator.wikimedia.org/T75878
* 02:17 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/654964
* 12:28 hashar: rebased integration puppetmaster : 5d35de4..1a5ebee
* 00:32 bd808: Testing local hack on deployment-salt to switch order of heira backends
* 00:16 bd808: Testing a proposed puppet patch to allow pointing hhvm logs back to deploment-bastion


== November 26 ==
== 2021-01-07 ==
* 00:51 bd808: cherry-picked patch for redis logstash input from MW {{gerrit|175896}}
* 18:34 James_F: Zuul: Provide experimental PHP 7.2-on-buster jobs [[phab:T252434|T252434]]
* 00:50 bd808: Restored puppet cherry-picks from reflog [phab:T75947]
* 18:32 James_F: Zuul: [labs/tools/massmailer] Add gate-and-submit-l10n jobs [[phab:T271426|T271426]]
* 16:01 hashar: Tag Quibble 0.0.46 @ {{Gerrit|df9e75329ab}} # [[phab:T225218|T225218]] [[phab:T266441|T266441]] [[phab:T263500|T263500]]


== November 25 ==
== 2021-01-06 ==
* 23:45 hashar: Fixed upload cache on beta cluster, the Varnish backend had a mmap SILO error that prevented the backend from starting. https://phabricator.wikimedia.org/T75922
* 19:36 dpifke: Cherry-picking https://gerrit.wikimedia.org/r/c/operations/puppet/+/620729 in beta puppet.
* 21:05 bd808: Running `sudo find . -type d ! -perm -o=w -exec chmod 0777 {} +` to fix upload permissions
* 18:33 dpifke: Doing a test run in beta of the scap commands I'm going to run in today's backport window to roll out profiler changes.
* 18:01 legoktm: cleared out renameuser_status table (old broken global merges)
* 18:00 legoktm: 4086 rows deleted from localnames, 3929 from localuser
* 17:59 legoktm: clearing out localnames/localuser where wikis don't exist on beta
* 17:10 legoktm: ran migratePass0.php on all wikis
* 17:09 legoktm: ran checkLocalUser.php --delete on all wikis
* 17:08 legoktm: PHP Notice:  Undefined index: wmgExtraLanguageNames in /mnt/srv/mediawiki/php-master/includes/SiteConfiguration.php on line 307
* 17:07 legoktm: ran checkLocalNames.php --delete on all wikis
* 04:37 jgage: restarted jenkins at 20:31


== November 24 ==
== 2021-01-05 ==
* 17:24 greg-g: stupid https
* 23:55 dpifke: Cherry-picking https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/651267 and https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/621095/3 in beta. Should be a no-op (removes hopefully unused code).
* 16:40 bd808|deploy: My problem with en.wikipedia.beta.wmflabs.org was caused by a forceHTTPS cookie being set in my browser and redirecting to the broken https endpoint
* 16:33 bd808|deploy: scap fixed by reverting bad config patch; still looking into failures from en.wikipedia.beta.wmflabs.org
* 16:27 bd808: Looking at scap crash
* 15:18 YuviPanda: restored local hacks + fixed 'em to account for 47dcefb74dd4faf8afb6880ec554c7e087aa947b on deployment-salt puppet repo, puppet failures recovering now


== November 21 ==
== 2021-01-04 ==
* 17:06 bd808: deleted salt keys for deleted instances: i-00000289, i-0000028a, i-0000028b, i-0000028e, i-000002b7, i-000006ad
* 22:48 James_F: Zuul: [mediawiki/services/parsoid] Enable PHP 8.0 composer job [[phab:T269719|T269719]]
* 15:57 hashar: fixed puppet cert on deployment-restbase01
* 22:30 hasharAway: IRC notifications from Jenkins / wmf-insecte disabled  for now due to [[phab:T271122|T271122]]
* 15:50 hashar: deployment-sca01 regenerating puppet CA for deployment-sca01
* 21:08 hasharAway: Change Jenkins IRC login to mw-jenkinsbot # [[phab:T271122|T271122]]
* 15:34 hashar: Renerated puppet master certificate on deployment-salt. It needs to be named  deployment-salt.eqiad.wmflabs not i-0000015c.eqiad.wmflabs.  Puppet agent works on deployment-salt now.
* 17:33 thcipriani: fixed beta-scap-eqiad by removing local mwdeploy user/group using vipw/vigr and chown -R mwdeploy:mwdeploy /srv/mediawiki for deployment-prep hosts
* 15:19 hashar: I have revoked the deployment-salt certificates. All puppet agent are thus broken!
* 15:01 hashar: deployment-salt cleaning certs with puppet cert clean
* 14:52 hashar: manually switching restbase01 puppet master from virt1000 to deployment-salt.eqiad.wmflabs
* 14:50 hashar: deployment-restbase01 has some puppet error: Error 400 on SERVER: Must provide non empty value. on node i-00000727.eqiad.wmflabs . That is due to puppet pickle() function being given an empty variable


== November 20 ==
== 2021-01-02 ==
* 15:25 hashar: 15:01 Restarted Jenkins AND Zuul.  Beta cluster jobs are still deadlocked.
* 19:13 James_F: Zuul: Add CI for the Mirage skin [[phab:T270979|T270979]]
* 13:21 hashar: for integration, set puppet master report retention to 360 minutes ( https://wikitech.wikimedia.org/wiki/Hiera:Integration , see https://bugzilla.wikimedia.org/show_bug.cgi?id=73472#c14 )
* 13:20 hashar: rebased puppet master on integration project
* 13:20 hashar: rebased puppet master


== November 19 ==
== 2021-01-01 ==
* 21:27 bd808: Ran `GIT_SSH=/var/lib/git/ssh git pull --rebase` in deployment-salt:/srv/var-lib/git/labs/private
* 18:26 James_F: zuul: Try in a second way to only run mwext coverage jobs on master [[phab:T270976|T270976]]
* 18:13 James_F: zuul: [mediawiki/extensions/AbuseFilter] Make sqlite tests voting [[phab:T251967|T251967]]


== November 18 ==
{{SAL-archives/Release Engineering}}
* 15:32 hashar: Deleting job https://integration.wikimedia.org/ci/job/mediawiki-vendor-integration/ replaced by mediawiki-phpunit. Clearing out workspaces {{bug|73515}}


== November 17 ==
* 09:24 YuviPanda: moved *old* /var/log/eventlogging into /home/yuvipanda so puppet can run without bitching
* 04:57 YuviPanda: cleaned up coredump on mediawiki02 on deployment-prep
== November 14 ==
* 21:03 marxarelli: loaded and re-saved jenkins configuration to get it back to english
* 17:27 bd808: /var full on deployment-mediawiki02. Adjusted ~bd808/cleanup-hhvm-cores for core found in /var/tmp/core rather than the expected /var/tmp/hhvm
* 11:14 hashar: Recreated a labs Gerrit setup on integration-zuul-server . Available from http://integration.wmflabs.org/gerrit/ using OpenID for authentication.
== November 13 ==
* 11:13 hashar: apt-get upgrade / maintenance on all slaves
* 11:02 hashar: bringing back integration-slave1008 to the pool. The label had a typo. https://integration.wikimedia.org/ci/computer/integration-slave1008/
== November 12 ==
* 21:03 hashar: Restarted Jenkins due to a deadlock with deployment-bastion slave
== November 9 ==
* 16:51 bd808: Running `chmod -R =rwX .` in /data/project/upload7
== November 8 ==
* 08:06 YuviPanda: that fixed it
* 08:04 YuviPanda: disabling/enabling gearman
== November 6 ==
* 23:43 bd808: https://integration.wikimedia.org/ci/job/mwext-MobileFrontend-qunit-mobile/ happier after I deleted the clone of mw/core that was somehow corrupted
* 21:01 cscott: bounced zuul, jobs seem to be running again
* 20:58 cscott: about to restart zuul as per https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Known_issues
* 00:53 bd808: HHVM not installed on integration-slave1009? "/srv/deployment/integration/slave-scripts/bin/mw-run-phpunit-hhvm.sh: line 42: hhvm: command not found" -- https://integration.wikimedia.org/ci/job/mediawiki-core-regression-hhvm-master/2542/console
== November 5 ==
* 16:14 bd808: Updated scap to include Ic4574b7fed679434097be28c061927ac459a86fc (Revert "Make scap restart HHVM")
== October 31 ==
* 17:13 godog: bouncing zuul in jenkins as per https://www.mediawiki.org/wiki/Continuous_integration/Zuul#Known_issues
== October 30 ==
* 16:34 hashar: cleared out /var/ on integration-puppetmaster
* 16:34 bd808: Upgraded kibana to v3.1.1
* 15:54 hashar: Zuul: merging in https://review.openstack.org/#/c/128921/3 which should fix jobs being stuck in queue on merge/gearman failures. {{bug|72113}}
* 15:45 hashar: Upgrading Zuul reference copy from upstream c9d11ab..1f4f8e1 
* 15:43 hashar: Going to upgrade Zuul and monitor the result over the next hour.
== October 29 ==
* 22:58 bd808: Stopped udp2log and started udp2log-mw on deployment-bastion
* 19:46 bd808: Logging seems broken following merge of https://gerrit.wikimedia.org/r/#/c/119941/24. Investigating
== October 28 ==
* 21:39 bd808: RoanKattouw creating deployment-parsoid05 as a replacement for the totally broken deployment-parsoid04
== October 24 ==
* 13:36 hashar: That bumps hhvm on contint from 3.3.0-20140925+wmf2  to 3.3.0-20140925+wmf3
* 13:36 hashar: apt-get upgrade on Trusty Jenkins slaves
== October 23 ==
* 22:43 hashar: Jenkins resumed activity.  Beta cluster code is being updated
* 21:36 hashar: Jenkins: disconnected / reconnected slave node  deployment-bastion.eqiad
== October 22 ==
* 20:54 bd808: Enabled puppet on deployment-logstash1
* 09:07 hashar: Jenkins: upgrading gearman-plugin from 0.0.7-1-g3811bb8 to 0.1.0-1-gfa5f083 .  Ie bring us to latest version + 1 commit
== October 21 ==
* 21:10 hashar: contint: refreshed slave-scripts 0b85d48..8c3f228  sqlite files will be cleared out after 20 minutes (instead of 60 minutes)  {{bug|71128}}
* 20:51 cscott: deployment-prep _joe_ promises to fix this properly tomorrow am
* 20:51 cscott: deployment-prep turned off puppet on deployment-pdf01, manually fixed broken /etc/ocg/mw-ocg-service.js
* 20:50 cscott: deployment-prep updated OCG to version 523c8123cd826c75240837c42aff6301032d8ff1
* 10:55 hashar: deleted salt master key on deployment-elastic{06,07}, restarted salt-minion and reran puppet.  It is now passing on both instances \O/
* 10:48 hashar: rerunning puppet manually on deployment-elastic{06,07}
* 10:48 hashar: beta: signing puppet cert for deployment-elastic{06,07}.  On deployment-salt ran:  puppet ca sign  i-000006b6.eqiad.wmflabs; puppet ca sign i-000006b7.eqiad.wmflabs
* 09:29 hashar: forget me  deployment-logstash1 has a puppet agent error but it is simply because the agent is disabled "'debugging logstash config'"
* 09:28 hashar: deployment-logstash1 disk full
== October 20 ==
* 17:41 bd808: Disabled redis input plugin and restarted logstash on deployment-logstash1
* 17:39 bd808: Disabled puppet on deployment-logstash1 for some live hacking of logstash config
* 15:27 apergos: upgrded salt-master on virt1000 (master for labs)
== October 17 ==
* 22:34 subbu: live fixed bad logger config in /srv/deployment/parsoid/deploy/conf/wmf/betalabs.localsettings.js and verified that parsoid doesn't crash anymore -- fix now on gerrit and being merged
* 20:48 hashar: qa-morebots is back
* 20:30 hashar: beta: switching Parsoid config file to the one in mediawiki/services/parsoid/deploy.git instead of the puppet maintained config file https://gerrit.wikimedia.org/r/#/c/166610/ for subbu.  Parsoid seems happy :)
* hashar: qa-morebots disappeared :(  {{bug|72179}}
* hashar: deployment-logstash1 unlocking puppet by deleting left over /var/lib/puppet/state/agent_catalog_run.lock
* hashar: logstash1 instance being filled up is {{bug|72175}}  probably caused by the Diamond collector spamming /server-status?auto
* hashar: deployment-logstash1 deleting files under /var/log/apache2/  gotta fill a bug to prevent access log from filling the partition
== October 16 ==
* 06:14 apergos: updated remaining beta instances to salt-minion 2014.1.11 from salt ppa
== October 15 ==
* 12:56 apergos: updated i-000002f4, i-0000059b, i-00000504, i-00000220 salt-minion to 2014.1.11
* 12:20 apergos: updated salt-master and salt-minion on the deployment-salt host _only_  to 2014.1.11 (using salt ppa for now)
* 01:08 Krinkle: Pooled integration-slave1009
* 01:00 Krinkle: Setting up integration-slave1009 ({{bug|72014}} fixed)
* 01:00 Krinkle: integration-publisher and integration-zuul-server were rebooted by me yesterday. Seems they only show up in graphite now. Maybe they were shutdown or had puppet stuck.
== October 14 ==
* 21:00 JohnLewis: icinga says deployment-sca01 is good (yay)
* 20:42 JohnLewis: deleted and recreated deployment-sca01 (still needs puppet set up)
* 20:24 JohnLewis: rebooted deployment-sca01
* 09:26 hashar: renamed deployment-cxserver02 node slaves to 03 and updated the ip address
* 06:49 Krinkle: Did a slow-rotating graceful depool/reboot/repool of all integration-slave's over the past hour to debug problems whilst waiting for puppet to unblock and set up new slaves.
* 06:43 Krinkle: Keeping the new integration-slave1009 unpooled because setup could not be completed due to {{bug|72014}}.
* 06:43 Krinkle: Pooled integration-slave1004
* 05:40 Krinkle: Setting up integration-slave1004 and integration-slave1009 ({{bug|71873}} fixed)
== October 10 ==
* 20:53 Krinkle: Deleted integration-slave1004 and integration-slave1009. When {{bug|71873}} is fixed, they'll need to be re-created.
* 19:11 Krinkle: integration-slave1004 (new instance, not set up yet) was broken ({{bug|71741}}). The bug seems fixed for new instances so, I deleted and re-created it. Will be setting up as a Precise instance and pool it.
* 19:09 Krinkle: integration-slave1009 (new instance) remains unpooled as it is not yet fully set up ({{bug|71874}}). See [[Nova_Resource:Integration/Setup]]
== October 9 ==
* 20:17 bd808: rebooted deployment-sca01 via wikitech ui
* 20:16 bd808: deployment-sca01 dead -- Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
* 19:44 bd808: added role::deployment::test to deployment-rsync01 and deployment-mediawiki03 for trebuchet testing
* 19:07 bd808: updated scap to include 8183d94 (Fix "TypeError bufsize must be an integer")
* 09:34 hashar: migrating deployment-cxserver02 to beta cluster puppet and salt masters
* 09:22 hashar: Renamed Jenkins slave deployment-cxserver01 to deployment-cxserver02 and updated IP. It is marked offline until the instance is ready and has the relevant puppet classes applied.
* 09:19 hashar: deleting deployment-cxserver01 (borked since virt1005 outage) creating deployment-cxserver02 to replace it {{bug|71783}}
== October 7 ==
* 19:19 bd808: ^d deleted all files/directories in gallium:/var/lib/jenkins-slave/tmpfs
* 18:24 bd808: /var/lib/jenkins-slave/tmpfs full (100%) on gallium
* 11:54 Krinkle: The new integration-slave1009 must remain unpooled because Setup failed (puppet unable to mount /mnt, {{bug|71874}}) - see also [[Nova Resource:Integration/Setup]]
* 11:53 Krinkle: Deleted integration-slave1004 because {{bug|71741}}
* 10:16 hashar: beta: apt-get upgraded all instances beside the lucid one.
* 09:57 hashar: beta: deleting old occurrences of /etc/apt/preferences.d/puppet_base_2.7
* 09:53 hashar: apt-get upgrade on all beta cluster instances
* 09:34 Krinkle: Rebase integration-puppetmaster on latest operations-puppet (patches: I7163fd38bcd082a1, If2e96bfa9a1c46)
* 09:32 Krinkle: Apply I44d33af1ce85 instead of Ib95c292190d on integration-puppetmaster (remove php5-parsekit package)
* 09:28 hashar: upgrading php5-fss on both beta-cluster and integration instances. {{bug|66092}} https://rt.wikimedia.org/Ticket/Display.html?id=7213
* 08:55 Krinkle: Building additional contint slaves in labs (integration-slave1004 with precise and integration-slave1009 with trusty)
* 08:21 Krinkle: Reload Zuul to deploy 5e905e7c9dde9f47482d
== October 3 ==
* 22:53 bd808: Had to stop and start zuul due to NoConnectedServersError("No connected Gearman servers") in zuul.log on gallium
* 22:34 bd808|deploy: Merged Ie731eaa7e10548a947d983c0539748fe5a3fe3a2 (Regenerate autoloader) to integration/phpunit for bug 71629
* 14:01 manybubbles: rebuilding beta's simplewiki cirrus index
* 08:24 hashar: deployment-bastion clearing up /var/log/account a bit {{bug|69604}}. Puppet patch pending :]
== October 2 ==
* 19:42 bd808: Updated scap to include eff0d01 Fix format specifier for error message
* 11:58 hashar: Migrated all mediawiki-core-regression* jobs to Zuul cloner {{bug|71549}}
* 11:57 hashar: Migrated all mediawiki-core-regression* jobs to Zuul cloner
== October 1 ==
* 20:57 bd808: hhvm servers broken because of I5f9b5c4e452e914b33313d0774fb648c1cdfe7ad
* 17:29 bd808: Stopped service udp2log and started service udp2log-mw on deployment-bastion
* 16:21 bd808: Cherry-picked https://gerrit.wikimedia.org/r/#/c/163078/ into scap for beta. hhvm will be restarted on each scap. Keep your eyes open for weird problems like 503 responses that this may cause.
* 14:14 hashar: rebased contint puppetmaster
== September 30 ==
* 23:47 bd808: jobrunner using outdated ip address for redis01. Testing patch to use hostname rather than hardcoded ip
* 21:45 bd808: jobrunner not running. ebernhardson is debugging.
* 21:38 bd808: /srv on rsync01 now has 3.2G of free space and should be fine fro quite a while again.
* 21:37 bd808: I figured out the disk space problem on rsync01 (just as I was ready to replace it with rsync02). The old /src/common-local directory was still there which doubled the disk utilization. /src/mediawiki is the correct sync dir now following prod changes.
* 21:15 bd808: local l10nupdate users on bastion, mediawiki01 and rsync01
* 21:06 bd808: Local mwdeploy user on deployment-bastion making things sad
* 20:36 bd808: lots and lots of "file has vanished" errors from rsync. Not sure why
* 20:35 bd808: Initial puppet run with role::beta::rsync_slave applied on rsync02 failed spectacularly in /Stage[main]/Mediawiki::Scap/Exec[fetch_mediawiki] stage
* 20:02 bd808: Started building deployment-rsync02 to replace deployment-rsync01
* 19:59 bd808|LUNCH: /srv partition on deployment-rsync01 full again. We need a new rsync server with more space
* 17:44 bd808: Updated scap to 064425b (Remove restart-nutcracker and restart-twemproxy scripts)
* 16:08 bd808: Occasional memecached-serious errors in beta for something trying to connect to the default memcached port (11211) rather than the nutcracker port (11212).
* 15:58 bd808: scap happy again after fixing rogue group/user on rsync01 \o/ Not sure why they were created but likely an ldap hiccup during a puppet run
* 15:56 bd808: removed local group/user mwdeploy on deployment-rsync01
* 15:54 bd808: Local mwdeploy (gid=996) shadowing ldap group gid=603(mwdeploy) on deployment-rsync01
* 15:49 bd808: apt-get dist-upgrade fixed hhvm on deployment-mediawiki03
* 15:45 hashar: Updating our Jenkins job builder fork  686265a..ee80dbc (no job changed)
* 15:44 bd808: scap failing in beta due to "Permission denied (publickey)" talking to deployment-rsync01.eqiad.wmflabs
* 15:39 bd808: hhvm not starting after puppet run on deployment-mediawiki03. Investigating.
* 15:36 bd808: enabling puppet and forcing run on deployment-mediawiki03
* 15:34 bd808: enabling puppet and forcing run on deployment-mediawiki02
* 15:29 bd808:  puppet showed no changes on mediawiki01‽
* 15:27 bd808: enabling puppet and forcing run on deployment-mediawiki01
* 15:13 bd808: Fixed logstash by installing http://packages.elasticsearch.org/logstash/1.4/debian/pool/main/l/logstash-contrib/logstash-contrib_1.4.2-1-efd53ef_all.deb
* 15:02 bd808: Logstash doesn't bundle the prune filter by default any more -- http://logstash.net/docs/1.4.2/filters/prune
* 14:59 bd808: Logstash rules need to be adjusted for latest upstream version: "Couldn't find any filter plugin named 'prune'"
* 12:37 hashar: Fixed some file permissions under deployment-bastion:/srv/mediawiki-staging/php-master/vendor/.git  some files belonged to root instead of mwdeploy
* 00:34 bd808: Updated kibana to latest upstream head 8653aba
== September 29 ==
* 14:22 hashar: apt-get upgrade and reboot of all integration-slaveXX instances
* 14:07 hashar: updated puppetmaster labs/private on both integration and beta cluster projects ( a41fcdd..84f0906 )
* 08:57 hashar: rebased puppetmaster
== September 26 ==
* 22:16 bd808: Deleted deployment-mediawiki04 (i-000005ba.eqiad.wmflabs) and removed from salt and trebuchet
* 07:50 hashar: Pooled back integration-slave1006 , was removed because of {{bug|71314}}
* 07:41 hashar: Updated our Jenkins Job Builder fork 2d74b16..686265a
== September 25 ==
* 23:35 bd808: Done messing with puppet repo. Replaced 2 local commits with proper gerrit cherry picks. Removed a cherry-pick that had been rearranged and merged. Removed a cherry-pick that had been abandoned in gerrit.
* 23:10 bd808: removed cherry-pick of abandoned https://gerrit.wikimedia.org/r/#/c/156223/; if beta wikis stop working this would be a likely culprit
* 22:36 bd808: Trying to reduce the number of untracked changes in puppet repo. Expect some short term breakage.
* 22:21 bd808: cleaned up puppet repo with `git rebase origin/production; git submodule update --init --recursive`
* 22:18 bd808: puppet repo on deployment-salt out of whack. I will try to fix.
* 08:15 hashar: beta: puppetmaster rebased
* 08:10 hashar: beta: dropped a patch that reverted OCG LVS configuration ( https://gerrit.wikimedia.org/r/#/c/146860/ ), it has been fixed by https://gerrit.wikimedia.org/r/#/c/148371/
* 08:04 hashar: attempting to rebase beta cluster puppet master. Currently at 74036376
== September 24 ==
* 15:30 hashar_: install additional fonts on jenkins slaves for browser screenshots ( https://gerrit.wikimedia.org/r/#/c/162604/ and https://bugzilla.wikimedia.org/69535 )
* 09:57 hashar_: upgraded Zuul on all integration labs instances
* 09:33 hashar_: Jenkins switched mwext-UploadWizard-qunit back to Zuul cloner by applying pending change {{gerrit|161459}}
* 09:19 hashar_: Upgrading Zuul to f0e3688  Cherry pick https://review.openstack.org/#/c/123437/1 which fix {{bug|71133}} ''Zuul cloner: fails on extension jobs against a wmf branch''
== September 23 ==
* 23:08 bd808: Jenkins and deployment-bastion talking to each other again after six (6!) disconnect, cancel jobs, reconnect cycles
* 22:53 greg-g: The dumb "waiting for executors" bug is https://bugzilla.wikimedia.org/show_bug.cgi?id=70597
* 22:51 bd808: Jenkins stuck trying to update database in beta again with the dumb "waiting for executors" bug/problem
== September 22 ==
* 16:09 bd808: Ori updating HHVM to 3.3.0-20140918+wmf1 (from deployment-prep SAL)
* 09:37 hashar_: Jenkins: deleting old mediawiki extensions jobs (<tt>rm -fR /var/lib/jenkins/jobs/*testextensions-master</tt>).  They are no more triggered and superseded by the <tt>*-testextension</tt> jobs.
== September 20 ==
* 21:30 bd808: Deleted /var/log/atop.* on deployment-bastion to free some disk space in /var
* 21:29 bd808: Deleted /var/log/account/pacct.* on deployment-bastion to free some disk space in /var
== September 19 ==
* 21:16 hashar: puppet is broken on Trusty integration slaves  because they try to install the non existing package php-parsekit. WIP will get it sorted on eventually.
* 14:57 hashar: Jenkins friday deploy: migrate all MediaWiki extension qunit jobs to Zuul cloner.
== September 17 ==
* 12:20 hashar: upgrading jenkins 1.565.1 -> 1.565.2
== September 16 ==
* 16:36 bd808: Updated scap to 663f137 (Check php syntax with parallel `php -l`)
* 04:01 jeremyb: deployment-mediawiki02: salt was broken with a msgpack exception. mv -v /var/cache/salt{,.old} && service salt-minion restart fixed it. also did salt-call saltutil.sync_all
* 04:00 jeremyb: deployment-mediawiki02: (/run was 99%)
* 03:59 jeremyb: deployment-mediawiki02: rm -rv /run/hhvm/cache && service hhvm restart
* 00:51 jeremyb: deployment-pdf01 removed base::firewall (ldap via wikitech)
== September 15 ==
* 22:53 jeremyb: deployment-pdf01: pkill -f grain-ensure
* 21:36 bd808: Trying to fix salt with `salt '*' service.restart salt-minion`
* 21:32 bd808: only hosts responding to salt in beta are deployment-mathoid, deployment-pdf01 and deployment-stream
* 21:29 bd808: salt calls failing in beta with errors like "This master address: 'salt' was previously resolvable but now fails to resolve!"
* 20:18 hashar: restarted salt-master
* 19:50 hashar: killed on deployment-bastion  a bunch of <tt>python /usr/local/sbin/grain-ensure contains ... </tt> and <tt>/usr/bin/python /usr/bin/salt-call --out=json grains.append deployment_target scap</tt> commands
* 18:57 hashar: scap breakage due to ferm is logged as https://bugzilla.wikimedia.org/show_bug.cgi?id=70858
* 18:48 hashar: https://gerrit.wikimedia.org/r/#/c/160485/ tweaked a default ferm configuration file which caused puppet to reload ferm.  It ends up having rules that prevent ssh from other host thus breaking rsync \O/
* 18:37 hashar: beta-scap-eqiad job is broken since ~17:20 UTC https://integration.wikimedia.org/ci/job/beta-scap-eqiad/21680/console  || rsync: failed to connect to deployment-bastion.eqiad.wmflabs (10.68.16.58): Connection timed out (110)
== September 13 ==
* 01:07 bd808: Moved /srv/scap-stage-dir to /srv/mediawiki-staging; put a symlink in as a failsafe
* 00:31 bd808: scap staging dir needs some TLC on deployment-bastion; working on it
* 00:30 bd808: Updated scap to I083d6e58ecd68a997dd78faabe60a3eaf8dfaa3c
== September 12 ==
* 01:28 ori: services promoted User:Catrope to projectadmin
== September 11 ==
* 20:59 spagewmf: https://integration.wikimedia.org/ci/ is down with 503 errors
* 16:13 bd808: Now that scap is pointed to labmon1001.eqiad.wmnet the deployment-graphite.eqiad.wmflabs host can probably be deleted; it never really worked anyway
* 16:12 bd808: Updated scap to include I0f7f5cae72a87f68d861340d11632fb429c557b9
* 15:09 bd808: Updated hhvm-luasandbox to latest version on mediawiki03 and verified that mediawiki0[12] were already updated
* 15:01 bd808: Fixed incorrect $::deployment_server_override var on deployment-videoscaler01; deployment-bastion.eqiad.wmflabs is correct and deployment-salt.eqiad.wmflabs is not
* 10:05 ori: deployment-prep upgraded luasandbox and hhvm across the cluster
* 08:41 spagewmf: deployment-mediawiki01/02 are not getting latest code
* 05:10 bd808: Reverted cherry-pick of I621d14e4b75a8415b16077fb27ca956c4de4c4c3 in scap; not the actual problem
* 05:02 bd808: Cherry-picked I621d14e4b75a8415b16077fb27ca956c4de4c4c3 to scap  to try and fix l10n update issue
== September 10 ==
* 19:38 bd808: Fixed beta-recompile-math-texvc-eqiad job on deployment-bastion
* 19:38 bd808: Made /usr/local/apache/common-local a symlink to /srv/mediawiki on deployment-bastion
* 19:37 bd808: Deleted old /srv/common-local on deployment-videoscaler01
* 19:32 bd808: Killed jobs-loop.sh tasks on deployment-jobrunner01
* 19:30 bd808: Removed old mw-job-runner cron job on deployment-jobrunner01
* 19:19 bd808: Deleted /var/log/account/pacct* and /var/log/atop.log.* on deployment-jobrunner01 to make some temporary room in /var
* 19:14 bd808: Deleted /var/log/mediawiki/jobrunner.log and restarted jobrunner on deployment-jobrunner01:
* 19:11 bd808: /var full on deployment-jobrunner01
* 19:05 bd808: Deleted /srv/common-local on deployment-jobrunner01
* 19:04 bd808: Changed /usr/local/apache/common-local symlink to point to /srv/mediawiki on deployment-jobrunner01
* 19:03 bd808: w00t!!! scap jobs is green again -- https://integration.wikimedia.org/ci/job/beta-scap-eqiad/20965/
* 19:00 bd808: sync-common finished on deployement-jobrunner01; trying Jenkins scap job again
* 18:53 bd808: Removed symlink and make /srv/mediawiki a proper directory on deployment-jobrunner01; Running sync-common to populate.
* 18:45 bd808: Made /srv/mediawiki a symling to /srv/common-local on deployment-jobrunner01
* 10:20 jeremyb: deployment-bastion /var at 97%, freed up ~500MB. apt-get clean && rm -rv /var/log/account/pacct*
* 10:17 jeremyb: deployment-bastion good puppet run
* 10:16 jeremyb: deployment-salt had an oom-kill recently. and some box (maybe master, maybe client?) had a disk fill up
* 10:15 jeremyb: deployment-mediawiki0[12] both had good puppet runs
* 10:15 jeremyb: deployment-salt started puppetmaster && puppet run
* 10:14 jeremyb: deployment-bastion killed puppet lock
* 03:04 bd808: Ori made puppet changes that moved the MediaWiki install dir to /srv/mediawiki (https://gerrit.wikimedia.org/r/#/c/159431/). I didn't see that in SAL so I'm adding it here.
== September 9 ==
* 03:06 bd808: Restarted jenkins agent on delopment-bastion twice to resolve executor deadlock (bug 70597)
== September 7 ==
* 07:00 jeremyb: testing 1,2,3
__NOTOC__
<noinclude>[[Category:SAL]]</noinclude>
<noinclude>[[Category:SAL]]</noinclude>

Revision as of 19:05, 17 September 2021

2021-09-17

  • 19:05 hashar: Building Docker images for [tox-buster] Install shellcheck and cascade [integration/config] - https://gerrit.wikimedia.org/r/721881
  • 18:08 Krinkle: Re-recreating qemu-1002 as integration-agent-qemu-1003 (Debian 11 Bullseye, g3.cores8.ram24.disk20.ephemeral40.4xiops), ref T284774
  • 18:07 Krinkle: Re-recreating qemu-1002 as integration-agent-qemu-1003 (Debian 11 Bullseye, g3.cores8.ram24.disk20.ephemeral40.4xiops), ref T28477
  • 16:19 dpifke: Enabled TLS on Jumbo Kafka instances in deployment-prep.

2021-09-16

2021-09-15

2021-09-14

  • 17:17 Amir1: delete from wb_changes_dispatch where chd_db = 'enwiki'; (T290985)

2021-09-13

  • 09:33 hashar: Castor cache: nuked files that were last changed more than six months ago to free up disk space

2021-09-10

  • 21:52 James_F: Created experimental integration-agent-docker-1021 for T252071
  • 21:48 James_F: Deleting CI agent integration-agent-docker-1001 for T252071
  • 21:44 James_F: Pulling oldest CI agent integration-agent-docker-1001 from rotation so it can be replaced by a bullseye one for T252071
  • 21:23 James_F: Zuul: [integration/config] Add shellcheck job for scripts defined in jjb as an experimental job
  • 17:41 James_F: Zuul: [cloud/toolforge/jobs-framework-emailer] Add basic tox CI
  • 02:18 James_F: Zuul: [wikipeg] Switch JS+PHP job from node10 to node12
  • 02:13 James_F: Zuul: [wikipeg] Provide wikipeg-special-node12-plus-php80-composer-docker as an experimental job
  • 02:10 James_F: Zuul: [oojs/ui] Switch special JS+PHP job from node10 to node12
  • 01:58 James_F: Zuul: [oojs/ui] Add ooui-special-node12-plus-php80-composer-docker as experimental
  • 01:48 James_F: Zuul: [wikipeg] Drop php72 special test job, the php80 one suffices

2021-09-09

  • 22:20 brennen: gitlab-ansible-test: resetting instance data
  • 19:42 James_F: Docker: Building node{10,12}-test-browser-php80-composer for T290651
  • 10:56 hashar: Successfully published image docker-registry.discovery.wmnet/releng/helm-linter:0.2.17
  • 10:37 hashar: Reloading Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/719496

2021-09-08

2021-09-07

2021-09-03

2021-09-02

  • 15:17 brennen: gitlab-test: testing upgrade path to 14.x

2021-09-01

  • 21:22 dpifke: Cherry-picking https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/716041 in beta.
  • 16:55 urbanecm: deployment-prep: Unlock scap
  • 15:50 urbanecm: deployment-prep: Lock scap again
  • 15:40 urbanecm: deployment-prep: Lock scap to be able to test something
  • 14:08 urbanecm: deployment-prep: Create foundationwiki (T290164)
  • 14:07 urbanecm: urbanecm@deployment-mediawiki11:~$ sudo run-puppet-agent # T290164
  • 13:58 urbanecm: urbanecm@deployment-cache-text06:~$ sudo run-puppet-agent # T290164

2021-08-31

  • 21:38 dduvall: deploying new blubberoid to eqiad/codfw following successful testing in staging
  • 21:35 dduvall: staging new blubberoid release to deploy https://gerrit.wikimedia.org/r/c/blubber/+/715276
  • 14:29 hashar: Restarting CI Jenkins for plugins upgrade

2021-08-30

  • 19:53 urbanecm: urbanecm@deployment-deploy01:/srv/mediawiki-staging$ git submodule update portals # to clear dirty staging dir at beta
  • 19:27 urbanecm: deployment-prep: reboot deployment-eventgate-3 (T289029)
  • 19:16 brennen: gitlab-test: powering off gitlab (former main test instance)
  • 19:11 James_F: Zuul: [mediawiki/services/apple-search] Add postmerge publish
  • 19:10 brennen: gitlab-test: upgrading gitlab on gitlab-ansible-test to 13.12.9; re-associating floating IP to gitlab-ansible-test
  • 18:23 brennen: gitlab-test: associating floating IP to primary test box
  • 16:36 James_F: Zuul: [mediawiki/services/apple-search] Remove composer-package
  • 15:16 James_F: Zuul: [mediawiki/services/apple-search] Add pipeline CI for T289224
  • 12:20 Amir1: foreachwikiindblist wikisource refreshImageMetadata.php --mediatype=OFFICE --batch-size=10 --verbose --split --sleep 5

2021-08-27

  • 19:24 James_F: Docker: Publish initial node14 CI images for T267888

2021-08-25

2021-08-24

  • 23:15 James_F: Zuul: Configure the REL1_37 test and gate pipelines T289587
  • 22:18 thcipriani: phab1001:sudo /srv/phab/phabricator/bin/bulk make-silent --id 2822 (releng-logspam -> unstewarded production error)
  • 20:02 Reedy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/712990

2021-08-23

  • 20:37 James_F: Docker: Publish php-ast with 1.0.14 (T289429) and no longer support PHP 7.0 or 7.1 (last trace!)

2021-08-21

  • 15:03 majavah: fixing deployment-prep puppet merge conflicts re: swift

2021-08-20

  • 21:11 urbanecm: urbanecm@deployment-deploy01:/srv/mediawiki-staging/private$ rm mwblocker.log # remove weird blank log file
  • 18:56 jeena: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/713439
  • 18:55 urbanecm: urbanecm@deployment-mwmaint01:/srv/mediawiki/php-master$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki
  • 18:54 urbanecm: urbanecm@deployment-mwmaint01:~$ for i in {1..20}; do echo "test $i" | mwscript edit.php --wiki={cswiki,enwiki} --user="Martin Urbanec (test $i)" --summary="test" Sandbox; done
  • 18:49 urbanecm: urbanecm@deployment-mwmaint01:/srv/mediawiki/php-master$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=enwiki
  • 18:49 urbanecm: urbanecm@deployment-mwmaint01:/srv/mediawiki/php-master$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki
  • 18:46 urbanecm: urbanecm@deployment-mwmaint01:/srv/mediawiki/php-master$ for i in {1..20}; do mwscript extensions/CentralAuth/maintenance/createLocalAccount.php --wiki=enwiki "Martin Urbanec (test $i)"; done
  • 18:40 urbanecm: urbanecm@deployment-mwmaint01:~$ for i in {1..20}; do mwscript createAndPromote.php --wiki=cswiki "Martin Urbanec (test $i)" "$password"; done # to test a feature that needs a lot of different accounts
  • 16:30 majavah: restart sssd on deployment-cache-text06, T286502?
  • 16:24 majavah: deployment-prep: configure wikifunctions.beta.wmflabs.org dns zones and add to acme-chief T284162

2021-08-19

2021-08-17

2021-08-16

2021-08-15

  • 17:44 James_F: Zuul: [mediawiki/extensions/CIForms] Add basic quibble CI

2021-08-13

  • 20:09 urbanecm: Manually start `beta-update-databases-eqiad` CI job
  • 20:06 urbanecm: deployment-prep: sudo -u jenkins-deploy /usr/local/bin/wmf-beta-update-databases.py
  • 20:03 urbanecm: Kill beta-scap-sync-world job for the usual reason
  • 13:13 majavah: `mwscript extensions/CentralAuth/maintenance/importMissingLocalNames.php --wiki metawiki` on the beta cluster

2021-08-11

  • 00:52 James_F: Zuul: Add Aca to the CI allow list
  • 00:52 James_F: Zuul: [mediawiki/extensions/SimpleCalendar] Add basic quibble CI

2021-08-10

  • 16:57 James_F: Zuul: Update e-mail address for Zabe in the allow list
  • 16:06 James_F: Ran `sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/Wikibase` on doc1001 for T288396

2021-08-09

  • 21:02 urbanecm: Remove hanging beta-scap-sync-world job in CI to unblock beta auto-updates

2021-08-07

  • 00:48 James_F: Docker: Publish quibble-buster-php73-coverage 1.1.1 for T287918.
  • 00:29 James_F: Zuul: Add skin-coverage jobs to all Wikimedia production skins T287918
  • 00:27 James_F: Zuul: Provide a skin-coverage template T287918

2021-08-06

  • 23:53 James_F: Docker: Publishing quibble-buster-php73-coverage 1.1.0 for T287918
  • 23:47 James_F: Zuul: [mediawiki/skins/Mirage] Not a production skin; move to right section

2021-08-05

2021-08-04

2021-08-03

2021-08-02

2021-07-30

  • 21:27 dduvall: "Total reclaimed space: 141.4GB" on releases1002 following docker prune
  • 21:24 dduvall: running `docker system prune -af` on releases1002

2021-07-29

2021-07-28

2021-07-27

  • 16:55 dduvall: creating new gitlab runner instance runner-1002 for testing
  • 16:32 hashar: cleaned some obsolete caches under integration-castor03 /srv/jenkins-workspace/caches

2021-07-26

2021-07-23

  • 19:26 brennen: gitlab-runners: launched runner-1001, g3.cores8.ram36.disk20 to install baseline experimental runner (T287279)
  • 14:08 hashar: Building Docker images for quibble 1.0.1
  • 13:45 hashar: Tag Quibble 1.0.1 @ 5a2548699a # T287001

2021-07-21

  • 21:06 brennen: gitlab1001: running ansible for logging typo fix (T274462)
  • 20:36 dancy: Newest scap deployed to beta cluster
  • 19:06 brennen: gitlab1001: running ansible to deploy nginx logging and status changes (T274462, T275170)
  • 16:46 dancy: restarting Gerrit to fix plugins
  • 16:07 dancy: Updating plugins on releases-jenkins
  • 15:00 urbanecm: deployment-prep: Change password for `Martin Urbanec` at votewiki

2021-07-20

  • 23:17 brennen: removed erroneous listing of myself as a train deployer for this week from deployment schedule, added hashar (T281156)
  • 18:39 hashar: Rolling back Jenkins jobs from Quibble 1.0.0 to 0.0.47 # T287001

2021-07-19

  • 18:42 brennen: gerrit1001: ran puppet; noted that quotes were added to jvm configuration values

2021-07-18

  • 08:48 majavah: set shared_acme_certificates: {} on deployment-prep shared hiera, T276653

2021-07-16

2021-07-14

2021-07-13

  • 23:22 dpifke: Re-cherry-picking newer https://gerrit.wikimedia.org/r/c/operations/puppet/+/703912 patch in deployment-prep. Should only affect deployment-webperf12.
  • 19:29 James_F: Manually deleted Jade extension coverage from doc1001 for T281430
  • 19:24 James_F: Zuul: [mediawiki/extensions/Jade] Mark repo as archived T281430
  • 16:33 dpifke: Cherry-picking https://gerrit.wikimedia.org/r/c/operations/puppet/+/703912 in deployment-prep puppet. Should only affect deployment-webperf12.
  • 10:22 hashar: gerrit: pushed upstream tags for plugins/gitiles # T278990
  • 09:13 James_F: Zuul: [mediawiki/extensions/Report] Add basic quibble CI job
  • 08:24 hashar: Updated operations/software/gerrit branches to 3.2.11 # T278990
  • 07:46 hashar: Wiping all Docker images from contint2001l

2021-07-12

  • 18:45 majavah: upgrade deployment-cache-text06 to use varnish 6 (with profile::cache::varnish::frontend::packages_component), and run apt upgrade, T286506
  • 18:43 majavah: deployment-cache-text06 varnish not starting, T286506, causing an outage on text traffic on deployment-prep
  • 18:23 majavah: hard reboot deployment-cache-text06 once I got in using a root ssh key
  • 16:15 majavah: hard reboot deployment-cache-text06, refusing to let me log in and console full of errors
  • 14:48 Amir1: ran $ ./jjb-update 'wikidata-query-gui-build' (T286479)
  • 14:44 majavah: fix merge conflict on deployment-puppetmaster04
  • 13:19 James_F: Zuul: Add Voidwalker to the CI allow list
  • 13:19 James_F: Zuul: Add R4356th to the CI allow list
  • 13:05 James_F_: Zuul: [pywikibot/i18n] Add gate-and-submit-l10n pipeline T286207

2021-07-09

  • 14:47 bd808: Slienced puppet failure alert for deployment-parsoid12 for 7 days (T286375)
  • 00:18 bd808: Silenced puppet failure alert for deployment-kafka-jumbo-3 for the next 7 days (T286358)

2021-07-08

  • 08:21 majavah: kick stuck puppet agent on deployment-logstash04

2021-07-07

  • 18:10 majavah: create shellbox.svc.deployment-prep.eqiad1.wikimedia.cloud. record as a CNAME to deployment-shellbox instance T286298

2021-07-06

2021-07-05

  • 15:02 Amir1: deployed 703212 (T286058)

2021-07-02

2021-06-30

  • 22:42 brennen: gitlab: published https://gitlab.wikimedia.org/releng/gitlab-settings
  • 22:39 brennen: gitlab: creating people, people/wmf, and people/wmf/release-engineering groups; mandating 2fa for people/wmf
  • 17:57 thcipriani: restart ci jenkins following upgrade
  • 17:54 thcipriani: restart releases-jenkins following upgrade
  • 12:36 James_F: Zuul: [mediawiki/core] Drop PHP70/71 testing for REL1_31

2021-06-29

  • 21:45 urbanecm: urbanecm@deployment-deploy01:~$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # T285811

2021-06-28

  • 12:06 majavah: revert manual scap downgrade on deployment-mediawiki11
  • 11:59 majavah: downgrade scap to 3.17.1-1 (matching production) on deployment-mediawiki11, testing for T285125

2021-06-24

  • 01:11 Krinkle: deployment-memc08 and -memc09: apt-get install memkeys (already installed on deployment-mediawiki11)

2021-06-23

  • 16:32 Reedy: beta update jobs have been stuck for ~9.5 hours. Going to attempt to unstick

2021-06-22

2021-06-21

  • 23:27 Krinkle: Jobs for `deployment-deploy01` are waiting for executor but host is completely idle for over 10min. Disconnecting and relaunching.

2021-06-19

  • 13:44 majavah: remove deployment-deploy02 T278689
  • 08:05 majavah: creating deployment-logstash05 and configure it like 04, looks like elasticsearch does not like clusters with only one host T283013

2021-06-16

  • 21:47 James_F: Zuul: Install CI for mediawiki/libs/NormalizedException T284732
  • 05:44 majavah: restart trafficserver-tls.service on deployment-cache-upload06, was using an expired cert

2021-06-14

  • 22:06 brennen: gitlab-test: repointing floating IP to ansible test box, running ansible to test issue & wiki default config
  • 21:41 brennen: gitlab-test: repointing floating IP to main test instance; gitlab-ctl reconfigure to test some feature flags
  • 16:57 James_F: uul: [mediawiki/tools/api-testing] Publish docs on postmerge T236915
  • 16:31 James_F: Zuul: [mediawiki/tools/api-testing] Add npm run doc and publishing T236915
  • 16:28 James_F: Zuul: Add Yashvarshney02 to Trusted users
  • 16:25 James_F: Zuul: Add Jay (CIS-A2K) to Trusted users
  • 16:22 James_F: Zuul: Add initial CI for cloud/toolforge/jobs-framework-cli

2021-06-13

2021-06-11

  • 20:49 brennen: gitlab1001: resetting application data, re-running ansible playbook
  • 15:50 James_F: Zuul: [node-rdkafka-statsd] Switch to service-pipeline-test T284345
  • 15:25 James_F: Zuul: [node-rdkafka-factory] Switch to service-pipeline-test T284345
  • 14:47 majavah: generate and add my (taavi) own root key to deployment-prep
  • 14:14 hashar: deployment-imagescaler03: delete local mwdeploy user with uid 497 # T73480
  • 12:31 hashar: deployment-prep: removed deployment-shellbox puppet certificate and regenerated it. Ran puppet and it passes all fine.
  • 10:30 hashar: deployment-prep: cherry picked https://gerrit.wikimedia.org/r/c/operations/puppet/+/699207 to add a motd on all instances # T100837
  • 02:03 Reedy: beta-update-databases-eqiad seemingly broken by CategoryTree fix for T271011. Comment left on gerrit patch and task, not reverting patch in master at this stage

2021-06-10

  • 21:12 James_F: Zuul: [mediawiki/extensions/ProofreadPage] Add Scribunto as phan dep too T281195
  • 21:05 James_F: Zuul: [mediawiki/extensions/ProofreadPage] Add Scribunto as a dependency T281195

2021-06-08

  • 20:32 brennen: gitlab1001: k6_gitlab: running test data creation
  • 19:59 brennen: gitlab1001: gitlab-ansible run to reset configuration
  • 19:40 brennen: gitlab1001: resetting all application data for a second attempt at test data creation
  • 17:40 brennen: gitlab1001: running k6 data generator
  • 17:25 James_F: Zuul: [mediawiki/extensions/Wikibase] Switch legacy Ruby jobs to 2.5 T280491

2021-06-07

  • 22:02 urbanecm: urbanecm@deployment-sessionstore04:~$ sudo service cassandra start # T263617
  • 22:02 urbanecm: urbanecm@deployment-sessionstore04:~$ sudo touch /etc/cassandra/service-enabled #T263617
  • 21:40 James_F: Docker: Pushing node12-test ano node12-test-browser 0.0.2 for T284492
  • 11:51 hashar: zuul enqueue --trigger gerrit --pipeline postmerge --project operations/software/tegola --change 698470,1 # request by mbsantos for https://gerrit.wikimedia.org/r/c/operations/software/tegola/+/698470

2021-06-05

  • 20:34 James_F: Zuul: [mediawiki/extensions/TitleIcon] Switch to non-composer, with-selenium

2021-06-04

  • 23:57 Krinkle: integration-agent-qemu-1001 back up, Thanks andrewbogott
  • 23:47 Krinkle: Qemu jobs are stuck. Jenkins is unable to connect to integration-agent-qemu-1001
  • 20:13 James_F: Zuul: Switch almost all node10 jobs to node12 T284345
  • 20:11 James_F: Zuul: [VisualEditor/VisualEditor] Switch node10 jobs to node12 T284345
  • 19:20 James_F: Docker: Publishing node12 CI images T284343

2021-06-03

  • 19:06 hashar: contint1001 and contint2001: deleted all workspaces under /srv/jenkins-slave/workspace/* # T284125
  • 00:39 James_F: Zuul: Add EventLogging to dependencies of PropertySuggester
  • 00:24 James_F: Zuul: Add Anysite to CI allowlist
  • 00:19 James_F: Zuul: [mediawiki/services/image-suggestion-api] Use bespoke pipeline T281132

2021-06-02

2021-05-28

2021-05-27

  • 23:14 brennen: gitlab1001: gitlab-ctl stop nginx - pausing httpd for the weekend
  • 20:36 brennen: gitlab1001: temporarily disabling backup cron jobs
  • 17:46 legoktm: reloaded zuul for https://gerrit.wikimedia.org/r/696565 https://gerrit.wikimedia.org/r/696566
  • 16:52 brennen: gitlab1001: ran gitlab-ctl start; logins now working; will add banner to effect that this is all provisional state
  • 16:05 brennen: gitlab1001: re-running ansible and puppet per T279545
  • 00:14 James_F: Zuul: [wikimedia/irc/ircservserv-config] Fix bad copy-paste
  • 00:06 James_F: Zuul: [wikimedia/irc/ircservserv-config] Add bespoke pipeline jobs

2021-05-26

  • 17:37 brennen: gitlab1001: reset admin password and ran `gitlab-ctl stop` (T279545)
  • 16:24 brennen: running gitlab-ansible's install-gitlab-server.sh against gitlab1001.wikimedia.org

2021-05-24

2021-05-21

2021-05-20

  • 21:36 Krinkle: Fix broken Jenkins config for console sections of selenium jobs to accomodate for updates to wdio
  • 01:43 James_F: Zuul: Add Southparkfan to CI allowlist
  • 01:33 James_F: Zuul: [labs/codesearch] Add "test" deployment pipeline job too
  • 00:18 James_F: Zuul: [labs/codesearch] Install deployment pipeline
  • 00:14 James_F: Publishing quibble-buster-php73-coverage:0.0.47-s2 with no memory limit for coverage jobs T280669

2021-05-19

  • 23:44 James_F: Publishing quibble-buster-php73-coverage:0.0.47-s1 with a 4GiB memory limit for coverage jobs T280669
  • 16:42 James_F: Zuul: [mediawiki/extensions/AbuseFilter] Add Scribunto & EventLogging deps T279275
  • 16:39 James_F: Zuul: Add H.krishna123 to the list of trusted users T279552

2021-05-17

  • 18:01 James_F: Zuul: [mediawiki/extensions/RelatedLinks] Archive T279221

2021-05-16

  • 19:58 Krinkle: deployment-mediawiki11$ apt-get install memkeys
  • 09:29 Majavah: fix labs/private merge conflicts on deployment-puppetmaster04

2021-05-15

2021-05-14

  • 21:31 Krinkle: Delete now-unreadable unread echo notifications from deploymentwiki and clear cache badge count cache (echo_unread_wikis: 9892 rows affected, Echo/maintenance/recomputeNotifCounts.php), T198673
  • 21:10 Krinkle: Delete beta cluster commonswiki.globalusage data for deploymentwiki, T198673, https://wikitech.wikimedia.org/wiki/Delete_a_wiki (86 rows affected)
  • 21:09 Krinkle: Delete beta cluster centralauth rows relating to deploymentwiki, T198673, https://wikitech.wikimedia.org/wiki/Delete_a_wiki (12600 rows affected)
  • 20:51 Krinkle: I broke beta `InvalidArgumentException: mcrouter-with-onhost-tier not present in $wgObjectCaches` - working on it
  • 13:08 addshore: Github, Allowed Wikimedia Helper Bot for GitHub to read `github/workflows/dependabot-gerrit.yml`
  • 10:26 addshore: reload zuul for WMDE: Add Marta to trusted emails [integration/config] - https://gerrit.wikimedia.org/r/691117
  • 02:16 James_F: Zuul: [mediawiki/skins/MinervaNeue] Drop Ruby-based selenium job T174018 T177260 T280901
  • 02:09 James_F: Zuul: [operations/container/miscweb] Install bespoke pipeline CI T281538

2021-05-13

2021-05-12

2021-05-11

  • 23:32 James_F: Zuul: Add Disambiguator to the MediaWiki gated extension set T237538 T249674
  • 23:25 James_F: Zuul: [mediawiki/extensions/NCBITaxonomyLookup] Enable basic quibble CI

2021-05-10

  • 14:38 James_F: Zuul: [mediawiki/extensions/VoteNY] Add SocialProfile as a phan dependency
  • 14:04 CFisch_WMDE: Improve comment around ReferencePreviews beta cluster default (T271206)
  • 14:04 CFisch_WMDE: Forward renamed config name for improved template search features (T277028)

2021-05-07

  • 16:37 James_F: Zuul: [operations/software/mailman-templates] Add CI of debian-glue T282018

2021-05-06

  • 02:52 James_F: jjb: Enable Sonar analysis for mjolnir builds T264877
  • 02:14 James_F: Zuul: [mediawiki/extensions/UploadWizard] Drop tox job, not useful
  • 00:16 James_F: Zuul: [mediawiki/services/parsoid] Drop parsoidsvc-parsertests-docker job T271562

2021-05-05

2021-05-04

  • 23:34 James_F: Zuul: Add Adam Hammad to CI allow list
  • 17:02 Amir1: stop exim4 and upgrade it in deployment-mx02

2021-05-03

  • 19:42 James_F: Zuul: [mediawiki/services/image-suggestion-api] Publish images post-merge T281256
  • 17:05 James_F: Docker: Publishing quibble-buster images with python3-distutils so quibble can build
  • 16:07 James_F: Zuul: Add Luca Mauri to the CI allow list
  • 13:55 CFisch_WMDE: enable new search features for the template dialog (T271802)

2021-05-02

  • 18:58 Majavah: add dns record upload.wikimedia.beta.wmflabs.org. -> 185.15.56.35 (deployment-cache-upload floating address)
  • 18:50 Majavah: adjust deployment-cache* hieradata to treat upload.wikimedia.beta.wmflabs.org like upload.beta.wmflabs.org
  • 18:42 Krinkle: Cherry-pick "mediawiki: Remove 'deployment.wikimedia' vhost from Beta Cluster" - <https://gerrit.wikimedia.org/r/c/operations/puppet/+/684117>, ref T198673
  • 18:41 Krinkle: Run `puppet agent -tv` on deployment-cache-text06 and deployment-mediawiki11
  • 18:37 Krinkle: Cherry-pick "mediawiki: Remove 'deployment.wikimedia' vhost from Beta Cluster" - https://gerrit.wikimedia.org/r/c/operations/puppet/+/684117

2021-05-01

  • 19:19 James_F: Zuul: Add atagar to the CI allow list
  • 10:37 Majavah: installing deployment-urldownloader03 to replace 02 - T278641
  • 04:05 Krinkle: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/684004

2021-04-30

  • 20:13 dancy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/683987
  • 19:21 James_F: Docker: Publishing mediawiki-phan-taint-check-demo:0.1.1 for T257301
  • 14:21 Majavah: add profile::pki::client to all deployment-prep instances to trust deployment-prep cfssl certificates, already deployed on production
  • 14:15 Majavah: revert above as it's not working, T206158
  • 14:13 Majavah: deployment-cache-text: trying out useusing HTTPS for backend traffic to deployment-mediawiki11 T206158
  • 12:37 Majavah: force reboot deployment-cache-text06, not letting me to log in, this will disrupt beta cluster availability
  • 02:37 James_F: Docker: Publishing node10 images based on buster T278203 T240955

2021-04-29

  • 12:19 Majavah: dropping jade_diff_judgement, jade_diff_label, jade_revision_judgement, jade_revision_label tables on all-labs.dblist T281418

2021-04-28

  • 15:27 James_F: Zuul: [mediawiki/libs/metrics-platform] Add pipeline-based CI jobs T279180
  • 07:26 hashar: contint2001: sudo -u jenkins find *quibble* -path '*/archive/log/rawSeleniumVideoGrabs/*' -delete # T249268
  • 07:26 hashar: contint2001: sudo -u jenkins find *quibble* -path '*/archive/log/rawSeleniumVideoGrabs/*' -delete
  • 07:19 hashar: contint2001: sudo -u jenkins find /srv/jenkins/builds/mediawiki-fresnel-patch-docker -name "*trace.json" -exec gzip {} \+ # T249268
  • 01:19 dpifke: Cherry-picking https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886 in beta, should only affect deployment-webperf11.

2021-04-27

  • 19:16 James_F: Docker: Rebuilding all Sury-php derivatives for T277742.
  • 17:52 Majavah: delete deployment-sessionstore03 T263617 T278641
  • 16:35 James_F: Docker: Publishing composer-scratch 1.10.22 and its cascade for T281283
  • 14:18 hashar: Updating most jenkins jobs to change cleanup commands from stretch to buster | https://gerrit.wikimedia.org/r/c/integration/config/+/680476
  • 12:44 hashar: Restarted CI Jenkins for plugins upgrade
  • 12:24 hashar: Upgraded releases Jenkins from 2.263.3 to 2.277.2 (with ldap plugin 1.26)
  • 12:11 hashar: Upgrading Jenkins plugins on the releases jenkins
  • 06:40 Majavah: installing deployment-sessionstore04 T263617
  • 05:29 Majavah: restart cassandra on deployment-sessionstore03 refs T281198

2021-04-26

  • 16:53 James_F: Zuul: Add AnjaliKumari to the CI allow list

2021-04-24

  • 17:47 James_F: Zuul: [mediawiki/extensions/MultimediaViewer] Drop Ruby selenium test job

2021-04-23

  • 22:14 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/682029
  • 16:30 Majavah: remove deployment-prep hiera settings for phabricator, given there is no phabricator instance on that project
  • 09:12 Majavah: signing puppet certs for deployment-eventlog08 and running puppet for the first time to stop annoying email alerts

2021-04-22

  • 06:06 legoktm: reloading zuul to deploy https://gerrit.wikimedia.org/r/680697
  • 02:53 Reedy: killed a few stuck beta ci jobs
  • 02:51 Krinkle: The 'beta-mediawiki-config-update-eqiad' jobs have been stuck for ~ 8 hours
  • 02:19 James_F: Zuul: Switch bundle-yard-publish jobs to Ruby 2.5 T280874
  • 01:49 James_F: Zuul: [mediawiki/vagrant] Add mediawiki-vagrant-ruby2.5-rake-docker as experimental T280874
  • 01:44 James_F: Docker: Publishing rake-vagrant-ruby2.5:0.1.0 for T280874
  • 00:45 James_F: Zuul: Add experimental Ruby 2.5 jobs for two repos T280874

2021-04-21

  • 23:43 James_F: Zuul: [operations/puppet-lint/wmf_styleguide-check] Switch to Ruby 2.5
  • 23:25 James_F: Zuul: Provide experimental Ruby 2.5 rake jobs T280874
  • 22:56 James_F: Zuul: Add mwgate-ruby2.5-rake-docker experimentally to mwgate-rake
  • 22:36 James_F: Docker: Publishing rake-ruby2.5:0.1.0 for T280874
  • 18:47 James_F: Add ImageMap to the list of Parsoid's ext dependencies

2021-04-20

  • 07:19 CFisch_WMDE: enable changes to the descriptions in the VE transclusion dialog (T273425)
  • 07:17 CFisch_WMDE: enable suggested values paramter in TemplateData and VisualEditor (T271825)

2021-04-19

  • 23:04 James_F: Zuul: Add legacy-quibble-rubyselenium-docker as experimental T280491
  • 17:58 Majavah: apply hack (https://phabricator.wikimedia.org/T277206#7015609) to deployment-puppetmaster04 to unbreak maintenance scripts until we have conftool
  • 15:24 James_F: Re-pushing mwselenium-quibble-docker back to master for T280491

2021-04-17

  • 07:23 Majavah: restart uwsgi-ores on deployment-ores01 for T280420

2021-04-16

  • 23:20 James_F: Docker: Publishing quibble-buster-php73-coverage version with performance tuning config tweaks T234020 T280167
  • 22:39 James_F: Docker: Publishing quibble-buster-php72-bundle
  • 22:00 James_F: Docker: Publishing quibble-fresnel based on buster not stretch T278203
  • 19:43 James_F: Zuul: Drop the now duplicate PHP72 'buster' quibble jobs T252432
  • 19:11 Krinkle: Remove `profile::mediawiki::install_hhvm: false` Hiera config in Horizon for deployment-prep. This variable is no longer used. ref T235142
  • 19:06 Krinkle: Change profile::mail::mx::verp_post_connect_server in Horizon for deployment-prep from `deployment.wikimedia.beta.wmflabs.org` to `meta.wikimedia.beta.wmflabs.org`, ref T198673
  • 19:05 Krinkle: Change profile::mail::mx::verp_bounce_post_url in Horizon for deployment-prep from `http://deployment.wikimedia.beta.wmflabs.org/w/api.php` to `http://meta.wikimedia.beta.wmflabs.org/w/api.php`, ref T198673
  • 18:47 Krinkle: Delete forceupdate.beta.wmflabs.org from DNS for deployment-prep (created 2020-03-18, comment "I'm going to delete this in a moment")
  • 17:44 dancy: Updating docker-pkg files on contint primary for https://gerrit.wikimedia.org/r/c/integration/config/+/680392
  • 17:02 dancy: Updating dev-images docker-pkg files on primary contint
  • 16:46 James_F: Zuul: Make php72_buster jobs voting for extension-quibble template T252434

2021-04-15

  • 18:26 paladox: gerrit: created openstack/horizon/trove-dashboard per andrewbogott (with parent set as openstack/horizon/horizon)
  • 16:47 Majavah: manually rebase deployment-puppetmaster04 due to local hacks having conflicts

2021-04-14

  • 16:19 James_F: Docker: Publish quibble-buster-php73-coverage fixing loading of pcov T234020

2021-04-13

2021-04-12

  • 15:46 Urbanecm: deployment-prep: Run `mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php` on all beta wikis with GrowthExperiments installed (wikis that are both in all-labs and growthexperiments, plus enwiki; T279853)
  • 15:40 Urbanecm: deployment-prep: urbanecm@deployment-deploy01:~$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=cswiki # T279853
  • 14:39 Majavah: remove https://gerrit.wikimedia.org/r/c/operations/puppet/+/263024 cherry pick from beta cluster per T106915#6279270 - T135427
  • 14:38 Majavah: fix parsoid CI ferm rule local hack puppet patch on deployment-puppetmaster04 after it broke due to operations/puppet changes
  • 11:30 Urbanecm: deployment-prep: Beta is down due to my change, fix on its way (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/678578)

2021-04-11

  • 14:44 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/678378
  • 00:52 James_F: dockerfiles: [quibble-buster-php73-coverage] Switch from xdebug to pcov T234020
  • 00:08 James_F: Zuul: [mediawiki/core] Enforce PHP 8.0 composer test for REL1_3{5,6} T274971
  • 00:02 James_F: Docker: Publishing quibble-buster images T252432

2021-04-10

  • 17:36 James_F: Zuul: Add Meno25 to &email_allowlist list

2021-04-09

2021-04-08

2021-04-07

  • 16:03 James_F: Zuul: Add Bharatkhatri in the CI allow list
  • 16:03 James_F: Zuul: [wikidata/query-builder] Add gate-and-submit-l10n template
  • 15:12 Majavah: remove jessie-deployment-prep from deployment-deploy01 aptly
  • 14:27 Majavah: delete deployment-mediawiki-07 and deployment-parsoid11 T278664
  • 00:57 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/677388
  • 00:43 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/677382

2021-04-06

  • 20:26 James_F: Zuul: [mediawiki/extensions/D3Loader] Mark as archived T277626
  • 20:26 James_F: Zuul: [mediawiki/extensions/Gravatar] Add basic quibble and phan jobs T279260
  • 19:58 James_F: Zuul: Configure the REL1_36 test and gate pipelines T279459
  • 17:58 James_F: Zuul: [mediawiki/services/function-{orchestr,evalu}ator] Publish images

2021-04-05

  • 18:20 brennen: resizing gitlab-ansible-test to g3.cores8.ram16.disk20
  • 17:45 brennen: halting gitlab-test for resize

2021-04-02

  • 10:53 Majavah: change deployment-wikifeeds01 config to use deployment-mediawiki11
  • 10:47 Majavah: update web proxy parsoid-beta.wmflabs.org to point to deployment-parsoid12

2021-04-01

  • 16:16 Majavah: hard reboot unresponsive deployment-cache-text06
  • 12:52 Majavah: update floating ip 185.15.56.9 from deployment-parsoid11 to deployment-parsoid12
  • 11:00 Majavah: restart changeprop container on deployment-docker-mobileapps01 to pick up config changes
  • 10:45 hashar: Updating all Jenkins jobs with jjb to deploy https://gerrit.wikimedia.org/r/676298

2021-03-30

  • 18:05 Majavah: remove {trysty,precise}-deployment-prep repos from deployment-deploy01 aptly
  • 17:51 Majavah: arm deployment-deploy01 keyholder with all the keys
  • 14:50 Majavah: cherry pick 675807 675814 and 675815 to deployment-puppetmaster to unblock work on deployment-deploy03 until sre has merged those T278689
  • 14:44 Majavah: remove deployment-puppetmaster04 local patch adding releng/phatality to scap to see if it unbreaks deployment-deploy03 puppet runs
  • 13:55 Majavah: cherry pick https://gerrit.wikimedia.org/r/c/operations/puppet/+/675802/ on beta to unblock my progress until merged
  • 13:35 Majavah: create and install deployment-deploy03 T278689
  • 13:17 Majavah: armed deployment-cumin keyholder, found passphrase at deployment-puppetmaster04:/var/lib/git/labs/private/files/ssh/tin/cumin_rsa.passphrase
  • 07:26 Majavah: shutoff deployment-mediawiki-09 T278664
  • 06:25 Majavah: switch w-beta.wmflabs.org web proxy to deployment-mediawiki11
  • 06:18 Majavah: restart restbase on deployment-restbase03 to pick up config changes to use deployment-mediawiki11

2021-03-29

  • 15:37 Majavah: hard reboot deployment-sessionstore03 T263617
  • 15:16 Majavah: manually run puppet on deployment-sessionstore03, starting Cassandra (which was stopped) T263617
  • 13:04 Majavah: cherry pick https://gerrit.wikimedia.org/r/c/operations/puppet/+/675503/ on deployment-puppetmaster04 (T278664), also apply same change on horizon. this will switch traffic from deployment-mediawiki-07 to deployment-mediawiki11
  • 10:29 Majavah: remove deployment-mediawiki10, too much live debugging, not in use
  • 09:56 Majavah: taavi@deployment-mediawiki10:~$ sudo ln -s /usr/local/share/ca-certificates/Puppet_Internal_CA.crt /etc/ssl/certs/aeffde42.0 && sudo update-ca-certificates
  • 09:29 Urbanecm: Manually run puppet on mediawiki10
  • 09:28 Urbanecm: Re-enable puppet on mediawiki10
  • 08:49 Urbanecm: DIsable puppet on mediawiki10 - investigating failing curl certificate check
  • 06:46 Majavah: cherry-pick https://gerrit.wikimedia.org/r/c/operations/puppet/+/675357/ on deployment-puppetmaster04 - T278664
  • 05:40 Majavah: move role::labs::lvm::srv puppet classes from deployment-mediawiki- prefix to current individual appservers, T278664

2021-03-26

2021-03-25

2021-03-24

  • 07:42 Majavah: remove deployment-logstash2 hiera from horizon, instahce was shut off earlier by moritzm T238707

2021-03-23

  • 20:46 James_F: Zuul: [mediawiki/extensions/CopyToClipboard] Archive per T274015
  • 20:44 James_F: Zuul: [mediawiki/extensions/Wikibase] Add experimental Postgres job T207226
  • 19:33 James_F: Zuul: [operations/homer/public] Add bespoke tox-publish job
  • 04:45 James_F: dockerfiles: [quibble-buster] Switch npm to our own build, and cascade T252434
  • 02:39 James_F: Zuul: Make php72_buster jobs voting for skin-quibble template T252434
  • 02:23 James_F: Zuul: [mediawiki/vendor] Make php72_buster jobs voting for master branch T252434
  • 01:35 James_F: Zuul: [mediawiki/core] Make php72_buster jobs voting for master branch T252434
  • 00:03 brennen: re-associating floating IP for gitlab-test to gitlab-ansible-test box for speed & function use

2021-03-22

  • 23:06 James_F: Zuul: [labs/tools/majavah-bot] Run generic tox tests
  • 19:28 James_F: Zuul: [mediawiki/services/function-orchestrator] Add code coverage job
  • 12:07 Majavah: delete deployment-restbase[01-02], T250574
  • 11:36 dcaro: Created subzone svc.deployment-prep.eqiad1.wikimedia.cloud. (T276624)
  • 11:33 dcaro: Created subzone beta.wmcloud.org (T276624)

2021-03-19

2021-03-18

2021-03-17

  • 20:30 hashar: Reloaded Zuul for I236847
  • 15:37 Majavah: shutdown deployment-restbase01 for T250574
  • 15:32 Majavah: taavi@deployment-restbase01:~$ sudo nodetool decomission # T250574
  • 14:53 addshore: reload zuul for https://gerrit.wikimedia.org/r/673028 Run more Wikibase tests jobs for REL1_35 branch
  • 01:21 James_F: Zuul: [labs/tools/wikisource-ocr] Remove CI

2021-03-16

  • 21:37 longma: Updating dev-images docker-pkg files on primary contint
  • 21:22 marxarelli: reloading zuul to deploy https://gerrit.wikimedia.org/r/c/integration/config/+/672792
  • 20:06 marxarelli: restarting zuul due to seemingly stuck dependency chain
  • 16:22 James_F: Docker: Publishing quibble-stretch-php72-apache:0.0.46-s1
  • 10:29 addshore: reload zuul for https://gerrit.wikimedia.org/r/670898 Add configuration for new wikidata/query-builder repo
  • 10:17 hashar: Building docker-registry.wikimedia.org/releng/sonar-scanner:4.6.0.2311-1 # T277527

2021-03-15

2021-03-13

  • 17:10 twentyafterfour: restart apache on gerrit1001

2021-03-12

2021-03-11

2021-03-10

  • 19:36 Majavah: shutdown deployment-ircd T277081
  • 18:46 Majavah: switch floating ip 185.15.56.34 to deployment-ircd02 T277081
  • 18:05 Majavah: create deployment-ircd02 for T277081
  • 17:26 marxarelli: `rm -rf /srv/dump` on deployment-db06 and reenabling puppet
  • 17:25 marxarelli: `rm -rf /srv/restore` on deployment-db08 and reenabling puppet
  • 17:24 marxarelli: `rm -rf /srv/backup /srv/restore` on deployment-db07 and reenabling puppet
  • 17:09 Majavah: set beta cluster mediawiki as read write on mw config (T276968)
  • 17:03 Majavah: make deployment-db06 read-write T276968
  • 16:50 Majavah: `reset slave;` on new master deployment-db06 T276968
  • 16:49 Majavah: add deployment-db07 as a replica of db06 for T276968
  • 16:45 Urbanecm: root@deployment-db07:/opt/wmf-mariadb104/bin# ./mysql_upgrade -h 127.0.0.1 # T276968
  • 16:12 Majavah: deployment-db08 CHANGE MASTER to MASTER_USER='repl', MASTER_PASSWORD='redacted', MASTER_PORT=3306, MASTER_HOST='deployment-db06.deployment-prep.eqiad1.wikimedia.cloud', MASTER_LOG_FILE='deployment-db06-bin.000059', MASTER_LOG_POS=522469730; (T276968)
  • 16:06 Urbanecm: start root@deployment-db07:/srv/sqldata.db06# rsync --progress -r deployment-db06:/srv/sqldata/ . (T276968)
  • 15:57 Majavah: set deployment-db06 as readonly from mysql side T276968
  • 15:54 Urbanecm: Start `root@deployment-db08:/opt/wmf-mariadb104/bin# ./mysql_upgrade -h 127.0.0.1` (T276968)
  • 15:54 Urbanecm: Start mariadb on db08 (T276968)
  • 15:22 Urbanecm: rsync deployment-db06:/srv/sqldata to deployment-db08:/srv/sqldata in a tmux session on deploymdeployment-db08 (T276968)
  • 14:52 Majavah: delete deployment-db08 /srv/sqldata to attempt procedure in https://phabricator.wikimedia.org/T276968#6900199
  • 10:16 arturo: briefly stopping deployment-puppetdb03 to disable VMX CPU flag
  • 00:28 marxarelli: mariadb successfully started on db07 following transfer/extraction using mariabackup and following mysql_upgrade (T276968)
  • 00:10 marxarelli: restore of db06 failed yet again. trying mariabackup db06 -> db07 instead of mysqldump (after fixing docs/usage of the former) (T276968)

2021-03-09

  • 21:54 marxarelli: restoring from db06 dump on db07 and db08 following `DROP VIEW IF EXISTS user` workaround (T276968)
  • 20:53 marxarelli: restore on db07 failed. appears to be a bug w/ mariadb/mysqldump 10.4 compat https://jira.mariadb.org/browse/MDEV-22127 (T276968)
  • 20:53 marxarelli: restore on db07 failed. appears to be a bug w/ mariadb/mysqldump 10.4 compat https://jira.mariadb.org/browse/MDEV-22127
  • 20:39 marxarelli: doing `--skip-grant-tables` on deployment-db08 and creating a new root@127.0.0.1 user (T276968)
  • 20:33 Majavah: install mariadb on deployment-db08 T276968
  • 19:59 marxarelli: creating new instance deployment-db08 to use as new beta replica db (T276968)
  • 19:56 marxarelli: deleting deployment-db05 to free up quota for new replica (T276968)
  • 19:50 marxarelli: restoring database dump on deployment-db07 (T276968)
  • 18:49 marxarelli: restarting db dump on db06 `mysqldump -h 127.0.0.1 --events --routines --triggers --all-databases -f --single-transaction` (T276968)
  • 18:38 Majavah: installing mariadb 10.4 via role::mariadb::beta to db07 T276968
  • 18:25 marxarelli: "View 'labswiki.tag_summary' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them" when using LOCK TABLES" during mysqldump on db06 (T276968)
  • 18:21 Majavah: create deployment-db07 as g2.cores8.ram16.disk160 Buster T276968
  • 18:20 marxarelli: disabled puppet on deployment-db06 and started mysqldump (T276968)
  • 18:09 Majavah: set deployment-db05 to read-only to avoid issues with T276968
  • 18:04 marxarelli: deleting shut down memc* deployment-prep instances to free up quota for replacement db instances (T276968)
  • 17:25 marxarelli: seeing "[ 2886.337845] EXT4-fs error (device vda3): ext4_validate_block_bitmap:" for deployment-db05
  • 17:22 marxarelli: restarting deployment-db05 via horizon
  • 17:22 marxarelli: deployment-db05 seems to be acting up (intermittent connection failures) which is causing issues with beta-update-databases-eqiad, which is (possibly) causing post-merge jobs to pile up
  • 16:47 marxarelli: still seeing "JobOffer[deployment-deploy01 #3] rejected beta-scap-eqiad: Waiting for next available executor on ‘deployment-deploy01’" despite available executors
  • 16:27 marxarelli: builds once again being scheduled on deployment-deploy01
  • 16:24 marxarelli: cycling gearman plugin on integration.wikimedia.org
  • 16:16 marxarelli: taking deployment-deploy01 agent offline to mitigate stuck post-merge jobs
  • 13:32 arturo: hard-reboot deployment-db05 because issues related to T276922
  • 12:34 arturo: briefly rebooting VM deployment-db05, we need to reboot its hypervisor cloudvirt1038 and failed to migrate to other

2021-03-08

2021-03-07

  • 17:46 James_F: Deleting deployment-snapshot01, shut off since 2020-10-03.
  • 17:43 James_F: Deleting deployment-cumin02, shut off since 2020-10-16.
  • 17:18 Majavah: shutdown deployment-memc[04-05] T276707
  • 16:51 Majavah: cherry pick 669436 and 669436 to deployment-puppetmaster04 T276707
  • 15:52 Majavah: redis::shards change shard01 from deployment-memc04 to deployment-memc08, shard02 from deployment-memc05 to deployment-memc10 T276707
  • 15:44 Majavah: create deployment-memc10 on Buster T276707, beta cluster is almost on full quota but will get better when old shutdown Jessie instances will be deleted
  • 15:28 Majavah: remove and shard04 (deployment-memc07) from redis::shards, switch shard03 from deployment-memc06 to deployment-memc09, [06-07] are both already shut down and 09 is a new in setup Buster machine to replace it, T276707 T250585
  • 13:14 Majavah: create deployment-memc09 on Buster T276707

2021-03-06

  • 19:45 Majavah: restart deployment-logstash03 to see if it fixes it being empty
  • 09:48 Majavah: cherry-pick https://gerrit.wikimedia.org/r/668995 on deployment-puppetmaster04 T276654
  • 08:09 Majavah: deployment-acme-chief change authorized regex for mx to use .eqiad1.wikimedia.cloud domain to fix T276652

2021-03-05

  • 20:25 James_F: Disabling deployment-memc06 on the grounds that it's an unreferenced Jessie box we don't want any more T250585
  • 20:23 James_F: Disabling deployment-memc07 on the grounds that it's an unreferenced Jessie box we don't want any more T250585
  • 19:36 Majavah: release deployment-prep floating ip 185.15.56.7, was used for mailman upgrade which is now on its own project
  • 19:30 Majavah: shutdown deployment-etcd-01 to see if anything breaks, will delete if nothing has broken during next week T276462
  • 19:15 Majavah: beta cluster etcd was switched from deployment-etcd-01 to deployment-etcd02 ref T276462
  • 17:50 Majavah: switch deployment-prep hiera key etcd_host to use deployment-etcd02 ref T276462
  • 13:40 Majavah: create deployment-etcd02 and sign its puppet certificate T276462
  • 13:13 Majavah: move profile::etcd::cluster_name hiera key from deployment-etcd prefix to deployment-etcd-01 vm specific
  • 11:48 Majavah: live hack beta puppetmaster to fix hopefully trust store location; T276521 and possibly others
  • 08:32 Majavah: deployment-logstash03 try to recreate /etc/rsyslog.d using puppet to try to repair T241481, directory is different on deployment-logstash2

2021-03-04

  • 15:47 hashar: Refreshing jobs based on releng/tox-buster to use latest image. That brings in tox installed with python3 instead of python2 # T276384
  • 15:00 Majavah: remove graphoid role from deploymenr-sca[01-02] ref T276102 and it being decomissioned in T242855
  • 13:18 Majavah: shutdown deployment-fluorine02 for a scream test for T276419, I believe everything has been moved to deployment-mwlog01
  • 12:38 Majavah: `git rebase origin/production` on deployment-puppetmaster04 to update few settings for T276419
  • 12:19 Majavah: Beta cluster is now using deployment-mwlog01 instead of deployment-fluorine02 for MediaWiki logs. fluorine02 is still used for some other misc services, these will be migrated soon
  • 12:06 Majavah: deployment-prep Delete lists.beta.wmflabs.org DNS record, points to an unassigned floating IP and not used according to Amir
  • 11:02 Majavah: live hacking https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/668338/ on deployment-deploy01 to test new deployment-mwlog01 ref T276419
  • 10:51 Majavah: stop bogus service udp2log on deployment-mwlog01, no idea what it is but it was using the same port as udp2log-mw.service is
  • 09:20 hashar: Restored analytics/udp2log cause it got to be packaged for Buster # T276422 T180301
  • 07:47 legoktm: rebuilding php*-compile images https://gerrit.wikimedia.org/r/668259
  • 06:33 Majavah: create Buster VM deployment-mwlog01 to eventually replace deployment-fluorine02 which is still on Stretch

2021-03-03

2021-03-02

  • 22:22 Krinkle: Run `sudo systemctl restart memcached` on deployment-mediawiki-07
  • 22:22 Krinkle: Set `profile::mediawiki::mcrouter_wancache::use_onhost_memcached: true` manaully in Horizon for deployment-mediawiki-07 (TODO: Move to cloud/eqiad1 in operations/puppet.git).

2021-03-01

  • 18:25 marxarelli: deleting unused docker-registry-uploader jenkins credential
  • 14:41 andrewbogott: changed profile::redis::multidc::discovery from 'false' to "" to comply with strict typing in the deployment-memc puppet prefix.

2021-02-27

  • 22:03 Reedy: re-armed beta keyholder... I think...

2021-02-26

2021-02-24

  • 22:47 James_F: Docker: Actually re-building Rust images for 1.50.0
  • 22:16 legoktm: rebuilding Rust docker images

2021-02-23

  • 18:20 James_F: Zuul: [mediawiki/services/function-schemata] Add generic pipeline CI
  • 16:24 James_F: Zuul: [mediawiki/extensions/DiscussionTools] Test with Echo (Notifications)

2021-02-20

  • 18:35 James_F: Zuul: [mediawiki/services/function-evaluator] Drop direct CI; uses pipeline
  • 18:13 James_F: Zuul: [mediawiki/extensionos/LockAuthor] Enable basic quibble CI

2021-02-19

2021-02-18

  • 01:50 Urbanecm: Kill stuck beta-scap-eqiad job and start a new one to sync beta
  • 00:06 brennen: gerrit: added abstract-wikipedia to members for extension-WikiLambda, mediawiki-services-function-schemata

2021-02-17

2021-02-16

2021-02-15

  • 15:58 hashar: Successfully published image docker-registry.discovery.wmnet/releng/operations-puppet:0.8.1 # T209953

2021-02-14

  • 21:31 James_F: Zuul: Add 'check php' support for library repos
  • 20:12 James_F: Zuul: [mediawiki/services/graphoid] Archive T274738

2021-02-13

  • 03:50 James_F: Zuul: [mediawiki/libs/IDLeDOM] Turn on jenkins CI for the `idle-dom` library

2021-02-12

  • 17:19 brennen: Publishing from dev-images docker-pkg files on primary contint for fr-tech images
  • 12:05 Lucas_WMDE: canceled one beta-scap-eqiad job per https://w.wiki/J5$

2021-02-11

  • 21:44 Krinkle: Logstash in beta is not receiving any events T274593
  • 17:36 James_F: Zuul: [mediawiki/extensions/Acrolinx] Disable running selenium tests
  • 17:14 James_F: Zuul: [mediawiki/extensions/GoogleAppEngine] Archive the extension T274069
  • 09:50 hashar: Successfully build Docker images for Quibble 0.0.46
  • 09:07 hashar: Building Quibble 0.0.46 Docker images on contint1001 (it is faster than contint2001)
  • 01:24 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/663339

2021-02-10

  • 22:55 longma: Deploying zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/661796
  • 20:59 brennen: Attempting one more update from dev-images docker-pkg on contint2001 for T274306
  • 18:36 Urbanecm: deployment-prep: Run scap sync-world as jenkins-deploy
  • 18:36 Urbanecm: deployment-prep deploy01: Run cd /srv/mediawiki-staging/php-master/extensions/PagedTiffHandler && git reset HEAD * && git checkout -- * to fix disappeared extension

2021-02-09

2021-02-08

2021-02-06

  • 21:13 Reedy: unstuck beta jobs

2021-02-04

  • 11:40 Lucas_WMDE: canceled one beta-scap-eqiad job per https://w.wiki/J5$
  • 00:17 James_F: Zuul: [mediawiki/libs/Minify] Install initial CI T273247

2021-02-03

  • 21:38 James_F: Zuul: Archive VirtualKeyboard extension T273801
  • 00:30 James_F: Docker: Publish rust images with default-libmysqlclient-dev
  • 00:24 James_F: Zuul: [mediawiki/extensions/UseResource] Rename from TemplateScripts
  • 00:17 James_F: Zuul: Enable CI for mediawiki/libs/Dodo and mediawiki/libs/WebIDL T273295

2021-02-02

  • 19:43 hashar: Pruning dangling Docker images on contint2001
  • 19:39 hasharDinner: Pruning dangling Docker images on contint1001
  • 19:28 James_F: Zuul: [mediawiki/extensions/PageNotice] Tag as in-wikimedia-production, move T61245
  • 11:27 hashar: gerrit: fixed notifications queries having single quotes instead of double quotes for qchris, arturo and twentyafterfour
  • 10:59 hashar: Marking https://integration.wikimedia.org/ci/computer/compiler1002.puppet-diffs.eqiad.wmflabs/ as offline due to disk space issue # T273599

2021-02-01

2021-01-29

  • 18:51 hashar: CI slightly overloaded due to a surge of library updates but is otherwise processing changes

2021-01-28

2021-01-27

  • 10:29 apergos: decommissioned deployment-snapshot01 at last, long since replaced by deployment-snapshot02

2021-01-26

  • 18:27 marxarelli: restarting jenkins on releases-jenkins.wikimedia.org following plugin updates
  • 18:26 marxarelli: updating pipeline plugins on releases-jenkins.wikimedia.org
  • 18:26 marxarelli: updating git plugins on releases-jenkins.wikimedia.org
  • 16:06 dancy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/658449
  • 05:43 legoktm: reloaded zuul for https://gerrit.wikimedia.org/r/658355

2021-01-25

2021-01-22

  • 08:07 legoktm: manually started mwcore-phpunit-coverage-master job with 6hr timeout

2021-01-19

  • 17:11 James_F: Zuul: [mediawiki/services/function-orchestrator] Add pipeline CI T271761

2021-01-18

  • 23:38 James_F: Zuul: [labs/tools/bodh-backend] Provide CI with tox-docker T272320

2021-01-17

  • 03:44 James_F: Zuul: [mediawiki/core] Add composer (not vendor) experimental PHP 8.0 job T248925

2021-01-16

  • 23:24 James_F: Docker: Building cascade of new php-ast image T271428

2021-01-14

  • 19:08 James_F: Zuul: [mediawiki/extensions/HeadScript] Add quibble job

2021-01-13

  • 15:26 hashar: Pruned Docker containers and images on all Docker based Jenkins agents

2021-01-12

  • 21:15 brennen: Updating dev-images docker-pkg files on primary contint for https://gerrit.wikimedia.org/r/c/releng/dev-images/+/640567
  • 20:13 James_F: Zuul: Remove Disambiguator from Parsoid dependencies (again)
  • 20:06 James_F: Zuul: Add parsoid as a dependency of the Disambiguator extension
  • 20:03 James_F: Zuul: Allow parsoid to be added to dependency and gatedextensions lists
  • 19:33 James_F: Zuul: Revert Parsoid integration job injection.
  • 02:35 James_F: Zuul: [mediawiki/vendor] Experimental composer-php80 job, not 72
  • 01:11 James_F: Zuul: Ensure Parsoid's integration job tests against the Disambiguator extension T237538
  • 01:07 James_F: Zuul: [labs/tools/stewardbots] Enable PHP 8.0 jobs; drop special template

2021-01-11

  • 08:59 hashar: gerrit: created integration/jenkinsci/gearman-plugin.git to maintain the Jenkins Gearman plugin # T271683

2021-01-09

  • 04:30 James_F: Zuul: [mediawiki/libs/RemexHtml] Enable PHP 8.0 jobs, now passing T271575
  • 04:30 James_F: Zuul: [mediawiki/libs/Equivset] Enable PHP 8.0 jobs, now passing T271575

2021-01-08

2021-01-07

2021-01-06

2021-01-05

2021-01-04

  • 22:48 James_F: Zuul: [mediawiki/services/parsoid] Enable PHP 8.0 composer job T269719
  • 22:30 hasharAway: IRC notifications from Jenkins / wmf-insecte disabled for now due to T271122
  • 21:08 hasharAway: Change Jenkins IRC login to mw-jenkinsbot # T271122
  • 17:33 thcipriani: fixed beta-scap-eqiad by removing local mwdeploy user/group using vipw/vigr and chown -R mwdeploy:mwdeploy /srv/mediawiki for deployment-prep hosts

2021-01-02

  • 19:13 James_F: Zuul: Add CI for the Mirage skin T270979

2021-01-01

  • 18:26 James_F: zuul: Try in a second way to only run mwext coverage jobs on master T270976
  • 18:13 James_F: zuul: [mediawiki/extensions/AbuseFilter] Make sqlite tests voting T251967

Archives