You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Nova Resource:Integration/Setup: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Addshore
imported>Samtar
(update post T252071 completion)
 
(7 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== Roles ==
== Roles ==


=== integration-slave-{type}-XXXX ===
=== integration-agent-{type}-XXXX ===
# Create instance:
''Updated September 2019 based on [[phab:T226233|T226233]]''
#* <code>ci1.medium</code>
''Updated January 2021''
#* Security group: Default<br>(Puppet master is auto-configured via [[Hiera:Integration]])
 
# Wait about 10 minutes (during which the instance is created, initial setup happens).
The instances are created via https://horizon.wikimedia.org/project/instances/ , you will need a source image to pick and an instance flavor.
# Connect to the instance over SSH.
 
# [[Standalone puppetmaster|Configure the server]] to use the integration-puppetmaster.
 
# Reconfigure instance from wikitech: Enable <code>role::ci::slave::labs</code>.
* '''Source''': pick the <code>debian-11.0-bullseye</code> image now that [[:phab:T252071|T252071]] is complete
# Via SSH, [[#puppet|force a puppet run]] (provisions the instance, takes about an hour).
 
# You might have to force run puppet a few more times to complete provisioning
 
# Once puppet is done, perform package upgrade: <tt>apt-get -y dist-upgrade</tt>
For the flavor the important parts are:
# '''Reboot the instance''' (Before adding to Jenkins). This cleans state, launches deamons, and fixes Shinken monitoring ([[phabricator:T91351]]).
* have enough disk space (docker role notably requests 24G for /var/lib/docker and you would need enough disk remaining for /srv).
* have a <code>4xiops</code> flavor which dramatically boost the underlying Disk IO rate limiting applied to all WMCS instances.
 
* '''Flavor''': pick <code>g3.cores8.ram24.disk20.ephemeral40.4xiops</code>
 
* Create a new instance named <code>integration-agent-{type}-XXXX</code> where <code>{type}</code> is a role (example: <code>docker</code>) and <code>XXXX</code> increments starting from 1001.
 
 
 
Wait a few minutes (during which the instance is created, initial setup happens). Then connect to the instance over SSH and fix puppet:
 
* <code>sudo rm -fR /var/lib/puppet/ssl && sudo puppet agent -tv</code>
* If that complains:
** get the instance fully qualified domain name (FQDN): <code>hostname --fqdn</code>
** On <code>integration-puppetmaster-02.integration.eqiad.wmflabs</code>, clean the old and invalid certificate(s): <code>sudo puppet cert clean <FQDN OF INSTANCE HERE</code>
 
Apply the Puppet role:
 
* https://horizon.wikimedia.org/project/instances/
* Click the instance then head to the tab <code>Puppet</code>
* Pick <code>role::ci::slave::labs::docker</code>
 
The Docker agent will have a 24G <code>/var/lib/docker</code> partition, the remaining disk space is allocated to <code>/srv</code>.
 
Run Puppet on the instance (<code>puppet agent -tv</code>) and verify:
* If a Docker agent, make sure there is a <code>/var/lib/docker</code> partition for Docker
* Clean unused packages: <code>apt-get autoremove --purge</code>
* Upgrade packages: <code>apt-get -y dist-upgrade</code>
 
'''Reboot the instance''' (Before adding to Jenkins). This cleans state, take in account the new Linux kernel if any, launches daemons. Once it is back, you can then add it to Jenkins
 
Add the instance to Jenkins
 
# Create "New Node" in [https://integration.wikimedia.org/ci/computer/ Jenkins management]
# Create "New Node" in [https://integration.wikimedia.org/ci/computer/ Jenkins management]
#* Name: (short hostname of instance)
#* Name: (short hostname of instance)
#* Type: Dumb
#* Type: Permanent Agent
#* Executors: 1
#* Executors: 1 (for Docker agents: 4, for Qemu agents: 1)
#* Filesystem root: <code>/mnt/jenkins-workspace</code>
#* Remote root directory: <code>/srv/jenkins/workspace</code>
#* Labels:
#* Labels:
#** <code>contintLabsSlave</code>
#** For Docker agents: <code>Docker</code>
#** <code>UbuntuTrusty phpflavor-hhvm phpflavor-php55</code>
#** For Qemu agents: <code>Qemu</code>
#* Usage: <code>EXCLUSIVE</code> (Only build jobs with label restrictions matching this node)
#* Usage: <code>EXCLUSIVE</code> (Only build jobs with label restrictions matching this node)
#* Launch method: SSH
#* Launch method: SSH
Line 26: Line 58:
#** Credentials: jenkins-deploy (key from role::ci::slave::labs::common)
#** Credentials: jenkins-deploy (key from role::ci::slave::labs::common)
#* Availability: <code>Always</code> (Keep this slave on-line as much as possible)
#* Availability: <code>Always</code> (Keep this slave on-line as much as possible)
The Jenkins master will automatically trust the ssh key upon the first connection.


=== integration-dev ===
=== integration-dev ===
Line 36: Line 70:


== Utilities ==
== Utilities ==
=== npm upgrade ===
Always use the original npm that came with Ubuntu to perform the upgrade. This installs the new version in <code>/usr/local/bin</code>, preserving the original.
* For each instance, gracefully depool it (!log it), wait for any running builds on it to complete, and run the following:
** If Ubuntu Trusty:
<source lang=bash>
# Note: Must sudo without preserving user environment, using -i
$ sudo -i
root$ /usr/share/npm/bin/npm-cli.js install -g npm@2.7.6
# Purge both caches
root$ rm -rf /mnt/home/jenkins-deploy/.npm /root/.npm
</source>
* Re-pool instance.
=== puppet ===
=== puppet ===
{{outdated}}
{{outdated}}
Use <code>sudo /usr/local/sbin/puppet-run &</code>. Don't use <s><code>sudo puppet agent -t</code></s>, because that is not what cron uses and leads to inconsistencies with e.g. umask and other factors affecting default values used at runtime.
Use <code>sudo /usr/local/sbin/puppet-run &</code>. Don't use <s><code>sudo puppet agent -t</code></s>, because that is not what cron uses and leads to inconsistencies with e.g. umask and other factors affecting default values used at runtime.
=== screenshot ===
<source lang=bash>
# at integration-slave1010 over ssh
$ import -display :94 -window root "image $(date).png"
# local shell
$ scp integration-slave1010.eqiad.wmflabs:image*.png . && open
</source>
=== Debug MediaWiki frontend ===
Take slave of choosing offline in Jenkins web interface (e.g. press "Mark offline" after [https://integration.wikimedia.org/ci/computer/ selecting a node])
<source lang=bash>
sudo -iu jenkins-deploy
$ cd /mnt/jenkins-workspace/workspace/mediawiki-core-qunit/
$ export WORKSPACE=$PWD; export BUILD_TAG=debug; export EXECUTOR_NUMBER=999
$ cd src/
$ git remote update && git checkout origin/master && git reset --hard HEAD
$ /srv/deployment/integration/slave-scripts/bin/mw-install-mysql.sh
$ /srv/deployment/integration/slave-scripts/bin/mw-apply-settings.sh
$ . /srv/deployment/integration/slave-scripts/bin/mw-set-env-localhost.sh
$ echo -e "<?php\n\$wgServer = '${MW_SERVER}';\n\$wgScriptPath = '${MW_SCRIPT_PATH}';\n\$wgScript=\$wgStylePath=\$wgLogo=false;\n\$wgEnableJavaScriptTest = true;\n" >> "$MW_INSTALL_PATH/LocalSettings.php"
$ . /srv/deployment/integration/slave-scripts/bin/npm-setup.sh
$ rm -rf node_modules && npm install
$ ln -s "$MW_INSTALL_PATH" /srv/localhost-worker/$BUILD_TAG
</source>
Now the install is ready for interacting with.
For example:
<source lang=bash>
$ curl --include "$MW_SERVER/$MW_SCRIPT_PATH/index.php?title=Special:BlankPage"
$ curl --include "$MW_SERVER/$MW_SCRIPT_PATH/load.php?debug=true&modules=startup&only=scripts"
</source>
Or
<source lang=bash>
$ cd $MW_INSTALL_PATH
$ grunt qunit
</source>
'''Remember:''' Clean up MySQL database afterwards by running (needs the same environment, e.g. <code>WORKSPACE</code>, <code>EXECUTOR_NUMBER</code>):
<source lang=bash>
/srv/deployment/integration/slave-scripts/bin/mw-teardown-mysql.sh
</source>

Latest revision as of 15:44, 10 June 2022

Roles

integration-agent-{type}-XXXX

Updated September 2019 based on T226233 Updated January 2021

The instances are created via https://horizon.wikimedia.org/project/instances/ , you will need a source image to pick and an instance flavor.


  • Source: pick the debian-11.0-bullseye image now that T252071 is complete


For the flavor the important parts are:

  • have enough disk space (docker role notably requests 24G for /var/lib/docker and you would need enough disk remaining for /srv).
  • have a 4xiops flavor which dramatically boost the underlying Disk IO rate limiting applied to all WMCS instances.
  • Flavor: pick g3.cores8.ram24.disk20.ephemeral40.4xiops
  • Create a new instance named integration-agent-{type}-XXXX where {type} is a role (example: docker) and XXXX increments starting from 1001.


Wait a few minutes (during which the instance is created, initial setup happens). Then connect to the instance over SSH and fix puppet:

  • sudo rm -fR /var/lib/puppet/ssl && sudo puppet agent -tv
  • If that complains:
    • get the instance fully qualified domain name (FQDN): hostname --fqdn
    • On integration-puppetmaster-02.integration.eqiad.wmflabs, clean the old and invalid certificate(s): sudo puppet cert clean <FQDN OF INSTANCE HERE

Apply the Puppet role:

The Docker agent will have a 24G /var/lib/docker partition, the remaining disk space is allocated to /srv.

Run Puppet on the instance (puppet agent -tv) and verify:

  • If a Docker agent, make sure there is a /var/lib/docker partition for Docker
  • Clean unused packages: apt-get autoremove --purge
  • Upgrade packages: apt-get -y dist-upgrade

Reboot the instance (Before adding to Jenkins). This cleans state, take in account the new Linux kernel if any, launches daemons. Once it is back, you can then add it to Jenkins

Add the instance to Jenkins

  1. Create "New Node" in Jenkins management
    • Name: (short hostname of instance)
    • Type: Permanent Agent
    • Executors: 1 (for Docker agents: 4, for Qemu agents: 1)
    • Remote root directory: /srv/jenkins/workspace
    • Labels:
      • For Docker agents: Docker
      • For Qemu agents: Qemu
    • Usage: EXCLUSIVE (Only build jobs with label restrictions matching this node)
    • Launch method: SSH
      • Host: (internal IP of instance)
      • Credentials: jenkins-deploy (key from role::ci::slave::labs::common)
    • Availability: Always (Keep this slave on-line as much as possible)

The Jenkins master will automatically trust the ssh key upon the first connection.

integration-dev

  1. Create instance:
    • m1.medium
    • Security group: Default
  2. Wait 10 minutes
  3. Reconfigure instance from wikitech: Enable role::ci::slave::labs.
  4. Via SSH, force a puppet run (applies role).

Utilities

puppet

Use sudo /usr/local/sbin/puppet-run &. Don't use sudo puppet agent -t, because that is not what cron uses and leads to inconsistencies with e.g. umask and other factors affecting default values used at runtime.