You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Nova Resource:Integration/Setup: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Krinkle
No edit summary
imported>Samtar
(update post T252071 completion)
 
(3 intermediate revisions by 3 users not shown)
Line 2: Line 2:


=== integration-agent-{type}-XXXX ===
=== integration-agent-{type}-XXXX ===
''Updated September 2019 based on https://phabricator.wikimedia.org/T226233''
''Updated September 2019 based on [[phab:T226233|T226233]]''
''Updated January 2021''
 
The instances are created via https://horizon.wikimedia.org/project/instances/ , you will need a source image to pick and an instance flavor.
 
 
* '''Source''': pick the <code>debian-11.0-bullseye</code> image now that [[:phab:T252071|T252071]] is complete
 
 
For the flavor the important parts are:
* have enough disk space (docker role notably requests 24G for /var/lib/docker and you would need enough disk remaining for /srv).
* have a <code>4xiops</code> flavor which dramatically boost the underlying Disk IO rate limiting applied to all WMCS instances.
 
* '''Flavor''': pick <code>g3.cores8.ram24.disk20.ephemeral40.4xiops</code>


On https://horizon.wikimedia.org/project/instances/
* Create a new instance named <code>integration-agent-{type}-XXXX</code> where <code>{type}</code> is a role (example: <code>docker</code>) and <code>XXXX</code> increments starting from 1001.
* Create a new instance named <code>integration-agent-{type}-XXXX</code> where <code>{type}</code> is a role (example: <code>docker</code>) and <code>XXXX</code> increments starting from 1001.
* Source: pick the Stretch image
 
* Flavor: pick <code>mediumram</code> flavor
 


Wait a few minutes (during which the instance is created, initial setup happens). Then connect to the instance over SSH and fix puppet:
Wait a few minutes (during which the instance is created, initial setup happens). Then connect to the instance over SSH and fix puppet:
Line 21: Line 33:
* Click the instance then head to the tab <code>Puppet</code>
* Click the instance then head to the tab <code>Puppet</code>
* Pick <code>role::ci::slave::labs::docker</code>
* Pick <code>role::ci::slave::labs::docker</code>
* For a Docker agent, set <code>docker_lvm_volume</code> variable to <code>true</code>.


Run puppet on the instance (<code>puppet agent -tv</code>) and verify:
The Docker agent will have a 24G <code>/var/lib/docker</code> partition, the remaining disk space is allocated to <code>/srv</code>.
 
Run Puppet on the instance (<code>puppet agent -tv</code>) and verify:
* If a Docker agent, make sure there is a <code>/var/lib/docker</code> partition for Docker
* If a Docker agent, make sure there is a <code>/var/lib/docker</code> partition for Docker
* Clean unused packages: <code>apt-get autoremove --purge</code>
* Clean unused packages: <code>apt-get autoremove --purge</code>
* Upgrade packages: <code>apt-get -y dist-upgrade</code>
* Upgrade packages: <code>apt-get -y dist-upgrade</code>


'''Reboot the instance''' (Before adding to Jenkins). This cleans state, launches deamons, and fixes Shinken monitoring ([[phabricator:T91351]]). Once it is back, you can then add it to Jenkins
'''Reboot the instance''' (Before adding to Jenkins). This cleans state, take in account the new Linux kernel if any, launches daemons. Once it is back, you can then add it to Jenkins


Add the instance to Jenkins
Add the instance to Jenkins

Latest revision as of 15:44, 10 June 2022

Roles

integration-agent-{type}-XXXX

Updated September 2019 based on T226233 Updated January 2021

The instances are created via https://horizon.wikimedia.org/project/instances/ , you will need a source image to pick and an instance flavor.


  • Source: pick the debian-11.0-bullseye image now that T252071 is complete


For the flavor the important parts are:

  • have enough disk space (docker role notably requests 24G for /var/lib/docker and you would need enough disk remaining for /srv).
  • have a 4xiops flavor which dramatically boost the underlying Disk IO rate limiting applied to all WMCS instances.
  • Flavor: pick g3.cores8.ram24.disk20.ephemeral40.4xiops
  • Create a new instance named integration-agent-{type}-XXXX where {type} is a role (example: docker) and XXXX increments starting from 1001.


Wait a few minutes (during which the instance is created, initial setup happens). Then connect to the instance over SSH and fix puppet:

  • sudo rm -fR /var/lib/puppet/ssl && sudo puppet agent -tv
  • If that complains:
    • get the instance fully qualified domain name (FQDN): hostname --fqdn
    • On integration-puppetmaster-02.integration.eqiad.wmflabs, clean the old and invalid certificate(s): sudo puppet cert clean <FQDN OF INSTANCE HERE

Apply the Puppet role:

The Docker agent will have a 24G /var/lib/docker partition, the remaining disk space is allocated to /srv.

Run Puppet on the instance (puppet agent -tv) and verify:

  • If a Docker agent, make sure there is a /var/lib/docker partition for Docker
  • Clean unused packages: apt-get autoremove --purge
  • Upgrade packages: apt-get -y dist-upgrade

Reboot the instance (Before adding to Jenkins). This cleans state, take in account the new Linux kernel if any, launches daemons. Once it is back, you can then add it to Jenkins

Add the instance to Jenkins

  1. Create "New Node" in Jenkins management
    • Name: (short hostname of instance)
    • Type: Permanent Agent
    • Executors: 1 (for Docker agents: 4, for Qemu agents: 1)
    • Remote root directory: /srv/jenkins/workspace
    • Labels:
      • For Docker agents: Docker
      • For Qemu agents: Qemu
    • Usage: EXCLUSIVE (Only build jobs with label restrictions matching this node)
    • Launch method: SSH
      • Host: (internal IP of instance)
      • Credentials: jenkins-deploy (key from role::ci::slave::labs::common)
    • Availability: Always (Keep this slave on-line as much as possible)

The Jenkins master will automatically trust the ssh key upon the first connection.

integration-dev

  1. Create instance:
    • m1.medium
    • Security group: Default
  2. Wait 10 minutes
  3. Reconfigure instance from wikitech: Enable role::ci::slave::labs.
  4. Via SSH, force a puppet run (applies role).

Utilities

puppet

Use sudo /usr/local/sbin/puppet-run &. Don't use sudo puppet agent -t, because that is not what cron uses and leads to inconsistencies with e.g. umask and other factors affecting default values used at runtime.