You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

SRE Onboarding

From Wikitech-static
Revision as of 12:34, 14 April 2020 by imported>Kormat (move gpg generation instruction to pre-boarding)
Jump to navigation Jump to search

Phabricator ticket template

This is a template to copy/paste into a Phabricator ticket to get all the needed checkboxes to onboard a member of the SRE team.

This is taken from existing onboarding tickets, to be edited:

[] Create shell user (can connect to bastions)
[] server root shell (membership in ops group)
[] Phabricator User + 2FA
[] Phabricator permissions to see NDA and Ops restricted tickets, and added to trusted users for antivandal exempt: https://phabricator.wikimedia.org/project/profile/29/ https://phabricator.wikimedia.org/project/profile/61/ https://phabricator.wikimedia.org/project/profile/974/
[] Add to private IRC channels https://office.wikimedia.org/wiki/IRC#Channel_operators_commands
[] Add to ops mailing lists (`ops` and `ops-private` minimum requirements)
[] Add to Exim mail aliases (`root` via `private.git:modules/privateexim/files/wikimedia.org`)
[] Icinga user and permissions (icinga commands, paging/notifications)
[] Add to wmf and ops LDAP groups (for web services)
[] Access to Office Wiki (OIT grants that)
[] Gerrit login and +2 on operations/puppet (this is automatic from being added to LDAP groups above)
[] Access to pwstore 
[] Access to Google group for maint-announce mails (directly added user via "web only participation" option from https://groups.google.com/a/wikimedia.org/forum/#!managemembers/ops-maintenance/add though anyone in wikimedia org should be able to join)
[] Add to "Ops vendor maintenance" Calendar

Pre-boarding checklist

  1. Get access to @wikimedia.org email
  2. Generate 2 ssh keys
    • prod - for production logins e.g. bastions.
    • non-prod - for everything else e.g. Cloud Services (WMCS), Gerrit (Note: sometimes this is key is referred to as cloud or labs key)
  3. Select a shell/unix name, this is important, choose wisely.
  4. Select a username on Wikitech, this is important, choose wisely
    • The Wikitech username (aka LDAP username) will be visible in many many places (including Gerrit). You will be seeing it ALL THE TIME. Your full name ("Jimmy Wales") is not the worst idea here.
  5. Generate a GPG key
    • Run gpg2 --full-generate-key and follow the prompts (default expiration is never, but should limit it to 1 year)
  6. Have a mobile phone number to receive work SMSes and occasionally calls
  7. Choose an IRC nickname. Make sure nick you turn nick enforcement is on or/and get an IRC cloak

Wikitech

After creating an account on Wikitech, go to preferences

Technical details for each checkbox

Note: the shell user + phabricator + ldap user should be done as part of the regular shell access as documented in https://wikitech.wikimedia.org/wiki/Production_shell_access. What's different is that we can skip the "creating a ticket" part.

These are partially instructions for an existing team member who is doing onboarding and partially they can be done by the onboarded person themselves. It is not set in stone who exactly runs the commands.

add to wmf and ops LDAP groups

Connect to the maintenance host (currently: mwmaint1002.eqiad.wmnet)

[mwmaint1001:~] $ sudo modify-ldap-group ops
[mwmaint1001:~] $ sudo modify-ldap-group wmf

Important: If you do this step you must also do the next step and create a puppet change.

add to puppet admins module

Git clone the operations/puppet repo and go to modules/admin/data/data.yaml.

Add the user to the "ldap_only_users:" section if they have only LDAP membership.

If they have shell access AND LDAP membership they should be in the general section and not be duplicated.

gerrit login and +2 on operations/puppet

Use your Wikitech credentials to login to Gerrit. Go to preferences and add your non-prod key. Your onboarding person should give you +2 voting rights on the operations/puppet repository

phabricator login

phabricator permissions for restricted tickets

Your onboarding person or any Phabricator admin should add you to both of these groups (please have 2FA enabled on your account before proceeding):

shell user (connecting to bastions)

server root shell

This requires either an approval in the weekly SRE meeting or from a manager. Go to puppet/modules/admin/data/data.yaml and add yourself to the groups:ops:members list

add to private IRC channels

add to ops mailing lists

You can either ask the list admins (email them at <list name>-owner@lists.wikimedia.org) or your onboarding buddy can do that for you.

add to exim mail aliases

  • ssh to a puppetmaster and cd to the directory containing aliases
 $ cd /srv/private/modules/privateexim/files
  • open wikimedia.org for editing using sudo
  • add your email (without @wikimedia.org), to the root alias and save
  • commit it
 $ sudo git commit -m 'Added <EMAIL> to root@ alias'
  • create email filters using the webmail :)

icinga login

The web UI is at https://icinga.wikimedia.org. There is Apache simple auth in front of it for security reasons but Icinga itself has no idea about that. To be able to login you just need a valid LDAP (Wikitech wiki) user that is in one of the groups "ops", "wmf" or "nda". This is the same user you use for Gerrit or (most likely) Phabricator. This is a read-only login that doesn't involve the right to execute commands from the web UI. For this see further below.

icinga permissions

First see the part above about having a working login user. This part is only needed for additional privileges to run commands from the Icinga web UI such as scheduling downtimes, disabling notifications, leaving comments etc. It includes setting up a contact in the private repo with paging and being added to cgi.cfg for global permissions to run host commands.

  • go to private repo on puppetmaster (the same that holds passwords and the private exim alias files)
    • add a contact in /srv/private/modules/secret/secrets/nagios/contacts.cfg
    • pick your timezone (if you want a new one do it in public repo in timeperiods.cfg)
    • if your mobile phone provider has an email2SMS gateway, use the address for that as "address1", you can ignore "pager". If you don't have one use "AQL" (WMF pays for this service). You can copy the format from other existing users.
    • git commit locally, run puppet on icinga server, check icinga server config is syntactically correct
  • go back to public repo and add your new contact into the contactgroup called "sms", merge in gerrit, again run puppet on the icinga server and check the config isn't broken (icinga -v /etc/icinga/icinga.cfg should show 0 errors or warnings)

In general Icinga will give privileges to any "contact" (user) who is a contact for a specific service or host. So if a custom contact group for a service/host is defined in puppet and the user is a member of that contactgroup then they have the right to run commands. This should be the preferred method for external users to give them rights to "their" services. In SRE we are using a global override to give ourselves unlimited privileges on all services/hosts regardless of contact groups. The global override is configured in cgi.cfg.

  • find "cgi.cfg" in the public repo and add your new contact name to all the "privileged" lines, careful, this needs to match the "CN" field in LDAP, this can be different from your shell user name
  • extra caveat: Apache simple auth doesnt care about capitalization of your user name, you could be logged in as "foo" or "Foo" in the Icinga web UI but Icinga itself matches the contact name to give you privileges and it does care about capitalization. This means it's possible to be logged in with the "wrong" version that doesn't get the Icinga privileges. In that case log out and log in with the other variant. There is no "logout" link so you have to close your browser session / empty cache or use another one.
  • in the Icinga web UI pick a random test host and locate the box with "host commands". Use "schedule downtime" and schedule a short downtime of a few minutes, or use "send custom notification" and watch the IRC channel for a response from the bot icinga-wm to confirm you have the privileges. If this works you are good.

access to pwstore

  • Using the gpg key you generated in pre-boarding: gpg2 --keyserver pgp.mit.edu --send-keys <key id> uploads public key to key server
  • reach out to any SRE to get it signed. It requires two signatures from members of the SRE team.

See Also

Related Videos