You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

SRE Onboarding: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Kormat
m (move gpg generation instruction to pre-boarding)
imported>Muehlenhoff
(More removals)
Line 1: Line 1:
== Phabricator ticket template ==
== Phabricator ticket template ==
This is a template to copy/paste into a Phabricator ticket to get all the needed checkboxes to onboard a member of the SRE team.
This is taken from existing onboarding tickets, to be edited:


<pre>
<pre>
[] Create shell user (can connect to bastions)
[] server root shell (membership in ops group)
[] Phabricator User + 2FA
[] Phabricator permissions to see NDA and Ops restricted tickets, and added to trusted users for antivandal exempt: https://phabricator.wikimedia.org/project/profile/29/ https://phabricator.wikimedia.org/project/profile/61/ https://phabricator.wikimedia.org/project/profile/974/
[] Add to private IRC channels https://office.wikimedia.org/wiki/IRC#Channel_operators_commands
[] Add to ops mailing lists (`ops` and `ops-private` minimum requirements)
[] Add to Exim mail aliases (`root` via `private.git:modules/privateexim/files/wikimedia.org`)
[] Icinga user and permissions (icinga commands, paging/notifications)
[] Add to wmf and ops LDAP groups (for web services)
[] Access to Office Wiki (OIT grants that)
[] Gerrit login and +2 on operations/puppet (this is automatic from being added to LDAP groups above)
[] Access to pwstore
[] Access to Google group for maint-announce mails (directly added user via "web only participation" option from https://groups.google.com/a/wikimedia.org/forum/#!managemembers/ops-maintenance/add though anyone in wikimedia org should be able to join)
[] Access to Google group for maint-announce mails (directly added user via "web only participation" option from https://groups.google.com/a/wikimedia.org/forum/#!managemembers/ops-maintenance/add though anyone in wikimedia org should be able to join)
[] Add to "Ops vendor maintenance" Calendar
[] Add to "Ops vendor maintenance" Calendar
</pre>
</pre>
== Pre-boarding checklist ==
# Get access to @wikimedia.org email
#[[Production shell access|Generate 2 ssh keys]]
#*''prod'' - for production logins e.g. [[:en:Bastion_host|bastions]].
#* ''non-prod'' - for everything else e.g. [[horizonlabs:|Cloud Services (WMCS)]], [https://gerrit.wikimedia.org/ Gerrit] (Note: sometimes this is key is referred to as ''cloud'' or ''labs'' key)
# Select a shell/unix name, '''<u>this is important, choose wisely</u>'''.
# Select a username on [https://wikitech.wikimedia.org/ Wikitech],  '''<u>this is important, choose wisely</u>'''
#* The [https://wikitech.wikimedia.org/ Wikitech] username (aka ''LDAP username'') will be visible in many many places (including [https://gerrit.wikimedia.org/ Gerrit]). '''You will be seeing it ALL THE TIME.''' Your full name ("Jimmy Wales") is not the worst idea here.
# Generate a GPG key
#* Run <code>gpg2 --full-generate-key</code>  and follow the prompts (default expiration is never, but should limit it to 1 year)
# Have a mobile phone number to receive work SMSes and occasionally calls
# Choose an IRC nickname. Make sure nick you turn nick enforcement is on or/and get an [[metawiki:IRC/Cloaks|IRC cloak]]
==== Wikitech ====
After creating an account on [https://wikitech.wikimedia.org/ Wikitech], go to [[wikitech:Special:Preferences|preferences]]
* User Profile: Enable 2FA (hint: use [https://play.google.com/store/apps/details?id=com.google.android.apps.authenticator2&hl=el Google Authenticator])
* Openstack: Add your ''non-prod'' key
== Technical details for each checkbox ==
Note: the shell user + phabricator + ldap user should be done as part of the regular shell access as documented in https://wikitech.wikimedia.org/wiki/Production_shell_access. What's different is that we can skip the "creating a ticket" part.
These are partially instructions for an existing team member who is doing onboarding and partially they can be done by the onboarded person themselves. It is not set in stone who exactly runs the commands.
=== add to wmf and ops LDAP groups ===
Connect to the maintenance host (currently: mwmaint1002.eqiad.wmnet)
[mwmaint1001:~] $ sudo modify-ldap-group ops
[mwmaint1001:~] $ sudo modify-ldap-group wmf
Important: If you do this step you must also do the next step and create a puppet change.
=== add to puppet admins module ===
Git clone the operations/puppet repo and go to modules/admin/data/data.yaml.
Add the user to the "ldap_only_users:" section if they have only LDAP membership.
If they have shell access AND LDAP membership they should be in the general section and not be duplicated.
=== gerrit login and +2 on operations/puppet ===
Use your [https://wikitech.wikimedia.org/ Wikitech] credentials to login to [https://gerrit.wikimedia.org/ Gerrit]. Go to [https://gerrit.wikimedia.org/r/#/settings/ssh-keys preferences] and add your ''non-prod'' key. Your onboarding person should give you +2 voting rights on the operations/puppet repository
=== phabricator login ===
* Login to [https://phabricator.wikimedia.org/auth/start/?next=%2F phabricator] using your [https://wikitech.wikimedia.org/ Wikitech] credentials.
* Enable ''2FA'' on your [https://phabricator.wikimedia.org/auth/start/?next=%2F phabricator] account: Settings -> Multi-Factor Auth
=== phabricator permissions for restricted tickets ===
Your onboarding person or any Phabricator admin should add you to both of these groups (<u>please have 2FA enabled on your account before proceeding</u>):
* [[phab:project/members/974/|WMF-NDA-Requests]]
* [[phab:project/members/61/|WMF-NDA]]
=== shell user (connecting to bastions) ===
* Read [[phab:L3|Wikimedia Server Access Responsibilities]] for your responsibilities, and [[Bastion]]  or more details on the bastion hosts. Pick the one geographically closest to you.
* Read [[Production shell access]] and copy the SSH config described there.
* Prepare a [[Adding_users_on_puppet|patch for your user on puppet]].
===  server root shell ===
This requires either an approval in the weekly SRE meeting or from a manager.
Go to <code>puppet/modules/admin/data/data.yaml</code> and add yourself to the <tt>groups:ops:members</tt> list
=== add to private IRC channels === 
=== add to ops mailing lists ===
You can either ask the [https://lists.wikimedia.org/mailman/listinfo/ops list admins] (email them at <tt><list name>-owner@lists.wikimedia.org</tt>) or  your onboarding buddy can do that for you.
=== add to exim mail aliases ===
* <tt>ssh</tt> to a puppetmaster and <tt>cd</tt> to the directory containing aliases
  $ cd /srv/private/modules/privateexim/files
* open <code>wikimedia.org</code> for editing using <tt>sudo</tt>
* add your email (without @wikimedia.org), to the <tt>root</tt> alias and save
* commit it
  $ sudo git commit -m 'Added <EMAIL> to root@ alias'
* create email filters using the webmail :)
=== icinga login ===
The web UI is at https://icinga.wikimedia.org. There is Apache simple auth in front of it for security reasons but Icinga itself has no idea about that.
To be able to login you just need a valid LDAP (Wikitech wiki) user that is in one of the groups "ops", "wmf" or "nda". This is the same user you use for Gerrit or (most likely) Phabricator.
This is a read-only login that doesn't involve the right to execute commands from the web UI. For this see further below.
=== icinga permissions ===
First see the part above about having a working login user. This part is only needed for additional privileges to run commands from the Icinga web UI such as scheduling downtimes, disabling notifications, leaving comments etc. It includes setting up a contact in the private repo with paging and being added to cgi.cfg for global permissions to run host commands.
* go to private repo on puppetmaster (the same that holds passwords and the private exim alias files)
** add a contact in /srv/private/modules/secret/secrets/nagios/contacts.cfg
** pick your timezone (if you want a new one do it in public repo in timeperiods.cfg)
** if your mobile phone provider has an email2SMS gateway, use the address for that as "address1", you can ignore "pager". If you don't have one use "AQL" (WMF pays for this service). You can copy the format from other existing users.
** git commit locally, run puppet on icinga server, check icinga server config is syntactically correct
* go back to public repo and add your new contact into the contactgroup called "sms", merge in gerrit, again run puppet on the icinga server and check the config isn't broken (icinga -v /etc/icinga/icinga.cfg should show 0 errors or warnings)
In general Icinga will give privileges to any "contact" (user) who is a contact for a specific service or host. So if a custom contact group for a service/host is defined in puppet and the user is a member of that contactgroup then they have the right to run commands. This should be the preferred method for external users to give them rights to "their" services. In SRE we are using a global override to give ourselves unlimited privileges on all services/hosts regardless of contact groups. The global override is configured in cgi.cfg.
* find "cgi.cfg" in the public repo and add your new contact name to all the "privileged" lines, careful, this needs to match the "CN" field in [[LDAP#Common_LDAP_administrative_actions|LDAP]], this can be different from your shell user name
* extra caveat: Apache simple auth doesnt care about capitalization of your user name, you could be logged in as "foo" or "Foo" in the Icinga web UI but Icinga itself matches the contact name to give you privileges and it does care about capitalization. This means it's possible to be logged in with the "wrong" version that doesn't get the Icinga privileges. In that case log out and log in with the other variant. There is no "logout" link so you have to close your browser session / empty cache or use another one.
* in the Icinga web UI pick a random test host and locate the box with "host commands". Use "schedule downtime" and schedule a short downtime of a few minutes, or use "send custom notification" and watch the IRC channel for a response from the bot icinga-wm to confirm you have the privileges. If this works you are good.
=== access to pwstore ===
* Using the gpg key you generated in [[#Pre-boarding checklist|pre-boarding]]: <code>gpg2 --keyserver pgp.mit.edu --send-keys <key id></code> uploads public key to [[:en:Key_server_(cryptographic)|key server]]
* reach out to any SRE to get it signed. It requires two signatures from members of the SRE team.


== See Also ==
== See Also ==
*[[Ops_Onboarding_Chats|Ops Onboarding Chat Sessions]]
*[[Ops_Onboarding_Chats|Ops Onboarding Chat Sessions]]
*[[Infrastructure naming conventions]]
*[[Adding_users_on_puppet|Adding users on Puppet]]
*https://office.wikimedia.org/wiki/Technology/Onboarding/Checklists/Template
*https://office.wikimedia.org/wiki/Technology/Onboarding/Checklists/Template
*https://office.wikimedia.org/wiki/SRE/SRE_sessions
*https://office.wikimedia.org/wiki/SRE/SRE_sessions

Revision as of 16:30, 16 April 2020

Phabricator ticket template

[] Access to Google group for maint-announce mails (directly added user via "web only participation" option from https://groups.google.com/a/wikimedia.org/forum/#!managemembers/ops-maintenance/add though anyone in wikimedia org should be able to join)
[] Add to "Ops vendor maintenance" Calendar

See Also

Related Videos