You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Analytics/Onboarding: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Fdans
(Added requirement of adding team member to labs projects)
imported>Nintendofan885
m (Resolving double redirect)
 
(3 intermediate revisions by 3 users not shown)
Line 1: Line 1:
=Background=
#REDIRECT [[Data Engineering/Team/Onboarding]]
You will need lots of accounts, memberships and other secret keys to become a real productive member of the Analytics team. Here's an overview of things you should do in the first week. <b>Please update this document as you go along!</b> Last but not the least, the most important thing: '''welcome to the Analytics team!'''
= First Steps =
This section is related to the first logistic steps to effectively join the Wikimedia's staff crew. Take your time to explore and look around, don't rush!
 
== Wikimedia tech employee orientation ==
Starting point for each new Wikimedia employee: https://office.wikimedia.org/wiki/New_tech_employee_orientation
 
== Wikimedia account ==
Your manager and the Wikimedia's IT department will help you open several accounts including your work email. Right after this step, you will be able to communicate and participate to the day to day discussions between staff members. Be patient and don't be scared about the huge amount of information and emails that you'll receive!
 
==E-mail lists==
Reading mailing lists is important. All projects we build or use are open-source, and as most open-source projects, they have communities which come together on mailing lists. There is much knowledge to be gained in these mailing lists.
 
Once you have a Wikimedia e-mail address you should subscribe yourself to these e-mail lists:
* [[mail:analytics-internal|analytics-internal@]] ([https://lists.wikimedia.org/mailman/private/analytics-internal/ Archive])
 
* [[mail:analytics|analytics@]] ([[mailarchive:analytics|Archive]])
* [[mail:wikimetrics|wikimetrics@]]
* analytics-alerts - needs a phabricator task like [[phab:T123141|https://phabricator.wikimedia.org/T123141]]
* Please request acces to:
** [[mail:Ops|Operations]]
** [[mail:Engineering|Engineering]]
For an overview of all available mailing lists see [https://lists.wikimedia.org/mailman/listinfo https://lists.wikimedia.org/mailman/listinfo]
 
* Optionally you may want to read archives or subscribe to the following mailing lists:
# [https://lists.wikimedia.org/mailman/listinfo/mediawiki-l/ Mediawiki]
# [https://lists.wikimedia.org/mailman/listinfo/mobile-l/ Mobile]
# [https://lists.wikimedia.org/mailman/listinfo/mediawiki-api/ Mediawiki API]
 
If there are mailing lists you want to read without subscribing you may consider using the following gateways:
# [http://www.mail-archive.com/find.php?q=wikimedia&sa=Search&lists=all Mail Archive]
# [http://news.gmane.org/index.php?prefix=gmane.org.wikimedia Gmane]
# [http://www.gossamer-threads.com/lists/wiki/ Gossamer Threads]
==IRC==
Most of our communication happens on IRC, you should set up an IRC nick
# Install an IRC client -- ask team members for recommendations ( some would be [http://quassel-irc.org/ quassel], [http://irssi.org/ irssi], [http://www.pidgin.im/ pidgin], [http://xchat.org/ xchat], [http://www.codeux.com/textual/ textual] or [https://adium.im/ adium] if you're on a Mac )
# Follow instructions on [https://meta.wikimedia.org/wiki/IRC/Cloaks https://meta.wikimedia.org/wiki/IRC/Cloaks] to request an IRC cloack
# Connect to #wikimedia-analytics on Freenode
# Other channels you might be interested in:
#wikimedia-labs, #wikimedia-operations, #wikimedia-office
 
==Office wiki ==
Make sure you  have an employee account and that you can use the office wiki, your office wiki user will be given to you once
you get your wikimedia e-mail address.
 
https://office.wikimedia.org/wiki/Getting_Started_With_User_Info_and_Talk_Pages
 
==Headset==
Please buy a high-quality headset -- your colleagues will love you for this. For more tips see [https://office.wikimedia.org/wiki/Office_IT/Projects/Telepresence https://office.wikimedia.org/wiki/Office_IT/Projects/Telepresence]
 
==Culture==
We are part of a movement with a unique culture.  It's worth taking the time to read a bit about how our biggest project works.  This policy could be a useful start, as it introduces the core concepts from a concrete point of view: https://en.wikipedia.org/wiki/Wikipedia:Biographies_of_living_persons
 
=Getting permit=
Please follow the next subsections (order matters!) to get permissions for various fundamental services. Access to Production will be covered in a separate section.
 
==Wikitech/Labs==
Labs is a cluster of virtual machines. Access is completely decouple from production and different ssh keys should be used.
 
Labs is not production but we have several tools hosted on the cluster, accessing to labs requires a [https://wikitech.wikimedia.org wikitech] account:
# [https://wikitech.wikimedia.org/w/index.php?title=Special:UserLogin&type=signup&returnto=Main+Page Create] account
# [[Help:Access|Request]] Shell Access
# Log in
# You need to [[Help:Access|set up]] ssh keys
# [[Special:Preferences|Upload]] your public SSH key. Please have in mind that labs is a testing environment thus this ssh key should only be used in testing, if you need access to machines in the production cluster your ssh key should be different (see section below about Production access).
# Configure your '''~/.ssh/config''' with bastion hosts
# Ask someone in the team to add you to the relevant projects in Labs. Once you're added you should be able to see the projects' instances listed [[Special:NovaInstance|here]].
# Get familiar with the labs environment, how to use the labs interface to spin up nodes, remove nodes, etc
 
== Phabricator ==
https://phabricator.wikimedia.org is the version of Phabricator that we use. Follow [https://www.mediawiki.org/wiki/Phabricator/Help#Creating_your_account_and_notifications this page] to log in for the first time (please use the sunflower icon as suggested by the tutorial to leverage the single sign on).
 
==Mediawiki==
# [https://www.mediawiki.org/w/index.php?title=Special:UserLogin&returnto=Analytics&type=signup Create an account]
# Log in
 
==Gerrit==
# Gerrit is the code review workflow we use, build on top of git
# Log in to [https://gerrit.wikimedia.org/r/#/ Gerrit] using your Wikitech/Labs credentials.
# To verify everything works, clone a repo repo from [https://gerrit.wikimedia.org/r/#/admin/projects/?filter=analytic https://gerrit.wikimedia.org/r/#/admin/projects/?filter=analytics] using SSH.
# Take a look at how to deal with gerrit in different work scenarios: http://etherpad.wikimedia.org/p/analytics-gerrit
 
= Accessing production infrastructure=
With great power comes great responsibility. Please do read carefully the [https://wikitech.wikimedia.org/wiki/SSH_access Wikimedia's SSH access guidelines] and familiarize with your new SSH config before proceeding. Moreover we manage very sensitive data, please read [[Analytics/Data access]] to familiarize yourself with our procedures. 
 
== Shell access to Wikimedia cluster and production infrastructure==
Tickets are filed for the ops team to see and need to be approved by a manger (example: [[phab:T96053|https://phabricator.wikimedia.org/T96053]]).
 
Talk with [[User:Aotto|Andrew Otto]] about how to submit your ssh public key. You would likely need to proxy your ssh connection from a know machine to access some of the hosts above.  You should not use the same ssh key for labs (testing) and stat1 machines (production).
 
The easiest would be to ask some team member for its .ssh/config file and get the proxy setup.
 
Please have in mind that different processes are required to access production machines (stat1) and testing machines (labs)
 
== Sample ssh config ==
See [[SSH access|SSH access#SSH configuration]] for sample SSH config. If you're in the analytics team you will probably SSH into both Labs and Production, so add relevant config for both in your ~/.ssh/config file.
== Logging in ==
Once you have your SSH setup in place and your credentials have been approved by Ops (using the Phabricator task created before) you will be able to explore the Analytic infrastructure. Please start from [[Analytics]] and check the instruction for projects, for example:
* [[Analytics/Cluster]]
* [[Analytics/Vital Signs]]
* [[Analytics/EventLogging]]
Talk with the people of your team on IRC about their work and pointers to their projects, so you will get a more precise idea about who does what. Be patient, it will take a while to get a good overall picture!
 
= Process =
==Google Calendar==
Add the Analytics Team Calendar to your default view.  Someone (we all can manage sharing) should go to https://calendar.google.org and add you:
* My Calendars -> Settings
* Click Team Analytics -> Share This Calendar
* Add the new person
 
== Scrum ==
[[Analytics/Scrum_Planning]]
=Equipment=
 
==Hardware==
As far as equipment goes you will need a good development machine.
 
Minimum machine specs:
 
* >=4GB RAM
* i7 >= 2.4 Ghz quad-core or better
* 300GB disk
 
Recommended machine specs:
 
* >=8GB RAM
* i7 >= 2.4 Ghz quad-core or better
* SSD (if you're going to be working on wikimetrics)
* 300GB disk
 
At first sight you might think these are not required, but you will have to run VMs, you will be using vagrant
to re-create various environments(sometimes with multiple nodes), so you will need some hardware for that.
=Optional accounts=
 
You could consider creating accounts for:
* [https://github.com GitHub]
 
=Operating System=
 
The machines we deploy on are using Ubuntu and it would be more convenient for you to have
[http://www.ubuntu.com/download/desktop Ubuntu] installed on your development machine or any other UNIX based operating system.
It will considerably facilitate your work.
You may choose [http://distrowatch.com/ any other Linux distribution] you're familiar with.
 
Mac is also a very possible choice.
 
=Misc=
 
This is a collection of things you might find useful in your work.
 
==Sync tools==
 
You may find the following tools useful for sync-ing files between your local machine and remote machines(one-way or two-way). You can also mount remote directories as if they were your local directories:
 
# '''[http://fuse.sourceforge.net/sshfs.html sshfs]'''
# '''[http://linux.die.net/man/1/rsync rsync]'''
# [http://code.google.com/p/lsyncd/ lsync]
# [http://www.cis.upenn.edu/~bcpierce/unison/ unison]
# '''[http://linux.die.net/man/1/scp scp]'''
 
==IDEs and editors==
 
For Java development, you may use what IDE you feel comfortable with. [http://www.eclipse.org/kepler/ Eclipse] is the IDE du jour, but you might want to look at [http://www.jetbrains.com/idea/ IDEA] also.
For remote development you may find [http://www.vim.org/ vim] to be useful(or a combination of a sync tool and your favorite editor/IDE).
Other editors you might find useful may include [http://www.sublimetext.com/2 Sublime Text], [http://www.gnu.org/software/emacs/ Emacs].
 
==Searching==
 
You may find the following tools useful to search through configuration files or code:
 
# [http://beyondgrep.com/ Ack] (mainly for grepping code. [http://www.youtube.com/watch?v=sKmyl5D8Da8 video presentation])
# [http://www.gnu.org/software/grep/manual/grep.html grep]
# [http://www.gnu.org/software/findutils/manual/html_mono/find.html GNU find]
 
[[Category:Wikimedia Foundation teams internals]]
 
==Environment simulation==
 
It may be useful that you familiarize yourself with [http://www.vagrantup.com/ Vagrant] and [http://docs.puppetlabs.com/learning/ Puppet] to be able to recreate smaller environments/conditions on your machine to test various software you're developing or contributing to.
 
== Important Talks ==
Talks recommended by other members of the Analytics team:
* [https://www.youtube.com/watch?v=-We4GZbH3Iw&feature=youtu.be&t=33m51s The Paramecium Talk], Aaron Halfaker
* [https://www.youtube.com/watch?v=XPsSXczerDQ Kafka @ Wikimedia foundation], Andrew Otto
* [https://plus.google.com/u/0/events/c53ho5esd0luccd09a1c30rlrmg Hadoop and Beyond. An overview of Analytics infrastructure]

Latest revision as of 11:05, 30 November 2021