You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Analytics/Onboarding: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Milimetric
imported>Elukey
(Improvements for newcomers)
Line 1: Line 1:
=Background=
=Background=
You will need lots of accounts, memberships and other secret keys to become a real productive member of the Analytics team. Here's an overview of things you should do in the first week. <b>Please update this document as you go along </b>
You will need lots of accounts, memberships and other secret keys to become a real productive member of the Analytics team. Here's an overview of things you should do in the first week. <b>Please update this document as you go along!</b>
 
 
= First Things =
= First Things =
==e-mail list==
==E-mail lists==
Once you have a wikimedia e-mail address you should subscribe yourself to two e-mail lists:
Once you have a wikimedia e-mail address you should subscribe yourself to two e-mail lists:
analytics-internal@
* [[mail:analytics-internal|analytics-internal@]] ([https://lists.wikimedia.org/mailman/private/analytics-internal/ Archive])
analytics@
 
Link:
https://lists.wikimedia.org/mailman/listinfo/analytics
 
https://lists.wikimedia.org/mailman/listinfo/analytics-internal


Archives:
* [[mail:analytics|analytics@]] ([[mailarchive:analytics|Archive]])
 
https://lists.wikimedia.org/pipermail/analytics/
https://lists.wikimedia.org/mailman/private/analytics-internal/


==IRC==
==IRC==
Line 39: Line 28:
We are part of a movement with a unique culture.  It's worth taking the time to read a bit about how our biggest project works.  This policy could be a useful start, as it introduces the core concepts from a concrete point of view: https://en.wikipedia.org/wiki/Wikipedia:Biographies_of_living_persons
We are part of a movement with a unique culture.  It's worth taking the time to read a bit about how our biggest project works.  This policy could be a useful start, as it introduces the core concepts from a concrete point of view: https://en.wikipedia.org/wiki/Wikipedia:Biographies_of_living_persons


=Getting permits=
=Getting permit=
Sample ticket asking for permits: https://phabricator.wikimedia.org/T96053
The very first thing to do will be to get a labs/Wikitech (they are the same thing) account and a Phabricator account so you can file a task like https://phabricator.wikimedia.org/T96053 (likely somebody else will do it for you).
* The very first thing to do will be to get a labs account and a phabricator account so you can file tickets for anything else you need.
* Labs is a cluster of virtual machines. Access is completely decouple from production and different ssh keys should be used.


==Labs==
Please follow the next subsections (order matters!).
Labs is not production but we have several tools hosted on the cluster, accessing to labs requires a wikitech account (I know, confusing)
 
==Wikitech/Labs==
Labs is a cluster of virtual machines. Access is completely decouple from production and different ssh keys should be used.
 
Labs is not production but we have several tools hosted on the cluster, accessing to labs requires a [https://wikitech.wikimedia.org wikitech] account:
# Create account: [https://wikitech.wikimedia.org/w/index.php?title=Special:UserLogin&type=signup&returnto=Main+Page https://wikitech.wikimedia.org/w/index.php?title=Special:UserLogin&type=signup&returnto=Main+Page]
# Create account: [https://wikitech.wikimedia.org/w/index.php?title=Special:UserLogin&type=signup&returnto=Main+Page https://wikitech.wikimedia.org/w/index.php?title=Special:UserLogin&type=signup&returnto=Main+Page]
# Request Shell Access: https://wikitech.wikimedia.org/wiki/Help:Access
# Request Shell Access: https://wikitech.wikimedia.org/wiki/Help:Access
Line 56: Line 47:


== Phabricator ==
== Phabricator ==
https://phabricator.wikimedia.org is the version of phabricator that we use.You can browse tasks but to create/edit you need access to LDAP.
https://phabricator.wikimedia.org is the version of Phabricator that we use. Follow [[mediawikiwiki:Phabricator/Help#Creating_your_account_and_notifications|https://www.mediawiki.org/wiki/Phabricator/Help#Creating_your_account_and_notifications]] to log in for the first time (please use the sunflower icon as suggested by the tutorial to leverage the single sign on).
 
 
==Mediawiki==
==Mediawiki==
# Create account: [https://www.mediawiki.org/w/index.php?title=Special:UserLogin&returnto=Analytics&type=signup https://www.mediawiki.org/w/index.php?title=Special:UserLogin&returnto=Analytics&type=signup]
# Create account: [https://www.mediawiki.org/w/index.php?title=Special:UserLogin&returnto=Analytics&type=signup https://www.mediawiki.org/w/index.php?title=Special:UserLogin&returnto=Analytics&type=signup]
Line 65: Line 54:
==Gerrit==
==Gerrit==
# Gerrit is the code review workflow we use, build on top of git
# Gerrit is the code review workflow we use, build on top of git
# Log in to [https://gerrit.wikimedia.org/r/#/ Gerrit] using your Labs credentials.
# Log in to [https://gerrit.wikimedia.org/r/#/ Gerrit] using your Wikitech/Labs credentials.
# To verify everything works, clone a repo repo from [https://gerrit.wikimedia.org/r/#/admin/projects/?filter=analytic https://gerrit.wikimedia.org/r/#/admin/projects/?filter=analytics] using SSH.
# To verify everything works, clone a repo repo from [https://gerrit.wikimedia.org/r/#/admin/projects/?filter=analytic https://gerrit.wikimedia.org/r/#/admin/projects/?filter=analytics] using SSH.
# Take a look at how to deal with gerrit in different work scenarios: http://etherpad.wikimedia.org/p/analytics-gerrit
# Take a look at how to deal with gerrit in different work scenarios: http://etherpad.wikimedia.org/p/analytics-gerrit


==Mailinglists==
==Mailing lists==


Reading mailing lists is important. All projects we build or use are opensource, and as most opensource projects, they have communities which come together on mailing lists. There is much knowledge to be gained in these mailing lists.
Reading mailing lists is important. All projects we build or use are open-source, and as most open-source projects, they have communities which come together on mailing lists. There is much knowledge to be gained in these mailing lists.


* Please subscribe to:
* Please subscribe to:
# [https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics]
# [https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics]
# [[mail:analytics-internal|https://lists.wikimedia.org/mailman/listinfo/analytics-internal]]
# [https://lists.wikimedia.org/mailman/listinfo/wikimetrics https://lists.wikimedia.org/mailman/listinfo/wikimetrics]
# [https://lists.wikimedia.org/mailman/listinfo/wikimetrics https://lists.wikimedia.org/mailman/listinfo/wikimetrics]
# [https://lists.wikimedia.org/mailman/listinfo/wmfresearch https://lists.wikimedia.org/mailman/listinfo/wmfresearch]
# [https://lists.wikimedia.org/mailman/listinfo/wmfresearch https://lists.wikimedia.org/mailman/listinfo/wmfresearch]
* Please request acces to:
* Please request acces to:
# Analytics internal (email Toby)
** Operations
# Operations
** Engineering  
# Engineering  
For an overview of all available mailing lists see [https://lists.wikimedia.org/mailman/listinfo https://lists.wikimedia.org/mailman/listinfo]
For an overview of all available mailinglists see [https://lists.wikimedia.org/mailman/listinfo https://lists.wikimedia.org/mailman/listinfo]


* Optionally you may want to read archives or subscribe to the following mailing lists:
* Optionally you may want to read archives or subscribe to the following mailing lists:
Line 91: Line 80:
# [http://www.mail-archive.com/find.php?q=wikimedia&sa=Search&lists=all Mail Archive]
# [http://www.mail-archive.com/find.php?q=wikimedia&sa=Search&lists=all Mail Archive]
# [http://news.gmane.org/index.php?prefix=gmane.org.wikimedia Gmane]
# [http://news.gmane.org/index.php?prefix=gmane.org.wikimedia Gmane]
# [http://www.gossamer-threads.com/lists/wiki/ Gossamer Threads]
# [http://www.gossamer-threads.com/lists/wiki/ Gossamer Threads]<br>
 
 
= Accessing production infrastructure=
= Accessing production infrastructure=



Revision as of 22:52, 5 January 2016

Background

You will need lots of accounts, memberships and other secret keys to become a real productive member of the Analytics team. Here's an overview of things you should do in the first week. Please update this document as you go along!

First Things

E-mail lists

Once you have a wikimedia e-mail address you should subscribe yourself to two e-mail lists:

IRC

Most of our communication happens on IRC, you should set up an IRC nick

  1. Install an IRC client -- ask team members for recommendations ( some would be quassel, irssi, pidgin, xchat, textual or adium if you're on a Mac )
  2. Follow instructions on https://meta.wikimedia.org/wiki/IRC/Cloaks to request an IRC cloack
  3. Connect to #wikimedia-analytics on Freenode
  4. Other channels you might be interested in:
#wikimedia-labs, #wikimedia-operations, #wikimedia-office

Office wiki

Make sure you have an employee account and that you can use the office wiki, your office wiki user will be given to you once you get your wikimedia e-mail address.

https://office.wikimedia.org/wiki/Getting_Started_With_User_Info_and_Talk_Pages

Headset

Please buy a high-quality headset -- your colleagues will love you for this. For more tips see https://office.wikimedia.org/wiki/Office_IT/Projects/Telepresence

Culture

We are part of a movement with a unique culture. It's worth taking the time to read a bit about how our biggest project works. This policy could be a useful start, as it introduces the core concepts from a concrete point of view: https://en.wikipedia.org/wiki/Wikipedia:Biographies_of_living_persons

Getting permit

The very first thing to do will be to get a labs/Wikitech (they are the same thing) account and a Phabricator account so you can file a task like https://phabricator.wikimedia.org/T96053 (likely somebody else will do it for you).

Please follow the next subsections (order matters!).

Wikitech/Labs

Labs is a cluster of virtual machines. Access is completely decouple from production and different ssh keys should be used.

Labs is not production but we have several tools hosted on the cluster, accessing to labs requires a wikitech account:

  1. Create account: https://wikitech.wikimedia.org/w/index.php?title=Special:UserLogin&type=signup&returnto=Main+Page
  2. Request Shell Access: https://wikitech.wikimedia.org/wiki/Help:Access
  3. Log in
  4. You need to set up ssh keys: [[1]]
  5. Upload your public SSH key: https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-openstack
    1. Please have in mind that labs is a testing environment thus this ssh key should only be used in testing, if you need access to machines in the production cluster your ssh key should be different.
  6. Configure your ~/.ssh/config with bastion hosts
  7. Get familiar with the labs environment, how to use the labs interface to spin up nodes, remove nodes, etc

Phabricator

https://phabricator.wikimedia.org is the version of Phabricator that we use. Follow https://www.mediawiki.org/wiki/Phabricator/Help#Creating_your_account_and_notifications to log in for the first time (please use the sunflower icon as suggested by the tutorial to leverage the single sign on).

Mediawiki

  1. Create account: https://www.mediawiki.org/w/index.php?title=Special:UserLogin&returnto=Analytics&type=signup
  2. Log in

Gerrit

  1. Gerrit is the code review workflow we use, build on top of git
  2. Log in to Gerrit using your Wikitech/Labs credentials.
  3. To verify everything works, clone a repo repo from https://gerrit.wikimedia.org/r/#/admin/projects/?filter=analytics using SSH.
  4. Take a look at how to deal with gerrit in different work scenarios: http://etherpad.wikimedia.org/p/analytics-gerrit

Mailing lists

Reading mailing lists is important. All projects we build or use are open-source, and as most open-source projects, they have communities which come together on mailing lists. There is much knowledge to be gained in these mailing lists.

  • Please subscribe to:
  1. https://lists.wikimedia.org/mailman/listinfo/analytics
  2. https://lists.wikimedia.org/mailman/listinfo/analytics-internal
  3. https://lists.wikimedia.org/mailman/listinfo/wikimetrics
  4. https://lists.wikimedia.org/mailman/listinfo/wmfresearch
  • Please request acces to:
    • Operations
    • Engineering

For an overview of all available mailing lists see https://lists.wikimedia.org/mailman/listinfo

  • Optionally you may want to read archives or subscribe to the following mailing lists:
  1. Mediawiki
  2. Mobile
  3. Mediawiki API

If there are mailing lists you want to read without subscribing you may consider using the following gateways:

  1. Mail Archive
  2. Gmane
  3. Gossamer Threads

Accessing production infrastructure

Servers

All Analytics team-members should have access requested to the following machines:

  • stat1001
  • stat1002
  • stat1003?
  • bast1001.wikimedia.org
  • bastion.wmflabs.org
  • Hadoop cluster
  • vanadium/hafnium
  • Eventlogging DB through 1003, which means you are in analytics research group

Shell access to wikimedia cluster and production infrastructure

  1. Tickets are filed for the ops team to see and need to be approved by a manger: [2]

Talk with Andrew Otto about how to submit your private key. You would likely need to proxy your ssh connection from a know machine to access

some of the hosts above. You should not use the same ssh key for labs (testing) and stat1 machines (production).

The easiest would be to ask some team member for its .ssh/config file and get the proxy setup.

Please have in mind that different processes are required to access production machines (stat1) and testing machines (labs)

Sample ssh config

Sample ssh config:

### Short names
Host <some host you want your system to auto-complete>

## Use bastion-eqiad.wmflabs.org as proxy to labs
Host bastlabs
HostName bastion-eqiad.wmflabs.org
User <your-username>

Host *.eqiad.wmflabs !bastion-eqiad.wmflabs.org
User <your-username>
IdentityFile ~/.ssh/id_rsa
ProxyCommand ssh -a -W %h:%p bastlabs

## Prod
Host bastprod
HostName bast1001.wikimedia.org
User <your-username>

Host *.eqiad.wmnet *.wikimedia.org !bast1001.wikimedia.org
User <your-username>
IdentityFile ~/.ssh/id_rsa_prod
ProxyCommand ssh -a -W %h:%p bastprod


Process

Google Calendar

  1. Add the Analytics Team Calendar to your default view. Someone (we all can manage sharing) should go to https://calendar.google.org and add you:
    • My Calendars -> Settings
    • Click Team Analytics -> Share This Calendar
    • Add the new person

Scrum

Analytics/Scrum_Planning


Equipment

Hardware

As far as equipment goes you will need a good development machine.

Minimum machine specs:

  • >=4GB RAM
  • i7 >= 2.4 Ghz quad-core or better
  • 300GB disk

Recommended machine specs:

  • >=8GB RAM
  • i7 >= 2.4 Ghz quad-core or better
  • SSD (if you're going to be working on wikimetrics)
  • 300GB disk

At first sight you might think these are not required, but you will have to run VMs, you will be using vagrant to re-create various environments(sometimes with multiple nodes), so you will need some hardware for that.


Optional accounts

You could consider creating accounts for:

Operating System

The machines we deploy on are using Ubuntu and it would be more convenient for you to have Ubuntu installed on your development machine or any other UNIX based operating system. It will considerably facilitate your work. You may choose any other Linux distribution you're familiar with.

Mac is also a very possible choice.

Misc

This is a collection of things you might find useful in your work.

Sync tools

You may find the following tools useful for sync-ing files between your local machine and remote machines(one-way or two-way). You can also mount remote directories as if they were your local directories:

  1. sshfs
  2. rsync
  3. lsync
  4. unison
  5. scp

IDEs and editors

For Java development, you may use what IDE you feel comfortable with. Eclipse is the IDE du jour, but you might want to look at IDEA also. For remote development you may find vim to be useful(or a combination of a sync tool and your favorite editor/IDE). Other editors you might find useful may include Sublime Text, Emacs.

Searching

You may find the following tools useful to search through configuration files or code:

  1. Ack (mainly for grepping code. video presentation)
  2. grep
  3. GNU find

Environment simulation

It may be useful that you familiarize yourself with Vagrant and Puppet to be able to recreate smaller environments/conditions on your machine to test various software you're developing or contributing to.

Important Talks

Talks recommended by other members of the Analytics team:

Random Docs

Thorough description of our Hadoop infrastructure : https://plus.google.com/u/0/events/c53ho5esd0luccd09a1c30rlrmg