You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Analytics/Systems/Cluster/Access: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Ottomata
imported>Ottomata
No edit summary
Line 3: Line 3:


== HTTP Access ==
== HTTP Access ==
Access to HTTP GUIs in the Analytics Cluster is currently very restricted.  You must have shell accounts on analytics nodes. You must use a SOCKS proxy or ssh tunnels to access to HTTP services.
Access to HTTP GUIs in the Analytics Cluster is currently very restricted.  You must have shell accounts on analytics nodes.


At the very minimum, you must have a shell account on the primary NameNode (analytics1001).  HDFS uses POSIX accounts on the NameNode (analytics1001) for granting access to files.
At the very minimum, you must have a shell account on the primary NameNode (analytics1001).  HDFS uses POSIX accounts on the NameNode (analytics1001) for granting access to files.
Line 19: Line 19:


Done!
Done!
== sshuttle ==
[https://github.com/apenwarr/sshuttle sshuttle] is a 'Transparent proxy server that works as a poor man's VPN. Forwards over ssh. Doesn't require admin. Works with Linux and MacOS. Supports DNS tunneling.'.  You can use this to proxy traffic through a bastion host to the cluster.
Download and install sshuttle following [https://github.com/apenwarr/sshuttle#this-is-how-you-use-it these instructions].  Then, run
  ./sshuttle --dns -vvr bast1001.wikimedia.org 10.0.0.0/8
'''Be warned, this will proxy DNS requests through the Wikimedia network, and any requests to an IP on the internal Wikimedia network will be proxied through the bastion.'''
While this is running, you should be able to navigate to internally hosted web services from your browser.  Try accessing the ResourceManager jobbrowser at http://analytics1001.eqiad.wmnet:8088/


== ssh tunnel(s) ==
== ssh tunnel(s) ==
Line 49: Line 38:


You might want to check out the FairScheduler interface here too.  It will show you usage of the cluster per user:  http://localhost:8088/cluster/scheduler
You might want to check out the FairScheduler interface here too.  It will show you usage of the cluster per user:  http://localhost:8088/cluster/scheduler
== SOCKS proxy & FoxyProxy ==
:''Also see the explanation in [[Help:Access#Setting up the proxy]]''
For this to work, you need automatic ssh proxying to stat1004.eqiad.wmnet through bast1001.wikimedia.org. You can add the following to your <tt>.ssh/config</tt> file if you don't already have something more generic (see [[SSH access]]):
  Host analytics*
      ProxyCommand ssh -a -W %h:%p bast1001.wikimedia.org
Once that works (verify that you can <kbd>ssh</kbd> into stat1004.eqiad.wmnet), you can open up a
SOCKS proxy through stat1004.eqiad.wmnet:
  ssh -N -D 8999 stat1004.eqiad.wmnet
Finally, configure your browser to connect via host: localhost port 8999.  If you use FoxyProxy, you can set up specific URL patterns that you would like to proxy.  https?://analytics.* should do.
Once there, you should be able to navigate to services. Try out http://analytics1001.eqiad.wmnet:8088/cluster to be sure that it works.

Revision as of 20:51, 14 December 2017

Command line access

You can access the Hadoop and Hive on the the stats machines stat1005 and stat1004. For information on getting access, see Analytics/Data access and production shell access.

HTTP Access

Access to HTTP GUIs in the Analytics Cluster is currently very restricted. You must have shell accounts on analytics nodes.

At the very minimum, you must have a shell account on the primary NameNode (analytics1001). HDFS uses POSIX accounts on the NameNode (analytics1001) for granting access to files.

Hue (Hadoop User Experience) GUI is available at https://hue.wikimedia.org. Log in using your shell username and your LDAP credentials. If you already have cluster access, but can't log into Hue, it is likely that your LDAP account needs to be manually synced. Ask an Analytics Opsen (ottomata (aotto@wikimedia.org) or elukey (ltoscano@wikimedia.org) ) for help.

Admin Instructions to sync a Hue LDAP account

When a new Hadoop user is added, an admin should give them a Hue account. If this ticket is resolved, this process should be automatic.

  1. Log into http://hue.wikimedia.org
  2. In the upper right, click on your username, and select Manage Users (you will only be able to do this if you are Hue admin. Another admin can make you one.)
  3. Click 'Add/Sync LDAP User'
  4. Fill in the form with their shell username (not LDAP/Wikitech login), deselect both 'Distinguished name' and 'Create home directory', and click 'Add/Sync user'

Done!

ssh tunnel(s)

If you are in the wmf LDAP group (every WMF employee/contractor) and you care only about the Yarn Resource Manager UI, you can login directly to yarn.wikimedia.org.

Otherwise, If you have access to the nodes you want to send HTTP requests to, then you can access specific HTTP services using direct ssh tunneling.

To access the Hadoop Resourcemanager jobbrowser, try running:

 ssh -N stat1004.eqiad.wmnet -L 8088:analytics1001.eqiad.wmnet:8088

or

 ssh -N bast1001.wikimedia.org -L 8088:analytics1001.eqiad.wmnet:8088

And then navigate to http://localhost:8088/cluster in your browser.

You might want to check out the FairScheduler interface here too. It will show you usage of the cluster per user: http://localhost:8088/cluster/scheduler