You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Analytics/Systems/Cluster/Access: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Ottomata
No edit summary
imported>MMiller
(→‎HTTP Access: Clarifying how to get access to Hue.)
Line 7: Line 7:
At the very minimum, you must have a shell account on the primary NameNode (analytics1001).  HDFS uses POSIX accounts on the NameNode (analytics1001) for granting access to files.
At the very minimum, you must have a shell account on the primary NameNode (analytics1001).  HDFS uses POSIX accounts on the NameNode (analytics1001) for granting access to files.


Hue (Hadoop User Experience) GUI is available at https://hue.wikimedia.org.  Log in using your shell username and your LDAP credentials.  If you already have cluster access, but can't log into Hue, it is likely that your LDAP account needs to be manually synced.  Ask an Analytics Opsen (ottomata (aotto@wikimedia.org) or elukey (ltoscano@wikimedia.org) ) for help.
Hue (Hadoop User Experience) GUI is available at https://hue.wikimedia.org.  Log in using your UNIX shell username and Wikimedia developer account (Wikitech) password.  If you already have cluster access, but can't log into Hue, it is likely that your account needs to be manually synced.  Ask an Analytics Opsen (ottomata (aotto{{@}}wikimedia.org) or elukey (ltoscano{{@}}wikimedia.org) ) -- or file a Phabricator task -- for help.


=== Admin Instructions to sync a Hue LDAP account ===
=== Admin Instructions to sync a Hue account ===


When a new Hadoop user is added, an admin should give them a Hue account.  If [https://phabricator.wikimedia.org/T127850 this ticket] is resolved, this process should be automatic.
When a new Hadoop user is added, an admin should give them a Hue account.  Once [[phab:T127850]] is resolved, this process should be automatic.


# Log into http://hue.wikimedia.org
# Log into http://hue.wikimedia.org
# In the upper right, click on your username, and select Manage Users (you will only be able to do this if you are Hue admin.  Another admin can make you one.)
# In the upper right, click on your username, and select Manage Users (you will only be able to do this if you are Hue admin.  Another admin can make you one.)
# Click 'Add/Sync LDAP User'
# Click 'Add/Sync LDAP User'
# Fill in the form with their shell username (not LDAP/Wikitech login), deselect both 'Distinguished name' and 'Create home directory', and click 'Add/Sync user'
# Fill in the form with their UNIX shell username (not their Wikimedia developer account username), deselect both 'Distinguished name' and 'Create home directory', and click 'Add/Sync user'


Done!
Done!


== ssh tunnel(s) ==
== ssh tunnel(s) ==
{{See|For the main article on creating a tunnel, see [[Proxy access to cluster]]}}
If you are in the wmf LDAP group (open to every WMF employee/contractor) and you care only about the Yarn Resource Manager UI, you can login directly to [https://yarn.wikimedia.org/ yarn.wikimedia.org].


If you are in the wmf LDAP group (every WMF employee/contractor) and you care only about the Yarn Resource Manager UI, you can login directly to [https://yarn.wikimedia.org/ yarn.wikimedia.org].
Otherwise, to send HTTP requests to an internal analytics server, use an SSH tunnel. For example:
 
Otherwise, If you have access to the nodes you want to send HTTP requests to,
then you can access specific HTTP services using direct ssh tunneling.


To access the Hadoop Resourcemanager jobbrowser, try running:
To access the Hadoop Resourcemanager jobbrowser, try running:


   ssh -N stat1004.eqiad.wmnet -L 8088:analytics1001.eqiad.wmnet:8088
   ssh -N bast1002.wikimedia.org -L 8088:analytics1001.eqiad.wmnet:8088
 
or
 
  ssh -N bast1001.wikimedia.org -L 8088:analytics1001.eqiad.wmnet:8088


And then navigate to http://localhost:8088/cluster in your browser.
And then navigate to http://localhost:8088/cluster in your browser. The FairScheduler interface will be at http://localhost:8088/cluster/scheduler.


You might want to check out the FairScheduler interface here too. It will show you usage of the cluster per user:  http://localhost:8088/cluster/scheduler
For more information see [[Proxy access to cluster]].

Revision as of 17:54, 4 June 2018

Command line access

You can access the Hadoop and Hive on the the stats machines stat1005 and stat1004. For information on getting access, see Analytics/Data access and production shell access.

HTTP Access

Access to HTTP GUIs in the Analytics Cluster is currently very restricted. You must have shell accounts on analytics nodes.

At the very minimum, you must have a shell account on the primary NameNode (analytics1001). HDFS uses POSIX accounts on the NameNode (analytics1001) for granting access to files.

Hue (Hadoop User Experience) GUI is available at https://hue.wikimedia.org. Log in using your UNIX shell username and Wikimedia developer account (Wikitech) password. If you already have cluster access, but can't log into Hue, it is likely that your account needs to be manually synced. Ask an Analytics Opsen (ottomata (aotto at wikimedia.org) or elukey (ltoscano at wikimedia.org) ) -- or file a Phabricator task -- for help.

Admin Instructions to sync a Hue account

When a new Hadoop user is added, an admin should give them a Hue account. Once phab:T127850 is resolved, this process should be automatic.

  1. Log into http://hue.wikimedia.org
  2. In the upper right, click on your username, and select Manage Users (you will only be able to do this if you are Hue admin. Another admin can make you one.)
  3. Click 'Add/Sync LDAP User'
  4. Fill in the form with their UNIX shell username (not their Wikimedia developer account username), deselect both 'Distinguished name' and 'Create home directory', and click 'Add/Sync user'

Done!

ssh tunnel(s)

If you are in the wmf LDAP group (open to every WMF employee/contractor) and you care only about the Yarn Resource Manager UI, you can login directly to yarn.wikimedia.org.

Otherwise, to send HTTP requests to an internal analytics server, use an SSH tunnel. For example:

To access the Hadoop Resourcemanager jobbrowser, try running:

 ssh -N bast1002.wikimedia.org -L 8088:analytics1001.eqiad.wmnet:8088

And then navigate to http://localhost:8088/cluster in your browser. The FairScheduler interface will be at http://localhost:8088/cluster/scheduler.

For more information see Proxy access to cluster.