You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Discovery/Analytics"

From Wikitech-static
Jump to navigation Jump to search
imported>Smalyshev
imported>EBernhardson
Line 65: Line 65:
     -D oozie.bundle.application.path= \
     -D oozie.bundle.application.path= \
     -D oozie.coord.application.path=hdfs://analytics-hadoop/wmf/discovery/$DISCOVERY_VERSION/oozie/transfer_to_es/coordinator.xml \
     -D oozie.coord.application.path=hdfs://analytics-hadoop/wmf/discovery/$DISCOVERY_VERSION/oozie/transfer_to_es/coordinator.xml \
     -D elasticsearch_url=http://elastic1017.eqiad.wmnet:9200 \
     -D elasticsearch_url=http://elastic1017.eqiad.wmnet:9200


== Oozie Test Deployments ==
== Oozie Test Deployments ==
Line 113: Line 113:
             -D oozie.coord.application.path=hdfs://analytics-hadoop/user/$USER/discovery-analytics/$DISCOVERY_VERSION/oozie/transfer_to_es/coordinator.xml
             -D oozie.coord.application.path=hdfs://analytics-hadoop/user/$USER/discovery-analytics/$DISCOVERY_VERSION/oozie/transfer_to_es/coordinator.xml
[[Category:Discovery]]
[[Category:Discovery]]
== Debugging ==
=== Web Based ===
==== Hue ====
Active oozie bundles, coordinators, and workflows can be seen in [https://hue.wikimedia.org hue]. Hue uses wikimedia LDAP for authentication. All production jobs for discovery are owned by the analytics-search user. This is only useful for viewing the state of things, actual manipulations need to be done by sudo'ing to the analytics-search user on <code>stat1002.eqiad.wmnet</code> and utilizing the CLI tools.
==== Yarn ====
Yarn shows the status of currently running jobs on the hadoop cluster. Yarn is accessible over a [[Analytics/Cluster/Access#SOCKS_proxy_.26_FoxyProxy|SOCKS proxy]] to <code>analytics1001.eqiad.wmnet</code>. Currently running jobs are visible via the [http://analytics1001.eqiad.wmnet:8088/cluster/apps/RUNNING running applications] page.
=== CLI ===
CLI commands are typically run by ssh'ing into <code>stat1002.eqiad.wmnet</code>
==== Oozie ====
The <code>oozie</code> CLI command can be used to get info about currently running bundles, coordinators, and workflows. Although often it is easier to get those from [[#Hue]]. Use <code>oozie help</code> for more detailed info, here are a few useful commands. Get the appropriate oozie id from hue.
===== Re-run a failed workflow =====
  sudo -u analytics-search oozie job -oozie http://analytics1027.eqiad.wmnet:11000/oozie -rerun 0000612-160202151345641-oozie-oozi-W
===== Show info about running job =====
  oozie job -info 0000612-160202151345641-oozie-oozi-W
This will output something like the following
  Job ID : 0000612-160202151345641-oozie-oozi-W
  ------------------------------------------------------------------------------------------------------------------------------------
  Workflow Name : discovery-transfer_to_es-discovery.popularity_score-2016,2,1->http://elastic1017.eqiad.wmnet:9200-wf
  App Path      : hdfs://analytics-hadoop/wmf/discovery/2016-02-02T21.16.44Z--2b630f1/oozie/transfer_to_es/workflow.xml
  Status        : RUNNING
  Run          : 0
  User          : analytics-search
  Group        : -
  Created      : 2016-02-02 23:01 GMT
  Started      : 2016-02-02 23:01 GMT
  Last Modified : 2016-02-03 03:36 GMT
  Ended        : -
  CoordAction ID: 0000611-160202151345641-oozie-oozi-C@1
  Actions
  ------------------------------------------------------------------------------------------------------------------------------------
  ID                                                                            Status    Ext ID                Ext Status Err Code 
  ------------------------------------------------------------------------------------------------------------------------------------
  0000612-160202151345641-oozie-oozi-W@:start:                                  OK        -                      OK        -       
  ------------------------------------------------------------------------------------------------------------------------------------
  0000612-160202151345641-oozie-oozi-W@transfer                                RUNNING  job_1454006297742_16752RUNNING    -       
  ------------------------------------------------------------------------------------------------------------------------------------
You can continue down the rabbit hole to find more information. In this case the above oozie workflow kicked off a transfer job, which matches an element of the related workflow.xml
  oozie job -info 0000612-160202151345641-oozie-oozi-W@transfer
This will output something like the following
  ID : 0000612-160202151345641-oozie-oozi-W@transfer
  ------------------------------------------------------------------------------------------------------------------------------------
  Console URL      : http://analytics1001.eqiad.wmnet:8088/proxy/application_1454006297742_16752/
  Error Code        : -
  Error Message    : -
  External ID      : job_1454006297742_16752
  External Status  : RUNNING
  Name              : transfer
  Retries          : 0
  Tracker URI      : resourcemanager.analytics.eqiad.wmnet:8032
  Type              : spark
  Started          : 2016-02-02 23:01 GMT
  Status            : RUNNING
  Ended            : -
  ------------------------------------------------------------------------------------------------------------------------------------
Of particular interest is the Console URL, and the related application id (application_1454006297742_16752). This id can be used with the yarn command to retrieve logs of application that was run. Note though that two separate jobs are created, the @transfer job shown above is an oozie runner, for spark jobs this does little to no actual work and mostly just kicks off another job to run the spark application.
===== List jobs by user =====
Don't attempt to use the <code>oozie jobs -filter ...</code> command, it will stall out oozie. Instead use hue and the filter's available there.
==== Yarn ====
Yarn is the actual job runner.
===== List running spark jobs =====
  yarn application -appTypes SPARK -list
===== Fetch application logs =====
the yarn application id can be used to fetch application logs
  sudo -u analytics-search yarn logs -applicationId application_1454006297742_16752 | less
Remember that each spark job kicked off by oozie has two application id's. The one reported by <code>oozie job -info</code> is the oozie job runner. The sub job that is actually running the spark application isn't as easy to find the ID of. Your best bet is often to poke around in the yarn web ui to find the job with the right name, such as <code>Discovery Transfer To http://elastic1017.eqiad.wmnet:9200</code>.

Revision as of 23:20, 3 February 2016

Discovery uses the Analytics Cluster to support CirrusSearch. The source code is in the wikimedia/discovery/analytics repository.

How to deploy

  1. Ssh into Tin
  2. Run:
    cd /srv/deployment/wikimedia/discovery/analytics
    git deploy start
    git checkout master
    git pull
    git deploy sync

    (git deploy sync will complain that only “2/3 minions completed fetch”. You can say “y”es to that)

    This part brings the refinery code from gerrit to stat1002.
  3. Ssh into stat1002
  4. Run sudo -u analytics-search /srv/deployment/wikimedia/discovery/analytics/bin/discovery-deploy-to-hdfs --verbose --no-dry-run

    This part brings the refinery code to the HDFS (but it does not resubmit Oozie jobs).

How to deploy Oozie production jobs

Oozie jobs are deployed from stat1002. The following environment variables are used to kick off all jobs:

  • REFINERY_VERSION should be set to the concrete, 'deployed' version of refinery that you want to deploy from. Like 2015-01-05T17.59.18Z--7bb7f07. (Do not use current there, or your job is likely to break when someone deploys refinery afresh).
  • DISCOVERY_VERSION should be set to the concrete, 'deployed' version of discovery analytics that you want to deploy from. Like 2016-01-22T20.19.59Z--e00dbef. (Do not use current there, or your job is likely to break when someone deploys disocvery analyitcs afresh).
  • PROPERTIES_FILE should be set to the properties file that you want to deploy; relative to the refinery root. Like oozie/popularity_score/bundle.properties.
  • START_TIME should denote the time the job should run the first time. Like 2016-01-05T11:00Z. This should be coordinated between both the popularity_score and transfer_to_es jobs so that they are asking for the same days. Generally you want to set this to the next day the job should run.

popularity_score

 export DISCOVERY_VERSION=$(ls -d /mnt/hdfs/wmf/discovery/20* | sort | tail -n 1 | sed 's/^.*\///')
 export REFINERY_VERSION=$(ls -d /mnt/hdfs/wmf/refinery/20* | sort | tail -n 1 | sed 's/^.*\///')
 export PROPERTIES_FILE=oozie/popularity_score/coordinator.properties
 export START_TIME=2016-01-05T11:00Z
 
 cd /mnt/hdfs/wmf/discovery/$DISCOVERY_VERSION
 sudo -u analytics-search oozie job \
   -oozie http://analytics1027.eqiad.wmnet:11000/oozie \
   -run \
   -config $PROPERTIES_FILE \
   -D discovery_oozie_directory=hdfs://analytics-hadoop/wmf/discovery/$DISCOVERY_VERSION/oozie \
   -D analytics_oozie_directory=hdfs://analytics-hadoop/wmf/refinery/$REFINERY_VERSION/oozie \
   -D queue_name=production \
   -D start_time=$START_TIME

transfer_to_es

The firewall between analytics and codfw is not yet opened up, so this adjusts the properties to run the bundle as a coordinator

 export DISCOVERY_VERSION=$(ls -d /mnt/hdfs/wmf/discovery/20* | sort | tail -n 1 | sed 's/^.*\///')
 export REFINERY_VERSION=$(ls -d /mnt/hdfs/wmf/refinery/20* | sort | tail -n 1 | sed 's/^.*\///')
 export PROPERTIES_FILE=oozie/transfer_to_es/bundle.properties
 export START_TIME=2016-01-05T11:00Z
 
 cd /mnt/hdfs/wmf/discovery/$DISCOVERY_VERSION
 sudo -u analytics-search oozie job \
   -oozie http://analytics1027.eqiad.wmnet:11000/oozie \
   -run \
   -config $PROPERTIES_FILE \
   -D discovery_oozie_directory=hdfs://analytics-hadoop/wmf/discovery/$DISCOVERY_VERSION/oozie \
   -D analytics_oozie_directory=hdfs://analytics-hadoop/wmf/refinery/$REFINERY_VERSION/oozie \
   -D queue_name=production \
   -D start_time=$START_TIME \
   -D oozie.bundle.application.path= \
   -D oozie.coord.application.path=hdfs://analytics-hadoop/wmf/discovery/$DISCOVERY_VERSION/oozie/transfer_to_es/coordinator.xml \
   -D elasticsearch_url=http://elastic1017.eqiad.wmnet:9200

Oozie Test Deployments

There is no hadoop cluster in beta cluster or labs, so changes have to be tested in production. When submitting a job please ensure you override all appropriate values so the production data paths and tables are not effected. After testing you job be sure to kill it (the correct one!) from hue. Note that most of the time you won't need to do a full test through oozie, you can instead call the script directly with spark-submit.

deploy test code to hdfs

 git clone http://gerrit.wikimedia.org/r/wikimedia/discovery/analytics ~/discovery-analytics
 <copy some command from the gerrit ui to pull down and checkout your patch>
 ~/discovery-analytics/bin/discovery-deploy-to-hdfs --base hdfs:///user/$USER/discovery-analytics --verbose --no-dry-run

popularity_score

 export DISCOVERY_VERSION=current
 export ANALYTICS_VERSION=current
 export PROPERTIES_FILE=oozie/popularity_score/coordinator.properties
 cd /mnt/hdfs/user/$USER/discovery-analytics/$DISCOVERY_VERSION
 oozie job -oozie http://analytics1027.eqiad.wmnet:11000/oozie \
           -run \
           -config $PROPERTIES_FILE \
           -D discovery_oozie_directory=hdfs://analytics-hadoop/user/$USER/discovery-analytics/$DISCOVERY_VERSION/oozie \
           -D analytics_oozie_directory=hdfs://analytics-hadoop/wmf/refinery/$REFINERY_VERSION/oozie \
           -D start_time=2016-01-22T00:00Z \
           -D discovery_data_directory=hdfs://analytics-hadoop/user/$USER/discovery-analytics-data \
           -D popularity_score_table=$USER.discovery_popularity_score

transfer_to_es

 export DISCOVERY_VERSION=current
 export ANALYTICS_VERSION=current
 export PROPERTIES_FILE=oozie/transfer_to_es/bundle.properties
 cd /mnt/hdfs/user/$USER/discovery-analytics/$DISCOVERY_VERSION
 oozie job -oozie http://analytics1027.eqiad.wmnet:11000/oozie \
           -run \
           -config $PROPERTIES_FILE \
           -D discovery_oozie_directory=hdfs://analytics-hadoop/user/$USER/discovery-analytics/$DISCOVERY_VERSION/oozie \
           -D analytics_oozie_directory=hdfs://analytics-hadoop/wmf/refinery/$REFINERY_VERSION/oozie \
           -D start_time=2016-01-22T00:00Z \
           -D discovery_data_directory=hdfs://analytics-hadoop/user/$USER/discovery-analytics-data \
           -D elasticsearch_url=http://stat1002.eqiad.wmnet:9876 \
           -D spark_number_executors=3 \
           -D popularity_score_table=$USER.discovery_popularity_score \
           -D oozie.bundle.application.path= \
           -D oozie.coord.application.path=hdfs://analytics-hadoop/user/$USER/discovery-analytics/$DISCOVERY_VERSION/oozie/transfer_to_es/coordinator.xml

Debugging

Web Based

Hue

Active oozie bundles, coordinators, and workflows can be seen in hue. Hue uses wikimedia LDAP for authentication. All production jobs for discovery are owned by the analytics-search user. This is only useful for viewing the state of things, actual manipulations need to be done by sudo'ing to the analytics-search user on stat1002.eqiad.wmnet and utilizing the CLI tools.

Yarn

Yarn shows the status of currently running jobs on the hadoop cluster. Yarn is accessible over a SOCKS proxy to analytics1001.eqiad.wmnet. Currently running jobs are visible via the running applications page.

CLI

CLI commands are typically run by ssh'ing into stat1002.eqiad.wmnet

Oozie

The oozie CLI command can be used to get info about currently running bundles, coordinators, and workflows. Although often it is easier to get those from #Hue. Use oozie help for more detailed info, here are a few useful commands. Get the appropriate oozie id from hue.

Re-run a failed workflow
 sudo -u analytics-search oozie job -oozie http://analytics1027.eqiad.wmnet:11000/oozie -rerun 0000612-160202151345641-oozie-oozi-W
Show info about running job
 oozie job -info 0000612-160202151345641-oozie-oozi-W

This will output something like the following

 Job ID : 0000612-160202151345641-oozie-oozi-W
 ------------------------------------------------------------------------------------------------------------------------------------
 Workflow Name : discovery-transfer_to_es-discovery.popularity_score-2016,2,1->http://elastic1017.eqiad.wmnet:9200-wf
 App Path      : hdfs://analytics-hadoop/wmf/discovery/2016-02-02T21.16.44Z--2b630f1/oozie/transfer_to_es/workflow.xml
 Status        : RUNNING
 Run           : 0
 User          : analytics-search
 Group         : -
 Created       : 2016-02-02 23:01 GMT
 Started       : 2016-02-02 23:01 GMT
 Last Modified : 2016-02-03 03:36 GMT
 Ended         : -
 CoordAction ID: 0000611-160202151345641-oozie-oozi-C@1
 Actions
 ------------------------------------------------------------------------------------------------------------------------------------
 ID                                                                            Status    Ext ID                 Ext Status Err Code  
 ------------------------------------------------------------------------------------------------------------------------------------
 0000612-160202151345641-oozie-oozi-W@:start:                                  OK        -                      OK         -         
 ------------------------------------------------------------------------------------------------------------------------------------
 0000612-160202151345641-oozie-oozi-W@transfer                                 RUNNING   job_1454006297742_16752RUNNING    -         
 ------------------------------------------------------------------------------------------------------------------------------------

You can continue down the rabbit hole to find more information. In this case the above oozie workflow kicked off a transfer job, which matches an element of the related workflow.xml

 oozie job -info 0000612-160202151345641-oozie-oozi-W@transfer

This will output something like the following

 ID : 0000612-160202151345641-oozie-oozi-W@transfer
 ------------------------------------------------------------------------------------------------------------------------------------
 Console URL       : http://analytics1001.eqiad.wmnet:8088/proxy/application_1454006297742_16752/
 Error Code        : -
 Error Message     : -
 External ID       : job_1454006297742_16752
 External Status   : RUNNING
 Name              : transfer
 Retries           : 0
 Tracker URI       : resourcemanager.analytics.eqiad.wmnet:8032
 Type              : spark
 Started           : 2016-02-02 23:01 GMT
 Status            : RUNNING
 Ended             : -
 ------------------------------------------------------------------------------------------------------------------------------------

Of particular interest is the Console URL, and the related application id (application_1454006297742_16752). This id can be used with the yarn command to retrieve logs of application that was run. Note though that two separate jobs are created, the @transfer job shown above is an oozie runner, for spark jobs this does little to no actual work and mostly just kicks off another job to run the spark application.

List jobs by user

Don't attempt to use the oozie jobs -filter ... command, it will stall out oozie. Instead use hue and the filter's available there.

Yarn

Yarn is the actual job runner.

List running spark jobs
 yarn application -appTypes SPARK -list
Fetch application logs

the yarn application id can be used to fetch application logs

 sudo -u analytics-search yarn logs -applicationId application_1454006297742_16752 | less

Remember that each spark job kicked off by oozie has two application id's. The one reported by oozie job -info is the oozie job runner. The sub job that is actually running the spark application isn't as easy to find the ID of. Your best bet is often to poke around in the yarn web ui to find the job with the right name, such as Discovery Transfer To http://elastic1017.eqiad.wmnet:9200.