You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Analytics/Cluster/Refinery

From Wikitech-static
< Analytics‎ | Cluster
Revision as of 13:46, 23 February 2016 by imported>Mforns (add check that code is pulled)
Jump to navigation Jump to search

Refinery is the software infrastructure that is used on the Analytics Cluster. The source code is in the analytics/refinery repository.

This repository uses jars created from analytics/refinery/source, see this page for deploying those.

How to deploy

  1. Ssh into Tin
  2. Run:
    cd /srv/deployment/analytics/refinery
    git deploy start
    git checkout master
    git pull
    git deploy sync

    (git deploy sync will complain that only “2/3 minions completed fetch”. You can say “y”es to that)

    This part brings the refinery code from gerrit to stat1002.
  3. Ssh into stat1002
  4. Cd into /srv/deployment/analytics/refinery and check (git log) that the code has been pulled.
  5. Run sudo -u hdfs /srv/deployment/analytics/refinery/bin/refinery-deploy-to-hdfs --verbose --no-dry-run

    This part brings the refinery code to the HDFS (but it does not resubmit Oozie jobs).

How to deploy Oozie jobs

Please see the Deployment section in the Oozie docs.