You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
User:QChris/TestClusterSetup
< User:QChris
Jump to navigation
Jump to search
Revision as of 19:32, 5 May 2015 by imported>Ottomata (→Creating instances)
To setup your own test cluster with Hadoop, Hive, Oozie, and Pig.
When following the below stop-by-step instructions, you'll at the end have a cluster consisting of three nodes:
foo-master.eqiad.wmflabs
. The cluster's master, and main namenode.foo-worker1.eqiad.wmflabs
. A worker node. You can use this to schedule oozie jobs or run hive querie.foo-worker2.eqiad.wmflabs
. A worker node. You can use this to schedule oozie jobs or run hive querie.
Please swap foo
with your login name if you follow these instructions!
Relevant Bugs
- bug 68161 because puppet from 2014-07-17 does not allow to bring a cluster up.
Creating instances
- Create a master instance.
- Go to the "Add instance" page.
- Set
Instance name
tofoo-master
. - Set
Instance type
tom1.small [...]
.
- (A bigger instance won't buy you anything. When testing, you'll most likely not run into disk space problems, but rather "number of filenames"-limitations. So let's be nice to wikitech.)
- Click “Submit”.
- Create a first worker instance.
- (Use the same settings as for the master instance, but set
Instance name
tofoo-worker1
).
- (Use the same settings as for the master instance, but set
- Create a second worker instance.
- (Use the same settings as for the master instance, but set
Instance name
tofoo-worker2
).
- (Use the same settings as for the master instance, but set
- Wait for all of the instances showing “Puppet status” to say “ok”.
- (Will take ~15 minutes. The page does not refresh on its own, so reload from time to time)
Setting up Hadoop
- Set up Hadoop master
- Go to the “configure” page for the
foo-master
instance. - Select
role::analytics::hadoop::master
. - Set
hadoop_cluster_name
toanalytics-hadoop
. - Set
hadoop_namenodes
tofoo-master.eqiad.wmflabs
. - Click “Submit”.
- Log in to the
foo-master
instance. - Run
sudo puppet agent --onetime --verbose --no-daemonize --no-splay --show_diff
.- (This wil take ~7 minutes).
- Go to the “configure” page for the
- Set up first worker instance.
- Go to the “configure” page for the
foo-worker1
instance. - Select
role::analytics::hadoop::worker
. - Set
hadoop_cluster_name
toanalytics-hadoop
. - Set
hadoop_namenodes
tofoo-master.eqiad.wmflabs
.- (This is no typo,
foo-master.eqiad.wmflabs
is the namenode.)
- (This is no typo,
- Click “Submit”.
- Log in to the
foo-worker1
instance. - Run
sudo puppet agent --onetime --verbose --no-daemonize --no-splay --show_diff
.
- Go to the “configure” page for the
- Set up second worker instance.
- Repeat the steps from “Set up first worker instance”, but use
foo-worker?
insteadfoo-worker1
.
- Repeat the steps from “Set up first worker instance”, but use
- Check that the two workers are known to the namenode
- Log in to the
foo-master
instance. - Run
sudo su - -c 'sudo -u hdfs hdfs dfsadmin -printTopology'
- If everything is working, The output will be a “defaultRack” showing a separate line for each worker node. Like
- Log in to the
Rack: /default-rack 10.68.17.180:50010 (foo-worker1.eqiad.wmflabs) 10.68.17.181:50010 (foo-worker2.eqiad.wmflabs)
Setting up Oozie, Hive
- Set up Hadoop master
- Go to the “configure” page for the
foo-master
instance. - Select
role::analytics::hive::server
. - Select
role::analytics::oozie::server
. - Click “Submit”.
- Log in to the
foo-master
instance. - Run
sudo puppet agent --onetime --verbose --no-daemonize --no-splay --show_diff
.
- Go to the “configure” page for the
- Set up first worker instance.
- Go to the “configure” page for the
foo-worker1
instance. - Select
role::analytics::hive::client
. - Select
role::analytics::oozie::client
. - Select
role::analytics::pig
. - Click “Submit”.
- Log in to the
foo-worker1
instance. - Run
sudo puppet agent --onetime --verbose --no-daemonize --no-splay --show_diff
.
- Go to the “configure” page for the
- Set up second worker instance.
- Repeat the steps from “Set up first worker instance”, but use
foo-worker2
insteadfoo-worker1
.
- Repeat the steps from “Set up first worker instance”, but use
Bootstrapping content
Done. HDFS, Hive, Oozie, Pig are functional.
Sample data
You can use the scripts from the cluster-scripts repository to bootstrap some parts. For example:
- Log in to the
foo-worker1
instance. - In the cluster-scripts directory, run
./prepare_for_user.sh
- Log out.
- Log in to the
foo-worker1
instance again - In the cluster-scripts directory, run
./hive_add_table_webrequest_sequence_stats.sh
- In the cluster-scripts directory, run
./hive_add_table_webrequest.sh
- In the cluster-scripts directory, run
./insert_webrequest_fixture_duplicate_monitoring.sh
Sample Deployment
- In the refinery directory, run
bin/deploy --no-dry-run --verbose
Sample Oozie job
- In the refinery directory, run
oozie job -config oozie/webrequest/compute_sequence_stats/job.properties -run -D hadoopMaster=foo-master.eqiad.wmflabs
Cluster Tuning
- Replication issues. (
hdfs-site.xml
) The default block size is rather big for labs, and it'll pretty quickly give you randem errors about not being able to replicate blocks. You can work around it by settingdfs.namenode.fs-limits.min-block-size
to100
, and bothdfs.blocksize
, anddfs.blocksize
to128k
. That should basically kill the replication issue. If you run into it nonetheless, runhdfs dfs -expunge
, whenever you run into the issue.
- Oozie needing 5 minutes between job creations. (
oozie-site.xml
) Add the propertyoozie.service.CoordMaterializeTriggerService.lookup.interval
with value2
, to get the default down to 2 seconds. Reboot the oozie server afterwards.
- If yarn applications (e.g.: Hive jobs, Oozie jobs, ...) do not finish in time (like >10 minutes on simple selects), check
yarn application -list
. If the jobs do not change there either, checkmapred job -list
. IfUsedMem < NeededMem
, orRsvdMem > 0
for >5 minutes, you'll likely need more data nodes. Just addfoo-worker3
,foo-worker4
, ... accordingly to the description forfoo-worker1
above.
- If the namenode switches into safemode, it's typically the free disk space (in plain fs) running low on the namenode in
/var
. Look for big directories in/var/log
. After some cleanup there, you can get the namenode out of safe mode by runningsudo su - -c "sudo -u hdfs hdfs dfsadmin -safemode leave"