You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Analytics/Systems/Cluster/Camus"

From Wikitech-static
Jump to navigation Jump to search
imported>Milimetric
m (Milimetric moved page Analytics/Cluster/Camus to Analytics/Systems/Cluster/Camus: Reorganizing documentation)
 
imported>Elukey
Line 5: Line 5:
== How to stop Camus ==
== How to stop Camus ==
The quickest way is to ssh to analytics1003.eqiad.wmnet (check site.pp in Puppet first for the role camus) and comment the crontab entry:
The quickest way is to ssh to analytics1003.eqiad.wmnet (check site.pp in Puppet first for the role camus) and comment the crontab entry:
  ssh analytics1003.eqiad.wmnet
  ssh an-coord1001.eqiad.wmnet
  sudo -u hdfs crontab -e /* comment whatever you need and then save */
  sudo -u hdfs crontab -e /* comment whatever you need and then save */


== Check Camus Production logs ==
== Check Camus Production logs ==
* ssh to analytics1027.eqiad.wmnet (check site.pp in Puppet first for the role camus).
* ssh to an-coord1001.eqiad.wmnet (check site.pp in Puppet first for the role camus).
* logs are stored in /var/log/camus, one (rotated) file per camus run-type (as of today: <code>webrequest</code>, <code>eventlogging</code>, <code>mediawiki</code> and <code>eventbus</code>)
* logs are stored in /var/log/camus, one (rotated) file per camus run-type (as of today: <code>webrequest</code>, <code>eventlogging</code>, <code>mediawiki</code> and <code>eventbus</code>)
* In those files are logged both camus output and camus-partition-checker output.
* In those files are logged both camus output and camus-partition-checker output.

Revision as of 09:35, 11 October 2018

This info is for members of analytics team.

Analytics Production Camus jobs are launched via hdfs user cron on analytics1003 (check site.pp in Puppet first for the role camus).

How to stop Camus

The quickest way is to ssh to analytics1003.eqiad.wmnet (check site.pp in Puppet first for the role camus) and comment the crontab entry:

ssh an-coord1001.eqiad.wmnet
sudo -u hdfs crontab -e /* comment whatever you need and then save */

Check Camus Production logs

  • ssh to an-coord1001.eqiad.wmnet (check site.pp in Puppet first for the role camus).
  • logs are stored in /var/log/camus, one (rotated) file per camus run-type (as of today: webrequest, eventlogging, mediawiki and eventbus)
  • In those files are logged both camus output and camus-partition-checker output.

How to produce to kafka

cat test_message.txt  | kafkacat -b  kafka1012.eqiad.wmnet:9092 -t test

Test message is a file like:

{"id":123456,"name":"pepito perez", "muchoStuff":{"a": "1"}}
{"id":123456,"name":"pepito perez", "muchoStuff":{"a": "2"}}
{"id":123456,"name":"pepito perez", "muchoStuff":{"a": "3"}}
{"id":123456,"name":"pepito perez", "muchoStuff":{"a": "4"}}

How to validate your data against your avro schema

We have found php bindings to be different than java ones, please validate messages using this java jar:

java -jar avro-tools-1.7.6.jar jsontofrag --schema-file CirrusSearchRequestSet.avsc searchmessage.json

How to run camus job to decode avro from kafka topic

Camus is our map reduce job but also has some of the code we depend on, thus camus jar appears twice.

Note that you need your local properties file to pass to camus. Note: "-P /home/user/avro-kafka/camus.avro.json.properties" below

"Real" properties files live on puppet: [1]

#!/bin/sh
export LIBJARS=/home/user/avro-kafka/camus-wmf-0.1.0-wmf6.jar,/home/user/avro-kafka/camus-etl-kafka-0.1.0-wmf6.jar,/home/user/avro-kafka/camus-api-0.1.0-wmf6.jar,/home/user/av
ro-kafka/camus-kafka-coders-0.1.0-wmf6.jar,/home/user/avro-kafka/camus-schema-registry-0.1.0-wmf6.jar,/home/user/avro-kafka/camus-parent-0.1.0-wmf6-tests.jar,/home/user/avro-kafka/refinery-camus-0.0.20-SNAPSHOT.jar

export HADOOP_CLASSPATH=/home/user/avro-kafka/camus-wmf-0.1.0-wmf6.jar:/home/user/avro-kafka/camus-etl-kafka-0.1.0-wmf6.jar:/home/user/avro-kafka/camus-api-0.1.0-wmf6.jar:/home/user/avro-kafka/camus-kafka-coders-0.1.0-wmf6.jar:/home/user/avro-kafka/camus-schema-registry-0.1.0-wmf6.jar:/home/user/avro-kafka/camus-parent-0.1.0-wmf6-tests.jar:/home/user/avr
o-kafka/refinery-camus-0.0.20-SNAPSHOT.jar

/usr/bin/hadoop jar /home/user/avro-kafka/camus-wmf-0.1.0-wmf6.jar com.linkedin.camus.etl.kafka.CamusJob -libjars ${LIBJARS}  -Dcamus.job.name="some_avro_test"  -P /home/user/avro-kafka/camus.avro.json.properties >>  ./log_camus_avro_test.txt 2>&1