You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Service restarts: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Muehlenhoff
(Add preliminary notes for hadoop)
imported>Muehlenhoff
Line 9: Line 9:
= Kafka brokers =
= Kafka brokers =


One Kafka broker can be restarted/rebooted at a time. It needs to be ensured that all replicas are[https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Kafka/Administration#Safe_Broker_Restarts fully replicated].
One Kafka broker can be restarted/rebooted at a time:
service kafka restart
 
It needs to be ensured that all replicas are[https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Kafka/Administration#Safe_Broker_Restarts fully replicated].
After restarting a broker a [https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Kafka/Administration#Replica_Elections replica election] should be performed.
After restarting a broker a [https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Kafka/Administration#Replica_Elections replica election] should be performed.



Revision as of 18:53, 25 November 2015

This page collects procedures to restart services (or reboot the underlying server) in the WMF production cluster.

Hadoop workers

Three of the hadoop workers run an additional JournalNode process. These are configured in the puppet manifest:

  • The other Hadoop workers are running two services (hadoop-hdfs-datanode and hadoop-yarn-nodemanager). The services on the Hadoop workers can be restarted in arbitrary orde. The service restarts have no user-visible impact (and the machines can also be rebooted).
  • TODO: Add notes for JournalNode hosts

Kafka brokers

One Kafka broker can be restarted/rebooted at a time:

service kafka restart

It needs to be ensured that all replicas arefully replicated. After restarting a broker a replica election should be performed.

ntpd

We run four ntpd servers (chromium, hydrogen, acamar, achenar) and all of these are configured for use by the other servers in the cluster. As such, as long as only one server is restarted/rebooted at at time, everything is fine. The ntpd running locally on the individual servers can easily be restarted at any any time.