You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Search
Looking for lsearchd? That's been moved
In case of emergency
Here is the (sorted) emergency contact list for Search issues on the WMF production cluster:
- David Causse
- Erik Berhardson
- James Douglas
- Nik Everett
You can get contact information from officewiki.
Overview
This system has three components: Elastica, CirrusSearch, and Elasticsearch.
Elastica
Elastica is a MediaWiki extension that provides the library to interface with Elasticsearch. It wraps the Elastica library. It has no configuration.
CirrusSearch
CirrusSearch is a MediaWiki extension that provides search support backed by Elasticsearch.
Turning on/off
- $wmgUseCirrus will set the wiki to use CirrusSearch by default for all searches. Users can get the old search by adding &srbackend=LuceneSearch to the search results page url.
Configuration
The canonical location of the configuration documentation is in the CirrusSearch.php file in the extension source. It also contains the defaults. A pool counter configuration example lives in the README in the extension source.
WMF runs the default configuration except for elasticsearch server locations but overrides would live in the CirrusSearch-{common|production|labs}.php files in the mediawiki-config files.
Elasticsearch
Elasticsearch is a multi-node Lucene implementation. No individual node is a single point of failure.
Install
To install Elasticsearch on a node, use role::elasticsearch::server in puppet. This will install the elasticsearch Debian package and setup the service and appropriate monitoring. It should automatically join the cluster.
Configuration
role::elasticsearch::server has no configuration.
In labs a config parameter is required. It is called elasticsearch::cluster_name and names the cluster that the machine should join.
role::elasticsearch::server use ::elasticsearch to install elasticsearch with the following configuration:
Environment | Memory | Cluster Name | Multicast Group | Servers |
---|---|---|---|---|
labs | 2G | configured | 224.2.2.4 | |
beta | 4G | beta-search | 224.2.2.4 | deployment-elastic0[5678] |
production | 30G | production-search-${::site} | 224.2.2.5 (eqiad)/224.2.2.6 (pmtpa) | elastic10([012][0-9]*or*3[01]) (eqiad)/none yet (pmtpa) |
Note that the multicast group and cluster name are different in different production sites. This is because the network is flat from the perspective of Elasticsearch's multicast and it is way easier to reserve separate IPs than it is to keep up with multicast ttl.
Currently we have not made provisions to run multiple clusters in production or beta but that may be required at some point.
Files
/etc/default/elasticsearch | Simple settings that must be initialized either on the command line (memory) or really early (memlockall) in execution |
/etc/elasticsearch/logging.yml | Logging configuration |
/etc/logrotate.d/elasticsearch | Log rolling managed by logrotate |
/etc/elasticsearch/elasticsearch.yml | Everything not covered above, e.g. cluster name, multicast group |
Current setup
elastic1002, elastic1007, elastic1014 are all master eligible.
Administration
All of our (CirrusSearch's) scripts have been designated to run on terbium.eqiad.wmnet.
Adding new wikis
All wikis now get Cirrus by default as primary. They have to opt out in InitializeSettings.php if they only want Cirrus as a secondary search option and want to use lsearchd instead. To add a new Cirrus wiki:
- Estimate the number of shards required (one, the default, is fine for new wikis).
- Create the search index
- addWiki.php should do this automatically for new wikis
- Populate the search index
Estimating the number of shards required
If the wiki is small then skip this step. The defaults are good for small wikis. If it isn't, use one of the three methods below to come up with numbers and add them to mediawiki-config's InitializeSettings.php file under wmgCirrusSearchShardCount.
See #Resharding if you are correcting a shard count error.
If it hasn't been indexed yet
You should compare its content and general page counts with a wiki with similarly sized pages. Mostly this means a wiki in a different language of the same type. Enwikisource and frwikisource are similar, for example. Wiktionaries pages are very small. As are wikidatawiki's. In any case pick a number proportional to the number for the other wiki. Don't worry too much about being wrong - resharding is as simple as changing the value in the config and performing an in place reindex. To count the pages I log in to tools-login.pmtpa.wmflabs then
sql $wiki SELECT COUNT(*) FROM page WHERE page_namespace = 0; SELECT COUNT(*) FROM page WHERE page_namespace != 0; sql $comparableWiki SELECT COUNT(*) FROM page WHERE page_namespace = 0; SELECT COUNT(*) FROM page WHERE page_namespace != 0;
Wiki's who's indexes are split other ways then content/general should use the same methodology but be sure to do it once per index and use the right namespaces for counts.
If it has been indexed
You can get the actual size of the primary shards (in GB), divide by two, round the result to a whole number, and spit ball estimate for growth so you don't have to go do this again really soon. Normally I add one to the number if the primary shards already total to at least one GB and it isn't a wiktionary. Don't worry too much about being wrong because you can change this with an in place reindex. Anyway, to get the size log in to an elasticsearch machine and run this:
curl -s localhost:9200/_stats?pretty | grep 'general\|content\|"size"\|count' | less
Count and size are repeated twice. The first time is for the primary shards and the second includes all replicas. You can ignore the replica numbers.
We max out the number of shards at 5 for content indexes and 10 for non-content indexes.
Create the index
mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki $wiki
That'll create the search index with all the proper configuration. Addwiki should have done this automatically for you.
Populate the search index
mkdir -p ~/log mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --skipLinks --indexOnSkip --queue | tee ~/log/$wiki.parse.log mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --skipParse --queue | tee ~/log/$wiki.links.log
If the last line of the output of the --skipLinks line doesn't end with "jobs left on the queue" wait a few minutes before launching the second job. The job queue doesn't update its counts quickly and the job might queue everything before the counts catch up and still see the queue as empty.
Health/Activity Monitoring
We have nagios checks that monitor Elasticsearch's claimed health (green, yellow, red) and ganglia monitoring on a number of attributes exposed by Elasticsearch.
Waiting for Elasticsearch to "go green"
Elasticsearch has a built in cluster health monitor. red
means there are unassigned primary shards and is bad because some requests will fail. Yellow
means there are unassigned replica shards but the masters are doing just fine. This is mostly fine because technically all requests can still be served but there is less redundancy then normal and there are less shards to handle queries. Performance may be effected in the yellow
state but requests won't just fail. Anyway, this is how you wait for the cluster to "go green".
until curl -s localhost:9200/_cluster/health?pretty | tee /tmp/status | grep \"status\" | grep \"green\"; do cat /tmp/status echo sleep 1 done
So you aren't just watching number scroll by for fun I'll explain what they all mean:
{ "cluster_name" : "labs-search-project", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 4, "number_of_data_nodes" : 4, "active_primary_shards" : 30, "active_shards" : 55, "relocating_shards" : 0, "initializing_shards" : 1, "unassigned_shards" : 0 }
- cluster_name
- The name of the cluster.
- status
- The status we're waiting for.
- timed_out
- Has this requests for status timed out? I've never seen one time out.
- number_of_nodes
- How many nodes (data or otherwise) are in the cluster? We plan to have some master only nodes at some point so this should be three more than
number_of_data_nodes
- number_of_data_nodes
- How many nodes that hold data are in the cluster?
- active_primary_shards
- How many of the primary shards are active? This should be
indexes * shards
. This number shouldn't change frequently because when the primary shards go off line one of the replica shards should take over immediately. It should be too fast to notice. - active_shards
- How many shards are active? This should be
indexes * shards * (1 + replicas)
. This will go down when a machine leaves the cluster. When possible those shards will be reassigned to other machines. This is possible so long as those machines don't already hold a copy of that replica. - relocating_shards
- How many shards are currently relocating?
- initializing_shards
- How many shards are initializing? This mostly means they are being restored from a peer. This should max out at
cluster.routing.allocation.node_concurrent_recoveries
. We still use the default of2
. - unassigned_shards
- How many shards have yet to be assigned? These are just waiting in line to become
initializing_shards
.
See the Trouble section for what can go wrong with this.
Restarting a node
Elasticsearch rebalances shards when machines join and leave the cluster. Because this can take some time we've stopped puppet from restarting Elasticsearch on config file changes. We expect to manually perform rolling restarts to pick up config changes or software updates (via apt-get), at least for the time being. There are two ways to do this: the fast way, and the safe way. At this point we prefer the fast way if the downtime is quick and the safe way if it isn't. The fast way in the Elasticsearch recommended way. The safe way keeps the cluster green the whole time but is really slow and can cause the cluster to get a tad unbalanced if it is running close to the edge on disk space.
The fast way:
es-tool restart-fast
This will instruct Elasticsearch not to allocate new replicas for the duration of the downtime. It should be faster then the simple way because unchanged indexes can be restored on the restarted machine quickly. It still takes some time.
The safe way starts with this to instruct Elasticsearch to move all shards off of this node as fast as it can without going under the appropriate number of replicas:
ip=$(facter ipaddress) curl -XPUT localhost:9200/_cluster/settings?pretty -d "{ \"transient\" : { \"cluster.routing.allocation.exclude._ip\" : \"$ip\" } }"
Now you wait for all the shards to move off of this one:
host=$(facter hostname) function moving() { curl -s localhost:9200/_cluster/state | jq -c '.nodes as $nodes | .routing_table.indices[].shards[][] | select(.relocating_node) | {index, shard, from: $nodes[.node].name, to: $nodes[.relocating_node].name}' } while moving | grep $host; do echo; sleep 1; done
Now you double check that they are all off. See the advice under stuck in yellow if they aren't. It is probably because Elasticsearch can't find another place for them to go:
curl -s localhost:9200/_cluster/state?pretty | awk ' BEGIN {more=1} {if (/"nodes"/) nodes=1} {if (/"metadata"/) nodes=0} {if (nodes && !/"name"/) {node_name=$1; gsub(/[",]/, "", node_name)}} {if (nodes && /"name"/) {name=$3; gsub(/[",]/, "", name); node_names[node_name]=name}} {if (/"node"/) {node=$3; gsub(/[",]/, "", node)}} {if (/"shard"/) {shard=$3; gsub(/[",]/, "", shard)}} {if (more && /"index"/) { index_name=$3 gsub(/[",]/, "", index_name) print "node="node_names[node]" shard="index_name":"shard }} ' | grep $host
Now you can restart this node and the cluster with stay green:
sudo /etc/init.d/elasticsearch restart until curl -s localhost:9200/_cluster/health?pretty ; do sleep 1 done
Next you tell Elasticsearch that it can move shards back to this node:
curl -XPUT localhost:9200/_cluster/settings?pretty -d "{ \"transient\" : { \"cluster.routing.allocation.exclude._ip\" : \"\" } }"
Finally you can use the same code that you used to find stuff shards moving from this node to find shards moving to this node:
while moving | grep $host; do echo; sleep 1; done
Rolling restarts
This script will perform a rolling restart across all nodes using the fast way mentioned above:
# Build the servers file with servers to restart
export prefix=elastic10
export suffix=.eqiad.wmnet
rm -f servers
for i in $(seq -w 1 31); do
echo $prefix$i$suffix >> servers
done
# Restart them
cat << __commands__ > /tmp/commands
# sudo apt-get update
# sudo apt-get install elasticsearch
# wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.0.deb
# sudo dpkg -i --force-confdef --force-confold elasticsearch-1.1.0.deb
sudo es-tool restart-fast
echo "Bouncing gmond to make sure the statistics are up to date..."
sudo /etc/init.d/ganglia-monitor restart
__commands__
for server in $(cat servers); do
scp /tmp/commands $server:/tmp/commands
ssh $server bash /tmp/commands
done
Recovering from an Elasticsearch outage/interruption in updates
The same script that populates the search index can be run over a more limited list of pages:
mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --from <YYYY-mm-ddTHH:mm:ssZ> --to <YYYY-mm-ddTHH:mm:ssZ>
or:
mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --fromId <id> --toId <id>
So the simplest way to recover from an Elasticsearch outage should be to use --from and --to with the times of the outage. Don't be stingy with the dates - it is better to reindex too many pages than too few.
If there was an outage its probably good to also do:
mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --deletes --from <YYYY-mm-ddTHH:mm:ssZ> --to <YYYY-mm-ddTHH:mm:ssZ>
This will pick up deletes which need to be iterated separately.
This is the script I have in my notes for recovering from an outage:
function outage() { wiki=$1 start=$2 end=$3 mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --from $start --to $end --deletes | tee -a ~/cirrus_log/$wiki.outage.log mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --from $start --to $end --queue | tee -a ~/cirrus_log/$wiki.outage.log } while read wiki ; do outage $wiki '2015-03-01T20:00:00Z' '2015-03-02T00:00:00Z' done < /srv/mediawiki/all.dblist
In place reindex
Some releases require an in place reindex. This is due to analyzer changes. Sometime the analyzer will only change for wikis in particular languages so only those wikis will need an update. In any case, this is how you perform an in place reindex:
function reindex() { wiki=$1 TZ=UTC export REINDEX_START=$(date +%Y-%m-%dT%H:%m:%SZ) mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki $wiki --reindexAndRemoveOk --indexIdentifier now --reindexProcesses 10 2>&1 | tee ~/cirrus_log/$wiki.reindex.log mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --from $REINDEX_START --deletes | tee -a ~/cirrus_log/$wiki.reindex.log mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --from $REINDEX_START | tee -a ~/cirrus_log/$wiki.reindex.log } while read wiki ; do reindex $wiki done < ../reindex_me
Don't worry about incompatibility during the update - CirrusSearch is maintained so that queries and updates will always work against the currently deployed index as well as the new index configuration. Once the new index has finished building (the second command) it'll replace the old one automatically without any interruption of service. Some updates will have been lost during the reindex process. The third command will catch those update
The reindex_me file should be the wikis you want to reindex. For group 0 use:
/a/common/group0.dblist
For group 1 use:
rm -f allcirrus while read wiki ; do echo 'global $wgCirrusSearchServers; if (isset($wgCirrusSearchServers)) {echo wfWikiId();}' | mwscript maintenance/eval.php --wiki $wiki done < /usr/local/apache/common/all.dblist | sed '/^$/d' | tee allcirrus mv allcirrus allcirrus.old sort allcirrus.old | uniq > allcirrus rm allcirrus.old diff allcirrus /usr/local/apache/common/group0.dblist | grep '<' | cut -d' ' -f 2 | diff - /usr/local/apache/common/wikipedia.dblist | grep '<' | cut -d' ' -f 2 > cirrus_group1 cp cirrus_group1 reindex_me
For group 2 use:
cat allcirrus /usr/local/apache/common/wikipedia.dblist | sort | uniq -c | grep " 2" | cut -d' ' -f 8 > cirrus_wikipedia cp cirrus_wikipedia reindex_me
Full reindex
Some releases will require a full, from scratch, reindex. This is due to changes in the way the Mediawiki sends data to Elasticsearch. These changes typically will have to be performed for all deployed wikis and are time consuming. First, do an in place reindex as above. Then use this to make scripts that run the full reindex:
function make_index_commands() {
wiki=$1
mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --queue --maxJobs 10000 --pauseForJobs 1000 \
--forceUpdate --buildChunks 250000 |
sed -e 's/^/mwscript extensions\/CirrusSearch\/maintenance\//' |
sed -e 's/$/ | tee -a cirrus_log\/'$wiki'.parse.log/'
}
function make_many_reindex_commands() {
wikis=$(pwd)/$1
rm -rf cirrus_scripts
mkdir cirrus_scripts
pushd cirrus_scripts
while read wiki ; do
make_index_commands $wiki
done < $wikis | split -n r/5
for script in x*; do sort -R $script > $script.sh && rm $script; done
popd
}
Then run the scripts it makes in screen sessions.
Adding new nodes
Add the new nodes in puppet making sure to include all the current roles and account, sudo, and lvs settings.
Once the node is installed and puppet has run, you should be left with large RAID 0 ext4 /var/lib/elasticsearch mount point. Two things should be tweaked about this mount point:
service elasticsearch stop
umount /var/lib/elasticsearch
# Remove the default 5% blocks reserved for privileged processes.
tune2fs -m 0 /dev/md2
# add the noatime option to fstab.
sed -i 's@/var/lib/elasticsearch ext4 defaults@/var/lib/elasticsearch ext4 defaults,noatime@' /etc/fstab
mount /var/lib/elasticsearch
service elasticsearch start
Add the node to lvs.
You can watch he node suck up shards:
curl -s localhost:9200/_cluster/state?pretty | awk ' BEGIN {more=1} {if (/"nodes"/) nodes=1} {if (/"metadata"/) nodes=0} {if (nodes && !/"name"/) {node_name=$1; gsub(/[",]/, "", node_name)}} {if (nodes && /"name"/) {name=$3; gsub(/[",]/, "", name); node_names[node_name]=name}} {if (/"RELOCATING"/) relocating=1} {if (/"routing_nodes"/) more=0} {if (/"node"/) {from_node=$3; gsub(/[",]/, "", from_node)}} {if (/"relocating_node"/) {to_node=$3; gsub(/[",]/, "", to_node)}} {if (/"shard"/) {shard=$3; gsub(/[",]/, "", shard)}} {if (more && relocating && /"index"/) { index_name=$3 gsub(/[",]/, "", index_name) print "from="node_names[from_node]" to="node_names[to_node]" shard="index_name":"shard relocating=0 }} '
Removing nodes
Push all the shards off the nodes you want to remove like this:
curl -XPUT localhost:9200/_cluster/settings -d '{ "transient" : { "cluster.routing.allocation.exclude._ip" : "10.x.x.x,10.x.x.y,..." } }'
Then wait for there to be no more relocating shards:
echo "Waiting until relocating shards is 0..." until curl -s localhost:9200/_cluster/health?pretty | tee /tmp/status | grep '"relocating_shards" : 0,'; do cat /tmp/status | grep relocating_shards sleep 2 done
Use the cluster state API to make sure all the shards are off of the node:
curl localhost:9200/_cluster/state?pretty | less
Elasticsearch will leave the shard on a node even if it can't find another place for it to go. If you see that then you have probably removed too many nodes. Finally, you can decommission those nodes in puppet. The is no need to do it one at a time because those nodes aren't hosting any shards any more.
Deploying plugins
Plugins are deployed via the repository at https://gerrit.wikimedia.org/r/operations/software/elasticsearch/plugins
.
To use it, clone it, install git-fat (e.g. pip install --user git-fat
), then run git fat init
in the root of the repository. Add your jars and commit as normal but note that git-fat will add them as text files containing hashes.
To get the jars into place build them against http://archiva.wikimedia.org/ then upload them to the "Wikimedia Release Repository". You can also download them from central and verify their checksums, and then upload them to the release repository. If you need a dependency then make sure to verify its checksum and upload it to the "Wikimedia Mirrored Repository".
Once you've got the jars into Archiva wait a while for the git-fat archive to build. Then you can sync it to beta by going to the gerrit page for the change, copying the checkout link, then:
ssh deployment-tin.eqiad.wmflabs cd /srv/deployment/elasticsearch/plugins/ git deploy start <paste checkout link> git deploy sync <follow instructions>
Since the git deploy/git fat process can be a bit tempermental, verify that the plugins made it:
export prefix=deployment-elastic0 export suffix=.eqiad.wmflabs rm -f servers for i in {5..8}; do echo $prefix$i$suffix >> servers done
cat << __commands__ > /tmp/commands find /srv/deployment/elasticsearch/plugins/ -name "*.jar" -type f | xargs du -h | sort -k 2 __commands__ for server in $(cat servers); do scp /tmp/commands $server:/tmp/commands ssh $server bash /tmp/commands done
If a file is 4.0k it is probably not a valid jar file.
Now once the files are synced in beta do a rolling restart in beta and they'll be loaded. Magic, eh?
To get the plugins to production repeat the above process but instead of deployment-bastion.eqiad.wmflabs
use tin.eqiad.wmnet
and instead of pasting the checkout links do a git pull
. Use the following as the list of servers to check and for the rolling restart.
export prefix=elastic100 export suffix=.eqiad.wmnet rm -f servers for i in {1..9}; do echo $prefix$i$suffix >> servers done export prefix=elastic101 for i in {0..9}; do echo $prefix$i$suffix >> servers done export prefix=elastic102 for i in {0..9}; do echo $prefix$i$suffix >> servers done export prefix=elastic103 for i in {0..1}; do echo $prefix$i$suffix >> servers done
Trouble
If Elasticsearch is in trouble, it can show itself in many ways. Searches could fail on the wiki, job queue could get backed up with updates, pool counter overflowing with unperformed searches, or just plain old high-load that won't go away. Luckily, Elasticsearch is very good at recovering from failure so most of the time these sorts of problems aren't life threatening. For some more specific problems and their mitigation techniques, see the /Trouble subpage.