You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Search

From Wikitech-static
Revision as of 19:39, 15 June 2015 by imported>Chad (→‎Configuration: Minor tweaks to role info)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Looking for lsearchd? That's been moved

Overview

This system has three components: Elastica, CirrusSearch, and Elasticsearch.

Elastica

Elastica is a MediaWiki extension that provides the library to interface with Elasticsearch. It wraps the Elastica library. It has no configuration.

CirrusSearch

CirrusSearch is a MediaWiki extension that provides search support backed by Elasticsearch.

Turning on/off

  • $wmgUseCirrus will set the wiki to use CirrusSearch by default for all searches. Users can get the old search by adding &srbackend=LuceneSearch to the search results page url.

Configuration

The canonical location of the configuration documentation is in the CirrusSearch.php file in the extension source. It also contains the defaults. A pool counter configuration example lives in the README in the extension source.

WMF runs the default configuration except for elasticsearch server locations but overrides would live in the CirrusSearch-{common|production|labs}.php files in the mediawiki-config files.

Elasticsearch

Elasticsearch is a multi-node Lucene implementation. No individual node is a single point of failure.

Install

To install Elasticsearch on a node, use role::elasticsearch::server in puppet. This will install the elasticsearch Debian package and setup the service and appropriate monitoring. It should automatically join the cluster.

Configuration

role::elasticsearch::server has no configuration.

In labs a config parameter is required. It is called elasticsearch::cluster_name and names the cluster that the machine should join.

role::elasticsearch::server use ::elasticsearch to install elasticsearch with the following configuration:

Environment Memory Cluster Name Multicast Group
labs 2G configured 224.2.2.4
beta 4G beta-search 224.2.2.4
production 30G production-search-${::site} 224.2.2.5 (eqiad)/224.2.2.6 (pmtpa)

Note that the multicast group and cluster name are different in different production sites. This is because the network is flat from the perspective of Elasticsearch's multicast and it is way easier to reserve separate IPs than it is to keep up with multicast ttl.

Currently we have not made provisions to run multiple clusters in production or beta but that may be required at some point.

Files
/etc/default/elasticsearch Simple settings that must be initialized either on the command line (memory) or really early (memlockall) in execution
/etc/elasticsearch/logging.yml Logging configuration
/etc/logrotate.d/elasticsearch Log rolling managed by logrotate
/etc/elasticsearch/elasticsearch.yml Everything not covered above, e.g. cluster name, multicast group

Current setup

Cluster as of August 2014

elastic1002, elastic1007, elastic1014 are all master eligible.

Administration

All of our (CirrusSearch's) scripts have been designated to run on terbium.eqiad.wmnet.

Adding new wikis

All wikis now get Cirrus by default as primary. They have to opt out in InitializeSettings.php if they only want Cirrus as a secondary search option and want to use lsearchd instead. To add a new Cirrus wiki:

  1. Estimate the number of shards required (one, the default, is fine for new wikis).
  2. Create the search index
    1. addWiki.php should do this automatically for new wikis
  3. Populate the search index

Estimating the number of shards required

If the wiki is small then skip this step. The defaults are good for small wikis. If it isn't, use one of the three methods below to come up with numbers and add them to mediawiki-config's InitializeSettings.php file under wmgCirrusSearchShardCount.

See #Resharding if you are correcting a shard count error.

If it hasn't been indexed yet

You should compare its content and general page counts with a wiki with similarly sized pages. Mostly this means a wiki in a different language of the same type. Enwikisource and frwikisource are similar, for example. Wiktionaries pages are very small. As are wikidatawiki's. In any case pick a number proportional to the number for the other wiki. Don't worry too much about being wrong - resharding is as simple as changing the value in the config and performing an in place reindex. To count the pages I log in to tools-login.pmtpa.wmflabs then

sql $wiki
 SELECT COUNT(*) FROM page WHERE page_namespace = 0;
 SELECT COUNT(*) FROM page WHERE page_namespace != 0;
sql $comparableWiki
 SELECT COUNT(*) FROM page WHERE page_namespace = 0;
 SELECT COUNT(*) FROM page WHERE page_namespace != 0;

Wiki's who's indexes are split other ways then content/general should use the same methodology but be sure to do it once per index and use the right namespaces for counts.

If it has been indexed

You can get the actual size of the primary shards (in GB), divide by two, round the result to a whole number, and spit ball estimate for growth so you don't have to go do this again really soon. Normally I add one to the number if the primary shards already total to at least one GB and it isn't a wiktionary. Don't worry too much about being wrong because you can change this with an in place reindex. Anyway, to get the size log in to an elasticsearch machine and run this:

curl -s localhost:9200/_stats?pretty | grep 'general\|content\|"size"\|count' | less

Count and size are repeated twice. The first time is for the primary shards and the second includes all replicas. You can ignore the replica numbers.

We max out the number of shards at 5 for content indexes and 10 for non-content indexes.

Create the index

mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki $wiki

That'll create the search index with all the proper configuration. Addwiki should have done this automatically for you.

Populate the search index

mkdir -p ~/log
mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --skipLinks --indexOnSkip --queue | tee ~/log/$wiki.parse.log
mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --skipParse --queue | tee ~/log/$wiki.links.log

If the last line of the output of the --skipLinks line doesn't end with "jobs left on the queue" wait a few minutes before launching the second job. The job queue doesn't update its counts quickly and the job might queue everything before the counts catch up and still see the queue as empty.

Health/Activity Monitoring

We have nagios checks that monitor Elasticsearch's claimed health (green, yellow, red) and ganglia monitoring on a number of attributes exposed by Elasticsearch.

Waiting for Elasticsearch to "go green"

Elasticsearch has a built in cluster health monitor. red means there are unassigned primary shards and is bad because some requests will fail. Yellow means there are unassigned replica shards but the masters are doing just fine. This is mostly fine because technically all requests can still be served but there is less redundancy then normal and there are less shards to handle queries. Performance may be effected in the yellow state but requests won't just fail. Anyway, this is how you wait for the cluster to "go green".

until curl -s localhost:9200/_cluster/health?pretty | tee /tmp/status | grep \"status\" | grep \"green\"; do
  cat /tmp/status
  echo
  sleep 1
done

So you aren't just watching number scroll by for fun I'll explain what they all mean:

{
  "cluster_name" : "labs-search-project",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 30,
  "active_shards" : 55,
  "relocating_shards" : 0,
  "initializing_shards" : 1,
  "unassigned_shards" : 0
}
cluster_name
The name of the cluster.
status
The status we're waiting for.
timed_out
Has this requests for status timed out? I've never seen one time out.
number_of_nodes
How many nodes (data or otherwise) are in the cluster? We plan to have some master only nodes at some point so this should be three more than number_of_data_nodes
number_of_data_nodes
How many nodes that hold data are in the cluster?
active_primary_shards
How many of the primary shards are active? This should be indexes * shards. This number shouldn't change frequently because when the primary shards go off line one of the replica shards should take over immediately. It should be too fast to notice.
active_shards
How many shards are active? This should be indexes * shards * (1 + replicas). This will go down when a machine leaves the cluster. When possible those shards will be reassigned to other machines. This is possible so long as those machines don't already hold a copy of that replica.
relocating_shards
How many shards are currently relocating?
initializing_shards
How many shards are initializing? This mostly means they are being restored from a peer. This should max out at cluster.routing.allocation.node_concurrent_recoveries. We still use the default of 2.
unassigned_shards
How many shards have yet to be assigned? These are just waiting in line to become initializing_shards.

See the Trouble section for what can go wrong with this.

Restarting a node

Elasticsearch rebalances shards when machines join and leave the cluster. Because this can take some time we've stopped puppet from restarting Elasticsearch on config file changes. We expect to manually perform rolling restarts to pick up config changes or software updates (via apt-get), at least for the time being. There are two ways to do this: the fast way, and the safe way. At this point we prefer the fast way if the downtime is quick and the safe way if it isn't. The fast way in the Elasticsearch recommended way. The safe way keeps the cluster green the whole time but is really slow and can cause the cluster to get a tad unbalanced if it is running close to the edge on disk space.

The fast way:

es-tool restart-fast

This will instruct Elasticsearch not to allocate new replicas for the duration of the downtime. It should be faster then the simple way because unchanged indexes can be restored on the restarted machine quickly. It still takes some time.

The safe way starts with this to instruct Elasticsearch to move all shards off of this node as fast as it can without going under the appropriate number of replicas:

ip=$(facter ipaddress)
curl -XPUT localhost:9200/_cluster/settings?pretty -d "{
    \"transient\" : {
        \"cluster.routing.allocation.exclude._ip\" : \"$ip\"
    }
}"

Now you wait for all the shards to move off of this one:

host=$(facter hostname)
function moving() {
   curl -s localhost:9200/_cluster/state | jq -c '.nodes as $nodes |
       .routing_table.indices[].shards[][] |
       select(.relocating_node) | {index, shard, from: $nodes[.node].name, to: $nodes[.relocating_node].name}'
}
while moving | grep $host; do echo; sleep 1; done

Now you double check that they are all off. See the advice under stuck in yellow if they aren't. It is probably because Elasticsearch can't find another place for them to go:

curl -s localhost:9200/_cluster/state?pretty | awk '
    BEGIN {more=1}
    {if (/"nodes"/) nodes=1}
    {if (/"metadata"/) nodes=0}
    {if (nodes && !/"name"/) {node_name=$1; gsub(/[",]/, "", node_name)}}
    {if (nodes && /"name"/) {name=$3; gsub(/[",]/, "", name); node_names[node_name]=name}}
    {if (/"node"/) {node=$3; gsub(/[",]/, "", node)}}
    {if (/"shard"/) {shard=$3; gsub(/[",]/, "", shard)}}
    {if (more && /"index"/) {
        index_name=$3
        gsub(/[",]/, "", index_name)
        print "node="node_names[node]" shard="index_name":"shard
    }}
' | grep $host

Now you can restart this node and the cluster with stay green:

sudo /etc/init.d/elasticsearch restart
until curl -s localhost:9200/_cluster/health?pretty ; do
    sleep 1
done

Next you tell Elasticsearch that it can move shards back to this node:

curl -XPUT localhost:9200/_cluster/settings?pretty -d "{
    \"transient\" : {
        \"cluster.routing.allocation.exclude._ip\" : \"\"
    }
}"

Finally you can use the same code that you used to find stuff shards moving from this node to find shards moving to this node:

while moving | grep $host; do echo; sleep 1; done

Rolling restarts

This script will perform a rolling restart across all nodes using the fast way mentioned above:

# Build the servers file with servers to restart
export prefix=elastic10
export suffix=.eqiad.wmnet
rm -f servers
for i in $(seq -w 1 31); do
    echo $prefix$i$suffix >> servers
done

# Restart them
cat << __commands__ > /tmp/commands
# sudo apt-get update
# sudo apt-get install elasticsearch
# wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.0.deb
# sudo dpkg -i --force-confdef --force-confold elasticsearch-1.1.0.deb
sudo es-tool restart-fast
echo "Bouncing gmond to make sure the statistics are up to date..."
sudo /etc/init.d/ganglia-monitor restart
__commands__

for server in $(cat servers); do
    scp /tmp/commands $server:/tmp/commands
    ssh $server bash /tmp/commands
done

Recovering from an Elasticsearch outage/interruption in updates

The same script that populates the search index can be run over a more limited list of pages:

mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --from <YYYY-mm-ddTHH:mm:ssZ> --to <YYYY-mm-ddTHH:mm:ssZ>

or:

mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --fromId <id> --toId <id>

So the simplest way to recover from an Elasticsearch outage should be to use --from and --to with the times of the outage. Don't be stingy with the dates - it is better to reindex too many pages than too few.

If there was an outage its probably good to also do:

mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --deletes --from <YYYY-mm-ddTHH:mm:ssZ> --to <YYYY-mm-ddTHH:mm:ssZ>

This will pick up deletes which need to be iterated separately.

This is the script I have in my notes for recovering from an outage:

function outage() {
   wiki=$1
   start=$2
   end=$3
   mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --from $start --to $end --deletes | tee -a ~/cirrus_log/$wiki.outage.log
   mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --from $start --to $end --queue | tee -a ~/cirrus_log/$wiki.outage.log
}
while read wiki ; do
   outage $wiki '2015-03-01T20:00:00Z' '2015-03-02T00:00:00Z'
done < /srv/mediawiki/all.dblist

In place reindex

Some releases require an in place reindex. This is due to analyzer changes. Sometime the analyzer will only change for wikis in particular languages so only those wikis will need an update. In any case, this is how you perform an in place reindex:

function reindex() {
    wiki=$1
    TZ=UTC export REINDEX_START=$(date +%Y-%m-%dT%H:%m:%SZ)
    mwscript extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php --wiki $wiki --reindexAndRemoveOk --indexIdentifier now --reindexProcesses 10 2>&1 | tee ~/cirrus_log/$wiki.reindex.log
    mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --from $REINDEX_START --deletes | tee -a ~/cirrus_log/$wiki.reindex.log
    mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --from $REINDEX_START | tee -a ~/cirrus_log/$wiki.reindex.log
}
while read wiki ; do
    reindex $wiki
done < ../reindex_me

Don't worry about incompatibility during the update - CirrusSearch is maintained so that queries and updates will always work against the currently deployed index as well as the new index configuration. Once the new index has finished building (the second command) it'll replace the old one automatically without any interruption of service. Some updates will have been lost during the reindex process. The third command will catch those update

The reindex_me file should be the wikis you want to reindex. For group 0 use:

/a/common/group0.dblist

For group 1 use:

rm -f allcirrus
while read wiki ; do
    echo 'global $wgCirrusSearchServers; if (isset($wgCirrusSearchServers)) {echo wfWikiId();}' |
        mwscript maintenance/eval.php --wiki $wiki
done < /usr/local/apache/common/all.dblist | sed '/^$/d' | tee allcirrus
mv allcirrus allcirrus.old
sort allcirrus.old | uniq > allcirrus
rm allcirrus.old
diff allcirrus /usr/local/apache/common/group0.dblist | grep '<' | cut -d' ' -f 2 | diff - /usr/local/apache/common/wikipedia.dblist | grep '<' | cut -d' ' -f 2 > cirrus_group1
cp cirrus_group1 reindex_me

For group 2 use:

cat allcirrus /usr/local/apache/common/wikipedia.dblist | sort | uniq -c | grep "  2" | cut -d' ' -f 8 > cirrus_wikipedia
cp cirrus_wikipedia reindex_me

Full reindex

Some releases will require a full, from scratch, reindex. This is due to changes in the way the Mediawiki sends data to Elasticsearch. These changes typically will have to be performed for all deployed wikis and are time consuming. First, do an in place reindex as above. Then use this to make scripts that run the full reindex:

 function make_index_commands() {
     wiki=$1
     mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --queue --maxJobs 10000 --pauseForJobs 1000 \
         --forceUpdate --buildChunks 250000 |
         sed -e 's/^/mwscript extensions\/CirrusSearch\/maintenance\//' |
         sed -e 's/$/ | tee -a cirrus_log\/'$wiki'.parse.log/'
 }
 function make_many_reindex_commands() {
     wikis=$(pwd)/$1
     rm -rf cirrus_scripts
     mkdir cirrus_scripts
     pushd cirrus_scripts
     while read wiki ; do
         make_index_commands $wiki
     done < $wikis | split -n r/5
     for script in x*; do sort -R $script > $script.sh && rm $script; done
     popd
 }

Then run the scripts it makes in screen sessions.

Adding new nodes

Add the new nodes in puppet making sure to include all the current roles and account, sudo, and lvs settings.

Once the node is installed and puppet has run, you should be left with large RAID 0 ext4 /var/lib/elasticsearch mount point. Two things should be tweaked about this mount point:

service elasticsearch stop
umount /var/lib/elasticsearch

# Remove the default 5% blocks reserved for privileged processes.
tune2fs -m 0 /dev/md2

# add the noatime option to fstab.
sed -i 's@/var/lib/elasticsearch ext4    defaults@/var/lib/elasticsearch ext4    defaults,noatime@' /etc/fstab

mount /var/lib/elasticsearch
service elasticsearch start

Add the node to lvs.

You can watch he node suck up shards:

curl -s localhost:9200/_cluster/state?pretty | awk '
    BEGIN {more=1}
    {if (/"nodes"/) nodes=1}
    {if (/"metadata"/) nodes=0}
    {if (nodes && !/"name"/) {node_name=$1; gsub(/[",]/, "", node_name)}}
    {if (nodes && /"name"/) {name=$3; gsub(/[",]/, "", name); node_names[node_name]=name}}
    {if (/"RELOCATING"/) relocating=1}
    {if (/"routing_nodes"/) more=0}
    {if (/"node"/) {from_node=$3; gsub(/[",]/, "", from_node)}}
    {if (/"relocating_node"/) {to_node=$3; gsub(/[",]/, "", to_node)}}
    {if (/"shard"/) {shard=$3; gsub(/[",]/, "", shard)}}
    {if (more && relocating && /"index"/) {
        index_name=$3
        gsub(/[",]/, "", index_name)
        print "from="node_names[from_node]" to="node_names[to_node]" shard="index_name":"shard
        relocating=0
    }}
'

Removing nodes

Push all the shards off the nodes you want to remove like this:

curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
        "cluster.routing.allocation.exclude._ip" : "10.x.x.x,10.x.x.y,..."
    }
}'

Then wait for there to be no more relocating shards:

echo "Waiting until relocating shards is 0..."
until curl -s localhost:9200/_cluster/health?pretty | tee /tmp/status | grep '"relocating_shards" : 0,'; do
   cat /tmp/status | grep relocating_shards
   sleep 2
done

Use the cluster state API to make sure all the shards are off of the node:

curl localhost:9200/_cluster/state?pretty | less

Elasticsearch will leave the shard on a node even if it can't find another place for it to go. If you see that then you have probably removed too many nodes. Finally, you can decommission those nodes in puppet. The is no need to do it one at a time because those nodes aren't hosting any shards any more.

Deploying plugins

Plugins are deployed via the repository at https://gerrit.wikimedia.org/r/operations/software/elasticsearch/plugins.

To use it, clone it, install git-fat (e.g. pip install --user git-fat), then run git fat init in the root of the repository. Add your jars and commit as normal but note that git-fat will add them as text files containing hashes.

To get the jars into place build them against http://archiva.wikimedia.org/ then upload them to the "Wikimedia Release Repository". You can also download them from central and verify their checksums, and then upload them to the release repository. If you need a dependency then make sure to verify its checksum and upload it to the "Wikimedia Mirrored Repository".

Once you've got the jars into Archiva wait a while for the git-fat archive to build. Then you can sync it to beta by going to the gerrit page for the change, copying the checkout link, then:

ssh deployment-tin.eqiad.wmflabs
cd /srv/deployment/elasticsearch/plugins/
git deploy start
<paste checkout link>
git deploy sync
<follow instructions>

Since the git deploy/git fat process can be a bit tempermental, verify that the plugins made it:

export prefix=deployment-elastic0
export suffix=.eqiad.wmflabs
rm -f servers
for i in {5..8}; do
    echo $prefix$i$suffix >> servers
done
cat << __commands__ > /tmp/commands
find /srv/deployment/elasticsearch/plugins/ -name "*.jar" -type f | xargs du -h | sort -k 2
__commands__
for server in $(cat servers); do
    scp /tmp/commands $server:/tmp/commands
    ssh $server bash /tmp/commands
done

If a file is 4.0k it is probably not a valid jar file.

Now once the files are synced in beta do a rolling restart in beta and they'll be loaded. Magic, eh?

To get the plugins to production repeat the above process but instead of deployment-bastion.eqiad.wmflabs use tin.eqiad.wmnet and instead of pasting the checkout links do a git pull. Use the following as the list of servers to check and for the rolling restart.

export prefix=elastic100
export suffix=.eqiad.wmnet
rm -f servers
for i in {1..9}; do
    echo $prefix$i$suffix >> servers
done
export prefix=elastic101
for i in {0..9}; do
    echo $prefix$i$suffix >> servers
done
export prefix=elastic102
for i in {0..9}; do
    echo $prefix$i$suffix >> servers
done
export prefix=elastic103
for i in {0..1}; do
    echo $prefix$i$suffix >> servers
done

Trouble

If Elasticsearch is in trouble, it can show itself in many ways. Searches could fail on the wiki, job queue could get backed up with updates, pool counter overflowing with unperformed searches, or just plain old high-load that won't go away. Luckily, Elasticsearch is very good at recovering from failure so most of the time these sorts of problems aren't life threatening. For some more specific problems and their mitigation techniques, see the /Trouble subpage.