You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Search: Difference between revisions
imported>Elukey |
imported>DCausse |
||
(35 intermediate revisions by 10 users not shown) | |||
Line 1: | Line 1: | ||
{{Navigation Wikimedia infrastructure|expand=search}} | |||
{{See also|Help:Toolforge/Elasticsearch|Help:CirrusSearch elasticsearch replicas|mw:codesearch}} | |||
[[File:CirrusSearch components.svg|alt=CirrusSearch components diagram|thumb|CirrusSearch components diagram]] | [[File:CirrusSearch components.svg|alt=CirrusSearch components diagram|thumb|CirrusSearch components diagram]] | ||
Line 8: | Line 9: | ||
* {{ircnick|dcausse|David Causse}} | * {{ircnick|dcausse|David Causse}} | ||
* {{ircnick|ebernhardson|Erik Bernhardson}} | * {{ircnick|ebernhardson|Erik Bernhardson}} | ||
* {{ircnick|ryankemper|Ryan Kemper}} | |||
* {{ircnick|gehel|Guillaume Lederrey}} | * {{ircnick|gehel|Guillaume Lederrey}} | ||
* {{ircnick|inflatador|Brian King}} | |||
You can get contact information from officewiki. | You can get contact information from officewiki. | ||
== Overview == | == Overview == | ||
This system has three components: Elastica, CirrusSearch, and Elasticsearch. | This system has three components: Elastica, CirrusSearch, and Elasticsearch. | ||
If you want to extend the data that is available to CirrusSearch, have a look [[Search/TechnicalInteractionsWithSearch|here]]. | |||
=== Elastica === | === Elastica === | ||
[[mw:Extension:Elastica|Elastica]] is a MediaWiki extension that provides the library to interface with Elasticsearch. It wraps the [https://github.com/ruflin/Elastica Elastica] library. It has no configuration. | [[mw:Extension:Elastica|Elastica]] is a MediaWiki extension that provides the library to interface with Elasticsearch. It wraps the [https://github.com/ruflin/Elastica Elastica] library. It has no configuration. | ||
Line 41: | Line 47: | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
! Environment !! Memory !! Cluster Name | ! Environment !! Memory !! Cluster Name !! Servers | ||
|- | |- | ||
| '''labs''' || 2G || configured | | '''labs''' || 2G || configured || | ||
|- | |- | ||
| '''beta''' || 4G || beta-search || | | '''beta''' || 4G || beta-search || Found in [https://github.com/wikimedia/puppet/blob/production/hieradata/cloud/eqiad1/deployment-prep/common.yaml#L209 hieradata/cloud/eqiad1/deployment-prep/common.yaml] in [https://github.com/wikimedia/puppet the puppet repo] | ||
|- | |- | ||
| eqiad || 30G || production-search-eqiad | | eqiad || 30G || production-search-eqiad || elastic10([012][0-9]*or*3[01]) (eqiad) | ||
|- | |- | ||
|codfw | |codfw | ||
|30G | |30G | ||
|production-search-codfw | |production-search-codfw | ||
|<nowiki>elastic20(0[0-9|1[0-9]|2[0-4])</nowiki> | |<nowiki>elastic20(0[0-9|1[0-9]|2[0-4])</nowiki> | ||
|} | |} | ||
Line 177: | Line 182: | ||
Batch updates use a custom job type, the <code>CirrusSearch\Job\MassIndex</code> job. The main script iterates the entire <code>page</code> table and inserts jobs in batches of 10 titles. The <code>MassIndex</code> job kicks off <code>CirrusSearch\Updater::updateFromPages()</code> to perform the actual updates. This is the same process as <code>CirrusSearch\Updater::updateFromTitle</code>, <code>updateFromTitle</code> simply does a couple extra checks around redirect handling that is unnecessary here. | Batch updates use a custom job type, the <code>CirrusSearch\Job\MassIndex</code> job. The main script iterates the entire <code>page</code> table and inserts jobs in batches of 10 titles. The <code>MassIndex</code> job kicks off <code>CirrusSearch\Updater::updateFromPages()</code> to perform the actual updates. This is the same process as <code>CirrusSearch\Updater::updateFromTitle</code>, <code>updateFromTitle</code> simply does a couple extra checks around redirect handling that is unnecessary here. | ||
==== Scheduled batch updates from analytics network ==== | |||
Jobs are scheduled in the WMF analytics network by the search platform airflow instance to collect together various information collected there and ship it back to elasticsearch. The airflow jobs build one or more files per wiki containing elasticsearch bulk update statements, uploads them to swift, and sends a message over kafka indicating availability of new information to import. The mjolnir-msearch-daemon running on search-loader instances in the production network recieve the kafka messages, download the bulk updates from swift, and pipe them into the appropriate elasticsearch clusters. This includes information such as page popularity and ml predictions from various wmf projects (link recommendation, ores, more in the future). | |||
==== Saneitizer (background repair process) ==== | |||
The saneitizer is a process to keep the CirrusSearch indices sane. It's primary purpose is to compare the revision_id held in cirrussearch and the primary wiki databases, to verify that cirrus pages are properly being updated. Pages that have a mismatched revision_id in cirrussearch and sent to the indexing pipeline to be reindexed. | |||
The saneitizer has a secondary purpose of ensuring all indexed pages have been rendered from wikitext within the last few months. It accomplishes this by indexing every n'th page it visits is such a way that after n loops over the dataset all pages will have been re-indexed. | |||
TODO fill in info on the saneitizer (leaving as stub for now) | |||
=== Job queue === | === Job queue === | ||
Line 194: | Line 211: | ||
===== Backend write jobs ===== | ===== Backend write jobs ===== | ||
These are triggered by primary and secondary update jobs to represent an individual write request to the cluster. | These are triggered by primary and secondary update jobs to represent an individual write request to the cluster. One job is inserted for every cluster to write to. In the backend the jobqueue is configured to partition the jobs by cluster into a separate queues. This partitioning ensures slowdowns indexing to one cluster do not cause similar slowdowns in the remaining clusters. | ||
* ElasticaWrite | * ElasticaWrite | ||
Line 202: | Line 220: | ||
* The kibana [https://logstash.wikimedia.org/#/dashboard/elasticsearch/cirrus logging dashboard] for CirrusSearch contains all of the low-volume logging. | * The kibana [https://logstash.wikimedia.org/#/dashboard/elasticsearch/cirrus logging dashboard] for CirrusSearch contains all of the low-volume logging. | ||
* CirrusSearchRequests - A textual log line per request from mediawiki to elasticsearch plus a json payload of information. Logged via udp2log to | * CirrusSearchRequests - A textual log line per request from mediawiki to elasticsearch plus a json payload of information. Logged via udp2log to [[Mwlog1002|mwlog1002.eqiad.wmnet]]. This is generally 1500-3000 log lines per second. Can be turned off by setting <code>$wgCirrusSearchLogElasticRequests = false</code>. | ||
* CirrusSearchRequestSet - This is a replacement for CirrusSearchRequests and is batched together at the php execution level. This simplifies the job of user analysis as they can look at the sum of what we did for a users requests, rather than the individual pieces. This is logged from mediawiki to the <code>mediawiki_CirrusSearchRequests</code> kafka topic. Can be turned off by setting <code>$wgCirrusSearchSampleRequestSetLog = 0</code>. | * CirrusSearchRequestSet - This is a replacement for CirrusSearchRequests and is batched together at the php execution level. This simplifies the job of user analysis as they can look at the sum of what we did for a users requests, rather than the individual pieces. This is logged from mediawiki to the <code>mediawiki_CirrusSearchRequests</code> kafka topic. Can be turned off by setting <code>$wgCirrusSearchSampleRequestSetLog = 0</code>. | ||
== Administration == | == Administration == | ||
All of our (CirrusSearch's) scripts have been designated to run on | All of our (CirrusSearch's) scripts have been designated to run on [[mwmaint1002|mwmaint1002.eqiad.wmnet]]. | ||
=== {{anchor|Hardware Failures}} Hardware failures === | |||
=== Hardware Failures === | Elasticsearch is robust to losing nodes. In case maintenance is required on a node (failed disk, RAM issues, whatever...), the node can be depooled and shutdown without any further action. There is no need to synchronize this shutdowns or restart with the Search Platform team (a ping to tell us it is happening is always welcomed). | ||
Elasticsearch is robust to | |||
* any node can be taken down at any time | * any node can be taken down at any time | ||
Line 216: | Line 233: | ||
* up to 3 nodes from the same datacenter row can be taken down at any time | * up to 3 nodes from the same datacenter row can be taken down at any time | ||
* for more complex operations, please synchronize with the Search Platform team first | * for more complex operations, please synchronize with the Search Platform team first | ||
=== {{anchor|Tuning Shard Counts}} Tuning shard counts === | |||
Optimal shard count requires making a tradeoff between several few competing factors. | |||
'''Quick background:''' Each Elasticsearch shard is actually a Lucene index which requires some amount of file descriptors/disk usage, compute, and RAM. So, a higher shard count causes more overhead, due to resource contention as well as "fixed costs". | |||
Now, since Elasticsearch is designed to be resilient to instance failures, if a node drops out of the cluster, shards must rebalance across the remaining nodes (likewise for changes in instance count, etc). Shard rebalancing is rate-limited by network throughput, and thus excessively large shards can cause the cluster to be stuck "recovering" (rebalancing) for an unacceptable amount of time. | |||
Thus the optimal shard size is a balancing act between overhead (which is optimized via having larger shards), and rebalancing time (which is optimized via smaller, more numerous shards). Less importantly, due to the problem of fragmentation, we also don't want a given shard to be too large a % of the available disk capacity. | |||
Currently (01/07/2020 DD/MM/YY), in most cases we don't want shards to exceed 50GB, and ideally they wouldn't be smaller than 10GB (but note that for small indices this is unavoidable). Once our Elasticsearch cluster has 10G networking, we can increase our desired shard size, due to higher networking throughput decreasing the time required to redistribute large shards. | |||
The final consideration is that different indices receive different levels of query volume. As of July 2020, '''enwiki''' and '''dewiki''' are hammered the hardest. This means nodes that have, for example,`'''enwiki_content''' shards assigned to them, will receive disproportionate load relative to nodes that lack those shards. As such, we want to set our total shard count in relation to the number of servers we have available. For example, a cluster with 36 servers would ideally have a total shard count that is close to a multiple of 36. We generally like to go slightly "under" to be able to weather losing nodes. In short, '''we want to maintain the invariant that ''most'' servers have the same number of shards for a given heavy index''', with a few servers having 1 less shard so that we have headroom to lose nodes. | |||
'''In conclusion:''' we want shards close to 50GB, but we also need our shard count set to avoid having any particularly hot servers, while making sure the shard count isn't completely even so that we can still afford to lose a few nodes. We absolutely want to avoid having a ridiculous number of extremely tiny shards, since the thrash/resource contention would grind the cluster to a halt. | |||
=== Adding new wikis === | === Adding new wikis === | ||
All wikis | All wikis have Cirrus enabled as the search engine. To add a new Cirrus wiki: | ||
# Estimate the number of shards required (one, the default, is fine for new wikis). | # Estimate the number of shards required (one, the default, is fine for new wikis). | ||
# Create the search index | # Create the search index | ||
# Populate the search index | # Populate the search index | ||
Line 243: | Line 273: | ||
===== If it has been indexed ===== | ===== If it has been indexed ===== | ||
You can get the actual size of the primary shards (in GB), divide by two, round the result to a whole number, and spit ball estimate for growth so you don't have to go do this again really soon. Normally I add one to the number if the primary shards already total to at least one GB and it isn't a wiktionary. Don't worry too much about being wrong because you can change this with an in place reindex. Anyway, to get the size log in to an elasticsearch machine and run this: | You can get the actual size of the primary shards (in GB), divide by two, round the result to a whole number, and spit ball estimate for growth so you don't have to go do this again really soon. Normally I add one to the number if the primary shards already total to at least one GB and it isn't a wiktionary. Don't worry too much about being wrong because you can change this with an in place reindex. Anyway, to get the size log in to an elasticsearch machine and run this: | ||
<syntaxhighlight lang="shell"> | |||
curl -s localhost:$port/_stats?pretty | grep 'general\|content\|"size"\|count' | less | |||
</syntaxhighlight> | |||
Count and size are repeated twice. The first time is for the primary shards and the second includes all replicas. You can ignore the replica numbers. | Count and size are repeated twice. The first time is for the primary shards and the second includes all replicas. You can ignore the replica numbers. | ||
Line 249: | Line 281: | ||
==== Create the index ==== | ==== Create the index ==== | ||
<syntaxhighlight lang="shell"> | |||
That'll create the search index with all the proper configuration | mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki $wiki --cluster=all | ||
</syntaxhighlight> | |||
That'll create the search index on all necessary clusters with all the proper configuration. | |||
==== Populate the search index ==== | ==== Populate the search index ==== | ||
<syntaxhighlight lang="shell"> | |||
mkdir -p ~/log | |||
clusters='eqiad codfw cloudelastic' | |||
If the last line of the output of the --skipLinks line doesn't end with "jobs left on the queue" wait a few minutes before launching the second job. The job queue doesn't update its counts quickly and the job might queue everything before the counts catch up and still see the queue as empty. | for cluster in eqiad codfw cloudelastic; do | ||
mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --skipLinks --indexOnSkip --queue | tee ~/log/$wiki.$cluster.parse.log | |||
mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --skipParse --queue | tee ~/log/$wiki.$cluster.links.log | |||
done | |||
</syntaxhighlight> | |||
If the last line of the output of the --skipLinks line doesn't end with "jobs left on the queue" wait a few minutes before launching the second job. The job queue doesn't update its counts quickly and the job might queue everything before the counts catch up and still see the queue as empty. If this wiki is a private wiki then <code>cloudelastic</code> should be removed from the set of clusters. No harm if it's included, but it will (should?) throw exceptions and complain. | |||
=== Health/Activity Monitoring === | === Health/Activity Monitoring === | ||
Line 293: | Line 334: | ||
=== Waiting for Elasticsearch to "go green" === | === Waiting for Elasticsearch to "go green" === | ||
Elasticsearch has a built in cluster health monitor. <code>red</code> means there are unassigned primary shards and is bad because some requests will fail. <code>Yellow</code> means there are unassigned replica shards but the masters are doing just fine. This is mostly fine because technically all requests can still be served but there is less redundancy | Elasticsearch has a built in cluster health monitor. <code>red</code> means there are unassigned primary shards and is bad because some requests will fail. <code>Yellow</code> means there are unassigned replica shards but the masters are doing just fine. This is mostly fine because technically all requests can still be served but there is less redundancy than normal and there are less shards to handle queries. Performance may be affected in the <code>yellow</code> state but requests won't just fail. Anyway, this is how you wait for the cluster to "go green". | ||
until curl -s localhost:9200/_cluster/health?pretty | tee /tmp/status | grep \"status\" | grep \"green\"; do | until curl -s localhost:9200/_cluster/health?pretty | tee /tmp/status | grep \"status\" | grep \"green\"; do | ||
cat /tmp/status | cat /tmp/status | ||
Line 367: | Line 408: | ||
The safe way starts with this to instruct Elasticsearch to move all shards off of this node as fast as it can without going under the appropriate number of replicas: | The safe way starts with this to instruct Elasticsearch to move all shards off of this node as fast as it can without going under the appropriate number of replicas: | ||
ip=$(facter ipaddress) | ip=$(facter ipaddress) | ||
curl -XPUT localhost:9200/_cluster/settings?pretty -d "{ | curl -H 'Content-Type: Application/json' -XPUT localhost:9200/_cluster/settings?pretty -d "{ | ||
\"transient\" : { | \"transient\"\: { | ||
\"cluster.routing.allocation.exclude._ip\" : \"$ip\" | \"cluster.routing.allocation.exclude._ip\": \"$ip\" | ||
} | } | ||
}" | }" | ||
Line 386: | Line 427: | ||
{if (/"metadata"/) nodes=0} | {if (/"metadata"/) nodes=0} | ||
{if (nodes && !/"name"/) {node_name=$1; gsub(/[",]/, "", node_name)}} | {if (nodes && !/"name"/) {node_name=$1; gsub(/[",]/, "", node_name)}} | ||
<nowiki> </nowiki> {if (nodes && /"name"/) {name=$3; gsub(/[",]/, "", name); node_names[node_name]=name}} | |||
<nowiki> </nowiki> {if (/"node"/) {node=$3; gsub(/[",]/, "", node)}} | |||
<nowiki> </nowiki> {if (/"shard"/) {shard=$3; gsub(/[",]/, "", shard)}} | |||
<nowiki> </nowiki> {if (more && /"index"/) { | |||
<nowiki> </nowiki> index_name=$3 | |||
<nowiki> </nowiki> gsub(/[",]/, "", index_name) | |||
<nowiki> </nowiki> print "node="node_names[node]" shard="index_name":"shard | |||
<nowiki> </nowiki><nowiki> }} | |||
' | grep $host</nowiki> | |||
Now you can restart this node and the cluster with stay green: | Now you can restart this node and the cluster with stay green: | ||
sudo /etc/init.d/elasticsearch restart | sudo /etc/init.d/elasticsearch restart | ||
Line 401: | Line 442: | ||
done | done | ||
Next you tell Elasticsearch that it can move shards back to this node: | Next you tell Elasticsearch that it can move shards back to this node: | ||
curl -XPUT localhost:9200/_cluster/settings?pretty -d "{ | curl -H 'Content-Type: Application/json' -XPUT localhost:9200/_cluster/settings?pretty -d "{ | ||
\"transient\" : { | \"transient\"": { | ||
\"cluster.routing.allocation.exclude._ip\" : \"\" | \"cluster.routing.allocation.exclude._ip\"\: \"\" | ||
} | } | ||
}" | }" | ||
Line 410: | Line 451: | ||
=== Rolling restarts === | === Rolling restarts === | ||
Rolling restarts are done with multiple servers in parallel to speed up the operation. Servers from the same availability zone (datacenter row) are restarted at the same time to ensure that all indices still have sufficient shards. 3 nodes in parallel is conservative, 4 has been tested multiple times and works if the cluster isn't under high load. | |||
To optimize recovery, writes are paused during restart and a [https://www.elastic.co/guide/en/elasticsearch/reference/6.8/indices-synced-flush.html sync flush] is issued. Pausing writes will increase backlog in the kafka queues and writes will be dropped if writes are paused for too long (at this point ~3 hours). Writes are re-enabled between each group of restarts to allow the queue to be processed. | |||
Each server runs multiple elasticsearch instances (2 at this point), both need to be restarted. | |||
The full restart logic is embedded in cookbooks for the different specific use cases: | |||
for | |||
* [https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/elasticsearch/rolling-reboot.py sre.elasticsearch.rolling-reboot]: reboots the server | |||
* [https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/elasticsearch/rolling-restart.py sre.elasticsearch.rolling-restart]: only restarts elasticsearch | |||
* [https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/elasticsearch/rolling-upgrade.py sre.elasticsearch.rolling-upgrade]: upgrade elasticsearch | |||
Those cookbooks are well tested on the main elasticsearch clusters, but not as much on the smaller ones (relforge, cloudelastic). The small clusters are simple enough to restart manually. Those cookbooks are idempotent, they can be stopped and restarted, servers already restarted will be ignored by the next run. To be on the safe side, stop the cookbook while they are waiting on the cluster to go back to green (context managers are used for all operations that needs to be undone, but a force kill might not leave them time to cleanup). | |||
'''Example run:''' | |||
On one of the cumin host: | |||
</ | <code>sudo -i cookbook sre.elasticsearch.rolling-restart search_eqiad "restart for JVM upgrade" --start-datetime 2019-06-12T08:00:00 --nodes-per-run 3</code> | ||
where: | |||
* '''search_eqiad''' is the cluster to be restarted | |||
* '''--start-datetime 2019-06-12T08:00:00''' is the time at which the operation is starting (which allows the cookbook to be restarted without restarting the already restarted servers). | |||
* '''--nodes-per-run 3''' is the maximum number of nodes to restart concurrently | |||
During rolling restarts, it is a good idea to monitor a few elasticsearch specific things: | |||
* health of the cluster: <code>watch -d -n 5 curl -s -k 'https://localhost:9243/_cluster/health?pretty'</code> (9243 is the port of the main cluster, replace with 9443 and 9643 to monitor the psi and omega clusters) | |||
* unallocated shards: <code>watch -d -n 30 "curl -s -k 'https://localhost:9243/_cat/shards' | grep -v STARTED | sort"</code> | |||
* ongoing recoveries: <code>watch -d -n 5 "curl -s -k 'https://localhost:9243/_cat/recovery' | grep -v done | sort"</code> | |||
* number of nodes restarted more than a day ago: <code>watch "curl -s -k 'https://localhost:9243/_cat/nodes?h=n,u' | sort | grep 'd\$' | wc -l"</code> | |||
</ | * elasticsearch uptime on each node: <code>watch "curl -s -k 'https://localhost:9243/_cat/nodes?h=n,u' | sort"</code> | ||
* shard allocation enabled (primaries/all): <code>watch "curl -s -k 'https://localhost:9243/_cluster/settings?pretty'"</code> | |||
'''Things that can go wrong:''' | |||
* ''some shards are not reallocated:'' Elasticsearch stops trying to recover shards after too many failures. To force reallocation, use the [https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/elasticsearch/force-shard-allocation.py sre.elasticsearch.force-shard-allocation cookbook]. | |||
* ''writes are not thawed:'' in some rare cases (that we don't entirely understand) writes are frozen during the restart, but not thawed properly. Use the [https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/elasticsearch/force-unfreeze.py sre.elasticsearch.force-unfreeze cookbooks] to manually recover. | |||
* ''master re-election takes too long:'' There is no way to preemptively force a master relection. When the current master is restarted, an election will occur. This sometimes takes long enough that it has impact. This might raise an alert and some traffic might be dropped. This recovers as soon as the new master is elected (1 or 2 minutes). We don't have a good way around this at the moment. | |||
* ''cookbook are force killed or in error:'' The cookbook use context manager for most operations that need to be undone (stop puppet, freeze writes, etc...). A force kill might not leave time to cleanup. Some operations are not rollbacked in case of exception, like pool / depool, because unknown exception might leave the server in an unkown state and do require manual checking. | |||
=== Cold restart === | === Cold restart === | ||
Line 529: | Line 517: | ||
start=$2 | start=$2 | ||
end=$3 | end=$3 | ||
mwscript extensions/CirrusSearch/maintenance/ | mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --from $start --to $end --deletes | tee -a ~/cirrus_log/$wiki.outage.log | ||
mwscript extensions/CirrusSearch/maintenance/ | mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --from $start --to $end --queue | tee -a ~/cirrus_log/$wiki.outage.log | ||
} | } | ||
while read wiki ; do | while read wiki ; do | ||
Line 537: | Line 525: | ||
=== In place reindex === | === In place reindex === | ||
Some releases require an in-place reindex. Usually in-place reindex is needed if any part of mapping: fields, analyzers, etc. - has changed. This mode of reindexing creates a new index with up-to-date mappings, but does not change the source data already stored in the index, only the interpretation of the data. | Some releases require an in-place reindex. Usually in-place reindex is needed if any part of mapping: fields, analyzers, etc. - has changed. This mode of reindexing creates a new index with up-to-date mappings, but does not change the source data already stored in the index, only the interpretation of the data. All reindex operations are very expensive, it can take multiple weeks to sequentially work through the wikis and clusters. | ||
Sometime the change may be only for wikis in particular languages so only those wikis will need an update. In any case, this is how you perform an in-place reindex: | Sometime the change may be only for wikis in particular languages so only those wikis will need an update. In any case, this is how you perform an in-place reindex: | ||
Line 543: | Line 531: | ||
cluster=$1 | cluster=$1 | ||
wiki=$2 | wiki=$2 | ||
reindex_log="$HOME/cirrus_log/$wiki.$cluster.reindex.log" | mkdir -p "$HOME/cirrus_log/" | ||
reindex_log="$HOME/cirrus_log/$wiki.$cluster.reindex.log" | |||
if [ -z "$cluster" -o -z "$wiki" ]; then | if [ -z "$cluster" -o -z "$wiki" ]; then | ||
echo "Usage: reindex [cluster] [wiki]" | echo "Usage: reindex [cluster] [wiki]" | ||
Line 550: | Line 539: | ||
TZ=UTC export REINDEX_START=$(date +%Y-%m-%dT%H:%m:%SZ) | TZ=UTC export REINDEX_START=$(date +%Y-%m-%dT%H:%m:%SZ) | ||
echo "Started at $REINDEX_START" > $reindex_log | echo "Started at $REINDEX_START" > $reindex_log | ||
mwscript extensions/CirrusSearch/maintenance/ | mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki $wiki --cluster $cluster --reindexAndRemoveOk --indexIdentifier now 2>&1 | tee -a $reindex_log && \ | ||
mwscript extensions/CirrusSearch/maintenance/ | mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --deletes | tee -a $reindex_log && \ | ||
mwscript extensions/CirrusSearch/maintenance/ | mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --queue | tee -a $reindex_log | ||
mwscript extensions/CirrusSearch/maintenance/ | mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --archive | tee -a $reindex_log | ||
} | } | ||
If you only need to reindex a certain elasticsearch index/wiki type (for example, just <code>dewiki_content</code> rather than all of <code>dewiki</code>, use the following altered version of the <code>reindex</code> function: | |||
function reindex_single_es_index() { | |||
/ | cluster="$1" | ||
wiki="$2" | |||
es_index_suffix="$3" | |||
mkdir -p "$HOME/cirrus_log/" | |||
reindex_log="$HOME/cirrus_log/$wiki.$cluster.reindex.log" | |||
if [ -z "$cluster" -o -z "$wiki" -o -z "es_index_suffix" ]; then | |||
echo "Usage: reindex [cluster] [wiki] [es_index_suffix]" | |||
return 1 | |||
fi | |||
TZ=UTC export REINDEX_START=$(date +%Y-%m-%dT%H:%m:%SZ) | |||
echo "Started at $REINDEX_START" > "$reindex_log" | |||
mwscript extensions/CirrusSearch/maintenance/UpdateOneSearchIndexConfig.php --wiki $wiki --cluster $cluster --indexType $es_index_suffix --reindexAndRemoveOk --indexIdentifier now 2>&1 | tee -a "$reindex_log" && \ | |||
mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --deletes | tee -a "$reindex_log" && \ | |||
mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --queue | tee -a "$reindex_log" | |||
mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --archive | tee -a "$reindex_log" | |||
} | |||
Every cluster is isolated, so it is suggested to run a reindex process per-cluster. Essentially create a tmux session with a pane for each cluster, and run the following in each (with CLUSTER adjusted as appropriate). This must be done for <code>eqiad</code>, <code>codfw</code> and <code>cloudelastic</code> clusters. | |||
Because the CirrusSearch reindex process involves hitting the SQL databases (data is fed from SQL into Elasticsearch), it's helpful to have each tmux pane run the reindex in a slightly different order to avoid thrash. (This is a recommended performance optimization but is not mandatory, to be clear) | |||
cluster=eqiad | |||
expanddblist all | while read wiki ; do | |||
reindex $cluster $wiki | |||
done | |||
Don't worry about incompatibility during the update - CirrusSearch is maintained so that queries and updates will always work against the currently deployed index as well as the new index configuration. Once the new index has finished building (the second command) it'll replace the old one automatically without any interruption of service. Some updates will have been lost during the reindex process. The third command will catch those updates. | |||
=== Full reindex === | === Full reindex === | ||
< | If for some reason a wiki index has to be recreated from scratch (bad backups, no other cluster to copy from, some other sort of total failure) CirrusSearch can regenerate the index from the SQL databases. Historically this was also used to add new fields or apply changes to how cirrussearch renders fields, but that use case has been replaced by the Saneitizer background process. This process is expensive and will take significant time, it is a last resort. | ||
First, do an in-place reindex as above. This is necessary due to the fact that in-place indexing updates the index mappings, while full reindex uses existing mappings to bring in the new data. Then, use this to make scripts that run the full reindex: | |||
<syntaxhighlight lang="bash"> | |||
function make_index_commands() { | function make_index_commands() { | ||
wiki=$1 | wiki=$1 | ||
Line 602: | Line 605: | ||
popd | popd | ||
} | } | ||
</ | </syntaxhighlight> | ||
Then run the scripts it makes in screen sessions. | Then run the scripts it makes in screen sessions. | ||
Line 608: | Line 611: | ||
Dumps of all indices are created weekly by a cron as part of [https://github.com/wikimedia/puppet/blob/production/modules/snapshot/manifests/cron/cirrussearch.pp the snapshot puppet module]. The dumps are available on [https://dumps.wikimedia.org/other/cirrussearch/ dumps.wikimedia.org] and can be reimported via the [https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html elasticsearch _bulk API]. | Dumps of all indices are created weekly by a cron as part of [https://github.com/wikimedia/puppet/blob/production/modules/snapshot/manifests/cron/cirrussearch.pp the snapshot puppet module]. The dumps are available on [https://dumps.wikimedia.org/other/cirrussearch/ dumps.wikimedia.org] and can be reimported via the [https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html elasticsearch _bulk API]. | ||
=== Adding new masters === | |||
New masters must be listed as "unicast_hosts" in [https://github.com/wikimedia/puppet/blob/production/hieradata/role/eqiad/elasticsearch/cirrus.yaml our production puppet repo's cirrus.yaml] file. | |||
After this change is merged (and puppet-merged), we also need to run [https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/master/scripts/push_cross_cluster_conf.py this script], which informs the other ES clusters of the new masters. The script requires manually creating an 'lst' file, [https://phabricator.wikimedia.org/T294805#7701855 here's an example on how to do that]. | |||
=== Adding new nodes === | === Adding new nodes === | ||
Line 615: | Line 624: | ||
mount point. Two things should be tweaked about this mount point: | mount point. Two things should be tweaked about this mount point: | ||
< | <syntaxhighlight lang="bash"> | ||
service elasticsearch stop | service elasticsearch stop | ||
umount /var/lib/elasticsearch | umount /var/lib/elasticsearch | ||
Line 627: | Line 636: | ||
mount /var/lib/elasticsearch | mount /var/lib/elasticsearch | ||
service elasticsearch start | service elasticsearch start | ||
</ | </syntaxhighlight> | ||
Add the node to lvs. | Add the node to lvs. | ||
Line 637: | Line 646: | ||
{if (/"metadata"/) nodes=0} | {if (/"metadata"/) nodes=0} | ||
{if (nodes && !/"name"/) {node_name=$1; gsub(/[",]/, "", node_name)}} | {if (nodes && !/"name"/) {node_name=$1; gsub(/[",]/, "", node_name)}} | ||
<nowiki> </nowiki> {if (nodes && /"name"/) {name=$3; gsub(/[",]/, "", name); node_names[node_name]=name}} | |||
<nowiki> </nowiki> {if (/"RELOCATING"/) relocating=1} | |||
<nowiki> </nowiki> {if (/"routing_nodes"/) more=0} | |||
<nowiki> </nowiki> {if (/"node"/) {from_node=$3; gsub(/[",]/, "", from_node)}} | |||
<nowiki> </nowiki> {if (/"relocating_node"/) {to_node=$3; gsub(/[",]/, "", to_node)}} | |||
<nowiki> </nowiki> {if (/"shard"/) {shard=$3; gsub(/[",]/, "", shard)}} | |||
<nowiki> </nowiki> {if (more && relocating && /"index"/) { | |||
<nowiki> </nowiki> index_name=$3 | |||
<nowiki> </nowiki> gsub(/[",]/, "", index_name) | |||
<nowiki> </nowiki> print "from="node_names[from_node]" to="node_names[to_node]" shard="index_name":"shard | |||
<nowiki> </nowiki> relocating=0 | |||
<nowiki> </nowiki><nowiki> }} | |||
'</nowiki> | |||
=== Removing nodes === | === Removing nodes === | ||
Push all the shards off the nodes you want to remove like this: | Push all the shards off the nodes you want to remove like this: | ||
curl -XPUT localhost:9200/_cluster/settings -d '{ | curl -H 'Content-Type: Application/json' -XPUT localhost:9200/_cluster/settings -d '{ | ||
"transient" : { | "transient"": { | ||
"cluster.routing.allocation.exclude._ip" : "10.x.x.x,10.x.x.y,..." | "cluster.routing.allocation.exclude._ip"p: "10.x.x.x,10.x.x.y,..." | ||
} | } | ||
}' | }' | ||
Line 670: | Line 679: | ||
Standard [[Server_Lifecycle#Reclaim_or_Decommission|decommissioning documentation]] applies. When decommissioning nodes, you should ensure that elasticsearch is cleanly stopped before the node is cut from the network. | Standard [[Server_Lifecycle#Reclaim_or_Decommission|decommissioning documentation]] applies. When decommissioning nodes, you should ensure that elasticsearch is cleanly stopped before the node is cut from the network. | ||
=== Deploying Debian Packages === | |||
When we upgrade our Elasticsearch or Debian distro versions, we have to redeploy packages, including: | |||
* elasticsearch-madvise | |||
* liblogstash-gelf-java | |||
* wmf-elasticsearch-search-plugins | |||
In the case where upstream has not changed significantly, you (an SRE or someone with SRE access) can simply copy the Debian package from the previous distro repository, [[labsconsole:Reprepro#Copying_between_distributions|as described on the reprepro page]]. | |||
=== Deploying plugins === | === Deploying plugins === | ||
Line 763: | Line 781: | ||
if [ -z $1 ] { | if [ -z $1 ] { | ||
echo "Please provide a service name." | echo "Please provide a service name." | ||
echo "Usage: systemctl-jstack elasticsearch_6@production-search-codfw.service | echo "Usage: systemctl-jstack elasticsearch_6@production-search-codfw.service" | ||
exit 1 | |||
} | } | ||
PID=$(systemctl show $1 -p MainPID | sed -e 's/^MainPID=//') | PID=$(systemctl show $1 -p MainPID | sed -e 's/^MainPID=//') | ||
Line 775: | Line 793: | ||
==== Status ==== | ==== Status ==== | ||
We have deployed | We have deployed full size elasticsearch clusters in eqiad and codfw. The cluster are queried via LVS at search.svc.{eqiad,codfw}.wmnet:9243. | ||
The logic for ongoing updates was implemented in https://gerrit.wikimedia.org/r/#/c/237264/. Switch of Elasticsearch to codfw is tested as part of the [[Switch_Datacenter|Datacenter Switch]] tests. | The logic for ongoing updates was implemented in https://gerrit.wikimedia.org/r/#/c/237264/. Switch of Elasticsearch to codfw is tested as part of the [[Switch_Datacenter|Datacenter Switch]] tests. By default application servers are configured to query the elasticsearch cluster in their own datacenter. All traffic can be shifted to a single datacenter for maintenance operations when necessary. | ||
==== Overview ==== | ==== Overview ==== | ||
Line 788: | Line 806: | ||
=== Removing Duplicate Indices === | === Removing Duplicate Indices === | ||
When index creation bails out, perhaps due to a thrown exception or some such, the cluster can be left with multiple indices of a particular type but only one active. | When index creation bails out, perhaps due to a thrown exception or some such, the cluster can be left with multiple indices of a particular type but only one active. CirrusSearch contains a script, <code>scripts/check_indices.py</code>, that will check the configuration of all wikis in the cluster and then compare the set of expected indices with those actually in the elasticsearch clusters. | ||
<br /> | <br /> | ||
/srv/mediawiki/php/extensions/CirrusSearch/scripts/check_indices.py | jq . | |||
When everything is expected the output will be an empty json array: | |||
[] | |||
When something is wrong the output will contain an object for each cluster, and then the cluster will have a list of problems found: | |||
[ | |||
{ | |||
"cluster_name": "production-search-eqiad", | |||
"problems": [ | |||
{ | |||
"reason": "Looks like Cirrus, but did not expect to exist.Deleted wiki?", | |||
"index": "ebernhardson_test_first" | |||
} | |||
], | |||
"url": "https://search.svc.eqiad.wmnet:9243" | |||
} | |||
] | |||
Indicies still have to be manually deleted after reviewing the output above. As we gain confidence in the output through manual review the tool can be updated with options to automatically apply deletes. | |||
curl -XDELETE https://search.svc.eqiad.wmnet/ebernhardson_test_first | |||
=== Rebalancing shards to even out disk space use === | === Rebalancing shards to even out disk space use === | ||
Line 814: | Line 840: | ||
To move shards away from the nodes with the least disk free, we can lower [https://www.elastic.co/guide/en/elasticsearch/reference/1.7/disk.html <code>cluster.routing.allocation.disk.watermark.high</code>] settings temporarily. The high watermark is the limit at which Elasticsearch will start moving shards out of the node to free up space. High watermark can be modified by: | To move shards away from the nodes with the least disk free, we can lower [https://www.elastic.co/guide/en/elasticsearch/reference/1.7/disk.html <code>cluster.routing.allocation.disk.watermark.high</code>] settings temporarily. The high watermark is the limit at which Elasticsearch will start moving shards out of the node to free up space. High watermark can be modified by: | ||
curl -XPUT localhost:9200/_cluster/settings -d '{ | curl -H 'Content-Type: Application/json' -XPUT localhost:9200/_cluster/settings -d '{ | ||
"transient" : { | "transient"": { | ||
"cluster.routing.allocation.disk.watermark.high" : "70%" | "cluster.routing.allocation.disk.watermark.high"h: "70%" | ||
} | } | ||
}' | }' | ||
=== Keep an eye on the number of indices on a few nodes === | === Keep an eye on the number of indices on a few nodes === | ||
Line 882: | Line 889: | ||
* Kafka queues are having issues | * Kafka queues are having issues | ||
* Job runner are having issues | * Job runner are having issues | ||
=== Stuck in old GC hell === | |||
When memory pressure is excessive the GC behaviour of an instance will switch from primarily invoking the young GC to primarily invoking the old GC. If the issue is limited to a single instance it's possible the instance has an unlucky set of shards that require more memory than the average instance. Resolving this situation involves: | |||
# Ban the node from cluster routing | |||
# Wait for all shards to drain | |||
# Restart instance | |||
# Unban the node from cluster routing | |||
Draining all the shards away ensures the instance will get a new selection of shards when it is unbanned. Restarting the instance is not strictly required, but it's nice to start with a fresh JVM after the old one has been misbehaving. | |||
=== Pool Counter rejections (search is currently too busy) === | |||
Sometimes users' searches [https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_184#Search_busy may fail with the message]: "An error has occurred while searching: Search is currently too busy. Please try again later." | |||
This is generally due to a spike in traffic which triggers MediaWiki's [https://www.mediawiki.org/wiki/PoolCounter PoolCounter], which in the case of CirrusSearch will drop requests if the traffic spike is too large. | |||
There is an alert <code>mediawiki_cirrus_pool_counter_rejections_rate</code> which should warn if a concerning number of CirrusSearch PoolCounter rejections are detected. The alert can be [https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=alert1001&service=Mediawiki+CirrusSearch+pool+counter+rejections+rate checked in icinga]. | |||
If the alert fires, take some time to consider if the current PoolCounter thresholds should be increased to increase the allowable queue size on MediaWiki's end. If the threshold cannot be increased, then all there is to do is verify that the traffic spike is organic as opposed to the result of some bug upstream. | |||
=== Deploying Elasticsearch config (.yml/jvm) changes === | |||
Elasticsearch needs to be restarted in a controlled manner. When puppet deploys configuration changes for elasticsearch the changes are applied on disk but the running service is not restarted. After puppet has succesfully run on all instances follow the [[#Rolling restarts]] runbook. | |||
== Other elasticsearch clients == | == Other elasticsearch clients == | ||
Line 890: | Line 922: | ||
* Translate Extension | * Translate Extension | ||
* Phabricator | * Phabricator | ||
* [[Toolhub.wikimedia.org|Toolhub]] | |||
=== Constraints on elasticsearch clients === | === Constraints on elasticsearch clients === | ||
Line 910: | Line 943: | ||
For a quick look at tasks on a node, you can use something like: | For a quick look at tasks on a node, you can use something like: | ||
curl -s 'localhost:9200/_tasks/?detailed&nodes= | curl -s 'localhost:9200/_tasks/?detailed&nodes=_local' | \ | ||
jq '.nodes | to_entries | .[0].value.tasks | to_entries | jq '.nodes | to_entries | .[0].value.tasks | to_entries | ||
| map({ | | map({ | ||
Line 917: | Line 950: | ||
ms:(.value.running_time_in_nanos / 1000000) | ms:(.value.running_time_in_nanos / 1000000) | ||
}) | sort_by(.ms)' | \ | }) | sort_by(.ms)' | \ | ||
less | less | ||
== Other Resources == | == Other Resources == | ||
Line 923: | Line 956: | ||
* [https://www.elastic.co/guide/en/elasticsearch/reference/current/cat.html The elasticsearch cat APIs]. This contains much of the information you want to know about the cluster. The allocation, shards, nodes, indices, health and recovery reports within are often useful for diagnosing information. | * [https://www.elastic.co/guide/en/elasticsearch/reference/current/cat.html The elasticsearch cat APIs]. This contains much of the information you want to know about the cluster. The allocation, shards, nodes, indices, health and recovery reports within are often useful for diagnosing information. | ||
* [https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-update-settings.html The elasticsearch cluster settings api]. The contains other interesting information about the current configuration of the cluster. Temporary settings changes, such as changing logging levels, are applied here. | * [https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-update-settings.html The elasticsearch cluster settings api]. The contains other interesting information about the current configuration of the cluster. Temporary settings changes, such as changing logging levels, are applied here. | ||
Information about specific CirrusSearch extra functionality: | |||
* [[Search/articletopic|articletopic]] | |||
* [[Wikidata query service/Usage#Deepcat search|deepcategory]] | |||
* For the old lucene-search docs, see [[Search/Old]]. | |||
[[Category:Search]] | [[Category:Search]] |
Revision as of 09:40, 18 May 2022
File:CirrusSearch components.svg This page covers the usage of Elastica, CirrusSearch, and Elasticsearch, in Wikimedia projects.
In case of emergency
Here is the (sorted) emergency contact list for Search issues on the WMF production cluster:
- David Causse (dcausse)
- Erik Bernhardson (ebernhardson)
- Ryan Kemper (ryankemper)
- Guillaume Lederrey (gehel)
- Brian King (inflatador)
You can get contact information from officewiki.
Overview
This system has three components: Elastica, CirrusSearch, and Elasticsearch.
If you want to extend the data that is available to CirrusSearch, have a look here.
Elastica
Elastica is a MediaWiki extension that provides the library to interface with Elasticsearch. It wraps the Elastica library. It has no configuration.
CirrusSearch
CirrusSearch is a MediaWiki extension that provides search support backed by Elasticsearch.
Turning on/off
$wmgUseCirrus
will set the wiki to use CirrusSearch by default for all searches. Doing so would break searches and search updates. You probably don't want to do this.
Configuration
The canonical location of the configuration documentation is in the CirrusSearch.php
file in the extension source. It also contains the defaults. A pool counter configuration example lives in the README in the extension source.
WMF runs the default configuration except for elasticsearch server locations but overrides would live in the CirrusSearch-{common|production|labs}.php
files in the mediawiki-config files.
Local Build
See docs: https://github.com/wikimedia/wikimedia-discovery-discovery-parent-pom
Install
To install Elasticsearch on a node, use role::elasticsearch::server
in puppet. This will install the elasticsearch Debian package and setup the service and appropriate monitoring. It should automatically join the cluster.
Configuration
role::elasticsearch::server
has no configuration.
In labs a config parameter is required. It is called elasticsearch::cluster_name
and names the cluster that the machine should join.
role::elasticsearch::server
use ::elasticsearch
to install elasticsearch with the following configuration:
Environment | Memory | Cluster Name | Servers |
---|---|---|---|
labs | 2G | configured | |
beta | 4G | beta-search | Found in hieradata/cloud/eqiad1/deployment-prep/common.yaml in the puppet repo |
eqiad | 30G | production-search-eqiad | elastic10([012][0-9]*or*3[01]) (eqiad) |
codfw | 30G | production-search-codfw | elastic20(0[0-9|1[0-9]|2[0-4]) |
Note that the multicast group and cluster name are different in different production sites. This is because the network is flat from the perspective of Elasticsearch's multicast and it is way easier to reserve separate IPs than it is to keep up with multicast ttl.
Files
/etc/default/elasticsearch | Simple settings that must be initialized either on the command line (memory) or really early (memlockall) in execution |
/etc/elasticsearch/logging.yml | Logging configuration |
/etc/logrotate.d/elasticsearch | Log rolling managed by logrotate |
/etc/elasticsearch/elasticsearch.yml | Everything not covered above, e.g. cluster name, multicast group |
Puppet
The elasticsearch cluster is provisioned via the elasticsearch module of the operations/puppet repository. Configuration for specific machines is in the hieradata folder of the same operations/puppet repository. A few of the relevant files are linked here, but a grep for 'elasticsearch' in hieradata and poking around the elasticsearch puppet module are the best ways to understand the current configuration.
Provisioning
- elasticsearch init defaults - Sets configuration needed to start the elasticsearch daemons, such as the heap size
- elasticsearch.yml - primary daemon configuration
- logging.yml - Default logging configuration. For debugging this is often supplemented at run time through the cluster update settings api.
Configuration
- Common configuration for all elasticsearch servers (this includes the separate elasticsearch cluster used with the ELK stack, not covered in this document).
- Common configuration for all eqiad elasticsearch servers.
- regex.yaml - Contains rack identifiers for each elasticsearch server. This is fed into elasticsearch.yml for rack aware shard distribution.
- Master eligibility is applied to individual selected hosts.
- A new elasticsearch server will serve queries as soon as it joins the cluster. It needs to be added to conftool-data to be added to LVS endpoint and receive direct traffic.
SSL certificates
- Elasticsearch exposes HTTPS endpoints via nginx. Puppet CA is used for certificates. Certificates have to be regenerated after the first installation to ensure they contain the correct SAN entry. The procedure is documented on Puppet CA page.
WMF Production setup
Hardware
The WMF, as of September 2015, operates a single elasticsearch cluster which serves all search requests for WikiMedia sites. This is hosted on elastic{01..31}.eqiad.wmnet.
Up to date rack placement can be found in ops/puppet here
servers | cores | memory | disk |
---|---|---|---|
elastic1017-1031 | 32 | 128G | 2x300GB SSD RAID0 * |
elastic1032-1052 | 32 | 128G | 2x800GB SSD RAID1 |
elastic2001-2024 | 32 | 128G | 2x800GB SSD RAID1 |
- Software RAID by partition is raid1 for the OS but elastic data will not survive a disk loss
Data distribution
As far as elasticsearch is concerned all of these servers are exactly the same. The hardware within the machines is not taken into account by the shard allocation algorithms. This does lead to some issues where the older machines have a higher load than the newer machines, but fixing it is not yet supported by elasticsearch.
Our largest user of elasticsearch resources is, by far, queries to enwiki. The data for enwiki is split into 7 shards with 1 master plus 3 replica. This means that each individual enwiki query is answered by 7 machines and at any given moment 28 (7*4) of the 31 machines in the cluster are serving enwiki queries. Other popular wikis are included in the table below(wikis can be listed twice, there is a 'content' index and a separate 'everything' index for each wiki):
wiki | primaries | replicas | total |
---|---|---|---|
itwiki | 9 | 2 | 27 |
dewiki, enwiki | 7 | 3 | 28 |
zhwiki, ukwiki, svwiki, nlwiki, frwiki, wikidatawiki, frwikisource, eswiki, enwikisource, ruwiki, itwiki, plwiki, jawiki, ptwiki | 7 | 2 | 21 |
viwiki, jawiki, eswiki | 6 | 2 | 18 |
arwiki, cawiki, commonswiki, enwiktionary, fawiki, ptwiki, ruwiki, zhwiki zhwikisource | 5 | 2 | 15 |
Many many more wikis served by < 15 machines |
Load balancing requests
MediaWiki talks to the elasticsearch cluster through LVS. Application servers talk to a single LVS ip address and this balances the requests out across the cluster. The read and write requests are distributed evenly across all available elasticsearch instances with no consideration for data locality.
Shard balance across the cluster
The production configuration sets index.routing.allocation.total_shards_per_node
to 1 for all indexes. This means that each server will only contain a single shard for any given index. This combined with setting the number of shards and replicas to an appropriate number ensure that the indices with heavy query volume are spread out across the entire cluster.
The elasticsearch shard distribution algorithm is relatively naive with respect to our use case. It is optimized for having shards of approximately equal size and query volume in all indexes. The WMF use case is very different. As of September 2015 there are 6419 shards with approximately the following size breakdown:
size | num shards | percent |
---|---|---|
x > 50gb | 33 | 0.5% |
50gb > x > 10gb | 56 | 0.8% |
10gb > x > 1gb | 1307 | 20.3% |
1gb > x > 100mb | 1114 | 17.3% |
100mb > x | 3909 | 60.8% |
Due to this load across the cluster needs to be occasionally monitored and shards moved around. This is further discussed in #Trouble.
Indexing
CirrusSearch updates the elasticsearch index by building and upserting almost the entire document on every edit. The revision id of the edit is used as the elasticsearch version number to ensure out-of-order writes by the job queue have no effect on the index correctness. There is, as of September 2015, only one property that is not upserted as normal on every edit. The CirrusSearch\OtherIndex
functionality which adds information to the commonswiki index about duplicate files only updates the commonswiki index when the duplicated file, on the other wiki, is indexed.
Realtime updates
The CirrusSearch extension updates the elasticsearch indexes for each and every mediawiki edit. The chain of events between a user clicking the 'save page' button and elasticsearch being updated is roughly as follows:
- MW core approves of the edit and inserts the
LinksUpdate
object intoDeferredUpdates
DeferredUpdates
runs theLinksUpdate
in the web request process, but after closing the connection to the user (so no extra delays).- When
LinksUpdate
completes it runs aLinksUpdateComplete
hook which CirrusSearch listens for. In response to this hook CirrusSearch insertsCirrusSearch\Job\LinksUpdate
for this page into the job queue (backed by Kafka in wmf prod). - The
CirrusSearch\Job\LinksUpdate job
runsCirrusSearch\Updater::updateFromTitle()
to re-build the document that represents this page in elasticsearch. For each wikilink that was added or removed this insertsCirrusSearch\Job\IncomingLinkCount
to the job queue. - The
CirrusSearch\Job\IncomingLinkCount
job runsCirrusSearch\Updater::updateLinkedArticles()
for the title that was added or removed.
Other processes that write to elasticsearch (such as page deletion) are similar. All writes to elasticsearch are funneled through the CirrusSearch\Updater
class, but this class does not directly perform writes to the elasticsearch database. This class performs all the necessary calculations and then creates the CirrusSearch\Job\ElasticaWrite
job to actually make the request to elasticsearch. When the job is run it creates CirrusSearch\DataSender
which transforms the job parameters into the full request and issues it. This is done so that any updates that fail (network errors, cluster maintenance, etc) can be re-inserted into the job queue and executed at a later time without having to re-do all the heavy calculations of what actually needs to change.
Batch updates from the database
CirrusSearch indices can also be populated from the database to bootstrap brand new clusters or to backfill existing indices for periods of time where updates, for whatever reason, were not written to the elasticsearch cluster. These updates are performed with the forceSearchIndex.php
maintenance script, the usage of which is described in multiple parts of the #Administration section.
Batch updates use a custom job type, the CirrusSearch\Job\MassIndex
job. The main script iterates the entire page
table and inserts jobs in batches of 10 titles. The MassIndex
job kicks off CirrusSearch\Updater::updateFromPages()
to perform the actual updates. This is the same process as CirrusSearch\Updater::updateFromTitle
, updateFromTitle
simply does a couple extra checks around redirect handling that is unnecessary here.
Scheduled batch updates from analytics network
Jobs are scheduled in the WMF analytics network by the search platform airflow instance to collect together various information collected there and ship it back to elasticsearch. The airflow jobs build one or more files per wiki containing elasticsearch bulk update statements, uploads them to swift, and sends a message over kafka indicating availability of new information to import. The mjolnir-msearch-daemon running on search-loader instances in the production network recieve the kafka messages, download the bulk updates from swift, and pipe them into the appropriate elasticsearch clusters. This includes information such as page popularity and ml predictions from various wmf projects (link recommendation, ores, more in the future).
Saneitizer (background repair process)
The saneitizer is a process to keep the CirrusSearch indices sane. It's primary purpose is to compare the revision_id held in cirrussearch and the primary wiki databases, to verify that cirrus pages are properly being updated. Pages that have a mismatched revision_id in cirrussearch and sent to the indexing pipeline to be reindexed.
The saneitizer has a secondary purpose of ensuring all indexed pages have been rendered from wikitext within the last few months. It accomplishes this by indexing every n'th page it visits is such a way that after n loops over the dataset all pages will have been re-indexed.
TODO fill in info on the saneitizer (leaving as stub for now)
Job queue
CirrusSearch uses the mediawiki job queue for all operations that write to the indices. The jobs can be roughly split into a few groups, as follows:
Primary update jobs
These are triggered by the actions of either users or adminstrators.
- DeletePages - Removes titles from the search index when they have been deleted.
- LinksUpdate - Updates a page after it has been edited.
- MassIndex - Used by the
forceSearchIndex.php
maintenance script to distribute indexing load across the job runners.
Secondary update jobs
These are triggered by primary update jobs to update pages or indices other than the main document.
- OtherIndex - Updates the commonswiki index with information about file uploads to all wikis to prevent showing the user duplicate uploads.
- IncomingLinkCount - Triggers against the linked page when a link is added or removed from a page. Updates the list of links coming into a page from other pages on the same wiki. This information is used as part of the rescoring algorithms. While this always sends the full list to elasticsearch, our custom search-extra plugin for elasticsearch adds functionality that only performs the update if there is more than a 20% change in the content of the incoming links field. While only one field is being updated elasticsearch docs are immutable, it will internally delete the old document and re-index the new version. The 20% change optimization prevents the cluster from being overloaded with writes when a popular template edit triggers link changes in significant numbers of pages.
Backend write jobs
These are triggered by primary and secondary update jobs to represent an individual write request to the cluster. One job is inserted for every cluster to write to. In the backend the jobqueue is configured to partition the jobs by cluster into a separate queues. This partitioning ensures slowdowns indexing to one cluster do not cause similar slowdowns in the remaining clusters.
- ElasticaWrite
Logging
CirrusSearch logs a wide variety of data. A few logs in particular are of interest:
- The kibana logging dashboard for CirrusSearch contains all of the low-volume logging.
- CirrusSearchRequests - A textual log line per request from mediawiki to elasticsearch plus a json payload of information. Logged via udp2log to mwlog1002.eqiad.wmnet. This is generally 1500-3000 log lines per second. Can be turned off by setting
$wgCirrusSearchLogElasticRequests = false
. - CirrusSearchRequestSet - This is a replacement for CirrusSearchRequests and is batched together at the php execution level. This simplifies the job of user analysis as they can look at the sum of what we did for a users requests, rather than the individual pieces. This is logged from mediawiki to the
mediawiki_CirrusSearchRequests
kafka topic. Can be turned off by setting$wgCirrusSearchSampleRequestSetLog = 0
.
Administration
All of our (CirrusSearch's) scripts have been designated to run on mwmaint1002.eqiad.wmnet.
Hardware failures
Elasticsearch is robust to losing nodes. In case maintenance is required on a node (failed disk, RAM issues, whatever...), the node can be depooled and shutdown without any further action. There is no need to synchronize this shutdowns or restart with the Search Platform team (a ping to tell us it is happening is always welcomed).
- any node can be taken down at any time
- any node can rejoin the cluster at any time
- up to 3 nodes from the same datacenter row can be taken down at any time
- for more complex operations, please synchronize with the Search Platform team first
Tuning shard counts
Optimal shard count requires making a tradeoff between several few competing factors.
Quick background: Each Elasticsearch shard is actually a Lucene index which requires some amount of file descriptors/disk usage, compute, and RAM. So, a higher shard count causes more overhead, due to resource contention as well as "fixed costs".
Now, since Elasticsearch is designed to be resilient to instance failures, if a node drops out of the cluster, shards must rebalance across the remaining nodes (likewise for changes in instance count, etc). Shard rebalancing is rate-limited by network throughput, and thus excessively large shards can cause the cluster to be stuck "recovering" (rebalancing) for an unacceptable amount of time.
Thus the optimal shard size is a balancing act between overhead (which is optimized via having larger shards), and rebalancing time (which is optimized via smaller, more numerous shards). Less importantly, due to the problem of fragmentation, we also don't want a given shard to be too large a % of the available disk capacity.
Currently (01/07/2020 DD/MM/YY), in most cases we don't want shards to exceed 50GB, and ideally they wouldn't be smaller than 10GB (but note that for small indices this is unavoidable). Once our Elasticsearch cluster has 10G networking, we can increase our desired shard size, due to higher networking throughput decreasing the time required to redistribute large shards.
The final consideration is that different indices receive different levels of query volume. As of July 2020, enwiki and dewiki are hammered the hardest. This means nodes that have, for example,`enwiki_content shards assigned to them, will receive disproportionate load relative to nodes that lack those shards. As such, we want to set our total shard count in relation to the number of servers we have available. For example, a cluster with 36 servers would ideally have a total shard count that is close to a multiple of 36. We generally like to go slightly "under" to be able to weather losing nodes. In short, we want to maintain the invariant that most servers have the same number of shards for a given heavy index, with a few servers having 1 less shard so that we have headroom to lose nodes.
In conclusion: we want shards close to 50GB, but we also need our shard count set to avoid having any particularly hot servers, while making sure the shard count isn't completely even so that we can still afford to lose a few nodes. We absolutely want to avoid having a ridiculous number of extremely tiny shards, since the thrash/resource contention would grind the cluster to a halt.
Adding new wikis
All wikis have Cirrus enabled as the search engine. To add a new Cirrus wiki:
- Estimate the number of shards required (one, the default, is fine for new wikis).
- Create the search index
- Populate the search index
Estimating the number of shards required
If the wiki is small then skip this step. The defaults are good for small wikis. If it isn't, use one of the three methods below to come up with numbers and add them to mediawiki-config's InitializeSettings.php file under wmgCirrusSearchShardCount.
See #Resharding if you are correcting a shard count error.
If it hasn't been indexed yet
You should compare its content and general page counts with a wiki with similarly sized pages. Mostly this means a wiki in a different language of the same type. Enwikisource and frwikisource are similar, for example. Wiktionaries pages are very small. As are wikidatawiki's. In any case pick a number proportional to the number for the other wiki. Don't worry too much about being wrong - resharding is as simple as changing the value in the config and performing an in place reindex. To count the pages I log in to tools-login.pmtpa.wmflabs then
sql $wiki SELECT COUNT(*) FROM page WHERE page_namespace = 0; SELECT COUNT(*) FROM page WHERE page_namespace != 0; sql $comparableWiki SELECT COUNT(*) FROM page WHERE page_namespace = 0; SELECT COUNT(*) FROM page WHERE page_namespace != 0;
Wiki's who's indexes are split other ways then content/general should use the same methodology but be sure to do it once per index and use the right namespaces for counts.
If it has been indexed
You can get the actual size of the primary shards (in GB), divide by two, round the result to a whole number, and spit ball estimate for growth so you don't have to go do this again really soon. Normally I add one to the number if the primary shards already total to at least one GB and it isn't a wiktionary. Don't worry too much about being wrong because you can change this with an in place reindex. Anyway, to get the size log in to an elasticsearch machine and run this:
curl -s localhost:$port/_stats?pretty | grep 'general\|content\|"size"\|count' | less
Count and size are repeated twice. The first time is for the primary shards and the second includes all replicas. You can ignore the replica numbers.
We max out the number of shards at 5 for content indexes and 10 for non-content indexes.
Create the index
mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki $wiki --cluster=all
That'll create the search index on all necessary clusters with all the proper configuration.
Populate the search index
mkdir -p ~/log
clusters='eqiad codfw cloudelastic'
for cluster in eqiad codfw cloudelastic; do
mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --skipLinks --indexOnSkip --queue | tee ~/log/$wiki.$cluster.parse.log
mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --skipParse --queue | tee ~/log/$wiki.$cluster.links.log
done
If the last line of the output of the --skipLinks line doesn't end with "jobs left on the queue" wait a few minutes before launching the second job. The job queue doesn't update its counts quickly and the job might queue everything before the counts catch up and still see the queue as empty. If this wiki is a private wiki then cloudelastic
should be removed from the set of clusters. No harm if it's included, but it will (should?) throw exceptions and complain.
Health/Activity Monitoring
We have nagios checks that monitor Elasticsearch's claimed health (green, yellow, red) and ganglia monitoring on a number of attributes exposed by Elasticsearch.
Expected eligible masters check and alert
There is an icinga check that checks that the expected number of eligible masters is exactly what it is. If this deviates, we should see icinga alerts. Check
We have nagios checks that monitor Elasticsearch's claimed health (green, yellow, red) and ganglia monitoring on a number of attributes exposed by Elasticsearch.
Check the actual eligible masters with:
curl -s localhost:9200/_cat/nodes?h=node.role,name | grep mdi
and then compare with puppet configuration.
If an expected eligible master node is down, we can either bring it up or promote another node to a master. Also make sure the masters are spread out across the rack.
Monitoring the job queue
![]() | This information is outdated. showJobs.php prints no output, presumably due to the switch to the Kafka Job Queue. Someone please update this with the correct commands! |
The current state of the jobqueue is visible, but only for individual wikis. Enwiki is almost always the busiest index, so we can monitor the state with:
mwscript showJobs.php --wiki=enwiki --group | grep ^cirrus
Under normal operation this will output something like:
ebernhardson@mwmaint1001:~$ mwscript showJobs.php --wiki=enwiki --group | grep ^cirrus cirrusSearchIncomingLinkCount: 19028 queued; 0 claimed (0 active, 0 abandoned); 234 delayed cirrusSearchLinksUpdatePrioritized: 35 queued; 4 claimed (4 active, 0 abandoned); 0 delayed
This output is explained in the manual. Most CirrusSearch jobs are normal to see here, but there is one exception. The cirrusSearchElasticaWrite
is typically created and run in-process from other jobs. The only time cirrusSearchElasticaWrite
in inserted into the job queue is when the write operations cannot be performed to the cluster. This may be due to writes being frozen, it could be a network partition between the job runners and the elasticsearch cluster, or it could be that the index being written to is red.
Waiting for Elasticsearch to "go green"
Elasticsearch has a built in cluster health monitor. red
means there are unassigned primary shards and is bad because some requests will fail. Yellow
means there are unassigned replica shards but the masters are doing just fine. This is mostly fine because technically all requests can still be served but there is less redundancy than normal and there are less shards to handle queries. Performance may be affected in the yellow
state but requests won't just fail. Anyway, this is how you wait for the cluster to "go green".
until curl -s localhost:9200/_cluster/health?pretty | tee /tmp/status | grep \"status\" | grep \"green\"; do cat /tmp/status echo sleep 1 done
So you aren't just watching number scroll by for fun I'll explain what they all mean:
{ "cluster_name" : "labs-search-project", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 4, "number_of_data_nodes" : 4, "active_primary_shards" : 30, "active_shards" : 55, "relocating_shards" : 0, "initializing_shards" : 1, "unassigned_shards" : 0 }
- cluster_name
- The name of the cluster.
- status
- The status we're waiting for.
- timed_out
- Has this requests for status timed out? I've never seen one time out.
- number_of_nodes
- How many nodes (data or otherwise) are in the cluster? We plan to have some master only nodes at some point so this should be three more than
number_of_data_nodes
- number_of_data_nodes
- How many nodes that hold data are in the cluster?
- active_primary_shards
- How many of the primary shards are active? This should be
indexes * shards
. This number shouldn't change frequently because when the primary shards go off line one of the replica shards should take over immediately. It should be too fast to notice. - active_shards
- How many shards are active? This should be
indexes * shards * (1 + replicas)
. This will go down when a machine leaves the cluster. When possible those shards will be reassigned to other machines. This is possible so long as those machines don't already hold a copy of that replica. - relocating_shards
- How many shards are currently relocating?
- initializing_shards
- How many shards are initializing? This mostly means they are being restored from a peer. This should max out at
cluster.routing.allocation.node_concurrent_recoveries
. We still use the default of2
. - unassigned_shards
- How many shards have yet to be assigned? These are just waiting in line to become
initializing_shards
.
See the Trouble section for what can go wrong with this.
Pausing Indexing
Various forms of cluster maintenance, such as rolling restarts, take significantly less time when the content of the indices stays static and is not being constantly written to. To support this CirrusSearch includes the concept of 'frozen' indices. When an index is frozen no ElasticaWrite jobs are executed. When an ElasticaWrite job tries to run it sees the index it wants to write to is frozen and re-inserts itself to the job queue with an exponential backoff. Individual indic
To freeze all writes to the cluster. It does not matter which --wiki=XXX
you use, this is a cluster wide setting stored in an elasticsearch index. The cluster to freeze is selected with --cluster=XXX
:
mwscript extensions/CirrusSearch/maintenance/freezeWritesToCluster --wiki=enwiki --cluster=eqiad
Once run all writes will backup into the job queue. The current state of the jobqueue is visible, but only for individual wikis. Enwiki is almost always the busiest index, so we can monitor the state with:
mwscript showJobs.php --wiki=enwiki --group | grep ^cirrus
To thaw out writes to the cluster re-run the initial script with the --thaw
option
mwscript extensions/CirrusSearch/maintenance/freezeWritesToCluster --wiki=enwiki --cluster=eqiad --thaw
After thawing the cluster you must monitor the job queue to ensure the number of cirrusSearchElasticaWrite
jobs is decreasing as expected. These jobs include very little calculation and quickly drain from the queue.
Ensure you do not leave the cluster in the frozen state for too long. As of Sept 2015 too long is six hours. In WMF production all jobs need to fit within the redis cluster's available memory. If cirrus jobs are allowed to fill up all available redis memory there will be an outage and it will not be limited to search.
In some cases (as yet unexplained) removing the "freeze-everything" document which controls the freeze fails, and the document reappears randomly. Re-creating and re-deleting this document fixes the issue. The force-unfreeze
cookbook can be used for that.
Restarting a node
Elasticsearch rebalances shards when machines join and leave the cluster. Because this can take some time we've stopped puppet from restarting Elasticsearch on config file changes. We expect to manually perform rolling restarts to pick up config changes or software updates (via apt-get), at least for the time being. There are two ways to do this: the fast way, and the safe way. At this point we prefer the fast way if the downtime is quick and the safe way if it isn't. The fast way in the Elasticsearch recommended way. The safe way keeps the cluster green the whole time but is really slow and can cause the cluster to get a tad unbalanced if it is running close to the edge on disk space.
The fast way:
es-tool restart-fast
This will instruct Elasticsearch not to allocate new replicas for the duration of the downtime. It should be faster then the simple way because unchanged indexes can be restored on the restarted machine quickly. It still takes some time.
The safe way starts with this to instruct Elasticsearch to move all shards off of this node as fast as it can without going under the appropriate number of replicas:
ip=$(facter ipaddress) curl -H 'Content-Type: Application/json' -XPUT localhost:9200/_cluster/settings?pretty -d "{ \"transient\"\: { \"cluster.routing.allocation.exclude._ip\": \"$ip\" } }"
Now you wait for all the shards to move off of this one:
host=$(facter hostname) function moving() { curl -s localhost:9200/_cluster/state | jq -c '.nodes as $nodes | .routing_table.indices[].shards[][] | select(.relocating_node) | {index, shard, from: $nodes[.node].name, to: $nodes[.relocating_node].name}' } while moving | grep $host; do echo; sleep 1; done
Now you double check that they are all off. See the advice under stuck in yellow if they aren't. It is probably because Elasticsearch can't find another place for them to go:
curl -s localhost:9200/_cluster/state?pretty | awk ' BEGIN {more=1} {if (/"nodes"/) nodes=1} {if (/"metadata"/) nodes=0} {if (nodes && !/"name"/) {node_name=$1; gsub(/[",]/, "", node_name)}} {if (nodes && /"name"/) {name=$3; gsub(/[",]/, "", name); node_names[node_name]=name}} {if (/"node"/) {node=$3; gsub(/[",]/, "", node)}} {if (/"shard"/) {shard=$3; gsub(/[",]/, "", shard)}} {if (more && /"index"/) { index_name=$3 gsub(/[",]/, "", index_name) print "node="node_names[node]" shard="index_name":"shard }} ' | grep $host
Now you can restart this node and the cluster with stay green:
sudo /etc/init.d/elasticsearch restart until curl -s localhost:9200/_cluster/health?pretty ; do sleep 1 done
Next you tell Elasticsearch that it can move shards back to this node:
curl -H 'Content-Type: Application/json' -XPUT localhost:9200/_cluster/settings?pretty -d "{ \"transient\"": { \"cluster.routing.allocation.exclude._ip\"\: \"\" } }"
Finally you can use the same code that you used to find stuff shards moving from this node to find shards moving to this node:
while moving | grep $host; do echo; sleep 1; done
Rolling restarts
Rolling restarts are done with multiple servers in parallel to speed up the operation. Servers from the same availability zone (datacenter row) are restarted at the same time to ensure that all indices still have sufficient shards. 3 nodes in parallel is conservative, 4 has been tested multiple times and works if the cluster isn't under high load.
To optimize recovery, writes are paused during restart and a sync flush is issued. Pausing writes will increase backlog in the kafka queues and writes will be dropped if writes are paused for too long (at this point ~3 hours). Writes are re-enabled between each group of restarts to allow the queue to be processed.
Each server runs multiple elasticsearch instances (2 at this point), both need to be restarted.
The full restart logic is embedded in cookbooks for the different specific use cases:
- sre.elasticsearch.rolling-reboot: reboots the server
- sre.elasticsearch.rolling-restart: only restarts elasticsearch
- sre.elasticsearch.rolling-upgrade: upgrade elasticsearch
Those cookbooks are well tested on the main elasticsearch clusters, but not as much on the smaller ones (relforge, cloudelastic). The small clusters are simple enough to restart manually. Those cookbooks are idempotent, they can be stopped and restarted, servers already restarted will be ignored by the next run. To be on the safe side, stop the cookbook while they are waiting on the cluster to go back to green (context managers are used for all operations that needs to be undone, but a force kill might not leave them time to cleanup).
Example run: On one of the cumin host:
sudo -i cookbook sre.elasticsearch.rolling-restart search_eqiad "restart for JVM upgrade" --start-datetime 2019-06-12T08:00:00 --nodes-per-run 3
where:
- search_eqiad is the cluster to be restarted
- --start-datetime 2019-06-12T08:00:00 is the time at which the operation is starting (which allows the cookbook to be restarted without restarting the already restarted servers).
- --nodes-per-run 3 is the maximum number of nodes to restart concurrently
During rolling restarts, it is a good idea to monitor a few elasticsearch specific things:
- health of the cluster:
watch -d -n 5 curl -s -k 'https://localhost:9243/_cluster/health?pretty'
(9243 is the port of the main cluster, replace with 9443 and 9643 to monitor the psi and omega clusters) - unallocated shards:
watch -d -n 30 "curl -s -k 'https://localhost:9243/_cat/shards' | grep -v STARTED | sort"
- ongoing recoveries:
watch -d -n 5 "curl -s -k 'https://localhost:9243/_cat/recovery' | grep -v done | sort"
- number of nodes restarted more than a day ago:
watch "curl -s -k 'https://localhost:9243/_cat/nodes?h=n,u' | sort | grep 'd\$' | wc -l"
- elasticsearch uptime on each node:
watch "curl -s -k 'https://localhost:9243/_cat/nodes?h=n,u' | sort"
- shard allocation enabled (primaries/all):
watch "curl -s -k 'https://localhost:9243/_cluster/settings?pretty'"
Things that can go wrong:
- some shards are not reallocated: Elasticsearch stops trying to recover shards after too many failures. To force reallocation, use the sre.elasticsearch.force-shard-allocation cookbook.
- writes are not thawed: in some rare cases (that we don't entirely understand) writes are frozen during the restart, but not thawed properly. Use the sre.elasticsearch.force-unfreeze cookbooks to manually recover.
- master re-election takes too long: There is no way to preemptively force a master relection. When the current master is restarted, an election will occur. This sometimes takes long enough that it has impact. This might raise an alert and some traffic might be dropped. This recovers as soon as the new master is elected (1 or 2 minutes). We don't have a good way around this at the moment.
- cookbook are force killed or in error: The cookbook use context manager for most operations that need to be undone (stop puppet, freeze writes, etc...). A force kill might not leave time to cleanup. Some operations are not rollbacked in case of exception, like pool / depool, because unknown exception might leave the server in an unkown state and do require manual checking.
Cold restart
A cold restart (shutting down all nodes and restarting all of them at the same time) is governed by the following settings:
- recover_after_nodes
- recover_after_time
- expected_nodes
Our current understanding is that elasticsearch will wait for expected_nodes
to be available, and recover anyway after recover_after_time
if at least recover_after_nodes
are present.
Recovering from an Elasticsearch outage/interruption in updates
The same script that populates the search index can be run over a more limited list of pages:
mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --from <YYYY-mm-ddTHH:mm:ssZ> --to <YYYY-mm-ddTHH:mm:ssZ>
or:
mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --fromId <id> --toId <id>
So the simplest way to recover from an Elasticsearch outage should be to use --from and --to with the times of the outage. Don't be stingy with the dates - it is better to reindex too many pages than too few.
If there was an outage its probably good to also do:
mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --deletes --from <YYYY-mm-ddTHH:mm:ssZ> --to <YYYY-mm-ddTHH:mm:ssZ>
This will pick up deletes which need to be iterated separately.
This is the script I have in my notes for recovering from an outage:
function outage() { wiki=$1 start=$2 end=$3 mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --from $start --to $end --deletes | tee -a ~/cirrus_log/$wiki.outage.log mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --from $start --to $end --queue | tee -a ~/cirrus_log/$wiki.outage.log } while read wiki ; do outage $wiki '2015-03-01T20:00:00Z' '2015-03-02T00:00:00Z' done < /srv/mediawiki/dblists/all.dblist
In place reindex
Some releases require an in-place reindex. Usually in-place reindex is needed if any part of mapping: fields, analyzers, etc. - has changed. This mode of reindexing creates a new index with up-to-date mappings, but does not change the source data already stored in the index, only the interpretation of the data. All reindex operations are very expensive, it can take multiple weeks to sequentially work through the wikis and clusters.
Sometime the change may be only for wikis in particular languages so only those wikis will need an update. In any case, this is how you perform an in-place reindex:
function reindex() { cluster=$1 wiki=$2 mkdir -p "$HOME/cirrus_log/" reindex_log="$HOME/cirrus_log/$wiki.$cluster.reindex.log" if [ -z "$cluster" -o -z "$wiki" ]; then echo "Usage: reindex [cluster] [wiki]" return 1 fi TZ=UTC export REINDEX_START=$(date +%Y-%m-%dT%H:%m:%SZ) echo "Started at $REINDEX_START" > $reindex_log mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki $wiki --cluster $cluster --reindexAndRemoveOk --indexIdentifier now 2>&1 | tee -a $reindex_log && \ mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --deletes | tee -a $reindex_log && \ mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --queue | tee -a $reindex_log mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --archive | tee -a $reindex_log }
If you only need to reindex a certain elasticsearch index/wiki type (for example, just dewiki_content
rather than all of dewiki
, use the following altered version of the reindex
function:
function reindex_single_es_index() { cluster="$1" wiki="$2" es_index_suffix="$3" mkdir -p "$HOME/cirrus_log/" reindex_log="$HOME/cirrus_log/$wiki.$cluster.reindex.log" if [ -z "$cluster" -o -z "$wiki" -o -z "es_index_suffix" ]; then echo "Usage: reindex [cluster] [wiki] [es_index_suffix]" return 1 fi TZ=UTC export REINDEX_START=$(date +%Y-%m-%dT%H:%m:%SZ) echo "Started at $REINDEX_START" > "$reindex_log" mwscript extensions/CirrusSearch/maintenance/UpdateOneSearchIndexConfig.php --wiki $wiki --cluster $cluster --indexType $es_index_suffix --reindexAndRemoveOk --indexIdentifier now 2>&1 | tee -a "$reindex_log" && \ mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --deletes | tee -a "$reindex_log" && \ mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --queue | tee -a "$reindex_log" mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki $wiki --cluster $cluster --from $REINDEX_START --archive | tee -a "$reindex_log" }
Every cluster is isolated, so it is suggested to run a reindex process per-cluster. Essentially create a tmux session with a pane for each cluster, and run the following in each (with CLUSTER adjusted as appropriate). This must be done for eqiad
, codfw
and cloudelastic
clusters.
Because the CirrusSearch reindex process involves hitting the SQL databases (data is fed from SQL into Elasticsearch), it's helpful to have each tmux pane run the reindex in a slightly different order to avoid thrash. (This is a recommended performance optimization but is not mandatory, to be clear)
cluster=eqiad expanddblist all | while read wiki ; do reindex $cluster $wiki done
Don't worry about incompatibility during the update - CirrusSearch is maintained so that queries and updates will always work against the currently deployed index as well as the new index configuration. Once the new index has finished building (the second command) it'll replace the old one automatically without any interruption of service. Some updates will have been lost during the reindex process. The third command will catch those updates.
Full reindex
If for some reason a wiki index has to be recreated from scratch (bad backups, no other cluster to copy from, some other sort of total failure) CirrusSearch can regenerate the index from the SQL databases. Historically this was also used to add new fields or apply changes to how cirrussearch renders fields, but that use case has been replaced by the Saneitizer background process. This process is expensive and will take significant time, it is a last resort.
First, do an in-place reindex as above. This is necessary due to the fact that in-place indexing updates the index mappings, while full reindex uses existing mappings to bring in the new data. Then, use this to make scripts that run the full reindex:
function make_index_commands() {
wiki=$1
mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki $wiki --queue --maxJobs 10000 --pauseForJobs 1000 \
--buildChunks 250000 |
sed -e 's/^php\s*/mwscript extensions\/CirrusSearch\/maintenance\//' |
sed -e 's/$/ | tee -a cirrus_log\/'$wiki'.parse.log/'
}
function make_many_reindex_commands() {
wikis=$(pwd)/$1
rm -rf cirrus_scripts
mkdir cirrus_scripts
pushd cirrus_scripts
while read wiki ; do
make_index_commands $wiki
done < $wikis | split -n r/5
for script in x*; do sort -R $script > $script.sh && rm $script; done
popd
}
Then run the scripts it makes in screen sessions.
Dumps
Dumps of all indices are created weekly by a cron as part of the snapshot puppet module. The dumps are available on dumps.wikimedia.org and can be reimported via the elasticsearch _bulk API.
Adding new masters
New masters must be listed as "unicast_hosts" in our production puppet repo's cirrus.yaml file.
After this change is merged (and puppet-merged), we also need to run this script, which informs the other ES clusters of the new masters. The script requires manually creating an 'lst' file, here's an example on how to do that.
Adding new nodes
Add the new nodes in puppet making sure to include all the current roles and account, sudo, and lvs settings.
Once the node is installed and puppet has run, you should be left with large RAID 0 ext4 /var/lib/elasticsearch
mount point. Two things should be tweaked about this mount point:
service elasticsearch stop
umount /var/lib/elasticsearch
# Remove the default 5% blocks reserved for privileged processes.
tune2fs -m 0 /dev/md2
# add the noatime option to fstab.
sed -i 's@/var/lib/elasticsearch ext4 defaults@/var/lib/elasticsearch ext4 defaults,noatime@' /etc/fstab
mount /var/lib/elasticsearch
service elasticsearch start
Add the node to lvs.
You can watch he node suck up shards:
curl -s localhost:9200/_cluster/state?pretty | awk ' BEGIN {more=1} {if (/"nodes"/) nodes=1} {if (/"metadata"/) nodes=0} {if (nodes && !/"name"/) {node_name=$1; gsub(/[",]/, "", node_name)}} {if (nodes && /"name"/) {name=$3; gsub(/[",]/, "", name); node_names[node_name]=name}} {if (/"RELOCATING"/) relocating=1} {if (/"routing_nodes"/) more=0} {if (/"node"/) {from_node=$3; gsub(/[",]/, "", from_node)}} {if (/"relocating_node"/) {to_node=$3; gsub(/[",]/, "", to_node)}} {if (/"shard"/) {shard=$3; gsub(/[",]/, "", shard)}} {if (more && relocating && /"index"/) { index_name=$3 gsub(/[",]/, "", index_name) print "from="node_names[from_node]" to="node_names[to_node]" shard="index_name":"shard relocating=0 }} '
Removing nodes
Push all the shards off the nodes you want to remove like this:
curl -H 'Content-Type: Application/json' -XPUT localhost:9200/_cluster/settings -d '{ "transient"": { "cluster.routing.allocation.exclude._ip"p: "10.x.x.x,10.x.x.y,..." } }'
Then wait for there to be no more relocating shards:
echo "Waiting until relocating shards is 0..." until curl -s localhost:9200/_cluster/health?pretty | tee /tmp/status | grep '"relocating_shards" : 0,'; do cat /tmp/status | grep relocating_shards sleep 2 done
Use the cluster state API to make sure all the shards are off of the node:
curl localhost:9200/_cluster/state?pretty | less
Elasticsearch will leave the shard on a node even if it can't find another place for it to go. If you see that then you have probably removed too many nodes. Finally, you can decommission those nodes in puppet. The is no need to do it one at a time because those nodes aren't hosting any shards any more.
Standard decommissioning documentation applies. When decommissioning nodes, you should ensure that elasticsearch is cleanly stopped before the node is cut from the network.
Deploying Debian Packages
When we upgrade our Elasticsearch or Debian distro versions, we have to redeploy packages, including:
- elasticsearch-madvise
- liblogstash-gelf-java
- wmf-elasticsearch-search-plugins
In the case where upstream has not changed significantly, you (an SRE or someone with SRE access) can simply copy the Debian package from the previous distro repository, as described on the reprepro page.
Deploying plugins
Plugins (for search) are deployed using the wmf-elasticsearch-search-plugins debian package. This package is built using the ops-es-plugins repository and deployed to our apt repo like any other debian packages we maintain.
The list of plugins we currently maintain is:
They are currently released using the classic maven process (release:prepare|perform) and deployed to maven central.
The repository name used for deploying to central is ossrh
and thus the engineer performing the release must have its ~/.m2/settings.xml
with the following credentials set:
<settings>
<servers>
<server>
<id>ossrh</id>
<username>username</username>
<password>password</password>
</server>
</servers>
</settings>
To request write permission to this repository one must create a request like this one and obtain a +1
from a person already granted.
Please see discovery-parent-pom for generic advises on the build process of the java projects we maintain.
Inspecting logs
There is two different types of log, we have elasticsearch logs on each node and cirrus logs.
Elasticsearch logs
The logs generated by elasticsearch are located in /var/logs/elasticsearch/
:
production-search-eqiad.log
is the main log, it contains all errors (mostly failed queries due to syntax errors). It's generally a good idea to keep this log opened on the master and the node involved in administration tasks.production-search-eqiad_index_search_slowlog.log
contains the queries that are considered slow (thresholds are described in /etc/elasticsearch/elasticsearch.yml)elasticsearch_hot_threads.log
is a snapshot of java threads considered "hot" (generated by the python scriptelasticsearch_hot_threads_logger.py
, it takes a snapshot from hot threads every 10 or 50 seconds depending on the load of the node)
NOTE: the log verbosity can be changed at runtime, see elastic search logging for more details.
Cirrus logs
The logs generated by cirrus are located on mwlog1001.eqiad.wmnet:/a/mw-log/
:
CirrusSearch.log
: the main log. Around 300-500 lines generated per second.CirrusSearchRequests.log
: contains all requests (queries and updates) sent by cirrus to elastic.Generates between 1500 and 3000+ lines per second.CirrusSearchSlowRequests.log
: contains all slow requests (the threshold is currently set to 10s but can be changed with $wgCirrusSearchSlowSearch). Few lines per day.CirrusSearchChangeFailed.log
: contains all failed updates. Few lines per day except in case of cluster outage.
Useful commands :
See all errors in realtime (useful when doing maintenance on the elastic cluster)
tail -f /a/mw-log/CirrusSearch.log | grep -v DEBUG
WARNING: you can rapidly get flooded if the pool counter is full.
Measure the throughput between cirrus and elastic (requests/sec) in realtime
tail -f /a/mw-log/CirrusSearchRequests.log | pv -l -i 5 > /dev/null
NOTE: this is an estimation because I'm not sure that all requests are logged here. For instance: I think that the requests sent to the frozen_index are not logged here. You can add something like 150 or 300 qps (guessed by counting the number of "Allowed write" in CirrusSearch.log
)
Measure the number of prefix queries per second for enwiki in realtime
tail -f /a/mw-log/CirrusSearchRequests.log | grep enwiki_content | grep " prefix search " | pv -l -i 5 > /dev/null
Using jstack or jmap or other similar tools to view logs
Our elasticsearch systemd unit sets PrivateTmp=true, which is overall a good thing. But this prevents jstack / jmap / etc. from connecting to the JVM as they expect a socket in the temp directory. The easy/safe workaround to viewing these logs is via nsenter (See T230774). Example:
gehel@elastic2050:~$ sudo systemctl status elasticsearch_6@production-search-codfw.service | grep "Main PID"
Main PID: 1110 (java)
gehel@elastic2050:~$ sudo nsenter -t 1110 -m sudo -u elasticsearch jstack 1110
2019-08-20 08:58:53
Full thread dump OpenJDK 64-Bit Server VM (25.222-b10 mixed mode):
"Attach Listener" #9275 daemon prio=9 os_prio=0 tid=0x00007f9c48002000 nid=0x31f81 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"elasticsearch[elastic2050-production-search-codfw][fetch_shard_store][T#15]" #7796 daemon prio=5 os_prio=0 tid=0x00007f9b58025000 nid=0x2a438 waiting on condition [0x00007f8a64a5e000]
java.lang.Thread.State: WAITING (parking)
[...]
or We can go further with:
systemctl-jstack() {
if [ -z $1 ] {
echo "Please provide a service name."
echo "Usage: systemctl-jstack elasticsearch_6@production-search-codfw.service"
exit 1
}
PID=$(systemctl show $1 -p MainPID | sed -e 's/^MainPID=//')
USER=$(systemctl show $1 -p User | sed - e 's/^User=//')
sudo nsenter -t $PID -m sudo -u $USER jstack $PID
}
Multi-DC Operations
Status
We have deployed full size elasticsearch clusters in eqiad and codfw. The cluster are queried via LVS at search.svc.{eqiad,codfw}.wmnet:9243. The logic for ongoing updates was implemented in https://gerrit.wikimedia.org/r/#/c/237264/. Switch of Elasticsearch to codfw is tested as part of the Datacenter Switch tests. By default application servers are configured to query the elasticsearch cluster in their own datacenter. All traffic can be shifted to a single datacenter for maintenance operations when necessary.
Overview
There is no current plan to utilize a second job queue in codfw for the secondary elastic cluster. As update jobs are spawned in eqiad they will be spawned in duplicate with one for each cluster. This will keep both clusters in sync. Prior to that we need to do initial index population.
DC Switch
Point CirrusSearch to codfw by editing wmgCirrusSearchDefaultCluster
InitialiseSettings.php. The default value is "local", which means that if mediawiki switches DC, everything should be automatic.
Having search traffic flow between 2 datacenters increases the privacy risks. HTTPS has been deployed (on port 9243) to mitigate this risk.
Removing Duplicate Indices
When index creation bails out, perhaps due to a thrown exception or some such, the cluster can be left with multiple indices of a particular type but only one active. CirrusSearch contains a script, scripts/check_indices.py
, that will check the configuration of all wikis in the cluster and then compare the set of expected indices with those actually in the elasticsearch clusters.
/srv/mediawiki/php/extensions/CirrusSearch/scripts/check_indices.py | jq .
When everything is expected the output will be an empty json array:
[]
When something is wrong the output will contain an object for each cluster, and then the cluster will have a list of problems found:
[ { "cluster_name": "production-search-eqiad", "problems": [ { "reason": "Looks like Cirrus, but did not expect to exist.Deleted wiki?", "index": "ebernhardson_test_first" } ], "url": "https://search.svc.eqiad.wmnet:9243" } ]
Indicies still have to be manually deleted after reviewing the output above. As we gain confidence in the output through manual review the tool can be updated with options to automatically apply deletes.
curl -XDELETE https://search.svc.eqiad.wmnet/ebernhardson_test_first
Rebalancing shards to even out disk space use
Viewing free disk space per node:
curl -s localhost:9200/_cat/nodes?h=d,h | sort -nr
To move shards away from the nodes with the least disk free, we can lower cluster.routing.allocation.disk.watermark.high
settings temporarily. The high watermark is the limit at which Elasticsearch will start moving shards out of the node to free up space. High watermark can be modified by:
curl -H 'Content-Type: Application/json' -XPUT localhost:9200/_cluster/settings -d '{ "transient"": { "cluster.routing.allocation.disk.watermark.high"h: "70%" } }'
Keep an eye on the number of indices on a few nodes
When banning nodes to prepare for some maintenance operations, it is useful to see how many shards are left on those nodes:
watch -d -n 10 \ "curl -s localhost:9200/_cat/shards \ | awk '{print \$8}' \ | egrep 'elastic10(21|22|23|24)' \ | sort \ | uniq -c"
Elasticsearch Curator
elasticsearch-curator is a python tool which provides high level configuration for some maintenance operations. Its configuration is based on action files. Some standard actions are deployed on each elasticsearch node in /etc/curator
. For example, you can disable shard routing with:
sudo curator --config /etc/curator/config.yaml /etc/curator/disable-shard-allocation.yaml
Curator must be run as root to access its log file (/var/log/curator.log
).
Explain unallocated shards
Elasticsearch 5.2 introduced a new API endpoint to explain why a shard is unassigned:
curl -s localhost:9200/_cluster/allocation/explain?pretty
Shards stuck in recovery
We've seen cases where some shards are stuck in recovery and never complete the recovery process. Reducing temporarily the number of replicas and increasing it again once the cluster is green seems to fix the issue. So far, this has happen only during the 5.6.4 to 6.5.4 upgrade.
curl -k -X PUT "https://localhost:9243/$index/_settings" -H 'Content-Type: application/json' -d' { "index" : { "auto_expand_replicas" : "0-2", "number_of_replicas" : 2 } } '
No updates
If no updates are flowing, the usual culprit can be:
- indexing is frozen
- Kafka queues are having issues
- Job runner are having issues
Stuck in old GC hell
When memory pressure is excessive the GC behaviour of an instance will switch from primarily invoking the young GC to primarily invoking the old GC. If the issue is limited to a single instance it's possible the instance has an unlucky set of shards that require more memory than the average instance. Resolving this situation involves:
- Ban the node from cluster routing
- Wait for all shards to drain
- Restart instance
- Unban the node from cluster routing
Draining all the shards away ensures the instance will get a new selection of shards when it is unbanned. Restarting the instance is not strictly required, but it's nice to start with a fresh JVM after the old one has been misbehaving.
Pool Counter rejections (search is currently too busy)
Sometimes users' searches may fail with the message: "An error has occurred while searching: Search is currently too busy. Please try again later."
This is generally due to a spike in traffic which triggers MediaWiki's PoolCounter, which in the case of CirrusSearch will drop requests if the traffic spike is too large.
There is an alert mediawiki_cirrus_pool_counter_rejections_rate
which should warn if a concerning number of CirrusSearch PoolCounter rejections are detected. The alert can be checked in icinga.
If the alert fires, take some time to consider if the current PoolCounter thresholds should be increased to increase the allowable queue size on MediaWiki's end. If the threshold cannot be increased, then all there is to do is verify that the traffic spike is organic as opposed to the result of some bug upstream.
Deploying Elasticsearch config (.yml/jvm) changes
Elasticsearch needs to be restarted in a controlled manner. When puppet deploys configuration changes for elasticsearch the changes are applied on disk but the running service is not restarted. After puppet has succesfully run on all instances follow the #Rolling restarts runbook.
Other elasticsearch clients
Cirrus is not the only application using our "search" elasticsearch cluster. In particular:
- API Feature requests
- Translate Extension
- Phabricator
- Toolhub
Constraints on elasticsearch clients
Reading this page should already give you some idea of how our elasticsearch cluster is managed. A short checklist that can help you:
- Writes need to be robust: Our elasticsearch cluster can go in read-only mode for various reasons (loss of a node, maintenance operations, ...). While this is relatively rare, it is part of normal operation. The client application is responsible to implement queuing / retry.
- Writes need to be stoppable: During cluster restart, we need to stop all writes to ensure nodes can be restarted fast. A client application needs to provide a way to pause and restart writes. Ideally it should provide a way to flush pending writes as well (this can be used when writes are re-enabled to process all pending writes quickly, before the next read-only period).
- Multi-DC aware: We have 2 independent clusters, writes have to be duplicated to both clusters, including index creation, the client application has to provide a mechanism to switch from one cluster to the other (in case of loss of datacenter, major maintenance on the cluster, ...)
Trouble
If Elasticsearch is in trouble, it can show itself in many ways. Searches could fail on the wiki, job queue could get backed up with updates, pool counter overflowing with unperformed searches, or just plain old high-load that won't go away. Luckily, Elasticsearch is very good at recovering from failure so most of the time these sorts of problems aren't life threatening. For some more specific problems and their mitigation techniques, see the /Trouble subpage.
Tasks Api
Sometimes pathological queries can get "stuck" burning cpu as they try and execute for many minutes. We can ask elasticsearch what tasks are currently running with the tasks api:
curl https://search.svc.eqiad.wmnet:9243/_tasks?detailed > tasks
This will dump a multi-megabyte json file that indicates all tasks (such as shard queries, or the parent query that issued shard queries) and how long they have been running. The inclusion of the detailed flag means all shard queries will report the index they are running against, and the parent query will report the exact text of the json search query that cirrussearch provided. For an extended incident this can be useful to track down queries that have been running extended periods of time, but for brief blips it may not have much information.
For a quick look at tasks on a node, you can use something like:
curl -s 'localhost:9200/_tasks/?detailed&nodes=_local' | \ jq '.nodes | to_entries | .[0].value.tasks | to_entries | map({ action: .value.action, desc: .value.description[0:100], ms:(.value.running_time_in_nanos / 1000000) }) | sort_by(.ms)' | \ less
Other Resources
- The elasticsearch cat APIs. This contains much of the information you want to know about the cluster. The allocation, shards, nodes, indices, health and recovery reports within are often useful for diagnosing information.
- The elasticsearch cluster settings api. The contains other interesting information about the current configuration of the cluster. Temporary settings changes, such as changing logging levels, are applied here.
Information about specific CirrusSearch extra functionality:
- articletopic
- deepcategory
- For the old lucene-search docs, see Search/Old.