You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Search/S3 Plugin Enable: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Ebernhardson
imported>Ebernhardson
Line 82: Line 82:
==== Restore Snapshot ====
==== Restore Snapshot ====


To restore the snapshot we should override a few index settings. The below settings may or may not be appropriate for your use case, please review as necessary.  In the specific example below the <code>number_of_replicas</code> was set to 0 with the intent of expanding that post-restore using normal cross-cluster index recovery mechanisms.  <code>total_shards_per_node</code> was set to a large value to allow a 32 shard index to be restored into a 6 node cluster.
To restore the snapshot we should override a few index settings. The below settings may or may not be appropriate for your use case, please review as necessary.  In the specific example below the <code>number_of_replicas</code> was set to 0 with the intent of expanding that post-restore using normal index recovery mechanisms.  <code>total_shards_per_node</code> was set to a large value to allow a 32 shard index to be restored into a 6 node cluster.


<pre>
<pre>

Revision as of 15:35, 14 July 2022

Brian's notes for deploying the S3 plugin. See T309720 for more details. and this page for more about the test environment.

Why Enable the S3 Plugin?

Currently, there is no easy way to copy data from one cluster to another. The easiest way to do this is to use thanos-swift (an object storage service with an S3-compatible API) to move data around. Elasticsearch has better support for the S3 API as opposed to swift. This is (sadly) pretty common, as the Search platform team has seen with flink already.

Complicating factors

We have an unorthodox Elastic environment: specifically, we run 2 or 3 Elasticsearch instances on a single host. As a result, using the elasticsearch-keystore requires special care.

Getting the elastic keystore path right

By default, Elasticsearch requires the S3 client key and secret key (aka username and password) to be stored in its keystore, instead of in config files or its API (this is typical for values it considers sensitive). By default, elasticsearch-keystore invokes java with the wrong es.path.conf. We can override by setting ES_PATH_CONF when invoking elasticsearch-keystore:

export ES_PATH_CONF=/etc/elasticsearch/production-search-psi-codfw; /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.access_key

Permissions are also very important! The keystore file must have permissions root:elasticsearch and mode 0640 . If the elasticsearch service fails to start after a keystore change, check the paths and permissions. A brand-new elasticsearch-keystore file in /etc/elasticsearch/ means the ES_PATH_CONF=environment variable was not respected. If theelasticsearch-keystore file is owned by root:root , the service will not start.

The keystore file has no validation

"...the keystore has no validation to block unsupported settings. Adding unsupported settings to the keystore will cause Elasticsearch to fail to start." More at Elastic's website

The keystore file must be identical across all cluster nodes

Since we don't use shared storage and the keystore file isn't a simple flat file, we do some interesting stuff with puppet to make this work.

Path-style and bucket-style access

We use the thanos-swift cluster as our object store, via its S3-compatible API . "Real" S3 supports bucket-based access, which relies on DNS records. We don't have this, so we must use path-style access. Unfortunately, Elastic added, removed, then re-added support for this feature. As of this writing, we are on Elasticsearch 6.8, which does not support path-style access. As a result, we use our own custom version of the s3 plugin, installed via our elastic-plugins deb package.

Overloading LVS when creating a snapshot

When creating a snapshot, particularly from the 30+ node production clusters, we can end up overloading the LVS instances that load balance requests going into thanos-swift. We've been notified by network ops previously when creating a 32 shard snapshot with the default max_snapshot_bytes_per_sec of 40mb. Reducing to 20mb allowed the snapshot to complete without blowing up anyones alarms. This may not be necessary when snapshoting from a small cluster or an index with only a couple shards.

Premature end of Content-Length delimited message body

When restoring a snapshot we've seen failures where it appears that data being loaded from thanos-swift into elasticsearch doesn't make it all the way there. Elasticsearch will make multiple attempts to restore the shard and each time it will get a different amount of data that doesn't match the expected index size. We don't know exactly which values made it work, but a succesfull snapshot/restore of a 32 shard, 1+TB index was taken with aggressively setting these values quite low, to a chunk_size of 50mb and max_restore_bytes_per_sec pulled down to 10mb in a 6 node cluster. Note that when using chunk_size less than 100mb buffer_size has to be reduced to match. It is likely these values could be somewhere between these values and the defaults, but we haven't experimented yet.

index.routing.allocation.total_shards_per_node

When moving an index between clusters it will, by default, load into the new cluster with the total_shards_per_node value that was appropriate to the source cluster but may not be appropriate to the target cluster. If a restore is attempted to a smaller cluster than the source without changing this value the snapshot will eventually fail with an allocation_explanation, from /_cluster/allocation/explain, of cannot allocate because allocation is not permitted to any of the nodes. This value needs to be overridden in the index_settings section of the snapshot restore api call.


API Calls

Register S3 repository

The below API call registers a snapshot repository with the arbitrary name "elastic_snaps". The exact values of max_(snapshot|restore)_bytes_per_sec depend on circumstances. Snapshotting a 32 shard index needs a lower setting than a 1 shard index. Restoring a 32 shard index to a 35 node cluster needs different settings than restoring the same index to a 6 node cluster. The values below are conservative and hopefully work everywhere, but may take longer than strictly necessary to complete an operation.

More details at Elastic.co .

curl -H 'Content-type: Application/json' -XPUT  http://127.0.0.1:9200/_snapshot/elastic_snaps -d '
{
  "type": "s3",
  "settings": {
   "bucket": "elasticsearch-snapshot",
   "client": "default",
   "endpoint": "https://thanos-swift.discovery.wmnet",
   "path_style_access": "true",
   "chunk_size": "50mb",
   "buffer_size": "50mb",
   "max_snapshot_bytes_per_sec": "20mb"
   "max_restore_bytes_per_sec": "10mb"
  }
}'

Create snapshot

The following API call creates a snapshot, using the S3 repository registered in the above API call

curl -X PUT "https://search.svc.codfw.wmnet:9243/_snapshot/elastic_snaps/snapshot_t309648?pretty" -H 'Content-Type: application/json' -d'
> {
>   "indices": "commonswiki_file",
>   "include_global_state": false,
>   "metadata": {
>     "taken_by": "bking",
>     "taken_because":  "T309648"
>   }
> }
> '

{
  "accepted" : true
}

Restore Snapshot

To restore the snapshot we should override a few index settings. The below settings may or may not be appropriate for your use case, please review as necessary. In the specific example below the number_of_replicas was set to 0 with the intent of expanding that post-restore using normal index recovery mechanisms. total_shards_per_node was set to a large value to allow a 32 shard index to be restored into a 6 node cluster.

curl -H 'Content-type: Application/json' -XPOST  \
    https://cloudelastic.wikimedia.org:9243/_snapshot/elastic_snaps/snapshot_t309648/_restore -d '{
        "indices": "commonswiki_file_1647921177",
        "include_global_state": false,
        "index_settings": {
            "index.number_of_replicas": 0,
            "index.auto_expand_replicas": null,
            "index.routing.allocation.total_shards_per_node": 8
        }
    }'

Post Restore

If the number_of_replicas was set to 0 during restore it's critical that we bring this back to the expected value post-restore. Additionally in this specific example the refresh_interval was increased to the default expected in that specific cluster.

curl -XPUT https://cloudelastic.wikimedia.org:9243/commonswiki_file/_settings -H 'Content-Type: application/json' -d '{
    "index.number_of_replicas": 1,
    "index.refresh_interval": "5m"
}'