You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Search/S3 Plugin Enable

From Wikitech-static
< Search
Revision as of 22:14, 1 July 2022 by imported>Bking
Jump to navigation Jump to search

Brian's notes for deploying the S3 plugin. See T309720 for more details. and this page for more about the test environment.

Why Enable the S3 Plugin?

Currently, there is no easy way to restore data from production to cloudelastic. The easiest way to do this is to use thanos-swift to move data around. Elasticsearch has better support for the S3 API as opposed to swift. This is (sadly) pretty common, as the Search platform team has seen with flink already.

Complicating factors

We have an unorthodox Elastic environment: specifically, we run 2 or 3 Elasticsearch instances on a single host. As a result, using the elasticsearch-keystore requires special care.

Getting the elastic keystore path right

By default, elasticsearch-keystore invokes java with the wrong es.path.conf. We can override by setting ES_PATH_CONF when invoking elasticsearch-keystore:

export ES_PATH_CONF=/etc/elasticsearch/production-search-psi-codfw; /usr/share/elasticsearch/bin/elasticsearch-keystore add s3.client.default.access_key

Permissions are also very important! The keystore file must have permissions root:elasticsearch and mode 0640 . If the elasticsearch service fails to start after a keystore change, check the paths and permissions. A brand-new elasticsearch-keystore file in /etc/elasticsearch/ means the ES_PATH_CONF=environment variable was not respected. If theelasticsearch-keystore file is owned by root:root , the service will not start.

The keystore file has no validation

"...the keystore has no validation to block unsupported settings. Adding unsupported settings to the keystore will cause Elasticsearch to fail to start." More at Elastic's website

The keystore file must be identical across all cluster nodes

Since we don't use shared storage and the keystore file isn't a simple flat file, we do some interesting stuff with puppet to make this work.

Path-style and bucket-style access

We use the thanos-swift cluster as our object store, via its S3-compatible API . "Real" S3 supports bucket-based access, which relies on DNS records. We don't have this, so we must use path-style access. Unfortunately, Elastic added, removed, then re-added support for this feature. As of this writing, we are on Elasticsearch 6.8, which does not support path-style access. As a result, we use our own custom version of the s3 plugin, installed via our elastic-plugins deb package.

API Calls

Register S3 repository

The below API call registers a snapshot repository with the arbitrary name "elastic_snaps".

More details at Elastic.co .

curl -H 'Content-type: Application/json' -XPUT  http://127.0.0.1:9200/_snapshot/elastic_snaps -d '

{

  "type": "s3",

  "settings": {

   "bucket": "elasticsearch-snapshot",

   "client": "default",

   "endpoint": "https://thanos-swift.discovery.wmnet",

   "path_style_access": "true"

       }

  }

  '

Create snapshot

The following API call creates a snapshot, using the S3 repository registered in the above API call

curl -X PUT "localhost:9200/_snapshot/elastic_snaps/snapshot_t309648_attempt_2?pretty" -H 'Content-Type: application/json' -d'

> {

>   "indices": "commonswiki_file",

>   "include_global_state": false,

>   "metadata": {

>     "taken_by": "bking",

>     "taken_because":  "T309648"

>   }

> }

> '

{

  "accepted" : true

}