You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

MediaWiki and EtcdConfig

From Wikitech-static
Revision as of 02:43, 13 March 2018 by imported>Krinkle (+Category:MediaWiki production)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page is about use of EtcdConfig at Wikimedia Foundation. See EtcdConfig on doc/mediawiki about the general functionality.

MediaWiki supports fetching configuration data from Etcd. This page is about how this functionality is used at Wikimedia Foundation.

How data is organized

Similar to Etcd values Wikimedia uses for other applications, we manage configuration values for MediaWiki via Conftool. Specifically, we have a specific schema for objects that MediaWiki expects to read from Etcd, and that structure is as follows:

{
    val: <VARIABLE>
}

Here, VARIABLE is any JSON structure, which translates to a PHP-based structure. As you can see from the conftool schema, any value is allowed as long as it's valid JSON. Still, we want editing safety as any change in Etcd is rapidly spread to all app servers in the cluster, so we have the ability to define json-schema based validators for each of the records we add via conftool, and reject invalid edits without them reaching MediaWiki.

As usual, conftool organizes objects in file-like paths according to tags; in the case of the mw-config object type, there is one tag only, the scope of the variable. As of writing this, only the common scope and the scopes relative to the datacenters. So the tree structure of record is, as of now, in the form <basedir>/mediawiki-config/<scope>/<name> as follows:

common/wmfMasterDatacenter
eqiad/ReadOnly
codfw/ReadOnly

Get the values MediaWiki gets from etcd

Using conftool it's pretty easy to see all values that MediaWiki fetches:

$ confctl --object-type mwconfig select 'name=.*' get

you can see which variables are used by MediaWiki, you should check the relevant mediawiki-config file.

Edit an existing value

If you want to edit an existing value, you can use confctl to fetch a value, edit it in your preferred editor, and resubmit them.

So for example if you want to modify what ends up in $wgReadOnly in eqiad, you can do:

$ sudo -i confctl --object-type mwconfig select='scope=eqiad,name=ReadOnly' edit

your changes are expected to fully propagate to both MediaWiki clusters within 15 seconds or so. Edit actions are automatically logged to the SAL, too.

Add a new configuration variable

Adding a new variable and consuming it from mediawiki is a rather cumbersome process, and that's a good thing! In fact, we don't want too much data to be stored in etcd, nor too many variables. In fact, when you think about adding a variable to etcd, always ask yourself if that variable represents a state or a configuration. So, the pooled/depooled state of a database in the configuration is state, but switching on / off VE on a wiki is configuration. Those two examples are pretty extreme and clear-cut, but you'll find out it's not always that clear. When in doubt, avoid moving configs to etcd!

If you do have a variable you want to add, this is a three step process.

  1. Define a json schema for your variable or group of variables, and add it to the json-schemas in operations/puppet, and also add a rule to the main conftool schema file for matching tags and names that correspond to your validation.
  2. Add an entry in conftool-data for your new object; during puppet-merge, an empty object will be created.
  3. Add an entry to fetch and use the variable in mediawiki-config. Please avoid instantiating or requiring etcd.php multiple times, as we want to fetch data from etcd once and only once.

Operational guarantees and failure scenarios

Flowchart
Flowchart for the loading of data from etcd within MediaWiki

Whenever one request is received, MediaWiki will try to fetch the config data from the local cache (APC on fastcgi, a local hash on cli); if it's not there, or it's stale, it will try to fetch it from etcd. A locking mechanism guarantees at most one thread per application server will request the data. Once data is fetched, it's cached for 10 seconds, so etcdconfig should result in 6 read/appserver/minute, which is quite a small volume and should not become an issue for the etcd clusters. MediaWiki is smart and will pick randomly one of the servers listed in a SRV record, and connect to the next one if the first is not available.

If no server is available, the data from cache will be used, even if stale. This means that an appserver will continue to work as expected as long as it's not restarted, even in case of a complete failure of the etcd cluster. Whenever some failure happens, at most one request out of all the concurrent ones will try to fetch the configuration anyways, so the overall slowdown of the user experience will be limited.

If you want to get into the full details of how this works, you can refer to the flowchart linked here, or read the code for EtcdConfig::load. As you can see from what said above, the implementation favours availability over consistency, as it allows stale reads.

Alerting

While this is ok for avoiding the worst failure scenarios, it is still a possibility that some appserver ends up being out of sync with the etcd servers. In such a case, an icinga alert will pop up for "MediaWiki EtcdConfig up-to-date", and the general response should be first checking the server, then the etcd cluster, and eventually restart the Fcgi server.