You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
RESTBase: Difference between revisions
imported>Hnowlan (→Administration: Add final details about pooling servers, and cert management) |
imported>Hnowlan (Description of alert being created) |
||
Line 27: | Line 27: | ||
==== Provisioning ==== | ==== Provisioning ==== | ||
Once a server is in the [[Server Lifecycle#Staged -> Active|right state]] to be deployed to, first ensure that the correct DNS setup has been completed. RESTBase hosts have three additional addresses added as aliases for the system's main interface. These are usually created along with the host's base DNS record (restbaseNNNN) but you should make sure that restbaseNNNN-a, restbaseNNNN-b and restbaseNNNN-c have all been created in DNS. Ideally these addresses are sequential from the base address but this is ''not'' required. These addresses should be specified in [[DNS]] and [[Netbox]]. | Once a server is in the [[Server Lifecycle#Staged -> Active|right state]] to be deployed to, first ensure that the correct DNS setup has been completed. RESTBase hosts have three additional addresses added as aliases for the system's main interface. These are usually created along with the host's base DNS record (restbaseNNNN) during setup via [[Netbox]] using the checkbox for Restbase hosts, but you should make sure that restbaseNNNN-a, restbaseNNNN-b and restbaseNNNN-c have all been created in DNS. Ideally these addresses are sequential from the base address but this is ''not'' required. These addresses should be specified in [[DNS]] and [[Netbox]]. | ||
Once the host is configured correctly: | Once the host is configured correctly: | ||
Line 51: | Line 51: | ||
* Add [[phab:rGRBD4ad65b00720f2f8926a0bd2c45c71988deb02266|hosts to the deployment list]] in the Restbase deploy repo ([./Ssh://gerrit.wikimedia.org:29418/mediawiki/services/restbase/deploy ssh://gerrit.wikimedia.org:29418/mediawiki/services/restbase/deploy]) | * Add [[phab:rGRBD4ad65b00720f2f8926a0bd2c45c71988deb02266|hosts to the deployment list]] in the Restbase deploy repo ([./Ssh://gerrit.wikimedia.org:29418/mediawiki/services/restbase/deploy ssh://gerrit.wikimedia.org:29418/mediawiki/services/restbase/deploy]) | ||
* | * If there have been changes to the restbase service since you applied the correct roles to the host (the latest deployed version should be pulled via Puppet during the first puppet runs), deploy restbase to the hosts: from deployment.eqiad.wmnet, <code>cd /srv/deployment/restbase/deploy/</code>, <code>git pull</code> and then <code>scap deploy -f -l restbaseNNNN.DC.wmnet "First deploy to restbaseNNNN"</code> | ||
* [[gerrit:c/operations/puppet/+/632497|Add the hosts to conftool-data]] | * [[gerrit:c/operations/puppet/+/632497|Add the hosts to conftool-data]] | ||
* If the hosts are healthy in Icinga at this point and if you feel it is safe as regards deployment timing and so on, pool the hosts: | * If the hosts are healthy in Icinga at this point and if you feel it is safe as regards deployment timing and so on, pool the hosts: | ||
Line 61: | Line 61: | ||
=== Renewing expired certificates === | === Renewing expired certificates === | ||
Every now and again Cassandra certificates will come close to expiry (for example: SSL WARNING - Certificate restbase2016-a valid until 2020-11-29 09:26:14 +0000 (expires in 53 days)). Certificates need to be deleted and recreated in the Puppet secrets directory - See the [[Cassandra#Installing and generating certificates|Cassandra documentation]] for details. | Every now and again Cassandra certificates will come close to expiry (for example: SSL WARNING - Certificate restbase2016-a valid until 2020-11-29 09:26:14 +0000 (expires in 53 days)). Certificates need to be deleted and recreated in the Puppet secrets directory - See the [[Cassandra#Installing and generating certificates|Cassandra documentation]] for details. | ||
== Monitoring == | |||
=== instance-data === | |||
In production, the <code>instance-data</code> path is usually a RAID array. It is used for hints, commitlogs and caches - all vital to the stable operation of the Cassandra instances. Under unusual circumstances (a large rebalancing, an instance behaving erroneously etc) this mount can fill up quickly and space will sometimes be required to back out of this condition. For this reason, we set a lower threshold for disk free on this path than for other disks. | |||
== Debugging == | == Debugging == |
Revision as of 16:12, 17 August 2021
This page is currently a draft. More information and discussion about changes to this draft on the talk page. |
![]() | FIXME: This document needs expansion |
RESTBase is an API proxy serving the REST API at /api/rest_v1/
. It uses Cassandra as a storage backend.
It is currently running on hosts with the profile::restbase
class.
Deployment and config changes
RESTBase is deployed by Scap.
What to check after a deploy
Deploys to do not always go according to plan, and regressions are not always obvious. Here is a list of things you should check after each deploy:
- Does the API documentation still load? Consider exercising some of the endpoints from the UI (perhaps by requesting an html render).
- Check error logs in logstash.
- Have a look at the metrics in Grafana. Have latencies increased, or error rates jumped? Is memory utilization consistent with expectations? What about storage (op rates, exceptions, etc)?
- Consider making an edit to a page using Visual Editor.
- Take a look at some recent Visual Editor-performed changes (French Wikipedia works great for this, as they use VE by default). Do the diffs looks reasonable?
- Keep a close eye on
#wikimedia-operations
, if someone spots a problem, they're likely to raise the issue there.
Other considerations
Be sure to log all actions ahead of time in #wikimedia-operations
. Don't be shy about including details.
Administration
Adding a new RESTBase host
Provisioning
Once a server is in the right state to be deployed to, first ensure that the correct DNS setup has been completed. RESTBase hosts have three additional addresses added as aliases for the system's main interface. These are usually created along with the host's base DNS record (restbaseNNNN) during setup via Netbox using the checkbox for Restbase hosts, but you should make sure that restbaseNNNN-a, restbaseNNNN-b and restbaseNNNN-c have all been created in DNS. Ideally these addresses are sequential from the base address but this is not required. These addresses should be specified in DNS and Netbox.
Once the host is configured correctly:
- Add host definitions in hiera - look at the existing hosts in the chosen datacentre to determine which rack the hosts should go into. Generally distribution between racks should be as even in number as possible. The
jbod_devices
parameter is dependent on the disk layout of the hosts in question. If you're not changing the layout elsewhere, it's generally okay to reuse the layout of previous hosts - verify this on the new host before reprovisioning it. - Add the host definitions to the restbase hierdata
- Add the hosts to the datacentre hierdata - these are used only for configuring the rate limiting service and its firewall in puppet and in Scap3.
- Add certificates for the host on the main puppetmaster's /srv/private repo. Search the commit history for previous restbase host additions for examples. Generate the new certificates from the cassandra certs directory by running
cassandra-ca-manager restbase.yaml
. - Change the hosts' roles - by default hosts will be using
insetup
.
Example links for historical changes shown for context.
Generally it's a good idea to do the above one host at a time. Adding new RESTBase hosts to the Cassandra cluster can take a long time and so it's best to just proceed slowly one by one if you have many hosts to add at once rather than have to worry about juggling downtimes.
Cassandra setup
Once the host is fully provisioned, it can be added instance by instance (restbaseNNNN-a, then restbaseNNNN-b, then restbaseNNNN-c) to the Cassandra cluster. Topology changes are costly events in Cassandra and for this reason only ever add one node to the cluster at a time. To start bootstrapping the "a" instance, simply run sudo touch /etc/cassandra-a/service-enabled
and sudo run-puppet-agent
. This will start the respective cassandra instance and in time it will start bootstrapping itself. To monitor the progress of node a's updates, run cassandra-streams -nt nodetool-a
from the bootstrapping host or c-any-nt status -r| grep restbaseNNNN
from another host in the cluster. A host can be considered fully bootstrapped when the instance when c-any-nt status -r | grep restbaseNNNN
shows the node in status "UN". Hosts still in the process of joining will show status "UJ".
The process of adding a single node can take a long time - even beginning the bootstrap process can take upwards of an hour, and the process itself can take around 5-6 hours. For this reason, it is imperative that you manage your downtimes appropriately to prevent disruption (in short, use the sre.hosts.downtime
cookbook).
When waiting for the bootstrapping process to start, it is perfectly normal to see the message "Migration task failed to complete
" in the system logs for the instance in question. This can be ignored.
Service deployment and pooling
- Add hosts to the deployment list in the Restbase deploy repo ([./Ssh://gerrit.wikimedia.org:29418/mediawiki/services/restbase/deploy ssh://gerrit.wikimedia.org:29418/mediawiki/services/restbase/deploy])
- If there have been changes to the restbase service since you applied the correct roles to the host (the latest deployed version should be pulled via Puppet during the first puppet runs), deploy restbase to the hosts: from deployment.eqiad.wmnet,
cd /srv/deployment/restbase/deploy/
,git pull
and thenscap deploy -f -l restbaseNNNN.DC.wmnet "First deploy to restbaseNNNN"
- Add the hosts to conftool-data
- If the hosts are healthy in Icinga at this point and if you feel it is safe as regards deployment timing and so on, pool the hosts:
confctl select "dc=DC,cluster=restbase,service=restbase,name=restbaseNNNN.DC.wmnet" set/pooled=yes
confctl select "dc=DC,cluster=restbase,service=restbase-ssl,name=restbaseNNNN.DC.wmnet" set/pooled=yes
confctl select "dc=DC,cluster=restbase,service=restbase-backend,name=restbaseNNNN.DC.wmnet" set/pooled=yes
- Verify that the hosts have been added and are healthy via the pybal API
Renewing expired certificates
Every now and again Cassandra certificates will come close to expiry (for example: SSL WARNING - Certificate restbase2016-a valid until 2020-11-29 09:26:14 +0000 (expires in 53 days)). Certificates need to be deleted and recreated in the Puppet secrets directory - See the Cassandra documentation for details.
Monitoring
instance-data
In production, the instance-data
path is usually a RAID array. It is used for hints, commitlogs and caches - all vital to the stable operation of the Cassandra instances. Under unusual circumstances (a large rebalancing, an instance behaving erroneously etc) this mount can fill up quickly and space will sometimes be required to back out of this condition. For this reason, we set a lower threshold for disk free on this path than for other disks.
Debugging
To temporarily switch to local logging for debugging, you can change the config.yaml log stanza like this:
logging: name: restbase streams: # level can be trace, debug, info, warn, error - level: info path: /tmp/debug.log
Alternatively, you can log to stdout by commenting out the streams sub-object. This is useful for debugging startup failures like this:
cd /srv/deployment/restbase/deploy/ sudo -u restbase node restbase/server.js -c /etc/restbase/config.yaml -n 0
The -n 0
parameter avoids forking off any workers, which reduces log noise. Instead, a single worker is started up right in the master process.
Analytics and metrics
Hive query for action API & rest API traffic:
use wmf;
SELECT
SUM(IF (uri_path LIKE '/api/rest_v1/%', 1, 0)) as count_rest,
SUM(IF (uri_path LIKE '/w/api.php%', 1, 0)) as count_action
FROM wmf.webrequest
WHERE webrequest_source = 'text'
AND year = 2017
AND month = 9
AND (uri_path LIKE '/api/rest_v1/%' OR uri_path LIKE '/w/api.php%');