You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Application servers: Difference between revisions
imported>Muehlenhoff (→Restarting: Remove salt references) |
imported>Quiddity m ((syntaxhighlight) lang="bash") |
||
Line 11: | Line 11: | ||
==Deploying config== | ==Deploying config== | ||
It is suggested that you may wish to place any configuration updates on the [[Deployments]] page. A bad configuration going live can easily result in a site outage. | It is suggested that you may wish to place any configuration updates on the [[Deployments]] page. A bad configuration going live can easily result in a site outage. | ||
* Test your change in deployment-prep and make sure that it works as expected. | |||
* Submit change to [[gerrit]] in the <code>modules/mediawiki/files/apache/sites</code> directory (project: operations/puppet) | * Submit change to [[gerrit]] in the <code>modules/mediawiki/files/apache/sites</code> directory (project: operations/puppet) | ||
* | * Disable puppet across the affected mediawiki application servers. | ||
* | ** Cumin can in finding the precise set of hosts. For example, this is a recent query: <syntaxhighlight lang="bash"> | ||
* | cumin 'R:File = "/etc/apache2/sites-available/04-remnant.conf"' 'disable-puppet "elukey - precaution for https://gerrit.wikimedia.org/r/#/c/380774/"' -b 10 | ||
</syntaxhighlight> In this case the change was related to a RewriteRule change in '''04-remnant.conf''', but of course it must be changed every time with the file(s) modified by the Gerrit change. | |||
* | * Merge via gerrit and run on puppetmaster1001 the usual <code>puppet-merge</code> | ||
* | * Create a plain text file with some significant URLs that should be modified by the Gerrit change. An example is the de-facto testing authority '''/home/oblivian/baseurls''' on tin. This text file will be used on tin with '''apache-fast-test''' later on to verify that the change works as expected. | ||
** For example, if you are adding or modifying a new RewriteRule, please add to your text file some URLs that are expected to change. | |||
* Go to one of the '''mwdebug''' servers and enable/run puppet. Apache will reload its configuration automatically, please check that no error messages are emitted. Running '''apachectl -t''' after running puppet surely helps verifying that the new configuration is syntactically correct (it doesn't absolutely imply that it will work as intended of course). | |||
** Some Apache directive changes need a full restart to get applied, not a simple reload. These changes are very rare and they are clearly indicated in Apache's documentation, so please verify it beforehand. Simple RewriteRule changes require only an Apache reload. | |||
* On tin run '''apache-fast-test''' against the selected '''mwdebug''' host using '''/home/oblivian/baseurls and your new test file'''. '''Both of them need to return a positive confirmation that everything looks good.''' | |||
** Example of usage related to the previously mentioned change (https://gerrit.wikimedia.org/r/#/c/380774): | |||
<syntaxhighlight lang="bash"> | |||
elukey@tin:~$ apache-fast-test /home/oblivian/baseurls mwdebug1002.eqiad.wmnet | |||
testing 19 urls on 1 servers, totalling 19 requests | |||
spawning threads.. | |||
http://elefante-a-pallini.ro.sa/ | |||
* 200 OK 929 | |||
http://wikimedia.org/research | |||
* 301 Moved Permanently https://wikimedia.qualtrics.com/SE/?SID=SV_6R04ammTX8uoJFP | |||
http://www.wikipedia.org/wiki/it:Francesco_Totti | |||
* 302 Found http://it.wikipedia.org/wiki/Francesco_Totti | |||
http://zero.wikipedia.org/ | |||
* 302 Found http://en.zero.wikipedia.org/wiki/Special:ZeroRatedMobileAccess | |||
[.. cut ..] | |||
* 301 Moved Permanently https://meta.wikimedia.org/wiki/Special:UrlShortener | |||
elukey@tin:~$ apache-fast-test wikidata_redirect mwdebug1002.eqiad.wmnet | |||
testing 1 urls on 1 servers, totalling 1 requests | |||
spawning threads.. | |||
https://commons.wikimedia.org/data/main/Data:Bundestagswahl2017/wahlkreis46.map | |||
* 301 Moved Permanently https://commons.wikimedia.org/wiki/Special:PageData/main/Data:Bundestagswahl2017/wahlkreis46.map | |||
</syntaxhighlight> | |||
* Enable/Run puppet on another mediawiki application server that is taking traffic, de-pooling it beforehand via confctl. Verify again from tin that everything is working as expected, running apache-fast-test. | |||
* Repool the host mentioned above and verify on Apache access logs that everything looks fine. If you want to be extra paranoid, you can check the host level metrics via https://grafana.wikimedia.org/dashboard/db/prometheus-apache-hhvm-dc-stats and make sure that nothing is out of the ordinary. | |||
* Re-enable puppet across the appservers previously disabled via cumin. | |||
* Keep an eye on the operations channel and make sure that puppet runs fine on these hosts. | |||
==Restarting== | ==Restarting== | ||
Revision as of 20:27, 11 October 2017
The Apache configs are maintained in a Git repository at operations/puppet.git:/modules/mediawiki/files/apache/sites/. Before July 11th 2012 these were in Subversion.
Testing config
- Submit change to Gerrit in the
modules/mediawiki/files/apache/sites
directory (project: operations/puppet) - Disable puppet on one of the mediawiki test servers (assuming you choose
mwdebug1001
):sudo puppet agent --disable 'insert reason'
- Apply change locally under
/etc/apache2/sites-enabled/
- On
mwdebug1001
:sudo apache2ctl restart
- Test your change by making relevant HTTP request. Use the debugg header to make the request go to
mwdebug1001
. See Debugging in production for details. - When you're done,
sudo puppet agent --enable
Deploying config
It is suggested that you may wish to place any configuration updates on the Deployments page. A bad configuration going live can easily result in a site outage.
- Test your change in deployment-prep and make sure that it works as expected.
- Submit change to gerrit in the
modules/mediawiki/files/apache/sites
directory (project: operations/puppet) - Disable puppet across the affected mediawiki application servers.
- Cumin can in finding the precise set of hosts. For example, this is a recent query: In this case the change was related to a RewriteRule change in 04-remnant.conf, but of course it must be changed every time with the file(s) modified by the Gerrit change.
cumin 'R:File = "/etc/apache2/sites-available/04-remnant.conf"' 'disable-puppet "elukey - precaution for https://gerrit.wikimedia.org/r/#/c/380774/"' -b 10
- Cumin can in finding the precise set of hosts. For example, this is a recent query:
- Merge via gerrit and run on puppetmaster1001 the usual
puppet-merge
- Create a plain text file with some significant URLs that should be modified by the Gerrit change. An example is the de-facto testing authority /home/oblivian/baseurls on tin. This text file will be used on tin with apache-fast-test later on to verify that the change works as expected.
- For example, if you are adding or modifying a new RewriteRule, please add to your text file some URLs that are expected to change.
- Go to one of the mwdebug servers and enable/run puppet. Apache will reload its configuration automatically, please check that no error messages are emitted. Running apachectl -t after running puppet surely helps verifying that the new configuration is syntactically correct (it doesn't absolutely imply that it will work as intended of course).
- Some Apache directive changes need a full restart to get applied, not a simple reload. These changes are very rare and they are clearly indicated in Apache's documentation, so please verify it beforehand. Simple RewriteRule changes require only an Apache reload.
- On tin run apache-fast-test against the selected mwdebug host using /home/oblivian/baseurls and your new test file. Both of them need to return a positive confirmation that everything looks good.
- Example of usage related to the previously mentioned change (https://gerrit.wikimedia.org/r/#/c/380774):
elukey@tin:~$ apache-fast-test /home/oblivian/baseurls mwdebug1002.eqiad.wmnet
testing 19 urls on 1 servers, totalling 19 requests
spawning threads..
http://elefante-a-pallini.ro.sa/
* 200 OK 929
http://wikimedia.org/research
* 301 Moved Permanently https://wikimedia.qualtrics.com/SE/?SID=SV_6R04ammTX8uoJFP
http://www.wikipedia.org/wiki/it:Francesco_Totti
* 302 Found http://it.wikipedia.org/wiki/Francesco_Totti
http://zero.wikipedia.org/
* 302 Found http://en.zero.wikipedia.org/wiki/Special:ZeroRatedMobileAccess
[.. cut ..]
* 301 Moved Permanently https://meta.wikimedia.org/wiki/Special:UrlShortener
elukey@tin:~$ apache-fast-test wikidata_redirect mwdebug1002.eqiad.wmnet
testing 1 urls on 1 servers, totalling 1 requests
spawning threads..
https://commons.wikimedia.org/data/main/Data:Bundestagswahl2017/wahlkreis46.map
* 301 Moved Permanently https://commons.wikimedia.org/wiki/Special:PageData/main/Data:Bundestagswahl2017/wahlkreis46.map
- Enable/Run puppet on another mediawiki application server that is taking traffic, de-pooling it beforehand via confctl. Verify again from tin that everything is working as expected, running apache-fast-test.
- Repool the host mentioned above and verify on Apache access logs that everything looks fine. If you want to be extra paranoid, you can check the host level metrics via https://grafana.wikimedia.org/dashboard/db/prometheus-apache-hhvm-dc-stats and make sure that nothing is out of the ordinary.
- Re-enable puppet across the appservers previously disabled via cumin.
- Keep an eye on the operations channel and make sure that puppet runs fine on these hosts.
Restarting
One, to test a change
Ssh to the web server you want to test on. Then restart apache on that web server only. Test your change with curl, as with this foundation example:
curl -H 'Host: wikimediafoundation.org' "http://localhost/fundraising"
The raw HTML for the page will now be displayed in your window. You can copy and paste that into a file on your hard drive and open it with your browser to see the effect. Host is the name of the web site after the http:// part in your browser URL area. GET /fundraising is the part after the site name. The example gets http://wikimediafoundation.org/fundraising.
Logging
Apache errors are logged to /a/mw-log/apache2.log on fluorine.
Apache access logs are mostly disabled. Statistics are drawn from Varnish front ends instead.
Apache setup checklist
- Follow the Automated installation instructions for the base install
- Run the following on the server:
- apt-get update && apt-get dist-upgrade -y && apt-get install wikimedia-task-appserver && reboot && exit
- Wait for the server to come back online, ensure it starts apache correctly
- echo 'GET /' | nc localhost 80 or any of the number of tests listed below
- If the server is part of the memcached group, follow instructions on Memcached
- Run the setup of Ganglia
- If the server is new, you will need to do the following:
- Login to the LVS server for apaches (lvs3 as of 2009-02-13) and add the new servers to /etc/pybal/apaches
- If the server is not new do the following:
- Ensure the server is now enabled in pybal on the LVS server in the file /etc/pybal/apaches
- You will need to add the server to DSH groups if new, or check if they are commented, if the server is not new:
- Add/Uncomment the host to /usr/local/dsh/node_groups/apaches and mediawiki-installation, as well as any other groups needed
- Reload nagios to accept the changes to the node groups:
- cd /home/wikipedia/conf/nagios && ./sync
- Verify that the server is tacking traffic and doing work
- ipvsadm -L | grep SERVERNAME
- traffic logs?
Test cases
Here are some test cases you can use to test the apache configuration after changing something.
GET /wiki/Foo HTTP/1.1 Host: en.wikipedia.org User-agent: testthing GET /wiki/Foo HTTP/1.1 Host: www.wikipedia.org User-agent: testthing GET /wiki/Main_Page HTTP/1.1 Host: www.wikipedia.com User-agent: testthing GET / HTTP/1.1 Host: wikipedia.com User-agent: testthing GET / HTTP/1.1 Host: wikibooks.org User-agent: testthing GET / HTTP/1.1 Host: wikiquote.org User-agent: testthing GET / HTTP/1.1 Host: dk.wikipedia.org User-agent: testthing GET / HTTP/1.1 Host: foo.wikipedia.org User-agent: testthing GET /wiki/Main_Page HTTP/1.1 Host: test.wikipedia.org User-agent: testthing GET /wiki/Foo HTTP/1.1 Host: en.wikipedia.org User-Agent: Exalead GET /wiki/Foo HTTP/1.1 Host: meta.wikimedia.org User-agent: testthing GET / HTTP/1.1 Host: en.wiktionary.org User-agent: testthing
Hardware Repair
Application Servers
When taking down application servers (running mediawiki) for things like disk replacement or other hardware repair, _do not forget to_:
- before: remove from dsh group
These are in puppet, operations/puppet repo, in modules/dsh/files/group. The important one for Mediawiki sync is "mediawiki-installation".
- before: de-pool in pybal
- TODO: Document what to do if it's a scap proxy (see hieradata/common/dsh/config.yaml)
See pybal. You can just grep for the server name and set 'enabled': False and save.
- before: check nobody is scapping right now (best: announce with a !log line in IRC)
This is an IRC thing on freenode in #wikimedia-dev/-tech/-operations
- during: acknowledge Icinga monitoring checks (best: with related ticket number as comment)
Do this by logging in via browser on icinga.wikimedia.org. search for the hostname, check all services and use the "acknowledge" option. You'll see the IRC bots outputting this as well and they will stop repeating things over and over in the channels.
- after: re-add to dsh groups
Revert the above.
- after: re-pool in pybal
Revert the above.