You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Application servers: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Giuseppe Lavagetto
m (Changed reference to palladium)
imported>Krinkle
 
(29 intermediate revisions by 16 users not shown)
Line 1: Line 1:
The Apache configs are maintained in a Git repository at [https://github.com/wikimedia/operations-puppet/tree/production/modules/mediawiki/files/apache/sites operations/puppet.git:/modules/mediawiki/files/apache/sites/]. Before July 11th 2012 these were in Subversion.
{{Navigation Wikimedia infrastructure|expand=mw}}
{{See|See also '''[[Application servers/Runbook]]''' for how to perform common tasks, or diagnose issues.}}


==Testing config==
The '''Application servers''' (or '''app servers''') are the several hundred Apache servers that run [[MediaWiki]] (PHP application).
* Submit change to [[Gerrit]] in the <code>modules/mediawiki/files/apache/sites</code> directory (project: operations/puppet)
* Disable puppet on [[mw1017]]: <code>sudo puppet agent --disable 'insert reason'</code>
* Apply change locally under <code>/etc/apache2/sites-enabled/</code>
* On [[mw1017]]: <code>sudo apache2ctl restart</code>
* Test your change by making relevant HTTP request. Use the debugg header to make the request go to mw1017. See [[Debugging in production]] for details.
* When you're done, <code>sudo puppet agent --enable</code>


==Deploying config==
==Service==
It is suggested that you may wish to place any configuration updates on the [[Deployments]] page.  A bad configuration going live can easily result in a site outage.
Puppet roles:
* <code>mediawiki::appserver</code>, <code>mediawiki::canary_appserver</code>
* <code>mediawiki::appserver::api</code>, <code>mediawiki::appserver::canary_api</code>
* <code>mediawiki::maintenance</code>
* <code>mediawiki::jobrunner</code>


* Submit change to [[gerrit]] in the <code>modules/mediawiki/files/apache/sites</code> directory (project: operations/puppet)
Relevant puppet classes:
* disable puppet across the mw-cluster (so you can test the change to a single host): <code>salt --batch-size=25% 'mw*' cmd.run 'puppet agent --disable'</code>
* <code>[https://gerrit.wikimedia.org/g/operations/puppet/+/HEAD/modules/profile/manifests/mediawiki/webserver.pp profile::mediawiki::webserver]</code>, this provisions Apache, and any other packages or resources needed by MediaWiki on app servers.
* Merge via gerrit
** <code>[https://gerrit.wikimedia.org/g/operations/puppet/+/HEAD/modules/profile/manifests/mediawiki/httpd.pp profile::mediawiki::httpd]</code>, the Apache service.
* on one puppetmaster frontend: <code>puppet-merge</code>
** <code>[https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/HEAD/modules/mediawiki/manifests/web/prod_sites.pp mediawiki::web::prod_sites]</code>, the Apache configuration for all production websites (including wikipedia.org).
{{warning}} check that the config does not break Apache! You can do this by just disabling puppet on the appservers, run the puppet tagged run to a single server and restarting that manually before pushing out to all. Use a test script to check multiple URLs to see if redirect changes work.
** Additional Apache configurations are at [https://github.com/wikimedia/operations-puppet/tree/production/modules/mediawiki/files/apache/sites modules/mediawiki/files/apache/sites/]. Prior to 2012, Apache configuration were in a Subversion repository.
* go to one server and manually re-enable puppet & run puppet
:* confirm the configuration changes go through
* go to one mw server and do <code>apache2ctl configtest</code>
* create a plain text file with some significant URLs you touched and use "apache-fast-test url.file mw1234" on tin to test against your one test host.
* once test works, re-enable puppet across the mw hosts: <code>salt --batch-size=25% 'mw*' cmd.run 'puppet agent --enable'</code>
* sync the code via a tagged puppet run on appservers: <code>salt --batch-size=25% 'mw*' cmd.run 'puppet agent -t --tags mw-apache-config'</code>


* Test change from external
==Architecture==
[[File:WMF_infrastructure_2022.png|thumb|400px|Each appserver cluster and their role.]]
{{See also|MediaWiki at WMF|HTTP timeouts#App servers}}


==Restarting==
The application servers are load-balanced via [[LVS]]. Connections between our CDN (HTTP cache proxies) and app servers are encrypted with TLS, which is terminated locally on the app server using [[Envoy]]. Envoy then hands the request off to the local Apache.


===All===
'''Apache''' is in charge of handling redirects, rewrite rules, and determining the [[MediaWiki at WMF#Document root|document root]]. It then uses <code>php-fpm</code> to invoke the MediaWiki software.
Use [[Salt]] from a salt master like [[neodymium]].


===One, to test a change===
The Apache [https://httpd.apache.org/docs/2.4/mpm.html MPM] we use is [https://httpd.apache.org/docs/2.4/mod/worker.html mod_worker]</code>, which decides how <code>php-fpm</code> processes are spawned.
Ssh to the web server you want to test on. Then restart apache on that web server only. Test your change with curl, as with this foundation example:
 
<code>curl -H 'Host: wikimediafoundation.org' "http://localhost/fundraising"</code>
 
The raw HTML for the page will now be displayed in your window. You can copy and paste that into a file on your hard drive and open it with your browser to see the effect. Host is the name of the web site after the http:// part in your browser URL area. GET /fundraising is the part after the site name. The example gets http://wikimediafoundation.org/fundraising.


==Logging==
==Logging==


Apache errors are logged to /a/mw-log/apache2.log on fluorine.
Apache errors are logged to <code>/srv/mw-log/apache2.log</code> on <code>mwlog1001</code>.


Apache access logs are mostly disabled. Statistics are drawn from [[Varnish]] front ends instead.
Apache access logs are mostly disabled. Statistics are drawn from [[Varnish]] front ends instead.


==Apache setup checklist==
==Update our PHP packages==
 
* Follow the [[Automated installation]] instructions for the base install
* Run the following on the server:
:* <tt>apt-get update && apt-get dist-upgrade -y && apt-get install wikimedia-task-appserver && reboot && exit </tt>
* Wait for the server to come back online, ensure it starts apache correctly
** <tt>echo 'GET /' | nc localhost 80</tt> or any of the number of tests listed below
* If the server is part of the memcached group, follow instructions on [[Memcached]]
* Run the setup of [[Ganglia]]
* If the server is new, you will need to do the following:
:* Login to the LVS server for apaches (lvs3 as of 2009-02-13) and add the new servers to /etc/pybal/apaches
* If the server is not new do the following:
:* Ensure the server is now enabled in pybal on the LVS server in the file /etc/pybal/apaches
* You will need to add the server to [[DSH]] groups if new, or check if they are commented, if the server is not new:
:* Add/Uncomment the host to /usr/local/dsh/node_groups/apaches and mediawiki-installation, as well as any other groups needed
:* Reload nagios to accept the changes to the node groups:
::* <tt>cd /home/wikipedia/conf/nagios && ./sync </tt>
* Verify that the server is tacking traffic and doing work
:* <tt>ipvsadm -L | grep SERVERNAME </tt>
:* traffic logs?
 
==Test cases==
 
Here are some test cases you can use to test the apache configuration after changing something.
 
<pre>
GET /wiki/Foo HTTP/1.1
Host: en.wikipedia.org
User-agent: testthing
 
GET /wiki/Foo HTTP/1.1
Host: www.wikipedia.org
User-agent: testthing
 
GET /wiki/Main_Page HTTP/1.1
Host: www.wikipedia.com
User-agent: testthing
 
GET / HTTP/1.1
Host: wikipedia.com
User-agent: testthing
 
GET / HTTP/1.1
Host: wikibooks.org
User-agent: testthing


GET / HTTP/1.1
We use custom PHP packages, which are co-installable and more recent that what is shipped in the underlying Debian releases. If you want to add/update a patch you can do the following:
Host: wikiquote.org
User-agent: testthing


GET / HTTP/1.1
* On an existing app server obtain the source with "apt-get source php7.4"
Host: dk.wikipedia.org
* Copy the source files (*debian.tar.xz, *dsc, *orig.tar.xz(.asc)) to build2001.codfw.wmnet
User-agent: testthing
* Unpack the souce with "dpkg-source -x $DSCFILE"
* Make the modification within the source tree (e.g. applying a local patch or a backport from upstream)
* Run "dpkg-source --commit" to record your changes in a debian/patches/foo patch file (debian/patches/series is automatically amended)
* Bump the changelog with "dch -i" (which spawns an editor)
* Finally build the updated package with "PHP74=yes DIST=buster-wikimedia pdebuild"
* Test the resulting build (and if all is fine, import to apt.wikimedia.org)


GET / HTTP/1.1
== Hardware repair ==
Host: foo.wikipedia.org
{{Outdated-inline|year=2015}}
User-agent: testthing


GET /wiki/Main_Page HTTP/1.1
Host: test.wikipedia.org
User-agent: testthing
GET /wiki/Foo HTTP/1.1
Host: en.wikipedia.org
User-Agent: Exalead
GET /wiki/Foo HTTP/1.1
Host: meta.wikimedia.org
User-agent: testthing
GET / HTTP/1.1
Host: en.wiktionary.org
User-agent: testthing
</pre>
== Hardware Repair ==
==== Application Servers ====
When taking down application servers (running mediawiki) for things like disk replacement or other hardware repair, _do not forget to_:
When taking down application servers (running mediawiki) for things like disk replacement or other hardware repair, _do not forget to_:
* before: remove from dsh group
* before: remove from dsh group
Line 131: Line 56:
See [[pybal]]. You can just grep for the server name and set 'enabled': False and save.
See [[pybal]]. You can just grep for the server name and set 'enabled': False and save.
* before: check nobody is scapping right now (best: announce with a !log line in IRC)
* before: check nobody is scapping right now (best: announce with a !log line in IRC)
This is an IRC thing on freenode in #wikimedia-dev/-tech/-operations
This is an IRC thing on libera.chat in {{irc|wikimedia-dev}}/{{irc|wikimedia-tech}}/{{irc|wikimedia-operations}}
* during: acknowledge Icinga monitoring checks (best: with related ticket number as comment)
* during: acknowledge Icinga monitoring checks (best: with related ticket number as comment)
Do this by logging in via browser on icinga.wikimedia.org. search for the hostname, check all services and use the "acknowledge" option. You'll see the IRC bots outputting this as well and they will stop repeating things over and over in the channels.
Do this by logging in via browser on icinga.wikimedia.org. search for the hostname, check all services and use the "acknowledge" option. You'll see the IRC bots outputting this as well and they will stop repeating things over and over in the channels.
Line 139: Line 64:
Revert the above.
Revert the above.


 
== See also ==
* [[Apache log format]]
* [[UID]]


[[Category:Servers by usage| Apache]]
[[Category:Servers by usage| Apache]]
[[Category:MediaWiki production| ]]
[[Category:SRE Service Operations]]

Latest revision as of 00:18, 27 September 2022

The Application servers (or app servers) are the several hundred Apache servers that run MediaWiki (PHP application).

Service

Puppet roles:

  • mediawiki::appserver, mediawiki::canary_appserver
  • mediawiki::appserver::api, mediawiki::appserver::canary_api
  • mediawiki::maintenance
  • mediawiki::jobrunner

Relevant puppet classes:

Architecture

Each appserver cluster and their role.

The application servers are load-balanced via LVS. Connections between our CDN (HTTP cache proxies) and app servers are encrypted with TLS, which is terminated locally on the app server using Envoy. Envoy then hands the request off to the local Apache.

Apache is in charge of handling redirects, rewrite rules, and determining the document root. It then uses php-fpm to invoke the MediaWiki software.

The Apache MPM we use is mod_worker, which decides how php-fpm processes are spawned.

Logging

Apache errors are logged to /srv/mw-log/apache2.log on mwlog1001.

Apache access logs are mostly disabled. Statistics are drawn from Varnish front ends instead.

Update our PHP packages

We use custom PHP packages, which are co-installable and more recent that what is shipped in the underlying Debian releases. If you want to add/update a patch you can do the following:

  • On an existing app server obtain the source with "apt-get source php7.4"
  • Copy the source files (*debian.tar.xz, *dsc, *orig.tar.xz(.asc)) to build2001.codfw.wmnet
  • Unpack the souce with "dpkg-source -x $DSCFILE"
  • Make the modification within the source tree (e.g. applying a local patch or a backport from upstream)
  • Run "dpkg-source --commit" to record your changes in a debian/patches/foo patch file (debian/patches/series is automatically amended)
  • Bump the changelog with "dch -i" (which spawns an editor)
  • Finally build the updated package with "PHP74=yes DIST=buster-wikimedia pdebuild"
  • Test the resulting build (and if all is fine, import to apt.wikimedia.org)

Hardware repair

When taking down application servers (running mediawiki) for things like disk replacement or other hardware repair, _do not forget to_:

  • before: remove from dsh group

These are in puppet, operations/puppet repo, in modules/dsh/files/group. The important one for Mediawiki sync is "mediawiki-installation".

  • before: de-pool in pybal
  • TODO: Document what to do if it's a scap proxy (see hieradata/common/dsh/config.yaml)

See pybal. You can just grep for the server name and set 'enabled': False and save.

  • before: check nobody is scapping right now (best: announce with a !log line in IRC)

This is an IRC thing on libera.chat in #wikimedia-dev connect/#wikimedia-tech connect/#wikimedia-operations connect

  • during: acknowledge Icinga monitoring checks (best: with related ticket number as comment)

Do this by logging in via browser on icinga.wikimedia.org. search for the hostname, check all services and use the "acknowledge" option. You'll see the IRC bots outputting this as well and they will stop repeating things over and over in the channels.

  • after: re-add to dsh groups

Revert the above.

  • after: re-pool in pybal

Revert the above.

See also