You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Debugging in production: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Krinkle
imported>Krinkle
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 8: Line 8:
=== Locally ===
=== Locally ===
You can make a local self request from any web server by using curl, like so, for a regular MediaWiki request over HTTPS:<syntaxhighlight lang="bash">
You can make a local self request from any web server by using curl, like so, for a regular MediaWiki request over HTTPS:<syntaxhighlight lang="bash">
mwdebug1002$ curl -i --connect-to "::$HOSTNAME" 'https://test.wikipedia.org/w/load.php'
mwdebug1002$ curl -i --connect-to ::$HOSTNAME 'https://test.wikipedia.org/w/load.php'
HTTP/1.1 200 OK
HTTP/1.1 200 OK
Server: mwdebug1002.eqiad.wmnet
Server: mwdebug1002.eqiad.wmnet
Line 15: Line 15:


</syntaxhighlight>Or over HTTP:<syntaxhighlight lang="bash">
</syntaxhighlight>Or over HTTP:<syntaxhighlight lang="bash">
mwdebug1002$ curl -i --connect-to "::$HOSTNAME" 'http://test.wikipedia.org/wiki/Main_Page'
mwdebug1002$ curl -i --connect-to ::$HOSTNAME 'http://test.wikipedia.org/wiki/Main_Page'
HTTP/1.1 302 Found
HTTP/1.1 302 Found
Server: mwdebug1002.eqiad.wmnet
Server: mwdebug1002.eqiad.wmnet
Line 21: Line 21:


</syntaxhighlight><syntaxhighlight lang="bash">
</syntaxhighlight><syntaxhighlight lang="bash">
mwdebug1001$ curl -i --connect-to "::$HOSTNAME" 'http://www.wikimedia.org/'
mwdebug1001$ curl -i --connect-to ::$HOSTNAME 'http://www.wikimedia.org/'
HTTP/1.1 200 OK
HTTP/1.1 200 OK
Server: mwdebug1001.eqiad.wmnet
Server: mwdebug1001.eqiad.wmnet
Line 33: Line 33:
</syntaxhighlight>And over HTTP as if from an external HTTPS request (This is currently the only way to debug in Beta Cluster, since [[phab:T206158|internal HTTPS is not available there]]):<syntaxhighlight lang="bash">
</syntaxhighlight>And over HTTP as if from an external HTTPS request (This is currently the only way to debug in Beta Cluster, since [[phab:T206158|internal HTTPS is not available there]]):<syntaxhighlight lang="bash">
deployment-mediawiki11$ curl -i --connect-to "::$HOSTNAME" -H 'X-Forwarded-Proto: https' 'http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page'
deployment-mediawiki11$ curl -i --connect-to ::$HOSTNAME -H 'X-Forwarded-Proto: https' 'http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page'
HTTP/1.1 200 OK
HTTP/1.1 200 OK
Server: deployment-mediawiki11.deployment-prep.eqiad1.wikimedia.cloud
Server: deployment-mediawiki11.deployment-prep.eqiad1.wikimedia.cloud
Line 40: Line 40:
</syntaxhighlight>
</syntaxhighlight>




'''Note''' '''about Host header''': Prior to 2015, the more traditional approach of using <code>curl '<nowiki>http://localhost/wiki/Main_Page'</nowiki> -H 'Host: test.wikipedia.org'</code>  was supported, but per [[phab:T190111|T190111]] this is no longer possible because connections via "localhost" are handled by a higher priority VirtualHost in Apache that serves responses for the health status checks (not related to MediaWiki).
'''Note''' '''about Host header''': Prior to 2015, the more traditional approach of using <code>curl '<nowiki>http://localhost/wiki/Main_Page'</nowiki> -H 'Host: test.wikipedia.org'</code>  was supported, but per [[phab:T190111|T190111]] this is no longer possible because connections via "localhost" are handled by a higher priority VirtualHost in Apache that serves responses for the health status checks (not related to MediaWiki).


'''Note about FQDN address''': Prior to 2019, it was common to workaround the above "localhost" issue by using a FQDN address instead, with <code>$HOSTNAME</code> or <code>$(hostname -f)</code>, e.g. like <code>curl -i -H 'Host: test.wikipedia.org' "<nowiki>http://$HOSTNAME/w/load.php</nowiki>"</code>. While this still works today for HTTP requests, it does not work reliably for HTTPS requests since the web server in question has no certificate for the internal hostname, though this could be bypassed with <code>curl -k</code>.
'''Note about FQDN address''': Prior to 2019, it was common to workaround the above "localhost" issue by using the internal FQDN (mw0000.eqiad.wmnet) or its internal IP address instead. This is easiest via <code>$HOSTNAME</code> or <code>$(hostname -f)</code>, e.g. like <code>curl -i -H 'Host: test.wikipedia.org' "<nowiki>http://$HOSTNAME/w/load.php</nowiki>"</code>. While this still works today for HTTP requests, it does not work reliably for HTTPS requests since the web server in question has no certificate for the internal hostname, though this could be bypassed with <code>curl --insecure</code> (or <code>curl -k</code> for short).


'''Note about --resolve option''': Prior to 2020, other documentation pages recommended <code>--resolve</code> as the main strategy, e.g.  <code>curl -i --resolve "test.wikipedia.org:443:$(hostname -i)" '<nowiki>https://test.wikipedia.org/w/load.php'</nowiki></code>. This still works perfectly today and is functionally equivalent to the current recommendation with <code>--connect-to</code>. The <code>--resolve</code> option is no longer recommended because it is too easy to misuse and not realize that it was silently ignored. For example, if it specifies a different hostname than the real URL, then curl will connect to the main production edge instead which is easy to miss if you don't enable verbose <code>-v</code> mode and check what server it connected to. This can be mitigated by using a wildcard hostname like <code>--resolve "*:443:$(hostname -i)"</code> but that still requires getting the port right, which means over HTTP, it would get silenly ignored again, plus it requires the IP address and thus the extra hostname command. The <code>--connect-to</code> option has the benefit of allowing both host and port to be unspecified, and supports a hostname as destination (instead of IP address), thus allowing the simpler and more memorable <code>"::$HOSTNAME"</code> form.
'''Note about --resolve option''': Prior to 2020, other documentation pages recommended <code>--resolve</code> as the main strategy, e.g.  <code>curl -i --resolve "test.wikipedia.org:443:$(hostname -i)" '<nowiki>https://test.wikipedia.org/w/load.php'</nowiki></code>. This still works perfectly today and is functionally equivalent to the current recommendation with <code>--connect-to</code>. The <code>--resolve</code> option is no longer recommended because it is too easy to misuse and not realize that it was silently ignored. For example, if you specify "resolve" with a different hostname than your URL (with redirects, there can be many host names involved), curl will silently connect to the main production edge for your first and only request, which is easy to miss if you don't enable verbose <code>-v</code> mode and check what server it actually connected to. This can be mitigated by using a wildcard hostname like <code>--resolve "*:443:$(hostname -i)"</code> but that still requires getting the port right, which means over HTTP, it would silently get ignored again, plus it requires the IP address and thus the extra hostname command. The <code>--connect-to</code> option has the benefit of allowing both host and port to be omitted, and supports a hostname as destination (instead of IP address), thus allowing the simpler and more memorable <code>"::$HOSTNAME"</code> form.
== Pushing code to a debug server ==
== Pushing code to a debug server ==
{{Outdated-inline|year=2018}}
Developers can put code updates on one of the [[Debug servers|mwdebug hosts]], before deploying to the entire production cluster, see [[How to deploy code#Pre-deployment testing in production|Pre-deployment testing in production]] .
Developers can put code updates on one of the [[Debug servers|mwdebug hosts]], before deploying to the entire production cluster, see [[How to deploy code#Pre-deployment testing in production|Pre-deployment testing in production]] .


Line 99: Line 99:


== Debugging logs ==
== Debugging logs ==
To locally debug messages sent to Logstash from MediaWiki or php-fpm, read [[Application_servers/Runbook#Logging]].  
To debug how log messages are sent to Logstash from MediaWiki php-fpm servers, read [[Application_servers/Runbook#Logging]].  


== Debugging in shell ==
== Debugging in shell ==
{{Outdated-inline|year=2018}}
To open a command-line shell to PHP, log in an mwdebug server or the [[Maintenance server]] and run:
To open a debugger, run:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
mwrepl wikidbname
$ mwscript eval.php dbname_here
</syntaxhighlight>
</syntaxhighlight>


on [[mwmaint1002]].  wikidbname is e.g. eswiki.  You can set breakpoints, then call arbitrary MW code.
Where dname is e.g. aawiki.  You can call arbitrary MW code from here, and use <code>return ..</code> as shortcut for <code>var_dump( .. );</code> to read out any information.
=== Debugging action API requests in shell ===
Sometimes, it is convenient to debug through the action API. Since this is a user-facing entry point, it allows testing user parameters and permission checks.


Start by opening mwrepl as above.  Then use:
== Ad-hoc log messages ==
<syntaxhighlight lang="php">
The recommended approach to ad-hoc logging in production is <code>wfDebugLog( 'AdHocDebug', 'Hi...' );</code>. This will reliably send the message to Logstash from both web-facing contexts, jobrunners, and CLI maintenance scripts, and does so without running the risk of unintentionally [[Incidents/2022-06-16 MariaDB password leak|disclosing sensitive data]] attached to objects in memory.
define( 'MW_API', true ); // Signal this is api.php.


$wgUser = User::newFromName( <Username> );
If the data will be logged from a mwdebug host via CLI or via [[WikimediaDebug]], then the message will show up at [https://logstash.wikimedia.org/app/dashboards#/view/mwdebug1002 Logstash dashboard: mwdebug]
$wgTitle = Title::makeTitle( NS_SPECIAL, 'Badtitle/dummy title in manual testing' );


$token = $wgUser->getEditToken( '', $wgRequest ); // Although write actions will result in strange logs
If the data is expected to come from a different host (e.g. only reproducible there, or waiting for the condition to be hit organically), then the message will show up at [https://logstash.wikimedia.org/app/dashboards#/view/AXFV7JE83bOlOASGccsT Logstash dashboard: mediawiki] where you can query for channel:AfHocDebug, or page through the channel list and zoom in on the appropiate channel.


$params = [ 'action' => '<action>', 'token' => $token, /* etc */ ];
=== Ad-hoc command line logging ===
$request = new DerivativeRequest( $wgRequest, $params, /* $wasPosted = */ true );
To reproduce an issue programmatically, it is recommended to follow [[#Debugging in shell]] instead without modifying source code on disk or running modified programs.
$request->setIP( '127.0.0.1' ); // An IP must be set


$context = RequestContext::getMain();
If an issue is difficult to reproduce and you need to modify a maintenance script to log some information quickly you can use the <code>wfDebugLog()</code> approach above. Alternatively, to keep the information local and not write to Logstash, you can also choose one of the following:
$context->setUser( $wgUser );
$context->setTitle( $wgTitle );
$context->setRequest( $request );


$main = new ApiMain( $context, true );
* <code>error_log('Hi ...', 4);</code>
wfRunHooks( 'ApiBeforeMain', array( &$main ) ); // For CirrusSearch and other extensions
* <code>syslog(LOG_DEBUG, 'Hi...);</code>
$main->execute();


// Examine $main->getResult()->getResultData() or whatever else you need to do
[https://www.php.net/manual/en/function.error-log.php error_log Type 4] corresponds to STDERR in CLI. For web requests via Apache, STDERR is not defined and these go to syslog instead. For such web requests, these will end up in Logstash as <code>type:apache2 message:"Got error 'PHP message: Hi..."</code>. For mwdebug hosts, these end up on the [https://logstash.wikimedia.org/app/dashboards#/view/mwdebug1002 mwdebug Logstash], but take note that these will not match type:mediawiki queries and do not show up on the general "mediawiki" or "mediawiki-errors" dashboards in Logstash. For other hosts, you may find these on the [https://logstash.wikimedia.org/app/dashboards#/view/apache2log apache2log Logstash dashboard]
</syntaxhighlight>


For code that is not specific to the shell, and more details on internal requests, see [[mw:API:Calling internally]].
Syslog will end up on disk, readable via <code>sudo tail /var/log/syslog</code> and is also readable without sudo on the [https://logstash.wikimedia.org/app/dashboards#/view/syslog syslog Logstash dashboard], possibly querying with e.g. <code>host:mw0000</code> or <code>message:Hi</code> to find specific entries.
[[Category:Runbooks]]

Latest revision as of 21:47, 28 July 2022

Debugging a web request

Externally

Use X-Wikimedia-Debug to make a request bypass Varnish cache and route to a specific debug server.

Locally

You can make a local self request from any web server by using curl, like so, for a regular MediaWiki request over HTTPS:

mwdebug1002$ curl -i --connect-to ::$HOSTNAME 'https://test.wikipedia.org/w/load.php'
HTTP/1.1 200 OK
Server: mwdebug1002.eqiad.wmnet
…
/* This file is theWeb entry point for MediaWiki's ResourceLoader: … */

Or over HTTP:

mwdebug1002$ curl -i --connect-to ::$HOSTNAME 'http://test.wikipedia.org/wiki/Main_Page'
HTTP/1.1 302 Found
Server: mwdebug1002.eqiad.wmnet
Location: https://test.wikipedia.org/wiki/Main_Page
mwdebug1001$ curl -i --connect-to ::$HOSTNAME 'http://www.wikimedia.org/'
HTTP/1.1 200 OK
Server: mwdebug1001.eqiad.wmnet
…
<!DOCTYPE html>
<html lang="mul" dir="ltr">
<head>
<meta charset="utf-8">
<title>Wikimedia</title>
<meta name="description" content="Wikimedia is a global movement whose mission is to bring free educational content to the world.">
…

And over HTTP as if from an external HTTPS request (This is currently the only way to debug in Beta Cluster, since internal HTTPS is not available there):

deployment-mediawiki11$ curl -i --connect-to ::$HOSTNAME -H 'X-Forwarded-Proto: https' 'http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page'
HTTP/1.1 200 OK
Server: deployment-mediawiki11.deployment-prep.eqiad1.wikimedia.cloud
…
<!DOCTYPE html>
…


Note about Host header: Prior to 2015, the more traditional approach of using curl 'http://localhost/wiki/Main_Page' -H 'Host: test.wikipedia.org' was supported, but per T190111 this is no longer possible because connections via "localhost" are handled by a higher priority VirtualHost in Apache that serves responses for the health status checks (not related to MediaWiki).

Note about FQDN address: Prior to 2019, it was common to workaround the above "localhost" issue by using the internal FQDN (mw0000.eqiad.wmnet) or its internal IP address instead. This is easiest via $HOSTNAME or $(hostname -f), e.g. like curl -i -H 'Host: test.wikipedia.org' "http://$HOSTNAME/w/load.php". While this still works today for HTTP requests, it does not work reliably for HTTPS requests since the web server in question has no certificate for the internal hostname, though this could be bypassed with curl --insecure (or curl -k for short).

Note about --resolve option: Prior to 2020, other documentation pages recommended --resolve as the main strategy, e.g. curl -i --resolve "test.wikipedia.org:443:$(hostname -i)" 'https://test.wikipedia.org/w/load.php'. This still works perfectly today and is functionally equivalent to the current recommendation with --connect-to. The --resolve option is no longer recommended because it is too easy to misuse and not realize that it was silently ignored. For example, if you specify "resolve" with a different hostname than your URL (with redirects, there can be many host names involved), curl will silently connect to the main production edge for your first and only request, which is easy to miss if you don't enable verbose -v mode and check what server it actually connected to. This can be mitigated by using a wildcard hostname like --resolve "*:443:$(hostname -i)" but that still requires getting the port right, which means over HTTP, it would silently get ignored again, plus it requires the IP address and thus the extra hostname command. The --connect-to option has the benefit of allowing both host and port to be omitted, and supports a hostname as destination (instead of IP address), thus allowing the simpler and more memorable "::$HOSTNAME" form.

Pushing code to a debug server

Developers can put code updates on one of the mwdebug hosts, before deploying to the entire production cluster, see Pre-deployment testing in production .

Conditional code

Note that any changes you make this way will be overwritten by cluster-wide deployments. So, long-term changes should go into a block wrapped in an if ( $wgDBname === 'testwiki' ) (to prevent them from accidentally running on all wikis!). Short-term changes (anything not committed to the git repo) should either be committed and rolled out, or reverted as soon as possible.

PHP7 Opcache

When editing files on a debug server directly, remember to clear the PHP7 opcache afterwards. Without this, changes to files on disk might not take affect.

mwdebug1001$ php7adm /opcache-free

When using Scap to pull down a change from the deployment host, this happens automatically.

Testing it

Use X-Wikimedia-Debug in a browser to route one of your regular web requests to the debug server you have staged code on.

Debugging databases

From a maintenance host, use the sql command, or use mwscript mysql.php directly.

In particular, take note that in MediaWiki some of our DB clusters have a different name. For example "x1" and "x2" are known as "extension1" and "extension2", for the purposes of the sql --cluster parameter and internal values of $wgLBFactoryConf that this corresponds with.

Examples:

$ sql test2wiki
# Connected to s3.test2wiki database on a live replica in production.

$ sql centralauth
# Connected to s7.centralauth 

$ mwscript mysql.php --wiki aawiki --wikidb centralauth
# (idem)

$ sql wikishared
# Connected to x1.centralauth

$ mwscript mysql.php --wiki aawiki --cluster extension1 --wikidb wikishared
# (idem)

$ mwscript mysql.php --wiki aawiki --cluster extension2 --list-hosts
db0001
db0002
db0003

Debugging a maintenance script

ssh to a mwdebug host, then:

source /usr/local/lib/mw-deployment-vars.sh
sudo -u "$MEDIAWIKI_WEB_USER" php -m debug "$MEDIAWIKI_DEPLOYMENT_DIR/multiversion/MWScript.php" someScript.php --wiki=testwiki --scriptSpecificParameters "goHere"

Debugging logs

To debug how log messages are sent to Logstash from MediaWiki php-fpm servers, read Application_servers/Runbook#Logging.

Debugging in shell

To open a command-line shell to PHP, log in an mwdebug server or the Maintenance server and run:

$ mwscript eval.php dbname_here

Where dname is e.g. aawiki. You can call arbitrary MW code from here, and use return .. as shortcut for var_dump( .. ); to read out any information.

Ad-hoc log messages

The recommended approach to ad-hoc logging in production is wfDebugLog( 'AdHocDebug', 'Hi...' );. This will reliably send the message to Logstash from both web-facing contexts, jobrunners, and CLI maintenance scripts, and does so without running the risk of unintentionally disclosing sensitive data attached to objects in memory.

If the data will be logged from a mwdebug host via CLI or via WikimediaDebug, then the message will show up at Logstash dashboard: mwdebug

If the data is expected to come from a different host (e.g. only reproducible there, or waiting for the condition to be hit organically), then the message will show up at Logstash dashboard: mediawiki where you can query for channel:AfHocDebug, or page through the channel list and zoom in on the appropiate channel.

Ad-hoc command line logging

To reproduce an issue programmatically, it is recommended to follow #Debugging in shell instead without modifying source code on disk or running modified programs.

If an issue is difficult to reproduce and you need to modify a maintenance script to log some information quickly you can use the wfDebugLog() approach above. Alternatively, to keep the information local and not write to Logstash, you can also choose one of the following:

  • error_log('Hi ...', 4);
  • syslog(LOG_DEBUG, 'Hi...);

error_log Type 4 corresponds to STDERR in CLI. For web requests via Apache, STDERR is not defined and these go to syslog instead. For such web requests, these will end up in Logstash as type:apache2 message:"Got error 'PHP message: Hi...". For mwdebug hosts, these end up on the mwdebug Logstash, but take note that these will not match type:mediawiki queries and do not show up on the general "mediawiki" or "mediawiki-errors" dashboards in Logstash. For other hosts, you may find these on the apache2log Logstash dashboard

Syslog will end up on disk, readable via sudo tail /var/log/syslog and is also readable without sudo on the syslog Logstash dashboard, possibly querying with e.g. host:mw0000 or message:Hi to find specific entries.