You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Performance/Runbook/Measure backend performance: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Krinkle
imported>Ori
(More on benchmarking and load testing in production; remove reference to vtune (doesn't work on ARM, untested by us, alternatives exist, let's not recommend something we haven't tried))
Line 51: Line 51:


In PHP, you can ad-hoc measure using <code>microtime()</code>, for example:
In PHP, you can ad-hoc measure using <code>microtime()</code>, for example:
$t = microtime( true );
 
<syntaxhighlight lang=php>
$t = microtime( true );
   
   
$instance = createMyInstance();
$instance = createMyInstance();
$instance->myMethod();
$instance->myMethod();
$instance->myOtherMethod();
$instance->myOtherMethod();
   
   
print __METHOD__ . ':' . ( ( microtime( true)  - $t ) * 1000 );
print __METHOD__ . ':' . ( ( microtime( true)  - $t ) * 1000 );
   
   
//> 13.321235179901 milliseconds
//> 13.321235179901 milliseconds
</syntaxhighlight>
 
Or from maintenance/eval.php:
Or from maintenance/eval.php:
> $t = microtime(true); for( $i = 0; $i < 100000000; $i++ ) { md5('testing'); } print microtime(true)-$t;
<syntaxhighlight lang=php>
13.321235179901
> $t = microtime(true); for( $i = 0; $i < 100000000; $i++ ) { md5('testing'); } print microtime(true)-$t;
13.321235179901
</syntaxhighlight>
 
MediaWiki has benchmarking scripts in maintenance/benchmarks, including the generic utility benchmarkEval.php:
MediaWiki has benchmarking scripts in maintenance/benchmarks, including the generic utility benchmarkEval.php:
php benchmarkEval.php --code="md5('testing')" --inner=1000000 --count=100
<syntaxhighlight lang=sh>
php benchmarkEval.php --code="md5('testing')" --inner=1000000 --count=100
</syntaxhighlight>
Multiple timing runs will vary substantially. To minimise the impact of this:
Multiple timing runs will vary substantially. To minimise the impact of this:


Line 73: Line 82:
* Benchmark a small amount of code in a tight loop, so that the relative effect of the intervention will be larger.
* Benchmark a small amount of code in a tight loop, so that the relative effect of the intervention will be larger.


Extremely accurate performance measurements can be done using hardware performance counters. On Linux you can use [https://perf.wiki.kernel.org/index.php/Main_Page perf]. On other operating systems you can use [https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/command-line-interface/command-line-interface-reference/collect-with.html vtune -collect-with runsa] [please confirm].
Extremely accurate performance measurements can be done using hardware performance counters. On Linux you can use [https://perf.wiki.kernel.org/index.php/Main_Page perf].


For example, running a benchmark under `perf stat -e instructions` will give a metric which is not affected by background activity on the same host. It tells you how much machine code is executed, which may be a decent model for cost depending on what you're measuring.
For example, running a benchmark under `perf stat -e instructions` will give a metric which is not affected by background activity on the same host. It tells you how much machine code is executed, which may be a decent model for cost depending on what you're measuring.
=== Benchmarking and load testing in production ===
You can use [https://httpd.apache.org/docs/current/programs/ab.html ApacheBench] to load-test an application server:
<syntaxhighlight lang=sh>
# Makes 100 requests (serially) and outputs summary statistics. Use -c to control concurrency.
ab -n100 -X mw2377.codfw.wmnet:80 -H 'X-Forwarded-Proto: https' http://test2.wikipedia.org/wiki/Foobar
</syntaxhighlight>
To see what the output looks like, see {{phab|T279664#8122195}}.
Or use `perf` to benchmark a command-line entrypoint:
<syntaxhighlight lang=sh>
$ perf stat -r100 sudo -u www-data php /srv/mediawiki/multiversion/MWScript.php /srv/mediawiki/php-1.39.0-wmf.22/maintenance/getText.php --wiki=test2wiki Foobar >/dev/null
</syntaxhighlight>
As usual, exercise care when subjecting production servers to synthetic load.


== Beta Cluster ==
== Beta Cluster ==

Revision as of 17:48, 9 August 2022

This provides an entrypoint for measuring performance of existing code.

If you are starting a new project or otherwise have not yet set performance objectives, first read our guidance on Backend performance.

Production impact

Once code is in production, either during an incident or in the hours/days after deploying a new feature to a large wiki, the following are good starting points for measuring potential impact:

  • Grafana: Application Servers RED, this provides an overview for the MW server cluster as a whole, in particular look for response duration ("latency").
  • Grafana: MediaWiki Exceptions, counts of production errors as reported to Logstash.
  • Grafana: WANObjectCache. Memcached should be accessed via the WANObjectCache interface. Among many hidden operational benefits, this also provides rich telemetry on how groups of cache keys are behaving. Use the "by keygroup" breakdown in Grafana for your feature, and look for "cache hit rate" and "regeneration time". This measures the decisions and time taken by getWithSetCallback.
  • Logstash: mediawiki-errors (restricted), details of those production errors (learn more: OpenSearch Dashboards).
  • Logstash: slow queries (restricted), details of database queries from MW that above general performance thresholds.

New code generally rolls out over the course of a week, each week, starting with smaller wikis, and moving to higher traffic sites like Wikipedia. This naturally ramps up and load-tests changes to all code as part of our Train deployment process. See wikitech:Deployments/One week for more information.

The flame graphs linked above include the deployment branch (train week), which helps orient which version of the code is affected, and allows for week over week comparison by loading both in a separate tab. When comparins week over week, make sure to pick a day where the majority of the flame graph is from a single deployment branch. If the graph is clearly split between two major versions, pick a day earlier instead.

In addition to the above continuous monitoring, you can also use WikimediaDebug to capture a performance profile of a relevant user action or web request from before and after a deployment, and compare it with more details that way.

Local development

You can capture detailed trace logs, timing measures, and flame graphs from your local MediaWik install.

If you use MediaWiki-Docker, the packages needed are already installed and you can follow the MediaWiki-Docker/Configuration recipes/Profiling page instead. Otherwise, refer to Manual:Profiling on mediawiki.org for how to install the relevant packages in your local development environment.

It is recommended that you include the DevelopmentSettings.php preset in your LocalSettings.php file. This is done for you by default in MediaWiki-Docker. Among other things, this enables a medium base line of various debug mode. There is an additional section of commented-out Ad-hoc debugging that you can copy to LocalSettings.php as well, and enable when as as you need it (such as $wgDebugToolbar and $wgDebugDumpSql, referred to below).

Database queries

As you develop your database queries, use EXPLAIN and MySQL's DESCRIBE statements to find which indexes are involved in a particular query.

You can find out which exact queries are coming from your code by enabling $wgDebugToolbar in LocalSettings (see also Manual:How to debug). This provides an overview of all queries from a certain page. For API or other misc web requests, you can consult the debug log file which logs all SQL queries when $wgDebugDumpSql is enabled.

When adding a new query to your code (e.g. via the Database::select() helper from our Rdbms library), try to run a version of those queries at least once with the EXPLAIN statement, and make sure that it is effectively using indexes. While a select query without index may run fast for you locally, it is going to perform differently when there are several billion objects in the database.

With the Debug toolbar enabled, look out for:

  • repeat queries, data should be queried authoritivatively once by a service class and then re-used or passed around as needed. If two unrelated callers to a service class regularly need the same data, consider an in-class cache, and limit the size of this cache to avoid uncontrolled growth (e.g. in API batch requests, jobs, or CLI scripts). Even if you don't have a UI for batch operations, a higher level feature may still cause your code to be called in a loop. We providde MapCacheLRU and HashBagOStuff in MediaWiki core to make it easy to ad-hoc keep a limited number of key-value pairs in a class instance.
  • generated queries, if you see many similar queries with one different varaible, this may be coming from a loop that should instead query the data in a batch upfront.

For more details, see also: Roan Kattouw's 2010 talk on security, scalability and performance for extension developers, Roan's MySQL optimization tutorial from 2012 (slides), and Tim Starling's 2013 performance talk.

  • You must consider the cache characteristics of your underlying systems and modify your testing methodology accordingly. For example, if your database has a 4 GB cache, you'll need to make sure that cache is cold as otherwise your data is likely still in the cache from previous queries.
  • Particularly with databases, but in general, performance is heavily dependent on the size of the data you are storing (as well as caching) -- make sure you do your testing with realistic data sizes.
  • Spinning disks are really slow; use cache or solid state whenever you can; However as the data size grows, the advantages of solid state (avoiding seek times) are reduced.

Benchmarking

Quantify a proposed performance improvement by measuring it.

In PHP, you can ad-hoc measure using microtime(), for example:

$t = microtime( true );
 
$instance = createMyInstance();
$instance->myMethod();
$instance->myOtherMethod();
 
print __METHOD__ . ':' . ( ( microtime( true)  - $t ) * 1000 );
 
//> 13.321235179901 milliseconds

Or from maintenance/eval.php:

> $t = microtime(true); for( $i = 0; $i < 100000000; $i++ ) { md5('testing'); } print microtime(true)-$t;
13.321235179901

MediaWiki has benchmarking scripts in maintenance/benchmarks, including the generic utility benchmarkEval.php:

php benchmarkEval.php --code="md5('testing')" --inner=1000000 --count=100

Multiple timing runs will vary substantially. To minimise the impact of this:

  • Use a large loop count to benchmark for a long time — at least 10 seconds.
  • Avoid any other system activity while the benchmark runs. If you are using your laptop, kill your browser and anything else that might wake up periodically.
  • Don't use a VM if there is any other activity on the same hardware.
  • Avoid unnecessary I/O within the benchmark. For example, disable logging.
  • Benchmark a small amount of code in a tight loop, so that the relative effect of the intervention will be larger.

Extremely accurate performance measurements can be done using hardware performance counters. On Linux you can use perf.

For example, running a benchmark under `perf stat -e instructions` will give a metric which is not affected by background activity on the same host. It tells you how much machine code is executed, which may be a decent model for cost depending on what you're measuring.

Benchmarking and load testing in production

You can use ApacheBench to load-test an application server:

# Makes 100 requests (serially) and outputs summary statistics. Use -c to control concurrency.
ab -n100 -X mw2377.codfw.wmnet:80 -H 'X-Forwarded-Proto: https' http://test2.wikipedia.org/wiki/Foobar

To see what the output looks like, see task T279664#8122195.

Or use `perf` to benchmark a command-line entrypoint:

$ perf stat -r100 sudo -u www-data php /srv/mediawiki/multiversion/MWScript.php /srv/mediawiki/php-1.39.0-wmf.22/maintenance/getText.php --wiki=test2wiki Foobar >/dev/null

As usual, exercise care when subjecting production servers to synthetic load.

Beta Cluster

The Beta Cluster is hosted in Wikimedia Cloud. This is a good place to detect functional problems, but may not be a representative environment for performance measures as it runs in a virtualised multi-tennant environment. Meaning, the machines are less powerful than production, and often under heavy load. See also T67394.

See also

Credits

Portions of this page were copied from "Performance profiling for Wikimedia" on mediawiki.org as written by Sharihareswara (WMF) in 2014.