You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Performance/Guides/Backend performance practices

From Wikitech-static
Jump to navigation Jump to search

These are the performance guidelines for MediaWiki backend development aimed at being deployed to Wikimedia Foundation wikis. Use the below guidelines together with the Architecture guidelines and Security guidelines.

Getting started

These guidelines focus both on individually perceived performance (your code runs relatively fast) and scalability (your code can run fast, even on a large wikis and when run many times concurrently).

General principles

  1. Identify hot spots. Which parts of your code take the most time to execute? These may deserve extra care during code review in the future. The team should maintain awareness of these hot spots by regularly (e.g. monthly) verifying and measuring your code's performance in production. When making choices in the interest of performance, measure instead of relying on intuition. Results can be surprising and may change over time. See also: How to measure backend performance.
  2. Identify relevant latency metrics, and take responsibility for them. As developers you know best how your code is exercised and what high-level actions are meaning to your product. Monitoring their latency will let you know when an experience becomes slower and/or when there are capacity problems. Consider documenting your latency metrics, and review them on a monthly or quarterly basis to prioritise maintenance as-needed to uphold the quality of service. Example: Save Timing documentation.
  3. Treat the infrastructure as one that we share with others. Your code can become a high-traffic workload in Wikimedia production! Each web request has a strict HTTP timeout. SQL query may be automatically stopped after 30s. Perform long-running tasks on a dedicated server instead, e.g. via the JobQueue. See also #Long-running queries.
  4. Wikimedia heavily depends on its caching layers to survive. However, our caching exists primarily to increase capacity, not speed. Hitting memcached first means we need fewer database servers. A slow database query behind Memcached, is still a slow query!
  5. Choose the right persistence layer for your needs. Only cache data if your code can also performantly respond when the cached data is missing; otherwise, use stronger persistence for your data. See also #Persistence layer. When designing tables, or adding new database queries, consider that every query must use an index (including write queries!). See also: #Index.
  6. The cache hit ratio should be as high as possible. When introducing new batching calls or requests, underlying requests retain effective cache utilization. Highly variable URLs or data that is cached under multiple permutations of cache keys tend to worsen cache-hit ratio. See #Leverage the platform! for how our platform offers batching abilities that enable high cache-hit ratios.
  7. When performance is subpar, this can often be indicative of a deeper issue in how the code is solving a particular need. Think about the root cause, and whether certain costs can be structurally avoided or deferred. See also: #You are not alone.
  8. Before adding cookies, consult with Performance Team or SRE Traffic. See also #Cookies.

Ballpark numbers

  • When accessing information (e.g. a view or API request over GET), aim for your backend to respond within 50ms at the median and within 200ms at the p99. In other words, common requests to popular data that benefit from internally warm caches respond within 50ms (e.g. database server cache, or Memcached/Apcu), and requests that encounter internal cache misses still gather and render all data within 200ms.
  • When performing write actions (e.g. a POST request), aim for your backend to respond within 500ms at the p99. Make sure that the amount of work a request performs is naturally and deterministically limited (e.g. do not rely on server-level memory limits or timeouts). It is encouraged to deny and incentive against usage patterns that we cannot maintain at scale. Remember that you can schedule tasks via the Job queue and Deferred updates which let you run code asynchronously either on the same server after the response has been sent ("post-send deferred update"), or a few seconds later in a separate cluster ("jobqueue job").

Percentiles

For backend code, we recommend looking at high percentiles (such as the p95th and 99th percentile) instead of mean averages or medians (learn why: [1][2][3]). Backend latencies tend to vary based on factors like cache warmth and server load (instead of varying based on what a person is doing), which means even an issue with 1% of requests may affect everyone on a regular basis (though perhaps not at the same time). Monitoring the average would systematically ignore well over half the audience. Monitoring the 996h percentile tells you how your code behaves when it matters most.

Performance data often hides two different stories. One for users accessing the application on a warm cache, and another with a cold cache. Calculating averages on such dataset is deceptive. When benchmarking, create at least 10,000 observations to calculate a 50th and 90th percentile. If these numbers differ greatly, that would indicate a performance problem. For example, if the code requires network fetches and you have many resources to fetch, there will be a group who leverages cached cached resources (thus avoiding the roundtrips) vs a group that does not. A rule of thumb about statistical significance in performance context is that one needs 10,000 data points to calculate a 90th percentile; and 100,000 for a 99th percentile; and 1 million for a 99.9th.

Latency

Aim for your software to provide a reasonably fast experience, regardless of network latency. Client-side latency depends on numerous factors, including: your backend response time, the CDN response time, the round-trip time from the client device to our servers and back (RTT), and the transfer rate (bandwidth) that the client's network is capable of.

The RTT and bandwidth are not always related. For example, a Gigabit connection with an RTT of 2 seconds will not transfer anything in less than 2 seconds, regardless of payload size. You can think of bandwidth as the size of a truck on a highways (or the number of highway lanes), and RTT is how fast one may travel. If 1000 kg are transferred by the truck in 1 hour, this does not mean that 1 kg will arrive in 3 seconds. We recommend the free e-book High Performance Browser Networking by Ilya Grigorik. In particular, mobile devices take more time to reactivate their network connection after a period of inactivity.

Strategies for reducing or masking latency:

  • Ensure responses to unregistered users must generally be cachable by the CDN (i.e. requests from browsers without an active login or edit session).
    • For pageviews, page actions, and special pages, this is controlled by OutputPage.
    • A good example of a cachable Action API module is ApiOpenSearch.
    • A good example of a cacheable REST API route is Rest\Handler\SearchHandler.
  • Split high-level operations into smaller re-usable methods. This allows different parts of your code to only ask for what is needed, instead of accumulating many kinds of computations that "always" happen whenever anything interacts with your feature (which essentially guarantee spending more time than needed).
  • Set a tight timeout. Once you've set a p99 latency objective for your backend responses, think about might cause a response to fall in that last 1%. For example, if your feature includes internal requests to other services, what timeouts do they have? What re-tries do they allow? Consider what happens if suddenly a majority of web traffic starts to exercise your 1% scenario. Do we start to exhaust all backend capacity? Or do we fail quickly and free up resources so that other parts of the site stay up? Once you've set and achieved a latency objective, put a limit in place to shift outliers from "slow" toward "error". This way callers can quickly re-try, which some services may even do automatically.

See also Page load performance#Latency for advice on preloading and stale revalidation.

Leverage the platform and benefit by building atop prior work for scaling MediaWiki to our performance needs

How often will my code run?

Think about how often the server will execute your code. We tend to categorise server-side concerns in these buckets:

  1. Always.
    • Critical code running unconditionally on every backend request. This should be kept minimal with clearly low cost. This includes early Setup hooks, extension.json callbacks, and service wiring.
    • The typical budget for new needs from core or extension hooks during the Setup phase is under 0.1ms.
  2. On pageviews.
    • Code running on all HTML web responses. This is almost always, but not for action=raw requests, or HTTP 304 "Not Modified" responses to a pageview URL.
    • The typical budget for new output on a pageview is 1ms. We aim to respond within 50ms to a majority of requests (#Ballpark numbers).
  3. When parsing page content.
    • Most backend requests for pages views fetch page content from the ParserCache and render a skinned pageview. The parsing of page content happens on a relatively small portion of pageviews only (e.g. cache miss), as well as during the response to saving of edits. For that reason, it is acceptable to perform a limited number of more expensive operations during the parser run, e.g. from a parser hook. These hooks generally do not execute while someone is waiting during a pageview, as the result of these operations is retained in the parser cache.
    • The typical budget for components that render content is 10ms (for most pages using your feature), or up to 100ms for pages that utilize your feature in an unusually complex manner.
  4. When performing a write action.
    • Editing is the most common non-read action. The latency of processing an edit is actively monitored as Save Timing.
      • The typical budget for components that parse content or otherwise synchronously hook into edit submissions is 10ms (for most pages), or up to 100ms for unusually complex pages. We aim to respond within 1 second for write actions, and there are many extensions participating in that shared budget (#Ballpark numbers).

Leverage the platform!

  • Move work to post-send Deferred updates or Jobs if it doesn't have to happen in the critical path of a web response.
  • When interacting with memcache, use the getWithSet idiom and use WANObjectCache. Avoid calling memcached directly. WANCache automatically takes care of numerous "at scale" needs such as stampede protection, purging, mutex locks, and warms your caches by automatically regenerating values before they expire. This reduces the chances of hot keys ever getting a cache miss. WANCache also integrates with dev tooling such as rate and latency stats on Grafana: WANObjectCache.
  • Utilize core service classes whenever possible and look for batch methods that can process several of your requests in parallel. Our database abstraction layer, cache interfaces, and HTTP clients all support batching. For example: IDatabase::select, IDatabase::selectFieldValues, BagOStuff::getMulti, HttpRequestFactory::createMultiClient, and FileBackend::doOperations.

You are not alone

If your code is amplifying a pre-existing performance issue in other component or service, identity these and ensure relevant teams are made aware. The Performance Team can help you in finding and/or advocating for these cross-component issues.

Work with the Performance Team to understand general performance targets before you start architecting your system. For example, a user-facing application might have an acceptable latency of 200ms while a database might have something like 20ms or less, especially if further access is decided based on the results of previous queries. You don't want to prematurely optimize, but you need to understand if your targets are physically possible.

You might not need to design your own backend; consider using an existing service, or having someone design an interface for you. Consider modularization. Low-level performance is hard, avoid reinventing the wheel!

Persistence layer

Choose the right persistence layer for your needs. In some cases, a cache can be used instead of a persistence layer.

MediaWiki is configured in production to leverage local services that include Memcached, JobQueue, ExternalStore and MainStash (MariaDB), and Swift. These reside on a low-latency network in the same data center.

The most appropriate layer is the lightest store that meets your requirements. Avoid incurring overhead that serves unneeded guarantees. Here's when we recommend each:

  1. Cache - short-term storage of ephemeral unstructured data of small or medium size (up to 800KB).
  2. MainStash - replicated storage of ephemeral unstructured data and small to large blobs (avoid larger than 2MB), backed by MySQL/MariaDB.
  3. JobQueue - reliable short-term persistence of a job description. Once the job is completed, the job parameter data is automatically pruned.
  4. Database - long-term storage of structured data and blobs, backed by MySQL/MariaDB via dedicated table schemas.
  5. FileRepo - long-term storage of binary media files, backed by Swift. This is used by user file uploads, but can also be used for generated files like the Score extension (for MIDI files) and ConfirmEdit extension (for captchas). See wikitech:Media storage for details.

Most caches are backed by Memcached, see Object cache on mediawiki.org for guidance of when and how to WANObjectCache and the other cache interfaces. Use a cache for data that may persist between web requests, but that is okay to quietly fail and be pruned at any time. Use memcached to store objects if the database could recreate them but it would be computationally expensive to do so — you don't want to recreate them too often. You can imagine a spectrum between caches and stores, varying on how long developers expect objects to live in the service before getting evicted; refer to #Caching layers for more details.

Permanent names: Whenever possible, store resources under identifiers that won't change, e.g. revision IDs, page IDs, and user IDs instead of page titles or user names.

Bad example: Uploaded files in MediaWiki which as of 2022 are still only identified by their display title. When end-users rename the title of an uploaded file, an expensive, slow, and fragile operation takes place as follows: Updates to several database tables, and updates to a potentially large number of rows within those tables (every matching "oldimage" table row), and the database transaction even has to be synchronised with changes to the underlying Swift file server to relocate the physical place of each version of the uploaded file. Atomicity across different unrelated systems is impossible in practices, and this inevitably leads to temporary broken state, and the need to best-effort undo everything if any part of it fails. It may also require purging or moving of generated thumbnails and transcoded video files, and purging of ParserCache entries of all articles on all wikis that refer to the file. All that, merely because files don't have stable identifiers.[4]

Object size: Avoid storing multi-megabyte data in Memcached, this is discarded automatically. Verify whether caching is effective in both hit ratio, saving time, and saving backend load. For example, prior to 2015, ResourceLoader stored minified JavaScript code in Memcached, while the cache had a high hit-ratio and saved redundant minification computations, we proved that it was faster to run the minifier than to fetch and wait for cached data over the network from a Memcached server. It was cheaper and faster to recalculate than to retrieve, and doing so freed up webserver load, Memcached load, and network traffic. Another example is the Watchlist data, which performs a lookup like "is this page watched by current user", which we choose not to cache, because it's indexed and just as fast to query from a database as from Memcached, and more importantly, caching it would make things worse because the wide range of page-user combinations means it's almost never a cache-hit and thus you then wait for both a Memcached server, and then after that also wait for a database server still. WANObjectCache in Grafana provides statistics about how effective your cache keys are in cache-hit ratio, fetch time, and computation time.

Job queue examples

Use cases

  • TimedMediaHandler: Creating transcoded variants of a video that has been uploaded.
  • UploadWizard: Uses the core API for large uploads. This queues a job to take care of saving chunks of a file, reassembling them, turning into a file the user can view. The uploader can meanwhile start writing the file description, metadata, etc., while the UploadWizard frontend uploads the file one chunk at a time in the background.
  • HTMLCacheUpdate: Purge all articles from the CDN (Varnish) that use a given template, and invalidate the page_touched column in the database which instructs ParserCache to re-parser the article next time someone views it.
  • Other extensions that use the job queue include RenameUser, TranslationNotification, Translate, GWToolset, and MassMessage.

Example: Send a notification

Suppose we need to store a new notification. Good solution: run the actual notification action (sending an email) via the jobqueue. Bad solution: Send the email during the web request, thus delaying the browser response, and holding open the database transaction until the email is sent (and with no automatic re-try).

Example: Count active users

The Beta features extension lets a user opt-in to a "Beta feature" and displays, to the user, how many users have opted in. The preference data is stored in the user_properties table. Counting the number of opted-in users every time the count is displayed would not have acceptable performance. Thus, MediaWiki stores the count in the betafeatures_user_counts database table, and additionally in memcached. It's important for a change in preference to immediately apply to a user's own experience on subsequent pageviews, but, it's not important to immediately increase or decrease the count. Therefore, BetaFeatures instead recomputes the count every half hour at most. Specifically, the extension queues a job that runs a SELECT query. This query can take a long time on large wikis - up to several minutes! Once done, the next time someone views the Beta features preferences page, the count is fetched from the "count" table where the job stored it, and then kept in Memcached for a while. Code: UpdateBetaFeatureUserCountsJob.php and BetaFeaturesHooks.php.

Indexing

When designing new tables, or adding new database queries, consider that all queries must use an index (including write queries!). Use EXPLAIN on your queries and create new indices where required.

Unless you're dealing with a tiny table, you need to index writes (similarly to reads). Watch out for deadlocks and for lock-wait timeouts. Try to do updates and deletes by primary query, rather than some secondary key. Try to avoid UPDATE/DELETE queries on rows that do not exist. Make sure join conditions are cleanly indexed.

You cannot index blobs, but you can index blob prefixes (the substring comprising the first several characters of the blob).

Compound keys - namespace-title pairs are all over the database. You need to order your query by asking for namespace first, then title!

Use EXPLAIN & MYSQL DESCRIBE query to find out which indexes are affected by a specific query. If it says "Using temporary table" or "Using filesort" in the EXTRA column, that's often bad! If "possible_keys" is NULL, that's often bad (small sorts and temporary tables are tolerable though). An "obvious" index may not actually be used due to poor "selectivity". See the Measure backend performance in production guide, and for more details, see Roan Kattouw's 2010 talk on security, scalability and performance for extension developers, Roan's MySQL optimization tutorial from 2012 (slides), and Tim Starling's 2013 performance talk.

Indexing is not a silver bullet; more isn't always better. Once an index gets big enough that it doesn't fit into RAM anymore, it slows down dramatically. Additionally, an index can make reads faster, but writes slower.

Good example: See the ipblock and page_props tables. One of them also offers a reverse index, which gives you a cheap alternative to SORT BY.

Bad example: See this changeset (a fix). As the note states, "needs to be id/type, not type/id, according to the definition of the relevant index in wikibase.sql: wb_entity_per_page (epp_entity_id, epp_entity_type)". Rather than using the index that was built on the id-and-type combination, the previous code (that this is fixing) attempted to specify an index that was "type-and-id", which did not exist. Thus, MariaDB did not use the index, and thus instead tried to order the table without using the index, which caused the database to try to sort 20 million rows with no index.

Multiple datacenters

See Database transactions

MediaWiki at WMF is actively responding to requests from multiple data centers.

When our CDN forwards a cache-miss requests to one of the core data centers, the responding service (e.g. MediaWiki/Apache, or Thumbor) must only contact other services local to that datacenter, e.g. reading data from local database replicas. The same applies when writing to caching layers (with an exception for WANObjectCache purges that asynchronously broadcast to other data centers).

Write operations to primary data stores (MySQL, Swift, Cassandra) are more complex, and may only happen during HTTP POST or PUT requests from end-users and should be synchronously committed within the primary datacenter (possibly with asynchronous replication to remote datacenters). Updates to derived or secondary data stores (e.g. search index, Elastic, BlazeGraph) can use the JobQueue. We have configured the JobQueue such that it can independently accept new jobs locally in each data center. Behind the scenes, asynchronous workers will eventually relay and execute the job in the primary DC. That is, the logic of Job::run() executes in the primary DC only where you can safely perform writes to primary database tables.

HTTP POST/PUT requests to MediaWiki will be routed to the primary datacenter. An independent non-MediaWiki API service might be able to run write APIs correctly in multiple datacenters at once if it has limited semantics and has no relational integrity dependencies on other source data persistence layers. For example, if the service simply takes end-user input and stores blobs keyed under new UUIDs, with no way for writes to conflict. If updates or deletions are later added as a feature, then Last-Write-Wins might be considered a "correct" approach to handling write conflicts between datacenters (e.g. if only one user has permission to change any given blob then all conflicts are self-inflicted). If write conflicts are not manageable, then your API requests need to be routed by our CDN to the primary datacenter only.

Shared resources

Be mindful that your code is running in an environment with finite resources used to service numerous APIs, some of which most handle high traffic with low latency. Each application server and intermediary proxy layer in handling a request has limited CPU, disk space, disk bandwidth, network bandwidth, and worker threads. There is a finite pool of application and proxy servers. The same holds true of persistence layers such as databases, object stores, queues, and cache servers.

We recommend thinking of locks and semaphores as an intentionally limited resource as well. When code sits idly waiting for a lock or semaphore during a web response, that code still occupies one of the workers on the backend servers, as well a worker thread on each intermediary layer that is waiting to proxy the response back to the browser (e.g. Apache, Varnish, Envoy, etc.). If too many workers are waiting for locks, this can cause overloads or even outages.

Transactions

Every web request and every database operation, in general, should occur within a transaction. MediaWiki automatically creates a transaction for you during web requests when interacting with the main MySQL databases (we use the default "REPEATABLE-READ" isolation level). This offers you a simpler developer experience, where your code is atomic and consistent by default. For example, when you select an ID from one table, you can rely on being able to find the associated row in another table. This does mean MySQL must hold on to any locks until the web request (and thus the transaction) is completed, which makes us more prone to lock contention.

MySQL uses "record locks" to lock individual rows (or gaps) that prevent other requests from creating new rows that would be similiar or conflict with one your database writes. Gap locks can be confusing to understand and hence cause surprising cases of contention. Refer to On MySQL locks (Kumar, 2013) to learn about gap locks.

Be careful when mixing the operations of a database transaction with significant operations to other services (e.g. those that take over 10ms). Also, think about the order in which rows are locked due by your writes (e.g. UPDATE and INSERT queries). When you update, delete, or insert rows in the database ask yourself:

  • What table index can mysql use to guarantee that your SELECT query is fast?
  • What row(s) and row gaps will this cause mysql to lock?
  • Are there other functions or classes that write to the same table?
  • What else will happen between the write query (and thus the lock) and the the end of the web response? (when the transaction is committed).

You can choose to instruct MySQL to explicitly lock a row or gap during a SELECT and return the latest committed data (plus any changes pending in the current transaction as always). This can be useful when reading data that will be used to determine the next update queries. This can be done via "SELECT ... FOR UPDATE". Note that this can lock the same way that an UPDATE query with the same WHERE clause would. The use of "SELECT ... LOCK IN SHARE MODE" is strongly discouraged as it can lead to deadlocks. The use of LOCK TABLES is problematic and not not supported at all in our platform.

Each individual write query should normally take less than 5ms to complete, even on the largest production databases. This may require designing and indexing for that during the planning phase, which DBAs can help you with.

Each overall transaction (all writes from one web request) should normally stay open less than 250ms in total. This is important to reduce contention and thus how long other requests may be required to wait. Try to move updates made to highly contentious tables to the end of the request (e.g. via DeferredUpdates). For example, an API request that stores an uploaded file in Swift (which may take a while if the file is large) and also updates a row in the database, we recommend performing the file operation first and then the database write. This reduces the time time between the start of the database transaction and the end of the request.

Be mindful of hooks that often run at various points through the web response, which extensions can use to implement additional logic and that may run extra database queries as part of that hook. To minimize the risk of timeouts, deadlocks, and non-atomic updates, aim for speed and simplicity for any database writes during the main transaction round. Updates that take non-trivial time or are complex should use post-send DeferredUpdates or the JobQueue when possible, to better isolate different features from one another. Use purges to invalidate caches rather than in-place cache updates.

Example: Here's a common mistake that can cause inappropriate locking. The user_properties table follows the three-column "Entity-attribute-value" pattern:

  1. user ID (entity).
  2. user preference name (attribute).
  3. user preference value.
  4. Primary key: user ID + preference name.

When saving a change, it may be tempting to delete all existing rows for the user ID, and inserts everything you know about the new state. But, this would cause database contention. What MediaWiki does instead is to only make changes to the table by the unique primary key for each row. First, select existing rows, then, we use "upsert" for the rows that have changed (which adds missing rows and replaces existing ones), and lastly "delete" the rows that are no longer needed. This is an important example where more code and more queries produces a significantly faster and more efficient outcome at scale, despite perhaps feeling slower when measured in isolation as a single action in an empty world.

General guidelines:

  • Nearly all tables should have an immutable "ID" column, especially if there is "name" column. ID columns should usually be unsigned integers with values being assigned by the database (e.g. AUTOINCREMENT). In rare cases, a UUID column might be used instead. ID columns are essentially for efficient indexes, low maintenance foreign keys, and simplicity of schema migration scripts.
  • Large blob values that will be permanent and numerous typically go in ExternalStore. If the blobs are ephemeral and either numerous or frequently updated, then they usually go in MainStashDB. Heavy blob writes on the main database clusters (e.g. s1-s8) can lead to contention and poor performance for many reasons (e.g. purge lag, replication lag, poor use of buffer pool, poor use of disk space)
  • Try to design the schema, classes, and web APIs around single-record create/update/delete access patterns and multi-record read/list access patterns. If the API needs to support multi-record write operations, then a loop will often suffice. If hundreds of records are often involved or if the entry point becomes high traffic, then batched write query optimizations become justified. Most batching optimizations should be focused on APIs that can fetch multiple records by ID/name or page through records by ID/name/timestamp (possibly involving a JOIN with other records).
  • Wrap write operations that are meant to be atomic in IDatabase::startAtomic() and IDatabase::endAtomic() for each affected database. This methods can nest, so if atomic method A can invoke atomic methods B, C, and D. This declaring of atomic section is most needed when the code runs in a CLI mode (automatic transactions are disabled). Only use IDatabase::ATOMIC_CANCELABLE if you need call IDatabase::cancelAtomic() in some cases as it incurs extra round trips.
  • When reading data via SELECT to determine what write query to issue, you should normally use "SELECT ... FOR UPDATE". Avoid using "LOCK IN SHARE MODE".
  • APIs for multi-record writes should require explicit specification IDs of the affected items when possible (IDs are preferred over names). If deadlocks or lock wait timeouts become a problem, consider using one of the following strategies (in order of appearance).
    1. Consider moving high-contention row writes to the end of the transaction via IDatabase::onTransactionPrecommitOrIdle().
    2. Consider moving slow non-database operations to before high-contention database queries. If it is OK for the database operation to sometimes succeed while the non-database operation failed, consider using a DeferredUpdate.
    3. Consider moving slow database operations to a DeferredUpdate if they do not need to be highly atomic with respect to the main changes.
    4. If the records must be implicitly specified by a hierarchical relationship (e.g. "all comment rows for this thread row"), consider first locking the parent record via "SELECT ... FOR UPDATE" at the start of the operation. This can reduce deadlocks caused by concurrent operations to similar parts of the underlying B*-Trees in MySQL.
    5. If the records must be implicitly specified based on a broader conditional (e.g. inequalities, JOINs, sub-queries), consider selecting the ID of the matching rows first, without "FOR UPDATE" and then changing the rows with those IDs that still match the condition (e.g. restate any WHERE clause). This can reduce contention and deadlocks caused by MySQL "gap locking". Note other concurrent transactions can insert rows that would have matched the condition and commit before the transaction ends. Any associated risk to the API semantics should be considered. Sometimes, leaving out the restated WHERE in an UPDATE/DELETE query can reduce contention (e.g. if query planner index use is poor), but this increases the risk of anomalies (e.g. changing rows that no longer matched due to concurrent transaction writes).
    6. For tables with a parent/child relationship (e.g. page/revision or thread/comment), where the parent table has a "touched"/"CAS" column, a hybrid of the two above approaches can be used. If all operations to a child table row first do a "SELECT ... FOR UPDATE" of the corresponding parent table row, then check that the SELECT without "FOR UPDATE" yields the same values for "touched"/"CAS" column, and update that column by the end of the operation, then the child table writes could first SELECT the IDs of the matching rows, without FOR UPDATE, and then change them in second query.
    7. A variation of the above pattern uses ILoadBalancer::CONN_TRX_AUTOCOMMIT database handles and replaces FOR UPDATE with IDatabase::getScopedLock() calls on keys named after each specified parent row ID (e.g. "page-<ID>").

Long-running queries

We generally develop service classes and database queries with the expectation that they will run in response to a web request. Other contexts where your code may get called are JobQueue jobs and command-line Maintenance scripts.

MySQL uses snapshots for SELECT queries, and the snapshotting persists until the end of the database connection or transaction. Snapshots implement "REPEATABLE-READ" semantics which ensures that within your query session, you see a the database as it existed in single point in time (the time of the first SELECT). Keeping one connection open for more than a few seconds is generally problematic on regular database replicas. Long connections cause MySQL to create a temporary copy of all rows to remember during your connection, because you might query them later.

Queries that select or read data (5 seconds or more) must be run offline and via database hosts dedicated for that purpose.

Note that when assessing whether your queries will take "a long time" or cause contention, measure them. These numbers will always be relative to the performance of the server in general, and to how often it will be run. You can also search the DBPerformance channel on logstash for the URLs or table names handled by your code.

Good example:

  • Special pages that display data based on complex queries are generated periodically by a maintenance script, run via a cron job. The queries in question use the "vslow" query group, which directs the connection to a live database replica in production set aside for slow queries. See also: updateSpecialPages.php, QueryPage::isExpensive, and $wgMiserMode.
  • Analytics reports are standalone and separate from any deployed MediaWiki core. These are generated via periodic cron jobs on analytics clients (aka "stat" machine) that query the Analytics DB Replicas, which contain a full unredacted near-realtime mirror of the production MediaWiki databases.

Advisory locks

Beware of common pitfalls around simple advisory locks (e.g. form LockManager, PoolCounter, BagOStuff::getScopedLock, or Database::getScopedLock).

Make your lock key as granular and narrow as possible to allow high concurrency of other similar operations that don't conflict.

Likewise, aim to hold the lock for a short duration by releasing it as soon as the critical operation is done. This allows high throughput of operations that modify the same data. When a burst of requests all need to modify the same data (e.g. traffic spike about the same subject, or from the same bot account), each of those will be waiting in their respective "lock" function call for the previous one. The sooner you can release a lock the better!

Blocking locks are prone to deadlocks, which is a race condition where two requests are both waiting and can't continue (e.g. request 1 waits for lock B held by request 2, and request 2 waits for lock A held by request 1). This is more likely to happen if you have multiple different locks throughout your code execution. If possible, organise the code such that its locks can be acquired up front and then later released all at once. This way, once you have the locks, your code can't get stuck half-way.

Blocking locks tend to be used when code is written such that each conflicting operation is performed one after the other (additive/incremental).

Whenever possible, prefer non-blocking locks. This is possible when the code is designed such that operations can be performed by a single server at any given time, and servers fallback to doing nothing or showing an error message. For example, can the code work such that multiple refreshes of the same data are redundant? Or changes that overwrite data would have one party lose either way.

Rate limiting

If your product exposes new user actions that make database modifications beyond the standard page creation / page editing mechanism, then firstly consider whether this is appropriate and scalable. You're going to have a lot less maintenance overhead and operational risk if you adopt "Everything is a wiki page". See also Choose boring technology by Dan McKinley.

If you do have to expose new "write" actions, make sure a rate limit is applied.

Example:

  • UrlShortener exposes API to create new short URLs, which needs a rate limit. Typically powered by User::pingLimiter. T133109

For expensive computations that are not write actions, such as power user features that may expose slow or expensive computations,, consider implementing a throttle based on PoolCounter to limit overall server load.

Example:

  • Special:Contributions exposes a database read query that can be slow. This is rate limited by PoolCounter. See also T234450 and T160985.

Work involved during cache misses

Wikimedia uses and depends heavily on many different caching layers, so your code needs to work in that environment! (But it also must work if the cache is empty.)

Cache-on-save: Wikimedia sites use a preemptive cache-repopulation strategy: if your code will create or modify a large object when the user hits "save" or "submit", then along with saving the modified object in the database/filestore, populate the right cache with it (or schedule a job in the job queue to do so). This will give users faster results than if those large things were regenerated dynamically when someone hit the cache. Localization (i18n) messages, SpamBlacklist data, and parsed text (upon save) are all aggressively cached. (See "Caching layers" for more.)

At the moment, this strategy does not work well for memcached for Wikimedia's multi-datacenter use case. A workaround when using WANObjectCache is to use getWithSetCallback as normal, but with "lockTSE" set and with a "check" key passed in. The key can be "bumped" via touchCheckKey to perform invalidations instead of using delete. This avoids cache stampedes on purge for hot keys, which is usually the main goal.

If something is very expensive to recompute, then use a cache that is somewhat closer to a store. For instance, you might use the backend (secondary) Varnishes, which are often called a cache, but are really closer to a store, because objects tend to persist longer there (on disk).

Cache misses are normal: Avoid writing code that, on cache miss, is ridiculously slow. (For instance, it's not okay to count * and assume that a memcache between the database and the user will make it all right; cache misses and timeouts eat a lot of resources. Caches are not magic.) The cluster has a limit of 180 seconds per script (see the limit in Puppet); if your code is so slow that a function exceeds the max execution time, it will be killed.

Write your queries such that an uncached computation will take a reasonable amount of time. To figure out what is reasonable for your circumstance, ask the Performance Team.

If you can't make it fast, see if you can do it in the background. For example, see some of the statistics special pages that run expensive queries. These can then be run on a dedicated time on large installations. But again, this requires manual setup work -- only do this if you have to.

Watch out for cached HTML: HTML output may be cached for a long time and still needs to be supported by the CSS and JS. Problems where old JS/CSS hang around are in some ways easier to test, but stale HTML can be insidious!

Good example: See the TwnMainPage extension. It offloads the recalculation of statistics (site stats and user stats) to the job queue, adding jobs to the queue before the cache expires. In case of cache miss, it does not show anything; see CachedStat.php. It also sets a limit of 1 second for calculating message group stats; see SpecialTwnMainPage.php.

Bad example: a change "disabled varnish cache, where previously it was set to cache in varnish for 10 seconds. Given the amount of hits that page gets, even a 10 second cache is probably helpful."

Caching layers

Caching layers to be aware of:

  1. Browser caches
    In addition to the browser's HTTP cache, we also cache JavaScript and CSS modules in HTML5 LocalStorage which acts as a defragmented browser cache that significantly reduces network requests (details at ResourceLoader/Architecture#Store).
  2. CDN cache (aka "Edge caching" or "Varnish frontend")
    The Varnish caches stores entire HTTP responses, including thumbnails of images, frequently-requested article content, ResourceLoader modules, and most anything else that can be retrieved by URL.
    Wikimedia operates CDN front-ends in multiple data centers around the world to reduce latency between browser and server. See MediaWiki at WMF for some details.
  3. object cache (backed by Memcached at WMF)
    Our object caching layers scale all the way from small ephemeral memory on each individual web server (e.g. PHP-APCU, uncoordinated and unreplicated), to large Memcached clusters, and even multi-DC replicated clusters backed by SQL databases (e.g. ParserCache). To to help decide which abstraction layer to use when, refer to Object cache on mediawiki.org.
  4. MySQL internal buffer pool and query cache (transparently automated, not controllable)


Think about how you will invalidate or expire content from the various caching layers. Is it by purging? Directly pushing updates (setting keys into cache)?, Or by bumping the timestamp or version number of a URL or cache key? Your application needs will determine your cache purging strategy.

Since the CDN cache serves content by URL, URLs ought to be deterministic -- that is, they should not serve different content from the same URL. Different content belongs at a different URL. This should be true especially for anonymous users. (For logged-in users, WMF's configuration contains additional wrinkles involving session cookies).

Bad examples:

Cookies

For cookies, besides the above concerns around "Caching layers", there is also the issue that cookies bloat the payload of every request, that is, they result in more data sent back and forth, often unnecessarily. While the effect of bloated header payloads in page performance is less immediate than the impact of blowing up Varnish cache ratios, it is not less measurable or important. Please consider the usage of localStorage or sessionStorage as an alternative to cookies. See also Google's advice on minimizing request overhead.

Bad example: The word Token within your cookie name. During an incident involving the GettingStarted extension, the cookie name matched a regular expression that Varnish CDN recognises as a indicating logged-in pageview to never cache. See the code, an initial revert, another early fix, another revert commit, the Varnish layer workaround, the followup fix, the GettingStarted fix part 1 and part 2, and the regex fix.

Further reading

Articles

Talks

ResourceLoader

For frontend performance guidelines, see Performance/Guides/Frontend performance practices.

Meta

Sources that helped influence these guidelines, and future drafts and ideas:

References

  1. Measuring Wikipedia page load times (2018), Timo Tijhof.
  2. HPBN Book, Chapter: Primer on Web Performance (2013), Ilya Grigorik.
  3. “How Not To Measure Latency” (2016), a tech talk by Gil Tene.
  4. For more information about the sad stable of media storage, refer to task T28741 and the various parent tasks and "Mentions" tasks detailing its problems.