You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

News/Wiki Replicas 2020 Redesign

From Wikitech-static
< News
Revision as of 16:33, 4 November 2020 by imported>Jhernandez (Jhernandez moved page News/Wikireplicas 2020 Redesign to News/Wiki Replicas 2020 Redesign: The name is "Wiki Replicas" not "Wikireplicas")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

In the interest of making and keeping Wiki Replicas a stable and performant service, a new backend architecture is needed. There is some impact in the features and usage patterns.

What is changing?

Unfortunately, we need to make this architectural change in order to continue to provide a stable and performant service. The main changes are:

Timeline

  • October 2020
    • Hardware purchased; prepping for installation
  • October - December 2020
    • New hardware is brought online, critical services updated
    • Testing with early adopters
  • January 2021
    • At this point, both clusters should be operational, allowing for a transition period
    • Begin user migration
  • February 2021
    • Old hardware is decommissioned. All tools and services have been transitioned.

What should I do?

Update existing tools to ensure queries are executed against the proper database connection

Because you Can only query a database if you are connected to it explicitly, you should utilize the database hostnames in the form of <db_name>.db.svc.eqiad.wmflabs.

Don't connect to enwiki and then query against frwiki for example. Instead connect to frwiki.db.svc.eqiad.wmflabs, then execute the query.

Check your existing tools and services queries for cross slice JOINs

Because Cross slice JOINs will no longer work as written, check your services/tools and if you find cross JOINs you will have to separate the queries and use a connection for each database, and do the required data processing or filtering within the tool or service instead.

You can see examples in the Solutions to common problems section.

What are the primary changes with the new architecture?

Given the large change to the underlying architecture, our existing tools and queries will be impacted. In particular, each instance will now contain only a single slice. So:

Cross slice JOINs will no longer work as written

Unfortunately, this means cross joins from, for instance, enwiki to commonswiki or wikidatawiki are no longer possible.

There are different ways to accomplish the same results given the limitations, but they vary based on the query and where it was being run from. See Solutions to common problems for migration suggestions.

Can only query a database if you are connected to it explicitly

For example, connecting to enwiki to query against avwiki will no longer work.

You will need to make a specific connection to each database you want to query against. See Update existing tools to ensure queries are executed against the proper database connection and Solutions to common problems for examples.

Solutions to common problems

WIP

How do I cross reference data between wikis like I do with cross joins today?

All existing data can still be queried, but you will have to use specific connections to each database to do so.

The resulting datasets can then be processed in code as desired as part of a tool or service. The key will be to migrate the filtering in the SQL query to the codebase instead.

Unfortunately, the diversity of tools, services, and queries means there isn't a direct answer that can be provided. Browse around the Solutions to common problems to see if you can find an example that helps you, or otherwise reach out for support.

How will the wikis map to physical nodes/slices?

By connecting directly to the database for a query, this implementation detail shouldn't be something to worry about. However, the current mappings are here. These mappings are not expected to radically change as part of this process.


Why are we doing this?

Wiki Replicas existing setup relies on a single MariaDB instance for all Wikimedia data. As of October 2020, this includes over 900 wikis, comprised of 6,000 tables, and totalling over 7 TB’s after full compression.

This setup limits our ability to scale and maintain a performant and stable service. Based on community feedback and our own observations running the service:

  1. Given the large amount of data, the service struggles to keep nodes in sync. This results in poor performance
    • Recovering from data corruption is a concern, given the current performance
    • Even maintaining current capacity within the cluster has become challenging and more hardware is needed
  2. Usage and query complexity continue to increase, resulting in reduced performance
  3. Our current architectural design limits our ability to optimize for complex queries, such as those used to do analytics and OLAP

New architecture

Given the challenges outlined above, we plan to change the architectural model from "Multi-source" (lots of slices smashed into one MariaDB instance) to "Multi-instance" (multiple MariaDB instances on a single physical node with each instance only handling a single slice).

The new physical layout will be 8 slices * 2 types (web + analytics) * 2 instances each == 32 database instances. Deployed as multi-instance with 4 instances per physical node == 8 nodes.

Impact on other services

Services like Quarry and PAWS will need to be updated to understand where to route a query as there will now be multiple MariaDB instances.

How can I help?

Mitigation for cross slice JOIN's will require changes to existing tools and services. Providing help and guidance to developers and other community members is crucial to ensuring existing workflows can continue uninterrupted. Consider:

  • Sharing examples / descriptions of how a tool or service was updated
  • Writing a common solution or some example code others can utilize and reference
  • Helping others on IRC and the mailing lists

Related Tickets