You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
This page contains information about the infrastructure used for the Add a Link structured task project (task T252822)
- The Link Recommendation Service is accessible via HTTP (see task T258978), it responds to a POST request containing wikitext of an article and responds with a structured response of link recommendations for the article. It does not have caching or storage; the client (MediaWiki) is responsible for doing that (task T261411).
- The search index stores metadata about which articles have link recommendations via a field we set per article
- A MySQL table per wiki is used for caching the actual link recommendations (task T261411); each row contains serialized link recommendations for a particular article.
- A maintenance script regularly generates link recommendations by iterating over each Search/articletopic and calling the Link Recommendation Service
- the maintenance script caches the results in the MySQL table, then sends an event to Event_Platform/EventGate, where the Search pipeline ensures that the index is updated with the links/nolinks metadata for the article.
- on page edit (when the edit is not done via the Add Link UX), link recommendations are regenerated via the job queue and the same code and APIs that are utilized in the maintenance script
Link Recommendation Service
The repository for training the link recommendation model as well as for the query service is at https://gerrit.wikimedia.org/r/plugins/gitiles/research/mwaddlink/.
The service will be deployed in production using the Deployment pipeline.
The link recommendation model is trained on a stat* machine where several MySQL tables per wiki are filled with dictionary lookup data. The production query service (that MediaWiki interacts with) reads data from those MySQL tables.
Diagrams forthcoming :)
- Store the link recommendations in WANObjectCache or in a MySQL table? task T261411(needs SRE/DBA input)
- How to get a MySQL database from stat* server to a production MySQL instance (SRE/Analytics)
- Do we want to use the job queue to regenerate link recommendations (Growth)
Resolved questions / decisions
- 15 October: use wikitext for training model, generating dictionary data, and as input to the mwaddlink query service. Will search for phrases in VE's editable content surface rather than attempt to apply offsets from wikitext / parsoid HTML.
19 - 23 October 2020
12 - 16 October 2020
- Growth / Research: Parsoid HTML vs wikitext, repo structure, MySQL vs SQLite, misc other things
- Growth: Engineers meet to discuss schedule, order of tasks, etc
5 - 9 October 2020
- Growth / Editing: Exploring ways to bring link recommendation data into VisualEditor
- Growth / Research: Discussing repository structures in preparation for deployment pipeline setup
- Growth / SRE / Research: Discussing how to get mwaddlink-query / mwaddlink into production
Teams / Contact
Roles / responsibilities
- Growth: User facing code, integration with our existing newcomer tasks framework, plus maintenance script to populate cache with recommendations
- Research: Implementing code to train models and provide a query client (research/mwaddlink repo)
- SRE: Working with Growth + Research to put the link recommendation service into production
- Search Platform: Implementing the event pipeline to update the search index metadata for a document when new link recommendations are generated
- Release Engineering: Consulting with Growth for deployment pipeline
- Editing: Consulting with Growth for VE integration
- Parsing: Consulting with Growth for VE integration