You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Add Link: Difference between revisions
imported>Kosta Harlan |
imported>Gergő Tisza |
||
(22 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
== High-level summary == | == High-level summary == | ||
The '''Link Recommendation Service''' recommends phrases of text in an article to link to other articles on a wiki. Users can then accept or reject these recommendations. | |||
# The service is an application hosted on kubernetes with an API accessible via HTTP (see {{Phabricator|T258978}}). It responds to a POST request containing wikitext of an article and responds with a structured response of link recommendations for the article. It does not have caching or storage; the client (MediaWiki) is responsible for doing that ({{Phabricator|T261411}}). | |||
# The search index stores metadata about which articles have link recommendations via a field we set per article ({{Phabricator|T261407}}, {{Phabricator|T262226}}) | |||
# A MySQL table per wiki is used for caching the actual link recommendations ({{Phabricator|T261411}}); each row contains serialized link recommendations for a particular article. | |||
# A maintenance script ({{Phabricator|T261408}}) runs hourly per enabled wiki to generate link recommendations by iterating over each [[Search/articletopic]] and calling the Link Recommendation Service to request recommendations | |||
#* the maintenance script caches the results in the MySQL table, then sends an event to [[Event_Platform/EventGate]], where the [[Search]] pipeline ensures that the index is updated with the links/nolinks metadata for the article. | |||
#* on page edit (when the edit is not done via the Add Link UX), link recommendations are regenerated via the job queue and the same code and APIs that are utilized in the maintenance script (n.b. we might do this differently; not yet implemented) | |||
=== Diagram: Fetching and completing link recommendation tasks[[File:Link recommendation service (task fetch and completion).svg|Link recommendation service (task fetch and completion)]] === | |||
Source: [[Add_Link/Diagram:_Fetching_and_completing_link_recommendation_tasks]] | |||
== Link Recommendation Service == | |||
=== Repository === | |||
The repository for training the link recommendation model as well as for the query service is available: | |||
{{SourceLinks|url=https://gerrit.wikimedia.org/r/plugins/gitiles/research/mwaddlink/|text=research/mwaddlink|text2=|url2=}} | |||
=== Machine learning model === | |||
Some explanation of how the model works can be found on the [[:m:Research:Link_recommendation_model_for_add-a-link_structured_task|meta-research-page]]. | |||
=== Local development === | |||
Please see the [[gerrit:plugins/gitiles/research/mwaddlink/+/refs/heads/main|README]] in the research/mwaddlink repository for options available, including docker-compose, Vagrant, and host system setups. | |||
=== API === | |||
* [https://api.wikimedia.org/wiki/API_reference/Service/Link_recommendation api.wikimedia.org documentation] | |||
* [https://api.wikimedia.org/service/linkrecommendation/apidocs/ Swagger] | |||
=== Deployment === | |||
The service is deployed in production using the [[Deployment pipeline]]. The configuration specific to the service is in the deployment-charts repository: | |||
{{SourceLinks|url=https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/charts/linkrecommendation/|text=charts/linkrecommendation|url2=https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master/helmfile.d/services/linkrecommendation/|text2=helmfile.d/services/linkrecommendation}} | |||
=== Dataset pipeline === | |||
The link recommendation model is trained on the [[stat1008]] server (due to its high CPU needs and access to production systems available via stat1008) with the <code>run-pipeline.sh</code> script. That script aggregates MediaWiki data from hive into several MySQL lookup tables per wiki. (For more details, see the ''Training the model'' section of the readme.) Those tables (stored in the <code>staging</code> database with an <code>lr_</code> prefix) are then exported and published via [[datasets.wikimedia.org]] with the <code>publish-datasets.sh</code> command. The production query service (that MediaWiki interacts with) will poll for changes and import those datasets into its own MySQL instance in Kubernetes ({{Phabricator|T266826}}). | |||
The canonical location for training new models and publishing datasets is at <code>/home/mgerlach/REPOS/mwaddlink-gerrit</code> | |||
=== Monitoring === | |||
* [https://grafana.wikimedia.org/d/CI6JRnLMz/linkrecommendation?orgId=1 Grafana dashboard] | |||
* [https://logstash.wikimedia.org/app/kibana#/dashboard/6027a870-7ffc-11eb-8ab2-63c7f3b019fc Logstash] | |||
== Resolved questions / decisions == | |||
* 10 December How to get a MySQL database from stat* server to a production MySQL instance (SRE/Analytics) ({{Phabricator|T266826}}) | |||
*23 October: Store the link recommendations in WANObjectCache or in a MySQL table? {{Phabricator|T261411}}(needs SRE/DBA input) | |||
*15 October: use wikitext for training model, generating dictionary data, and as input to the mwaddlink query service. Will search for phrases in VE's editable content surface rather than attempt to apply offsets from wikitext / parsoid HTML. | |||
== Deployment == | |||
{{Notice|text=The canonical documentation is at [[Deployments on kubernetes]]}}{{Warning|content=If you change the default values.yaml, you need to release a new chart version by bumping the version of Chart.yaml.}} | |||
Make your patch to <code>operations/deployment-charts</code>. Typically it will only change the value of the <code>main_app.version</code> field in <code>helmfile.d/services/linkrecommendation/values.yaml</code>, to the new image tag was mentioned in PipelineBot's comment on the last merged <code>research/mwaddlink</code> patch ([[gerrit:c/operations/deployment-charts/+/772398|example]]) – see [[Deployments on kubernetes]] for tips, and note that 1) self merges are OK and 2) the repository on the deployment server will update about a minute after the patch is merged. | |||
Then, SSH to a [[Deployment server]]. | |||
=== staging === | |||
{{Terminal|title=Staging|text=$ cd /srv/deployment-charts/helmfile.d/services/linkrecommendation/ | |||
$ git log # Make sure your deployment patch is there | |||
$ helmfile -e staging -i apply # scan output to see if the changes are expected, press "enter" | |||
$ service-checker-swagger staging.svc.eqiad.wmnet https://staging.svc.eqiad.wmnet:4005 -t 2 -s /apispec_1.json | |||
# Manually verifying requests | |||
$ curl "https://staging.svc.eqiad.wmnet:4005/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15" | |||
# Against production | |||
$ diff <(curl -s "https://linkrecommendation.discovery.wmnet:4005/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15" {{!}} jq .) <(curl -s "https://staging.svc.eqiad.wmnet:4005/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15" {{!}} jq .) | |||
}} | |||
=== eqiad === | |||
{{Terminal|title=eqiad|text=$ cd /srv/deployment-charts/helmfile.d/services/linkrecommendation/ | |||
$ git log # Make sure your deployment patch is there | |||
$ helmfile -e eqiad -i apply # scan output to see if the changes are expected, press "enter" | |||
# Internal traffic release | |||
$ service-checker-swagger linkrecommendation.discovery.wmnet https://linkrecommendation.discovery.wmnet:4005 -t 2 -s /apispec_1.json | |||
# External traffic release | |||
$ service-checker-swagger linkrecommendation.discovery.wmnet https://linkrecommendation.discovery.wmnet:4006 -t 2 -s /apispec_1.json | |||
# Manually verifying requests | |||
$ curl "https://linkrecommendation.discovery.wmnet:4005/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15" | |||
$ curl "https://linkrecommendation.discovery.wmnet:4006/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15"}} | |||
=== codfw === | |||
# | {{Terminal|title=codfw|text=$ cd /srv/deployment-charts/helmfile.d/services/linkrecommendation/ | ||
# | $ git log # Make sure your deployment patch is there | ||
$ helmfile -e codfw -i apply # scan output to see if the changes are expected, press "enter" | |||
# | # NB the following requests will go to the active datacenter, so if eqiad is active and you're deploying to codfw, these requests will go to eqiad. | ||
# | # Internal traffic release | ||
$ service-checker-swagger linkrecommendation.discovery.wmnet https://linkrecommendation.discovery.wmnet:4005 -t 2 -s /apispec_1.json | |||
# External traffic release | |||
$ service-checker-swagger linkrecommendation.discovery.wmnet https://linkrecommendation.discovery.wmnet:4006 -t 2 -s /apispec_1.json | |||
# Manually verifying requests | |||
$ curl "https://linkrecommendation.discovery.wmnet:4005/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15" | |||
$ curl "https://linkrecommendation.discovery.wmnet:4006/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15"}} | |||
== | === Checking output from a container === | ||
{{Terminal|text=$ kube_env linkrecommendation staging | |||
$ kubectl get pods | |||
NAME READY STATUS RESTARTS AGE | |||
linkrecommendation-staging-7476db744d-w8bms 3/3 Running 0 7h47m | |||
tiller-974b97fc7-rq4dn 1/1 Running 0 30h | |||
$ kubectl logs -f linkrecommendation-staging-7476db744d-w8bms | |||
Error from server (BadRequest): a container name must be specified for pod linkrecommendation-staging-7476db744d-w8bms, choose one of: [linkrecommendation-staging staging-metrics-exporter linkrecommendation-staging-tls-proxy] | |||
$ kubectl logs -f linkrecommendation-staging-7476db744d-w8bms -c linkrecommendation-staging}} | |||
== Enabling on a new wiki == | |||
# | Enabling on a new-wiki once the models have been set up is a two-step process: | ||
* Add task configuration for the <code>link-recommendation</code> task type. Typically this would be done by running a command like | |||
:<syntaxhighlight lang="bash"> | |||
PHAB=T123456 | |||
for WIKI in wiki1 wiki2 wiki3 ...; do | |||
ORIGIN=`mwscript getConfiguration.php $WIKI --settings 'wgCanonicalServer' --format json | jq --raw-output '.wgCanonicalServer'` | |||
mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \ | |||
--page MediaWiki:NewcomerTasks.json \ | |||
--create-only \ | |||
--json \ | |||
--summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \ | |||
link-recommendation \ | |||
'{ "type": "link-recommendation", "group": "easy" }' | |||
jq "select(.wiki==\"$WIKI\" and .probability > 0.25) | .section" wiki_sections.jsonl \ | |||
| jq --slurp --compact-output "unique" \ | |||
| mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \ | |||
--page MediaWiki:NewcomerTasks.json \ | |||
--json \ | |||
--summary "machine-generated configuration for excluding sections from link recommendations ([[phab:$PHAB]]), feel free to improve" \ | |||
link-recommendation.excludedSections \ | |||
"`cat`" | |||
echo "$ORIGIN/wiki/MediaWiki:NewcomerTasks.json" | |||
echo "$ORIGIN/w/index.php?title=MediaWiki:NewcomerTasks.json&diff=next" | |||
read # give time for manual verification | |||
done | |||
</syntaxhighlight> | |||
:on [[mwmaint1002]]. (The <code>wiki_sections.jsonl</code> file can be found [https://phabricator.wikimedia.org/F35092312 here]; see [https://phabricator.wikimedia.org/T306792#7897336 T306792#7897336] for how it was produced.) | |||
* Set <code>$wgGENewcomerTasksLinkRecommendationsEnabled</code> to true for the target wikis. | |||
* Wait a few days so the systemd job running <code>refreshLinkRecommendations.php</code> can generate tasks. | |||
* set <code>$wgGELinkRecommendationsFrontendEnabled</code> to true for the target wikis. | |||
== Updates == | == Updates == | ||
=== 9 November - 10 December 2020 === | |||
* Growth / Research: Continued refactoring of research/mwaddlink for production ready status | |||
* Growth: Backend patches for GrowthExperiments for consuming research/mwaddlink data | |||
* Growth / SRE: Deployed linkrecommendation service to production (no datasets yet though) | |||
* DBA: Created database and read/write users for production kubernetes instance to access | |||
* Search: Working on consuming event(s) generated by service | |||
=== 2 - 6 November 2020 === | |||
* Growth / Analytics Engineering: [https://docs.google.com/document/d/1aX19lWCDz-oP4Nl0CcdpYC-0_C4ipo_24_opMBx-hDA/edit#heading=h.ua4w3d1iuahi Discuss pipeline for MySQL on stats1008 -> production MySQL] | |||
=== 26 - 30 October 2020 === | |||
* Growth / Research: Recap architecture and discuss milestones | |||
* Growth / SRE / DBA: Agreed to use MySQL for lookup tables for the link recommendation service | |||
* Growth: Continued prototyping of the VisualEditor integration; continued work on deployment pipeline; initial work on HTTP API via Flask; addition of MySQL cache table in GrowthExperiments along with general infrastructure for reading/writing to the cache | |||
=== 19 - 23 October 2020 === | |||
* Growth / Research: Working on deployment pipeline for mwaddlink | |||
*Growth: Prototyping VisualEditor integration | |||
*Growth: Beginning work on maintenance script and supporting classes | |||
=== 12 - 16 October 2020 === | === 12 - 16 October 2020 === | ||
* Growth / Research: Parsoid HTML vs wikitext, repo structure, MySQL vs SQLite, misc other things | |||
* Growth: Engineers meet to discuss schedule, order of tasks, etc | * Growth: Engineers meet to discuss schedule, order of tasks, etc | ||
Line 28: | Line 173: | ||
== Teams / Contact == | == Teams / Contact == | ||
[[mw:Growth|Growth]] (primary stakeholder, technical contact for project is [[mw:User:KHarlan_(WMF)|Kosta Harlan]], product owner is [[mw:User:MMiller_(WMF)|Marshall Miller]]). Other teams: [[mw:Wikimedia_Search_Platform|Search Platform]], [[mw:Wikimedia_Site_Reliability_Engineering|SRE]], Release Engineering, [[mw:Wikimedia_Research|Research,]] [[mw:Editing_team|Editing]], [[mw:Parsing|Parsing]] | |||
=== Roles / responsibilities === | |||
* | * Growth: User facing code, integration with our existing newcomer tasks framework, plus maintenance script to populate cache with recommendations | ||
* | * Research: Implementing code to train models and provide a query client (research/mwaddlink repo) | ||
* SRE: Working with Growth + Research to put the link recommendation service into production | |||
* | * Search Platform: Implementing the event pipeline to update the search index metadata for a document when new link recommendations are generated | ||
* Editing | * Release Engineering: Consulting with Growth for deployment pipeline | ||
* Parsing | * Editing: Consulting with Growth for VE integration | ||
* Parsing: Consulting with Growth for VE integration | |||
== Background reading == | == Background reading == |
Revision as of 19:43, 26 May 2022
This page contains information about the infrastructure used for the Add a Link structured task project (task T252822)
High-level summary
The Link Recommendation Service recommends phrases of text in an article to link to other articles on a wiki. Users can then accept or reject these recommendations.
- The service is an application hosted on kubernetes with an API accessible via HTTP (see task T258978). It responds to a POST request containing wikitext of an article and responds with a structured response of link recommendations for the article. It does not have caching or storage; the client (MediaWiki) is responsible for doing that (task T261411).
- The search index stores metadata about which articles have link recommendations via a field we set per article (task T261407, task T262226)
- A MySQL table per wiki is used for caching the actual link recommendations (task T261411); each row contains serialized link recommendations for a particular article.
- A maintenance script (task T261408) runs hourly per enabled wiki to generate link recommendations by iterating over each Search/articletopic and calling the Link Recommendation Service to request recommendations
- the maintenance script caches the results in the MySQL table, then sends an event to Event_Platform/EventGate, where the Search pipeline ensures that the index is updated with the links/nolinks metadata for the article.
- on page edit (when the edit is not done via the Add Link UX), link recommendations are regenerated via the job queue and the same code and APIs that are utilized in the maintenance script (n.b. we might do this differently; not yet implemented)
Diagram: Fetching and completing link recommendation tasks
Source: Add_Link/Diagram:_Fetching_and_completing_link_recommendation_tasks
Link Recommendation Service
Repository
The repository for training the link recommendation model as well as for the query service is available:
research/mwaddlink
Machine learning model
Some explanation of how the model works can be found on the meta-research-page.
Local development
Please see the README in the research/mwaddlink repository for options available, including docker-compose, Vagrant, and host system setups.
API
Deployment
The service is deployed in production using the Deployment pipeline. The configuration specific to the service is in the deployment-charts repository:
Dataset pipeline
The link recommendation model is trained on the stat1008 server (due to its high CPU needs and access to production systems available via stat1008) with the run-pipeline.sh
script. That script aggregates MediaWiki data from hive into several MySQL lookup tables per wiki. (For more details, see the Training the model section of the readme.) Those tables (stored in the staging
database with an lr_
prefix) are then exported and published via datasets.wikimedia.org with the publish-datasets.sh
command. The production query service (that MediaWiki interacts with) will poll for changes and import those datasets into its own MySQL instance in Kubernetes (task T266826).
The canonical location for training new models and publishing datasets is at /home/mgerlach/REPOS/mwaddlink-gerrit
Monitoring
Resolved questions / decisions
- 10 December How to get a MySQL database from stat* server to a production MySQL instance (SRE/Analytics) (task T266826)
- 23 October: Store the link recommendations in WANObjectCache or in a MySQL table? task T261411(needs SRE/DBA input)
- 15 October: use wikitext for training model, generating dictionary data, and as input to the mwaddlink query service. Will search for phrases in VE's editable content surface rather than attempt to apply offsets from wikitext / parsoid HTML.
Deployment
![]() | The canonical documentation is at Deployments on kubernetes |
![]() | If you change the default values.yaml, you need to release a new chart version by bumping the version of Chart.yaml. |
Make your patch to operations/deployment-charts
. Typically it will only change the value of the main_app.version
field in helmfile.d/services/linkrecommendation/values.yaml
, to the new image tag was mentioned in PipelineBot's comment on the last merged research/mwaddlink
patch (example) – see Deployments on kubernetes for tips, and note that 1) self merges are OK and 2) the repository on the deployment server will update about a minute after the patch is merged.
Then, SSH to a Deployment server.
staging
$ cd /srv/deployment-charts/helmfile.d/services/linkrecommendation/ $ git log # Make sure your deployment patch is there $ helmfile -e staging -i apply # scan output to see if the changes are expected, press "enter" $ service-checker-swagger staging.svc.eqiad.wmnet https://staging.svc.eqiad.wmnet:4005 -t 2 -s /apispec_1.json # Manually verifying requests $ curl "https://staging.svc.eqiad.wmnet:4005/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15" # Against production $ diff <(curl -s "https://linkrecommendation.discovery.wmnet:4005/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15" | jq .) <(curl -s "https://staging.svc.eqiad.wmnet:4005/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15" | jq .)
eqiad
$ cd /srv/deployment-charts/helmfile.d/services/linkrecommendation/ $ git log # Make sure your deployment patch is there $ helmfile -e eqiad -i apply # scan output to see if the changes are expected, press "enter" # Internal traffic release $ service-checker-swagger linkrecommendation.discovery.wmnet https://linkrecommendation.discovery.wmnet:4005 -t 2 -s /apispec_1.json # External traffic release $ service-checker-swagger linkrecommendation.discovery.wmnet https://linkrecommendation.discovery.wmnet:4006 -t 2 -s /apispec_1.json # Manually verifying requests $ curl "https://linkrecommendation.discovery.wmnet:4005/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15" $ curl "https://linkrecommendation.discovery.wmnet:4006/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15"
codfw
$ cd /srv/deployment-charts/helmfile.d/services/linkrecommendation/ $ git log # Make sure your deployment patch is there $ helmfile -e codfw -i apply # scan output to see if the changes are expected, press "enter" # NB the following requests will go to the active datacenter, so if eqiad is active and you're deploying to codfw, these requests will go to eqiad. # Internal traffic release $ service-checker-swagger linkrecommendation.discovery.wmnet https://linkrecommendation.discovery.wmnet:4005 -t 2 -s /apispec_1.json # External traffic release $ service-checker-swagger linkrecommendation.discovery.wmnet https://linkrecommendation.discovery.wmnet:4006 -t 2 -s /apispec_1.json # Manually verifying requests $ curl "https://linkrecommendation.discovery.wmnet:4005/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15" $ curl "https://linkrecommendation.discovery.wmnet:4006/v1/linkrecommendations/wikipedia/cs/Barack_Obama?threshold=0.5&max_recommendations=15"
Checking output from a container
$ kube_env linkrecommendation staging $ kubectl get pods NAME READY STATUS RESTARTS AGE linkrecommendation-staging-7476db744d-w8bms 3/3 Running 0 7h47m tiller-974b97fc7-rq4dn 1/1 Running 0 30h $ kubectl logs -f linkrecommendation-staging-7476db744d-w8bms Error from server (BadRequest): a container name must be specified for pod linkrecommendation-staging-7476db744d-w8bms, choose one of: [linkrecommendation-staging staging-metrics-exporter linkrecommendation-staging-tls-proxy] $ kubectl logs -f linkrecommendation-staging-7476db744d-w8bms -c linkrecommendation-staging
Enabling on a new wiki
Enabling on a new-wiki once the models have been set up is a two-step process:
- Add task configuration for the
link-recommendation
task type. Typically this would be done by running a command like
PHAB=T123456 for WIKI in wiki1 wiki2 wiki3 ...; do ORIGIN=`mwscript getConfiguration.php $WIKI --settings 'wgCanonicalServer' --format json | jq --raw-output '.wgCanonicalServer'` mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \ --page MediaWiki:NewcomerTasks.json \ --create-only \ --json \ --summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \ link-recommendation \ '{ "type": "link-recommendation", "group": "easy" }' jq "select(.wiki==\"$WIKI\" and .probability > 0.25) | .section" wiki_sections.jsonl \ | jq --slurp --compact-output "unique" \ | mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \ --page MediaWiki:NewcomerTasks.json \ --json \ --summary "machine-generated configuration for excluding sections from link recommendations ([[phab:$PHAB]]), feel free to improve" \ link-recommendation.excludedSections \ "`cat`" echo "$ORIGIN/wiki/MediaWiki:NewcomerTasks.json" echo "$ORIGIN/w/index.php?title=MediaWiki:NewcomerTasks.json&diff=next" read # give time for manual verification done
- on mwmaint1002. (The
wiki_sections.jsonl
file can be found here; see T306792#7897336 for how it was produced.)
- Set
$wgGENewcomerTasksLinkRecommendationsEnabled
to true for the target wikis. - Wait a few days so the systemd job running
refreshLinkRecommendations.php
can generate tasks. - set
$wgGELinkRecommendationsFrontendEnabled
to true for the target wikis.
Updates
9 November - 10 December 2020
- Growth / Research: Continued refactoring of research/mwaddlink for production ready status
- Growth: Backend patches for GrowthExperiments for consuming research/mwaddlink data
- Growth / SRE: Deployed linkrecommendation service to production (no datasets yet though)
- DBA: Created database and read/write users for production kubernetes instance to access
- Search: Working on consuming event(s) generated by service
2 - 6 November 2020
- Growth / Analytics Engineering: Discuss pipeline for MySQL on stats1008 -> production MySQL
26 - 30 October 2020
- Growth / Research: Recap architecture and discuss milestones
- Growth / SRE / DBA: Agreed to use MySQL for lookup tables for the link recommendation service
- Growth: Continued prototyping of the VisualEditor integration; continued work on deployment pipeline; initial work on HTTP API via Flask; addition of MySQL cache table in GrowthExperiments along with general infrastructure for reading/writing to the cache
19 - 23 October 2020
- Growth / Research: Working on deployment pipeline for mwaddlink
- Growth: Prototyping VisualEditor integration
- Growth: Beginning work on maintenance script and supporting classes
12 - 16 October 2020
- Growth / Research: Parsoid HTML vs wikitext, repo structure, MySQL vs SQLite, misc other things
- Growth: Engineers meet to discuss schedule, order of tasks, etc
5 - 9 October 2020
- Growth / Editing: Exploring ways to bring link recommendation data into VisualEditor
- Growth / Research: Discussing repository structures in preparation for deployment pipeline setup
- Growth / SRE / Research: Discussing how to get mwaddlink-query / mwaddlink into production
Teams / Contact
Growth (primary stakeholder, technical contact for project is Kosta Harlan, product owner is Marshall Miller). Other teams: Search Platform, SRE, Release Engineering, Research, Editing, Parsing
Roles / responsibilities
- Growth: User facing code, integration with our existing newcomer tasks framework, plus maintenance script to populate cache with recommendations
- Research: Implementing code to train models and provide a query client (research/mwaddlink repo)
- SRE: Working with Growth + Research to put the link recommendation service into production
- Search Platform: Implementing the event pipeline to update the search index metadata for a document when new link recommendations are generated
- Release Engineering: Consulting with Growth for deployment pipeline
- Editing: Consulting with Growth for VE integration
- Parsing: Consulting with Growth for VE integration