You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Add Image

From Wikitech-static
Revision as of 23:17, 4 August 2022 by imported>Gergő Tisza
Jump to navigation Jump to search

This page contains information about the infrastructure used for the Add Image structured task project (T285587). For project information, see mw:Growth/Personalized first day/Structured tasks/Add an image.

High-level summary

Add Image is the infrastructure behind a feature which recommends images to be added to articles which don't have any, and provides a streamlined editing interface for doing so. It consists of:

  • A WIP data pipeline (for details see this merge request, for a quick overview see this image) which
    • creates a Hive dataset of articles (on any project wiki) with no images, and image recommendations based on images in other Wikimedia projects which are connected to the article in some way via Wikidata;
    • loads that dataset into the CirrusSearch index as recommendation.image/exists|1 weighted tags;
    • exports the dataset to Cassandra.
  • A hasrecommendation:image CirrusSearch keyword for searching for articles with recommendations
  • An internal image recommendation API (repo, user docs, ops docs) that provides the information in the Cassandra dataset for the queried page IDs.
    • Previously a proof-of-concept API implementation (sandbox, repo, project page) was used, and is still the only API that's publicly available. (Tracker for a single public API: T306349.)
  • Integration with the structured task functionality of the GrowthExperiments extension: a browsing interface on Special:Homepage and VisualEditor-based custom editing interface.

Infobox exclusion

The GrowthExperiments extension adds a new hastemplatecollection:<collection> CirrusSearch keyword for searching for articles containing any one of a list of templates (typically a list so long that hastemplate: cannot be used). This is used for excluding articles with infoboxes: it defines the infobox and infoboxtest collections based on the GEInfoboxTemplates and GEInfoboxTemplatesTest community configuration fields.

To update, you can set GEInfoboxTemplatesTest and test with the hastemplatecollection:infoboxtest -hastemplatecollection:infobox and -hastemplatecollection:infoboxtest hastemplatecollection:infobox searches what infobox-containing articles would be added to / removed from the filter.

The list of infoboxes is generated by the tgr/infobox-templates script.

See also

  • Add Link, the previous structured task project