You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Add Image: Difference between revisions
imported>Gergő Tisza (Created page with "This page contains information about the infrastructure used for the Add Image structured task project (phab:T285587). For project information, see mw:Growth/Personalized first day/Structured tasks/Add an image. == High-level summary == Add Image is the infrastructure behind a feature which recommends images to be added to articles which don't have any, and provides a streamlined editing interface for doing so. It consists of: * A dataset (currently a one-off)...") |
imported>Kosta Harlan |
||
(9 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
This page contains information about the infrastructure used for the Add Image structured task project ([[phab:T285587|T285587]]). For project information, see [[mw:Growth/Personalized first day/Structured tasks/Add an image]]. | {{Ptag|image-suggestions}} | ||
This page contains information about the infrastructure used for the Add Image structured task project ([[phab:T285587|T285587]]). For project information, see [[mw:Growth/Personalized first day/Structured tasks/Add an image]]. (Parts of the infrastructure are used for other image recommendation features, too; currently this page is written from the POV of maintaining Growth team features.) | |||
== High-level summary == | == High-level summary == | ||
Add Image is the infrastructure behind a feature which recommends images to be added to articles which don't have any, and provides a streamlined editing interface for doing so. It consists of: | Add Image is the infrastructure behind a feature which recommends images to be added to articles which don't have any, and provides a streamlined editing interface for doing so. It consists of: | ||
* A | * A WIP data pipeline (see [[mw:Structured Data Across Wikimedia/Image Suggestions/Data Pipeline|mw:Structured_Data_Across_Wikimedia/Image_Suggestions/Data_Pipeline)]] (for details see [[gitlab:repos/generated-data-platform/datapipelines/-/merge_requests/51/diffs|this merge request]], for a quick overview see [[phab:F35030053|this image]]) which | ||
* A <code>hasrecommendation:image</code> CirrusSearch keyword for searching for articles with recommendations ( | ** creates a Hive dataset of articles (on any project wiki) with no images, and image recommendations based on images in other Wikimedia projects which are connected to the article in some way via Wikidata; | ||
* | ** loads that dataset into the CirrusSearch index as <code>recommendation.image/exists|1</code> [[Search/WeightedTags|weighted tags]]; | ||
* Integration with the structured task functionality of the [[Extension:GrowthExperiments|GrowthExperiments]] extension: a browsing interface on <code>Special:Homepage</code> and VisualEditor-based custom editing interface. | ** exports the dataset to Cassandra. | ||
* A <code>hasrecommendation:image</code> CirrusSearch keyword for searching for articles with recommendations | |||
* An internal image recommendation API ([https://gerrit.wikimedia.org/r/plugins/gitiles/generated-data-platform/datasets/image-suggestions repo], [[mw:Platform_Engineering_Team/Data_Value_Stream/Data_Gateway#Image_Suggestions|user docs]], [[Image-suggestion|ops docs]]) that provides the information in the Cassandra dataset for the queried page IDs. | |||
** Previously a proof-of-concept API implementation ([https://image-suggestion-api.wmcloud.org/?doc sandbox], [https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/services/image-suggestion-api repo], [[mw:Core_Platform_Team/Initiatives/Image_Suggestion_API|project page]]) was used, and is still the only API that's publicly available. (Tracker for a single public API: [[phab:T306349|T306349]].) | |||
* Integration with the structured task functionality of the [[mw:Extension:GrowthExperiments|GrowthExperiments]] extension: a browsing interface on <code>Special:Homepage</code> and VisualEditor-based custom editing interface. | |||
== Infobox exclusion == | |||
The GrowthExperiments extension adds a new <code>hastemplatecollection:<collection></code> CirrusSearch keyword for searching for articles containing any one of a list of templates (typically a list so long that <code>hastemplate:</code> cannot be used). This is used for excluding articles with infoboxes: it defines the <code>infobox</code> and <code>infoboxtest</code> collections based on the <code>GEInfoboxTemplates</code> and <code>GEInfoboxTemplatesTest</code> [[mw:Growth/Community_configuration|community configuration]] fields. | |||
To update, you can set <code>GEInfoboxTemplatesTest</code> and test with the <code>hastemplatecollection:infoboxtest -hastemplatecollection:infobox</code> and <code>-hastemplatecollection:infoboxtest hastemplatecollection:infobox</code> searches what infobox-containing articles would be added to / removed from the filter. | |||
The list of infoboxes is generated by the [https://gitlab.wikimedia.org/tgr/infobox-templates tgr/infobox-templates] script. | |||
== Enabling image recommendations on a new wiki == | |||
(See [[mw:Growth#Deployment_table]] about current status.) | |||
The data pipeline and API works for all wikis. To enable on the MediaWiki side: | |||
* Make sure the image recommendation task type is enabled in community config:<syntaxhighlight lang="bash"> | |||
export PHAB=T123456 # deployment task | |||
for WIKI in wiki1 wiki2 wiki3 ...; do | |||
mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \ | |||
--page MediaWiki:NewcomerTasks.json \ | |||
--create-only \ | |||
--json \ | |||
--summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \ | |||
image-recommendation \ | |||
'{ "type": "image-recommendation", "group": "medium" }'; | |||
done | |||
</syntaxhighlight> | |||
* Fetch the list of infoboxes with <code>python infobox-templates.py --format=json $LANG</code> (using [https://gitlab.wikimedia.org/tgr/infobox-templates tgr/infobox-templates]) and set it in community configuration, e.g.<syntaxhighlight lang="bash"> | |||
export PHAB=T123456 # deployment task | |||
mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \ | |||
--page MediaWiki:GrowthExperimentsConfig.json \ | |||
--json \ | |||
--summary "machine-generated list of infobox generators ([[phab:$PHAB]])" \ | |||
GEInfoboxTemplates \ | |||
"`jq --compact-output . <infobox-templates.py output file>`" | |||
</syntaxhighlight> | |||
* Set <code>$wgGENewcomerTasksImageRecommendationsEnabled</code>. | |||
* Set <code>$wgGEHomepageNewAccountVariantsByPlatform</code> to 50/50 control/imagerecommendation. | |||
== Troubleshooting == | |||
To test the (new) API, to to a production host and use <code><nowiki>curl -H 'Accept: application/json' 'http://localhost:6030/public/image_suggestions/suggestions/<wiki id>/<page id>' | jq .</nowiki></code>. | |||
=== Example API response === | |||
{{Codesample | |||
| code = { | |||
"wiki": "cswiki", | |||
"page_id": 319, | |||
"id": "65f0a7ce-ea3b-11ed-80ee-f4e9d4dbbe90", | |||
"image": "Vladimir_Smicer.jpg", | |||
"confidence": 50, | |||
"found_on": null, | |||
"kind": [ | |||
"istype-section-topics-p18" | |||
], | |||
"origin_wiki": "commonswiki", | |||
"page_qid": null, | |||
"page_rev": 22568084, | |||
"section_heading": "sport", | |||
"section_index": null | |||
} | |||
| lang = json | |||
}} | |||
== See also == | == See also == | ||
* [[Add Link]], the previous structured task project | * [[Add Link]], the previous structured task project | ||
* [[Image-suggestion]] |
Latest revision as of 07:57, 17 May 2023
This page contains information about the infrastructure used for the Add Image structured task project (T285587). For project information, see mw:Growth/Personalized first day/Structured tasks/Add an image. (Parts of the infrastructure are used for other image recommendation features, too; currently this page is written from the POV of maintaining Growth team features.)
High-level summary
Add Image is the infrastructure behind a feature which recommends images to be added to articles which don't have any, and provides a streamlined editing interface for doing so. It consists of:
- A WIP data pipeline (see mw:Structured_Data_Across_Wikimedia/Image_Suggestions/Data_Pipeline) (for details see this merge request, for a quick overview see this image) which
- creates a Hive dataset of articles (on any project wiki) with no images, and image recommendations based on images in other Wikimedia projects which are connected to the article in some way via Wikidata;
- loads that dataset into the CirrusSearch index as
recommendation.image/exists|1
weighted tags; - exports the dataset to Cassandra.
- A
hasrecommendation:image
CirrusSearch keyword for searching for articles with recommendations - An internal image recommendation API (repo, user docs, ops docs) that provides the information in the Cassandra dataset for the queried page IDs.
- Previously a proof-of-concept API implementation (sandbox, repo, project page) was used, and is still the only API that's publicly available. (Tracker for a single public API: T306349.)
- Integration with the structured task functionality of the GrowthExperiments extension: a browsing interface on
Special:Homepage
and VisualEditor-based custom editing interface.
Infobox exclusion
The GrowthExperiments extension adds a new hastemplatecollection:<collection>
CirrusSearch keyword for searching for articles containing any one of a list of templates (typically a list so long that hastemplate:
cannot be used). This is used for excluding articles with infoboxes: it defines the infobox
and infoboxtest
collections based on the GEInfoboxTemplates
and GEInfoboxTemplatesTest
community configuration fields.
To update, you can set GEInfoboxTemplatesTest
and test with the hastemplatecollection:infoboxtest -hastemplatecollection:infobox
and -hastemplatecollection:infoboxtest hastemplatecollection:infobox
searches what infobox-containing articles would be added to / removed from the filter.
The list of infoboxes is generated by the tgr/infobox-templates script.
Enabling image recommendations on a new wiki
(See mw:Growth#Deployment_table about current status.) The data pipeline and API works for all wikis. To enable on the MediaWiki side:
- Make sure the image recommendation task type is enabled in community config:
export PHAB=T123456 # deployment task for WIKI in wiki1 wiki2 wiki3 ...; do mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \ --page MediaWiki:NewcomerTasks.json \ --create-only \ --json \ --summary "Growth features configuration boilerplate ([[phab:$PHAB]])" \ image-recommendation \ '{ "type": "image-recommendation", "group": "medium" }'; done
- Fetch the list of infoboxes with
python infobox-templates.py --format=json $LANG
(using tgr/infobox-templates) and set it in community configuration, e.g.export PHAB=T123456 # deployment task mwscript extensions/GrowthExperiments/maintenance/changeWikiConfig.php $WIKI \ --page MediaWiki:GrowthExperimentsConfig.json \ --json \ --summary "machine-generated list of infobox generators ([[phab:$PHAB]])" \ GEInfoboxTemplates \ "`jq --compact-output . <infobox-templates.py output file>`"
- Set
$wgGENewcomerTasksImageRecommendationsEnabled
. - Set
$wgGEHomepageNewAccountVariantsByPlatform
to 50/50 control/imagerecommendation.
Troubleshooting
To test the (new) API, to to a production host and use curl -H 'Accept: application/json' 'http://localhost:6030/public/image_suggestions/suggestions/<wiki id>/<page id>' | jq .
.
Example API response
{
"wiki": "cswiki",
"page_id": 319,
"id": "65f0a7ce-ea3b-11ed-80ee-f4e9d4dbbe90",
"image": "Vladimir_Smicer.jpg",
"confidence": 50,
"found_on": null,
"kind": [
"istype-section-topics-p18"
],
"origin_wiki": "commonswiki",
"page_qid": null,
"page_rev": 22568084,
"section_heading": "sport",
"section_index": null
}
See also
- Add Link, the previous structured task project
- Image-suggestion