Difference between revisions of "WMDE/Wikidata/Dispatching"

From Wikitech-static
< WMDE‎ | Wikidata
Jump to navigation Jump to search
imported>Michael Große
m (→‎Overview: fixed the link to the full documentation)
imported>Michael Große
 
Line 2: Line 2:


* Changes on Wikidata are buffered in the wb_changes table.
* Changes on Wikidata are buffered in the wb_changes table.
* Dispatch state of wikis is stored in the wb_changes_dispatch table.
* DispatchChanges job is queued on Wikidata after an edit to an entity that has at least one wiki subscribed to it
* Multiple dispatchChanges.php scripts run creating ChangeNotificationJob jobs on wikis.
* The DispatchChanges job queues an EntityChangeNotification job into the job queue for each wiki subscribed to that entity
* When the job runs on wikis, the ChangeHandler handles the change, which includes cache purges, refreshing links and injecting rc records for example.
** These EntityChangeNotification jobs get the whole Change(s) as parameter
* When the EntityChangeNotification job runs on wikis, the ChangeHandler handles the change, which includes cache purges, refreshing links and injecting rc records for example.


Full docs: https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_change-propagation.html
Full docs: https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_change-propagation.html


The process runs on wikidatawiki and testwikidatawiki
The process runs on wikidatawiki, testwikidatawiki and beta.
Does it run on beta?


Dispatch lag is linked to max lag.
== Occasional stuck changes ==
If dispatch lag increases max lag will increase and the edit rate on wikidata can dramatically drop.
For example: https://grafana.wikimedia.org/dashboard/db/wikidata-edits?orgId=1&from=1541090549052&to=1541109255645


== Occasional stuck locks ==
Rarely, it might happen that a change gets stuck in `wb_changes` for yet unknown reasons. This can be resolved by running the <code>ResubmitChanges.php</code> maintenance script. See for example {{phab|T294008}}.
 
While watching the dispatching graphs on grafana you will occasionally notice that one or 2 wikis get stuck.
This is annoying and the root cause is unknown.
However this is not something to worry about too much as the locks have a fairly short TTL.


== Control ==
== Control ==
The initiating of dispatchChanges.php is controlled by a cronjob in operations-puppet. This starts a new process every couple of minutes.
Fine grained control of the script can be done from within IS.php in mediawiki-config.
=== Number of dispatching threads ===
==== Control ====
The number of dispatching threads can be controlled with a combination of the time controls for the cronjob in puppet and the "wmgWikibaseDispatchMaxTime" setting in IS.php
For example in https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/471088/3/wmf-config/InitialiseSettings.php the number of threads dispatching to Wikidata was changed from 2 to 3 by making each process run for a longer period of time.
==== Value choice ====
https://grafana.wikimedia.org/dashboard/db/wikidata-dispatch-script can be a good tool to determine if we need more threads dispatching or not.
If the dispatch process is consistently having 0 "no client passes" then we probably need more dispatchers.
* 05 Nov 2018 - Dispatchers increased from 2 to 3 [https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/471088 gerrit] after adding all wiktionaries to the dispatch pool.
* 18 Oct 2018 - Dispatchers changed from 4 to 2 [https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/468245/ gerrit] after https://phabricator.wikimedia.org/T205865 was resolved.
* prior changes.. go hunt for them...


=== Stop dispatching ===
=== Stop dispatching ===


wmgWikibaseDispatchMaxTime can be set to 0, and after some minutes dispatching will stop.
<code>$repoSettings['localClientDatabases']</code> controls to which wikis may get EntityChangeNotification jobs queued. Set this to the empty list to stop new EntityChangeNotification jobs being queued except for the client wiki which is also its own repo, i.e. wikidatawiki.
 
If you really need to stop dispatching now change this setting and also kill the processes on the mwmaint server.


== Monitoring ==
== Monitoring ==
Line 53: Line 25:
'''Dispatching state on the repo'''
'''Dispatching state on the repo'''
* https://www.wikidata.org/wiki/Special:DispatchStats
* https://www.wikidata.org/wiki/Special:DispatchStats
* https://grafana.wikimedia.org/dashboard/db/wikidata-dispatch
* https://grafana.wikimedia.org/d/hGFN2TH7z/edit-dispatching-via-jobs
* https://grafana.wikimedia.org/dashboard/db/wikidata-dispatch-script
** some of these metrics have alerts associated with them: https://grafana.wikimedia.org/d/TUJ0V-0Zk/wikidata-alerts
* https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=www.wikidata.org
* https://grafana.wikimedia.org/d/CbmStnlGk/jobqueue-job?var-job=DispatchChanges
* addshore@mwmaint1002:~$ tail -f /var/log/wikidata/dispatchChanges-wikidatawiki.log
* https://grafana.wikimedia.org/d/CbmStnlGk/jobqueue-job?var-job=EntityChangeNotification
* https://grafana.wikimedia.org/d/CbmStnlGk/jobqueue-job?var-job=wikibase-InjectRCRecords


'''Job queue for ChangeNotificationJobs'''
* TODO: add relevant logs on any servers, useful kafkacat commands, etc.
 
TODO dashboards for monitoring the job queue?


== How it actually works? ==
== How it actually works? ==


=== Data Storage ===
=== Data Storage ===
Every time someone makes an edit in Wikidata, a new row in ''wb_changes'' table in wikidatawiki gets added. Here's an example:
When someone makes an edit to an Entity in Wikidata, if and only if that Entity has at least one client wiki subscribed to it, then a new row in ''wb_changes'' table in wikidatawiki gets added. Here's an example:
<syntaxhighlight lang="mysql">
<syntaxhighlight lang="mysql">
MariaDB [wikidatawiki_p]> select * from wb_changes limit 1\G
MariaDB [wikidatawiki_p]> select * from wb_changes limit 1\G
Line 72: Line 43:
       change_type: wikibase-item~update
       change_type: wikibase-item~update
       change_time: 20190924171504
       change_time: 20190924171504
   change_object_id: Q68474115
   change_object_id: Q1
change_revision_id: 1019310059
change_revision_id: 1019310059
     change_user_id: 142191
     change_user_id: 142191
       change_info: {"compactDiff":"{\"arrayFormatVersion\":1,\"labelChanges\":[],\"descriptionChanges\":[\"el\",\"eo\",\"en\",\"zh\",\"sr-ec\",\"wuu\",\"vi\",\"sr-el\",\"it\",\"zh-hk\",\"ar\",\"pt-br\",\"tg-cyrl\",\"cs\",\"et\",\"gl\",\"id\",\"es\",\"en-gb\",\"ru\",\"he\",\"nl\",\"pt\",\"zh-tw\",\"nb\",\"tr\",\"zh-cn\",\"tl\",\"th\",\"ro\",\"ca\",\"pl\",\"fr\",\"bg\",\"ast\",\"zh-sg\",\"bn\",\"de\",\"zh-my\",\"ko\",\"da\",\"fi\",\"zh-mo\",\"hu\",\"ja\",\"en-ca\",\"ka\",\"nn\",\"zh-hans\",\"sr\",\"sq\",\"nan\",\"oc\",\"sv\",\"zh-hant\",\"sk\",\"uk\",\"yue\"],\"statementChanges\":[],\"siteLinkChanges\":[],\"otherChanges\":false}","metadata":{"page_id":68145928,"parent_id":1019293753,"comment":"\/* wbeditentity-update:0| *\/ Bot: - Add descriptions:(58 langs).","rev_id":1019310059,"user_text":"Mr.Ibrahembot","central_user_id":15992302,"bot":1}}</syntaxhighlight>
       change_info: {"compactDiff":"{\"arrayFormatVersion\":1,\"labelChanges\":[],\"descriptionChanges\":[\"el\",\"eo\",\"en\",\"zh\",\"sr-ec\",\"wuu\",\"vi\",\"sr-el\",\"it\",\"zh-hk\",\"ar\",\"pt-br\",\"tg-cyrl\",\"cs\",\"et\",\"gl\",\"id\",\"es\",\"en-gb\",\"ru\",\"he\",\"nl\",\"pt\",\"zh-tw\",\"nb\",\"tr\",\"zh-cn\",\"tl\",\"th\",\"ro\",\"ca\",\"pl\",\"fr\",\"bg\",\"ast\",\"zh-sg\",\"bn\",\"de\",\"zh-my\",\"ko\",\"da\",\"fi\",\"zh-mo\",\"hu\",\"ja\",\"en-ca\",\"ka\",\"nn\",\"zh-hans\",\"sr\",\"sq\",\"nan\",\"oc\",\"sv\",\"zh-hant\",\"sk\",\"uk\",\"yue\"],\"statementChanges\":[],\"siteLinkChanges\":[],\"otherChanges\":false}","metadata":{"page_id":68145928,"parent_id":1019293753,"comment":"\/* wbeditentity-update:0| *\/ Bot: - Add descriptions:(58 langs).","rev_id":1019310059,"user_text":"Mr.Ibrahembot","central_user_id":15992302,"bot":1}}</syntaxhighlight>
The change_info is the compact serialization of the change and it's being used later to dispatch the change. This table is being trimmed by [https://github.com/wikimedia/puppet/blob/227067fa09f4e89bd640bedc1a4d5180707fd1fc/modules/mediawiki/manifests/maintenance/wikidata.pp#L31 a cronjob] in mwmaint1002 not to contain anything older than three days, otherwise it would explode.
The change_info is the compact serialization of the change and it's being used later to dispatch the change. This table is being trimmed by the DispatchChanges jobs handling its entries and usually contains less than 10 rows and almost always less than 100.


Wikidata (and other repos like commons) keep track of client wikis that subscribe to their entities. It's stored in ''wb_changes_subscription'' table:
Wikidata (and other repos like commons) keep track of client wikis that subscribe to their entities. It's stored in ''wb_changes_subscription'' table:
Line 85: Line 56:
+-----------+--------------+------------------+
+-----------+--------------+------------------+
| 100946988 | Q1          | afwiki          |
| 100946988 | Q1          | afwiki          |
|  57021716 | Q1           | alswiki          |
|  57021716 | Q2           | alswiki          |
| 116682143 | Q1           | amwiki          |
| 116682143 | Q2           | amwiki          |
|  57060845 | Q1           | anwiki          |
|  57060845 | Q2           | anwiki          |
|  57107362 | Q1           | arcwiki          |
|  57107362 | Q2           | arcwiki          |
+-----------+--------------+------------------+
+-----------+--------------+------------------+


Line 112: Line 83:


=== The workflow using an example ===
=== The workflow using an example ===
'''Note: Change dispatching is only responsible for triggering a refresh (and inject rows to RecentChanges). Fetching the actual data happens somewhere else in the code (see [https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/docs/change-propagation.wiki#wikipageupdater WikiPageUpdater in change-propagation.wiki]).'''
'''Note: Change dispatching is only responsible for triggering a refresh (and inject rows to RecentChanges). Fetching the actual data happens somewhere else in the code (see [https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_change-propagation.html#autotoc_md177 WikiPageUpdater in change-propagation docs]).'''


The actual dispatching happens with [https://github.com/wikimedia/puppet/blob/227067fa09f4e89bd640bedc1a4d5180707fd1fc/modules/mediawiki/manifests/maintenance/wikidata.pp#L14 several parallel cronjobs] that run a maintenance script for some time and then die (and new ones respawn). The maintenance scripts coordinate between themselves and the dead maintenance scripts using a redis lock manager (the same code and infrastructure as filebackend lock manager) and a table called ''wb_changes_dispatch'':
The actual dispatching happens in the [https://github.com/wikimedia/Wikibase/blob/master/repo/includes/ChangeModification/DispatchChangesJob.php DispatchChangesJob].
<syntaxhighlight lang="mysql">
 
MariaDB [wikidatawiki_p]> select * from wb_changes_dispatch limit 5;
1. The ''DispatchChanges'' job get's created with only the Entity-id as a parameter. It queries ''wb_changes'' table for all changes with that Entity-id. If multiple changes were made in quick succession to that Entity, then it might pick up multiple changes.
+-------------+-------------+------------+----------------+----------+--------------+
If it found any changes, it then queries for all client wikis subscribed to that Entity-id.
| chd_site    | chd_db      | chd_seen  | chd_touched    | chd_lock | chd_disabled |
 
+-------------+-------------+------------+----------------+----------+--------------+
Let's assume only ''afwiki'' is subscribed to that particular Entity. It then queues an ''EntityChangeNotification'' job for ''afwiki''. That job gets all the change(s) data directly as a parameter.
| abwiki      | abwiki      | 1015936815 | 20190927141553 | NULL    |            0 |
| acewiki    | acewiki    | 1015936788 | 20190927141551 | NULL    |            0 |
| adywiki    | adywiki    | 1015936745 | 20190927141546 | NULL    |            0 |
| afwiki     | afwiki     | 1015936737 | 20190927141545 | NULL    |            0 |
| afwikibooks | afwikibooks | 1015936815 | 20190927141553 | NULL    |            0 |
+-------------+-------------+------------+----------------+----------+--------------+


</syntaxhighlight>
After queueing this job at ''afwiki'', the ''DispatchChanges'' job deletes the rows that it had received from the ''wb_changes'' table.
<small>Note: The chd_lock column is replaced with the redis lock manager to avoid scripts holding long connections to master.</small>


1. Each dispatching maintenance script picks up a client wiki that hasn't been touched for a while at random (to avoid race conditions with other scripts) then lock it at redis so other scripts don't pick it up, let's assume it picked up afwiki, then queries ''wb_changes'' table to find all changes that happened after afwiki's ''chd_touched'' timestamp value:
2. Then at ''afwiki'' the ''EntityChangeNotificationJob'' runs with the those changes (i.e. telling ''afwiki'' "Hey, Chinese label of Q3180666 and Hindi description of Q469681 has changed"). The given job checks that data against the actual aspects it actually needs using ''wbc_entity_usage'' table in ''afwiki'':
<syntaxhighlight lang="mysql">
MariaDB [wikidatawiki_p]> select change_object_id, change_info from wb_changes join wb_changes_dispatch where wb_changes.change_time > wb_changes_dispatch.chd_touched and chd_site = 'afwiki' limit 5\G
*************************** 1. row ***************************
change_object_id: Q3065585
    change_info: {"compactDiff":"{\"arrayFormatVersion\":1,\"labelChanges\":[],\"descriptionChanges\":[],\"statementChanges\":[\"P7335\"],\"siteLinkChanges\":[],\"otherChanges\":false}","metadata":{"page_id":2930153,"parent_id":1021184825,"comment":"\/* wbsetreference-add:2| *\/ [[Property:P7335]]: 93048, Adding sources to Mixer ID (P7335).","rev_id":1021184886,"user_text":"Premeditated","central_user_id":54410566,"bot":0}}
*************************** 2. row ***************************
change_object_id: Q31827561
    change_info: {"compactDiff":"{\"arrayFormatVersion\":1,\"labelChanges\":[],\"descriptionChanges\":[\"de\"],\"statementChanges\":[],\"siteLinkChanges\":[],\"otherChanges\":false}","metadata":{"page_id":33311769,"parent_id":898389167,"comment":"\/* wbsetdescription-set:1|de *\/ Insel auf den Philippinen, [[:toollabs:quickstatements\/#\/batch\/19155|batch #19155]]","rev_id":1021184887,"user_text":"Hog\u00fc-456","central_user_id":35946278,"bot":0}}
*************************** 3. row ***************************
change_object_id: Q66116999
    change_info: {"compactDiff":"{\"arrayFormatVersion\":1,\"labelChanges\":[\"ast\"],\"descriptionChanges\":[],\"statementChanges\":[],\"siteLinkChanges\":[],\"otherChanges\":false}","metadata":{"page_id":65740278,"parent_id":993826385,"comment":"\/* wbsetlabel-add:1|ast *\/ Hibbertia vestita var. thymifolia","rev_id":1021184888,"user_text":"XabatuBot","central_user_id":53605708,"bot":1}}
*************************** 4. row ***************************
change_object_id: Q34659082
    change_info: {"compactDiff":"{\"arrayFormatVersion\":1,\"labelChanges\":[],\"descriptionChanges\":[\"hy\",\"hyw\"],\"statementChanges\":[],\"siteLinkChanges\":[],\"otherChanges\":false}","metadata":{"page_id":36092135,"parent_id":972155915,"comment":"\/* wbeditentity-update:0| *\/ hy:description:2002 \u0569\u057e\u0561\u056f\u0561\u0576\u056b \u0574\u0561\u0575\u056b\u057d\u056b\u0576 \u0570\u0580\u0561\u057f\u0561\u0580\u0561\u056f\u057e\u0561\u056e \u0563\u056b\u057f\u0561\u056f\u0561\u0576 \u0570\u0578\u0564\u057e\u0561\u056e, hyw:description:2002 \u0569\u0578\u0582\u0561\u056f\u0561\u0576\u056b \u0544\u0561\u0575\u056b\u057d\u056b\u0576 \u0570\u0580\u0561\u057f\u0561\u0580\u0561\u056f\u0578\u0582\u0561\u056e \u0563\u056b\u057f\u0561\u056f\u0561\u0576 \u0575\u0585\u0564\u0578\u0582\u0561\u056e","rev_id":1021184889,"user_text":"\u0531\u0577\u0562\u0578\u057f\u054f\u0546\u0542","central_user_id":45853834,"bot":1}}
*************************** 5. row ***************************
change_object_id: Q60369602
    change_info: {"compactDiff":"{\"arrayFormatVersion\":1,\"labelChanges\":[],\"descriptionChanges\":[],\"statementChanges\":[\"P356\",\"P304\",\"P478\",\"P819\",\"P1433\"],\"siteLinkChanges\":[],\"otherChanges\":false}","metadata":{"page_id":60243349,"parent_id":986272026,"comment":"\/* wbeditentity-update:0| *\/ batch import from [[Q654724|SIMBAD]] reference \"1996ApJ...466..732G\"","rev_id":1021184890,"user_text":"Ghuron","central_user_id":2200842,"bot":0}}
</syntaxhighlight>
2. Then the job queries ''wb_changes_subscription'' to see which of them afwiki is actually subscribed to:
<syntaxhighlight lang="mysql">
MariaDB [wikidatawiki_p]> select change_object_id, change_info from wb_changes join wb_changes_dispatch join wb_changes_subscription on change_object_id = cs_entity_id where wb_changes.change_time > wb_changes_dispatch.chd_touched and chd_site = 'afwiki' and cs_subscriber_id = 'afwiki' limit 5\G
*************************** 1. row ***************************
change_object_id: Q3180666
    change_info: {"compactDiff":"{\"arrayFormatVersion\":1,\"labelChanges\":[\"zh\"],\"descriptionChanges\":[],\"statementChanges\":[],\"siteLinkChanges\":[],\"otherChanges\":false}","metadata":{"page_id":3038120,"parent_id":999138832,"comment":"< A long comment>","rev_id":1021103671,"user_text":"LogainmBot","central_user_id":58945035,"bot":1}}
*************************** 2. row ***************************
change_object_id: Q469681
    change_info: {"compactDiff":"{\"arrayFormatVersion\":1,\"labelChanges\":[],\"descriptionChanges\":[\"hi\"],\"statementChanges\":[],\"siteLinkChanges\":[],\"otherChanges\":false}","metadata":{"page_id":442999,"parent_id":1021102387,"comment":"< A long comment>","rev_id":1021103680,"user_text":"Vidariv","central_user_id":5888,"bot":0}}
</syntaxhighlight>
3. Then the job queues a ChangeNotificationJob in afwiki with the items that has changed and their change_info values (i.e. telling afwiki "Hey, Chinese label of Q3180666 and Hindi description of Q469681 has changed"). The given job checks it against the actual aspects it actually needs using ''wbc_entity_usage'' table in afwiki:
<syntaxhighlight lang="mysql">
<syntaxhighlight lang="mysql">
MariaDB [afwiki_p]> select * from wbc_entity_usage where eu_entity_id in ('Q3180666', 'Q469681') limit 5;
MariaDB [afwiki_p]> select * from wbc_entity_usage where eu_entity_id in ('Q3180666', 'Q469681') limit 5;
Line 173: Line 108:


</syntaxhighlight>
</syntaxhighlight>
This is to avoid triggering a refreshLink or InjectRCrecord job when the used aspects of changed entities hasn't changed actually.
This is to avoid triggering a ''refreshLink'' or ''InjectRCrecord'' job when the used aspects of changed entities hasn't changed actually.


For example, if the page in afwiki only uses the label on English, and the aliases in Persian has changed, no action is needed here.
For example, if the page in ''afwiki'' only uses the label on English, and the aliases in Persian has changed, no action is needed here.


If there's a match or matches, the dispatcher triggers jobs to refresh the page(s) to use the new data and injects rows into ''recentchanges'' table of the client. At the end, the job updates ''chd_seen'' and ''chd_touched'', unlocks the wiki in redis, and starts again from 1.
If there's a match or matches, the dispatcher triggers jobs to refresh the page(s) to use the new data and injects rows into ''recentchanges'' table of the client.


<small>Note: There used to be an aspect called "X" meaning "All" and it would basically says "notify for any change on the given item" but it's deprecated now.</small>
<small>Note: There used to be an aspect called "X" meaning "All" and it would basically says "notify for any change on the given item" but it's deprecated now.</small>

Latest revision as of 13:18, 21 October 2021

Overview

  • Changes on Wikidata are buffered in the wb_changes table.
  • DispatchChanges job is queued on Wikidata after an edit to an entity that has at least one wiki subscribed to it
  • The DispatchChanges job queues an EntityChangeNotification job into the job queue for each wiki subscribed to that entity
    • These EntityChangeNotification jobs get the whole Change(s) as parameter
  • When the EntityChangeNotification job runs on wikis, the ChangeHandler handles the change, which includes cache purges, refreshing links and injecting rc records for example.

Full docs: https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_change-propagation.html

The process runs on wikidatawiki, testwikidatawiki and beta.

Occasional stuck changes

Rarely, it might happen that a change gets stuck in `wb_changes` for yet unknown reasons. This can be resolved by running the ResubmitChanges.php maintenance script. See for example task T294008.

Control

Stop dispatching

$repoSettings['localClientDatabases'] controls to which wikis may get EntityChangeNotification jobs queued. Set this to the empty list to stop new EntityChangeNotification jobs being queued except for the client wiki which is also its own repo, i.e. wikidatawiki.

Monitoring

Dispatching state on the repo

  • TODO: add relevant logs on any servers, useful kafkacat commands, etc.

How it actually works?

Data Storage

When someone makes an edit to an Entity in Wikidata, if and only if that Entity has at least one client wiki subscribed to it, then a new row in wb_changes table in wikidatawiki gets added. Here's an example:

MariaDB [wikidatawiki_p]> select * from wb_changes limit 1\G
*************************** 1. row ***************************
         change_id: 1014161077
       change_type: wikibase-item~update
       change_time: 20190924171504
  change_object_id: Q1
change_revision_id: 1019310059
    change_user_id: 142191
       change_info: {"compactDiff":"{\"arrayFormatVersion\":1,\"labelChanges\":[],\"descriptionChanges\":[\"el\",\"eo\",\"en\",\"zh\",\"sr-ec\",\"wuu\",\"vi\",\"sr-el\",\"it\",\"zh-hk\",\"ar\",\"pt-br\",\"tg-cyrl\",\"cs\",\"et\",\"gl\",\"id\",\"es\",\"en-gb\",\"ru\",\"he\",\"nl\",\"pt\",\"zh-tw\",\"nb\",\"tr\",\"zh-cn\",\"tl\",\"th\",\"ro\",\"ca\",\"pl\",\"fr\",\"bg\",\"ast\",\"zh-sg\",\"bn\",\"de\",\"zh-my\",\"ko\",\"da\",\"fi\",\"zh-mo\",\"hu\",\"ja\",\"en-ca\",\"ka\",\"nn\",\"zh-hans\",\"sr\",\"sq\",\"nan\",\"oc\",\"sv\",\"zh-hant\",\"sk\",\"uk\",\"yue\"],\"statementChanges\":[],\"siteLinkChanges\":[],\"otherChanges\":false}","metadata":{"page_id":68145928,"parent_id":1019293753,"comment":"\/* wbeditentity-update:0| *\/ Bot: - Add descriptions:(58 langs).","rev_id":1019310059,"user_text":"Mr.Ibrahembot","central_user_id":15992302,"bot":1}}

The change_info is the compact serialization of the change and it's being used later to dispatch the change. This table is being trimmed by the DispatchChanges jobs handling its entries and usually contains less than 10 rows and almost always less than 100.

Wikidata (and other repos like commons) keep track of client wikis that subscribe to their entities. It's stored in wb_changes_subscription table:

MariaDB [wikidatawiki_p]> select * from wb_changes_subscription where cs_entity_id like 'Q%' limit 5;
+-----------+--------------+------------------+
| cs_row_id | cs_entity_id | cs_subscriber_id |
+-----------+--------------+------------------+
| 100946988 | Q1           | afwiki           |
|  57021716 | Q2           | alswiki          |
| 116682143 | Q2           | amwiki           |
|  57060845 | Q2           | anwiki           |
|  57107362 | Q2           | arcwiki          |
+-----------+--------------+------------------+

Client wikis themselves keep track of exactly which part of Wikidata (=repo) entities they are using and in which pages in a table called wbc_entity_usage (note that this table is in all wikis and not only in Wikidata).

This is an example form Afrikaans Wikipedia:

MariaDB [afwiki_p]> select * from wbc_entity_usage where eu_entity_id = 'Q1';
+-----------+--------------+-----------+------------+
| eu_row_id | eu_entity_id | eu_aspect | eu_page_id |
+-----------+--------------+-----------+------------+
|    398481 | Q1           | C         |      39420 |
|    929039 | Q1           | L.af      |      70835 |
|    132666 | Q1           | O         |      39420 |
|    115881 | Q1           | S         |      39420 |
|    398482 | Q1           | T         |      39420 |
|    929040 | Q1           | T         |      70835 |
+-----------+--------------+-----------+------------+

"C" means "statements" (a.k.a. "claims"), "L.af" means "Label in Afrikaans language" , "O" means "Other" (currently means aliases), "S" means "Sitelinks" (to show them in sidebar), "T" means title. Note that "Q1" is being used in two different pages in different aspects. An item can be used in millions of pages in a client wiki.

The workflow using an example

Note: Change dispatching is only responsible for triggering a refresh (and inject rows to RecentChanges). Fetching the actual data happens somewhere else in the code (see WikiPageUpdater in change-propagation docs).

The actual dispatching happens in the DispatchChangesJob.

1. The DispatchChanges job get's created with only the Entity-id as a parameter. It queries wb_changes table for all changes with that Entity-id. If multiple changes were made in quick succession to that Entity, then it might pick up multiple changes. If it found any changes, it then queries for all client wikis subscribed to that Entity-id.

Let's assume only afwiki is subscribed to that particular Entity. It then queues an EntityChangeNotification job for afwiki. That job gets all the change(s) data directly as a parameter.

After queueing this job at afwiki, the DispatchChanges job deletes the rows that it had received from the wb_changes table.

2. Then at afwiki the EntityChangeNotificationJob runs with the those changes (i.e. telling afwiki "Hey, Chinese label of Q3180666 and Hindi description of Q469681 has changed"). The given job checks that data against the actual aspects it actually needs using wbc_entity_usage table in afwiki:

MariaDB [afwiki_p]> select * from wbc_entity_usage where eu_entity_id in ('Q3180666', 'Q469681') limit 5;
+-----------+--------------+-----------+------------+
| eu_row_id | eu_entity_id | eu_aspect | eu_page_id |
+-----------+--------------+-----------+------------+
|    872799 | Q3180666     | C.P1015   |     224030 |
|    872807 | Q3180666     | C.P1048   |     224030 |
|    872798 | Q3180666     | C.P1053   |     224030 |
|    872815 | Q3180666     | C.P1157   |     224030 |
|    872813 | Q3180666     | C.P1222   |     224030 |
+-----------+--------------+-----------+------------+

This is to avoid triggering a refreshLink or InjectRCrecord job when the used aspects of changed entities hasn't changed actually.

For example, if the page in afwiki only uses the label on English, and the aliases in Persian has changed, no action is needed here.

If there's a match or matches, the dispatcher triggers jobs to refresh the page(s) to use the new data and injects rows into recentchanges table of the client.

Note: There used to be an aspect called "X" meaning "All" and it would basically says "notify for any change on the given item" but it's deprecated now.