You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Last updated March 2019
Data & Writing
- Current EPM (edits per minute) on wikidata range between 300 and 1100.
- Peak EPM appears to have been in Feb 2019 with 2000 EPM (per https://grafana.wikimedia.org/d/000000170/wikidata-edits?refresh=1m&panelId=1&fullscreen&orgId=1&from=1550907044608&to=1550934601798)
- 2019-20 prediction is that the general range the edit rate sits between will increase slightly, but not drastically.
- We may also see more high EPM spikes...
Wikidata edit rate (per year)
2019-20 prediction, no vast increase in rate, 200 million - 250 million.
Data from hadoop: https://phabricator.wikimedia.org/P8193 Yearly EPM using X/365/24/60
- 2012, 2,912,964
- 2013, 94,323,394
- 2014, 87,411,229
- 2015, 102,362,226 (194 EPM)
- 2016, 135,511,683 (257 EPM)
- 2017, 192,353,549 (365 EPM)
- 2018, 208,944,716 (397 EPM)
Yearly edit rate equivalent sustained EPMs
In order to put looking at yearly figures in perspective see below conversion table for going from yearly edits to sustained / avergae EPM for the year.
|200 million||380 EPM|
|300 million||570 EPM|
|600 million||1141 EPM|
March 2019 we are at 881,499,873 revisions. This will probably increase to 1 billion by the end of 2019. In 2018 the year edit count was 208,944,716. The rate is predicted to continue increasing at around 200 million - 250 million for 2019-20.
- Long term, reaching 4,294,967,295 (bigint irevids)**
Based on what we know now we would predicate that we would not need bigints on the revision table until at least 2025, likely further in the future.
|2019||200-250 million||1.1-1.2 billion|
|2020||200-300 million||1.3-1.5 billion|
|2021||200-350 million||1.5-1.85 billion|
|2022||200-400 million||1.7-2.25 billion|
|2023||200-450 million||1.9-2.7 billion|
|2024||200-500 million||2.1-3.2 billion|
|2025||200-550 million||2.3-3.75 billion|
|2025||200-600 million||2.5-4.15 billion|
- Average size of items remains pretty steady, ~18KB in March 2019
- 2019-20 prediction would not see this increase to over ~30KB
- Lexeme size isn't tracked, but assumed to be much smaller than items.
- In 2019 the max size of entities was increased from 2500 to 3000.
Storage in memcached
Currently (March 2019) the size of entities could become and issue for storage in the shared memcached cache when they reach 1MB.
See WMDE/Wikidata/Caching#WikiPageEntityRevisionLookup for more details.
Right now the biggest shared cache entity is less than 200k, meaning the max entity size limit would have to increase to around 15,000 to become an issue.
Changes in the way the serialization is stored though could accelerate this.
Number of Entities by type
2019-20 predicted growth 10 million - 20 million, resulting in no more than 73 million items.
- 2016-17 5.3 million
- 2017-18 17.7 million
- 2018-19 11.3 million
2019-20 predicted grown 1500 - 3000 property increase, resulting in no more than 9000 properties.
This takes into account the fact that over the years the rate of creation has increased every year, and also that commons will start using properties in 2019 and we may see an increase property creation due to that.
- 2016-17, 900
- 2017-18, 1200
- 2018-19, 1500
Lexemes were only released to the world in 2018, so their growth is hard to predict.
The last 9 months (to March 2019) have seen an increase from 3509 to 43500.
Unless something drastic happens we would comfortably stay below 1 million lexemes for 2019-2020.
No prediction for Forms or Senses here...
There is no grafana tracking for mediainfo entities currently.
DB query for counting current # of mediainfo entities https://quarry.wmflabs.org/query/34303
March 2019: 273,540 mediainfo entities, out of 52 million files
MediaInfo entities have the potential to match the # of files on commons (50 million).
DB Tables size
Latest info on auto inc fields running out of space: https://phabricator.wikimedia.org/P8198
wb_terms is VERY big(on disk), and is going to see no further adoption.
It is going to be killed in 2019.
TBA current growth predictions.
text & revisions
These tables will share the same growth pattern in terms of auto inc ids and the need to switch to bigints.
See predicted revision count in WMDE/Wikidata/Growth#Revision_count.
recentchanges & cu_changes
based on predicted revision increase rate WMDE/Wikidata/Growth#Revision_count we would fill the current auto increment fields between 2022-2024.
Data below from March 2019:
table_schema: wikidatawiki table_name: recentchanges column_name: rc_id data_type: int column_type: int(11) is_signed: 1 is_unsigned: 0 max_value: 2147483647 auto_increment: 919219099 auto_increment_ratio: 0.4280
table_schema: wikidatawiki table_name: cu_changes column_name: cuc_id data_type: int column_type: int(11) is_signed: 1 is_unsigned: 0 max_value: 2147483647 auto_increment: 899023427 auto_increment_ratio: 0.4186
WikibaseQualityConstraints check data
TBA (we are going to persistently store this stuff)
Usage & Reading
TBA more stuff?
Wikidata.org / Repo
3rd party federated wikis
At some point we will develop federation for 3rd parties. This will likely result in an increase in requests to Special:EntityData and or the API. More details to come in the future...
3rd party WDQS updaters
As identified in https://phabricator.wikimedia.org/T217897#5020183 WDQS updaters both internal to WMF and external hit Special:EntityData a lot. These requests account for most of the cache misses on wikidata.org.
The PHP processing for these queries is fairly light weight, but continued uncached requests here will result in a direct connection to increase reads from the shared entity revision cache in memcached.
Naturally this is predicted to increase but this is mainly for the WMF discover team to worry about.
There will likely be a growth in internal WMF requests (particularly from Wikibase quality constraints) as the checks are planned to run after every edit. Thus as edit rate increases the number of these checks increases.
Lydia growth thoughts from early 2019
- Creation Rate
- Interest in project is growing
- OTOH some groups are splitting out into own projects
- Creation rate will not slow down
- Creation rate may slow down a bit, but follow existing trend
- Huge growth expected, number of M entities to be similar to number of files on commons
- Commons it also expected to grow at a high rate
- Properties for commons?
- No significant raise of number expected.
- Early stage of project, significant growth expected
- Auto generating forms and senses - how much data is actually stored(curatable?) vs generated on the fly(i.e. Only materialize when requested)
- General edit rate growth
- Client editing from clients (wikipedias)
- volumes of edits comparable to bot edit volume currently
- Client editing from clients (wikipedias)
- Growth in the size of the entity
- On average each item will have more data
- Data used on client wikis
- With client editing and better referencing the significant use of Wikidata data will grow
- On going: adding infoboxes to WMF Commons categories
- Single template in use, would changing the template cause some issues?
- Recent Changes
- WMF should be taking care of this
- External to wikidata?
- Non-WMF federated wikis accessing Wikidata data