You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

User:AKhatun/Wikidata Subgraph Query Analysis: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>AKhatun
m (Typo and minor edits)
imported>AKhatun
(→‎Human subgraph (Q5) query analysis: Add query time and UA analysis)
Line 17: Line 17:
# If the query uses literals that occur 99% of the times in a particular subgraph. The literals can occur with or without language tags. Both versions are compared to check for match. Note that whole literals are matched in queries and Wikidata. Queries that ask for partial matches, using regex for example, are not included. The assumption is that such queries are more likely to contain other items from the subgraph and are caught anyways.  
# If the query uses literals that occur 99% of the times in a particular subgraph. The literals can occur with or without language tags. Both versions are compared to check for match. Note that whole literals are matched in queries and Wikidata. Queries that ask for partial matches, using regex for example, are not included. The assumption is that such queries are more likely to contain other items from the subgraph and are caught anyways.  


The following analysis uses Wikidata dump of <code>20211101</code> and WDQS public SPARQL queries of 10/2021. '''All query related numbers below are monthly values'''.
The following analysis uses Wikidata dump of <code>20211101</code> and WDQS public SPARQL queries of 10/2021 unless otherwise stated. '''All query related numbers below are monthly values'''.


== Query count and time ==
== Query count and time ==
Line 25: Line 25:
* Total query time for all queries for a month is ~16,000 hours.
* Total query time for all queries for a month is ~16,000 hours.


The table below lists the top 50 most queried subgraphs with subgraph size and query time information. A breakdown of what caused the match is also present, which corresponds to the parameters mentioned in [[#What are subgraph related queries]]. It also ranks the subgraphs by size, query count, and query time consumed.
The table below lists the top 50 most queried subgraphs with subgraph size and query time information of <code>11/2021</code>. A breakdown of what caused the match is also present, which corresponds to the parameters mentioned in [[#What are subgraph related queries]]. It also ranks the subgraphs by size, query count, and query time consumed. A more complete list containing 341 subgraphs, that form ~90% of Wikidata triples, is available here: [csvlink|all_subgraph_data.csv]. The difference between values from October and November is shown in the next table for comparison purposes. In some places, the query count percentages differ a lot.


A more complete list containing 341 subgraphs, that form ~90% of Wikidata triples, is available here: [csvlink|all_subgraph_data.csv]
{| class="wikitable sortable mw-collapsible"
|+ {{nowrap|Top 50 most queries subgraphs in Wikidata with subgraph size information}}
|-
! Subgraph rank by size !! Subgraph rank by query count !! Subgraph rank by query time !! Subgraph !! Subgraph label !! %of triples !! %of entities !! Days to recover (4.77M rate) !!Query count !! %count of all queries !! Query time (hr) !! %time of all queries !!Avg time/query !! %count of query from Qid !! %count of query from instance items !! %count of query from items !! %count of query from properties !! %count of query from literals
|-
|3||1||1||Q5||human||7.254||10.045||204||60,868,572||31.941||5248||34.195||0.31||2.541||18.199||12.198||19.457||1.435
|-
|5||2||11||Q16521||taxon||2.885||3.5||81||27,172,995||14.259||480||3.131||0.064||12.19||0.746||12.862||0.87||0.433
|-
|34||3||7||Q4830453||business||0.107||0.208||3||9,228,037||4.842||554||3.607||0.216||1.646||2.95||2.24||0.001||0.177
|-
|6||4||5||Q101352||family name||1.646||0.511||46||5,990,617||3.144||659||4.292||0.396||0.041||3.057||2.791||0.018||0.038
|-
|15||5||2||Q11424||film||0.359||0.284||10||5,067,305||2.659||1541||10.042||1.095||0.451||1.469||1.348||0.003||0.543
|-
|1||6||13||Q13442814||scholarly article||48.935||39.815||1378||4,944,995||2.595||263||1.713||0.191||0.017||1.942||1.938||0.405||0.396
|-
|7||7||3||Q4167410||Wikimedia disambiguation page||1.354||1.464||38||3,292,873||1.728||765||4.982||0.836||0.164||0.192||0.472||0.0||1.163
|-
|2||8||25||Q6999||astronomical object||8.684||8.943||245||2,444,109||1.283||79||0.516||0.117||0.003||1.218||1.222||0.023||0.004
|-
|92||9||14||Q6881511||enterprise||0.036||0.052||1||1,937,486||1.017||234||1.528||0.436||0.083||0.812||0.538||0.0||0.071
|-
|26||10||29||Q484170||commune of France||0.179||0.048||5||1,934,902||1.015||70||0.455||0.13||0.024||0.869||0.085||0.115||0.01
|-
|19||11||22||Q13406463||Wikimedia list article||0.249||0.355||7||1,766,742||0.927||117||0.765||0.239||0.034||0.372||0.628||0.0||0.137
|-
|63||12||12||Q5398426||television series||0.055||0.063||2||1,379,486||0.724||411||2.68||1.073||0.048||0.376||0.369||0.0||0.167
|-
|37||13||47||Q7725634||literary work||0.087||0.203||2||1,377,546||0.723||42||0.273||0.11||0.39||0.181||0.243||0.0||0.009
|-
|16||14||4||Q486972||human settlement||0.298||0.612||8||1,328,064||0.697||699||4.557||1.896||0.328||0.39||0.236||0.0||0.005
|-
|163||15||15||Q891723||public company||0.015||0.013||0||1,175,813||0.617||219||1.426||0.67||0.042||0.415||0.185||0.001||0.092
|-
|90||16||6||Q43229||organization||0.037||0.082||1||1,067,340||0.56||600||3.908||2.023||0.259||0.227||0.146||0.0||0.021
|-
|13||17||24||Q3305213||painting||0.426||0.579||12||926,701||0.486||86||0.558||0.333||0.017||0.426||0.284||0.002||0.008
|-
|87||18||36||Q47461344||written work||0.037||0.078||1||881,216||0.462||53||0.345||0.216||0.289||0.079||0.114||0.0||0.003
|-
|25||19||32||Q532||village||0.199||0.294||6||872,310||0.458||61||0.399||0.253||0.003||0.417||0.198||0.0||0.015
|-
|4||20||28||Q4167836||Wikimedia category||5.806||5.175||164||808,536||0.424||74||0.484||0.331||0.037||0.363||0.292||0.0||0.024
|-
|61||21||51||Q7889||video game||0.055||0.048||2||753,351||0.395||37||0.244||0.179||0.006||0.181||0.314||0.002||0.01
|-
|20||22||41||Q8502||mountain||0.248||0.559||7||749,283||0.393||47||0.306||0.225||0.002||0.369||0.351||0.0||0.001
|-
|28||23||33||Q482994||album||0.16||0.288||5||704,746||0.37||59||0.388||0.304||0.012||0.15||0.189||0.0||0.098
|-
|89||24||17||Q4164871||position||0.037||0.128||1||645,434||0.339||175||1.141||0.977||0.003||0.305||0.025||0.0||0.011
|-
|8||25||16||Q7187||gene||0.911||1.273||26||604,364||0.317||208||1.354||1.238||0.084||0.1||0.022||0.015||0.127
|-
|11||26||26||Q11173||chemical compound||0.684||1.302||19||588,469||0.309||76||0.496||0.466||0.135||0.11||0.092||0.002||0.014
|-
|55||27||54||Q215380||musical group||0.062||0.087||2||585,266||0.307||37||0.241||0.227||0.01||0.205||0.16||0.0||0.011
|-
|31||28||39||Q16970||church building||0.128||0.227||4||577,677||0.303||48||0.315||0.301||0.003||0.288||0.214||0.0||0.002
|-
|71||29||55||Q732577||publication||0.047||0.076||1||569,536||0.299||37||0.238||0.231||0.283||0.015||0.296||0.0||0.0
|-
|22||30||43||Q79007||street||0.23||0.626||6||535,623||0.281||44||0.289||0.298||0.028||0.246||0.218||0.001||0.001
|-
|23||31||34||Q4022||river||0.216||0.425||6||520,347||0.273||56||0.365||0.388||0.002||0.254||0.192||0.0||0.002
|-
|242||32||8||Q14204246||Wikimedia project page||0.008||0.033||0||498,708||0.262||548||3.572||3.957||0.026||0.19||0.038||0.0||0.064
|-
|36||33||63||Q3947||house||0.096||0.216||3||465,249||0.244||33||0.212||0.252||0.0||0.238||0.223||0.0||0.002
|-
|32||34||31||Q41176||building||0.124||0.29||3||463,636||0.243||65||0.423||0.504||0.042||0.189||0.168||0.001||0.002
|-
|307||35||62||Q783794||company||0.005||0.012||0||459,638||0.241||33||0.213||0.256||0.081||0.146||0.1||0.0||0.006
|-
|29||36||48||Q23397||lake||0.136||0.279||4||456,054||0.239||42||0.273||0.331||0.002||0.227||0.211||0.0||0.001
|-
|119||37||42||Q3957||town||0.023||0.015||1||450,870||0.237||46||0.297||0.364||0.057||0.162||0.034||0.0||0.003
|-
|64||38||40||Q811979||architectural structure||0.054||0.12||2||445,779||0.234||48||0.313||0.388||0.097||0.126||0.117||0.0||0.001
|-
|80||39||59||Q34442||road||0.041||0.073||1||440,960||0.231||34||0.22||0.276||0.008||0.129||0.171||0.0||0.001
|-
|275||40||180||Q21198342||manga series||0.007||0.015||0||437,382||0.23||11||0.074||0.093||0.01||0.052||0.2||0.0||0.003
|-
|72||41||23||Q86850539||Whitaker's Latin frequency type C||0.047||0.011||1||436,103||0.229||95||0.622||0.788||0.0||0.0||0.0||0.0||0.228
|-
|138||42||139||Q18340514||events in a specific year or time period||0.019||0.048||1||431,649||0.227||16||0.104||0.133||0.0||0.21||0.068||0.0||0.004
|-
|261||43||53||Q2085381||publisher||0.007||0.015||0||420,459||0.221||37||0.243||0.319||0.001||0.21||0.068||0.0||0.004
|-
|44||44||38||Q55488||railway station||0.074||0.104||2||410,774||0.216||49||0.319||0.43||0.001||0.172||0.163||0.0||0.002
|-
|108||45||27||Q33506||museum||0.027||0.044||1||409,716||0.215||75||0.486||0.655||0.017||0.184||0.134||0.0||0.001
|-
|181||46||19||Q34770||language||0.013||0.011||0||402,013||0.211||145||0.947||1.302||0.009||0.169||0.02||0.0||0.017
|-
|112||47||86||Q15632617||fictional human||0.025||0.056||1||395,934||0.208||25||0.166||0.232||0.007||0.138||0.09||0.0||0.004
|-
|42||48||119||Q22808320||Wikimedia human name disambiguation page||0.077||0.075||2||381,873||0.2||19||0.125||0.181||0.0||0.164||0.142||0.0||0.001
|-
|143||49||75||Q11032||newspaper||0.017||0.043||0||380,153||0.199||28||0.181||0.263||0.002||0.169||0.143||0.0||0.019
|-
|38||50||117||Q3331189||version, edition, or translation||0.087||0.191||2||374,597||0.197||19||0.126||0.186||0.117||0.037||0.134||0.0||0.038
|}
=== Comparison of subgraph queries across time ===


{| class="wikitable sortable"
{| class="wikitable sortable mw-collapsible"
|+ Top 50 most queries subgraphs in Wikidata with subgraph size information
|+ {{nowrap|Comparison of subgraph queries across time (Oct, Nov 2021)}}
|-
|-
! Subgraph rank by size !! Subgraph rank by query count !! Subgraph rank by query time !! Subgraph !! Subgraph label !! %of triples !! %of entities !! Days to recover (4.77M rate) !!Query count !! %count of all queries !! Query time (hr) !! %time of all queries !! %count of query from Qid !! %count of query from instance items !! %count of query from items !! %count of query from properties !! %count of query from literals
! Subgraph rank by size !! Subgraph !! Subgraph label !!%of entities !! %of triples!! Oct query count !! style="background: #ffdead;" | Oct %count of queries !! Oct query time (hr) !! Oct %time of queries !! Nov query count !! style="background: #ffdead;" | Nov %count of queries !! Nov query time (hr) !! Nov %time of queries
|-
|-
|3||1||1||Q5||human||7.324||9.986||203||68,659,369||31.058||6314||0.393||1.827||17.705||10.324||20.176||1.11
|3||Q5||human||9.986||7.324||68,659,369|| style="background: #ffdead;" |31.058||6,314||39.3||60,868,572|| style="background: #ffdead;" |31.941||5,248||34.195
|-
|-
|5||2||4||Q16521||taxon||2.871||3.427||79||56,437,140||25.529||495||0.031||22.986||1.251||23.665||0.965||0.496
|5||Q16521||taxon||3.427||2.871||56,437,140|| style="background: #ffdead;" |'''25.529'''||495||3.1||27,172,995|| style="background: #ffdead;" |'''14.259'''||480||3.131
|-
|-
|6||3||3||Q101352||family name||1.546||0.509||43||5,564,173||2.517||640||0.04||0.064||2.425||2.34||0.016||0.032
|34||Q4830453||business||0.207||0.108||4,041,395|| style="background: #ffdead;" |'''1.828'''||343||2.1||9,228,037|| style="background: #ffdead;" |'''4.842'''||554||3.607
|-
|-
|15||4||2||Q11424||film||0.364||0.281||10||4,757,084||2.152||1613||0.1||0.563||1.308||1.089||0.008||0.407
|6||Q101352||family name||0.509||1.546||5,564,173|| style="background: #ffdead;" |'''2.517'''||640||4.0||5,990,617|| style="background: #ffdead;" |'''3.144'''||659||4.292
|-
|-
|34||5||7||Q4830453||business||0.108||0.207||3||4,041,395||1.828||343||0.021||0.953||0.788||0.416||0.0||0.101
|15||Q11424||film||0.281||0.364||4,757,084|| style="background: #ffdead;" |2.152||1,613||10.0||5,067,305|| style="background: #ffdead;" |2.659||1,541||10.042
|-
|-
|7||6||9||Q4167410||Wikimedia disambiguation page||1.374||1.459||38||3,737,550||1.691||223||0.014||0.195||0.484||0.554||0.0||0.938
|1||Q13442814||scholarly article||39.794||49.668||1,649,268|| style="background: #ffdead;" |'''0.746'''||142||0.9||4,944,995|| style="background: #ffdead;" |'''2.595'''||263||1.713
|-
|-
|177||7||20||Q34770||language||0.013||0.011||0||1,713,196||0.775||73||0.005||0.008||0.757||0.009||0.0||0.005
|7||Q4167410||Wikimedia disambiguation page||1.459||1.374||3,737,550|| style="background: #ffdead;" |1.691||223||'''1.4'''||3,292,873|| style="background: #ffdead;" |1.728||765||'''4.982'''
|-
|-
|1||8||13||Q13442814||scholarly article||49.668||39.794||1375||1,649,268||0.746||142||0.009||0.005||0.261||0.278||0.124||0.386
|2||Q6999||astronomical object||8.942||8.75||448,032|| style="background: #ffdead;" |'''0.203'''||51||0.3||2,444,109|| style="background: #ffdead;" |'''1.283'''||79||0.516
|-
|-
|4||9||17||Q4167836||Wikimedia category||5.85||5.165||162||1,383,343||0.626||96||0.006||0.019||0.594||0.152||0.0||0.01
|92||Q6881511||enterprise||0.052||0.036||943,613|| style="background: #ffdead;" |'''0.427'''||164||1.0||1,937,486|| style="background: #ffdead;" |'''1.017'''||234||1.528
|-
|-
|10||10||14||Q11173||chemical compound||0.693||1.302||19||1,307,852||0.592||133||0.008||0.022||0.548||0.449||0.001||0.014
|26||Q484170||commune of France||0.043||0.18||866,766|| style="background: #ffdead;" |'''0.392'''||46||0.3||1,934,902|| style="background: #ffdead;" |'''1.015'''||70||0.455
|-
|-
|20||11||22||Q13406463||Wikimedia list article||0.252||0.352||7||1,283,160||0.58||73||0.005||0.018||0.409||0.357||0.0||0.048
|20||Q13406463||Wikimedia list article||0.352||0.252||1,283,160|| style="background: #ffdead;" |0.58||73||0.5||1,766,742|| style="background: #ffdead;" |0.927||117||0.765
|-
|-
|63||12||6||Q5398426||television series||0.055||0.062||2||1,206,285||0.546||366||0.023||0.05||0.332||0.252||0.0||0.128
|63||Q5398426||television series||0.062||0.055||1,206,285|| style="background: #ffdead;" |0.546||366||2.3||1,379,486|| style="background: #ffdead;" |0.724||411||2.68
|-
|-
|243||13||24||Q14204246||Wikimedia project page||0.008||0.033||0||1,114,113||0.504||62||0.004||0.009||0.227||0.016||0.0||0.275
|42||Q7725634||literary work||0.176||0.077||468,204|| style="background: #ffdead;" |0.212||22||0.1||1,377,546|| style="background: #ffdead;" |0.723||42||0.273
|-
|-
|92||14||11||Q6881511||enterprise||0.036||0.052||1||943,613||0.427||164||0.01||0.034||0.338||0.144||0.0||0.042
|16||Q486972||human settlement||0.602||0.302||721,789|| style="background: #ffdead;" |0.327||73||'''0.5'''||1,328,064|| style="background: #ffdead;" |0.697||699||'''4.557'''
|-
|-
|26||15||29||Q484170||commune of France||0.18||0.043||5||866,766||0.392||46||0.003||0.006||0.278||0.004||0.098||0.007
|165||Q891723||public company||0.013||0.015||837,595|| style="background: #ffdead;" |0.379||157||1.0||1,175,813|| style="background: #ffdead;" |0.617||219||1.426
|-
|-
|165||16||12||Q891723||public company||0.015||0.013||0||837,595||0.379||157||0.01||0.034||0.277||0.061||0.0||0.054
|91||Q43229||organization||0.08||0.037||806,840|| style="background: #ffdead;" |0.365||123||'''0.8'''||1,067,340|| style="background: #ffdead;" |0.56||600||'''3.908'''
|-
|-
|12||17||19||Q3305213||painting||0.432||0.578||12||834,752||0.378||79||0.005||0.012||0.332||0.187||0.005||0.012
|12||Q3305213||painting||0.578||0.432||834,752|| style="background: #ffdead;" |0.378||79||0.5||926,701|| style="background: #ffdead;" |0.486||86||0.558
|-
|-
|91||18||16||Q43229||organization||0.037||0.08||1||806,840||0.365||123||0.008||0.128||0.213||0.097||0.0||0.012
|86||Q47461344||written work||0.078||0.038||774,947|| style="background: #ffdead;" |0.351||67||0.4||881,216|| style="background: #ffdead;" |0.462||53||0.345
|-
|-
|89||19||8||Q4164871||position||0.037||0.128||1||788,077||0.356||332||0.021||0.004||0.343||0.016||0.0||0.003
|25||Q532||village||0.292||0.201||584,789|| style="background: #ffdead;" |0.265||21||0.1||872,310|| style="background: #ffdead;" |0.458||61||0.399
|-
|-
|28||20||30||Q482994||album||0.161||0.287||4||776,845||0.351||37||0.002||0.012||0.287||0.209||0.0||0.016
|4||Q4167836||Wikimedia category||5.165||5.85||1,383,343|| style="background: #ffdead;" |0.626||96||0.6||808,536|| style="background: #ffdead;" |0.424||74||0.484
|-
|-
|86||21||23||Q47461344||written work||0.038||0.078||1||774,947||0.351||67||0.004||0.244||0.085||0.039||0.0||0.003
|62||Q7889||video game||0.047||0.056||741,401|| style="background: #ffdead;" |0.335||30||0.2||753,351|| style="background: #ffdead;" |0.395||37||0.244
|-
|-
|62||22||35||Q7889||video game||0.056||0.047||2||741,401||0.335||30||0.002||0.006||0.195||0.256||0.005||0.007
|19||Q8502||mountain||0.559||0.253||227,393|| style="background: #ffdead;" |0.103||16||0.1||749,283|| style="background: #ffdead;" |0.393||47||0.306
|-
|-
|16||23||21||Q486972||human settlement||0.302||0.602||8||721,789||0.327||73||0.005||0.095||0.22||0.107||0.0||0.006
|28||Q482994||album||0.287||0.161||776,845|| style="background: #ffdead;" |0.351||37||0.2||704,746|| style="background: #ffdead;" |0.37||59||0.388
|-
|-
|8||24||18||Q7187||gene||0.927||1.273||26||628,916||0.284||94||0.006||0.107||0.063||0.007||0.021||0.113
|89||Q4164871||position||0.128||0.037||788,077|| style="background: #ffdead;" |0.356||332||2.1||645,434|| style="background: #ffdead;" |0.339||175||1.141
|-
|-
|25||25||46||Q532||village||0.201||0.292||6||584,789||0.265||21||0.001||0.001||0.246||0.109||0.0||0.013
|8||Q7187||gene||1.273||0.927||628,916|| style="background: #ffdead;" |0.284||94||0.6||604,364|| style="background: #ffdead;" |0.317||208||1.354
|-
|-
|70||26||27||Q732577||publication||0.048||0.076||1||512,416||0.232||53||0.003||0.229||0.003||0.23||0.0||0.0
|10||Q11173||chemical compound||1.302||0.693||1,307,852|| style="background: #ffdead;" |0.592||133||0.8||588,469|| style="background: #ffdead;" |0.309||76||0.496
|-
|-
|42||27||45||Q7725634||literary work||0.077||0.176||2||468,204||0.212||22||0.001||0.017||0.16||0.104||0.0||0.007
|54||Q215380||musical group||0.087||0.063||461,181|| style="background: #ffdead;" |0.209||17||0.1||585,266|| style="background: #ffdead;" |0.307||37||0.241
|-
|-
|138||28||57||Q18340514||events in a specific year or time period||0.019||0.048||1||463,683||0.21||17||0.001||0.0||0.2||0.056||0.0||0.005
|31||Q16970||church building||0.226||0.129||396,936|| style="background: #ffdead;" |0.18||25||0.2||577,677|| style="background: #ffdead;" |0.303||48||0.315
|-
|-
|54||29||60||Q215380||musical group||0.063||0.087||2||461,181||0.209||17||0.001||0.009||0.164||0.073||0.0||0.008
|70||Q732577||publication||0.076||0.048||512,416|| style="background: #ffdead;" |0.232||53||0.3||569,536|| style="background: #ffdead;" |0.299||37||0.238
|-
|-
|2||30||28||Q6999||astronomical object||8.75||8.942||242||448,032||0.203||51||0.003||0.0||0.175||0.085||0.015||0.003
|22||Q79007||street||0.62||0.231||225,188|| style="background: #ffdead;" |0.102||20||0.1||535,623|| style="background: #ffdead;" |0.281||44||0.289
|-
|-
|41||31||56||Q22808320||Wikimedia human name disambiguation page||0.078||0.075||2||433,986||0.196||17||0.001||0.0||0.174||0.154||0.0||0.001
|23||Q4022||river||0.425||0.219||280,190|| style="background: #ffdead;" |0.127||20||0.1||520,347|| style="background: #ffdead;" |0.273||56||0.365
|-
|-
|53||32||63||Q134556||single||0.065||0.103||2||431,003||0.195||16||0.001||0.001||0.167||0.138||0.0||0.004
|243||Q14204246||Wikimedia project page||0.033||0.008||1,114,113|| style="background: #ffdead;" |0.504||62||'''0.4'''||498,708|| style="background: #ffdead;" |0.262||548||'''3.572'''
|-
|-
|37||33||32||Q3331189||version, edition, or translation||0.087||0.19||2||410,352||0.186||34||0.002||0.103||0.053||0.118||0.004||0.028
|36||Q3947||house||0.216||0.098||118,886|| style="background: #ffdead;" |0.054||9||0.1||465,249|| style="background: #ffdead;" |0.244||33||0.212
|-
|-
|31||34||41||Q16970||church building||0.129||0.226||4||396,936||0.18||25||0.002||0.005||0.172||0.112||0.0||0.001
|32||Q41176||building||0.287||0.125||271,666|| style="background: #ffdead;" |0.123||36||0.2||463,636|| style="background: #ffdead;" |0.243||65||0.423
|-
|-
|71||35||25||Q86850539||Whitaker's Latin frequency type C||0.048||0.011||1||355,247||0.161||56||0.003||0.0||0.0||0.0||0.0||0.16
|310||Q783794||company||0.012||0.005||124,932|| style="background: #ffdead;" |0.057||19||0.1||459,638|| style="background: #ffdead;" |0.241||33||0.213
|-
|-
|11||36||65||Q8054||protein||0.67||1.05||19||349,573||0.158||16||0.001||0.079||0.034||0.002||0.02||0.066
|29||Q23397||lake||0.278||0.138||130,027|| style="background: #ffdead;" |0.059||14||0.1||456,054|| style="background: #ffdead;" |0.239||42||0.273
|-
|-
|49||37||167||Q2225692||fourth-level administrative division in Indonesia||0.07||0.088||2||344,964||0.156||5||0.0||0.0||0.147||0.098||0.0||0.009
|121||Q3957||town||0.015||0.023||294,685|| style="background: #ffdead;" |0.133||24||0.1||450,870|| style="background: #ffdead;" |0.237||46||0.297
|-
|-
|223||38||87||Q571||book||0.009||0.022||0||340,900||0.154||12||0.001||0.114||0.016||0.01||0.0||0.023
|64||Q811979||architectural structure||0.119||0.055||282,739|| style="background: #ffdead;" |0.128||28||0.2||445,779|| style="background: #ffdead;" |0.234||48||0.313
|-
|-
|112||39||76||Q476028||association football club||0.026||0.038||1||320,422||0.145||14||0.001||0.006||0.12||0.029||0.0||0.003
|80||Q34442||road||0.073||0.041||215,771|| style="background: #ffdead;" |0.098||14||0.1||440,960|| style="background: #ffdead;" |0.231||34||0.22
|-
|-
|21||40||10||Q2668072||collection||0.248||0.534||7||312,822||0.142||166||0.01||0.056||0.084||0.058||0.0||0.001
|280||Q21198342||manga series||0.014||0.007||208,503|| style="background: #ffdead;" |0.094||5||0.0||437,382|| style="background: #ffdead;" |0.23||11||0.074
|-
|-
|113||41||54||Q15632617||fictional human||0.026||0.056||1||306,319||0.139||18||0.001||0.006||0.1||0.05||0.0||0.003
|71||Q86850539||Whitaker's Latin frequency type C||0.011||0.048||355,247|| style="background: #ffdead;" |0.161||56||0.3||436,103|| style="background: #ffdead;" |0.229||95||0.622
|-
|-
|121||42||42||Q3957||town||0.023||0.015||1||294,685||0.133||24||0.001||0.047||0.079||0.014||0.0||0.002
|138||Q18340514||events in a specific year or time period||0.048||0.019||463,683|| style="background: #ffdead;" |0.21||17||0.1||431,649|| style="background: #ffdead;" |0.227||16||0.104
|-
|-
|133||43||58||Q506240||television film||0.02||0.019||1||290,899||0.132||17||0.001||0.009||0.098||0.07||0.0||0.02
|264||Q2085381||publisher||0.014||0.007||179,442|| style="background: #ffdead;" |0.081||23||0.1||420,459|| style="background: #ffdead;" |0.221||37||0.243
|-
|-
|136||44||5||Q15416||television program||0.019||0.05||1||286,609||0.13||386||0.024||0.024||0.084||0.072||0.0||0.01
|45||Q55488||railway station||0.104||0.075||258,862|| style="background: #ffdead;" |0.117||20||0.1||410,774|| style="background: #ffdead;" |0.216||49||0.319
|-
|-
|72||45||79||Q105543609||musical work/composition||0.048||0.099||1||285,889||0.129||13||0.001||0.004||0.095||0.061||0.004||0.009
|108||Q33506||museum||0.044||0.028||252,308|| style="background: #ffdead;" |0.114||54||0.3||409,716|| style="background: #ffdead;" |0.215||75||0.486
|-
|-
|64||46||38||Q811979||architectural structure||0.055||0.119||2||282,739||0.128||28||0.002||0.09||0.035||0.024||0.0||0.001
|177||Q34770||language||0.011||0.013||1,713,196|| style="background: #ffdead;" |0.775||73||0.5||402,013|| style="background: #ffdead;" |0.211||145||0.947
|-
|-
|23||47||51||Q4022||river||0.219||0.425||6||280,190||0.127||20||0.001||0.002||0.12||0.045||0.0||0.002
|113||Q15632617||fictional human||0.056||0.026||306,319|| style="background: #ffdead;" |0.139||18||0.1||395,934|| style="background: #ffdead;" |0.208||25||0.166
|-
|-
|32||48||31||Q41176||building||0.125||0.287||3||271,666||0.123||36||0.002||0.034||0.084||0.065||0.002||0.001
|41||Q22808320||Wikimedia human name disambiguation page||0.075||0.078||433,986|| style="background: #ffdead;" |0.196||17||0.1||381,873|| style="background: #ffdead;" |0.2||19||0.125
|-
|-
|45||49||50||Q55488||railway station||0.075||0.104||2||258,862||0.117||20||0.001||0.001||0.109||0.072||0.0||0.001
|144||Q11032||newspaper||0.043||0.017||230,085|| style="background: #ffdead;" |0.104||11||0.1||380,153|| style="background: #ffdead;" |0.199||28||0.181
|-
|-
|192||50||143||Q3464665||television series season||0.011||0.02||0||254,318||0.115||6||0.0||0.031||0.077||0.009||0.0||0.0
|37||Q3331189||version, edition, or translation||0.19||0.087||410,352|| style="background: #ffdead;" |0.186||34||0.2||374,597|| style="background: #ffdead;" |0.197||19||0.126
|}
|}


Line 311: Line 416:
{|
{|
|
|
{| class="wikitable sortable"
{| class="wikitable sortable mw-collapsible"
|+ Relationship between subgraphs and user agents
|+ {{nowrap|Relationship between subgraphs and user agents}}
! #of Subgraphs (X) !! #of User agents querying X subgraphs!! %of User agents querying X subgraphs
! #of Subgraphs (X) !! #of User agents querying X subgraphs!! %of User agents querying X subgraphs
|-
|-
Line 611: Line 716:
[[File:subgraph_pair_heatmap.png]]
[[File:subgraph_pair_heatmap.png]]


== Triples analysis ==
== Human subgraph (Q5) query analysis ==
 
The following analysis was done with query data of <code>November, 2021</code>.
 
The queries that were estimated to be related to the human subgraph accounted for '''31.94%''' of all queries in Wikidata. '''31.09%''' queries used only the human subgraph and the rest '''0.85%''' queries used a mix of human and various other subgraphs. As described in [[#What are subgraph related queries]], subgraphs are related to queries through Properties, Subject or Object URIs, Subgraph instance items, etc. Here is a breakdown for human subgraph taken from [[#Query count and time]]. A query can be said to be related to human subgraph due to multiple of the following reasons.
* Number of queries: 60,868,572 (31.94%)
* Percent of queries matching subgraph Qid, i.e, has Q5: 2.54%
* Percent of queries matching instance items: 18%
* Percent of queries matching subject/object URIs: 12%
* Percent of queries matching properties: 19.45%
* Percent of queries matching literal strings: 1.43%
 
Some of these breakdown have large percentages. It is worth looking at what items/properties/URIs are queried the most. Also looking at the distribution of such items' usage in queries shows how narrow or wide the search space is.
 
=== Instance items matched ===
* Total items used: 7,969,182
* Total queries that use these items: 34,680,808 (18% of all queries)
* The distribution shows there are some high usage (~10k-20k queries) items, a small number of medium usage (~5k queries) items, and rest form a long tail of small usage (<1k queries) items in the human subgraph.
 
{|
| rowspan="2" |
{| class="wikitable sortable"
|+ Top items that cause a query to be related to Human subgraph (Q5)
|-
! Instance item !! Instance item label !!#of queries
|-
|Q22686||Donald Trump||19759
|-
|Q1747297||Robert Oliveri||19247
|-
|Q509260||John Zimmerman||19193
|-
|Q6499255||Laura Nader||19135
|-
|Q209394||Michael Wood||19101
|-
|Q937||Albert Einstein||19098
|-
|Q7340648||Rob Whitehurst||19026
|-
|Q52354375||Irene Aparicio||18970
|-
|Q6232209||John F. Cassidy||18964
|-
|Q22986632||Lori Lynn Ross||18954
|-
|Q3976229||Stuart Lancaster||18953
|-
|Q106466114||Gary Michael Ritchie||18947
|-
|Q86599148||James Spicer||18926
|-
|Q87653156||David A. Cook||18919
|-
|Q16015822||Jerry Fleck||18917
|-
|Q7179427||Petur Hliddal||18914
|-
|Q19878977||Jackie Carson||18902
|-
|Q99859767||Kathy McCarty||18898
|-
|Q90307934||Ann Harris||18893
|-
|Q1070508||Cheryl Carasik||18834
|-
|Q9682||Elizabeth II||18816
|-
|Q6279||Joe Biden||18277
|-
|Q64840837||Dylan Arnold||18161
|-
|Q76||Barack Obama||18035
|-
|Q107626126||Mauricio Lara||18010
|}
|
[[File:human_instance_count_all.png|600px]]
|-
|
[[File:human_instance_count_20k.gif|650px]]
|}
 
=== Properties matched ===
* Total properties used: 1,091 (Recall these are properties that occur 99% of the times in the human subgraph)
* Total queries that use these properties: 37,078,566 (19.45% of all queries)
* The distribution shows there are 3 properties with ~20-30M queries, 7 properties with ~1-5M queries, and rest of the more than 1000 properties match ~100K and less queries. In short, the distribution is a extremely skewed by only ~10 properties that are highly related to the human subgraph.
 
{|
| rowspan="2" |
{| class="wikitable sortable"
|+ Top properties that cause a query to be related to Human subgraph (Q5)
|-
! Property !! Property label !!#of queries
|-
|P570||date of death||30151024
|-
|P569||date of birth||30084200
|-
|P27||country of citizenship||24186000
|-
|P106||occupation||5259920
|-
|P734||family name||4871326
|-
|P735||given name||4616631
|-
|P19||place of birth||2379702
|-
|P2949||WikiTree person ID||1707373
|-
|P20||place of death||1222037
|-
|P4985||TMDb person ID||916399
|-
|P39||position held||750380
|-
|P3602||candidacy in election||599067
|-
|P69||educated at||561380
|-
|P26||spouse||471589
|-
|P108||employer||384111
|-
|P2562||married name||279197
|-
|P937||work location||258707
|-
|P1066||student of||158339
|-
|P184||doctoral advisor||152318
|-
|P1960||Google Scholar author ID||151507
|-
|P185||doctoral student||150982
|-
|P54||member of sports team||150573
|-
|P1153||Scopus author ID||150545
|-
|P119||place of burial||144027
|-
|P3829||Publons author ID||138839
|}
|
[[File:human_pred_count_all_log.png|600px]]
|-
|
[[File:human_pred_count.gif|650px]]
|}
 
=== Subject/Object URI matched ===
* Total URIs used: 7,926,297 (Recall these are URIs that occur 99% of the times in the human subgraph)
* Total queries that use these URIs: 23,245,152 (12.2% of all queries)
*The top URIs/items show the obvious and most common ways the human subgraph is queried: query about specific people, about groups of people, and about their wikipedia pages. More about types of queries below.
* The distribution is a smooth logarithmic graph with only one item present in 165k queries, and the rest go down from 40k in a logarithmic pattern.
 
{|
| rowspan="2" |
{| class="wikitable sortable"
|+ Top URIs that cause a query to be related to Human subgraph (Q5)
|-
! URI !! URI label !!#of queries
|-
|Q3391743||visual artist||165540
|-
|Q1925963||graphic artist||38897
|-
|Q28389||screenwriter||33718
|-
|en.wikipedia.org/wiki/Lee_Child||-||33179
|-
|en.wikipedia.org/wiki/Emily_Wilson_(journalist)||-||30837
|-
|en.wikipedia.org/wiki/M.I.A._(rapper)||-||29388
|-
|Q10800557||film actor||29318
|-
|en.wikipedia.org/wiki/Shannon_Lee||-||29216
|-
|en.wikipedia.org/wiki/Eugene_Gordon_Lee||-||29205
|-
|en.wikipedia.org/wiki/Lee_Childs||-||29203
|-
|en.wikipedia.org/wiki/Emily_Wilson_(classicist)||-||26864
|--
|en.wikipedia.org/wiki/Emily_Wilson_(actress)||-||26862
|-
|en.wikipedia.org/wiki/Adhir_Kalyan||-||26862
|-
|en.wikipedia.org/wiki/Emily_Wilson_(footballer)||-||26861
|-
|en.wikipedia.org/wiki/Emily_Wilson_Walker||-||26861
|-
|Q10798782||television actor||24679
|-
|Q185351||jurist||22130
|-
|Q1650915||researcher||21206
|-
|Q2374149||botanist||20385
|-
|Q250867||Catholic priest||20314
|-
|Q10873124||chess player||19832
|-
|Q12299841||cricketer||19414
|-
|Q14373094||rugby league player||19396
|-
|Q509260||John Zimmerman||19193
|-
|Q6499255||Laura Nader||19135
|}
|
[[File:human_uri_count_all_log.png|600px]]
|-
|
[[File:human_uri_count.gif|650px]]
|}
 
=== Query time ===
 
* The total query time of human subgraph is 34% of total query time and total query count is ~32% of all queries.
* Average time per query is 0.3 seconds (300 ms). Most queries in this subgraphs are small and simple.
* The query time distribution is shown in the chart below, both in absolute counts and in percent of queries in human subgraph.
 
[[File:human_time_class.png]]
 
=== User agent ===
 
List of top user agents that query human subgraph is given below. This helps us view the distribution of usage - whether few user agents dominate the usage or it is a rather well distributed usage scenario across user agents. Top 10 user agents in terms of query count and also query time is shown in the table below.
 
{| class="wikitable sortable mw-collapsible"
|+ {{nowrap|Top user agents in human subgraph}}
|-
! User agent !! Query count !! % query in human subgraph !! % query overall !! Query time(hr) !!% query time in human subgraph !! % query time overall
|-
|mix-n-match||6960988||11.436||3.653||79||1.51||0.516
|-
|searx1||6615319||10.868||3.471||778||14.832||5.072
|-
|UA#3||3491821||5.737||1.832||75||1.426||0.487
|-
|UA#4||3073725||5.05||1.613||175||3.327||1.138
|-
|UA#5||2933240||4.819||1.539||80||1.516||0.518
|-
|UA#6||2488807||4.089||1.306||19||0.364||0.125
|-
|UA#7||2182220||3.585||1.145||44||0.841||0.288
|-
|WikidataQueryServiceR||2044045||3.358||1.073||36||0.68||0.232
|-
|UA#9||1970264||3.237||1.034||27||0.524||0.179
|-
|searx2||1909144||3.137||1.002||200||3.808||1.302
|-
|UA#11||75523||0.124||0.04||434||8.271||2.828
|-
|UA#12||55357||0.091||0.029||319||6.083||2.08
|-
|searx3||1428789||2.347||0.75||151||2.871||0.982
|-
|OB-bot||287534||0.472||0.151||144||2.736||0.935
|-
|UA#15||50915||0.084||0.027||134||2.553||0.873
|-
|UA#16||31298||0.051||0.016||112||2.132||0.729
|-
|searx4||771932||1.268||0.405||92||1.761||0.602
|}
 
The query time breakdown was plotted for the top 20 user agents (in terms of time). Most queries have query time of 10ms to 1s, as observed earlier. Some user agents have most queries in the range 10ms to 100ms and some others have most queries in the range 100ms to 1s.
 
[[File:human_ua_query_class_percent_limy15.png]]
 
== Taxon subgraph (Q16521) query analysis ==
The following analysis was done with query data of <code>November, 2021</code>.

Revision as of 23:02, 15 December 2021

Analysis on Subgraphs in Wikidata showed how large each of the subgraphs are in Wikidata and how connected they are. This page shows the results from analysis on the queries that relate to these subgraph. The questions that needed to be answered were:

  • How many(percent) queries access each subgraph?
  • How many queries access multiple subgraphs at once? i.e, how much overlap can we expect in subgraphs?
  • How long do these queries take?
  • How many user-agents access each subgraph? How many of them access lots of subgraphs, or are they confined to a small set of subgraphs? Do some of them dominate queries in multiple subgraphs?
  • Are there chunks of similar queries in these subgraphs? i.e, how diverse the queries in each subgraph are.

TL;DR

What are subgraph related queries

We define some parameters to identify whether a query touches on a subgraph based on the items and properties a query uses. Some queries may even touch on multiple subgraphs. See more on what a subgraph means here. Note: Subgraphs have overlaps.

The parameters that define which subgraph a query belongs to are:

  1. If the query uses the subgraph's Qid. Example: Q5 containing queries are part of Q5 subgraph.
  2. If the query uses items that are instance of a particular subgraph.
  3. If the query uses items that occur 99% of the times in a particular subgraph.
  4. If the query uses properties that occur 99% of the times in a particular subgraph.
  5. If the query uses literals that occur 99% of the times in a particular subgraph. The literals can occur with or without language tags. Both versions are compared to check for match. Note that whole literals are matched in queries and Wikidata. Queries that ask for partial matches, using regex for example, are not included. The assumption is that such queries are more likely to contain other items from the subgraph and are caught anyways.

The following analysis uses Wikidata dump of 20211101 and WDQS public SPARQL queries of 10/2021 unless otherwise stated. All query related numbers below are monthly values.

Query count and time

  • All queries here refer to queries with status code 200 and 500, i.e correct queries, successful or time-out.
  • WDQS receives ~220M queries a month.
  • Total query time for all queries for a month is ~16,000 hours.

The table below lists the top 50 most queried subgraphs with subgraph size and query time information of 11/2021. A breakdown of what caused the match is also present, which corresponds to the parameters mentioned in #What are subgraph related queries. It also ranks the subgraphs by size, query count, and query time consumed. A more complete list containing 341 subgraphs, that form ~90% of Wikidata triples, is available here: [csvlink|all_subgraph_data.csv]. The difference between values from October and November is shown in the next table for comparison purposes. In some places, the query count percentages differ a lot.

Top 50 most queries subgraphs in Wikidata with subgraph size information
Subgraph rank by size Subgraph rank by query count Subgraph rank by query time Subgraph Subgraph label %of triples %of entities Days to recover (4.77M rate) Query count %count of all queries Query time (hr) %time of all queries Avg time/query %count of query from Qid %count of query from instance items %count of query from items %count of query from properties %count of query from literals
3 1 1 Q5 human 7.254 10.045 204 60,868,572 31.941 5248 34.195 0.31 2.541 18.199 12.198 19.457 1.435
5 2 11 Q16521 taxon 2.885 3.5 81 27,172,995 14.259 480 3.131 0.064 12.19 0.746 12.862 0.87 0.433
34 3 7 Q4830453 business 0.107 0.208 3 9,228,037 4.842 554 3.607 0.216 1.646 2.95 2.24 0.001 0.177
6 4 5 Q101352 family name 1.646 0.511 46 5,990,617 3.144 659 4.292 0.396 0.041 3.057 2.791 0.018 0.038
15 5 2 Q11424 film 0.359 0.284 10 5,067,305 2.659 1541 10.042 1.095 0.451 1.469 1.348 0.003 0.543
1 6 13 Q13442814 scholarly article 48.935 39.815 1378 4,944,995 2.595 263 1.713 0.191 0.017 1.942 1.938 0.405 0.396
7 7 3 Q4167410 Wikimedia disambiguation page 1.354 1.464 38 3,292,873 1.728 765 4.982 0.836 0.164 0.192 0.472 0.0 1.163
2 8 25 Q6999 astronomical object 8.684 8.943 245 2,444,109 1.283 79 0.516 0.117 0.003 1.218 1.222 0.023 0.004
92 9 14 Q6881511 enterprise 0.036 0.052 1 1,937,486 1.017 234 1.528 0.436 0.083 0.812 0.538 0.0 0.071
26 10 29 Q484170 commune of France 0.179 0.048 5 1,934,902 1.015 70 0.455 0.13 0.024 0.869 0.085 0.115 0.01
19 11 22 Q13406463 Wikimedia list article 0.249 0.355 7 1,766,742 0.927 117 0.765 0.239 0.034 0.372 0.628 0.0 0.137
63 12 12 Q5398426 television series 0.055 0.063 2 1,379,486 0.724 411 2.68 1.073 0.048 0.376 0.369 0.0 0.167
37 13 47 Q7725634 literary work 0.087 0.203 2 1,377,546 0.723 42 0.273 0.11 0.39 0.181 0.243 0.0 0.009
16 14 4 Q486972 human settlement 0.298 0.612 8 1,328,064 0.697 699 4.557 1.896 0.328 0.39 0.236 0.0 0.005
163 15 15 Q891723 public company 0.015 0.013 0 1,175,813 0.617 219 1.426 0.67 0.042 0.415 0.185 0.001 0.092
90 16 6 Q43229 organization 0.037 0.082 1 1,067,340 0.56 600 3.908 2.023 0.259 0.227 0.146 0.0 0.021
13 17 24 Q3305213 painting 0.426 0.579 12 926,701 0.486 86 0.558 0.333 0.017 0.426 0.284 0.002 0.008
87 18 36 Q47461344 written work 0.037 0.078 1 881,216 0.462 53 0.345 0.216 0.289 0.079 0.114 0.0 0.003
25 19 32 Q532 village 0.199 0.294 6 872,310 0.458 61 0.399 0.253 0.003 0.417 0.198 0.0 0.015
4 20 28 Q4167836 Wikimedia category 5.806 5.175 164 808,536 0.424 74 0.484 0.331 0.037 0.363 0.292 0.0 0.024
61 21 51 Q7889 video game 0.055 0.048 2 753,351 0.395 37 0.244 0.179 0.006 0.181 0.314 0.002 0.01
20 22 41 Q8502 mountain 0.248 0.559 7 749,283 0.393 47 0.306 0.225 0.002 0.369 0.351 0.0 0.001
28 23 33 Q482994 album 0.16 0.288 5 704,746 0.37 59 0.388 0.304 0.012 0.15 0.189 0.0 0.098
89 24 17 Q4164871 position 0.037 0.128 1 645,434 0.339 175 1.141 0.977 0.003 0.305 0.025 0.0 0.011
8 25 16 Q7187 gene 0.911 1.273 26 604,364 0.317 208 1.354 1.238 0.084 0.1 0.022 0.015 0.127
11 26 26 Q11173 chemical compound 0.684 1.302 19 588,469 0.309 76 0.496 0.466 0.135 0.11 0.092 0.002 0.014
55 27 54 Q215380 musical group 0.062 0.087 2 585,266 0.307 37 0.241 0.227 0.01 0.205 0.16 0.0 0.011
31 28 39 Q16970 church building 0.128 0.227 4 577,677 0.303 48 0.315 0.301 0.003 0.288 0.214 0.0 0.002
71 29 55 Q732577 publication 0.047 0.076 1 569,536 0.299 37 0.238 0.231 0.283 0.015 0.296 0.0 0.0
22 30 43 Q79007 street 0.23 0.626 6 535,623 0.281 44 0.289 0.298 0.028 0.246 0.218 0.001 0.001
23 31 34 Q4022 river 0.216 0.425 6 520,347 0.273 56 0.365 0.388 0.002 0.254 0.192 0.0 0.002
242 32 8 Q14204246 Wikimedia project page 0.008 0.033 0 498,708 0.262 548 3.572 3.957 0.026 0.19 0.038 0.0 0.064
36 33 63 Q3947 house 0.096 0.216 3 465,249 0.244 33 0.212 0.252 0.0 0.238 0.223 0.0 0.002
32 34 31 Q41176 building 0.124 0.29 3 463,636 0.243 65 0.423 0.504 0.042 0.189 0.168 0.001 0.002
307 35 62 Q783794 company 0.005 0.012 0 459,638 0.241 33 0.213 0.256 0.081 0.146 0.1 0.0 0.006
29 36 48 Q23397 lake 0.136 0.279 4 456,054 0.239 42 0.273 0.331 0.002 0.227 0.211 0.0 0.001
119 37 42 Q3957 town 0.023 0.015 1 450,870 0.237 46 0.297 0.364 0.057 0.162 0.034 0.0 0.003
64 38 40 Q811979 architectural structure 0.054 0.12 2 445,779 0.234 48 0.313 0.388 0.097 0.126 0.117 0.0 0.001
80 39 59 Q34442 road 0.041 0.073 1 440,960 0.231 34 0.22 0.276 0.008 0.129 0.171 0.0 0.001
275 40 180 Q21198342 manga series 0.007 0.015 0 437,382 0.23 11 0.074 0.093 0.01 0.052 0.2 0.0 0.003
72 41 23 Q86850539 Whitaker's Latin frequency type C 0.047 0.011 1 436,103 0.229 95 0.622 0.788 0.0 0.0 0.0 0.0 0.228
138 42 139 Q18340514 events in a specific year or time period 0.019 0.048 1 431,649 0.227 16 0.104 0.133 0.0 0.21 0.068 0.0 0.004
261 43 53 Q2085381 publisher 0.007 0.015 0 420,459 0.221 37 0.243 0.319 0.001 0.21 0.068 0.0 0.004
44 44 38 Q55488 railway station 0.074 0.104 2 410,774 0.216 49 0.319 0.43 0.001 0.172 0.163 0.0 0.002
108 45 27 Q33506 museum 0.027 0.044 1 409,716 0.215 75 0.486 0.655 0.017 0.184 0.134 0.0 0.001
181 46 19 Q34770 language 0.013 0.011 0 402,013 0.211 145 0.947 1.302 0.009 0.169 0.02 0.0 0.017
112 47 86 Q15632617 fictional human 0.025 0.056 1 395,934 0.208 25 0.166 0.232 0.007 0.138 0.09 0.0 0.004
42 48 119 Q22808320 Wikimedia human name disambiguation page 0.077 0.075 2 381,873 0.2 19 0.125 0.181 0.0 0.164 0.142 0.0 0.001
143 49 75 Q11032 newspaper 0.017 0.043 0 380,153 0.199 28 0.181 0.263 0.002 0.169 0.143 0.0 0.019
38 50 117 Q3331189 version, edition, or translation 0.087 0.191 2 374,597 0.197 19 0.126 0.186 0.117 0.037 0.134 0.0 0.038

Comparison of subgraph queries across time

Comparison of subgraph queries across time (Oct, Nov 2021)
Subgraph rank by size Subgraph Subgraph label %of entities %of triples Oct query count Oct %count of queries Oct query time (hr) Oct %time of queries Nov query count Nov %count of queries Nov query time (hr) Nov %time of queries
3 Q5 human 9.986 7.324 68,659,369 31.058 6,314 39.3 60,868,572 31.941 5,248 34.195
5 Q16521 taxon 3.427 2.871 56,437,140 25.529 495 3.1 27,172,995 14.259 480 3.131
34 Q4830453 business 0.207 0.108 4,041,395 1.828 343 2.1 9,228,037 4.842 554 3.607
6 Q101352 family name 0.509 1.546 5,564,173 2.517 640 4.0 5,990,617 3.144 659 4.292
15 Q11424 film 0.281 0.364 4,757,084 2.152 1,613 10.0 5,067,305 2.659 1,541 10.042
1 Q13442814 scholarly article 39.794 49.668 1,649,268 0.746 142 0.9 4,944,995 2.595 263 1.713
7 Q4167410 Wikimedia disambiguation page 1.459 1.374 3,737,550 1.691 223 1.4 3,292,873 1.728 765 4.982
2 Q6999 astronomical object 8.942 8.75 448,032 0.203 51 0.3 2,444,109 1.283 79 0.516
92 Q6881511 enterprise 0.052 0.036 943,613 0.427 164 1.0 1,937,486 1.017 234 1.528
26 Q484170 commune of France 0.043 0.18 866,766 0.392 46 0.3 1,934,902 1.015 70 0.455
20 Q13406463 Wikimedia list article 0.352 0.252 1,283,160 0.58 73 0.5 1,766,742 0.927 117 0.765
63 Q5398426 television series 0.062 0.055 1,206,285 0.546 366 2.3 1,379,486 0.724 411 2.68
42 Q7725634 literary work 0.176 0.077 468,204 0.212 22 0.1 1,377,546 0.723 42 0.273
16 Q486972 human settlement 0.602 0.302 721,789 0.327 73 0.5 1,328,064 0.697 699 4.557
165 Q891723 public company 0.013 0.015 837,595 0.379 157 1.0 1,175,813 0.617 219 1.426
91 Q43229 organization 0.08 0.037 806,840 0.365 123 0.8 1,067,340 0.56 600 3.908
12 Q3305213 painting 0.578 0.432 834,752 0.378 79 0.5 926,701 0.486 86 0.558
86 Q47461344 written work 0.078 0.038 774,947 0.351 67 0.4 881,216 0.462 53 0.345
25 Q532 village 0.292 0.201 584,789 0.265 21 0.1 872,310 0.458 61 0.399
4 Q4167836 Wikimedia category 5.165 5.85 1,383,343 0.626 96 0.6 808,536 0.424 74 0.484
62 Q7889 video game 0.047 0.056 741,401 0.335 30 0.2 753,351 0.395 37 0.244
19 Q8502 mountain 0.559 0.253 227,393 0.103 16 0.1 749,283 0.393 47 0.306
28 Q482994 album 0.287 0.161 776,845 0.351 37 0.2 704,746 0.37 59 0.388
89 Q4164871 position 0.128 0.037 788,077 0.356 332 2.1 645,434 0.339 175 1.141
8 Q7187 gene 1.273 0.927 628,916 0.284 94 0.6 604,364 0.317 208 1.354
10 Q11173 chemical compound 1.302 0.693 1,307,852 0.592 133 0.8 588,469 0.309 76 0.496
54 Q215380 musical group 0.087 0.063 461,181 0.209 17 0.1 585,266 0.307 37 0.241
31 Q16970 church building 0.226 0.129 396,936 0.18 25 0.2 577,677 0.303 48 0.315
70 Q732577 publication 0.076 0.048 512,416 0.232 53 0.3 569,536 0.299 37 0.238
22 Q79007 street 0.62 0.231 225,188 0.102 20 0.1 535,623 0.281 44 0.289
23 Q4022 river 0.425 0.219 280,190 0.127 20 0.1 520,347 0.273 56 0.365
243 Q14204246 Wikimedia project page 0.033 0.008 1,114,113 0.504 62 0.4 498,708 0.262 548 3.572
36 Q3947 house 0.216 0.098 118,886 0.054 9 0.1 465,249 0.244 33 0.212
32 Q41176 building 0.287 0.125 271,666 0.123 36 0.2 463,636 0.243 65 0.423
310 Q783794 company 0.012 0.005 124,932 0.057 19 0.1 459,638 0.241 33 0.213
29 Q23397 lake 0.278 0.138 130,027 0.059 14 0.1 456,054 0.239 42 0.273
121 Q3957 town 0.015 0.023 294,685 0.133 24 0.1 450,870 0.237 46 0.297
64 Q811979 architectural structure 0.119 0.055 282,739 0.128 28 0.2 445,779 0.234 48 0.313
80 Q34442 road 0.073 0.041 215,771 0.098 14 0.1 440,960 0.231 34 0.22
280 Q21198342 manga series 0.014 0.007 208,503 0.094 5 0.0 437,382 0.23 11 0.074
71 Q86850539 Whitaker's Latin frequency type C 0.011 0.048 355,247 0.161 56 0.3 436,103 0.229 95 0.622
138 Q18340514 events in a specific year or time period 0.048 0.019 463,683 0.21 17 0.1 431,649 0.227 16 0.104
264 Q2085381 publisher 0.014 0.007 179,442 0.081 23 0.1 420,459 0.221 37 0.243
45 Q55488 railway station 0.104 0.075 258,862 0.117 20 0.1 410,774 0.216 49 0.319
108 Q33506 museum 0.044 0.028 252,308 0.114 54 0.3 409,716 0.215 75 0.486
177 Q34770 language 0.011 0.013 1,713,196 0.775 73 0.5 402,013 0.211 145 0.947
113 Q15632617 fictional human 0.056 0.026 306,319 0.139 18 0.1 395,934 0.208 25 0.166
41 Q22808320 Wikimedia human name disambiguation page 0.075 0.078 433,986 0.196 17 0.1 381,873 0.2 19 0.125
144 Q11032 newspaper 0.043 0.017 230,085 0.104 11 0.1 380,153 0.199 28 0.181
37 Q3331189 version, edition, or translation 0.19 0.087 410,352 0.186 34 0.2 374,597 0.197 19 0.126

More on query time

The query time can be broken down to classes for better visualization. Below is a figure with the query class distribution (number of queries per query time class per subgraph) for the top 50 subgraphs. Some of the takeaways are:

  • Most subgraphs have most queries in the range of 10-100ms
  • Second most commons class is 100ms to 1s
  • collection and photograph have most queries (~150k) timed at 1-10s. Around 10 more subgraphs have a little (~10-20k) query in this time range.

File:Top 50 query time class.png

User agent

Analysis on user-agent is an approximation because these don't completely represent distinct users. For example lots people use the same bot or script without changing the user-agent, or the same person or bot uses multiple user-agent strings. Yet based on the available data we can get an estimate nevertheless.

User agent count

  • Total number of unique user agents across all subgraphs: 981,180
  • First, a list of subgraphs with most and least distinct user-agents is listed. It seems the least number of user-agents a subgraph has is at least 10. So the large subgraphs are used by multiple users.
  • The largest numbers of user-agents are present in a variety of type of subgraphs. gene, protein, biological process, molecular function appear to be similar among them. It is possible the same queries represent several of these subgraphs. More on subgraph connectivity in #Subgraph Connectivity.
Subgraphs with most user-agents
Subgraph Subgraph label %Query #User agents %User agent
Q11424 film 2.152 251420 0.256
Q8054 protein 0.158 234659 0.239
Q7187 gene 0.284 187029 0.191
Q2996394 biological process 0.072 124415 0.127
Q14860489 molecular function 0.044 89445 0.091
Q5 human 31.058 55377 0.056
Q898273 protein domain 0.019 38484 0.039
Q16521 taxon 25.529 25193 0.026
Q86850539 Whitaker's Latin frequency type C 0.161 20158 0.021
Q4167410 Wikimedia disambiguation page 1.691 13818 0.014
Q14204246 Wikimedia project page 0.504 13443 0.014
Q476028 association football club 0.145 12086 0.012
Q235557 file format 0.045 7701 0.008
Q1520033 count noun 0.05 7662 0.008
Q417841 protein family 0.007 4906 0.005
Q484170 commune of France 0.392 4764 0.005
Q4830453 business 1.828 4383 0.004
Q4164871 position 0.356 4319 0.004
Q7278 political party 0.109 4073 0.004
Q3918 university 0.104 3565 0.004
Subgraphs with least user-agents
Subgraph Subgraph label %Query #User agents %User agent
Q106006703 local regulations of the People's Republic of China 0.0 11 0.0
Q67015940 Government Boys' Primary School 0.0 13 0.0
Q7604693 Statutory Rules of Northern Ireland 0.0 13 0.0
Q106474968 ethnic group by settlement in Macedonia 0.003 15 0.0
Q6453643 decree law 0.0 15 0.0
Q97695005 committee group motion 0.0 15 0.0
Q100532807 Irish Statutory Instrument 0.0 16 0.0
Q10429085 report 0.0 19 0.0
Q99045339 written question 0.0 20 0.0
Q1505023 Interpellation 0.0 20 0.0
Q96739634 individual motion 0.0 21 0.0
Q67035425 ASTM standard 0.0 21 0.0
Q61278455 health sub-centre 0.001 23 0.0
Q26267864 Wikimedia KML file 0.005 23 0.0
Q3508250 Syndicat intercommunal 0.02 24 0.0
Q107102664 cell line from embryonic stem cells 0.0 24 0.0
Q7604686 UK Statutory Instrument 0.0 27 0.0
Q6451276 Congressional Research Service report 0.001 28 0.0
Q61443650 sub post office 0.0 33 0.0
Q26894053 basketball team season 0.009 34 0.0
  • There are 50 subgraphs with more than 1000 user agents, and 300 subgraphs with less than 1000 user agents. Most subgraphs are therefore not queried overly-widely. The distribution of user-agent counts less than 1000 is shown in the figure below. This clearly shows the small number of user counts in most subgraphs.

File:Ua lessthan1k dist.png

User agent distribution in subgraphs

  • Next, the user agent vs query count distribution was analyzed for some of the top subgraphs. While user agent count gives us an idea of how many users may be using a subgraph, it is not clear whether all of them query the subgraph equally, or very few user agents perform most of the queries.
  • ~30 out of 341 subgraphs have a user agent that queries >=50% of all queries of that particular subgraphs.
  • 6 subgraphs have a user agent querying around 80-90% of the time.
  • So the trend of dominating single source queries is not wide spread among subgraphs, but is present in few subgraphs nonetheless.

The figure below shows the top 2 user-agent query in percents for 341 subgraphs. This shows whether there is a dominating pattern in a subgraph with the top user agents per subgraph. This figure shows the top 2 user-agent query percents for 341 subgraphs. This shows whether there is a dominating pattern in a subgraph with the top user agents per subgraph.

The figure below shows 100 subgraphs with their user agent query usage distribution in percents. Usage greater than 50% is marked in red. A birds-eye view of the plots shows how some subgraphs have a dominating user agent and most other subgraphs have at least 1 or 2 user agents that query the most. The rest of the user agents form the long tail of the distribution This figure shows 100 subgraphs with their user agent query usage distribution in percents. Usage greater than 50% is marked in red. A birds-eye view of the plots shows how some subgraphs have a dominating user agent while most subgraphs have at least 1 or 2 user agents that query the most. The rest of the user agents form the long tail 10% of the distribution.

Top user agents in subgraphs

  • The top user agents in various subgraphs is listed below. More analysis on Q5 (human) and Q16521 (taxon) is done at the end of the page as they are the most queried subgraphs.
Top user agents in various subgraphs
Subgraph Subgraph label User agent Query count (in subgraph) Query percent (within subgraph) Query percent overall
Q16521 taxon mix-n-match 50622670 89.697 22.899
Q5 human UA # 2 9017930 13.134 4.079
Q5 human mix-n-match 8548335 12.45 3.867
Q5 human UA # 3 5059258 7.369 2.289
Q5 human UA # 4 4020496 5.856 1.819
Q5 human UA # 5 3828747 5.576 1.732
Q101352 family name UA # 5 3828747 68.811 1.732
Q5 human UA # 6 2685807 3.912 1.215
Q5 human UA # 7 2434486 3.546 1.101
Q4830453 business UA # 8 2403677 59.476 1.087
Q5 human UA # 9 2020598 2.943 0.914
Q16521 taxon Hub 1984437 3.516 0.898
Q5 human UA # 11 1877700 2.735 0.849
Q5 human UA # 12 1781161 2.594 0.806
Q16521 taxon UA # 13 1294113 2.293 0.585

User agent vs Subgraph

So far we have explored the user-agent count and distribution per subgraph. It is also important to note the user agent's query across subgraphs. In other words,

  • Do users have a very specific use case and so the queries spans only a few subgraphs? or is it spread across a lot of subgraphs?
  • Are there some user agents that query the most in multiple subgraphs? This could be due to the nature of the use case or simply because some subgraphs overlap a lot.

We start by looking at how many user agents access how many subgraphs. From the table below, we see that most user agents (89% of them) query one subgraphs only. Some user agents query a lot of subgraphs as well. A clearer picture is seem from the plot below.

Relationship between subgraphs and user agents
#of Subgraphs (X) #of User agents querying X subgraphs %of User agents querying X subgraphs
1 875724 89.252
2 91962 9.373
5 3562 0.363
3 2388 0.243
6 1539 0.157
7 799 0.081
9 628 0.064
8 463 0.047
4 460 0.047
12 332 0.034
16 308 0.031
15 282 0.029
10 281 0.029
17 242 0.025
18 235 0.024
14 202 0.021
11 184 0.019
19 177 0.018
13 167 0.017
20 119 0.012
21 75 0.008
22 47 0.005
25 46 0.005
23 39 0.004
24 39 0.004
27 32 0.003
26 28 0.003
28 26 0.003
29 25 0.003
30 20 0.002
31 17 0.002
35 16 0.002
37 16 0.002
47 15 0.002
34 15 0.002
61 13 0.001
32 12 0.001
50 12 0.001
36 11 0.001
44 11 0.001
49 10 0.001
65 9 0.001
56 9 0.001
72 9 0.001
51 9 0.001
121 9 0.001
95 9 0.001
124 9 0.001
42 9 0.001
39 9 0.001
File:Ua vs subgraph.png

Next we isolate user agents from each subgraph who query drastically more (>=10% difference) than other user agents in the same subgraph, and perform at least 100k queries (0.05% of all queries) a month. A list of ~30 such user agents was found. A plot with subgraph distributions of all these user agents was observed to find some large buckets where they tend to query. The plot is shows below, followed by some explicit observations.

File:Imp ua dist censored.png

Percentages below are percent of all monthly queries.

  • mix n match (UA #17):
    • a lot of taxon queries (Q16521), 23%
    • a lot of human queries (Q5), 4%
  • UA #6:
    • 1% in Business (Q4830453)
  • UA #14:
    • 1% in human (Q5)
    • 0.5% in film (Q11424)
  • UA #23:
    • 1.73% in family name (Q101352)
    • 1.73% in human (Q5)
    • both have exact counts, meaning they could be the same queries that
      touch both human and family name subgraphs

For reference:

  • 100% percent is 221,067,674 queries
  • 10% percent is 22,106,767 queries
  • 1% percent is 2,210,676 queries
  • 0.1% percent is 221,067 queries
  • 0.05% percent is 110,533 queries
  • 0.01% percent is 22,106 queries

Subgraph connectivity through queries

Subgraph connectivity was explored to some extent using only Wikidata in Wikidata_Subgraph_Analysis. This was based on what items or properties were common between subgraphs and how many direct connections were present between them. A visualization was created to show the strength of this connectivity between subgraphs here: wikidata_graph. This section aims to analyze the connectivity of subgraphs through the queries, i.e, how often are some subgraphs queried together.

  • Subgaph Queries: The total number of queries that touch on at least one of the top 341 subgraps is 72% of all queries.
  • First we look at how many subgraphs do most queries access. The tables below show the least and most query groups by number of subgraphs accessed.
  • 70% of all queries (97% of subgraph queries) touch on 1 or 2 subgraph. 64% of all queries (90% of subgraph queries) touch on only 1 subgraph.
Queries with most subgraphs accessed
#of Subgraphs #of Queries
341 25
333 1
315 2
313 3
258 1
181 3
152 1
142 1
133 2
130 2
129 1
128 2
127 4
126 4
125 9
Queries with least subgraphs
accessed
#of Subgraphs #of Queries %of Queries
1 142507736 64.463
2 12464811 5.638
3 1767253 0.799
4 586173 0.265
5 364445 0.165
6 221485 0.1
7 188012 0.085
8 112922 0.051
9 102524 0.046
10 68871 0.031
11 50341 0.023
12 38102 0.017
13 34075 0.015
14 24003 0.011
15 17935 0.008

File:NumQuery vs numSubgraph.png

  • It is hard to view which subgraphs occur together from the data above. So the subgraphs that occured together were broken into pairs and pars of subgraphs that occur together the most were listed.
  • There are 57,970 subgraphs pairs that occur togther in queries. Total possible subgrah pair count is (340*341)/2 = 57,970. This shows that every subgraph is connected to every other subgraph through queries! Ofcourse the number of queries vary widely.
  • A list of some of the most queried subgraphs is shown below.
Top pairs of subgraphs that are queried together
Subgraph 1 Subgraph 2 Query
Subgraph Subgraph label Subgraph Subgraph label #of Query %of Query
Q101352 family name Q5 human 4935675 2.233
Q4830453 business Q6881511 enterprise 883757 0.4
Q11424 film Q5 human 771698 0.349
Q4830453 business Q891723 public company 735902 0.333
Q3305213 painting Q4167410 Wikimedia disambiguation page 629633 0.285
Q4164871 position Q5 human 541257 0.245
Q47461344 written work Q732577 publication 493402 0.223
Q11424 film Q14204246 Wikimedia project page 483338 0.219
Q6881511 enterprise Q891723 public company 480426 0.217
Q4167410 Wikimedia disambiguation page Q5 human 466217 0.211
Q14204246 Wikimedia project page Q4167410 Wikimedia disambiguation page 436192 0.197
Q13406463 Wikimedia list article Q5 human 394815 0.179
Q4830453 business Q5 human 354945 0.161
Q13442814 scholarly article Q4167410 Wikimedia disambiguation page 316720 0.143
Q13442814 scholarly article Q5 human 282237 0.128
Q13406463 Wikimedia list article Q18340514 events in a specific year or time period 274841 0.124
Q3331189 version, edition, or translation Q5 human 273761 0.124
Q571 book Q5 human 259234 0.117
Q16521 taxon Q5 human 222118 0.1
Q4167410 Wikimedia disambiguation page Q811979 architectural structure 204572 0.093
Q4167410 Wikimedia disambiguation page Q838948 work of art 200810 0.091
Q5398426 television series Q5 human 197997 0.09
Q47461344 written work Q5 human 194750 0.088
Q43229 organization Q4830453 business 179640 0.081
Q5 human Q6881511 enterprise 172486 0.078
Q43229 organization Q5 human 171567 0.078
Q2225692 fourth-level administrative division in Indonesia Q532 village 171086 0.077
Q215380 musical group Q5 human 168318 0.076
Q15632617 fictional human Q5 human 163992 0.074
Q3305213 painting Q838948 work of art 161979 0.073
  • The distribution of the number of times each subgraph pair in wikidata occurs in queries is shown below. Note that (A,B) pair is the same as (B,A) pair, so there is no duplication in the plots. Since the plot is extremely skewed, three plots with various limits on the number of occurrences are shown. We can see how only a small number of pairs occur a lot together, they can be viewed from the table above. Whereas a huge number of pairs occur a very small number of times.

File:Subgraph pair dist.png

  • Below is a heatmap of the number of queries, where both x and y axis represent subgraph indices (names of subgrahps not shown due to space)
  • The diagonals show queries that use only 1 subgraph and are represented as Q5-Q5, or Q42-Q42 for example. Other are represented as Q5-Q42 or Q42-Q5
  • It is a Symmetrical plot.
  • The tons of vertical and horizontal lines indicate there are lots of subgraphs that happen to pair with many other subgraphs.

File:Subgraph pair heatmap.png

Human subgraph (Q5) query analysis

The following analysis was done with query data of November, 2021.

The queries that were estimated to be related to the human subgraph accounted for 31.94% of all queries in Wikidata. 31.09% queries used only the human subgraph and the rest 0.85% queries used a mix of human and various other subgraphs. As described in #What are subgraph related queries, subgraphs are related to queries through Properties, Subject or Object URIs, Subgraph instance items, etc. Here is a breakdown for human subgraph taken from #Query count and time. A query can be said to be related to human subgraph due to multiple of the following reasons.

  • Number of queries: 60,868,572 (31.94%)
  • Percent of queries matching subgraph Qid, i.e, has Q5: 2.54%
  • Percent of queries matching instance items: 18%
  • Percent of queries matching subject/object URIs: 12%
  • Percent of queries matching properties: 19.45%
  • Percent of queries matching literal strings: 1.43%

Some of these breakdown have large percentages. It is worth looking at what items/properties/URIs are queried the most. Also looking at the distribution of such items' usage in queries shows how narrow or wide the search space is.

Instance items matched

  • Total items used: 7,969,182
  • Total queries that use these items: 34,680,808 (18% of all queries)
  • The distribution shows there are some high usage (~10k-20k queries) items, a small number of medium usage (~5k queries) items, and rest form a long tail of small usage (<1k queries) items in the human subgraph.
Top items that cause a query to be related to Human subgraph (Q5)
Instance item Instance item label #of queries
Q22686 Donald Trump 19759
Q1747297 Robert Oliveri 19247
Q509260 John Zimmerman 19193
Q6499255 Laura Nader 19135
Q209394 Michael Wood 19101
Q937 Albert Einstein 19098
Q7340648 Rob Whitehurst 19026
Q52354375 Irene Aparicio 18970
Q6232209 John F. Cassidy 18964
Q22986632 Lori Lynn Ross 18954
Q3976229 Stuart Lancaster 18953
Q106466114 Gary Michael Ritchie 18947
Q86599148 James Spicer 18926
Q87653156 David A. Cook 18919
Q16015822 Jerry Fleck 18917
Q7179427 Petur Hliddal 18914
Q19878977 Jackie Carson 18902
Q99859767 Kathy McCarty 18898
Q90307934 Ann Harris 18893
Q1070508 Cheryl Carasik 18834
Q9682 Elizabeth II 18816
Q6279 Joe Biden 18277
Q64840837 Dylan Arnold 18161
Q76 Barack Obama 18035
Q107626126 Mauricio Lara 18010

File:Human instance count all.png

File:Human instance count 20k.gif

Properties matched

  • Total properties used: 1,091 (Recall these are properties that occur 99% of the times in the human subgraph)
  • Total queries that use these properties: 37,078,566 (19.45% of all queries)
  • The distribution shows there are 3 properties with ~20-30M queries, 7 properties with ~1-5M queries, and rest of the more than 1000 properties match ~100K and less queries. In short, the distribution is a extremely skewed by only ~10 properties that are highly related to the human subgraph.
Top properties that cause a query to be related to Human subgraph (Q5)
Property Property label #of queries
P570 date of death 30151024
P569 date of birth 30084200
P27 country of citizenship 24186000
P106 occupation 5259920
P734 family name 4871326
P735 given name 4616631
P19 place of birth 2379702
P2949 WikiTree person ID 1707373
P20 place of death 1222037
P4985 TMDb person ID 916399
P39 position held 750380
P3602 candidacy in election 599067
P69 educated at 561380
P26 spouse 471589
P108 employer 384111
P2562 married name 279197
P937 work location 258707
P1066 student of 158339
P184 doctoral advisor 152318
P1960 Google Scholar author ID 151507
P185 doctoral student 150982
P54 member of sports team 150573
P1153 Scopus author ID 150545
P119 place of burial 144027
P3829 Publons author ID 138839

File:Human pred count all log.png

File:Human pred count.gif

Subject/Object URI matched

  • Total URIs used: 7,926,297 (Recall these are URIs that occur 99% of the times in the human subgraph)
  • Total queries that use these URIs: 23,245,152 (12.2% of all queries)
  • The top URIs/items show the obvious and most common ways the human subgraph is queried: query about specific people, about groups of people, and about their wikipedia pages. More about types of queries below.
  • The distribution is a smooth logarithmic graph with only one item present in 165k queries, and the rest go down from 40k in a logarithmic pattern.
Top URIs that cause a query to be related to Human subgraph (Q5)
URI URI label #of queries
Q3391743 visual artist 165540
Q1925963 graphic artist 38897
Q28389 screenwriter 33718
en.wikipedia.org/wiki/Lee_Child - 33179
en.wikipedia.org/wiki/Emily_Wilson_(journalist) - 30837
en.wikipedia.org/wiki/M.I.A._(rapper) - 29388
Q10800557 film actor 29318
en.wikipedia.org/wiki/Shannon_Lee - 29216
en.wikipedia.org/wiki/Eugene_Gordon_Lee - 29205
en.wikipedia.org/wiki/Lee_Childs - 29203
en.wikipedia.org/wiki/Emily_Wilson_(classicist) - 26864
en.wikipedia.org/wiki/Emily_Wilson_(actress) - 26862
en.wikipedia.org/wiki/Adhir_Kalyan - 26862
en.wikipedia.org/wiki/Emily_Wilson_(footballer) - 26861
en.wikipedia.org/wiki/Emily_Wilson_Walker - 26861
Q10798782 television actor 24679
Q185351 jurist 22130
Q1650915 researcher 21206
Q2374149 botanist 20385
Q250867 Catholic priest 20314
Q10873124 chess player 19832
Q12299841 cricketer 19414
Q14373094 rugby league player 19396
Q509260 John Zimmerman 19193
Q6499255 Laura Nader 19135

File:Human uri count all log.png

File:Human uri count.gif

Query time

  • The total query time of human subgraph is 34% of total query time and total query count is ~32% of all queries.
  • Average time per query is 0.3 seconds (300 ms). Most queries in this subgraphs are small and simple.
  • The query time distribution is shown in the chart below, both in absolute counts and in percent of queries in human subgraph.

File:Human time class.png

User agent

List of top user agents that query human subgraph is given below. This helps us view the distribution of usage - whether few user agents dominate the usage or it is a rather well distributed usage scenario across user agents. Top 10 user agents in terms of query count and also query time is shown in the table below.

Top user agents in human subgraph
User agent Query count % query in human subgraph % query overall Query time(hr) % query time in human subgraph % query time overall
mix-n-match 6960988 11.436 3.653 79 1.51 0.516
searx1 6615319 10.868 3.471 778 14.832 5.072
UA#3 3491821 5.737 1.832 75 1.426 0.487
UA#4 3073725 5.05 1.613 175 3.327 1.138
UA#5 2933240 4.819 1.539 80 1.516 0.518
UA#6 2488807 4.089 1.306 19 0.364 0.125
UA#7 2182220 3.585 1.145 44 0.841 0.288
WikidataQueryServiceR 2044045 3.358 1.073 36 0.68 0.232
UA#9 1970264 3.237 1.034 27 0.524 0.179
searx2 1909144 3.137 1.002 200 3.808 1.302
UA#11 75523 0.124 0.04 434 8.271 2.828
UA#12 55357 0.091 0.029 319 6.083 2.08
searx3 1428789 2.347 0.75 151 2.871 0.982
OB-bot 287534 0.472 0.151 144 2.736 0.935
UA#15 50915 0.084 0.027 134 2.553 0.873
UA#16 31298 0.051 0.016 112 2.132 0.729
searx4 771932 1.268 0.405 92 1.761 0.602

The query time breakdown was plotted for the top 20 user agents (in terms of time). Most queries have query time of 10ms to 1s, as observed earlier. Some user agents have most queries in the range 10ms to 100ms and some others have most queries in the range 100ms to 1s.

File:Human ua query class percent limy15.png

Taxon subgraph (Q16521) query analysis

The following analysis was done with query data of November, 2021.