You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
User:AKhatun/Wikidata Subgraph Query Analysis: Difference between revisions
imported>AKhatun (→Taxon subgraph (Q16521) query analysis: Add services, triples, and path analysis) |
imported>AKhatun (→Number of subgraph accessed vs query time: Clarify) |
||
Line 716: | Line 716: | ||
* The exact numbers for this heatmap are present in [https://github.com/tanny411/Wikidata-WDQS-Analysis/blob/master/subgraph_query_analysis/data/subgraph_pair_heatmap_df.csv subgraph_pair_heatmap_df] as a csv file. | * The exact numbers for this heatmap are present in [https://github.com/tanny411/Wikidata-WDQS-Analysis/blob/master/subgraph_query_analysis/data/subgraph_pair_heatmap_df.csv subgraph_pair_heatmap_df] as a csv file. | ||
[[File:subgraph_pair_heatmap.png]] | [[File:subgraph_pair_heatmap.png]] | ||
=== Number of subgraph accessed vs query time === | |||
To view whether there is a correlation between accessing more subgraphs and query time, various subsets of subgraphs were taken and their query time distributions were observed. Then, number (and percents) of queries that access various number of subgraphs were plotted for each query time group. A simple correlation plot with time and subgraph number was not possible due to the large number of queries, but the following plots give us a good idea of the correlation. We see there is a slight correlation but it is not significant enough. All query time groups are dominated by queries that access 1 or 2 subgraphs. Queries accessing more subgraphs do appear comparatively more in <code>More than 10s</code> group. | |||
The following analysis was done with data from November 2021. Thus there are slight differences in numbers from the above analysis, which were done with October 2021 data. | |||
[[File:various_subset_time_classes.png]] | |||
[[File:timeGroupWise_numSubgraphAccessed.png]] | |||
== Human subgraph (Q5) query analysis == | == Human subgraph (Q5) query analysis == | ||
Line 730: | Line 740: | ||
Some of these breakdown have large percentages. It is worth looking at what items/properties/URIs are queried the most. Also looking at the distribution of such items' usage in queries shows how narrow or wide the search space is. | Some of these breakdown have large percentages. It is worth looking at what items/properties/URIs are queried the most. Also looking at the distribution of such items' usage in queries shows how narrow or wide the search space is. | ||
Here is a detailed breakdown of what kind of match caused a query to be part of the human subgraph: | |||
{| | |||
| | |||
{| class="wikitable sortable" | |||
|+ Human subgraph query breakdown | |||
|- | |||
! item !! predicate !! URI !! human Q-id !! literal !! # query !! % all query !! % human query | |||
|- | |||
|0||1||0||0||0||{{formatnum:17785347}}||9.333||29.219 | |||
|- | |||
|1||1||1||0||0||{{formatnum:12215379}}||6.41||20.068 | |||
|- | |||
|1||0||0||0||0||{{formatnum:10705360}}||5.618||17.588 | |||
|- | |||
|1||0||1||0||0||{{formatnum:7253287}}||3.806||11.916 | |||
|- | |||
|1||1||0||0||0||{{formatnum:3137130}}||1.646||5.154 | |||
|- | |||
|0||0||1||0||0||{{formatnum:2512142}}||1.318||4.127 | |||
|- | |||
|0||0||0||0||1||{{formatnum:1775347}}||0.932||2.917 | |||
|- | |||
|0||1||0||1||0||{{formatnum:1694236}}||0.889||2.783 | |||
|- | |||
|0||0||0||1||0||{{formatnum:930137}}||0.488||1.528 | |||
|- | |||
|1||1||0||1||0||{{formatnum:598261}}||0.314||0.983 | |||
|- | |||
|1||1||1||1||0||{{formatnum:508706}}||0.267||0.836 | |||
|- | |||
|0||1||0||0||1||{{formatnum:407610}}||0.214||0.67 | |||
|- | |||
|0||0||0||1||1||{{formatnum:350982}}||0.184||0.577 | |||
|- | |||
|0||1||1||1||0||{{formatnum:311340}}||0.163||0.511 | |||
|- | |||
|0||1||1||0||0||{{formatnum:226959}}||0.119||0.373 | |||
|- | |||
|1||0||0||1||0||{{formatnum:178650}}||0.094||0.294 | |||
|- | |||
|0||1||1||1||1||{{formatnum:135684}}||0.071||0.223 | |||
|- | |||
|1||0||1||1||0||{{formatnum:76736}}||0.04||0.126 | |||
|- | |||
|0||1||0||1||1||{{formatnum:56971}}||0.03||0.094 | |||
|- | |||
|1||0||1||0||1||{{formatnum:3451}}||0.002||0.006 | |||
|- | |||
|1||0||0||0||1||{{formatnum:2844}}||0.001||0.005 | |||
|- | |||
|0||0||1||0||1||{{formatnum:702}}||0.0||0.001 | |||
|- | |||
|1||1||1||1||1||{{formatnum:437}}||0.0||0.001 | |||
|- | |||
|1||1||0||1||1||{{formatnum:393}}||0.0||0.001 | |||
|- | |||
|0||0||1||1||0||{{formatnum:304}}||0.0||0.0 | |||
|- | |||
|1||1||0||0||1||{{formatnum:93}}||0.0||0.0 | |||
|- | |||
|1||0||0||1||1||{{formatnum:59}}||0.0||0.0 | |||
|- | |||
|1||1||1||0||1||{{formatnum:17}}||0.0||0.0 | |||
|- | |||
|1||0||1||1||1||{{formatnum:5}}||0.0||0.0 | |||
|- | |||
|0||1||1||0||1||{{formatnum:3}}||0.0||0.0 | |||
|- | |||
! colspan="5" | Total !! 60,868,572 !! 31.94 !! 100 | |||
|} | |||
| | |||
[[File:human_venn.png|800px]] | |||
|} | |||
=== Instance items matched === | === Instance items matched === | ||
Line 1,363: | Line 1,448: | ||
The following analysis was done with query data of <code>November, 2021</code>. | The following analysis was done with query data of <code>November, 2021</code>. | ||
The queries that were estimated to be related to the taxon subgraph accounted for '''14.26%''' of all queries in Wikidata. '''13.57%''' queries used only the taxon subgraph and the rest ''' | The queries that were estimated to be related to the taxon subgraph accounted for '''14.26%''' of all queries in Wikidata. '''13.57%''' queries used only the taxon subgraph and the rest '''0.69%''' queries used a mix of taxon and various other subgraphs. As described in [[#What are subgraph related queries]], subgraphs are related to queries through Properties, Subject or Object URIs, Subgraph instance items, etc. Here is a breakdown for taxon subgraph taken from [[#Query count and time]]. A query can be said to be related to taxon subgraph due to multiple of the following reasons. | ||
* Number of queries: 27,172,995 (14.26%) | * Number of queries: 27,172,995 (14.26%) | ||
* Percent of queries matching subgraph Qid, i.e, has Q5: '''12.19'''% | * Percent of queries matching subgraph Qid, i.e, has Q5: '''12.19'''% | ||
Line 1,372: | Line 1,457: | ||
Percent of queries matching subject/object URIs (12.86) includes Qid (12.19) and instance items (0.75) in them. This makes Qid match almost the only reason for queries to match taxon subqueries. Therefore we look at the top URIs that cause these matches. Also looking at the distribution of such items' usage in queries shows how narrow or wide the search space is (in this case, quite narrow as almost all queries match the Qid itself). | Percent of queries matching subject/object URIs (12.86) includes Qid (12.19) and instance items (0.75) in them. This makes Qid match almost the only reason for queries to match taxon subqueries. Therefore we look at the top URIs that cause these matches. Also looking at the distribution of such items' usage in queries shows how narrow or wide the search space is (in this case, quite narrow as almost all queries match the Qid itself). | ||
Here is a detailed breakdown of what kind of match caused a query to be part of the taxon subgraph: | |||
{| | |||
| | |||
{| class="wikitable sortable" | |||
|+ Taxon subgraph query breakdown | |||
|- | |||
! item !! predicate !! URI !! taxon Q-id !! literal !! # query !! % all query !! % taxon query | |||
|- | |||
|0||0||1||1||0||{{formatnum:22955853}}||12.046||84.48 | |||
|- | |||
|0||1||0||0||0||{{formatnum:1415482}}||0.743||5.209 | |||
|- | |||
|1||0||0||0||0||{{formatnum:638533}}||0.335||2.35 | |||
|- | |||
|0||0||1||0||0||{{formatnum:624147}}||0.328||2.297 | |||
|- | |||
|1||0||1||0||0||{{formatnum:501232}}||0.263||1.845 | |||
|- | |||
|0||0||0||0||1||{{formatnum:443593}}||0.233||1.632 | |||
|- | |||
|0||0||1||1||1||{{formatnum:233109}}||0.122||0.858 | |||
|- | |||
|1||1||1||0||0||{{formatnum:132462}}||0.07||0.487 | |||
|- | |||
|1||0||0||0||1||{{formatnum:66408}}||0.035||0.244 | |||
|- | |||
|0||1||0||0||1||{{formatnum:60111}}||0.032||0.221 | |||
|- | |||
|1||1||0||0||0||{{formatnum:38147}}||0.02||0.14 | |||
|- | |||
|1||0||1||1||0||{{formatnum:30652}}||0.016||0.113 | |||
|- | |||
|0||0||1||0||1||{{formatnum:13104}}||0.007||0.048 | |||
|- | |||
|1||0||1||0||1||{{formatnum:9026}}||0.005||0.033 | |||
|- | |||
|0||1||1||1||0||{{formatnum:5847}}||0.003||0.022 | |||
|- | |||
|1||1||1||1||0||{{formatnum:5248}}||0.003||0.019 | |||
|- | |||
|0||1||1||1||1||{{formatnum:24}}||0.0||0.0 | |||
|- | |||
|0||0||0||1||0||{{formatnum:14}}||0.0||0.0 | |||
|- | |||
|0||1||1||0||0||{{formatnum:3}}||0.0||0.0 | |||
|- | |||
! colspan="5" | Total !! 27,172,995 !! 14.26 !! 100 | |||
|} | |||
| | |||
[[File:taxon_venn.png|800px]] | |||
|} | |||
=== Instance items matched === | === Instance items matched === | ||
Line 1,546: | Line 1,684: | ||
|} | |} | ||
| | | | ||
[[File:taxon_uri_count_all_log.png| | [[File:taxon_uri_count_all_log.png|500px]] | ||
|- | |- | ||
| | | | ||
[[File:taxon_uri_count_all_log_except4largest.png| | [[File:taxon_uri_count_all_log_except4largest.png|500px]] | ||
|} | |} | ||
Revision as of 15:52, 23 December 2021
Analysis on Subgraphs in Wikidata showed how large each of the subgraphs are in Wikidata and how connected they are. This page shows the results from analysis on the queries that relate to these subgraph. The questions that needed to be answered were:
- How many(percent) queries access each subgraph?
- How many queries access multiple subgraphs at once? i.e, how much overlap can we expect in subgraphs?
- How long do these queries take?
- How many user-agents access each subgraph? How many of them access lots of subgraphs, or are they confined to a small set of subgraphs? Do some of them dominate queries in multiple subgraphs?
- Are there chunks of similar queries in these subgraphs? i.e, how diverse the queries in each subgraph are.
TL;DR
We define some parameters to identify whether a query touches on a subgraph based on the items and properties a query uses. Some queries may even touch on multiple subgraphs. See more on what a subgraph means here. Note: Subgraphs have overlaps.
The parameters that define which subgraph a query belongs to are:
- If the query uses the subgraph's Qid. Example: Q5 containing queries are part of Q5 subgraph.
- If the query uses items that are
instance of
a particular subgraph. - If the query uses items that occur 99% of the times in a particular subgraph.
- If the query uses properties that occur 99% of the times in a particular subgraph.
- If the query uses literals that occur 99% of the times in a particular subgraph. The literals can occur with or without language tags. Both versions are compared to check for match. Note that whole literals are matched in queries and Wikidata. Queries that ask for partial matches, using regex for example, are not included. The assumption is that such queries are more likely to contain other items from the subgraph and are caught anyways.
The following analysis uses Wikidata dump of 20211101
and WDQS public SPARQL queries of 10/2021 unless otherwise stated. All query related numbers below are monthly values.
Query count and time
- All queries here refer to queries with status code 200 and 500, i.e correct queries, successful or time-out.
- WDQS receives ~220M queries a month.
- Total query time for all queries for a month is ~16,000 hours.
The table below lists the top 50 most queried subgraphs with subgraph size and query time information of 11/2021
. A breakdown of what caused the match is also present, which corresponds to the parameters mentioned in #What are subgraph related queries. It also ranks the subgraphs by size, query count, and query time consumed. A more complete list containing 341 subgraphs, that form ~90% of Wikidata triples, is available here: subgraph data for November 2021, and subgraph data for October 2021. The difference between values from October and November is shown in the next table for comparison purposes. In some places, the query count percentages differ slightly.
Subgraph rank by size | Subgraph rank by query count | Subgraph rank by query time | Subgraph | Subgraph label | %of triples | %of entities | Days to recover (4.77M rate) | Query count | %count of all queries | Query time (hr) | %time of all queries | Avg time/query | %count of query from Qid | %count of query from instance items | %count of query from items | %count of query from properties | %count of query from literals |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 1 | 1 | Q5 | human | 7.254 | 10.045 | 204 | 60,868,572 | 31.941 | 5248 | 34.195 | 0.31 | 2.541 | 18.199 | 12.198 | 19.457 | 1.435 |
5 | 2 | 11 | Q16521 | taxon | 2.885 | 3.5 | 81 | 27,172,995 | 14.259 | 480 | 3.131 | 0.064 | 12.19 | 0.746 | 12.862 | 0.87 | 0.433 |
34 | 3 | 7 | Q4830453 | business | 0.107 | 0.208 | 3 | 9,228,037 | 4.842 | 554 | 3.607 | 0.216 | 1.646 | 2.95 | 2.24 | 0.001 | 0.177 |
6 | 4 | 5 | Q101352 | family name | 1.646 | 0.511 | 46 | 5,990,617 | 3.144 | 659 | 4.292 | 0.396 | 0.041 | 3.057 | 2.791 | 0.018 | 0.038 |
15 | 5 | 2 | Q11424 | film | 0.359 | 0.284 | 10 | 5,067,305 | 2.659 | 1541 | 10.042 | 1.095 | 0.451 | 1.469 | 1.348 | 0.003 | 0.543 |
1 | 6 | 13 | Q13442814 | scholarly article | 48.935 | 39.815 | 1378 | 4,944,995 | 2.595 | 263 | 1.713 | 0.191 | 0.017 | 1.942 | 1.938 | 0.405 | 0.396 |
7 | 7 | 3 | Q4167410 | Wikimedia disambiguation page | 1.354 | 1.464 | 38 | 3,292,873 | 1.728 | 765 | 4.982 | 0.836 | 0.164 | 0.192 | 0.472 | 0.0 | 1.163 |
2 | 8 | 25 | Q6999 | astronomical object | 8.684 | 8.943 | 245 | 2,444,109 | 1.283 | 79 | 0.516 | 0.117 | 0.003 | 1.218 | 1.222 | 0.023 | 0.004 |
92 | 9 | 14 | Q6881511 | enterprise | 0.036 | 0.052 | 1 | 1,937,486 | 1.017 | 234 | 1.528 | 0.436 | 0.083 | 0.812 | 0.538 | 0.0 | 0.071 |
26 | 10 | 29 | Q484170 | commune of France | 0.179 | 0.048 | 5 | 1,934,902 | 1.015 | 70 | 0.455 | 0.13 | 0.024 | 0.869 | 0.085 | 0.115 | 0.01 |
19 | 11 | 22 | Q13406463 | Wikimedia list article | 0.249 | 0.355 | 7 | 1,766,742 | 0.927 | 117 | 0.765 | 0.239 | 0.034 | 0.372 | 0.628 | 0.0 | 0.137 |
63 | 12 | 12 | Q5398426 | television series | 0.055 | 0.063 | 2 | 1,379,486 | 0.724 | 411 | 2.68 | 1.073 | 0.048 | 0.376 | 0.369 | 0.0 | 0.167 |
37 | 13 | 47 | Q7725634 | literary work | 0.087 | 0.203 | 2 | 1,377,546 | 0.723 | 42 | 0.273 | 0.11 | 0.39 | 0.181 | 0.243 | 0.0 | 0.009 |
16 | 14 | 4 | Q486972 | human settlement | 0.298 | 0.612 | 8 | 1,328,064 | 0.697 | 699 | 4.557 | 1.896 | 0.328 | 0.39 | 0.236 | 0.0 | 0.005 |
163 | 15 | 15 | Q891723 | public company | 0.015 | 0.013 | 0 | 1,175,813 | 0.617 | 219 | 1.426 | 0.67 | 0.042 | 0.415 | 0.185 | 0.001 | 0.092 |
90 | 16 | 6 | Q43229 | organization | 0.037 | 0.082 | 1 | 1,067,340 | 0.56 | 600 | 3.908 | 2.023 | 0.259 | 0.227 | 0.146 | 0.0 | 0.021 |
13 | 17 | 24 | Q3305213 | painting | 0.426 | 0.579 | 12 | 926,701 | 0.486 | 86 | 0.558 | 0.333 | 0.017 | 0.426 | 0.284 | 0.002 | 0.008 |
87 | 18 | 36 | Q47461344 | written work | 0.037 | 0.078 | 1 | 881,216 | 0.462 | 53 | 0.345 | 0.216 | 0.289 | 0.079 | 0.114 | 0.0 | 0.003 |
25 | 19 | 32 | Q532 | village | 0.199 | 0.294 | 6 | 872,310 | 0.458 | 61 | 0.399 | 0.253 | 0.003 | 0.417 | 0.198 | 0.0 | 0.015 |
4 | 20 | 28 | Q4167836 | Wikimedia category | 5.806 | 5.175 | 164 | 808,536 | 0.424 | 74 | 0.484 | 0.331 | 0.037 | 0.363 | 0.292 | 0.0 | 0.024 |
61 | 21 | 51 | Q7889 | video game | 0.055 | 0.048 | 2 | 753,351 | 0.395 | 37 | 0.244 | 0.179 | 0.006 | 0.181 | 0.314 | 0.002 | 0.01 |
20 | 22 | 41 | Q8502 | mountain | 0.248 | 0.559 | 7 | 749,283 | 0.393 | 47 | 0.306 | 0.225 | 0.002 | 0.369 | 0.351 | 0.0 | 0.001 |
28 | 23 | 33 | Q482994 | album | 0.16 | 0.288 | 5 | 704,746 | 0.37 | 59 | 0.388 | 0.304 | 0.012 | 0.15 | 0.189 | 0.0 | 0.098 |
89 | 24 | 17 | Q4164871 | position | 0.037 | 0.128 | 1 | 645,434 | 0.339 | 175 | 1.141 | 0.977 | 0.003 | 0.305 | 0.025 | 0.0 | 0.011 |
8 | 25 | 16 | Q7187 | gene | 0.911 | 1.273 | 26 | 604,364 | 0.317 | 208 | 1.354 | 1.238 | 0.084 | 0.1 | 0.022 | 0.015 | 0.127 |
11 | 26 | 26 | Q11173 | chemical compound | 0.684 | 1.302 | 19 | 588,469 | 0.309 | 76 | 0.496 | 0.466 | 0.135 | 0.11 | 0.092 | 0.002 | 0.014 |
55 | 27 | 54 | Q215380 | musical group | 0.062 | 0.087 | 2 | 585,266 | 0.307 | 37 | 0.241 | 0.227 | 0.01 | 0.205 | 0.16 | 0.0 | 0.011 |
31 | 28 | 39 | Q16970 | church building | 0.128 | 0.227 | 4 | 577,677 | 0.303 | 48 | 0.315 | 0.301 | 0.003 | 0.288 | 0.214 | 0.0 | 0.002 |
71 | 29 | 55 | Q732577 | publication | 0.047 | 0.076 | 1 | 569,536 | 0.299 | 37 | 0.238 | 0.231 | 0.283 | 0.015 | 0.296 | 0.0 | 0.0 |
22 | 30 | 43 | Q79007 | street | 0.23 | 0.626 | 6 | 535,623 | 0.281 | 44 | 0.289 | 0.298 | 0.028 | 0.246 | 0.218 | 0.001 | 0.001 |
23 | 31 | 34 | Q4022 | river | 0.216 | 0.425 | 6 | 520,347 | 0.273 | 56 | 0.365 | 0.388 | 0.002 | 0.254 | 0.192 | 0.0 | 0.002 |
242 | 32 | 8 | Q14204246 | Wikimedia project page | 0.008 | 0.033 | 0 | 498,708 | 0.262 | 548 | 3.572 | 3.957 | 0.026 | 0.19 | 0.038 | 0.0 | 0.064 |
36 | 33 | 63 | Q3947 | house | 0.096 | 0.216 | 3 | 465,249 | 0.244 | 33 | 0.212 | 0.252 | 0.0 | 0.238 | 0.223 | 0.0 | 0.002 |
32 | 34 | 31 | Q41176 | building | 0.124 | 0.29 | 3 | 463,636 | 0.243 | 65 | 0.423 | 0.504 | 0.042 | 0.189 | 0.168 | 0.001 | 0.002 |
307 | 35 | 62 | Q783794 | company | 0.005 | 0.012 | 0 | 459,638 | 0.241 | 33 | 0.213 | 0.256 | 0.081 | 0.146 | 0.1 | 0.0 | 0.006 |
29 | 36 | 48 | Q23397 | lake | 0.136 | 0.279 | 4 | 456,054 | 0.239 | 42 | 0.273 | 0.331 | 0.002 | 0.227 | 0.211 | 0.0 | 0.001 |
119 | 37 | 42 | Q3957 | town | 0.023 | 0.015 | 1 | 450,870 | 0.237 | 46 | 0.297 | 0.364 | 0.057 | 0.162 | 0.034 | 0.0 | 0.003 |
64 | 38 | 40 | Q811979 | architectural structure | 0.054 | 0.12 | 2 | 445,779 | 0.234 | 48 | 0.313 | 0.388 | 0.097 | 0.126 | 0.117 | 0.0 | 0.001 |
80 | 39 | 59 | Q34442 | road | 0.041 | 0.073 | 1 | 440,960 | 0.231 | 34 | 0.22 | 0.276 | 0.008 | 0.129 | 0.171 | 0.0 | 0.001 |
275 | 40 | 180 | Q21198342 | manga series | 0.007 | 0.015 | 0 | 437,382 | 0.23 | 11 | 0.074 | 0.093 | 0.01 | 0.052 | 0.2 | 0.0 | 0.003 |
72 | 41 | 23 | Q86850539 | Whitaker's Latin frequency type C | 0.047 | 0.011 | 1 | 436,103 | 0.229 | 95 | 0.622 | 0.788 | 0.0 | 0.0 | 0.0 | 0.0 | 0.228 |
138 | 42 | 139 | Q18340514 | events in a specific year or time period | 0.019 | 0.048 | 1 | 431,649 | 0.227 | 16 | 0.104 | 0.133 | 0.0 | 0.21 | 0.068 | 0.0 | 0.004 |
261 | 43 | 53 | Q2085381 | publisher | 0.007 | 0.015 | 0 | 420,459 | 0.221 | 37 | 0.243 | 0.319 | 0.001 | 0.21 | 0.068 | 0.0 | 0.004 |
44 | 44 | 38 | Q55488 | railway station | 0.074 | 0.104 | 2 | 410,774 | 0.216 | 49 | 0.319 | 0.43 | 0.001 | 0.172 | 0.163 | 0.0 | 0.002 |
108 | 45 | 27 | Q33506 | museum | 0.027 | 0.044 | 1 | 409,716 | 0.215 | 75 | 0.486 | 0.655 | 0.017 | 0.184 | 0.134 | 0.0 | 0.001 |
181 | 46 | 19 | Q34770 | language | 0.013 | 0.011 | 0 | 402,013 | 0.211 | 145 | 0.947 | 1.302 | 0.009 | 0.169 | 0.02 | 0.0 | 0.017 |
112 | 47 | 86 | Q15632617 | fictional human | 0.025 | 0.056 | 1 | 395,934 | 0.208 | 25 | 0.166 | 0.232 | 0.007 | 0.138 | 0.09 | 0.0 | 0.004 |
42 | 48 | 119 | Q22808320 | Wikimedia human name disambiguation page | 0.077 | 0.075 | 2 | 381,873 | 0.2 | 19 | 0.125 | 0.181 | 0.0 | 0.164 | 0.142 | 0.0 | 0.001 |
143 | 49 | 75 | Q11032 | newspaper | 0.017 | 0.043 | 0 | 380,153 | 0.199 | 28 | 0.181 | 0.263 | 0.002 | 0.169 | 0.143 | 0.0 | 0.019 |
38 | 50 | 117 | Q3331189 | version, edition, or translation | 0.087 | 0.191 | 2 | 374,597 | 0.197 | 19 | 0.126 | 0.186 | 0.117 | 0.037 | 0.134 | 0.0 | 0.038 |
Comparison of subgraph queries across time
Subgraph rank by size | Subgraph | Subgraph label | %of entities | %of triples | Oct query count | Oct %count of queries | Oct query time (hr) | Oct %time of queries | Nov query count | Nov %count of queries | Nov query time (hr) | Nov %time of queries |
---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | Q5 | human | 9.986 | 7.324 | 68,659,369 | 31.058 | 6,314 | 39.3 | 60,868,572 | 31.941 | 5,248 | 34.195 |
5 | Q16521 | taxon | 3.427 | 2.871 | 56,437,140 | 25.529 | 495 | 3.1 | 27,172,995 | 14.259 | 480 | 3.131 |
34 | Q4830453 | business | 0.207 | 0.108 | 4,041,395 | 1.828 | 343 | 2.1 | 9,228,037 | 4.842 | 554 | 3.607 |
6 | Q101352 | family name | 0.509 | 1.546 | 5,564,173 | 2.517 | 640 | 4.0 | 5,990,617 | 3.144 | 659 | 4.292 |
15 | Q11424 | film | 0.281 | 0.364 | 4,757,084 | 2.152 | 1,613 | 10.0 | 5,067,305 | 2.659 | 1,541 | 10.042 |
1 | Q13442814 | scholarly article | 39.794 | 49.668 | 1,649,268 | 0.746 | 142 | 0.9 | 4,944,995 | 2.595 | 263 | 1.713 |
7 | Q4167410 | Wikimedia disambiguation page | 1.459 | 1.374 | 3,737,550 | 1.691 | 223 | 1.4 | 3,292,873 | 1.728 | 765 | 4.982 |
2 | Q6999 | astronomical object | 8.942 | 8.75 | 448,032 | 0.203 | 51 | 0.3 | 2,444,109 | 1.283 | 79 | 0.516 |
92 | Q6881511 | enterprise | 0.052 | 0.036 | 943,613 | 0.427 | 164 | 1.0 | 1,937,486 | 1.017 | 234 | 1.528 |
26 | Q484170 | commune of France | 0.043 | 0.18 | 866,766 | 0.392 | 46 | 0.3 | 1,934,902 | 1.015 | 70 | 0.455 |
20 | Q13406463 | Wikimedia list article | 0.352 | 0.252 | 1,283,160 | 0.58 | 73 | 0.5 | 1,766,742 | 0.927 | 117 | 0.765 |
63 | Q5398426 | television series | 0.062 | 0.055 | 1,206,285 | 0.546 | 366 | 2.3 | 1,379,486 | 0.724 | 411 | 2.68 |
42 | Q7725634 | literary work | 0.176 | 0.077 | 468,204 | 0.212 | 22 | 0.1 | 1,377,546 | 0.723 | 42 | 0.273 |
16 | Q486972 | human settlement | 0.602 | 0.302 | 721,789 | 0.327 | 73 | 0.5 | 1,328,064 | 0.697 | 699 | 4.557 |
165 | Q891723 | public company | 0.013 | 0.015 | 837,595 | 0.379 | 157 | 1.0 | 1,175,813 | 0.617 | 219 | 1.426 |
91 | Q43229 | organization | 0.08 | 0.037 | 806,840 | 0.365 | 123 | 0.8 | 1,067,340 | 0.56 | 600 | 3.908 |
12 | Q3305213 | painting | 0.578 | 0.432 | 834,752 | 0.378 | 79 | 0.5 | 926,701 | 0.486 | 86 | 0.558 |
86 | Q47461344 | written work | 0.078 | 0.038 | 774,947 | 0.351 | 67 | 0.4 | 881,216 | 0.462 | 53 | 0.345 |
25 | Q532 | village | 0.292 | 0.201 | 584,789 | 0.265 | 21 | 0.1 | 872,310 | 0.458 | 61 | 0.399 |
4 | Q4167836 | Wikimedia category | 5.165 | 5.85 | 1,383,343 | 0.626 | 96 | 0.6 | 808,536 | 0.424 | 74 | 0.484 |
62 | Q7889 | video game | 0.047 | 0.056 | 741,401 | 0.335 | 30 | 0.2 | 753,351 | 0.395 | 37 | 0.244 |
19 | Q8502 | mountain | 0.559 | 0.253 | 227,393 | 0.103 | 16 | 0.1 | 749,283 | 0.393 | 47 | 0.306 |
28 | Q482994 | album | 0.287 | 0.161 | 776,845 | 0.351 | 37 | 0.2 | 704,746 | 0.37 | 59 | 0.388 |
89 | Q4164871 | position | 0.128 | 0.037 | 788,077 | 0.356 | 332 | 2.1 | 645,434 | 0.339 | 175 | 1.141 |
8 | Q7187 | gene | 1.273 | 0.927 | 628,916 | 0.284 | 94 | 0.6 | 604,364 | 0.317 | 208 | 1.354 |
10 | Q11173 | chemical compound | 1.302 | 0.693 | 1,307,852 | 0.592 | 133 | 0.8 | 588,469 | 0.309 | 76 | 0.496 |
54 | Q215380 | musical group | 0.087 | 0.063 | 461,181 | 0.209 | 17 | 0.1 | 585,266 | 0.307 | 37 | 0.241 |
31 | Q16970 | church building | 0.226 | 0.129 | 396,936 | 0.18 | 25 | 0.2 | 577,677 | 0.303 | 48 | 0.315 |
70 | Q732577 | publication | 0.076 | 0.048 | 512,416 | 0.232 | 53 | 0.3 | 569,536 | 0.299 | 37 | 0.238 |
22 | Q79007 | street | 0.62 | 0.231 | 225,188 | 0.102 | 20 | 0.1 | 535,623 | 0.281 | 44 | 0.289 |
23 | Q4022 | river | 0.425 | 0.219 | 280,190 | 0.127 | 20 | 0.1 | 520,347 | 0.273 | 56 | 0.365 |
243 | Q14204246 | Wikimedia project page | 0.033 | 0.008 | 1,114,113 | 0.504 | 62 | 0.4 | 498,708 | 0.262 | 548 | 3.572 |
36 | Q3947 | house | 0.216 | 0.098 | 118,886 | 0.054 | 9 | 0.1 | 465,249 | 0.244 | 33 | 0.212 |
32 | Q41176 | building | 0.287 | 0.125 | 271,666 | 0.123 | 36 | 0.2 | 463,636 | 0.243 | 65 | 0.423 |
310 | Q783794 | company | 0.012 | 0.005 | 124,932 | 0.057 | 19 | 0.1 | 459,638 | 0.241 | 33 | 0.213 |
29 | Q23397 | lake | 0.278 | 0.138 | 130,027 | 0.059 | 14 | 0.1 | 456,054 | 0.239 | 42 | 0.273 |
121 | Q3957 | town | 0.015 | 0.023 | 294,685 | 0.133 | 24 | 0.1 | 450,870 | 0.237 | 46 | 0.297 |
64 | Q811979 | architectural structure | 0.119 | 0.055 | 282,739 | 0.128 | 28 | 0.2 | 445,779 | 0.234 | 48 | 0.313 |
80 | Q34442 | road | 0.073 | 0.041 | 215,771 | 0.098 | 14 | 0.1 | 440,960 | 0.231 | 34 | 0.22 |
280 | Q21198342 | manga series | 0.014 | 0.007 | 208,503 | 0.094 | 5 | 0.0 | 437,382 | 0.23 | 11 | 0.074 |
71 | Q86850539 | Whitaker's Latin frequency type C | 0.011 | 0.048 | 355,247 | 0.161 | 56 | 0.3 | 436,103 | 0.229 | 95 | 0.622 |
138 | Q18340514 | events in a specific year or time period | 0.048 | 0.019 | 463,683 | 0.21 | 17 | 0.1 | 431,649 | 0.227 | 16 | 0.104 |
264 | Q2085381 | publisher | 0.014 | 0.007 | 179,442 | 0.081 | 23 | 0.1 | 420,459 | 0.221 | 37 | 0.243 |
45 | Q55488 | railway station | 0.104 | 0.075 | 258,862 | 0.117 | 20 | 0.1 | 410,774 | 0.216 | 49 | 0.319 |
108 | Q33506 | museum | 0.044 | 0.028 | 252,308 | 0.114 | 54 | 0.3 | 409,716 | 0.215 | 75 | 0.486 |
177 | Q34770 | language | 0.011 | 0.013 | 1,713,196 | 0.775 | 73 | 0.5 | 402,013 | 0.211 | 145 | 0.947 |
113 | Q15632617 | fictional human | 0.056 | 0.026 | 306,319 | 0.139 | 18 | 0.1 | 395,934 | 0.208 | 25 | 0.166 |
41 | Q22808320 | Wikimedia human name disambiguation page | 0.075 | 0.078 | 433,986 | 0.196 | 17 | 0.1 | 381,873 | 0.2 | 19 | 0.125 |
144 | Q11032 | newspaper | 0.043 | 0.017 | 230,085 | 0.104 | 11 | 0.1 | 380,153 | 0.199 | 28 | 0.181 |
37 | Q3331189 | version, edition, or translation | 0.19 | 0.087 | 410,352 | 0.186 | 34 | 0.2 | 374,597 | 0.197 | 19 | 0.126 |
More on query time
The query time can be broken down to classes for better visualization. Below is a figure with the query class distribution (number of queries per query time class per subgraph) for the top 50 subgraphs. Some of the takeaways are:
- Most subgraphs have most queries in the range of 10-100ms
- Second most commons class is 100ms to 1s
collection
andphotograph
have most queries (~150k) timed at 1-10s. Around 10 more subgraphs have a little (~10-20k) query in this time range.
File:Top 50 query time class.png
User agent
Analysis on user-agent is an approximation because these don't completely represent distinct users. For example lots people use the same bot or script without changing the user-agent, or the same person or bot uses multiple user-agent strings. Yet based on the available data we can get an estimate nevertheless.
User agent count
- Total number of unique user agents across all subgraphs: 981,180
- First, a list of subgraphs with most and least distinct user-agents is listed. It seems the least number of user-agents a subgraph has is at least 10. So the large subgraphs are used by multiple users.
- The largest numbers of user-agents are present in a variety of type of subgraphs.
gene, protein, biological process, molecular function
appear to be similar among them. It is possible the same queries represent several of these subgraphs. More on subgraph connectivity in #Subgraph Connectivity.
|
|
- There are 50 subgraphs with more than 1000 user agents, and 300 subgraphs with less than 1000 user agents. Most subgraphs are therefore not queried overly-widely. The distribution of user-agent counts less than 1000 is shown in the figure below. This clearly shows the small number of user counts in most subgraphs.
User agent distribution in subgraphs
- Next, the user agent vs query count distribution was analyzed for some of the top subgraphs. While user agent count gives us an idea of how many users may be using a subgraph, it is not clear whether all of them query the subgraph equally, or very few user agents perform most of the queries.
- ~30 out of 341 subgraphs have a user agent that queries >=50% of all queries of that particular subgraphs.
- 6 subgraphs have a user agent querying around 80-90% of the time.
- So the trend of dominating single source queries is not wide spread among subgraphs, but is present in few subgraphs nonetheless.
The figure below shows the top 2 user-agent query in percents for 341 subgraphs. This shows whether there is a dominating pattern in a subgraph with the top user agents per subgraph. This figure shows the top 2 user-agent query percents for 341 subgraphs. This shows whether there is a dominating pattern in a subgraph with the top user agents per subgraph.
The figure below shows 100 subgraphs with their user agent query usage distribution in percents. Usage greater than 50% is marked in red. A birds-eye view of the plots shows how some subgraphs have a dominating user agent and most other subgraphs have at least 1 or 2 user agents that query the most. The rest of the user agents form the long tail of the distribution This figure shows 100 subgraphs with their user agent query usage distribution in percents. Usage greater than 50% is marked in red. A birds-eye view of the plots shows how some subgraphs have a dominating user agent while most subgraphs have at least 1 or 2 user agents that query the most. The rest of the user agents form the long tail 10% of the distribution.
Top user agents in subgraphs
- The top user agents in various subgraphs is listed below. More analysis on Q5 (human) and Q16521 (taxon) is done at the end of the page as they are the most queried subgraphs.
Subgraph | Subgraph label | User agent | Query count (in subgraph) | Query percent (within subgraph) | Query percent overall |
---|---|---|---|---|---|
Q16521 | taxon | mix-n-match | 50622670 | 89.697 | 22.899 |
Q5 | human | UA # 2 | 9017930 | 13.134 | 4.079 |
Q5 | human | mix-n-match | 8548335 | 12.45 | 3.867 |
Q5 | human | UA # 3 | 5059258 | 7.369 | 2.289 |
Q5 | human | UA # 4 | 4020496 | 5.856 | 1.819 |
Q5 | human | UA # 5 | 3828747 | 5.576 | 1.732 |
Q101352 | family name | UA # 5 | 3828747 | 68.811 | 1.732 |
Q5 | human | UA # 6 | 2685807 | 3.912 | 1.215 |
Q5 | human | UA # 7 | 2434486 | 3.546 | 1.101 |
Q4830453 | business | UA # 8 | 2403677 | 59.476 | 1.087 |
Q5 | human | UA # 9 | 2020598 | 2.943 | 0.914 |
Q16521 | taxon | Hub | 1984437 | 3.516 | 0.898 |
Q5 | human | UA # 11 | 1877700 | 2.735 | 0.849 |
Q5 | human | UA # 12 | 1781161 | 2.594 | 0.806 |
Q16521 | taxon | UA # 13 | 1294113 | 2.293 | 0.585 |
User agent vs Subgraph
So far we have explored the user-agent count and distribution per subgraph. It is also important to note the user agent's query across subgraphs. In other words,
- Do users have a very specific use case and so the queries spans only a few subgraphs? or is it spread across a lot of subgraphs?
- Are there some user agents that query the most in multiple subgraphs? This could be due to the nature of the use case or simply because some subgraphs overlap a lot.
We start by looking at how many user agents access how many subgraphs. From the table below, we see that most user agents (89% of them) query one subgraphs only. Some user agents query a lot of subgraphs as well. A clearer picture is seem from the plot below.
|
File:Ua vs subgraph.png |
Next we isolate user agents from each subgraph who query drastically more (>=10% difference) than other user agents in the same subgraph, and perform at least 100k queries (0.05% of all queries) a month. A list of ~30 such user agents was found. A plot with subgraph distributions of all these user agents was observed to find some large buckets where they tend to query. The plot is shows below, followed by some explicit observations.
Percentages below are percent of all monthly queries.
|
For reference:
|
Subgraph connectivity through queries
Subgraph connectivity was explored to some extent using only Wikidata in Wikidata_Subgraph_Analysis. This was based on what items or properties were common between subgraphs and how many direct connections were present between them. A visualization was created to show the strength of this connectivity between subgraphs here: wikidata_graph. This section aims to analyze the connectivity of subgraphs through the queries, i.e, how often are some subgraphs queried together.
- Subgaph Queries: The total number of queries that touch on at least one of the top 341 subgraps is 72% of all queries.
- First we look at how many subgraphs do most queries access. The tables below show the least and most query groups by number of subgraphs accessed.
- 70% of all queries (97% of subgraph queries) touch on 1 or 2 subgraph. 64% of all queries (90% of subgraph queries) touch on only 1 subgraph.
|
|
File:NumQuery vs numSubgraph.png
- It is hard to view which subgraphs occur together from the data above. So the subgraphs that occured together were broken into pairs and pars of subgraphs that occur together the most were listed.
- There are 57,970 subgraphs pairs that occur togther in queries. Total possible subgrah pair count is (340*341)/2 = 57,970. This shows that every subgraph is connected to every other subgraph through queries! Ofcourse the number of queries vary widely.
- A list of some of the most queried subgraphs is shown below.
Subgraph 1 | Subgraph 2 | Query | |||
---|---|---|---|---|---|
Subgraph | Subgraph label | Subgraph | Subgraph label | #of Query | %of Query |
Q101352 | family name | Q5 | human | 4649345 | 2.44 |
Q4830453 | business | Q6881511 | enterprise | 1858183 | 0.975 |
Q11424 | film | Q5 | human | 1096150 | 0.575 |
Q5 | human | Q7725634 | literary work | 1067191 | 0.56 |
Q4830453 | business | Q891723 | public company | 973565 | 0.511 |
Q13406463 | Wikimedia list article | Q5 | human | 970047 | 0.509 |
Q16521 | taxon | Q5 | human | 890304 | 0.467 |
Q4167410 | Wikimedia disambiguation page | Q5 | human | 840151 | 0.441 |
Q4830453 | business | Q5 | human | 680786 | 0.357 |
Q3305213 | painting | Q4167410 | Wikimedia disambiguation page | 606434 | 0.318 |
Q6881511 | enterprise | Q891723 | public company | 572986 | 0.301 |
Q13442814 | scholarly article | Q5 | human | 527538 | 0.277 |
Q47461344 | written work | Q732577 | publication | 514321 | 0.27 |
Q4164871 | position | Q5 | human | 480484 | 0.252 |
Q13442814 | scholarly article | Q4167410 | Wikimedia disambiguation page | 446490 | 0.234 |
Q482994 | album | Q5 | human | 409139 | 0.215 |
Q13406463 | Wikimedia list article | Q16521 | taxon | 401466 | 0.211 |
Q13406463 | Wikimedia list article | Q4167410 | Wikimedia disambiguation page | 349421 | 0.183 |
Q14204246 | Wikimedia project page | Q4167410 | Wikimedia disambiguation page | 341845 | 0.179 |
Q43229 | organization | Q5 | human | 337868 | 0.177 |
Q5398426 | television series | Q5 | human | 323501 | 0.17 |
Q215380 | musical group | Q5 | human | 320532 | 0.168 |
Q47461344 | written work | Q5 | human | 313149 | 0.164 |
Q5 | human | Q6881511 | enterprise | 285110 | 0.15 |
Q3331189 | version, edition, or translation | Q5 | human | 283741 | 0.149 |
Q5 | human | Q86850539 | Whitaker's Latin frequency type C | 280866 | 0.147 |
Q11424 | film | Q13406463 | Wikimedia list article | 272316 | 0.143 |
Q13406463 | Wikimedia list article | Q18340514 | events in a specific year or time period | 270710 | 0.142 |
Q16521 | taxon | Q4167410 | Wikimedia disambiguation page | 266507 | 0.14 |
Q4167410 | Wikimedia disambiguation page | Q86850539 | Whitaker's Latin frequency type C | 249340 | 0.131 |
- The distribution of the number of times each subgraph pair in wikidata occurs in queries is shown below. Note that (A,B) pair is the same as (B,A) pair, so there is no duplication in the plots. Since the plot is extremely skewed, three plots with various limits on the number of occurrences are shown. We can see how only a small number of pairs occur a lot together, they can be viewed from the table above. Whereas a huge number of pairs occur a very small number of times.
- Below is a heatmap of the number of queries, where both x and y axis represent subgraph indices (names of subgrahps not shown due to space). The subgraphs are sorted by most queried subgraphs.
- The diagonals show queries that use only 1 subgraph and are represented as Q5-Q5, or Q42-Q42 for example. Other are represented as Q5-Q42 or Q42-Q5
- It is a Symmetrical plot.
- The tons of vertical and horizontal lines indicate there are lots of subgraphs that happen to pair with many other subgraphs.
- The exact numbers for this heatmap are present in subgraph_pair_heatmap_df as a csv file.
File:Subgraph pair heatmap.png
Number of subgraph accessed vs query time
To view whether there is a correlation between accessing more subgraphs and query time, various subsets of subgraphs were taken and their query time distributions were observed. Then, number (and percents) of queries that access various number of subgraphs were plotted for each query time group. A simple correlation plot with time and subgraph number was not possible due to the large number of queries, but the following plots give us a good idea of the correlation. We see there is a slight correlation but it is not significant enough. All query time groups are dominated by queries that access 1 or 2 subgraphs. Queries accessing more subgraphs do appear comparatively more in More than 10s
group.
The following analysis was done with data from November 2021. Thus there are slight differences in numbers from the above analysis, which were done with October 2021 data.
File:Various subset time classes.png
File:TimeGroupWise numSubgraphAccessed.png
Human subgraph (Q5) query analysis
The following analysis was done with query data of November, 2021
.
The queries that were estimated to be related to the human subgraph accounted for 31.94% of all queries in Wikidata. 25.78% queries used only the human subgraph and the rest 6.16% queries used a mix of human and various other subgraphs. As described in #What are subgraph related queries, subgraphs are related to queries through Properties, Subject or Object URIs, Subgraph instance items, etc. Here is a breakdown for human subgraph taken from #Query count and time. A query can be said to be related to human subgraph due to multiple of the following reasons.
- Number of queries: 60,868,572 (31.94%)
- Percent of queries matching subgraph Qid, i.e, has Q5: 2.54%
- Percent of queries matching instance items: 18%
- Percent of queries matching subject/object URIs: 12%
- Percent of queries matching properties: 19.45%
- Percent of queries matching literal strings: 1.43%
Some of these breakdown have large percentages. It is worth looking at what items/properties/URIs are queried the most. Also looking at the distribution of such items' usage in queries shows how narrow or wide the search space is.
Here is a detailed breakdown of what kind of match caused a query to be part of the human subgraph:
|
Instance items matched
- Total items used: 7,969,182
- Total queries that use these items: 34,680,808 (18% of all queries)
- The distribution shows there are some high usage (~10k-20k queries) items, a small number of medium usage (~5k queries) items, and rest form a long tail of small usage (<1k queries) items in the human subgraph.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Properties matched
- Total properties used: 1,091 (Recall these are properties that occur 99% of the times in the human subgraph)
- Total queries that use these properties: 37,078,566 (19.45% of all queries)
- The distribution shows there are 3 properties with ~20-30M queries, 7 properties with ~1-5M queries, and rest of the more than 1000 properties match ~100K and less queries. In short, the distribution is a extremely skewed by only ~10 properties that are highly related to the human subgraph.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject/Object URI matched
- Total URIs used: 7,926,297 (Recall these are URIs that occur 99% of the times in the human subgraph)
- Total queries that use these URIs: 23,245,152 (12.2% of all queries)
- The top URIs/items show the obvious and most common ways the human subgraph is queried: query about specific people, about groups of people, and about their wikipedia pages. More about types of queries below.
- The distribution is a smooth logarithmic graph with only one item present in 165k queries, and the rest go down from 40k in a logarithmic pattern.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Query time
- The total query time of human subgraph is 34% of total query time and total query count is ~32% of all queries.
- Average time per query is 0.3 seconds (300 ms). Most queries in this subgraphs are small and simple.
- The query time distribution is shown in the chart below, both in absolute counts and in percent of queries in human subgraph.
User agent
List of top user agents that query human subgraph is given below. This helps us view the distribution of usage - whether few user agents dominate the usage or it is a rather well distributed usage scenario across user agents. Top 10 user agents in terms of query count and also query time is shown in the table below.
User agent | Query count | % query in human subgraph | % query overall | Query time(hr) | % query time in human subgraph | % query time overall |
---|---|---|---|---|---|---|
mix-n-match | 6960988 | 11.436 | 3.653 | 79 | 1.51 | 0.516 |
searx1 | 6615319 | 10.868 | 3.471 | 778 | 14.832 | 5.072 |
UA#3 | 3491821 | 5.737 | 1.832 | 75 | 1.426 | 0.487 |
UA#4 | 3073725 | 5.05 | 1.613 | 175 | 3.327 | 1.138 |
UA#5 | 2933240 | 4.819 | 1.539 | 80 | 1.516 | 0.518 |
UA#6 | 2488807 | 4.089 | 1.306 | 19 | 0.364 | 0.125 |
UA#7 | 2182220 | 3.585 | 1.145 | 44 | 0.841 | 0.288 |
WikidataQueryServiceR | 2044045 | 3.358 | 1.073 | 36 | 0.68 | 0.232 |
UA#9 | 1970264 | 3.237 | 1.034 | 27 | 0.524 | 0.179 |
searx2 | 1909144 | 3.137 | 1.002 | 200 | 3.808 | 1.302 |
UA#11 | 75523 | 0.124 | 0.04 | 434 | 8.271 | 2.828 |
UA#12 | 55357 | 0.091 | 0.029 | 319 | 6.083 | 2.08 |
searx3 | 1428789 | 2.347 | 0.75 | 151 | 2.871 | 0.982 |
OB-bot | 287534 | 0.472 | 0.151 | 144 | 2.736 | 0.935 |
UA#15 | 50915 | 0.084 | 0.027 | 134 | 2.553 | 0.873 |
UA#16 | 31298 | 0.051 | 0.016 | 112 | 2.132 | 0.729 |
searx4 | 771932 | 1.268 | 0.405 | 92 | 1.761 | 0.602 |
The query time breakdown was plotted for the top 20 user agents (in terms of time). Most queries have query time of 10ms to 1s, as observed earlier. Some user agents have most queries in the range 10ms to 100ms and some others have most queries in the range 100ms to 1s.
File:Human ua query class percent limy15.png
Query types
Query types are grouped by the operations a query uses and also the order of operations used. This groups similar queries together despite different information sought and also separates groups of simple or complicated queries. The human subgraph has ~11,500 different types of queries. Notice that some query groups can be very similar in what they ask for as well, while most groups differ a lot. The top top query groups are listed below. The top 10 types of queries account for 60% of the queries of human subgraph. The rest form a really long tail of small query counts.
Query group (operation list) | #of queries | %of queries in human subgraph |
---|---|---|
bgp, service, path, bgp, sequence, leftjoin, bgp, join, path, bgp, sequence, leftjoin, bgp, join, path, bgp, sequence, leftjoin, bgp, join, path, bgp, sequence, leftjoin, bgp, join, path, bgp, sequence, leftjoin, bgp, join, path, bgp, sequence, leftjoin, bgp, join, path, bgp, sequence, leftjoin, bgp, join, path, bgp, sequence, leftjoin, bgp, join, bgp, (leftjoin, bgp)x10, path, leftjoin, leftjoin, bgp, path, leftjoin, leftjoin, bgp, leftjoin, bgp, path, leftjoin, (leftjoin, bgp)x15, leftjoin, path, bgp, sequence, leftjoin, bgp, join, bgp, (leftjoin, bgp)x35, leftjoin, path, bgp, sequence, (leftjoin, bgp)x8, service, join, group, (extend)x68,project | 9,511,204 | 15.626 |
table, bgp, join, filter, project, distinct | 6,889,023 | 11.318 |
bgp, service, path, bgp, sequence, leftjoin, bgp, join, path, bgp, sequence, leftjoin, bgp, join, path, bgp, sequence, leftjoin, bgp, join, path, bgp, sequence, leftjoin, bgp, join, path, bgp, sequence, leftjoin, bgp, join, path, bgp, sequence, leftjoin, bgp, join, path, bgp, sequence, leftjoin, bgp, join, path, bgp, sequence, leftjoin, bgp, join, bgp, (leftjoin, bgp)x9, leftjoin, bgp, path, leftjoin, leftjoin, bgp, path, leftjoin, leftjoin, bgp, leftjoin, bgp, path, leftjoin, (leftjoin, bgp)x15 leftjoin, path, bgp, sequence, leftjoin, bgp, join, bgp, (leftjoin, bgp)x36, leftjoin, path, bgp, sequence, (leftjoin, bgp)x8, service, join, group, (extend)x68, project | 4,298,916 | 7.063 |
table, bgp, join, bgp, leftjoin, bgp, leftjoin, bgp, join, filter, project | 3,444,363 | 5.659 |
bgp, bgp, leftjoin, filter, bgp, extend, filter, union, bgp, extend, filter, union, project] | 3,073,725 | 5.05 |
bgp, project | 2,454,919 | 4.033 |
table, bgp, leftjoin, bgp, join, filter, project | 2,429,518 | 3.991 |
bgp, bgp, service, join, filter, project, distinct | 1,172,912 | 1.927 |
table, bgp, join, bgp, service, join, order, project | 1,047,351 | 1.721 |
table, extend, bgp, join, bgp, leftjoin, bgp, leftjoin, bgp, leftjoin, path, leftjoin, bgp, leftjoin, bgp, leftjoin, bgp, leftjoin, bgp, leftjoin, bgp, leftjoin, bgp, leftjoin, bgp, leftjoin, bgp, leftjoin, bgp, service, join, filter, project | 1,033,220 | 1.697 |
Looking at the top 20 query groups (that make up 70% of all human subgraph queries), the following query types were found:
- Query lists lots of predicates with times/date precision of some items
- Mix n Match: Birth and Death of certain people, with filters
- Like 1, slightly different
- Name and family information of people in specific langauges
- All properties of some humans (these queries are generic but here used for humans)
- Uses P227 (or some international ID) and retrieves the wikipedia page for it
- Same as 6 but written differently
- Just wants labels in specific languages
- Same as 6 but written differently
- All contact info of people (fb, insta, youtube, twitter, etc)
- Labels and wikipedia article of people
- Only label and description of people
- Search for films by director filter
- CEO or high officials of companies searched by name strings
- Timeline of people of a particular occupation and particular gender
- Occupation, name, birth, death of people
- Education institution and its reference of people
- Label and wikipedia page of people in specific languages
- List all people, or all people of certain occupation, or entity related to a given wikipedia page (generic query)
- Notable works, labels, and IDs of works like isbns
UA vs query types
Getting the number of query types per user agent informs us of the variety of queries a user agent makes to WDQS. This also breaks down the human subgraph queries into finer groups. The following plot shows the number of query types for each user agent in the human subgraph.
This shows us that most user agents make only 1 type of query. Only 8 user agents make queries of >500 types, and ~50 user agents make queries of >100 types. Looking into query counts in each of these UA - query type
groups, we find that they have few queries (<10,000), and only ~10 groups have >10,000 queries, but all of these are small simple queries. The figure below shows the number of query per query type for the top 8 user agents. As we can see, their dsitribution looks alike although their query counts and query types differ.
File:Query vs query type 8ua.png
Query type vs time class
While there are close to 11,500 query types, 20 of these types make 70% of all queries of human subgraph (22% of all queries), not all of them are equally time consuming. Some can be simple queries, while some can be long and complex. The following plot shows these 20 query types with query time classes. The values above the bar show both percent in human subgraph and overall query percentage. The subplots are titled with percent of the number of queries in that query type.
Services
The queries use ~50 unique services. The top 10 services are the most used; rest are used in less than 50 queries, mostly in less than 10 queries. 20 of these services are used in only 1 query.
Service | Query count | % query in human subgraph |
---|---|---|
wikibase:label | 29,925,798 | 49.165 |
wikibase:mwapi | 14,014,458 | 23.024 |
gas:service | 46,588 | 0.077 |
bd:slice | 42,764 | 0.07 |
http://dbpedia.org/sparql | 22,751 | 0.037 |
https://query.wikidata.org/sparql | 22,751 | 0.037 |
wikibase:around | 1,733 | 0.003 |
https://sophox.org/sparql | 628 | 0.001 |
wikibase:box | 195 | 0.0 |
mediawiki:categoryTree | 45 | 0.0 |
Triples
Some query type analysis done in section query types gives us a good idea of what kind of queries human subgraph receives. Looking at the triples themselves also helps us peek into what most of the queries look like, what the most common subjects, objects, and properties are. The table below lists these along with the top wikidata items and properties used overall. From the numbers it seems the top items are probably part of the same queries.
|
|
|
|
Paths
Paths are more complex predicates that chain properties with logic. Complex paths can increase the scope of a query and also increase its runtime. The table below lists the most used paths in human subgraph queries. While most path are not very complex or long, there are a lot of variety in ways paths are formed to perform queries. Ordinary properties are not considered as paths. The following list contains not only the paths, but also their breakdown into components paths (as done by Jena ARQ while parsing SPARQL queries). For instance: (p:P31/ps:P31)/(wdt:P279)*
is recorded as:
(p:P31/ps:P31)/(wdt:P279)*
(p:P31/ps:P31)
p:P31
ps:P31
(wdt:P279)*
wdt:P279
The unit form, wdt:P279
for example, was removed from the path list since they are part of other paths and not paths themselves. More paths that seemed obvious as being part of a longer path, and not paths themselves, were also removed from the list for better visualization of the distinct paths used in the queries.
Path | Query count | % Query count in human subgraph |
---|---|---|
p:P570/psv:P570 | 13,867,481 | 22.783 |
p:P569/psv:P569 | 13,863,357 | 22.77 |
p:P625/psv:P625 | 13,810,408 | 22.689 |
p:P577/psv:P577 | 13,810,371 | 22.689 |
p:P571/psv:P571 | 13,810,310 | 22.689 |
p:P576/psv:P576 | 13,810,242 | 22.689 |
p:P582/psv:P582 | 13,810,146 | 22.688 |
psv:P2046/<http://wikiba.se/ontology#quantityUnit | 13,810,142 | 22.688 |
p:P580/psv:P580 | 13,810,142 | 22.688 |
psv:P281/<http://wikiba.se/ontology#quantityUnit | 13,810,142 | 22.688 |
p:P619/psv:P619 | 13,810,138 | 22.688 |
p:P620/psv:P620 | 13,810,138 | 22.688 |
wdt:P31/(wdt:P279)* | 1,148,817 | 1.887 |
wdt:P31|wdt:P279 | 1,040,140 | 1.709 |
p:P169|p:P488 | 704,491 | 1.157 |
ps:P169|ps:P488 | 704,491 | 1.157 |
p:P2572/ps:P2572 | 501,987 | 0.825 |
((((((((((((wdt:P17|wdt:P101)|wdt:P112)|wdt:P135)|wdt:P136)|wdt:P279)|wdt:P361)|wdt:P460)|wdt:P793)|wdt:P800)|wdt:P1269)|wdt:P1344)|wdt:P1830)|(p:P2572/ps:P2572) | 501,987 | 0.825 |
ps:P106/(wdt:P279)* | 429,856 | 0.706 |
ps:P31/(wdt:P279)* | 429,284 | 0.705 |
wdt:P106/(wdt:P279)* | 251,007 | 0.412 |
p:P569/ps:P569 p:P570/ps:P570 | 246,947 | 0.406 |
p:P569/ps:P569 p:P570/ps:P570 | 202,546 | 0.333 |
wdt:P50|wdt:P2093 | 197,426 | 0.324 |
Taxon subgraph (Q16521) query analysis
The following analysis was done with query data of November, 2021
.
The queries that were estimated to be related to the taxon subgraph accounted for 14.26% of all queries in Wikidata. 13.57% queries used only the taxon subgraph and the rest 0.69% queries used a mix of taxon and various other subgraphs. As described in #What are subgraph related queries, subgraphs are related to queries through Properties, Subject or Object URIs, Subgraph instance items, etc. Here is a breakdown for taxon subgraph taken from #Query count and time. A query can be said to be related to taxon subgraph due to multiple of the following reasons.
- Number of queries: 27,172,995 (14.26%)
- Percent of queries matching subgraph Qid, i.e, has Q5: 12.19%
- Percent of queries matching instance items: 0.75%
- Percent of queries matching subject/object URIs: 12.86%
- Percent of queries matching properties: 0.87%
- Percent of queries matching literal strings: 0.43%
Percent of queries matching subject/object URIs (12.86) includes Qid (12.19) and instance items (0.75) in them. This makes Qid match almost the only reason for queries to match taxon subqueries. Therefore we look at the top URIs that cause these matches. Also looking at the distribution of such items' usage in queries shows how narrow or wide the search space is (in this case, quite narrow as almost all queries match the Qid itself).
Here is a detailed breakdown of what kind of match caused a query to be part of the taxon subgraph:
|
Instance items matched
- Total items used: 588,668
- Total queries that use these items: 1,421,708 (0.75% of all queries)
- The distribution shows there are only 3 high usage(>100k queries) items, and the rest form a long tail of small usage (<1k queries) items in the taxon subgraph.
- Note that these are for the queries from the month of November 2021. These data change from one month to another.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Properties matched
- Total properties used: 162 (Recall these are properties that occur 99% of the times in the taxon subgraph)
- Total queries that use these properties: 1,657,324 (0.87% of all queries)
- Most of these look like external IDs. Only 31 of these properties are not IDs.
- The distribution shows there is 1 property with >1M queries, 7 properties with >100K queries, 14 properties with 2-8K queries, and rest of the properties match ~1K and less queries. In short, the distribution is a extremely skewed by only ~10 properties.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Subject/Object URI matched
- Total URIs used: 651,945 (Recall these are URIs that occur 99% of the times in the human subgraph)
- Total queries that use these URIs: 24,510,707 (12.86% of all queries)
- The top URI is infact the Qid of taxon subgraph - Q16521 - and matches 12.19%of all queries. We look into the queries directly later in this section.
- We analyze the top 100K URIs. Of these, 66% are Wikidata items, 31% are wikipedia links.
- The distribution shows that the top 4 URIs occur in queries magnitudes of times greater than the other URIs. Ofcourse this data is only for November 2021, but the high usage of Taxon Qid was also observed in October 2021 data.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The wikipedia links are from 80 different languages. The tables below show some of the top languages used in terms of unique links queried and shows the top 5 links for each of these langauges.
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
Query time
- The total query time of taxon subgraph is ~3% of total query time and total query count is 14.26% of all queries.
- Average time per query is 0.064 seconds (64 ms). Almost all queries in this subgraph are small and simple.
- The query time distribution is shown in the chart below: in absolute counts, in percent of queries in taxon subgraph, and in percent of all queries.
User agent
List of top user agents that query the taxon subgraph is given below. This helps us view the distribution of usage - whether few user agents dominate the usage or it is a rather well distributed usage scenario across user agents. Top 10 user agents in terms of query count and also query time is shown in the table below. The query type column is discussed later in the section.
User agent | Query count | % query in taxon subgraph | % query overall | Query time(hr) | % query time in taxon subgraph | % query time overall | # query type |
---|---|---|---|---|---|---|---|
mix-n-match | 22959293 | 84.493 | 12.048 | 163 | 33.949 | 1.063 | 5 |
Hub | 1318251 | 4.851 | 0.692 | 17 | 3.455 | 0.108 | 1 |
WikidataQueryServiceR | 568799 | 2.093 | 0.298 | 9 | 1.837 | 0.058 | 34 |
UA#4 | 325563 | 1.198 | 0.171 | 10 | 2.044 | 0.064 | 5 |
UA#5 | 265565 | 0.977 | 0.139 | 2 | 0.495 | 0.015 | 5 |
UA#6 | 199650 | 0.735 | 0.105 | 53 | 11.133 | 0.349 | 24 |
UA#7 | 168536 | 0.62 | 0.088 | 2 | 0.441 | 0.014 | 2 |
sparqlwrapper | 161781 | 0.595 | 0.085 | 126 | 26.257 | 0.822 | 33 |
UA#9 | 107736 | 0.396 | 0.057 | 3 | 0.627 | 0.02 | 1 |
EasyContent | 103292 | 0.38 | 0.054 | 2 | 0.346 | 0.011 | 1 |
UA#11 | 1330 | 0.005 | 0.001 | 12 | 2.481 | 0.078 | 8 |
UA#12 | 45065 | 0.166 | 0.024 | 8 | 1.644 | 0.051 | 9 |
AhrefsBot | 56580 | 0.208 | 0.03 | 8 | 1.575 | 0.049 | 12 |
Apache-Jena-ARQ | 6654 | 0.024 | 0.003 | 6 | 1.27 | 0.04 | 11 |
The query time breakdown was plotted for the top 20 user agents (in terms of time).
File:Taxon ua query class percent.png File:Taxon ua query class percent log.png
Query types
Query types are grouped by the operations a query uses and also the order of operations used. This groups similar queries together despite different information sought and also separates groups of simple or complicated queries. The taxon subgraph has ~1100 different types of queries (The variety is quite less compares to 11K query type in human subgraph). Notice that some query groups can be very similar in what they ask for as well, alhtough most groups differ a lot. The top query groups are listed below. Only the top 3 types of queries account for ~90% of the queries of taxon subgraph. Top 40 form 99% of the queries in this subgraph. The rest form a long tail of small query counts.
Query group (operation list) | #of queries | %of queries in human subgraph |
---|---|---|
['path', 'table', 'bgp', 'join', 'bgp', 'union', 'join', 'project'] | 13,013,162 | 47.89 |
['path', 'table', 'bgp', 'join', 'bgp', 'union', 'bgp', 'union', 'join', 'project'] | 9,943,644 | 36.594 |
['bgp', 'project', 'distinct', 'slice'] | 1,318,254 | 4.851 |
['table', 'bgp', 'leftjoin', 'bgp', 'join', 'filter', 'project'] | 236,468 | 0.87 |
['bgp', 'project'] | 230,329 | 0.848 |
['path', 'bgp', 'path', 'sequence', 'table', 'join', 'filter', 'order', 'project', 'distinct'] | 199,441 | 0.734 |
['bgp', 'bgp', 'service', 'join', 'project', 'slice'] | 169,045 | 0.622 |
['table', 'extend', 'extend', 'bgp', 'join', 'project'] | 152,885 | 0.563 |
['table', 'path', 'bgp', 'sequence', 'path', 'bgp', 'sequence', 'path', 'bgp', 'sequence', 'leftjoin', 'leftjoin', 'leftjoin', 'bgp', 'leftjoin', 'bgp', 'leftjoin', 'bgp', 'leftjoin', 'bgp', 'leftjoin', 'bgp', 'leftjoin', 'bgp', 'service', 'join', 'project', 'distinct'] | 152,353 | 0.561 |
['bgp', 'extend', 'bgp', 'extend', 'union', 'bgp', 'extend', 'union', 'bgp', 'extend', 'union', 'path', 'extend', 'union', 'path', 'extend', 'union', 'bgp', 'extend', 'union', 'bgp', 'extend', 'union', 'bgp', 'extend', 'union', 'bgp', 'extend', 'union', 'bgp', 'extend', 'union', 'bgp', 'extend', 'union', 'bgp', 'extend', 'union', 'bgp', 'extend', 'union', 'bgp', 'extend', 'union', 'path', 'extend', 'union', 'bgp', 'extend', 'union', 'path', 'extend', 'union', 'bgp', 'extend', 'union', 'path', 'extend', 'union', 'project', 'distinct'] | 139,177 | 0.512 |
Looking at the top 5 query groups (that make up >90% of all taxon subgraph queries), the following query types were found:
- Search with taxon name, synonyms, altlabels etc
- Search with taxon ID. E.g.
SELECT DISTINCT ?subject WHERE { ?subject wdt:P3151 '47126' .}
- Get labels of certain items
- Get labels of certain items in specific languages
- Get external IDs of items
UA vs query types
Getting the number of query types per user agent informs us of the variety of queries a user agent makes to WDQS. This also breaks down the taxon subgraph queries into finer groups. The following plot shows the number of query types for each user agent in the taxon subgraph.
This shows us that most user agents make only 1 type of query. Only 3 user agents make queries of >100 types, and 5 user agents make queries of 50-100 types. The figure below shows the number of query per query type for the top 8 user agents.
** The number of query types for the top user agents in terms of query count and time is listed in the Taxon User agent section.
File:Taxon query vs query type 8ua.png
Query type vs time class
While there are close to 1,100 query types and only the top 3 types of queries account for ~90% of the queries of taxon subgraph (12.8% of all queries), not all of them are equally time consuming. Some can be simple queries, while some can be long and complex. The following plot shows the top 10 query types with query time classes. The values above the bar show both percent in taxon subgraph and overall query percentage. The subplots are titled with percent of the number of queries in that query type.
In sum, most queries in taxon subgraph are small and simple, take between 10 to 100ms time to run, and are mostly by 1/2 user agents.
File:Taxon top 10 qtype qtime.png
Services
The queries use 12 unique services. The top 4 services are the most used, although the usage is still pretty low; rest are used in less than 30 queries.
Service | Query count | % query in taxon subgraph |
---|---|---|
wikibase:label | 853,272 | 3.14 |
wikibase:mwapi | 8,198 | 0.03 |
gas:service | 1,206 | 0.004 |
bd:sample | 512 | 0.002 |
https://query.wikidata.org/bigdata/namespace/wdq/sparql | 28 | 0.0 |
idsm:wikidata | 28 | 0.0 |
https://query.wikidata.org/sparql | 14 | 0.0 |
http://sparql.wikipathways.org/sparql | 13 | 0.0 |
wikibase:around | 3 | 0.0 |
https://sparql.wikipathways.org/sparql | 2 | 0.0 |
https://sophox.org/sparql | 2 | 0.0 |
https://spang.dbcls.jp/sparql | 1 | 0.0 |
Triples
Some query type analysis done in section query types gives us a good idea of what kind of queries taxon subgraph receives. Looking at the triples themselves also helps us peek into what most of the queries look like, what the most common subjects, objects, and properties are. The table below lists these along with the top wikidata items and properties used overall.
Subject | Predicate | Object | # query | % query of taxon subgraph |
---|---|---|---|---|
q | wdt:P31/(wdt:P279)* | wd:Q16521 | 22,956,806 | 84.484* |
bd:serviceParam | wikibase:language | en | 379,833 | 1.398 |
item | wdt:P225 | taxonName | 265,426 | 0.977 |
item | p:P105 | taxonRank1 | 265,360 | 0.977 |
taxonRank1 | ps:P105 | taxonRank | 265,360 | 0.977 |
item | rdfs:label | label | 241,505 | 0.889 |
item | skos:altLabel | altLabel | 236,495 | 0.870 |
bd:serviceParam | wikibase:language | [AUTO_LANGUAGE],en | 205,404 | 0.756 |
items | rdfs:label | itemlabel | 199,441 | 0.734 |
items | (wdt:P279)? | types | 195,916 | 0.721 |
*coincides with the number of queries from mix-n-match. Almost all mix-in-match queries in taxon subgraph have this triple. And all of these triples only occur in mix-n-match queries.
|
|
|
|
Paths
Paths are more complex predicates that chain properties with logic. Complex paths can increase the scope of a query and also increase its runtime. The table below lists the most used paths in taxon subgraph queries. While most path are not very complex or long, there are a lot of variety in ways paths are formed to perform queries. Ordinary properties are not considered as paths. The following list contains not only the paths, but also their breakdown into components paths (as done by Jena ARQ while parsing SPARQL queries). For instance: (p:P31/ps:P31)/(wdt:P279)*
is recorded as:
(p:P31/ps:P31)/(wdt:P279)*
(p:P31/ps:P31)
p:P31
ps:P31
(wdt:P279)*
wdt:P279
The unit form, wdt:P279
for example, was removed from the path list since they are part of other paths and not paths themselves. More paths that seemed obvious as being part of a longer path, and not paths themselves, were also removed from the list for better visualization of the distinct paths used in the queries.
Path | Query count | % Query in taxon subgraph |
---|---|---|
wdt:P31/(wdt:P279)* | 23,035,627 | 84.774 |
((wdt:P31)*/(wdt:P279)*)/(wd:P361)* | 195,916 | 0.721 |
(wdt:P171)+ | 168,128 | 0.619 |
wdt:P1416|wdt:P108 | 139,191 | 0.512 |
(wdt:P159)?/wdt:P625 | 139,191 | 0.512 |
wdt:P50|wdt:P2093 | 139,191 | 0.512 |
^wdt:P31/wdt:P235 | 139,191 | 0.512 |
wdt:P31/(wdt:P279)? | 139,184 | 0.512 |
(((((((((((((((((((((((((wdt:P171|(wdt:P171/wdt:P171))|((wdt:P171/wdt:P171)/wdt:P171))|(((wdt:P171/wdt:P171)/wdt:P171)/wdt:P171))|(wdt:P171/wdt:P171*4))|(wdt:P171/wdt:P171*5))|(wdt:P171/wdt:P171*6))|(wdt:P171/wdt:P171*7))|(wdt:P171/wdt:P171*8))|(wdt:P171/wdt:P171*9))|(wdt:P171/wdt:P171*10))|(wdt:P171/wdt:P171*11))|(wdt:P171/wdt:P171*12))|(wdt:P171/wdt:P171*13))|(wdt:P171/wdt:P171*14))|(wdt:P171/wdt:P171*15))|(wdt:P171/wdt:P171*16))|(wdt:P171/wdt:P171*17))|(wdt:P171/wdt:P171*18))|(wdt:P171/wdt:P171*19))|(wdt:P171/wdt:P171*20))|(wdt:P171/wdt:P171*21))|(wdt:P171/wdt:P171*22))|(wdt:P171/wdt:P171*23))|(wdt:P171/wdt:P171*24))|(wdt:P171/wdt:P171*25)) | 112,475 | 0.414 |
wdt:P31|wdt:P279 | 103,573 | 0.381 |
(wdt:P1647)* | 11,178 | 0.041 |
(((((((((((((((wdt:P17|wdt:P101)|wdt:P112)|wdt:P135)|wdt:P136)|wdt:P279)|wdt:P361)|wdt:P460)|wdt:P793)|wdt:P800)|wdt:P1269)|wdt:P1344)|wdt:P1830)|(p:P2572/ps:P2572))|wdt:P3342)|wdt:P3602)|wdt:P5004 | 11,002 | 0.04 |