You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org
User:AKhatun/Wikidata Subgraph Analysis: Difference between revisions
imported>AKhatun (Form article structure) |
imported>AKhatun (Add subgraph info table) |
||
Line 1: | Line 1: | ||
= TL;DR = | = TL;DR = | ||
= What are subgraphs? = | = What are subgraphs? = | ||
= Subgraph sizes (item, subgraph, days to | Wikidata contains all kinds of data from various aspects of knowledge. All of these data are highly inter-connected, but we can find some patterns. We find subgraphs within Wikidata and find out how large these subgraphs are, how connected they are, and finally how much these subgraphs are used (queried). | ||
== | In order to find subgraphs, the following steps were taken: | ||
* Consider all items that are instance of <code>(P31)</code> the same item to be under a subgraph. For example: all items that are <code>instance of</code> [[wikidata:wiki/Q13442814|Q13442814]] are part of one subgraph. | |||
* Some subgraphs were merged where it was obvious. For example: all subclasses of [[wikidata:wiki/Q6999|astronomical object]] were considered part of [[wikidata:wiki/Q6999|astronomical object]] as they were all indeed some sort of astronomical object. This method of sublcass merging is not applicable everywhere without manual inspection. | |||
* Some large subgraphs were almost completely part of another subgraph. For example: all items under [[wikidata:wiki/Q56476470|Review Articles]] are also instance of [[wikidata:wiki/Q13442814|scholarly article]]. In such case, review articles was not considered a separate subgraph. | |||
= Subgraph sizes = | |||
Using only <code>instance of</code>, Wikidata has <code>82,919</code> subgraphs. The distribution of the sizes of these subgraphs has a clear long tail, with very few subgraphs incorporating most items in Wikidata. Subgraph size can be calculated in two ways: | |||
* The number of items it contains | |||
* The number of triples related to the items in a subgraph. This is what we refer as '''subgraph size''' from here on. | |||
Takeaways: | |||
* Most calculations from here on will take the top 50 subgraphs, which form 85% of Wikidata | |||
* 340 top subgraphs (0.5% of all subgraphs, after merging some) form 90% of Wikidata (91% of all items and 90% of all triples). These subgraphs have '''>=10,000''' items each. | |||
* Rest 99.5% of the subgraphs have <10,000 items each, and together form 10% of Wikidata. | |||
'''Below is the distribution of the number of items in a subgraphs.''' | |||
{| | |||
| [[File:number_of_groups_vs_number_of_items.png|550px]] | |||
| [[File:number_of_groups_vs_number_of_items_log.png|550px]] | |||
|} | |||
To be more specific, | |||
{| class="wikitable" | |||
|+ Subgraph item distribution | |||
|- | |||
! !! Number of subgraphs !! !! Number of items !! | |||
|- | |||
| rowspan="8" | There are | |||
| 54,602 | |||
| rowspan="8" | subgraph(s) with more than | |||
| 1 | |||
| rowspan="8" | item(s) | |||
|- | |||
| 23,724 || 10 | |||
|- | |||
| 6,625 || 100 | |||
|- | |||
| 1,712 || 1,000 | |||
|- | |||
| 392 || 10,000 | |||
|- | |||
| 63 || 100,000 | |||
|- | |||
| 10 || 1,000,000 | |||
|- | |||
| 1 || 10,000,000 | |||
|} | |||
'''Below is the subgraph size comparison of top 340 subgraphs in Wikidata (90%).''' | |||
[[File: subgraph_distribution_triples.png]] | |||
[[File: subgraph_distribution_percents.png]] | |||
'''Below is the subgraph size comparison of top 50 subgraphs in Wikidata (85%).''' | |||
[[File: top_50_subgraph_distribution_triples.png]] | |||
[[File: top_50_subgraph_distribution_percents.png]] | |||
Here is an interactive graph showing the comparison of subgraph sizes in terms of item count and triple count: [https://tanny411.github.io/Wikidata-WDQS-Analysis/subgraph_stats.html subgraph stats]. | |||
Here are some subgraph size visualizations in WDQS: | |||
* Size as percentage of Wikidata each subgraph occupies: [https://query.wikidata.org/#%23defaultView%3ABubbleChart%0ASELECT%20%3Fsubgraph%20%3FsubgraphLabel%20%3Ftriple_percentage%0AWHERE%20%7B%0A%20%20%20%20VALUES%20%28%3Fsubgraph%20%3Ftriple_percentage%29%20%7B%20%28wd%3AQ13442814%2049.557%29%20%28wd%3AQ6999%208.68%29%20%28wd%3AQ5%207.272%29%20%28wd%3AQ4167836%205.778%29%20%28wd%3AQ16521%202.839%29%20%28wd%3AQ101352%201.448%29%20%28wd%3AQ4167410%201.366%29%20%28wd%3AQ7187%200.927%29%20%28wd%3AQ11266439%200.864%29%20%28wd%3AQ11173%200.692%29%20%28wd%3AQ8054%200.67%29%20%28wd%3AQ3305213%200.432%29%20%28wd%3AQ13100073%200.391%29%20%28wd%3AQ11424%200.358%29%20%28wd%3AQ486972%200.301%29%20%28wd%3AQ29654788%200.297%29%20%28wd%3AQ815382%200.269%29%20%28wd%3AQ13406463%200.256%29%20%28wd%3AQ13433827%200.253%29%20%28wd%3AQ8502%200.252%29%20%28wd%3AQ2668072%200.248%29%20%28wd%3AQ79007%200.23%29%20%28wd%3AQ4022%200.219%29%20%28wd%3AQ30612%200.21%29%20%28wd%3AQ532%200.201%29%20%28wd%3AQ484170%200.179%29%20%28wd%3AQ17633526%200.165%29%20%28wd%3AQ482994%200.161%29%20%28wd%3AQ23397%200.138%29%20%28wd%3AQ54050%200.13%29%20%28wd%3AQ16970%200.128%29%20%28wd%3AQ41176%200.124%29%20%28wd%3AQ56436498%200.116%29%20%28wd%3AQ4830453%200.107%29%20%28wd%3AQ47150325%200.107%29%20%28wd%3AQ3947%200.098%29%20%28wd%3AQ1348305%200.086%29%20%28wd%3AQ3331189%200.084%29%20%28wd%3AQ18593264%200.081%29%20%28wd%3AQ27020041%200.081%29%20%28wd%3AQ22808320%200.078%29%20%28wd%3AQ7725634%200.076%29%20%28wd%3AQ355304%200.076%29%20%28wd%3AQ23442%200.075%29%20%28wd%3AQ11060274%200.075%29%20%28wd%3AQ55488%200.074%29%20%28wd%3AQ12308941%200.072%29%20%28wd%3AQ277338%200.071%29%20%28wd%3AQ2225692%200.07%29%20%28wd%3AQ5633421%200.069%29%20%28wd%3AQ5084%200.068%29%20%28wd%3AQ9842%200.068%29%20%28wd%3AQ134556%200.065%29%20%28wd%3AQ19389637%200.063%29%20%28wd%3AQ215380%200.062%29%20%28wd%3AQ93184%200.061%29%20%28wd%3AQ21014462%200.06%29%20%28wd%3AQ452237%200.06%29%20%28wd%3AQ23038290%200.059%29%20%28wd%3AQ11753321%200.058%29%20%28wd%3AQ3558970%200.056%29%20%28wd%3AQ811979%200.055%29%20%28wd%3AQ7889%200.055%29%20%28wd%3AQ5398426%200.055%29%20%28wd%3AQ473972%200.052%29%20%28wd%3AQ1260524%200.051%29%20%28wd%3AQ47521%200.05%29%20%28wd%3AQ427087%200.05%29%20%28wd%3AQ7604686%200.05%29%20%28wd%3AQ732577%200.048%29%20%28wd%3AQ86850539%200.048%29%20%28wd%3AQ105543609%200.047%29%20%28wd%3AQ57733494%200.046%29%20%28wd%3AQ59199015%200.045%29%20%28wd%3AQ21191270%200.045%29%20%28wd%3AQ59542487%200.044%29%20%28wd%3AQ96739634%200.043%29%20%28wd%3AQ11879590%200.043%29%20%28wd%3AQ1504425%200.043%29%20%28wd%3AQ34442%200.041%29%20%28wd%3AQ61443690%200.04%29%20%28wd%3AQ253019%200.04%29%20%28wd%3AQ125191%200.039%29%20%28wd%3AQ49008%200.039%29%20%28wd%3AQ39816%200.038%29%20%28wd%3AQ47461344%200.038%29%20%28wd%3AQ22698%200.038%29%20%28wd%3AQ11446%200.037%29%20%28wd%3AQ23894233%200.037%29%20%28wd%3AQ43229%200.036%29%20%28wd%3AQ4164871%200.036%29%20%28wd%3AQ6881511%200.036%29%20%28wd%3AQ191067%200.035%29%20%28wd%3AQ12323%200.033%29%20%28wd%3AQ985488%200.033%29%20%28wd%3AQ2065736%200.033%29%20%28wd%3AQ735428%200.031%29%20%28wd%3AQ67383935%200.031%29%20%28wd%3AQ5185279%200.03%29%20%28wd%3AQ21672098%200.03%29%20%28wd%3AQ1248784%200.03%29%20%28wd%3AQ61089180%200.029%29%20%28wd%3AQ124714%200.029%29%20%28wd%3AQ59541917%200.029%29%20%28wd%3AQ1002697%200.029%29%20%28wd%3AQ19855165%200.028%29%20%28wd%3AQ860861%200.028%29%20%28wd%3AQ55659167%200.028%29%20%28wd%3AQ33506%200.028%29%20%28wd%3AQ26211545%200.027%29%20%28wd%3AQ17343829%200.026%29%20%28wd%3AQ476028%200.026%29%20%28wd%3AQ15632617%200.025%29%20%28wd%3AQ2341654%200.025%29%20%28wd%3AQ24862%200.025%29%20%28wd%3AQ108325%200.024%29%20%28wd%3AQ3257686%200.024%29%20%28wd%3AQ839954%200.024%29%20%28wd%3AQ41253%200.023%29%20%28wd%3AQ3957%200.023%29%20%28wd%3AQ24046192%200.023%29%20%28wd%3AQ737498%200.022%29%20%28wd%3AQ22969563%200.022%29%20%28wd%3AQ99045339%200.022%29%20%28wd%3AQ179700%200.021%29%20%28wd%3AQ27555384%200.021%29%20%28wd%3AQ28564%200.021%29%20%28wd%3AQ12284%200.02%29%20%28wd%3AQ187971%200.02%29%20%28wd%3AQ820655%200.02%29%20%28wd%3AQ506240%200.02%29%20%28wd%3AQ2996394%200.02%29%20%28wd%3AQ100532807%200.019%29%20%28wd%3AQ39614%200.019%29%20%28wd%3AQ18340514%200.019%29%20%28wd%3AQ1115575%200.019%29%20%28wd%3AQ15416%200.019%29%20%28wd%3AQ2151232%200.019%29%20%28wd%3AQ820477%200.018%29%20%28wd%3AQ489357%200.018%29%20%28wd%3AQ18918145%200.018%29%20%28wd%3AQ34023%200.018%29%20%28wd%3AQ11032%200.017%29%20%28wd%3AQ5783996%200.017%29%20%28wd%3AQ9826%200.017%29%20%28wd%3AQ21199%200.016%29%20%28wd%3AQ3914%200.016%29%20%28wd%3AQ4989906%200.016%29%20%28wd%3AQ1516079%200.016%29%20%28wd%3AQ163740%200.016%29%20%28wd%3AQ1134686%200.016%29%20%28wd%3AQ15647814%200.016%29%20%28wd%3AQ3918%200.016%29%20%28wd%3AQ928830%200.016%29%20%28wd%3AQ56428020%200.016%29%20%28wd%3AQ220659%200.016%29%20%28wd%3AQ1580166%200.016%29%20%28wd%3AQ24529780%200.016%29%20%28wd%3AQ187685%200.016%29%20%28wd%3AQ192287%200.016%29%20%28wd%3AQ23058136%200.015%29%20%28wd%3AQ39594%200.015%29%20%28wd%3AQ16917%200.015%29%20%28wd%3AQ891723%200.015%29%20%28wd%3AQ585956%200.015%29%20%28wd%3AQ634099%200.015%29%20%28wd%3AQ838948%200.014%29%20%28wd%3AQ773668%200.014%29%20%28wd%3AQ104093746%200.014%29%20%28wd%3AQ14350%200.014%29%20%28wd%3AQ126807%200.014%29%20%28wd%3AQ23925393%200.014%29%20%28wd%3AQ15303838%200.014%29%20%28wd%3AQ41298%200.014%29%20%28wd%3AQ123705%200.013%29%20%28wd%3AQ23413%200.013%29%20%28wd%3AQ34770%200.013%29%20%28wd%3AQ16466010%200.013%29%20%28wd%3AQ46831%200.013%29%20%28wd%3AQ16510064%200.013%29%20%28wd%3AQ26887310%200.013%29%20%28wd%3AQ12280%200.013%29%20%28wd%3AQ46190676%200.013%29%20%28wd%3AQ2514025%200.012%29%20%28wd%3AQ34763%200.012%29%20%28wd%3AQ1529096%200.012%29%20%28wd%3AQ7278%200.012%29%20%28wd%3AQ97695005%200.011%29%20%28wd%3AQ310890%200.011%29%20%28wd%3AQ62447%200.011%29%20%28wd%3AQ7075%200.011%29%20%28wd%3AQ3464665%200.011%29%20%28wd%3AQ10870555%200.011%29%20%28wd%3AQ4421%200.011%29%20%28wd%3AQ3052382%200.011%29%20%28wd%3AQ189004%200.011%29%20%28wd%3AQ852190%200.011%29%20%28wd%3AQ10429085%200.01%29%20%28wd%3AQ5358913%200.01%29%20%28wd%3AQ7777570%200.01%29%20%28wd%3AQ207326%200.01%29%20%28wd%3AQ212198%200.01%29%20%28wd%3AQ121117%200.01%29%20%28wd%3AQ15184295%200.01%29%20%28wd%3AQ131681%200.01%29%20%28wd%3AQ751876%200.01%29%20%28wd%3AQ65954115%200.01%29%20%28wd%3AQ47345468%200.01%29%20%28wd%3AQ18663566%200.01%29%20%28wd%3AQ159334%200.01%29%20%28wd%3AQ1195098%200.01%29%20%28wd%3AQ19860854%200.009%29%20%28wd%3AQ847017%200.009%29%20%28wd%3AQ210272%200.009%29%20%28wd%3AQ179049%200.009%29%20%28wd%3AQ133056%200.009%29%20%28wd%3AQ26267864%200.009%29%20%28wd%3AQ6453643%200.009%29%20%28wd%3AQ30198%200.009%29%20%28wd%3AQ202444%200.009%29%20%28wd%3AQ965568%200.009%29%20%28wd%3AQ1302249%200.009%29%20%28wd%3AQ16887380%200.009%29%20%28wd%3AQ1681353%200.009%29%20%28wd%3AQ66826848%200.009%29%20%28wd%3AQ67015940%200.009%29%20%28wd%3AQ571%200.009%29%20%28wd%3AQ106006703%200.009%29%20%28wd%3AQ5707594%200.009%29%20%28wd%3AQ7604693%200.009%29%20%28wd%3AQ178561%200.009%29%20%28wd%3AQ1303167%200.009%29%20%28wd%3AQ728937%200.009%29%20%28wd%3AQ1505023%200.009%29%20%28wd%3AQ26703203%200.009%29%20%28wd%3AQ185113%200.009%29%20%28wd%3AQ3231690%200.009%29%20%28wd%3AQ24354%200.008%29%20%28wd%3AQ3950%200.008%29%20%28wd%3AQ2042028%200.008%29%20%28wd%3AQ174782%200.008%29%20%28wd%3AQ16560%200.008%29%20%28wd%3AQ14204246%200.008%29%20%28wd%3AQ2593777%200.008%29%20%28wd%3AQ155076%200.008%29%20%28wd%3AQ740445%200.008%29%20%28wd%3AQ12317349%200.008%29%20%28wd%3AQ176799%200.008%29%20%28wd%3AQ15773317%200.008%29%20%28wd%3AQ327333%200.008%29%20%28wd%3AQ61443650%200.008%29%20%28wd%3AQ42195%200.008%29%20%28wd%3AQ1577547%200.008%29%20%28wd%3AQ27686%200.008%29%20%28wd%3AQ19692072%200.007%29%20%28wd%3AQ2334719%200.007%29%20%28wd%3AQ2116450%200.007%29%20%28wd%3AQ44613%200.007%29%20%28wd%3AQ65661087%200.007%29%20%28wd%3AQ2309609%200.007%29%20%28wd%3AQ1497375%200.007%29%20%28wd%3AQ1266946%200.007%29%20%28wd%3AQ2732840%200.007%29%20%28wd%3AQ107102664%200.007%29%20%28wd%3AQ2085381%200.007%29%20%28wd%3AQ575759%200.007%29%20%28wd%3AQ26895936%200.007%29%20%28wd%3AQ35127%200.007%29%20%28wd%3AQ417841%200.007%29%20%28wd%3AQ55850593%200.007%29%20%28wd%3AQ55659107%200.007%29%20%28wd%3AQ40080%200.007%29%20%28wd%3AQ4414033%200.007%29%20%28wd%3AQ618779%200.007%29%20%28wd%3AQ40231%200.007%29%20%28wd%3AQ102496%200.007%29%20%28wd%3AQ17018380%200.007%29%20%28wd%3AQ7321974%200.007%29%20%28wd%3AQ51591359%200.007%29%20%28wd%3AQ29023906%200.007%29%20%28wd%3AQ169930%200.007%29%20%28wd%3AQ21198342%200.007%29%20%28wd%3AQ166735%200.007%29%20%28wd%3AQ15221623%200.006%29%20%28wd%3AQ207524%200.006%29%20%28wd%3AQ420927%200.006%29%20%28wd%3AQ31855%200.006%29%20%28wd%3AQ21573182%200.006%29%20%28wd%3AQ35509%200.006%29%20%28wd%3AQ1520033%200.006%29%20%28wd%3AQ43460564%200.006%29%20%28wd%3AQ15243209%200.006%29%20%28wd%3AQ56731284%200.006%29%20%28wd%3AQ11755880%200.006%29%20%28wd%3AQ106474968%200.006%29%20%28wd%3AQ18761202%200.006%29%20%28wd%3AQ107103143%200.006%29%20%28wd%3AQ18127%200.006%29%20%28wd%3AQ67015883%200.006%29%20%28wd%3AQ187223%200.006%29%20%28wd%3AQ358%200.006%29%20%28wd%3AQ25379%200.006%29%20%28wd%3AQ2467461%200.006%29%20%28wd%3AQ726%200.006%29%20%28wd%3AQ7397%200.006%29%20%28wd%3AQ1081138%200.006%29%20%28wd%3AQ13406554%200.005%29%20%28wd%3AQ27671617%200.005%29%20%28wd%3AQ160091%200.005%29%20%28wd%3AQ34038%200.005%29%20%28wd%3AQ953806%200.005%29%20%28wd%3AQ67035425%200.005%29%20%28wd%3AQ4504495%200.005%29%20%28wd%3AQ26894053%200.005%29%20%28wd%3AQ783794%200.005%29%20%28wd%3AQ50231%200.005%29%20%28wd%3AQ294414%200.005%29%20%28wd%3AQ814254%200.005%29%20%28wd%3AQ842402%200.005%29%20%28wd%3AQ14860489%200.005%29%20%28wd%3AQ3508250%200.005%29%20%28wd%3AQ1539532%200.005%29%20%28wd%3AQ149566%200.004%29%20%28wd%3AQ13417114%200.004%29%20%28wd%3AQ50386450%200.004%29%20%28wd%3AQ17524420%200.004%29%20%28wd%3AQ46135307%200.004%29%20%28wd%3AQ1238720%200.004%29%20%28wd%3AQ235557%200.004%29%20%28wd%3AQ459297%200.004%29%20%28wd%3AQ20541692%200.004%29%20%28wd%3AQ2231510%200.004%29%20%28wd%3AQ14659%200.003%29%20%28wd%3AQ61278455%200.003%29%20%28wd%3AQ81505329%200.003%29%20%28wd%3AQ6451276%200.003%29%20%28wd%3AQ482%200.003%29%20%28wd%3AQ8436%200.003%29%20%28wd%3AQ46686%200.003%29%20%28wd%3AQ6503489%200.003%29%20%28wd%3AQ898273%200.002%29%20%7D%0A%20%20%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D query link] | |||
* Size as percentage of Wikidata items each subgraph contains: [https://query.wikidata.org/#%23defaultView%3ABubbleChart%0ASELECT%20%3Fsubgraph%20%3FsubgraphLabel%20%3Fitem_percentage%0AWHERE%20%7B%0A%20%20%20%20VALUES%20%28%3Fsubgraph%20%3Fitem_percentage%29%20%7B%20%28wd%3AQ13442814%2039.752%29%20%28wd%3AQ6999%208.95%29%20%28wd%3AQ5%209.964%29%20%28wd%3AQ4167836%205.152%29%20%28wd%3AQ16521%203.424%29%20%28wd%3AQ101352%200.51%29%20%28wd%3AQ4167410%201.449%29%20%28wd%3AQ7187%201.273%29%20%28wd%3AQ11266439%200.899%29%20%28wd%3AQ11173%201.302%29%20%28wd%3AQ8054%201.05%29%20%28wd%3AQ3305213%200.577%29%20%28wd%3AQ13100073%200.626%29%20%28wd%3AQ11424%200.28%29%20%28wd%3AQ486972%200.602%29%20%28wd%3AQ29654788%200.042%29%20%28wd%3AQ815382%200.117%29%20%28wd%3AQ13406463%200.357%29%20%28wd%3AQ13433827%200.545%29%20%28wd%3AQ8502%200.559%29%20%28wd%3AQ2668072%200.534%29%20%28wd%3AQ79007%200.617%29%20%28wd%3AQ4022%200.425%29%20%28wd%3AQ30612%200.38%29%20%28wd%3AQ532%200.292%29%20%28wd%3AQ484170%200.043%29%20%28wd%3AQ17633526%200.305%29%20%28wd%3AQ482994%200.287%29%20%28wd%3AQ23397%200.277%29%20%28wd%3AQ54050%200.348%29%20%28wd%3AQ16970%200.226%29%20%28wd%3AQ41176%200.283%29%20%28wd%3AQ56436498%200.155%29%20%28wd%3AQ4830453%200.207%29%20%28wd%3AQ47150325%200.201%29%20%28wd%3AQ3947%200.216%29%20%28wd%3AQ1348305%200.085%29%20%28wd%3AQ3331189%200.169%29%20%28wd%3AQ18593264%200.157%29%20%28wd%3AQ27020041%200.169%29%20%28wd%3AQ22808320%200.075%29%20%28wd%3AQ7725634%200.176%29%20%28wd%3AQ355304%200.186%29%20%28wd%3AQ23442%200.158%29%20%28wd%3AQ11060274%200.13%29%20%28wd%3AQ55488%200.103%29%20%28wd%3AQ12308941%200.04%29%20%28wd%3AQ277338%200.061%29%20%28wd%3AQ2225692%200.089%29%20%28wd%3AQ5633421%200.103%29%20%28wd%3AQ5084%200.126%29%20%28wd%3AQ9842%200.168%29%20%28wd%3AQ134556%200.103%29%20%28wd%3AQ19389637%200.162%29%20%28wd%3AQ215380%200.087%29%20%28wd%3AQ93184%200.098%29%20%28wd%3AQ21014462%200.137%29%20%28wd%3AQ452237%200.082%29%20%28wd%3AQ23038290%200.111%29%20%28wd%3AQ11753321%200.048%29%20%28wd%3AQ3558970%200.06%29%20%28wd%3AQ811979%200.121%29%20%28wd%3AQ7889%200.047%29%20%28wd%3AQ5398426%200.062%29%20%28wd%3AQ473972%200.09%29%20%28wd%3AQ1260524%200.093%29%20%28wd%3AQ47521%200.133%29%20%28wd%3AQ427087%200.091%29%20%28wd%3AQ7604686%200.094%29%20%28wd%3AQ732577%200.076%29%20%28wd%3AQ86850539%200.011%29%20%28wd%3AQ105543609%200.098%29%20%28wd%3AQ57733494%200.065%29%20%28wd%3AQ59199015%200.119%29%20%28wd%3AQ21191270%200.067%29%20%28wd%3AQ59542487%200.037%29%20%28wd%3AQ96739634%200.065%29%20%28wd%3AQ11879590%200.025%29%20%28wd%3AQ1504425%200.013%29%20%28wd%3AQ34442%200.072%29%20%28wd%3AQ61443690%200.137%29%20%28wd%3AQ253019%200.044%29%20%28wd%3AQ125191%200.062%29%20%28wd%3AQ49008%200.136%29%20%28wd%3AQ39816%200.103%29%20%28wd%3AQ47461344%200.078%29%20%28wd%3AQ22698%200.093%29%20%28wd%3AQ11446%200.082%29%20%28wd%3AQ23894233%200.037%29%20%28wd%3AQ43229%200.08%29%20%28wd%3AQ4164871%200.128%29%20%28wd%3AQ6881511%200.051%29%20%28wd%3AQ191067%200.078%29%20%28wd%3AQ12323%200.079%29%20%28wd%3AQ985488%200.097%29%20%28wd%3AQ2065736%200.077%29%20%28wd%3AQ735428%200.023%29%20%28wd%3AQ67383935%200.078%29%20%28wd%3AQ5185279%200.111%29%20%28wd%3AQ21672098%200.028%29%20%28wd%3AQ1248784%200.026%29%20%28wd%3AQ61089180%200.012%29%20%28wd%3AQ124714%200.074%29%20%28wd%3AQ59541917%200.014%29%20%28wd%3AQ1002697%200.058%29%20%28wd%3AQ19855165%200.072%29%20%28wd%3AQ860861%200.052%29%20%28wd%3AQ55659167%200.071%29%20%28wd%3AQ33506%200.044%29%20%28wd%3AQ26211545%200.044%29%20%28wd%3AQ17343829%200.04%29%20%28wd%3AQ476028%200.038%29%20%28wd%3AQ15632617%200.056%29%20%28wd%3AQ2341654%200.065%29%20%28wd%3AQ24862%200.033%29%20%28wd%3AQ108325%200.057%29%20%28wd%3AQ3257686%200.041%29%20%28wd%3AQ839954%200.05%29%20%28wd%3AQ41253%200.059%29%20%28wd%3AQ3957%200.015%29%20%28wd%3AQ24046192%200.024%29%20%28wd%3AQ737498%200.024%29%20%28wd%3AQ22969563%200.052%29%20%28wd%3AQ99045339%200.036%29%20%28wd%3AQ179700%200.05%29%20%28wd%3AQ27555384%200.051%29%20%28wd%3AQ28564%200.043%29%20%28wd%3AQ12284%200.051%29%20%28wd%3AQ187971%200.055%29%20%28wd%3AQ820655%200.028%29%20%28wd%3AQ506240%200.019%29%20%28wd%3AQ2996394%200.045%29%20%28wd%3AQ100532807%200.036%29%20%28wd%3AQ39614%200.047%29%20%28wd%3AQ18340514%200.048%29%20%28wd%3AQ1115575%200.013%29%20%28wd%3AQ15416%200.049%29%20%28wd%3AQ2151232%200.065%29%20%28wd%3AQ820477%200.048%29%20%28wd%3AQ489357%200.043%29%20%28wd%3AQ18918145%200.032%29%20%28wd%3AQ34023%200.048%29%20%28wd%3AQ11032%200.043%29%20%28wd%3AQ5783996%200.041%29%20%28wd%3AQ9826%200.033%29%20%28wd%3AQ21199%200.011%29%20%28wd%3AQ3914%200.039%29%20%28wd%3AQ4989906%200.03%29%20%28wd%3AQ1516079%200.04%29%20%28wd%3AQ163740%200.028%29%20%28wd%3AQ1134686%200.02%29%20%28wd%3AQ15647814%200.015%29%20%28wd%3AQ3918%200.015%29%20%28wd%3AQ928830%200.016%29%20%28wd%3AQ56428020%200.014%29%20%28wd%3AQ220659%200.021%29%20%28wd%3AQ1580166%200.056%29%20%28wd%3AQ24529780%200.039%29%20%28wd%3AQ187685%200.035%29%20%28wd%3AQ192287%200.017%29%20%28wd%3AQ23058136%200.03%29%20%28wd%3AQ39594%200.036%29%20%28wd%3AQ16917%200.027%29%20%28wd%3AQ891723%200.013%29%20%28wd%3AQ585956%200.038%29%20%28wd%3AQ634099%200.02%29%20%28wd%3AQ838948%200.024%29%20%28wd%3AQ773668%200.013%29%20%28wd%3AQ104093746%200.033%29%20%28wd%3AQ14350%200.028%29%20%28wd%3AQ126807%200.036%29%20%28wd%3AQ23925393%200.034%29%20%28wd%3AQ15303838%200.019%29%20%28wd%3AQ41298%200.031%29%20%28wd%3AQ123705%200.027%29%20%28wd%3AQ23413%200.023%29%20%28wd%3AQ34770%200.011%29%20%28wd%3AQ16466010%200.019%29%20%28wd%3AQ46831%200.025%29%20%28wd%3AQ16510064%200.03%29%20%28wd%3AQ26887310%200.034%29%20%28wd%3AQ12280%200.027%29%20%28wd%3AQ46190676%200.028%29%20%28wd%3AQ2514025%200.02%29%20%28wd%3AQ34763%200.028%29%20%28wd%3AQ1529096%200.019%29%20%28wd%3AQ7278%200.02%29%20%28wd%3AQ97695005%200.012%29%20%28wd%3AQ310890%200.011%29%20%28wd%3AQ62447%200.02%29%20%28wd%3AQ7075%200.024%29%20%28wd%3AQ3464665%200.02%29%20%28wd%3AQ10870555%200.023%29%20%28wd%3AQ4421%200.026%29%20%28wd%3AQ3052382%200.022%29%20%28wd%3AQ189004%200.027%29%20%28wd%3AQ852190%200.026%29%20%28wd%3AQ10429085%200.011%29%20%28wd%3AQ5358913%200.017%29%20%28wd%3AQ7777570%200.011%29%20%28wd%3AQ207326%200.025%29%20%28wd%3AQ212198%200.016%29%20%28wd%3AQ121117%200.019%29%20%28wd%3AQ15184295%200.031%29%20%28wd%3AQ131681%200.021%29%20%28wd%3AQ751876%200.025%29%20%28wd%3AQ65954115%200.028%29%20%28wd%3AQ47345468%200.015%29%20%28wd%3AQ18663566%200.016%29%20%28wd%3AQ159334%200.019%29%20%28wd%3AQ1195098%200.028%29%20%28wd%3AQ19860854%200.023%29%20%28wd%3AQ847017%200.024%29%20%28wd%3AQ210272%200.023%29%20%28wd%3AQ179049%200.014%29%20%28wd%3AQ133056%200.023%29%20%28wd%3AQ26267864%200.012%29%20%28wd%3AQ6453643%200.013%29%20%28wd%3AQ30198%200.025%29%20%28wd%3AQ202444%200.021%29%20%28wd%3AQ965568%200.012%29%20%28wd%3AQ1302249%200.018%29%20%28wd%3AQ16887380%200.023%29%20%28wd%3AQ1681353%200.029%29%20%28wd%3AQ66826848%200.025%29%20%28wd%3AQ67015940%200.028%29%20%28wd%3AQ571%200.021%29%20%28wd%3AQ106006703%200.017%29%20%28wd%3AQ5707594%200.026%29%20%28wd%3AQ7604693%200.018%29%20%28wd%3AQ178561%200.015%29%20%28wd%3AQ1303167%200.021%29%20%28wd%3AQ728937%200.015%29%20%28wd%3AQ1505023%200.013%29%20%28wd%3AQ26703203%200.016%29%20%28wd%3AQ185113%200.022%29%20%28wd%3AQ3231690%200.012%29%20%28wd%3AQ24354%200.013%29%20%28wd%3AQ3950%200.017%29%20%28wd%3AQ2042028%200.021%29%20%28wd%3AQ174782%200.016%29%20%28wd%3AQ16560%200.014%29%20%28wd%3AQ14204246%200.033%29%20%28wd%3AQ2593777%200.02%29%20%28wd%3AQ155076%200.025%29%20%28wd%3AQ740445%200.019%29%20%28wd%3AQ12317349%200.015%29%20%28wd%3AQ176799%200.023%29%20%28wd%3AQ15773317%200.011%29%20%28wd%3AQ327333%200.015%29%20%28wd%3AQ61443650%200.026%29%20%28wd%3AQ42195%200.02%29%20%28wd%3AQ1577547%200.02%29%20%28wd%3AQ27686%200.013%29%20%28wd%3AQ19692072%200.021%29%20%28wd%3AQ2334719%200.029%29%20%28wd%3AQ2116450%200.02%29%20%28wd%3AQ44613%200.013%29%20%28wd%3AQ65661087%200.017%29%20%28wd%3AQ2309609%200.022%29%20%28wd%3AQ1497375%200.011%29%20%28wd%3AQ1266946%200.018%29%20%28wd%3AQ2732840%200.016%29%20%28wd%3AQ107102664%200.017%29%20%28wd%3AQ2085381%200.014%29%20%28wd%3AQ575759%200.017%29%20%28wd%3AQ26895936%200.025%29%20%28wd%3AQ35127%200.014%29%20%28wd%3AQ417841%200.025%29%20%28wd%3AQ55850593%200.011%29%20%28wd%3AQ55659107%200.019%29%20%28wd%3AQ40080%200.017%29%20%28wd%3AQ4414033%200.011%29%20%28wd%3AQ618779%200.021%29%20%28wd%3AQ40231%200.02%29%20%28wd%3AQ102496%200.016%29%20%28wd%3AQ17018380%200.017%29%20%28wd%3AQ7321974%200.018%29%20%28wd%3AQ51591359%200.011%29%20%28wd%3AQ29023906%200.021%29%20%28wd%3AQ169930%200.013%29%20%28wd%3AQ21198342%200.014%29%20%28wd%3AQ166735%200.016%29%20%28wd%3AQ15221623%200.017%29%20%28wd%3AQ207524%200.018%29%20%28wd%3AQ420927%200.014%29%20%28wd%3AQ31855%200.012%29%20%28wd%3AQ21573182%200.02%29%20%28wd%3AQ35509%200.016%29%20%28wd%3AQ1520033%200.011%29%20%28wd%3AQ43460564%200.016%29%20%28wd%3AQ15243209%200.011%29%20%28wd%3AQ56731284%200.013%29%20%28wd%3AQ11755880%200.012%29%20%28wd%3AQ106474968%200.015%29%20%28wd%3AQ18761202%200.012%29%20%28wd%3AQ107103143%200.013%29%20%28wd%3AQ18127%200.011%29%20%28wd%3AQ67015883%200.012%29%20%28wd%3AQ187223%200.014%29%20%28wd%3AQ358%200.016%29%20%28wd%3AQ25379%200.014%29%20%28wd%3AQ2467461%200.016%29%20%28wd%3AQ726%200.014%29%20%28wd%3AQ7397%200.012%29%20%28wd%3AQ1081138%200.011%29%20%28wd%3AQ13406554%200.014%29%20%28wd%3AQ27671617%200.012%29%20%28wd%3AQ160091%200.013%29%20%28wd%3AQ34038%200.011%29%20%28wd%3AQ953806%200.013%29%20%28wd%3AQ67035425%200.018%29%20%28wd%3AQ4504495%200.011%29%20%28wd%3AQ26894053%200.013%29%20%28wd%3AQ783794%200.012%29%20%28wd%3AQ50231%200.015%29%20%28wd%3AQ294414%200.013%29%20%28wd%3AQ814254%200.011%29%20%28wd%3AQ842402%200.016%29%20%28wd%3AQ14860489%200.012%29%20%28wd%3AQ3508250%200.015%29%20%28wd%3AQ1539532%200.019%29%20%28wd%3AQ149566%200.014%29%20%28wd%3AQ13417114%200.012%29%20%28wd%3AQ50386450%200.015%29%20%28wd%3AQ17524420%200.013%29%20%28wd%3AQ46135307%200.012%29%20%28wd%3AQ1238720%200.013%29%20%28wd%3AQ235557%200.013%29%20%28wd%3AQ459297%200.014%29%20%28wd%3AQ20541692%200.015%29%20%28wd%3AQ2231510%200.012%29%20%28wd%3AQ14659%200.011%29%20%28wd%3AQ61278455%200.012%29%20%28wd%3AQ81505329%200.013%29%20%28wd%3AQ6451276%200.015%29%20%28wd%3AQ482%200.017%29%20%28wd%3AQ8436%200.011%29%20%28wd%3AQ46686%200.011%29%20%28wd%3AQ6503489%200.014%29%20%28wd%3AQ898273%200.012%29%20%7D%0A%20%20%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D query link] | |||
'''Number of days to recovery''' | |||
Given the current rate of growth, how long would it take wikidata to get back to its original size again if some amount of triples were removed from it? This helps us estimate what to temporarily remove from Wikidata in the siatuation Wikidata backend maxes out. The growth rate of triples is not constant, but considering the growth an approximate straight line, in [https://grafana.wikimedia.org/goto/pyO_iMRnk grafana dashboard], Wikidata grows at a rate of '''4.77M''' triples per day. This rate was calculated from the number of triples at the start and end of a 90-day interval (11/3/21 to 6/6/21). It could be faster or a bit slower than this. This will give us a wide approximation of the number of days we can gain by removing some parts of Wikidata. | |||
{| class="wikitable sortable" | |||
|+ Top 50 Subgraphs in Wikidata | |||
|- | |||
! Rank !! Subgraph !! Subgraph Name !! Number of items !! % of WD items !! Number of triples !! % of WD Triples !! Number of days to recover | |||
|- | |||
|1||Q13442814||scholarly article||37,362,641||39.75||6,539,020,889||49.73||1370.86 | |||
|- | |||
|2||Q6999||astronomical object||8,412,914||8.95||1,136,682,291||8.64||238.3 | |||
|- | |||
|3||Q5||human||9,315,444||9.91||954,536,943||7.26||200.11 | |||
|- | |||
|4||Q4167836||Wikimedia category||4,840,195||5.15||753,127,982||5.73||157.89 | |||
|- | |||
|5||Q16521||taxon||3,180,248||3.38||367,926,462||2.8||77.13 | |||
|- | |||
|6||Q101352||family name||481,445||0.51||187,299,892||1.42||39.27 | |||
|- | |||
|7||Q4167410||Wikimedia disambiguation page||1,359,804||1.45||180,124,174||1.37||37.76 | |||
|- | |||
|8||Q7187||gene||1,196,361||1.27||122,421,508||0.93||25.66 | |||
|- | |||
|9||Q11266439||Wikimedia template||845,852||0.9||114,308,711||0.87||23.96 | |||
|- | |||
|10||Q11173||chemical compound||1,223,387||1.3||91,228,463||0.69||19.13 | |||
|- | |||
|11||Q8054||protein||986,599||1.05||88,483,828||0.67||18.55 | |||
|- | |||
|12||Q3305213||painting||539,468||0.57||56,769,083||0.43||11.9 | |||
|- | |||
|13||Q13100073||village-level division in China||588,477||0.63||51,615,572||0.39||10.82 | |||
|- | |||
|14||Q11424||film||263,070||0.28||47,176,067||0.36||9.89 | |||
|- | |||
|15||Q486972||human settlement||563,958||0.6||39,590,792||0.3||8.3 | |||
|- | |||
|16||Q13406463||Wikimedia list article||334,939||0.36||33,742,245||0.26||7.07 | |||
|- | |||
|17||Q13433827||encyclopedia article||512,141||0.55||33,373,227||0.25||7.0 | |||
|- | |||
|18||Q8502||mountain||525,553||0.56||33,340,188||0.25||6.99 | |||
|- | |||
|19||Q2668072||collection||500,968||0.53||32,670,637||0.25||6.85 | |||
|- | |||
|20||Q79007||street||578,926||0.62||30,252,119||0.23||6.34 | |||
|- | |||
|21||Q4022||river||399,552||0.42||28,833,476||0.22||6.04 | |||
|- | |||
|22||Q30612||clinical trial||356,838||0.38||27,731,502||0.21||5.81 | |||
|- | |||
|23||Q532||village||274,840||0.29||26,483,275||0.2||5.55 | |||
|- | |||
|24||Q17633526||Wikinews article||286,950||0.3||21,830,150||0.17||4.58 | |||
|- | |||
|25||Q482994||album||269,095||0.29||21,181,015||0.16||4.44 | |||
|- | |||
|26||Q23397||lake||260,135||0.28||18,053,096||0.14||3.78 | |||
|- | |||
|27||Q54050||hill||327,277||0.35||17,228,390||0.13||3.61 | |||
|- | |||
|28||Q16970||church building||211,291||0.22||16,821,530||0.13||3.53 | |||
|- | |||
|29||Q41176||building||265,925||0.28||16,293,008||0.12||3.42 | |||
|- | |||
|30||Q56436498||village in India||145,824||0.16||15,383,416||0.12||3.23 | |||
|- | |||
|31||Q4830453||business||193,858||0.21||14,101,220||0.11||2.96 | |||
|- | |||
|32||Q47150325||calendar day of a given year||189,366||0.2||14,078,486||0.11||2.95 | |||
|- | |||
|33||Q3947||house||197,736||0.21||12,468,434||0.1||2.61 | |||
|- | |||
|34||Q3331189||version, edition, or translation||157,486||0.17||10,997,589||0.08||2.31 | |||
|- | |||
|35||Q18593264||item of collection or exhibition||147,402||0.16||10,732,969||0.08||2.25 | |||
|- | |||
|36||Q27020041||sports season||158,877||0.17||10,693,504||0.08||2.24 | |||
|- | |||
|37||Q355304||watercourse||174,620||0.19||10,080,421||0.08||2.11 | |||
|- | |||
|38||Q7725634||literary work||164,860||0.18||10,049,521||0.08||2.11 | |||
|- | |||
|39||Q23442||island||148,587||0.16||9,885,277||0.08||2.07 | |||
|- | |||
|40||Q11060274||print||119,806||0.13||9,700,063||0.07||2.03 | |||
|- | |||
|41||Q811979||architectural structure||145,957||0.16||9,666,936||0.07||2.03 | |||
|- | |||
|42||Q5084||hamlet||118,188||0.13||9,013,534||0.07||1.89 | |||
|- | |||
|43||Q9842||primary school||157,451||0.17||8,916,373||0.07||1.87 | |||
|- | |||
|44||Q19389637||biographical article||151,026||0.16||8,238,397||0.06||1.73 | |||
|- | |||
|45||Q21014462||cell line||128,805||0.14||7,955,975||0.06||1.67 | |||
|- | |||
|46||Q47521||stream||124,853||0.13||6,654,366||0.05||1.4 | |||
|- | |||
|47||Q59199015||group of stereoisomers||111,599||0.12||5,843,270||0.04||1.23 | |||
|- | |||
|48||Q61443690||branch post office||129,183||0.14||5,313,033||0.04||1.11 | |||
|- | |||
|49||Q49008||prime number||127,545||0.14||5,188,768||0.04||1.09 | |||
|- | |||
|50||Q4164871||position||120,117||0.13||4,720,668||0.04||0.99 | |||
|} | |||
== Triples == | |||
=== Direct triples vs statements === | === Direct triples vs statements === | ||
=== Triples per item === | === Triples per item === |
Revision as of 17:25, 27 October 2021
TL;DR
What are subgraphs?
Wikidata contains all kinds of data from various aspects of knowledge. All of these data are highly inter-connected, but we can find some patterns. We find subgraphs within Wikidata and find out how large these subgraphs are, how connected they are, and finally how much these subgraphs are used (queried).
In order to find subgraphs, the following steps were taken:
- Consider all items that are instance of
(P31)
the same item to be under a subgraph. For example: all items that areinstance of
Q13442814 are part of one subgraph. - Some subgraphs were merged where it was obvious. For example: all subclasses of astronomical object were considered part of astronomical object as they were all indeed some sort of astronomical object. This method of sublcass merging is not applicable everywhere without manual inspection.
- Some large subgraphs were almost completely part of another subgraph. For example: all items under Review Articles are also instance of scholarly article. In such case, review articles was not considered a separate subgraph.
Subgraph sizes
Using only instance of
, Wikidata has 82,919
subgraphs. The distribution of the sizes of these subgraphs has a clear long tail, with very few subgraphs incorporating most items in Wikidata. Subgraph size can be calculated in two ways:
- The number of items it contains
- The number of triples related to the items in a subgraph. This is what we refer as subgraph size from here on.
Takeaways:
- Most calculations from here on will take the top 50 subgraphs, which form 85% of Wikidata
- 340 top subgraphs (0.5% of all subgraphs, after merging some) form 90% of Wikidata (91% of all items and 90% of all triples). These subgraphs have >=10,000 items each.
- Rest 99.5% of the subgraphs have <10,000 items each, and together form 10% of Wikidata.
Below is the distribution of the number of items in a subgraphs.
File:Number of groups vs number of items.png | File:Number of groups vs number of items log.png |
To be more specific,
Number of subgraphs | Number of items | |||
---|---|---|---|---|
There are | 54,602 | subgraph(s) with more than | 1 | item(s) |
23,724 | 10 | |||
6,625 | 100 | |||
1,712 | 1,000 | |||
392 | 10,000 | |||
63 | 100,000 | |||
10 | 1,000,000 | |||
1 | 10,000,000 |
Below is the subgraph size comparison of top 340 subgraphs in Wikidata (90%).
File:Subgraph distribution triples.png File:Subgraph distribution percents.png
Below is the subgraph size comparison of top 50 subgraphs in Wikidata (85%).
File:Top 50 subgraph distribution triples.png File:Top 50 subgraph distribution percents.png
Here is an interactive graph showing the comparison of subgraph sizes in terms of item count and triple count: subgraph stats.
Here are some subgraph size visualizations in WDQS:
- Size as percentage of Wikidata each subgraph occupies: query link
- Size as percentage of Wikidata items each subgraph contains: query link
Number of days to recovery Given the current rate of growth, how long would it take wikidata to get back to its original size again if some amount of triples were removed from it? This helps us estimate what to temporarily remove from Wikidata in the siatuation Wikidata backend maxes out. The growth rate of triples is not constant, but considering the growth an approximate straight line, in grafana dashboard, Wikidata grows at a rate of 4.77M triples per day. This rate was calculated from the number of triples at the start and end of a 90-day interval (11/3/21 to 6/6/21). It could be faster or a bit slower than this. This will give us a wide approximation of the number of days we can gain by removing some parts of Wikidata.
Rank | Subgraph | Subgraph Name | Number of items | % of WD items | Number of triples | % of WD Triples | Number of days to recover |
---|---|---|---|---|---|---|---|
1 | Q13442814 | scholarly article | 37,362,641 | 39.75 | 6,539,020,889 | 49.73 | 1370.86 |
2 | Q6999 | astronomical object | 8,412,914 | 8.95 | 1,136,682,291 | 8.64 | 238.3 |
3 | Q5 | human | 9,315,444 | 9.91 | 954,536,943 | 7.26 | 200.11 |
4 | Q4167836 | Wikimedia category | 4,840,195 | 5.15 | 753,127,982 | 5.73 | 157.89 |
5 | Q16521 | taxon | 3,180,248 | 3.38 | 367,926,462 | 2.8 | 77.13 |
6 | Q101352 | family name | 481,445 | 0.51 | 187,299,892 | 1.42 | 39.27 |
7 | Q4167410 | Wikimedia disambiguation page | 1,359,804 | 1.45 | 180,124,174 | 1.37 | 37.76 |
8 | Q7187 | gene | 1,196,361 | 1.27 | 122,421,508 | 0.93 | 25.66 |
9 | Q11266439 | Wikimedia template | 845,852 | 0.9 | 114,308,711 | 0.87 | 23.96 |
10 | Q11173 | chemical compound | 1,223,387 | 1.3 | 91,228,463 | 0.69 | 19.13 |
11 | Q8054 | protein | 986,599 | 1.05 | 88,483,828 | 0.67 | 18.55 |
12 | Q3305213 | painting | 539,468 | 0.57 | 56,769,083 | 0.43 | 11.9 |
13 | Q13100073 | village-level division in China | 588,477 | 0.63 | 51,615,572 | 0.39 | 10.82 |
14 | Q11424 | film | 263,070 | 0.28 | 47,176,067 | 0.36 | 9.89 |
15 | Q486972 | human settlement | 563,958 | 0.6 | 39,590,792 | 0.3 | 8.3 |
16 | Q13406463 | Wikimedia list article | 334,939 | 0.36 | 33,742,245 | 0.26 | 7.07 |
17 | Q13433827 | encyclopedia article | 512,141 | 0.55 | 33,373,227 | 0.25 | 7.0 |
18 | Q8502 | mountain | 525,553 | 0.56 | 33,340,188 | 0.25 | 6.99 |
19 | Q2668072 | collection | 500,968 | 0.53 | 32,670,637 | 0.25 | 6.85 |
20 | Q79007 | street | 578,926 | 0.62 | 30,252,119 | 0.23 | 6.34 |
21 | Q4022 | river | 399,552 | 0.42 | 28,833,476 | 0.22 | 6.04 |
22 | Q30612 | clinical trial | 356,838 | 0.38 | 27,731,502 | 0.21 | 5.81 |
23 | Q532 | village | 274,840 | 0.29 | 26,483,275 | 0.2 | 5.55 |
24 | Q17633526 | Wikinews article | 286,950 | 0.3 | 21,830,150 | 0.17 | 4.58 |
25 | Q482994 | album | 269,095 | 0.29 | 21,181,015 | 0.16 | 4.44 |
26 | Q23397 | lake | 260,135 | 0.28 | 18,053,096 | 0.14 | 3.78 |
27 | Q54050 | hill | 327,277 | 0.35 | 17,228,390 | 0.13 | 3.61 |
28 | Q16970 | church building | 211,291 | 0.22 | 16,821,530 | 0.13 | 3.53 |
29 | Q41176 | building | 265,925 | 0.28 | 16,293,008 | 0.12 | 3.42 |
30 | Q56436498 | village in India | 145,824 | 0.16 | 15,383,416 | 0.12 | 3.23 |
31 | Q4830453 | business | 193,858 | 0.21 | 14,101,220 | 0.11 | 2.96 |
32 | Q47150325 | calendar day of a given year | 189,366 | 0.2 | 14,078,486 | 0.11 | 2.95 |
33 | Q3947 | house | 197,736 | 0.21 | 12,468,434 | 0.1 | 2.61 |
34 | Q3331189 | version, edition, or translation | 157,486 | 0.17 | 10,997,589 | 0.08 | 2.31 |
35 | Q18593264 | item of collection or exhibition | 147,402 | 0.16 | 10,732,969 | 0.08 | 2.25 |
36 | Q27020041 | sports season | 158,877 | 0.17 | 10,693,504 | 0.08 | 2.24 |
37 | Q355304 | watercourse | 174,620 | 0.19 | 10,080,421 | 0.08 | 2.11 |
38 | Q7725634 | literary work | 164,860 | 0.18 | 10,049,521 | 0.08 | 2.11 |
39 | Q23442 | island | 148,587 | 0.16 | 9,885,277 | 0.08 | 2.07 |
40 | Q11060274 | 119,806 | 0.13 | 9,700,063 | 0.07 | 2.03 | |
41 | Q811979 | architectural structure | 145,957 | 0.16 | 9,666,936 | 0.07 | 2.03 |
42 | Q5084 | hamlet | 118,188 | 0.13 | 9,013,534 | 0.07 | 1.89 |
43 | Q9842 | primary school | 157,451 | 0.17 | 8,916,373 | 0.07 | 1.87 |
44 | Q19389637 | biographical article | 151,026 | 0.16 | 8,238,397 | 0.06 | 1.73 |
45 | Q21014462 | cell line | 128,805 | 0.14 | 7,955,975 | 0.06 | 1.67 |
46 | Q47521 | stream | 124,853 | 0.13 | 6,654,366 | 0.05 | 1.4 |
47 | Q59199015 | group of stereoisomers | 111,599 | 0.12 | 5,843,270 | 0.04 | 1.23 |
48 | Q61443690 | branch post office | 129,183 | 0.14 | 5,313,033 | 0.04 | 1.11 |
49 | Q49008 | prime number | 127,545 | 0.14 | 5,188,768 | 0.04 | 1.09 |
50 | Q4164871 | position | 120,117 | 0.13 | 4,720,668 | 0.04 | 0.99 |