You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

User:AndreaWest/WDQS Q and A: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>AndreaWest
(Added query analyis questions)
 
imported>DCausse
No edit summary
Line 3: Line 3:
== SPARQL SERVICEs vs Functions ==
== SPARQL SERVICEs vs Functions ==
'''Question''': Are Wikidata custom SERVICES different than Federated Query?  
'''Question''': Are Wikidata custom SERVICES different than Federated Query?  
: '''Answer'''
:: I think it's mostly a reason of feasibility of the two different ways to implement extensions in blazegraph. Service extensions allows to control a lot more aspects of blazegraph internal I believe.
Background:  
Background:  
* Using the keyword, SERVICE, means that they are considered as Federated Queries - which can be seen because the Wikidata federated endpoints are also referenced using SERVICE.
* Using the keyword, SERVICE, means that they are considered as Federated Queries - which can be seen because the Wikidata federated endpoints are also referenced using SERVICE.
* Currently, there are SERVICES for labels, geospatial stuff, dates and a bit more.
* Currently, there are SERVICES for labels, geospatial stuff, dates and a bit more.
'''Question''': Why were SPARQL custom functions not added/used instead?
'''Question''': Why were SPARQL custom functions not added/used instead?
 
: I don't have a clear answer to this and it might because functions are not always appropriate to do what we needed. Technically (blazegraph specific) speaking extension points extending the SERVICE tag seems to have more control about the context of the query (access all variables and bind new ones).
: '''Answer'''
::For the custom services we have:
::* the '''label''' service wants to access all variables to do some magic with <code>?item</code> - <code>?itemLabel</code> pairs, rewriting this as function would certainly be a lot more verbose.
::* '''mwapi''' service: is very similar to a federation endpoint so I think it made to implement as a <code>SERVICE</code>
::* '''gas:service''' (owned by blazegraph): https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_optimization#GAS_Service is using a SERVICE certainly because a function was not appropriate
: Service integration allows deeper integration with blazegraph
'''Question''': Is/was there a reason to prefer the GeoSpatial SERVICE over GeoSPARQL native support? I.E., instead of a SERVICE, you just "natively" write a query such as:
'''Question''': Is/was there a reason to prefer the GeoSpatial SERVICE over GeoSPARQL native support? I.E., instead of a SERVICE, you just "natively" write a query such as:
   SELECT ?geom ?feature {
   SELECT ?geom ?feature {
Line 15: Line 22:
       geo:hasGeometry ?geom .
       geo:hasGeometry ?geom .
     ?geom geof:within (38.855 -77.111 38.885 -77.052) }
     ?geom geof:within (38.855 -77.111 38.885 -77.052) }
:'''Answer'''
:: Blazegraph does not seem to support GeoSPARQL but we seem to have implemented some part of it like <code>geof:distance</code>, <code>geof:globe</code>, <code>geof:latitude</code> & <code>geof:longitude</code>. Why didn't we implement <code>geof:within</code> like a function but as a service like <code>wikibase:box</code>? I have no clue, I bet that internal function extension point in blazegraph is not flexible enough to allow similar feature.
:'''General comment''': it might make sense to revisit the various features we added if actual standards are available.


== Wikidata Query Questions ==
== Wikidata Query Questions ==

Revision as of 18:32, 26 January 2022

The following are the many questions that I have asked over the last few days and the answers that I have received so far. I appreciate everyone's patience and insights.

SPARQL SERVICEs vs Functions

Question: Are Wikidata custom SERVICES different than Federated Query?

Answer
I think it's mostly a reason of feasibility of the two different ways to implement extensions in blazegraph. Service extensions allows to control a lot more aspects of blazegraph internal I believe.

Background:

  • Using the keyword, SERVICE, means that they are considered as Federated Queries - which can be seen because the Wikidata federated endpoints are also referenced using SERVICE.
  • Currently, there are SERVICES for labels, geospatial stuff, dates and a bit more.

Question: Why were SPARQL custom functions not added/used instead?

I don't have a clear answer to this and it might because functions are not always appropriate to do what we needed. Technically (blazegraph specific) speaking extension points extending the SERVICE tag seems to have more control about the context of the query (access all variables and bind new ones).
Answer
For the custom services we have:
  • the label service wants to access all variables to do some magic with ?item - ?itemLabel pairs, rewriting this as function would certainly be a lot more verbose.
  • mwapi service: is very similar to a federation endpoint so I think it made to implement as a SERVICE
  • gas:service (owned by blazegraph): https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_optimization#GAS_Service is using a SERVICE certainly because a function was not appropriate
Service integration allows deeper integration with blazegraph

Question: Is/was there a reason to prefer the GeoSpatial SERVICE over GeoSPARQL native support? I.E., instead of a SERVICE, you just "natively" write a query such as:

 SELECT ?geom ?feature {
   ?f a :Location ;
      rdfs:label ?feature ;
      geo:hasGeometry ?geom .
   ?geom geof:within (38.855 -77.111 38.885 -77.052) }
Answer
Blazegraph does not seem to support GeoSPARQL but we seem to have implemented some part of it like geof:distance, geof:globe, geof:latitude & geof:longitude. Why didn't we implement geof:within like a function but as a service like wikibase:box? I have no clue, I bet that internal function extension point in blazegraph is not flexible enough to allow similar feature.
General comment: it might make sense to revisit the various features we added if actual standards are available.

Wikidata Query Questions

Question: For all wikidata, what is the count/prevalence of items that are only used as subjects (NOT used as objects)?

Question: Same question as above but for scholarly article items only

Question: Do queries ever use the SPARQL forms, CONSTRUCT, INSERT, DELETE and DESCRIBE? (Or only SELECT and ASK)

Question: Do queries use SPARQL functions such as CONCAT? If so, what functions are used?

Question: What SPARQL functions are used and does their use correlate with timeouts occurring?

Question: I see in the second table in WDQS Triples Analysis, Node Type Distribution that NODE_LITERAL is the SUBJECT 30 times? What is an example of this? This seems wrong to occur at all.