You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

User:AKhatun/Intro to WMF Search Data: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>AKhatun
(Start page, add intro)
 
imported>AKhatun
(Add brief description of data sources)
Line 21: Line 21:
## Sometimes the word or phrase you searched for has very few results. If the system thinks you meant something else, it will recommend a different (possibly correct) search. [https://en.wikipedia.org/w/index.php?search=alsha&title=Special:Search&profile=advanced&fulltext=1&ns0=1 Search for alsha], it will say ''Did you mean: alpha''. It still shows the little results it found for ''alsha'', but you can click on ''alpha'' and view those results instead.
## Sometimes the word or phrase you searched for has very few results. If the system thinks you meant something else, it will recommend a different (possibly correct) search. [https://en.wikipedia.org/w/index.php?search=alsha&title=Special:Search&profile=advanced&fulltext=1&ns0=1 Search for alsha], it will say ''Did you mean: alpha''. It still shows the little results it found for ''alsha'', but you can click on ''alpha'' and view those results instead.
# On the side are results from sister projects
# On the side are results from sister projects
# At the bottom of the results are results from other language wikis if applicable. [https://en.wikipedia.org/w/index.php?search=%E0%A6%AC%E0%A6%A8%E0%A7%8D%E0%A6%AF+%E0%A6%AA%E0%A7%8D%E0%A6%B0%E0%A6%BE%E0%A6%A8%E0%A6%BF&title=Special:Search&profile=advanced&fulltext=1&ns0=1 Search for বন্য প্রানি] for example.
# At the bottom of the results are results from other language wikis if applicable. [https://en.wikipedia.org/w/index.php?search=%E0%A6%AC%E0%A6%A8%E0%A7%8D%E0%A6%AF+%E0%A6%AA%E0%A7%8D%E0%A6%B0%E0%A6%BE%E0%A6%A8%E0%A6%BF&title=Special:Search&profile=advanced&fulltext=1&ns0=1 Search for বন্য প্রানি] (a not English query) in the English wikipedia, for example.
# Some wikis have results from Wikidata at the bottom as well.
# Some wikis have results from Wikidata at the bottom as well.


=== Useful resources ===
* [[mw:Wikimedia_Discovery/So_Many_Search_Options]]
* Default MediaWiki search: [[m:Help:Searching]]
* CirrusSearch: [[mw:Help:CirrusSearch]]
Few blog posts. Find more in [[wmfblog:|diff.wikimedia.org]].
* [[wmfblog:2015/12/23/search-and-discovery-on-wikipedia]]
* [[wmfblog:2021/02/22/in-search-of-the-perfect-search-for-wikipedia]]
* [[wmfblog:2019/03/12/the-anatomy-of-search-a-place-for-my-stuff]]


== Data Sources ==
== Data Sources ==
table name
 
desc
{| class="wikitable"
source
|+ Sources of data related to Search
format
|-
code
! Table name !! Database!! Description !! Docs !! Code
docs
|-
| mediawiki_cirrussearch_request || event || Also known as query logs. Contains all search events including the query, the various hits returned from one or more wiki projects, time taken, and other backend information || [https://schema.wikimedia.org/repositories//primary/jsonschema/mediawiki/cirrussearch/request/latest.yaml Schema] || -
|-
| searchsatisfaction || event || Table of various search events such as searchResultPage, click, checkin etc along with the query, number of hits returned and other search specific details. || [https://meta.wikimedia.org/wiki/Schema:SearchSatisfaction Schema] || [https://github.com/wikimedia/mediawiki-extensions-WikimediaEvents/blob/master/modules/ext.wikimediaEvents/searchSatisfaction.js Source Code]
|-
| query_clicks_hourly || discovery || A cross of mediawiki_cirrussearch_request and searchsatisfaction to list each search query with its list of hits returned and clicks by the user || [https://wikitech.wikimedia.org/wiki/Obsolete:Analytics/Data_Lake/Traffic/CirrusQueryClicks#discovery.query_clicks_hourly Schema] || [https://github.com/wikimedia/wikimedia-discovery-analytics/blob/master/airflow/dags/query_clicks.py#L87 Source Code]
|-
| query_clicks_daily || discovery || Sessionized version of the discovery.query_clicks_hourly table. Only contains queries with click throughs|| [https://wikitech.wikimedia.org/wiki/Obsolete:Analytics/Data_Lake/Traffic/CirrusQueryClicks#discovery.query_clicks_daily Schema] || [https://github.com/wikimedia/wikimedia-discovery-analytics/blob/master/airflow/dags/query_clicks.py#L341 Source Code]
|-
| search_satisfaction_daily || discovery || A sessionized daily version of the event.searchsatisfaction table. Each search session and most of its related information are aggregated in individual rows || - || [https://github.com/wikimedia/wikimedia-discovery-analytics/blob/master/spark/generate_daily_search_satisfaction.py Source Code]
|-
| fulltext_head_queries || discovery || Aggregate of queries and their results after making some minor alterations to the query string (e.g please and "please" --> please) || - || [https://github.com/wikimedia/wikimedia-discovery-analytics/blob/master/spark/fulltext_head_queries.py Source Code]
|}


= Table details =
= Table details =
== event table ==
== search satisfation event ==


== search satisfaction hourly ==
== event.mediawiki_cirrussearch_request ==
== event.searchsatisfaction ==
== discovery.query_clicks_hourly ==
== discovery.query_clicks_daily ==
== discovery.search_satisfaction_daily ==
== discovery.fulltext_head_queries ==

Revision as of 21:03, 2 August 2022

Search Data

The search platform team at the foundation saves some temporary data from searches done in various wikimedia projects, analyzing which can help us understand what improvements can benefit users and what we can do to create better search experience for them. To do this, we need to first understand how search works and what are the various data stored. This page is intended to help you get started with search and search data: with resources, links, and brief explanations. This is not an exhaustive list or a complete explanation of all things related to search.

Where can you search from?

How search works

As you start typing on any of the search boxes mentioned above, the search process has already started. Every letter/group of letter typed fires a search event; once you press enter/click the magnifying glass icon, an event is fired; once you click a search result from the search result page, another event is fired. More about events later.

Here are some of the possibilities with searching:

  1. You start typing in the GO box or any other mediawiki search bar. After each letter you type, you get a drop down of tittle suggestions. These are called autocomplete searches. Sometimes if you type mutiple letters with quick succession, you will get these suggestions when you pause.
    1. You can click one of the tittle suggestions and go to that page directly
    2. Or, you can press enter or select search for pages containing <your text>. This takes you to the search results page.
  2. In the search results page, you will see your search results, results from other langauge wikis (if applicable), results from sister projects, and advanced search options. This is also the search special page. You can continue to perform other searches from here or read your results.
    1. Sometimes, the word or phrase you searched for may have no results. If the system thinks you meant something else, it will search for that and show those results instead. Search for azpw, the results will be populated for the word aziz and says Showing results for aziz. No results found for azpw.
    2. Sometimes the word or phrase you searched for has very few results. If the system thinks you meant something else, it will recommend a different (possibly correct) search. Search for alsha, it will say Did you mean: alpha. It still shows the little results it found for alsha, but you can click on alpha and view those results instead.
  3. On the side are results from sister projects
  4. At the bottom of the results are results from other language wikis if applicable. Search for বন্য প্রানি (a not English query) in the English wikipedia, for example.
  5. Some wikis have results from Wikidata at the bottom as well.

Useful resources

Few blog posts. Find more in diff.wikimedia.org.

Data Sources

Sources of data related to Search
Table name Database Description Docs Code
mediawiki_cirrussearch_request event Also known as query logs. Contains all search events including the query, the various hits returned from one or more wiki projects, time taken, and other backend information Schema -
searchsatisfaction event Table of various search events such as searchResultPage, click, checkin etc along with the query, number of hits returned and other search specific details. Schema Source Code
query_clicks_hourly discovery A cross of mediawiki_cirrussearch_request and searchsatisfaction to list each search query with its list of hits returned and clicks by the user Schema Source Code
query_clicks_daily discovery Sessionized version of the discovery.query_clicks_hourly table. Only contains queries with click throughs Schema Source Code
search_satisfaction_daily discovery A sessionized daily version of the event.searchsatisfaction table. Each search session and most of its related information are aggregated in individual rows - Source Code
fulltext_head_queries discovery Aggregate of queries and their results after making some minor alterations to the query string (e.g please and "please" --> please) - Source Code

Table details

event.mediawiki_cirrussearch_request

event.searchsatisfaction

discovery.query_clicks_hourly

discovery.query_clicks_daily

discovery.search_satisfaction_daily

discovery.fulltext_head_queries