You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org
What We Own
We maintain the big data platform including the data lake, ingestion and processing pipelines, as well as a number of systems to explore and visualize the data.
Please see Analytics/Systems for a more comprehensive list of the systems we maintain.
|System name and link||Type||Accessibility|
|Archiva||Repository for Java archives||Private|
|AQS - Analytics Query Service||REST API for analytics data||Public|
|Clients (stat100X)||Analytics client nodes to access Hadoop and various services||Private|
|Cluster (Hadoop, Gobblin, Hive, Oozie, Spark...)||Hadoop||Private|
|Dashiki||Framework for building dashboards||Public|
|Druid||Data storage engine optimized for exploratory analytics||Private|
|EventLogging||Ad-hoc streaming pipeline||Private|
|EventStreams||Mediawiki events streams||Public|
|Hue||Web interface for Hive, Oozie, and other Cluster services||Private|
|Kafka||Data transport and streaming system||Private|
|MariaDB||Data storage for MediaWiki replicas and EventLogging||Private|
|Matomo (formerly known as Piwik)||Small-scale web analytics platform||Private|
|Presto||Big data high performance sql query engine||Private|
|Superset||Web interface for data visualization and exploration||Private|
|Jupyter||Hosted notebooks for data analysis||Private|
|Turnilo||Web interface for exploring data stored in Druid||Private|
|Wikistats (1 and 2)||Community Dashboard with high-level metrics||Public|
The list of scheduled manual maintenance tasks are documented here.
Please also refer to Analytics/Data Lake for more liks to reference material.
- Webrequests [Traffic logs] and derived tables, including:
- Mediawiki raw databases
- EventLogging (in the event database in hive)
- Edits history, Page history, User history
- Other reports
Please also refer to Analytics/Systems/Cluster for more reference information about the pipelines we manage.
- Traffic data
- Webrequest, pageviews, and unique devices
- Edits data
- Historical data about revisions, pages, and users (e.g. MediaWiki History)
- Content data
- Wikitext (latest & historical) and wikidata-entities
- Events data
- EventLogging, EventBus and event streams data (raw, refined, sanitized)
- ORES scores