Data Platform/Systems
Appearance
(Redirected from
Data Engineering/Systems
)
These subpages explain in technical detail the systems that process data for analytics at Wikimedia Foundation.
They include information about setup, maintenance, architecture, and more.
Search within the Data Platform/Systems docs
Child Pages of Data Platform/Systems
AQS
·
Airflow
·
Analytics Meta
·
Archiva
·
Bigtop Packages
·
Blunderbuss
·
Ceph
·
Cluster
·
Conda
·
Coordinator
·
DB Replica
·
Dashiki
·
DataHub
·
Data Quality
·
Data deletion and sanitization
·
Dealing with data loss alarms
·
Druid
·
Edit data loading
·
Edit history administration
·
Edit serving layer
·
Event Data retention
·
Exporting from HDFS to Swift
·
FerretDB
·
Geolocation
·
Gobblin
·
Hadoop
·
Hadoop Event Ingestion Lifecycle
·
Hive
·
Hive to Druid Ingestion Pipeline
·
Iceberg
·
Java
·
Jupyter
·
Kerberos
·
Maintenance Schedule
·
Managing systemd timers
·
Manual maintenance
·
Matomo
·
MediaWiki replicas
·
Mediawiki History Snapshot Check
·
Mediawiki history reduced algorithm
·
OpenSearch-on-K8s
·
Page and user history reconstruction
·
Page and user history reconstruction algorithm
·
PostgreSQL
·
Presto
·
R
·
Refine
·
Reportupdater
·
Revision augmentation and denormalization
·
Siege
·
Spark
·
Stat hosts
·
Superset
·
System users
·
Turnilo
·
Varnishkafka
·
Wikistats
·
Wikistats 2
·
analytics.wikimedia.org
·
ua-parser
All Subpages of Data Platform/Systems
- AQS
- AQS/OpenAPI spec style guide
- AQS/Scaling
- AQS/Scaling/2016/Hardware Refresh
- AQS/Scaling/2017/Cluster Expansion
- AQS/Scaling/2020/Cluster Expansion
- AQS/Scaling/LoadTesting
- Airflow
- Airflow/Developer guide
- Airflow/Developer guide/Normalize a DAG
- Airflow/Developer guide/Python Job Repos
- Airflow/Instances
- Airflow/Kubernetes
- Airflow/Kubernetes/Administration
- Airflow/Kubernetes/Operations
- Airflow/Kubernetes/Operations/K8s-Migration
- Airflow/Upgrading
- Analytics Meta
- Archiva
- Bigtop Packages
- Blunderbuss
- Ceph
- Ceph/Troubleshooting
- Ceph/Upgrading
- Cluster
- Cluster/Geotagging
- Cluster/Hadoop/Load
- Cluster/OpenSearch
- Cluster/Spark History
- Conda
- Coordinator
- DB Replica
- Dashiki
- Dashiki/Configuration
- DataHub
- DataHub/Administration
- DataHub/Data Catalog Documentation Guide
- DataHub/Upgrading
- Data Quality
- Data deletion and sanitization
- Dealing with data loss alarms
- Druid
- Druid/Alerts
- Druid/Load test
- Edit data loading
- Edit history administration
- Edit serving layer
- Event Data retention
- Event Data retention/AppInstallId
- Exporting from HDFS to Swift
- FerretDB
- Geolocation
- Gobblin
- Hadoop
- Hadoop/Administration
- Hadoop/Alerts
- Hadoop/Test
- Hadoop Event Ingestion Lifecycle
- Hive
- Hive/Alerts
- Hive/Avro
- Hive/Compression
- Hive/Counting uniques
- Hive/Queries
- Hive/Queries/Wikidata
- Hive/Querying using UDFs
- Hive to Druid Ingestion Pipeline
- Iceberg
- Iceberg/Migration Dependencies
- Java
- Jupyter
- Jupyter/Administration
- Kerberos
- Kerberos/Administration
- Maintenance Schedule
- Managing systemd timers
- Manual maintenance
- Matomo
- MediaWiki replicas
- Mediawiki History Snapshot Check
- Mediawiki history reduced algorithm
- OpenSearch-on-K8s
- OpenSearch-on-K8s/Administration
- Page and user history reconstruction
- Page and user history reconstruction algorithm
- PostgreSQL
- PostgreSQL/Backup and Restore
- PostgreSQL/Clusters
- PostgreSQL/Operations
- Presto
- Presto/Administration
- Presto/Query Logger
- R
- Refine
- Refine/Deploy Refinery
- Refine/Deploy Refinery-source
- Reportupdater
- Revision augmentation and denormalization
- Siege
- Spark
- Spark/Administration
- Spark/Kubernetes
- Stat hosts
- Superset
- Superset/Administration
- Superset/Date functions
- System users
- Turnilo
- Varnishkafka
- Wikistats
- Wikistats/Deprecation of Wikistats 1
- Wikistats/Traffic
- Wikistats 2
- Wikistats 2/Map Component
- Wikistats 2/Metrics/FAQ
- Wikistats 2/Smoke Testing
- analytics.wikimedia.org
- ua-parser
- ua-parser/2019-09-18 Update