Jump to content

This is a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Category : Data pipelines

From Wikitech

Documentation of data ingestion and processing pipelines.

  • Includes documentation describing how specific datasets are derived or computed, for example: MediaWiki history computation (ingestion from DB, history rebuilding, computation of metrics, extraction onto other systems, ad-hoc querying).
  • Does not include documentation for the data platform infrastructure or system components that implement a given data pipeline, for example: Airflow, Gobblin.