You are browsing a read-only backup copy of Wikitech. The primary site can be found at

User:Luke Bowmaker/Data Value Streams: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
m (Wrong name)
imported>Luke Bowmaker
(Updated event-platform Phab board links)
Line 23: Line 23:

Revision as of 14:44, 11 August 2022

Data Value Streams

  • TO DO: Populate table
Value Stream Objectives PM EM Team Board Roadmap Link
Event Streams
  • Conclude the work from the Event Stream Experiments
  • Deliver a consolidated, enriched and ordered stream that is available to the community
  • Deliver a way for internal teams to query the current state of MediaWiki with a delay of 3-4 hours, removing the reliance on the monthly dumps
  • Deploy Flink to the new DSE k8s cluster as an experimental/development environment
  • Deploy Flink to a production multi-dc environment
  • Build tooling to support Engineers who want to build event driven services
  • Build event driven data integration services that allow teams to be agnostic of the underlying database architecture
  • Build a current state store to allow bootstrapping of services and a view of the current state of MediaWiki
Luke Will #event-platform
Data Pipelines & Services
  • Deliver a way for engineers, analysts and data users to create, deploy and test their own data pipelines.
    • Multi-tenantisie airflow with APIs.
    • Deliver clear documentation on how to write, deploy and monitor pipelines.
    • Deliver APIs to users hook their own development environments into airflow.
  • Deliver a consistent and reliable airflow experience to teams who need it.
    • Allow for creation of Data Pipelines that interact with our data, without kerberos acting as a blocker.
    • Deploy airflow to K8 (ideally DSE)
    • Provide a CI/CD interface for deploying and monitoring data pipelines.
  • Find a sustainable way to continue the work on migrating existing ETL Jobs to airflow, while making progress on other initiatives.
  • Support the Structured Data teams implementation of SDAW grant work: Section Topics Data Pipeline (Q1) and Section Level Image Suggestions (Q2)
Emil (Infra)/Luke (Services) Olja #Data-Pipelines Data Pipeline work for SDAW:

RESTBase Deprecation
  • Port API Backends used by Visual Editor to CORE
  • Migrate AQS 2.0 to k8s production
  • Review services to understand required changes
  • Investigate and understand access control needs
  • Define a RESTBase end of life plan and fully deprecate RESTBase
Luke Atieno #api_platform
Shared Data Infrastructure
  • Deploy Data Science and Engineering Kubernetes Cluster
    • Deploy a K8 Cluster using existing training wing hardware
    • Deploy a High Performance Ceph Cluster for Persistent volume storage.
    • Expand initial cluster with additional compute nodes.
  • Deploy a stateless pilot (Kubeflow or Flink?)
  • Deploy a stateful pilot (Data Warehouse)
  • Migrate JupyterHub
Emil Olja #DSE(A)-Cluster
Metrics Platform
  • Development and adoption of client libraries to generate MP Events.
  • Integrate feature flag functionality into the Metrics Platform libraries
  • Deliver a mechanism to run AB tests using MP libraries.
Emil Will #Metrics_Platform
API Platform Desiree [Virginia]
Community Datasets (Dumps)
  • Automate the generation of Dumps (Using a data pipeline?)
  • Leverage the events experiments to make it more incremental?
Emil? Will
Platform Engineering Reliability
  • Upgrade Thumbor to Py3
  • Deploy Thumbor to Commons Beta and test
  • Deploy Thumbor to k8s
  • Tune Thumbor for prod traffic
  • Deprecate legacy Thumbor
Desiree Mat #thumbor


  • 3 week sprints
  • Daily stand-ups (except Friday)
  • Retro after every sprint
  • Sprint Planning
  • Dependency mapping
  • Scrum of Scrums once a sprint?
  • Clinic/Ops session Once a week