You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

User:Luke Bowmaker/Data Value Streams: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Luke Bowmaker
(Updated event-platform Phab board links)
imported>Vpoundstone
m (fixed paragraph spacing)
 
Line 1: Line 1:
== Data Value Streams ==
== Value Streams ==
 
* TO DO: Populate table
{| class="wikitable"
{| class="wikitable"
|+
|+
Line 7: Line 5:
!Value Stream!!Objectives!!PM
!Value Stream!!Objectives!!PM
!EM
!EM
!Team
!Board
!Board
!Roadmap Link
!Roadmap Link
|-
|-
|'''Event Streams'''||
|'''Event Platform'''||
* Conclude the work from the Event Stream Experiments
* Conclude the work from the Event Stream Experiments
* Deliver a consolidated, enriched and ordered stream that is available to the community
* Deliver a consolidated, enriched and ordered stream that is available to the community
Line 22: Line 19:
|Luke
|Luke
|Will
|Will
|
|[[phab:tag/event-platform|#event-platform]]
|[[phab:tag/event-platform|#event-platform]]
|https://miro.com/app/board/uXjVOtmVf40=/
|https://miro.com/app/board/uXjVOtmVf40=/
Line 39: Line 35:
|Emil (Infra)/Luke (Services)
|Emil (Infra)/Luke (Services)
|Olja
|Olja
|
|[[phab:project/view/5616/|#Data-Pipelines]]
|[[phab:project/view/5616/|#Data-Pipelines]]
|Data Pipeline work for SDAW: https://miro.com/app/board/uXjVOk3r2pI=/?share_link_id=950082448064
|Data Pipeline work for SDAW: https://miro.com/app/board/uXjVOk3r2pI=/?share_link_id=950082448064
Line 52: Line 47:
|Luke
|Luke
|Atieno
|Atieno
|
|https://phabricator.wikimedia.org/project/view/5144/ (MediaWiki)
|[[phab:tag/api_platform/|#api_platform]]
[[phab:tag/api_platform/|#api_platform]] (AQS)
|https://miro.com/app/board/uXjVOuAghYU=/
|https://miro.com/app/board/uXjVOuAghYU=/
|-
|-
Line 66: Line 61:
|Emil
|Emil
|Olja
|Olja
|
|[[phab:project/board/5959/|#DSE(A)-Cluster]]
|[[phab:project/board/5959/|#DSE(A)-Cluster]]
|https://docs.google.com/presentation/d/1KfVkb5kgjARlM_WLfnc4g6_HEfNx0b4BNUcbnwMTmVI/edit?usp=sharing
|https://docs.google.com/presentation/d/1KfVkb5kgjARlM_WLfnc4g6_HEfNx0b4BNUcbnwMTmVI/edit?usp=sharing
Line 76: Line 70:
|Emil
|Emil
|Will
|Will
|
|[[phab:project/view/5324/|#Metrics_Platform]]  
|[[phab:project/view/5324/|#Metrics_Platform]]  
|https://miro.com/app/board/uXjVOmLYLsA=/?share_link_id=971253282885
|https://miro.com/app/board/uXjVOmLYLsA=/?share_link_id=971253282885
|-
|-
|API Platform|| ||Desiree [Virginia]
|'''API Platform'''|| Establish and adapt the practices of API Lifecycle Management
|
 
|
* Design API services based on user needs and use cases that align to WMF goals and strategy
|
* Build and test validated API Services
|
* Deliver secure and useful API services
* Maintain API services until deprecation
Develop an API Gateway to provide an interface that:
* routes public requests to internal services
 
* specifies rate-limits and access
* scales services for internal and external users
Launch an API Portal that is a self-service hub where internal and external developers can:
 
* browse available API services
* access API documentation
* share their projects, tips & tricks
 
Reporting & analytics capabilities that monitor API services so we can:
 
* diagnose and troubleshoot integration issues
* be alerted to potential issues for proactive address
* produce specific functional and usage information crucial to data-driven decision making about applications and services
 
|Desiree [Virginia]
|Atieno
|[[phab:tag/api_platform/|#api_platform]]
|To do
|-
|-
|'''Community Datasets (Dumps)'''||
|'''Community Datasets (Dumps)'''||
Line 91: Line 106:
|Emil?
|Emil?
|Will
|Will
|
|
|
|
|
Line 103: Line 117:
|Desiree
|Desiree
|Mat
|Mat
|
|[[phab:tag/thumbor/|#thumbor]]
|[[phab:tag/thumbor/|#thumbor]]
|https://miro.com/app/board/uXjVOtKCKBc=/
|https://miro.com/app/board/uXjVOtKCKBc=/
|}
|}
 
*
 
==Rituals==
 
*3 week sprints
*Daily stand-ups (except Friday)
*Retro after every sprint
*Sprint Planning
*Dependency mapping
*Scrum of Scrums once a sprint?
*Clinic/Ops session Once a week

Latest revision as of 17:18, 12 August 2022

Value Streams

Value Stream Objectives PM EM Board Roadmap Link
Event Platform
  • Conclude the work from the Event Stream Experiments
  • Deliver a consolidated, enriched and ordered stream that is available to the community
  • Deliver a way for internal teams to query the current state of MediaWiki with a delay of 3-4 hours, removing the reliance on the monthly dumps
  • Deploy Flink to the new DSE k8s cluster as an experimental/development environment
  • Deploy Flink to a production multi-dc environment
  • Build tooling to support Engineers who want to build event driven services
  • Build event driven data integration services that allow teams to be agnostic of the underlying database architecture
  • Build a current state store to allow bootstrapping of services and a view of the current state of MediaWiki
Luke Will #event-platform https://miro.com/app/board/uXjVOtmVf40=/
Data Pipelines & Services
  • Deliver a way for engineers, analysts and data users to create, deploy and test their own data pipelines.
    • Multi-tenantisie airflow with APIs.
    • Deliver clear documentation on how to write, deploy and monitor pipelines.
    • Deliver APIs to users hook their own development environments into airflow.
  • Deliver a consistent and reliable airflow experience to teams who need it.
    • Allow for creation of Data Pipelines that interact with our data, without kerberos acting as a blocker.
    • Deploy airflow to K8 (ideally DSE)
    • Provide a CI/CD interface for deploying and monitoring data pipelines.
  • Find a sustainable way to continue the work on migrating existing ETL Jobs to airflow, while making progress on other initiatives.
  • Support the Structured Data teams implementation of SDAW grant work: Section Topics Data Pipeline (Q1) and Section Level Image Suggestions (Q2)
Emil (Infra)/Luke (Services) Olja #Data-Pipelines Data Pipeline work for SDAW: https://miro.com/app/board/uXjVOk3r2pI=/?share_link_id=950082448064

https://miro.com/app/board/uXjVOmLYLsA=/?share_link_id=971253282885

RESTBase Deprecation
  • Port API Backends used by Visual Editor to CORE
  • Migrate AQS 2.0 to k8s production
  • Review services to understand required changes
  • Investigate and understand access control needs
  • Define a RESTBase end of life plan and fully deprecate RESTBase
Luke Atieno https://phabricator.wikimedia.org/project/view/5144/ (MediaWiki)

#api_platform (AQS)

https://miro.com/app/board/uXjVOuAghYU=/
Shared Data Infrastructure
  • Deploy Data Science and Engineering Kubernetes Cluster
    • Deploy a K8 Cluster using existing training wing hardware
    • Deploy a High Performance Ceph Cluster for Persistent volume storage.
    • Expand initial cluster with additional compute nodes.
  • Deploy a stateless pilot (Kubeflow or Flink?)
  • Deploy a stateful pilot (Data Warehouse)
  • Migrate JupyterHub
Emil Olja #DSE(A)-Cluster https://docs.google.com/presentation/d/1KfVkb5kgjARlM_WLfnc4g6_HEfNx0b4BNUcbnwMTmVI/edit?usp=sharing
Metrics Platform
  • Development and adoption of client libraries to generate MP Events.
  • Integrate feature flag functionality into the Metrics Platform libraries
  • Deliver a mechanism to run AB tests using MP libraries.
Emil Will #Metrics_Platform https://miro.com/app/board/uXjVOmLYLsA=/?share_link_id=971253282885
API Platform Establish and adapt the practices of API Lifecycle Management
  • Design API services based on user needs and use cases that align to WMF goals and strategy
  • Build and test validated API Services
  • Deliver secure and useful API services
  • Maintain API services until deprecation

Develop an API Gateway to provide an interface that:

  • routes public requests to internal services
  • specifies rate-limits and access
  • scales services for internal and external users

Launch an API Portal that is a self-service hub where internal and external developers can:

  • browse available API services
  • access API documentation
  • share their projects, tips & tricks

Reporting & analytics capabilities that monitor API services so we can:

  • diagnose and troubleshoot integration issues
  • be alerted to potential issues for proactive address
  • produce specific functional and usage information crucial to data-driven decision making about applications and services
Desiree [Virginia] Atieno #api_platform To do
Community Datasets (Dumps)
  • Automate the generation of Dumps (Using a data pipeline?)
  • Leverage the events experiments to make it more incremental?
Emil? Will
Platform Engineering Reliability
  • Upgrade Thumbor to Py3
  • Deploy Thumbor to Commons Beta and test
  • Deploy Thumbor to k8s
  • Tune Thumbor for prod traffic
  • Deprecate legacy Thumbor
Desiree Mat #thumbor https://miro.com/app/board/uXjVOtKCKBc=/