You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Event Platform: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Ottomata
imported>Ottomata
(→‎History: added link to mw: Event Platform Value Stream)
 
(24 intermediate revisions by 6 users not shown)
Line 1: Line 1:
The Wikimedia Event Platform refers to the various event stream distribution and processing systems we employ at the Wikimedia foundation. There are various components that make up the event platform, including 'legacy' and 'modern'This page documents these different components and provides some historical context and also outlines future work and intentions.
{{Navigation Event Platform}}
The Wikimedia Event Platform exists to enable software and systems engineers to build loosely coupled software systems with [https://en.wikipedia.org/wiki/Event-driven_architecture event driven architectures]Systems built this way can be easily distributed and decoupled, allowing for inherent horizontal scalability and versatility with respect to yet-unknown use cases.


== Philosophy and data model ==
In typical web stacks, when data changes occur, those changes are reflected as an update to a state store.  For example, if a user renames a wikipedia page title, that page's record in a database table would have its page title field updated with the new page title value.  Modeling data in this way works if all you care about is the current state.  But by keeping only the latest state, we discard a huge piece of useful information: the history of changes.


More documentation can be found on component specific pages:
Instead of updating a database when a state change happens, we can choose to model the state change as what is is: an 'event'.  An event is something happening at a specific time, e.g. 'page id 123 was renamed from title_A to title_B by user X at 2020-06-25T12:34:00Z'.  If we keep the history of all events, we can always use them recreate the current state as well as the state at any point in the past.  An event based data model decouples the occurrence of an event from any downstream state changes.  Multiple consumers can be notified of the events occurrence, which enables engineers to build new systems based on the events without interfering with the producers or other consumers of those events.


* [[Event Platform/EventGate]] - Documentation about EventGate event intake services.
== Background reading ==
* [[Event Platform/EventStreams]] & [[Event Platform/EventStreams/Administration]] - Documentation about the public event publishing service.
* [https://martinfowler.com/articles/201701-event-driven.html Martin Fowler - What do you mean by "Event-Driven"]
* [[Analytics/Systems/EventLogging]] & [[Analytics/Systems/EventLogging/Administration]] - Documentation about the original EventLogging Analytics event intake and distribution systems.
* [https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/ Confluent - Turning the Database Inside Out]
* [https://www.confluent.io/blog/event-driven-2-0/ Confluent - Event Driven 2.0]
* [https://assets.confluent.io/m/7a91acf41502a75e/original/20180328-EB-Confluent_Designing_Event_Driven_Systems.pdf Confluent - Designing Event Driven Systems]
* [https://www.confluent.io/blog/data-dichotomy-rethinking-the-way-we-treat-data-and-services/ Confluent - The Data Dichotomy: Rethinking the Way We Treat Data and Services]
* [https://www.confluent.io/blog/stream-data-platform-1/ Confluent - Stream data platform 1]
* [https://www.confluent.io/blog/messaging-single-source-truth/ Confluent - Messaging as the Single Source of Truth]
* [https://www.confluent.io/blog/build-services-backbone-events/ Confluent - Build Services on a Backbone of Events]
* [https://medium.com/@hugo.oliveira.rocha/what-they-dont-tell-you-about-event-sourcing-6afc23c69e9a What they dont tell you about event sourcing]
* [https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/ Confluent - Kafka Streams: Stream Processing Made Simple] (This is about Kafka streams, but the discussion of streams vs. table and stateful streaming is good.)
* [https://techblog.wikimedia.org/2020/09/10/wikimedias-event-data-platform-or-json-is-ok-too/ WMF Tech Blog - Wikimedia’s Event Data Platform] ([https://techblog.wikimedia.org/2020/09/10/wikimedias-event-data-platform-or-json-is-ok-too/ part 1], [https://techblog.wikimedia.org/2020/09/17/wikimedias-event-data-platform-json-event-schemas/ part 2], [https://techblog.wikimedia.org/2020/09/24/wikimedias-event-data-platform-event-intake/ part 3])
* [https://www.confluent.io/events/current-2022/wikipedias-event-data-platform-or-json-is-okay-too/ Wikipedia’s Event Data Platform, Or: JSON Is Okay Too], Andrew Otto, Confluent Current 2022 ([[c:File:Current_2022_-_Wikimedia's_Event_Data_Platform,_or_JSON_is_okay_too.pdf|slides]])


* [[Event Platform/Schemas]] - Documentation about event schema conventions, repositories and services.
== Event Platform generic concepts ==
{| class="wikitable"
|-
| '''event''' || A strongly typed and schema-ed piece of data, usually representing something happening at a definite time.  E.g. 'revision create','user button click', 'page view', etc.
|-
| '''event schema''' (AKA 'schema') || A datatype of an event.  The event schema describes the data model of any given event and is used for validation upon receipt of events, as well as for data storage integration (an event schema can be mapped to an RDBS schema, etc.).  An event schema is just like a programming data type.
|-
| '''event stream''' (AKA 'stream') || A continuous collection of events (loosely) ordered by time.
|-
| '''event bus''' || A publish & subscribe (PubSub) message queue where events are initially produced to and consumed from.  WMF uses [https://kafka.apache.org/ Apache Kafka].  (NOTE: here 'event bus' is meant to be a generic term and is not referencing the MW EventBus extension).
|-
| '''event producer''' || Producers create events and produce them to specific event streams in the event bus.
|-
| '''event consumer''' || Consumers consume events from specific event streams in the event bus.
|-
| '''event intake service''' || An HTTP service that accepts events, validates them against their schema, and produces them to a specific event stream in the event bus.  WMF uses [[Event Platform/EventGate|EventGate]].
|-
| [https://en.wikipedia.org/wiki/Event_stream_processing '''Event stream processing'''] || Refers to any software that consumes, transforms, and produces event streams.  This includes simple event processing, as well as [https://en.wikipedia.org/wiki/Complex_event_processing complex event processing] and [https://docs.confluent.io/platform/current/streams/concepts.html#stateful-stream-processing stateful stream processing].  This is usually done using a distributed framework of some kind, e.g. [https://flink.apache.org/ Apache Flink], [https://spark.apache.org/ Apache Spark], or [https://kafka.apache.org/documentation/streams/ Kafka Streams], but also includes simpler home grown technologies like [[Changeprop|Change-propogation]].
|}


* [[phab:T185233|Modern Event Platform Phabricator Parent Task]]
== Platform Architecture Diagram ==
* [[Event*]] - Event Platform/Service disambiguation page.
[[File:Modern Event Platform Architecture Diagram.jpeg|400px]]


== Platform Architecture Summary ==
== History ==
TODO


== Historical Overview ==
The first event platform at WMF was [[EventLogging]].  This system originally used ZeroMQ to transport messages between its various services, but was later improved to use Kafka.  It was built to collect client-side performance measures and click tracking data, and has also been adopted in our mobile apps.  It used a Meta-Wiki (meta.wikimedia.org) namespace to store its schemas, which were then referenced by stable revision ID to validate incoming events before producing them to the primary ZeroMQ/Kafka topics for consumption.
Some terms:


* '''Event''' - A strongly typed and schemaed piece of data, usually representing something happening at a definite time. E.g. revision-create, user-button-click, page-load, etc.
The original [[EventLogging]] with ZeroMQ eventually reached a scaling problem in 2014. It ran from a single server without automatic failover and did not support multiple <code>eventlogging-processor</code> servers running concurrency, sustaining around 1000 events per second.<ref>[https://docs.google.com/presentation/d/1dmifA6lGl9k6mRZPjdCBiH2ebzRFvhTuBPFONoQ3xQE/edit#slide=id.g468cc95171_0_97 Event Infrastructure at WMF (2007-2018) by Andrew Otto] (WMF-restricted)</ref>  In 2014, EventLogging intake was migrated to Kafka, with the eventlogging-processor instances able to scale horizontally to multiple servers with Kafka naturally distributing traffic and effectively failing over as-needed.
* '''Stream''' - A contiguous (often unending) collection of events (loosely) ordered by time.


The first 'event platform' at WMF was [[EventLogging]].  This system originally used ZeroMQ to transport messages between its various 'services' but was later improved to use Kafka. It was built for WMF product teams to be able to instrument and measure WMF features and usage on websites and apps.  It used a (hardcoded) on-wiki event 'schema repository' to validate incoming events.
EventLogging was designed for instrumenting features for telemetry, tracking interactions and recording measurements to give us insights into how features are used by real users. The system did not have built-in support for responding to incoming events or taking actions in response to them. Such larger pipelines were supported using the python-eventlogging client and deploying a separate microservice based on a minimal template.


In 2015, an effort was made to extend the analytics focus of EventLogging to use for production events.  This effort was dubbed 'EventBus' and culminated in three new components: the Mediawiki EventBus extension, the mediawiki/event-schemas git repository, and eventlogging-service-eventbus.  eventlogging-service-eventbus is the first HTTP POST (internal) endpoint.  It validated and produced events against more tightly controlled production schemas, and produced them to Kafka. EventBus was used to build the Change Propagation service. We originally intended to merge the analytics vs. production uses of EventLogging.
In 2015, an effort was made to extend the analytics focus of EventLogging to "from and to production" event-driven pipelines.  This effort was dubbed "EventBus" and culminated in three new components: the EventBus extension for MediaWiki, the <code>mediawiki/event-schemas</code> Git repository, and <code>eventlogging-service-eventbus</code>.  eventlogging-service-eventbus is the the internal frontend that accepts HTTP POST.  It validated and produced events against more tightly controlled production schemas, and produced them to Kafka. EventBus was used to build the ChangeProp service. We originally intended to merge the analytics and production uses of EventLogging.


In 2018, we started the Modern Event Platform program, which included EventBus's original analytics+production unification goal as well as other parts of WMF's event processing stack using open source (non-homegrown) components where possible.  The EventLogging python codebase was too WMF and Mediawiki specific to easily accomplish the unification, so it was decided to build a new more generic and extensible JSONSchema event service, eventually entitled EventGate.
In 2018, we started the "Modern Event Platform" program, which included EventBus's original analytics+production unification goal as well as other parts of WMF's event processing stack using open source (non-homegrown) components where possible.  The EventLogging python codebase was specific to WMF and MediaWiki, making it more difficult to accomplish this unification, so it was decided to build a new more generic and extensible JSONSchema event service, eventually entitled [[Event Platform/EventGate|EventGate]].
 
In 2019, EventGate, along with other Modern Event Platform components, replaced eventlogging-service-eventbus, and is intended to eventually replace the 'analytics' deployment of EventLogging services (e.g. eventlogging-processor).
 
Today, events logged via EventLogging don't go to a [[:en:MySQL|MySQL]] database, instead data analysts can find the data in [[:en:Apache_Hadoop|Hadoop]], which enables a greater volume of events to be stored.
 
In 2022, The "[[mw:Platform_Engineering_Team/Event_Platform_Value_Stream|Event Platform Value Stream]]" working group was created and tasks with working on the Stream Connectors and Stream Processing components.  Evolution of these components is being driven by work to improve the ability to externalize the current state of MediaWiki pages using event streams.
 
== References ==
 
* [https://phabricator.wikimedia.org/T185233 T185233 Modern Event Platform]


In 2019, EventGate, along with other Modern Event Platform components, replaced eventlogging-service-eventbus, and is intended to eventually replace as the 'analytics' deployment of EventLogging services (e.g. eventlogging-processor).
[[Category:Services]]
[[Category:Services]]
[[Category:Event Platform]]

Latest revision as of 16:52, 15 November 2022

The Wikimedia Event Platform exists to enable software and systems engineers to build loosely coupled software systems with event driven architectures. Systems built this way can be easily distributed and decoupled, allowing for inherent horizontal scalability and versatility with respect to yet-unknown use cases.

Philosophy and data model

In typical web stacks, when data changes occur, those changes are reflected as an update to a state store. For example, if a user renames a wikipedia page title, that page's record in a database table would have its page title field updated with the new page title value. Modeling data in this way works if all you care about is the current state. But by keeping only the latest state, we discard a huge piece of useful information: the history of changes.

Instead of updating a database when a state change happens, we can choose to model the state change as what is is: an 'event'. An event is something happening at a specific time, e.g. 'page id 123 was renamed from title_A to title_B by user X at 2020-06-25T12:34:00Z'. If we keep the history of all events, we can always use them recreate the current state as well as the state at any point in the past. An event based data model decouples the occurrence of an event from any downstream state changes. Multiple consumers can be notified of the events occurrence, which enables engineers to build new systems based on the events without interfering with the producers or other consumers of those events.

Background reading

Event Platform generic concepts

event A strongly typed and schema-ed piece of data, usually representing something happening at a definite time. E.g. 'revision create','user button click', 'page view', etc.
event schema (AKA 'schema') A datatype of an event. The event schema describes the data model of any given event and is used for validation upon receipt of events, as well as for data storage integration (an event schema can be mapped to an RDBS schema, etc.). An event schema is just like a programming data type.
event stream (AKA 'stream') A continuous collection of events (loosely) ordered by time.
event bus A publish & subscribe (PubSub) message queue where events are initially produced to and consumed from. WMF uses Apache Kafka. (NOTE: here 'event bus' is meant to be a generic term and is not referencing the MW EventBus extension).
event producer Producers create events and produce them to specific event streams in the event bus.
event consumer Consumers consume events from specific event streams in the event bus.
event intake service An HTTP service that accepts events, validates them against their schema, and produces them to a specific event stream in the event bus. WMF uses EventGate.
Event stream processing Refers to any software that consumes, transforms, and produces event streams. This includes simple event processing, as well as complex event processing and stateful stream processing. This is usually done using a distributed framework of some kind, e.g. Apache Flink, Apache Spark, or Kafka Streams, but also includes simpler home grown technologies like Change-propogation.

Platform Architecture Diagram

Modern Event Platform Architecture Diagram.jpeg

History

The first event platform at WMF was EventLogging. This system originally used ZeroMQ to transport messages between its various services, but was later improved to use Kafka. It was built to collect client-side performance measures and click tracking data, and has also been adopted in our mobile apps. It used a Meta-Wiki (meta.wikimedia.org) namespace to store its schemas, which were then referenced by stable revision ID to validate incoming events before producing them to the primary ZeroMQ/Kafka topics for consumption.

The original EventLogging with ZeroMQ eventually reached a scaling problem in 2014. It ran from a single server without automatic failover and did not support multiple eventlogging-processor servers running concurrency, sustaining around 1000 events per second.[1] In 2014, EventLogging intake was migrated to Kafka, with the eventlogging-processor instances able to scale horizontally to multiple servers with Kafka naturally distributing traffic and effectively failing over as-needed.

EventLogging was designed for instrumenting features for telemetry, tracking interactions and recording measurements to give us insights into how features are used by real users. The system did not have built-in support for responding to incoming events or taking actions in response to them. Such larger pipelines were supported using the python-eventlogging client and deploying a separate microservice based on a minimal template.

In 2015, an effort was made to extend the analytics focus of EventLogging to "from and to production" event-driven pipelines. This effort was dubbed "EventBus" and culminated in three new components: the EventBus extension for MediaWiki, the mediawiki/event-schemas Git repository, and eventlogging-service-eventbus. eventlogging-service-eventbus is the the internal frontend that accepts HTTP POST. It validated and produced events against more tightly controlled production schemas, and produced them to Kafka. EventBus was used to build the ChangeProp service. We originally intended to merge the analytics and production uses of EventLogging.

In 2018, we started the "Modern Event Platform" program, which included EventBus's original analytics+production unification goal as well as other parts of WMF's event processing stack using open source (non-homegrown) components where possible. The EventLogging python codebase was specific to WMF and MediaWiki, making it more difficult to accomplish this unification, so it was decided to build a new more generic and extensible JSONSchema event service, eventually entitled EventGate.

In 2019, EventGate, along with other Modern Event Platform components, replaced eventlogging-service-eventbus, and is intended to eventually replace the 'analytics' deployment of EventLogging services (e.g. eventlogging-processor).

Today, events logged via EventLogging don't go to a MySQL database, instead data analysts can find the data in Hadoop, which enables a greater volume of events to be stored.

In 2022, The "Event Platform Value Stream" working group was created and tasks with working on the Stream Connectors and Stream Processing components. Evolution of these components is being driven by work to improve the ability to externalize the current state of MediaWiki pages using event streams.

References