Difference between revisions of "Event Platform/Stream Configuration"

From Wikitech-static
Jump to navigation Jump to search
imported>Ottomata
imported>Ottomata
 
Line 20: Line 20:


== <code>stream</code> ==
== <code>stream</code> ==
The name of the stream.  TODO: this may be removed soon.  See https://phabricator.wikimedia.org/T277193
<code>wgEventStreams</code> is keyed by stream name.  The stream name is also available as the <code>stream</code> setting in API results.


== <code>schema_title</code> ==
== <code>schema_title</code> ==

Latest revision as of 13:52, 19 October 2021


Stream configuration refers to configuration that distributed producers or consumers of a stream might want, e.g. the sampling rate or the schema title of the events that are allowed in the stream. Stream configuration was originally a requested feature of Event Platform for Product engineers, so they could more easily vary some event stream producer setting without having to do code deploys. It has since become a critical part of Event Platform, used by multiple services.

EventStreamConfig

EventStreamConfig is a MediaWiki extension that implements PHP and HTTP API for requesting stream configuration. Streams configuration entries are declared in the $wgEventStreams global list in mediawiki-config.

This centralized EventStreamConfig is used by several services to automate discovery and configuration of stream producer and consumer clients:

  • EventGate service clusters uses stream config to restrict which types of events are allowed in which streams via tha schema_title setting.
  • The MediaWiki EventLogging extension uses stream config to vary things like event stream sampling rate.
  • The Analytics Cluster uses stream config to automate ingestion of streams into Hive.
  • EventStreams uses stream config to discover streams and auto-generate OpenAPI docs.

Usage

See the EventStreamConfig README.

Common Settings Documentation

In lieu of a better place, we'll try to document some of the common stream config settings here.

stream

wgEventStreams is keyed by stream name. The stream name is also available as the stream setting in API results.

schema_title

This much match exaclty the title of the event JSONSchema that is allowed in this stream.

destination_event_service

This refers to the name of the EventGate HTTP event intake service the stream should be produced through. Producer clients use this to figure out where to send the stream. The EventGate services also use this to determine if a stream is allowed to be produced through them.

NOTE: This should one day be moved into a producers config subobject.

canary_events_enabled

This aides in monitoring ingestion pipelines for event streams. If this is true (the default if not set), artificial canary events will periodically be produced into the stream. The canary events are created from the first event example in the schema, but with meta.dt at a current timestamp, and with meta.domain: "canary". Consumers of streams with canary_events_enabled: true should filter out all events where meta.domain == "canary".

consumers and producers

These sub object config settings should be used to configure specific clients that produce or consume this stream. The keys in this subobject should be the name of the client. Clients look up their configuration from the API by this name.

As of 2021-09, this is only used for the Analytics Hadoop ingestion pipeline. See also https://phabricator.wikimedia.org/T273235.