You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Event Platform/Stream Configuration: Difference between revisions
imported>Ottomata |
imported>Ottomata (Added Common Settings Documentation) |
||
Line 12: | Line 12: | ||
== Usage == | == Usage == | ||
See the [https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/EventStreamConfig/#mediawiki-config EventStreamConfig README]. | See the [https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/EventStreamConfig/#mediawiki-config EventStreamConfig README]. | ||
= Common Settings Documentation = | |||
In lieu of a better place, we'll try to document some of the common stream config settings here. | |||
== <code>stream</code> == | |||
The name of the stream. | |||
== <code>schema_title</code> == | |||
This much match exaclty the <code>title</code> of the event JSONSchema that is allowed in this stream. | |||
== <code>destination_event_service</code> == | |||
This refers to the name of the [[Event Platform/EventGate|EventGate]] HTTP event intake service the stream should be produced through. Producer clients use this to figure out where to send the stream. The EventGate services also use this to determine if a stream is allowed to be produced through them. | |||
NOTE: This should one day be moved into a <code>producers</code> config subobject. | |||
== <code>canary_events_enabled</code> == | |||
This aides in monitoring ingestion pipelines for event streams. If this is true (the default if not set), artificial canary events will periodically be produced into the stream. The canary events are created from the first event example in the schema, but with <code>meta.dt</code> at a current timestamp, and with <code>meta.domain: "canary"</code>. Consumers of streams with <code>canary_events_enabled: true</code> should filter out all events where <code>meta.domain == "canary"</code>. | |||
== <code>consumers</code> and <code>producers</code> == | |||
These sub object config settings should be used to configure specific clients that produce or consume this stream. The keys in this subobject should be the name of the client. Clients look up their configuration from the API by this name. | |||
As of 2021-09, this is only used for the Analytics Hadoop ingestion pipeline. See also https://phabricator.wikimedia.org/T273235. |
Revision as of 13:51, 3 September 2021
Stream configuration refers to configuration that distributed producers or consumers of a stream might want, e.g. the sampling rate or the schema title of the events that are allowed in the stream. Stream configuration was originally a requested feature of Event Platform for Product engineers, so they could more easily vary some event stream producer setting without having to do code deploys. It has since become a critical part of Event Platform, used by multiple services.
EventStreamConfig
EventStreamConfig is a MediaWiki extension that implements PHP and HTTP API for requesting stream configuration. Streams configuration entries are declared in the $wgEventStreams global list in mediawiki-config.
This centralized EventStreamConfig is used by several services to automate discovery and configuration of stream producer and consumer clients:
- EventGate service clusters uses stream config to restrict which types of events are allowed in which streams via tha schema_title setting.
- The MediaWiki EventLogging extension uses stream config to vary things like event stream sampling rate.
- The Analytics Cluster uses stream config to automate ingestion of streams into Hive.
- EventStreams uses stream config to discover streams and auto-generate OpenAPI docs.
Usage
See the EventStreamConfig README.
Common Settings Documentation
In lieu of a better place, we'll try to document some of the common stream config settings here.
stream
The name of the stream.
schema_title
This much match exaclty the title
of the event JSONSchema that is allowed in this stream.
destination_event_service
This refers to the name of the EventGate HTTP event intake service the stream should be produced through. Producer clients use this to figure out where to send the stream. The EventGate services also use this to determine if a stream is allowed to be produced through them.
NOTE: This should one day be moved into a producers
config subobject.
canary_events_enabled
This aides in monitoring ingestion pipelines for event streams. If this is true (the default if not set), artificial canary events will periodically be produced into the stream. The canary events are created from the first event example in the schema, but with meta.dt
at a current timestamp, and with meta.domain: "canary"
. Consumers of streams with canary_events_enabled: true
should filter out all events where meta.domain == "canary"
.
consumers
and producers
These sub object config settings should be used to configure specific clients that produce or consume this stream. The keys in this subobject should be the name of the client. Clients look up their configuration from the API by this name.
As of 2021-09, this is only used for the Analytics Hadoop ingestion pipeline. See also https://phabricator.wikimedia.org/T273235.