Difference between revisions of "Event Platform/EventStreams/Administration"

From Wikitech-static
Jump to navigation Jump to search
imported>Ottomata
imported>Ottomata
 
(3 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{Navigation Event Platform}}
See [[EventStreams]] for an overview of the EventStreams service.
See [[EventStreams]] for an overview of the EventStreams service.


Line 5: Line 6:
Internally, EventStreams is available at <tt>eventstreams.svc.${::site}.wmnet</tt>.  It is routed to by varnish and LVS from <tt>stream.wikimedia.org</tt>.
Internally, EventStreams is available at <tt>eventstreams.svc.${::site}.wmnet</tt>.  It is routed to by varnish and LVS from <tt>stream.wikimedia.org</tt>.


EventStreams in production is configured and deployed using WMFs [[Deployment Pipeline]].
EventStreams in production is configured and deployed using WMFs [[Deployment pipeline]].


 
== Configuration ==
= Configuration =
EventStreams is configured in the [https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master operations/deployment-charts] repository.
EventStreams is configured in the [https://gerrit.wikimedia.org/r/plugins/gitiles/operations/deployment-charts/+/refs/heads/master operations/deployment-charts] repository.
Configuration is spread between the defaults in charts/eventstreams and the production specific configuration in helmfile.d.
Configuration is spread between the defaults in charts/eventstreams and the production specific configuration in helmfile.d.


helmfile.d values.yaml files contain mapping of stream routes to composite topics in Kafka(In the future, we'd like to get this mapping from [[Event_Platform/Stream_Configuration|EventStreamConfig]].Our event topics are prefixed by datacenter name.  This is abstracted for EventStreams consumers via this mapping. Any combination of stream name -> composite topic list is possible, e.g.
helmfile.d values.yaml files contain the list of <tt>allowed_streams</tt> that EventStreams will exposeThe actual Kafka topics are retrieved from [[Event_Platform/Stream_Configuration|EventStreamConfig]]. Our event topics are prefixed by datacenter name.  This is abstracted for EventStreams consumers via this mapping.
 
<syntaxhighlight lang="yaml">
      revision-create:
        description: |-
          Mediawiki Revision create events.
          Schema: https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema/mediawiki/revision/create
        topics:
          - eqiad.mediawiki.revision-create
          - codfw.mediawiki.revision-create
</syntaxhighlight>


= Kafka =
== Kafka ==
EventStreams is backed by the <tt>main</tt> Kafka clusters. As of 2018-08, EventStreams is multi-DC capable.  EventStreams in eqiad consumes from the Kafka main-eqiad cluster, and EventStreams in codfw consumes from the Kafka main-codfw cluster.  [[Kafka/Administration#MirrorMaker|Kafka MirrorMaker]] is responsible for mirroring the topics from eqiad to codfw and vice versa.
EventStreams is backed by the <tt>main</tt> Kafka clusters. As of 2018-08, EventStreams is multi-DC capable.  EventStreams in eqiad consumes from the Kafka main-eqiad cluster, and EventStreams in codfw consumes from the Kafka main-codfw cluster.  [[Kafka/Administration#MirrorMaker|Kafka MirrorMaker]] is responsible for mirroring the topics from eqiad to codfw and vice versa.


== NodeJS Kafka Client ==
=== NodeJS Kafka Client ===
KafkaSSE uses [https://github.com/Blizzard/node-rdkafka node-rdkafka] (as do other production NodeJS services that use Kafka).
KafkaSSE uses [https://github.com/Blizzard/node-rdkafka node-rdkafka] (as do other production NodeJS services that use Kafka).


= Repositories =
== Repositories ==


{| class="wikitable"
{| class="wikitable"
Line 43: Line 33:
|}
|}


= Deployment =
== Deployment ==
See: [[Deployments_on_kubernetes|Deployments on kubernetes]]
See: [[Deployments_on_kubernetes|Deployments on kubernetes]]


= Submitting changes =
== Submitting changes ==


== Change to KafkaSSE library ==
=== Change to KafkaSSE library ===
KafkaSSE is [https://github.com/wikimedia/kafkasse hosted in Github], so you must either submit a pull request or push a change there.
KafkaSSE is [https://github.com/wikimedia/kafkasse hosted in Github], so you must either submit a pull request or push a change there.


Line 55: Line 45:
If you update kafka-sse, you should bump the package version and publish to npm: https://www.npmjs.com/package/kafka-sse
If you update kafka-sse, you should bump the package version and publish to npm: https://www.npmjs.com/package/kafka-sse


== Change to mediawiki/services/eventstreams repository ==
=== Change to mediawiki/services/eventstreams repository ===
EventStreams is [https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/services/eventstreams hosted in gerrit].  Use git review to submit patches.  If you've modified the KafkaSSE repository, you should update the kafka-sse dependency version in package.json.  Merged changes in this repository will result in a new Docker image being built.
EventStreams is [https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/services/eventstreams hosted in gerrit].  Use git review to submit patches.  If you've modified the KafkaSSE repository, you should update the kafka-sse dependency version in package.json.  Merged changes in this repository will result in a new Docker image being built.


== Update operations/deployment-charts repository ==
=== Update operations/deployment-charts repository ===
Once a new Docker image has been made, you'll need to update the <tt>image_version</tt> in helmfile.d eventstreams values.yaml.
Once a new Docker image has been made, you'll need to update the <tt>image_version</tt> in helmfile.d eventstreams values.yaml.


== Deploy ==
=== Deploy ===
See: [[Deployments_on_kubernetes#Code_deployment/configuration_changes]]
See: [[Deployments_on_kubernetes#Code_deployment/configuration_changes]]


<syntaxhighlight lang="bash">
ssh deployment.eqiad.wmnet # or deployment-tin.deployment-prep.eqiad.wmflabs
ssh deployment.eqiad.wmnet # or deployment-tin.deployment-prep.eqiad.wmflabs
cd /srv/deployment/eventstreams/deploy
cd /srv/deployment/eventstreams/deploy
git pull && git submodule update
git pull && git submodule update
scap deploy
scap deploy
</syntaxhighlight>
</syntaxhighlight>


 
== Logs ==
= Logs =  
Logs are sent to logstash. You can [https://logstash.wikimedia.org/goto/a1e376b072956aa5c29cc5436ece63ea view them in Kibana].
Logs are sent to logstash. You can [https://logstash.wikimedia.org/goto/a1e376b072956aa5c29cc5436ece63ea view them in Kibana].


= Metrics =
== Metrics ==
https://grafana.wikimedia.org/dashboard/db/eventstreams
https://grafana.wikimedia.org/dashboard/db/eventstreams


= Throughput limits =
== Throughput limits ==
As of 2019-07, The public EventStreams stream.wikimedia.org endpoint is configured in varnish to only allow for 25 concurrent connections per varnish backend.    There are 10 text varnishes in codfw and 8 in eqiad, so the varnish concurrent connection limit for EventStreams is 200 in eqiad and 250 in codfw for a total of 450 concurrent connections.  
As of 2019-07, The public EventStreams stream.wikimedia.org endpoint is configured in varnish to only allow for 25 concurrent connections per varnish backend.    There are 10 text varnishes in codfw and 8 in eqiad, so the varnish concurrent connection limit for EventStreams is 200 in eqiad and 250 in codfw for a total of 450 concurrent connections.  
We have had incidents where a rogue client spawns too many connections. EventStreams code has some primitive logic to try to reduce the number of concurrent connections from the same X-Client-IP, but this will not fully prevent the issue from happening.  Check the total number of connections in https://grafana.wikimedia.org/dashboard/db/eventstreams if new connections receive a 502 error from varnish.
We have had incidents where a rogue client spawns too many connections. EventStreams code has some primitive logic to try to reduce the number of concurrent connections from the same X-Client-IP, but this will not fully prevent the issue from happening.  Check the total number of connections in https://grafana.wikimedia.org/dashboard/db/eventstreams if new connections receive a 502 error from varnish.


= Alerts =
== Alerts ==
EventStreams is configured with a [https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/eventstreams/monitoring.pp monitoring check] that will check that the /v2/stream/recentchange URL has data on it.  This check is done to the public stream.wikimedia.org endpoint. If this public check fails, then likely all backend service processes have the same issue.
EventStreams is configured with a [https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/eventstreams/monitoring.pp monitoring check] that will check that the /v2/stream/recentchange URL has data on it.  This check is done to the public stream.wikimedia.org endpoint. If this public check fails, then likely all backend service processes have the same issue.


= Incidents =
== Incidents ==
* [[Incident_documentation/20170829-EventStreams]]
* [[Incident_documentation/20170829-EventStreams]]
* [[Incident documentation/20190628-EventStreams]]
* [[Incident documentation/20190628-EventStreams]]
[[Category:Event Platform]]

Latest revision as of 13:51, 19 October 2021

See EventStreams for an overview of the EventStreams service.

EventStreams is a service-template-node based service. It glues together KafkaSSE with common Wikimedia service features, like logging, error reporting, metrics, configuration and deployment.

Internally, EventStreams is available at eventstreams.svc.${::site}.wmnet. It is routed to by varnish and LVS from stream.wikimedia.org.

EventStreams in production is configured and deployed using WMFs Deployment pipeline.

Configuration

EventStreams is configured in the operations/deployment-charts repository. Configuration is spread between the defaults in charts/eventstreams and the production specific configuration in helmfile.d.

helmfile.d values.yaml files contain the list of allowed_streams that EventStreams will expose. The actual Kafka topics are retrieved from EventStreamConfig. Our event topics are prefixed by datacenter name. This is abstracted for EventStreams consumers via this mapping.

Kafka

EventStreams is backed by the main Kafka clusters. As of 2018-08, EventStreams is multi-DC capable. EventStreams in eqiad consumes from the Kafka main-eqiad cluster, and EventStreams in codfw consumes from the Kafka main-codfw cluster. Kafka MirrorMaker is responsible for mirroring the topics from eqiad to codfw and vice versa.

NodeJS Kafka Client

KafkaSSE uses node-rdkafka (as do other production NodeJS services that use Kafka).

Repositories

Repository Description
KafkaSSE (github) Generic Kafka Consumer -> SSE NodeJS library.
eventstreams (github) EventStreams implementation using KafkaSSE and service-template-node.
operations/deployment-charts Helm chart repository for all production Kubernetes based services, including EventStreams.

Deployment

See: Deployments on kubernetes

Submitting changes

Change to KafkaSSE library

KafkaSSE is hosted in Github, so you must either submit a pull request or push a change there.

kafka-sse is an npm dependency of EventStreams.

If you update kafka-sse, you should bump the package version and publish to npm: https://www.npmjs.com/package/kafka-sse

Change to mediawiki/services/eventstreams repository

EventStreams is hosted in gerrit. Use git review to submit patches. If you've modified the KafkaSSE repository, you should update the kafka-sse dependency version in package.json. Merged changes in this repository will result in a new Docker image being built.

Update operations/deployment-charts repository

Once a new Docker image has been made, you'll need to update the image_version in helmfile.d eventstreams values.yaml.

Deploy

See: Deployments_on_kubernetes#Code_deployment/configuration_changes

ssh deployment.eqiad.wmnet # or deployment-tin.deployment-prep.eqiad.wmflabs
cd /srv/deployment/eventstreams/deploy
git pull && git submodule update
scap deploy

Logs

Logs are sent to logstash. You can view them in Kibana.

Metrics

https://grafana.wikimedia.org/dashboard/db/eventstreams

Throughput limits

As of 2019-07, The public EventStreams stream.wikimedia.org endpoint is configured in varnish to only allow for 25 concurrent connections per varnish backend. There are 10 text varnishes in codfw and 8 in eqiad, so the varnish concurrent connection limit for EventStreams is 200 in eqiad and 250 in codfw for a total of 450 concurrent connections. We have had incidents where a rogue client spawns too many connections. EventStreams code has some primitive logic to try to reduce the number of concurrent connections from the same X-Client-IP, but this will not fully prevent the issue from happening. Check the total number of connections in https://grafana.wikimedia.org/dashboard/db/eventstreams if new connections receive a 502 error from varnish.

Alerts

EventStreams is configured with a monitoring check that will check that the /v2/stream/recentchange URL has data on it. This check is done to the public stream.wikimedia.org endpoint. If this public check fails, then likely all backend service processes have the same issue.

Incidents