You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Analytics/Systems/EventLogging/Architecture"

From Wikitech-static
Jump to navigation Jump to search
imported>Ottomata
(Updated for current arch)
imported>Ottomata
 
Line 1: Line 1:
{{Notice|This documentation is outdated.  See [[Event_Platform#Event_Platform_documentation_pages|Event Platform documentation]].}}
This page explains WMF's EventLogging system topology and how its parts interact. Using the following diagram as a reference:
This page explains WMF's EventLogging system topology and how its parts interact. Using the following diagram as a reference:



Latest revision as of 15:11, 29 July 2021


This page explains WMF's EventLogging system topology and how its parts interact. Using the following diagram as a reference:

EventLogging architecture

  • varnishkafka sends client-side raw (URL encoded JSON in query string) events from Varnish to eventlogging-client-side Kafka topic.
  • An eventlogging-processor consumes and processes these raw events and send them back to Kafka as JSON strings. Once processed and validated, the processed events are produce to Kafka in the topics: eventlogging-valid-mixed and eventlogging_<schemaName>. eventlogging-valid-mixed that contains the valid events from all schemas with the exception of blacklisted high volume schemas. eventlogging_<schemaName> holds all events for each schema.
  • eventlogging-valid-mixed is consumed by eventlogging-consumer processes and stored into MySQL and into the eventlogging log files. The eventlogging_<schemaName> topics are consumed by Camus and stored in HDFS partitioned by <schemaName>/<year>/<month>/<day>/<hour>

The EventLogging back-end is comprised of several pieces that consume and produce from/to Kafka, which makes it a single purpose standalone stream processor. The /etc/eventlogging.d file hierarchy contains those process instance definitions. It has a subfolder for each service type. An systemd task, uses this file hierarchy and provisions a job for each instance definition. Instance definition files contain command-line arguments for the service program, one argument per line.

An 'eventloggingctl' shell script provides a convenient wrapper around for managing EventLogging processes.