You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Event Platform/Flaws

From Wikitech-static
Jump to navigation Jump to search

Event Platform has tried to be flawless, but of course it is not! Like any system, it has inherited design decisions that may have been good at the time, but with hindsight are not.

meta field

The meta field is very confusing. It was originally created as a way of referencing a single subobject field that contained fields EventBus events needed to operate. This allowed for easier copy/pasting the field between different schemas, which we had to do before we had jsonschema-tools and schema $refs and materialization.

Ideally, the fields in meta would be top level and named appropriately. If we could get rid of meta we would. Doing so would be a lot of work.

dt fields

Every event needs to have an 'event time' field, specifying the time at which the event happened. Ideally, this would be the only field we'd need to require for all events. We would then use this field for Kafka timestamps and Hive hourly partitioning.

However, we accept events from unauthenticated external clients, so we can't totally trust them. A client might send an event with a timestamp in the distant past or future, which would cause issues for data ingestion. In cases where we can't trust the event time, we fall back to using the server side receive time.

So we need 2 timestamp fields for ingestions, event time and server receive time. As of 2021-04, the intention is to always use meta.dt for server receive time and dt for event time, and then make which is used for ingestion configurable. These field names are not particularly descriptive, but creating and using new dt fields is a non trivial amount of work. We may decide to soon rename dt to event_dt, as this is a little bit easier than renaming meta.dt.