etcd (https://etcd.io/) is an open source key-value store with a focus on reliability that is used to store configuration and state data for distributed systems. At WMF we run a number of etcd clusters, this document addresses the two etcd Main clusters, one each installed in the primary datacenters, eqiad and codfw. A number of applications, including mediawiki read/write configuration store state data on etcd.
Relevant service categories (wiki categories) for grouping by similar services, owner, etc.
Etcd is a foundational service
No hard dependencies beyond hardware and networking. It is worth pointing out that server hardware and networking have their own failure rates that are in the 99% range. Etcd as configured is able to deal with a certain type of failures in a local datacenter.
Confd: a lightweight configuration management tool focused on keeping local configuration files up-to-date using data stored in etcd
Etcd is owned by the Service Operations SRE team, which is responsible for all aspects including operation, scalability, backups and software updates.
- Escalation points and Key contacts:
Supporting documentation and relevant information
- Design documents
- Operational documentation
- Phabricator component query links
- Netbox links
- Links to other relevant SRE Tooling™
- Links to Runbooks
- Related service request types
- Any supporting or underpinning services (e.g. dependencies)
- Who is entitled to request/view the service