You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

SessionService/Storage

From Wikitech-static
Revision as of 23:37, 9 August 2016 by imported>Eevans (→‎Consistency: fail-out if consistency requirements cannot be met)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Requirements

Availability / Fault-tolerance

The SessionService storage system must be resilient to wholesale machine outages without resulting in user-facing impact, or requiring immediate operator intervention. These properties must hold true both within a given a data-center, and across all sites.

  • TODO: How many machine failures must be survivable (within a given data-center, across all sites)?

Consistency

SessionService storage must provide read-your-write consistency; It cannot be possible for a client to return a stale read when racing an asynchronous replication, post-write. If the conditions do not permit read-your-write consistency (due to machine failures, network partition, etc), then the operation must fail. These properties must hold true both for writes occurring within a given data-center, as well as those that happen across sites[1].

Performance

Storage for the SessionService must be capable of delivering a throughput of NNN/second reads and NNN/second writes. Average read and write latency must be no more than NNN ms and NNN ms respectively. 99p read and write latency should be no greater than NNN ms and NNN ms respectively.

  • TODO: Need values for NNN above.
  • TODO: When we say writes, what does this entail? Are writes something which always happen at a global quorum? In other words, do writes (and reads for that matter) need to be qualified according to their function/purpose?

Candidates

MySQL

Description

  1. Each data-center would contain N MySQL servers, each configured to be a master, that in turn acts as a slave to all others (multisync)
  2. HAProxy would be used for load-balancing and fail-over of client connections to the pool of DB servers
  3. Within each data-center, each server would do semisync replication to some number of it's slaves, (blocking successful writes unless/until that number of slaves has acknowledged the write)
  4. Cross-DC, the masters of one DC would do asynchronous-only replication with the slaves in the remote DC

See also: https://github.com/jynus/pysessions

Questions

  • How many replicas would exist in each data-center, and what slave count would be used for the semisync replication?
  • How is conflict resolution managed on reads?
  • In the prototype, the dataset is sharded across 256 tables, why so?

Cassandra

Description

  1. Each data-center would contain 3 Cassandra nodes, and each data-center would be configured for a replication count of 3[2]
  2. Cassandra drivers (by convention) perform auto-discovery of nodes, distribute requests across available nodes, and de-pool in the event of failure. Any node can serve any request (coordinating if necessary).
  3. Cassandra employs tunable consistency to determine how (a)synchronous replication is, or how many copies to consider for conflict resolution on a read. If we assume read-your-write guarantees, a per-site replication factor of 3 permits both low-latency quorum reads/writes within a data-center, and either global or "each" quorums when cross-site consistency is required.

Questions

  • ...

Footnotes

  1. "For now, reliable server-side session invalidation is a hard requirement, in both single master and multi-DC operation. Were we using something like JWT for the session cookie value, then that requirement could be much more relaxed, but I think we're a number of circuitous discussions away from being able to implement signed client-side tokens." -- Darian Patrick
  2. Cassandra's rack-awareness ensures that replicas are distributed across racks (or rows as the case may be). This isn't terribly useful when both node and replica count are 3 as specified here, but if ever becomes necessary to expand the cluster...