You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Establishing an Upper Bound
Preserving the ability to decommission hosts
From an operational perspective, it is important to retain the ability to decommission at least one host in each rack. As an example, consider a cluster that has 4 machines to a rack, all of which are treated homogeneously (i.e. they all store the same amount of data). In order to retain the ability to decommission one host in each rack, you must have the free space to transfer one host’s worth of data to the remaining 3, this would establish a hard upper bound of 75%.
NOTE: It may make more sense to express capacity and utilization in terms of data units, rather than percentages. However, when considering an upper bound that permits the decommission of at least one host, a percentage will always be required. That percentage is calculated as:
((N-1) / N) * 100 (where
N is the number of hosts in a rack).
Another consideration to establishing an upper bound on utilization is the runway needed to procure and provision additional resources. There needs to exist an upper bound on utilization, after which no additional use-cases will be added (or changes made to existing use-cases that would result in increased utilization). From this point of “No”, there should be sufficient time given the rate of growth, to budget for, procure, and provision additional storage well before running critically low.
Compaction is an asynchronous process that optimizes the layout of data on disk for read performance. Various compaction strategies exist within Cassandra, each meant to optimize for different storage and/or access patterns, but all work by creating new files to atomically replace existing ones. This requires that we maintain enough free space to store as much as 2x of all in-progress compactions (space enough for the tables being combined, plus the newly combined files, until the compaction succeeds and the old data can be safely removed). As a pathological worst case, imagine an entire dataset stored as one keyspace, with a single configured data file directory, sans incremental repair. A non-split major compaction using
SizeTieredCompactionStrategy would result in a single file as output, requiring 100% in additional overhead for working space. Again, this represents a worst-case scenario that is easily avoided, but it demonstrates the need to understand your configuration, and plan accordingly.
For the RESTBase cluster, Cassandra hosts are uniformly distributed over 3 rows in each data-center. Cassandra's
NetworkTopologyStrategy is used to create a correlation between a Cassandra rack, and each of the data-center rows where nodes are located; Three replicas are stored in each data-center, one per row/rack.
Multiple data-centers are used in order to support active-active geographic distribution (uncoordinated reads and writes from either or both sites). The use of 3-way replication in each data-center is primarily a matter of guaranteeing consistency and availability requirements, vis-a-vis tunable consistency, and not about data redundancy (all data in RESTBase is thus far secondary in nature, and can be recovered or regenerated from primary data sources).
NOTE: All hosts in the RESTBase cluster are treated homogeneously, even when the hardware is not. This necessitates that capacity be based off of the least common denominator of any resource. For example: If a single host in a 24 host cluster has a total storage capacity of
N, and the other 23 have
N+10, total capacity is
|Date||Total capacity||Critical limit||Soft limit||Current utilization|
- Planning for 100% utilization is dangerous and impractical, so the actual limit needs to be somewhat lower than this in practice.
- On the basis that we need reserve 25% for decommissions, and 1.2TB for other overheads.
- Given a growth-rate of 850GB/month, and the need for a 3-month runway.