You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Redis: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>CRusnov
(Added documentation about how to use Redis from other services in our infrastructure.)
imported>Alexandros Kosiaris
(Compact Netbox redis databases in 1 line. Mostly for keeping context in a single line)
 
(22 intermediate revisions by 10 users not shown)
Line 1: Line 1:
{{Navigation Wikimedia infrastructure|expand=mw}}
{{Navigation Wikimedia infrastructure|expand=mw}}
{{See|For the Redis service at Toolforge, see  [[Nova Resource:Tools/Redis]].}}
{{See|For the Redis service at Toolforge, see  [[Help:Toolforge/Redis for Toolforge]].}}
'''Redis''' is used in Wikimedia production for:
There are a number of '''Redis''' clusters and instance in Wikimedia production.
* changeprop (role::redis::misc)
* stashing user sessions ([[mw:Manual:$wgSessionCacheType|$wgSessionCacheType]]) and the data stored in the main stash ([[mw:Manual:$wgMainStash|$wgMainStash]]).
* [[Maps]]
* As a cache and queue backend in [[ORES]]
* Receiver of sampled profile data from PHP, as part as the sampling/profiling pipeline ([[Arc Lamp]]).
* stores lists of articles for new editors to edit as part of the [[mw:Extension:GettingStarted|GettingStarted]] MediaWiki extension


[[mw:MediaWiki-Vagrant|MediaWiki-Vagrant]] and [[Help:MediaWiki-Vagrant in Labs|MediaWiki-Vagrant in Cloud VPS]] are configured by default to use redis for [[mw:Manual:$wgMainCacheType|$wgMainCacheType]], [[mw:Manual:$wgSessionCacheType|$wgSessionCacheType]], etc.
* <s>redis_maps (maps* hosts) used by [[Maps]].</s>
* redis_misc (rdb*hosts) used by multiple services detailed below.
* webperf (mwlog1002 host) used by [[Arc Lamp]] for collecting PHP profiling samples.


== Role redis::misc (Redis 3.x) ==
Outside production, we have [[mw:MediaWiki-Vagrant|MediaWiki-Vagrant]] and [[Help:MediaWiki-Vagrant in Cloud VPS|MediaWiki-Vagrant in Cloud VPS]] which are configured by default to use a Redis instance for local object caching and session store.


The role <code>redis::misc</code> is for our general purpose master-slave cluster in eqiad and codfw DCs. Each <code>rdb*</code> node has 5 instances (ports 6378, 6379, 6380, 6381, 6382) because redis is sigle threaded.
== <s>Cluster redis_maps</s> ==
<s>See [[Maps]].</s>


=== Servers ===
== Cluster redis_gitlab ==
Each master has its respected slave. Masters use odd numbers (e.g. rdb1005) and slaves an even one (e.g. rdb1006). Master-slave instances use the same ports e.g. <code>rdb1005:6379</code> is the master of <code>rdb1006:6379</code>
See [[Gitlab]].
 
== Cluster redis_misc ==
 
The role <code>redis::misc</code> is for our general purpose master-replica cluster in eqiad and codfw DCs. Each <code>rdb*</code> node has 5 instances (ports 6378, 6379, 6380, 6381, 6382) because redis is single threaded. A mapping of usages is below.
 
The servers are setup as 2 independent pairs. This is for HA purposes and it's up to the application to use it that way. Conversely not all applications are able to do so.
 
Consumers:
 
* [[Changeprop]]: Uses Redis for rate limiting (actively uses both instances).
* changeprop-jobqueue: Uses Redis for job deduplication (actively uses both instances).
* [[API_Gateway]]: Uses redis for rate-limitting
* [[ORES]]: Uses Redis for caching and queueing (one active instance).
* docker-registry: (one active instance).
 
=== Pair 1 ===
 
{| class="wikitable"
|-
! Port !! redis db !! Usage
|-
| 6378 || 0, 1 ||Netbox tasks (db 0) and NetBox caching (db 1)
|-
| 6379 || 0 || [[changeprop]]/[[Job queue|cpjobqueue]]/[[API Gateway|api-gateway]]
|-
| 6380 || 0 ||unallocated
|-
| 6381 || 0 || filebackend.php (redisLockManager)
|-
| 6382 || 0 ||filebackend.php (redisLockManager)
|}
Masters: '''rdb1009''' and '''rdb2007'''
 
Cumin alias: '''redis-misc-pair1-master'''
 
 
Replicas: '''rdb1010''' and '''rdb2008'''
 
Cumin alias: '''redis-misc-pair1-slave'''
 
=== Pair 2 ===
 
{| class="wikitable"
|-
! Port !! redis db !! Usage
|-
| 6378 || 0 || [[ORES]] cache
|-
| 6379 || 0 || [[changeprop]]/[[Job queue|cpjobqueue]]/[[API Gateway|api-gateway]]
|-
| 6380 || 0 || [[ORES]] queue
|-
| 6381 || 0 || filebackend.php (redisLockManager)
|-
| 6382 || 0 || [[docker-registry]]
|}
Masters: '''rdb1011''' and '''rdb2009'''
 
Cumin alias: '''redis-misc-pair2-master'''
 
 
Replicas '''rdb1012''' and '''rdb2010'''


'''eqiad:'''
Cumin alias: '''redis-misc-pair2-slave'''
*rdb1005 (m) - rdb1006 (s)
*rdb1009 (m) - rdb1010 (s)
'''codfw:'''
*rdb2003 (m) - rdb2004 (s)
*rdb2005 (m) - rdb2006 (s)


=== Servers ===
Each master has a replica. Masters use odd numbers (e.g. rdb1005) and replicas the subsequent even number (e.g. rdb1006). Master-replica instances use the same ports e.g. <code>rdb0003:6379</code> would replicate to <code>rdb0004:6379</code>
=== Services ===
=== Services ===


'''Change propagation''' (or '''changeprop''') is a service running on <code>scb*</code> servers listening to topics on Kafka for events, and translates them into http requests to various systems. It is also responsible for cache evictions to happen on all services like [[RESTBase]]. Changeprop talks to redis via [[Nutcracker]].
Change propagation (or changeprop) is a service that runs on [[Kubernetes]] nodes by listening to topics on Kafka for events, and then translating them into HTTP requests to various systems. It is also responsible for cache evictions to happen on all services like [[RESTBase]]. Changeprop talks to Redis via [[Nutcracker]].
* <code>hieradata/role/eqiad/scb.yaml</code>
*[[phab:source/operations-deployment-charts/browse/master/helmfile.d/services/changeprop/|Helmfile service definition]]
* <code>hieradata/role/codfw/scb.yaml</code>
*[https://logstash.wikimedia.org/app/kibana#/dashboard/change-prop?_g=h@44136fa&_a=h@8c02121 Kibana changeprop Dashboard]


=== Related puppet code ===
=== Related puppet code ===
Line 41: Line 95:
=== Other Info ===
=== Other Info ===


* Instance passwords can be easily found under <code>/etc/redis/<instance>.conf</code>
* Instance passwords can be found under <code>/etc/redis/<instance>.conf</code>
* [https://grafana.wikimedia.org/d/000000174/redis?orgId=1 Grafana redis::misc Dashboard]
* [https://grafana.wikimedia.org/d/000000174/redis?orgId=1 Grafana dashboard: Redis]


== Using Redis ==
== Using Redis ==
Line 78: Line 132:


Configuration is likewise pretty straightforward with perhaps the exception of the snapshotting, aof and memory settings; here's the [https://raw.github.com/antirez/redis/2.6/redis.conf sample config file].
Configuration is likewise pretty straightforward with perhaps the exception of the snapshotting, aof and memory settings; here's the [https://raw.github.com/antirez/redis/2.6/redis.conf sample config file].
== Former clusters ==
=== Cluster redis_sessions ===
The "redis_sessions" cluster was co-located on the main <code>mc*</code> hosts that also serve memcached, as was used by MediaWiki.
The cluster had a capacity of 8GB in total (16 shards with 520MB each, downsized to 8 shards as of April 2021, [[phab:T280582|T280582]]).
The cluster was stable in its utilization at a fairly constant 3GB of live data at any given time (as of July 2021, [[phab:T212129#6283230|T212129#6283230]]).
Past consumers in [[MediaWiki at WMF|MediaWiki]]:
* [[mw:MainStash|MainStash]] backend, generic interface used by various features and extensions to store secondary data that should persist for multiple weeks without LRU eviction. The MainStash backend was moved out to the [[X2|x2 database]] as part of [[phab:T212129|T212129]]
* Prior to 2020, MediaWiki core session data was stored in Redis, via [[mw:Manual:$wgSessionCacheType|$wgSessionCacheType]], and has since moved to Cassandra ([[phab:T206016|T206016]]).
*Prior to Oct 2021, [[mw:Extension:GettingStarted|GettingStarted]] extension, which stored lists of articles for new editors to edit.
*Prior to Jul 2022, CentralAuth authentication tokens (short-lived). Moved to memcached via mcrouter-primary-dc ([[phab:T278392|T278392]]).
*Pror to Jul 2022, CentralAuth session data. Moved to Cassandra ([[phab:T267270|T267270]]).
*Prior to Aug 2022, Rdbms-ChronologyProtector offsets (short-lived). Moved to dc-local memcached ([[phab:T314453|T314453]]).
The decomission task is [[phab:T267581|T267581: Phasing out "redis_sessions" cluster]]


==See also==
==See also==
* [[memcached]]
* [[memcached]]
* [[nutcracker]] (AKA twemproxy), the proxy used by all application servers to contact memcached (but not redis as of 2015, except it does again as of 2016)
* [[nutcracker]] (AKA twemproxy), the proxy used by all application servers to contact memcached (but not redis as of 2015, except it does again as of 2016)
* [https://architecturenotes.co/redis/ Redis explained]


[[Category:Caching]]
[[Category:Caching]]
[[Category:MediaWiki production]]
[[Category:MediaWiki production]]
[[Category:SRE Service Operations]]

Latest revision as of 11:15, 16 May 2023

There are a number of Redis clusters and instance in Wikimedia production.

  • redis_maps (maps* hosts) used by Maps.
  • redis_misc (rdb*hosts) used by multiple services detailed below.
  • webperf (mwlog1002 host) used by Arc Lamp for collecting PHP profiling samples.

Outside production, we have MediaWiki-Vagrant and MediaWiki-Vagrant in Cloud VPS which are configured by default to use a Redis instance for local object caching and session store.

Cluster redis_maps

See Maps.

Cluster redis_gitlab

See Gitlab.

Cluster redis_misc

The role redis::misc is for our general purpose master-replica cluster in eqiad and codfw DCs. Each rdb* node has 5 instances (ports 6378, 6379, 6380, 6381, 6382) because redis is single threaded. A mapping of usages is below.

The servers are setup as 2 independent pairs. This is for HA purposes and it's up to the application to use it that way. Conversely not all applications are able to do so.

Consumers:

  • Changeprop: Uses Redis for rate limiting (actively uses both instances).
  • changeprop-jobqueue: Uses Redis for job deduplication (actively uses both instances).
  • API_Gateway: Uses redis for rate-limitting
  • ORES: Uses Redis for caching and queueing (one active instance).
  • docker-registry: (one active instance).

Pair 1

Port redis db Usage
6378 0, 1 Netbox tasks (db 0) and NetBox caching (db 1)
6379 0 changeprop/cpjobqueue/api-gateway
6380 0 unallocated
6381 0 filebackend.php (redisLockManager)
6382 0 filebackend.php (redisLockManager)

Masters: rdb1009 and rdb2007

Cumin alias: redis-misc-pair1-master


Replicas: rdb1010 and rdb2008

Cumin alias: redis-misc-pair1-slave

Pair 2

Port redis db Usage
6378 0 ORES cache
6379 0 changeprop/cpjobqueue/api-gateway
6380 0 ORES queue
6381 0 filebackend.php (redisLockManager)
6382 0 docker-registry

Masters: rdb1011 and rdb2009

Cumin alias: redis-misc-pair2-master


Replicas rdb1012 and rdb2010

Cumin alias: redis-misc-pair2-slave

Servers

Each master has a replica. Masters use odd numbers (e.g. rdb1005) and replicas the subsequent even number (e.g. rdb1006). Master-replica instances use the same ports e.g. rdb0003:6379 would replicate to rdb0004:6379

Services

Change propagation (or changeprop) is a service that runs on Kubernetes nodes by listening to topics on Kafka for events, and then translating them into HTTP requests to various systems. It is also responsible for cache evictions to happen on all services like RESTBase. Changeprop talks to Redis via Nutcracker.

Related puppet code

  • hieradata/role/common/redis/misc/master.yaml
  • hieradata/role/common/redis/misc/slave.yaml
  • modules/role/manifests/redis/misc/master.pp
  • modules/role/manifests/redis/misc/slave.pp

Other Info

Using Redis

Connecting

redis-cli is installed on all servers where redis-server is installed. This will leave you at a redis prompt where you can enter commands interactively.

Some useful commands

  • AUTH <somepass> authenticate
  • INFO status information, including:
# Replication
role:slave
master_host:10.64.0.24
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
<snip>
# Keyspace
db0:keys=9351936,expires=9291239,avg_ttl=0
  • KEYS <pattern-here> list of all keys matching the given pattern. Use this sparingly! This query could take seconds to complete
  • QUIT closes the connection.

Using Redis from other Services

Some services may require or be able to use Redis, and this Redis cluster is appropriate for that.

As noted above, each pair of Redis servers in each data center have five separate instances on different ports, a majority of which are not in use; the first step to using the Redis server in production service is to choose an unused instance/port pair which can be located by examining Hiera data for what is currently in use: a relatively straight forward way to do this is to use git grep '\Wrdb[12]' within a Puppet tree, which shows every use of an rdb address. A similar procedure may be used to find a port that is unallocated.

Once a port/host combination for each datacenter is chosen, it is as simple as referring to those from the Puppet state which will use them.

Using Redis from a service requires a password; the password may be obtained from the Hiera key ::passwords::redis::main_password in hieradata/role/common/redis/misc/master.yaml in the private repository. It is currently the convention to introduce a new private Hiera key to store the password for your service's use, however this is obviously inefficient and subject to change.

Other references

Commands are easy, they all depend on the data type (hash, set, list, etc). Here's a quick reference.

Configuration is likewise pretty straightforward with perhaps the exception of the snapshotting, aof and memory settings; here's the sample config file.

Former clusters

Cluster redis_sessions

The "redis_sessions" cluster was co-located on the main mc* hosts that also serve memcached, as was used by MediaWiki.

The cluster had a capacity of 8GB in total (16 shards with 520MB each, downsized to 8 shards as of April 2021, T280582).

The cluster was stable in its utilization at a fairly constant 3GB of live data at any given time (as of July 2021, T212129#6283230).

Past consumers in MediaWiki:

  • MainStash backend, generic interface used by various features and extensions to store secondary data that should persist for multiple weeks without LRU eviction. The MainStash backend was moved out to the x2 database as part of T212129
  • Prior to 2020, MediaWiki core session data was stored in Redis, via $wgSessionCacheType, and has since moved to Cassandra (T206016).
  • Prior to Oct 2021, GettingStarted extension, which stored lists of articles for new editors to edit.
  • Prior to Jul 2022, CentralAuth authentication tokens (short-lived). Moved to memcached via mcrouter-primary-dc (T278392).
  • Pror to Jul 2022, CentralAuth session data. Moved to Cassandra (T267270).
  • Prior to Aug 2022, Rdbms-ChronologyProtector offsets (short-lived). Moved to dc-local memcached (T314453).

The decomission task is T267581: Phasing out "redis_sessions" cluster

See also

  • memcached
  • nutcracker (AKA twemproxy), the proxy used by all application servers to contact memcached (but not redis as of 2015, except it does again as of 2016)
  • Redis explained