You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Cache servers: Difference between revisions
imported>Volans m (→Add host to new cluster: Switch wmf-auto-reimage to the reimage cookbook) |
imported>Krinkle mNo edit summary |
||
Line 2: | Line 2: | ||
{{Navigation Wikimedia infrastructure|expand=caching}} | {{Navigation Wikimedia infrastructure|expand=caching}} | ||
WMF currently runs around 100 [[Traffic_cache_hardware|physical servers]] in [[ | WMF currently runs around 100 [[Traffic_cache_hardware|physical servers]] in [[Data centers|5 data centres]] with the purpose of caching HTTP traffic. The servers are split in different [[Varnish#Cache_Clusters|logical clusters]] according to the type of contents they cache. A puppet role corresponds to each logical cluster, for example '''cache::text''' and '''cache::upload'''. | ||
== Move server to another cluster == | == Move server to another cluster == |
Latest revision as of 23:04, 17 June 2022
This page is 'orphaned' as no other pages link to it. Please link to it in appropriate pages or navigation boxes. Build the web! |
WMF currently runs around 100 physical servers in 5 data centres with the purpose of caching HTTP traffic. The servers are split in different logical clusters according to the type of contents they cache. A puppet role corresponds to each logical cluster, for example cache::text and cache::upload.
Move server to another cluster
The following procedure allows to move a cache server to another cluster. The example shows how cp3043.esams.wmnet can be moved from cache_text to cache_upload.
Depool and downtime
On the cluster::management node (neodymium.eqiad.wmnet at the time of this writing):
sudo -i confctl select name=cp3043.esams.wmnet set/pooled=no
Make sure that depooling took place correctly, for instance by looking at the frontend and backend traffic instance breakdown dashboards.
On the alerting_host node (currently einsteinium.wikimedia.org):
sudo -i icinga-downtime -h cp3043 -d 7200 -r "move $1 to cache_upload --$USER"
Remove host from its current cluster
Remove the host from the list of cache_text machines in conftool and hiera (commit example)
Puppet needs to be run on all other (non-cp3043) cache_text nodes to reflect the hiera changes. Conftool changes take effect automatically upon puppet-merge. You can double-check whether the node has been removed from the list of cache_text servers in esams.
Add host to new cluster
Disable puppet on all cache nodes belonging to the new cluster (cache_upload in this example).
Add the node to cache_upload in hiera, change server role (example) and follow the Server Lifecycle/Reimage procedure
Add node to conftool.
Run puppet on all cache_upload nodes.
Final verification and pooling
Ensure that the new node is working as expected (eg: varnishtest -k /usr/share/varnish/tests/upload/*.vtc, test requests locally against varnish-fe and varnish-be)
Repool.