You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Traffic cache hardware

From Wikitech-static
Revision as of 22:03, 5 January 2023 by imported>BCornwall (Add SRE Traffic cat)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This is an overview of our currently deployed and active cache hardware at the Traffic layer.

Hardware classes

We have purchased and retired multiple classes of server hardware over the years in staggered timeframes. In the general case we'll always have multiple overlapping classes of hardware as various warranty and support periods expire. These are the currently-active hardware configuration classes:

Label Model CPU Type/Speed Phys Cores RAM Cache storage NIC speed, type, driver DC Ops Config
F1 Dell R440 2x Xeon Gold 5118 @ 2.3Ghz 24 384 GB 1x Samsung PM1725a 1.6TB NVMe (U.2 SFF) 10G, BCM57412, bnxt F-10G, +storage card
F2 Dell R440 2x Xeon Gold 5118 @ 2.3Ghz 24 384 GB 1x Samsung PM1725b 1.6TB NVMe (HHHL Card) 10G, BCM57412, bnxt F-10G, +storage card
F3 Dell R440 2x Xeon Gold 5118 @ 2.3Ghz 24 384 GB 1x Samsung PM1725b 1.6TB NVMe (HHHL Card) 10/25G, BCM57412, bnxt F-10G, +storage card, +10/25G NIC variant
F4-T Dell R450 2x Xeon Gold 5318Y @ 2.1Ghz 48 512 GB 1x 6.4TB NVMe Card 10/25G, BCM57414, bnxt F (*)
F4-U Dell R450 2x Xeon Gold 5318Y @ 2.1Ghz 48 512 GB 2x 6.4TB NVMe Card 10/25G, BCM57414, bnxt F (*)
  • F4: The new DC Ops Config F from mid-2022 is exclusive to these edge cache roles, and thus our F4 config using it includes the storage cards and NIC upgrades as part of its base definition.

Deployed hardware

Currently deployed hardware by data center and caching cluster.

Current as of December 2022
Data center cache_text cache_upload total
eqiad 8x F1 8x F1 16x F1
codfw 8x F2 8x F2 16x F2
esams 8x F2 8x F2 16x F2
ulsfo 8x F4-T 8x F4-U 16x F4
eqsin 8x F4-T 8x F4-U 16x F4
drmrs 8x F3 8x F3 16x F3
total 48x Fn 48x Fn 96x Fn

Proposed FY22-23 changes + refreshes

Note as of 2022-12-08: the deployment of the new hardware in eqsin + ulsfo is now complete. Shipping some or all of the 8x F2/F3 nodes back to eqiad is still pending.

  • ulsfo and eqsin get refreshed to new-standard 16xF4 config in first half of the FY.
  • The 8x off-cycle (newer, still in warranty) F-nodes in ulsfo and eqsin are shipped to eqiad.
  • Eqiad installs these into the new E+F rows for a number of reasons:
    • Utilize the new rows in eqiad in general (more load/redundancy spread)
    • Test impact of expanded server counts in general
    • Re-use this good hardware instead of tossing it, so we don't waste it just for being off-cycle purchased
    • Buys us time to push natural eqiad warranty refresh out another FY, spreading out refresh cycles better (too many this year!)
    • Allow the F4 refreshes in ulsfo+eqsin to be whole-DC upgrades, since F4 enables whole-DC architecture changes in traffic routing.
  • esams gets refreshed in Q4 to the same new F4 config as ulsfo+eqsin (we have some time and space to adjust this based on earlier outcomes if necc)
Data center cache_text cache_upload total Note
eqiad 8x F1 + 4x F2 8x F1 + 4x F3 16x F1 + 4x F2 + 4x F3 Reinforced this FY
codfw 8x F2 8x F2 16x F2 no changes this FY
esams 8x F4-T 8x F4-U 16x F4 refreshed to F4 in Q4
ulsfo 8x F4-T 8x F4-U 16x F4 refreshed to F4 in Q1
eqsin 8x F4-T 8x F4-U 16x F4 refreshed to F4 in Q2
drmrs 8x F3 8x F3 16x F3 no changes this FY
total 52x Fn 52x Fn 104x Fn