You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org
SRE/Infrastructure naming conventions: Difference between revisions
imported>Legoktm (→Servers: +lists) |
imported>Clément Goubert m (→Kubernetes) |
||
(28 intermediate revisions by 19 users not shown) | |||
Line 1: | Line 1: | ||
{{Navigation Wikimedia infrastructure|expand=dc}} | {{Navigation Wikimedia infrastructure|expand=dc}} | ||
This page documents the '''naming conventions''' of servers, routers, data center sites | This page documents the '''naming conventions''' of servers, routers, and data center sites. | ||
Our servers currently fall in broadly two categories: | Our servers currently fall in broadly two categories: | ||
* Clustered servers: These use numeral sequences with a descriptive prefix (see [[#Networking]] and [[#Servers]]). For example: db1001. | * Clustered servers: These use numeral sequences with a descriptive prefix (see [[#Networking]] and [[#Servers]]). For example: db1001. | ||
* Miscellaneous servers: These | * Miscellaneous servers: These used unique hostnames (see [[#Miscellaneous servers]]). For example: [[helium]]. This naming convention is deprecated and not used for new hosts, but some older miscellaneous-named hosts still exist. | ||
== Name | == Name reuse == | ||
Historically we did not | Historically, we did not reuse names of past servers for new servers. For example, after db1001 is decommissioned, no other server will be named db1001. Ganeti VMs sometimes reuse hostnames, but bare metal typically will not. | ||
The notable exception is networking gear, which are deterministically specified by rack. For example the access switch in [[Eqiad cluster|Eqiad]] rack A8 is named asw-a8-eqiad. If it is replaced, the new switch will take the same name. | The notable exception is networking gear, which are deterministically specified by rack. For example the access switch in [[Eqiad cluster|Eqiad]] rack A8 is named asw-a8-eqiad. If it is replaced, the new switch will take the same name. | ||
Line 14: | Line 14: | ||
__TOC__ | __TOC__ | ||
== Server clusters == | |||
[[ | == {{anchor|Server clusters}} Data centers == | ||
[[Data centers]] are named as ''vendor initials'' (at time of lease signing) followed by the IATA code for a nearby major airport. | |||
For example: our Dallas site is named [[Codfw cluster|''codfw'']]. The vendor is CyrusOne, and DFW being the large nearby airport. (Technically, [[:en:Dallas_Love_Field|Love Field airport]] is closer but less well-known.) | For example: our Dallas site is named [[Codfw cluster|''codfw'']]. The vendor is CyrusOne, and DFW being the large nearby airport. (Technically, [[:en:Dallas_Love_Field|Love Field airport]] is closer but less well-known.) | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
! | ! DC !! Vendor !! Airport Code | ||
|- | |- | ||
| codfw || CyrusOne || [[:en:Dallas/Fort_Worth_International_Airport|DFW]] | | codfw || CyrusOne || [[:en:Dallas/Fort_Worth_International_Airport|DFW]] | ||
|- | |||
|drmrs | |||
|Digital Realty | |||
|[[:en:Marseille_Provence_Airport|MRS]] | |||
|- | |- | ||
| eqdfw || Equinix || [[:en:Fort Worth International Airport|DFW]] | | eqdfw || Equinix || [[:en:Fort Worth International Airport|DFW]] | ||
Line 102: | Line 107: | ||
::* servicehostgroup-arrayN-site | ::* servicehostgroup-arrayN-site | ||
:::* Example: labstore-array1-codfw, labstore-array2-codfw | :::* Example: labstore-array1-codfw, labstore-array2-codfw | ||
== Kubernetes == | |||
Any [[Kubernetes/Clusters|cluster]] that is not the main wikikube cluster should follow these conventions: | |||
* Cluster name: <identifier>-k8s (ex: dse-k8s, aux-k8s) | |||
* Control plane service name: <identifier>-k8s-ctrl | |||
* Ingress service name: <identifier>-k8s-ingress [-ro|-rw] for active/active or active/passive | |||
* Hostnames for control plane : <identifier>-k8s-ctrlXXXX.$site.wmnet | |||
* Hostnames for kubelets : <identifier>-k8s-workerXXXX.$site.wmnet | |||
== Servers == | == Servers == | ||
Line 109: | Line 123: | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
! | ! Data center !! Numeral range !! Example | ||
|- | |- | ||
| [[pmtpa]] / [[sdtpa]] || 1-999 || cp7 | | [[pmtpa]] / [[sdtpa]] (decommissioned) || 1-999 || cp7 | ||
|- | |- | ||
| [[eqiad]] || 1000-1999 || db1001 | | [[eqiad]] || 1000-1999 || db1001 | ||
Line 122: | Line 136: | ||
|- | |- | ||
| [[eqsin]] || 5000-5999 || dns5001 | | [[eqsin]] || 5000-5999 || dns5001 | ||
|- | |||
|[[Drmrs cluster|drmrs]] | |||
|6000-6999 | |||
|cp6011 | |||
|} | |} | ||
Line 157: | Line 175: | ||
| analytics || analytics nodes (Hadoop, Hive, Impala, and various other things) | | analytics || analytics nodes (Hadoop, Hive, Impala, and various other things) | ||
|Being replaced by an-worker | |Being replaced by an-worker | ||
| | |Data Engineering SREs | ||
|- | |- | ||
| analytics-master || analytics master nodes ||Being replaced by an-master | | analytics-master || analytics master nodes ||Being replaced by an-master | ||
| | |Data Engineering SREs | ||
|- | |- | ||
| analytics-tool || virtual machines in production (Ganeti) running analytics tools/websites | | analytics-tool || virtual machines in production (Ganeti) running analytics tools/websites | ||
|Being replaced by an-tool | |Being replaced by an-tool | ||
| | |Data Engineering SREs | ||
|- | |- | ||
| an-coord || analytics coordination node | | an-coord || analytics coordination node | ||
|In use | |In use | ||
| | |Data Engineering SREs | ||
|- | |||
|an-db | |||
|analytics postgresql database cluster | |||
|In use | |||
|Data Engineering SREs | |||
|- | |- | ||
| an-master || analytics master node | | an-master || analytics master node | ||
|In use, replacing analytics-master | |In use, replacing analytics-master | ||
| | |Data Engineering SREs | ||
|- | |||
|an-mariadb | |||
|analytics-meta mariadb databases | |||
|In use | |||
|Data Engineering SREs | |||
|- | |- | ||
| an-tool || analytics tools node | | an-tool || analytics tools node | ||
|In use | |In use | ||
| | |Data Engineering SREs | ||
|- | |- | ||
| an-test-(coord/master/worker) || analytics hadoop test cluster nodes | | an-test-(coord/master/worker) || analytics hadoop test cluster nodes | ||
|In use | |In use | ||
| | |Data Engineering SREs | ||
|- | |- | ||
| an-worker || analytics worker node | | an-worker || analytics worker node | ||
|In use, replacing analyticsNNNN | |In use, replacing analyticsNNNN | ||
| | |Data Engineering SREs | ||
|- | |- | ||
| an-scheduler || analytics job scheduler node | | an-scheduler || analytics job scheduler node | ||
|In use | |In use | ||
| | |Data Engineering SREs | ||
|- | |- | ||
| an-airflow || analytics job scheduler node dedicated to the Discovery team | | an-airflow || analytics job scheduler node dedicated to the Discovery team | ||
|In use | |In use | ||
| | |Data Engineering SREs | ||
|- | |- | ||
| aphlict || notification server for Phabricator | | aphlict || notification server for Phabricator | ||
Line 206: | Line 234: | ||
|Analytics Query Service | |Analytics Query Service | ||
|In use | |In use | ||
| | |Data Engineering SREs | ||
|- | |- | ||
|archiva | |archiva | ||
|Archiva Artifact Repository | |Archiva Artifact Repository | ||
|In use | |In use | ||
| | |Data Engineering SREs | ||
|- | |- | ||
|auth | |auth | ||
Line 225: | Line 253: | ||
|backup | |backup | ||
|Backup hosts | |Backup hosts | ||
|In use | |||
| Data Persistence | |||
|- | |||
|backupmon | |||
|Backup monitoring hosts | |||
|In use | |In use | ||
| Data Persistence | | Data Persistence | ||
Line 234: | Line 267: | ||
|censorship | |censorship | ||
|Censorship monitoring databases and scripts | |Censorship monitoring databases and scripts | ||
| | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
|centrallog | |centrallog | ||
Line 241: | Line 274: | ||
|In use | |In use | ||
|Observability | |Observability | ||
|- | |||
|cephosd | |||
|Ceph servers for use with Data Engineering and similar storage requirements | |||
|In use | |||
|Data Engineering SREs | |||
|- | |- | ||
|certcentral | |certcentral | ||
Line 253: | Line 291: | ||
|- | |- | ||
| cloud*-dev || Any cloud role + '-dev' = internal deployment (PoC, Staging, etc) | | cloud*-dev || Any cloud role + '-dev' = internal deployment (PoC, Staging, etc) | ||
|In use | |In use | ||
|WMCS | |WMCS | ||
|- | |- | ||
| | | cloudbackup || Backup storage system for [[Help:Cloud Services introduction|WMCS]] | ||
|In use | |In use | ||
|WMCS | |WMCS | ||
|- | |- | ||
| | | cloudcephmon || Ceph monitor and manager daemon for [[Help:Cloud Services introduction|WMCS]] | ||
|In use | |In use | ||
|WMCS | |WMCS | ||
|- | |- | ||
| | | cloudcephosd || Ceph object storage data nodes for [[Help:Cloud Services introduction|WMCS]] | ||
|In use | |In use | ||
|WMCS | |WMCS | ||
|- | |- | ||
| cloudcontrol || OpenStack deployment controller for [[ | | cloudceph || Converged Ceph object storage and monitor nodes for [[Help:Cloud Services introduction|WMCS]] (only used for testing) | ||
|In use | |No longer used | ||
| | |||
|- | |||
| cloudcontrol || OpenStack deployment controller for [[Help:Cloud Services introduction|WMCS]] | |||
|In use | |||
|WMCS | |WMCS | ||
|- | |- | ||
| clouddb || Wiki replica servers for [[ | | clouddb || Wiki replica servers for [[Help:Cloud Services introduction|WMCS]] | ||
| | |In use | ||
| WMCS, with support from DBAs | | WMCS, with support from DBAs | ||
|- | |- | ||
| cloudelastic || Replication of ElasticSearch for [[ | | cloudelastic || Replication of ElasticSearch for [[Help:Cloud Services introduction|WMCS]] | ||
|In use | |In use | ||
|WMCS | |WMCS | ||
|- | |- | ||
| cloudgw || Cloud gateway server for [[ | | cloudgw || Cloud gateway server for [[Help:Cloud Services introduction|WMCS]] | ||
| | | In use | ||
| WMCS | | WMCS | ||
|- | |- | ||
| | |cloudmetrics | ||
|In use | |Monitoring server for WMCS | ||
|In use | |||
|WMCS | |WMCS | ||
|- | |- | ||
| | | cloudnet || Network gateway for tenants of [[Help:Cloud Services introduction|WMCS]] (Neutron l3) | ||
|In use | |In use | ||
|WMCS | |WMCS | ||
|- | |- | ||
| cloudvirt || OpenStack Hypervisor (libvirtd + KVM) for [[ | | cloudservices || Misc OpenStack components (Designate) for [[Help:Cloud Services introduction|WMCS]] | ||
|In use | |In use | ||
|WMCS | |||
|- | |||
| cloudstore || Storage system for [[Help:Cloud Services introduction|WMCS]] | |||
|In use | |||
|WMCS | |||
|- | |||
| cloudvirt || OpenStack Hypervisor (libvirtd + KVM) for [[Help:Cloud Services introduction|WMCS]] | |||
|In use | |||
|WMCS | |WMCS | ||
|- | |- | ||
|cloudvirtan | |cloudvirtan | ||
|OpenStack Hypervisor (libvirtd + KVM) for [[ | |OpenStack Hypervisor (libvirtd + KVM) for [[Help:Cloud Services introduction|WMCS]] (dedicated to [[Analytics]]) | ||
| | |No longer used | ||
| | | | ||
|- | |- | ||
| | |cloudvirt-wqds | ||
| | |OpenStack Hypervisor (libvirtd + KVM) for [[Help:Cloud Services introduction|WMCS]] (dedicated to WDQS) | ||
| | |||
|WMCS | |WMCS | ||
|- | |- | ||
| | |cloudweb | ||
|In use | |WMCS management websites (wikitech, horizon, striker) | ||
|In use | |||
|WMCS | |WMCS | ||
|- | |- | ||
Line 327: | Line 380: | ||
|In use | |In use | ||
|Infrastructure Foundations | |Infrastructure Foundations | ||
|- | |||
|datahubsearch | |||
|DataHub OpenSearch Cluster - used for [[phab:tag/data-catalog/|Data Catalog MVP]] | |||
|In use | |||
|Data Engineering SREs | |||
|- | |- | ||
| dataset || dataset dumps storage | | dataset || dataset dumps storage | ||
|No longer used (deprecated) | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
| db || Database host | | db || Database host | ||
Line 355: | Line 413: | ||
| dbstore || Database analytics | | dbstore || Database analytics | ||
|In use | |In use | ||
| | | Data Engineering SREs & Data Persistence | ||
|- | |- | ||
|debmonitor | |debmonitor | ||
Line 383: | Line 441: | ||
| [http://druid.io/ an-druid] || Druid Cluster (Analytics). Due to naming legacy, druid100[1-3] are also in this cluster. | | [http://druid.io/ an-druid] || Druid Cluster (Analytics). Due to naming legacy, druid100[1-3] are also in this cluster. | ||
|In use | |In use | ||
| | |Data Engineering SREs | ||
|- | |- | ||
| [http://druid.io/ druid] || Druid Cluster (Public) | | [http://druid.io/ druid] || Druid Cluster (Public) | ||
|In use | |In use | ||
| | |Data Engineering SREs | ||
|- | |||
| dse-k8s-etcd || etcd server for the kubernetes cluster of Data Science and Engineering | |||
|In use | |||
|Data Engineering SREs | |||
|- | |||
| dse-k8s-ctrl || control plane server for the kubernetes cluster of Data Science and Engineering | |||
|In use | |||
|Data Engineering SREs | |||
|- | |||
| dse-k8s-worker || worker node for the kubernetes cluster of Data Science and Engineering | |||
|In use | |||
|Data Engineering SREs | |||
|- | |- | ||
| dumpsdata || dataset generation fileset serving to snapshot hosts | | dumpsdata || dataset generation fileset serving to snapshot hosts | ||
|In use | |In use | ||
| | |Platform Engineering | ||
|- | |||
|durum | |||
|Check service for Wikidough | |||
|In use | |||
|Traffic | |||
|- | |- | ||
| elastic || elasticsearch servers | | elastic || elasticsearch servers | ||
Line 412: | Line 487: | ||
|[[Analytics/Systems/EventLogging|EventLogging]] host | |[[Analytics/Systems/EventLogging|EventLogging]] host | ||
|In use | |In use | ||
| | |Data Engineering SREs | ||
|- | |- | ||
|flowspec | |flowspec | ||
Line 427: | Line 502: | ||
|Ganeti Virtualization Cluster | |Ganeti Virtualization Cluster | ||
|In use | |In use | ||
|Infrastructure Foundations | |||
|- | |||
|ganeti-test | |||
|Ganeti Virtualization Cluster (test setup) | |||
|in use | |||
|Infrastructure Foundations | |Infrastructure Foundations | ||
|- | |- | ||
Line 432: | Line 512: | ||
|Gerrit code review (gerrit1001 in eqiad is currently used) | |Gerrit code review (gerrit1001 in eqiad is currently used) | ||
|In use (deprecated) | |In use (deprecated) | ||
|Service Operations | |Service Operations & Release Engineering | ||
|- | |- | ||
|gitlab | |gitlab | ||
| | |Gitlab servers | ||
|In use ([[phab:T274459]]) | |In use ([[phab:T274459]]) | ||
| | | Service Operations | ||
|- | |- | ||
| grafana || Grafana server | | grafana || Grafana server | ||
Line 457: | Line 537: | ||
|- | |- | ||
| install || Installation server | | install || Installation server | ||
|In use | |In use | ||
|Infrastructure Foundations | |Infrastructure Foundations | ||
|- | |- | ||
| kafka || Kafka brokers | | kafka || Kafka brokers | ||
|In use | |In use | ||
| | |Data Engineering SREs & Infrastructure Foundations | ||
|- | |- | ||
|kafka-jumbo | |kafka-jumbo | ||
|Large general purpose Kafka cluster | |Large general purpose Kafka cluster | ||
|In use | |In use | ||
| | |Data Engineering SREs & Infrastructure Foundations | ||
|- | |- | ||
| kafkamon || Kafka monitoring (VMs) | | kafkamon || Kafka monitoring (VMs) | ||
|In use | |In use | ||
| | |Data Engineering SREs & Infrastructure Foundations | ||
|- | |||
|karapace | |||
|DataHub Schema Registry server (standalone) - Used for the [[phab:tag/data-catalog/|Data Catalog MVP]] | |||
|In use | |||
|Data Engineering SREs | |||
|- | |- | ||
| knsq || [[knams]] squid | | knsq || [[knams]] squid | ||
Line 480: | Line 565: | ||
|Kerberos KDC/Kadmin | |Kerberos KDC/Kadmin | ||
|In use | |In use | ||
|Infrastructure Foundations & | |Infrastructure Foundations & Data Engineering SREs | ||
|- | |- | ||
|kubernetes | |kubernetes | ||
Line 504: | Line 589: | ||
| lab || labs virtual node | | lab || labs virtual node | ||
|No longer used (deprecated) | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
| labcontrol || Controller node for [[ | | labcontrol || Controller node for [[Help:Cloud Services introduction|WMCS]] (aka "labs") | ||
| | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
| labnet || Networking host for WMCS | | labnet || Networking host for WMCS | ||
| | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
| labnodepool || Dedicated WMCS host for [[Nodepool]] (CI) | | labnodepool || Dedicated WMCS host for [[Nodepool]] (CI) | ||
| | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
|labpuppetmaster | |labpuppetmaster | ||
|Puppetmasters for WMCS | |Puppetmasters for WMCS | ||
| | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
| labsdb || Replication of production databases for WMCS | | labsdb || Replication of production databases for WMCS | ||
| | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
|labservices | |labservices | ||
|Services for WMCS | |Services for WMCS | ||
| | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
| labstore || Disk storage for WMCS | | labstore || Disk storage for WMCS | ||
Line 542: | Line 623: | ||
|labtest* | |labtest* | ||
|Test hosts for WMCS | |Test hosts for WMCS | ||
| | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
| labvirt || Virtualization node for WMCS | | labvirt || Virtualization node for WMCS | ||
| | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
|labweb | |labweb | ||
|Management websites for WMCS | |Management websites for WMCS | ||
| | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
|lists | |lists | ||
|Mailing lists running [[Mailman]] | |Mailing lists running [[Mailman]] | ||
|In use | |In use | ||
| | |[[User:Legoktm|Legoktm]] and [[User:Ladsgroup|Ladsgroup]] | ||
|- | |- | ||
| logstash || elasticsearch/logstash/kibana node | | logstash || elasticsearch/logstash/kibana node | ||
Line 570: | Line 651: | ||
|Maps cluster | |Maps cluster | ||
|In use | |In use | ||
| | |Content Transform Team and [[User:Hnowlan|hnowlan]] | ||
|- | |- | ||
|maps-test | |maps-test | ||
Line 585: | Line 666: | ||
|Service Operations | |Service Operations | ||
|- | |- | ||
| ml- | |mc-wf | ||
|[[memcached]] servers fro wikifunctions | |||
|In use | |||
|Service Operations | |||
|- | |||
| ml-staging || Machine learning stanging env etcd and control plane machines | |||
| In use | | In use | ||
| ML team | | ML team | ||
Line 592: | Line 678: | ||
| In use | | In use | ||
| ML team | | ML team | ||
|- | |||
|ml-cache | |||
|Machine leaning caching nodes | |||
|In use | |||
|ML team | |||
|- | |||
|mirror | |||
|public mirror, e.g. Debian mirror, Ubuntu mirror | |||
|In use | |||
|Infrastructure Foundations | |||
|- | |- | ||
|miscweb | |miscweb | ||
|miscellaneous web server | |miscellaneous web server | ||
| | |In use | ||
|Service Operations | |Service Operations | ||
|- | |- | ||
| ms || media storage | | ms || media storage | ||
|No longer used (deprecated) | |No longer used (deprecated) | ||
|Data Persistence (Media Storage) | |||
|- | |||
| ms-backup || media storage backup generation (workers) | |||
|In use | |||
|Data Persistence (Media Storage) | |Data Persistence (Media Storage) | ||
|- | |- | ||
Line 641: | Line 741: | ||
|Infrastructure Foundations | |Infrastructure Foundations | ||
|- | |- | ||
| netmon || Network monitor ( | | netmon || Network monitor (librenms, rancid, etc) | ||
|In use | |In use | ||
|Infrastructure Foundations | |Infrastructure Foundations | ||
Line 662: | Line 762: | ||
|notebook | |notebook | ||
|Jupyterhub experimental server | |Jupyterhub experimental server | ||
| | |Unused | ||
| | | | ||
|- | |- | ||
| nfs || [[NFS]] server | | nfs || [[NFS]] server | ||
Line 688: | Line 788: | ||
|- | |- | ||
| oresrdb || ORES Redis systems | | oresrdb || ORES Redis systems | ||
| | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
| pc || [[Parser cache]] database | | pc || [[Parser cache]] database | ||
Line 742: | Line 842: | ||
|proton | |proton | ||
|Proton cluster | |Proton cluster | ||
| | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
|puppetboard | |puppetboard | ||
Line 766: | Line 866: | ||
| rbf || Redis Bloom Filter server | | rbf || Redis Bloom Filter server | ||
|Unused | |Unused | ||
| | | | ||
|- | |- | ||
| rcs ||[[Obsolete:RCStream]] server (recent changes stream) | | rcs ||[[Obsolete:RCStream]] server (recent changes stream) | ||
Line 801: | Line 901: | ||
|- | |- | ||
| sca || Service Cluster A - Includes various services | | sca || Service Cluster A - Includes various services | ||
| | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
| scb || Service Cluster B - Includes various services. It's effectively the next generation of the sca cluster above | | scb || Service Cluster B - Includes various services. It's effectively the next generation of the sca cluster above | ||
| | |No longer used (deprecated) | ||
| | | | ||
|- | |- | ||
|schema | |schema | ||
|Event Schemas HTTP server | |Event Schemas HTTP server | ||
|In use | |In use | ||
| | |Data Engineering SREs & Service Operations | ||
|- | |- | ||
|search-loader | |search-loader | ||
Line 819: | Line 919: | ||
|- | |- | ||
|sessionstore | |sessionstore | ||
| | |Cassandra cluster for sessionstore | ||
| | |In use | ||
|Service Operations | |Service Operations | ||
|- | |- | ||
| snapshot || Data dump processing node | | snapshot || Data dump processing node | ||
|In use | |In use | ||
| | |Platform Engineering | ||
|- | |- | ||
| sq || squid server | | sq || squid server | ||
Line 837: | Line 937: | ||
| stat || statistics computation hosts (see [[Analytics/Data access#Stats machines|Analytics/Data access]]) | | stat || statistics computation hosts (see [[Analytics/Data access#Stats machines|Analytics/Data access]]) | ||
|In use | |In use | ||
| | |Data Engineering SREs | ||
|- | |- | ||
| storage || storage host | | storage || storage host | ||
Line 877: | Line 977: | ||
|No longer used (deprecated) | |No longer used (deprecated) | ||
| | | | ||
|- | |||
| wcqs || wikicommons query service | |||
|In use | |||
|Search Platform SREs | |||
|- | |- | ||
| wdqs || wikidata query service | | wdqs || wikidata query service | ||
Line 890: | Line 994: | ||
|Service Operations | |Service Operations | ||
|- | |- | ||
| xhgui || A graphical interface for PHP debug profiles | | xhgui || A graphical interface for PHP debug profiles. See [[Performance/Runbook/XHGui service]]. | ||
|In use | |In use | ||
|Performance & Service Operations | |Performance & Service Operations | ||
Line 902: | Line 1,006: | ||
=== Miscellaneous servers === | === Miscellaneous servers === | ||
Historically, we used per-datacenter naming schemes for any one-off or single host. This included any software that wasn't load balanced across multiple machines, or general task machines that could cluster (to an extent) but required opsen work to do so. | |||
Instead of being named for their purpose, these hosts were named according to a naming convention for their datacenter: | |||
* Hosts in eqiad were named for chemical elements, in order of increasing atomic number. | |||
* Hosts in codfw were named for stars. Stars in the Orion constellation were reserved for fundraising (Alnilam, Alnitak, Bellatrix, <s>Betelgeuse</s>, Heka, Meissa, Mintaka, Nair Al Saif, Rigel, Saiph, Tabit, Thabit). | |||
* Hosts in esams or knams were named for notable Dutch people. | |||
'''These naming schemes are deprecated in favour of specialized cluster names above.''' Even if you're certain that the foobar service will only ever use a single host, you should name that host "foobar1001" (or 2001, 3001, etc. as appropriate to the datacenter). | |||
One-off names were easy to come up with—especially for machines that did more than one kind of thing, where it's hard to identify a single descriptive name—but they were also opaque. Engineers had to know that the eqiad MediaWiki maintenance host was "terbium" and the codfw package-build host was "deneb." Naming these machines "mwmaint1001" and "build2001" is easier for sleepy oncallers to remember in an emergency, and friendlier to new hires who have to learn all the names at once. | |||
Some older hosts in production still use these naming schemes, but new hosts should not use them. | |||
[[Category:Operations policies]] | [[Category:Operations policies]] |
Latest revision as of 15:06, 30 January 2023
This page documents the naming conventions of servers, routers, and data center sites.
Our servers currently fall in broadly two categories:
- Clustered servers: These use numeral sequences with a descriptive prefix (see #Networking and #Servers). For example: db1001.
- Miscellaneous servers: These used unique hostnames (see #Miscellaneous servers). For example: helium. This naming convention is deprecated and not used for new hosts, but some older miscellaneous-named hosts still exist.
Name reuse
Historically, we did not reuse names of past servers for new servers. For example, after db1001 is decommissioned, no other server will be named db1001. Ganeti VMs sometimes reuse hostnames, but bare metal typically will not.
The notable exception is networking gear, which are deterministically specified by rack. For example the access switch in Eqiad rack A8 is named asw-a8-eqiad. If it is replaced, the new switch will take the same name.
All hardware in the datacenter space is tracked in Netbox, which can be used to check for existing hostnames for both hardware and ganeti instances.
Data centers
Data centers are named as vendor initials (at time of lease signing) followed by the IATA code for a nearby major airport.
For example: our Dallas site is named codfw. The vendor is CyrusOne, and DFW being the large nearby airport. (Technically, Love Field airport is closer but less well-known.)
DC | Vendor | Airport Code |
---|---|---|
codfw | CyrusOne | DFW |
drmrs | Digital Realty | MRS |
eqdfw | Equinix | DFW |
eqiad | Equinix | IAD |
eqord | Equinix | ORD |
eqsin | Equinix | SIN |
esams | EvoSwitch | AMS |
knams | Kennisnet | AMS |
ulsfo | United Layer | SFO |
Networking
Naming for network equipment is based on role and location.
This also applies to: power distribution units, serial console servers, and other networking infrastructure.
Name prefix | Role | Example |
---|---|---|
asw | access switch | asw-a1-eqiad |
cr | core router | cr1-eqiad |
mr | management router | mr1-eqiad |
msw | management switch | msw1-eqiad & msw-b2-eqiad |
pfw | payments fire wall | pfw1-eqiad |
ps1 / ps2 | power strips/distribution units | ps1-b3-eqiad |
scs | serial console server | scs-a8-eqiad |
fasw | Fundraising access switch | fasw-c-codfw |
cloudsw | Cloud L3 switches | cloudsw1-c8-eqiad |
OpenStack deployments
[Datacenter Site][numeric identifier](optional dev suffix to indicate non-external non-customer facing deployments) - [r (if region)][letter for AZ]
- Current Eqiad/Codfw deployments will not fully meet these standards until rebuilt: [eqiad0 (deployment), eqiad (region), nova (AZ)]
Deployment | Region | Availability Zone |
---|---|---|
eqiad0 | eqiad0-r | eqiad0-rb |
eqiad1 | eqiad1-r | eqiad1-rb |
codfw0dev | codfw0dev-r | codfw0dev-rb |
codfw1dev | codfw1dev-r | codfw1dev-rb |
Disks
- Arrays must use the
Storage array
device role in Netbox. - Naming follows two conventions:
- Array is attached to a single host:
- hostname_of_host_system-arrayN
- Example: ms2001-array1, ms2001-array2
- all arrays get a number, even if there is only a single array.
- Example: dataset1001-array1
- Array is attached to multiple hosts
- Labs uses this for labstore, each shelf connects to two different hosts. As such, the older single host naming scheme fails.
- servicehostgroup-arrayN-site
- Example: labstore-array1-codfw, labstore-array2-codfw
Kubernetes
Any cluster that is not the main wikikube cluster should follow these conventions:
- Cluster name: <identifier>-k8s (ex: dse-k8s, aux-k8s)
- Control plane service name: <identifier>-k8s-ctrl
- Ingress service name: <identifier>-k8s-ingress [-ro|-rw] for active/active or active/passive
- Hostnames for control plane : <identifier>-k8s-ctrlXXXX.$site.wmnet
- Hostnames for kubelets : <identifier>-k8s-workerXXXX.$site.wmnet
Servers
Any system that runs in a dedicated services cluster with other machines will be named after their role/service task. As a rule, we attempt to name after the service, not just the software package. Also, servers within a group are numbered based on the datacenter they are located in.
Data center | Numeral range | Example |
---|---|---|
pmtpa / sdtpa (decommissioned) | 1-999 | cp7 |
eqiad | 1000-1999 | db1001 |
codfw | 2000-2999 | mw2187 |
esams / knams | 3000-3999 | cp3031 |
ulsfo | 4000-4999 | bast4001 |
eqsin | 5000-5999 | dns5001 |
drmrs | 6000-6999 | cp6011 |
When adding a new datacenter, make sure to update operations/puppet.git
's /typos
file which checks hostnames.
Name prefix | Description | Status | Points of contact |
---|---|---|---|
acmechief | ACME certificate manager | In use | Traffic |
acmechief-test | ACME certificate manager staging environment | In use | Traffic |
alert | Alerting host (Icinga / Alertmanager) | In use | Observability |
amssq | esams caching server | No longer used (deprecated) | |
amslvs | esams LVS | No longer used (deprecated) | |
analytics | analytics nodes (Hadoop, Hive, Impala, and various other things) | Being replaced by an-worker | Data Engineering SREs |
analytics-master | analytics master nodes | Being replaced by an-master | Data Engineering SREs |
analytics-tool | virtual machines in production (Ganeti) running analytics tools/websites | Being replaced by an-tool | Data Engineering SREs |
an-coord | analytics coordination node | In use | Data Engineering SREs |
an-db | analytics postgresql database cluster | In use | Data Engineering SREs |
an-master | analytics master node | In use, replacing analytics-master | Data Engineering SREs |
an-mariadb | analytics-meta mariadb databases | In use | Data Engineering SREs |
an-tool | analytics tools node | In use | Data Engineering SREs |
an-test-(coord/master/worker) | analytics hadoop test cluster nodes | In use | Data Engineering SREs |
an-worker | analytics worker node | In use, replacing analyticsNNNN | Data Engineering SREs |
an-scheduler | analytics job scheduler node | In use | Data Engineering SREs |
an-airflow | analytics job scheduler node dedicated to the Discovery team | In use | Data Engineering SREs |
aphlict | notification server for Phabricator | In use | Service Operations |
apt | Advanced Package Tool Repository (Debian APT repo) | In use | Infrastructure Foundations |
aqs | Analytics Query Service | In use | Data Engineering SREs |
archiva | Archiva Artifact Repository | In use | Data Engineering SREs |
auth | Authentication server | In use | Infrastructure Foundations |
authdns | Authoritative DNS (gdsnd) | In use | Traffic |
backup | Backup hosts | In use | Data Persistence |
backupmon | Backup monitoring hosts | In use | Data Persistence |
bast | bastion host | In use | Infrastructure Foundations |
censorship | Censorship monitoring databases and scripts | No longer used (deprecated) | |
centrallog | Centralized syslog | In use | Observability |
cephosd | Ceph servers for use with Data Engineering and similar storage requirements | In use | Data Engineering SREs |
certcentral | Central certificates service | No longer used (deprecated) | |
chartmuseum | Helm Chart repository ChartMuseum | In use | Service Operations |
cloud*-dev | Any cloud role + '-dev' = internal deployment (PoC, Staging, etc) | In use | WMCS |
cloudbackup | Backup storage system for WMCS | In use | WMCS |
cloudcephmon | Ceph monitor and manager daemon for WMCS | In use | WMCS |
cloudcephosd | Ceph object storage data nodes for WMCS | In use | WMCS |
cloudceph | Converged Ceph object storage and monitor nodes for WMCS (only used for testing) | No longer used | |
cloudcontrol | OpenStack deployment controller for WMCS | In use | WMCS |
clouddb | Wiki replica servers for WMCS | In use | WMCS, with support from DBAs |
cloudelastic | Replication of ElasticSearch for WMCS | In use | WMCS |
cloudgw | Cloud gateway server for WMCS | In use | WMCS |
cloudmetrics | Monitoring server for WMCS | In use | WMCS |
cloudnet | Network gateway for tenants of WMCS (Neutron l3) | In use | WMCS |
cloudservices | Misc OpenStack components (Designate) for WMCS | In use | WMCS |
cloudstore | Storage system for WMCS | In use | WMCS |
cloudvirt | OpenStack Hypervisor (libvirtd + KVM) for WMCS | In use | WMCS |
cloudvirtan | OpenStack Hypervisor (libvirtd + KVM) for WMCS (dedicated to Analytics) | No longer used | |
cloudvirt-wqds | OpenStack Hypervisor (libvirtd + KVM) for WMCS (dedicated to WDQS) | WMCS | |
cloudweb | WMCS management websites (wikitech, horizon, striker) | In use | WMCS |
conf | Configuration system host (etcd, zookeeper...) | In use | Service Operations |
contint | Continuous Integration | In use | Service Operations |
cp | Cache proxy (Varnish) | In use | Traffic |
cumin | Cluster management (cumin/spicerack/debdeploy/etc...) | In use | Infrastructure Foundations |
datahubsearch | DataHub OpenSearch Cluster - used for Data Catalog MVP | In use | Data Engineering SREs |
dataset | dataset dumps storage | No longer used (deprecated) | |
db | Database host | In use | Data Persistence |
dbmonitor | Database monitoring | In use | Data Persistence |
dborch | Database orchestration (MySQL Orchestrator) | In use | Data Persistence |
dbprov | Database backup generation and data provisioning | In use | Data Persistence |
dbproxy | Database proxy | In use | Data Persistence |
dbstore | Database analytics | In use | Data Engineering SREs & Data Persistence |
debmonitor | Debian packages monitoring | In use | Infrastructure Foundations |
deploy | Deployment hosts | In use | Service Operations |
dns | DNS recursors | In use | Infrastructure Foundations |
doc | Documentation server (CI) | In use | Service Operations (Supportive Services) & Release Engineering |
doh | Wikidough Anycasted | In use | Traffic |
an-druid | Druid Cluster (Analytics). Due to naming legacy, druid100[1-3] are also in this cluster. | In use | Data Engineering SREs |
druid | Druid Cluster (Public) | In use | Data Engineering SREs |
dse-k8s-etcd | etcd server for the kubernetes cluster of Data Science and Engineering | In use | Data Engineering SREs |
dse-k8s-ctrl | control plane server for the kubernetes cluster of Data Science and Engineering | In use | Data Engineering SREs |
dse-k8s-worker | worker node for the kubernetes cluster of Data Science and Engineering | In use | Data Engineering SREs |
dumpsdata | dataset generation fileset serving to snapshot hosts | In use | Platform Engineering |
durum | Check service for Wikidough | In use | Traffic |
elastic | elasticsearch servers | In use | Search Platform SREs |
es | Database host for MediaWiki external storage (wiki content, compressed) | In use | Data Persistence |
etcd | Etcd server | In use | Service Operations |
etherpad | Etherpad server | In use | Service Operations |
eventlog | EventLogging host | In use | Data Engineering SREs |
flowspec | Network controller | In use (testing) | Infrastructure Foundations |
fr* | Fundraising servers, e.g. frdb, frlog, frpm (puppetmaster) | In use | fr-tech SREs |
ganeti | Ganeti Virtualization Cluster | In use | Infrastructure Foundations |
ganeti-test | Ganeti Virtualization Cluster (test setup) | in use | Infrastructure Foundations |
gerrit | Gerrit code review (gerrit1001 in eqiad is currently used) | In use (deprecated) | Service Operations & Release Engineering |
gitlab | Gitlab servers | In use (phab:T274459) | Service Operations |
grafana | Grafana server | In use | Observability |
graphite | Graphite server | In use | Observability |
icinga | Icinga servers | In use | Observability |
idp | Identity provider (Apereo CAS) | In use | Infrastructure Foundations |
install | Installation server | In use | Infrastructure Foundations |
kafka | Kafka brokers | In use | Data Engineering SREs & Infrastructure Foundations |
kafka-jumbo | Large general purpose Kafka cluster | In use | Data Engineering SREs & Infrastructure Foundations |
kafkamon | Kafka monitoring (VMs) | In use | Data Engineering SREs & Infrastructure Foundations |
karapace | DataHub Schema Registry server (standalone) - Used for the Data Catalog MVP | In use | Data Engineering SREs |
knsq | knams squid | No longer used (deprecated) | |
krb | Kerberos KDC/Kadmin | In use | Infrastructure Foundations & Data Engineering SREs |
kubernetes | Kubernetes cluster (k8s) | In use | Service Operations |
kubestage | Kubernetes staging cluster | In use | Service Operations |
kubestagetcd | Etcd cluster for the Kubernetes staging cluster | In use | Service Operations |
kubetcd | Etcd cluster for the Kubernetes cluster | In use | Service Operations |
lab | labs virtual node | No longer used (deprecated) | |
labcontrol | Controller node for WMCS (aka "labs") | No longer used (deprecated) | |
labnet | Networking host for WMCS | No longer used (deprecated) | |
labnodepool | Dedicated WMCS host for Nodepool (CI) | No longer used (deprecated) | |
labpuppetmaster | Puppetmasters for WMCS | No longer used (deprecated) | |
labsdb | Replication of production databases for WMCS | No longer used (deprecated) | |
labservices | Services for WMCS | No longer used (deprecated) | |
labstore | Disk storage for WMCS | In use (deprecated) | WMCS |
labtest* | Test hosts for WMCS | No longer used (deprecated) | |
labvirt | Virtualization node for WMCS | No longer used (deprecated) | |
labweb | Management websites for WMCS | No longer used (deprecated) | |
lists | Mailing lists running Mailman | In use | Legoktm and Ladsgroup |
logstash | elasticsearch/logstash/kibana node | In use | Observability |
lvs | lvs load balancer | In use | Traffic |
maps | Maps cluster | In use | Content Transform Team and hnowlan |
maps-test | maps test cluster | No longer used (deprecated) | |
mc | memcached server | In use | Service Operations |
mc-gp | memcached gutter pool server | In use | Service Operations |
mc-wf | memcached servers fro wikifunctions | In use | Service Operations |
ml-staging | Machine learning stanging env etcd and control plane machines | In use | ML team |
ml-serve | Machine learning serving cluster (ml-serve-ctrl* are VMs for k8s control plane) | In use | ML team |
ml-cache | Machine leaning caching nodes | In use | ML team |
mirror | public mirror, e.g. Debian mirror, Ubuntu mirror | In use | Infrastructure Foundations |
miscweb | miscellaneous web server | In use | Service Operations |
ms | media storage | No longer used (deprecated) | Data Persistence (Media Storage) |
ms-backup | media storage backup generation (workers) | In use | Data Persistence (Media Storage) |
ms-be | media storage backend | In use | Data Persistence (Media Storage) |
ms-fe | media storage frontend | In use | Data Persistence (Media Storage) |
mw | MediaWiki application server (MediaWiki PHP webservers, api, jobrunners, videoscalers) | In use | Service Operations |
mwdebug | MediaWiki application server for debugging and deployment staging (Ganeti VMs) | In use | Service Operations |
mwlog | MediaWiki logging host | In use | Service Operations |
mwmaint | MediaWiki maintenance host (formerly "terbium") | In use | Service Operations |
mx | Mail relays | In use | Infrastructure Foundations |
nas | NAS boxes (NetApp) | Unused | |
netflow | Network visibility | In use | Infrastructure Foundations |
netmon | Network monitor (librenms, rancid, etc) | In use | Infrastructure Foundations |
netbox | Netbox front-end instances | In use | Infrastructure Foundations |
netbox-dev | Netbox test instances | In use | Infrastructure Foundations |
netboxdb | Netbox back-end database instances | In use | Infrastructure Foundations |
notebook | Jupyterhub experimental server | Unused | |
nfs | NFS server | Unused | |
peek | Security Team workflow and project management tooling | In use | Security Team |
ocg | offline content generator (PDF) | No longer used (deprecated) | |
ores | ORES cluster | In use | Machine Learning SREs |
orespoolcounter | ORES PoolCounter | In use | Machine Learning SREs |
oresrdb | ORES Redis systems | No longer used (deprecated) | |
pc | Parser cache database | In use | SRE Data Persistence (DBAs), with support from Platform and Performance |
PDF Collections | No longer used (deprecated) | ||
people | peopleweb (people.wikimedia.org) | In use | Service Operations & Infrastructure Foundations |
parse | parsoid | Soon in use | Service Operations |
phab | Phabricator host (currently iridium is eqiad phab host) | In use | Service Operations |
ping | Ping offload server | In use | Infrastructure Foundations |
planet | Planet server | In use (mistake) | Service Operations |
pki | PKI Server (CFSSL) | In use | Infrastructure Foundations |
pki-root | PKI Root CA Server (CFSSL) | In use | Infrastructure Foundations |
poolcounter | PoolCounter cluster | In use | Service Operations |
prometheus | Prometheus cluster | In use | Observability |
proton | Proton cluster | No longer used (deprecated) | |
puppetboard | PuppetDB Web UI | In use | Service Operations |
puppetdb | PuppetDB cluster | In use | Service Operations |
puppetmaster | Puppet masters | In use | Infrastructure Foundations |
pybal-test | PyBal testing and development | In use | Traffic |
rbf | Redis Bloom Filter server | Unused | |
rcs | Obsolete:RCStream server (recent changes stream) | No longer used (deprecated) | |
rdb | Redis server | In use | Service Operations |
registry | Docker registries | In use | Service Operations |
releases | Software Releases | In use | Service Operations |
relforge | Discovery's Relevance Forge (see discovery/relevanceForge.git, T131184) | In use | Search Platform SREs |
restbase | RESTBase server | In use | Service Operations |
rpki | RPKI#Validation | In use | Infrastructure Foundations |
sca | Service Cluster A - Includes various services | No longer used (deprecated) | |
scb | Service Cluster B - Includes various services. It's effectively the next generation of the sca cluster above | No longer used (deprecated) | |
schema | Event Schemas HTTP server | In use | Data Engineering SREs & Service Operations |
search-loader | Analytics to Elastic Search model data loader | In use | Search Platform SREs |
sessionstore | Cassandra cluster for sessionstore | In use | Service Operations |
snapshot | Data dump processing node | In use | Platform Engineering |
sq | squid server | No longer used (deprecated) | |
srv | apache server | No longer used (deprecated) | |
stat | statistics computation hosts (see Analytics/Data access) | In use | Data Engineering SREs |
storage | storage host | No longer used (deprecated) | |
testreduce | parsoid visual diff testing | In use | Service Operations |
thanos-be | Prometheus long term storage backend | In use | Observability |
thanos-fe | Prometheus long term storage frontend | In use | Observability |
thumbor | Thumbor | In use | Service Operations (& Performance) |
tmh | MediaWiki videoscaler (TimedMediaHandler). See T105009 and T115950. | No longer used (deprecated) | |
torrelay | Tor relay | No longer used (deprecated) | |
urldownloader | url-downloader | In use (added in T224551) | Service Operations |
virt | labs virtualization nodes | No longer used (deprecated) | |
wcqs | wikicommons query service | In use | Search Platform SREs |
wdqs | wikidata query service | In use | Search Platform SREs |
webperf | webperf metrics (performance team). See T179036. | In use | Performance & Service Operations |
wtp | wiki-text processor node (parsoid) | In use | Service Operations |
xhgui | A graphical interface for PHP debug profiles. See Performance/Runbook/XHGui service. | In use | Performance & Service Operations |
dragonfly-supernode | Supernode for Dragonfly P2P network (distributing docker images) (T286054) | In use | Service Operations |
Miscellaneous servers
Historically, we used per-datacenter naming schemes for any one-off or single host. This included any software that wasn't load balanced across multiple machines, or general task machines that could cluster (to an extent) but required opsen work to do so.
Instead of being named for their purpose, these hosts were named according to a naming convention for their datacenter:
- Hosts in eqiad were named for chemical elements, in order of increasing atomic number.
- Hosts in codfw were named for stars. Stars in the Orion constellation were reserved for fundraising (Alnilam, Alnitak, Bellatrix,
Betelgeuse, Heka, Meissa, Mintaka, Nair Al Saif, Rigel, Saiph, Tabit, Thabit). - Hosts in esams or knams were named for notable Dutch people.
These naming schemes are deprecated in favour of specialized cluster names above. Even if you're certain that the foobar service will only ever use a single host, you should name that host "foobar1001" (or 2001, 3001, etc. as appropriate to the datacenter).
One-off names were easy to come up with—especially for machines that did more than one kind of thing, where it's hard to identify a single descriptive name—but they were also opaque. Engineers had to know that the eqiad MediaWiki maintenance host was "terbium" and the codfw package-build host was "deneb." Naming these machines "mwmaint1001" and "build2001" is easier for sleepy oncallers to remember in an emergency, and friendlier to new hires who have to learn all the names at once.
Some older hosts in production still use these naming schemes, but new hosts should not use them.