You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Difference between revisions of "SRE/Infrastructure Foundations/Ownership"
(normal heading size, less surprising, easier to replicate)
m (Switch wmf-auto-reimage to the reimage cookbook)
|Line 63:||Line 63:|
Latest revision as of 09:16, 11 October 2021
|Install server||Bare metal Infrastructure||An install server consists of DHCP, TFTP, webproxy (Squid) and apt.wikimedia.org (reprepro) servers.|
|Ganeti||Bare metal Infrastructure||Clustered virtual machine management software tool built on top of existing virtualization technologies such as Xen or KVM and other open source software. It supports both KVM and Xen. At WMF we only have KVM as an enabled hypervisor.|
|Puppet||Configuration Management Systems||Puppet is the main configuration management tool to be used on the Wikimedia clusters.
|PCC||Configuration Management Systems||PCC - Puppet compiler. Compiler run Puppet Server and PuppetDB services, as well as a file sync client. When triggered by a web endpoint, file sync takes changes from the working directory on the primary server and deploys the code to a live code directory. File sync then deploys that code to all compilers.|
|Puppetboard||Configuration Management Systems||Puppetboard is a web interface to PuppetDB aiming to replace the reporting functionality of Puppet Enterprise console.||https://puppetboard.wikimedia.org/|
|Debmonitor||Configuration Management Systems||DebMonitor is a Debian package tracker website and tool developed at the Wikimedia Foundation and used to track installed and upgradable packages across the fleet. It consists of DebMonitor website and DebMonitor client.||https://debmonitor.wikimedia.org/|
|Homer||Configuration Management Systems||Homer is our homemade network configuration manager. It takes variables from Netbox and yaml files, run them through jinja templates to generate Juniper compatible configuration. Homer can then send those configurations to selected network devices, for a diff or a safe commit.||https://phabricator.wikimedia.org/tag/homer/|
|Spicerack||Orchestration Tooling||Spicerack is a Python library to orchestrate tasks in the Wikimedia Foundation production environment. It comes with an easy API and a cookbook entry point script that allows to write simple Cookbooks to automate and orchestrate tasks.|
|Server Lifecycle/Reimage||Orchestration Tooling||Fully automated OS (re)installation for physical hosts.|
|Debdeploy||Orchestration Tooling||Debdeploy allows the deployment of software updates in Debian (or Debian-based) environments on a large scale. It is based on Cumin; updates are initiated via the debdeploy tool running on the Cumin master. Servers can be grouped into arbitrary sets of servers/services based on the Cumin syntax.|
|Conftool||Orchestration Tooling||Conftool is a set of tools we use to sync and manage the dynamic state configuration for services (varnish backend, the pybal pools, the DNS discovery entries, and some variables in Mediawiki configuration). This configuration is stored in the distributed key/value store: Etcd.|
|Dbctl||Orchestration Tooling||Dbctl is a tool based on conftool to store Mediawiki's database loadbalancer configuration in etcd.|
|Cumin||Orchestration Tooling||Cumin is an automation and orchestration framework that provides a flexible and scalable automation framework to execute multiple commands on multiple hosts in parallel.
It allows to easily perform complex selections of hosts through a user-friendly query language which can interface with different backend modules and combine their results for a fine grained selection. The transport layer can also be selected, and can provide multiple execution strategies. The executed commands outputs are automatically grouped for an easy-to-read result.
|Wmflib||Orchestration Tooling||A Python package that contains custom modules to interact with the WMF production infrastructure.
It can be used in any script throughout the fleet as it doesn't require any special privilege to be run, as opposed to Spicerack and its Cookbooks and removes the need to re-implement each time the same functionalities over and over again.
|PKI||Infrastructure security and packaging||A public key infrastructure is a set of roles, policies, hardware, software and procedures needed to create, manage, distribute, use, store and revoke digital certificates and manage public-key encryption. We currently use CFSSL to provide and manage PKI solutions. Clients are able to make use of the CFSSL API end point (it requires using the puppet agent certificate). Further to the client auth requirement API request also need to be signed with a hmac using a secret key (available in the puppet private repo)|
|CAS-SSO||Infrastructure security and packaging||The Wikimedia Developer SSO Portal at idp.wikimedia.org is a single sign-on (SSO) infrastructure built on Apereo CAS. When logging into a CAS-enabled website without an active SSO session you'll be redirected to the CAS login page. The CAS service collects LDAP group memberships and makes them available to services for making authorisation choices. After authentication the users get redirected to the initiating service.||https://phabricator.wikimedia.org/tag/cas-sso/|
|Reprepro||Infrastructure security and packaging||Reprepro is able to manage multiple repositories for multiple distribution versions in one package pool. It can process updates from an
|Cowbuilder||Infrastructure security and packaging||A module used to populate a Debian/Ubuntu package building environment. Meant to be used in the Wikimedia environment but could be adapted for other environments as well.|
|Netbox||Infrastructure security||Netbox is a "IP address management (IPAM) and data center infrastructure management (DCIM) tool".||https://phabricator.wikimedia.org/tag/netbox/||https://netbox.wikimedia.org/|
|Netmon||Infrastructure security||Netmon is a network monitoring system with high-performance traffic sniffing technology.|
|RPKI||Infrastructure security||Resource Public Key Infrastructure is a public key infrastructure framework to support improved security for the Internet's BGP routing infrastructure. RPKI provides a way to connect Internet number resource information to a trust anchor.|
|Cloudflare||Infrastructure security||Cloudflare's Magic Transit protects IP subnets from DDoS attacks. It uses Cloudflare's global network to mitigate attacks, employing two networking protocols: BGP and GRE, for routing and encapsulation.|
|NEL||Infrastructure security||Network Error Logging is a mechanism that can be configured via the NEL HTTP response header. This header allows web sites and applications to opt-in to receive reports about failed (and, if desired, successful) network fetches from supporting browsers.|
|Failoid||Miscellanea||Fallback backend that immediately close the connection used in the DNS/Discovery setup.|