You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

GitLab: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Dzahn
imported>Jelto
(adjust instance names according to T307142)
 
(14 intermediate revisions by 5 users not shown)
Line 1: Line 1:
This page contains SRE related topics for GitLab. For GitLab application specific information, please see https://www.mediawiki.org/wiki/GitLab (under Implementation).
{{Sidebar
| style = background: white; padding:10px; padding-{{dir|{{pagelang}}|left|right}}:13px; margin:{{dir|{{pagelang}}|5px 12px 5px 0|5px 0 5px 12px}}; width: 350px;
| name = GitLab
| title = GitLab
| image = [[File:Gitlab-logo.svg.svg|center|250px]]
| headingstyle = font-size: 130%; padding: .5em;
| contentstyle = text-align: {{dir|{{pagelang}}|right|left}}; font-size: 14px; padding: .5em; line-height: 1.5;
| abovestyle = text-align: {{dir|{{pagelang}}|right|left}};
| content1 =
{{Special:PrefixIndex/{{FULLPAGENAME}}/ |hideredirects=1 |stripprefix=1}}
* External resources:
** [[mw:GitLab]] - User documentation
** [https://gitlab.wikimedia.org/ Production GitLab]
** [https://docs.gitlab.com Upstream GitLab docs]
** [https://phabricator.wikimedia.org/project/view/5057/ GitLab in Phabricator]
}}


==Backup and restore==
This page contains SRE related topics for GitLab. For GitLab application-specific information, user documentation, and policy, please see [[mw:GitLab]] on mediawiki.org.
This section describes backup configuration and restore procedure for GitLab instance.


===Backups===
GitLab is reachable at https://gitlab.wikimedia.org/. We run multiple instances of GitLab:
To backup application data GitLabs build in [https://docs.gitlab.com/ee/raketasks/backup_restore.html#back-up-gitlab backup functionality] is used. Application data backups are created by calling the <code>/usr/bin/gitlab-backup create</code> command. Configuration backups are created by calling <code>/usr/bin/gitlab-ctl backup-etc</code>. The commands are executed once a day in cronjobs [[gerrit:plugins/gitiles/operations/gitlab-ansible/+/refs/heads/master/roles/gitlab_server/defaults/main.yml#14|created with Ansible]] and will create full backups. To configure the backups please refer to all [[gerrit:plugins/gitiles/operations/gitlab-ansible/+/refs/heads/master/roles/gitlab_server/templates/gitlab-crontab.j2|backup related variables]] in Ansible.


So GitLab will create two new .tar archives every day:
* gitlab1004 runs production GitLab serving https://gitlab.wikimedia.org/
* gitlab1003 runs a passive GitLab [[GitLab/Replica|replica]] serving https://gitlab-replica.wikimedia.org/
* gitlab-prod-1001.devtools.eqiad1.wikimedia.cloud runs a production-like [[GitLab/Test Instance|test instance]] in WMCS/VPS serving https://gitlab.devtools.wmcloud.org/
* gitlab1001 and gitlab2001 are old ganeti VMs which will be decommissioned soon


*full data backup in <code><nowiki>{{gitlab_backup_path}}</nowiki></code>
== GitLab instances ==
*full config backup in <code>/etc/gitlab/config_backup</code>


Partial backups are [[gerrit:plugins/gitiles/operations/gitlab-ansible/+/refs/heads/master/roles/gitlab_server/defaults/main.yml#22|disabled currently]]. For the initialization phase daily full backups are used. In the future we may start implementing partial and incremental backups.
gitlab1003, gitlab1004 and test instance gitlab-prod-1001 are setup using puppet. The configuration currently lives in [[gerrit:plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/gitlab.pp|profile::gitlab]]. Former configuration from [[gerrit:plugins/gitiles/operations/gitlab-ansible|gitlab-ansible]] was migrated completely to puppet (see [[phab:T283076|T283076]]). GitLab is installed as a [https://docs.gitlab.com/omnibus/ Omnibus installation] on all instances. So all GitLab components are installed using the official packages and are executed on a single host. The reasons for this setup can be found in the [[mw:GitLab/Initialization|Initialization docs in Mediawiki]].


====Backup retention====
GitLab login is implemented with SSO using the [[CAS-SSO|CAS/SSO]]. So users will be redirected to idp.wikimedia.org (idp.wmcloud.org on WMCS/VPS) to login to the SSO portal. Authentication is currently open to all users with a Wikimedia developer account for the production instance. Access to the replica and test instance is restricted to WMF/NDA groups.
Data backups and config backups will be deleted after three days on the production instance (see [[phab:T274463#7147179|T274463#7147179]]). Release Engineering wanted to have three days of local retention for fast troubleshooting and restores. Deletion of the data backups is handled by GitLab (using the <code>gitlab_backup_keep_time</code> variable). Deletion of the config backup is implemented in the [[gerrit:plugins/gitiles/operations/gitlab-ansible/+/refs/heads/master/roles/gitlab_server/defaults/main.yml#14|backup cronjob]] (using the <code>gitlab_backup_config_keep_num</code> variable).  


===Storing backups in bacula===
== GitLab runners ==
For enhanced reliability backups are also stored in [[Bacula]]. Bacula is the standard for secure, encrypted backup storage in the WMF.


For the initialization phase we decided to only backup the most recent .tar file with the data backup and the most recent .tar file with the configuration backup. Furthermore these .tar files will be shipped to Bacula once a day as a full backup (see [[Bacula#Backup Strategy|backup strategy daily]]). This backup strategy is '''not the default''' used by most services. The following concerns and advantages were discovered during our discussion when comparing daily full backups instead of weekly full backups and daily incremental backups (see [[phab:T274463|T274463]] and comments in [[gerrit:c/operations/puppet/+/697850|/puppet/+/697850]]):
GitLab offers CI/CD capabilities. For our current and Runner documentation, see [[GitLab/Gitlab Runner]].


* Incremental backups of GitLab's self-contained full backups would introduce an artificial technical dependency between revisions without having an actual dependency. To restore a backup Bacula would have to merge and diff all recent incremental backups and combine them with the last full backup. However, the latest backup should be enough to restore GitLab to the previous state.
== SSH fingerprints ==
* The default backup policy would conflict with the requirement of Release Engineering to have three days of local backup retention on the GitLab host. This conflict would cause up to three times of additional disk usage in Bacula in comparison to a non-default backup policy.
* Incremental-only backups would solve the problem of additional disk usage but can't be used long term due to technical limitations of Bacula according to Data Persistence. The restore process with a lot of incremental revisions would need a long time and computing resources. Furthermore we would introduce a dependency between revisions (see above).
Because of the reasons above we decided against the default strategy and instead use Daily Full Backups. For this decisions it was necessary to implement two changes:


* Add a new Daily Full policy to Bacula (see [[gerrit:c/operations/puppet/+/700183|/puppet/+/700183]])
See [[Help:SSH_Fingerprints/gitlab.wikimedia.org]] for an overview of all fingerprints at once.
* Create dedicated folder structure for GitLab latest backup (see [[gerrit:c/operations/gitlab-ansible/+/700084|/gitlab-ansible/+/700084]] and below)


==== "Latest" backup ====
Each gitlab server has 4 IPs on the same network interface. One IPv4 and one IPv6 for server, the standard sshd that admins use to connect to the individual backend (gitlab1001.wikimedia.org/gitlab2001.wikimedia.org) and one IPv4 and IPv6 for the service address (gitlab.wikimedia.org).
To implement the strategy of daily full backups, a dedicated folder structure is needed for Bacula. We have to make sure that Bacula will not save the last three backups available on the GitLab host. Bacula must only backup the directory with the most recent files. For this purpose we created a additional <code>./latest</code> directory inside each of the backup directories ([[gerrit:c/operations/gitlab-ansible/+/700084|using Ansible]]). Since our goal is to replace the Ansible code with puppet eventually, we also ensured the "latest" backup dirs exist using Puppet. We did this in 2 places, the profile class currently used in production ([[gerrit:700622]]) and the backup class from the gitlab module currently used only in cloud ([[gerrit:700595]]). Ideally we want to get to a situation where both production and cloud machines are setup automatically by the same puppet role, both using the module. The backup scripts on the GitLab machine will update the <code>latest.tar</code> file. <syntaxhighlight lang="bash">
/srv/gitlab-backup/
├── 1624752267_2021_06_27_13.11.5_gitlab_backup.tar
├── 1624838667_2021_06_28_13.11.5_gitlab_backup.tar
├── 1624925067_2021_06_29_13.11.5_gitlab_backup.tar
└── latest
    └── latest.tar
</syntaxhighlight>
Bacula is then configured to just use the <code>/latest</code> folder and save the most recent backup. Here is the fileset used in bacula:<syntaxhighlight lang="bash">
    bacula::director::fileset { 'gitlab':
        includes => [ '/srv/gitlab-backup/latest', '/etc/gitlab/config_backup/latest' ]
    }
</syntaxhighlight>


===Restore===
If you connect to the service as a user you _should_ expect to see the one for the service IP but currently you will see the one for the backend you are connecting to. Currently this is gitlab1004 but it could change when we switch data centers or fail over.
WIP


=== Failover ===
We are looking into getting a new configuration option into gitlab upstream to properly fix this. Meanwhile you can find fingerprints linked on [[Help:SSH_Fingerprints/gitlab.wikimedia.org]].
WIP
 
also see the status of this ticket: [[phab:T296944]]
 
== How to create or migrate a repo / group / project ==
 
See [[mw:GitLab/Hosting a project on GitLab]] for full user documentation.
 
== Tickets ==
 
*[[phab:T274459]] (VM creation request)
*[[phab:T296944]] (Self-reported GitLab SSH host key fingerprints don’t appear to match actual host key fingerprints)
*[[phab:T295481]] (Setup GitLab Runner in trusted environment)
 
[[Category:SRE Service Operations]]

Latest revision as of 12:25, 8 June 2022

This page contains SRE related topics for GitLab. For GitLab application-specific information, user documentation, and policy, please see mw:GitLab on mediawiki.org.

GitLab is reachable at https://gitlab.wikimedia.org/. We run multiple instances of GitLab:

GitLab instances

gitlab1003, gitlab1004 and test instance gitlab-prod-1001 are setup using puppet. The configuration currently lives in profile::gitlab. Former configuration from gitlab-ansible was migrated completely to puppet (see T283076). GitLab is installed as a Omnibus installation on all instances. So all GitLab components are installed using the official packages and are executed on a single host. The reasons for this setup can be found in the Initialization docs in Mediawiki.

GitLab login is implemented with SSO using the CAS/SSO. So users will be redirected to idp.wikimedia.org (idp.wmcloud.org on WMCS/VPS) to login to the SSO portal. Authentication is currently open to all users with a Wikimedia developer account for the production instance. Access to the replica and test instance is restricted to WMF/NDA groups.

GitLab runners

GitLab offers CI/CD capabilities. For our current and Runner documentation, see GitLab/Gitlab Runner.

SSH fingerprints

See Help:SSH_Fingerprints/gitlab.wikimedia.org for an overview of all fingerprints at once.

Each gitlab server has 4 IPs on the same network interface. One IPv4 and one IPv6 for server, the standard sshd that admins use to connect to the individual backend (gitlab1001.wikimedia.org/gitlab2001.wikimedia.org) and one IPv4 and IPv6 for the service address (gitlab.wikimedia.org).

If you connect to the service as a user you _should_ expect to see the one for the service IP but currently you will see the one for the backend you are connecting to. Currently this is gitlab1004 but it could change when we switch data centers or fail over.

We are looking into getting a new configuration option into gitlab upstream to properly fix this. Meanwhile you can find fingerprints linked on Help:SSH_Fingerprints/gitlab.wikimedia.org.

also see the status of this ticket: phab:T296944

How to create or migrate a repo / group / project

See mw:GitLab/Hosting a project on GitLab for full user documentation.

Tickets

  • phab:T274459 (VM creation request)
  • phab:T296944 (Self-reported GitLab SSH host key fingerprints don’t appear to match actual host key fingerprints)
  • phab:T295481 (Setup GitLab Runner in trusted environment)