You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Media storage/Backups

From Wikitech-static
Jump to navigation Jump to search

The backups servers are listed on puppet (hieradata). Worker servers are those that are used to download and upload files on backup and recovery, as well as pre-processing them (e.g. hashing them and checking its integrity). Storage servers run minio (an S3-api compatible service) and hold the data long term. For now it uses completely static discovery as it give us high flexibility to depool a server.

Each datacenter has its own separate set of credentials.

At the moment, the clustering functionality of minio is not used, meaning one will have to access each storage server (minio server) individually. The sharding is based on the sha256 hash of the file, divided by the number of configured servers, in the order configured. For example, with:

endpoints:
  - https://backup1004.eqiad.wmnet:9000
  - https://backup1005.eqiad.wmnet:9000
  - https://backup1006.eqiad.wmnet:9000
  - https://backup1007.eqiad.wmnet:9000

files whose hash start (in hexadecimal) with 0-3 go to backup1004, 4-7 to backup1005, 8-B to backup1006 and C-F to backup1007. This is not guarantee to stay like that, as servers will likely be unavailable for maintenance at times, and the number of servers will be expanded, which means eventually one will be forced to use the metadata database to locate the server where a file is located.

How to access the web UI of minio

The integrated web client of minio, while simple, allows easily to manage, list and upload/download files to the backend with a more user-friendly interface.

minio access is firewalled, and it only has its service port open to the backup workers and prometheus for metrics gathering from the same datacenter. To gain access one needs to tunnel HTTPS on port 9000 to a local port with ssh through a server with access (e.g. a worker server from the same datacenter).

For example:

ssh -L 1234:backup1004.eqiad.wmnet:9000 ms-backup1001.eqiad.wmnet

will tunnel the minio service to the local port 1234 through ms-backup1001.

For the actual active worker and storage servers, consult hieradata.

File:MinIO login.png
Minio login screen

Then go to your browser and open https://localhost:1234. The https is important, as non-tls traffic is not allowed.

Your browser will complain about lack of a trusted TLS- this is because it uses the discovery CA, which is only deployed to the WMF production cluster. Either install it on your client PC or click "Accept risk and continue".

A login screen should appear. Credentials are deployed on the worker servers at: /etc/mediabackup/mediabackups_recovery.conf They are also available on private-puppet:hieradata/common.yaml and private-puppet:hieradata/{eqiad, codfw}.yaml.

File:MinIO browser.png
Minio file browser

It is highly recommended to use the credentials used for backup recovery (not backup generation) as those are read-only. If writing is needed, it is more reliable to use the command line client (mc), unless you know what you are doing.

After logging, a browser screen should appear, allowing you to navigate among the file structure, download files, etc. The minio browser will not disable options for deleting or uploading if in read only- it will only fail to do so.