You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Labsdb redaction
This is a WIP
This page is to document how the data is sanitized for the public databases that Wikimedia Cloud Services provides.
Step 1 Sanitarium
See MariaDB/Sanitarium and Labsdbs for more details.
Sanitarium has 7 mysql instances to replicate each db shard. This removes sensitive columns, tables and databases in the simple case where there are no conditions (e.g. Ensures user_password does not go into labs).
- For tables that should not be replicated, the
replicate-wild-ignore-table
mysql config option is set with the $private_tables puppet variable - For databases that should not be replicated (private wikis),
replicate-wild-ignore-table
is set with the databases from the $private_wikis puppet variable (Note, this is separate from private.dblist) - For columns that should be redacted, they are redacted via triggers that are set based on the list of columns at modules/role/files/mariadb/filtered_tables.txt
Data from this host is then replicated on to the labsdb hosts. Having this redaction done on a separate host outside of labs helps isolate the security of the data and ensure a privilege escalation on labs does not compromise the very sensitive data in the db.
There is also a report check_private_data_report to make sure redaction happened properly (FIXME: How is this run?)
The code related to sanitarium currently lives in operations/puppet in modules/role/files/mariadb
.
- modules/role/files/mariadb/redact_sanitarium.sh Add triggers to redact the appropriate columns
- modules/role/files/mariadb/filtered_tables.txt What columns to filter
- modules/role/files/mariadb/check_private_data_report and check_private_data.py Audit to make sure no private data is there
- $private_wikis and $private_tables in manifests/realm.pp
Formerly this used to be part of operations/software/redactron.git, but that repo is no longer used.
Step 2 Labsdb views
In operations/puppet.git modules/role/templates/labs/db/views/maintain-views.yaml contains views that define what is public. This contains conditional redactions that cannot be done at sanitarium (e.g. revision delete), and also serves as defense in depth in case one of the sanitarium redactions fail.
Document redaction decisions
TODO: include documentation/rationale on any info publicly exposed that is not publically exposed by MW.
Other
Note: operations/software/redactron.git and operations/software/labsdb-auditor.git contain historical software which is no longer used.