You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Netbox: Difference between revisions
imported>Ayounsi (Add juniper report steps) |
imported>CRusnov (→Automatic CSV Dumps: minor improvements to teh language used) |
||
Line 11: | Line 11: | ||
* https://netbox.wikimedia.org/ | * https://netbox.wikimedia.org/ | ||
* login using your LDAP/Wikitech credentials | * login using your LDAP/Wikitech credentials | ||
* Currently you need an LDAP group membership in "ops" to be able to login | * Currently you need an LDAP group membership in "ops" to be able to login | ||
== Backups == | == Backups == | ||
Line 17: | Line 17: | ||
The following paths are backed up in [[Bacula]]: | The following paths are backed up in [[Bacula]]: | ||
/srv/netbox- | /srv/netbox-dumps/ | ||
/srv/postgres-backup/ | /srv/postgres-backup/ | ||
Line 36: | Line 36: | ||
* unzip the latest dump file from /srv/postgres-backup | * unzip the latest dump file from /srv/postgres-backup | ||
* sudo -u postgres /usr/bin/psql < psql-all-dbs-20180804.sql | * sudo -u postgres /usr/bin/psql < psql-all-dbs-20180804.sql | ||
Some more details from when restore was tested on [[Phab:T190184#4481629]]. | Some more details from when restore was tested on [[Phab:T190184#4481629]]. | ||
=== Automatic CSV Dumps === | |||
Each hour at :37, a script dumps most pertinent tables to a target directory in <code>/srv/netbox-dumps</code> with a timestamp. Sixteen of these dumps are retained for backup purposes, which is executed by the script in <code>/srv/deployment/netbox/deploy/scripts/rotatedump</code>. This script only rotates directories in the pattern <code>20*</code>, so if a manual, retained dump is desired, one can simply run the script (<code>su netbox -c /srv/deployment/netbox/deploy/scripts/rotatedump)</code> and rename the resulting dump outside of this pattern, perhaps with a descriptive prefix. | |||
Note that historical copies are also available from Bacula, as this is one of the directories that are backed up. | |||
== Dumping Database for Testing Purposes == | == Dumping Database for Testing Purposes == |
Revision as of 16:53, 25 October 2019
Netbox is a "IP address management (IPAM) and data center infrastructure management (DCIM) tool".
At Wikimedia it has been evaluated in Phab:T170144 as a replacement for Racktables.
In Phab:T199083 the actual migration between the systems took place.
Web UI
- https://netbox.wikimedia.org/
- login using your LDAP/Wikitech credentials
- Currently you need an LDAP group membership in "ops" to be able to login
Backups
The following paths are backed up in Bacula:
/srv/netbox-dumps/ /srv/postgres-backup/
A puppetized cron job (class postgresql::backup) automatically creates a daily dump file of all local Postgres databases (pg_dumpall) and stores it in /srv/postgres-backup.
This path is then backed up by Bacula.
For more details, the related subtask to setup backups was Phab:T190184.
Restore
To restore files from Bacula back to the client, use bconsole on helium and refer to Bacula#Restore_(aka_Panic_mode) for detailed steps.
To restore postgres databases from a dump file:
- unzip the latest dump file from /srv/postgres-backup
- sudo -u postgres /usr/bin/psql < psql-all-dbs-20180804.sql
Some more details from when restore was tested on Phab:T190184#4481629.
Automatic CSV Dumps
Each hour at :37, a script dumps most pertinent tables to a target directory in /srv/netbox-dumps
with a timestamp. Sixteen of these dumps are retained for backup purposes, which is executed by the script in /srv/deployment/netbox/deploy/scripts/rotatedump
. This script only rotates directories in the pattern 20*
, so if a manual, retained dump is desired, one can simply run the script (su netbox -c /srv/deployment/netbox/deploy/scripts/rotatedump)
and rename the resulting dump outside of this pattern, perhaps with a descriptive prefix.
Note that historical copies are also available from Bacula, as this is one of the directories that are backed up.
Dumping Database for Testing Purposes
The Netbox database contains a few bits of sensitive information, and if it is going to be used for testing purposes in WMCS it should be sanitized first.
- Create a copy of the main database
createdb netbox-sanitize && pg_dump netbox | psql netbox-sanitize
- Run the below SQL code on
netbox-sanitize
database. - Dump and drop database
pg_dump netbox-sanitize > netbox-sanitized.sql
;dropdb netbox-sanitize
-- truncate secrets
TRUNCATE secrets_secret CASCADE;
TRUNCATE secrets_sessionkey CASCADE;
TRUNCATE secrets_userkey CASCADE;
-- sanitize dcim_serial
UPDATE dcim_device SET serial = concat('SERIAL', id::TEXT);
-- truncate user table
TRUNCATE auth_user CASCADE;
-- sanitize dcim_interface.mac_address
UPDATE dcim_interface SET mac_address = CONCAT(
LPAD(TO_HEX(FLOOR(random() * 255 + 1) :: INT)::TEXT, 2, '0'), ':',
LPAD(TO_HEX(FLOOR(random() * 255 + 1) :: INT)::TEXT, 2, '0'), ':',
LPAD(TO_HEX(FLOOR(random() * 255 + 1) :: INT)::TEXT, 2, '0'), ':',
LPAD(TO_HEX(FLOOR(random() * 255 + 1) :: INT)::TEXT, 2, '0'), ':',
LPAD(TO_HEX(FLOOR(random() * 255 + 1) :: INT)::TEXT, 2, '0'), ':',
LPAD(TO_HEX(FLOOR(random() * 255 + 1) :: INT)::TEXT, 2, '0')) :: macaddr;
-- sanitize cricuits_circuit.cid
UPDATE circuits_circuit SET cid = concat('CIRCUIT', id::TEXT);
Reports
Netbox reports are a way of validating data within Netbox. They are available in https://netbox.wikimedia.org/extras/reports/., and are defined in the repository https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/netbox-reports/.
In summary, reports produce a series of log lines that indicate some status connected to a machine, and may be either error
, warning
, or success
. Log lines with no particular disposition for information purposes may also be emitted.
Report Conventions
Because of limitations to the UI for Netbox reports, certain conventions have emerged:
- Reports should emit one
log_error
line for each failed item. If the item doesn't exist as a Netbox object,None
may be passed in place of the first argument. - If any
log_warning
lines are produced, they should be grouped after the loop which produceslog_error
lines. - Reports should emit one
log_success
which contains a summary of successes, as the last log in the report. - Log messages referring to a single object should be formatted like <verb/condition> <noun/subobject>[: <explanatory extra information>]. Examples:
- malformed asset tag: WNF1212
- missing purchase date
- Summary log messages should be formatted like <count> <verb/condition> <noun/subobject>
Report Alert
The report results are at https://netbox.wikimedia.org/extras/reports/
Most reports that alert are non-critical data integrity mismatches due to changes in infrastructure, as a secondary check, and the responsibility of DC-ops.
Report | Typical Responsibility | Typical Error(s) |
---|---|---|
Accounting | Faidon or DC-ops | |
Cables | DC-ops | |
Coherence (does not alert) | ||
LibreNMS | DC-ops or Netops | |
Management | DC-ops | |
PuppetDB | Whoever changed / reimaged host | <device> missing from PuppetDB or <device> missing from Netbox. These occur because the data in PuppetDB does not match the data in Netbox, typically related to missing devices or unexpected devices. Generally these errors fix themselves once the reimage is complete, but the Netbox record for the host may need to be updated for decommissioning and similar operations. |
Juniper (does not alert) | DC-ops or Netops |
Juniper Report
The Juniper Installed Base report needs manual steps to be updated:
- Login to my.juniper.net
- Go to the Products tab
- Hit the export button, select "No filter, All Columns, and Accounts" then Export
- Download the spreadsheet from the 🔔 (Notification) menu.
- Copy it to netbox1001.wikimedia.org:/tmp/juniper_installed_base.csv
- Run the report
A possible future evolution is to query that data directly from Juniper's APIs.