You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Netbox: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>CRusnov
(Reorganize top of article to be more random viewer friendly. Add section where we can put pointers and procedures for errors produced by the system.)
imported>CRusnov
m (→‎Would like to remove interface: minor editorial and typo fix to make it more clear whats happening)
Line 208: Line 208:


=== '''Would like to remove interface''' ===
=== '''Would like to remove interface''' ===
This error is produced in the Interface Automation script when cleaning up old interfaces during an import. Interfaces are considered from removal if they don't appear in the list provided by the data source (generally speaking, PuppetDB). They are then checked if there is an IP address or a cable associated with the interface. If there is one of these the interface is left in place so as to not lose data. It is considered a bug if this happens, so if you see this error in an output feel free to open a ticket against #netbox in Phabricator.
This error is produced in the Interface Automation script when cleaning up old interfaces during an import.
 
Interfaces are considered for removal if they don't appear in the list provided by the data source (generally speaking, PuppetDB); they are then checked if there is an IP address or a cable associated with the interface. If there is one of these the interface is left in place so as to not lose data. It is considered a bug if this happens, so if you see this error in an output feel free to open a ticket against #netbox in Phabricator.
[[Category:Services]]
[[Category:Services]]

Revision as of 18:20, 22 January 2021

Netbox is a "IP address management (IPAM) and data center infrastructure management (DCIM) tool".


At Wikimedia it is used as the DCIM and IPAM system, as well as being used as an integration point for switch and port management, DNS management and similar operations.

History

Web UI

  • https://netbox.wikimedia.org/
  • login using your LDAP/Wikitech credentials
  • currently you need to be in either the "ops" or "wmf" LDAP group to be able to login

Backups

The following paths are backed up in Bacula:

/srv/netbox-dumps/
/srv/postgres-backup/

A puppetized cron job (class postgresql::backup) automatically creates a daily dump file of all local Postgres databases (pg_dumpall) and stores it in /srv/postgres-backup.

This path is then backed up by Bacula.

For more details, the related subtask to setup backups was Phab:T190184.

Restore

First of all analyze the Netbox changelog to choose what's the best action to perform a restore.

The general options are:

  • Manually (or via the API) re-play the actions listed in the changelog in reverse order. The changelog entries don't have full raw data, some of them might show the names instead of the IDs required in the API.
  • Use the CSV dumps to recover data. Their restore is not trivial either due to the fact that some of the Netbox exports are not immediately re-importable due to reference resolution.
  • Restore a database dump. This ensure consistency at a given point in time, and could even be used to perform some partial restore using pg_restore.

To restore files from Bacula back to the client, use bconsole on helium and refer to Bacula#Restore_(aka_Panic_mode) for detailed steps.

Restore the DB dump

  • Check the dump list in both hosts (as of May 2020 netboxdb[12]001 in /srv/postgres-backup, a more recent one might be in the other host.
  • Copy if needed the dump to the master host (as of May 2020 netboxdb1001)
  • Unzip the chosen dump file
  • Take a one-off backup right before starting the restore with (the .bak suffix is important to not be auto-evicted):
 /usr/bin/pg_dumpall | /bin/gzip > /srv/postgres-backup/${USER}-DESCRIPTION.psql-all-dbs-$(date +\%Y\%m\%d).sql.gz.bak
  • Connect to the DB, list and drop the Netbox database:
psql
postgres=# \l
...
postgres=# DROP DATABASE netbox;
DROP DATABASE
postgres=#
  • Restore the DB with:
 sudo -u postgres /usr/bin/psql < ${DUMP_FILE}

Flush caches after a restore

After a restore Netbox caches must be flushed to ensure both consistency and see the changes.

To perform the flush SSH into the Netbox master host (as of May 2020 netbox1001) and execute:

cd /srv/deployment/netbox
. venv/bin/activate  # Activate the Netbox Python virtualenv
cd deploy/src/netbox
python manage.py invalidate all  # Perform the flush

Automatic CSV Dumps

Each hour at :37, a script dumps most pertinent tables to a target directory in /srv/netbox-dumps with a timestamp. Sixteen of these dumps are retained for backup purposes, which is executed by the script in /srv/deployment/netbox/deploy/scripts/rotatedump. This script only rotates directories in the pattern 20*, so if a manual, retained dump is desired, one can simply run the script (su netbox -c /srv/deployment/netbox/deploy/scripts/rotatedump) and rename the resulting dump outside of this pattern, perhaps with a descriptive prefix.

Note that historical copies are also available from Bacula, as this is one of the directories that are backed up.

Dumping Database for Testing Purposes

The Netbox database contains a few bits of sensitive information, and if it is going to be used for testing purposes in WMCS it should be sanitized first.

  1. Create a copy of the main database createdb netbox-sanitize && pg_dump netbox | psql netbox-sanitize
  2. Run the below SQL code on netbox-sanitize database.
  3. Dump and drop database pg_dump netbox-sanitize > netbox-sanitized.sql; dropdb netbox-sanitize
-- truncate secrets
TRUNCATE secrets_secret CASCADE;
TRUNCATE secrets_sessionkey CASCADE;
TRUNCATE secrets_userkey CASCADE;

-- sanitize dcim_serial
UPDATE dcim_device SET serial = concat('SERIAL', id::TEXT);

-- truncate user table
TRUNCATE auth_user CASCADE;

-- sanitize dcim_interface.mac_address
UPDATE dcim_interface SET mac_address = CONCAT(
                   LPAD(TO_HEX(FLOOR(random() * 255 + 1) :: INT)::TEXT, 2, '0'), ':',
                   LPAD(TO_HEX(FLOOR(random() * 255 + 1) :: INT)::TEXT, 2, '0'), ':',
                   LPAD(TO_HEX(FLOOR(random() * 255 + 1) :: INT)::TEXT, 2, '0'), ':',
                   LPAD(TO_HEX(FLOOR(random() * 255 + 1) :: INT)::TEXT, 2, '0'), ':',
                   LPAD(TO_HEX(FLOOR(random() * 255 + 1) :: INT)::TEXT, 2, '0'), ':',
                   LPAD(TO_HEX(FLOOR(random() * 255 + 1) :: INT)::TEXT, 2, '0')) :: macaddr;

-- sanitize cricuits_circuit.cid
UPDATE circuits_circuit SET cid = concat('CIRCUIT', id::TEXT);

Custom Links

Netbox allow to setup custom links to other websites using Jinja2 templating for both the visualized name and the actual link, allowing for quite some flexibility. The current setup (as of Feb. 2020) has the following links:

  • Grafana (for all physical devices and VMs)
  • Icinga (for all physical devices and VMs)
  • Debmonitor (for all physical devices and VMs)
  • Procurement Ticket (only for physical devices that have a ticket that matches either Phabricator or RT)
  • Hardware config (for Dell and HP physical devices, pointing to the manufacturer page for warranty information based on their serial number)

Netbox Extras

CustomScripts, Reports and other associated tools for Netbox are collected in the netbox-extras repository at https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/netbox-extras/. This repository is deployed to the Netbox frontends under /srv/deployment/netbox-extras. It is not automatically deployed on merged, and must be manually `git pull` after merge on both front-ends. This can be most comfortably accomplished with Cumin on a Cumin host:

 sudo cumin 'A:netbox' 'cd /srv/deployment/netbox-extras; git pull'

This will have the dual purpose of resetting any local changes and updating the deployment to the latest version.

Reports

Netbox reports are a way of validating data within Netbox. They are available in https://netbox.wikimedia.org/extras/reports/., and are defined in the repository https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/netbox-extras/ under reports/.

In summary, reports produce a series of log lines that indicate some status connected to a machine, and may be either error, warning, or success. Log lines with no particular disposition for information purposes may also be emitted.

Report Conventions

Because of limitations to the UI for Netbox reports, certain conventions have emerged:

  1. Reports should emit one log_error line for each failed item. If the item doesn't exist as a Netbox object, None may be passed in place of the first argument.
  2. If any log_warning lines are produced, they should be grouped after the loop which produces log_error lines.
  3. Reports should emit one log_success which contains a summary of successes, as the last log in the report.
  4. Log messages referring to a single object should be formatted like <verb/condition> <noun/subobject>[: <explanatory extra information>]. Examples:
    1. malformed asset tag: WNF1212
    2. missing purchase date
  5. Summary log messages should be formatted like <count> <verb/condition> <noun/subobject>

Report Alert

The report results are at https://netbox.wikimedia.org/extras/reports/

Most reports that alert are non-critical data integrity mismatches due to changes in infrastructure, as a secondary check, and the responsibility of DC-ops.

Reports and their Errors
Report Typical Responsibility Typical Error(s)
Accounting Faidon or DC-ops
Cables DC-ops
Coherence (does not alert)
LibreNMS DC-ops or Netops
Management DC-ops
PuppetDB Whoever changed / reimaged host <device> missing from PuppetDB or <device> missing from Netbox. These occur because the data in PuppetDB does not match the data in Netbox, typically related to missing devices or unexpected devices. Generally these errors fix themselves once the reimage is complete, but the Netbox record for the host may need to be updated for decommissioning and similar operations.
Juniper (does not alert) DC-ops or Netops

Juniper Report

The Juniper Installed Base report needs manual steps to be updated:

  1. Login to my.juniper.net
  2. Go to the Products tab
  3. Hit the export button, select "No filter, All Columns, and Accounts" then Export
  4. Download the spreadsheet from the 🔔 (Notification) menu.
  5. Copy it to netbox1001.wikimedia.org:/tmp/juniper_installed_base.csv
  6. Run the report

A possible future evolution is to query that data directly from Juniper's APIs.

Exports

Set of resources that exports Netbox data in various formats.

DNS

A git repository of DNS zonefile snippets generated from Netbox data and exported via HTTPS in read-only mode to be consumed by the DNS#Authoritative_nameservers and the Continuous Integration tests run for the operations/dns Gerrit repository. The repository is available via:

 $ git clone https://netbox-exports.wikimedia.org/dns.git

To update the repository, see DNS/Netbox#Update_generated_records.

Extra Errors, Notes and Procedures

Would like to remove interface

This error is produced in the Interface Automation script when cleaning up old interfaces during an import.

Interfaces are considered for removal if they don't appear in the list provided by the data source (generally speaking, PuppetDB); they are then checked if there is an IP address or a cable associated with the interface. If there is one of these the interface is left in place so as to not lose data. It is considered a bug if this happens, so if you see this error in an output feel free to open a ticket against #netbox in Phabricator.