You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
LibreNMS
LibreNMS is an autodiscovering PHP/MySQL/SNMP based network monitoring which includes support for a wide range of network hardware and operating systems including Cisco, Linux, Juniper, Foundry, and many more.
LibreNMS is a community-based fork of the last GPL-licensed version of Observium.
Service
Currently hosted on netmon1003 and netmon2002.
Replaces Observium which ran on Streber.
- Software is not installed via Debian package
- Software installed in:
/srv/deployment/librenms/
- RRD data stored in:
/srv/librenms/
- User creds are stored in MySQL:
# grep auth_mechanism /srv/deployment/librenms/librenms/config.php
- Authentication is done via LDAP
How to
Add a device to LibreNMS
Configure the read only v2c SNMP community on the device
Via webUI:
https://librenms.wikimedia.org/addhost/
And use the device FQDN, keep all the other fields as it (and do not force add it). Note: because of a bug, set port to "161".
The device should be discovered and polled in the next 10min.
Via CLI:
$ ssh librenms.wikimedia.org
$ cd /srv/deployment/librenms/librenms
$ sudo -u librenms php addhost.php <fqdn>
Added device <fqdn> (XXX)
$ sudo -u librenms php discovery.php -h XXX && sudo -u librenms php poller.php -h XXX
Upgrade LibreNMS
Let's assume your remote is configured like the following. And we're tracking new versions in different branches.
origin ssh://<username>@gerrit.wikimedia.org:29418/operations/software/librenms (fetch)
origin ssh://<username>@gerrit.wikimedia.org:29418/operations/software/librenms (push)
upstream https://github.com/librenms/librenms.git (fetch)
upstream https://github.com/librenms/librenms.git (push)
new=<new version> old=<old version> git fetch upstream git checkout -b upstream-$new $new # If you are missing composer: apt install -y composer php-gd composer install --no-dev # (your will be prompted for any missing php requirements) git add -f vendor git commit -m "Add composer requirements for LibreNMS $new" mkdir scap git checkout upstream-$old -- scap/scap.cfg git add scap git commit -m "Add Scap config" git push origin upstream-$new
WARNING: At this point you should make sure we are not leaving behind "our" patches to the old version. Check if any patches were applied on top of upstream-$old
and cherry-pick them on upstream-$new
. See for example an occurrence where a LibreNMS upgrade left behind patches: https://phabricator.wikimedia.org/T273716#7430992
On deploy1002:
cd /srv/deployment/librenms/librenms/ git fetch origin git branch # note the current branch git checkout upstream-<version> scap deploy Upgrade LibreNMS to <version> - <task>
Run puppet on netmon* hosts (cumin1001.eqiad.wmnet, cumin2002.codfw.wmnet
)
cumin O:netmon run-puppet-agent
On the netmon_server (git grep -h netmon_server: hieradata/)
cd /srv/deployment/librenms/librenms sudo -u librenms ./daily.sh
Rollback
On deploy1002:
cd /srv/deployment/librenms/librenms/ git fetch origin git checkout <previous branch> scap deploy Rollback LibreNMS to <version> - <task>
Then run puppet again from cumin host:
cumin O:netmon run-puppet-agent
Check the logs
LibreNMS logs in 4 different locations:
- /srv/deployment/librenms/librenms/logs/librenms.log
- /var/log/librenms.log
- /var/log/librenms/daily.log
- /var/log/apache2/librenms.wikimedia.org.error.log
It would be great to have the first 3 in a single location.
Mass update PDU alerting thresholds
PDUs have automatically generated thresholds, the query bellow sets sane defaults to eqiad/codfw PDUs. And need to be run when new PDUs are being provisioned.
https://phabricator.wikimedia.org/T247358
https://phabricator.wikimedia.org/T245655
UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_class = 'power'
AND sensor_descr like "Phase%"
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 1400
UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_class = 'current'
AND (sensor_descr like "%Phase%" or sensor_descr like "%Line%" )
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 12
UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_descr like "Cord%"
AND sensor_class = 'power'
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 3440
Features
Interface grouping
LibreNMS can group interfaces based on their description's prefix, for example "Transit:", "Peering:". Which is shown under the "ports" dropdown.
Prefixes not shown in the dropdown are still reachable by editing the URL, for example:
https://librenms.wikimedia.org/iftype/type=transport-tun/
https://librenms.wikimedia.org/iftype/type=transport/
Prometheus push-gateway
Alertmanager integration
Known limitations
- When failed over to the codfw (backup) instance (see. https://phabricator.wikimedia.org/T247967)
- Polling time for eqiad devices increased significantly due to the added latency. For the most populated rows (eqiad B and D) this means that occasionally poll times are >5 min, resulting in alerts and potentially missed data
- librenms web ui got significantly slower (from Europe at least) in part because of the added latency to reach codfw, in part because the database is still in eqiad