You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

LibreNMS: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Ayounsi
No edit summary
imported>Filippo Giunchedi
Line 46: Line 46:
upstream https://github.com/librenms/librenms.git (fetch)
upstream https://github.com/librenms/librenms.git (fetch)
upstream https://github.com/librenms/librenms.git (push)
upstream https://github.com/librenms/librenms.git (push)
</syntaxhighlight>
</syntaxhighlight>
 
<pre>
new=<new version>
old=<old version>
git fetch upstream
git fetch upstream
git checkout -b upstream-$new $new


git checkout -b upstream-<version> <version>
# If you are missing composer: apt install -y composer php-gd
 
composer install --no-dev # (your will be prompted for any missing php requirements)
(Verify you're in the <code>upstream-<version></code> branch)
 
If you are missing composer: <code>apt install -y composer php-gd</code>
 
composer install --no-dev (your will be prompted for any missing php requirements)
 
git add -f vendor
git add -f vendor
 
git commit -m "Add composer requirements for LibreNMS $new"
git commit -m "Add composer requirements for LibreNMS<version>"


mkdir scap
mkdir scap
 
git checkout upstream-$old -- scap/scap.cfg
vim scap/scap.cfg
 
<syntaxhighlight lang="text">
[global]                           
git_repo: librenms/librenms
git_deploy_dir: /srv/deployment
ssh_user: deploy-librenms
dsh_targets: /etc/dsh/group/librenms
git_submodules: False
lock_file: /tmp/scap.librenms.lock
config_deploy: False
keyholder_key: deploy_librenms
</syntaxhighlight>
 
git add scap
git add scap
git commit -m "Add Scap config"
git commit -m "Add Scap config"


git push origin upstream-<version>
git push origin upstream-$new
</pre>


On deploy1001:
On deploy1001:


<pre>
cd /srv/deployment/librenms/librenms/
cd /srv/deployment/librenms/librenms/
git fetch origin
git fetch origin
git branch # note the current branch
git checkout upstream-<version>
scap deploy Upgrade LibreNMS to <version> - <task>
</pre>


git status - note the current branch
Run puppet on netmon* hosts (e.g. from cumin1001)
 
git checkout upstream-<version>


scap deploy Upgrade LibreNMS to <version> - <task>
<pre>
cumin O:netmon run-puppet-agent
</pre>


On netmon*:
On the <tt>netmon_server</tt> (<tt>git grep -h netmon_server: hieradata/</tt>)


  run-puppet-agent
<pre>
  cd /srv/deployment/librenms/librenms
cd /srv/deployment/librenms/librenms
  sudo -u librenms ./daily.sh
sudo -u librenms ./daily.sh
</pre>


==== Rollback ====
==== Rollback ====
On deploy1001:
On deploy1001:


<pre>
cd /srv/deployment/librenms/librenms/
cd /srv/deployment/librenms/librenms/
git fetch origin
git fetch origin
git checkout <previous branch>
git checkout <previous branch>
scap deploy Rollback LibreNMS to <version> - <task>
scap deploy Rollback LibreNMS to <version> - <task>
</pre>


On netmon*:
Then run puppet again from cumin host:


Run puppet
<pre>
cumin O:netmon run-puppet-agent
</pre>


=== Check the logs ===
=== Check the logs ===

Revision as of 16:17, 3 August 2020

LibreNMS is an autodiscovering PHP/MySQL/SNMP based network monitoring which includes support for a wide range of network hardware and operating systems including Cisco, Linux, Juniper, Foundry, and many more.

LibreNMS is a community-based fork of the last GPL-licensed version of Observium.

Service

Currently hosted on netmon1002 and netmon2001

Replaces Observium which ran on Streber.

  • Software is not installed via Debian package
  • Software installed in: /srv/deployment/librenms/
  • RRD data stored in: /srv/librenms/
  • User creds are stored in MySQL: # grep auth_mechanism /srv/deployment/librenms/librenms/config.php
  • Authentication is done via LDAP

How to

Add a device to LibreNMS

Configure the read only v2c SNMP community on the device

Via webUI:

https://librenms.wikimedia.org/addhost/

And use the device FQDN, keep all the other fields as it (and do not force add it). Note: because of a bug, set port to "161".

The device should be discovered and polled in the next 10min.

Via CLI:

$ ssh librenms.wikimedia.org
$ cd /srv/deployment/librenms/librenms
$ sudo -u librenms php addhost.php <fqdn>
Added device <fqdn> (XXX)
$ sudo -u librenms php discovery.php -h XXX && sudo -u librenms php poller.php -h XXX

Upgrade LibreNMS

Let's assume your remote is configured like the following. And we're tracking new versions in different branches.

origin	ssh://<username>@gerrit.wikimedia.org:29418/operations/software/librenms (fetch)
origin	ssh://<username>@gerrit.wikimedia.org:29418/operations/software/librenms (push)
upstream	https://github.com/librenms/librenms.git (fetch)
upstream	https://github.com/librenms/librenms.git (push)
new=<new version>
old=<old version>
git fetch upstream
git checkout -b upstream-$new $new

# If you are missing composer: apt install -y composer php-gd
composer install --no-dev # (your will be prompted for any missing php requirements)
git add -f vendor
git commit -m "Add composer requirements for LibreNMS $new"

mkdir scap
git checkout upstream-$old -- scap/scap.cfg
git add scap
git commit -m "Add Scap config"

git push origin upstream-$new

On deploy1001:

cd /srv/deployment/librenms/librenms/
git fetch origin
git branch # note the current branch
git checkout upstream-<version>
scap deploy Upgrade LibreNMS to <version> - <task>

Run puppet on netmon* hosts (e.g. from cumin1001)

cumin O:netmon run-puppet-agent

On the netmon_server (git grep -h netmon_server: hieradata/)

cd /srv/deployment/librenms/librenms
sudo -u librenms ./daily.sh

Rollback

On deploy1001:

cd /srv/deployment/librenms/librenms/
git fetch origin
git checkout <previous branch>
scap deploy Rollback LibreNMS to <version> - <task>

Then run puppet again from cumin host:

cumin O:netmon run-puppet-agent

Check the logs

LibreNMS logs in 4 different locations:

  • /srv/deployment/librenms/librenms/logs/librenms.log
  • /var/log/librenms.log
  • /var/log/librenms/daily.log
  • /var/log/apache2/librenms.wikimedia.org.error.log

It would be great to have the first 3 in a single location.

Mass update PDU alerting thresholds

PDUs have automatically generated thresholds, the query bellow sets sane defaults to eqiad/codfw PDUs. And need to be run when new PDUs are being provisioned.
https://phabricator.wikimedia.org/T247358
https://phabricator.wikimedia.org/T245655

UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_class = 'power'
AND sensor_descr like "Phase%"
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 1400
UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_class = 'current'
AND (sensor_descr like "%Phase%" or sensor_descr like "%Line%" )
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 12
UPDATE librenms.sensors JOIN librenms.devices
ON sensors.device_id = devices.device_id
AND sensor_descr like "Cord%"
AND sensor_class = 'power'
AND (hostname like "%eqiad%" or hostname like "%codfw%" )
SET sensors.sensor_custom = 'Yes', sensor_limit = 3440

Features

Interface grouping

LibreNMS can group interfaces based on their description's prefix, for example "Transit:", "Peering:". Which is shown under the "ports" dropdown.

Prefixes not shown in the dropdown are still reachable by editing the URL, for example:

https://librenms.wikimedia.org/iftype/type=transport-tun/

https://librenms.wikimedia.org/iftype/type=transport/

IRC Alerting

LibreNMS IRC bot named librenms-wmf outputs the alerts and recoveries on the -operations channel. This to help correlate series of alerts between different monitoring tools.

More details about the alerts can be found on https://librenms.wikimedia.org/alerts/ .

If the bot misbehaves or is too noisy and needs to be stopped, three options:

Then file a task to track the issue.

If IRC alerting is not working:

  1. Make sure the bot is running
  2. Make sure the fifo file librenms/.ircbot.alert (created by the IRC bot) is writable by www-data (the LibreNMS app)

Known limitations

  • When failed over to the codfw (backup) instance (see. https://phabricator.wikimedia.org/T247967)
    • Polling time for eqiad devices increased significantly due to the added latency. For the most populated rows (eqiad B and D) this means that occasionally poll times are >5 min, resulting in alerts and potentially missed data
    • librenms web ui got significantly slower (from Europe at least) in part because of the added latency to reach codfw, in part because the database is still in eqiad

External links