You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Monitoring/systemd unit state: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Dzahn
(Created page with "The "systemd unit state" Icinga checks tests if there are any failed systemd units. If this alerts you should ssh to the server in question and run '''systemctl list-...")
 
imported>BCornwall
(add shortcut)
 
(3 intermediate revisions by 3 users not shown)
Line 1: Line 1:
The "systemd unit state" [[Icinga]] checks tests if there are any failed [[systemd]] units.
The "systemd unit state" [[Icinga]] checks test if there are any failed [[systemd]] units. Units commonly include, but are not limited to, services (.service), mount points (.mount), devices (.device) and sockets (.socket). See {{manpage|name=systemd.unit|section=5}} for details on each of these.


If this alerts you should ssh to the server in question and run '''systemctl list-units --state=failed''' to check which unit is the one that has issues.
For this type of alerts, you should ssh to the server in question and run <code>systemctl list-units --state=failed</code> (or the shortcut <code>systemctl --failed</code>) to check which unit is the one that has issues.


Try manually starting it with '''systemctl start <unit name>'''.
Try manually starting it with <code>systemctl start ''unit''</code>.


You can use '''systemctl status <unit name>''', '''journalctl -u <unit name>''' and '''journalctl -xn''' to see more details and logs to figure out why it failed.
You can use <code>systemctl status ''unit''</code>, <code>journalctl -u ''unit''</code> and <code>journalctl -xn</code> to see more details and logs to figure out why it failed.


Sometimes the failure has been fixed already and you just need to clear the list of failed units with '''systemctl reset-failed'''.
Sometimes the failure has been fixed already and you just need to clear the list of failed units with <code>systemctl reset-failed ''unit''</code>.
 
Also see:


== See also ==
* {{phabricator|T199911}} for an ongoing issue with "Systemd session creation fails under I/O load"
* [https://linux-audit.com/auditing-systemd-solving-failed-units-with-systemctl/ Auditing systemd: solving failed units with systemctl]
* [https://linux-audit.com/auditing-systemd-solving-failed-units-with-systemctl/ Auditing systemd: solving failed units with systemctl]
* [https://www.digitalocean.com/community/tutorials/how-to-use-journalctl-to-view-and-manipulate-systemd-logs How To Use Journalctl to View and Manipulate Systemd Logs]
* [https://www.digitalocean.com/community/tutorials/how-to-use-journalctl-to-view-and-manipulate-systemd-logs How To Use Journalctl to View and Manipulate Systemd Logs]


[[Category:Runbooks]]
[[Category:Runbooks]]

Latest revision as of 16:20, 27 July 2022

The "systemd unit state" Icinga checks test if there are any failed systemd units. Units commonly include, but are not limited to, services (.service), mount points (.mount), devices (.device) and sockets (.socket). See systemd.unit(5) for details on each of these.

For this type of alerts, you should ssh to the server in question and run systemctl list-units --state=failed (or the shortcut systemctl --failed) to check which unit is the one that has issues.

Try manually starting it with systemctl start unit.

You can use systemctl status unit, journalctl -u unit and journalctl -xn to see more details and logs to figure out why it failed.

Sometimes the failure has been fixed already and you just need to clear the list of failed units with systemctl reset-failed unit.

See also