Systemd Http Health Check

Systemd http health check

The Short Answer

systemd has a native (socket-based) healthcheck method, but it's not HTTP-based. You can write a shim that polls status over HTTP and forwards it to the native mechanism, however.

The Long Answer

The Right Thing in the systemd world is to use the sd_notify socket mechanism to inform the init system when your application is fully available. Use Type=notify for your service to enable this functionality.

You can write to this socket directly using the sd_notify() call, or you can inspect the NOTIFY_SOCKET environment variable to get the name and have your own code write READY=1 to that socket when the application is returning 200s.

If you want to put this off to a separate process that polls your process over HTTP and then writes to the socket, you can do that -- ensure that NotifyAccess is set appropriately (by default, only the main process of the service is allowed to write to the socket).

Inasmuch as you're interested in detecting cases where the application fails after it was fully initialized, and triggering a restart, the sd_notify socket is appropriate in this scenario as well:

Send WATCHDOG_USEC=... to set the amount of time which is permissible between successful tests, then WATCHDOG=1 whenever you have a successful self-test; whenever no successful test is seen for the configured period, your service will be restarted.

Systemd unit, check status with external script

Short answer

This is impossible in systemd. The systemctl status verb always does the same thing, it cannot be overrided per-unit to a custom action.

Long answer

You can write a foo-status.service unit file with Type=oneshot and ExecStart= pointing to your custom status script, and then run systemctl start foo-status. However, this will only provide a zero/nonzero information (any nonzero exit code will be converted to 1).

To get the real exit code of your status script, run systemctl show -pExecMainStatus foo-status, however, if you go this far, then it is simpler to run your script directly.

How to add a health check on Tomcat?

In a simple way, you can create a directory like health inside webapps and create an index file named index.html inside webapps/health/ with the following contents.

<HTML>
  <HEAD>
    <TITLE>Tomcat status</TITLE>
  </HEAD>
  <BODY>
    <H1>Tomcat Running</H1>
   </BODY>
</HTML>

test with the following URL

http://localhost:8080/health

How to get the state of a service with sd-bus?

Using the sd-bus API is absolutely correct (header #include <systemd/sd-bus.h>)

First you need get access to a bus object:

I do this:

Systemctl::Systemctl() :
    m_bus(nullptr)
{
    int r = sd_bus_default_system(&m_bus);

    if (r < 0)
        throw exception("Could not open systemd bus");
}

If you're having problems opening the bus:

Run as root/sudo
Make some polkit policies to grant your user/group access to this command
Run the _user bus instead of the _system bus

Don't forget to release the bus when you are done:

Systemctl::~Systemctl()
{
    sd_bus_unref(m_bus);
}

Now you had 3 questions:

Query the status

For each unit, I have a class which holds the escaped name (foo_2eservice) as m_name, and a reference to the bus in m_bus. Call this method with any property. You seem to be most interested in "ActiveState" or "SubState".

std::string Unit::GetPropertyString(const std::string& property) const
{
    sd_bus_error err = SD_BUS_ERROR_NULL;
    char* msg = nullptr;
    int r;

    r = sd_bus_get_property_string(m_bus,
        "org.freedesktop.systemd1",
        ("/org/freedesktop/systemd1/unit/" + m_unit).c_str(),
        "org.freedesktop.systemd1.Unit",
        property.c_str(),
        &err,
        &msg);

    if (r < 0)
    {
        std::string err_msg(err.message);
        sd_bus_error_free(&err);

        std::string err_str("Failed to get " + property + " for service "
                            + m_name + ". Error: " + err_msg);

        throw exception(err_str);
    }

    sd_bus_error_free(&err);

    // Free memory (avoid leaking)
    std::string ret(msg);
    free (msg);

    return ret;
}

Monitor the status of a service:

The first step is to set up a file-descriptor to subscribe to changes. In this case you are interested in subscribing to the "PropertiesChanged" signal. Note that you'll get a signal for any property changing, not just the state. In the sd_bus_add_match() call, there is room for a callback, though I haven't experimented with it.

void Systemctl::SubscribeToUnitChanges(const std::string& escaped_name)
{
    /* This function is an easier helper, but it as only introduced in systemd 237
     * Stretch is on 232 while buster is on 241 .  Need re replace this as long as
     * we still support stretch
    sd_bus_match_signal(
        m_bus,
        nullptr, // slot
        nullptr, // sender
        std::string("/org/freedesktop/systemd1/unit/" + escaped_name).c_str(), // path
        "org.freedesktop.DBus.Properties", // interface
        "PropertiesChanged", // member
        nullptr, // callback
        nullptr // userdata
    );
    */
    std::string match =  "type='signal'";
        match += ",path='/org/freedesktop/systemd1/unit/" + escaped_name + "'" ;
        match += ",interface='org.freedesktop.DBus.Properties'";
        match += ",member='PropertiesChanged'";

    sd_bus_add_match(
        m_bus,
        nullptr, // slot
        match.c_str(),
        nullptr, // callback
        nullptr // userdata
    );
}

Instead what I do is periodically poll the bus for the subscribed changes and update each unit:

bool Systemctl::ProcessBusChanges()
{
    bool changed = false;
    sd_bus_message* msg = nullptr;

    // for each new message
    std::list<std::string> escaped_names;
    while( sd_bus_process(m_bus, &msg) )
    {
        // Note:  Once sd_bus_process returns 0, We are supposed to call
        // sd_bus_wait, or check for changes on sd_bus_get_fd before calling
        // this function again.  We're breaking that rule.  I don't really know
        // the consequences.
        if (msg)
        {
            std::string path = strna( sd_bus_message_get_path(msg) );
            sd_bus_message_unref(msg);

            std::string escaped_name = path.erase(0, path.find_last_of('/')+1 );
            escaped_names.push_back(escaped_name);

            changed = true;
        }
    }

    escaped_names.sort();
    escaped_names.unique();
    for (auto unit : escaped_names)
    {
        auto it = m_units.find(unit);
        if (it != m_units.end())
            it->second.RefreshDynamicProperties();
    }

    return changed;
}

If it tells us that the bus has changed, then I go ahead and read all of my monitored units on that bus.

Change the status

This one is easy. I use the following, where method is one of "StartUnit", "StopUnit", or "RestartUnit".

static void CallMethodSS(sd_bus* bus,
                         const std::string& name,
                         const std::string& method)
{
    sd_bus_error err = SD_BUS_ERROR_NULL;
    sd_bus_message* msg = nullptr;
    int r;

    r = sd_bus_call_method(bus,
        "org.freedesktop.systemd1",         /* <service>   */
        "/org/freedesktop/systemd1",        /* <path>      */
        "org.freedesktop.systemd1.Manager", /* <interface> */
        method.c_str(),                     /* <method>    */
        &err,                               /* object to return error in */
        &msg,                               /* return message on success */
        "ss",                               /* <input_signature (string-string)> */
        name.c_str(),  "replace" );         /* <arguments...> */

    if (r < 0)
    {
        std::string err_str("Could not send " + method +
                            " command to systemd for service: " + name +
                            ". Error: " + err.message );

        sd_bus_error_free(&err);
        sd_bus_message_unref(msg);
        throw exception(err_str);
    }

    // Extra stuff that might be useful:  display the response...
    char* response;
    r = sd_bus_message_read(msg, "o", &response);
    if (r < 0)
    {
      LogError("Failed to parse response message: %s\n", strerror(-r) );
    }

    sd_bus_error_free(&err);
    sd_bus_message_unref(msg);
}

How to monitor a systemd service using telegraf?

So I found that indeed there is a plugin that monitors systems service,
The name is systemd_timings.

This is the configuration I've implemented:

# Gather systemd units state
[[inputs.systemd_units]]
  ## Set timeout for systemctl execution
   timeout = "1s"

  # Filter for a specific unit type, default is "service", other possible
  # values are "socket", "target", "device", "mount", "automount", "swap",
  # "timer", "path", "slice" and "scope ":
  unittype = "service"

  # Filter for a specific pattern, default is "" (i.e. all), other possible
  # values are valid pattern for systemctl, e.g. "a*" for all units with
  # names starting with "a"
  pattern = ""
  ## pattern = "telegraf* influxdb*"
  ## pattern = "a*"

After getting the metrics in the influxDB This is the query I used to extract the data I needed:

from(bucket: "veeva")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_field"] == "active_code")
  |> filter(fn: (r) => r["_measurement"] == "systemd_units")
  |> filter(fn: (r) => r["active"] == "active")
  |> filter(fn: (r) => r["host"] == "10.192.21.66")
  |> filter(fn: (r) => r["name"] == "myservice.service")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> yield(name: "mean")
  [1]:

And this is how it looks like in Grafana:

Sample Image
https://docs.influxdata.com/telegraf/v1.22/plugins/#systemd_timings

systemctl short status output format for specific service

You can try systemctl is-active sshd.service, systemctl is-enabled sshd.service and systemctl is-failed sshd.service.

Output from systemctl start/restart/stop

To my knowledge, there is no such thing. That being said, you can go ahead and "make you own":

We're going to edit out bashrc file to add this as a an alias command

echo "startstat(){ systemctl start \$*; systemctl status \$* }" >> ~/.bashrc

Note that this will only work for bash sessions and for the user you're running it for, so don't run this inside stuff that doesn't run bashrc before starting.

You can then start services and immediately get the status by running

startstat [arguments to pass to BOTH systemctl start AND systemctl status]

Sample usage:

startstat systemd-networkd

If you want to wait a little bit before checking the status, you can always add a sleep between:

Just nano ~/.bashrc, scroll to the bottom (or if you added things, whichever line it's at), and just add sleep [seconds]; between systemctl start \$*; and systemctl status \$*;

If you want the status to be run after the start is finished, you can put a singular & sign with a space in front of it between the \$* and the ; to fork it off into background.

Systemd Http Health Check