How do you debug a process for which you have no output and no exit code?

Here is the story of how I solved such problem. No, it was not DNS.

# Putting things in… Motion

Motion lets you build DIY motion-detection for various types of cameras. It allows pretty extensive configuration through a text-based configuration file. A few configuration options, e.g. on_motion_detected, execute arbitrary programs or scripts to notify the user of the event.

To make things concrete, the following configuration should log messages into the system log1.

video_device /dev/video0

# ... other configuration options setting up the input size, etc.

############################################################
# Script execution configuration parameters
############################################################

# Command to be executed when an event starts.
on_event_start logger "Motion detected!"

I happily tried this and… it did not work, no log was printed.

To figure out why, I configured Motion’s log to print debug entries. It confirmed that the event was being detected and the command executed:

[DBG] [EVT] exec_command: Executing external command 'logger on_event_start'
[INF] [ALL] on_event_start            logger on_event_start

But it did not give me any additional clue. So, what next? Being opensource, we can look at what happens when it tries executing the command:

// https://github.com/Motion-Project/motion/blob/8d69cbf27a1c2fcffd7ff0f5558f3e7de277b8d1/src/event.c#L75

/**
 * exec_command
 *      Execute 'command' with 'arg' as its argument.
 *      if !arg command is started with no arguments
 *      Before we call execl we need to close all the file handles
 *      that the fork inherited from the parent in order not to pass
 *      the open handles on to the shell
 */
static void exec_command(struct context *cnt, char *command, char *filename, int filetype)
{
    char stamp[PATH_MAX];
    mystrftime(cnt, stamp, sizeof(stamp), command, &cnt->current_image->timestamp_tv, filename, filetype);

    if (!fork()) {

        /* Detach from parent */
        setsid();

        execl("/bin/sh", "sh", "-c", stamp, " &", NULL);

        /* if above function succeeds the program never reach here */
        MOTION_LOG(ALR, TYPE_EVENTS, SHOW_ERRNO
            ,_("Unable to start external command '%s'"), stamp);

        exit(1);
    }

    MOTION_LOG(DBG, TYPE_EVENTS, NO_ERRNO
        ,_("Executing external command '%s'"), stamp);
}

Relatively straightforward2: the classic fork-then-exec pattern. Being able to read sh’s exit code or its standard error would be gold to debug this, but motion does not expose them.

Worry not, this just means we can have some fun!

# Enters gdb

I had a suspect that sh would tell me more about the problem and I wanted to somehow convince it to spit its output. I originally thought to replace /bin/sh with a wrapper that would log to file. Feasible, but annoying and likely to wreak havoc at the system level.

Then, I thought of gdb. If we could pause the process execution at the right time — right after the fork — maybe we could redirect the standard error to a file. Without modifying the /bin/sh executable at all.

It turns out that it is possible, by being clever about how file descriptors are allocated on Unix and abusing the print command in GDB to evaluate arbitrary espressione.

The snippet below:

  1. Hooks gdb to the motion process.
  2. Configures gdb to follow the child in case of forks.
  3. Sets a breakpoint at the function exec_command above.
  4. Closes file-descriptor 1 (std-out) and opens a new file descriptor, that will be assigned the now-free 1 descriptor.
  5. Does the same with file-descriptor 2 for std-out.
$ sudo gdb -q -p $(pgrep motion) $(which notion)
Attaching to process 865984
[New LWP 865999]
[New LWP 866000]
[New LWP 866001]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/k7zgvzp2r31zkg9xqgjim7mbknryv6bs-glibc-2.39-52/lib/libthread_db.so.1".
0x00007f92146e90c5 in clock_nanosleep@GLIBC_2.2.5 () from /nix/store/k7zgvzp2r31zkg9xqgjim7mbknryv6bs-glibc-2.39-52/lib/libc.so.6
(gdb) set follow-fork-mode child
(gdb) b exec_command
Breakpoint 1 at 0x40ca00
(gdb) c

# Moving a hand in front of the camera triggers the breakpoint.
# The process forks, we stay with the child.
# [...]

Continuing.
(gdb) p (int)close(1)
$1 = 0
(gdb) p (int)creat("/tmp/stdout", 0600)
$2 = 1
(gdb) p (int)close(2)
$3 = 0
(gdb) p (int)creat("/tmp/stderr", 0600)
$4 = 2
(gdb) c
Continuing.

Et voilà! By peeking at /tmp/stderr we have our answer: logger: command not found.

# The smoking gun: NixOS

To understand why things did not work, I need to tell you that I am playing with NixOS. My Nix configuration runs motion through a systemd service that looks roughly like this:

systemd.services.motion = {
  description = "motion";
  serviceConfig = {
    # The configuration that I am using (not shown here) checks a few more things,
    # that is why it is wrapped in `bash`.
    ExecStart = "${pkgs.bash}/bin/bash -c '${pkgs.motion}/bin/motion -n -c /etc/motion/motion.conf'";
  };
    WantedBy = [ "default.target" ];
};

How does NixOS configure the PATH of a systemd service? 3

sudo systemctl cat motion
# /etc/systemd/system/motion.service
[Service]
# ...
Environment="PATH=/nix/store/php4qidg2bxzmm79vpri025bqi0fa889-coreutils-9.5/bin:/nix/store/jjcsr5gs4qanf7ln5c6wgcq4sn75a978-findutils-4.9.0/bin:/nix/store/28gpmx3z6ss3znd7fhmrzmvk3x5lnfbk-gnugrep-3.11/bin:/nix/store/5zjms21vpxlk>

It explicitly sets a PATH variable that contains all runtime dependencies of the Nix packages being wrapped into a service — motion and bash, in my case. Of course, Nix has no way of knowing that the configuration requires logger. But we can bring it into the path with the additional path key:

systemd.services.motion = {
  description = "motion";
  # 👇 this
  path = [pkgs.logger];
  serviceConfig = {
    # The configuration that I am using (not shown here) checks a few more things,
    # that is why it is wrapped in `bash`.
    ExecStart = "${pkgs.bash}/bin/bash -c '${pkgs.motion}/bin/motion -n -c /etc/motion/motion.conf'";
  };
    WantedBy = [ "default.target" ];
};

The change fixed the issue and things started to run as expected.

# Wrapping up

I am sure there are dozens of other ways to debug this and I’d love to hear them! Reach out if you want to exchange debugging stories. Until next time!

# Footnotes

  1. Today I learned about the logger command

  2. It took me a while to remember my first-year computer science classes and figure out that sh, the second argument to execl, sets arg0

  3. Before looking at the systemd definition, I looked directly a the process environment with sudo xargs -0 -L1 -a /proc/$(pgrep motion)/environ