Debugging a silent process
How do you debug a process for which you have no output and no exit code?
Here is the story of how I solved such problem. No, it was not DNS.
# Putting things in… Motion
Motion lets you build DIY motion-detection for various types of cameras.
It allows pretty extensive configuration through a text-based configuration
file. A few configuration options, e.g. on_motion_detected
, execute
arbitrary programs or scripts to notify the user of the event.
To make things concrete, the following configuration should log messages into the system log1.
video_device /dev/video0
# ... other configuration options setting up the input size, etc.
############################################################
# Script execution configuration parameters
############################################################
# Command to be executed when an event starts.
on_event_start logger "Motion detected!"
I happily tried this and… it did not work, no log was printed.
To figure out why, I configured Motion’s log to print debug entries. It confirmed that the event was being detected and the command executed:
[DBG] [EVT] exec_command: Executing external command 'logger on_event_start'
[INF] [ALL] on_event_start logger on_event_start
But it did not give me any additional clue. So, what next? Being opensource, we can look at what happens when it tries executing the command:
// https://github.com/Motion-Project/motion/blob/8d69cbf27a1c2fcffd7ff0f5558f3e7de277b8d1/src/event.c#L75
/**
* exec_command
* Execute 'command' with 'arg' as its argument.
* if !arg command is started with no arguments
* Before we call execl we need to close all the file handles
* that the fork inherited from the parent in order not to pass
* the open handles on to the shell
*/
static void exec_command(struct context *cnt, char *command, char *filename, int filetype)
{
char stamp[PATH_MAX];
mystrftime(cnt, stamp, sizeof(stamp), command, &cnt->current_image->timestamp_tv, filename, filetype);
if (!fork()) {
/* Detach from parent */
setsid();
execl("/bin/sh", "sh", "-c", stamp, " &", NULL);
/* if above function succeeds the program never reach here */
MOTION_LOG(ALR, TYPE_EVENTS, SHOW_ERRNO
,_("Unable to start external command '%s'"), stamp);
exit(1);
}
MOTION_LOG(DBG, TYPE_EVENTS, NO_ERRNO
,_("Executing external command '%s'"), stamp);
}
Relatively straightforward2: the classic fork-then-exec pattern. Being
able to read sh
’s exit code or its standard error would be gold to debug this,
but motion
does not expose them.
Worry not, this just means we can have some fun!
# Enters gdb
I had a suspect that sh
would tell me more about the problem and I wanted to
somehow convince it to spit its output. I originally thought to replace /bin/sh
with a wrapper that would log to file. Feasible, but annoying and likely to
wreak havoc at the system level.
Then, I thought of gdb
. If we could pause the process execution at the right
time — right after the fork — maybe we could redirect the standard error to
a file. Without modifying the /bin/sh
executable at all.
It turns out that it is possible, by being clever about how file
descriptors are allocated on Unix and abusing the print
command in GDB to
evaluate arbitrary espressione.
The snippet below:
- Hooks
gdb
to themotion
process. - Configures
gdb
to follow the child in case of forks. - Sets a breakpoint at the function
exec_command
above. - Closes file-descriptor
1
(std-out) and opens a new file descriptor, that will be assigned the now-free1
descriptor. - Does the same with file-descriptor
2
for std-out.
$ sudo gdb -q -p $(pgrep motion) $(which notion)
Attaching to process 865984
[New LWP 865999]
[New LWP 866000]
[New LWP 866001]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/k7zgvzp2r31zkg9xqgjim7mbknryv6bs-glibc-2.39-52/lib/libthread_db.so.1".
0x00007f92146e90c5 in clock_nanosleep@GLIBC_2.2.5 () from /nix/store/k7zgvzp2r31zkg9xqgjim7mbknryv6bs-glibc-2.39-52/lib/libc.so.6
(gdb) set follow-fork-mode child
(gdb) b exec_command
Breakpoint 1 at 0x40ca00
(gdb) c
# Moving a hand in front of the camera triggers the breakpoint.
# The process forks, we stay with the child.
# [...]
Continuing.
(gdb) p (int)close(1)
$1 = 0
(gdb) p (int)creat("/tmp/stdout", 0600)
$2 = 1
(gdb) p (int)close(2)
$3 = 0
(gdb) p (int)creat("/tmp/stderr", 0600)
$4 = 2
(gdb) c
Continuing.
Et voilà! By peeking at /tmp/stderr
we have our answer: logger: command not found
.
# The smoking gun: NixOS
To understand why things did not work, I need to tell you that I am playing
with NixOS. My Nix configuration runs motion
through a systemd
service that
looks roughly like this:
systemd.services.motion = {
description = "motion";
serviceConfig = {
# The configuration that I am using (not shown here) checks a few more things,
# that is why it is wrapped in `bash`.
ExecStart = "${pkgs.bash}/bin/bash -c '${pkgs.motion}/bin/motion -n -c /etc/motion/motion.conf'";
};
WantedBy = [ "default.target" ];
};
How does NixOS configure the PATH
of a systemd
service? 3
sudo systemctl cat motion
# /etc/systemd/system/motion.service
[Service]
# ...
Environment="PATH=/nix/store/php4qidg2bxzmm79vpri025bqi0fa889-coreutils-9.5/bin:/nix/store/jjcsr5gs4qanf7ln5c6wgcq4sn75a978-findutils-4.9.0/bin:/nix/store/28gpmx3z6ss3znd7fhmrzmvk3x5lnfbk-gnugrep-3.11/bin:/nix/store/5zjms21vpxlk>
It explicitly sets a PATH
variable that contains all runtime dependencies of
the Nix packages being wrapped into a service — motion
and bash
, in my
case. Of course, Nix has no way of knowing that the configuration requires
logger
. But we can bring it into the path with the additional path
key:
systemd.services.motion = {
description = "motion";
# 👇 this
path = [pkgs.logger];
serviceConfig = {
# The configuration that I am using (not shown here) checks a few more things,
# that is why it is wrapped in `bash`.
ExecStart = "${pkgs.bash}/bin/bash -c '${pkgs.motion}/bin/motion -n -c /etc/motion/motion.conf'";
};
WantedBy = [ "default.target" ];
};
The change fixed the issue and things started to run as expected.
# Wrapping up
I am sure there are dozens of other ways to debug this and I’d love to hear them! Reach out if you want to exchange debugging stories. Until next time!
# Footnotes
-
Today I learned about the
logger
command. ↩ -
It took me a while to remember my first-year computer science classes and figure out that
sh
, the second argument toexecl
, setsarg0
. ↩ -
Before looking at the
systemd
definition, I looked directly a the process environment withsudo xargs -0 -L1 -a /proc/$(pgrep motion)/environ
. ↩