A curious thing happened recently. We had some of our team asked to take a thread dump of a java process, easy enough. They knew that they could
kill -3 to get the JVM to perform a thread dump if they didn't have access to
jstack, and they knew that it would go to one of STDOUT/STDERR. However they couldn't find where that output went and assumed it was lost and gave up on capturing the stack output.
Now granted we have a mix of
daemontools and
systemd across the environments, and if you're used to one you may not know how to find it in the other. However I think what was missing was some basic linux process information, eg: thinking of
daemontools or
systemd is attacking this from the wrong end of the problem.
Background
Let's have a look at my shell for a start
dime@crobxps:~$ echo what I type is standard in, and what comes out is standard out
what I type is standard in, and what comes out is standard out
That's easy enough, I'm typing in to my terminal, which is STDIN for
bash, then this is feeding echo with the rest of the text,
echo then sends the text back to the terminal via STDOUT.
How about some manipulation of that, what if I don't want to display anything?
dime@crobxps:~$ echo Redirect stdout to /dev/null > /dev/null
dime@crobxps:~$
Excellent, we've told bash via
STDIN that we want the
echo process to run, but when running it redirect all its STDOUT to
/dev/null, so we don't see it on screen.
OK what about STDERR? How do we manipulate that?
dime@crobxps:~$ echo Redirect stderr to stdout, and then stdout to /dev/null 2>&1 > /dev/null
dime@crobxps:~$
As it says there, what we've done is told bash to redirect STDERR to STDOUT, and then direct STDOUT to
/dev/null.
All of this is manipulating bash to tell it how to handle the the STDERR and STDOUT when it launches the
echo process.
But it brings up two things. STDERR is 2, and STDOUT is 1. These are both file descriptors, which are the id's used to read and write from files. STDERR and STDOUT can be written to and read from like files. Every process will (or should) have a files open with these file descriptor id's which will be the STDOUT/STDERR for the process. There's a third one with id of 0, which is STDIN.
OK, now what?
They're files? They're files!
I know that the process will have files open that map to STDIN/STDOUT/STDERR. So if I can look at what files the process has open I can see where these might be mapped to.
Lets look at the bash process I'm running, we can see which files it has open by looking in
/proc
dime@crobxps:~$ ls -l /proc/$(pgrep bash)/fd
total 0
lrwx------ 1 dime dime 0 Jul 13 12:14 0 -> /dev/tty1
lrwx------ 1 dime dime 0 Jul 13 12:14 1 -> /dev/tty1
lrwx------ 1 dime dime 0 Jul 13 12:13 2 -> /dev/tty1
lrwx------ 1 dime dime 0 Jul 13 12:14 255 -> /dev/tty1
I'm using
pgrep to find the PID of the
bash process, then I'm looking in the
fd directory for this process in
/proc. Here I can see 0, 1 and 2 all map to the
/dev/tty1 device, which is the terminal I'm using. That's where my STDIN, STDOUT and STDERR are.
So if I have a java process, where would I find the thread dump output?
dime@crobxps:~$ ls -l /proc/$(pgrep java)/fd
total 0
lrwx------ 1 dime dime 0 Jul 13 12:14 0 -> /dev/null
lrwx------ 1 dime dime 0 Jul 13 12:14 1 -> /var/log/javaservice/output
lrwx------ 1 dime dime 0 Jul 13 12:13 2 -> /var/log/javaservice/error
There we go. I would have to look in either
/var/log/javaservice/output or
/var/log/javaservice/error to find the thread dump for my process.
What if they're not files?
Systemd does something a little different. It will map STDIN/STDOUT to sockets, for example (and this is very configurable with systemd):
$ sudo ls -l /proc/$(pgrep -f /usr/bin/gpg-agent)/fd
total 0
lr-x------ 1 root root 64 Jun 27 02:20 0 -> /dev/null
lrwx------ 1 root root 64 Jun 27 02:20 1 -> 'socket:[753362]'
lrwx------ 1 root root 64 Jun 27 02:20 2 -> 'socket:[753362]'
So how do I find where the output is for this one? You can use
ps too tell you which systemd unit started the process, and then from that you can get the StandardOutput and StandardError properties for that unit, like so:
dime at crobbox in ~
$ ps -o unit $(pgrep -f /usr/bin/gpg-agent)
UNIT
user@1000.service
dime at crobbox in ~
$ sudo systemctl show user@1000.service -p StandardOutput, StandardError
StandardOutput=journal
StandardOutput=inherit
The journal value here means it ends up in the systemd journal. This property could also be a file which you would look in to find the output you need.
If it's set to journal you can view this units output like this:
dime at crobbox in ~
$ sudo journalctl -u user@1000.service -t gpg-agent | tail
May 02 02:22:04 crobbox gpg-agent[13117]: listening on: std=6 extra=4 browser=5 ssh=3
Jun 24 08:36:48 crobbox gpg-agent[13117]: SIGTERM received - shutting down ...
Jun 24 08:36:48 crobbox gpg-agent[13117]: gpg-agent (GnuPG) 2.2.8 stopped
-- Reboot --
Jun 27 02:20:26 crobbox gpg-agent[5470]: gpg-agent (GnuPG) 2.2.8 starting in supervised mode.
Jun 27 02:20:26 crobbox gpg-agent[5470]: using fd 3 for extra socket (/run/user/1000/gnupg/S.gpg-agent.extra)
Jun 27 02:20:26 crobbox gpg-agent[5470]: using fd 4 for ssh socket (/run/user/1000/gnupg/S.gpg-agent.ssh)
Jun 27 02:20:26 crobbox gpg-agent[5470]: using fd 5 for browser socket (/run/user/1000/gnupg/S.gpg-agent.browser)
Jun 27 02:20:26 crobbox gpg-agent[5470]: using fd 6 for std socket (/run/user/1000/gnupg/S.gpg-agent)
Jun 27 02:20:26 crobbox gpg-agent[5470]: listening on: std=6 extra=3 browser=5 ssh=4
Here I can see all the STDOUT and STDERR for the
gpg-agent process on this host. The
-t flag here will filter the output to one identifier since this systemd service has a lot of other log lines in it from other processes. I know I just wanted to see the
gpg-agent lines I have filtered it down to just these with
-t.
Conclusion
Now you will never lose track of where you STDOUT or STDERR output has gone. You know that file descriptors 0, 1 and 2 are on every process, and where they are mapped too. And if the process is controlled by systemd you can navigate it's config to find out where the output has gone.