A curious thing happened recently. We had some of our team asked to take a thread dump of a java process, easy enough. They knew that they could kill -3 to get the JVM to perform a thread dump if they didn't have access to jstack, and they knew that it would go to one of STDOUT/STDERR. However they couldn't find where that output went and assumed it was lost and gave up on capturing the stack output.
Now granted we have a mix of daemontools and systemd across the environments, and if you're used to one you may not know how to find it in the other. However I think what was missing was some basic linux process information, eg: thinking of daemontools or systemd is attacking this from the wrong end of the problem.
Let's have a look at my shell for a start
That's easy enough, I'm typing in to my terminal, which is STDIN for bash, then this is feeding echo with the rest of the text, echo then sends the text back to the terminal via STDOUT.
How about some manipulation of that, what if I don't want to display anything?
Excellent, we've told bash via STDIN that we want the echo process to run, but when running it redirect all its STDOUT to /dev/null, so we don't see it on screen.
OK what about STDERR? How do we manipulate that?
As it says there, what we've done is told bash to redirect STDERR to STDOUT, and then direct STDOUT to /dev/null.
All of this is manipulating bash to tell it how to handle the the STDERR and STDOUT when it launches the echo process.
But it brings up two things. STDERR is 2, and STDOUT is 1. These are both file descriptors, which are the id's used to read and write from files. STDERR and STDOUT can be written to and read from like files. Every process will (or should) have a files open with these file descriptor id's which will be the STDOUT/STDERR for the process. There's a third one with id of 0, which is STDIN.
OK, now what?
I know that the process will have files open that map to STDIN/STDOUT/STDERR. So if I can look at what files the process has open I can see where these might be mapped to.
Lets look at the bash process I'm running, we can see which files it has open by looking in /proc
I'm using pgrep to find the PID of the bash process, then I'm looking in the fd directory for this process in /proc. Here I can see 0, 1 and 2 all map to the /dev/tty1 device, which is the terminal I'm using. That's where my STDIN, STDOUT and STDERR are.
So if I have a java process, where would I find the thread dump output?
There we go. I would have to look in either /var/log/javaservice/output or /var/log/javaservice/error to find the thread dump for my process.
Systemd does something a little different. It will map STDIN/STDOUT to sockets, for example (and this is very configurable with systemd):
So how do I find where the output is for this one? You can use ps too tell you which systemd unit started the process, and then from that you can get the StandardOutput and StandardError properties for that unit, like so:
If it's set to journal you can view this units output like this:
Here I can see all the STDOUT and STDERR for the gpg-agent process on this host. The -t flag here will filter the output to one identifier since this systemd service has a lot of other log lines in it from other processes. I know I just wanted to see the gpg-agent lines I have filtered it down to just these with -t.
Now you will never lose track of where you STDOUT or STDERR output has gone. You know that file descriptors 0, 1 and 2 are on every process, and where they are mapped too. And if the process is controlled by systemd you can navigate it's config to find out where the output has gone.
Now granted we have a mix of daemontools and systemd across the environments, and if you're used to one you may not know how to find it in the other. However I think what was missing was some basic linux process information, eg: thinking of daemontools or systemd is attacking this from the wrong end of the problem.
Background
Let's have a look at my shell for a start
dime@crobxps:~$ echo what I type is standard in, and what comes out is standard out what I type is standard in, and what comes out is standard out
That's easy enough, I'm typing in to my terminal, which is STDIN for bash, then this is feeding echo with the rest of the text, echo then sends the text back to the terminal via STDOUT.
How about some manipulation of that, what if I don't want to display anything?
dime@crobxps:~$ echo Redirect stdout to /dev/null > /dev/null dime@crobxps:~$
Excellent, we've told bash via STDIN that we want the echo process to run, but when running it redirect all its STDOUT to /dev/null, so we don't see it on screen.
OK what about STDERR? How do we manipulate that?
dime@crobxps:~$ echo Redirect stderr to stdout, and then stdout to /dev/null 2>&1 > /dev/null dime@crobxps:~$
As it says there, what we've done is told bash to redirect STDERR to STDOUT, and then direct STDOUT to /dev/null.
All of this is manipulating bash to tell it how to handle the the STDERR and STDOUT when it launches the echo process.
But it brings up two things. STDERR is 2, and STDOUT is 1. These are both file descriptors, which are the id's used to read and write from files. STDERR and STDOUT can be written to and read from like files. Every process will (or should) have a files open with these file descriptor id's which will be the STDOUT/STDERR for the process. There's a third one with id of 0, which is STDIN.
OK, now what?
They're files? They're files!
I know that the process will have files open that map to STDIN/STDOUT/STDERR. So if I can look at what files the process has open I can see where these might be mapped to.
Lets look at the bash process I'm running, we can see which files it has open by looking in /proc
dime@crobxps:~$ ls -l /proc/$(pgrep bash)/fd total 0 lrwx------ 1 dime dime 0 Jul 13 12:14 0 -> /dev/tty1 lrwx------ 1 dime dime 0 Jul 13 12:14 1 -> /dev/tty1 lrwx------ 1 dime dime 0 Jul 13 12:13 2 -> /dev/tty1 lrwx------ 1 dime dime 0 Jul 13 12:14 255 -> /dev/tty1
I'm using pgrep to find the PID of the bash process, then I'm looking in the fd directory for this process in /proc. Here I can see 0, 1 and 2 all map to the /dev/tty1 device, which is the terminal I'm using. That's where my STDIN, STDOUT and STDERR are.
So if I have a java process, where would I find the thread dump output?
dime@crobxps:~$ ls -l /proc/$(pgrep java)/fd total 0 lrwx------ 1 dime dime 0 Jul 13 12:14 0 -> /dev/null lrwx------ 1 dime dime 0 Jul 13 12:14 1 -> /var/log/javaservice/output lrwx------ 1 dime dime 0 Jul 13 12:13 2 -> /var/log/javaservice/error
There we go. I would have to look in either /var/log/javaservice/output or /var/log/javaservice/error to find the thread dump for my process.
What if they're not files?
Systemd does something a little different. It will map STDIN/STDOUT to sockets, for example (and this is very configurable with systemd):
$ sudo ls -l /proc/$(pgrep -f /usr/bin/gpg-agent)/fd total 0 lr-x------ 1 root root 64 Jun 27 02:20 0 -> /dev/null lrwx------ 1 root root 64 Jun 27 02:20 1 -> 'socket:[753362]' lrwx------ 1 root root 64 Jun 27 02:20 2 -> 'socket:[753362]'
So how do I find where the output is for this one? You can use ps too tell you which systemd unit started the process, and then from that you can get the StandardOutput and StandardError properties for that unit, like so:
dime at crobbox in ~ $ ps -o unit $(pgrep -f /usr/bin/gpg-agent) UNIT user@1000.service dime at crobbox in ~ $ sudo systemctl show user@1000.service -p StandardOutput, StandardError StandardOutput=journal StandardOutput=inheritThe journal value here means it ends up in the systemd journal. This property could also be a file which you would look in to find the output you need.
If it's set to journal you can view this units output like this:
dime at crobbox in ~ $ sudo journalctl -u user@1000.service -t gpg-agent | tail May 02 02:22:04 crobbox gpg-agent[13117]: listening on: std=6 extra=4 browser=5 ssh=3 Jun 24 08:36:48 crobbox gpg-agent[13117]: SIGTERM received - shutting down ... Jun 24 08:36:48 crobbox gpg-agent[13117]: gpg-agent (GnuPG) 2.2.8 stopped -- Reboot -- Jun 27 02:20:26 crobbox gpg-agent[5470]: gpg-agent (GnuPG) 2.2.8 starting in supervised mode. Jun 27 02:20:26 crobbox gpg-agent[5470]: using fd 3 for extra socket (/run/user/1000/gnupg/S.gpg-agent.extra) Jun 27 02:20:26 crobbox gpg-agent[5470]: using fd 4 for ssh socket (/run/user/1000/gnupg/S.gpg-agent.ssh) Jun 27 02:20:26 crobbox gpg-agent[5470]: using fd 5 for browser socket (/run/user/1000/gnupg/S.gpg-agent.browser) Jun 27 02:20:26 crobbox gpg-agent[5470]: using fd 6 for std socket (/run/user/1000/gnupg/S.gpg-agent) Jun 27 02:20:26 crobbox gpg-agent[5470]: listening on: std=6 extra=3 browser=5 ssh=4
Here I can see all the STDOUT and STDERR for the gpg-agent process on this host. The -t flag here will filter the output to one identifier since this systemd service has a lot of other log lines in it from other processes. I know I just wanted to see the gpg-agent lines I have filtered it down to just these with -t.
Conclusion
Now you will never lose track of where you STDOUT or STDERR output has gone. You know that file descriptors 0, 1 and 2 are on every process, and where they are mapped too. And if the process is controlled by systemd you can navigate it's config to find out where the output has gone.
No comments:
Post a Comment