Friday 22 May 2020

Headless install of raspberry pi

This is mostly all here, but here's a simplified version - this was current as of 2019, I haven't got a pi handy to test with right now.

The goal here is to set up an SD card with the installer, and then power on the Raspberry Pi with that SD card and have it appear on the network after it's finished installing without any interaction. This is handy when you don't have a spare screen or keyboard to be able to perform the install interactively.

How to:
1. Format an SD card.
2. Get the NOOBs distribution zip, extract it to previously formatted SD card.
3. Create a file called 'ssh' in the root directory of the SD card to enable the SSH server so you can login after it boots.
4. Create a file called wpa_supplicant.conf in the root directory of the SD card to join a wifi network automagically, with contents like:

country=gb
update_config=1
ctrl_interface=/var/run/wpa_supplicant

network={
    scan_ssid=1
    ssid="Network name goes here"
    psk="password goes here"
}

5. Open the recovery.cmdline from the SD card and append 'silentinstall' to the line

6. Make sure you set all your line endings to unix style in the editor you use.

7. Eject the SD card, insert it in the pi and boot it. It should appear on your network after it finishes installing (not hours, 10s of minutes).

Saturday 13 July 2019

Uh oh, where is STDOUT and STDERR for my process?

A curious thing happened recently. We had some of our team asked to take a thread dump of a java process, easy enough. They knew that they could kill -3 to get the JVM to perform a thread dump if they didn't have access to jstack, and they knew that it would go to one of STDOUT/STDERR. However they couldn't find where that output went and assumed it was lost and gave up on capturing the stack output.

Now granted we have a mix of daemontools and systemd across the environments, and if you're used to one you may not know how to find it in the other. However I think what was missing was some basic linux process information, eg: thinking of daemontools or systemd is attacking this from the wrong end of the problem.

Background


Let's have a look at my shell for a start
dime@crobxps:~$ echo what I type is standard in, and what comes out is standard out
what I type is standard in, and what comes out is standard out

That's easy enough, I'm typing in to my terminal, which is STDIN for bash, then this is feeding echo with the rest of the text, echo then sends the text back to the terminal via STDOUT.

How about some manipulation of that, what if I don't want to display anything?
dime@crobxps:~$ echo Redirect stdout to /dev/null > /dev/null
dime@crobxps:~$

Excellent, we've told bash via STDIN that we want the echo process to run, but when running it redirect all its STDOUT to /dev/null, so we don't see it on screen.

OK what about STDERR? How do we manipulate that?
dime@crobxps:~$ echo Redirect stderr to stdout, and then stdout to /dev/null 2>&1 > /dev/null
dime@crobxps:~$

As it says there, what we've done is told bash to redirect STDERR to STDOUT, and then direct STDOUT to /dev/null.

All of this is manipulating bash to tell it how to handle the the STDERR and STDOUT when it launches the echo process.

But it brings up two things. STDERR is 2, and STDOUT is 1. These are both file descriptors, which are the id's used to read and write from files. STDERR and STDOUT can be written to and read from like files. Every process will (or should) have a files open with these file descriptor id's which will be the STDOUT/STDERR for the process. There's a third one with id of 0, which is STDIN.

OK, now what?

They're files? They're files!


I know that the process will have files open that map to STDIN/STDOUT/STDERR. So if I can look at what files the process has open I can see where these might be mapped to.

Lets look at the bash process I'm running, we can see which files it has open by looking in /proc
dime@crobxps:~$ ls -l /proc/$(pgrep bash)/fd
total 0
lrwx------ 1 dime dime 0 Jul 13 12:14 0 -> /dev/tty1
lrwx------ 1 dime dime 0 Jul 13 12:14 1 -> /dev/tty1
lrwx------ 1 dime dime 0 Jul 13 12:13 2 -> /dev/tty1
lrwx------ 1 dime dime 0 Jul 13 12:14 255 -> /dev/tty1

I'm using pgrep to find the PID of the bash process, then I'm looking in the fd directory for this process in /proc. Here I can see 0, 1 and 2 all map to the /dev/tty1 device, which is the terminal I'm using. That's where my STDIN, STDOUT and STDERR are.

So if I have a java process, where would I find the thread dump output?
dime@crobxps:~$ ls -l /proc/$(pgrep java)/fd
total 0
lrwx------ 1 dime dime 0 Jul 13 12:14 0 -> /dev/null
lrwx------ 1 dime dime 0 Jul 13 12:14 1 -> /var/log/javaservice/output
lrwx------ 1 dime dime 0 Jul 13 12:13 2 -> /var/log/javaservice/error

There we go. I would have to look in either /var/log/javaservice/output or /var/log/javaservice/error to find the thread dump for my process.

What if they're not files?


Systemd does something a little different. It will map STDIN/STDOUT to sockets, for example (and this is very configurable with systemd):
$ sudo ls -l /proc/$(pgrep -f /usr/bin/gpg-agent)/fd
total 0
lr-x------ 1 root root 64 Jun 27 02:20 0 -> /dev/null
lrwx------ 1 root root 64 Jun 27 02:20 1 -> 'socket:[753362]'
lrwx------ 1 root root 64 Jun 27 02:20 2 -> 'socket:[753362]'

So how do I find where the output is for this one? You can use ps too tell you which systemd unit started the process, and then from that you can get the StandardOutput and StandardError properties for that unit, like so:
dime at crobbox in ~
$ ps -o unit $(pgrep -f /usr/bin/gpg-agent)
UNIT
user@1000.service

dime at crobbox in ~
$ sudo systemctl show user@1000.service -p StandardOutput, StandardError
StandardOutput=journal

StandardOutput=inherit

The journal value here means it ends up in the systemd journal. This property could also be a file which you would look in to find the output you need.

If it's set to journal you can view this units output like this:
dime at crobbox in ~
$ sudo journalctl -u user@1000.service -t gpg-agent | tail
May 02 02:22:04 crobbox gpg-agent[13117]: listening on: std=6 extra=4 browser=5 ssh=3
Jun 24 08:36:48 crobbox gpg-agent[13117]: SIGTERM received - shutting down ...
Jun 24 08:36:48 crobbox gpg-agent[13117]: gpg-agent (GnuPG) 2.2.8 stopped
-- Reboot --
Jun 27 02:20:26 crobbox gpg-agent[5470]: gpg-agent (GnuPG) 2.2.8 starting in supervised mode.
Jun 27 02:20:26 crobbox gpg-agent[5470]: using fd 3 for extra socket (/run/user/1000/gnupg/S.gpg-agent.extra)
Jun 27 02:20:26 crobbox gpg-agent[5470]: using fd 4 for ssh socket (/run/user/1000/gnupg/S.gpg-agent.ssh)
Jun 27 02:20:26 crobbox gpg-agent[5470]: using fd 5 for browser socket (/run/user/1000/gnupg/S.gpg-agent.browser)
Jun 27 02:20:26 crobbox gpg-agent[5470]: using fd 6 for std socket (/run/user/1000/gnupg/S.gpg-agent)
Jun 27 02:20:26 crobbox gpg-agent[5470]: listening on: std=6 extra=3 browser=5 ssh=4

Here I can see all the STDOUT and STDERR for the gpg-agent process on this host. The -t flag here will filter the output to one identifier since this systemd service has a lot of other log lines in it from other processes. I know I just wanted to see the gpg-agent lines I have filtered it down to just these with -t.

Conclusion


Now you will never lose track of where you STDOUT or STDERR output has gone. You know that file descriptors 0, 1 and 2 are on every process, and where they are mapped too. And if the process is controlled by systemd you can navigate it's config to find out where the output has gone.

Tuesday 4 June 2019

Finding a song I only vaguely remember with Docker and Mysql? (and luck)

The Background

Last night, while watching Love Island (yeah, so what? 😃) after coming back from an ad break they played a song I recognised from the early 2000s by Tim Deluxe (slightly nsfw, depending on your workplace) https://www.youtube.com/watch?v=3FjauOAhBHw

This reminded me of another song that I think was around at the same time (Disclaimer: 'same time' here might mean a period of my life spanning a few years, in this case 'University'), but I was struggling to remember what it was called, or who it was by.

All I remember of that mostly forgotten tune was an animated dog walking around, the basic bass melody part, and that I think it had something about the weekend in the name. I remember talking to someone about it at university, so this was somewhere between 2001 and 2003. I must have seen it at some time on a Friday or Saturday morning on TV in Australia on Rage.

First step to try and find this song I'm thinking about: Some random google searching. Trying 'dog walking music video', 'music video with a dog from early 2000s', and various combinations of that revealed nothing closer to the song I was thinking of. I tried the images and videos tabs, nothing looked familiar.

Next step: asking friends from where I grew up. Nope, no one saw it. There were some suggestions though: Nope, also nope, still nope. Then someone suggested the The Internet Music Video Database, this could be something! However, since this is user entered data I wasn't too hopeful. I browsed through all videos released in 2000 to 2003 on this website (800-900 per year!) but could see nothing familiar.

So back to Rage, that's my only lead.

One big advantage here is that some time when the internet was getting a bit more popular Rage started publishing their playlists from every night online. People have taken these playlists and created websites that allow you to re-live these very late Friday and Saturday nights: rageagain.com and rageaholic.tv.

I noticed on rageagain.com that there's a link where you can download a dump of their database.
Well this makes it interesting. I'm sure if I could find the name I'd recognise it. And this is where I can step in and help myself!

The Findening

First I'll get the data from rageagain.com:

wget http://www.pjgalbraith.com/wp-content/uploads/rageagain-01-01-2013.sql.zip

It looks like it's a dump from mysql using phpmyadmin which should be easy enough to handle. I need a quick mysql instance so I'll look at the official docker mysql images first on Docker Hub.
In the examples on Docker Hub there's a docker-compose example that also includes a basic administration interface. First thing to do is get docker installed, I'll leave that as an exercise for yourself.

Next, create my docker-compose file, save the following as mysql.yaml:

# Use root/example as user/password credentials
version: '3.1'

services:
  db:
    image: mysql
    command: --default-authentication-plugin=mysql_native_password
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: example
  adminer:
    image: adminer
    restart: always
    ports:
      - 8080:8080

These are all the defaults given in the example, I'm not too interested in security here since this is a once off viewing of the data.
Bring the containers up from my docker compose file:

docker-compose -f mysql.yml up

Once everything's downloaded you should be able to access mysql adminer via http://localhost:8080/.

To login use the username of 'root' and the password of 'example' (or whatever you changed it to), leave the other values as defaults. Next, create a new database to put your import in by clicking on the Create Database link, give it a name of 'rageagain' and then click on Save.

Now you can import the data from rageagain.com. Click on the Import link on the left hand side, click on choose files and select the sql file you have from the zip file downloaded earlier. Click on Execute, wait a minute or so and then your database will be populated.

Now what?

I know that:
  • I heard this song some time between 2000 and 2003 
  • I vaguely something about the weekend or a holiday in the title.

The data in the database is divided into playlists and tracks. So lets try some queries and see if I recognise anything:

First, anything played between 2000 and 2003 with 'Weekend' anywhere in the title:

SELECT distinct t.artist, t.track
FROM `playlists` p, `tracks` t
where year(p.date) in ('2000','2001','2002','2003') and t.playlist_id = p.id
and UPPER(t.track) like '%WEEKEND%'
order by t.artist, t.track



8 rows, but no nothing familiar there.

OK, how about looking for each day name instead?

SELECT distinct t.artist, t.track
FROM `playlists` p, `tracks` t
where year(p.date) in ('2000','2001','2002','2003') and t.playlist_id = p.id
and UPPER(t.track) REGEXP 'FRIDAY|SATURDAY|SUNDAY'
order by t.artist, t.track


23 rows in total, this should be easy. AND THERE IT IS!

Johnny Corporate - Sunday Shoutin' 

As soon as I saw the name I recognised it (This looks like it was fast, and it was, but I did try a couple more queries out to check the data, but it was about this fast). And to verify it's the one i'm thinking of, the film clip: https://www.youtube.com/watch?v=3TLFdGNAo4g

It is! I've found it! Now I can get back to thinking about something else...

Saturday 4 May 2019

Part 2 - Streaming externally

This is Part 2. Part 1 is here.

In the first part a HD stream was created from the raspberry pi, turned into RTMP with ffmpeg, and then published and consumed through nginx. In this part I'll add an external host that can proxy this stream to the outside world. It prevents any connections to this stream to my home internet connection.

We need a couple of things:
  • An external host that you can ssh into with a public key from the raspberry pi.
  • An nginx instance running on your external ssh host
Next we need to create a couple of things which is what I'll cover below:
  • Script on the external host which will keep the SSH connection alive
  • Systemd unit that will restart the ssh tunnel if dropped
  • Nginx config to use the SSH tunnel to connect to the nginx instance running on the raspberry pi.
So why do it this way? There's one thing I didn't want - When no one is viewing the stream I do not want the stream to consume my home internet upload bandwidth. If I wasn't concerned with this  I'd just publish to an S3 bucket instead.

Keep Alive script

This short script will keep some random traffic flowing over the tty part of the SSH connection alive so that the port mapped connection stays up.

For this I'm using a script I've used in other places that just outputs a random set of words every few seconds to keep the connection up. If I don't do this the ssh connection will drop the connection after a timeout when  the terminal has had no traffic, if it does drop the connection will re-establish thanks to systemd.

This is the contents of ${HOME}/keepalive.sh on the external host:
DICT=/usr/share/dict/words
RANGE=$(wc -l ${DICT} | cut -d\  -f1)
while true; do
sleep 5
number=$(awk 'BEGIN{srand();print int(rand()*'${RANGE}')}') # or shuf, or jot, but awk is everywhere
let "number %= $RANGE"
sed ${number}'!d' $DICT
done

It's pretty simple, but the logic is:
  1. Get the number of words in the dictionary file on the local system
  2. Wait for 5 seconds
  3. Using awk, get a random number up to number of lines in the dictionary
  4. Display that word
  5. Go to step 2.

SSH Tunnel Systemd Unit

This one is also pretty simple, it belongs on the raspberry pi running the webcam. This service will:
  1. Wait for nginx to start
  2. Open a ssh connection to the external host
  3. Create a tunnel over that ssh connection from the local nginx server to a port on the external host that the external nginx host can proxy to.
  4. Start the keepalive.sh script above
You need to create a file called /etc/systemd/system/sshtunnel.service with this contents (remember to update the ssh line with your external user and host name):
[Unit]

Description=sshtunnel
After=nginx.service
After=systemd-user-sessions.service
After=rc-local.service
Before=getty.target

[Service]
ExecStart=/usr/bin/ssh externaluser@externalproxyhostname -R 8080:127.0.0.1:80 /home/dime/keepalive.sh
Type=simple
Restart=always
RestartSec=5
StartLimitIntervalSec=0
User=pi
Group=pi

[Install]
WantedBy=multi-user.target
Once the config is in place you need to enable it with:
systemctl enable sshtunnel

Once it starts it'll create that tunnel so the external host can proxy though to the raspberry pi. If the connection ever drops it'll restart, and if nginx ever restarts it'll make sure the tunnel is restarted also.

Nginx configuration for the external host

Assuming you might have an existing nginx setup externally, you'll need to add a section like this to your existing nginx conf:
server { # in any existing server config
  # Streaming webcam location
  location /window/ {
    proxy_pass http://127.0.0.1:8080/;
  }
}

And that's it. There is some improvement to be made around caching so when two or more viewers are watching the stream only one *.ts file is transferred, for now though keep it simple.

Starting

Once it's all in place you need to `systemctl restart nginx` on the raspberry pi webcam, this will:
- Restart nginx
- Restart the webcam stream
- Start the sshtunnel service

If all goes to plan you should now be able to access your webcam via your external proxy at
http://externalproxyhostname/webcam/

Next Time

For Part 3, some parameters will be adjusted in raspbivid and ffmpeg to allow us to store 24 hours of video.

Sunday 24 March 2019

Smooth streaming video from a raspberry pi camera


Here's how to set up a smooth streaming video webcam from a raspberry pi. In my case I have a raspberry pi zero w with the pi camera which streams a feed from a window.

Prerequisites:
  • Raspberry pi (zero, zero w, B, whatever - they all do the heavy h.264 encoding, unsure on A)
  • Raspberry pi camera
  • raspbivid installed and working (with raspi-config). Not covered here.
  • Network connection to raspberry pi

What we'll be doing is:
  • Compiling FFMPEG from source
  • Compiling NGINX with the rtmp module
  • Creating a simple index file to load a javascript hls library
  • Creating a script to start the streaming
  • Systemd units to keep everything going

In part 2, I'll show how to stream this via an external proxy.

FFMPEG

You'll now need to compile ffmpeg on the pi
# install libx246 dev tools
sudo apt-get install build-essential libx264-dev
# Get a copy of the latest ffmpeg
git clone git://source.ffmpeg.org/ffmpeg.git
cd ffmpeg/
# Configure ffmpeg with x246 and non-free codecs
./configure --enable-gpl --enable-nonfree --enable-libx264
# Make and install
make && sudo make install

Nginx with rtmp module

Since Nginx doesn't natively support the rtmp protocol, you have to compile it in.

Here's how:
# Download and install development tools
sudo apt-get install build-essential libpcre3 libpcre3-dev libssl-dev
# Get the rtmp module
git clone git://github.com/arut/nginx-rtmp-module.git
# Get the nginx source (at the time this was 1.14.1, review your versions...)
wget http://nginx.org/download/nginx-1.14.1.tar.gz
# Extract the nginx source and go in to the directory
tar xvzf nginx-1.14.1.tar.gz && cd nginx-1.14.1
# Configure nginx with defaults + ssl + rtmp module
./configure --with-http_ssl_module --add-module=../nginx-rtmp-module --
with-cc-opt=-Wno-error
# Compile, and install
make && sudo make install

Next create a directory to store the rtmp content in for streaming
sudo mkdir /webcam
sudo chown nobody: /webcam

Create an nginx configuration that uses the rtmp module in /usr/local/nginx/conf/nginx.conf:
worker_processes 4;
pid /run/nginx.pid;
error_log  logs/error.log debug;

events {
  worker_connections  512;
}

http {
  include mime.types;
  #default_type application/octet-stream;
  sendfile off;
  keepalive_timeout 65;

  server {
    listen 80;

    root /webcam/;

    location / {
      rewrite ^/webcam/(.*) /$1; # Allows http://site/ or http://site/webcam/
      index index.html;
      add_header Cache-Control no-cache;
      add_header 'Access-Control-Allow-Origin' '*';
      types {
        application/vnd.apple.mpegurl m3u8;
        video/mp2t ts;
        text/html html;
      }
    }
  }

}

rtmp {
  server {
    listen 1935;
    ping 30s;
    notify_method get;
    application video {
      live on;           # Enable live streaming
      meta copy;
      hls on;            # Enable HLS output
      hls_path /webcam; # Where to write HLS files
    }
  }
}

In summary of above:
- Users will connect to http://host/webcam/index.html and this will load a javascript library that loads the stream (next section)
- The rtmp stream will be published from ffmpeg to http://host/video/streamname, it will be served out at http://host/webcam/streamname

HTML file to load stream

Create an /webcam/index.html file that loads a public hls streaming javascript library - this will give you the streaming video interface. Make sure to update the URL to where you are hosting your stream:
<script src="https://cdn.jsdelivr.net/npm/hls.js@latest"></script>
  <video autoplay="" controls="" id="video"></video>
  <script>
    if (Hls.isSupported()) {
      var video = document.getElementById('video');
      var hls = new Hls();
      // bind them together
      hls.attachMedia(video);
      hls.on(Hls.Events.MEDIA_ATTACHED, function () {
        console.log("video and hls.js are now bound together !");
        hls.loadSource("http://raspberrypi.local/webcam/stream.m3u8");
        hls.on(Hls.Events.MANIFEST_PARSED, function (event, data) {
          console.log("manifest loaded, found " + data.levels.length + " quality level");
        });
        video.play();
      });
    }
</script>

Script to start the stream

Next part is to create a script that will launch rasbpivid to do the encoding, then feed that in to ffmpeg to do the rtmp handling and point that output at nginx.

I've put this script in /home/pi/webcam.sh (remember to chmod a+x this file)
#!/bin/bash
/usr/bin/raspivid -o - -t 0 -b 1000000 -w 1280 -h 720 -g 50 | \
/usr/local/bin/ffmpeg -i - -vcodec copy -map 0:0 -strict experimental \
-f flv rtmp://127.0.0.1/video/stream

Quick summary of those commands:
- Start raspivid
- output to STDOUT (-o -)
- Do it forever (-t 0)
- Set the bitrate to 1000000 bps, close enough to 1Mbit for jazz (-b 1000000)
- Set the width and height of the video to 1280x720 (-w 1280 -h 720)
- Set the GOP length to 50 frames (this means the image is completely refreshed every 50 frames)
- Pipe this output in to ffmpeg (| /usr/local/bin/ffmpeg)
- Tell ffmpeg to get input from STDIN
- Set the video codec to copy
- Map stream 0 of the input to to stream 0 of the output
- Enable some experimental features
- Format to flv
- Set the output to the nginx server with the rtmp module.


Use Systemd to keep everything going

Create a systemd service that will start the webcam, after nginx has started:
in /etc/systemd/system/webcam.service
[Unit]
Description=webcam
After=nginx.service
After=systemd-user-sessions.service
After=rc-local.service
Before=getty.target


[Service]
ExecStart=/home/pi/webcam.sh
Type=simple
Restart=always
User=pi
Group=pi

[Install]
WantedBy=multi-user.target

Create an nginx systemd service to start the local install of nginx in /etc/systemd/system/nginx.service:
[Unit]
Description=The NGINX HTTP and reverse proxy server
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/usr/local/nginx/sbin/nginx -t
ExecStart=/usr/local/nginx/sbin/nginx
ExecReload=/usr/local/nginx/sbin/nginx -s reload
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Start everything up

sudo systemctl daemon-reload
sudo systemctl enable nginx.service
sudo systemctl enable webcam.service
sudo systemctl start nginx.service
sudo systemctl start webcam.service

Now, connect your browser to http://raspberrypi.local/webcam/ and you should see your video stream which will look something like this:

Note that this is not live. It's a sample captured from the camera and hosted elsewhere.

Tuesday 16 May 2017

Finding deleted files and summing all numbers in a column


Here are two handy things, which can be combined in to one!

First, is there a mismatch between what du says is used on disk and what df is reporting as used? Well you probably have some processes holding open deleted files, the links will be removed so du will be unable to sum their size, but since they're still allocated on disk then df will report it as used space.

You often see this with log files where someone has gone in to clean up files due to a disk space alert, but the process is still writing to those files. As an aside, it's better avoid this whole situation and use >logfile.name to truncate a file rather than rm to delete the file on a process you're unable to restart right away.

Point one, find the deleted files, you can do this easily with:

sudo lsof | grep deleted

Also if you know which process is holding open the files then you can use the proc filesystem and look in the processes fd directory, the links will have (deleted) after their names if they are removed. Historical tip: The flash plugin used to protect streaming videos (such as youtube) by saving the file in /tmp and then deleting the file. You could just go in to proc and retrieve it if you like, that's changed now of course with HTML5 and DRM. Anyway back on topic.

The second part of this is: now I have the list of deleted files how do I sum up all the size values to see how much disk space is used?

Here you can use awk to extract column, paste to remove the newline and put '+' in it's place (awk can probably do this, but this is easier for me to remember), then pipe that in to bc to get a total:

sudo lsof | grep deleted | awk '{ print $7}' | paste -sd+ | bc

Easy. That number should be the same as the mismatch between du and df values.

Wednesday 10 May 2017

Tar and rename files using substring replacement

In my day to day I'm asked to tar up logs quite often, but often from hosts which have the same log file names. I've got this little snippet saved that can rename the log file paths in the tar file so we don't clobber log files when extracting at the destination end.

find /var/log/jbossas/standalone -mtime -1 -type f -print | \
   xargs tar --transform 'flags=r;s|var/log/jbossas/standalone|${hostname}|' \
   -cvf /var/tmp/logs_$(hostname)_$(date +%Y%m%d).tgz

The first part of the command is running find and looking for any files that are modified in the past 24 hours (use -daystart for the past day). We print any filenames found and pass that to xargs, which will then run tar and add the files to the output tar file. However in the middle is this transform option, it's doing a simple substring replacement of "var/log/jbossas/standalone" with "$(hostname)".

And that's it. Simple tar filename transformation.

This is of course completely ignoring solutions like splunk or greylog, but often vendors want their raw log files to look at.