Tuesday 16 May 2017

Finding deleted files and summing all numbers in a column


Here are two handy things, which can be combined in to one!

First, is there a mismatch between what du says is used on disk and what df is reporting as used? Well you probably have some processes holding open deleted files, the links will be removed so du will be unable to sum their size, but since they're still allocated on disk then df will report it as used space.

You often see this with log files where someone has gone in to clean up files due to a disk space alert, but the process is still writing to those files. As an aside, it's better avoid this whole situation and use >logfile.name to truncate a file rather than rm to delete the file on a process you're unable to restart right away.

Point one, find the deleted files, you can do this easily with:

sudo lsof | grep deleted

Also if you know which process is holding open the files then you can use the proc filesystem and look in the processes fd directory, the links will have (deleted) after their names if they are removed. Historical tip: The flash plugin used to protect streaming videos (such as youtube) by saving the file in /tmp and then deleting the file. You could just go in to proc and retrieve it if you like, that's changed now of course with HTML5 and DRM. Anyway back on topic.

The second part of this is: now I have the list of deleted files how do I sum up all the size values to see how much disk space is used?

Here you can use awk to extract column, paste to remove the newline and put '+' in it's place (awk can probably do this, but this is easier for me to remember), then pipe that in to bc to get a total:

sudo lsof | grep deleted | awk '{ print $7}' | paste -sd+ | bc

Easy. That number should be the same as the mismatch between du and df values.

No comments:

Post a Comment