Wednesday, 1 March 2017

Process log files inside a gzipped tar file within python

So I have a task: Collect the information in a set of lines from jboss server.log files.

My first step is to collect the log files from server. It's a production server I don't really want to process log files there as I may inadvertently impact the running application. So I'll tar them up and scp them off the host:

tar cfz /var/tmp/server.log.tgz /apps/jbooss/aslogs/server.log.201[67]*

Now I have a tgz file filled with the log files I want to process on my local machine. There are several ways to attack this and I could just untar the files and process them individually, but I'd be using up disk space that I might forget to clean up and I don't really want to untar the file if I don't need to. Processing time isn't of concern to me.

My solution is to use the tarfile module within python and create a generator function to return each log file line that I want so I can process the lines individually.

Here's what I came up with:

import tarfile

def tar_read_log_lines(input_tar, logfile_name):
    with tarfile.open(input_tar, 'r:*') as tar:
        for member in tar.getmembers():
            if member.name.startswith(logfile_name):
                memberfile = tar.extractfile(member)
                for line in memberfile:
                    yield [member.name, line]


This function will take in a compressed tar file and a match for the filenames, and then return each line and the file it came from in a list.

Here's how we make use of that in a simple line count:

>>> from collections import defaultdict
>>> linecounts = defaultdict(int)
>>> for linedata in tar_read_log_lines(r'server.log.tgz', 'server.log'):
...     linecounts[linedata[0]]+=1
...
>>> for c in linecounts:
...    print c, linecounts[c])
...
...
server.log.2017-02-19 72045
server.log.2017-02-18 86586
server.log.2017-01-21 20864
server.log.2017-01-20 30641
--------8<-snip-----------

This is good. It's initially slow but not enough to investigate other methods. Now I can take that and process the log lines that I want without having to untar the file.

1 comment:

  1. A look at all the games offered at Bet365 casino - JTMHub
    Bet365 충청북도 출장샵 Casino has been 여주 출장샵 around since 2016 and it is one of the finest sites 춘천 출장마사지 available online. It is available 목포 출장마사지 for you 강원도 출장샵 to play in real-time,

    ReplyDelete