I went away thinking about the multiline regex to capture the java stack output from yesterdays post and it was bugging me slightly that I didn't do that in the first place.
Now this morning I've quickly written up a regex version of the stack capture which will iterate through each stack rather than each line in the file.
The regex is pretty simple, as explained above. The only thing to be careful with is DOTALL so it matches newlines, and ? to make sure ? only matches the shortest possible match.
With this version I can then scan the full stack to make some kind of guess about what the objects are for. Say to scan for the keyword 'oracle' then I would attribute that thread to be an oracle thread if it's not named as such.
At a later date I'll use both versions and collect stats to see which is fastest.
Now this morning I've quickly written up a regex version of the stack capture which will iterate through each stack rather than each line in the file.
stackdata = [] # Loop through all the stack files that have been captured for stackfile in stackfiles: # The filenames captured by the script are built up of these values: process_start_time, stack_capture_time, capture_count = stackfile.split('_')[1:] # Open the stack file and get the contents with open(stackfiles[0], 'r') as f: stackdump = f.read() # Iterate over each stack found in the file # The regular expression here is finding any line beginning with ", up to two new # lines in a row. But also capturing the text between the quotes as a group. Both # * characters are non-greedy with ?. for stack in re.finditer('^"(.*?)".*?\n\n', stackdump, re.DOTALL|re.MULTILINE): fullstack = stack.group(0) threadname = stack.group(1) # Do the same genericising of the thread name threadtype = threadname.translate(None, '0123456789') # Find the thread state line, but if it doesn't exist, leave it as None threadstate_match = re.search('java.lang.Thread.State: (.*)', fullstack) threadstate = threadstate_match.group(1) if threadstate_match else None stackdata.append([process_start_time, stack_capture_time, capture_count, threadtype, threadname, threadstate])
The regex is pretty simple, as explained above. The only thing to be careful with is DOTALL so it matches newlines, and ? to make sure ? only matches the shortest possible match.
With this version I can then scan the full stack to make some kind of guess about what the objects are for. Say to scan for the keyword 'oracle' then I would attribute that thread to be an oracle thread if it's not named as such.
At a later date I'll use both versions and collect stats to see which is fastest.
No comments:
Post a Comment