R – Shell script/regex: extraction across multiple lines

extractlinesregexshell

I'm trying to write a log parsing script to extract failed events. I can pull these with grep:

$ grep -A5 "FAILED" log.txt

2008-08-19 17:50:07 [7052] [14] DEBUG:      data: 3a 46 41 49 4c 45 44 20 20 65 72 72 3a 30 32 33   :FAILED  err:023
2008-08-19 17:50:07 [7052] [14] DEBUG:      data: 20 74 65 78 74 3a 20 00                            text: .
2008-08-19 17:50:07 [7052] [14] DEBUG:    Octet string dump ends.
2008-08-19 17:50:07 [7052] [14] DEBUG: SMPP PDU dump ends.
2008-08-19 17:50:07 [7052] [14] DEBUG: SMPP[test] handle_pdu, got DLR
2008-08-19 17:50:07 [7052] [14] DEBUG: DLR[internal]: Looking for DLR smsc=test, ts=1158667543, dst=447872123456, type=2
--
2008-08-19 17:50:07 [7052] [8] DEBUG:      data: 3a 46 41 49 4c 45 44 20 20 65 72 72 3a 30 32 34   :FAILED  err:024
2008-08-19 17:50:07 [7052] [8] DEBUG:      data: 20 74 65 78 74 3a 20 00                            text: .
2008-08-19 17:50:07 [7052] [8] DEBUG:    Octet string dump ends.
2008-08-19 17:50:07 [7052] [8] DEBUG: SMPP PDU dump ends.
2008-08-19 17:50:07 [7052] [8] DEBUG: SMPP[test] handle_pdu, got DLR
2008-08-19 17:50:07 [7052] [8] DEBUG: DLR[internal]: Looking for DLR smsc=test, ts=1040097716, dst=447872987654, type=2

What I'm interested in is, for each block, the error code (i.e. the "023" part of ":FAILED err:023" on the first line) and the dst number (i.e."447872123456" from "dst=447872123456" on the last line.)

Can anyone help with a shell one-liner to extract those two values, or provide some hints as to how I should approach this?

Best Solution

grep -A 5 FAILED log.txt | \              # Get FAILED and dst and other lines
    egrep '(FAILED|dst=)' | \             # Just the FAILED/dst lines
    egrep -o "err:[0-9]*|dst=[0-9]*" | \  # Just the err: and dst= phrases
    cut -d':' -f 2 | \                    # Strip "err:" from err: lines
    cut -d '=' -f 2 | \                   # Strip "dst=" from dst= lines
    xargs -n 2                            # Combine pairs of numbers

023 447872123456
024 447872987654

As with all shell "one"-liners, there is almost certainly a more elegant way to do this. However, I find the iterative approach very successful for getting what I want: start with too much information (your grep), then narrow down the lines I want (with grep), then snip out the parts of each line that I want (with cut).

While using the linux toolbox takes more lines, you only have to know the basics of a few commands to do just about anything you want. An alternative is to use awk, python, or other scripting languages, which require more specialized programming knowledge but will take less screen space.