combining log4j log lines - linux

I am dealing with some hive logs created with log4j on linux. The regex used is
(%d{ISO8601} %-5p [%t]: %c{2} (%F:%M(%L)) - %m%n)
. I realise some lines are broken down into several lines. for example
2017-02-10 10:03:29,933 INFO [main]: parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command:
create table my_table
(std_id STRING, std_number STRING)
2017-02-10 10:03:31,296 INFO [main]: parse.ParseDriver (ParseDriver.java:parse(209)) - Parse Completed
Is there a command in linux which I can use to combine the broken lines to give an output like this ;
2017-02-10 10:03:29,933 INFO [main]: parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: create table my_table (std_id STRING, std_number STRING)
2017-02-10 10:03:31,296 INFO [main]: parse.ParseDriver (ParseDriver.java:parse(209)) - Parse Completed

That should work.
<myfile.log sed -nr \
-e '/^.{4}-.{2}-.{2} .{2}:.{2}:.{2},.{3} [A-Z]+ /{x;1!{s/\n/ /g;p};${g;p};d}' \
-e 'H' \
-e '${g;s/\n/ /g;p}'
Explanation
Let /^.{4}-.{2}-.{2} .{2}:.{2}:.{2},.{3} [A-Z]+ be the pattern that indicates a new log record, e.g. 2017-03-17 03:20:19,372 WARN
1st -e
When the current row indicates a new log record
x: Exchange the contents of the pattern space (which hold the current row) and the hold space.
1!{s/\n/ /g;p}: If this is not the first row of the file, (within the pattern space) replace the new lines with spaces and print it
${g;p} If this was the last row of the file, get it from the hold space and print it
d: Delete the pattern space and start a new cycle (ignoring following commands)
2nd -e
'H': append the pattern space to the hold space (only when not a new log record)
3rd -e
If this was the last row of the file ($) and not a new log record
g: Overwrite the pattern space with the content of the hold space
s/\n/ /g;p: replace the new lines with spaces and print it
Demo
$ cat>myfile.log
2017-02-10 10:03:27,374 INFO [main]: ql.Driver (Driver.java:compile(463)) - Semantic Analysis Completed
2017-02-10 10:03:29,933 INFO [main]: parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command:
create table my_table
(std_id STRING, std_number STRING)
2017-02-10 10:03:31,296 INFO [main]: parse.ParseDriver (ParseDriver.java:parse(209)) - Parse Completed
2017-03-17 03:04:09,297 INFO [main]: ql.Driver (SessionState.java:printInfo(927)) - OK
2017-03-17 03:20:19,372 WARN [Driver]: client.SparkClientImpl (SparkClientImpl.java:run(451)) - Child process exited with code 1.
2017-03-17 03:03:55,282 ERROR [main]: ql.Driver (SessionState.java:printError(936)) - FAILED: ParseException line 1:14 cannot recognize input near 'valeus' '(' '1' in statement
org.apache.hadoop.hive.ql.parse.ParseException: line 1:14 cannot recognize input near 'valeus' '(' '1' in statement
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)
$ <myfile.log sed -nr -e '/^.{4}-.{2}-.{2} .{2}:.{2}:.{2},.{3} [A-Z]+ /{x;1!{s/\n/ /g;p};${g;p};d}' -e 'H' -e '${g;s/\n/ /g;p}'
2017-02-10 10:03:27,374 INFO [main]: ql.Driver (Driver.java:compile(463)) - Semantic Analysis Completed
2017-02-10 10:03:29,933 INFO [main]: parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: create table my_table (std_id STRING, std_number STRING)
2017-02-10 10:03:31,296 INFO [main]: parse.ParseDriver (ParseDriver.java:parse(209)) - Parse Completed
2017-03-17 03:04:09,297 INFO [main]: ql.Driver (SessionState.java:printInfo(927)) - OK
2017-03-17 03:20:19,372 WARN [Driver]: client.SparkClientImpl (SparkClientImpl.java:run(451)) - Child process exited with code 1.
2017-03-17 03:03:55,282 ERROR [main]: ql.Driver (SessionState.java:printError(936)) - FAILED: ParseException line 1:14 cannot recognize input near 'valeus' '(' '1' in statement org.apache.hadoop.hive.ql.parse.ParseException: line 1:14 cannot recognize input near 'valeus' '(' '1' in statement at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:204)

Related

How to get this complete line of file in list using robot framework?

I saved one output in Test1.txt like below.
Test1.txt:-
sq.service - SYSV: sudha md server daemon
Loaded: loaded (/etc/rc.d/init.d/md; bad; vendor preset: disabled)
Active: active (running) since Wed 2021-01-13 21:06:14 JST; 5 months 0 days ago
docs: man:systemd-mon-gator(8)
And I got the output like that by executing the command.
***Test Cases***
Check the Regular Expresssion
${Cmd_Output}= Get File Test1.txt
${matches}= String.Get Regexp Matches ${Cmd_Output} (?s).*(Active):\\s*(\\w+).*;\\s*(\\d)\\s*(\\w+)\\s*(\\d+)\\s*(\\w+) 1 2 3 4 5 6
log to console ${matches}
I got ouput like that:
Check the Regular Expresssi ..[('Active', 'active', '5', 'months', '0', 'days')]
Check the Regular Expresssi | PASS |
------------------------------------------------------------------------------
Is any optmized way to get this comlete line "Active: active (running) since Wed 2021-01-13 21:06:14 JST; 5 months 0 days ago" in a list in robot framework.
You can use the following keywords from string library of robot framework -
Get Line and Split to Lines
1. - More close to what you want you can split this output using split string keyword to get it into list.
*Test Cases
Check the Regular Expresssion
${Cmd_Output}= Get File ${filepath}
${Lines} Get Line ${Cmd_Output} 2
log to console ${Lines}
Output -
==============================================================================
Check the Regular Expresssion .. Active: active (running) since Wed 2021-01-13 21:06:14 JST; 5 months 0 days ago
Check the Regular Expresssion | PASS |
2. - Another way to suffice your requirement.
*Test Cases
Check the Regular Expresssion
${Cmd_Output}= Get File ${filepath}
#{Lines}= Split To Lines ${Cmd_Output}
log to console ${Lines}
Output -
==============================================================================
Check the Regular Expresssion ..['sq.service - SYSV: sudha md server daemon', ' Loaded: loaded (/etc/rc.d/init.d/md; bad; vendor preset: disabled)', ' Active: active (running) since Wed 2021-01-13 21:06:14 JST; 5 months 0 days ago', ' docs: man:systemd-mon-gator(8)']
Check the Regular Expresssion | PASS |

SLURM: How to view completed jobs full name?

sacct -n returns all job's name trimmed for example" QmefdYEri+.
[Q] How could I view the complete name of the job, instead of its trimmed version?
--
$ sacct -n
1194 run.sh debug root 1 COMPLETED 0:0
1194.batch batch root 1 COMPLETED 0:0
1195 run_alper+ debug root 1 COMPLETED 0:0
1195.batch batch root 1 COMPLETED 0:0
1196 QmefdYEri+ debug root 1 COMPLETED 0:0
1196.batch batch root 1 COMPLETED 0:0
I use the scontrol command when I am interested in one particular jobid as shown below (output of the command taken from here).
$ scontrol show job 106
JobId=106 Name=slurm-job.sh
UserId=rstober(1001) GroupId=rstober(1001)
Priority=4294901717 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=00:00:07 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2013-01-26T12:55:02 EligibleTime=2013-01-26T12:55:02
StartTime=2013-01-26T12:55:02 EndTime=Unknown
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=defq AllocNode:Sid=atom-head1:3526
ReqNodeList=(null) ExcNodeList=(null)
NodeList=atom01
BatchHost=atom01
NumNodes=1 NumCPUs=2 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=0 Contiguous=0 Licenses=(null) Network=(null)
Command=/home/rstober/slurm/local/slurm-job.sh
WorkDir=/home/rstober/slurm/local
If you want to use sacct, you can modify the number of characters that are displayed for any given field as explained in the documentation:
-o, --format Comma separated list of fields. (use "--helpformat" for a list of available fields). NOTE: When using the format option for
listing various fields you can put a %NUMBER afterwards to specify how
many characters should be printed.
e.g. format=name%30 will print 30 characters of field name right
justified. A %-30 will print 30 characters left justified.
Therefore, you can do something like this:
sacct --format="JobID,JobName%30,Partition,Account,AllocCPUS,State,ExitCode"
if you want the JobName row to be 30-characters wide.

Application Status script

I have a start and stop script in-place and need a script that gives status on Linux server. My start script looks as below, can you please let me know if I can add some arguments/command to get my application status.
#!/bin/ksh
java_home=`cat /apps/abc.properties | grep "$1|" | cut "-d|" -f2`
service_executable=`cat /apps/abc.properties | grep "$1|" | cut "-d|" -f3`
service_home=`cat /apps/abc.properties | grep "$1|" | cut "-d|" -f4`
service_opts=`cat /apps/abc.properties | grep "$1|" | cut "-d|" -f5`
export JAVA_HOME=$java_home
export PATH=$JAVA_HOME/bin:$PATH
echo start $service_home
cd $service_home/bin
nohup $service_executable start $service_opts
abc.properties has below values
abc-3.7.3|/apps/java/jdk1.8.0_66|rmc|/apps/rmc/abc-3.7.3|-M-Drmc.mmc.bind.port=8770
abc-3.7.3-spii|/apps/java/jdk1.8.0_66|rmc|/apps/rmc/abc-3.7.3-spii|-M-Drmc.mmc.bind.port=8770
I want a scrpit that can check each version of application(JVM) using port numbers and give me status for example abc-3.7.3"running"/ abc-3.7.3-spii"down".
A quick version would be to have a script that would extract the application name and the port number form the input file (ex. read APP + stuff + PORT) and then check if the port is opened (ex. in the netstat output grep for the port number).
This is a very short form that does just that:
while IFS="|=" read App _ _ _ _ Port; do
netstat -lnt | grep -qw ":$Port" && echo "$App : running" || echo "$App : down"
done < abc.properties
Is it good enough?
Thanks for your answer, I tried using " if lsof -Pi :8080 -sTCP:LISTEN -t >/dev/null ; "and it worked fine on RHEL6 (4.82) but on RHEL5 I'm seeing a error below
lsof: unsupported TCP/TPI info selection: C
lsof: unsupported TCP/TPI info selection: P
lsof: unsupported TCP/TPI info selection: :
lsof: unsupported TCP/TPI info selection: L
lsof: unsupported TCP/TPI info selection: I
lsof: unsupported TCP/TPI info selection: S
lsof: unsupported TCP/TPI info selection: T
lsof: unsupported TCP/TPI info selection: E
lsof: unsupported TCP/TPI info selection: N
lsof 4.78

how to extract numbers in the same location from many log files

I got an file test1.log
04/15/2016 02:22:46 PM - kneaddata.knead_data - INFO: Running kneaddata v0.5.1
04/15/2016 02:22:46 PM - kneaddata.utilities - INFO: Decompressing gzipped file ...
Input Reads: 69766650 Surviving: 55798391 (79.98%) Dropped: 13968259 (20.02%)
TrimmomaticSE: Completed successfully
04/15/2016 02:32:04 PM - kneaddata.utilities - DEBUG: Checking output file from Trimmomatic : /home/liaoming/kneaddata_v0.5.1/WGC066610D/WGC066610D_kneaddata.trimmed.fastq
04/15/2016 05:32:31 PM - kneaddata.utilities - DEBUG: 55798391 reads; of these:
55798391 (100.00%) were unpaired; of these:
55775635 (99.96%) aligned 0 times
17313 (0.03%) aligned exactly 1 time
5443 (0.01%) aligned >1 times
0.04% overall alignment rate
and the other files in the same format but different contents,like test2.log,test3.log to test60.log
I would like to extract two numbers from these files.For example the test1.log, the two numbers would be 55798391 55775635.
So the final generated file counts.txt would be something like this:
test1 55798391 55775635
test2 51000000 40000000
.....
test60 5000000 30000000
awk to the rescue!
$ awk 'FNR==9{f=$1} FNR==10{print FILENAME,f,$1}' test{1..60}.log
if not in the same directory, either call within a loop or create the file list and pipe to xargs awk
$ for i in {1..60}; do awk ... test$i/test$i.log; done
$ for i in {1..60}; do echo test$i/test$i.log; done | xargs awk ...

How do i cut section with start and end using Bash?

When i am doing pactl list i get lot of information. Out of those information, i am trying to only get the part start with Sink #0 till end of that section.
1) Information's
Sink #0
State: SUSPENDED
Name: auto_null
Description: Dummy Output
Driver: module-null-sink.c
Sample Specification: s16le 2ch 44100Hz
Channel Map: front-left,front-right
Owner Module: 14
Mute: no
Volume: 0: 0% 1: 0%
0: -inf dB 1: -inf dB
balance 0.00
Base Volume: 100%
0.00 dB
Monitor Source: auto_null.monitor
Latency: 0 usec, configured 0 usec
Flags: DECIBEL_VOLUME LATENCY
Properties:
device.description = "Dummy Output"
device.class = "abstract"
device.icon_name = "audio-card"
Source #0
State: SUSPENDED
Name: auto_null.monitor
Description: Monitor of Dummy Output
Driver: module-null-sink.c
Sample Specification: s16le 2ch 44100Hz
Channel Map: front-left,front-right
Owner Module: 14
Mute: no
Volume: 0: 80% 1: 80%
0: -5.81 dB 1: -5.81 dB
balance 0.00
Base Volume: 100%
0.00 dB
Monitor of Sink: auto_null
Latency: 0 usec, configured 0 usec
Flags: DECIBEL_VOLUME LATENCY
Properties:
device.description = "Monitor of Dummy Output"
device.class = "monitor"
device.icon_name = "audio-input-microphone"
2) I am trying, such as:
#!/bin/bash
command=$(pactl list);
# just get Sink #0 section not one line
Part1=$(grep "Sink #0" $command);
for i in $Part1
do
# show only Sink #0 lines
echo $i;
done
3) It output very strange
grep: dB: No such file or directory
How can i get that section using my BASH script, is there any other best way to work on such filtering?
Follow up: So i was also trying to keep it simple. such as:
pactl list | grep Volume | head -n1 | cut -d' ' -f2- | tr -d ' '
|________| |________| |______| |_____________| |_________|
| | | | |
command target get show 1 row cut empty Dont know..
to list
You can use several features of the sed editor to achieve your goal.
sed -n '/^Sink/,/^$/p' pactl_Output.txt
-n says "don't perform the standard option of printing each line of output
/^Sink/,/^$/ is a range regular expr, that says find a line that begins with Sink, then keep looking at lines until you find an empty line (/^$/).
the final char, p says Print what you have matched.
If there are spaces or tabs on the empty line, use " ...,/^$[${spaceChar}${tabChar}]*\$/p". Note the change from single quoting to dbl-quoting which will allow the variables ${spaceChar} and ${tabChar} to be expanded to their real values. You may need to escape the closing '$'. YOu'll need to define spaceChar and tabChar before you use them, like spaceChar=" " . No way here on S.O. for you to see the tabChar, but not all sed's support the \t version. It's your choice to go with pressing tab key or use \t. I would go with tab key as it is more portable.
While it is probably possible to accomplish your goal with bash, sed was designed for this sort of problem.
I hope this helps.
Try:
Part1=`echo $command | grep "Sink #0"`
instead of
Part1=$(grep "Sink #0" $command);

Resources