Linux grep and sort log files

Linux grep and sort log files - linux

I looked almost everywhere (there, there, there, there and there) with no luck.
What I have here is a bunch of log files in a directory, where I need to look for a specific ID (myID) and sort the output by date. Here is an example :
in file1.log :
2015-09-26 15:39:50,788 - DEBUG - blabla : {'id' : myID}
in file2.log:
2015-09-26 15:39:51,788 - ERROR - foo : {'id' : myID}
in file3.log:
2015-09-26 15:39:48,788 - ERROR - bar : {'id' : myID}
Exepected output :
2015-09-26 15:39:48,788 - ERROR - bar : {'id' : myID}
2015-09-26 15:39:50,788 - DEBUG - blabla : {'id' : myID}
2015-09-26 15:39:51,788 - ERROR - foo : {'id' : myID}
What I am doing now (and it works pretty well), is :
grep -hri --color=always "myID" | sort -n
The only problem is that with the -h option of grep, the file names are hidden. I'd like to keep the file names AND keep the sorting.
I tried :
grep -ri --color=always "myID" | sort -n -t ":" -k1,1 -k2,2
But it doesn't work. Basically, the grep command outputs the name of the file followed by ":", I'd like to sort the results from this character.
Thanks a lot

Try this:
grep --color=always "myID" file*.log | sort -t : -k2,2 -k3,3n -k4,4n
Output:
file3.log:2015-09-26 15:39:48,788 - ERROR - bar : {'id' : myID}
file1.log:2015-09-26 15:39:50,788 - DEBUG - blabla : {'id' : myID}
file2.log:2015-09-26 15:39:51,788 - ERROR - foo : {'id' : myID}

Another solution, a little bit longer but I think it should work:
grep -l "myID" file* > /tmp/file_names && grep -hri "myID" file* | sort -n > /tmp/grep_result && paste /tmp/file_names /tmp/grep_result | column -s $'\t' -t
What it does basically is, first store files names by:
grep -l "myID" file* > /tmp/file_names
Store grep sorted results:
grep -hri "myID" file* | sort -n > /tmp/grep_result
Paste the results column-wise (using a tab separator):
paste /tmp/file_names /tmp/grep_result | column -s $'\t' -t

The column ordering for sort is 1-based, so k1 will be your filename part. That means that in your attempt, you are sorting by filename, then by date and hour of your log line. Also, the -n means that you are using numeric ordering, which won't be playing nicely with yyyy-mm-dd hh:mm:ss format (it will read yyyy-mm-dd hh as only the first number, i.e. the year).
You can use:
sort -t ":" -k2
Note that I specified column 2 as the start, and left the end blank. The end defaults to the end-of-line.
If you want to sort specific columns, you need to explicitly set the start and end, for example: -k2,2. You can use this to sort out-of-sequence columns, for example -k4,4 -k2,2 will sort by column 4 and use column 2 for tie-breaking.
You could also use -k2,4, which would stop sorting at the colon just before your log details (i.e. it would use 2015-09-26 15:39:48,788 - ERROR - bar)
Finally, perhaps you want to have your log files in a consistent order if the time is the same:
sort -t ":" -k2,4 -k1,1

Try rust-based tool Super Speedy Syslog Searcher
(assuming you have rust installed)
cargo install super_speedy_syslog_searcher
then
s4 file1.log file2.log file3.log | grep "myID"
The only problem is that with the -h option of grep, the file names are hidden. I'd like to keep the file names AND keep the sorting.
You could try
$ s4 --color=never -nw file1.log file2.log file3.log | grep "myID"
file1.log:2015-09-26 15:39:48,788 - ERROR - bar : {'id' : myID}
file2.log:2015-09-26 15:39:50,788 - DEBUG - blabla : {'id' : myID}
file3.log:2015-09-26 15:39:51,788 - ERROR - foo : {'id' : myID}

Related

How to format gcloud compute instances list output to excel format

Tried various approaches, but nearest to working one:
Replace multiple spaces with single one
Replace commas(,) in INTERNAL_IP column with Pipe(|)
Remove 4th cloumn (PREEMPTIBLE) as it was causing IPs in INTERNAL_IP cloumn shift under it.
Replace space with comma(,) to prepare a csv file.
But did not work. Gets messed up at PREEMPTIBLE cloumn.
gcloud compute instances list > file1
tr -s " " < file1 > file2 // to replace multiple spaces with single one
sed s/\,/\|/g file2 > file3 // to replace , with pipe
awk '{$4=""; print $0}' file3 // to remove 4th column
sed -e 's/\s\+/,/g' file3 > final.csv
Output of gcloud compute instances list command:
Expected format:
Any help or suggestion is appreciated. Thank you in advance.
Edit:
Attached sample input and expected output files:
sample_input.txt
expected_output.xlsx

csv format is supported in gcloud CLI so everything you are doing can be done without sed/awk maybe with | tail -n +2 if you want to skip the column header :
gcloud compute instances list --format="csv(NAME,ZONE,MACHINE_TYPE,PREEMPTIBLE,INTERNAL_IP,EXTERNAL_IP,STATUS)" > final.csv
Or if you wanted to do something with the data in your bash script:
while IFS="," read -r NAME ZONE MACHINE_TYPE PREMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
do
echo "NAME=$NAME ZONE=$ZONE MACHINE_TYPE=$MACHINE_TYPE PREMPTIBLE=$PREMPTIBLE INTERNAL_IP=$INTERNAL_IP EXTERNAL_IP=$EXTERNAL_IP STATUS=$STATUS"
done < <(gcloud compute instances list --format="csv(NAME,ZONE,MACHINE_TYPE,PREEMPTIBLE,INTERNAL_IP,EXTERNAL_IP,STATUS)" | tail -n +2 | awk ' BEGIN {print "NAME,ZONE,MACHINE_TYPE,PREMPTIBLE,INTERNAL_IP,EXTERNAL_IP,STATUS"} {print $1","$2","$3","" "","$4","" "","$5}' )

Based on attached files sample input & expected output i have made following change :
Some of the instances for multiple internal IPs and they are
separated by ",". I have replaced that "," with "-" using sed
's/,/-/g' to aviod conflicts with other fields as we are
generating a CSV.
Displaying $4 & $6 in 5th & 7th columns so that they will be aligned
with Column Headers Internal IP Address and Status
cat command_output.txt | grep -v 'NAME' | sed 's/,/-/g' | awk ' BEGIN {print "NAME,ZONE,MACHINE_TYPE,PREMPTIBLE,INTERNAL_IP,EXTERNAL_IP,STATUS"} {print $1","$2","$3","" "","$4","" "","$5}'

Counting total occurrences of each 'version' across multiple files

I have a number of files in a directory on Linux, each of which contains a version line in the format: #version x (where x is the version number).
I'm trying to find a way to count the number of times each different version appears across all the files, and output something like:
#version 1: 12
#version 2: 36
#version 3: 2
I don't know all the potential versions that might exist, so I'm really trying to match lines that contain #version.
I've tried using things like grep -c - however that only gives the total of all lines containing #version - I can't find a nice way to split on the different version numbers.

A possibility piping multiple commands:
strings * | grep '#version \w' | sort | uniq --count | awk '{printf("%s: %s\n", substr($0, index($0, $2)), $1)}''
Operations breakdown:
strings *: Extract text strings from * all files in current directory.
| grep '#version \w': Pipe the strings into the grep command, to find all occurrences of #version word.
sort: Pipe the version strings to the sort command.
| uniq --count: Pipe the occurrences of #version lines into the uniq command, to output count of each #version... string.
awk '{printf("%s: %s\n", substr($0, index($0, $2)), $1)}': Pipe the unique counts into the awk command, to re-format the output as: #version ...: count.
Testing the process:
cd /tmp
mkdir testing 2>/dev/null || true
cd testing
# Create 10 testfile#.txt with random #version 1 to 4
for i in {1..10}; do
echo "#version $(($RANDOM%4+1))" >"testfile${i}.txt"
done
# Now get the counts per version
strings * \
| grep '#version \w' \
| sort \
| uniq --count \
| awk '{printf("%s: %s\n", substr($0, index($0, $2)), $1)}'
Example of test output:
#version 1: 4
#version 2: 2
#version 3: 1
#version 4: 3

Something like this may do the trick:
grep -h '#version' * | sort | uniq -c | awk '{print $2,$3": found "$1}'
example files:
filename:filecontent
file1:#version 1
file1.1:#version 1
file111:#version 1
file2:#version 2
file3:#version 3
file4:#version 4
file44:#version 4
Output:
#version 1: found 3
#version 2: found 1
#version 3: found 1
#version 4: found 2
grep version * gets all files with version.sort sorts the results for uniq -c which counts the number of duplicates then awk rearranges the output for desired formatting.
Note: grep might have a slightly different separator than : on your OS.

GREP or AWK complex pattern match (linux)

So I have text output bellow from a 'mediinfo VIDEO.mkv':
General
Unique ID : 190778803810831492312123193779943 (0x8F265C1B107A4D595F723237C370C7074FB7)
Complete name : VIDEO.mkv
Format : Matroska
Format version : Version 4 / Version 2
Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main#L3#Main
Codec ID : V_MPEGH/ISO/HEVC
I need to GREP or AWK out the Format: HEVC bellow Video. I wasn't sure how to proceed as I could regex 'Format' but then I get back multiple rows (Matroska and HEVC). I haven't found any handy hints.
Ideas?

If "Matroska" is fixed you can do it by mediinfo VIDEO.mkv | grep "Format " test.fi | grep -v "Matroska"
If output format is fixed then you do it by mediinfo VIDEO.mkv | grep "Format " test.fi | tail -n1
grep -v will ignore matching line, tail will print specified number o lines from the last.

mediinfo VIDEO.mkv | awk -v RS= '/^Video/{print $7}'
HEVC
You can use awk with RS set to blank and print the desired column number.

Obviously many ways to solve this, but sed seems like a natural fit here:
$ sed -n '/Video/,$ { s/Format *: //p }' file
HEVC

generate report based on log file using shell script

I have to create a report based on the nagios log file. I am going to write a shell script for this.
The log file is as follows :
[1420520400] CURRENT SERVICE STATE: abc.com;service;CRITICAL;HARD;3;OK : OK : Last on 10-01-2015, Users = 2, Employees = 0
[1420520400] CURRENT SERVICE STATE: def.com;service;CRITICAL;HARD;3;WARNING : Last on 10-01-2015, Users = 2, Employees = 0
[1420520400] CURRENT SERVICE STATE: ghi.com;service;CRITICAL;HARD;3;CRITICAL : Last on 2014-11-19, Users = 2, Employees = 0
From this file, I want to generate the report as follows :
Name : abc.om
Date : 10-01-2015
Users : 2
Employees : 0
Name : def.om
Date : 10-01-2015
Users : 2
Employees : 0
Name : ghi.om
Date : 2014-11-19
Users : 2
Employees : 0
It would be great if anyone help me to achieve this.

This command will give you the above output, from the log file just change the file name from input.log to the actual file name.
$ cat input.log |cut -d';' -f1,6|sed -e 's/\<CURRENT SERVICE STATE\>/NAME=/g'|sed -e 's/\<OK\>//g'|sed -e 's/\<Last on\>/Date =/g'|tr -d ':'|sed 's/WARNING//g'|sed 's/CRITICAL//g'|cut -c 14-|tr -s ' '|tr ',;' '\n'
Output:
Here, I used '=' but you can change the output exactly same as above if you use, following command,
$ cut -d';' -f1,6 input.log|sed -e 's/\<CURRENT SERVICE STATE\>/NAME=/g'|sed -e 's/\<OK\>//g'|sed -e 's/\<Last on\>/Date =/g'|tr -d ':'|sed 's/WARNING//g'|sed 's/CRITICAL//g'|cut -c 14-|tr -s ' '|tr ',;' '\n' |tr '=' ':'

Retrieving information from a text file. Linux

Basically I am trying to read information from three text files in which it contains unique information.
The way the text file is setup is this:
textA.txt
----------------
something.awesome.com
something2.awesome.com
something3.awesome.com
...
textB.txt
----------------
123
456
789
...
textC.txt
----------------
12.345.678.909
87.65.432.1
102.254.326.12
....
Now what its suppose to look like when i output it something like this
something.awesome.com : 123 : 12.345.678.909
something2.awesome.com : 456 : 87.65.432.1
something3.awesome.com : 789 : 102.254.326.12
The code I am trying now is this:
for each in `cat site.txt` ; do
site=`echo $each | cut -f1`
for line in `cat port.txt` ; do
port=`echo $line | cut -f1`
for this in `cat ip.txt` ; do
connect=`echo $this | cut -f1`
echo "$site : $port : $connect"
done
done
done
The result I am getting is just crazy wrong and just not what i want. I don't know how to fix this.
I want to be able to call the information through variable form.

paste testA.txt testB.txt testC.txt | sed -e 's/\t/ : /g'
Output is:
something.awesome.com : 123 : 12.345.678.909
something2.awesome.com : 456 : 87.65.432.1
something3.awesome.com : 789 : 102.254.326.12
Edit: Here is a solution using pure bash:
#!/bin/bash
exec 7<testA.txt
exec 8<testB.txt
exec 9<testC.txt
while true
do
read site <&7
read port <&8
read connect <&9
[ -z "$site" ] && break
echo "$site : $port : $connect"
done
exec 7>&-
exec 8>&-
exec 9>&-

Have you looked at using paste ?
e.g.
$ paste testA.txt testB.txt
etc. The -d operator will specify a separator character.
A related utility is the SQL-like join, which you can use in scenarios where you have to join using fields common to your input files.

head -2 /etc/hosts | tail -1 | awk '{print$2}'
where /etc/hosts is the name of a file.
(head -2 ) is used to retrieve top 2 lines from the file.
(tail -1) is used to retrieve only last one line outputed from (head -2).
(awk '{print$2}') is used to print the 2nd column of line outputted from (tail -1).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Linux grep and sort log files - linux

Try this: grep --color=always "myID" file*.log | sort -t : -k2,2 -k3,3n -k4,4n Output: file3.log:2015-09-26 15:39:48,788 - ERROR - bar : {'id' : myID} file1.log:2015-09-26 15:39:50,788 - DEBUG - blabla : {'id' : myID} file2.log:2015-09-26 15:39:51,788 - ERROR - foo : {'id' : myID}

Related

How to format gcloud compute instances list output to excel format

Counting total occurrences of each 'version' across multiple files

GREP or AWK complex pattern match (linux)

generate report based on log file using shell script

Retrieving information from a text file. Linux

Categories

Resources