grep using variable and regex - linux

I am trying to grep a log file for entries within the last 24 hours. I came up with the following command:
grep "$(date +%F\ '%k')"\|"$(date +%F --date='yesterday')\ [$(date +%k)-23]" /path/to/log/file
I know regular expressions can be used in grep, but am not very familiar with regex. You see I am greping for anything from today or anything from yesterday between the current hour or higher. This isnt working and I am guessing due to the way I am trying to pass a command as a variable in the regex of grep. I also wouldnt be opposed to using awk with awk I came up with the following but it is not checking the variables properly:
t=$(date +%F) | y=$(date +%F --date='yesterday') | hr=$(date +%k) | awk '{ if ($1=$t || $1=$y && $2>=$hr) { print $0 }}' /path/to/log/file
I would assume systime could be used with awk rather than settings variables but i am not familiar with systime at all. Any suggestions with either command would be greatly appreciated! Oh and here's the log formatting:
2012-12-26 16:33:16 SMTP connection from [127.0.0.1]:46864 (TCP/IP connection count = 1)
2012-12-26 16:33:16 SMTP connection from (localhost) [127.0.0.1]:46864 closed by QUIT
2012-12-26 16:38:19 SMTP connection from [127.0.0.1]:48451 (TCP/IP connection count = 1)
2012-12-26 16:38:21 SMTP connection from [127.0.0.1]:48451 closed by QUIT
2012-12-26 16:38:21 SMTP connection from [127.0.0.1]:48860 (TCP/IP connection count = 1)

Here's one way using GNU awk. Run like:
awk -f script.awk file
Contents of script.awk:
BEGIN {
time = systime()
}
{
spec = $1 " " $2
gsub(/[-:]/, " ", spec)
}
time - mktime(spec) < 86400
Alternatively, here's the one-liner:
awk 'BEGIN { t = systime() } { s = $1 " " $2; gsub(/[-:]/, " ", s) } t - mktime(s) < 86400' file
Also, the correct way to pass shell vars to awk is to use the -v flag. I've made a few adjustments to your awk command to show you what I mean, but I recommend against doing this:
awk -v t="$(date +%F)" -v y="$(date +%F --date='yesterday')" -v hr="$(date +%k)" '$1==t || $1==y && $2>=hr' file
Explanation:
So before awk starts processing the file, the BEGIN block is processed first. In this block we create a variable called time / t and this is set using the systime() function. systime() simply returns the current time as the number of seconds since the system epoch. Then, for every line in your log file, awk will create another variable called spec / s and this is set to the first and second fields seperated by a single space. Additionally, other characters like - and : need to be globally substituted with spaces for the mktime() function to work correctly and this done using gsub(). Then it's just a little mathematics to test if the datetime in the log file is within the last 24 hours (or exactly 86400 seconds). If the test is true, the line will be printed. Maybe a little extra reading would help, see Time Functions and String Manipulation Functions. HTH.

Related

BASH scripting - unable to split string from grepped output and pass it one by one to a variable

I'm a beginner to bash scripting and been writing a script to check different log files and I'm bit stuck here.
clientlist=/path/to/logfile/which/consists/of/client/names
#i will grep only the client name from the file which has multiple log lines
clients=$(grep --color -i 'list of client assets:' $clientlist | cut -d":" -f1 )
echo "Clients : $clients"
#For example "Clients: Apple
# Samsung
# Nokia"
#number of clients may vary from time to time
assets=("$clients".log)
echo assets: "$assets"
The code above greps the client name from the log file and i'm trying to use the grepped client name (each) to construct a logfile with each client name.
The number of clients is indefinite and may vary from time to time.
The code I have returns the client name as a whole
assets: Apple
Samsung
Nokia.log
and I'm bit unsure of how to cut the string and pass it on one by one to return the assets which has .log for each client name. How can i do this ?
Apple.log
Samsung.log
Nokia.log
(Apologies if I have misunderstood the task)
Using awk
if your input file (I'll call it clients.txt) is:
Clients: Apple
Samsung
Nokia
The following awk step:
awk '{print $NF".log"}' clients.txt
outputs:
Apple.log
Samsung.log
Nokia.log
(You can pipe straight into awk and omit the file name if the pipe stream is as the file contents in the above example).
It is highly likely that a simple awk procedure can perform the entire task beginning with the 'clientlist' you process with grep (awk has all the functionality of grep built-in) but I'd need to know the structure of the origial file to extract the client names.
One awk idea:
assets=( $(awk -F: '/list of client assets:/ {print $2".log"}' "${clientlist}") )
# or
mapfile -t assets < <(awk -F: '/list of client assets:/ {print $2".log"}' "${clientlist}")
Where:
-F: - define input field delimiter as :
/list of client assets:/ - for lines that contain the string list of clients assets: print the 2nd :-delimited field and append the string .log on the end
One sed idea:
assets=( $(sed 's/.*://; s/$/.log/' "${clientlist}") )
# or
mapfile -t assets < <(sed 's/.*://; s/$/.log/' "${clientlist}")
Where:
s/.*:// - strip off everything up to the :
s/$/.log/ - replace end of line with .log
Both generate:
$ typeset -p assets
declare -a assets=([0]="Apple.log" [1]="Samsung.log" [2]="Nokia.log")
$ echo "${assets[#]}"
Apple.log Samsung.log Nokia.log
$ printf "%s\n" "${assets[#]}"
Apple.log
Samsung.log
Nokia.log
$ for i in "${!assets[#]}"; do echo "assets[$i] = ${assets[$indx]}"; done
assets[0] = Apple.log
assets[1] = Samsung.log
assets[2] = Nokia.log
NOTE: the alternative answers using mapfile address the issue referenced in Charles Duffy comment (see bash pitfall #50); readarray is a synonym for mapfile

I need to make an awk script to parse text in a file. I am not sure if I am doing it correctly

Hi I need to make a an awk script in order to parse a csv file and sort it in bash.
I need to get a list of presidents from Wikipedia and sort their years in office by year.
When it is all sorted out, each ear needs to be in a text file.
Im not sure I am doing it correctly
Here is a portion of my csv file:
28,Woodrow Wilson,http:..en.wikipedia.org.wiki.Woodrow_Wilson,4.03.1913,4.03.1921,Democratic ,WoodrowWilson.gif,thmb_WoodrowWilson.gif,New Jersey
29,Warren G. Harding,http:..en.wikipedia.org.wiki.Warren_G._Harding,4.03.1921,2.8.1923,Republican ,WarrenGHarding.gif,thmb_WarrenGHarding.gif,Ohio
I want to include $2 which is i think the name, and sort by $4 which is think the date the president took office
Here is my actual awk file:
#!/usr/bin/awk -f
-F, '{
if (substr($4,length($4)-3,2) == "17")
{ print $2 > Presidents1700 }
else if (substr($4,length($4)-3,2) == "18")
{ print $2 > Presidents1800 }
else if (substr($4,length($4)-3,2) == "19")
{ print $2 > Presidents1900 }
else if (substr($4,length($4)-3,2) == "20")
{ print $2 > Presidents2000 }
}'
Here is my function running it:
SplitFile() {
printf "Task 4: Spliting file based on century\n"
awk -f $AFILE ${custFolder}/${month}/$DFILE
}
Where $AFILE is my awk file, and the directories listed on the right lead to my actual file.
Here is a portion of my output, it's actually several hundred lines long but in the
end this is what a portion of it looks like:
awk: presidentData/10/presidents.csv:47: 46,Joseph Biden,http:..en.wikipedia.org.wiki.Joe_Biden,20.01.2021,Incumbent , Democratic , Joe_Biden.jpg,thmb_Joe_Biden.jpg,Pennsilvania awk: presidentData/10/presidents.csv:47: ^ syntax error awk: presidentData/10/presidents.csv:47: 46,Joseph Biden,http:..en.wikipedia.org.wiki.Joe_Biden,20.01.2021,Incumbent , Democratic , Joe_Biden.jpg,thmb_Joe_Biden.jpg,Pennsilvania awk: presidentData/10/presidents.csv:47: ^ syntax error
awk: presidentData/10/presidents.csv:47: 46,Joseph Biden,http:..en.wikipedia.org.wiki.Joe_Biden,20.01.2021,Incumbent , Democratic , Joe_Biden.jpg,thmb_Joe_Biden.jpg,Pennsilvania awk: presidentData/10/presidents.csv:47: ^ syntax error
awk: presidentData/10/presidents.csv:47: 46,Joseph Biden,http:..en.wikipedia.org.wiki.Joe_Biden,20.01.2021,Incumbent , Democratic , Joe_Biden.jpg,thmb_Joe_Biden.jpg,Pennsilvania awk: presidentData/10/presidents.csv:47:
I know the output is not very helpful; I would rather just screenshot but I can't. I tried getting help but these online classes can be really hard and getting help at a distance is tough, the syntax errors above seem to be pointing to commas in the csv file.
After the edits, it's clear you are trying to classify the presidents by century outputting the century in which the president served.
As stated in my comments above, you don't include single quotes or command-line arguments in an awk script file. You use the BEGIN {...} rule to set the field-separator FS = ",". Then there are several ways to you split things in the fourth field. split() is just as easy as anything else.
That will leave you with the ending year in which the president served in the fourth element of arr (arr[0] is always the complete expression matching any REGEX used). Then it just a matter of comparing with the largest year first and decreasing from there redirecting the output to the output file for the century.
Continuing with what you started, your awk script will look similar to:
#!/usr/bin/awk -f
BEGIN { FS = "," }
{
split ($4, arr, ".")
if (arr[3] >= 2000)
print $2 > "Presidents2000"
else if (arr[3] >= 1900)
print $2 > "Presidents1900"
else if (arr[3] >= 1800)
print $2 > "Presidents1800"
else if (arr[3] >= 1700)
print $2 > "Presidents1700"
}
Now make it executable (for convenience). Presuming the script is in the file pres.awk:
$ chmod +x pres.awk
Now simply call the awk script passing the .csv filename as the argument, e.g.
$ ./pres.awk my.csv
Now list the files named Presid* and see what is created:
$ ls -al Presid*
-rw-r--r-- 1 david david 33 Oct 8 22:28 Presidents1900
And verify the contents is what you needed:
$ cat Presidents1900
Woodrow Wilson
Warren G. Harding
Presuming that is the output you are looking for based on your attempt.
(note: you need to quote the output file name to ensure, e.g. Presidents1900 isn't taken as a variable that hasn't been set yet)
Let me know if you have further questions.

Count occurrences of string in logfile in last 5 minutes in bash

I have log file containing logs like this:
[Oct 13 09:28:15] WARNING.... Today is good day...
[Oct 13 09:28:15] Info... Tommorow will be...
[Oct 13 09:28:15] WARNING.... Yesterday was...
I need shell command to count occurrences of certain string in last 5 minutes.
I have tried this:
$(awk -v d1="$(date --date="-5 min" "+%b %_d %H:%M:%S")" -v d2="$(date "+%b %_d %H:%M:%S")" '$0 > d1 && $0 < d2 || $0 ~ d2' "$1" |
grep -ci "$2")
and calling script like this: sh ${script} /var/log/message "day" but it does not work
Your immediate problem is that you are comparing dates in random string format. To Awk (and your computer generally) a string which starts with "Dec" is "less than" a string which starts with "Oct" (this is what date +%b produces). Generally, you would want both your log files and your programs to use dates in some standard computer-readable format, usually ISO 8601.
Unfortunately, though, sometimes you can't control that, and need to adapt your code accordingly. The solution then is to normalize the dates before comparing them.
awk -v d1=$(date -d "-5 min" +"%F-%T") -v d2=$(date +"%F-%T") '
BEGIN { split("Jan:Feb:Mar:Apr:May:Jun:Jul:Aug:Sep:Oct:Nov:Dec", m, ":")
for (i=1; i<=12; ++i) mon["[" m[i]] = i }
{ timestamp = substr(d1, 1, 5) mon[$1] "-" $2 "-" $3 }
timestamp > d1 && timestamp <= d2' "$1" | grep -ci "$2
This will not work across New Year boundaries, but should hopefully at least help get you started in the right direction. (I suppose you could check if the year in d2 is different, and then check if the month in $1 is January, and then add 1 to the year from d1 in timestamp; but I leave this as an exercise for the desperate. This still won't work across longer periods of time, but the OP calls for a maximum period of 5 minutes, so the log can't straddle multiple years. Or if it does, you have a more fundamental problem.)
Perhaps note as well that date -d is a GNU extension which is not portable to POSIX (so this will not work e.g. on MacOS without modifications).
(Also, for production use, I would refactor the grep -ci into the Awk script; see also useless use of grep.)
Finally, the command substitution $(...) around your entire command line is wrong; this would instruct your shell to use the output from Awk and run it as a command.

Can't input date variable in bash

I have a directory /user/reports under which many files are there, one of them is :
report.active_user.30092018.77325.csv
I need output as number after date i.e. 77325 from above file name.
I created below command to find a value from file name:
ls /user/reports | awk -F. '/report.active_user.30092018/ {print $(NF-1)}'
Now, I want current date to be passed in above command as variable and get result:
ls /user/reports | awk -F. '/report.active_user.$(date +'%d%m%Y')/ {print $(NF-1)}'
But not getting required output.
Tried bash script:
#!/usr/bin/env bash
_date=`date +%d%m%Y`
active=$(ls /user/reports | awk -F. '/report.active_user.${_date}/ {print $(NF-1)}')
echo $active
But still output is blank.
Please help with proper syntax.
As #cyrus said you must use double quotes in your variable assignment because simple quote are use only for string and not for containing variables.
Bas use case
number=10
string='I m sentence with or wihtout var $number'
echo $string
Correct use case
number=10
string_with_number="I m sentence with var $number"
echo $string_with_number
You can use simple quote but not englobe all the string
number=10
string_with_number='I m sentence with var '$number
echo $string_with_number
Don't parse ls
You don't need awk for this: you can manage with the shell's capabilities
for file in report.active_user."$(date "+%d%m%Y")"*; do
tmp=${file%.*} # remove the extension
number=${tmp##*.} # remove the prefix up to and including the last dot
echo "$number"
done
See https://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion

BASH - Extract Data from String

I have a log that returns thousands of lines of data, I want to extract a few values from that.
In the log there is only one line containing the unquie unit reference so I can grep for that using:
grep "unit=Central-C152" logfile.txt
That produces a line of output similar to the following:
a3cd23e,85d58f5,53f534abef7e7,unit=Central-C152,locale=32325687-8595-9856-1236-12546975,11="School",1="Mr Green",2="Qual",3="SWE",8="report",5="channel",7="reset",6="velum"
The format of the line may change in that the order of the values won't always be in the same position.
I'm trying to work out how to get the value of 2 and 7 in to separate variables.
I had thought about cut on , or = but as the values aren't in a set order I couldn't work out that best way to do it.
I' trying to get:
var state=value of 2 without quotes
var mode=value of 7 without quotes
Can anyone advise on the best way to do this ?
Thanks
Could you please try following to create variable's values.
state=$(awk '/unit=Central-C152/ && match($0,/2=\"[^"]*/){print substr($0,RSTART+3,RLENGTH-3)}' Input_file)
mode=$(awk '/unit=Central-C152/ && match($0,/7=\"[^"]*/){print substr($0,RSTART+3,RLENGTH-3)}' Input_file)
You could print them too by doing following.
echo "$state"
echo "$mode"
Explanation: Adding explanation of command too now.
awk ' ##Starting awk program here.
/unit=Central-C152/ && match($0,/2=\"[^"]*/){ ##Checking condition if a line has string (unit=Central-C152) and using match using REGEX to check from 2 to till "
print substr($0,RSTART+3,RLENGTH-3) ##Printing substring starting from RSTART+3 till RLENGTH-3 characters.
}
' Input_file ##Mentioning Input_file name here.
You are probably better off doing all of the processing in Awk.
awk -F, '/unit=Central-C152/ {
for(i=1;i<=NF;++i)
if($i ~ /^[27]="/) {
b[++k] = $i
sub(/^[27]="/, "", b[k])
sub(/"$/, "", b[k])
gsub(/\\/, "", b[k])
}
print "state " b[1] ", mode " b[2]
}' logfile.txt
This presupposes that the fields always occur in the same order (2 before 7). Maybe you need to change or disable the gsub to remove backslashes in the values.
If you want to do more than print the values, refactoring whatever Bash code you have into Awk is often a better approach than doing this processing in Bash.
Assuming you already have the line in a variable such as with:
line="$(grep 'unit=Central-C152' logfile.txt | head -1)"
You can then simply use the built-in parameter substitution features of bash:
f2=${line#*2=\"} ; f2=${f2%%\"*} ; echo ${f2}
f7=${line#*7=\"} ; f7=${f7%%\"*} ; echo ${f7}
The first command on each line strips off the first part of the line up to and including the <field-number>=". The second command then strips everything off that beyond (and including) the first quote. The third, of course, simply echos the value.
When I run those commands against your input line, I see:
Qual
reset
which is, from what I can see, what you were after.

Resources