I have a line in a bash script that calculates the sum of unique IP requests to a certain page.
grep $YESTERDAY $ACCESSLOG | grep "$1" | awk -F" - " '{print $1}' | sort | uniq -c | awk '{sum += 1; print } END { print " ", sum, "total"}'
I am trying to get the value of sum to a variable outside the awk statement so I can compare pages to each other. So far I have tried various combinations of something like this:
unique_sum=0
grep $YESTERDAY $ACCESSLOG | grep "$1" | awk -F" - " '{print $1}' | sort | uniq -c | awk '{sum += 1; print ; $unique_sum=sum} END { print " ", sum, "total"}'
echo "${unique_sum}"
This results in an echo of "0". I've tried placing __$unique_sum=sum__ in the END, various combinations of initializing the variable (awk -v unique_sum=0 ...) and placing the variable assignment outside of the quoted sections.
So far, my Google-fu is failing horribly as most people just send the whole of the output to a variable. In this example, many lines are printed (one for each IP) in addition to the total. Failing a way to capture the 'sum' variable, is there a way to capture that last line of output?
This is probably one of the most sophisticated things I've tried in awk so my confidence that I've done anything useful is pretty low. Any help will be greatly appreciated!
You can't assign a shell variable inside an awk program. In general, no child process can alter the environment of its parent. You have to have the awk program print out the calculated value, and then shell can grab that value and assign it to a variable:
output=$( grep $YESTERDAY $ACCESSLOG | grep "$1" | awk -F" - " '{print $1}' | sort | uniq -c | awk '{sum += 1; print } END {print sum}' )
unique_sum=$( sed -n '$p' <<< "$output" ) # grab the last line of the output
sed '$d' <<< "$output" # print the output except for the last line
echo " $unique_sum total"
That pipeline can be simplified quite a lot: awk can do what grep can do, so first
grep $YESTERDAY $ACCESSLOG | grep "$1" | awk -F" - " '{print $1}'
is (longer, but only one process)
awk -F" - " -v date="$YESTERDAY" -v patt="$1" '$0 ~ date && $0 ~ patt {print $1}' "$ACCESSLOG"
And the last awk program just counts how many lines and can be replaced with wc -l
All together:
unique_output=$(
awk -F" - " -v date="$YESTERDAY" -v patt="$1" '
$0 ~ date && $0 ~ patt {print $1}
' "$ACCESSLOG" | sort | uniq -c
)
echo "$unique_output"
unique_sum=$( wc -l <<< "$unique_output" )
echo " $unique_sum total"
Related
Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 is my string and the result I want is vm-1.0.3
What is the best way to do this
Below is what I tried
$ echo Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 | awk -F _ {'print $2'} | awk -F - {'print $1,$2'}
vm 1.0.3
I also tried
$ echo Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 | awk -F _ {'print $2'} | awk -F - {'print $1"-",$2'}
vm- 1.0.3
Here I do not need space in between
I tried using cut and I got the expected result
$ echo Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 | awk -F _ {'print $2'} | cut -c 1-8
vm-1.0.3
What is the best way to do the same?
Making assumptions from the 1 example you provided about what the general form of your input will be so it can handle that robustly, using any sed:
$ echo 'Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2' |
sed 's/^[^-]*-[^-]*-[^_]*_\(.*\)-[^-]*$/\1/'
vm-1.0.3
or any awk:
$ echo 'Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2' |
awk 'sub(/^[^-]+-[^-]+-[^_]+_/,"") && sub(/-[^-]+$/,"")'
vm-1.0.3
You don't need 2 calls to awk, but your syntax with the single quotes outside the curly's, including printing the hyphen:
echo Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 |
awk -F_ '{print $2}' | awk -F- '{print $1 "-" $2}'
If your string has the same format, let the field separator be either - or _
echo Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 | awk -F"[-_]" '{print $4 "-" $5}'
Or split the second field on - and print the first 2 parts
echo Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 | awk -F_ '{
split($2,a,"-")
print a[1] "-" a[2]
}'
Or with gnu-awk a bit more specific match with a capture group:
echo Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2 |
awk 'match($0, /^Apps-[^_]*_(vm-[0-9]+\.[0-9]+\.[0-9]+)/, a) {print a[1]}'
Output
vm-1.0.3
This is the easiest I can think of:
echo "Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2" | cut -c 25-32
Obviously you need to be sure about the location of your characters. In top of that, you seem to be have two separators: '_' and '-', while both characters also are part of the name of your entry.
echo 'Apps-10.00.00R000-B1111_vm-1.0.3-x86_64.qcow2' | sed -E 's/^.*_vm-([0-9]+).([0-9]+).([0-9]+)-.*/vm-\1.\2.\3/'
I have a linux script for selecting the node.
For example:
4
40*r13n15:40*r10n61:40*r11n18:40*r09n15
The correct result should be:
r13n15
r10n61
r11n18
r09n15
My linux script content is like:
hostNum=`bjobs -X -o "nexec_host" $1 | grep -v NEXEC`
hostSer=`bjobs -X -o "exec_host" $1 | grep -v EXEC`
echo $hostNum
echo $hostSer
for i in `seq 1 $hostNum`
do
echo $hostSer | awk -F ':' '{print '$i'}' | awk -F '*' '{print $2}'
done
But unlucky, I got nothing about node information.
I have tried:
echo $hostSer | awk -F ':' '{print "'$i'"}' | awk -F '*' '{print $2}'
and
echo $hostSer | awk -F ':' '{print '"$i"'}' | awk -F '*' '{print $2}'
But there are wrong. Who can give me a help?
One more awk:
$ echo "$variable" | awk 'NR%2==0' RS='[*:\n]'
r13n15
r10n61
r11n18
r09n15
By setting the record separtor(RS) to *:\n , the string is broken into individual tokens, after which you can just print every 2nd line(NR%2==0).
You can use multiple separators in awk. Please try below:
h='40*r13n15:40*r10n61:40*r11n18:40*r09n15'
echo "$h"| awk -F '[:*]' '{ for (i=2;i<=NF;i+=2) print $i }'
**edited to make it generic based on the comment from RavinderSingh13.
Using bash, I want to print a number followed by sizes of 2 paths on one line. i.e. output of 3 commands on one line.
All the 3 items should be separated by ":"
echo -n "10001:"; du -sch /abc/def/* | grep 'total' | awk '{ print $1 }'; du -sch /ghi/jkl/* | grep 'total' | awk '{ print $1 }'
I am getting the output as -
10001:61M
:101M
But I want the output as -
10001:61M:101M
This should work for you. The two key elements added being the
tr - d '\n'
which effectively strips new line characters from the end of the output. As well as adding in the echo ":" to get the extra colon for formatting in there.
Hope this helps! Here's a link to the docs for tr command.
https://ss64.com/bash/tr.html
echo -n "10001:"; du -sch /abc/def/* | grep 'total' | awk '{ print $1 }' | tr -d '\n'; echo ":" | tr -d '\n'; du -sch /ghi/jkl/* | grep 'total' | awk '{ print $1 }'
Save your values to variables, and then use printf:
printf '%s:%s:%s\n' "$first" "$second" "$third"
I want to print the longest and shortest username found in /etc/passwd. If I run the code below it works fine for the shortest (head -1), but doesn't run for (sort -n |tail -1 | awk '{print $2}). Can anyone help me figure out what's wrong?
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |head -1 | awk '{print $2}'
sort -n |tail -1 | awk '{print $2}'
Here the issue is:
Piping finishes with the first sort -n |head -1 | awk '{print $2}' command. So, input to first command is provided through piping and output is obtained.
For the second command, no input is given. So, it waits for the input from STDIN which is the keyboard and you can feed the input through keyboard and press ctrl+D to obtain output.
Please run the code like below to get desired output:
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |head -1 | awk '{print $2}'
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |tail -1 | awk '{print $2}
'
All you need is:
$ awk -F: '
NR==1 { min=max=$1 }
length($1) > length(max) { max=$1 }
length($1) < length(min) { min=$1 }
END { print min ORS max }
' /etc/passwd
No explicit loops or pipelines or multiple commands required.
The problem is that you only have two pipelines, when you really need one. So you have grep | while read do ... done | sort | head | awk and sort | tail | awk: the first sort has an input (i.e., the while loop) - the second sort doesn't. So the script is hanging because your second sort doesn't have an input: or rather it does, but it's STDIN.
There's various ways to resolve:
save the output of the while loop to a temporary file and use that as an input to both sort commands
repeat your while loop
use awk to do both the head and tail
The first two involve iterating over the password file twice, which may be okay - depends what you're ultimately trying to do. But using a small awk script, this can give you both the first and last line by way of the BEGIN and END blocks.
While you already have good answers, you can also use POSIX shell to accomplish your goal without any pipe at all using the parameter expansion and string length provided by the shell itself (see: POSIX shell specifiction). For example you could do the following:
#!/bin/sh
sl=32;ll=0;sn=;ln=; ## short len, long len, short name, long name
while read -r line; do ## read each line
u=${line%%:*} ## get user
len=${#u} ## get length
[ "$len" -lt "$sl" ] && { sl="$len"; sn="$u"; } ## if shorter, save len, name
[ "$len" -gt "$ll" ] && { ll="$len"; ln="$u"; } ## if longer, save len, name
done </etc/passwd
printf "shortest (%2d): %s\nlongest (%2d): %s\n" $sl "$sn" $ll "$ln"
Example Use/Output
$ sh cketcpw.sh
shortest ( 2): at
longest (17): systemd-bus-proxy
Using either pipe/head/tail/awk or the shell itself is fine. It's good to have alternatives.
(note: if you have multiple users of the same length, this just picks the first, you can use a temp file if you want to save all names and use -le and -ge for the comparison.)
If you want both the head and the tail from the same input, you may want something like sed -e 1b -e '$!d' after you sort the data to get the top and bottom lines using sed.
So your script would be:
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n | sed -e 1b -e '$!d'
Alternatively, a shorter way:
cut -d":" -f1 /etc/passwd | awk '{ print length, $0 }' | sort -n | cut -d" " -f2- | sed -e 1b -e '$!d'
I am trying to join output from ps and pwdx command. Can anyone point out the mistake in my command.
ps -eo %p,%c,%u,%a --no-headers | awk -F',' '{ for(i=1;i<=NF;i++) {printf $i",
"} ; printf pwdx $1; printf "\n" }'
I expect the last column in each row to be the process directory. But it just shows the value of $1 instead of the command output pwdx $1
This is my output sample (1 row):
163957, processA , userA , /bin/processA -args, 163957
I expected
163957, processA , userA , /bin/processA -args, /app/processA
Can anyone point out what I may be missing
Try this:
ps -eo %p,%c,%u,%a --no-headers | awk -F',' '{ printf "%s,", $0; "pwdx " $1 | getline; print gensub("^[0-9]*: *","","1",$0);}'
Explanation:
awk '{print pwdx $1}' will concatenate the awk variable pwdx (which is empty) and $1 (pid). So, effectively, you were getting only the pid at the output.
In order to run a command and gets its output, you need to use this awk construct:
awk '{"some command" | getline; do_something_with $0}'
# After getline, the output will be present in $0.
#For multiline output, use this:
awk '{while ("some command" | getline){do_something_with $0}}'
# Each individual line will be present in subsequent run of the while loop.
Simplifying your example to focus on how to execute the pwdx command within awk and capture the result of this command into an awk variable as this is where you were having issues:
ps -eo %p,%c,%u,%a --no-headers | awk -F',' '{ system("pwdx "$1) | getline vpwdx; printf vpwdx $1}'
produces:
15651665: /
16651690: /
16901691: /home/fpm
169134248: /home/fpm
3424834254: /home/fpm/tmp
3425440181: /home/fpm/UDK2015
...