Using awk to modify output - linux

I have a command that is giving me the output:
/home/konnor/md5sums:ea66574ff0daad6d0406f67e4571ee08 counted-file.xml.20131003-083611
I need the output to be:
ea66574ff0daad6d0406f67e4571ee08 counted-file.xml
The closest I got was:
$ echo /home/konnor/md5sums:ea66574ff0daad6d0406f67e4571ee08 counted-file.xml.20131003-083611 | awk '{ printf "%s", $1 }; END { printf "\n" }'
/home/konnor/md5sums:ea66574ff0daad6d0406f67e4571ee08
I'm not familiar with awk but I believe this is the command I want to use, any one have any ideas?

Or just a sed oneliner:
echo /home/konnor/md5sums:ea66574ff0daad6d0406f67e4571ee08 counted-file.xml.20131003-083611 \
| sed -E 's/.*:(.*\.xml).*/\1/'

$ echo "/home/konnor/md5sums:ea66574ff0daad6d0406f67e4571ee08 counted-file.xml.20131003-083611" |
cut -d: -f2 |
cut -d. -f1-2
ea66574ff0daad6d0406f67e4571ee08 counted-file.xml
Note that this relies on the dot . being present as in counted-file.xml.

$ awk -F[:.] -v OFS="." '{print $2,$3}' <<< "/home/konnor/md5sums:ea66574ff0daad6d0406f67e4571ee08 counted-file.xml.20131003-083611"
ea66574ff0daad6d0406f67e4571ee08 counted-file.xml

not sure if this is ok for you:
sed 's/^.*:\(.*\)\.[^.]*$/\1/'
with your example:
kent$ echo "/home/konnor/md5sums:ea66574ff0daad6d0406f67e4571ee08 counted-file.xml.20131003-083611"|sed 's/^.*:\(.*\)\.[^.]*$/\1/'
ea66574ff0daad6d0406f67e4571ee08 counted-file.xml
this grep line works too:
grep -Po ':\K.*(?=\..*?$)'

Related

Awk: parse node names out of "40*r13n15:40*r10n61:40*r11n18:40*r09n15"

I have a linux script for selecting the node.
For example:
4
40*r13n15:40*r10n61:40*r11n18:40*r09n15
The correct result should be:
r13n15
r10n61
r11n18
r09n15
My linux script content is like:
hostNum=`bjobs -X -o "nexec_host" $1 | grep -v NEXEC`
hostSer=`bjobs -X -o "exec_host" $1 | grep -v EXEC`
echo $hostNum
echo $hostSer
for i in `seq 1 $hostNum`
do
echo $hostSer | awk -F ':' '{print '$i'}' | awk -F '*' '{print $2}'
done
But unlucky, I got nothing about node information.
I have tried:
echo $hostSer | awk -F ':' '{print "'$i'"}' | awk -F '*' '{print $2}'
and
echo $hostSer | awk -F ':' '{print '"$i"'}' | awk -F '*' '{print $2}'
But there are wrong. Who can give me a help?
One more awk:
$ echo "$variable" | awk 'NR%2==0' RS='[*:\n]'
r13n15
r10n61
r11n18
r09n15
By setting the record separtor(RS) to *:\n , the string is broken into individual tokens, after which you can just print every 2nd line(NR%2==0).
You can use multiple separators in awk. Please try below:
h='40*r13n15:40*r10n61:40*r11n18:40*r09n15'
echo "$h"| awk -F '[:*]' '{ for (i=2;i<=NF;i+=2) print $i }'
**edited to make it generic based on the comment from RavinderSingh13.

Optimize Multiline Pipe to Awk in Bash Function

I have this function:
field_get() {
while read data; do
echo $data | awk -F ';' -v number=$1 '{print $number}'
done
}
which can be used like this:
cat filename | field_get 1
in order to extract the first field from some piped in input. This works but I'm iterating on each line and it's slower than expected.
Does anybody know how to avoid this iteration?
I tried to use:
stdin=$(cat)
echo $stdin | awk -F ';' -v number=$1 '{print $number}'
but the line breaks get lost and it treats all the stdin as a single line.
IMPORTANT: I need to pipe in the input because in general I DO NOT have just to cat a file. Assume that the file is multiline, the problem is that actually. I know I can use "awk something filename" but that won't help me.
Just lose the while. Awk is a while loop in itself:
field_get() {
awk -F ';' -v number=$1 '{print $number}'
}
$ echo 1\;2\;3 | field_get 2
2
Update:
Not sure what you mean by your comment on multiline pipe and file but:
$ cat foo
1;2
3;4
$ cat foo | field_get 1
1
3
Use either stdin or file
field_get() {
awk -F ';' -v number="$1" '{print $number}' "${2:-/dev/stdin}"
}
Test Results:
$ field_get() {
awk -F ';' -v number="$1" '{print $number}' "${2:-/dev/stdin}"
}
$ echo '1;2;3;4' >testfile
$ field_get 3 testfile
3
$ echo '1;2;3;4' | field_get 2
2
No need to use a while loop and then awk. awk itself can read the input file. Where $1 is the argument passed to your script.
cat script.ksh
awk -v field="$1" '{print $field}' Input_file
./script.ksh 1
This is a job for the cut command:
cut -d';' -f1 somefile

Getting error while running script to find disk space

I am running below script:-
#!/bin/bash
threshold="20"
i=2
result=`df -kh |grep -v “Filesystem” | awk ‘{ print $5 }’ | sed ‘s/%//g’`
for percent in $result; do
if ((percent > threshold))
then
partition=`df -kh | head -$i | tail -1| awk ‘{print $1}’`
echo “$partition at $(hostname -f) is ${percent}% full”
fi
let i=$i+1
done
But I get the following error:
awk: ‘{
awk: ^ invalid char '▒' in expression
sed: -e expression #1, char 1: unknown command: `▒'
Please help me to resolve this.
What awk does not work? (your script does work fine on my Ubuntu)
This line:
result=`df -kh |grep -v "Filesystem" | awk '{ print $5 }' | sed 's/%//g'`
could be changed to:
result=$(df -kh | awk '!/Filesystem/ {print $5+0}')
Avoid using old and outdated backtics if parentheses works like this: var=$(code...)
This:
partition=`df -kh | head -$i | tail -1| awk '{print $1}'`
could be changed to:
partition=$(df -kh | awk -v line="$i" 'NR==line {print $1}')
This
let i=$i+1
could be change to:
((i++))
This would then give some like this:
#!/bin/bash
threshold="20"
i=2
result=$(df -kh | awk '!/Filesystem/ {print $5+0}')
for percent in $result; do
if ((percent > threshold))
then
partition=$(df -kh | awk -v line="$i" 'NR==line {print $1}')
echo "$partition at $(hostname -f) is ${percent}% full"
fi
((i++))
done
You're using ‘ for a single quote not '. Try re-encoding your file with an editor.
You got the answer to your syntax error, now re-write the whole script as just:
#!/bin/bash
df -kh |
awk -v t=20 -v h="$(hostname -f)" '(NR>1)&&($5+0>t){printf "%s at %s is %s full\n",$1,h,$5}'

Replace from nth occurrence of pattern till the end of line with sed

For example:
/some/long/path/we/need/to/shorten
Need to delete after the 6th occurrence of '/', including itself:
/some/long/path/we/need
Using sed I came up with this solution, but it's kind of workaround-ish:
path=/some/long/path/we/need/to/shorten
slashesToKeep=5
n=2+slashesToKeep
echo $path | sed "s/[^/]*//$n;s/\/\/.*//g"
Cleaner solution much appreciated!
Input
/some/long/path/we/need/to/shorten
Code
Cut Solution
echo '/some/long/path/we/need/to/shorten' | cut -d '/' -f 1-6
AWK Solution
echo '/some/long/path/we/need/to/shorten' | awk -F '/' '{ for(i=1; i<=6; i++) {print $i} }' | tr '\n' '/'|sed 's/.$//'
Output
/some/long/path/we/need
This might work for you (GNU sed):
sed 's/\/[^\/]*//6g' file
Awk:
awk -F'/' 'BEGIN{OFS=FS}{NF=6}1'
In action:
$ echo /some/long/path/we/need/to/shorten | awk -F'/' 'BEGIN{OFS=FS}{NF=6}1'
/some/long/path/we/need

Need to grab data inbetween tilde character

Can any one advise how to search on linux for some data between a tilde character. I need to get IP data however its been formed like the below.
Details:
20110906000418~118.221.246.17~DATA~DATA~DATA
One more:
echo '20110906000418~118.221.246.17~DATA~DATA~DATA' | sed -r 's/[^~]*~([^~]+)~.*/\1/'
echo "20110906000418~118.221.246.17~DATA~DATA~DATA" | cut -d'~' -f2
This uses the cut command with the delimiter set to ~. The -f2 switch then outputs just the 2nd field.
If the text you give is in a file (called filename), try:
grep "[0-9]*~" filename | cut -d'~' -f2
With cut:
echo "20110906000418~118.221.246.17~DATA~DATA~DATA" | cut -d~ -f2
With awk:
echo "20110906000418~118.221.246.17~DATA~DATA~DATA"
| awk -F~ '{ print $2 }'
In awk:
echo '20110906000418~118.221.246.17~DATA~DATA~DATA' | awk -F~ '{print $2}'
Just use bash
$ string="20110906000418~118.221.246.17~DATA~DATA~DATA"
$ echo ${string#*~}
118.221.246.17~DATA~DATA~DATA
$ string=${string#*~}
$ echo ${string%%~*}
118.221.246.17
one more, using perl:
$ perl -F~ -lane 'print $F[1]' <<< '20110906000418~118.221.246.17~DATA~DATA~DATA'
118.221.246.17
bash:
#!/bin/bash
IFS='~'
while read -a array;
do
echo ${array[1]}
done < ip
If string is constant, the following parameter expansion performs substring extraction:
$ a=20110906000418~118.221.246.17~DATA~DATA~DATA
$ echo ${a:15:14}
118.221.246.17
or using regular expressions in bash:
$ echo $(expr "$a" : '[^~]*~\([^~]*\)~.*')
118.221.246.17
last one, again using pure bash methods:
$ tmp=${a#*~}
$ echo $tmp
118.221.246.17~DATA~DATA~DATA
$ echo ${tmp%%~*}
118.221.246.17

Resources