Optimize Multiline Pipe to Awk in Bash Function - linux

I have this function:
field_get() {
while read data; do
echo $data | awk -F ';' -v number=$1 '{print $number}'
done
}
which can be used like this:
cat filename | field_get 1
in order to extract the first field from some piped in input. This works but I'm iterating on each line and it's slower than expected.
Does anybody know how to avoid this iteration?
I tried to use:
stdin=$(cat)
echo $stdin | awk -F ';' -v number=$1 '{print $number}'
but the line breaks get lost and it treats all the stdin as a single line.
IMPORTANT: I need to pipe in the input because in general I DO NOT have just to cat a file. Assume that the file is multiline, the problem is that actually. I know I can use "awk something filename" but that won't help me.

Just lose the while. Awk is a while loop in itself:
field_get() {
awk -F ';' -v number=$1 '{print $number}'
}
$ echo 1\;2\;3 | field_get 2
2
Update:
Not sure what you mean by your comment on multiline pipe and file but:
$ cat foo
1;2
3;4
$ cat foo | field_get 1
1
3

Use either stdin or file
field_get() {
awk -F ';' -v number="$1" '{print $number}' "${2:-/dev/stdin}"
}
Test Results:
$ field_get() {
awk -F ';' -v number="$1" '{print $number}' "${2:-/dev/stdin}"
}
$ echo '1;2;3;4' >testfile
$ field_get 3 testfile
3
$ echo '1;2;3;4' | field_get 2
2

No need to use a while loop and then awk. awk itself can read the input file. Where $1 is the argument passed to your script.
cat script.ksh
awk -v field="$1" '{print $field}' Input_file
./script.ksh 1

This is a job for the cut command:
cut -d';' -f1 somefile

Related

Awk: parse node names out of "40*r13n15:40*r10n61:40*r11n18:40*r09n15"

I have a linux script for selecting the node.
For example:
4
40*r13n15:40*r10n61:40*r11n18:40*r09n15
The correct result should be:
r13n15
r10n61
r11n18
r09n15
My linux script content is like:
hostNum=`bjobs -X -o "nexec_host" $1 | grep -v NEXEC`
hostSer=`bjobs -X -o "exec_host" $1 | grep -v EXEC`
echo $hostNum
echo $hostSer
for i in `seq 1 $hostNum`
do
echo $hostSer | awk -F ':' '{print '$i'}' | awk -F '*' '{print $2}'
done
But unlucky, I got nothing about node information.
I have tried:
echo $hostSer | awk -F ':' '{print "'$i'"}' | awk -F '*' '{print $2}'
and
echo $hostSer | awk -F ':' '{print '"$i"'}' | awk -F '*' '{print $2}'
But there are wrong. Who can give me a help?
One more awk:
$ echo "$variable" | awk 'NR%2==0' RS='[*:\n]'
r13n15
r10n61
r11n18
r09n15
By setting the record separtor(RS) to *:\n , the string is broken into individual tokens, after which you can just print every 2nd line(NR%2==0).
You can use multiple separators in awk. Please try below:
h='40*r13n15:40*r10n61:40*r11n18:40*r09n15'
echo "$h"| awk -F '[:*]' '{ for (i=2;i<=NF;i+=2) print $i }'
**edited to make it generic based on the comment from RavinderSingh13.

Linux usernames /etc/passwd listing

I want to print the longest and shortest username found in /etc/passwd. If I run the code below it works fine for the shortest (head -1), but doesn't run for (sort -n |tail -1 | awk '{print $2}). Can anyone help me figure out what's wrong?
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |head -1 | awk '{print $2}'
sort -n |tail -1 | awk '{print $2}'
Here the issue is:
Piping finishes with the first sort -n |head -1 | awk '{print $2}' command. So, input to first command is provided through piping and output is obtained.
For the second command, no input is given. So, it waits for the input from STDIN which is the keyboard and you can feed the input through keyboard and press ctrl+D to obtain output.
Please run the code like below to get desired output:
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |head -1 | awk '{print $2}'
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n |tail -1 | awk '{print $2}
'
All you need is:
$ awk -F: '
NR==1 { min=max=$1 }
length($1) > length(max) { max=$1 }
length($1) < length(min) { min=$1 }
END { print min ORS max }
' /etc/passwd
No explicit loops or pipelines or multiple commands required.
The problem is that you only have two pipelines, when you really need one. So you have grep | while read do ... done | sort | head | awk and sort | tail | awk: the first sort has an input (i.e., the while loop) - the second sort doesn't. So the script is hanging because your second sort doesn't have an input: or rather it does, but it's STDIN.
There's various ways to resolve:
save the output of the while loop to a temporary file and use that as an input to both sort commands
repeat your while loop
use awk to do both the head and tail
The first two involve iterating over the password file twice, which may be okay - depends what you're ultimately trying to do. But using a small awk script, this can give you both the first and last line by way of the BEGIN and END blocks.
While you already have good answers, you can also use POSIX shell to accomplish your goal without any pipe at all using the parameter expansion and string length provided by the shell itself (see: POSIX shell specifiction). For example you could do the following:
#!/bin/sh
sl=32;ll=0;sn=;ln=; ## short len, long len, short name, long name
while read -r line; do ## read each line
u=${line%%:*} ## get user
len=${#u} ## get length
[ "$len" -lt "$sl" ] && { sl="$len"; sn="$u"; } ## if shorter, save len, name
[ "$len" -gt "$ll" ] && { ll="$len"; ln="$u"; } ## if longer, save len, name
done </etc/passwd
printf "shortest (%2d): %s\nlongest (%2d): %s\n" $sl "$sn" $ll "$ln"
Example Use/Output
$ sh cketcpw.sh
shortest ( 2): at
longest (17): systemd-bus-proxy
Using either pipe/head/tail/awk or the shell itself is fine. It's good to have alternatives.
(note: if you have multiple users of the same length, this just picks the first, you can use a temp file if you want to save all names and use -le and -ge for the comparison.)
If you want both the head and the tail from the same input, you may want something like sed -e 1b -e '$!d' after you sort the data to get the top and bottom lines using sed.
So your script would be:
#!/bin/bash
grep -Eo '^([^:]+)' /etc/passwd |
while read NAME
do
echo ${#NAME} ${NAME}
done |
sort -n | sed -e 1b -e '$!d'
Alternatively, a shorter way:
cut -d":" -f1 /etc/passwd | awk '{ print length, $0 }' | sort -n | cut -d" " -f2- | sed -e 1b -e '$!d'

how to extract grep and cut into a bash array

I tried:
here is content of file.txt
some other text
#1.something1=kjfk
#2.something2=dfkjdk
#3.something3=3232
some other text
bash script:
ids=( `grep "something" file.txt | cut -d'.' -f1` )
for id in "${ids[#]}"; do
echo $id
done
result:
(nothing newline...)
(nothing newline...)
(nothing newline...)
but all it prints is nothing like newline for every such id found what am i missing?
Your grep and cut should be working but you can use awk and reduce 2 commands into one:
while read -r id;
echo "$id"
done < <(awk -F '\\.' '/something/{print $1}' file.txt)
To populate an array:
ids=()
while read -r id;
ids+=( "$id" )
done < <(awk -F '\\.' '/something/{print $1}' file.txt)
You can use grep's -o option to output only the text matched by a regular expression:
$ ids=($(grep -Eo '^#[0-9]+' file.txt))
$ echo ${ids[#]}
#1 #2 #3
This of course doesn't check for the existence of a period on the line... If that's important, then you could either expand things with another pipe:
$ ids=($(grep -Eo '^#[0-9]+\.something' file.txt | grep -o '^#[0-9]*'))
or you could trim the array values after populating the array:
$ ids=($(grep -Eo '^#[0-9]+\.something' file.txt))
$ echo ${ids[#]}
#1.something #2.something #3.something
$ for key in "${!ids[#]}"; do ids[key]="${ids[key]%.*}"; done
$ echo ${ids[#]}
#1 #2 #3

Using awk to modify output

I have a command that is giving me the output:
/home/konnor/md5sums:ea66574ff0daad6d0406f67e4571ee08 counted-file.xml.20131003-083611
I need the output to be:
ea66574ff0daad6d0406f67e4571ee08 counted-file.xml
The closest I got was:
$ echo /home/konnor/md5sums:ea66574ff0daad6d0406f67e4571ee08 counted-file.xml.20131003-083611 | awk '{ printf "%s", $1 }; END { printf "\n" }'
/home/konnor/md5sums:ea66574ff0daad6d0406f67e4571ee08
I'm not familiar with awk but I believe this is the command I want to use, any one have any ideas?
Or just a sed oneliner:
echo /home/konnor/md5sums:ea66574ff0daad6d0406f67e4571ee08 counted-file.xml.20131003-083611 \
| sed -E 's/.*:(.*\.xml).*/\1/'
$ echo "/home/konnor/md5sums:ea66574ff0daad6d0406f67e4571ee08 counted-file.xml.20131003-083611" |
cut -d: -f2 |
cut -d. -f1-2
ea66574ff0daad6d0406f67e4571ee08 counted-file.xml
Note that this relies on the dot . being present as in counted-file.xml.
$ awk -F[:.] -v OFS="." '{print $2,$3}' <<< "/home/konnor/md5sums:ea66574ff0daad6d0406f67e4571ee08 counted-file.xml.20131003-083611"
ea66574ff0daad6d0406f67e4571ee08 counted-file.xml
not sure if this is ok for you:
sed 's/^.*:\(.*\)\.[^.]*$/\1/'
with your example:
kent$ echo "/home/konnor/md5sums:ea66574ff0daad6d0406f67e4571ee08 counted-file.xml.20131003-083611"|sed 's/^.*:\(.*\)\.[^.]*$/\1/'
ea66574ff0daad6d0406f67e4571ee08 counted-file.xml
this grep line works too:
grep -Po ':\K.*(?=\..*?$)'

Need to grab data inbetween tilde character

Can any one advise how to search on linux for some data between a tilde character. I need to get IP data however its been formed like the below.
Details:
20110906000418~118.221.246.17~DATA~DATA~DATA
One more:
echo '20110906000418~118.221.246.17~DATA~DATA~DATA' | sed -r 's/[^~]*~([^~]+)~.*/\1/'
echo "20110906000418~118.221.246.17~DATA~DATA~DATA" | cut -d'~' -f2
This uses the cut command with the delimiter set to ~. The -f2 switch then outputs just the 2nd field.
If the text you give is in a file (called filename), try:
grep "[0-9]*~" filename | cut -d'~' -f2
With cut:
echo "20110906000418~118.221.246.17~DATA~DATA~DATA" | cut -d~ -f2
With awk:
echo "20110906000418~118.221.246.17~DATA~DATA~DATA"
| awk -F~ '{ print $2 }'
In awk:
echo '20110906000418~118.221.246.17~DATA~DATA~DATA' | awk -F~ '{print $2}'
Just use bash
$ string="20110906000418~118.221.246.17~DATA~DATA~DATA"
$ echo ${string#*~}
118.221.246.17~DATA~DATA~DATA
$ string=${string#*~}
$ echo ${string%%~*}
118.221.246.17
one more, using perl:
$ perl -F~ -lane 'print $F[1]' <<< '20110906000418~118.221.246.17~DATA~DATA~DATA'
118.221.246.17
bash:
#!/bin/bash
IFS='~'
while read -a array;
do
echo ${array[1]}
done < ip
If string is constant, the following parameter expansion performs substring extraction:
$ a=20110906000418~118.221.246.17~DATA~DATA~DATA
$ echo ${a:15:14}
118.221.246.17
or using regular expressions in bash:
$ echo $(expr "$a" : '[^~]*~\([^~]*\)~.*')
118.221.246.17
last one, again using pure bash methods:
$ tmp=${a#*~}
$ echo $tmp
118.221.246.17~DATA~DATA~DATA
$ echo ${tmp%%~*}
118.221.246.17

Resources