String comparison in awk - string

I need to compare two strings in alphabetic order, not only equality test. I want to know is there way to do string comparison in awk?

Sure it can:
pax$ echo 'hello
goodbye' | gawk '{if ($0 == "hello") {print "HELLO"}}'
HELLO
You can also do inequality (ordered) testing as well:
pax> printf 'aaa\naab\naac\naad\n' | gawk '{if ($1 < "aac"){print}}'
aaa
aab

You can do string comparison in awk using standard boolean operators, unlike in C where you would have to use strcmp().
echo "xxx yyy" > test.txt
cat test.txt | awk '$1!=$2 { print($1 $2); }'

You can check the answer in the nawk manual
echo aaa bbb | awk '{ print ($1 >= $2) ? "true" : "false" }'

Related

How to extract words between two characters in linux?

I have the following stored in a file named tmp.txt
user/config/jars/content-config-factory-3.2.0.0.jar
I need to store this word to a variable -
$variable=content-config-factory
I have written the following
while read line
do
var=$(echo $line | awk 'BEGIN{FS="\/"; OFS=" "} {print $NF}' )
var=$(echo $var | awk 'BEGIN{FS="-"; OFS=" "} {print $(1)}' )
echo $var
done < tmp.txt
This returns the result "content" instead of "content-config-factory".
Can anyone please tell me how to extract a word between two characters from a string efficiently.
An awk solution would be like
awk -F/ '{sub("-[^-]+$", "", $NF); print $NF}
Test
$ echo "user/config/jars/content-config-factory-3.2.0.0.jar" | awk -F/ '{sub("-[^-]+$", "", $NF); print $NF}'
content-config-factory
You can try this way also and get your expected result
variable=$(sed 's:.*/\(.*\)-.*:\1:' FileName)
echo $variable
OutPut :
content-config-factory
You could use grep,
grep -oP '(?<=/)[^/]*(?=-\d+\.)' file
Example:
$ var=$(echo 'user/config/jars/content-config-factory-3.2.0.0.jar' | grep -oP '(?<=/)[^/]*(?=-\d+\.)')
$ echo "$var"
content-config-factory

how can i get certain columns and certain rows from file with egrep and awk

This is my data and file name : example.txt
id name lastname point
1234;emanuel;emenike;2855
1357;christian;baroni;398789
1390;alex;souza;23143
8766;moussa;sow;5443
I want to see who has this id(1234, 1390) columnname and point like that
emanuel 2855
alex 23143
How can i do this in linux command line with awk and egrep
You can try this:
awk -F\; '$1=="1234" || $1=="1390" {print $2,$4}' file
Using grep and cut:
grep '^\(1234\|1390\);' input | cut -d\; --output-delimiter=' ' -f2,4
Some variation awk
awk -F\; '$1~/^(1234|1390)$/ {print $2,$4}' file
emanuel 2855
alex 23143
Through awk,
awk -F';' '$1~/^1234$/ || $1~/^1390$/ {print $2,$4}' file
Example:
$ cat ccc
id name lastname point
1234;emanuel;emenike;2855
1357;christian;baroni;398789
1390;alex;souza;23143
8766;moussa;sow;5443
$ awk -F';' '$1~/^1234$/ || $1~/^1390$/ {print $2,$4}' ccc
emanuel 2855
alex 23143
Use the GNU version of awk (= gawk) in a two step approach to make your solution very flexible:
Step 1:
Parse your data file (e.g., example.txt) to generate a gawk lookup-function (here called "function_library.awk"):
$ /PATH/TO/generate_awk_function.sh /PATH/TO/example.txt
"generate_awk_function.sh" is just an gawk script for printing:
#! /bin/bash -
gawk 'BEGIN {
FS=";"
OFS="\t"
print "#### gawk function library \"function_library.awk\""
print "function lookup_value(key, value_for_key) {"
}
{
if (NR > 1 ) print "\tvalue_for_key["$1"] = \"" $2 OFS $4 "\""
}
END {
print " print value_for_key[key]"
print "}"
}' $1 > function_library.awk
You have generated this lookup function:
$ cat function_library.awk
#### gawk function library "function_library.awk"
function lookup_value(key, value_for_key) {
value_for_key[1234] = "emanuel 2855"
value_for_key[1357] = "christian 398789"
value_for_key[1390] = "alex 23143"
value_for_key[8766] = "moussa 5443"
print value_for_key[key]
}
Adapt "generate_awk_function.sh" for your needs:
a) FS=";" is setting the field separator in your input file (here a semicolon)
b) OFS="\t" is setting the output field separator (here a TAB)
You only have to generate this gawk "lookup-function" anew when your "example.txt" has changed.
Step 2:
Read your IDs to look up your results:
$ cat id.txt
1234
1390
$ gawk -i function_library.awk '{lookup_value($1)}' id.txt
emanuel 2855
alex 23143
You can also use this approach in a pipe like this:
$ cat id.txt | gawk -i function_library.awk '{lookup_value($1)}'
or like this:
$ echo 1234 | gawk -i function_library.awk '{lookup_value($1)}'
You can adapt this approach if your lookup string (1234) or file (id.txt) is containing some additional unwanted data ("noise") by using simple awk means:
a) Here, too, you can define a field separator, e.g., by setting it to a colon (:)
$ gawk -F":" -i function_library.awk '{lookup_value($5)}' id.txt
b) You can use the nth field of your lookup string, e.g., setting it from the 1st field to the 5th field just by changing the lookup_value from $1 to $5:
$ gawk -i function_library.awk '{lookup_value($5)}' id.txt
Please be aware that the '-i' command-line option is only supported by the GNU version of awk (= gawk).
HTH
bernie

Linux - parsing data, what language to use

I am looking to parse data out of a 'column' based format. I am running into issues where I feel I am 'hacking' bash/awk commands to pull the strings and numbers. If the numbers/text come in different formats then the script might fail unexpectedly and I will have errors.
Data:
RSSI (dBm): -86 Tx Power: 0
RSRP (dBm): -114 TAC: 4r5t (12341)
RSRQ (dB): -10 Cell ID: efefwg (4261431)
SINR (dB): 2.2
My method:
Using bash and awk
#!/bin/bash
DATA_OUTPUT=$(get_data)
RSSI=$(echo "${DATA_OUTPUT}" | awk '$1 == "RSSI" {print $3}')
RSRP=$(echo "${DATA_OUTPUT}" | awk '$1 == "RSRP" {print $3}')
RSRQ=$(echo "${DATA_OUTPUT}" | awk '$1 == "RSRQ" {print $3}')
SINR=$(echo "${DATA_OUTPUT}" | awk '$1 == "SINR" {print $3}')
TX_POWER=$(echo "${DATA_OUTPUT}" | awk '$4 == "Tx" {print $6}')
echo "$SINR"
echo ">$SINR<"
However the output of the above comes out very strange.
2.2 # thats fine!
<2.2 # what??? expecting >4.6<
Little things like this make me question using awk and bash to parse the data. Should I use C++ or some other language? Or is there a better way of doing this?
Thank you
This should be your starting point (the match() can be simplified or removed if your input data is tab-separated or fixed width fields):
$ cat file
RSSI (dBm): -86 Tx Power: 0
RSRP (dBm): -114 TAC: 4r5t (12341)
RSRQ (dB): -10 Cell ID: efefwg (4261431)
SINR (dB): 2.2
.
$ cat tst.awk
{
tail = $0
while ( match(tail,/[^:]+:[[:space:]]+[^[:space:]]+[[:space:]]*([^[:space:]]*$)?/) )
{
nvPair = substr(tail,RSTART,RLENGTH)
sub(/ \([^)]+\):/,":",nvPair) # remove (dB) or (dBm)
sub(/:[[:space:]]+/,":",nvPair) # remove spaces after :
sub(/[[:space:]]+$/,"",nvPair) # remove trailing spaces
split(nvPair,tmp,/:/)
name2value[tmp[1]] = tmp[2] # name2value["RSSI"] = "-86"
tail = substr(tail,RSTART+RLENGTH)
}
}
END {
for (name in name2value) {
value = name2value[name]
printf "%s=\"%s\"\n", name, value
}
}
.
$ awk -f tst.awk file
Tx Power="0"
RSSI="-86"
TAC="4r5t (12341)"
Cell ID="efefwg (4261431)"
RSRP="-114"
RSRQ="-10"
SINR="2.2"
Hopefully it's clear that in the above script after the match() loop you can simply say things like print name2value["Tx Power"] to print the value of that key phrase.
If your data was created in DOS, run dos2unix or tr -d '^M' on it first, where ^M means a literal control-M character.
Your data contains DOS-style \r\n line endings. When you do this
echo ">$SINR<"
The actual output is actually
>4.6\r<
The carriage return sends the cursor back to the start of the line.
You can do this:
DATA_OUTPUT=$(get_data | sed 's/\r$//')
But instead of parsing the output over and over, I'd rewrite like this:
while read -ra fields; do
case ${fields[0]} in
RSSI) rssi=${fields[2]};;
RSRP) rsrp=${fields[2]};;
RSPQ) rspq=${fields[2]};;
SINR) sinr=${fields[2]};;
esac
if [[ ${fields[3]} == "Tx" ]]; then tx_power=${fields[5]}; fi
done < <(get_data | sed 's/\r$//' )

awk compare value with a input

can a user input variable($userinput) compare with a value?
awk -F: '$1 < $userinput { printf .... }'
This comparison expression seems ok to me, but it gives an error?
Try doing this :
awk -vuserinput="$userinput" -F: '$1 < userinput {}'
A real example :
read -p "Give me an integer >>> " int
awk -v input=$int '$1 < input {print $1, "is less than", input}' <<< 1

get the number after zeroes

I need to get 88090000 after zeroes. How can I do that using awk?
There can be any number of zeroes before the number. But, I need the number after the zeroes.
0000000088090000
I appreciate your help.
Just add 0.
$ awk '{ print $0 + 0 }' <<< '0000000088090000'
88090000
Using regular expressions:
echo '0000000088090000' | awk '{ sub(/^0+/, ""); print }'
One way:
echo "0000000088090000" | awk '{ printf "%d\n", $0 }'
Using sed:
[jaypal:~/Temp] echo "0000000088090000" | sed 's/^0\+//g'
88090000

Resources