using variable defined outside Awk - linux

I have codded the following lines :
ARRAY=($(awk 'FS = ";" {print $3}' file.txt))
LINE_CREATOR=`echo "aaaa;bbbb;cccccccc" |
'{awk -F";"};
END
for (i in ARRAY)
{
print $'${ARRAY['i']}'
}
}'`
the File.txt looks like
1;8;3
4;6;1
7;9;2
Explanation :
the array contains the value : 3 1 2
so the loop will loop on the array , and extract fields $3 $1 $2 from the "aaaa;bbbb;cccccccc" using awk
and the final output should be this
ccccccccaaaabbbb
I still have some errors while launching my script.

I'm making a few guesses here but I think that this does what you want:
$ echo "aaaa;bbbb;cccccccc" | awk -F\; 'NR == FNR { n = split($0, a); next }
{ printf "%s", a[$3] } END { print "" }' - file
ccccccccaaaabbbb
NR == FNR means that the block is only run for the first input. - as an argument tells awk to read first from standard input. The string is split on FS (;) into the array a. next skips the rest of the script.
The second block is only run for the second input (the text file). The values in the third field are used to print the elements in the array a.

if you want to pass the index as an awk variable, here is another way
$ awk -F';' -v ix="$(cut -d\; -f3 file | paste -sd\;)" '
BEGIN{n=split(ix,a)}
{for(i=1;i<n;i++) printf "%s",$a[i];
printf "%s\n",$a[n]}' <<< "aaaa;bbbb;cccccccc"
ccccccccaaaabbbb

Related

awk with bash variable along with condition to be checked

I need to search and replace a pattern from file
[ec2_server]
server_host=something
[list_server]
server_host=old_name
to
[ec2_server]
server_host=something
[list_server]
server_host=new_name
I'm able to get it working with
awk '/\[list_server]/ { print; getline; $0 = "server_host=new_name" } 1'
But I'm trying to parameterize the search pattern, the parameter name to change and the parameter value to change.
PATTERN_TO_SEARCH=[list_server]
PARAM_NAME=server_host
PARAM_NEW_VALUE=new_name
But it is not working when I parameterize and pass the variables to awk
awk -v patt=$PATTERN_TO_SEARCH -v parm=$PARAM_NAME -v parmval=$PARAM_NEW_VALUE '/\patt/ { print; getline; $0 = "parm=parmval" } 1' file.txt
You have two instances of the same problem: you're trying to use a
variable name inside a string value. Awk can't read your mind: it
can't intuit that sometimes when your write "HOME" you mean "print the
value of the variable HOME" and other times you mean "print the word
HOME".
We need to make two separate changes:
First, to use a variable in your search pattern, you can use
syntax like this:
awk -v patt='some text' '$0 == patt {print}'
(Note that here we're using an equality match, ==; you can also use a regular expression match, ~, but in this particular case that would only complicate things).
With your example file content, running:
awk -v patt='[list_server]' '$0 == patt {print}' file.txt
Produces:
[list_server]
Next, when you write $0 = "parm=parmval", you're setting $0 to the literal string parm=parmval. If you want to perform variable substitution, consider using sprintf():
awk \
-v patt="$PATTERN_TO_SEARCH" \
-v parm="$PARAM_NAME" \
-v parmval="$PARAM_NEW_VALUE"\
'
$0 == patt { print; getline; $0 = sprintf("%s=%s\n", parm, parmval) } 1
' file.txt
Which gives us:
[ec2_server]
server_host=something
[list_server]
server_host=new_server
Have your awk code in following way, as experts recommend not to use getline(since it has edge cases in its use). So I am going with find the string and then set flag(custom variable made by me in program) and then print the line accordingly with using regex along with passed value from shell variable.
Along with matching and printing the new value we need to set field separator also to fetch correct value and replace/print it with new value. So I made field separator as = here for whole Input_file. By doing this approach you need not to pass any variable which has server_host value in it, since its already present in Input_file so we can take it from there.
awk solution with mentioning value within awk variable itself and then check regex in main program of awk for comparison.
awk -v var="list_server" -v newVal="NEW_VALUE" '
BEGIN{ FS=OFS="=" }
$0 ~ "^\\[" var "\\]$"{
found=1
print
next
}
found{
print $1 OFS newVal
found=""
next
}
1
' Input_file
OR awk solution to get value from shell variable and then use regex inside awk to match condition:
varS="list_server" ##Shell variable
newvalue="NEW_VALUE" ##Shell variable
awk -v var="$varS" -v newVal="$newvalue" '
BEGIN{ FS=OFS="=" }
$0 ~ "^\\[" var "\\]$"{
found=1
print
next
}
found{
print $1 OFS newVal
found=""
next
}
1
' Input_file
$ awk -v pat="$PATTERN_TO_SEARCH" -v parm="$PARAM_NAME" -v parmval="$PARAM_NEW_VALUE" '
f{$0=parm"="parmval; f=0} $0==pat{f=1} 1
' file
[ec2_server]
server_host=something
[list_server]
server_host=new_name
This makes the assumption "${PARAM_NAME}" immediately follows the search pattern row :
_P2S_='[list_server]'
_PNM_='server_host'
_PNV_='new_name'
echo "${...input...}" | gtee >( gpaste - | gcat -b >&2; echo ) | gcat - |
{m,n,g}awk -v __="${_P2S_}=${_PNM_}=${_PNV_}" -F= 'BEGIN {
$(_-=_)=__;___= $(_ = NF); FS ="^"(OFS = $--_ FS)
__= $-(_+=-_--) } (NR-_)< NF ? ($NF =___)^(_-=_) :_=NR*(-!!_)^(__!=$!_)' |
gcat -b | gcat -n | ecp
1 [ec2_server]
2 server_host=something
3 [list_server]
4 server_host=old_name
1 1 [ec2_server]
2 2 server_host=something
3
4 3 [list_server]
5 4 server_host=new_name

Sorting a file using fields with specific value

Recently, I had to sort several files according to records' ID; the catch was that there can be several types of records, and in each of those the field I had to use for sorting is on a different position. The fields, however, are easily identifiable thanks to key=value structure. To show a simple sample of the general structure:
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3
I came up with a pipeline as follows, which did the job:
awk -F'[|=]' '{for(i=1; i<=NF; i++) {if($i ~ "id") {i++; print $i"?"$0} }}' tester.txt | sort -n | awk -F'?' '{print $2}'
In other words the algorithm is as follows:
Split the record by both field and key-value separators (| and =)
Iterate through the elements and search for the id key
Print the next element (value of id key), a separator, and the whole line
Sort numerically
Remove prepended identifier to preserve records' structure
Processing the sample gives the output:
fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3
Is there a way, though, to do this task using single awk command?
You may try this gnu-awk code to to this in a single command:
awk -F'|' '{
for(i=1; i<=NF; ++i)
if ($i ~ /^id=/) {
a[gensub(/^id=/, "", 1, $i)] = $0
break
}
}
END {
PROCINFO["sorted_in"] = "#ind_num_asc"
for (i in a)
print a[i]
}' file
fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3
We are using | as field delimiter and when there is a column name starting with id= we store it in array a with index as text after = and value as the full record.
Using PROCINFO["sorted_in"] = "#ind_num_asc" we sort array a using numerical value of index and then in for loop we print value part to get the sorted output.
Using GNU awk for the 3rd arg to match() and sorted_in:
$ cat tst.awk
match($0,/(^|\|)id=([0-9]+)/,a) {
ids2vals[a[2]] = $0
}
END {
PROCINFO["sorted_in"] = "#ind_num_asc"
for ( id in ids2vals ) {
print ids2vals[id]
}
}
$ awk -f tst.awk file
fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3
Try Perl: perl -e 'print map { s/^.*? //; $_ } sort { $a <=> $b } map { ($id) = /id=(\d+)/; "$id $_" } <>' file
Some explanation of the code I use:
print #print the resulting list of lines
map {
s/^.*? //;
$_
} #remove numeric id from start of line
sort { $a <=> $b } #sort numerically
map {
($id) = /id=(\d+)/;
"$id $_"
} # capture id and place it in start of line
<> # read all lines from file
Or try sed and sort: sed 's/^\(.*id=\([0-9][0-9]*\).*\)$/\2 \1/' file | sort -n | sed 's/^[^ ][^ ]* //'
With your shown samples only, please try following(awk + sort + cut) solution, written and tested in GNU awk, should work in any awk.
awk '
match($0,/id=[0-9]+/){
print substr($0,RSTART,RLENGTH)";"$0
}
' Input_file | sort -t'=' -k2n | cut -d';' -f2-
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
match($0,/id=[0-9]+/){ ##Using awk match function to match id= followed by digits.
print substr($0,RSTART,RLENGTH)";"$0 ##printing sub string of matched value followed by current line along with semi-colon in it.
}
' Input_file | ##Mentioning Input_file here and passing awk output as a standard input to next command.
sort -t'=' -k2n | ##Sorting output with delimiter of = and by 2nd field then passing output to next command as an input.
cut -d';' -f2- ##Using cut command making delimiter as ; and printing everything from 2nd field onwards.

Count number of ';' in column

I use the following command to count number of ; in a first line in a file:
awk -F';' '(NR==1){print NF;}' $filename
I would like to do same with all lines in the same file. That is to say, count number of ; on all line in file.
What I have :
$ awk -F';' '(NR==1){print NF;}' $filename
11
What I would like to have :
11
11
11
11
11
11
Straight forward method to count ; per line should be:
awk '{print gsub(/;/,"&")}' Input_file
To remove empty lines try:
awk 'NF{print gsub(/;/,"&")}' Input_file
To do this in OP's way reduce 1 from value of NF:
awk -F';' '{print (NF-1)}' Input_file
OR
awk -F';' 'NF{print (NF-1)}' Input_file
I'd say you can solve your problem with the following:
awk -F';' '{if (NF) {a += NF-1;}} END {print a}' test.txt
You want to keep a running count of all the occurrences made (variable a).
As NF will return the number of fields, which is one more than the number of separators, you'll need to subtract 1 for each line. This is the NF-1 part.
However, you don't want to count "-1" for the lines in which there is no separator at all. To skip those you need the if (NF) part.
Here's a (perhaps contrived) example:
$ cat test.txt
;;
; ; ; ;;
; asd ;;a
a ; ;
$ awk -F';' '{if (NF) {a += NF-1;}} END {print a}' test.txt
12
Notice the empty line at the end (to test against the "no separator" case).
A different approach using tr and wc:
$ tr -cd ';' < file | wc -c
42
Your code returns a number one more than the number of semicolons; NF is the number of fields you get from splitting on a semicolon (so for example, if there is one semicolon, the line is split in two).
If you want to add this number from each line, that's easy;
awk -F ';' '{ sum += NF-1 } END { print sum }' "$filename"
If the number of fields is consistent, you could also just count the number of lines and multiply;
awk -F ':' 'END { print NR * (NF-1) }' "$filename"
But that's obviously wrong if you can't guarantee that all lines contain exactly the same number of fields.

How to replace fields using substr comparison

I have two files where I need to fetch the last 6 char of Field-11 from F1 and lookup on F2, if it match I need to replace Field-9 of F1 with Field-1 and Filed-2 of F2.
file1:
12345||||||756432101000||756432||756432101000||
aaaaa||||||986754812345||986754||986754812345||
ccccc||||||134567222222||134567||134567222222||
file2:
101000|AAAA
812345|20030
The expected output is:
12345||||||756432101000||101000AAAA ||756432101000||
aaaaa||||||986754812345||81234520030||986754812345||
ccccc||||||134567222222||134567||134567222222||
I have tried:
awk -F '|' -v OFS='|' 'NR==FNR{a[$1,$2];next} {b=substr($11,length($11)-7)} b in a {$9=a[$1,$2]}1'
I'd write it this way as a full script in a file, rather than a one-liner:
#!/usr/bin/awk -f
BEGIN {
FS = "|";
OFS = FS;
}
NR == FNR { # second file: the replacements to use
map[$1] = $2
next;
}
{ # first file specified: the main file to manipulate
b = substr($11,length($11)-5);
if (map[b]) {
$9 = b map[b]
}
print
}
$ awk -F '|' -v OFS='|' 'NR==FNR{a[$1]=$2;next} {b=substr($11,length($11)-5)} b in a {$9=b a[b]}1' file2 file1
12345||||||756432101000||101000AAAA||756432101000||
aaaaa||||||986754812345||81234520030||986754812345||
ccccc||||||134567222222||134567||134567222222||
How it works
awk implicitly loops through every line in both files, starting with file2 because it is specified first on the command line.
-F '|'
This tells awk to use | as the field separator on input
-v OFS='|'
This tells awk to use | as the field separator on output
NR==FNR{a[$1]=$2;next}
While reading the first file, file2, this saves the second field, $2, as the value of associative array a with the first field, $1, as the key.
next tells awk to skip the rest of the commands and start over on the next line.
b=substr($11,length($11)-5)
This extracts the last six characters of field 11 and saves them in variable b.
b in a {$9=b a[b]}
This tests to see if b is one of the keys of associative array a. If it is, this assigns the ninth field, $9, to the combination of b and a[b].
1
This is awk's cryptic shorthand for print-the-line.
You are almost there:
$ awk -F '|' -v OFS='|' 'NR==FNR{a[$1]=$2;next} {b=substr($11,length($11)-5)} b in a {$9=b a[b]}1' file2 file1
12345||||||756432101000||101000AAAA ||756432101000||
aaaaa||||||986754812345||81234520030||986754812345||
ccccc||||||134567222222||134567||134567222222||
$

Awk Search file for string

I have implimented a function that searches a column in a file for a string and it works well. What I would like to know how do I modify it to search all the columns fr a string?
awk -v s=$1 -v c=$2 '$c ~ s { print $0 }' $3
Thanks
If "all the columns" means "the entire file" then:
grep $string $file
Here is an example of one way to modify your current script to search for two different strings in two different columns. You can extend it to work for as many as you wish, however for more than a few it would be more efficient to do it another way.
awk -v s1="$1" -v c1="$2" -v s2="$3" -v c2="$4" '$c1 ~ s1 || $c2 ~ s2 { print $0 }' "$5"
As you can see, this technique won't scale well.
Another technique treats the column numbers and strings as a file and should scale better:
awk 'FNR == NR {strings[++c] = $1; columns[c] = $2; next}
{
for (i = 1; i <= c; i++) {
if ($columns[i] ~ strings[i]) {
print
}
}
}' < <(printf '%s %d\n' "${searches[#]}") inputfile
The array ${searches[#]} should contain strings and column numbers alternating.
There are several ways to populate ${searches[#]}. Here's one:
#!/bin/bash
# (this is bash and should precede the AWK above in the script file)
unset searches
for arg in "${#:1:$#-1}"
do
searches+=("$arg")
shift
done
inputfile=$1 # the last remaining argument
# now the AWK stuff goes here
To run the script, you'd do this:
$ ./scriptname foo 3 bar 7 baz 1 filename
awk -v pat="$string" '$0 ~ pat' infile

Resources