Trying to understand a simple Linux code - linux

I am trying to figure out What does the following command mean in linux
awk 'match($0, "##SA") ==0 {print $0} ' $1 > ${G_DEST_DIR}/${G_DEST_FILENAME}
Does it remove the 1st line from the given parameter and places it under dest_dir ?

This awk prints all lines from input file that don't match the pattern:
##SA
Output of this awk is redirected to file name represented by:
${G_DEST_DIR}/${G_DEST_FILENAME}
Note $1 is shell variable here which is actually input file for awk.
Though same awk be shortened to:
awk '!/##SA/' "$1" > "${G_DEST_DIR}/${G_DEST_FILENAME}"

Related

Rename file as third word on it (bash)

I have several autogenerated files (see the picture below for example) and I want to rename them according to 3rd word in the first line (in this case, that would be 42.txt).
First line:
ligand CC##HOc3ccccc3 42 P10000001
Is there a way to do it?
Say you have file.txt containing:
ligand CC##HOc3ccccc3 42 P10000001
and you want to rename file.txt to 42.txt based on the 3rd field in the file.
*Using awk
The easiest way is simply to use mv with awk in a command substitution, e.g.:
mv file.txt $(awk 'NR==1 {print $3; exit}' file.txt).txt
Where the command-substitution $(...) is just the awk expression awk 'NR==1 {print $3; exit}' that simply outputs the 3rd-field (e.g. 42). Specifying NR==1 ensures only the first line is considered and exit at the end of that rule ensures no more lines are processed wasting time if file.txt is a 100000 line file.
Confirmation
file.txt is now renamed 42.txt, e.g.
$ cat 42.txt
ligand CC##HOc3ccccc3 42 P10000001
Using read
You can also use read to simply read the first line and take the 3rd word as the name there and then mv the file, e.g.
$ read -r a a name a <file.txt; mv file.txt "$name".txt
The temporary variable a above is just used to read and discard the other words in the first line of the file.

split and write the files with AWK -Bash

INPUT_FILE.txt in c:\Pro\usr\folder1
ABCDEFGH123456
ABCDEFGH123456
ABCDEFGH123456
BBCDEFGH123456
BBCDEFGH123456
used the below AWK command in .SH script which runs from c:\Pro\usr\folder2 to split the file to multiple txt files with an extension of _kg based on first 8 characters.
awk '{ F=substr($0,1,8) "_kg" ".txt"; print $0 >> F; close(F) }' ' "c:\Pro\usr\folder1\input_file.txt"
this is working good , but the files are writing in the main location where the bash is pointing. How can I route the created files to another location like c:\Pro\usr\folder3.
Thanks
Following awk code may help you in same, written and tested with shown samples in GNU awk.
awk -v outPath='c:\\Pro\\usr\\folder3' -v FPAT='^.{8}' '{outFile=($1"_kg.txt");outFile=outPath"\\"outFile;print > (outFile);close(outFile)}' Input_file
Explanation: Creating an awk variable named outPath which has path mentioned by OP in samples. Then setting FPAT(field separator settings as a regex), where I am creating field of 8 characters starting from first character. In main program of awk, creating outFile variable which has output file names in it(1st field following by _kg.txt), then printing whole line to output file and closing the output file in backend to avoid "too many opened files" error.
Pass the destination folder as a variable to awk:
awk -v dest='c:\\Pro\\usr\\folder3\\' '{F=dest substr($0,1,8) "_kg" ".txt"; print $0 >> F; close(F) }' "c:\Pro\usr\folder1\input_file.txt"
I think the doubled backslashes are required.

How to get the rest of the Pattern using any linux command?

I am try to update a file and doing some transformation using any linux tool.
For example, here I am trying with awk.
Would be great to know how to get the rest of the pattern?
awk -F '/' '{print $1"/raw"$2}' <<< "string1/string2/string3/string4/string5"
string1,rawstring2
here I dont know how many "/" is there and I want to get the output:
string1/rawstring2/string3/string4/string5
Something like
awk -F/ -v OFS=/ '{ $2 = "raw" $2 } 1' <<< "string1/string2/string3/string4/string5"
Just modify the desired field, and print out the changed line (Have to set OFS so it uses a slash instead of a space to separate fields on output, and a pattern of 1 uses the default action of printing $0. It's an idiom you'll see a lot of with awk.)
Also possible with sed:
sed -E 's|([^/]*/)|\1raw|' <<< "string1/string2/string3/string4/string5"
The \1 in the replacement string reproduces the bit inside parenthesis and appends raw to it.
Equivalent to
sed 's|\([^/]*/\)|\1raw|' <<< "string1/string2/string3/string4/string5"

awk output to variable [duplicate]

This question already has answers here:
How do I set a variable to the output of a command in Bash?
(15 answers)
Closed 6 years ago.
[Dd])
echo"What is the record ID?"
read rID
numA= awk -f "%" '{print $1'}< practice.txt
I cannot figure out how to set numA = to the output of the awk in order to compare rID and numA. numA is equal to the first field of a txt file which is separated by %. Any suggestions?
You can capture the output of any command in a variable via command substitution:
numA=$(awk -F '%' '{print $1}' < practice.txt)
Unless your file contains only one line, however, the awk command you presented (as corrected above) is unlikely to be what you want to use. If the practice.txt file contains, say, answers to multiple questions, one per line, then you probably want to structure the script altogether differently.
You don't need to use awk, just use parameter expansion:
numA=${rID%%\%*}
this is the correct syntax.
numA=$(awk -F'%' '{print $1}' practice.txt)
however, it will be easier to do comparisons in awk by passing the bash variable in.
awk -F'%' -v r="$rID" '$1==r{... do something ...}' practice.txt
since you didn't specify any details it's difficult to suggest more...
to remove rID matching line from the file do this
awk -F'%' -v r="$rID" '$1!=r' practice.txt > output
will print the lines where the condition is met ($1 not equal to rID), equivalent to deleting the ones which are equal. You can mimic in place replacement by
awk ... practice.txt > temp && mv temp practice.txt
where you fill in ... from the line above.
Try using
$ numA=`awk -F'%' '{ if($1 != $0) { print $1; exit; }}' practice.txt`
From the question, "numA is equal to the first field of a txt file which is separated by %"
-F'%', meaning % is the only separator we care about
if($1 != $0), meaning ignore lines that don't have the separator
print $1; exit;, meaning exit after printing the first field that we encounter separated by %. Remove the exit if you don't want to stop after the first field.

how to pass the filename as variable to a awk command from a shell script

in my shell script i have the following line
PO_list=$(awk -v col="$1" -F";" '!seen[$col]++ {print $col}' test.csv)
which generates a list with the values from column "col" which came from "$1" from file test.csv.
it might be possible to have several files in same location and for this would need to loop among them with a for sentence. For this I have to replace the filename test.csv with a variable, $i for example, which is the index from the list of files.
trying to fulfill my request, I was modifying my line with
PO_list=$(awk -v col="$1" -F";" '!seen[$col]++ {print $col}' $j)
unfortunately, i receive the error message:
awk: cannot open test.csv (No such file or directory)
Can anyone tell me why this error occur and how can I solve it, please?
Thank you,
As you commented in your previous question, you are calling it with
abc$ ./test.sh 2
So you just need to add another parameter when you call it:
abc$ ./test.sh 2 "test.csv"
and the script can be like this:
PO_list=$(awk -v col="$1" -F";" '!seen[$col]++ {print $col}' "$2")
# ^^^^
Whenever you want to use other parameters, remember they are positional. Hence, the first one is $1, second is $2 and so on.
In case the file happens to be in another directory, you can replace ./test.sh 2 "test.csv" by something like ./test.sh 2 "/full/path/of/test.csv" or whatever relative path you may need.

Resources