What's wrong with my AWK when coping with the last column? - linux

I write a shell script to abstract data from a file named "POSCAR". It is produced in the win10 system. It looks like this:
System
1.0
23.0000000000 0.0000000000 0.0000000000
0.0000000000 23.0000000000 0.0000000000
0.0000000000 0.0000000000 17.0000000000
C H
24 7
Direct
The 6th and 7th rows are element symbols and number of atoms. I want to get a string = C24H7. So I wrote the script like this:
#!/bin/bash
path=$PWD
fin="POSCAR"
e_tot=`sed -n 6p $fin |awk '{printf "%.1d", NF }'`
echo There are $e_tot columns.
ele=""
for ii in $(seq 1 1 $e_tot)
do
echo $ii
aa=`sed -n 6p $fin |awk -v ll=$ii '{printf "%s", $ll}'`
mm=`sed -n 7p $fin |awk -v ll=$ii '{printf "%d", $ll}'`
col=$aa$mm
ele=$ele$col
done
The output is wield for the last column. I can get C24H, but the "7" is lost. Or it just be exported to the next row.
I thought it may be related to the last character of the row, which is produced by windows and not recognized by Linux, and which I don't know is what.
BEGIN{FS="[ \n\t]+"} for awk does not work.
Where is wrong ?
THANK YOU...

With awk:
awk 'NR==6{a=$1;b=$2}NR==7{print a $1 b $2}' file
C24H7

Related

grep reverse with exact matching

I have a list file, which has id and number and am trying to get those lines from a master file which do not have those ids.
List file
nw_66 17296
nw_67 21414
nw_68 21372
nw_69 27387
nw_70 15830
nw_71 32348
nw_72 21925
nw_73 20363
master file
nw_1 5896
nw_2 52814
nw_3 14537
nw_4 87323
nw_5 56466
......
......
nw_n xxxxx
so far am trying this but not working as expected.
for i in $(awk '{print $1}' list.txt); do grep -v -w $i master.txt; done;
Kindly help
Give this awk one-liner a try:
awk 'NR==FNR{a[$1]=1;next}!a[$1]' list master
Maybe this helps:
awk 'NR == FNR {id[$1]=1;next}
{
if (id[$1] == "") {
print $0
}
}' listfile masterfile
We accept 2 files as input above, first one is listfile, second is masterfile.
NR == FNR would be true while awk is going through listfile. In the associative array id[], all ids in listfile are made a key with value as 1.
When awk goes through masterfile, it only prints a line if $1 i.e. the id is not a key in array ids.
The OP attempted the following line:
for i in $(awk '{print $1}' list.txt); do grep -v -w $i master.txt; done;
This line will not work as for every entry $i, you print all entries in master.txt tat are not equivalent to "$i". As a consequence, you will end up with multiple copies of master.txt, each missing a single line.
Example:
$ for i in 1 2; do grep -v -w "$i" <(seq 1 3); done
2 \ copy of seq 1 3 without entry 1
3 /
1 \ copy of seq 1 3 without entry 2
3 /
Furthermore, the attempt reads the file master.txt multiple times. This is very inefficient.
The unix tool grep allows one the check multiple expressions stored in a file in a single go. This is done using the -f flag. Normally this looks like:
$ grep -f list.txt master.txt
The OP can use this now in the following way:
$ grep -vwf <(awk '{print $1}' list.txt) master.txt
But this would do matches over the full line.
The awk solution presented by Kent is more flexible and allows the OP to define a more tuned match:
awk 'NR==FNR{a[$1]=1;next}!a[$1]' list master
Here the OP clearly states, I want to match column 1 of list with column 1 of master and I don't care about spaces or whatever is in column 2. The grep solution could still match entries in column 2.

How to cut word which is having three digits in a file (100) - shell scripting

I am having file, it has below data. I want to get queue names (FID.MAGNET.ERROR.*) which is having 100 + depth. please help me here.
file name MQData -
Which command i should use to get queue names which is having 100+(three digits > + ) details?
Three digits and >=100 have different meanings.
0000 is more than 3 digits. well perhaps your data won't have those cases.
If the length is important, I will do awk 'length($1)>2{print $2} file
If the value is what you are looking at, I will do awk '($1+0)>=100{print $2}' file
The $1+0 makes sure if your $1 has leading zeros, the comparison will be done correctly too. Take a look this example:
kent$ awk 'BEGIN{if("01001"+0>100)print "OK";else print "NOK"}'
OK
kent$ awk 'BEGIN{if("01001">100)print "OK";else print "NOK"}'
NOK
awk '$1 >= 100 {print $2}' MQData
Does that work?
You can skip lines with grep -v. I use echo -e to create a multi-line stream.
echo -e "1 xx\n22 yy\n333 zz\n100 To be deleted" | grep -Ev "^. |^.. |^100 "

bash print first to nth column in a line iteratively

I am trying to get the column names of a file and print them iteratively. I guess the problem is with the print $i but I don't know how to correct it. The code I tried is:
#! /bin/bash
for i in {2..5}
do
set snp = head -n 1 smaller.txt | awk '{print $i}'
echo $snp
done
Example input file:
ID Name Age Sex State Ext
1 A 12 M UT 811
2 B 12 F UT 818
Desired output:
Name
Age
Sex
State
Ext
But the output I get is blank screen.
You'd better just read the first line of your file and store the result as an array:
read -a header < smaller.txt
and then printf the relevant fields:
printf "%s\n" "${header[#]:1}"
Moreover, this uses bash only, and involves no unnecessary loops.
Edit. To also answer your comment, you'll be able to loop through the header fields thus:
read -a header < smaller.txt
for snp in "${header[#]:1}"; do
echo "$snp"
done
Edit 2. Your original method had many many mistakes. Here's a corrected version of it (although what I wrote before is a much preferable way of solving your problem):
for i in {2..5}; do
snp=$(head -n 1 smaller.txt | awk "{print \$$i}")
echo "$snp"
done
set probably doesn't do what you think it does.
Because of the single quotes in awk '{print $i}', the $i never gets expanded by bash.
This algorithm is not good since you're calling head and awk 4 times, whereas you don't need a single external process.
Hope this helps!
You can print it using awk itself:
awk 'NR==1{for (i=2; i<=5; i++) print $i}' smaller.txt
The main problem with your code is that your assignment syntax is wrong. Change this:
set snp = head -n 1 smaller.txt | awk '{print $i}'
to this:
snp=$(head -n 1 smaller.txt | awk '{print $i}')
That is:
Do not use set. set is for setting shell options, numbered parameters, and so on, not for assigning arbitrary variables.
Remove the spaces around =.
To run a command and capture its output as a string, use $(...) (or `...`, but $(...) is less error-prone).
That said, I agree with gniourf_gniourf's approach.
Here's another alternative; not necessarily better or worse than any of the others:
for n in $(head smaller.txt)
do
echo ${n}
done
somthin like
for x1 in $(head -n1 smaller.txt );do
echo $x1
done

Separating Awk input in Unix

I am trying to write an Awk program that takes two dates separated by / so 3/22/2013 for example and breaks them into the three separate numbers so that I could work with the 3 the 22 and the 2013 separately.
I would like the program to be called like
awk -f program_file 2/23/2013 4/15/2013
so far I have:
BEGIN {
d1 = ARGV[1]
d2 = ARGV[2]
}
This will accept both dates, but I am not sure how to break them up. Additionally, the above program must be called with nawk, with awk says it cannot open 2/23/2013.
Thanks in advance.
you cannot do it in your way. since awk thinks you have two files as input. that is, your date strings were looked as filenames. That's why you got that error message.
if the two dates are stored in shell variables, you could:
awk -vd1="$d1" -vd2="$d2" BEGIN{split(d1,one,"/");split(d2,two,"/");...}{...}'
the ... part is your logic, in the line above, the splitted parts are stored in array one and two. for example, you just want to print the elements of one:
kent$ d1=2/23/2013
kent$ d2=4/15/2013
kent$ awk -vd1="$d1" -vd2="$d2" 'BEGIN{split(d1,one,"/");split(d2,two,"/"); for(x in one)print one[x]}'
2
23
2013
or as other suggested, you could use FS of awk, but you have to do in this way:
kent$ echo $d1|awk -F/ '{print $1,$2,$3}'
2 23 2013
if you pass the two vars in one short, the -F/ won't work, unless they(the two dates) are in different lines
hope it helps
How about it?
[root#01 opt]# echo 2/23/2013 | awk -F[/] '{print $1}'
2
[root#01 opt]# echo 2/23/2013 | awk -F[/] '{print $2}'
23
[root#01 opt]# echo 2/23/2013 | awk -F[/] '{print $3}'
2013
You could decide to use / as a field separator, and pass -F / to GNU awk (or to nawk)
If you're on a machine with nawk and awk, there's a chance you're on Solaris and using /bin/awk or /usr/bin/awk, both of which are old, broken awk which must never be used. Use /usr/xpg4/bin/awk on Solaris instead.
Anyway, to your question:
$ cat program_file
BEGIN {
d1 = ARGV[1]
d2 = ARGV[2]
split(d1,array,/\//)
print array[1]
print array[2]
print array[3]
exit
}
$ awk -f program_file 2/23/2013 4/15/2013
2
23
2013
There may be better approaches though. Post some more info about what you're trying to do if you'd like help.

How do you implement an array into an AWK expression?

I am writing a script, and I have delimited file that looks like this.
1|Anderson|399.00|123
2|Smith|29.99|234
3|Smith|98.00|345
4|Smith|29.98|456
5|Edwards|399.00|567
6|Kreitzer|234.56|456
Here's an awk statement that will grab all the values in column one of a row that contain "Smith".
echo $(awk -F '|' 'BEGIN {count=0;} $2=="Smith" {count++; print $1}' customer)
The output would be:
2 3 4
How could I make it so I am also inputting the values into an array as awk increments. I tried this:
echo $(awk -F '|' 'BEGIN {count=0;} $2=="Smith" {count++; arr[count]=$1; print $1}' customer)
Edit: Later into the script, when I type
echo ${array[1]}
nothing outputs.
Your code seems to be right! Perhaps, I might haven't got your question correctly?
I slightly enhanced your code to print the values stored in the array at the end of execution. Also, there is a print statement just before the values are printed.
echo $(awk -F '|' 'BEGIN {count=0;} $2=="Smith" {count++; arr[count]=$1; print $1} END { print "Printing the inputs"; for (i in arr) print arr[i] }' customer)
2 3 4 Printing the inputs 2 3 4
Further, look at this site for more examples.
Your question is not very clear. Looking for something like this?
awk -F "|" '$2=="Smith" {arr[count++]=$1}
END {n=length(arr); for (i=0;i<n;i++) print (i+1), arr[i]}' in.file
OUTPUT
1 2
2 3
3 4
Found an easy solution. Set the output of awk into a variable. Then turn the variable into an array.
list=$(awk -F '|' 'BEGIN {count=0;} $2=="Smith" {count++; print $1}' customer)
array=($list)
Typing:
echo ${array[1]}
Will give you the second entry in the array

Resources