bash print first to nth column in a line iteratively - linux

I am trying to get the column names of a file and print them iteratively. I guess the problem is with the print $i but I don't know how to correct it. The code I tried is:
#! /bin/bash
for i in {2..5}
do
set snp = head -n 1 smaller.txt | awk '{print $i}'
echo $snp
done
Example input file:
ID Name Age Sex State Ext
1 A 12 M UT 811
2 B 12 F UT 818
Desired output:
Name
Age
Sex
State
Ext
But the output I get is blank screen.

You'd better just read the first line of your file and store the result as an array:
read -a header < smaller.txt
and then printf the relevant fields:
printf "%s\n" "${header[#]:1}"
Moreover, this uses bash only, and involves no unnecessary loops.
Edit. To also answer your comment, you'll be able to loop through the header fields thus:
read -a header < smaller.txt
for snp in "${header[#]:1}"; do
echo "$snp"
done
Edit 2. Your original method had many many mistakes. Here's a corrected version of it (although what I wrote before is a much preferable way of solving your problem):
for i in {2..5}; do
snp=$(head -n 1 smaller.txt | awk "{print \$$i}")
echo "$snp"
done
set probably doesn't do what you think it does.
Because of the single quotes in awk '{print $i}', the $i never gets expanded by bash.
This algorithm is not good since you're calling head and awk 4 times, whereas you don't need a single external process.
Hope this helps!

You can print it using awk itself:
awk 'NR==1{for (i=2; i<=5; i++) print $i}' smaller.txt

The main problem with your code is that your assignment syntax is wrong. Change this:
set snp = head -n 1 smaller.txt | awk '{print $i}'
to this:
snp=$(head -n 1 smaller.txt | awk '{print $i}')
That is:
Do not use set. set is for setting shell options, numbered parameters, and so on, not for assigning arbitrary variables.
Remove the spaces around =.
To run a command and capture its output as a string, use $(...) (or `...`, but $(...) is less error-prone).
That said, I agree with gniourf_gniourf's approach.

Here's another alternative; not necessarily better or worse than any of the others:
for n in $(head smaller.txt)
do
echo ${n}
done

somthin like
for x1 in $(head -n1 smaller.txt );do
echo $x1
done

Related

How to search the full string in file which is passed as argument in shell script?

i am passing a argument and that argument i have to match in file and extract the information. Could you please how I can get it?
Example:
I have below details in file-
iMedical_Refined_load_Procs_task_id=970113
HV_Rawlayer_Execution_Process=988835
iMedical_HV_Refined_Load=988836
DHS_RawLayer_Execution_Process=988833
iMedical_DHS_Refined_Load=988834
If I am passing 'hv' as argument so it should to pick 'iMedical_HV_Refined_Load' and give the result - '988836'
If I am passing 'dhs' so it should pick - 'iMedical_DHS_Refined_Load' and give the result = '988834'
I tried below logic but its not giving the result correctly. What Changes I need to do-
echo $1 | tr a-z A-Z
g=${1^^}
echo $g
echo $1
val=$(awk -F= -v s="$g" '$g ~ s{print $2}' /medaff/Scripts/Aggrify/sltconfig.cfg)
echo "TASK ID is $val"
Assuming your matching criteria is the first string after delimiter _ and the output needed is the numbers after the = char, then you can try this sed
$ sed -n "/_$1/I{s/[^=]*=\(.*\)/\1/p}" input_file
$ read -r input
hv
$ sed -n "/_$input/I{s/[^=]*=\(.*\)/\1/p}" input_file
988836
$ read -r input
dhs
$ sed -n "/_$input/I{s/[^=]*=\(.*\)/\1/p}" input_file
988834
If I'm reading it right, 2 quick versions -
$: cat 1
awk -F= -v s="_${1^^}_" '$1~s{print $2}' file
$: cat 2
sed -En "/_${1^^}_/{s/^.*=//;p;}" file
Both basically the same logic.
In pure bash -
$: cat 3
while IFS='=' read key val; do [[ "$key" =~ "_${1^^}_" ]] && echo "$val"; done < file
That's a lot less efficient, though.
If you know for sure there will be only one hit, all these could be improved a bit by short-circuit exits, but on such a small sample it won't matter at all. If you have a larger dataset to read, then I strongly suggest you formalize your specs better than "in this set I should get...".

grep reverse with exact matching

I have a list file, which has id and number and am trying to get those lines from a master file which do not have those ids.
List file
nw_66 17296
nw_67 21414
nw_68 21372
nw_69 27387
nw_70 15830
nw_71 32348
nw_72 21925
nw_73 20363
master file
nw_1 5896
nw_2 52814
nw_3 14537
nw_4 87323
nw_5 56466
......
......
nw_n xxxxx
so far am trying this but not working as expected.
for i in $(awk '{print $1}' list.txt); do grep -v -w $i master.txt; done;
Kindly help
Give this awk one-liner a try:
awk 'NR==FNR{a[$1]=1;next}!a[$1]' list master
Maybe this helps:
awk 'NR == FNR {id[$1]=1;next}
{
if (id[$1] == "") {
print $0
}
}' listfile masterfile
We accept 2 files as input above, first one is listfile, second is masterfile.
NR == FNR would be true while awk is going through listfile. In the associative array id[], all ids in listfile are made a key with value as 1.
When awk goes through masterfile, it only prints a line if $1 i.e. the id is not a key in array ids.
The OP attempted the following line:
for i in $(awk '{print $1}' list.txt); do grep -v -w $i master.txt; done;
This line will not work as for every entry $i, you print all entries in master.txt tat are not equivalent to "$i". As a consequence, you will end up with multiple copies of master.txt, each missing a single line.
Example:
$ for i in 1 2; do grep -v -w "$i" <(seq 1 3); done
2 \ copy of seq 1 3 without entry 1
3 /
1 \ copy of seq 1 3 without entry 2
3 /
Furthermore, the attempt reads the file master.txt multiple times. This is very inefficient.
The unix tool grep allows one the check multiple expressions stored in a file in a single go. This is done using the -f flag. Normally this looks like:
$ grep -f list.txt master.txt
The OP can use this now in the following way:
$ grep -vwf <(awk '{print $1}' list.txt) master.txt
But this would do matches over the full line.
The awk solution presented by Kent is more flexible and allows the OP to define a more tuned match:
awk 'NR==FNR{a[$1]=1;next}!a[$1]' list master
Here the OP clearly states, I want to match column 1 of list with column 1 of master and I don't care about spaces or whatever is in column 2. The grep solution could still match entries in column 2.

Replace string in a file from a file [duplicate]

This question already has answers here:
Difference between single and double quotes in Bash
(7 answers)
Closed 5 years ago.
I need help with replacing a string in a file where "from"-"to" strings coming from a given file.
fromto.txt:
"TRAVEL","TRAVEL_CHANNEL"
"TRAVEL HD","TRAVEL_HD_CHANNEL"
"FROM","TO"
First column is what to I'm searching for, which is to be replaced with the second column.
So far I wrote this small script:
while read p; do
var1=`echo "$p" | awk -F',' '{print $1}'`
var2=`echo "$p" | awk -F',' '{print $2}'`
echo "$var1" "AND" "$var2"
sed -i -e 's/$var1/$var2/g' test.txt
done <fromto.txt
Output looks good (x AND y), but for some reason it does not replace the first column ($var1) with the second ($var2).
test.txt:
"TRAVEL"
Output:
"TRAVEL" AND "TRAVEL_CHANNEL"
sed -i -e 's/"TRAVEL"/"TRAVEL_CHANNEL"/g' test.txt
"TRAVEL HD" AND "TRAVEL_HD_CHANNEL"
sed -i -e 's/"TRAVEL HD"/"TRAVEL_HD_CHANNEL"/g' test.txt
"FROM" AND "TO"
sed -i -e 's/"FROM"/"TO"/g' test.txt
$ cat test.txt
"TRAVEL"
input:
➜ cat fromto
TRAVEL TRAVEL_CHANNEL
TRAVELHD TRAVEL_HD
➜ cat inputFile
TRAVEL
TRAVELHD
The work:
➜ awk 'BEGIN{while(getline < "fromto") {from[$1] = $2}} {for (key in from) {gsub(key,from[key])} print}' inputFile > output
and output:
➜ cat output
TRAVEL_CHANNEL
TRAVEL_CHANNEL_HD
➜
This first (BEGIN{}) loads your input file into an associate array: from["TRAVEL"] = "TRAVEL_HD", then rather inefficiently performs search and replace line by line for each array element in the input file, outputting the results, which I piped to a separate outputfile.
The caveat, you'll notice, is that the search and replaces can interfere with each other, the 2nd line of output being a perfect example since the first replacement happens. You can try ordering your replacements differently, or use a regex instead of a gsub. I'm not certain if awk arrays are guaranteed to have a certain order, though. Something to get you started, anyway.
2nd caveat. There's a way to do the gsub for the whole file as the 2nd step of your BEGIN and probably make this much faster, but I'm not sure what it is.
you can't do this oneshot you have to use variables within a script
maybe something like below sed command for full replacement
-bash-4.4$ cat > toto.txt
1
2
3
-bash-4.4$ cat > titi.txt
a
b
c
-bash-4.4$ sed 's|^\s*\(\S*\)\s*\(.*\)$|/^\2\\>/s//\1/|' toto.txt | sed -f - titi.txt > toto.txt
-bash-4.4$ cat toto.txt
a
b
c
-bash-4.4$

awk usage in a variable

actlist file contains around 15 records. I want to print/store each row in a variable to perform further action. script runs but echo $j displays blank value. What is the issue?
my script:
#/usr/bin/sh
acList=/root/john/actlist
Rowcount=`wc -l $acList | awk -F " " '{print $1}'`
for ((i=1; i<=Rowcount; i++)); do
j=`awk 'FNR == $i{print}' $acList`
echo $j
done
file: actlist
cat > actlist
5663233332 2223 2
5656556655 5545 5
4454222121 5555 5
.
.
.
The issue happens to be related to quotes and to the way the shell interpolates variables.
More specifically, when you write
j=`awk "FNR == $i{print}" $acList`
the AWK code must be enclosed into double quotes. This is necessary if you want the shell to be able to substitute the $i with the actual value stored in the i variable.
On the other hand, if you write
j=`awk 'FNR == $i{print}' $acList`
i.e. with single quotes, the $i will be interpreted as a literal string.
Hence the fixed code will read:
#/usr/bin/sh
acList=/root/john/actlist
Rowcount=`wc -l $acList | awk -F " " '{print $1}'`
for ((i=1; i<=Rowcount; i++)); do
j=`awk "FNR == $i{print}" $acList`
echo $j
done
Remember: it is always the shell that does variable interpolation before calling other commands.
Having said that, there are some places, in supplied code, where some improvements could be devised. But that's another story.
Unfortunately all your script does is print the contents of the input file so we can't help you figure out the right approach to do whatever it is you REALLY want to do without more information on what that is but chances are this is the right starting point:
acList=/root/john/actlist
awk '
{ print }
' "$acList"
I think you would probably be better off with this for parsing your file:
#!/bin/bash
while read a b c; do
echo $a, $b, $c
done < "$actlist"
Output:
5663233332, 2223, 2
5656556655, 5545, 5
4454222121, 5555, 5
Updated
Whilst the above demonstrates the concept I was suggesting, as #EdMorton rightly says in the comments section, the following code would be more robust for a production environment.
#!/bin/bash
while IFS= read -r a b c; do
echo "$a, $b, $c"
done < "$actlist"

Get the index of a specific string from a dynamically generated output

I tried a lot of things, but now I am at my wit's end.
My problem is I need the index of a specific string from my dynamically generated output.
In example, I want the index from the string 'cookie' in this output:
1337 cat dog table cookie 42
So in this example I would need this result:
5
One problem is that I need that number for a later executed awk command. Another problem is that this generated output has a flexible length and you are not able to 'sed' something with . - or something else. There is no such pattern like this.
Cheers
Just create an array mapping the string value to it's index and then print the entry:
$ cat file
1337 cat dog table cookie 42
$ awk -v v="cookie" '{v2i[v]=0; for (i=1;i<=NF;i++) v2i[$i]=i; print v2i[v]}' file
5
The above will print 0 if the string doesn't exist as a field on the given line.
By the way, you say you need that number output from the above "for a later executed awk command". Wild idea - why not do both steps in one awk command?
Ugly, but possible:
echo '1337 cat dog table cookie 42' \
| tr ' ' '\n' \
| grep -Fn cookie \
| cut -f1 -d:
Here is a way to find position of word in a string using gnu awk (due to RS), and store it to a variable.
pat="cookie"
pos=$(echo "1337 cat dog table cookie 42" | awk '{print NF+1;exit}' RS="$pat")
echo "$pos"
5
If you do not have gnu awk
pat="cookie"
pos=$(echo "1337 cat dog table cookie 42" | awk '{for (i=1;i<=NF;i++) if ($i~p) print i}' p="$pat")
echo "$pos"
5
Here is pure bash way of doing it with arrays, no sed or awk or GNUs required ;-)
# Load up array, you would use your own command in place of echo
array=($(echo 1337 cat dog table cookie 42))
# Show what we have
echo ${array[*]}
1337 cat dog table cookie 42
# Find which element contains our pattern
for ((i=0;i<${#array[#]};i++)); do [ ${array[$i]} == "cookie" ] && echo $(($i+1)); done
5
Of course, you could set a variable to use later instead of echoing $i+1. You may also want some error checking in case pattern isn't found, but you get the idea!
Here is another answer, not using arrays, or "sed" or "awk" or "tr", just based on the bash IFS separating the values for you:
#!/bin/bash
output="cat dog mouse cookie 42" # Or output=$(yourProgram)
f=0 # f will be your answer
i=0 # i counts the fields
for x in $output; do \
((i++)); [[ "$x" = "cookie" ]] && f=$i; \
done
echo $f
Result:
4
Or you can put it all on one line, if you remove the backslashes, like this:
#!/bin/bash
output="cat dog mouse cookie 42" # Or output=$(yourProgram)
f=0;i=0;for x in $output; do ((i++)); [[ "$x" = "cookie" ]] && f=$i; done
echo $f
Explanation:
The "[[a=b]] && c" part is just shorthand for
if [a=b]; then
c
fi
It relies on shortcut evaluation of logicals. Basically, we are asking shell to determine if the two statements "a equals b" AND the statement "c" are both true. If a is not equal to b, it already knows it doesn't need to evaluate c because they already can't both be true - so f doesn't get the value of i. If, on the other hand, a is equal to b, the shell must still evaluate statement "c" to see if it is also true - and when it does so, f will get the value of i.
Pat="cookie"
YourInput | sed -n "/${Pat}/ {s/.*/ & /;s/ ${Pat} .*/I/;s/[[:blank:]\{1,\}[^[:blank:]\{1,\}/I/g
s/I\{9\}/9/;s/I\{8\}/8/;s/I\{7\}/7/;s/IIIIII/6/;s/IIIII/5/;s/IIII/4/;s/III/3/;s/II/2/;s/I/1/
p;q;}
$ s/.*/0/p"
if there is more than 9 cols, a more complex sed could be made or pass through a wc -c instead

Resources