Creating a nested for loop in bash script - linux

I am trying to create a nested for loop which will count from 1 to 10 and the second or nested loop will count from 1 to 5.
for ((i=1;i<11;i++));
do
for ((a=1;a<6;a++));
do
echo $i:$a
done
done
What I though the output of this loop was going to be was:
1:1
2:2
3:3
4:4
5:5
6:1
7:2
8:3
9:4
10:5
But instead the output was
1:1
1:2
1:3
1:4
1:5
2:1
...
2:5
3:1
...
and same thing till 10:5
How can I modify my loop to get the result I want!
Thanks

Your logic is wrong. You don't want a nested loop at all. Try this:
for ((i=1;i<11;++i)); do
echo "$i:$((i-5*((i-1)/5)))"
done
This uses integer division to subtract the right number of multiples of 5 from the value of $i:
when $i is between 1 and 5, (i-1)/5 is 0, so nothing is subtracted
when $i is between 6 and 10, (i-1)/5 is 1, so 5 is subtracted
etc.

You must not use a nested loop in this case. What you have is a second co-variable, i.e. something that increments similar to the outer loop variable. It's not at all independent of the outer loop variable.
That means you can calculate the value of a from i:
for ((i=1;i<11;i++)); do
((a=((i-1)%5)+1))
echo $i:$a
done
%5 will make sure that the value is always between 0 and 4. That means we need i to run from 0 to 9 which gives us i-1. Afterwards, we need to move 0...4 to 0...5 with +1.

I know #AaronDigulla's answer is what OP wants. this is another way to get the output :)
paste -d':' <(seq 10) <(seq 5;seq 5)

a=0
for ((i=1;i<11;i++))
do
a=$((a%5))
((a++))
echo $i:$a
done
If you really need it to use 2 loops, try
for ((i=1;i<3;i++))
do
for ((a=1;a<6;a++))
do
echo "$((i*a)):$a"
done
done

Related

Print out even numbers between two given integers

How would I print out even numbers between two numbers?
I have a script where a user enters in two values and them two values are placed into their respective array elements. How would I print the even numbers between the two values?
See man seq. You can use
seq first incr last
for example
seq 4 2 18
to print even numbers from 4 to 18 (inclusive)
If you have bash.
printf '%s\n' {4..18..2}
Or a c-style for loop
for
for ((i=4;i<=18;i+=2)); do echo "$i"; done

I do not understand the break command

in:
#!/bin/sh
for var1 in 1 2 3
do
for var2 in 0 5
do
if [ $var1 -eq 2 -a $var2 -eq 0 ]
then
break 2
else
echo "$var1 $var2"
fi
done
done
the output is:
1 0
1 5
and then script stops.
how ever if the break command's argument (2) is removed, the output is:
1 0
1 5
3 0
3 5
What i am asking is why 3 0 and 3 5 are printed, when the script is conditioned not to break? script didn't print 2 0 and 2 5, and 3 0 and 3 5 should signal a break as well...
The optional argument to break tells it which loop to break out of. If the argument is omitted, it breaks out of the innermost loop. With an argument n it breaks out of the nth enclosing loop.
So break 2 breaks out of the for var1 loop, because it's the 2nd enclosing loop. If you change it to break, it just breaks out of the for var2 loop, so it goes to the next iteration of for var1.
To summarize the comments, there were two issues:
Why is 3 0 printed after a break, but not after break 2?
This was because the condition ([ $var1 -eq 2 -a $var2 -eq 0 ]) checked for equality rather than -ge, greater or equal. With -ge there will be no echos where both numbers are greater.
The break 2 instead exited both loops, thereby giving the same effect in this particular case. If the loop had been for var1 in 1 2 0, break 2 would have also prevented 0 0 from showing up since both loops would have been stopped.
Why is 2 5 not printed after a brake?
This is because the entire inner loop stops on a break, so no other iterations will have their chance to echo. To instead skip the current iteration and immediately try the next one, use continue.
Just a simple break breaks out of one loop - the inner for loop in your case.
However, if you use an additional integer in the break statement, as in break 2, then breaks out of the specified number of loops - that is two for loops in your case. Since there are no more than two nested loops and there is no more code after the outermost loop, it is effectively the same as ending the script.

Gnuplot: store last data point as variable

I was wondering if there is an easy way, using Gnuplot, to get the last point in a data file say data.txt and store the values in a variable.
From this question,
accessing the nth datapoint in a datafile using gnuplot
I know that I can get the X-Value using stats and GP_VAL_DATA_X_MAX variable, but is there a simple trick to get the corresponding y-value?
A third possibility is to write each ordinate value into the same user variable during plotting. Last value stays in:
plot dataf using 1:(lasty=$2)
print lasty
If you want to use Gnuplot, you can
plot 'data.txt'
plot[GPVAL_DATA_X_MAX:] 'data.txt'
show variable GPVAL_DATA_Y_MAX
OR
plot 'data.txt'
plot[GPVAL_DATA_X_MAX:] 'data.txt'
print GPVAL_DATA_Y_MAX
If you know how your file is organised (separators, trailing empty lines) and you have access to standard Unix tools, you make use of Gnuplot’s system command. For example, if you have no trailing newlines and your values are separated by tabs, you can do the following:
x = system("tail -n 1 data.txt | cut -f 1")
y = system("tail -n 1 data.txt | cut -f 2")
(tail gets the last n lines of a file. cut extracts the column f.)
Note that x and y are strings if obtained this way, but for most applications this should not matter. If you must convert them, you can still add zero.
Let me add a 4th solution, because:
To be very precise, the OP asked about the last x-value and the correscponding y-value.
#TomSolid's solution will return the maximum x-value and its corresponding y-value.
However, strictly speaking the maximum value might not necessarily be the last value, unless the x-data is sorted in ascending order.
The result for the example below would be 10 and 14 instead of 8 and 18.
#Karl's solution will return the last y-value and as well plots something, although you maybe just want to extract the value and plot something else. Ideally, you could combine extraction and plotting.
#Wrzlprmft's solution is using the Linux function tail which is not platform-independent (for Windows you first would have to install such utilities)
Hence, here is a solution:
platform-independent and gnuplot-only
returns the last x-value and corresponding y-value
doesn't create any dummy plot
Script:
### get the last x-value and corresponding y-value
reset session
$Data <<EOD
1 11
2 12
3 13
10 14
5 15
6 16
7 17
8 18
EOD
stats $Data u (lastX=$1,lastY=$2) nooutput
print lastX, lastY
### end of script
Result:
8.0 18.0

Gnuplot plot specific lines from data file

I have a data file with 24 lines and 3 columns. How can I plot only the data from specific lines, e.g. 3,7,9,14,18,21?
Now I use the following command
plot 'xy.dat' using 0:2:3:xticlabels(1) with boxerrorbars ls 2
which plots all 24 lines.
I tried the every command but couldn't figure out a way that works.
Untested, but something like this
plot "<(sed -n -e 3p -e 7p -e 9p xy.dat)" using ...
Another option may be to annotate your datafile, if as it seems, it contains multiple datasets. Let's say you created your datafile like this:
1 2 3
2 1 3 # SetA
2 7 3 # SetB
2 2 1 # SetA SetB SetC
Then if you wanted just SetA you would use this sed command in the plot statement
sed -ne '/SetA/s/#.*//p' xy.dat
2 1 3
2 2 1
That says..."in general, don't print anything (-n), but, if you do see a line containing SetA, delete the hash sign and everything after it and print the line".
or if you wanted SetB, you would use
sed -ne '/SetB/s/#.*//p' xy.dat
2 7 3
2 2 1
or if you wanted the whole data file, but stripped of our comments
sed -e 's/#.*//' xy.dat
If you wanted SetB and SetC, use
sed -ne '/Set[BC]/s/#.*//p' xy.dat
2 7 3
2 2 1
If the lines you want have something in common that you can evaluate, e.g. the label in column 1 begins with an "a"
plot dataf using (strcol(1)[1:1] eq "a" ? $0 : NaN):2:xticslabel(1)
you can just skip these lines by letting the using statement return "NaN".
This here is an ugly hack you can use in case the desired line numbers are just arbitrary:
linnum = " 1 3 7 12 16 21 "
plot dataf using (strstrt(linnum," ".int($0)." ") != 0 ? $0 : NaN):2
strstrt(a,b) returns the position of string b in string a, zero if it does not appear. I add the two spaces to make the line numbers unique.
But I would recommend using an external program to preprocess the data in that case, see the other answer.
Yes, there is a solution with every. Since you want to plot with boxerrorbars it can be done in a plot for-loop.
no external tools, i.e. gnuplot-only and hence platform-independent
no strictly increasing line numbers, but arbitrary sequence of lines possible
Script:
### plot only certain lines appearing in a list
reset session
# create some random test data
set print $Data
do for [i=1:24] {
print sprintf("line%02d %g %g", i, rand(0)*5+1, rand(0)*0.5)
}
set print
myLines = "3 7 9 14 18 21"
myLine(i) = int(word(myLines,i)-1)
set offsets 0.5,0.5,0,0
set style fill solid 0.3
set boxwidth 0.6
set xtics out
set key noautotitle
set yrange [0:]
plot for [i=1:words(myLines)] $Data u (i):2:3:xtic(1) \
every ::myLine(i)::myLine(i) w boxerrorbars lc "blue"
### end of script
Result:

Extract rows and substrings from one file conditional on information of another file

I have a file 1.blast with coordinate information like this
1 gnl|BL_ORD_ID|0 100.00 33 0 0 1 3
27620 gnl|BL_ORD_ID|0 95.65 46 2 0 1 46
35296 gnl|BL_ORD_ID|0 90.91 44 4 0 3 46
35973 gnl|BL_ORD_ID|0 100.00 45 0 0 1 45
41219 gnl|BL_ORD_ID|0 100.00 27 0 0 1 27
46914 gnl|BL_ORD_ID|0 100.00 45 0 0 1 45
and a file 1.fasta with sequence information like this
>1
TCGACTAGCTACGACTCGGACTGACGAGCTACGACTACGG
>2
GCATCTGGGCTACGGGATCAGCTAGGCGATGCGAC
...
>100000
TTTGCGAGCGCGAAGCGACGACGAGCAGCAGCGACTCTAGCTACTG
I am searching now a script that takes from 1.blast the first column and extracts those sequence IDs (=first column $1) plus sequence and then from the sequence itself all but those positions between $7 and $8 from the 1.fasta file, meaning from the first two matches the output would be
>1
ACTAGCTACGACTCGGACTGACGAGCTACGACTACGG
>27620
GTAGATAGAGATAGAGAGAGAGAGGGGGGAGA
...
(please notice that the first three entries from >1 are not in this sequence)
The IDs are consecutive, meaning I can extract the required information like this:
awk '{print 2*$1-1, 2*$1, $7, $8}' 1.blast
This gives me then a matrix that contains in the first column the right sequence identifier row, in the second column the right sequence row (= one after the ID row) and then the two coordinates that should be excluded. So basically a matrix that contains all required information which elements from 1.fasta shall be extracted
Unfortunately I do not have too much experience with scripting, hence I am now a bit lost, how to I feed the values e.g. in the suitable sed command?
I can get specific rows like this:
sed -n 3,4p 1.fasta
and the string that I want to remove e.g. via
sed -n 5p 1.fasta | awk '{print substr($0,2,5)}'
But my problem is now, how can I pipe the information from the first awk call into the other commands so that they extract the right rows and remove from the sequence rows then the given coordinates. So, substr isn't the right command, I would need a command remstr(string,start,stop) that removes everything between these two positions from a given string, but I think that I could do in an own script. Especially the correct piping is a problem here for me.
If you do bioinformatics and work with DNA sequences (or even more complicated things like sequence annotations), I would recommend having a look at Bioperl. This obviously requires knowledge of Perl, but has quite a lot of functionality.
In your case you would want to generate Bio::Seq objects from your fasta-file using the Bio::SeqIO module.
Then, you would need to read the fasta-entry-numbers and positions wanted into a hash. With the fasta-name as the key and the value being an array of two values for each subsequence you want to extract. If there can be more than one such subsequence per fasta-entry, you would have to create an array of arrays as the value entry for each key.
With this data structure, you could then go ahead and extract the sequences using the subseq method from Bio::Seq.
I hope this is a way to go for you, although I'm sure that this is also feasible with pure bash.
This isn't an answer, it is an attempt to clarify your problem; please let me know if I have gotten the nature of your task correct.
foreach row in blast:
get the proper (blast[$1]) sequence from fasta
drop bases (blast[$7..$8]) from sequence
print blast[$1], shortened_sequence
If I've got your task correct, you are being hobbled by your programming language (bash) and the peculiar format of your data (a record split across rows). Perl or Python would be far more suitable to the task; indeed Perl was written in part because multiple file access in awk of the time was really difficult if not impossible.
You've come pretty far with the tools you know, but it looks like you are hitting the limits of their convenient expressibility.
As either thunk and msw have pointed out, more suitable tools are available for this kind of task but here you have a script that can teach you something about how to handle it with awk:
Content of script.awk:
## Process first file from arguments.
FNR == NR {
## Save ID and the range of characters to remove from sequence.
blast[ $1 ] = $(NF-1) " " $NF
next
}
## Process second file. For each FASTA id...
$1 ~ /^>/ {
## Get number.
id = substr( $1, 2 )
## Read next line (the sequence).
getline sequence
## if the ID is one found in the other file, get ranges and
## extract those characters from sequence.
if ( id in blast ) {
split( blast[id], ranges )
sequence = substr( sequence, 1, ranges[1] - 1 ) substr( sequence, ranges[2] + 1 )
## Print both lines with the shortened sequence.
printf "%s\n%s\n", $0, sequence
}
}
Assuming your 1.blasta of the question and a customized 1.fasta to test it:
>1
TCGACTAGCTACGACTCGGACTGACGAGCTACGACTACGG
>2
GCATCTGGGCTACGGGATCAGCTAGGCGATGCGAC
>27620
TTTGCGAGCGCGAAGCGACGACGAGCAGCAGCGACTCTAGCTACTGTTTGCGA
Run the script like:
awk -f script.awk 1.blast 1.fasta
That yields:
>1
ACTAGCTACGACTCGGACTGACGAGCTACGACTACGG
>27620
TTTGCGA
Of course I'm assumming some things, the most important that fasta sequences are not longer than one line.
Updated the answer:
awk '
NR==FNR && NF {
id=substr($1,2)
getline seq
a[id]=seq
next
}
($1 in a) && NF {
x=substr(a[$1],$7,$8)
sub(x, "", a[$1])
print ">"$1"\n"a[$1]
} ' 1.fasta 1.blast

Resources