bash for loops not looping (awk, bash, linux) - linux

Here is a sample dataset (10 cols, 2 rows):
8 1 4 10 7 9 2 3 6 5
0.001475 10.001 20.25 30.5 40.75 51 61.25 71.5 81.75 92
I would like to output ten files for each dataset. Each file will contain a unique value from the second row, and the filename will contain the value from the corresponding column in the first row.
(example: a file containing .001475, called foo_bar_8.1D
See my code below, intended for use on the following datasets:
OrderTimesKC_voxel_tuning_1.txt
OrderTimesKC_voxel_tuning_2.txt
OrderTimesKC_voxel_tuning_3.txt
OrderTimesKC_voxel_tuning_4.txt
OrderTimesKC_voxel_tuning_5.txt
Script:
subj='KC'
for j in {1..5}; do
for x in {1..10}; do
a=$(awk 'FNR == 1 {print $"$x"}' OrderTimes"$subj"_voxel_tuning_"$j".txt) #a == row 1, column x
b=$(awk 'FNR == 2 {print $"$x"}' OrderTimes"$subj"_voxel_tuning_"$j".txt) #b == row 2, column x
echo $b > voxTim_"$subj"_"$j"_"$a".1D
done
done
the current outputted files are:
voxTim_KC_1_8?1?4?10?7?9?2?3?6?5.1D
voxTim_KC_2_8?1?4?10?7?9?2?3?6?5.1D
voxTim_KC_3_8?1?4?10?7?9?2?3?6?5.1D
voxTim_KC_4_8?1?4?10?7?9?2?3?6?5.1D
voxTim_KC_5_8?1?4?10?7?9?2?3?6?5.1D
these contain ten values per file, indicating that it is not looping correctly.
what I want is:
voxTim_KC_1_1.1D, voxTim_KC_1_2.1D, voxTim_KC_1_3.1D.....
voxTim_KC_2_1.1D, voxTim_KC_2_2.1D, voxTim_KC_2_3.1D.....
and so on..
Thank you!

awk to the rescue!
You can use awk more effectively, for example this script will do the extraction of the two values from each input file and create 10 (or actual number of columns) files with the data
$ awk 'FNR==1{c++; n=split($0,r1); next}
FNR==2{split($0,r2);
for(i=1;i<=n;i++) print r2[i] > "file."c"."r1[i]".1D"}' input1 input2
will create set of files for given input1 and input2 files. You can use this as a template and get rid of the for loops.
For example
$ tail -n 2 *
==> input1 <==
8 1 4 10 7 9 2 3 6 5
0.001475 10.001 20.25 30.5 40.75 51 61.25 71.5 81.75 92
==> input2 <==
98 91 94 910 97 99 92 93 96 95
0.001475 10.001 20.25 30.5 40.75 51 61.25 71.5 81.75 92
after running the script
$ ls
file.1.1.1D file.1.2.1D file.1.4.1D file.1.6.1D file.1.8.1D file.2.91.1D file.2.92.1D file.2.94.1D file.2.96.1D file.2.98.1D input1
file.1.10.1D file.1.3.1D file.1.5.1D file.1.7.1D file.1.9.1D file.2.910.1D file.2.93.1D file.2.95.1D file.2.97.1D file.2.99.1D input2
and contents
$ tail -n 2 file.1*
==> file.1.1.1D <==
10.001
==> file.1.10.1D <==
30.5
==> file.1.2.1D <==
61.25
==> file.1.3.1D <==
71.5
==> file.1.4.1D <==
20.25
etc...
actually, you can simply it further to
$ awk 'FNR==1{c++; n=split($0,r1)}
FNR==2{for(i=1;i<=n;i++) print $i > ("file."c"."r1[i]".1D")}' input1 input2

Just with bash:
subj=KC
for j in {1..5}; do
{
read -ra a # read the 1st line into array 'a'
read -ra b # read the 2nd line into array 'b'
for i in {0..9}; do
echo "${b[i]}" > "voxTim_${subj}_${j}_${a[i]}.1D"
done
} < "OrderTimes${subj}_voxel_tuning_${j}.txt"
done

Related

Compare two files having different column numbers and print the requirement to a new file if condition satisfies

I have two files with more than 10000 rows:
File1 has 1 col File2 has 4 col
23 23 88 90 0
34 43 74 58 5
43 54 87 52 3
54 73 52 35 4
. .
. .
I want to compare each value in file-1 with that in file-2. If exists then print the value along with other three values in file-2. In this example output will be:
23 88 90 0
43 74 58 5
54 87 52 3
.
.
I have written following script, but it is taking too much time to execute.
s1=1; s2=$(wc -l < File1.txt)
while [ $s1 -le $s2 ]
do n=$(awk 'NR=="$s1" {print $1}' File1.txt)
p1=1; p2=$(wc -l < File2.txt)
while [ $p1 -le $p2 ]
do awk '{if ($1==$n) printf ("%s %s %s %s\n", $1, $2, $3, $4);}'> ofile.txt
(( p1++ ))
done
(( s1++ ))
done
Is there any short/ easy way to do it?
You can do it very shortly using awk as
awk 'FNR==NR{found[$1]++; next} $1 in found'
Test
>>> cat file1
23
34
43
54
>>> cat file2
23 88 90 0
43 74 58 5
54 87 52 3
73 52 35 4
>>> awk 'FNR==NR{found[$1]++; next} $1 in found' file1 file2
23 88 90 0
43 74 58 5
54 87 52 3
What it does?
FNR==NR Checks if FNR file number of record is equal to NR total number of records. This will be same only for the first file, file1 because FNR is reset to 1 when awk reads a new file.
{found[$1]++; next} If the check is true then creates an associative array indexed by $1, the first column in file1
$1 in found This check is only done for the second file, file2. If column 1 value, $1 is and index in associative array found then it prints the entire line ( which is not written because it is the default action)

keep groups of lines with specific keywords (bash)

I have a text file with plenty of lines in this format (the lines between every two # defined as a group):
# some str for test
hdfv 12 9 b
cgj 5 11 t
# another string to examine
kinj 58 96 f
dfg 7 26 u
fds 9 76 j
---
key.txt:
string to
---
output:
# another string to examine
kinj 58 96 f
dfg 7 26 u
fds 9 76 j
I should search some keywords(string,to) from lines which starts with # and if the keywords does not exist in key.txt (a file with two columns) then I should remove that line and the following lines(of that group).I've written this code without result!(key words are together in input file as the example )
cat input.txt | while IFS=$'#' read -r -a myarray
do
a=${myarray[1]}
b=${myarray[0]}
unset IFS
read -r a x y z <<< "$a"
key=$(echo "$x $y")
if grep "$key" key.txt > /dev/null
then
echo $key exists
else
grep -v -e "$a" -e "$b" input.txt > $$ && mv $$ input.txt
fi
done
can some one help me?
A simple way to get correct block is using awk and correct Record Selector:
awk 'FNR==NR {a[$0];next} { RS="#";for (i in a) if ($0~i) print}' key.txt input.txt
another string to examine
kinj 58 96 f
dfg 7 26 u
fds 9 76 j
This should reinsert the # that is used and remove the extra empty line. I may be simpler ways to do this, but this works.
awk 'FNR==NR {a[$0];next} { RS="#";for (i in a) if ($0~i) {sub(/^ /,RS);sub(/\n$/,x);print}}' key.txt input.txt
#another string to examine
kinj 58 96 f
dfg 7 26 u
fds 9 76 j

AWK--Comparing the value of two variables in two different files

I have two text files A.txt and B.txt. Each line of A.txt
A.txt
100
222
398
B.txt
1 2 103 2
4 5 1026 74
7 8 209 55
10 11 122 78
What I am looking for is something like this:
for each line of A
search B;
if (the value of third column in a line of B - the value of the variable in A > 10)
print that line of B;
Any awk for doing that??
How about something like this,
I had some troubles understanding your question, but maybe this will give you some pointers,
#!/bin/bash
# Read intresting values from file2 into an array,
for line in $(cat 2.txt | awk '{print $3}')
do
arr+=($line)
done
# Linecounter,
linenr=0
# Loop through every line in file 1,
for val in $(cat 1.txt)
do
# Increment linecounter,
((linenr++))
# Loop through every element in the array (containing values from 3 colum from file2)
for el in "${!arr[#]}";
do
# If that value - the value from file 1 is bigger than 10, print values
if [[ $((${arr[$el]} - $val )) -gt 10 ]]
then
sed -n "$(($el+1))p" 2.txt
# echo "Value ${arr[$el]} (on line $(($el+1)) from 2.txt) - $val (on line $linenr from 1.txt) equals $((${arr[$el]} - $val )) and is hence bigger than 10"
fi
done
done
Note,
This is a quick and dirty thing, there is room for improvements. But I think it'll do the job.
Use awk like this:
cat f1
1
4
9
16
cat f2
2 4 10 8
3 9 20 8
5 1 15 8
7 0 30 8
awk 'FNR==NR{a[NR]=$1;next} $3-a[FNR] < 10' f1 f2
2 4 10 8
5 1 15 8
UPDATE: Based on OP's edited question:
awk 'FNR==NR{a[NR]=$1;next} {for (i in a) if ($3-a[i] > 10) print}'
and see how simple awk based solution is as compared to nested for loops.

Cutting Element in Unix Based on Column Value

Without a shell script, in a single line. What command can help you cut from a row based on the column value
For example:
In
118 Balboni,Steve 23
11 Baker,Doug 0
120 Armas,Tony 13
133 Allanson,Andy 5
158 Baines,Harold 13
33 Bando,Chris 1
44 Adduci,James 1
50 Aguayo,Luis 3
5 Allen,Rod 0
94 Anderson,Brady 1
IF there 3rd row is not zero, how do I remove the row entirely in one statement? Is this possible in unix?
Assuming that the question is really asking about 'if the third column is non-zero, do not print it' or (equivalently) 'only print the row if the third column is 0':
Using awk:
awk '$3 == 0' data
(If the third column is zero, print the input; otherwise, ignore it. You could add { print } after the 0 to make the action explicit.)
Using perl:
perl -nae 'print if $F[2] == 0' data
Using sed:
sed -n '/ 0$/p' data
Using grep:
grep '[^0-9]0$' input
This does the inplace replacement.
perl -i -F -pane 'undef $_ if($F[2]!=0)' your_file
tested:
> cat temp
118 Balboni,Steve 23
11 Baker,Doug 0
120 Armas,Tony 13
133 Allanson,Andy 5
158 Baines,Harold 13
33 Bando,Chris 1
44 Adduci,James 1
50 Aguayo,Luis 3
5 Allen,Rod 0
94 Anderson,Brady 1
>
>
> perl -i -F -pane 'undef $_ if($F[2]!=0)' temp
> cat temp
11 Baker,Doug 0
5 Allen,Rod 0
>
If you wish to print lines that have no third column as well as those in which the 3rd column is explicitly 0 (ie, if you consider a blank field to be zero), try:
awk '!$3'
If you do not want to print lines with only 2 columns, try:
awk 'NF>2 && !$3'

How to extract one column from multiple files, and paste those columns into one file?

I want to extract the 5th column from multiple files, named in a numerical order, and paste those columns in sequence, side by side, into one output file.
The file names look like:
sample_problem1_part1.txt
sample_problem1_part2.txt
sample_problem2_part1.txt
sample_problem2_part2.txt
sample_problem3_part1.txt
sample_problem3_part2.txt
......
Each problem file (1,2,3...) has two parts (part1, part2). Each file has the same number of lines.
The content looks like:
sample_problem1_part1.txt
1 1 20 20 1
1 7 21 21 2
3 1 22 22 3
1 5 23 23 4
6 1 24 24 5
2 9 25 25 6
1 0 26 26 7
sample_problem1_part2.txt
1 1 88 88 8
1 1 89 89 9
2 1 90 90 10
1 3 91 91 11
1 1 92 92 12
7 1 93 93 13
1 5 94 94 14
sample_problem2_part1.txt
1 4 330 30 a
3 4 331 31 b
1 4 332 32 c
2 4 333 33 d
1 4 334 34 e
1 4 335 35 f
9 4 336 36 g
The output should look like: (in a sequence of problem1_part1, problem1_part2, problem2_part1, problem2_part2, problem3_part1, problem3_part2,etc.,)
1 8 a ...
2 9 b ...
3 10 c ...
4 11 d ...
5 12 e ...
6 13 f ...
7 14 g ...
I was using:
paste sample_problem1_part1.txt sample_problem1_part2.txt > \
sample_problem1_partall.txt
paste sample_problem2_part1.txt sample_problem2_part2.txt > \
sample_problem2_partall.txt
paste sample_problem3_part1.txt sample_problem3_part2.txt > \
sample_problem3_partall.txt
And then:
for i in `find . -name "sample_problem*_partall.txt"`
do
l=`echo $i | sed 's/sample/extracted_col_/'`
`awk '{print $5, $10}' $i > $l`
done
And:
paste extracted_col_problem1_partall.txt \
extracted_col_problem2_partall.txt \
extracted_col_problem3_partall.txt > \
extracted_col_problemall_partall.txt
It works fine with a few files, but it's a crazy method when the number of files is large (over 4000).
Could anyone help me with simpler solutions that are capable of dealing with multiple files, please?
Thanks!
Here's one way using awk and a sorted glob of files:
awk '{ a[FNR] = (a[FNR] ? a[FNR] FS : "") $5 } END { for(i=1;i<=FNR;i++) print a[i] }' $(ls -1v *)
Results:
1 8 a
2 9 b
3 10 c
4 11 d
5 12 e
6 13 f
7 14 g
Explanation:
For each line of input of each input file:
Add the files line number to an array with a value of column 5.
(a[FNR] ? a[FNR] FS : "") is a ternary operation, which is set up to build up the arrays value as a record. It simply asks if the files line number is already in the array. If so, add the arrays value followed by the default file separator before adding the fifth column. Else, if the line number is not in the array, don't prepend anything, just let it equal the fifth column.
At the end of the script:
Use a C-style loop to iterate through the array, printing each of the arrays values.
For only ~4000 files, you should be able to do:
find . -name sample_problem*_part*.txt | xargs paste
If find is giving names in the wrong order, pipe it to sort:
find . -name sample_problem*_part*.txt | sort ... | xargs paste
# print filenames in sorted order
find -name sample\*.txt | sort |
# extract 5-th column from each file and print it on a single line
xargs -n1 -I{} sh -c '{ cut -s -d " " -f 5 $0 | tr "\n" " "; echo; }' {} |
# transpose
python transpose.py ?
where transpose.py:
#!/usr/bin/env python
"""Write lines from stdin as columns to stdout."""
import sys
from itertools import izip_longest
missing_value = sys.argv[1] if len(sys.argv) > 1 else '-'
for row in izip_longest(*[column.split() for column in sys.stdin],
fillvalue=missing_value):
print " ".join(row)
Output
1 8 a
2 9 b
3 10 c
4 11 d
5 ? e
6 ? f
? ? g
Assuming the first and second files have less lines than the third one (missing values are replaced by '?').
Try this one. My script assumes that every file has the same number of lines.
# get number of lines
lines=$(wc -l sample_problem1_part1.txt | cut -d' ' -f1)
for ((i=1; i<=$lines; i++)); do
for file in sample_problem*; do
# get line number $i and delete everything except the last column
# and then print it
# echo -n means that no newline is appended
echo -n $(sed -n ${i}'s%.*\ %%p' $file)" "
done
echo
done
This works. For 4800 files, each 7 lines long it took 2 minutes 57.865 seconds on a AMD Athlon(tm) X2 Dual Core Processor BE-2400.
PS: The time for my script increases linearly with the number of lines. It would take very long time to merge files with 1000 lines. You should consider learning awk and use the script from steve. I tested it: For 4800 files, each with 1000 lines it took only 65 seconds!
You can pass awk output to paste and redirect it to a new file as follows:
paste <(awk '{print $3}' file1) <(awk '{print $3}' file2) <(awk '{print $3}' file3) > file.txt

Resources