Print statement within a loop - linux

I have few text files named as file1.txt, file2.txt and so on.
I would like to print the mean of each file after giving some weightage to it. My script is
#!/bin/sh
m1=3.2; m2=1.2; m3=0.2 #mean of file1.txt, file2.txt ...
for i in {1..100} #files
do for j in 20 30 35 45 #weightages
do
k=m$i*$j #This is an example, calulated as mean of file$i.txt * j
printf "%5s %8.3f\n" "$i" "$k" >> ofile.txt
done
done
The above it printing as
ofile.txt
1 64
1 96
1 112
1 144
2 24
2 36
. .
Desire output format as
ofile.txt
1 64 96 112 144
2 24 36 42 54
3 4 6 7 9
. . . . .
where 1st column is the file numbers, 2nd, 3rd, 4th columns are m*j

Off the top of my head so you might need to correct some stuff.
#!/bin/sh
m1=3.2; m2=1.2; m3=0.2 #mean of file1.txt, file2.txt ...
for i in {1..100} #files
do
printf "%5s" "$i" >> ofile.txt
for j in 20 30 35 45 #weightages
do
k=m$i*$j #This is an example, calulated as mean of file$i.txt * j
printf "\t%8.3f" "$k" >> ofile.txt
done
printf "\n" >> ofile.txt
done

#!/bin/sh
m1=3.2; m2=1.2; m3=0.2 #mean of file1.txt, file2.txt ...
for i in {1..100} #files
ofile_line="$i "
do for j in 20 30 35 45 #weightages
do
k=m$i*$j #This is an example, calulated as mean of file$i.txt * j
support=$(printf "%5s %8.3f\n" "$i" "$k")
ofile_line="${ofile_line}${support} "
done
echo "${ofile_line}" >> ofile.txt
done
You don't need a \n in echo "${ofile_line}" >> ofile.txt because echo breaks the line for you.

Related

Print the count of files in a specific format iteratively in shell script

I have the following folder structure:
A/B/C/D/E/00
A/B/C/D/E/01
.
.
A/B/C/D/E/23
Similarly,
M/N/O/P/Q/00
M/N/O/P/Q/01
.
.
M/N/O/P/Q/23
Now, each folder from 00 to 23 has many files inside, which I would like to count.
If I run this simple command:
ls /A/B/C/D/E/00 | wc -l
I can get the count of files in each of these sub directories. I want to automate this or get it iteratively. Can anyone suggest a way?
Also, the final output I am looking at is a file that should look like this:
C E RESULT OF ls /A/B/C/D/E/00 | wc -l RESULT OF ls /A/B/C/D/E/01 | wc -l
M Q RESULT OF ls /M/N/O/P/Q/00 | wc -l RESULT OF ls /M/N/O/P/Q/01 | wc -l
So, the output should look like this finally
C E 23 23 4 6 7 4 76 98 57 2 67 9 12 34 67 0 2 3 78 98 12 3 57 213
M Q 12 10 2 34 32 1 35 65 87 8 32 2 65 87 98 0 4 12 1 35 34 76 9 67
Please note, the values after the alphabets are the values of file counts of the 24 folders 00, 01 through 23.
Using the eval approach: I can hardcode and get the exact results. But, I wanted it in a way that would show me the data for the previous day. So this is what I did:
d=`date --date ="1 days ago" +%Y%m%d`
month= `date +%Y%m`
eval echo YZ $d '"$(ls "/A/B/YZ/$month/$d/"'{20150800..20150823})'| wc -l)"'
This works perfectly because in the given location there are files inside child directories 20150800,20150801..20150823. However when I try to generalize this like below, it gives me the total count of the folder instead of the count of each sub folder:
eval echo YZ $d '"$(ls "/A/B/YZ/$month/$d/"'{"$d"00.."$d"23})'| wc -l)"'
Something like this (not tested):
for d in [A-Z]/[A-Z]/[A-Z]/[A-Z]/[A-Z]/[0-9][0-9]
do
[[ -d $d ]] && echo $d : $(ls $d|wc -l)
done
Note that this gives an inccorect line count if one of the file names contains a newline character.

keep groups of lines with specific keywords (bash)

I have a text file with plenty of lines in this format (the lines between every two # defined as a group):
# some str for test
hdfv 12 9 b
cgj 5 11 t
# another string to examine
kinj 58 96 f
dfg 7 26 u
fds 9 76 j
---
key.txt:
string to
---
output:
# another string to examine
kinj 58 96 f
dfg 7 26 u
fds 9 76 j
I should search some keywords(string,to) from lines which starts with # and if the keywords does not exist in key.txt (a file with two columns) then I should remove that line and the following lines(of that group).I've written this code without result!(key words are together in input file as the example )
cat input.txt | while IFS=$'#' read -r -a myarray
do
a=${myarray[1]}
b=${myarray[0]}
unset IFS
read -r a x y z <<< "$a"
key=$(echo "$x $y")
if grep "$key" key.txt > /dev/null
then
echo $key exists
else
grep -v -e "$a" -e "$b" input.txt > $$ && mv $$ input.txt
fi
done
can some one help me?
A simple way to get correct block is using awk and correct Record Selector:
awk 'FNR==NR {a[$0];next} { RS="#";for (i in a) if ($0~i) print}' key.txt input.txt
another string to examine
kinj 58 96 f
dfg 7 26 u
fds 9 76 j
This should reinsert the # that is used and remove the extra empty line. I may be simpler ways to do this, but this works.
awk 'FNR==NR {a[$0];next} { RS="#";for (i in a) if ($0~i) {sub(/^ /,RS);sub(/\n$/,x);print}}' key.txt input.txt
#another string to examine
kinj 58 96 f
dfg 7 26 u
fds 9 76 j

AWK--Comparing the value of two variables in two different files

I have two text files A.txt and B.txt. Each line of A.txt
A.txt
100
222
398
B.txt
1 2 103 2
4 5 1026 74
7 8 209 55
10 11 122 78
What I am looking for is something like this:
for each line of A
search B;
if (the value of third column in a line of B - the value of the variable in A > 10)
print that line of B;
Any awk for doing that??
How about something like this,
I had some troubles understanding your question, but maybe this will give you some pointers,
#!/bin/bash
# Read intresting values from file2 into an array,
for line in $(cat 2.txt | awk '{print $3}')
do
arr+=($line)
done
# Linecounter,
linenr=0
# Loop through every line in file 1,
for val in $(cat 1.txt)
do
# Increment linecounter,
((linenr++))
# Loop through every element in the array (containing values from 3 colum from file2)
for el in "${!arr[#]}";
do
# If that value - the value from file 1 is bigger than 10, print values
if [[ $((${arr[$el]} - $val )) -gt 10 ]]
then
sed -n "$(($el+1))p" 2.txt
# echo "Value ${arr[$el]} (on line $(($el+1)) from 2.txt) - $val (on line $linenr from 1.txt) equals $((${arr[$el]} - $val )) and is hence bigger than 10"
fi
done
done
Note,
This is a quick and dirty thing, there is room for improvements. But I think it'll do the job.
Use awk like this:
cat f1
1
4
9
16
cat f2
2 4 10 8
3 9 20 8
5 1 15 8
7 0 30 8
awk 'FNR==NR{a[NR]=$1;next} $3-a[FNR] < 10' f1 f2
2 4 10 8
5 1 15 8
UPDATE: Based on OP's edited question:
awk 'FNR==NR{a[NR]=$1;next} {for (i in a) if ($3-a[i] > 10) print}'
and see how simple awk based solution is as compared to nested for loops.

print $i unles $i is less than 10. using awk or otherwise

I have some data with a series of values on each line like this:
49.01024263 49.13389087 49.38177387 (more numbers...)
42.71585143 43.48711477 44.25625756 (ect..)
43.18826160 43.15332580 43.13094893
30.69076014 28.74489096 26.85725970
eventually the numbers reach values less than 10, at that point I'd like to delete all the remaining numbers in that line.
so far I have this, but its returning several errors.
awk '{for (i=1;i++)do{if ($i > 10.0 ) print $i ; next ; else ; exit}}' input > output
What could I be doing wrong?
Any better ways to carry out this task?
try this line:
awk '{for(i=1;i<=NF;i++)if($i>10)printf "%s ",$i;else break;print ""}' file
test with an example:
kent$ cat f
30 20 15 9 8
50 40 30 20 7 2000
100 200 300 400 5 444
kent$ awk '{for(i=1;i<=NF;i++)if($i>10)printf "%s ",$i;else break;print ""}' f
30 20 15
50 40 30 20
100 200 300 400

How to extract one column from multiple files, and paste those columns into one file?

I want to extract the 5th column from multiple files, named in a numerical order, and paste those columns in sequence, side by side, into one output file.
The file names look like:
sample_problem1_part1.txt
sample_problem1_part2.txt
sample_problem2_part1.txt
sample_problem2_part2.txt
sample_problem3_part1.txt
sample_problem3_part2.txt
......
Each problem file (1,2,3...) has two parts (part1, part2). Each file has the same number of lines.
The content looks like:
sample_problem1_part1.txt
1 1 20 20 1
1 7 21 21 2
3 1 22 22 3
1 5 23 23 4
6 1 24 24 5
2 9 25 25 6
1 0 26 26 7
sample_problem1_part2.txt
1 1 88 88 8
1 1 89 89 9
2 1 90 90 10
1 3 91 91 11
1 1 92 92 12
7 1 93 93 13
1 5 94 94 14
sample_problem2_part1.txt
1 4 330 30 a
3 4 331 31 b
1 4 332 32 c
2 4 333 33 d
1 4 334 34 e
1 4 335 35 f
9 4 336 36 g
The output should look like: (in a sequence of problem1_part1, problem1_part2, problem2_part1, problem2_part2, problem3_part1, problem3_part2,etc.,)
1 8 a ...
2 9 b ...
3 10 c ...
4 11 d ...
5 12 e ...
6 13 f ...
7 14 g ...
I was using:
paste sample_problem1_part1.txt sample_problem1_part2.txt > \
sample_problem1_partall.txt
paste sample_problem2_part1.txt sample_problem2_part2.txt > \
sample_problem2_partall.txt
paste sample_problem3_part1.txt sample_problem3_part2.txt > \
sample_problem3_partall.txt
And then:
for i in `find . -name "sample_problem*_partall.txt"`
do
l=`echo $i | sed 's/sample/extracted_col_/'`
`awk '{print $5, $10}' $i > $l`
done
And:
paste extracted_col_problem1_partall.txt \
extracted_col_problem2_partall.txt \
extracted_col_problem3_partall.txt > \
extracted_col_problemall_partall.txt
It works fine with a few files, but it's a crazy method when the number of files is large (over 4000).
Could anyone help me with simpler solutions that are capable of dealing with multiple files, please?
Thanks!
Here's one way using awk and a sorted glob of files:
awk '{ a[FNR] = (a[FNR] ? a[FNR] FS : "") $5 } END { for(i=1;i<=FNR;i++) print a[i] }' $(ls -1v *)
Results:
1 8 a
2 9 b
3 10 c
4 11 d
5 12 e
6 13 f
7 14 g
Explanation:
For each line of input of each input file:
Add the files line number to an array with a value of column 5.
(a[FNR] ? a[FNR] FS : "") is a ternary operation, which is set up to build up the arrays value as a record. It simply asks if the files line number is already in the array. If so, add the arrays value followed by the default file separator before adding the fifth column. Else, if the line number is not in the array, don't prepend anything, just let it equal the fifth column.
At the end of the script:
Use a C-style loop to iterate through the array, printing each of the arrays values.
For only ~4000 files, you should be able to do:
find . -name sample_problem*_part*.txt | xargs paste
If find is giving names in the wrong order, pipe it to sort:
find . -name sample_problem*_part*.txt | sort ... | xargs paste
# print filenames in sorted order
find -name sample\*.txt | sort |
# extract 5-th column from each file and print it on a single line
xargs -n1 -I{} sh -c '{ cut -s -d " " -f 5 $0 | tr "\n" " "; echo; }' {} |
# transpose
python transpose.py ?
where transpose.py:
#!/usr/bin/env python
"""Write lines from stdin as columns to stdout."""
import sys
from itertools import izip_longest
missing_value = sys.argv[1] if len(sys.argv) > 1 else '-'
for row in izip_longest(*[column.split() for column in sys.stdin],
fillvalue=missing_value):
print " ".join(row)
Output
1 8 a
2 9 b
3 10 c
4 11 d
5 ? e
6 ? f
? ? g
Assuming the first and second files have less lines than the third one (missing values are replaced by '?').
Try this one. My script assumes that every file has the same number of lines.
# get number of lines
lines=$(wc -l sample_problem1_part1.txt | cut -d' ' -f1)
for ((i=1; i<=$lines; i++)); do
for file in sample_problem*; do
# get line number $i and delete everything except the last column
# and then print it
# echo -n means that no newline is appended
echo -n $(sed -n ${i}'s%.*\ %%p' $file)" "
done
echo
done
This works. For 4800 files, each 7 lines long it took 2 minutes 57.865 seconds on a AMD Athlon(tm) X2 Dual Core Processor BE-2400.
PS: The time for my script increases linearly with the number of lines. It would take very long time to merge files with 1000 lines. You should consider learning awk and use the script from steve. I tested it: For 4800 files, each with 1000 lines it took only 65 seconds!
You can pass awk output to paste and redirect it to a new file as follows:
paste <(awk '{print $3}' file1) <(awk '{print $3}' file2) <(awk '{print $3}' file3) > file.txt

Resources