cat one file to multiple file and output multiple files - cat

I'm trying to cat 1 main-file to multiple single files. The output file should benamed main-file_file1
For example
main-file + file1 = mailfile_file1
main-file + file2 = mailfile_file2
main-file + file3 = mailfile_file3
.
.
.
main-file + fileN = mainfile_fileN
I guess I could
cat mail-file file1 > mail-file_file1
but I have 100 files to cat to mail-file so that won't be so efficient.
Any suggestions?

You need a bash for loop (assuming your shell is bash)! In you case you would do something like this:
for i in {1..100}; do cat mail-file file$i > mail-file_file$i; done

Related

Looping through a table and append information of that table to another file

This is my first post and im fairly new to bash coding.
We ran some Experiments where i work and for plotting it in gnuplot we need to append a reaction label to a Result.
We have a file that looks like this:
G135b CH2O+HCO=O2+C2H3
R020b 2CO+H=OH+C2O
R021b 2CO+O=O2+C2O
and a Result-file (which i cant access right now, sorry) where the first column of shown file is the same, followed by multiple values. They are not in the same order.
Now i want to loop through the Result-file and take the value of the first column, search for it in the shown file and append the reactionlabel to that line.
How can i loop through all the lines of the resulting file and take the value of the first column in a temporary variable?
I want to use this variable like this:
grep -r '^$var' shownfile | awk '{print $2}'
(Gives something back like this: CH2O+HCO=O2+C2H3)
How can i append the result of that line to the Result-file?
Edit: I also wrote a Script to go from a file that looks like this:
G135b : 0.178273 C H 2 O + H C O = O 2 + C 2 H 3
to this:
G135b CH2O+HCO=O2+C2H3
which is:
#!/bin/bash
file=$(pwd)
cd $file
# echo "$file"
cut -f1,3 $file/newfile >>tmpfile
sed -i "s/://g" tmpfile
sed -i "s/ //g" tmpfile
cp tmpfile newfile
How do i execute the cut command inside a file? Like -i for sed. My workaround is pretty ugly because it creates another file in the current directory.
Thank you :)
join command would work here which would perform inner join on 2 files wrt 1st column of each(by default).
$ cat data
G135b CH2O+HCO=O2+C2H3
R020b 2CO+H=OH+C2O
R021b 2CO+O=O2+C2O
$ cat result_file
G135b a b c
R020b a b
R021b a b x y z
$ join data result_file
G135b CH2O+HCO=O2+C2H3 a b c
R020b 2CO+H=OH+C2O a b
R021b 2CO+O=O2+C2O a b x y z
Using awk, it would be something like:
NR == FNR { data[$1] = $2; next; }
{ print $0 " " data[$1]; }
Save that in a file called reactions.awk, then call awk -f reactions.awk shownfile resultfile.
awk '{a[$1]=a[$1]$2} END{for (i in a){print i,a[i]}}' file1 file2

opening a file > replacing text in a loop in unix

I have a file abc.csv which is as follows;-
sn,test,control
1,file1,file2
2,file3,file4
i have another configuration file text.cfg as follows:-
model.input1 = /usr/bin/file1
model.input2 = /usr/bin/file2
now i want to replace the file1 and file2 with
file3 and file4 respectively and then
file5 and file6 and so on,, till the abc.csv file is exhausted
my attempt for the problem is :-
IFS =","
while read NUM TEST CONTROL
X=""
Y=""
do
echo "Serial Number :$NUM"
echo "Test File :$TEST"
echo "Control File:$CONTROL"
sed -i -e 's/$A/$B/g' text.cfg
X = $TEST
Y = $CONTROL
done < abc.csv
the ouput should be new config files as follows:-
this is the test1.cfg:
model.input1 = /usr/bin/file1
model.input2 = /usr/bin/file2
then the second file should be test2.cfg
model.input1 = /usr/bin/file3
model.input2 = /usr/bin/file4
and so on..
awk -F, 'NR>1{i = 0; while(++i < NF){print "model.input"i" = /usr/bin/"$(i+1) >> "test"$1".cfg"} }' File
Output:
AMD$ cat test1.cfg
model.input1 = /usr/bin/file1
model.input2 = /usr/bin/file2
AMD$ cat test2.cfg
model.input1 = /usr/bin/file3
model.input2 = /usr/bin/file4
Hope this helps.

concatenate files awk/linux

I have n files in a folder which starts with lines as shown below.
##contig=<ID=chr38,length=23914537>
##contig=<ID=chrX,length=123869142>
##contig=<ID=chrMT,length=16727>
##samtoolsVersion=0.1.19-44428cd
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT P922_120
chr1 412573 SNP74 A C 2040.77 PASS AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;DP=58;
chr1 602567 BICF2G630707977 A G 877.77 PASS AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;
chr1 604894 BICF2G630707978 A G 2044.77 PASS AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;
chr1 693376 . GCCCCC GCCCC 761.73 . AC=2;AC1=2;AF=1.00;AF1=1;
There are n such files. I want to concatenate all the files into a single file such that all the lines begining with # should be deleted from all the files and concatenate the rest of the rows from all the files only retaining the header line. Example output is shown below:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT P922_120
chr1 412573 SNP74 A C 2040.77 PASS AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;DP=58;
chr1 602567 BICF2G630707977 A G 877.77 PASS AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;
chr1 604894 BICF2G630707978 A G 2044.77 PASS AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;
chr1 693376 . GCCCCC GCCCC 761.73 . AC=2;AC1=2;AF=1.00;AF1=1;
Specifically with awk:
awk '$0!~/^#/{print $0}' file1 file2 file3 > outputfile
Broken down you are checking if the line ($0) does not match (!~) a string beginning with # (/^#/) and if so, print the line. You take input files and write to (>) outputfile.
Your problem is not terribly well specified, but I think you are just looking for:
sed '/^##/d' $FILE_LIST > output
Where FILE_LIST is the list of input files( you may be able to use *)
If I understood correctly, you could do:
echo "#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT P922_120" > mergedfile
for file in $FILES; do cat $file | grep -v "#" >> mergedfile; done
Note that $FILES could be ls and the -v option in grep is the non-match flag.
I believe what you want is
awk '$0 ~/^##/ { next; } $0 ~ /^#/ && !printed_header {print; printed_header=1 } $0! ~ /^#/ {print }' file1 file2 file3
Or you can use grep like this:
grep -vh "^##" *
The -v means inverted, so the command means... look for all lines NOT starting ## in all files and don't print filenames (-h).
Or, if you want to emit 1 header line at the start,
(grep -m1 ^#CHROM * ; grep -hv ^## * ) > out.txt

Searching for text

I'm trying to write a shell script that searches for text within a file and prints out the text and associated information to a separate file.
From this file containing list of gene IDs:
DDIT3 ENSG00000175197
DNMT1 ENSG00000129757
DYRK1B ENSG00000105204
I want to search for these gene IDs (ENSG*), their RPKM1 and RPKM2 values in a gtf file:
chr16 gencodeV7 gene 88772891 88781784 0.126744 + . gene_id "ENSG00000174177.7"; transcript_ids "ENST00000453996.1,ENST00000312060.4,ENST00000378384.3,"; RPKM1 "1.40735"; RPKM2 "1.61345"; iIDR "0.003";
chr11 gencodeV7 gene 55850277 55851215 0.000000 + . gene_id "ENSG00000225538.1"; transcript_ids "ENST00000425977.1,"; RPKM1 "0"; RPKM2 "0"; iIDR "NA";
and print/ write it to a separate output file
Gene_ID RPKM1 RPKM2
ENSG00000108270 7.81399 8.149
ENSG00000101126 12.0082 8.55263
I've done it on the command line using for each ID using:
grep -w "ENSGno" rnaseq.gtf| awk '{print $10,$13,$14,$15,$16}' > output.file
but when it comes to writing the shell script, I've tried various combinations of for, while, read, do and changing the variables but without success. Any ideas would be great!
You can do something like:
while read line
do
var=$(echo $line | awk '{print $2}')
grep -w "$var" rnaseq.gtf| awk '{print $10,$13,$14,$15,$16}' >> output.file
done < geneIDs.file

extracting data from two list using a shell script

I am trying to create a shell script that pulls a line from a file and checks another file for an instance of the same. If it finds an entry then it adds it to another file and loops through the first list until the it has gone through the whole file. The data in the first file looks like this -
email#address.com;
email2#address.com;
and so on
The other file in which I am looking for a match and placing the match in the blank file looks like this -
12334 email#address.com;
32213 email2#address.com;
I want it to retain the numbers as well as the matching data. I have an idea of how this should work but need to know how to implement it.
My Idea
#!/bin/bash
read -p "enter first file name:" file1
read -p "enter second file name:" file2
FILE_DATA=( $( /bin/cat $file1))
FILE_DATA1=( $( /bin/cat $file2))
for I in $((${#FILE_DATA[#]}))
do
echo $FILE_DATA[$i] | grep $FILE_DATA1[$i] >> output.txt
done
I want the output to look like this but only for addresses that match -
12334 email#address.com;
32213 email2#address.com;
Thank You
quite like manipulating text using SQL:
$ cat file1
b#address.com
a#address.com
c#address.com
d#address.com
$ cat file2
10712 e#address.com
11457 b#address.com
19985 f#address.com
22519 d#address.com
$ join -1 1 -2 2 <(sort file1) <(sort -k2 file2) | awk '{print $2,$1}'
11457 b#address.com
22519 d#address.com
make keys sorted(we use emails as keys here)
join on keys(file1.column1, file2.column2)
format output(use awk to reverse columns)
As you've learned about diff and comm, now it's time to learn about another tool in the unix toolbox, join.
Join does just what the name indicates, it joins together 2 files. The way you join is based on keys embedded in the file.
The number 1 restraint on using join is that the data must be sorted in both files on the same column.
file1
a abc
b bcd
c cde
file2
a rec1
b rec2
c rec3
join file1 file2
a abc rec1
b bcd rec2
c cde rec3
you can consult the join man page for how to reduce and reorder the columns of output. for example
1>join -o 1.1 2.2 file1 file2
a rec1
b rec2
c rec3
You can use your code for file name input to turn this into a generalizable script.
Your solution using a pipeline inside a for loop will work for small sets of data, but as the size of data grows, the cost of starting a new process for each word you are searching for will drag down the run time.
I hope this helps.
Read line by the file1.txt file and assign the line to var ADDR. grep file2.txt with the content of var ADDR and append the output to file_result.txt.
(while read ADDR; do grep "${ADDR}" file2.txt >> file_result.txt ) < file1.txt
This awk one-liner can help you do that -
awk 'NR==FNR{a[$1]++;next}($2 in a){print $0 > "f3.txt"}' f1.txt f2.txt
NR and FNR are awk's built-in variables that stores the line numbers. NR does not get reset to 0 when working with two files. FNR does. So while that condition is true we add everything to an array a. Once the first file is completed, we check for the second column of second file. If a match is present in the array we put the entire line in a file f3.txt. If not then we ignore it.
Using data from Kev's solution:
[jaypal:~/Temp] cat f1.txt
b#address.com
a#address.com
c#address.com
d#address.com
[jaypal:~/Temp] cat f2.txt
10712 e#address.com
11457 b#address.com
19985 f#address.com
22519 d#address.com
[jaypal:~/Temp] awk 'NR==FNR{a[$1]++;next}($2 in a){print $0 > "f3.txt"}' f1.txt f2.txt
[jaypal:~/Temp] cat f3.txt
11457 b#address.com
22519 d#address.com

Resources