Inserting multiple files into a template - linux

I have a template file and need to insert data from multiple files into this file. The template (template.txt) is laid out like so:
Title
Data 1
Data 2
Data 3
I need to put each data set under its title. So say the data files are:
Data1.dat Data2.dat Data3.dat
1 2 3 0 0 0 500 300 100
4 5 6 0 0 0 400 200 000
The final product needs to be:
Title
Data 1
1 2 3
4 5 6
Data 2
0 0 0
0 0 0
Data 3
500 300 100
400 200 000
How can I make this possible? I can insert one data set into the template using:
sed '/Data 1/r Data1.dat' template.txt
I want to be able to do it for as many data files as needed and can't figure out how to automate it.

this would do...
while read -r line;
do
file=$(echo $line | sed 's/ //;s/$/.dat/');
echo $line;
if [ -f "$file" ]; then cat "$file"; fi;
done < template.txt

There's a lot of details left out of your post that could affect this, but given what you've asked, this seems to work assuming each line is the exact filename with spaces removed and missing the .dat extension, and that all files exist.
$ cat template.txt
Title
Data 1
Data 2
Data 3
$ awk 'NR==1{print;next;}{print;filename=$0;gsub(" ","",filename);system("cat "filename".dat");}' template.txt
Title
Data 1
1 2 3
4 5 6
Data 2
0 0 0
0 0 0
Data 3
500 300 100
400 200 000
NR==1{print;next;} emits the first line and then goes to the next.
{print;filename=$0".dat";gsub(" ","",filename);system("cat "filename");}
for all other lines, print the line, assign it to a var and append .dat, then replace spaces and make a system call to cat that file.
Another example:
$ awk 'NR==1{print;next;}{print;filename=$0".dat";gsub(" ","",filename);system("cat "filename);}' template.txt
Title
Data 1
1 2 3
4 5 6
Data 2
0 0 0
0 0 0
Data 3
500 300 100
400 200 000
Data 4
4-1 4-2 4-3
4-1 4-2 4-3
4-1 4-2 4-3
Data 5
5-1 5-2 5-3
5-1 5-2 5-3
5-1 5-2 5-3
Data 6
6-1 6-2 6-3
6-1 6-2 6-3
6-1 6-2 6-3
Data 7
7-1 7-2 7-3
7-1 7-2 7-3
7-1 7-2 7-3

$ cat tst.awk
NR==FNR {
if ( FNR==1 ) {
print
}
else {
filename = $0 ".dat"
gsub(/[[:space:]]/,"",filename)
title[filename] = $0
ARGV[ARGC] = filename
ARGC++
}
next
}
FNR==1 { print title[FILENAME] }
{ print }
$ awk -f tst.awk template.txt
Title
Data 1
1 2 3
4 5 6
Data 2
0 0 0
0 0 0
Data 3
500 300 100
400 200 000

Related

Update a column matching criteria in TAB seperated line using bash

I have a TAB separated txt file looking like this;
Serving Sector Target Sector HO Attempts HO Successful Attempts
1002080 1002081 8 8
1002080 1002084 0 0
1002080 1002974 2 2
1002080 2104-2975 5 5
1002080 1002976 2 2
1002080 1012237 10 10
1002080 1012281 0 0
In some situations the Target Sector(column 2) might be on this format 2104-2975( ABCD-YYYY).
In those cases I wish to update this string of column 2 to the correct format (BC0YYYY = 1002975)
This is what I have written so far;
while read -r line;
do
if echo $line | grep -E '([0-9])-([0-9])' # If line matches criteria
then
string=`echo "$line" | awk -F '\t' '{{print $2}}'` #fetch column 2
LAC=${string%-*} #LAC= ABCD
CI=${string##*-} #CI = YYYY
if [ ${#CI} -lt 5 ]; then CI="0"$CI; #IF stringlength of CI is less than 5, add 0
fi
LAC2=`echo $LAC | cut -c2-3` #LAC2 = BC
GERANCELL=$LAC2$CI
fi
done < input.txt
Anyone know how to update the 2nd column of the line with the new value $GERANCELL?
A straightforward sed solution could be:
sed 's/[0-9]\([0-9][0-9]\)[0-9]-\([0-9]\{4\}\)/\10\2/' input.txt
perl -lane'BEGIN{$"="\t"}if($F[1]=~/-/){($a,$b)=(split/-/,$F[1]);$F[1]=sprintf "%02d%05d",substr($a,1,2),$b;print "#F"}else{print "#F"}' file
erving Sector Target Sector HO Attempts HO Successful Attempts
1002080 1002081 8 8
1002080 1002084 0 0
1002080 1002974 2 2
1002080 1002975 5 5
1002080 1002976 2 2
1002080 1012237 10 10
1002080 1012281 0 0
Using awk
Set the field and output field separator to a tab
Match 4 or more digits with a hyphen in between in field 2
If there is a match, split on a hyphen
From the first part, take the second and third character
From the second part, prepend a 0 if the length is smaller than 5
Set the result to field 2 (or print the values of all 4 columns immediately)
}1 evaluates to true, printing the whole line
Example
awk '
BEGIN {FS=OFS="\t"}
match($2, /^[0-9][0-9][0-9][0-9]+-[0-9][0-9][0-9][0-9]+$/) {
split($2, a, "-")
$2 = substr(a[1],2,2) (length(a[2]) < 5 ? "0" : "") a[2]
}1
' input_file.txt | column -t -s $'\t'
Formatted output with column -t
Serving Sector Target Sector HO Attempts HO Successful Attempts
1002080 1002081 8 8
1002080 1002084 0 0
1002080 1002974 2 2
1002080 1002975 5 5
1002080 1002976 2 2
1002080 1012237 10 10
1002080 1012281 0 0
1002080 2509300 0 0
1002080 2519287 0 0
If supported, you can shorten the pattern in the match function to /^[0-9]{4,}-[0-9]{4,}$/

How to extract the number after specific word using awk?

I have several lines of text. I want to extract the number after specific word using awk.
I tried the following code but it does not work.
At first, create the test file by: vi test.text. There are 3 columns (the 3 fields are generated by some other pipeline commands using awk).
Index AllocTres CPUTotal
1 cpu=1,mem=256G 18
2 cpu=2,mem=1024M 16
3 4
4 cpu=12,gres/gpu=3 12
5 8
6 9
7 cpu=13,gres/gpu=4,gres/gpu:ret6000=2 20
8 mem=12G,gres/gpu=3,gres/gpu:1080ti=1 21
Please note there are several empty fields in this file.
what I want to achieve is to extract the number after the first gres/gpu= in each line (if no gres/gpu= occurs in this line, the default number is 0) using a pipeline like: cat test.text | awk '{some_commands}' to output 4 columns:
Index AllocTres CPUTotal GPUAllocated
1 cpu=1,mem=256G 18 0
2 cpu=2,mem=1024M 16 0
3 4 0
4 cpu=12,gres/gpu=3 12 3
5 8 0
6 9 0
7 cpu=13,gres/gpu=4,gres/gpu:ret6000=2 20 4
8 mem=12G,gres/gpu=3,gres/gpu:1080ti=1 21 3
Firstly: awk do not need cat, it could read files on its' own. Combining cat and awk is generally discouraged as useless use of cat.
For this task I would use GNU AWK following way, let file.txt content be
cpu=1,mem=256G
cpu=2,mem=1024M
cpu=12,gres/gpu=3
cpu=13,gres/gpu=4,gres/gpu:ret6000=2
mem=12G,gres/gpu=3,gres/gpu:1080ti=1
then
awk 'BEGIN{FS="gres/gpu="}{print $2+0}' file.txt
output
0
0
0
3
0
0
4
3
Explanation: I inform GNU AWK that field separator (FS) is gres/gpu= then for each line I do print 2nd field increased by zero. For lines without gres/gpu= $2 is empty string, when used in arithmetic context this is same as zero so zero plus zero gives zero. For lines with at least one gres/gpu= increasing by zero provokes GNU AWK to find longest prefix which is legal number, thus 3 (4th line) becomes 3, 4, (7th line) becomes 4, 3, (8th line) becomes 3.
(tested in GNU Awk 5.0.1)
With your shown samples in GNU awk you can try following code. Written and tested in GNU awk. Simple explanation would be using awk's match function where using regex gres\/gpu=([0-9]+)(escaping / here) and creating one and only capturing group to capture all digits coming after =. Once match is found printing current line followed by array's arr's 1st element +0(to print zero in case no match found for any line) here.
awk '
FNR==1{
print $0,"GPUAllocated"
next
}
{
match($0,/gres\/gpu=([0-9]+)/,arr)
print $0,arr[1]+0
}
' Input_file
Using sed
$ sed '1s/$/\tGPUAllocated/;s~.*gres/gpu=\([0-9]\).*~& \t\1~;1!{\~gres/gpu=[0-9]~!s/$/ \t0/}' input_file
Index AllocTres CPUTotal GPUAllocated
1 cpu=1,mem=256G 18 0
2 cpu=2,mem=1024M 16 0
3 4 0
4 cpu=12,gres/gpu=3 12 3
5 8 0
6 9 0
7 cpu=13,gres/gpu=4,gres/gpu:ret6000=2 20 4
8 mem=12G,gres/gpu=3,gres/gpu:1080ti=1 21 3
awk '
BEGIN{FS="\t"}
NR==1{
$(NF+1)="GPUAllocated"
}
NR>1{
$(NF+1)=FS 0
}
/gres\/gpu=/{
split($0, a, "=")
gp=a[3]; gsub(/[ ,].*/, "", gp)
$NF=FS gp
}1' test.text
Index AllocTres CPUTotal GPUAllocated
1 cpu=1,mem=256G 18 0
2 cpu=2,mem=1024M 16 0
3 4 0
4 cpu=12,gres/gpu=3 12 3
5 8 0
6 9 0
7 cpu=13,gres/gpu=4,gres/gpu:ret6000=2 20 4
8 mem=12G,gres/gpu=3,gres/gpu:1080ti=1 21 3

How to replace a number to another number in a specific column using awk

This is probably basic but I am completely new to command-line and using awk.
I have a file like this:
1 RQ22067-0 -9
2 RQ34365-4 1
3 RQ34616-4 1
4 RQ34720-1 0
5 RQ14799-8 0
6 RQ14754-1 0
7 RQ22101-7 0
8 RQ22073-1 0
9 RQ30201-1 0
I want the 0s to change to 1 in column3. And any occurence of 1 and 2 to change to 2 in column3. So essentially only changing numbers in column 3. But I am not changing the -9.
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
I have tried using (see below) but it has not worked
>> awk '{gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
>> awk '{gsub("1","2",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
Thank you.
With this code in your question:
awk '{gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
awk '{gsub("1","2",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
you're running both commands on the same input file and writing their
output to the same output file so only the output of the 2nd script
will be present in the output, and
you're trying to change 0 to 1
first and THEN change 1 to 2 so the $3s that start out as 0 would
end up as 2, you need to change the order of the operations.
This is what you should be doing, using your existing code:
awk '{gsub("1","2",$3); gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
For example:
$ awk '{gsub("1","2",$3); gsub("0","1",$3)}1' file
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
The gsub() should also just be sub()s as you only want to perform each substitution once, and you don't need to enclose the numbers in quotes so you could just do:
awk '{sub(1,2,$3); sub(0,1,$3)}1' file
You can check the value of column 3 and then update the field value.
Check for 1 as the first rule because if the first check is for 0, the value will be set to 1 and the next check will set the value to 2 resulting in all 2's.
awk '
{
if($3==1) $3 = 2
if($3==0) $3 = 1
}
1' file
Output
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
With your shown samples and ternary operators try following code. Simple explanation would be, checking condition if 3rd field is 1 then set it to 2 else check if its 0 then set it to 0 else keep it as it is, finally print the line.
awk '{$3=$3==1?2:($3==0?1:$3)} 1' Input_file
Generic solution: Adding a Generic solution here, where we can have 3 awk variables named: fieldNumber in which you could mention all field numbers which we want to check for. 2nd one is: existValue which we want to match(in condition) and 3rd one is: newValue new value which needs to be there after replacement.
awk -v fieldNumber="3" -v existValue="1,0" -v newValue="2,1" '
BEGIN{
num=split(fieldNumber,arr1,",")
num1=split(existValue,arr2,",")
num2=split(newValue,arr3,",")
for(i=1;i<=num1;i++){
value[arr2[i]]=arr3[i]
}
}
{
for(i=1;i<=num;i++){
if($arr1[i] in value){
$arr1[i]=value[$arr1[i]]
}
}
}
1
' Input_file
This might work for you (GNU sed):
sed -E 's/\S+/\n&\n/3;h;y/01/12/;G;s/.*\n(.*)\n.*\n(.*)\n.*\n.*/\2\1/' file
Surround 3rd column by newlines.
Make a copy.
Replace all 0's by 1's and all 1's by 2's.
Append the original.
Pattern match on newlines and replace the 3rd column in the original by the 3rd column in the amended line.
Also with awk:
awk 'NR > 1 {s=$3;sub(/1/,"2",s);sub(/0/,"1",s);$3=s} 1' file
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
the substitutions are made with sub() on a copy of $3 and then the copy with the changes is assigned to $3.
When you don't like the simple
sed 's/1$/2/; s/0$/1/' file
you might want to play with
sed -E 's/(.*)([01])$/echo "\1$((\2+1))"/e' file

How to reorder columns of hunderds of tab deliminated file in linux?

I have large scale tab-delimited files (a couple of hundreds), and but the order of the columns is different across the different files( the same columns, but different locations). Hence, I need to reorder all the columns in all the files and write it back in tab-deliminated format.
I would like to write a shell script that takes a specified order of columns and reorder all the columns in all the files and write it back. Can someone help me with it?
Here is how the header of my files looks like:
file1)
sLS72 chrX
A B E C F H
2 1 4 5 7 8
0 0 0 0 0 0
and the header of my second file:
S721 chrX
A E B F H C
12 11 2 3 4 1
0 0 0 0 0 0
here is the order of the columns that I want to achieve:
Order=[A ,B ,C ,E,F,H]
and here is the expected outputs for each file based on this ordering:
sLS72 chrX
A B C E F H
2 1 5 4 7 8
0 0 0 0 0 0
file 2:
S721 chrX
A B C E F H
12 2 1 11 3 4
0 0 0 0 0 0
I was trying to use awk:
awk -F'\t' '{s2=$A; $3=$B; $4=$C; $5=$E; $1=s}1' OFS='\t' in file
but the point is the, first, the order of columns are different in different files, and second, the names of the columns start from the second line of the file. In order words, first line is the header, I don't want to change it, but the second line is the colnames of the columns, so I want to order all files based on that. it's kind of tricky
$ awk -v order="A B C E F H" '
BEGIN {n=split(order,ho)}
FNR==1 {print; next}
FNR==2 {for(i=1;i<=NF;i++) hn[$i]=i}
{for(i=1;i<=n;i++) printf "%s",$hn[ho[i]] (i==n?ORS:OFS)}' file1 > tmp && mv tmp file1
sLS72 chrX
A B C E F H
0 0 0 0 0 0
0 0 0 0 0 0
if working on multiple files at the same time, change it to
$ awk -v ...
{... printf "%s",$hn[ho[i]] (i==n?ORS:OFS) > (FILENAME"_reordered") }' dir/files*
and do a mass rename afterwards. Alternative is run the original script if a loop for each file.

Compare columns from two files and print not match

I want to compare the first 4 columns of file1 and file2. I want to print all lines from file1 + the lines from file2 that are not in file1.
File1:
2435 2 2 7 specification 9-8-3-0
57234 1 6 4 description 0-0 55211
32423 2 44 3 description 0-0 24242
File2:
2435 2 2 7 specification
7624 2 2 1 namecomplete
57234 1 6 4 description
28748 34 5 21 gateway
32423 2 44 3 description
832758 3 6 namecomplete
output:
2435 2 2 7 specification 9-8-3-0
57234 1 6 4 description 0-0 55211
32423 2 44 3 description 0-0 24242
7624 2 2 1 namecomplete
28748 34 5 21 gateway
832758 3 6 namecomplete
I don't understand how to print things that don't match.
You can do it with an awk script like this:
script.awk
FNR == NR { mem[ $1 $2 $3 $4 $5 ] = 1;
print
next
}
{ key = $1 $2 $3 $4 $5
if( ! ( key in mem) ) print
}
And run it like this: awk -f script.awk file1 file2 .
The first part memorizes the first 5 fields, prints the whole line and moves to the next line. This part is exclusively applied to lines from the first file.
The second part is only applied to lines from the second file. It checks if the line is not in mem, in that case the line was not in file1 and it is printed.

Resources