how to compare two text file with first column if match then print same if not then put zero? - linux

1.txt contain
1
2
3
4
5
.
.
180
2.txt contain
3 0.5
4 0.8
9 9.0
120 3.0
179 2.0
so I want my output like if 2.txt match with first column of 1.txt then should print the value of second column that is in 2.txt. while if not match then should print zero .
like output should be:
1 0.0
2 0.0
3 0.5
4 0.8
5 0.0
.
.
8 0.0
9 9.0
10 0.0
11 0.0
.
.
.
120 3.0
121 0.0
.
.
150 0.0
.
179 2.0
180 0.0

awk 'NR==FNR{a[$1]=$2;next}{if($1 in a){print $1,a[$1]}else{print $1,"0.0"}}' 2.txt 1.txt
Brief explanation,
NR==FNR{a[$1]=$2;next: Record $1 of 2.txt into array a
If the $1 in 1.txt exists in array a, print a[$1], else print 0.0

Could you please try following and let me know if this helps you.
awk 'FNR==NR{a[$1];next} {for(i=prev+1;i<=($1-1);i++){print i,"0.0"}}{prev=$1;$1=$1;print}' OFS="\t" 1.txt 2.txt
Explanation of code:
awk '
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when 1.txt is being read.
a[$1]; ##Creating an array a whose index is $1.
next ##next will skip all further statements from here.
}
{
for(i=prev+1;i<=($1-1);i++){ ##Starting a for loop from variable prev+1 to till value of first field with less than 1 to it.
print i,"0.0"} ##Printing value of variable i and 0.0 here.
}
{
prev=$1; ##Setting $1 value to variable prev here.
$1=$1; ##Resetting $1 here to make TAB output delimited in output.
print ##Printing the current line here.
}' OFS="\t" 1.txt 2.txt ##Setting OFS as TAB and mentioning Input_file(s) name here.
Execution of above code:
Input_file(s):
cat 1.txt
1
2
3
4
5
6
7
cat 2.txt
3 0.5
4 0.8
9 9.0
Output will be as follows:
awk 'FNR==NR{a[$1];next} {for(i=prev+1;i<=($1-1);i++){print i,"0.0"}}{prev=$1;$1=$1;print}' OFS="\t" 1.txt 2.txt
1 0.0
2 0.0
3 0.5
4 0.8
5 0.0
6 0.0
7 0.0
8 0.0
9 9.0

This might work for you (GNU sed):
sed -r 's#^(\S+)\s.*#/^\1\\s*$/c&#' file2 | sed -i -f - -e 's/$/ 0.0/' file1
Create a sed script from file2 that if the first field from file2 matches the first field of file1 changes the matching line to the contents of the matching line in file2. All other lines are then zeroed i.e. lines not changed have 0.0 appended.

Related

How to match two different length and different column text file with header using join command in linux

I have two different length text files A.txt and B.txt
A.txt looks like :
ID pos val1 val2 val3
1 2 0.8 0.5 0.6
2 4 0.9 0.6 0.8
3 6 1.0 1.2 1.3
4 8 2.5 2.2 3.4
5 10 3.2 3.4 3.8
B.txt looks like :
pos category
2 A
4 B
6 A
8 C
10 B
I want to match pos column and in both files and want the output like this
ID catgeory pos val1 val2 val3
1 A 2 0.8 0.5 0.6
2 B 4 0.9 0.6 0.8
3 A 6 1.0 1.2 1.3
4 C 8 2.5 2.2 3.4
5 B 10 3.2 3.4 3.8
I used the join function join -1 2 -2 1 <(sort -k2 A.txt) <(sort -k1 B.txt) > C.txt
The C.txt comes without a header
1 A 2 0.8 0.5 0.6
2 B 4 0.9 0.6 0.8
3 A 6 1.0 1.2 1.3
4 C 8 2.5 2.2 3.4
5 B 10 3.2 3.4 3.8
I want to get output with a header from the join function. kindly help me out
Thanks in advance
In case you are ok with awk, could you please try following. Written and tested with shown samples in GNU awk.
awk 'FNR==NR{a[$1]=$2;next} ($2 in a){$2=a[$2] OFS $2} 1' B.txt A.txt | column -t
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when B.txt is being read.
a[$1]=$2 ##Creating array a with index of 1st field and value is 2nd field of current line.
next ##next will skip all further statements from here.
}
($2 in a){ ##Checking condition if 2nd field is present in array a then do following.
$2=a[$2] OFS $2 ##Adding array a value along with 2nd field in 2nd field as per output.
}
1 ##1 will print current line.
' B.txt A.txt | column -t ##Mentioning Input_file names and passing awk program output to column to make it look better.
As you requested... It is perfectly possible to get the desired output using just GNU join:
$ join -1 2 -2 1 <(sort -k2 -g A.txt) <(sort -k1 -g B.txt) -o 1.1,2.2,1.2,1.3,1.4,1.5
ID category pos val1 val2 val3
1 A 2 0.8 0.5 0.6
2 B 4 0.9 0.6 0.8
3 A 6 1.0 1.2 1.3
4 C 8 2.5 2.2 3.4
5 B 10 3.2 3.4 3.8
$
The key to getting the correct output is using the sort -g option, and specifying the join output column order using the -o option.
To "pretty print" the output, pipe to column -t
$ join -1 2 -2 1 <(sort -k2 -g A.txt) <(sort -k1 -g B.txt) -o 1.1,2.2,1.2,1.3,1.4,1.5 | column -t
ID category pos val1 val2 val3
1 A 2 0.8 0.5 0.6
2 B 4 0.9 0.6 0.8
3 A 6 1.0 1.2 1.3
4 C 8 2.5 2.2 3.4
5 B 10 3.2 3.4 3.8
$

How to do divide a column based on the corresponding value in another file?

I have multiple files (66) and want to divid column 3 of each file to its corresponding value in the info.file and insert the new value in column 4 of each file.
My manual code is:
awk '{print $4=$3/NUmber from info.file}1' file
But this takes me hours to do for each individual file. So I want to automate it for all files. Thanks
file1:
chrm name value
4 a 8
3 b 4
file2:
chrm name value
3 g 6
5 s 12
info.file:
file_name average
file1 8
file2 6
file3 10
output:
file1:
chrm name value new_value
4 a 8 1
3 b 4 0.5
file2:
chrm name value new_value
3 g 6 1
5 s 12 2
without error handling
$ awk 'NR==FNR {a[$1]=$2; next}
FNR==1 {out=FILENAME".new"; print $0, "new_value" > out; next}
{v=$NF/a[FILENAME]; $++NF=v; print > out}' info file1 file2
will generate updated files
$ head file{1,2}.new | column -t
==> file1.new <==
chrm name value new_value
4 a 8 1
3 b 4 0.5
==> file2.new <==
chrm name value new_value
3 g 6 1
5 s 12 2
Explanation
NR==FNR {a[$1]=$2; next} scan the first file and save the file/value pairs in the associative array
FNR==1 in the header line of each data file
out=FILENAME".new" set a output filename
print $0, "new_value" > out print existing header appended with the new column name
v=$NF/a[FILENAME] for every data line, scale the last field and assign to v
$++NF=v increment number of fields and assign the new computed value to the last field
print > out print the new line to the same file set before
info file1 file2 the list of files should be preceded by the info file
I have prepared the following double nested awk command for you:
awk 'NR>1{system("awk -v div="$2" -f div_column3.awk "$1" | column -t > new_"$1);}' info.file
with div_column3.awk being a awk commands script file with the content:
$ cat div_column3.awk
NR==1{print $0" new_value"}NR>1{print $0" "$3/div}

how to remove only the first two leading spaces in all lines of a files

my input file is like
*CONTROL_ADAPTIVE
$ adpfreq adptol adpopt maxlvl tbirth tdeath lcadp ioflag
0.10 5.000 2 3 0.0 0.0 0 0
I JUST want to remove the leading 2 spaces in all the lines.
I used
sed "s/^[ \t]*//" -i inputfile.txt
but it deletes all the space from all the lines.. I just want to shift the complete text in files to two position to left.
Any solutions to this?
You can specify that you want to delete two matches of the character set in the brackets:
sed -r -i "s/^[ \t]{2}//" inputfile.txt
See the output:
$ sed -r "s/^[ \t]{2}//" file
*CONTROL_ADAPTIVE
$ adpfreq adptol adpopt maxlvl tbirth tdeath lcadp ioflag
0.10 5.000 2 3 0.0 0.0 0 0

Split text file into parts based on a pattern taken from the text file

I have many text files of fixed-width data, e.g.:
$ head model-q-060.txt
% x y
15.0 0.0
15.026087 -1.0
15.052174 -2.0
15.07826 -3.0
15.104348 -4.0
15.130435 -5.0
15.156522 -6.0
15.182609 -6.9999995
15.208695 -8.0
The data comprise 3 or 4 runs of a simulation, all stored in the one text file, with no separator between runs. In other words, there is no empty line or anything, e.g. if there were only 3 'records' per run it would look like this for 3 runs:
$ head model-q-060.txt
% x y
15.0 0.0
15.026087 -1.0
15.052174 -2.0
15.0 0.0
15.038486 -1.0
15.066712 -2.0
15.0 0.0
15.041089 -1.0
15.087612 -2.0
It's a COMSOL Multiphysics output file for those interested. Visually you can tell where the new run data begin, as the first x-value is repeated (actually the entire second line might be the same for all of them). So I need to firstly open the file and get this x-value, save it, then use it as a pattern to match with awk or csplit. I am struggling to work this out!
csplit will do the job:
$ csplit -z -f 'temp' -b '%02d.txt' model-q-060.txt /^15\.0\\s/ {*}
but I have to know the pattern to split on. This question is similar but each of my text files might have a different pattern to match: Split files based on file content and pattern matching.
Ben.
Here's a simple awk script that will do what you want:
BEGIN { fn=0 }
NR==1 { next }
NR==2 { delim=$1 }
$1 == delim {
f=sprintf("test%02d.txt",fn++);
print "Creating " f
}
{ print $0 > f }
initialize output file number
ignore the first line
extract the delimiter from the second line
for every input line whose first token matches the delimiter, set up the output file name
for all lines, write to the current output file
This should do the job - test somewhere you don't have a lot of temp*.txt files: :)
rm -f temp*.txt
cat > f1.txt <<EOF
% x y
15.0 0.0
15.026087 -1.0
15.052174 -2.0
15.0 0.0
15.038486 -1.0
15.066712 -2.0
15.0 0.0
15.041089 -1.0
15.087612 -2.0
EOF
first=`awk 'NR==2{print $1}' f1.txt|sed 's/\\./\\\\./'`
echo --- Splitting by: $first
csplit -z -f temp -b %02d.txt f1.txt /^"$first"\\s/ {*}
for i in temp*.txt; do
echo ---- $i
cat $i
done
The output of the above is:
--- Splitting by: 15\.0
51
153
153
136
---- temp00.txt
% x y
---- temp01.txt
15.0 0.0
15.026087 -1.0
15.052174 -2.0
---- temp02.txt
15.0 0.0
15.038486 -1.0
15.066712 -2.0
---- temp03.txt
15.0 0.0
15.041089 -1.0
15.087612 -2.0
Of course, you will run into trouble if you have repeating second column value (15.0 in the above example) - solving that would be a tad harder - exercise left for the reader...
If the amount of lines per run is constant, you could use this:
cat your_file.txt | grep -P "^\d" | \
split --lines=$(expr \( $(wc -l "your_file.txt" | \
awk '{print $1'}) - 1 \) / number_of_runs)

Permutation columns without repetition

Can anybody give me some piece of code or algorithm or something else to solve the following problem?
I have several files, each with a different number of columns, like:
$> cat file-1
1 2
$> cat file-2
1 2 3
$> cat file-3
1 2 3 4
I would like to subtract the column absolute values and divide by the sum of all in a row for each different columns only once (combination without repeated column pairs):
in file-1 case I need to get:
0.3333 # because |1-2/(1+2)|
in file-2 case I need to get:
0.1666 0.1666 0.3333 # because |1-2/(1+2+3)| and |2-3/(1+2+3)| and |1-3/(1+2+3)|
in file-3 case I need to get:
0.1 0.2 0.3 0.1 0.2 0.1 # because |1-2/(1+2+3+4)| and |1-3/(1+2+3+4)| and |1-4/(1+2+3+4)| and |2-3/(1+2+3+4)| and |2-4/(1+2+3+4)| and |3-4/(1+2+3+4)|
This should work though I am guessing you have made a minor mistake in your input data. Based on your third pattern the following data should be -
Instead of:
in file-2 case I need to get:
0.1666 0.1666 0.3333 # because |1-2/(1+2+3)| and |2-3/(1+2+3)| and |1-3/(1+2+3)|
It should be:
in file-2 case I need to get:
0.1666 0.3333 0.1666 # because |1-2/(1+2+3)| and |1-3/(1+2+3)| and |2-3/(1+2+3)|
Here is the awk one liner:
awk '
NF{
a=0;
for(i=1;i<=NF;i++)
a+=$i;
for(j=1;j<=NF;j++)
{
for(k=j;k<NF;k++)
printf("%s ",-($j-$(k+1))/a)
}
print "";
next;
}1' file
Short version:
awk '
NF{for (i=1;i<=NF;i++) a+=$i;
for (j=1;j<=NF;j++){for (k=j;k<NF;k++) printf("%2.4f ",-($j-$(k+1))/a)}
print "";a=0;next;}1' file
Input File:
[jaypal:~/Temp] cat file
1 2
1 2 3
1 2 3 4
Test:
[jaypal:~/Temp] awk '
NF{
a=0;
for(i=1;i<=NF;i++)
a+=$i;
for(j=1;j<=NF;j++)
{
for(k=j;k<NF;k++)
printf("%s ",-($j-$(k+1))/a)
}
print "";
next;
}1' file
0.333333
0.166667 0.333333 0.166667
0.1 0.2 0.3 0.1 0.2 0.1
Test from shorter version:
[jaypal:~/Temp] awk '
NF{for (i=1;i<=NF;i++) a+=$i;
for (j=1;j<=NF;j++){for (k=j;k<NF;k++) printf("%2.4f ",-($j-$(k+1))/a)}
print "";a=0;next;}1' file
0.3333
0.1667 0.3333 0.1667
0.1000 0.2000 0.3000 0.1000 0.2000 0.1000
#Jaypal just beat me too it! Here's what I had:
awk '{for (x=1;x<=NF;x++) sum += $x; for (i=1;i<=NF;i++) for (j=2;j<=NF;j++) if (i < j) printf ("%.1f ",-($i-$j)/sum)} END {print ""}' file.txt
Output:
0.1 0.2 0.3 0.1 0.2 0.1
prints to one decimal place.
#Jaypal, Is there a quick way to printf an absolute value? Perhaps like: abs(value) ?
EDIT:
#Jaypal, yes I've tried searching too and couldn't find something simple :-( It seems if ($i < 0) $i = -$i is the way to go. I guess you could use sed to remove any minus signs:
awk '{for (x=1;x<=NF;x++) sum += $x; for (i=1;i<=NF;i++) for (j=2;j<=NF;j++) if (i < j) printf ("%.1f ", ($i-$j)/sum)} {print ""}' file.txt | sed "s%-%%g"
Cheers!
As it looks like a homework, I will act accordingly.
To find the total numbers present in the file, you can use
cat filename | wc -w
Find the first_number by:
cat filename | cut -d " " -f 1
To find the sum in a file:
cat filename | tr " " "+" | bc
Now, that you have the total_nos, use something like:
for i in {seq 1 1 $total_nos}
do
#Find the numerator by first_number - $i
#Use the sum you got from above to get the desired value.
done

Resources