Awk command to convert hex to signed decimal - linux

I have a text file which consists of 3 columns with hex numbers (values are variable, these used only as an example):
X Y Z
0a0a 0b0b 0c0c
0a0a 0b0b 0c0c
0a0a 0b0b 0c0c
0a0a 0b0b 0c0c
I want to convert these numbers to signed decimal and print them in the same structure they are in, so I did:
awk '{x="0x"$1;
y="0x"$2;
z="0x"$3;
printf ("%d %d %d" x,y,z);}' input_file.txt > output_file.txt
The list that I get as an output consists only of unsigned values.

You can use an awk function to make a Conversion from Two's Complement:
function hex2int( hexstr, nbits )
{
max = 2 ^ nbits
med = max / 2
num = strtonum( "0x" hexstr )
return ((num < med) ? num : ( (num > med) ? num - max : -med ))
}
4bit Conversion examples:
print hex2int( "7", 4 ) # +7
print hex2int( "2", 4 ) # +2
print hex2int( "1", 4 ) # +1
print hex2int( "0", 4 ) # 0
print hex2int( "f", 4 ) # -1
print hex2int( "d", 4 ) # -3
print hex2int( "9", 4 ) # -7
print hex2int( "8", 4 ) # -8
8bit Conversion examples:
print hex2int( "7f", 8 ) # +127
print hex2int( "40", 8 ) # +64
print hex2int( "01", 8 ) # +1
print hex2int( "00", 8 ) # 0
print hex2int( "ff", 8 ) # -1
print hex2int( "40", 8 ) # -64
print hex2int( "81", 8 ) # -127
print hex2int( "80", 8 ) # -128
Putting all together using a 16bit conversion:
#!/bin/awk
function hex2int( hex )
{
num = strtonum( "0x" hex )
return ((num < med) ? num : ( (num > med) ? num - max : -med ))
}
BEGIN {
nbits = 16
max = 2 ^ nbits
med = max / 2
}
{
for( i = 1; i <= NF; i++ )
{
if( NR == 1 )
{
printf "%s%s", $i, OFS
}
else
{
printf "%d%s", hex2int($i), OFS
}
}
printf "%s", ORS
}
# eof #
Input file:
X Y Z
0a0a 0b0b 0c0c
abcd ef01 1234
ffff fafa baba
12ab abca 4321
Testing:
$ awk -f script.awk -- input.txt
Output:
X Y Z
2570 2827 3084
-21555 -4351 4660
-1 -1286 -17734
4779 -21558 17185
Reference: https://en.wikipedia.org/wiki/Two's_complement
Hope it helps!

Related

Compute sum for column 2 and average for all other columns in multiple files without considering missing values

I want to calculate the sum for column 2 and average for all other columns from 15 files:- ifile1.txt, ifile2.txt, ....., ifile15.txt. Number of columns and rows of each file are same. But some of them are missing values. Part of the data looks as
ifile1.txt ifile2.txt ifile3.txt
3 ? ? ? . 1 2 1 3 . 4 ? ? ? .
1 ? ? ? . 1 ? ? ? . 5 ? ? ? .
4 6 5 2 . 2 5 5 1 . 3 4 3 1 .
5 5 7 1 . 0 0 1 1 . 4 3 4 0 .
. . . . . . . . . . . . . . .
I would like to find a new file which will show the sum for column 2 and average for all other columns from these 15 fils without considering the missing values.
ofile.txt
2.66 2 1 3 . (i.e. average of 3 1 4, sum of ? 2 ?, average of ? 1 ?, average of ? 3 ?, and so on)
2.33 ? ? ? .
3 15 4.33 1.33 .
3 8 4 0.66 .
. . . . .
This question is similar to my earlier question Average of multiple files without considering missing values where the script was written for average for all columns.
awk '
{
for (i = 1;i <= NF;i++) {
Sum[FNR,i]+=$i
Count[FNR,i]+=$i!="?"
}
}
END {
for( i = 1; i <= FNR; i++){
for( j = 1; j <= NF; j++) printf "%s ", Count[i,j] != 0 ? Sum[i,j]/Count[i,j] : "?"
print ""
}
}
' ifile*
But I can't able to modify it to my desire output.
Based on your previous awk script, I modify it as followed,
$ cat awk_script
{
for (i = 1;i <= NF;i++) {
Sum[FNR,i]+=$i
Count[FNR,i]+=$i!="?"
}
}
END {
for( i = 1; i <= FNR; i++){
for( j = 1; j <= NF; j++)
if(j==2) { printf "%s\t" ,Count[i,j] != 0 ? Sum[i,j] : "?" }
else {
if (Count[i,j] != 0){
val=Sum[i,j]/Count[i,j]
printf "%s%s\t",int(val),match(val,/\.[0-9]/)!=0 ? "."substr(val,RSTART+1,2):""
} else printf "?\t"
}
print ""
}
}
And the output would be:
$ awk -f awk_script ifile*
2.66 2 1 3 0
2.33 ? ? ? 0
3 15 4.33 1.33 0
3 8 4 0.66 0
0 0 0 0 0
Brief explanation,
if(j==2): print the sum of the value in each file
for the average value, I notice that the values are not rounded up, so extract the decimal part using substr(val,RSTART+1,2), and integer part using int(val)
$ cat tst.awk
BEGIN { dfltVal="?"; OFS="\t" }
{
for (colNr=1; colNr<=NF; colNr++) {
if ($colNr != dfltVal) {
sum[FNR,colNr] += $colNr
cnt[FNR,colNr]++
}
}
}
END {
for (rowNr=1; rowNr<=FNR; rowNr++) {
for (colNr=1; colNr<=NF; colNr++) {
val = dfltVal
if ( cnt[rowNr,colNr] != 0 ) {
val = int(100 * sum[rowNr,colNr] / (colNr==2 ? 1 : cnt[rowNr,colNr])) / 100
}
printf "%s%s", val, (colNr<NF ? OFS : ORS)
}
}
}
.
$ awk -f tst.awk file1 file2 file3
2.66 2 1 3
2.33 ? ? ?
3 15 4.33 1.33
3 8 4 0.66

Average of multiple files without considering missing values

I want to calculate the average of 15 files:- ifile1.txt, ifile2.txt, ....., ifile15.txt. Number of columns and rows of each file are same. But some of them are missing values. Part of the data looks as
ifile1.txt ifile2.txt ifile3.txt
3 ? ? ? . 1 2 1 3 . 4 ? ? ? .
1 ? ? ? . 1 ? ? ? . 5 ? ? ? .
4 6 5 2 . 2 5 5 1 . 3 4 3 1 .
5 5 7 1 . 0 0 1 1 . 4 3 4 0 .
. . . . . . . . . . . . . . .
I would like to find a new file which will show the average of these 15 fils without considering the missing values.
ofile.txt
2.66 2 1 3 . (i.e. average of 3 1 4, average of ? 2 ? and so on)
2.33 ? ? ? .
3 5 4.33 1.33 .
3 2.67 4 0.66 .
. . . . .
This question is similar to my earlier question Average of multiple files in shell where the script was
awk 'FNR == 1 { nfiles++; ncols = NF }
{ for (i = 1; i < NF; i++) sum[FNR,i] += $i
if (FNR > maxnr) maxnr = FNR
}
END {
for (line = 1; line <= maxnr; line++)
{
for (col = 1; col < ncols; col++)
printf " %f", sum[line,col]/nfiles;
printf "\n"
}
}' ifile*.txt
But I can't able to modify it.
Use this:
paste ifile*.txt | awk '{n=f=0; for(i=1;i<=NF;i++){if($i*1){f++;n+=$i}}; print n/f}'
paste will show all files side by side
awk calculates the averages per line:
n=f=0; set the variables to 0.
for(i=1;i<=NF;i++) loop trough all the fields.
if($i*1) if the field contains a digit (multiplication by 1 will succeed).
f++;n+=$i increment f (number of fields with digits) and sum up n.
print n/f calculate n/f.
awk '
{
for (i = 1;i <= NF;i++) {
Sum[FNR,i]+=$i
Count[FNR,i]+=$i!="?"
}
}
END {
for( i = 1; i <= FNR; i++){
for( j = 1; j <= NF; j++) printf "%s ", Count[i,j] != 0 ? Sum[i,j]/Count[i,j] : "?"
print ""
}
}
' ifile*
assuming file are correctly feeded (no trailing empty space line, ...)
awk 'FNR == 1 { nfiles++; ncols = NF }
{ for (i = 1; i < NF; i++)
if ( $i != "?" ) { sum[FNR,i] += $i ; count[FNR,i]++ ;}
if (FNR > maxnr) maxnr = FNR
}
END {
for (line = 1; line <= maxnr; line++)
{
for (col = 1; col < ncols; col++)
if ( count[line,col] > 0 ) printf " %f", sum[line,col]/count[line,col];
else printf " ? " ;
printf "\n" ;
}
}' ifile*.txt
I just check the '?' ...

Use awk to check a specific combination in other files

I have 3 files
base.txt
12345 6 78
13579 2 46
24680 1 35
123451 266 78
135792 6572 46
246803 12587 35
1stcheck.txt
Some odded stuffs
AB 12345/6/78 Fx00
BC 13579/2/47 0xFF
CD 24680/1/35 5x88
AB 123451/266_10/78 Fx00 #10 is mod(266,256)
BC 135792/6572_172/46 0xFF #172 is mod(6572,256)
CD 246803/12587_43/35 5x88 #43 is mod(12587,256)
There may be some other odded stuffs
2ndcheck.txt
12345u_6_78.dat
13579u_2_46.dat
24680u_0_35.dat
123451u_10_78.dat #10 is mod(266,256)
135792u_172_46.dat #172 is mod(6572,256)
246803u_43_35.dat #43 is mod(12587,256)
The info in 1stcheck.txt and 2ndcheck.txt is just combination of base.txt in applied some template/format
I'd like to have
report.txt
12345 6 78 passed passed
| |
(12345/6/78) (12345u_6_78)
13579 2 46 failed passed
24680 1 35 passed failed
123451 266 78 passed passed
135792 6572 46 passed passed
246803 12587 35 passed passed
Please help to consider about performance since
base.txt,2ndcheck.txt ~ 8MB-12MB
1stcheck.txt ~ 70MB
Many thanks
You'll have to decide if this is memory efficient: it does have to store data from all files in arrays before printing the table.
Required GNU awk
gawk '
# base file: store keys (and line numbers for output ordering)
FILENAME == ARGV[1] {key[$0] = FNR; next}
# 1st check: if key appears in base, store result as pass
FILENAME == ARGV[2] {
k = $2
gsub(/\//, " ", k)
if (k in key) pass1[k] = 1
}
# 2nd check: if key appears in base, store result as pass
FILENAME == ARGV[3] {
if ( match($0, /([0-9]+)._([0-9]+)_([0-9]+)\.dat/, m) ) {
k = m[1] " " m[2] " " m[3]
if (k in key) pass2[k] = 1
}
next
}
# print the result table
END {
PROCINFO["sorted_in"] = "#val_num_asc" # traverse array by line number
for (k in key) {
printf "%s\t%s\t%s\n", k \
, (k in pass1 ? "passed" : "failed") \
, (k in pass2 ? "passed" : "failed")
}
}
' base.txt 1stcheck.txt 2ndcheck.txt
12345 6 78 passed passed
13579 2 46 failed passed
24680 1 35 passed failed
Based on #glenn jackman's suggestion, I could solve my problem
gawk '
# Store key for 1st check
FILENAME == ARGV[1] {
k = $2
gsub(/\//, " ", k)
key_first[k];next
}
# Store key for 2nd check
FILENAME == ARGV[2] {
if ( match($0, /([0-9]+)._([0-9]+)_([0-9]+)\.dat/, m) ) {
k = m[1] " " m[2] " " m[3]
key_second[k];
}
next
}
# base file: do check on both 1st and 2nd check
FILENAME == ARGV[3] {
if($2>256) {
first=$1 " " $2 "_" ($2%256) " " $3
}
else {
first=$1 " " $2 " " $3
}
second=$1 " " $2%256 " " $3
if (first in key_first) pass1[$0] = 1
if (second in key_second) pass2[$0] = 1
key[$0]= FNR; next
}
# print the result table
END {
PROCINFO["sorted_in"] = "#val_num_asc" # traverse array by line number
for (k in key) {
printf "%s\t%s\t%s\n", k \
, (k in pass1 ? "sic_passed" : "sic_failed") \
, (k in pass2 ? "gd_passed" : "gd_failed")
}
}
' 1stcheck.txt 2ndcheck.txt base.txt

Finding averages from reading a file using Bash Scripting

I am trying to write a bash script that reads a file 'names.txt' and will compute the average of peoples grades. For instance, names.txt looks something like this.
900706845 Harry Thompson 70 80 90
900897665 Roy Ludson 90 90 90
The script should read the line, print out the ID# of the person, the average of the three test scores and the corresponding letter grade. So the output needs to look like this
900706845 80 B
900897665 90 A
Heres what I have
#!/bin/bash
cat names.txt | while read x
do
$SUM=0; for i in 'names.txt'; do SUM=$(($SUM + $i));
done;
echo $SUM/3
done
I understand the echo will only print out the averages at this point, but I am trying to atleast get it to compute the averages before I attempt the other parts as well. Baby steps!
Like this maybe:
#!/bin/bash
while read a name1 name2 g1 g2 g3
do
avg=$(echo "($g1+$g2+$g3)/3" | bc)
echo $a $name1 $name2 $avg
done < names.txt
Output:
900706845 Harry Thompson 80
900897665 Roy Ludson 90
Customize gradeLetter for your own needs:
#!/bin/sh
awk '
function gradeLetter(g)
{
if (g >= 90) return "A";
if (g >= 80) return "B";
if (g >= 70) return "C";
if (g >= 60) return "D";
return "E"
}
{
avgGrade = ($(NF) + $(NF - 1) + $(NF - 2)) / 3;
print $1, avgGrade, gradeLetter(avgGrade)
}' names.txt
With a awk one-liner:
awk '{ AVG = int( ( $(NF-2) + $(NF-1) + $(NF) ) / 3 ) ; if ( AVG >= 90 ) { GRADE = "A" } else if ( AVG >= 80 ) { GRADE = "B" } else if ( AVG >= 70 ) { GRADE = "C" } else if ( AVG >= 60 ) { GRADE = "D" } else { GRADE = "F" } ; print $1, AVG, GRADE }' file
Let's look at the details:
awk '{
# Calculate average
AVG = int( ( $(NF-2) + $(NF-1) + $(NF) ) / 3 )
# Calculate grade
if ( AVG >= 90 ) { GRADE = "A" }
else if ( AVG >= 80 ) { GRADE = "B" }
else if ( AVG >= 70 ) { GRADE = "C" }
else if ( AVG >= 60 ) { GRADE = "D" }
else { GRADE = "F" }
print $1, AVG, GRADE
}' file
The ID#s and averages can be obtained as follows:
$ awk '{sum=0; for(i=3;i<=NF;i++) sum+=$i ; print $1, sum/3}' names.txt
900706845 80
900897665 90
Guessing at how to compute grades, one can do:
$ awk '{sum=0; for(i=3;i<=NF;i++) sum+=$i ; ave=sum/3; print $1, ave, substr("FFFFFDCBA", ave/10, 1) }' names.txt
900706845 80 B
900897665 90 A
The above solutions work for any number of tests but names are limited to 2 words. If there will always be three tests but names can be any number of words, then use:
$ awk '{ave=($(NF-2)+$(NF-1)+$NF)/3; print $1, ave, substr("FFFFFDCBA", ave/10, 1) }' names.txt
900706845 80 B
900897665 90 A

2d histogram making

I have a data file containing two columns, like
1.1 2.2
3.1 4.5
1.2 4.5
3.2 4.6
1.1 2.3
4.2 4.9
4.2 1.1
I would like to make a histogram from the two columns, i.e. to get this output (if the step size (or bin size, as we talking about histogramming) equals to 0.1 in this case)
1.0 1.0 0
1.0 1.1 0
1.0 1.2 0
...
1.1 1.0 0
1.1 1.1 0
1.1 1.2 0
...
1.1 2.0 0
1.1 2.1 0
1.1 2.2 1
...
...
Can anybody suggest me something? It would be nice, if I can set the the range of values of the colmuns. In the above case the 1st column values goes from 1 to 4, and the same as for the second column.
EDITED: updated in order to handle more general data input, e.g. float numbers. The step size in the above case is 0.1, but it would be nice if it can be tunable for other settings, i.e. if step range (bin size) is for example 0.2, or 1.0.
If the step size is for example 1.0, then if I have 1.1 and 1.8 they have the same bin, we have to handle them together, for example (the range in this case let us say 4 for both of the two columns 0.0 ... 4.0)
1.1 1.8
2.5 2.6
1.4 2.1
1.3 1.5
3.3 4.0
3.8 3.9
4.0 3.2
4.0 4.0
output (if the bin size = 1.0)
1 1 2
1 2 1
1 3 0
1 4 0
2 1 0
2 2 1
2 3 0
2 4 0
3 1 0
3 2 0
3 3 1
3 4 1
4 1 0
4 2 0
4 3 1
4 4 1
awk 'END {
for (i = 0; ++i <= l;) {
for (j = 0; ++j <= l;)
printf "%d %d %d %s\n", i, j, \
b[i, j], (j < l ? x : ORS)
}
}
{
f[NR] = $1; s[NR] = $2
b[$1, $2]++
}' l=4 infile
You may try this (not thoroughly tested):
awk -v l=4 -v bs=0.1 'BEGIN {
if (!bs) {
print "invalid bin size" > "/dev/stderr"
exit
}
split(bs, t, ".")
t[2] || fl++
m = "%." length(t[2]) "f"
}
{
fk = fl ? int($1) : sprintf(m, $1)
sk = fl ? int($2) : sprintf(m, $2)
f[fk]; s[sk]; b[fk, sk]++
}
END {
if (!bs) exit 1
for (i = 1; int(i) <= l; i += bs) {
for (j = 1; int(j) <= l; j += bs) {
if (fl) {
fk = int(i); sk = int(j); m = "%d"
}
else {
fk = sprintf(m, i); sk = sprintf(m, j)
}
printf "%s" m OFS m OFS "%d\n", (i > 1 && fk != p ? ORS : x), fk, sk, b[fk, sk]
p = fk
}
}
}' infile
You can try this in bash:
for x in {1..4} ; do
for y in {1..4} ; do
echo $x%$y 0
done
done \
| join -1 1 -2 2 - -a1 <(sed 's/ /%/' FILE \
| sort \
| uniq -c \
| sort -k2 ) \
| sed 's/ 0 / /;s/%/ /'
It creates the table with all zeros in the last column, joins it with the real results (classic frequency table sort | uniq -c) and removes the zeros from the lines where a different number should be shown.
One solution in perl (sample output and usage to follow):
#!/usr/bin/perl -W
use strict;
my ($min, $step, $max, $file) = #ARGV
or die "Syntax: $0 <min> <step> <max> <file>\n";
my %seen;
open F, "$file"
or die "Cannot open file $file: $!\n";
my #l = map { chomp; $_} qx/seq $min $step $max/;
foreach my $first (#l) {
foreach my $second (#l) {
$seen{"$first $second"} = 0;
}
}
foreach my $line (<F>) {
chomp $line;
$line or next;
$seen{$line}++;
}
my $len = #l; # size of list
my $i = 0;
foreach my $key (sort keys %seen) {
printf("%s %d\n", $key, $seen{$key});
$i++;
print "\n" unless $i % $len;
}
exit(0);

Resources