How does ASN.1 encode an object identifier?

How does ASN.1 encode an object identifier? - security

I am having trouble understanding the basic concepts of ASN.1.
If a type is an OID, does the corresponding number get actually encoded in the binary data?
For instance in this definition:
id-ad-ocsp OBJECT IDENTIFIER ::= { id-ad 1 }
Does the corresponding 1.3.6.1.5.5.7.48.1 get encoded in the binary exactly like this?
I am asking this because I am trying to understand a specific value I see in a DER file (a certificate), which is 04020500, and I am not sure how to interpret it.

Yes, the OID is encoded in the binary data. The OID 1.3.6.1.5.5.7.48.1 you mention becomes 2b 06 01 05 05 07 30 01 (the first two numbers are encoded in a single byte, all remaining numbers are encoded in a single bytes as well because they're all smaller than 128).
A nice description of OID encoding is found here.
But the best way to analyze your ASN.1 data is to paste in into an online decoder, e.g. http://lapo.it/asn1js/.

If all your digits are less than or equal to 127 then you are very lucky because they can be represented with a single octet each. The tricky part is when you have larger numbers which are common, such as 1.2.840.113549.1.1.5 (sha1WithRsaEncryption). These examples focus on decoding, but encoding is just the opposite.
1. First two 'digits' are represented with a single byte
You can decode by reading the first byte into an integer
var firstByteNumber = 42;
var firstDigit = firstByteNumber / 40;
var secondDigit = firstByteNumber % 40;
Produces the values
1.2
2. Subsequent bytes are represented using Variable Length Quantity, also called base 128.
VLQ has two forms,
Short Form - If the octet starts with 0, then it is simply represented using the remaining 7 bits.
Long Form - If the octet starts with a 1 (most significant bit), combine the next 7 bits of that octet plus the 7 bits of each subsequent octet until you come across an octet with a 0 as the most significant bit (this marks the last octet).
The value 840 would be represented with the following two bytes,
10000110
01001000
Combine to 00001101001000 and read as int.
Great resource for BER encoding, http://luca.ntop.org/Teaching/Appunti/asn1.html
The first octet has value 40 * value1 + value2. (This is unambiguous,
since value1 is limited to values 0, 1, and 2; value2 is limited to
the range 0 to 39 when value1 is 0 or 1; and, according to X.208, n is
always at least 2.)
The following octets, if any, encode value3, ...,
valuen. Each value is encoded base 128, most significant digit first,
with as few digits as possible, and the most significant bit of each
octet except the last in the value's encoding set to "1." Example: The
first octet of the BER encoding of RSA Data Security, Inc.'s object
identifier is 40 * 1 + 2 = 42 = 2a16. The encoding of 840 = 6 * 128 +
4816 is 86 48 and the encoding of 113549 = 6 * 1282 + 7716 * 128 + d16
is 86 f7 0d. This leads to the following BER encoding:
06 06 2a 86 48 86 f7 0d
Finally, here is a OID decoder I just wrote in Perl.
sub getOid {
my $bytes = shift;
#first 2 nodes are 'special';
use integer;
my $firstByte = shift #$bytes;
my $number = unpack "C", $firstByte;
my $nodeFirst = $number / 40;
my $nodeSecond = $number % 40;
my #oidDigits = ($nodeFirst, $nodeSecond);
while (#$bytes) {
my $num = convertFromVLQ($bytes);
push #oidDigits, $num;
}
return join '.', #oidDigits;
}
sub convertFromVLQ {
my $bytes = shift;
my $firstByte = shift #$bytes;
my $bitString = unpack "B*", $firstByte;
my $firstBit = substr $bitString, 0, 1;
my $remainingBits = substr $bitString, 1, 7;
my $remainingByte = pack "B*", '0' . $remainingBits;
my $remainingInt = unpack "C", $remainingByte;
if ($firstBit eq '0') {
return $remainingInt;
}
else {
my $bitBuilder = $remainingBits;
my $nextFirstBit = "1";
while ($nextFirstBit eq "1") {
my $nextByte = shift #$bytes;
my $nextBits = unpack "B*", $nextByte;
$nextFirstBit = substr $nextBits, 0, 1;
my $nextSevenBits = substr $nextBits, 1, 7;
$bitBuilder .= $nextSevenBits;
}
my $MAX_BITS = 32;
my $missingBits = $MAX_BITS - (length $bitBuilder);
my $padding = 0 x $missingBits;
$bitBuilder = $padding . $bitBuilder;
my $finalByte = pack "B*", $bitBuilder;
my $finalNumber = unpack "N", $finalByte;
return $finalNumber;
}
}

OID encoding for dummies :) :
each OID component is encoded to one or more bytes (octets)
OID encoding is just a concatenation of these OID component encodings
first two components are encoded in a special way (see below)
if OID component binary value has less than 7 bits, the encoding is just a single octet, holding the component value (note, most significant bit, leftmost, will always be 0)
otherwise, if it has 8 and more bits, the value is "spread" into multiple octets - split the binary representation into 7 bit chunks (from right), left-pad the first one with zeroes if needed, and form octets from these septets by adding most significant (left) bit 1, except from the last chunk, which will have bit 0 there.
first two components (X.Y) are encoded like it is a single component with a value 40*X + Y
This is a rewording of ITU-T recommendation X.690, chapter 8.19

This is a simplistic Python 3 implementation of the of above, resp. a string form of an object identifier into ASN.1 DER or BER form.
def encode_variable_length_quantity(v:int) -> list:
# Break it up in groups of 7 bits starting from the lowest significant bit
# For all the other groups of 7 bits than lowest one, set the MSB to 1
m = 0x00
output = []
while v >= 0x80:
output.insert(0, (v & 0x7f) | m)
v = v >> 7
m = 0x80
output.insert(0, v | m)
return output
def encode_oid_string(oid_str:str) -> tuple:
a = [int(x) for x in oid_str.split('.')]
oid = [a[0]*40 + a[1]] # First two items are coded by a1*40+a2
# A rest is Variable-length_quantity
for n in a[2:]:
oid.extend(encode_variable_length_quantity(n))
oid.insert(0, len(oid)) # Add a Length
oid.insert(0, 0x06) # Add a Type (0x06 for Object Identifier)
return tuple(oid)
if __name__ == '__main__':
oid = encode_oid_string("1.2.840.10045.3.1.7")
print(oid)

Related

Addressing to a LUT with IEEE-754 format

When a LUT is used is common get the address for it via bitwise operations over the bits of a
number in fixed point. An example:
// The LUT has 8 addres: from 0 to 7 with a step of 1.
// The input number, x, is in u4.8 format
// Input number is 1.748 --> Fixed representation is 447: 0001.10111111
address = bits[1:4] + bits[4] // := 2; Returned value: 2
// Input number is 4.69 --> Fixed representation is 1201: 0100.10110001
address = bits[1:4] + bits[4] // := 5; Returned value: 5
// Input number is 7.22 --> Fixed representation is 1848: 0111.00111000
address = bits[1:4] + bits[4] // := 7; Returned value: 7
Ok, we suppose now that the LUT has 16 stored values: from 0 to 7.5 with a step of 0.5. An example:
// The LUT has 16 addres: from 0 to 7.5 with a step of 0.5.
// The input number, x, is in u4.8 format
// Input number is 1.748 --> Fixed representation is 447: 0001.10111111
address = bits[1:5] + bits[5] // := 3; Returned value: 1.5
// Input number is 4.69 --> Fixed representation is 1201: 0100.10110001
address = bits[1:5] + bits[5] // := 9; Returned value: 4.5
// Input number is 7.22 --> Fixed representation is 1848: 0111.00111000
address = bits[1:5] + bits[5] // := 14; Returned value: 7
The example is only to ilustrate that the goal is obtain the adress that corresponds with the nearest value to the input value based on the step. I achieved this in fixed point with a probability matching greater than 99 % in all the tests.
But, the question is: how to implement it in fp32 (IEEE-754)? Because representation in fp32 doesn't have and integer and fractional part, I have no idea how to implement it...
EDIT: FIRST APPROACH
As remarked #njuffa in the commentaries, the way to address a LUT with the IEEE-754 standard is using the MSB's from the mantissa. This bits contains the address and is neccessary get a specific range of bits always less or equal equal to the address length. I have calculated the number of bits necessaries take it into consideration the bits of the exponent. E.g If the LUT has a step of 1/256, the way that I have resolved the address is:
// To normalize the exponent
exponent = exponent - 127;
// msb indicates the number of bits to get from
// from the mantissa. This bits are the MSB. I have
// checked for large LUTS: 2¹⁸ stored values and always
// works well :)
// The step is 1/256: np.log2(step) = 8 --> The number of
// bits in the step!
msb = int(np.log2(step) - exp)
// And finally, get the bits from the mantissa
address = mantissa[31:msb]
Finally, an addition is necessary if rounding to the nearest is required, but if is it use table interpolation rounding to the nearest, is not neccesary.
I have perceived thtat when the input value is close to zero, sometimes the address is incorrect. Always is a difference of one with the reference test.
This way to resolve the address is right only if the step in the lut is a power of 2. For example: if the step of the LUT was pi/(4*512) with a range from 0 to pi/4, the total values stored in LUT would be 512, but with a step scalated by pi, so will be necessary perform a division as far as I know.
This is the test that I have performed to verify that the address is correct.
import numpy as np
import struct
# if mode == 0: No nearest rounding for address
# if mode == 1: Nearest rounding for address
MODE = 0
step = 256 # Step size in the LUT
max_value = 4 # Only for test purposes
LUT = np.arange(0, max_value, 1/step)
# Reference test
def ref_addressing(x):
if MODE == 0:
ref_address = (np.floor(x * step)).astype(np.int32)
elif MODE == 1:
ref_address = (np.round(x * step)).astype(np.int32)
return ref_address
# Test
def test_addressing(x):
j = 0
test_address = np.zeros(len(x), dtype=np.int32)
for x_ in x:
test_address[j] = ieee754_address(x_)
j = j + 1
return test_address
# Convert value to IEEE-754 Standard
def float_to_bin(num):
bits, = struct.unpack('!I', struct.pack('!f', num))
return "{:032b}".format(bits)
# Resolves the address with the MSB's bits of the mantissa
def ieee754_address(x):
ieee754 = float_to_bin(x) # Get the number in IEEE754 Standard
exp = 127 - int(ieee754[1:9], 2) # Calculte the exponent
mnt = ieee754[9::] # Get the mantissa
# How many bits are needed for succesfull addresing?
# np.log2(step): Maximun number of bits in the step size
# Obviously, these values are fixed in the hardware
# implementation. Only for testing.
msb = int(np.log2(step) - exp)
# Get the address. Don't forget the hidden bit!
address = int('1'+mnt[0:msb], 2)
# If rounding to the nearest, MODE == 1
if MODE == 1:
# int(mnt[msb], 2) --> Rounding bit
address = address + int(mnt[msb], 2)
# Negatives address if exp > int(np.log2(step)
if exp > int(np.log2(step):
address = 0
return address
# Uniform random values to check all the range
r = np.random.uniform(0.0, max_value, 65536)
# Perform the tests
ref_address = ref_addressing(r)
test_address = test_addressing(r)
# Statistics and report
diffs = ref_address - test_address
errors = len(np.where(diffs != 0)[0])
p = (1 - (errors / len(r))) * 100
print("-----------------------------------")
print("Total test's samples : %i " % (len(r)))
print("-----------------------------------")
print("Equal addrressing : %r " % (np.array_equal(ref_address, test_address)))
print("Errors : %i " % errors)
print("Probability matching : %0.3f %% " % p)
print("-----------------------------------")
And the sample of an one test (shows False when one or more address are not right):
-----------------------------------
Total test's samples : 65536
-----------------------------------
Equal addrressing : False
Errors : 2
Probability matching : 99.997 %
-----------------------------------

How to find the lexicographically smallest string by reversing a substring?

I have a string S which consists of a's and b's. Perform the below operation once. Objective is to obtain the lexicographically smallest string.
Operation: Reverse exactly one substring of S
e.g.
if S = abab then Output = aabb (reverse ba of string S)
if S = abba then Output = aabb (reverse bba of string S)
My approach
Case 1: If all characters of the input string are same then output will be the string itself.
Case 2: if S is of the form aaaaaaa....bbbbbb.... then answer will be S itself.
otherwise: Find the first occurence of b in S say the position is i. String S will look like
aa...bbb...aaaa...bbbb....aaaa....bbbb....aaaaa...
|
i
In order to obtain the lexicographically smallest string the substring that will be reversed starts from index i. See below for possible ending j.
aa...bbb...aaaa...bbbb....aaaa....bbbb....aaaaa...
| | | |
i j j j
Reverse substring S[i:j] for every j and find the smallest string.
The complexity of the algorithm will be O(|S|*|S|) where |S| is the length of the string.
Is there a better way to solve this problem? Probably O(|S|) solution.
What I am thinking if we can pick the correct j in linear time then we are done. We will pick that j where number of a's is maximum. If there is one maximum then we solved the problem but what if it's not the case? I have tried a lot. Please help.

So, I came up with an algorithm, that seems to be more efficient that O(|S|^2), but I'm not quite sure of it's complexity. Here's a rough outline:
Strip of the leading a's, storing in variable start.
Group the rest of the string into letter chunks.
Find the indices of the groups with the longest sequences of a's.
If only one index remains, proceed to 10.
Filter these indices so that the length of the [first] group of b's after reversal is at a minimum.
If only one index remains, proceed to 10.
Filter these indices so that the length of the [first] group of a's (not including the leading a's) after reversal is at a minimum.
If only one index remains, proceed to 10.
Go back to 5, except inspect the [second/third/...] groups of a's and b's this time.
Return start, plus the reversed groups up to index, plus the remaining groups.
Since any substring that is being reversed begins with a b and ends in an a, no two hypothesized reversals are palindromes and thus two reversals will not result in the same output, guaranteeing that there is a unique optimal solution and that the algorithm will terminate.
My intuition says this approach of probably O(log(|S|)*|S|), but I'm not too sure. An example implementation (not a very good one albeit) in Python is provided below.
from itertools import groupby
def get_next_bs(i, groups, off):
d = 1 + 2*off
before_bs = len(groups[i-d]) if i >= d else 0
after_bs = len(groups[i+d]) if i <= d and len(groups) > i + d else 0
return before_bs + after_bs
def get_next_as(i, groups, off):
d = 2*(off + 1)
return len(groups[d+1]) if i < d else len(groups[i-d])
def maximal_reversal(s):
# example input: 'aabaababbaababbaabbbaa'
first_b = s.find('b')
start, rest = s[:first_b], s[first_b:]
# 'aa', 'baababbaababbaabbbaa'
groups = [''.join(g) for _, g in groupby(rest)]
# ['b', 'aa', 'b', 'a', 'bb', 'aa', 'b', 'a', 'bb', 'aa', 'bbb', 'aa']
try:
max_length = max(len(g) for g in groups if g[0] == 'a')
except ValueError:
return s # no a's after the start, no reversal needed
indices = [i for i, g in enumerate(groups) if g[0] == 'a' and len(g) == max_length]
# [1, 5, 9, 11]
off = 0
while len(indices) > 1:
min_bs = min(get_next_bs(i, groups, off) for i in indices)
indices = [i for i in indices if get_next_bs(i, groups, off) == min_bs]
# off 0: [1, 5, 9], off 1: [5, 9], off 2: [9]
if len(indices) == 1:
break
max_as = max(get_next_as(i, groups, off) for i in indices)
indices = [i for i in indices if get_next_as(i, groups, off) == max_as]
# off 0: [1, 5, 9], off 1: [5, 9]
off += 1
i = indices[0]
groups[:i+1] = groups[:i+1][::-1]
return start + ''.join(groups)
# 'aaaabbabaabbabaabbbbaa'

TL;DR: Here's an algorithm that only iterates over the string once (with O(|S|)-ish complexity for limited string lengths). The example with which I explain it below is a bit long-winded, but the algorithm is really quite simple:
Iterate over the string, and update its value interpreted as a reverse (lsb-to-msb) binary number.
If you find the last zero of a sequence of zeros that is longer than the current maximum, store the current position, and the current reverse value. From then on, also update this value, interpreting the rest of the string as a forward (msb-to-lsb) binary number.
If you find the last zero of a sequence of zeros that is as long as the current maximum, compare the current reverse value with the current value of the stored end-point; if it is smaller, replace the end-point with the current position.
So you're basically comparing the value of the string if it were reversed up to the current point, with the value of the string if it were only reversed up to a (so-far) optimal point, and updating this optimal point on-the-fly.
Here's a quick code example; it could undoubtedly be coded more elegantly:
function reverseSubsequence(str) {
var reverse = 0, max = 0, first, last, value, len = 0, unit = 1;
for (var pos = 0; pos < str.length; pos++) {
var digit = str.charCodeAt(pos) - 97; // read next digit
if (digit == 0) {
if (first == undefined) continue; // skip leading zeros
if (++len > max || len == max && reverse < value) { // better endpoint found
max = len;
last = pos;
value = reverse;
}
} else {
if (first == undefined) first = pos; // end of leading zeros
len = 0;
}
reverse += unit * digit; // update reverse value
unit <<= 1;
value = value * 2 + digit; // update endpoint value
}
return {from: first || 0, to: last || 0};
}
var result = reverseSubsequence("aaabbaabaaabbabaaabaaab");
document.write(result.from + "→" + result.to);
(The code could be simplified by comparing reverse and value whenever a zero is found, and not just when the end of a maximally long sequence of zeros is encountered.)
You can create an algorithm that only iterates over the input once, and can process an incoming stream of unknown length, by keeping track of two values: the value of the whole string interpreted as a reverse (lsb-to-msb) binary number, and the value of the string with one part reversed. Whenever the reverse value goes below the value of the stored best end-point, a better end-point has been found.
Consider this string as an example:
aaabbaabaaabbabaaabaaab
or, written with zeros and ones for simplicity:
00011001000110100010001
We iterate over the leading zeros until we find the first one:
0001
^
This is the start of the sequence we'll want to reverse. We will start interpreting the stream of zeros and ones as a reversed (lsb-to-msb) binary number and update this number after every step:
reverse = 1, unit = 1
Then at every step, we double the unit and update the reverse number:
0001 reverse = 1
00011 unit = 2; reverse = 1 + 1 * 2 = 3
000110 unit = 4; reverse = 3 + 0 * 4 = 3
0001100 unit = 8; reverse = 3 + 0 * 8 = 3
At this point we find a one, and the sequence of zeros comes to an end. It contains 2 zeros, which is currently the maximum, so we store the current position as a possible end-point, and also store the current reverse value:
endpoint = {position = 6, value = 3}
Then we go on iterating over the string, but at every step, we update the value of the possible endpoint, but now as a normal (msb-to-lsb) binary number:
00011001 unit = 16; reverse = 3 + 1 * 16 = 19
endpoint.value *= 2 + 1 = 7
000110010 unit = 32; reverse = 19 + 0 * 32 = 19
endpoint.value *= 2 + 0 = 14
0001100100 unit = 64; reverse = 19 + 0 * 64 = 19
endpoint.value *= 2 + 0 = 28
00011001000 unit = 128; reverse = 19 + 0 * 128 = 19
endpoint.value *= 2 + 0 = 56
At this point we find that we have a sequence of 3 zeros, which is longer that the current maximum of 2, so we throw away the end-point we had so far and replace it with the current position and reverse value:
endpoint = {position = 10, value = 19}
And then we go on iterating over the string:
000110010001 unit = 256; reverse = 19 + 1 * 256 = 275
endpoint.value *= 2 + 1 = 39
0001100100011 unit = 512; reverse = 275 + 1 * 512 = 778
endpoint.value *= 2 + 1 = 79
00011001000110 unit = 1024; reverse = 778 + 0 * 1024 = 778
endpoint.value *= 2 + 0 = 158
000110010001101 unit = 2048; reverse = 778 + 1 * 2048 = 2826
endpoint.value *= 2 + 1 = 317
0001100100011010 unit = 4096; reverse = 2826 + 0 * 4096 = 2826
endpoint.value *= 2 + 0 = 634
00011001000110100 unit = 8192; reverse = 2826 + 0 * 8192 = 2826
endpoint.value *= 2 + 0 = 1268
000110010001101000 unit = 16384; reverse = 2826 + 0 * 16384 = 2826
endpoint.value *= 2 + 0 = 2536
Here we find that we have another sequence with 3 zeros, so we compare the current reverse value with the end-point's value, and find that the stored endpoint has a lower value:
endpoint.value = 2536 < reverse = 2826
so we keep the end-point set to position 10 and we go on iterating over the string:
0001100100011010001 unit = 32768; reverse = 2826 + 1 * 32768 = 35594
endpoint.value *= 2 + 1 = 5073
00011001000110100010 unit = 65536; reverse = 35594 + 0 * 65536 = 35594
endpoint.value *= 2 + 0 = 10146
000110010001101000100 unit = 131072; reverse = 35594 + 0 * 131072 = 35594
endpoint.value *= 2 + 0 = 20292
0001100100011010001000 unit = 262144; reverse = 35594 + 0 * 262144 = 35594
endpoint.value *= 2 + 0 = 40584
And we find another sequence of 3 zeros, so we compare this position to the stored end-point:
endpoint.value = 40584 > reverse = 35594
and we find it has a smaller value, so we replace the possible end-point with the current position:
endpoint = {position = 21, value = 35594}
And then we iterate over the final digit:
00011001000110100010001 unit = 524288; reverse = 35594 + 1 * 524288 = 559882
endpoint.value *= 2 + 1 = 71189
So at the end we find that position 21 gives us the lowest value, so it is the optimal solution:
00011001000110100010001 -> 00000010001011000100111
^ ^
start = 3 end = 21
Here's a C++ version that uses a vector of bool instead of integers. It can parse strings longer than 64 characters, but the complexity is probably quadratic.
#include <vector>
struct range {unsigned int first; unsigned int last;};
range lexiLeastRev(std::string const &str) {
unsigned int len = str.length(), first = 0, last = 0, run = 0, max_run = 0;
std::vector<bool> forward(0), reverse(0);
bool leading_zeros = true;
for (unsigned int pos = 0; pos < len; pos++) {
bool digit = str[pos] - 'a';
if (!digit) {
if (leading_zeros) continue;
if (++run > max_run || run == max_run && reverse < forward) {
max_run = run;
last = pos;
forward = reverse;
}
}
else {
if (leading_zeros) {
leading_zeros = false;
first = pos;
}
run = 0;
}
forward.push_back(digit);
reverse.insert(reverse.begin(), digit);
}
return range {first, last};
}

Convert binary ( integer and fraction) from VHDL to decimal, negative value in C code

I have a 14-bit data that is fed from FPGA in vhdl, The NIos II processor reads the 14-bit data from FPGA and do some processing tasks, where Nios II system is programmed in C code
The 14-bit data can be positive, zero or negative. In Altera compiler, I can only define the data to be 8,16 or 32. So I define this to be 16 bit data.
First, I need to check if the data is negative, if it is negative, I need to pad the first two MSB to be bit '1' so the system detects it as negative value instead of positive value.
Second, I need to compute the real value of this binary representation into a decimal value of BOTH integer and fraction.
I learned from this link (Correct algorithm to convert binary floating point "1101.11" into decimal (13.75)?) that I could convert a binary (consists of both integer and fraction) to decimal values.
To be specified, I am able to use this code quoted from this link (Correct algorithm to convert binary floating point "1101.11" into decimal (13.75)?) , reproduced as below:
#include <stdio.h>
#include <math.h>
double convert(const char binary[]){
int bi,i;
int len = 0;
int dot = -1;
double result = 0;
for(bi = 0; binary[bi] != '\0'; bi++){
if(binary[bi] == '.'){
dot = bi;
}
len++;
}
if(dot == -1)
dot=len;
for(i = dot; i >= 0 ; i--){
if (binary[i] == '1'){
result += (double) pow(2,(dot-i-1));
}
}
for(i=dot; binary[i] != '\0'; i++){
if (binary[i] == '1'){
result += 1.0/(double) pow(2.0,(double)(i-dot));
}
}
return result;
}
int main()
{
char bin[] = "1101.11";
char bin1[] = "1101";
char bin2[] = "1101.";
char bin3[] = ".11";
printf("%s -> %f\n",bin, convert(bin));
printf("%s -> %f\n",bin1, convert(bin1));
printf("%s -> %f\n",bin2, convert(bin2));
printf("%s -> %f\n",bin3, convert(bin3));
return 0;
}
I am wondering if this code can be used to check for negative value? I did try with a binary string of 11111101.11 and it gives the output of 253.75...
I have two questions:
What are the modifications I need to do in order to read a negative value?
I know that I can do the bit shift (as below) to check if the msb is 1, if it is 1, I know it is negative value...
if (14bit_data & 0x2000) //if true, it is negative value
The issue is, since it involves fraction part (but not only integer), it confused me a bit if the method still works...
If the binary number is originally not in string format, is there any way I could convert it to string? The binary number is originally fed from a fpga block written in VHDL say, 14 bits, with msb as the sign bit, the following 6 bits are the magnitude for integer and the last 6 bits are the magnitude for fractional part. I need the decimal value in C code for Altera Nios II processor.

OK so I m focusing on the fact that you want to reuse the algorithm you mention at the beginning of your question and assume that the binary representation you have for your signed number is Two's complement but I`m not really sure according to your comments that the input you have is the same than the one used by the algorithm
First pad the 2 MSB to have a 16 bit representation
16bit_data = (14_bit_data & 0x2000) ? ( 14_bit_data | 0xC000) : 14_bit_data ;
In case value is positive then value will remained unchanged and if negative this will be the correct two`s complement representation on 16bits.
For fractionnal part everything is the same compared to algorithm you mentionned in your question.
For integer part everything is the same except the treatment of MSB.
For unsigned number MSB (ie bit[15]) represents pow(2,15-6) ( 6 is the width of frationnal part ) whereas for signed number in Two`s complement representation it represents -pow(2,15-6) meaning that algorithm become
/* integer part operation */
while(p >= 1)
{
rem = (int)fmod(p, 10);
p = (int)(p / 10);
dec = dec + rem * pow(2, t) * (9 != t ? 1 : -1);
++t;
}
or said differently if you don`t want * operator
/* integer part operation */
while(p >= 1)
{
rem = (int)fmod(p, 10);
p = (int)(p / 10);
if( 9 != t)
{
dec = dec + rem * pow(2, t);
}
else
{
dec = dec - rem * pow(2, t);
}
++t;
}
For the second algorithm that you mention, considering you format if dot == 11 and i == 0 we are at MSB ( 10 integer bits followed by dot) so the code become
for(i = dot - 1; i >= 0 ; i--)
{
if (binary[i] == '1')
{
if(11 != dot || i)
{
result += (double) pow(2,(dot-i-1));
}
else
{
// result -= (double) pow(2,(dot-i-1));
// Due to your number format i == 0 and dot == 11 so
result -= 512
}
}
}
WARNING : in brice algorithm the input is character string like "11011.101" whereas according to your description you have an integer input so I`m not sure that this algorithm is suited to your case

I think this should work:
float convert14BitsToFloat(int16_t in)
{
/* Sign-extend in, since it is 14 bits */
if (in & 0x2000) in |= 0xC000;
/* convert to float with 6 decimal places (64 = 2^6) */
return (float)in / 64.0f;
}
To convert any number to string, I would use sprintf. Be aware it may significantly increase the size of your application. If you don't need the float and what to keep a small application, you should make your own conversion function.

How to aid Smaz in further compressing repeating characters?

Smaz is able to compress a short string (< 100 bytes) where other compressing tools fail.
But there is a problem with it, particularly repeating characters that it doesn't optimize by itself.
For example the string "this is a short string" compresses fine:
\x9b8\xac>\xbb\xf2>\xc3F
It is 9 bytes long. But if you have a short string with repeating characters you have a problem.. for example the string "this is a string with many aaaaaaaaaaaaaaaaaaaaaa's" compresses into this:
\x9b8\xac>\xc3F\xf3\xe3\xad\tG\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\xfe'\n
It is still smaller, but the many "\x04"'s look like a waste of space.
I've been thinking about calculating a letter occurrence and replacing it with a sort of "bookmark".. for example "aaaaaaaaaa" with ten "a" occurrences becomes "a//10".
This is a test Python snippet I've created out of my head, but is very very ugly as of now
a = set("this is a string with many aaaaaaaaaaaaaaaaaaaaaa's")
b = "this is a string with many aaaaaaaaaaaaaaaaaaaaaa's"
for i in a:
if i+i in b: # if char occ. > 2
o = b.count(i) - 2
s = i*o
c = b.replace(s, i+'//'+str(o))
print c
It then becomes:
this is a string with many a//22's
Smaz compressed
\x9b8\xac>\xc3F\xf3\xe3\xad\tG\x04\xc5\xc5\xff\x0222'\n
My worry is, what if the string contains an url? Is it safe to escape it like "//"? but then you have regex strings. How can it be escaped in that case?
Finally, my clear and concise question is: How do you safely shorten repeating characters that Smaz doesn't compress by itself?

Here's an example of safe compression of repeating bytes. My result for your data example
"this is a string with many aaaaaaaaaaaaaaaaaaaaaa's"
is:
"this is a string with many \x16a's"
It's 31 bytes long, a 39% reduction. "\x16" represents the one byte hexadecimal (22 decimal) count of repeating "a"'s.
What result do you get if you "Smaz" my result?
My result for your Smaz output example
"\x9b8\xac>\xc3F\xf3\xe3\xad\tG\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\xfe"
is:
"\x9b8\xac>\xc3F\xf3\xe3\xad\x01\tG\x16\x04\xfe"
It's 15 bytes long, a 56% reduction. "\x16" represents the one byte hexadecimal (22 decimal) count of repeating compressed "\x04"'s ("a"'s).
Here's my code in Go.
package main
import (
"fmt"
)
func Compress(src []byte) (dst []byte) {
for len(src) > 0 {
c := src[0]
n := 1
for ; n < len(src) && src[n] == c; n++ {
}
src = src[n:]
for n > 0 {
m := (n-1)%31 + 1
n -= m
if m == 1 && !(1 <= c && c <= 31) {
dst = append(dst, c)
} else {
dst = append(dst, byte(m), c)
}
}
}
return dst
}
func Decompress(src []byte) (dst []byte) {
for i := 0; i < len(src); i++ {
n, c := byte(1), src[i]
if i+1 < len(src) && (1 <= c && c <= 31) {
n, c = c, src[i+1]
i++
}
for j := byte(0); j < n; j++ {
dst = append(dst, c)
}
}
return dst
}
func test(data string) {
src := []byte(data)
fmt.Printf("%d %q\n", len(src), src)
compress := Compress(src)
fmt.Printf("%d %q\n", len(compress), compress)
decompress := Decompress(compress)
fmt.Printf("%d %q\n", len(decompress), decompress)
fmt.Println(string(Decompress(Compress(src))) == string(src))
}
func main() {
data := "this is a string with many aaaaaaaaaaaaaaaaaaaaaa's"
test(data)
fmt.Println()
smaz := "\x9b8\xac>\xc3F\xf3\xe3\xad\tG\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\xfe"
test(smaz)
}
Output:
51 "this is a string with many aaaaaaaaaaaaaaaaaaaaaa's"
31 "this is a string with many \x16a's"
51 "this is a string with many aaaaaaaaaaaaaaaaaaaaaa's"
true
34 "\x9b8\xac>\xc3F\xf3\xe3\xad\tG\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\xfe"
15 "\x9b8\xac>\xc3F\xf3\xe3\xad\x01\tG\x16\x04\xfe"
34 "\x9b8\xac>\xc3F\xf3\xe3\xad\tG\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\x04\xfe"
true

How do you safely shorten repeating characters that Smaz doesn't compress by itself?
You can't without changing the Smaz algorithm and being incompatible with Smaz.
Smaz is purpose built to be effective on small strings because its dictionary is universal and pre-computed. Other compression schemes need to build up a dictionary that is data set dependent, and typically takes a few hundred bytes for you to see positive returns. Repeating sequences are rare in short strings.
For your proposed Smaz variant with run length encoding scheme to work you would have to take up one of the 256 precious byte slots Smaz reserves for its codes. You could change one of the byte slots to mean "a byte indicating length to follow, followed by the byte to be repeated" - i.e., 3 bytes to communicate [REPEAT BYTE] [BYTE indicating 2 - 257 times] [BYTE CODE TO REPEAT]. You could reassign the Smaz byte code 253 from its present meaning of ".com" for the purpose of run-length encoding. But be aware that your compression will be slightly less effective for general data with ".com".
Also be aware that searching for repeating sequences in a hypothetical Smaz variant with run-length encoding would necessarily take more CPU compute time for the backtracking compression.

Find non-unique characters in a given string in O(n) time with constant space i.e with no extra auxiliary array

Given a string s containing only lower case alphabets (a - z), find (i.e print) the characters that are repeated.
For ex, if string s = "aabcacdddec"
Output: a c d
3 approaches to this problem exists:
[brute force] Check every char of string (i.e s[i] with every other char and print if both are same)
Time complexity: O(n^2)
Space complexity: O(1)
[sort and then compare adjacent elements] After sorting (in O(n log(n) time), traverse the string and check if s[i] ans s[i + 1] are equal
Time complexity: O(n logn) + O(n) = O(n logn)
Space complexity: O(1)
[store the character count in an array] Create an array of size 26 (to keep track of a - z) and for every s[i], increment value stored at index = s[i] - 26 in the array. Finally traverse the array and print all elements (i.e 'a' + i) with value greater than 1
Time complexity: O(n)
Space complexity: O(1) but we have a separate array for storing the frequency of each element.
Is there a O(n) approach that DOES NOT use any array/hash table/map (etc)?
HINT: Use BIT Vectors

This is the element distinctness problem, so generally speaking - no there is no way to solve it in O(n) without extra space.
However, if you regard the alphabet as constant size (a-z characters only is pretty constant) you can either create a bitset of these characters, in O(1) space [ it is constant!] or check for each character in O(n) if it repeats more than once, it will be O(constant*n), which is still in O(n).
Pseudo code for 1st solution:
bit seen[] = new bit[SIZE_OF_ALPHABET] //contant!
bit printed[] = new bit[SIZE_OF_ALPHABET] //so is this!
for each i in seen.length: //init:
seen[i] = 0
printed[i] = 0
for each character c in string: //traverse the string:
i = intValue(c)
//already seen it and didn't print it? print it now!
if seen[i] == 1 and printed[i] == 0:
print c
printed[i] = 1
else:
seen[i] = 1
Pseudo code for 2nd solution:
for each character c from a-z: //constant number of repeats is O(1)
count = 0
for each character x in the string: //O(n)
if x==c:
count += 1
if count > 1
print count

Implementation in Java
public static void findDuplicate(String str) {
int checker = 0;
char c = 'a';
for (int i = 0; i < str.length(); ++i) {
int val = str.charAt(i) - c;
if ((checker & (1 << val)) > 0) {
System.out.println((char)(c+val));
}else{
checker |= (1 << val);
}
}
}
Uses as int as storage and performs bit wise operator to find the duplicates.
it is in O(n) .. explanation follows
Input as "abddc"
i==0
STEP #1 : val = 98 - 98 (0) str.charAt(0) is a and conversion char to int is 98 ( ascii of 'a')
STEP #2 : 1 << val equal to ( 1 << 0 ) equal to 1 finally 1 & 0 is 0
STEP #3 : checker = 0 | ( 1 << 0) equal to 0 | 1 equal to 1 checker is 1
i==1
STEP #1 : val = 99 - 98 (1) str.charAt(1) is b and conversion char to int is 99 ( ascii of 'b')
STEP #2 : 1 << val equal to ( 1 << 1 ) equal to 2 finally 1 & 2 is 0
STEP #3 : checker = 2 | ( 1 << 1) equal to 2 | 1 equal to 2 finally checker is 2
i==2
STEP #1 : val = 101 - 98 (3) str.charAt(2) is d and conversion char to int is 101 ( ascii of 'd')
STEP #2 : 1 << val equal to ( 1 << 3 ) equal to 8 finally 2 & 8 is 0
STEP #3 : checker = 2 | ( 1 << 3) equal to 2 | 8 equal to 8 checker is 8
i==3
STEP #1 : val = 101 - 98 (3) str.charAt(3) is d and conversion char to int is 101 ( ascii of 'd')
STEP #2 : 1 << val equal to ( 1 << 3 ) equal to 8 finally 8 & 8 is 8
Now print 'd' since the value > 0
You can also use the Bit Vector, depends upon the language it would space efficient. In java i would prefer to use int for this fixed ( just 26) constant case

The size of the character set is a constant, so you could scan the input 26 times. All you need is a counter to store the number of times you've seen the character corresponding to the current iteration. At the end of each iteration, print that character if your counter is greater than 1.
It's O(n) in runtime and O(1) in auxiliary space.

Implementation in C# (recursive solution)
static void getNonUniqueElements(string s, string nonUnique)
{
if (s.Count() > 0)
{
char ch = s[0];
s = s.Substring(1);
if (s.LastIndexOf(ch) > 0)
{
if (nonUnique.LastIndexOf(ch) < 0)
nonUnique += ch;
}
getNonUniqueElements(s, nonUnique);
}
else
{
Console.WriteLine(nonUnique);
return;
}
}
static void Main(string[] args)
{
getNonUniqueElements("aabcacdddec", "");
Console.ReadKey();
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How does ASN.1 encode an object identifier? - security

Related

Addressing to a LUT with IEEE-754 format

How to find the lexicographically smallest string by reversing a substring?

Convert binary ( integer and fraction) from VHDL to decimal, negative value in C code

How to aid Smaz in further compressing repeating characters?

Find non-unique characters in a given string in O(n) time with constant space i.e with no extra auxiliary array

Categories

Resources