Why Scapy recalculates fragmentation size? - python-3.x

I am trying to fragment 120 bytes IP payload by 100 bytes. However, in output I got two packets one with 138 bytes and other with 50 bytes (Ethernet and IP header size are 14 and 20 bytes respectively). In first packet data offset starts from 0 to 103 and for second packet data offset starts from 104 to 119. Firstly I cannot understand why it works in this way. In order to understand I tried to look to source of defined fragment function in layers/inet.py line 552.
Scapy recalculates fragmentation size as follows:
def fragment(self, fragsize=1480):
"""Fragment IP datagrams"""
fragsize = (fragsize + 7) // 8 * 8 # <- RECALCULATION OF FRAGMENT SIZE
lst = []
fnb = 0
fl = self
while fl.underlayer is not None:
fnb += 1
fl = fl.underlayer
for p in fl:
s = raw(p[fnb].payload)
nb = (len(s) + fragsize - 1) // fragsize
for i in range(nb):
q = p.copy()
del(q[fnb].payload)
del(q[fnb].chksum)
del(q[fnb].len)
if i != nb - 1:
q[fnb].flags |= 1
q[fnb].frag += i * fragsize // 8
r = conf.raw_layer(load=s[i * fragsize:(i + 1) * fragsize])
r.overload_fields = p[fnb].payload.overload_fields.copy()
q.add_payload(r)
lst.append(q)
return lst
Can somebody explain why it is doing so?
N.B:
Ethernet header size 14 byte
IPv4 header size 20 byte

See https://github.com/secdev/scapy/issues/2424#issuecomment-576879663
From https://www.rfc-editor.org/rfc/rfc791#section-3.2 (page 25, top):
If an internet datagram is fragmented, its data portion must be broken on 8 octet boundaries.
To answer your question, fragment size must be a multiple of 8.
104 is a multiple of 8, not 100

Related

Serial communication : Send a list from Python3 to Arduino

I don't manage to deal with my problem even if I have read a lot about it on internet these last few days.
I try to communicate a variable length list from my Python3 program to my Arduino Leonardo.
Actually the length of these lists is variable but there are only three possible length :
first possibility : [0, 0, 1, 176, 1, 0, 0]
second possibility : [0, 1, 11, 255]
third possibility : [0, 2, 0]
(most of the values inside these lists are variables)
My Python3 code :
with Serial(port = port, baudrate=9600, timeout=1, writeTimeout=1) as port_serie :
if port_serie.isOpen() :
for value in Datas_To_Send : #Envoi des données
s = struct.pack('!{0}B'.format(len(Datas_To_Send)), *Datas_To_Send)
port_serie.write(s)
This code sends binary values like this one :
b'\x00\x00\x01\xb0\x01\x00\x00'
(the original list to send was : [0, 0, 1, 176, 1, 0, 0])
The problem is that I absolutely don't know how to find back my original list of values with my Arduino code...
My Arduino code (quite basic) :
void changeSettings() {
if ( Serial.available() > 0 ) {
int byte_read = Serial.read() ;
Serial.println(byte_read, DEC) ;
The output of this code is a pure conversion of each character from the ASCII to the Decimal...
Output (for the binary value I gave as example) :
98
39
92
120
48
48
92
120
48
48
92
120
48
49
92
120
98
48
92
120
48
49
92
120
48
48
92
120
48
48
39
10
Do you have any idea to find the first list back ?
Thank you
It seems you need to transmit either 7, 4 or 3 values, correct?
Are all the values under 256?
So, i would send 1 byte that is either 7, 4 or 3 followed by either 7, 4 or 3 bytes of the list's elements. If any item in your list is greater than 255 and less than 65,536 you'll need to send 2 bytes per element.
Well, I found the solution thanks to #MarkSetchel's and Olmagzar's (Discord user) tips.
Python3 code :
if Datas_To_Send :
long = len(Datas_To_Send)
Datas_To_Send.insert(0, long)
with Serial(port = '/dev/cu.usbmodem14101', baudrate=9600, timeout=1, writeTimeout=1) as port_serie :
if port_serie.isOpen() :
s = struct.pack('!{0}B'.format(len(Datas_To_Send)), *Datas_To_Send)
port_serie.write(s)
port_serie.close()
So I add list length directly at the first position of my list "Datas_To_Send".
Like that I just have to read it first on Arduino's side to know how many items I have to read.
Arduino code :
void changeSettings() {
if (Serial.available() > 0)
{
unsigned char len = Serial.read();
unsigned char Datas[len] ;
for (int i = 0; i < len - 1; i++)
{
unsigned char byte_read = Serial.read();
Datas[i] = byte_read ;
}
}
}

Addressing to a LUT with IEEE-754 format

When a LUT is used is common get the address for it via bitwise operations over the bits of a
number in fixed point. An example:
// The LUT has 8 addres: from 0 to 7 with a step of 1.
// The input number, x, is in u4.8 format
// Input number is 1.748 --> Fixed representation is 447: 0001.10111111
address = bits[1:4] + bits[4] // := 2; Returned value: 2
// Input number is 4.69 --> Fixed representation is 1201: 0100.10110001
address = bits[1:4] + bits[4] // := 5; Returned value: 5
// Input number is 7.22 --> Fixed representation is 1848: 0111.00111000
address = bits[1:4] + bits[4] // := 7; Returned value: 7
Ok, we suppose now that the LUT has 16 stored values: from 0 to 7.5 with a step of 0.5. An example:
// The LUT has 16 addres: from 0 to 7.5 with a step of 0.5.
// The input number, x, is in u4.8 format
// Input number is 1.748 --> Fixed representation is 447: 0001.10111111
address = bits[1:5] + bits[5] // := 3; Returned value: 1.5
// Input number is 4.69 --> Fixed representation is 1201: 0100.10110001
address = bits[1:5] + bits[5] // := 9; Returned value: 4.5
// Input number is 7.22 --> Fixed representation is 1848: 0111.00111000
address = bits[1:5] + bits[5] // := 14; Returned value: 7
The example is only to ilustrate that the goal is obtain the adress that corresponds with the nearest value to the input value based on the step. I achieved this in fixed point with a probability matching greater than 99 % in all the tests.
But, the question is: how to implement it in fp32 (IEEE-754)? Because representation in fp32 doesn't have and integer and fractional part, I have no idea how to implement it...
EDIT: FIRST APPROACH
As remarked #njuffa in the commentaries, the way to address a LUT with the IEEE-754 standard is using the MSB's from the mantissa. This bits contains the address and is neccessary get a specific range of bits always less or equal equal to the address length. I have calculated the number of bits necessaries take it into consideration the bits of the exponent. E.g If the LUT has a step of 1/256, the way that I have resolved the address is:
// To normalize the exponent
exponent = exponent - 127;
// msb indicates the number of bits to get from
// from the mantissa. This bits are the MSB. I have
// checked for large LUTS: 2¹⁸ stored values and always
// works well :)
// The step is 1/256: np.log2(step) = 8 --> The number of
// bits in the step!
msb = int(np.log2(step) - exp)
// And finally, get the bits from the mantissa
address = mantissa[31:msb]
Finally, an addition is necessary if rounding to the nearest is required, but if is it use table interpolation rounding to the nearest, is not neccesary.
I have perceived thtat when the input value is close to zero, sometimes the address is incorrect. Always is a difference of one with the reference test.
This way to resolve the address is right only if the step in the lut is a power of 2. For example: if the step of the LUT was pi/(4*512) with a range from 0 to pi/4, the total values stored in LUT would be 512, but with a step scalated by pi, so will be necessary perform a division as far as I know.
This is the test that I have performed to verify that the address is correct.
import numpy as np
import struct
# if mode == 0: No nearest rounding for address
# if mode == 1: Nearest rounding for address
MODE = 0
step = 256 # Step size in the LUT
max_value = 4 # Only for test purposes
LUT = np.arange(0, max_value, 1/step)
# Reference test
def ref_addressing(x):
if MODE == 0:
ref_address = (np.floor(x * step)).astype(np.int32)
elif MODE == 1:
ref_address = (np.round(x * step)).astype(np.int32)
return ref_address
# Test
def test_addressing(x):
j = 0
test_address = np.zeros(len(x), dtype=np.int32)
for x_ in x:
test_address[j] = ieee754_address(x_)
j = j + 1
return test_address
# Convert value to IEEE-754 Standard
def float_to_bin(num):
bits, = struct.unpack('!I', struct.pack('!f', num))
return "{:032b}".format(bits)
# Resolves the address with the MSB's bits of the mantissa
def ieee754_address(x):
ieee754 = float_to_bin(x) # Get the number in IEEE754 Standard
exp = 127 - int(ieee754[1:9], 2) # Calculte the exponent
mnt = ieee754[9::] # Get the mantissa
# How many bits are needed for succesfull addresing?
# np.log2(step): Maximun number of bits in the step size
# Obviously, these values are fixed in the hardware
# implementation. Only for testing.
msb = int(np.log2(step) - exp)
# Get the address. Don't forget the hidden bit!
address = int('1'+mnt[0:msb], 2)
# If rounding to the nearest, MODE == 1
if MODE == 1:
# int(mnt[msb], 2) --> Rounding bit
address = address + int(mnt[msb], 2)
# Negatives address if exp > int(np.log2(step)
if exp > int(np.log2(step):
address = 0
return address
# Uniform random values to check all the range
r = np.random.uniform(0.0, max_value, 65536)
# Perform the tests
ref_address = ref_addressing(r)
test_address = test_addressing(r)
# Statistics and report
diffs = ref_address - test_address
errors = len(np.where(diffs != 0)[0])
p = (1 - (errors / len(r))) * 100
print("-----------------------------------")
print("Total test's samples : %i " % (len(r)))
print("-----------------------------------")
print("Equal addrressing : %r " % (np.array_equal(ref_address, test_address)))
print("Errors : %i " % errors)
print("Probability matching : %0.3f %% " % p)
print("-----------------------------------")
And the sample of an one test (shows False when one or more address are not right):
-----------------------------------
Total test's samples : 65536
-----------------------------------
Equal addrressing : False
Errors : 2
Probability matching : 99.997 %
-----------------------------------

How to find the lexicographically smallest string by reversing a substring?

I have a string S which consists of a's and b's. Perform the below operation once. Objective is to obtain the lexicographically smallest string.
Operation: Reverse exactly one substring of S
e.g.
if S = abab then Output = aabb (reverse ba of string S)
if S = abba then Output = aabb (reverse bba of string S)
My approach
Case 1: If all characters of the input string are same then output will be the string itself.
Case 2: if S is of the form aaaaaaa....bbbbbb.... then answer will be S itself.
otherwise: Find the first occurence of b in S say the position is i. String S will look like
aa...bbb...aaaa...bbbb....aaaa....bbbb....aaaaa...
|
i
In order to obtain the lexicographically smallest string the substring that will be reversed starts from index i. See below for possible ending j.
aa...bbb...aaaa...bbbb....aaaa....bbbb....aaaaa...
| | | |
i j j j
Reverse substring S[i:j] for every j and find the smallest string.
The complexity of the algorithm will be O(|S|*|S|) where |S| is the length of the string.
Is there a better way to solve this problem? Probably O(|S|) solution.
What I am thinking if we can pick the correct j in linear time then we are done. We will pick that j where number of a's is maximum. If there is one maximum then we solved the problem but what if it's not the case? I have tried a lot. Please help.
So, I came up with an algorithm, that seems to be more efficient that O(|S|^2), but I'm not quite sure of it's complexity. Here's a rough outline:
Strip of the leading a's, storing in variable start.
Group the rest of the string into letter chunks.
Find the indices of the groups with the longest sequences of a's.
If only one index remains, proceed to 10.
Filter these indices so that the length of the [first] group of b's after reversal is at a minimum.
If only one index remains, proceed to 10.
Filter these indices so that the length of the [first] group of a's (not including the leading a's) after reversal is at a minimum.
If only one index remains, proceed to 10.
Go back to 5, except inspect the [second/third/...] groups of a's and b's this time.
Return start, plus the reversed groups up to index, plus the remaining groups.
Since any substring that is being reversed begins with a b and ends in an a, no two hypothesized reversals are palindromes and thus two reversals will not result in the same output, guaranteeing that there is a unique optimal solution and that the algorithm will terminate.
My intuition says this approach of probably O(log(|S|)*|S|), but I'm not too sure. An example implementation (not a very good one albeit) in Python is provided below.
from itertools import groupby
def get_next_bs(i, groups, off):
d = 1 + 2*off
before_bs = len(groups[i-d]) if i >= d else 0
after_bs = len(groups[i+d]) if i <= d and len(groups) > i + d else 0
return before_bs + after_bs
def get_next_as(i, groups, off):
d = 2*(off + 1)
return len(groups[d+1]) if i < d else len(groups[i-d])
def maximal_reversal(s):
# example input: 'aabaababbaababbaabbbaa'
first_b = s.find('b')
start, rest = s[:first_b], s[first_b:]
# 'aa', 'baababbaababbaabbbaa'
groups = [''.join(g) for _, g in groupby(rest)]
# ['b', 'aa', 'b', 'a', 'bb', 'aa', 'b', 'a', 'bb', 'aa', 'bbb', 'aa']
try:
max_length = max(len(g) for g in groups if g[0] == 'a')
except ValueError:
return s # no a's after the start, no reversal needed
indices = [i for i, g in enumerate(groups) if g[0] == 'a' and len(g) == max_length]
# [1, 5, 9, 11]
off = 0
while len(indices) > 1:
min_bs = min(get_next_bs(i, groups, off) for i in indices)
indices = [i for i in indices if get_next_bs(i, groups, off) == min_bs]
# off 0: [1, 5, 9], off 1: [5, 9], off 2: [9]
if len(indices) == 1:
break
max_as = max(get_next_as(i, groups, off) for i in indices)
indices = [i for i in indices if get_next_as(i, groups, off) == max_as]
# off 0: [1, 5, 9], off 1: [5, 9]
off += 1
i = indices[0]
groups[:i+1] = groups[:i+1][::-1]
return start + ''.join(groups)
# 'aaaabbabaabbabaabbbbaa'
TL;DR: Here's an algorithm that only iterates over the string once (with O(|S|)-ish complexity for limited string lengths). The example with which I explain it below is a bit long-winded, but the algorithm is really quite simple:
Iterate over the string, and update its value interpreted as a reverse (lsb-to-msb) binary number.
If you find the last zero of a sequence of zeros that is longer than the current maximum, store the current position, and the current reverse value. From then on, also update this value, interpreting the rest of the string as a forward (msb-to-lsb) binary number.
If you find the last zero of a sequence of zeros that is as long as the current maximum, compare the current reverse value with the current value of the stored end-point; if it is smaller, replace the end-point with the current position.
So you're basically comparing the value of the string if it were reversed up to the current point, with the value of the string if it were only reversed up to a (so-far) optimal point, and updating this optimal point on-the-fly.
Here's a quick code example; it could undoubtedly be coded more elegantly:
function reverseSubsequence(str) {
var reverse = 0, max = 0, first, last, value, len = 0, unit = 1;
for (var pos = 0; pos < str.length; pos++) {
var digit = str.charCodeAt(pos) - 97; // read next digit
if (digit == 0) {
if (first == undefined) continue; // skip leading zeros
if (++len > max || len == max && reverse < value) { // better endpoint found
max = len;
last = pos;
value = reverse;
}
} else {
if (first == undefined) first = pos; // end of leading zeros
len = 0;
}
reverse += unit * digit; // update reverse value
unit <<= 1;
value = value * 2 + digit; // update endpoint value
}
return {from: first || 0, to: last || 0};
}
var result = reverseSubsequence("aaabbaabaaabbabaaabaaab");
document.write(result.from + "→" + result.to);
(The code could be simplified by comparing reverse and value whenever a zero is found, and not just when the end of a maximally long sequence of zeros is encountered.)
You can create an algorithm that only iterates over the input once, and can process an incoming stream of unknown length, by keeping track of two values: the value of the whole string interpreted as a reverse (lsb-to-msb) binary number, and the value of the string with one part reversed. Whenever the reverse value goes below the value of the stored best end-point, a better end-point has been found.
Consider this string as an example:
aaabbaabaaabbabaaabaaab
or, written with zeros and ones for simplicity:
00011001000110100010001
We iterate over the leading zeros until we find the first one:
0001
^
This is the start of the sequence we'll want to reverse. We will start interpreting the stream of zeros and ones as a reversed (lsb-to-msb) binary number and update this number after every step:
reverse = 1, unit = 1
Then at every step, we double the unit and update the reverse number:
0001 reverse = 1
00011 unit = 2; reverse = 1 + 1 * 2 = 3
000110 unit = 4; reverse = 3 + 0 * 4 = 3
0001100 unit = 8; reverse = 3 + 0 * 8 = 3
At this point we find a one, and the sequence of zeros comes to an end. It contains 2 zeros, which is currently the maximum, so we store the current position as a possible end-point, and also store the current reverse value:
endpoint = {position = 6, value = 3}
Then we go on iterating over the string, but at every step, we update the value of the possible endpoint, but now as a normal (msb-to-lsb) binary number:
00011001 unit = 16; reverse = 3 + 1 * 16 = 19
endpoint.value *= 2 + 1 = 7
000110010 unit = 32; reverse = 19 + 0 * 32 = 19
endpoint.value *= 2 + 0 = 14
0001100100 unit = 64; reverse = 19 + 0 * 64 = 19
endpoint.value *= 2 + 0 = 28
00011001000 unit = 128; reverse = 19 + 0 * 128 = 19
endpoint.value *= 2 + 0 = 56
At this point we find that we have a sequence of 3 zeros, which is longer that the current maximum of 2, so we throw away the end-point we had so far and replace it with the current position and reverse value:
endpoint = {position = 10, value = 19}
And then we go on iterating over the string:
000110010001 unit = 256; reverse = 19 + 1 * 256 = 275
endpoint.value *= 2 + 1 = 39
0001100100011 unit = 512; reverse = 275 + 1 * 512 = 778
endpoint.value *= 2 + 1 = 79
00011001000110 unit = 1024; reverse = 778 + 0 * 1024 = 778
endpoint.value *= 2 + 0 = 158
000110010001101 unit = 2048; reverse = 778 + 1 * 2048 = 2826
endpoint.value *= 2 + 1 = 317
0001100100011010 unit = 4096; reverse = 2826 + 0 * 4096 = 2826
endpoint.value *= 2 + 0 = 634
00011001000110100 unit = 8192; reverse = 2826 + 0 * 8192 = 2826
endpoint.value *= 2 + 0 = 1268
000110010001101000 unit = 16384; reverse = 2826 + 0 * 16384 = 2826
endpoint.value *= 2 + 0 = 2536
Here we find that we have another sequence with 3 zeros, so we compare the current reverse value with the end-point's value, and find that the stored endpoint has a lower value:
endpoint.value = 2536 < reverse = 2826
so we keep the end-point set to position 10 and we go on iterating over the string:
0001100100011010001 unit = 32768; reverse = 2826 + 1 * 32768 = 35594
endpoint.value *= 2 + 1 = 5073
00011001000110100010 unit = 65536; reverse = 35594 + 0 * 65536 = 35594
endpoint.value *= 2 + 0 = 10146
000110010001101000100 unit = 131072; reverse = 35594 + 0 * 131072 = 35594
endpoint.value *= 2 + 0 = 20292
0001100100011010001000 unit = 262144; reverse = 35594 + 0 * 262144 = 35594
endpoint.value *= 2 + 0 = 40584
And we find another sequence of 3 zeros, so we compare this position to the stored end-point:
endpoint.value = 40584 > reverse = 35594
and we find it has a smaller value, so we replace the possible end-point with the current position:
endpoint = {position = 21, value = 35594}
And then we iterate over the final digit:
00011001000110100010001 unit = 524288; reverse = 35594 + 1 * 524288 = 559882
endpoint.value *= 2 + 1 = 71189
So at the end we find that position 21 gives us the lowest value, so it is the optimal solution:
00011001000110100010001 -> 00000010001011000100111
^ ^
start = 3 end = 21
Here's a C++ version that uses a vector of bool instead of integers. It can parse strings longer than 64 characters, but the complexity is probably quadratic.
#include <vector>
struct range {unsigned int first; unsigned int last;};
range lexiLeastRev(std::string const &str) {
unsigned int len = str.length(), first = 0, last = 0, run = 0, max_run = 0;
std::vector<bool> forward(0), reverse(0);
bool leading_zeros = true;
for (unsigned int pos = 0; pos < len; pos++) {
bool digit = str[pos] - 'a';
if (!digit) {
if (leading_zeros) continue;
if (++run > max_run || run == max_run && reverse < forward) {
max_run = run;
last = pos;
forward = reverse;
}
}
else {
if (leading_zeros) {
leading_zeros = false;
first = pos;
}
run = 0;
}
forward.push_back(digit);
reverse.insert(reverse.begin(), digit);
}
return range {first, last};
}

when compressing a sas dataset increases its size?

I had written a code which creates SAS dataset with compress=yes option. That said the resultant datasets is getting compressed with an increased size as seen in log
1374 +proc sql;
1375 + create table seg.KRG_EO_PVS_CUST_PROD_&op_cyc.
1376 + (
1377 + COMPRESS = YES
1378 + ) as
1379 + select
^L32 The SAS System 02:15 Thursday, August 20, 2015
1380 + W6DFFTE1.DIB_CUST_ID length = 8
1381 + format = 15.
1382 + informat = 15.
1383 + label = 'The logical customer id',
1384 + W6DFFTE1.DIB_PROD_ID length = 8
1385 + format = 15.
1386 + informat = 15.
1387 + label = 'The product id',
1388 + case when W5TM24S0.OFFER_FLAG = "1" then "1" else "0" end as OFFER_FLAG length = 1,
1389 + sum(W6DFFTE1.TOT_QUANTITY ) as TOT_QUANTITY length = 8
1390 + format = 10.
1391 + informat = 5.
1392 + label = 'Number of items purchased'
1393 + from
1394 + work.W6DFFTE1 left join
1395 + work.W5TM24S0
1396 + on
1397 + (
1398 + W5TM24S0.DIB_STORE_ID = W6DFFTE1.DIB_STORE_ID
1399 + and W5TM24S0.DIB_SCAN_ID = W6DFFTE1.DIB_SCAN_ID
1400 + )
1401 + group by
1402 + W6DFFTE1.DIB_CUST_ID,
1403 + W6DFFTE1.DIB_PROD_ID,
1404 + W5TM24S0.OFFER_FLAG
1405 + ;
NOTE: Compressing data set SEG.KRG_EO_PVS_CUST_PROD_20150701 increased size by 43.27 percent.
Compressed is 1961732 pages; un-compressed would require 1369265 pages.
NOTE: Table SEG.KRG_EO_PVS_CUST_PROD_20150701 created, with 346423801 rows and 4 columns.
I just want to know what are the probable reasons for this to happen
SAS compression is pretty primitive and compress=yes just lets SAS save disk space by not writing actual bytes of data for unused length in character variables. It looks like your data is three numeric variables, plus a one-character-long variable. This is not much to work with, plus it would have to add whatever formatting overhead is involved with a compressed file.
If you need to compress files for medium or long term storage, you're much better off using a separate zip or tar utility.
EDIT: I don't mean to disparage SAS compression. I believe the designers were more concerned with preserving relatively fast disk access than with with providing actual zip-style compression.

How does ASN.1 encode an object identifier?

I am having trouble understanding the basic concepts of ASN.1.
If a type is an OID, does the corresponding number get actually encoded in the binary data?
For instance in this definition:
id-ad-ocsp OBJECT IDENTIFIER ::= { id-ad 1 }
Does the corresponding 1.3.6.1.5.5.7.48.1 get encoded in the binary exactly like this?
I am asking this because I am trying to understand a specific value I see in a DER file (a certificate), which is 04020500, and I am not sure how to interpret it.
Yes, the OID is encoded in the binary data. The OID 1.3.6.1.5.5.7.48.1 you mention becomes 2b 06 01 05 05 07 30 01 (the first two numbers are encoded in a single byte, all remaining numbers are encoded in a single bytes as well because they're all smaller than 128).
A nice description of OID encoding is found here.
But the best way to analyze your ASN.1 data is to paste in into an online decoder, e.g. http://lapo.it/asn1js/.
If all your digits are less than or equal to 127 then you are very lucky because they can be represented with a single octet each. The tricky part is when you have larger numbers which are common, such as 1.2.840.113549.1.1.5 (sha1WithRsaEncryption). These examples focus on decoding, but encoding is just the opposite.
1. First two 'digits' are represented with a single byte
You can decode by reading the first byte into an integer
var firstByteNumber = 42;
var firstDigit = firstByteNumber / 40;
var secondDigit = firstByteNumber % 40;
Produces the values
1.2
2. Subsequent bytes are represented using Variable Length Quantity, also called base 128.
VLQ has two forms,
Short Form - If the octet starts with 0, then it is simply represented using the remaining 7 bits.
Long Form - If the octet starts with a 1 (most significant bit), combine the next 7 bits of that octet plus the 7 bits of each subsequent octet until you come across an octet with a 0 as the most significant bit (this marks the last octet).
The value 840 would be represented with the following two bytes,
10000110
01001000
Combine to 00001101001000 and read as int.
Great resource for BER encoding, http://luca.ntop.org/Teaching/Appunti/asn1.html
The first octet has value 40 * value1 + value2. (This is unambiguous,
since value1 is limited to values 0, 1, and 2; value2 is limited to
the range 0 to 39 when value1 is 0 or 1; and, according to X.208, n is
always at least 2.)
The following octets, if any, encode value3, ...,
valuen. Each value is encoded base 128, most significant digit first,
with as few digits as possible, and the most significant bit of each
octet except the last in the value's encoding set to "1." Example: The
first octet of the BER encoding of RSA Data Security, Inc.'s object
identifier is 40 * 1 + 2 = 42 = 2a16. The encoding of 840 = 6 * 128 +
4816 is 86 48 and the encoding of 113549 = 6 * 1282 + 7716 * 128 + d16
is 86 f7 0d. This leads to the following BER encoding:
06 06 2a 86 48 86 f7 0d
Finally, here is a OID decoder I just wrote in Perl.
sub getOid {
my $bytes = shift;
#first 2 nodes are 'special';
use integer;
my $firstByte = shift #$bytes;
my $number = unpack "C", $firstByte;
my $nodeFirst = $number / 40;
my $nodeSecond = $number % 40;
my #oidDigits = ($nodeFirst, $nodeSecond);
while (#$bytes) {
my $num = convertFromVLQ($bytes);
push #oidDigits, $num;
}
return join '.', #oidDigits;
}
sub convertFromVLQ {
my $bytes = shift;
my $firstByte = shift #$bytes;
my $bitString = unpack "B*", $firstByte;
my $firstBit = substr $bitString, 0, 1;
my $remainingBits = substr $bitString, 1, 7;
my $remainingByte = pack "B*", '0' . $remainingBits;
my $remainingInt = unpack "C", $remainingByte;
if ($firstBit eq '0') {
return $remainingInt;
}
else {
my $bitBuilder = $remainingBits;
my $nextFirstBit = "1";
while ($nextFirstBit eq "1") {
my $nextByte = shift #$bytes;
my $nextBits = unpack "B*", $nextByte;
$nextFirstBit = substr $nextBits, 0, 1;
my $nextSevenBits = substr $nextBits, 1, 7;
$bitBuilder .= $nextSevenBits;
}
my $MAX_BITS = 32;
my $missingBits = $MAX_BITS - (length $bitBuilder);
my $padding = 0 x $missingBits;
$bitBuilder = $padding . $bitBuilder;
my $finalByte = pack "B*", $bitBuilder;
my $finalNumber = unpack "N", $finalByte;
return $finalNumber;
}
}
OID encoding for dummies :) :
each OID component is encoded to one or more bytes (octets)
OID encoding is just a concatenation of these OID component encodings
first two components are encoded in a special way (see below)
if OID component binary value has less than 7 bits, the encoding is just a single octet, holding the component value (note, most significant bit, leftmost, will always be 0)
otherwise, if it has 8 and more bits, the value is "spread" into multiple octets - split the binary representation into 7 bit chunks (from right), left-pad the first one with zeroes if needed, and form octets from these septets by adding most significant (left) bit 1, except from the last chunk, which will have bit 0 there.
first two components (X.Y) are encoded like it is a single component with a value 40*X + Y
This is a rewording of ITU-T recommendation X.690, chapter 8.19
This is a simplistic Python 3 implementation of the of above, resp. a string form of an object identifier into ASN.1 DER or BER form.
def encode_variable_length_quantity(v:int) -> list:
# Break it up in groups of 7 bits starting from the lowest significant bit
# For all the other groups of 7 bits than lowest one, set the MSB to 1
m = 0x00
output = []
while v >= 0x80:
output.insert(0, (v & 0x7f) | m)
v = v >> 7
m = 0x80
output.insert(0, v | m)
return output
def encode_oid_string(oid_str:str) -> tuple:
a = [int(x) for x in oid_str.split('.')]
oid = [a[0]*40 + a[1]] # First two items are coded by a1*40+a2
# A rest is Variable-length_quantity
for n in a[2:]:
oid.extend(encode_variable_length_quantity(n))
oid.insert(0, len(oid)) # Add a Length
oid.insert(0, 0x06) # Add a Type (0x06 for Object Identifier)
return tuple(oid)
if __name__ == '__main__':
oid = encode_oid_string("1.2.840.10045.3.1.7")
print(oid)

Resources