Urlencode/decode, different representation of the same string - node.js

I am a bit out of my comfort zone here, so I'm not even sure I'm aproaching the problem appropriately. Anyhow, here goes:
So I have a problem where I shall hash some info with sha1 that will work as that info's id.
when a client wants to signal what current info is being used, it sends a percent-encoded sha1-string.
So one example is, my server hashes some info and gets a hex representation like so:
44 c1 b1 0d 6a de ce 01 09 fd 27 bc 81 7f 0e 90 e3 b7 93 08
and the client sends me
D%c1%b1%0dj%de%ce%01%09%fd%27%bc%81%7f%0e%90%e3%b7%93%08
Removing the % we get
D c1 b1 0dj de ce 01 09 fd 27 bc 81 7f 0e 90 e3 b7 93 08
which matches my hash except for the beginning D and the j after the 0d, but replacing those with their ascii hex no, we have identical hash.
So, as I have read and understood the urlencoding, the standard would allow a client to send the D as either D or %44? So different clients would be able to send different representations off the same hash, and I will not be able to just compare them for equality?
I would prefer to be able to compare the urlencoded strings as they are when they are sent, but one way to do it would be to decode them, removing all '%' and get the ascii hex value for whatever mismatch I get, much like the D and the j in my above example.
This all seems to be a very annoying way to do things, am I missing something, please tell me I am? :)
I am doing this in node.js but I suppose the solution would be language/platform agnostic.

I made this crude solution for now:
var unreserved = 'A B C D E F G H I J S O N K L M N O P Q R S T U V W X Y Za b c d e f g h i j s o n k l m n o p q r s t u v w x y z + 1 2 3 4 5 6 7 8 9 0 - _ . ~';
function hexToPercent(hex){
var index = 0,
end = hex.length,
delimiter = '%',
step = 2,
result = '',
tmp = '';
if(end % step !== 0){
console.log('\'' + hex + '\' must be dividable by ' + step + '.');
return result;
}
while(index < end){
tmp = hex.slice(index, index + step);
if(unreserved.indexOf(String.fromCharCode('0x' + tmp)) !== -1){
result = result + String.fromCharCode('0x' + tmp);
}
else{
result = result + delimiter + tmp;
}
index = index + step;
}
return result;
}

Related

Get If (condition), then (assign value), else (assign other value) statement in Linear Programming

I'm looking for a linear programming equation that satisfied the conditions;
Given that all variables here are binary variables
if A+B = 2; then C = 1; else C = 0
Also,
if A+B+D = 3; then E = 1; else E = 0
How would one phrase this and satisfy these conditions as well as linearity conditions?
I've tried
A + B - 2 <= M(1-y) and 1 - C <= My
for the first constraint but it doesn't seem to work
For the first equation, you can use:
C + 1 >= A + B
2C <= A + B
If there is a natural sense (max/min) for C in the problem, one of those is sufficient.
Similarly for the second:
E + 2 >= A + B + D
3E <= A + B + D

Changing protobuff optional field to oneof

I have the following message:
message Message {
int64 id = 1;
google.protobuf.FloatValue weight = 2;
google.protobuf.FloatValue override_weight = 3;
}
and I wish to change the type of weight and override_weight(optional fields) to google.protobuf.DoubleValue so what I did was the fllowing:
message Message {
int64 id = 1;
oneof weight_oneof {
google.protobuf.FloatValue weight = 2 [deprecated=true];
google.protobuf.DoubleValue double_weight = 4;
}
oneof override_weight_oneof {
google.protobuf.FloatValue override_weight = 3 [deprecated=true];
google.protobuf.DoubleValue double_override_weight = 5;
}
}
My question is, lets assume I have old messages who were compiled by the previous protobuff message compiler for the old message, would I be able to parse them as the new message?
The documentation is very vague about this:
"Move optional fields into or out of a oneof: You may lose some of your information (some fields will be cleared) after the message is serialized and parsed. However, you can safely move a single field into a new oneof and may be able to move multiple fields if it is known that only one is ever set."
Has anyone tried this before? what is the best practice for this situation?
As far as I know fields in an oneof are just serialize using their tag number. The serialized data does not indicate if a field is part of an oneof. This is all handled by the serializer and deserializer. So as long as the tag numbers do not conflict it can be assumed that it will work in both directions, old messages to a new serializer and new messages to an old serializer.
You could test this using an online protobuf deserializer.
Verification:
The code does indeed produce the same byte strings. Below you will find the message definitions and python code I used. The python code will output a byte string you can copy and use in the decoder of Marc Gravell.
syntax = "proto3";
message MessageA {
int64 id = 1;
float weight = 2;
float override_weight = 3;
}
message MessageB {
int64 id = 1;
oneof weight_oneof {
float weight = 2 [deprecated=true];
double double_weight = 4;
}
oneof override_weight_oneof {
float override_weight = 3 [deprecated=true];
double double_override_weight = 5;
}
}
import Example_pb2
# Set some data in the original message
msgA = Example_pb2.MessageA()
msgA.id = 1234
msgA.weight = 3.21
msgA.override_weight = 5.43
# Output the serialized bytes in a pretty format
str = 'msgA = '
for x in msgA.SerializeToString():
str += "{:02x} ".format(x)
print(str)
# Next set the original fields in the new message
msgB = Example_pb2.MessageB()
msgB.id = 1234
msgB.weight = 3.21
msgB.override_weight = 5.43
# Output the serialized bytes in a pretty format
str = 'msgB 1 = '
for x in msgB.SerializeToString():
str += "{:02x} ".format(x)
print(str)
# And finally set the new fields in msgB
msgB.double_weight = 3.21
msgB.double_override_weight = 5.43
# Output the serialized bytes in a pretty format
str = 'msgB 2 = '
for x in msgB.SerializeToString():
str += "{:02x} ".format(x)
print(str)
The output of the python script was:
msgA = 08 d2 09 15 a4 70 4d 40 1d 8f c2 ad 40
msgB 1 = 08 d2 09 15 a4 70 4d 40 1d 8f c2 ad 40
msgB 2 = 08 d2 09 21 ae 47 e1 7a 14 ae 09 40 29 b8 1e 85 eb 51 b8 15 40
As you can see message A and message B yield the same byte string when setting the original fields. Only when you set the new fields you get a different string.

Use regexprep with cell array for colons to format

I have a cell array formatted as:
t = {'23:34:22.959511';
'22:34:11.885113';
'12:34:08.995146';
'11:34:02.383092'}
I am trying to format the output as 4 column vectors as:
a = 23
22
12
11
b = 34
34
34
34
c = 22
11
08
02
d = 959511
885113
995146
383092
I am using regexprep to operate on the data:
a = regexprep(t,':34:22.959511', '')
However this only pertains to only one string in the data set and not all strings.
How do I divide the string into 4 column vectors -- using regexprep for colon: and display the output below?
If you're willing to use other solutions that regexp: strplit can split on any desired character:
a = zeros(numel(t),1);
b = zeros(numel(t),1);
c = zeros(numel(t),1);
d = zeros(numel(t),1);
for ii = 1:numel(t)
C = strsplit(t{ii}, ':');
a(ii) = str2double(C{1});
b(ii) = str2double(C{2});
tmp = strsplit(C{3},'.'); % Additional split for dot
c(ii) = str2double(tmp{1});
d(ii) = str2double(tmp{2});
end
Of course this only works when your data always has this structure (two colons, then one dot)
Here's a way:
r = cell2mat(cellfun(#str2double, regexp(t, ':|\.', 'split'), 'uniformoutput', false));
This gives
r =
23 34 22 959511
22 34 11 885113
12 34 8 995146
11 34 2 383092
If you really need four separate variables, you can use:
r = num2cell(r,1);
[a, b, c, d] = r{:};
I would recommend using split instead of strsplit. split will operate on vectors and if you use the string datatype you can just call double on the string to get the numeric value
>> profFunc
Adriaan's Solution: 5.299892
Luis Mendo's Solution: 3.449811
My Solution: 0.094535
function profFunc()
n = 1e4; % Loop to get measurable timings
t = ["23:34:22.959511";
"22:34:11.885113";
"12:34:08.995146";
"11:34:02.383092"];
tic
for i = 1:n
a = zeros(numel(t),1);
b = zeros(numel(t),1);
c = zeros(numel(t),1);
d = zeros(numel(t),1);
for ii = 1:numel(t)
C = strsplit(t{ii}, ':');
a(ii) = str2double(C{1});
b(ii) = str2double(C{2});
tmp = strsplit(C{3},'.'); % Additional split for dot
c(ii) = str2double(tmp{1});
d(ii) = str2double(tmp{2});
end
end
fprintf('Adriaan''s Solution: %f\n',toc);
tic
for i = 1:n
r = cell2mat(cellfun(#str2double, regexp(t, ':|\.', 'split'), 'uniformoutput', false));
r = num2cell(r,1);
[a, b, c, d] = r{:};
end
fprintf('Luis Mendo''s Solution: %f\n',toc);
tic
for i = 1:n
x = split(t,[":" "."]);
x = double(x);
a = x(:,1);
b = x(:,2);
c = x(:,3);
d = x(:,4);
end
fprintf('My Solution: %f\n',toc);

Read Hex file and append string matlab

I am reading H.264 bitstream as Hex file in matlab. I want to insert some string whenever some certain condition met. Like in the attached image if hex value of 00 00 00 01 occurs anywhere in the file i want to add some string like ABC before 00 00 00 01 in the file. String comparison is easy but how to do a Hex comparison?
Here is my code of reading file as hex file
f = fopen(theFile);
if f==-1
return
end
c = fread(f);
theSize=prod(size((c)));
c=sprintf('%02x\n',c);
c(3:3:end)='';
m=floor(length(c)/nChars);
hex='';
hex=reshape(c(1:m*nChars),nChars,m)';
if mod(length(c),nChars)
hex=strvcat(hex,c(m*nChars+1:end));
end
More specifically i want this c code converted into matlab
QByteArray data, basePattern;
basePattern.resize(3);
//start code:
basePattern[0] = (char) 0x00;
basePattern[1] = (char) 0x00;
basePattern[2] = (char) 0x01;
char end1 = 0x25, end2 = 0x45, end3 = 0x65;
x = myfile;//read using fopen
if (x == end1 || x == end2 {
}
Hex values are really just integers:
x = uint8(hex2dec({'01', '02', '0A', '0B', '25', '45', '65', '00', '01', 'AA'}))
x =
1
2
10
11
37
69
101
0
1
170
And they can be compared directly:
x(3) == uint8(hex2dec('0a'))
ans =
1
So putting it all together, you should create a new buffer and search through the bytes for the pattern, if it's found, insert you data, if not found, just append the byte:
pat0 = uint8(hex2dec('00'));
pat1 = uint8(hex2dec('00'));
pat2 = uint8(hex2dec('01'));
pos = 1;
data = % the uint8 array read in from the file.
new_data = uint8([]);
while pos < length(data) - 2
if data(pos+0) == pat0 && data(pos+1) == pat1 && data(pos+2) == pat2
% insert new data buffer and append pattern
new_data = [new_data my_data_to_insert pat0 pat1 pat2];
pos = pos + 3;
else
% append
new_data = [new_data data(pos)];
pos = pos + 1;
end
end
% append last 2 bytes
new_data = [new_data data(end-1:end)];

How does ASN.1 encode an object identifier?

I am having trouble understanding the basic concepts of ASN.1.
If a type is an OID, does the corresponding number get actually encoded in the binary data?
For instance in this definition:
id-ad-ocsp OBJECT IDENTIFIER ::= { id-ad 1 }
Does the corresponding 1.3.6.1.5.5.7.48.1 get encoded in the binary exactly like this?
I am asking this because I am trying to understand a specific value I see in a DER file (a certificate), which is 04020500, and I am not sure how to interpret it.
Yes, the OID is encoded in the binary data. The OID 1.3.6.1.5.5.7.48.1 you mention becomes 2b 06 01 05 05 07 30 01 (the first two numbers are encoded in a single byte, all remaining numbers are encoded in a single bytes as well because they're all smaller than 128).
A nice description of OID encoding is found here.
But the best way to analyze your ASN.1 data is to paste in into an online decoder, e.g. http://lapo.it/asn1js/.
If all your digits are less than or equal to 127 then you are very lucky because they can be represented with a single octet each. The tricky part is when you have larger numbers which are common, such as 1.2.840.113549.1.1.5 (sha1WithRsaEncryption). These examples focus on decoding, but encoding is just the opposite.
1. First two 'digits' are represented with a single byte
You can decode by reading the first byte into an integer
var firstByteNumber = 42;
var firstDigit = firstByteNumber / 40;
var secondDigit = firstByteNumber % 40;
Produces the values
1.2
2. Subsequent bytes are represented using Variable Length Quantity, also called base 128.
VLQ has two forms,
Short Form - If the octet starts with 0, then it is simply represented using the remaining 7 bits.
Long Form - If the octet starts with a 1 (most significant bit), combine the next 7 bits of that octet plus the 7 bits of each subsequent octet until you come across an octet with a 0 as the most significant bit (this marks the last octet).
The value 840 would be represented with the following two bytes,
10000110
01001000
Combine to 00001101001000 and read as int.
Great resource for BER encoding, http://luca.ntop.org/Teaching/Appunti/asn1.html
The first octet has value 40 * value1 + value2. (This is unambiguous,
since value1 is limited to values 0, 1, and 2; value2 is limited to
the range 0 to 39 when value1 is 0 or 1; and, according to X.208, n is
always at least 2.)
The following octets, if any, encode value3, ...,
valuen. Each value is encoded base 128, most significant digit first,
with as few digits as possible, and the most significant bit of each
octet except the last in the value's encoding set to "1." Example: The
first octet of the BER encoding of RSA Data Security, Inc.'s object
identifier is 40 * 1 + 2 = 42 = 2a16. The encoding of 840 = 6 * 128 +
4816 is 86 48 and the encoding of 113549 = 6 * 1282 + 7716 * 128 + d16
is 86 f7 0d. This leads to the following BER encoding:
06 06 2a 86 48 86 f7 0d
Finally, here is a OID decoder I just wrote in Perl.
sub getOid {
my $bytes = shift;
#first 2 nodes are 'special';
use integer;
my $firstByte = shift #$bytes;
my $number = unpack "C", $firstByte;
my $nodeFirst = $number / 40;
my $nodeSecond = $number % 40;
my #oidDigits = ($nodeFirst, $nodeSecond);
while (#$bytes) {
my $num = convertFromVLQ($bytes);
push #oidDigits, $num;
}
return join '.', #oidDigits;
}
sub convertFromVLQ {
my $bytes = shift;
my $firstByte = shift #$bytes;
my $bitString = unpack "B*", $firstByte;
my $firstBit = substr $bitString, 0, 1;
my $remainingBits = substr $bitString, 1, 7;
my $remainingByte = pack "B*", '0' . $remainingBits;
my $remainingInt = unpack "C", $remainingByte;
if ($firstBit eq '0') {
return $remainingInt;
}
else {
my $bitBuilder = $remainingBits;
my $nextFirstBit = "1";
while ($nextFirstBit eq "1") {
my $nextByte = shift #$bytes;
my $nextBits = unpack "B*", $nextByte;
$nextFirstBit = substr $nextBits, 0, 1;
my $nextSevenBits = substr $nextBits, 1, 7;
$bitBuilder .= $nextSevenBits;
}
my $MAX_BITS = 32;
my $missingBits = $MAX_BITS - (length $bitBuilder);
my $padding = 0 x $missingBits;
$bitBuilder = $padding . $bitBuilder;
my $finalByte = pack "B*", $bitBuilder;
my $finalNumber = unpack "N", $finalByte;
return $finalNumber;
}
}
OID encoding for dummies :) :
each OID component is encoded to one or more bytes (octets)
OID encoding is just a concatenation of these OID component encodings
first two components are encoded in a special way (see below)
if OID component binary value has less than 7 bits, the encoding is just a single octet, holding the component value (note, most significant bit, leftmost, will always be 0)
otherwise, if it has 8 and more bits, the value is "spread" into multiple octets - split the binary representation into 7 bit chunks (from right), left-pad the first one with zeroes if needed, and form octets from these septets by adding most significant (left) bit 1, except from the last chunk, which will have bit 0 there.
first two components (X.Y) are encoded like it is a single component with a value 40*X + Y
This is a rewording of ITU-T recommendation X.690, chapter 8.19
This is a simplistic Python 3 implementation of the of above, resp. a string form of an object identifier into ASN.1 DER or BER form.
def encode_variable_length_quantity(v:int) -> list:
# Break it up in groups of 7 bits starting from the lowest significant bit
# For all the other groups of 7 bits than lowest one, set the MSB to 1
m = 0x00
output = []
while v >= 0x80:
output.insert(0, (v & 0x7f) | m)
v = v >> 7
m = 0x80
output.insert(0, v | m)
return output
def encode_oid_string(oid_str:str) -> tuple:
a = [int(x) for x in oid_str.split('.')]
oid = [a[0]*40 + a[1]] # First two items are coded by a1*40+a2
# A rest is Variable-length_quantity
for n in a[2:]:
oid.extend(encode_variable_length_quantity(n))
oid.insert(0, len(oid)) # Add a Length
oid.insert(0, 0x06) # Add a Type (0x06 for Object Identifier)
return tuple(oid)
if __name__ == '__main__':
oid = encode_oid_string("1.2.840.10045.3.1.7")
print(oid)

Resources