Reed solomon encoding over Galois Field

Reed solomon encoding over Galois Field - reed-solomon

I am passing a secret key (length of 16 ASCII chars = 128 bits) through Reed Solomon encoder which is operating over Galois field 16 (2^16).
My question is: should this key be considered as 128 bits or 256?
I got lost here because I know that ASCII character = 8 bits so 16 ASCII character = 128 bits.
I read an article where it says once you passed the key through GF(16) then it will be 256, not 128 and I should pass only a key with 8 ASCII character instead. is this correct?
please see the function that I am using below where i used Matlab communication toolbox
function code = errorCorrectingCode(data, LENGTH)
% ERRORCORRECTINGCODE takes input data and runs it through Reed-Solomon Code
% See the following link:
% http://www.mathworks.com/help/comm/ref/encode.html
% >>> example: rsdec(rsenc(gf([1 2 3 ; 4 5 6],3),7,3),7,3);
% Apply Reed-Solomon encoding operation to data to obtain ECCed code.
% convert to GF(2^16) elements:
% (two characters map to one element, so 16char = 8)
elements = keyToField(data);
msg = elements';
% create encoding:
code = rsenc(msg,LENGTH,length(msg));
end
function gfArray = keyToField(keystr)
% KEYTOFIELD takes a key string composed of N characters,
% converting every two characters to a field array in the
% field GF(2^16).
% define number of elements in gfArray as floor(N / 2):
numElts = floor(length(keystr) / 2);
% define flag that checks if keystr is odd or not:
oddLength = mod(length(keystr),2);
% initialize pre-output:
gfArray_bin = zeros((numElts + oddLength),16);
% loop thru pairs of chars in the key:
for idx=1:numElts
curr = 2 * idx;
gfArray_bin(idx,:) = [dec2bin(double(keystr(curr - 1)),8) dec2bin(double(keystr(curr)),8)] - 48;
end
% take care of last element if odd:
if (oddLength)
gfArray_bin(end,:) = dec2bin(double(keystr(end)),16) - 48;
end
% convert everything to decimal again:
gfArray_dec = zeros(size(gfArray_bin,1),1);
for jdx=1:size(gfArray_bin,1)
% shorthand for current row:
bitrow = gfArray_bin(jdx,:);
% convert from a row of 1's and 0's to decimal representation:
gfArray_dec(jdx) = sum(bitrow .* 2.^(numel(bitrow)-1:-1:0));
end
% generate output by wrapping gfArray_dec in field array:
gfArray = gf(gfArray_dec,16);
end

Related

how to decrypt a string that is encrypted using XOR

I have tried to encrypt a string using a XOR operator and took the output in alphabets. Now when I am trying to decrypt it I'm not getting the string again.
Encryption code:
string= "Onions"
keyword = "MELLON"
def xor(string, key):
st=[]
ke=[]
xored=[]
for i in string:
asc= (ord(i))
st.append(int(asc))
print(st)
for i in key:
asc= (ord(i))
ke.append(int(asc))
print(ke)
for i in range(len(string)):
s1=st[i]
k1=ke[i]
abc = s1^k1
le = ord('A')+abc
ch = chr(le)
if le> 90:
le= le-26
ch = chr(le)
print(s1,k1)
print('XOR =',abc)
print(ch)
xored.append(ch)
print(xored)
return("" . join(xored))
Need help!!

The algorithm does not perform a pure XOR, but maps values conditionally to another value, leading to a relation that is no longer bijective.
To illustrate this point. See what this script outputs:
keyword = "MELLON"
print(xor("Onions", keyword) == xor("OTGEHs", keyword))
It will output True!
So this means you have two words that are encrypted to the same string. This also means that if you need to do the reverse, there is no way to know which of these is the real original word.
If you want to decryption to be possible, make sure to only use operations that lead to a bijective mapping. For instance, if you only use a XOR, without adding or subtracting values, it will be OK.
Here is an approach where only lower and uppercase letters of the Latin alphabet are allowed (for both arguments):
def togglecrypt(string, key):
mapper = "gUMtuAqhaEDcsGjBbreSNJYdFTiOmHKwnXWxzClQLRVyvIkfPpoZ"
res = []
for i, ch in enumerate(string):
shift = mapper.index(key[i % len(key)]) % 26
i = mapper.index(ch)
if i < 26:
j = 26 + (i + shift) % 26
else:
j = (i - shift) % 26
res.append(mapper[j])
return("".join(res))
keyword = "MELLON"
encoded = togglecrypt("Onions", keyword)
print(encoded) # TdsDAn
print(togglecrypt(encoded, keyword)) # Onions

Join two bytes in BASCOM-AVR

How can I join two bytes to make an 16-bit int variable in BASCOM-AVR?

PARTIAL ANSWER:
Subquestion 1
If one byte is stored in the variable BYTE1 and the other is stored in the variable BYTE2, you can merge them into WORD1 in many BASICS with WORD1 = BYTE1: WORD1 = (WORD1 SHL 8) OR BYTE2. This makes BYTE1 into the high-order bits of WORD1, and BYTE2 into the low-order bits.
Subquestion 2
If you want to mask (or select) specific bits of a word, use the AND operator, summing up the bit values of the bits of interest - for example, if you want to select the first and third bits (counting the first bit as the LSB of the word) of the variable FLAGS, you would look at the value of FLAGS AND 5 - 5 is binary 0000000000000101, so you are guaranteeing that all bits in the result will be 0 except for the first and third, which will carry whatever value they are showing in FLAGS (this is 'bitwise AND').

Function to shift-left/right binary:
Byte1# = 255
PRINT HEX$(Byte1#)
Byte1# = SHL(Byte1#, 8) ' shift-left 8 bits
PRINT HEX$(Byte1#)
END
' function to shift-left binary bits
FUNCTION SHL (V#, X)
SHL = V# * 2 ^ X
END FUNCTION
' function to shift-right binary bits
FUNCTION SHR (V#, X)
SHR = V# / 2 ^ X
END FUNCTION

You can find this in BASCOM index:
varn = MAKEINT(LSB , MSB)
The equivalent code is:
varn = (256 * MSB) + LSB
Varn: Variable that will be assigned with the converted value.
LSB: Variable or constant with the LS Byte.
MSB: Variable or constant with the MS Byte.
For example:
varn = MAKEINT(&B00100010,&B11101101)
The result is &B1110110100100010.

Generate an integer for encryption from a string and vice versa

I am trying to write an RSA code in python3. I need to turn user input strings (containing any characters, not only numbers) into integers to then encrypt them. What is the best way to turn a sting into an integer in Python 3.6 without 3-rd party modules?

how to encode a string to an integer is far from unique... there are many ways! this is one of them:
strg = 'user input'
i = int.from_bytes(strg.encode('utf-8'), byteorder='big')
the conversion in the other direction then is:
s = int.to_bytes(i, length=len(strg), byteorder='big').decode('utf-8')
and yes, you need to know the length of the resulting string before converting back. if length is too large, the string will be padded with chr(0) from the left (with byteorder='big'); if length is too small, int.to_bytes will raise an OverflowError: int too big to convert.

The #hiro protagonist's answer requires to know the length of the string. So I tried to find another solution and good answers here: Python3 convert Unicode String to int representation. I just summary here my favourite solutions:
def str2num(string):
return int(binascii.hexlify(string.encode("utf-8")), 16)
def num2str(number):
return binascii.unhexlify(format(number, "x").encode("utf-8")).decode("utf-8")
def numfy(s, max_code=0x110000):
# 0x110000 is max value of unicode character
number = 0
for e in [ord(c) for c in s]:
number = (number * max_code) + e
return number
def denumfy(number, max_code=0x110000):
l = []
while number != 0:
l.append(chr(number % max_code))
number = number // max_code
return ''.join(reversed(l))
Intersting: testing some cases shows me that
str2num(s) = numfy(s, max_code=256) if ord(s[i]) < 128
and
str2num(s) = int.from_bytes(s.encode('utf-8'), byteorder='big') (#hiro protagonist's answer)

Trying to while-loop my way through a binary file until a certain byte but the loop never ends

I've been working on an import script and have managed to hammer out most of the issues up until this point-I need to loop through the vertices until I reach a byte header that proceeds them but despite trying re.match, re.search, and !=
the while loop simply continues till the end of the file. I'm not sure where I went wrong given regex works with the if statement prior to this section of code.
while re.match(b'\x05\xC0.x\6E', byte) is None:
#Fill the vertex list by converting byte to its little endian float value
vertex[0] = struct.unpack('<f', byte)
byte = f.read(4)
vertex[1] = struct.unpack('<f', byte)
byte = f.read(4)
vertex[2] = struct.unpack('<f', byte)
#Append the vertices list with the completed vertex
vertices.append(vertex)
vertex_count += 1
#Read in what will either be the next X coordinate or a file header
byte = f.read(4)

The code is reading 4 bytes each time, but the pattern is 6 bytes-long.
>>> len(b'\x05\xC0.x\6E')
6
>>> b'\x05\xC0.x\6E' == b'\x05' + b'\xC0' + b'.' + b'x' + b'\6' + b'E'
True
The pattern will never match. That's why it continue until the end of the file.
IMHO, you mean this: (swapping the last \ and x)
b'\x05\xC0.\x6E'
>>> import re
>>> re.match(b'\x05\xC0.x\6E', b'\x05\xC0\x00\x6E') # no match
>>> re.match(b'\x05\xC0.\x6E', b'\x05\xC0\x00\x6E') # match
<_sre.SRE_Match object; span=(0, 4), match=b'\x05\xc0\x00n'>

Base64 length calculation?

After reading the base64 wiki ...
I'm trying to figure out how's the formula working :
Given a string with length of n , the base64 length will be
Which is : 4*Math.Ceiling(((double)s.Length/3)))
I already know that base64 length must be %4==0 to allow the decoder know what was the original text length.
The max number of padding for a sequence can be = or ==.
wiki :The number of output bytes per input byte is approximately 4 / 3 (33%
overhead)
Question:
How does the information above settle with the output length ?

Each character is used to represent 6 bits (log2(64) = 6).
Therefore 4 chars are used to represent 4 * 6 = 24 bits = 3 bytes.
So you need 4*(n/3) chars to represent n bytes, and this needs to be rounded up to a multiple of 4.
The number of unused padding chars resulting from the rounding up to a multiple of 4 will obviously be 0, 1, 2 or 3.

4 * n / 3 gives unpadded length.
And round up to the nearest multiple of 4 for padding, and as 4 is a power of 2 can use bitwise logical operations.
((4 * n / 3) + 3) & ~3

For reference, the Base64 encoder's length formula is as follows:
As you said, a Base64 encoder given n bytes of data will produce a string of 4n/3 Base64 characters. Put another way, every 3 bytes of data will result in 4 Base64 characters. EDIT: A comment correctly points out that my previous graphic did not account for padding; the correct formula for padding is 4(Ceiling(n/3)).
The Wikipedia article shows exactly how the ASCII string Man encoded into the Base64 string TWFu in its example. The input string is 3 bytes, or 24 bits, in size, so the formula correctly predicts the output will be 4 bytes (or 32 bits) long: TWFu. The process encodes every 6 bits of data into one of the 64 Base64 characters, so the 24-bit input divided by 6 results in 4 Base64 characters.
You ask in a comment what the size of encoding 123456 would be. Keeping in mind that every every character of that string is 1 byte, or 8 bits, in size (assuming ASCII/UTF8 encoding), we are encoding 6 bytes, or 48 bits, of data. According to the equation, we expect the output length to be (6 bytes / 3 bytes) * 4 characters = 8 characters.
Putting 123456 into a Base64 encoder creates MTIzNDU2, which is 8 characters long, just as we expected.

Integers
Generally we don't want to use doubles because we don't want to use the floating point ops, rounding errors etc. They are just not necessary.
For this it is a good idea to remember how to perform the ceiling division: ceil(x / y) in doubles can be written as (x + y - 1) / y (while avoiding negative numbers, but beware of overflow).
Readable
If you go for readability you can of course also program it like this (example in Java, for C you could use macro's, of course):
public static int ceilDiv(int x, int y) {
return (x + y - 1) / y;
}
public static int paddedBase64(int n) {
int blocks = ceilDiv(n, 3);
return blocks * 4;
}
public static int unpaddedBase64(int n) {
int bits = 8 * n;
return ceilDiv(bits, 6);
}
// test only
public static void main(String[] args) {
for (int n = 0; n < 21; n++) {
System.out.println("Base 64 padded: " + paddedBase64(n));
System.out.println("Base 64 unpadded: " + unpaddedBase64(n));
}
}
Inlined
Padded
We know that we need 4 characters blocks at the time for each 3 bytes (or less). So then the formula becomes (for x = n and y = 3):
blocks = (bytes + 3 - 1) / 3
chars = blocks * 4
or combined:
chars = ((bytes + 3 - 1) / 3) * 4
your compiler will optimize out the 3 - 1, so just leave it like this to maintain readability.
Unpadded
Less common is the unpadded variant, for this we remember that each we need a character for each 6 bits, rounded up:
bits = bytes * 8
chars = (bits + 6 - 1) / 6
or combined:
chars = (bytes * 8 + 6 - 1) / 6
we can however still divide by two (if we want to):
chars = (bytes * 4 + 3 - 1) / 3
Unreadable
In case you don't trust your compiler to do the final optimizations for you (or if you want to confuse your colleagues):
Padded
((n + 2) / 3) << 2
Unpadded
((n << 2) | 2) / 3
So there we are, two logical ways of calculation, and we don't need any branches, bit-ops or modulo ops - unless we really want to.
Notes:
Obviously you may need to add 1 to the calculations to include a null termination byte.
For Mime you may need to take care of possible line termination characters and such (look for other answers for that).

(In an attempt to give a succinct yet complete derivation.)
Every input byte has 8 bits, so for n input bytes we get:
n × 8      input bits
Every 6 bits is an output byte, so:
ceil(n × 8 / 6)  =  ceil(n × 4 / 3)      output bytes
This is without padding.
With padding, we round that up to multiple-of-four output bytes:
ceil(ceil(n × 4 / 3) / 4) × 4  =  ceil(n × 4 / 3 / 4) × 4  =  ceil(n / 3) × 4      output bytes
See Nested Divisions (Wikipedia) for the first equivalence.
Using integer arithmetics, ceil(n / m) can be calculated as (n + m – 1) div m,
hence we get:
(n * 4 + 2) div 3      without padding
(n + 2) div 3 * 4      with padding
For illustration:
n with padding (n + 2) div 3 * 4 without padding (n * 4 + 2) div 3
------------------------------------------------------------------------------
0 0 0
1 AA== 4 AA 2
2 AAA= 4 AAA 3
3 AAAA 4 AAAA 4
4 AAAAAA== 8 AAAAAA 6
5 AAAAAAA= 8 AAAAAAA 7
6 AAAAAAAA 8 AAAAAAAA 8
7 AAAAAAAAAA== 12 AAAAAAAAAA 10
8 AAAAAAAAAAA= 12 AAAAAAAAAAA 11
9 AAAAAAAAAAAA 12 AAAAAAAAAAAA 12
10 AAAAAAAAAAAAAA== 16 AAAAAAAAAAAAAA 14
11 AAAAAAAAAAAAAAA= 16 AAAAAAAAAAAAAAA 15
12 AAAAAAAAAAAAAAAA 16 AAAAAAAAAAAAAAAA 16
Finally, in the case of MIME Base64 encoding, two additional bytes (CR LF) are needed per every 76 output bytes, rounded up or down depending on whether a terminating newline is required.

Here is a function to calculate the original size of an encoded Base 64 file as a String in KB:
private Double calcBase64SizeInKBytes(String base64String) {
Double result = -1.0;
if(StringUtils.isNotEmpty(base64String)) {
Integer padding = 0;
if(base64String.endsWith("==")) {
padding = 2;
}
else {
if (base64String.endsWith("=")) padding = 1;
}
result = (Math.ceil(base64String.length() / 4) * 3 ) - padding;
}
return result / 1000;
}

I think the given answers miss the point of the original question, which is how much space needs to be allocated to fit the base64 encoding for a given binary string of length n bytes.
The answer is (floor(n / 3) + 1) * 4 + 1
This includes padding and a terminating null character. You may not need the floor call if you are doing integer arithmetic.
Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately.

For all people who speak C, take a look at these two macros:
// calculate the size of 'output' buffer required for a 'input' buffer of length x during Base64 encoding operation
#define B64ENCODE_OUT_SAFESIZE(x) ((((x) + 3 - 1)/3) * 4 + 1)
// calculate the size of 'output' buffer required for a 'input' buffer of length x during Base64 decoding operation
#define B64DECODE_OUT_SAFESIZE(x) (((x)*3)/4)
Taken from here.

While everyone else is debating algebraic formulas, I'd rather just use BASE64 itself to tell me:
$ echo "Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately."| wc -c
525
$ echo "Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately." | base64 | wc -c
710
So it seems the formula of 3 bytes being represented by 4 base64 characters seems correct.

I don't see the simplified formula in other responses. The logic is covered but I wanted a most basic form for my embedded use:
Unpadded = ((4 * n) + 2) / 3
Padded = 4 * ((n + 2) / 3)
NOTE: When calculating the unpadded count we round up the integer division i.e. add Divisor-1 which is +2 in this case

Seems to me that the right formula should be:
n64 = 4 * (n / 3) + (n % 3 != 0 ? 4 : 0)

I believe that this one is an exact answer if n%3 not zero, no ?
(n + 3-n%3)
4 * ---------
3
Mathematica version :
SizeB64[n_] := If[Mod[n, 3] == 0, 4 n/3, 4 (n + 3 - Mod[n, 3])/3]
Have fun
GI

Simple implementantion in javascript
function sizeOfBase64String(base64String) {
if (!base64String) return 0;
const padding = (base64String.match(/(=*)$/) || [])[1].length;
return 4 * Math.ceil((base64String.length / 3)) - padding;
}

If there is someone interested in achieve the #Pedro Silva solution in JS, I just ported this same solution for it:
const getBase64Size = (base64) => {
let padding = base64.length
? getBase64Padding(base64)
: 0
return ((Math.ceil(base64.length / 4) * 3 ) - padding) / 1000
}
const getBase64Padding = (base64) => {
return endsWith(base64, '==')
? 2
: 1
}
const endsWith = (str, end) => {
let charsFromEnd = end.length
let extractedEnd = str.slice(-charsFromEnd)
return extractedEnd === end
}

In windows - I wanted to estimate size of mime64 sized buffer, but all precise calculation formula's did not work for me - finally I've ended up with approximate formula like this:
Mine64 string allocation size (approximate)
= (((4 * ((binary buffer size) + 1)) / 3) + 1)
So last +1 - it's used for ascii-zero - last character needs to allocated to store zero ending - but why "binary buffer size" is + 1 - I suspect that there is some mime64 termination character ? Or may be this is some alignment issue.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Reed solomon encoding over Galois Field - reed-solomon

Related

how to decrypt a string that is encrypted using XOR

Join two bytes in BASCOM-AVR

Generate an integer for encryption from a string and vice versa

Trying to while-loop my way through a binary file until a certain byte but the loop never ends

Base64 length calculation?

Categories

Resources