Elixir: Base64 decode TTN message - base64

objective:
I'm getting a base64 encoded raw payload from the things network. I'm trying to decode that on my server instead of using their javascript payload decoders.
The hex encoded message is: AWYQkQCsCPMANA==
what works:
My usual decoding goes like this:
def decode(string) do
string = String.upcase(string)
# get the first 2 bytes, which is 4 hex characters
{flagsbat, string} = String.split_at(string, 4)
parse(Base.decode16!(flagsbat), string)
end
And the function head of the actual decoder:
defp parse(<<a::1, b::1, _reserve::1, c::1, d::1, e::1, f::1, g::1, bat>>, <<data::binary>>) do
And that part works fine for a string like "01660dfa0109038d0030". But somehow when TTN sends me the Base64 encoded raw payload, everything fails.
what doesn't work:
I'm trying to call the function Base.decode64!(raw_payload) |> decode()
It gives me the error: ** (ArgumentError) non-alphabet digit found: "\x01" (byte 1)
Interestingly I have found out that if I Base.decode64!("AWYQkQCsCPMANA=="), I get <<1, 102, 16, 145, 0, 172, 8, 243, 0, 52>>, while https://cryptii.com/pipes/base64-to-hex returns me 01 66 10 91 00 ac 08 f3 00 34. Why?
EDIT:
To make it clear:
{flagsbat, string} = "AWYQkQCsCPMANA==" |> Base.decode64!() |> String.upcase() |> String.split_at(4)
Base.decode16!(flagsbat) # this throws the error
{flagsbat, string} = "0166109100ac08f30034" |> String.upcase() |> String.split_at(4)
Base.decode16!(flagsbat) # works
So how do I get the string which I can split and parse from the raw payload which is base64 encoded?
"0166109100ac08f30034" is what I get if I Base64 decode "AWYQkQCsCPMANA==" on https://cryptii.com/pipes/base64-to-hex

Turns out I need to Base16 encode the decoded Base64 to get an actual String.
Base.decode64!("AWYQkQCsCPMANA==") |> Base.encode16() |> decode()
That does trick and I can pass it normally into the decoding function.

Related

base64.decode: Invalid encoding before padding

I'm working on a flutter project and I'm currently getting an error with some of the strings I try do decode using the base64.decode() method. I've created a short dart code which can reproduce the problem I'm facing with a specific string:
import 'dart:convert';
void main() {
final message = 'RU5UUkVHQUdSQVRJU1==';
print(utf8.decode(base64.decode(message)));
}
I'm getting the following error message:
Uncaught Error: FormatException: Invalid encoding before padding (at character 19)
RU5UUkVHQUdSQVRJU1==
I've tried decoding the same string with JavaScript and it works fine. Would be glad if someone could explain why am I getting this error, and possibly show me a solution. Thanks.
Base64 encoding breaks binary data into 6-bit segments of 3 full bytes and represents those as printable characters in ASCII standard. It does that in essentially two steps.
The first step is to break the binary string down into 6-bit blocks. Base64 only uses 6 bits (corresponding to 2^6 = 64 characters) to ensure encoded data is printable and humanly readable. None of the special characters available in ASCII are used.
The 64 characters (hence the name Base64) are 10 digits, 26 lowercase characters, 26 uppercase characters as well as the Plus sign (+) and the Forward Slash (/). There is also a 65th character known as a pad, which is the Equal sign (=). This character is used when the last segment of binary data doesn't contain a full 6 bits
So RU5UUkVHQUdSQVRJU1== doesn't follow the encoding pattern.
Use Underline character "_" as Padding Character and Decode With Pad Bytes Deleted
For some reason dart:convert's base64.decode chokes on strings padded with = with the "invalid encoding before padding error". This happens even if you use the package's own padding method base64.normalize which pads the string with the correct padding character =.
= is indeed the correct padding character for base64 encoding. It is used to fill out base64 strings when fewer than 24 bits are available in the input group. See RFC 4648, Section 4.
However, RFC 4648 Section 5 which is a base64 encoding scheme for Urls uses the underline character _ as padding instead of = to be Url safe.
Using _ as the padding character will cause base64.decode to decode without error.
In order to further decode the generated list of bytes to Utf8, you will need to delete the padding bytes or you will get an "Invalid UTF-8 byte" error.
See the code below. Here is the same code as a working dartpad.dev example.
import 'dart:convert';
void main() {
//String message = 'RU5UUkVHQUdSQVRJU1=='; //as of dart 2.18.2 this will generate an "invalid encoding before padding" error
//String message = base64.normalize('RU5UUkVHQUdSQVRJU1'); // will also generate same error
String message = 'RU5UUkVHQUdSQVRJU1';
print("Encoded String: $message");
print("Decoded String: ${decodeB64ToUtf8(message)}");
}
decodeB64ToUtf8(String message) {
message =
padBase64(message); // pad with underline => ('RU5UUkVHQUdSQVRJU1__')
List<int> dec = base64.decode(message);
//remove padding bytes
dec = dec.sublist(0, dec.length - RegExp(r'_').allMatches(message).length);
return utf8.decode(dec);
}
String padBase64(String rawBase64) {
return (rawBase64.length % 4 > 0)
? rawBase64 += List.filled(4 - (rawBase64.length % 4), "_").join("")
: rawBase64;
}
The string RU5UUkVHQUdSQVRJU1== is not a compliant base 64 encoding according to RFC 4648, which in section 3.5, "Canonical Encoding," states:
The padding step in base 64 and base 32 encoding can, if improperly
implemented, lead to non-significant alterations of the encoded data.
For example, if the input is only one octet for a base 64 encoding,
then all six bits of the first symbol are used, but only the first
two bits of the next symbol are used. These pad bits MUST be set to
zero by conforming encoders, which is described in the descriptions
on padding below. If this property do not hold, there is no
canonical representation of base-encoded data, and multiple base-
encoded strings can be decoded to the same binary data. If this
property (and others discussed in this document) holds, a canonical
encoding is guaranteed.
In some environments, the alteration is critical and therefore
decoders MAY chose to reject an encoding if the pad bits have not
been set to zero. The specification referring to this may mandate a
specific behaviour.
(Emphasis added.)
Here we will manually go through the base 64 decoding process.
Taking your encoded string RU5UUkVHQUdSQVRJU1== and performing the mapping from the base 64 character set (given in "Table 1: The Base 64 Alphabet" of the aforementioned RFC), we have:
R U 5 U U k V H Q U d S Q V R J U 1 = =
010001 010100 111001 010100 010100 100100 010101 000111 010000 010100 011101 010010 010000 010101 010001 001001 010100 110101 ______ ______
(using __ to represent the padding characters).
Now, grouping these by 8 instead of 6, we get
01000101 01001110 01010100 01010010 01000101 01000111 01000001 01000111 01010010 01000001 01010100 01001001 01010011 0101____ ________
E N T R E G A G R A T I S P
The important part is at the end, where there are some non-zero bits followed by padding. The Dart implementation is correctly determining that the padding provided doesn't make sense provided that the last four bits of the previous character do not decode to zeros.
As a result, the decoding of RU5UUkVHQUdSQVRJU1== is ambiguous. Is it ENTREGAGRATIS or ENTREGAGRATISP? It's precisely this reason why the RFC states, "These pad bits MUST be set to zero by conforming encoders."
In fact, because of this, I'd argue that an implementation that decodes RU5UUkVHQUdSQVRJU1== to ENTREGAGRATIS without complaint is problematic, because it's silently discarding non-zero bits.
The RFC-compliant encoding of ENTREGAGRATIS is RU5UUkVHQUdSQVRJUw==.
The RFC-compliant encoding of ENTREGAGRATISP is RU5UUkVHQUdSQVRJU1A=.
This further highlights the ambiguity of your input RU5UUkVHQUdSQVRJU1==, which matches neither.
I suggest you check your encoder to determine why it's providing you with non-compliant encodings, and make sure you're not losing information as a result.

ByteArray convert to String with UTF-8 charset in kotlin problem

i have a little confused
// default charset utf8
val bytes = byteArrayOf(78, 23, 41, 51, -32, 42)
val str = String(bytes)
// there i got array [78, 23, 41, 51, -17, -65, -67, 42]
val weird = str.toByteArray()
i put random value into the bytes property, for some reason. why is it inconsistent???
The issue here is that your bytes aren't a valid UTF-8 sequence.
Any sequence of bytes can be interpreted as valid ISO Latin-1, for example.  (There may be issues with bytes having values 0–31, but those generally don't stop the characters being stored and processed.)  Similar applies to most other 8-bit character sets.
But the same isn't true of UTF-8.  While all sequences of bytes in the range 1–127 are valid UTF-8 (and interpreted the same as they are in ASCII and most 8-bit encodings), bytes in the range 128–255 can only appear in certain well-defined combinations.  (This has several very useful properties: it lets you identify UTF-8 with a very high probability; it also avoids issues with synchronisation, searching, sorting, &c.)
In this case, the sequence in the question (which is 4E 17 29 33 E0 2A in unsigned hex) isn't valid UTF-8.
So when you try to convert it to a string using the default encoding (UTF-8), the JVM substitutes the replacement character — value U+FFFD, which looks like this: � — in place of each invalid character.
Then, when you convert that back to UTF-8, you get the UTF-8 encoding of the replacment character, which is EF BF BD.  And if you interpret that as signed bytes, you get -17 -65 -67 — as in the question.
So Kotlin/JVM is handling the invalid input as best it can.

what's the purpose of encoding and decoding at the same line of code?

I have seen code in which string is encoded and immediately decoded? What benefit does it give?
I have come across code a few times that goes like below. What's its purpose?
string_to_hash = "ahmer"
base64_string = base64.b64encode(string_to_hash).decode('utf-8')
b64encode takes bytes and returns bytes. Ideally it should be:
base64_string = base64.b64encode(string_to_hash.encode('utf-8')).decode('utf-8')
string -> bytes -> b64encoded bytes -> b64 encoded string

Incorporate Base64 encoded data in Python Web Service call

I am trying to make a web service call in Python 3. A subset of the request includes a base64 encoded string, which is coming from a list of Python dictionaries.
So I dump the list and encode the string:
j = json.dumps(dataDictList, indent=4, default = myconverter)
encodedData = base64.b64encode(j.encode('ASCII'))
Then, when I build my request, I add in that string. Because it comes back in bytes I need to change it to string:
...
\"data\": \"''' + str(encodedData) + '''\"
...
The response I'm getting from the web service is that my request is malformed. When I print our str(encodedData) I get:
b'WwogICAgewogICAgICAgICJEQVlfREFURSI6ICIyMDEyLTAzLTMxIDAwOjAwOjAwIiwKICAgICAgICAiQ0FMTF9DVFJfSUQiOiA1LAogICAgICAgICJUT1RfRE9MTEFSX1NBTEVTIjogMTk5MS4wLAogICAgICAgICJUT1RfVU5JVF9TQUxFUyI6IDQ0LjAsCiAgICAgICAgIlRPVF9DT1NUIjogMTYxOC4xMDM3MDAwMDAwMDA2LAogICAgICAgICJHUk9TU19ET0xMQVJfU0FMRVMiOiAxOTkxLjAKICAgIH0KXQ=='
If I copy this into a base64 decoder, I get gibberish until I remove the b' at the beginning as well as the last single quote. I think those are causing my request to fail. According to this note, though, I would think that the b' is ignored: What does the 'b' character do in front of a string literal?
I'll appreciate any advice.
Thank you.
Passing a bytes object into str causes it to be formatted for display, it doesn't convert the bytes into a string (you need to know the encoding for that to work):
In [1]: x = b'hello'
In [2]: str(x)
Out[2]: "b'hello'"
Note that str(x) actually starts with b' and ends with '. If you want to decode the bytes into a string, use bytes.decode:
In [5]: x = base64.b64encode(b'hello')
In [6]: x
Out[6]: b'aGVsbG8='
In [7]: x.decode('ascii')
Out[7]: 'aGVsbG8='
You can safely decode the base64 bytes as ASCII. Also, your JSON should be encoded as UTF-8, not ASCII. The following changes should work:
j = json.dumps(dataDictList, indent=4, default=myconverter)
encodedData = base64.b64encode(j.encode('utf-8')).decode('ascii')

Python hashlib & decode() on a Bytes Object

I'm not understanding something about hashlib. I'm not sure why I can decode a regular byte object, but can't decode a hash that's returned as a byte object. I keep getting this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 1: invalid start byte
Here's my test code that's producing this error. The error is on line 8 (h2 = h.decode('utf-8'))
import hashlib
pw = 'wh#teV)r'
salt = 'b7u2qw^T&^#U#Lta)hvx7ivRoxr^tDyua'
pwd = pw + salt
h = hashlib.sha512(pwd.encode('utf-8')).digest()
print(h)
h2 = h.decode('utf-8')
print(h2)
If I don't hash it, it works perfectly fine...
>>> pw = 'wh#teV)r'
>>> salt = 'b7u2qw^T&^#U#Lta)hvx7ivRoxr^tDyua'
>>> pwd = pw + salt
>>> h = pwd.encode('utf-8')
>>> print(h)
b'wh#teV)rb7u2qw^T&^#U#Lta)hvx7ivRoxr^tDyua'
>>> h2 = h.decode('utf-8')
>>> print(h2)
wh#teV)rb7u2qw^T&^#U#Lta)hvx7ivRoxr^tDyua
So I'm guessing I'm not understanding something about the hash, but I have no clue what I'm missing.
In the second example you're just encoding to UTF-8 and then decoding the result straight back.
In the first example, on the other hand, you're encoding to UTF-8, messing about with the bytes, and then trying to decode it as though it's still UTF-8. Whether the resulting bytes are still valid as UTF-8 is purely down to chance (and even if it is still valid UTF-8, the Unicode string it represents will bear no relation to the original string).

Resources