system.convert.tobase64string (in F#) - string

https://msdn.microsoft.com/fr-fr/library/system.convert.tobase64string.aspx
I don't really understand what this function does.
> System.Convert.ToBase64String [|7uy;243uy|];;
val it : string = "B/M="
> System.Convert.ToBase64String [|243uy;7uy|];;
val it : string = "8wc="
> System.Convert.ToBase64String [|243uy|];;
val it : string = "8w=="
> System.Convert.ToBase64String [|243uy;7uy;3uy|];;
val it : string = "8wcD"
> System.Convert.ToBase64String [|243uy;7uy;3uy;5uy|];;
val it : string = "8wcDBQ=="
I would have expected that function to be "commutative", that if we assume the first answer then System.Convert.ToBase64String [|243uy;7uy|];; would yield
val it : string = "M=B/"
also im surprised that with an array with 1,2 and 3 elements the size is a string of length 4, then we move to 8 from an array of size 4...
I didn't find any explanations...
I thought System.Text.Encoding.UTF8.GetString would do what i want to do (produce manually a string hash) but embarassingly
> System.Text.Encoding.UTF8.GetString [|243uy;7uy|];;
val it : string = "�"
> System.Text.Encoding.UTF8.GetString [|7uy;243uy|];;
val it : string = "�"
look the same
with individually
> System.Text.Encoding.UTF8.GetChars [|7uy;243uy|];;
val it : char [] = [|'\007'; '�'|]
> System.Text.Encoding.UTF8.GetChars [|243uy;7uy|];;
val it : char [] = [|'�'; '\007'|]
thanks

Not knowing your background, forgive me for quickly covering the basics:
Number bases: First, you'll want to understand how bases work in general, and binary (Base 2) in specific. If you don't understand what I mean when I say that 243 = 0b11110011 or 0b00000111 = 7, you'll want to check out the following link: http://betterexplained.com/articles/numbers-and-bases/ -- he explains it better than I could. :) We need to know this because in order to understand Base64 encoding, you need to be able to convert a number between decimal and binary, and vice versa.
Base64: To understand Base64 encoding, we need to go down to the level of the bits. See https://en.wikipedia.org/wiki/Base64 for a pretty clear picture. Base64's purpose historically is to help prepare binary data to be sent between computer systems via methods which don't reliably handle non-text data, such as email before it supported attachments. It does this by combining the bytes' data and re-slicing their boundaries so that each unit has 6 bits instead of 8. 6 bits can hold 2^6 = 64 different values, and an entire printable character is used to represent this base64 unit. Usually, A-Z represents values 0-25, a-z = 26-51, 0-9 = 52-61, and two final characters, + for value 62, and / for value 63. This representation of the data takes more memory -- it takes 4 bytes of printable characters to represent 3 bytes of binary data, but the benefit is that the data is now representable as text (and will thus be more likely to be the same binary data across different systems and character sets). When the data is received, the process is reversed, and you get your binary data back.
There is one extra piece. Binary data is consumed in chunks of 3 bytes -- this is so 24 incoming bits can easily be resliced into 4 base64 chars of 6 bits each. If the end of the data doesn't completely fill the last chunk, the encoding will be padded so that on the other end the decoding will leave off the correct number of bytes. A single = will show that the last chunk is padded one byte, and double equals (==) will show that the last chunk is padded two bytes. The decoding process will then know the final chunk of created bytes will be 1 or 2 bytes short.
Understanding some of your examples
Let's take a look at your first example -- you're sending the bytes 7 and 243 into the function:
System.Convert.ToBase64String[|7uy;243uy|]
// byte 1 byte 2 no 3rd
// dec 7 dec 243 byte!
// 00000111 11110011 00000000
// reslice 3*8 into 4*6
// 000001 111111 001100 000000
// dec 1 dec 63 dec 12 padding <-- refer to wikipedia base64 table
// char B char / char M char =
"B/M="
For your next example you're sending the bytes 243 and 7 into the function:
System.Convert.ToBase64String[|243uy;7uy|]
// byte 1 byte 2 no 3rd
// dec 243 dec 7 byte!
// 11110011 00000111 00000000
// reslice 3*8 into 4*6
// 111100 110000 011100 000000
// dec 60 dec 48 dec 28 padding <-- refer to wikipedia base64 table
// char 8 char w char c char =
"8wc="
Not 'commutative' You thought that the encoding should be "commutative". Just to be clear, this is a property that means the order that you perform an operation is irrelevant, such as adding or multiplying -- 1 + 2 + 3 can be added in any sequence, and the answer is still 6. I'm not sure that this property makes sense for this kind of operation -- the data itself isn't changing, you're merely giving it a different representation. Said differently, if you resequenced the input data but had the same output, either the data would no longer be the same, or the bytes you swapped were identical. :) Perhaps you meant that you expected swapping bytes would mean the base64 characters would swap? After having gone through a couple of examples, you can see that in each chunk, a portion of each input byte goes into determining what the next character is -- you're not swapping on clean boundary breaks. However, if you swapped a chunk of three bytes with another chunk of three bytes, then the resulting 8 output characters would cleanly swap. For example:
> let bytes = System.Text.Encoding.UTF8.GetBytes( "foobar" );;
val bytes : byte [] = [|102uy; 111uy; 111uy; 98uy; 97uy; 114uy|]
> System.Convert.ToBase64String bytes;;
val it : string = "Zm9vYmFy"
> let bytes = System.Text.Encoding.UTF8.GetBytes( "barfoo" );;
val bytes : byte [] = [|98uy; 97uy; 114uy; 102uy; 111uy; 111uy|]
> System.Convert.ToBase64String bytes;;
val it : string = "YmFyZm9v"
You can see that swapping a 3 byte chunk resulted in a clean swap of the 4 character output chunks.
Unexpected output size You mention that you were surprised to see a 4 character output for input byte arrays of length 1, 2, and 3. By now you probably recognized that the Base64 encoding algorithm processes in chunks of 3 bytes, producing 4 character output for each 3 byte chunk. The ending equals signs are the padding signifying that the input bytes didn't fill the last chunk. To be clear:
1 byte in = 4 chars out, last two are padding
2 bytes in = 4 chars out, last one is padding
3 bytes in = 4 chars out, no padding
4 bytes in = 8 chars out, last two are padding
etc.
Decoding Although you don't mention it, Base64 is an encoding, which means there is a decoding. Once you have your chars, you can easily get your bytes back again:
> System.Convert.FromBase64String "B/M=";;
val it : byte [] = [|7uy; 243uy|]
// or a more realistic string example:
> let bytes = System.Text.Encoding.UTF8.GetBytes( "foobar" );;
val bytes : byte [] = [|102uy; 111uy; 111uy; 98uy; 97uy; 114uy|]
> System.Convert.ToBase64String bytes;;
val it : string = "Zm9vYmFy"
> let bytes = System.Convert.FromBase64String "Zm9vYmFy";;
val bytes : byte [] = [|102uy; 111uy; 111uy; 98uy; 97uy; 114uy|]
> System.Text.Encoding.UTF8.GetString( bytes );;
val it : string = "foobar"
Producing a string hash The last unanswered idea is your saying you want to make a string hash. I'm not sure if you meant to encode a string as Base64 (which would probably be unnecessary since strings are usually already made of printable chars), except it kind of looks like it's been encrypted-ish (my guess is someone would think to Base64 decode it pretty quickly). However, just in case you wanted to get a hash on the string, remember all objects already have this functionality. Perhaps the following is what you're looking for?
> "Hash this please!".GetHashCode();;
val it : int = -1297461057
Regardless, hope this helped make things more clear. Good luck!

Related

base64.decode: Invalid encoding before padding

I'm working on a flutter project and I'm currently getting an error with some of the strings I try do decode using the base64.decode() method. I've created a short dart code which can reproduce the problem I'm facing with a specific string:
import 'dart:convert';
void main() {
final message = 'RU5UUkVHQUdSQVRJU1==';
print(utf8.decode(base64.decode(message)));
}
I'm getting the following error message:
Uncaught Error: FormatException: Invalid encoding before padding (at character 19)
RU5UUkVHQUdSQVRJU1==
I've tried decoding the same string with JavaScript and it works fine. Would be glad if someone could explain why am I getting this error, and possibly show me a solution. Thanks.
Base64 encoding breaks binary data into 6-bit segments of 3 full bytes and represents those as printable characters in ASCII standard. It does that in essentially two steps.
The first step is to break the binary string down into 6-bit blocks. Base64 only uses 6 bits (corresponding to 2^6 = 64 characters) to ensure encoded data is printable and humanly readable. None of the special characters available in ASCII are used.
The 64 characters (hence the name Base64) are 10 digits, 26 lowercase characters, 26 uppercase characters as well as the Plus sign (+) and the Forward Slash (/). There is also a 65th character known as a pad, which is the Equal sign (=). This character is used when the last segment of binary data doesn't contain a full 6 bits
So RU5UUkVHQUdSQVRJU1== doesn't follow the encoding pattern.
Use Underline character "_" as Padding Character and Decode With Pad Bytes Deleted
For some reason dart:convert's base64.decode chokes on strings padded with = with the "invalid encoding before padding error". This happens even if you use the package's own padding method base64.normalize which pads the string with the correct padding character =.
= is indeed the correct padding character for base64 encoding. It is used to fill out base64 strings when fewer than 24 bits are available in the input group. See RFC 4648, Section 4.
However, RFC 4648 Section 5 which is a base64 encoding scheme for Urls uses the underline character _ as padding instead of = to be Url safe.
Using _ as the padding character will cause base64.decode to decode without error.
In order to further decode the generated list of bytes to Utf8, you will need to delete the padding bytes or you will get an "Invalid UTF-8 byte" error.
See the code below. Here is the same code as a working dartpad.dev example.
import 'dart:convert';
void main() {
//String message = 'RU5UUkVHQUdSQVRJU1=='; //as of dart 2.18.2 this will generate an "invalid encoding before padding" error
//String message = base64.normalize('RU5UUkVHQUdSQVRJU1'); // will also generate same error
String message = 'RU5UUkVHQUdSQVRJU1';
print("Encoded String: $message");
print("Decoded String: ${decodeB64ToUtf8(message)}");
}
decodeB64ToUtf8(String message) {
message =
padBase64(message); // pad with underline => ('RU5UUkVHQUdSQVRJU1__')
List<int> dec = base64.decode(message);
//remove padding bytes
dec = dec.sublist(0, dec.length - RegExp(r'_').allMatches(message).length);
return utf8.decode(dec);
}
String padBase64(String rawBase64) {
return (rawBase64.length % 4 > 0)
? rawBase64 += List.filled(4 - (rawBase64.length % 4), "_").join("")
: rawBase64;
}
The string RU5UUkVHQUdSQVRJU1== is not a compliant base 64 encoding according to RFC 4648, which in section 3.5, "Canonical Encoding," states:
The padding step in base 64 and base 32 encoding can, if improperly
implemented, lead to non-significant alterations of the encoded data.
For example, if the input is only one octet for a base 64 encoding,
then all six bits of the first symbol are used, but only the first
two bits of the next symbol are used. These pad bits MUST be set to
zero by conforming encoders, which is described in the descriptions
on padding below. If this property do not hold, there is no
canonical representation of base-encoded data, and multiple base-
encoded strings can be decoded to the same binary data. If this
property (and others discussed in this document) holds, a canonical
encoding is guaranteed.
In some environments, the alteration is critical and therefore
decoders MAY chose to reject an encoding if the pad bits have not
been set to zero. The specification referring to this may mandate a
specific behaviour.
(Emphasis added.)
Here we will manually go through the base 64 decoding process.
Taking your encoded string RU5UUkVHQUdSQVRJU1== and performing the mapping from the base 64 character set (given in "Table 1: The Base 64 Alphabet" of the aforementioned RFC), we have:
R U 5 U U k V H Q U d S Q V R J U 1 = =
010001 010100 111001 010100 010100 100100 010101 000111 010000 010100 011101 010010 010000 010101 010001 001001 010100 110101 ______ ______
(using __ to represent the padding characters).
Now, grouping these by 8 instead of 6, we get
01000101 01001110 01010100 01010010 01000101 01000111 01000001 01000111 01010010 01000001 01010100 01001001 01010011 0101____ ________
E N T R E G A G R A T I S P
The important part is at the end, where there are some non-zero bits followed by padding. The Dart implementation is correctly determining that the padding provided doesn't make sense provided that the last four bits of the previous character do not decode to zeros.
As a result, the decoding of RU5UUkVHQUdSQVRJU1== is ambiguous. Is it ENTREGAGRATIS or ENTREGAGRATISP? It's precisely this reason why the RFC states, "These pad bits MUST be set to zero by conforming encoders."
In fact, because of this, I'd argue that an implementation that decodes RU5UUkVHQUdSQVRJU1== to ENTREGAGRATIS without complaint is problematic, because it's silently discarding non-zero bits.
The RFC-compliant encoding of ENTREGAGRATIS is RU5UUkVHQUdSQVRJUw==.
The RFC-compliant encoding of ENTREGAGRATISP is RU5UUkVHQUdSQVRJU1A=.
This further highlights the ambiguity of your input RU5UUkVHQUdSQVRJU1==, which matches neither.
I suggest you check your encoder to determine why it's providing you with non-compliant encodings, and make sure you're not losing information as a result.

Advice for decoding binary/hex WAV file metadata - Pro Tools UMID chunk

Pro Tools (AVID's DAW software) has a process for managing and linking to all of it's unique media using a Unique ID field, which gets embedded in to the WAV file in the form of a umid metadata chunk. Examining a particular file inside Pro Tools, I can see that the file's Unique ID comes in the form of an 11 character string, looking like: rS9ipS!x6Tf.
When I examine the raw data inside the WAV file, I find a 32-byte block of data - 4 bytes for the chars 'umid'; 4 bytes for the size of the following data block - 24; then the 24-byte data block, which, when examined in Hex Fiend, looks like this:
00000000 0000002A 5B7A5FFB 0F23DB11 00000000 00000000
As you can see, there are only 9 bytes that contain any non-zero information, but this is somehow being used to store the 11 char Unique ID field. It looks to me as if something is being done to interpret this raw data to retrieve that Unique ID string, but all my attempts to decode the raw data have not been at all fruitful. I have tried using https://gchq.github.io/CyberChef/ to run it through all the different formats that would make sense, but nothing it pointing me in the right direction. I have also tried looking at the data in 6-bit increments to see if it's being compressed in some way (9 bytes * 8 bits == 72 == 12 blocks * 6 bits) but have not had any luck stumbling on a pattern yet.
So I'm wondering if anyone has any specific tips/tricks/suggestions on how best to figure out what might be happening here - how to unpack this data in such a way that I might be able to end up with enough information to generate those 11 chars, of what I'm guessing would most likely be UTF-8.
Any and all help/suggestions welcome! Thanks.
It seems to be a base64 encoding only with a slightly different character map, here is my python implementation that I find best matches Pro Tools.
char_map = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789#!"
def encode_unique_id(uint64_value):
# unique id is a uint64_t, clamp
value = uint64_value & 0xFFFFFFFFFFFFFFFF
if value == 0:
return ""
# calculate the min number of bytes
# needed store value for int
byte_length = 0
tmp = value
while tmp:
tmp =tmp >> 8
byte_length += 1
# calculate number of chars needed to store encoding
char_total, remainder = divmod(byte_length * 8, 6)
if remainder:
char_total += 1
s = ""
for i in range(char_total):
value, index = divmod(value, 64)
s += char_map[index]
return s
Running encode_unique_id(0x2A5B7A5FFB0F23DB11) should give you rS9ipS!x6Tf

Output UUID in Go as a short string

Is there a built in way, or reasonably standard package that allows you to convert a standard UUID into a short string that would enable shorter URL's?
I.e. taking advantage of using a larger range of characters such as [A-Za-z0-9] to output a shorter string.
I know we can use base64 to encode the bytes, as follows, but I'm after something that creates a string that looks like a "word", i.e. no + and /:
id = base64.StdEncoding.EncodeToString(myUuid.Bytes())
A universally unique identifier (UUID) is a 128-bit value, which is 16 bytes. For human-readable display, many systems use a canonical format using hexadecimal text with inserted hyphen characters, for example:
123e4567-e89b-12d3-a456-426655440000
This has length 16*2 + 4 = 36. You may choose to omit the hypens which gives you:
fmt.Printf("%x\n", uuid)
fmt.Println(hex.EncodeToString(uuid))
// Output: 32 chars
123e4567e89b12d3a456426655440000
123e4567e89b12d3a456426655440000
You may choose to use base32 encoding (which encodes 5 bits with 1 symbol in contrast to hex encoding which encodes 4 bits with 1 symbol):
fmt.Println(base32.StdEncoding.EncodeToString(uuid))
// Output: 26 chars
CI7EKZ7ITMJNHJCWIJTFKRAAAA======
Trim the trailing = signs when transmitting, so this will always be 26 chars. Note that you have to append "======" prior to decode the string using base32.StdEncoding.DecodeString().
If this is still too long for you, you may use base64 encoding (which encodes 6 bits with 1 symbol):
fmt.Println(base64.RawURLEncoding.EncodeToString(uuid))
// Output: 22 chars
Ej5FZ-ibEtOkVkJmVUQAAA
Note that base64.RawURLEncoding produces a base64 string (without padding) which is safe for URL inclusion, because the 2 extra chars in the symbol table (beyond [0-9a-zA-Z]) are - and _, both which are safe to be included in URLs.
Unfortunately for you, the base64 string may contain 2 extra chars beyond [0-9a-zA-Z]. So read on.
Interpreted, escaped string
If you are alien to these 2 extra characters, you may choose to turn your base64 string into an interpreted, escaped string similar to the interpreted string literals in Go. For example if you want to insert a backslash in an interpreted string literal, you have to double it because backslash is a special character indicating a sequence, e.g.:
fmt.Println("One backspace: \\") // Output: "One backspace: \"
We may choose to do something similar to this. We have to designate a special character: be it 9.
Reasoning: base64.RawURLEncoding uses the charset: A..Za..z0..9-_, so 9 represents the highest code with alphanumeric character (61 decimal = 111101b). See advantage below.
So whenever the base64 string contains a 9, replace it with 99. And whenever the base64 string contains the extra characters, use a sequence instead of them:
9 => 99
- => 90
_ => 91
This is a simple replacement table which can be captured by a value of strings.Replacer:
var escaper = strings.NewReplacer("9", "99", "-", "90", "_", "91")
And using it:
fmt.Println(escaper.Replace(base64.RawURLEncoding.EncodeToString(uuid)))
// Output:
Ej5FZ90ibEtOkVkJmVUQAAA
This will slightly increase the length as sometimes a sequence of 2 chars will be used instead of 1 char, but the gain will be that only [0-9a-zA-Z] chars will be used, as you wanted. The average length will be less than 1 additional character: 23 chars. Fair trade.
Logic: For simplicity let's assume all possible uuids have equal probability (uuid is not completely random, so this is not the case, but let's set this aside as this is just an estimation). Last base64 symbol will never be a replaceable char (that's why we chose the special char to be 9 instead of like A), 21 chars may turn into a replaceable sequence. The chance for one being replaceable: 3 / 64 = 0.047, so on average this means 21*3/64 = 0.98 sequences which turn 1 char into a 2-char sequence, so this is equal to the number of extra characters.
To decode, use an inverse decoding table captured by the following strings.Replacer:
var unescaper = strings.NewReplacer("99", "9", "90", "-", "91", "_")
Example code to decode an escaped base64 string:
fmt.Println("Verify decoding:")
s := escaper.Replace(base64.RawURLEncoding.EncodeToString(uuid))
dec, err := base64.RawURLEncoding.DecodeString(unescaper.Replace(s))
fmt.Printf("%x, %v\n", dec, err)
Output:
123e4567e89b12d3a456426655440000, <nil>
Try all the examples on the Go Playground.
As suggested here, If you want just a fairly random string to use as slug, better to not bother with UUID at all.
You can simply use go's native math/rand library to make random strings of desired length:
import (
"math/rand"
"encoding/hex"
)
b := make([]byte, 4) //equals 8 characters
rand.Read(b)
s := hex.EncodeToString(b)
Another option is math/big. While base64 has a constant output of 22
characters, math/big can get down to 2 characters, depending on the input:
package main
import (
"encoding/base64"
"fmt"
"math/big"
)
type uuid [16]byte
func (id uuid) encode() string {
return new(big.Int).SetBytes(id[:]).Text(62)
}
func main() {
var id uuid
for n := len(id); n > 0; n-- {
id[n - 1] = 0xFF
s := base64.RawURLEncoding.EncodeToString(id[:])
t := id.encode()
fmt.Printf("%v %v\n", s, t)
}
}
Result:
AAAAAAAAAAAAAAAAAAAA_w 47
AAAAAAAAAAAAAAAAAAD__w h31
AAAAAAAAAAAAAAAAAP___w 18owf
AAAAAAAAAAAAAAAA_____w 4GFfc3
AAAAAAAAAAAAAAD______w jmaiJOv
AAAAAAAAAAAAAP_______w 1hVwxnaA7
AAAAAAAAAAAA_________w 5k1wlNFHb1
AAAAAAAAAAD__________w lYGhA16ahyf
AAAAAAAAAP___________w 1sKyAAIxssts3
AAAAAAAA_____________w 62IeP5BU9vzBSv
AAAAAAD______________w oXcFcXavRgn2p67
AAAAAP_______________w 1F2si9ujpxVB7VDj1
AAAA_________________w 6Rs8OXba9u5PiJYiAf
AAD__________________w skIcqom5Vag3PnOYJI3
AP___________________w 1SZwviYzes2mjOamuMJWv
_____________________w 7N42dgm5tFLK9N8MT7fHC7
https://golang.org/pkg/math/big

Why SHA256 hashes finish with " = "?

I've made a Webservice which returns a security Token after a successful authentification.
However when debugging I noticed that every hash the webservice returned finishes with "=" such as:
"tINH0JxmryvB6pRkEii1iBYP7FRedDqIEs0Ppbw83oc="
"INv7q72C1HvIixY1qmt5tNASFBEc0PnXRSb780Y5aeI="
"QkM8Kog8TtCczysDmKu6ZOjwwYlcR2biiUzxkb3uBio="
"6eNuCU6RBkwKMmVV6Mhm0Q0ehJ8Qo5SqcGm3LIl62uQ="
"dAPKN8aHl5tgKpmx9vNoYvXfAdF+76G4S+L+ep+TzU="
"O5qQNLEjmmgCIB0TOsNOPCHiquq8ALbHHLcWvWhMuI="
"N9ERYp+i7yhEblAjaKaS3qf9uvMja0odC7ERYllHCI="
"wsBTpxyNLVLbJEbMttFdSfOwv6W9rXba4GGodVVxgo="
"sr+nF83THUjYcjzRVQbnDFUQVTkuZOZYe3D3bmF1D8="
"9EosvgyYOG5a136S54HVmmebwiBJJ8a3qGVWD878j5k="
"8ORZmAXZ4dlWeaMOsyxAFphwKh9SeimwBzf8eYqTis="
"gVepn2Up5rjVplJUvDHtgIeaBL+X6TPzm2j9O2JTDFI="
Why such a behavior ?
This is because you don't see the raw bytes of the hash but rather the Base64 encoding.
Base64-encoding converts a block of 3 bytes to a block of four characters. This works well if the number of bytes is divisible by 3. If it is not, then you use a padding-character so the number of resulting characters is still divisible by 4.
So:
(no of bytes)%3 = 0 => no padding needed
(no of bytes)%3 = 1 => pad with ==
(no of bytes)%3 = 2 => pad with =
A SHA256-hash is 256 bit, that's 32 bytes. So you will get 40 characters for the first 30 bytes, 3 characters for the last 2 bytes and the padding will always be one =.
These strings are encoded using base64, = characters are used as paddings, to make the last block of a base64 string contains four characters.
The following Ruby code could be used to get base64 decoded string:
require 'base64'
s = "tINH0JxmryvB6pRkEii1iBYP7FRedDqIEs0Ppbw83oc="
puts Base64.decode64(s).bytes.map{|e| '%02x' % e}.join
Output: b48347d09c66af2bc1ea94641228b588160fec545e743a8812cd0fa5bc3cde87

Why does ToBase64String change a 16 byte string to 24 bytes

I have the following code. When I check the value of variable i it is 16 bytes but then when the output is converted to Base64 it is 24 bytes.
byte[] bytOut = ms.GetBuffer();
int i = 0;
for (i = 0; i < bytOut.Length; i++)
if (bytOut[i] == 0)
break;
// convert into Base64 so that the result can be used in xml
return System.Convert.ToBase64String(bytOut, 0, i);
Is this expected? I am trying to cut down storage and this is one of my problems.
Base64 expresses the input string made of 8-bit bytes using 64 human-readable characters (64 characters = 6 bits of information).
The key to the answer of your question is that it the encoding works in 24 bit chunks, so every 24 bits or fraction thereof results in 4 characters of output.
16 bytes * 8 bits = 128 bits of information
128 bits / 24 bits per chunk = 5.333 chunks
So the final output will be 6 chunks or 24 characters.
The fractional chunks are handled with equal signs, which represent the trailing "null bits". In your case, the output will always end in '=='.
Yes, you'd expect to see some expansion. You're representing your data in a base with only 64 characters. All those unprintable ASCII characters still need a way to be encoded though. So you end up with slight expansion of the data.
Here's a link that explains how much: Base64: What is the worst possible increase in space usage?
Edit: Based on your comment above, if you need to reduce size, you should look at compressing the data before you encrypt. This will get you the max benefit from compression. Compressing encrypted binary does not work.
This is because a base64 string can contain only 64 characters ( and that is because it should be displayable) in other hand and byte has a variety of 256 characters so it can contain more information in it.
Base64 is a great way to represent binary data in a string using only standard, printable characters. It is not, however, a good way to represent string data because it takes more characters than the original string.

Resources