Output UUID in Go as a short string - string

Is there a built in way, or reasonably standard package that allows you to convert a standard UUID into a short string that would enable shorter URL's?
I.e. taking advantage of using a larger range of characters such as [A-Za-z0-9] to output a shorter string.
I know we can use base64 to encode the bytes, as follows, but I'm after something that creates a string that looks like a "word", i.e. no + and /:
id = base64.StdEncoding.EncodeToString(myUuid.Bytes())

A universally unique identifier (UUID) is a 128-bit value, which is 16 bytes. For human-readable display, many systems use a canonical format using hexadecimal text with inserted hyphen characters, for example:
123e4567-e89b-12d3-a456-426655440000
This has length 16*2 + 4 = 36. You may choose to omit the hypens which gives you:
fmt.Printf("%x\n", uuid)
fmt.Println(hex.EncodeToString(uuid))
// Output: 32 chars
123e4567e89b12d3a456426655440000
123e4567e89b12d3a456426655440000
You may choose to use base32 encoding (which encodes 5 bits with 1 symbol in contrast to hex encoding which encodes 4 bits with 1 symbol):
fmt.Println(base32.StdEncoding.EncodeToString(uuid))
// Output: 26 chars
CI7EKZ7ITMJNHJCWIJTFKRAAAA======
Trim the trailing = signs when transmitting, so this will always be 26 chars. Note that you have to append "======" prior to decode the string using base32.StdEncoding.DecodeString().
If this is still too long for you, you may use base64 encoding (which encodes 6 bits with 1 symbol):
fmt.Println(base64.RawURLEncoding.EncodeToString(uuid))
// Output: 22 chars
Ej5FZ-ibEtOkVkJmVUQAAA
Note that base64.RawURLEncoding produces a base64 string (without padding) which is safe for URL inclusion, because the 2 extra chars in the symbol table (beyond [0-9a-zA-Z]) are - and _, both which are safe to be included in URLs.
Unfortunately for you, the base64 string may contain 2 extra chars beyond [0-9a-zA-Z]. So read on.
Interpreted, escaped string
If you are alien to these 2 extra characters, you may choose to turn your base64 string into an interpreted, escaped string similar to the interpreted string literals in Go. For example if you want to insert a backslash in an interpreted string literal, you have to double it because backslash is a special character indicating a sequence, e.g.:
fmt.Println("One backspace: \\") // Output: "One backspace: \"
We may choose to do something similar to this. We have to designate a special character: be it 9.
Reasoning: base64.RawURLEncoding uses the charset: A..Za..z0..9-_, so 9 represents the highest code with alphanumeric character (61 decimal = 111101b). See advantage below.
So whenever the base64 string contains a 9, replace it with 99. And whenever the base64 string contains the extra characters, use a sequence instead of them:
9 => 99
- => 90
_ => 91
This is a simple replacement table which can be captured by a value of strings.Replacer:
var escaper = strings.NewReplacer("9", "99", "-", "90", "_", "91")
And using it:
fmt.Println(escaper.Replace(base64.RawURLEncoding.EncodeToString(uuid)))
// Output:
Ej5FZ90ibEtOkVkJmVUQAAA
This will slightly increase the length as sometimes a sequence of 2 chars will be used instead of 1 char, but the gain will be that only [0-9a-zA-Z] chars will be used, as you wanted. The average length will be less than 1 additional character: 23 chars. Fair trade.
Logic: For simplicity let's assume all possible uuids have equal probability (uuid is not completely random, so this is not the case, but let's set this aside as this is just an estimation). Last base64 symbol will never be a replaceable char (that's why we chose the special char to be 9 instead of like A), 21 chars may turn into a replaceable sequence. The chance for one being replaceable: 3 / 64 = 0.047, so on average this means 21*3/64 = 0.98 sequences which turn 1 char into a 2-char sequence, so this is equal to the number of extra characters.
To decode, use an inverse decoding table captured by the following strings.Replacer:
var unescaper = strings.NewReplacer("99", "9", "90", "-", "91", "_")
Example code to decode an escaped base64 string:
fmt.Println("Verify decoding:")
s := escaper.Replace(base64.RawURLEncoding.EncodeToString(uuid))
dec, err := base64.RawURLEncoding.DecodeString(unescaper.Replace(s))
fmt.Printf("%x, %v\n", dec, err)
Output:
123e4567e89b12d3a456426655440000, <nil>
Try all the examples on the Go Playground.

As suggested here, If you want just a fairly random string to use as slug, better to not bother with UUID at all.
You can simply use go's native math/rand library to make random strings of desired length:
import (
"math/rand"
"encoding/hex"
)
b := make([]byte, 4) //equals 8 characters
rand.Read(b)
s := hex.EncodeToString(b)

Another option is math/big. While base64 has a constant output of 22
characters, math/big can get down to 2 characters, depending on the input:
package main
import (
"encoding/base64"
"fmt"
"math/big"
)
type uuid [16]byte
func (id uuid) encode() string {
return new(big.Int).SetBytes(id[:]).Text(62)
}
func main() {
var id uuid
for n := len(id); n > 0; n-- {
id[n - 1] = 0xFF
s := base64.RawURLEncoding.EncodeToString(id[:])
t := id.encode()
fmt.Printf("%v %v\n", s, t)
}
}
Result:
AAAAAAAAAAAAAAAAAAAA_w 47
AAAAAAAAAAAAAAAAAAD__w h31
AAAAAAAAAAAAAAAAAP___w 18owf
AAAAAAAAAAAAAAAA_____w 4GFfc3
AAAAAAAAAAAAAAD______w jmaiJOv
AAAAAAAAAAAAAP_______w 1hVwxnaA7
AAAAAAAAAAAA_________w 5k1wlNFHb1
AAAAAAAAAAD__________w lYGhA16ahyf
AAAAAAAAAP___________w 1sKyAAIxssts3
AAAAAAAA_____________w 62IeP5BU9vzBSv
AAAAAAD______________w oXcFcXavRgn2p67
AAAAAP_______________w 1F2si9ujpxVB7VDj1
AAAA_________________w 6Rs8OXba9u5PiJYiAf
AAD__________________w skIcqom5Vag3PnOYJI3
AP___________________w 1SZwviYzes2mjOamuMJWv
_____________________w 7N42dgm5tFLK9N8MT7fHC7
https://golang.org/pkg/math/big

Related

Converting string to 64 bit floats in node.js

This question has two inter-related parts which I am deeply struggling with. I just want to point out I have literally been struggling with this for days!
Part 1. I have a string of data which is like the following:
?\21167\200Z\251\3072\26467\000\350-\306%\33767\240\314\203\310\030\n77\200U\374\307\013577\000#\036\306\376_77\200s\215\307\361\21277\000t\235\306\344\26577\000\204\000\307\327\34077\000\264\217\306\313\01387\000R&\307\276687\000\312\210\306\261a87\000\364\026\306\244\21487p\257\"\311\227\26787\000#U\306\212\34287\000\324\210\306}\r97\000\274*\306p897\000\270\016\306J\27197\000\250u\306=\34497\000\224r\3060\017:7\000\316\213\306#::7#\254\223\310\026e:7\000d\251\306\374\272:7\000(-\306\360\345:7\000\322\202\306\343\020;7\200D\230\307\311f;7\000\\\314\306\274\221;7\0000\246\306\257\274;7\000\230\342\305\242\347;7\000\000\220\310\225\022<7\0002\003\307\210=<7\200X\316\307|h<7\000-\033\307o\223<7\000\000\367\305b\276<7\000|\237\306U\351<7\000Di\306H\024=7\000\356>\307;?=7\000\024u\306!\225=7\240J\317\310\025\300=7\000\224#\306\010\353=7\000\373\027\307\373\025>7\0008p\310\324\226>7\000\360\205\306\272\354>7\000\226m\307\255\027?7\000\224\304\306\241B?7\200D\240\310\224m?7\000\304x\306\207\230?7\000B\337\306z\303?7\000;\217\307`\031#7\000\252q\307SD#7\000\244d\3069\232#7\000\270\324\306-\305#7\200\311\266\307 \360#7\340P\233\310\023\033A7\000\014\245\307\006FA7\000B\324\307\371pA7\000\362\002\307\354\233A7\000L#\306\337\306A7\000$\031\306\322\361A7\000\002\004\307\306\034B7\000,U\306\254rB7\000\274\341\306\237\235B7 W\212\310\222\310B7\000\177N\307\205\363B7\0004\351\306x\036C7\000\004\037\306kIC7#;f\310^tC7\000\371Q\307E\312C7\000$f\3078\365C7\000\032\320\306+ D7\000\226\252\306\036KD7\200\350\216\310\021vD7\000\270?\306\367\313D7\000#\032\306\353\366D7\200\2100\310\336!E7\000\270[\306\321LE7\000\"{\307\304wE7\000^\301\306\267\242E7\000\246\240\306\252\315E7\000b\006\307\235\370E7\200\003M\311\220#F7\000\374\203\306\203NF7\000T{\306wyF7\000PT\306j\244F7\000\364^\306]\317F7\000\324\003\307P\372F7\000\303 \307K^67\000\204\000\306X367\000\324\341\306e\01067\000\271\001\307r\33557\000\374\004\306\177\26257\000\240\235\305\214\20757\000\217\242\307\231\\57\000PS\306\246157 \013\236\310\262\00657\000 \222\307\277\33347\000\244\210\306\314\26047\000\226\353\306\346Z47\200\257P\310\331\20547\000\213M\307\363/47\000$\021\306\000\00547\000\373\241\307\r\33237\000z\355\307\032\25737\000\010\244\306&\20437\000\271p\3103Y37\000\307f\307#.37\000\241^\307M\00337\000l&\306Z\33027\000\226\213\306g\25527\000fx\307t\20227\300\352\020\310\201W27\000\253\231\307\215,27\000v\267\306\247\32617#\373r\310\264\25317\000X\202\305\301\20017\300\357\020\310\316U17\000\016\227\306\333*17\000\030\010\307\350\37707\300XN\310\365\32407\000\350U\306\001\25207\000\344\363\306\016\17707\200\324\360\307\033T07\000\022\"\307()07 \213\375\3105\376/7\000\265\222\307O\250/7\000\251r\307B\323/7\200O\215\307\\}/7\200\235\376\307iR/7\000\324\274\307\202\374.7\000S*\307u\'/7\000R-\307\217\321.7\000j\177\307\234\246.7\200)\346\307\251{.7\000\364\350\307\266P.7\000!\310\307\303%.7\000-/\307\320\372-7\200\'\207\307\334\317-7\300\357\006\310\351\244-7\000$(\310\366y-7\000\177:\307\003O-7\200t\361\307\020$-7\0001\217\307\035\371,7\000`G\305*\316,7\200\t3\3107\243,7\300\022\017\310Dx,7\000\244m\307PM,7\000\327o\307]\",7\000\004s\307j\367+7 \360\257\310w\314+7\000\265;\307\204\241+7\000D2\306\221v+7\000\261`\307\236K+7\000 \313\306\253 +7 \337!\311\267\365*7#6L\310\304\312*7\000\271;\310\321\237*7\000%\205\307\336t*7\000\0145\307\353I*7\200E\242\307\370\036*7\000~E\307\005\364)7\000\311\031\307\022\311)7\000\302\374\307\037\236)7\200\276\t\310+s)7\000\261Z\3108H)7\200\350\325\307E\035)7\240\321\201\310R\362(7\000\276\334\307_\307(7\000\246\016\310l\234(7\300\006\254\310yq(7\000\266\024\310\206F(7\200Q2\310\223\033(7\200d\235\307\237\360\'7\000\311\314\307\254\305\'7#i\250\310\271\232\'7\300\254Y\310\306o\'7#\260\t\310\323D\'7\000\336\232\307\340\031\'7\000w&\307\355\356&7\360\226\017\311\372\303&7\000\240\226\306\006\231&7\200\253Q\310\023n&7#OT\310 C&7\000\034&\310-\030&7\000\213d\310:\355%7\000\000\200\302\363\202\0317\000\031\273G\000X\0317\000\246\276G\r-\0317\000\314\271F\032\002\0317\000\242\037G\'\327\0307\200\334\237G4\254\0307\3401\260HA\201\0307\000\262\307FNV\0307\000\230_FZ+\0307\000\022\256Fg\000\0307\200\354\232Gt\325\0277\000\246\355F\201\252\0277\000\336\355F\216\177\0277\000:$G\233T\0277\000\324jF\250)\0277\000\273\231G\265\376\0267p\365\031I\302\323\0267\000j\240G\333}\0267\000\370\247E\350R\0267\000tbF\365\'\0267\000Q7G\034\247\0257\000\304\036F)|\0257\000\214mF5Q\0257`\254\203HB&\0257\000\213|GO\373\0247\000o#Hi\245\0247\000\026\213Fvz\0247\000\014\346F\203O\0247\000\277\236G\220$\0247\000\014\214F\235\371\0237\200\355\345G\251\316\0237\000\312\253F\266\243\0237\200\001\203G\303x\0237\200\034\206G\320M\0237\000\030\236G\335\"\0237#x\033H\352\367\0227\000\3630G\367\314\0227\000>\260F\004\242\0227\000|\242F\021w\0227\000\240uH\035L\0227\000Z\005G7\366\0217\000\':GQ\240\0217\000-\tGkJ\0217\000\340LFx\037\0217\000\224\316F\204\364\0207\000\270\365F\221\311\0207\000\004\332F\236\236\0207\000\224\021G\253s\0207\000\334UG\270H\0207\000\274\211F\305\035\0207\200\243\353G\337\307\0177\000z\037H\370q\0177\000\t3G\022\034\0177\000\335<G\037\361\0167#\275+H,\306\0167\200\323\272G9\233\0167\300\030)HSE\0167\000\026\237F_\032\0167\000L\227Fl\357\r7\000~\243Fy\304\r7\000\306\037G\206\231\r7\000\334XF\240C\r7\000 dF\272\355\0147\000\374\324F\307\302\0147\000\014\207F\340l\0147\000\327CG\372\026\0147\000\370\261F\007\354\0137\000\025\003G\024\301\0137# \\H!\226\0137\000\254\247F.k\0137\000\024\270F;#\0137\000\362\216Fa\277\n7\000\020\023F{i\n7\000(\033G\347\255\0317\0008\226E\332\330\0317\000\000\276F\315\003\0327\000\230\253E\300.\0327\000XEF\263Y\0327\000cSH\246\204\0327\200`\224G\231\257\0327\000\366\337G\214\332\0327\000P\353Es0\0337\000&\212Ff[\0337\000\263JGY\206\0337\000\212gG?\334\0337\200\260\202GL\261\0337\000\324PF2\007\0347\340\023\037I%2\0347\000\256\314F\030]\0347\000\250#F\013\210\0347\000<\251G\377\262\0347\000\320hE\362\335\0347\000\270nF\3303\0357\200\263\327G\313^\0357\000\016\227F\276\211\0357\200\313\244G\261\264\0357\0005\262G\244\337\0357\200\307\301G\230\n\0367\2001\200G\2135\0367\000\020:F~`\0367\200\330\234Gq\213\0367\200\222\314Gd\266\0367\000\357BGW\341\0367\000\240\234FJ\014\0377\200\025\332G=7\0377\000d\021F0b\0377\000dhH$\215\0377\000\002iG\027\270\0377\200[\246G\n\343\0377\200e\264G\375\r 7\000 2G\3608 7\000#\302E\343c 7\200y\250G\326\216 7\000\335DG\311\271 7\000\323\263G\275\344 7\000B\214F\260\017!7\200\246\243G\226e!7\0006`G\243:!7\200\036\344G\211\220!7\200u\371G|\273!7\000\340\202Go\346!7\000\272\377Fb\021\"7#v<HU<\"7\000\"\330FIg\"7\000\004FG<\222\"7\200;\271G/\275\"7\200C\243H\"\350\"7\000\370\337F\025\023#7\300\r\001H\010>#7\000\215^H\373h#7\200K\202G\356\223#7\000\236\350G\341\276#7\200k\303G\325\351#7\000F\346G\310\024$7\3002\030H\273?$7\200\220\276G\241\225$7`d\371H\256j$7\240\275\036I\224\300$7#y\036H\207\353$7\000#\237Gz\026%7\000^<Gal%7\200^?HnA%7\2000\231HT\227%7\200\310\323GG\302%7\300+DH
I am reading data from a file and I end up with a variable myString with the above data in it. When I do typeof on it, it says it's a string and when I do a console.log on it, it outputs the above exactly.So I created a new node.js file, and put all of the data above inside quotes:
newString = "?\21167\200Z\251\3072\26467\000 etc"
When I console.log(newString) it outputs
?‰67€Z©Ç2´67è-Æ%ß67 Ìƒ77€UüÇ 577ÉÆþ_77€sÇñŠ77tÆäµ77„Ç×à7 etc
Rather than the original string "?\21167\200Z\251\3072\26467\000..." etc
What is going on here?
Part 2. Because of 1 I have struggled to to extract the 64 bit floats from this string.
I can do it Python!:
import numpy as np
data_new = b"?\21167\200Z\251\3072\26467\000\350-\306%\33767\240\314\2 etc etc"
print(np.frombuffer(data_new, dtype="f4,f4"))
Is there an easy way to do this in node?
Explanation
In JavaScript, \<OCTAL SEQUENCE> inside a string literal will give you the Unicode character represented by that codepoint in octal. This feature is deprecated in modern JavaScript but still works for backward-compatibility reasons (except in strict mode, where it throws a SyntaxError).
The octal sequence must be in the range 0o0..0o377 (decimal 0..255 or hexadecimal 0x0..0xff) for this to happen.
'\1'.codePointAt(0).toString(16) // 1
String.fromCodePoint(0x01) // <Start of Heading>
// https://www.compart.com/en/unicode/U+0001
'\377'.codePointAt(0).toString(16) // ff
String.fromCodePoint(0xff) // "ÿ"
If the number isn't valid Octal, the backslash is simply "swallowed":
'\9'.codePointAt(0).toString(16) // 39
String.fromCodePoint(0x39) // "9"
// 9 is not a valid octal number, so the string
// simply evaluates to the character "9"
Solutions
Option 1
Replace all the backslashes \ with double-backslashes \\ to escape them, then you should be good to go.
'\\1'.codePointAt(0).toString(16) // 5c
String.fromCodePoint(0x5c) // Reverse Solidus (AKA backslash)
// https://www.compart.com/en/unicode/U+005C
Option 2
Use String.raw with a template literal.
String.raw`\1\2\3` === "\\1\\2\\3" // true
Note that this will not work if there is a backslash at the very end of the string:
String.raw`\` // Uncaught SyntaxError: Unexpected end of input

How can I get the Unicode value of a character in go?

I try to get the unicode value of a string character in Go as an Int value.
I do this:
value = strconv.Itoa(int(([]byte(char))[0]))
where char contains a string with one character.
That works for many cases. It doesn't work for umlauts like ä, ö, ü, Ä, Ö, Ü.
E.g. Ä results in 65, which is the same as for A.
How can I do that?
Supplement: I had two problems. The first was solved with any of the answers below. The second was a bit more tricky. My input was not Go normalized UTF-8 code, e.g. umlauts were represented by two characters instead of one. As ANisus said the solution is found in the package golang.org/x/text/unicode/norm. The line above is now two lines:
rune, _ := utf8.DecodeRune(norm.NFC.Bytes([]byte(char)))
value = strconv.Itoa(int(rune))
Any hints to make this shorter welcome ...
Strings are utf8 encoded, so to decode a character from a string to get the rune (unicode code point), you can use the unicode/utf8 package.
Example:
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
str := "AÅÄÖ"
for len(str) > 0 {
r, size := utf8.DecodeRuneInString(str)
fmt.Printf("%d %v\n", r, size)
str = str[size:]
}
}
Result:
65 1
197 2
196 2
214 2
Edit: (To clarify Michael's supplement)
A character such as Ä may be created using different unicode code points:
Precomposed: Ä (U+00C4)
Using combining diaeresis: A (U+0041) + ¨ (U+0308)
In order to get the precomposed form, one can use the normalization package, golang.org/x/text/unicode/norm. The NFC (Canonical Decomposition,
followed by Canonical Composition) form will turn U+0041 + U+0308 into U+00C4:
c := "\u0041\u0308"
r, _ := utf8.DecodeRune(norm.NFC.Bytes([]byte(c)))
fmt.Printf("%+q", r) // '\u00c4'
The "character" type in Go is the rune which is an alias for int32, see also Rune literals. A rune is an integer value identifying a Unicode code point.
In Go strings are represented and stored as the UTF-8 encoded byte sequence of the text. The range form of the for loop iterates over the runes of the text:
s := "äöüÄÖÜ世界"
for _, r := range s {
fmt.Printf("%c - %d\n", r, r)
}
Output:
ä - 228
ö - 246
ü - 252
Ä - 196
Ö - 214
Ü - 220
世 - 19990
界 - 30028
Try it on the Go Playground.
Read this blog article if you want to know more about the topic:
Strings, bytes, runes and characters in Go
you can use the unicode/utf8 package
rune,_:=utf8.DecodeRuneInString("Ä")
fmt.Println(rune)

How to convert string of hex digits to string wtih hexdecimal byte escapes in hbase shell (JRuby)

I have JRuby (actually Apache HBase shell).
I have lot of strings which represent bytes, every character is hex digit, 2 chars per byte. Something like:
id = "faed31"
But I need string of escaped characters:
=> "\xfa\xed1"
Any solution? Failed to google and have only very general impression about Ruby.
Here is the code which actually solves all my tasks including output that was wanted:
# Convert binary string to hex digits.
def bin_to_hex(s)
s.each_byte.map { |b| b.to_s(16).rjust(2, '0') }.join
end
# Convers hex string to binary string.
def hex_to_bin(s)
s.scan(/../).map { |x| x.hex.chr }.join
end
# HBase special 'convert and print' routine to get hex digits, process them and print.
def print_hex_to_bin(s)
Kernel.print "\"" + Bytes.toStringBinary(s.scan(/../).map { |x| x.hex.chr }.join.to_java_bytes) + "\"\n"
end
Composed mostly based on http://anthonylewis.com/2011/02/09/to-hex-and-back-with-ruby/

Ensure a base64 encoded string never includes a nonalphanumerical character

Is there anyway to ensure that a base64 encoded string never includes a non-alphanumerical character?
For example, if I have a long string that I encode, is there something I can prepend or append to it that will ensure that when encoded with base64 will only include letters and numbers in the encoded string? Something like this:
String: 192.168.1.1
Encoded: MTkyLjE2OC4xLjE= <- I want to 'get rid' of the equal sign.
I tried appending } at the end of the string (new string is now 192.168.1.1}), and this worked (new encoded string: MTkyLjE2OC4xLjF9), but is there a method of ensuring every combination works?
Is this possible?
You can just rtrim() the equals signs away, which is what most people do.
but as for your question: when length of string / 3 is a whole number. So:
$pad = strlen($str) % 3; if($pad) { $str .= str_repeat(' ', $pad); }
but yeah, the parser will add the equals signs back in automatically just like that, to a multiple of 4, when you pass the string back in - so you dont need to keep them.
It's about the length. The = sign is padding to make the output a multiple of 4 base64 characters. 3 characters translate to 4 base64 characters, so you just need to make your input string a multiple of 3 characters in length somehow. In your case:
192.168.1.1 - 11 characters long, base64 ends with =
192.168.1.1$ - 12 characters long, base64 doesn't end with =
Choose a padding character you can easily remove.
The other alternative is to strip the = from the output, then make sure you append = signs to make a multiple of 4 characters before you try to base64 decode...

Remove trailing "=" when base64 encoding

I am noticing that whenever I base64 encode a string, a "=" is appended at the end. Can I remove this character and then reliably decode it later by adding it back, or is this dangerous? In other words, is the "=" always appended, or only in certain cases?
I want my encoded string to be as short as possible, that's why I want to know if I can always remove the "=" character and just add it back before decoding.
The = is padding. <!------------>
Wikipedia says
An additional pad character is
allocated which may be used to force
the encoded output into an integer
multiple of 4 characters (or
equivalently when the unencoded binary
text is not a multiple of 3 bytes) ;
these padding characters must then be
discarded when decoding but still
allow the calculation of the effective
length of the unencoded text, when its
input binary length would not be a
multiple of 3 bytes (the last non-pad
character is normally encoded so that
the last 6-bit block it represents
will be zero-padded on its least
significant bits, at most two pad
characters may occur at the end of the
encoded stream).
If you control the other end, you could remove it when in transport, then re-insert it (by checking the string length) before decoding.
Note that the data will not be valid Base64 in transport.
Also, Another user pointed out (relevant to PHP users):
Note that in PHP base64_decode will accept strings without padding, hence if you remove it to process it later in PHP it's not necessary to add it back. – Mahn Oct 16 '14 at 16:33
So if your destination is PHP, you can safely strip the padding and decode without fancy calculations.
I wrote part of Apache's commons-codec-1.4.jar Base64 decoder, and in that logic we are fine without padding characters. End-of-file and End-of-stream are just as good indicators that the Base64 message is finished as any number of '=' characters!
The URL-Safe variant we introduced in commons-codec-1.4 omits the padding characters on purpose to keep things smaller!
http://commons.apache.org/codec/apidocs/src-html/org/apache/commons/codec/binary/Base64.html#line.478
I guess a safer answer is, "depends on your decoder implementation," but logically it is not hard to write a decoder that doesn't need padding.
In JavaScript you could do something like this:
// if this is your Base64 encoded string
var str = 'VGhpcyBpcyBhbiBhd2Vzb21lIHNjcmlwdA==';
// make URL friendly:
str = str.replace(/\+/g, '-').replace(/\//g, '_').replace(/\=+$/, '');
// reverse to original encoding
if (str.length % 4 != 0){
str += ('===').slice(0, 4 - (str.length % 4));
}
str = str.replace(/-/g, '+').replace(/_/g, '/');
See also this Fiddle: http://jsfiddle.net/7bjaT/66/
= is added for padding. The length of a base64 string should be multiple of 4, so 1 or 2 = are added as necessary.
Read: No, you shouldn't remove it.
On Android I am using this:
Global
String CHARSET_NAME ="UTF-8";
Encode
String base64 = new String(
Base64.encode(byteArray, Base64.URL_SAFE | Base64.NO_PADDING | Base64.NO_CLOSE | Base64.NO_WRAP),
CHARSET_NAME);
return base64.trim();
Decode
byte[] bytes = Base64.decode(base64String,
Base64.URL_SAFE | Base64.NO_PADDING | Base64.NO_CLOSE | Base64.NO_WRAP);
equals this on Java:
Encode
private static String base64UrlEncode(byte[] input)
{
Base64 encoder = new Base64(true);
byte[] encodedBytes = encoder.encode(input);
return StringUtils.newStringUtf8(encodedBytes).trim();
}
Decode
private static byte[] base64UrlDecode(String input) {
byte[] originalValue = StringUtils.getBytesUtf8(input);
Base64 decoder = new Base64(true);
return decoder.decode(originalValue);
}
I had never problems with trailing "=" and I am using Bouncycastle as well
If you're encoding bytes (at fixed bit length), then the padding is redundant. This is the case for most people.
Base64 consumes 6 bits at a time and produces a byte of 8 bits that only uses six bits worth of combinations.
If your string is 1 byte (8 bits), you'll have an output of 12 bits as the smallest multiple of 6 that 8 will fit into, with 4 bits extra. If your string is 2 bytes, you have to output 18 bits, with two bits extra. For multiples of six against multiple of 8 you can have a remainder of either 0, 2 or 4 bits.
The padding says to ignore those extra four (==) or two (=) bits. The padding is there tell the decoder about your padding.
The padding isn't really needed when you're encoding bytes. A base64 encoder can simply ignore left over bits that total less than 8 bits. In this case, you're best off removing it.
The padding might be of some use for streaming and arbitrary length bit sequences as long as they're a multiple of two. It might also be used for cases where people want to only send the last 4 bits when more bits are remaining if the remaining bits are all zero. Some people might want to use it to detect incomplete sequences though it's hardly reliable for that. I've never seen this optimisation in practice. People rarely have these situations, most people use base64 for discrete byte sequences.
If you see answers suggesting to leave it on, that's not a good encouragement if you're simply encoding bytes, it's enabling a feature for a set of circumstances you don't have. The only reason to have it on in that case might be to add tolerance to decoders that don't work without the padding. If you control both ends, that's a non-concern.
If you're using PHP the following function will revert the stripped string to its original format with proper padding:
<?php
$str = 'base64 encoded string without equal signs stripped';
$str = str_pad($str, strlen($str) + (4 - ((strlen($str) % 4) ?: 4)), '=');
echo $str, "\n";
Using Python you can remove base64 padding and add it back like this:
from math import ceil
stripped = original.rstrip('=')
original = stripped.ljust(ceil(len(stripped) / 4) * 4, '=')
Yes, there are valid use cases where padding is omitted from a Base 64 encoding.
The JSON Web Signature (JWS) standard (RFC 7515) requires Base 64 encoded data to omit
padding. It expects:
Base64 encoding [...] with all trailing '='
characters omitted (as permitted by Section 3.2) and without the
inclusion of any line breaks, whitespace, or other additional
characters. Note that the base64url encoding of the empty octet
sequence is the empty string. (See Appendix C for notes on
implementing base64url encoding without padding.)
The same applies to the JSON Web Token (JWT) standard (RFC 7519).
In addition, Julius Musseau's answer has indicated that Apache's Base 64 decoder doesn't require padding to be present in Base 64 encoded data.
I do something like this with java8+
private static String getBase64StringWithoutPadding(String data) {
if(data == null) {
return "";
}
Base64.Encoder encoder = Base64.getEncoder().withoutPadding();
return encoder.encodeToString(data.getBytes());
}
This method gets an encoder which leaves out padding.
As mentioned in other answers already padding can be added after calculations if you need to decode it back.
For Android You may have trouble if You want to use android.util.base64 class, since that don't let you perform UnitTest others that integration test - those uses Adnroid environment.
In other hand if You will use java.util.base64, compiler warns You that You sdk may to to low (below 26) to use it.
So I suggest Android developers to use
implementation "commons-codec:commons-codec:1.13"
Encoding object
fun encodeObjectToBase64(objectToEncode: Any): String{
val objectJson = Gson().toJson(objectToEncode).toString()
return encodeStringToBase64(objectJson.toByteArray(Charsets.UTF_8))
}
fun encodeStringToBase64(byteArray: ByteArray): String{
return Base64.encodeBase64URLSafeString(byteArray).toString() // encode with no padding
}
Decoding to Object
fun <T> decodeBase64Object(encodedMessage: String, encodeToClass: Class<T>): T{
val decodedBytes = Base64.decodeBase64(encodedMessage)
val messageString = String(decodedBytes, StandardCharsets.UTF_8)
return Gson().fromJson(messageString, encodeToClass)
}
Of course You may omit Gson parsing and put straight away into method Your String transformed to ByteArray

Resources