Doing base64 decoding on a string in Go - string

I have a particular string that I need to run base64 decode on in Go. This string looks something like this:
qU4aaakFmjaaaaI5aaa\/EN\/aaa\/SaaaJaaa6aa+nGnk=
Please note this is not the exact same string but it does have the same shape and number of characters, padding characters and it has those \/ things on the same positions in the string.
Let's call it key.
In PHP if I run
base64_decode($key);
the decode operation is successful
If In Python I run
base64.b64decode(key)
the decode operation is once more successful. Problem is, I can't do base64 decoding on this thing in Go.
dcd, err := base64.StdEncoding.DecodeString("qU4aaakFmjaaaaI5aaa\\/EN\\/aaa\\/SaaaJaaa6aa+nGnk=")
if err != nil {
log.Fatal(err)
}
return dcd
This will return the error
illegal base64 data at input byte 19
In the Go version, I have to escape those backslashes. It seems that the error appears at byte 19. Bearing in mind that this string that I am using as an example has the same length as the string that is actually causing the problem I would believe that the error happens right at the byte with the \ character. What can I do about this?

The alphabet of the standard Base64 does not contain backslash. So the qU4aaakFmjaaaaI5aaa\/EN\/aaa\/SaaaJaaa6aa+nGnk= input is not valid Base64 encoded string.
The forward slash is valid character in Base64, just not the backslash. It's possible the \/ is a sequence designating a single slash. If so, replace the \/ sequences with a single / and you're good to go.
For example:
s := `qU4aaakFmjaaaaI5aaa\/EN\/aaa\/SaaaJaaa6aa+nGnk=`
s = strings.ReplaceAll(s, `\/`, `/`)
dcd, err := base64.StdEncoding.DecodeString(s)
if err != nil {
log.Fatal(err)
}
fmt.Println(string(dcd))
Which outputs (try it on the Go Playground):
�Ni��6�i�9i����i��i��i��i��y
If \/ is not a special sequence and you want to discard all invalid characters from the input, this is how it could be done:
var valid = map[rune]bool{}
func init() {
for _, r := range "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=" {
valid[r] = true
}
}
func clean(s string) string {
return strings.Map(func(r rune) rune {
if valid[r] {
return r
}
return -1
}, s)
}
func main() {
s := `qU4aaakFmjaaaaI5aaa\/EN\/aaa\/SaaaJaaa6aa+nGnk=`
s = clean(s)
dcd, err := base64.StdEncoding.DecodeString(s)
if err != nil {
log.Fatal(err)
}
fmt.Println(string(dcd))
}
Output is the same. Try this one on the Go Playground.

Related

How to correctly process a string with escapes in Go?

I am creating a program, which is processing and calculating sizes of open-source repositories and libraries, and saving the data to database for further analysis.
I have an input string: github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.
Parsed to a format: github.com/\!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1
Then I parse that into a format /home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm#v0.0.0-20210617225240-d185dfc1b5a1 which is a valid path in my filesystem, where I've downloaded that particular Go Library.
After that, I am passing that path to the gocloc -program (https://github.com/hhatto/gocloc)
And parse the result.
But the issue is, when I am saving that string /home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm#v0.0.0-20210617225240-d185dfc1b5a1 into a variable, Go actually adds another escape to the string I am saving so it's actually /home/username/dev/glass/tmp/pkg/mod/github.com/\\!azure/go-ansiterm#v0.0.0-20210617225240-d185dfc1b5a1 in memory. (fmt.Println - for example removes that)
Problem is, when I am passing that string as an argument to os/exec, which runs gocloc and that path string, it runs command with two escapes - and that's not a valid path.
Is there any way to work around this? One idea for me is to just a create shell script on what I want to do
This is the function, which parses github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 to a format github.com/\!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 - and after thats saved into a variable, and the variable has one more escapes, than it should have.
func parseUrlToVendorDownloadFormat(input string) string {
// Split the input string on the first space character
parts := strings.SplitN(input, " ", 2)
if len(parts) != 2 {
return ""
}
// Split the package name on the '/' character
packageNameParts := strings.Split(parts[0], "/")
// Add the '\!' prefix and lowercase each part of the package name
for i, part := range packageNameParts {
if hasUppercase(part) {
packageNameParts[i] = "\\!" + strings.ToLower(part)
}
}
// Join the modified package name parts with '/' characters
packageName := strings.Join(packageNameParts, "/")
return strings.ReplaceAll(packageName+"#"+parts[1], `\\!`, `\!`)
}
After, string is parsed to a format: /home/username/dev/glass/tmp/pkg/mod/github.com/\!azure/go-ansiterm#v0.0.0-20210617225240-d185dfc1b5a1
that is passed to this function:
// Alternative goCloc - command.
func linesOfCode(dir string) (int, error) {
// Run the `gocloc` command in the specified directory and get the output
cmd := exec.Command("gocloc", dir)
output, err := cmd.Output()
if err != nil {
return 0, err
}
lines, err := parseTotalLines(string(output))
if err != nil {
return 0, err
}
return lines, nil
}
Which uses this parse function:
// Parse from the GoCloc response.
func parseTotalLines(input string) (int, error) {
// Split the input string into lines
lines := strings.Split(input, "\n")
// Find the line containing the "TOTAL" row
var totalLine string
for _, line := range lines {
if strings.Contains(line, "TOTAL") {
totalLine = line
break
}
}
// If the "TOTAL" line was not found, return an error
if totalLine == "" {
return 0, fmt.Errorf("could not find TOTAL line in input")
}
// Split the "TOTAL" line into fields
fields := strings.Fields(totalLine)
// If the "TOTAL" line doesn't have enough fields, return an error
if len(fields) < 4 {
return 0, fmt.Errorf("invalid TOTAL line: not enough fields")
}
// Get the fourth field (the code column)
codeStr := fields[3]
// Remove any commas from the code column
codeStr = strings.Replace(codeStr, ",", "", -1)
// Parse the code column as an integer
code, err := strconv.Atoi(codeStr)
if err != nil {
return 0, err
}
return code, nil
}
What I've tried:
Use gocloc as a library, didn't get it to work.
Use single quotes instead of escapes, didn't get it to work, but I think there might be something.
One way to get around this, might be to create separate shell script and pass the dir to that as an argument, and get rid of the escapes there, I don't know ...
If you want to observe all the source code: https://github.com/haapjari/glass and more specificly, it's the files https://github.com/haapjari/glass/blob/main/pkg/plugins/goplg/plugin.go and function enrichWithLibraryData() and utils functions, which are here: https://github.com/haapjari/glass/blob/main/pkg/plugins/goplg/utils.go (the examples above)
Any ideas? How to proceed? Thanks in advance!
I have an input string: github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.
Parsed to a format: github.com/\!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.
Your parser seems to have error. I would expect Azure to become !azure:
github.com/!azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1.
Go Modules Reference
To avoid ambiguity when serving from case-insensitive file systems, the $module and $version elements are case-encoded by replacing every uppercase letter with an exclamation mark followed by the corresponding lower-case letter. This allows modules example.com/M and example.com/m to both be stored on disk, since the former is encoded as example.com/!m.

How to match by regexp 3 and 4 bytes UTF-8

I just want to find 3-byte character in Go using regexp.
But it panic with
regexp: Compile(\x{E29AA4}): error parsing regexp: invalid escape
sequence: \x{E29AA4
func get_words_from(text string) []string {
words := regexp.MustCompile(`\x{E29AA4}`)
return words.FindAllString(text, -1)
}
func main() {
text := "One,ВАПОЛтлдо⚤two ыаплд⚤ы ыапю.ы./\tавt𒀅hr𓀋ee!"
fmt.Println(get_words_from(text))
}
You can try on playground
Decode the UTF-8 byte sequence E2 9A A4 with e.g. utf8.DecodeRune() and use the resulting rune in the regexp:
func get_words_from(text string) []string {
r, _ := utf8.DecodeRune([]byte{0xE2, 0x9A, 0xA4})
words := regexp.MustCompile(string(r))
return words.FindAllString(text, -1)
}
You may also simply convert the byte slice to string (which interprets it as UTF-8 encoded bytes):
func get_words_from2(text string) []string {
s := string([]byte{0xE2, 0x9A, 0xA4})
words := regexp.MustCompile(s)
return words.FindAllString(text, -1)
}
Or use the equivalent unicode code point (which is 0x26a4) in the regexp string:
func get_words_from3(text string) []string {
words := regexp.MustCompile("\u26a4")
return words.FindAllString(text, -1)
}
Note that "\u26a4" is an interpreted string literal and will be unescaped by the Go compiler (not the regexp package).
All examples return (try the examples on the Go Playground):
[⚤ ⚤]
To filter out all runes that have 3 or more bytes in UTF-8, you may use a for range and utf8.RuneLen():
text := "One,ВАПОЛтлдо⚤two ыаплд⚤ы ыапю.ы./\tавt𒀅hr𓀋ee!"
fmt.Println(text)
var out []rune
for _, r := range text {
if utf8.RuneLen(r) < 3 {
out = append(out, r)
}
}
fmt.Println(string(out))
This outputs (try it on the Go Playground):
One,ВАПОЛтлдо⚤two ыаплд⚤ы ыапю.ы./ авt𒀅hr𓀋ee!
One,ВАПОЛтлдоtwo ыаплды ыапю.ы./ авthree!
Or use strings.Map(), where you return -1 for such runes, which then will be left out in the result:
out := strings.Map(func(r rune) rune {
if utf8.RuneLen(r) < 3 {
return r
}
return -1
}, text)
fmt.Println(string(out))
This outputs the same. Try this one on the Go Playground.
Also I found that character ⚤ in regex can match by \xE2\x9A\xA4 instead of wrong: \x{E29AA4}

Go - Is it possible to convert a raw string literal to an interpreted string literal?

Is it possible to convert raw string literals to interpreted string literals in Go? (See language specification)
I have a raw string literal, but I want to print out to the console what one would get with an interpreted string literal—that is, text output formatted using escape sequences.
For example, printing this raw string literal gives
s := `\033[1mString in bold.\033[0m`
println(s) // \033[1mString in bold.\033[0m
but I want the same result one would get with
s := "\033[1mString in bold.\033[0m"
println(s) // String in bold. (In bold)
For context, I am trying to print the contents of a text file that is formatted with escape sequences using
f, _ := := ioutil.ReadFile("file.txt")
println(string(f))
but the output is in the former way.
Use strconv.Unquote():
s := `\033[1mString in bold.\033[0m`
s2, err := strconv.Unquote(`"` + s + `"`)
if err != nil {
panic(err)
}
fmt.Println("normal", s2)
This will output:
normal String in bold.
Note that the string value passed to strconv.Unquote() must contain wrapping double quotes or backticks, and since the source s does not contain the wrapping quotes, I pre- and suffixed those like this:
`"` + s + `"`
See related questions:
How do I make raw unicode encoded content readable?
Golang convert integer to unicode character
How to transform Go string literal code to its value?
How to convert escape characters in HTML tags?
First, if you not have raw string you need to write it as raw string and without changes thing like "\n" and others,and incase you want to return bytes:
s := "\033[1mString in bold.\033[0m"
rune := string([]rune(s))
b := []byte(rune)
f, err := os.OpenFile("filename.txt", os.O_RDWR, 0644)
if err != nil {
return err
}
if _, err := f.Write(b); err != nil {
return err
}
with this approach, the bytes don't change so that the sh265 will be the same, and you return it after reading it from the file with no further changes.
second, for reading the data and printing:
bytes, err := os.ReadFile("filename.txt")
if err != nil {
return err
}
s = strconv.Quote(string(bytes))
fmt.Println(s)
and you gone got "\x1b[1mString in bold.\x1b[0m"

Converting unicode to "java

I have this a problem with character conversion. It all starts with this string: U+1F618. According to fileformat.info, this string is now (almost) in the HTML Entity (hex) notation.
But I need this character to be converted into a C/C++/Java source code-notation. I really don't know if this is the official name for the notation, but I assume this site to be correct :).
So basically my question is, instead of outputting to the real emoji, how can I get the value \uD83D\uDE18?
package main
import (
"fmt"
"html"
"strconv"
"strings"
)
func main() {
original := "\\U0001f618"
// Hex String
h := strings.ReplaceAll(original, "\\U", "0x")
// Hex to Int
i, _ := strconv.ParseInt(h, 0, 64)
// Unescape the string (HTML Entity -> String).
str := html.UnescapeString(string(i))
// Display the emoji.
fmt.Println(str)
// but I want something like this: \uD83D\uDE18
}
If you have the input as a string, e.g.
s := "\\U0001f618"
You may use strconv.Unquote() to unquote it. Be sure the string you pass to it is quoted (it must be wrapped with backticks or double quotes):
s2, err := strconv.Unquote(`"` + s + `"`)
fmt.Println(s2, err)
This will give you an s2 string that contains your emoji:
😘 <nil>
Java's string model is a char[] which contains the UTF-16 code points. Go's memory model of string is the UTF-8 encoded byte sequence.
To convert a Go string to UTF-16, you may use the unicode/utf16 package of the standard lib. For example utf16.Encode() encodes a series of runes (unicode codepoints) to UTF-16. You get a series of runes from a Go string with a simple type conversion: []rune("some string").
u16 := utf16.Encode([]rune(s2))
fmt.Printf("%X\n", u16)
The above prints the UTF16 codepoints in hexadecimal format:
[D83D DE18]
To get the format you want, use this loop:
buf := &strings.Builder{}
for _, v := range u16 {
fmt.Fprintf(buf, "\\u%X", v)
}
fmt.Println(buf.String())
Which outputs:
\uD83D\uDE18
Try the examples on the Go Playground.
You can capture this series of conversions in a function:
func convert(s string) (string, error) {
s2, err := strconv.Unquote(`"` + s + `"`)
if err != nil {
return "", err
}
buf := &strings.Builder{}
for _, v := range utf16.Encode([]rune(s2)) {
fmt.Fprintf(buf, "\\u%X", v)
}
return buf.String(), nil
}
Using it:
fmt.Println(convert("\\U0001f618"))
Which outputs (try it on the Go Playground):
\uD83D\uDE18 <nil>

Decode base64 with white space

I have a base64 encoded string i'm trying to decrypt with go. The string contains white spaces which should be ignored.
A sample code I'm trying:
s := "eyJ0aHJlZURTU2VydmVyVHJhbnNJRCI6IjEzZmU3MWQ0LWQxMGQtNDIyMC1hMjE2LTIwMDZkMWRkNGNiOCIsImFjc1RyY++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++W5zSUQiOiJkN2M0NWY5OS05NDc4LTQ0YTYtYjFmMi0xMDAwMDAwMDMzNjYiLCJtZXNzYWdlVHlwZSI6IkNSZXEiLCJtZXNzYWdlVmVyc2lvbiI6IjIuMS4wIiwiY2hhbGxlbmdlV2luZG93U2l6ZSI6IjAyIn0%3D"
out, err := base64.URLEncoding.DecodeString(s)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(string(out))
This code returns:
illegal base64 data at input byte 93
After changing the string padding, and using StdEncoding instead of URLEncoding:
s= strings.Replace(s, "%3D", "=", -1)
out, err := base64.StdEncoding.DecodeString(s)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(string(out))
The output will be:
{"threeDSServerTransID":"13fe71d4-d10d-4220-a216-2006d1dd4cb8","acsTrc���������������������������������������������������������������������������nsID":"d7c45f99-9478-44a6-b1f2-100000003366","messageType":"CReq","messageVersion":"2.1.0","challengeWindowSize":"02"}
How can I decrypt the string correctly?
What you have is most likely "cut off" from a URL, and it is in URL-encoded form. So to get a Base64 string, you have to first decode it, you may use url.PathUnescape() for this.
Once you have the unescaped string, you may decode it using the base64.StdEncoding encoder. Note that just because it is / was part of a URL, that doesn't make it a base64 string that used the alphabet of the URL-safe version of Base64.
Also the + signs in the middle of it are really just "junk". They shouldn't be there in the first place, so double-check how you get your input, but now that they are there, you have to remove them. For that, you may use strings.Replace().
Final code to decode your invalid input:
s := "eyJ0aHJlZURTU2VydmVyVHJhbnNJRCI6IjEzZmU3MWQ0LWQxMGQtNDIyMC1hMjE2LTIwMDZkMWRkNGNiOCIsImFjc1RyY++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++W5zSUQiOiJkN2M0NWY5OS05NDc4LTQ0YTYtYjFmMi0xMDAwMDAwMDMzNjYiLCJtZXNzYWdlVHlwZSI6IkNSZXEiLCJtZXNzYWdlVmVyc2lvbiI6IjIuMS4wIiwiY2hhbGxlbmdlV2luZG93U2l6ZSI6IjAyIn0%3D"
s = strings.Replace(s, "+", "", -1)
var err error
if s, err = url.PathUnescape(s); err != nil {
panic(err)
}
out, err := base64.StdEncoding.DecodeString(s)
if err != nil {
panic(err)
}
fmt.Println(string(out))
Complete output (try it on the Go Playground):
{"threeDSServerTransID":"13fe71d4-d10d-4220-a216-2006d1dd4cb8",
"acsTransID":"d7c45f99-9478-44a6-b1f2-100000003366","messageType":"CReq",
"messageVersion":"2.1.0","challengeWindowSize":"02"}
Note that the + sign is a valid symbol in the alphabet of the standard Base64, and you can even decode the Base64 without removing the + symbols, but then you get junk data remaining in the JSON keys in the result.
The input string has three problems
First the + signs in the middle of it
Second there is garbage (a url encoded +) at the end
Third the string appears to not be valid Base64
To remove the plus signs in the middle, find the index of the start and finish and make a new string
To remove the garbage at the end, terminate the string earlier ( at index 249 of the fixed string)
There is a further problem with the string at index 148 of the fixed string, which I would guess is due to bad data
But the code fragment below shows how to overcome the first two things
package main
import (
"fmt"
"encoding/base64"
"strings"
)
func main() {
s := "eyJ0aHJlZURTU2VydmVyVHJhbnNJRCI6IjEzZmU3MWQ0LWQxMGQtNDIyMC1hMjE2LTIwMDZkMWRkNGNiOCIsImFjc1RyY++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++W5zSUQiOiJkN2M0NWY5OS05NDc4LTQ0YTYtYjFmMi0xMDAwMDAwMDMzNjYiLCJtZXNzYWdlVHlwZSI6IkNSZXEiLCJtZXNzYWdlVmVyc2lvbiI6IjIuMS4wIiwiY2hhbGxlbmdlV2luZG93U2l6ZSI6IjAyIn0%3D"
a:=strings.Index(s,"+")
b:=strings.LastIndex(s,"+")+1
fixed:=s[0:a] + s[b:249]
out, err := base64.StdEncoding.DecodeString(fixed)
if err != nil {
fmt.Println(err)
fmt.Println(fixed)
}
fmt.Println(a,b)
fmt.Println(String(out))
}

Resources