here is my code and I don't understand why the decode function doesn't work.
Little insight would be great please.
func EncodeB64(message string) (retour string) {
base64Text := make([]byte, base64.StdEncoding.EncodedLen(len(message)))
base64.StdEncoding.Encode(base64Text, []byte(message))
return string(base64Text)
}
func DecodeB64(message string) (retour string) {
base64Text := make([]byte, base64.StdEncoding.DecodedLen(len(message)))
base64.StdEncoding.Decode(base64Text, []byte(message))
fmt.Printf("base64: %s\n", base64Text)
return string(base64Text)
}
It gaves me :
[Decode error - output not utf-8][Decode error - output not utf-8]
The len prefix is superficial and causes the invalid utf-8 error:
package main
import (
"encoding/base64"
"fmt"
"log"
)
func main() {
str := base64.StdEncoding.EncodeToString([]byte("Hello, playground"))
fmt.Println(str)
data, err := base64.StdEncoding.DecodeString(str)
if err != nil {
log.Fatal("error:", err)
}
fmt.Printf("%q\n", data)
}
(Also here)
Output
SGVsbG8sIHBsYXlncm91bmQ=
"Hello, playground"
EDIT: I read too fast, the len was not used as a prefix. dystroy got it right.
DecodedLen returns the maximal length.
This length is useful for sizing your buffer but part of the buffer won't be written and thus won't be valid UTF-8.
You have to use only the real written length returned by the Decode function.
l, _ := base64.StdEncoding.Decode(base64Text, []byte(message))
log.Printf("base64: %s\n", base64Text[:l])
To sum up the other two posts, here are two simple functions to encode/decode Base64 strings with Go:
// Dont forget to import "encoding/base64"!
func base64Encode(str string) string {
return base64.StdEncoding.EncodeToString([]byte(str))
}
func base64Decode(str string) (string, bool) {
data, err := base64.StdEncoding.DecodeString(str)
if err != nil {
return "", true
}
return string(data), false
}
Try it!
#Denys Séguret's answer is almost 100% correct. As an improvement to avoid wasting memory with non used space in base64Text, you should use base64.DecodedLen. Take a look at how base64.DecodeString uses it.
It should look like this:
func main() {
message := base64.StdEncoding.EncodeToString([]byte("Hello, playground"))
base64Text := make([]byte, base64.StdEncoding.DecodedLen(len(message)))
n, _ := base64.StdEncoding.Decode(base64Text, []byte(message))
fmt.Println("base64Text:", string(base64Text[:n]))
}
Try it here.
More or less like above, but using []bytes and part of a bigger struct:
func (s secure) encodePayload(body []byte) string {
//Base64 Encode
return base64.StdEncoding.EncodeToString(body)
}
func (s secure) decodePayload(body []byte) ([]byte, error) {
//Base64 Decode
b64 := make([]byte, base64.StdEncoding.DecodedLen(len(body)))
n, err := base64.StdEncoding.Decode(b64, body)
if err != nil {
return nil, err
}
return b64[:n], nil
}
Related
I have a JSON file as follows.
secret.json:
{
"secret": "strongPassword"
}
I want to print out an encrypted value of the key "secret".
I've so far tried as follows.
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"go.mozilla.org/sops"
)
type secretValue struct {
Value string `json:"secret"`
}
func main() {
file, _ := ioutil.ReadFile("secret.json")
getSecretValue := secretValue{}
_ = json.Unmarshal([]byte(file), &getSecretValue)
encryptedValue, err := sops.Tree.Encrypt([]byte(getSecretValue.Value), file)
if err != nil {
panic(err)
}
fmt.Println(encryptedValue)
}
As you might have guessed, I'm pretty new to Go and the code above doesn't work.
How can I improve the code to print out the encrypted value?
Please note that I'm writing code like this only to see how SOPS works using Go. I don't print out secret value like this in production.
Edit:
I think the problem is the arguments for the Encrypt function. According to the documentation, it should take []byte key and Cipher arguments, but I don't know either if I'm setting the []byte key correct or where that Cipher comes from. Is it from crypto/cipher package?
Edit 2:
Thank you #HolaYang for the great answer.
I tried to make your answer work with the external JSON file as follows, but it gave me an error message saying cannot use fileContent (type secretValue) as type []byte in argument to (&"go.mozilla.org/sops/stores/json".Store literal).LoadPlainFile.
package main
import (
hey "encoding/json"
"fmt"
"io/ioutil"
"go.mozilla.org/sops"
"go.mozilla.org/sops/aes"
"go.mozilla.org/sops/stores/json"
)
type secretValue struct {
Value string `json:"secret"`
}
func main() {
// fileContent := []byte(`{
// "secret": "strongPassword"
// }`)
file, _ := ioutil.ReadFile("secret.json")
fileContent := secretValue{}
//_ = json.Unmarshal([]byte(file), &fileContent)
_ = hey.Unmarshal([]byte(file), &fileContent)
encryptKey := []byte("0123456789012345") // length 16
branches, _ := (&json.Store{}).LoadPlainFile(fileContent)
tree := sops.Tree{Branches: branches}
r, err := tree.Encrypt(encryptKey, aes.NewCipher())
if err != nil {
panic(err)
}
fmt.Println(r)
}
Let's see the function declaration of sops.Tree.Encrypt (a typo here in your code).
By the code, we should do in these steps.
Construct a sops.Tree instance with the json file.
Use a certain Cipher for your encrypt.
Try yourself in this way please.
Code demo below, with AES as Cipher, and sops can only encrypt the total tree with the source code interface.
package main
import (
"fmt"
"go.mozilla.org/sops"
"go.mozilla.org/sops/aes"
"go.mozilla.org/sops/stores/json"
)
func main() {
/*
fileContent := []byte(`{
"secret": "strongPassword"
}`)
*/
fileContent, _ := ioutil.ReadFile("xxx.json")
encryptKey := []byte("0123456789012345") // length 16
branches, _ := (&json.Store{}).LoadPlainFile(fileContent)
tree := sops.Tree{Branches: branches}
r, err := tree.Encrypt(encryptKey, aes.NewCipher())
if err != nil {
panic(err)
}
fmt.Println(r)
}
I'm new to go and have been using split to my advantage. Recently I came across a problem I wanted to split something, and keep the splitting char in my second slice rather than removing it, or leaving it in the first slice as with SplitAfter.
For example the following code:
strings.Split("email#email.com", "#")
returned: ["email", "email.com"]
strings.SplitAfter("email#email.com", "#")
returned: ["email#", "email.com"]
What's the best way to get ["email", "#email.com"]?
Use strings.Index to find the # and slice to get the two parts:
var part1, part2 string
if i := strings.Index(s, "#"); i >= 0 {
part1, part2 = s[:i], s[i:]
} else {
// handle case with no #
}
Run it on the playground.
Could this work for you?
s := strings.Split("email#email.com", "#")
address, domain := s[0], "#"+s[1]
fmt.Println(address, domain)
// email #email.com
Then combing and creating a string
var buffer bytes.Buffer
buffer.WriteString(address)
buffer.WriteString(domain)
result := buffer.String()
fmt.Println(result)
// email#email.com
You can use bufio.Scanner:
package main
import (
"bufio"
"strings"
)
func email(data []byte, eof bool) (int, []byte, error) {
for i, b := range data {
if b == '#' {
if i > 0 {
return i, data[:i], nil
}
return len(data), data, nil
}
}
return 0, nil, nil
}
func main() {
s := bufio.NewScanner(strings.NewReader("email#email.com"))
s.Split(email)
for s.Scan() {
println(s.Text())
}
}
https://golang.org/pkg/bufio#Scanner.Split
Is there any way to range over a string in Go templates (that is, from the code in the template itself, not from native Go)? It doesn't seem to be supported directly (The value of the pipeline must be an array, slice, map, or channel.), but is there some hack like splitting the string into an array of single-character strings or something?
Note that I am unable to edit any go source: I'm working with a compiled binary here. I need to make this happen from the template code alone.
You can use FuncMap to split string into characters.
package main
import (
"text/template"
"log"
"os"
)
func main() {
tmpl, err := template.New("foo").Funcs(template.FuncMap{
"to_runes": func(s string) []string {
r := []string{}
for _, c := range []rune(s) {
r = append(r, string(c))
}
return r
},
}).Parse(`{{range . | to_runes }}[{{.}}]{{end}}`)
if err != nil {
log.Fatal(err)
}
err = tmpl.Execute(os.Stdout, "hello world")
if err != nil {
log.Fatal(err)
}
}
This should be:
[h][e][l][l][o][ ][w][o][r][l][d]
I am very, very memory careful as I have to write programs that need to cope with massive datasets.
Currently my application quickly reaches 32GB of memory, starts swapping, and then gets killed by the system.
I do not understand how this can be since all variables are collectable (in functions and quickly released) except TokensStruct and TokensCount in the Trainer struct. TokensCount is just a uint. TokensStruct is a 1,000,000 row slice of [5]uint32 and string, so that means 20 bytes + string, which we could call a maximum of 50 bytes per record. 50*1000000 = 50MB of memory required. So this script should therefore not use much more than 50MB + overhead + temporary collectable variables in the functions (maybe another 50MB max.) The maximum potential size of TokensStruct is 5,000,000, as this is the size of dictionary, but even then it would be only 250MB of memory. dictionary is a map and apparently uses around 600MB of memory, as that is how the app starts, but this is not an issue because dictionary is only loaded once and never written to again.
Instead it uses 32GB of memory then dies. By the speed that it does this I expect it would happily get to 1TB of memory if it could. The memory appears to increase in a linear fashion with the size of the files being loaded, meaning that it appears to never clear any memory at all. Everything that enters the app is allocated more memory and memory is never freed.
I tried implementing runtime.GC() in case the garbage collection wasn't running often enough, but this made no difference.
Since the memory usage increases in a linear fashion then this would imply that there is a memory leak in GetTokens() or LoadZip(). I don't know how this could be, since they are both functions and only do one task and then close. Or it could be that the tokens variable in Start() is the cause of the leak. Basically it looks like every file that is loaded and parsed is never released from memory, as that is the only way that the memory could fill up in a linear fashion and keep on rising up to 32GB++.
Absolute nightmare! What's wrong with Go? Any way to fix this?
package main
import (
"bytes"
"code.google.com/p/go.text/transform"
"code.google.com/p/go.text/unicode/norm"
"compress/zlib"
"encoding/gob"
"fmt"
"github.com/AlasdairF/BinSearch"
"io/ioutil"
"os"
"regexp"
"runtime"
"strings"
"unicode"
"unicode/utf8"
)
type TokensStruct struct {
binsearch.Key_string
Value [][5]uint32
}
type Trainer struct {
Tokens TokensStruct
TokensCount uint
}
func checkErr(err error) {
if err == nil {
return
}
fmt.Println(`Some Error:`, err)
panic(err)
}
// Local helper function for normalization of UTF8 strings.
func isMn(r rune) bool {
return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
}
// This map is used by RemoveAccents function to convert non-accented characters.
var transliterations = map[rune]string{'Æ': "E", 'Ð': "D", 'Ł': "L", 'Ø': "OE", 'Þ': "Th", 'ß': "ss", 'æ': "e", 'ð': "d", 'ł': "l", 'ø': "oe", 'þ': "th", 'Œ': "OE", 'œ': "oe"}
// removeAccentsBytes converts accented UTF8 characters into their non-accented equivalents, from a []byte.
func removeAccentsBytesDashes(b []byte) ([]byte, error) {
mnBuf := make([]byte, len(b))
t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
n, _, err := t.Transform(mnBuf, b, true)
if err != nil {
return nil, err
}
mnBuf = mnBuf[:n]
tlBuf := bytes.NewBuffer(make([]byte, 0, len(mnBuf)*2))
for i, w := 0, 0; i < len(mnBuf); i += w {
r, width := utf8.DecodeRune(mnBuf[i:])
if r == '-' {
tlBuf.WriteByte(' ')
} else {
if d, ok := transliterations[r]; ok {
tlBuf.WriteString(d)
} else {
tlBuf.WriteRune(r)
}
}
w = width
}
return tlBuf.Bytes(), nil
}
func LoadZip(filename string) ([]byte, error) {
// Open file for reading
fi, err := os.Open(filename)
if err != nil {
return nil, err
}
defer fi.Close()
// Attach ZIP reader
fz, err := zlib.NewReader(fi)
if err != nil {
return nil, err
}
defer fz.Close()
// Pull
data, err := ioutil.ReadAll(fz)
if err != nil {
return nil, err
}
return norm.NFC.Bytes(data), nil // return normalized
}
func getTokens(pibn string) []string {
var data []byte
var err error
data, err = LoadZip(`/storedir/` + pibn + `/text.zip`)
checkErr(err)
data, err = removeAccentsBytesDashes(data)
checkErr(err)
data = bytes.ToLower(data)
data = reg2.ReplaceAll(data, []byte("$2")) // remove contractions
data = reg.ReplaceAllLiteral(data, nil)
tokens := strings.Fields(string(data))
return tokens
}
func (t *Trainer) Start() {
data, err := ioutil.ReadFile(`list.txt`)
checkErr(err)
pibns := bytes.Fields(data)
for i, pibn := range pibns {
tokens := getTokens(string(pibn))
t.addTokens(tokens)
if i%100 == 0 {
runtime.GC() // I added this just to try to stop the memory craziness, but it makes no difference
}
}
}
func (t *Trainer) addTokens(tokens []string) {
for _, tok := range tokens {
if _, ok := dictionary[tok]; ok {
if indx, ok2 := t.Tokens.Find(tok); ok2 {
ar := t.Tokens.Value[indx]
ar[0]++
t.Tokens.Value[indx] = ar
t.TokensCount++
} else {
t.Tokens.AddKeyAt(tok, indx)
t.Tokens.Value = append(t.Tokens.Value, [5]uint32{0, 0, 0, 0, 0})
copy(t.Tokens.Value[indx+1:], t.Tokens.Value[indx:])
t.Tokens.Value[indx] = [5]uint32{1, 0, 0, 0, 0}
t.TokensCount++
}
}
}
return
}
func LoadDictionary() {
dictionary = make(map[string]bool)
data, err := ioutil.ReadFile(`dictionary`)
checkErr(err)
words := bytes.Fields(data)
for _, word := range words {
strword := string(word)
dictionary[strword] = false
}
}
var reg = regexp.MustCompile(`[^a-z0-9\s]`)
var reg2 = regexp.MustCompile(`\b(c|l|all|dall|dell|nell|sull|coll|pell|gl|agl|dagl|degl|negl|sugl|un|m|t|s|v|d|qu|n|j)'([a-z])`) //contractions
var dictionary map[string]bool
func main() {
trainer := new(Trainer)
LoadDictionary()
trainer.Start()
}
Make sure that if you're tokenizing from a large string, to avoid memory pinning. From the comments above, it sounds like the tokens are substrings of a large string.
You may need to add a little extra in your getTokens() function so it guarantees the tokens aren't pinning memory.
func getTokens(...) {
// near the end of your program
for i, t := range(tokens) {
tokens[i] = string([]byte(t))
}
}
By the way, reading the whole file into memory using ioutil.ReadFile all at once looks dubious. Are you sure you can't use bufio.Scanner?
I'm looking at the code more closely... if you are truly concerned about memory, take advantage of io.Reader. You should try to avoid sucking in the content of a whole file at once. Use io.Reader and the transform "along the grain". The way you're using it now is against the grain of its intent. The whole point of the transform package you're using is to construct flexible Readers that can stream through data.
For example, here's a simplification of what you're doing:
package main
import (
"bufio"
"bytes"
"fmt"
"unicode/utf8"
"code.google.com/p/go.text/transform"
)
type AccentsTransformer map[rune]string
func (a AccentsTransformer) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
for nSrc < len(src) {
// If we're at the edge, note this and return.
if !atEOF && !utf8.FullRune(src[nSrc:]) {
err = transform.ErrShortSrc
return
}
r, width := utf8.DecodeRune(src[nSrc:])
if r == utf8.RuneError && width == 1 {
err = fmt.Errorf("Decoding error")
return
}
if d, ok := a[r]; ok {
if nDst+len(d) > len(dst) {
err = transform.ErrShortDst
return
}
copy(dst[nDst:], d)
nSrc += width
nDst += len(d)
continue
}
if nDst+width > len(dst) {
err = transform.ErrShortDst
return
}
copy(dst[nDst:], src[nSrc:nSrc+width])
nDst += width
nSrc += width
}
return
}
func main() {
transliterations := AccentsTransformer{'Æ': "E", 'Ø': "OE"}
testString := "cØØl beÆns"
b := transform.NewReader(bytes.NewBufferString(testString), transliterations)
scanner := bufio.NewScanner(b)
scanner.Split(bufio.ScanWords)
for scanner.Scan() {
fmt.Println("token:", scanner.Text())
}
}
It becomes really easy then to chain transformers together. So, for example, if we wanted to remove all hyphens from the input stream, it's just a matter of using transform.Chain appropriately:
func main() {
transliterations := AccentsTransformer{'Æ': "E", 'Ø': "OE"}
removeHyphens := transform.RemoveFunc(func(r rune) bool {
return r == '-'
})
allTransforms := transform.Chain(transliterations, removeHyphens)
testString := "cØØl beÆns - the next generation"
b := transform.NewReader(bytes.NewBufferString(testString), allTransforms)
scanner := bufio.NewScanner(b)
scanner.Split(bufio.ScanWords)
for scanner.Scan() {
fmt.Println("token:", scanner.Text())
}
}
I have not exhaustively tested the code above, so please don't just copy-and-paste it without sufficient tests. :P I just cooked it up fast. But this kind of approach --- avoiding whole-file reading --- will scale better because it will read the file in chunks.
1 How large are "list.txt" and "dictionary"? If it is so large, No wonder the memory is so large
pibns := bytes.Fields(data)
how much is len(pibns)?
2 start the gc debug ( do GODEBUG="gctrace=1" ./yourprogram ) to see if there is any gc happening
3 do some profile like this:
func lookupMem(){
if f, err := os.Create("mem_prof"+time.Now.Unix()); err != nil {
log.Debug("record memory profile failed: %v", err)
} else {
runtime.GC()
pprof.WriteHeapProfile(f)
f.Close()
}
if f, err := os.Create("heap_prof" + "." + timestamp); err != nil {
log.Debug("heap profile failed:", err)
} else {
p := pprof.Lookup("heap")
p.WriteTo(f, 2)
}
}
func (t *Trainer) Start() {
.......
if i%1000==0 {
//if `len(pibns)` is not very large , record some meminfo
lookupMem()
}
.......
I have lots of small files, I don't want to read them line by line.
Is there a function in Go that will read a whole file into a string variable?
Use ioutil.ReadFile:
func ReadFile(filename string) ([]byte, error)
ReadFile reads the file named by filename and returns the contents. A successful call
returns err == nil, not err == EOF. Because ReadFile reads the whole file, it does not treat
an EOF from Read as an error to be reported.
You will get a []byte instead of a string. It can be converted if really necessary:
s := string(buf)
Edit: the ioutil package is now deprecated: "Deprecated: As of Go 1.16, the same functionality is now provided by package io or package os, and those implementations should be preferred in new code. See the specific function documentation for details." Because of Go's compatibility promise, ioutil.ReadMe is safe, but #openwonk's updated answer is better for new code.
If you just want the content as string, then the simple solution is to use the ReadFile function from the io/ioutil package. This function returns a slice of bytes which you can easily convert to a string.
Go 1.16 or later
Replace ioutil with os for this example.
package main
import (
"fmt"
"os"
)
func main() {
b, err := os.ReadFile("file.txt") // just pass the file name
if err != nil {
fmt.Print(err)
}
fmt.Println(b) // print the content as 'bytes'
str := string(b) // convert content to a 'string'
fmt.Println(str) // print the content as a 'string'
}
Go 1.15 or earlier
package main
import (
"fmt"
"io/ioutil"
)
func main() {
b, err := ioutil.ReadFile("file.txt") // just pass the file name
if err != nil {
fmt.Print(err)
}
fmt.Println(b) // print the content as 'bytes'
str := string(b) // convert content to a 'string'
fmt.Println(str) // print the content as a 'string'
}
I think the best thing to do, if you're really concerned about the efficiency of concatenating all of these files, is to copy them all into the same bytes buffer.
buf := bytes.NewBuffer(nil)
for _, filename := range filenames {
f, _ := os.Open(filename) // Error handling elided for brevity.
io.Copy(buf, f) // Error handling elided for brevity.
f.Close()
}
s := string(buf.Bytes())
This opens each file, copies its contents into buf, then closes the file. Depending on your situation you may not actually need to convert it, the last line is just to show that buf.Bytes() has the data you're looking for.
This is how I did it:
package main
import (
"fmt"
"os"
"bytes"
"log"
)
func main() {
filerc, err := os.Open("filename")
if err != nil{
log.Fatal(err)
}
defer filerc.Close()
buf := new(bytes.Buffer)
buf.ReadFrom(filerc)
contents := buf.String()
fmt.Print(contents)
}
You can use strings.Builder:
package main
import (
"io"
"os"
"strings"
)
func main() {
f, err := os.Open("file.txt")
if err != nil {
panic(err)
}
defer f.Close()
b := new(strings.Builder)
io.Copy(b, f)
print(b.String())
}
Or if you don't mind []byte, you can use
os.ReadFile:
package main
import "os"
func main() {
b, err := os.ReadFile("file.txt")
if err != nil {
panic(err)
}
os.Stdout.Write(b)
}
For Go 1.16 or later you can read file at compilation time.
Use the //go:embed directive and the embed package in Go 1.16
For example:
package main
import (
"fmt"
_ "embed"
)
//go:embed file.txt
var s string
func main() {
fmt.Println(s) // print the content as a 'string'
}
I'm not with computer,so I write a draft. You might be clear of what I say.
func main(){
const dir = "/etc/"
filesInfo, e := ioutil.ReadDir(dir)
var fileNames = make([]string, 0, 10)
for i,v:=range filesInfo{
if !v.IsDir() {
fileNames = append(fileNames, v.Name())
}
}
var fileNumber = len(fileNames)
var contents = make([]string, fileNumber, 10)
wg := sync.WaitGroup{}
wg.Add(fileNumber)
for i,_:=range content {
go func(i int){
defer wg.Done()
buf,e := ioutil.Readfile(fmt.Printf("%s/%s", dir, fileName[i]))
defer file.Close()
content[i] = string(buf)
}(i)
}
wg.Wait()
}