I'm converting a node service to go. For this I need a compatible md5 hash (not for storing passwords!!) generator. However, in this example, I keep getting different results:
Node's crypto takes an encoding parameter when creating md5s.
> crypto.createHash("md5").update("1Editor’s notebook: Escaping temptation for turf145468066").digest("hex")
'c7c3210bd977b049f42c487b8c6d0463'
In golang: (test_encode.go)
package main
import (
"crypto/md5"
"encoding/hex"
"testing"
)
func TestFoo(t *testing.T) {
const result = "c7c3210bd977b049f42c487b8c6d0463"
stringToEncode := "1Editor’s notebook: Escaping temptation for turf145468066"
hash := md5.Sum([]byte(stringToEncode))
hashStr := hex.EncodeToString(hash[:])
if hashStr != result {
t.Error("Got", hashStr, "expected", result)
}
}
Then go test test_encode.go results in:
--- FAIL: TestFoo (0.00s)
encode_test.go:17: Got c3804ddcc59fabc09f0ce2418b3a8335 expected c7c3210bd977b049f42c487b8c6d0463
FAIL
FAIL command-line-arguments 0.006s
I've tracked it down to the encoding parameter of crypto.update in the node code. And the fact that the string as a ’ quote character in it. If I specify "utf8" it works.
crypto.createHash("md5").update("1Editor’s notebook: Escaping temptation for turf145468066", "utf8").digest("hex")
BUT: I can't change the node code, so the go code has to be compatible. Any ideas on what to do?
As you've already noted: you must convert the UTF8 string to whatever encoding is used in your node application. This can be done with encoding packages such as:
golang.org/x/text/encoding/charmap
isoString, err := charmap.ISO8859_1.NewEncoder().Bytes([]byte(stringToEncode))
Considering that the character ’ is not allowed in iso-8859-1, we can assume you have a different encoding. Now you just need to figure out which one!
And in worse case, you might have to use another package than charmap.
After a lot of digging in node and V8 I was able to conclude the following:
require("crypto").createHash("md5").update(inputString).digest("hex");
Is pretty dangerous, as not specifying a encodes the input string as "ASCII". Which, after a lot of digging, is the equivalent (verified on a large input set from my end):
// toNodeASCIIString converts a string to a byte of node compatible ASCII string
func toNodeASCIIString(inputString string) []byte {
lengthOfString := utf8.RuneCountInString(string(inputString))
stringAsRunes := []rune(inputString)
bytes := make([]byte, lengthOfString)
for i, r := range stringAsRunes {
bytes[i] = byte(r % 256)
}
return bytes
}
What is basically does is mods by 256 and forgets a large part of the input string.
The node example above is pretty much the standard and copy-pasted-everywhere way to create MD5 hashes in node. I have not checked but I'm assuming this works the same for all other hashes (SHA1, SHA256, etc).
I would love to hear someones thoughts on why this is not huge security hole.
Related
Creating a TCP server that needs to process some data. I have a net.Conn instance "connection"from which I will read said data. The lower part of the snippet brings about an error noting that it cannot use the 'esc' value as a byte value.
const(
esc = "\a\n"
)
....
c := bufio.NewReader(connection)
data, err := c.ReadBytes(esc)
Clearly, some conversion is needed but when I try
const(
esc = "\a\n"
)
....
c := bufio.NewReader(connection)
data, err := c.ReadBytes(byte(esc))
The compiler makes note that I cannot convert esc to byte. Is it due to the fact that I declared "\a\n" as a const value on the package level? Or is there something else overall associated with how I'm framing the bytes to be read?
You can't convert esc to byte because you can't convert strings into single bytes. You can convert a string into a byte slice ([]byte).
The bufio.Reader only supports single byte delimiters, you should use a bufio.Scanner with a custom split function instead for multi-byte delimiters.
Perhaps a modified version of https://stackoverflow.com/a/37531472/1205448
How does Go type conversion internally work?
What is the memory utilisation for a type cast?
For example:
var str1 string
str1 = "26MB string data"
byt := []byte(str1)
str2 := string(byt)
whenever I type convert any variable, will it consume more memory?
I am concerned about this because when I try to unmarshall, I get "fatal error: runtime: out of memory"
err = json.Unmarshal([]byte(str1), &obj)
str1 value comes from HTTP response, but read using ioutils.ReadAll, hence it contains the complete response.
It's called conversion in Go (not casting), and this is covered in Spec: Conversions:
Specific rules apply to (non-constant) conversions between numeric types or to and from a string type. These conversions may change the representation of x and incur a run-time cost. All other conversions only change the type but not the representation of x.
So generally converting does not make a copy, only changes the type. Converting to / from string usually does, as string values are immutable, and for example if converting a string to []byte would not make a copy, you could change the content of the string by changing elements of the resulting byte slice.
See related question: Does convertion between alias types in Go create copies?
There are some exceptions (compiler optimizations) when converting to / from string does not make a copy, for details see golang: []byte(string) vs []byte(*string).
If you already have your JSON content as a string value which you want to unmarshal, you should not convert it to []byte just for the sake of unmarshaling. Instead use strings.NewReader() to obtain an io.Reader which reads from the passed string value, and pass this reader to json.NewDecoder(), so you can unmarshal without having to make a copy of your big input JSON string.
This is how it could look like:
input := "BIG JSON INPUT"
dec := json.NewDecoder(strings.NewReader(input))
var result YourResultType
if err := dec.Decode(&result); err != nil {
// Handle error
}
Also note that this solution can further be optimized if the big JSON string is read from an io.Reader, in which case you can completely omit reading it first, just pass that to json.NewDecoder() directly, e.g.:
dec := json.NewDecoder(jsonSource)
var result YourResultType
if err := dec.Decode(&result); err != nil {
// Handle error
}
I am dealing with some legacy data where I routinely need to convert a uint16 to a 2 byte string.
Here is what I am using (where i is a uint16):
string([]byte {byte(i >> 8), byte(i & 0xFF)})
https://play.golang.org/p/423CAL-SJv
This seems rather clunky. Is there an existing library function to do this? I have looked at both the strings and binary packages, but nothing seemed immediately obvious.
While that is perfectly fine for what you're trying to do, the encoding/binary package has much more functionality for reading and writing binary values.
You can use
i := uint16(0x474F)
b := make([]byte, 2)
binary.BigEndian.PutUint16(b, i)
fmt.Println(string(b))
// GO
https://play.golang.org/p/IdDnnOtS2V
Try the following
t := strconv.Itoa(123)
There are invalid byte sequences that can't be converted to Unicode strings. How do I detect that when converting []byte to string in Go?
You can, as Tim Cooper noted, test UTF-8 validity with utf8.Valid.
But! You might be thinking that converting non-UTF-8 bytes to a Go string is impossible. In fact, "In Go, a string is in effect a read-only slice of bytes"; it can contain bytes that aren't valid UTF-8 which you can print, access via indexing, pass to WriteString methods, or even round-trip back to a []byte (to Write, say).
There are two places in the language that Go does do UTF-8 decoding of strings for you.
when you do for i, r := range s the r is a Unicode code point as a value of type rune
when you do the conversion []rune(s), Go decodes the whole string to runes.
(Note that rune is an alias for int32, not a completely different type.)
In both these instances invalid UTF-8 is replaced with U+FFFD, the replacement character reserved for uses like this. More is in the spec sections on for statements and conversions between strings and other types. These conversions never crash, so you only need to actively check for UTF-8 validity if it's relevant to your application, like if you can't accept the U+FFFD replacement and need to throw an error on mis-encoded input.
Since that behavior's baked into the language, you can expect it from libraries, too. U+FFFD is utf8.RuneError and returned by functions in utf8.
Here's a sample program showing what Go does with a []byte holding invalid UTF-8:
package main
import "fmt"
func main() {
a := []byte{0xff}
s := string(a)
fmt.Println(s)
for _, r := range s {
fmt.Println(r)
}
rs := []rune(s)
fmt.Println(rs)
}
Output will look different in different environments, but in the Playground it looks like
�
65533
[65533]
I am having trouble getting a random sha256 hash using a timestamp seed:
https://play.golang.org/p/2-_VPe3oFr (dont use playground - time always same)
Does anyone understand why it always returns the same result? (non-playground runs)
Because you do this:
timestamp := time.Now().Unix()
log.Print(fmt.Sprintf("%x", sha256.Sum256([]byte(string(timestamp))))[:45])
You print the hex form of the SHA-256 digest of the data:
[]byte(string(timestamp))
What is it exactly?
timestamp is of type int64, converting it to string is:
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer. Values outside the range of valid Unicode code points are converted to "\uFFFD".
But its value is not a valid unicode code point so it will always be "\uFFFD" which is efbfbd (UTF-8 encoded), and your code always prints the SHA-256 of the data []byte{0xef, 0xbf, 0xbd} which is (or rather its first 45 hex digits because you slice the result):
83d544ccc223c057d2bf80d3f2a32982c32c3c0db8e26
I guess you wanted to generate some random bytes and calculate the SHA-256 of that, something like this:
data := make([]byte, 10)
for i := range data {
data[i] = byte(rand.Intn(256))
}
fmt.Printf("%x", sha256.Sum256(data))
Note that if you'd use the crypto/rand package instead of math/rand, you could fill a slice of bytes with random values using the rand.Read() function, and you don't even have to set seed (and so you don't even need the time package):
data := make([]byte, 10)
if _, err := rand.Read(data); err == nil {
fmt.Printf("%x", sha256.Sum256(data))
}
Yes. This:
string(timestamp)
does not do what you think it does, see the spec. Long story short, the timestamp is not a valid unicode code point, so the result is always "\uFFFD".