How does type conversion internally work? What is the memory utilization for the same? - string

How does Go type conversion internally work?
What is the memory utilisation for a type cast?
For example:
var str1 string
str1 = "26MB string data"
byt := []byte(str1)
str2 := string(byt)
whenever I type convert any variable, will it consume more memory?
I am concerned about this because when I try to unmarshall, I get "fatal error: runtime: out of memory"
err = json.Unmarshal([]byte(str1), &obj)
str1 value comes from HTTP response, but read using ioutils.ReadAll, hence it contains the complete response.

It's called conversion in Go (not casting), and this is covered in Spec: Conversions:
Specific rules apply to (non-constant) conversions between numeric types or to and from a string type. These conversions may change the representation of x and incur a run-time cost. All other conversions only change the type but not the representation of x.
So generally converting does not make a copy, only changes the type. Converting to / from string usually does, as string values are immutable, and for example if converting a string to []byte would not make a copy, you could change the content of the string by changing elements of the resulting byte slice.
See related question: Does convertion between alias types in Go create copies?
There are some exceptions (compiler optimizations) when converting to / from string does not make a copy, for details see golang: []byte(string) vs []byte(*string).
If you already have your JSON content as a string value which you want to unmarshal, you should not convert it to []byte just for the sake of unmarshaling. Instead use strings.NewReader() to obtain an io.Reader which reads from the passed string value, and pass this reader to json.NewDecoder(), so you can unmarshal without having to make a copy of your big input JSON string.
This is how it could look like:
input := "BIG JSON INPUT"
dec := json.NewDecoder(strings.NewReader(input))
var result YourResultType
if err := dec.Decode(&result); err != nil {
// Handle error
}
Also note that this solution can further be optimized if the big JSON string is read from an io.Reader, in which case you can completely omit reading it first, just pass that to json.NewDecoder() directly, e.g.:
dec := json.NewDecoder(jsonSource)
var result YourResultType
if err := dec.Decode(&result); err != nil {
// Handle error
}

Related

How to convert String to Primitive.ObjectID in Golang?

There are questions similar to this. But mostly they are using Hex()(like here) for primitive Object to String conversion. I'm using String() for conversion. How do I convert it back to primitive Object type ?
The String() method of types may result in an arbitrary string representation. Parsing it may not always be possible, as it may not contain all the information the original value holds, or it may not be "rendered" in a way that is parsable unambiguously. There's also no guarantee the "output" of String() doesn't change over time.
Current implementation of ObjectID.String() does this:
func (id ObjectID) String() string {
return fmt.Sprintf("ObjectID(%q)", id.Hex())
}
Which results in a string like this:
ObjectID("4af9f070cc10e263c8df915d")
This is parsable, you just have to take the hex number, and pass it to primitive.ObjectIDFromHex():
For example:
id := primitive.NewObjectID()
s := id.String()
fmt.Println(s)
hex := s[10:34]
id2, err := primitive.ObjectIDFromHex(hex)
fmt.Println(id2, err)
This will output (try it on the Go Playground):
ObjectID("4af9f070cc10e263c8df915d")
ObjectID("4af9f070cc10e263c8df915d") <nil>
This solution could be improved to find " characters in the string representation and use the indices instead of the fixed 10 and 34, but you shouldn't be transferring and parsing the result of ObjectID.String() in the first place. You should use its ObjectID.Hex() method in the first place, which can be passed as-is to primitive.ObjectIDFromHex().

Adding a integer value to string on a single statement

I was wondering how can I add an integer value to a string value like "10". I know I can accomplish this by converting the string into an int first and then after adding the integer I can convert it back into string. But can I accomplish this in a single statement in golang. For example I can do this with multiple lines like this:
i, err := strconv.Atoi("10")
// handle error
i = i + 5
s := strconv.Itoa(i)
But is there any way that I can accomplish this in a single statement?
There is no ready function in the standard library for what you want to do. And the reason for that is because adding a number to a number available as a string and having the result as another string is (terribly) inefficient.
The model (memory representation) of the string type does not support adding numbers to it efficiently (not to mention that string values are immutable, a new one has to be created); the memory model of int does support adding efficiently for example (and CPUs also have direct operations for that). No one wants to add ints to numbers stored as string values. If you want to add numbers, have your numbers ready just as that: numbers. When you want to print or transmit, only then convert it to string (if you must).
But everything becomes a single statement if you have a ready util function for it:
func add(s string, n int) (string, error) {
i, err := strconv.Atoi(s)
if err != nil {
return "", err
}
return strconv.Itoa(i + n), nil
}
Using it:
s, err := add("10", 5)
fmt.Println(s, err)
Output (try it on the Go Playground):
15 <nil>

[]byte(string) vs []byte(*string)

I'm curious as why Go doesn't provide a []byte(*string) method. From a performance perspective, wouldn't []byte(string) make a copy of the input argument and add more cost (though this seems odd since strings are immutable, why copy them)?
[]byte("something") is not a function (or method) call, it's a type conversion.
The type conversion "itself" does not copy the value. Converting a string to a []byte however does, and it needs to, because the result byte slice is mutable, and if a copy would not be made, you could modify / alter the string value (the content of the string) which is immutable, it must be as the Spec: String types section dictates:
Strings are immutable: once created, it is impossible to change the contents of a string.
Note that there are few cases when string <=> []byte conversion does not make a copy as it is optimized "away" by the compiler. These are rare and "hard coded" cases when there is proof an immutable string cannot / will not end up modified.
Such an example is looking up a value from a map where the key type is string, and you index the map with a []byte, converted to string of course (source):
key := []byte("some key")
var m map[string]T
// ...
v, ok := m[string(key)] // Copying key here is optimized away
Another optimization is when ranging over the bytes of a string that is explicitly converted to a byte slice:
s := "something"
for i, v := range []byte(s) { // Copying s is optimized away
// ...
}
(Note that without the conversion the for range would iterate over the runes of the string and not over its UTF8-encoded bytes.)
I'm curious as why Golang doesn't provide a []byte(*string) method.
Because it doesn't make sense.
A pointer (to any type) cannot be represented (in any obviously meaningful way) as a []byte.
From a performance perspective, wouldn't []byte(string) make a copy of the input argument and add more cost (though this seems odd since strings are immutable, why copy them)?
Converting from []byte to string (and vice versa) does involve a copy, because strings are immutable, but byte arrays are not.
However, using a pointer wouldn't solve that problem.

How to detect when bytes can't be converted to string in Go?

There are invalid byte sequences that can't be converted to Unicode strings. How do I detect that when converting []byte to string in Go?
You can, as Tim Cooper noted, test UTF-8 validity with utf8.Valid.
But! You might be thinking that converting non-UTF-8 bytes to a Go string is impossible. In fact, "In Go, a string is in effect a read-only slice of bytes"; it can contain bytes that aren't valid UTF-8 which you can print, access via indexing, pass to WriteString methods, or even round-trip back to a []byte (to Write, say).
There are two places in the language that Go does do UTF-8 decoding of strings for you.
when you do for i, r := range s the r is a Unicode code point as a value of type rune
when you do the conversion []rune(s), Go decodes the whole string to runes.
(Note that rune is an alias for int32, not a completely different type.)
In both these instances invalid UTF-8 is replaced with U+FFFD, the replacement character reserved for uses like this. More is in the spec sections on for statements and conversions between strings and other types. These conversions never crash, so you only need to actively check for UTF-8 validity if it's relevant to your application, like if you can't accept the U+FFFD replacement and need to throw an error on mis-encoded input.
Since that behavior's baked into the language, you can expect it from libraries, too. U+FFFD is utf8.RuneError and returned by functions in utf8.
Here's a sample program showing what Go does with a []byte holding invalid UTF-8:
package main
import "fmt"
func main() {
a := []byte{0xff}
s := string(a)
fmt.Println(s)
for _, r := range s {
fmt.Println(r)
}
rs := []rune(s)
fmt.Println(rs)
}
Output will look different in different environments, but in the Playground it looks like
�
65533
[65533]

Go - Comparing strings/byte slices input by the user

I am getting input from the user, however when I try to compare it later on to a string literal it does not work. That is just a test though.
I would like to set it up so that when a blank line is entered (just hitting the enter/return key) the program exits. I don't understand why the strings are not comparing because when I print it, it comes out identical.
in := bufio.NewReader(os.Stdin);
input, err := in.ReadBytes('\n');
if err != nil {
fmt.Println("Error: ", err)
}
if string(input) == "example" {
os.Exit(0)
}
string vs []byte
string definition:
string is the set of all strings of 8-bit bytes, conventionally but not necessarily representing UTF-8-encoded text. A string may be empty, but not nil. Values of string type are immutable.
byte definition:
byte is an alias for uint8 and is equivalent to uint8 in all ways. It is used, by convention, to distinguish byte values from 8-bit unsigned integer values.
What does it mean?
[]byte is a byte slice. slice can be empty.
string elements are unicode characters, which can have more then 1 byte.
string elements keep a meaning of data (encoding), []bytes not.
equality operator is defined for string type but not for slice type.
As you see they are two different types with different properties.
There is a great blog post explaining different string related types [1]
Regards the issue you have in your code snippet.
Bear in mind that in.ReadBytes(char) returns a byte slice with char inclusively. So in your code input ends with '\n'. If you want your code to work in desired way then try this:
if string(input) == "example\n" { // or "example\r\n" when on windows
os.Exit(0)
}
Also make sure that your terminal code page is the same as your .go source file. Be aware about different end-line styles (Windows uses "\r\n"), Standard go compiler uses utf8 internally.
[1] Comparison of Go data types for string processing.

Resources