How do int-to-string casts work in Go? - string

I only started Go today, so this may be obvious but I couldn't find anything on it.
What does var x uint64 = 0x12345678; y := string(x) give y?
I know var x uint8 = 65; y := string(x) would give y the byte 65, character A, and common sense would suggest (since types larger than uint8 are allowed to be cast to strings) that they would simply be packed in to native byte order (i.e little endian) and assigned to the variable.
This does not seem to be the case:
hex.EncodeToString([]byte(y)) ==> "efbfbd"
First thought says this is an address with the last byte being left off because of some weird null terminator thingy, but if I allocate two x and y variables with two different values and print them out I get the same result.
var x, x2 uint64 = 0x10000000, 0x20000000
y, y2 := string(x), string(x2)
fmt.Println(hex.EncodeToString([]byte(y))) // "efbfbd"
fmt.Println(hex.EncodeToString([]byte(y2))) // "efbfbd"
Maddeningly I can't find the implementation for the string type anywhere although I probably haven't looked hard enough.

This is covered in the Spec: Conversions: Conversions to and from a string type:
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer. Values outside the range of valid Unicode code points are converted to "\uFFFD".
So effectively when you convert a numeric value to string, it can only yield a string having one rune (character). And since Go stores strings as the UTF-8 encoded byte sequences in memory, that is what you will see if you convert your string to []byte:
Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.
When you try to conver the 0x12345678, 0x10000000 and 0x20000000 values to string, since they are outside of the range of valid Unicode code points, as per spec they are converted to "\uFFFD" which in UTF-8 encoding is []byte{239, 191, 189}; when encoded to hex string:
fmt.Println(hex.EncodeToString([]byte("\uFFFD"))) // Output: efbfbd
Or simply:
fmt.Printf("%x", "\uFFFD") // Output: efbfbd
Read the blog post Strings, bytes, runes and characters in Go for more details about string internals.
And btw since Go 1.5 the Go runtime is implemented (mostly) in Go, so these conversions are now implemented in Go and can be found in the runtime package: runtime/string.go, look for the intstring() function.

Related

string(int), string(int32) and string([]int32) are all valid but string([]int) is invalid - what's the rationale here?

(I'm using Go 1.14.6.)
The following statements would all output the char a
Println(string(int(97) ) )
Println(string(int32(97) ) )
Println(string([]int32{97} ) )
But
Println(string([]int{97} ) )
would cause compile error
cannot convert []int literal (type []int) to type string
The behavior is confusing to me. If it handles string(int) the same as string(int32), why it handles string([]int) different from string([]int32)?
rune which represents a unicode code point is an alias for int32. So effectively string([]int32{}) is the same as string([]rune{}) which converts a slice of runes (something like the charaters of a string) to string. This is useful.
int is not int32 nor rune, so it's not logical what converting []int to string should be, it's ambiguous, so it's not allowed by the language spec.
Converting an integer number to string results in a string value with a single rune. Spec: Conversions:
Conversions to and from a string type
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer. Values outside the range of valid Unicode code points are converted to "\uFFFD".
This is confusing to many, as many expects the conversion result to be the (decimal) representation as string. The Go authors have recognized this, and have taken steps to depcecate and remove it from the language in the future. In Go 1.15, go vet already warns for such conversion. Go 1.15 release notes: Vet:
New warning for string(x)
The vet tool now warns about conversions of the form string(x) where x has an integer type other than rune or byte. Experience with Go has shown that many conversions of this form erroneously assume that string(x) evaluates to the string representation of the integer x. It actually evaluates to a string containing the UTF-8 encoding of the value of x. For example, string(9786) does not evaluate to the string "9786"; it evaluates to the string "\xe2\x98\xba", or "☺".
Code that is using string(x) correctly can be rewritten to string(rune(x)). Or, in some cases, calling utf8.EncodeRune(buf, x) with a suitable byte slice buf may be the right solution. Other code should most likely use strconv.Itoa or fmt.Sprint.
This new vet check is enabled by default when using go test.
We are considering prohibiting the conversion in a future release of Go. That is, the language would change to only permit string(x) for integer x when the type of x is rune or byte. Such a language change would not be backward compatible. We are using this vet check as a first trial step toward changing the language.

Binary Formatting Variables in TCL

I am trying to create a binary message to send over a socket, but I'm having trouble with the way TCL treats all variables as strings. I need to calculate the length of a string and know its value in binary.
set length [string length $message]
set binaryMessagePart [binary format s* { $length 0 }]
However, when I run this I get the error 'expected integer but got "$length"'. How do I get this to work and return the value for the integer 5 and not the char 5?
To calculate the length of a string, use string length. To calculate the length of a string in a particular encoding, convert the string to that encoding and use string length:
set enc "utf-8"; # Or whatever; you need to know this ahead of time for sanity's sake
set encoded [encoding convertto $enc $message]
set length [string length $encoded]
Note that with the encoded length, this will be in bytes whereas the length prior to encoding is in characters. For some messages and some encodings, the difference can be substantial.
To compose a binary message with the length and the body of the message (a fairly common binary format), use binary format like this:
# Assumes the length is big-endian; for little-endian, use i instead of I
set binPart [binary format "Ia*" $length $encoded]
What you were doing wrong was using s* which consumes a list of integers and produces a sequence of little-endian short integer binary values in the output string, and yet were feeding the list that was literally $length 0; and the string $length is not an integer as those don't start with $. We could have instead done [list $length 0] to produce the argument to s* and that would have worked, but that doesn't seem quite right for the context of the question.
In binary format, these are the common formats (there are many more):
a is for string data (mnemonically “ASCII”); this is binary string data, and you need to encode it first.
i and I are for 32-bit numbers (mnemonically “int” like in many programming languages, but especially C). Upper case is big-endian, lower case is little-endian.
s and S are for 16-bit numbers (mnemonically “short”).
c is for 8-bit numbers (mnemonically “char” from C).
w and W are for 64-bit numbers (mnemonically “wide integers”).
f and d are for IEEE binary floating point numbers (mnemonically “float” and “double” respectively, so 4 and 8 bytes).
All can be followed by an optional length, either a number or a *. For the number ones, instead of inserting a single number they insert a list of them (and so consume a list); numbers give fixed lengths, and * does “all the list”. For the string format indicator, a number uses a fixed number of bytes in the message (truncating or padding with zero bytes as necessary) and * does “all the string” (never truncating or padding).

Why string cannot convert to other data type array except uint8 and int32?

When I try to convert string to []int, compile fail. and I found string can convert to int32(rune) and uint8(byte).
This is my test code:
s1 := "abcd"
b1 := []byte(s1)
r1 := []rune(s1)
i1 := []int8(s1) //error
The short answer is because the language specification does not allow it.
The allowed conversions for non-constant values: Spec: Conversions:
A non-constant value x can be converted to type T in any of these cases:
x is assignable to T.
ignoring struct tags (see below), x's type and T have identical underlying types.
ignoring struct tags (see below), x's type and T are pointer types that are not defined types, and their pointer base types have identical underlying types.
x's type and T are both integer or floating point types.
x's type and T are both complex types.
x is an integer or a slice of bytes or runes and T is a string type.
x is a string and T is a slice of bytes or runes.
The longer answer is:
Spec: Conversions: Conversions to and from a string type:
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer. Values outside the range of valid Unicode code points are converted to "\uFFFD".
Converting a slice of bytes to a string type yields a string whose successive bytes are the elements of the slice.
Converting a slice of runes to a string type yields a string that is the concatenation of the individual rune values converted to strings.
Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.
Converting a value of a string type to a slice of runes type yields a slice containing the individual Unicode code points of the string.
Converting a string to []byte is "useful" because that is the UTF-8 encoded byte sequence of the text, this is exactly how Go stores strings in memory, and this is usually the data you should store / transmit in order to deliver a string over byte-streams (such as io.Writer), and similarly, this is what you can get out of an io.Reader.
Converting a string to []rune is also useful, it result in the characters (runes) of the text, so you can easily inspect / operate on the characters of a string (which is often needed in real-life applications).
Converting a string to []int8 is not so much useful given that byte-streams operate on bytes (which is an alias to uint8, not int8). If in a specific case you need a []int8 from a string, you can write your custom converter (which would most likely convert the individual bytes of the string to int8 values).
from the go documentation:
string is the set of all strings of 8-bit bytes, conventionally but not
necessarily representing UTF-8-encoded text. A string may be empty, but
not nil. Values of string type are immutable.
so a string is the same as []uint8
that is why you can convert it to rune and uint8

What is the difference between the string and []byte in Go?

s := "some string"
b := []byte(s) // convert string -> []byte
s2 := string(b) // convert []byte -> string
what is the difference between the string and []byte in Go?
When to use "he" or "she"?
Why?
bb := []byte{'h','e','l','l','o',127}
ss := string(bb)
fmt.Println(ss)
hello
The output is just "hello", and lack of 127, sometimes I feel that it's weird.
string and []byte are different types, but they can be converted to one another:
3 . Converting a slice of bytes to a string type yields a string whose successive bytes are the elements of the slice.
4 . Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.
Blog: Arrays, slices (and strings): The mechanics of 'append':
Strings are actually very simple: they are just read-only slices of bytes with a bit of extra syntactic support from the language.
Also read: Strings, bytes, runes and characters in Go
When to use one over the other?
Depends on what you need. Strings are immutable, so they can be shared and you have guarantee they won't get modified.
Byte slices can be modified (meaning the content of the backing array).
Also if you need to frequently convert a string to a []byte (e.g. because you need to write it into an io.Writer()), you should consider storing it as a []byte in the first place.
Also note that you can have string constants but there are no slice constants. This may be a small optimization. Also note that:
The expression len(s) is constant if s is a string constant.
Also if you are using code already written (either standard library, 3rd party packages or your own), in most of the cases it is given what parameters and values you have to pass or are returned. E.g. if you read data from an io.Reader, you need to have a []byte which you have to pass to receive the read bytes, you can't use a string for that.
This example:
bb := []byte{'h','e','l','l','o',127}
What happens here is that you used a composite literal (slice literal) to create and initialize a new slice of type []byte (using Short variable declaration). You specified constants to list the initial elements of the slice. You also used a byte value 127 which - depending on the platform / console - may or may not have a visual representation.
Late but i hope this could help.
In simple words
Bit: 0 and 1 is how machines represents all the information
Byte: 8 bits that represents UTF-8 encodings i.e. characters
[ ]type: slice of a given data type. Slices are dynamic size arrays.
[ ]byte: this is a byte slice i.e. a dynamic size array that contains bytes i.e. each element is a UTF-8 character.
String: read-only slices of bytes i.e. immutable
With all this in mind:
s := "Go"
bs := []byte(s)
fmt.Printf("%s", bs) // Output: Go
fmt.Printf("%d", bs) // Output: [71 111]
or
bs := []byte{71, 111}
fmt.Printf("%s", bs) // Output: Go
%s converts byte slice to string
%d gets UTF-8 decimal value of bytes
IMPORTANT:
As strings are immutable, they cannot be changed within memory, each time you add or remove something from a string, GO creates a new string in memory. On the other hand, byte slices are mutable so when you update a byte slice you are not recreating new stuffs in memory.
So choosing the right structure could make a difference in your app performance.

Go - Comparing strings/byte slices input by the user

I am getting input from the user, however when I try to compare it later on to a string literal it does not work. That is just a test though.
I would like to set it up so that when a blank line is entered (just hitting the enter/return key) the program exits. I don't understand why the strings are not comparing because when I print it, it comes out identical.
in := bufio.NewReader(os.Stdin);
input, err := in.ReadBytes('\n');
if err != nil {
fmt.Println("Error: ", err)
}
if string(input) == "example" {
os.Exit(0)
}
string vs []byte
string definition:
string is the set of all strings of 8-bit bytes, conventionally but not necessarily representing UTF-8-encoded text. A string may be empty, but not nil. Values of string type are immutable.
byte definition:
byte is an alias for uint8 and is equivalent to uint8 in all ways. It is used, by convention, to distinguish byte values from 8-bit unsigned integer values.
What does it mean?
[]byte is a byte slice. slice can be empty.
string elements are unicode characters, which can have more then 1 byte.
string elements keep a meaning of data (encoding), []bytes not.
equality operator is defined for string type but not for slice type.
As you see they are two different types with different properties.
There is a great blog post explaining different string related types [1]
Regards the issue you have in your code snippet.
Bear in mind that in.ReadBytes(char) returns a byte slice with char inclusively. So in your code input ends with '\n'. If you want your code to work in desired way then try this:
if string(input) == "example\n" { // or "example\r\n" when on windows
os.Exit(0)
}
Also make sure that your terminal code page is the same as your .go source file. Be aware about different end-line styles (Windows uses "\r\n"), Standard go compiler uses utf8 internally.
[1] Comparison of Go data types for string processing.

Resources