I want to create a dummy string of a given length to do a performance test. For example I want to first test with 1 KB of string and then may be 10 KB of string etc. I don't care which character (or rune?) it gets filled with. I understand that a string in Go is backed by byte array. So, I want the final string to be backed by a byte array of size equivalent of 1 KB (if I give 1024 as the argument).
For example, I tried the brute force code below:
...
oneKBPayload := createPayload(1024, 'A')
...
//I don't mind even if the char argument is removed and 'A' is used for example
func createPayload(len int, char rune) string {
payload := make([]byte, len)
for i := 0; i < len; i++ {
payload = append(payload, byte(char))
}
return string(payload[:])
}
and it produced a result of (for 10 length)
"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000AAAAAAAAAA"
I realize that it has something to do with the encoding. But how to fix this so that I create any string which is backed by a byte array of the given length so that when I write it over the network, I generate the intended payload.
Your createPayload() creates a byte slice with the given length, which is filled with zeros by default (zero value). Then you append len number of runes to this slice, so the result will be double the length you intend to create (given the rune is less than 127), and that's why you see zeros then followed by the 'A' rune when printed.
If you change it to:
payload := make([]byte, 0, len)
Then the result will be what you want.
But easier would be to simply use strings.Repeat() which repeats a given string value n times. Repeat a one-rune (or more specifically a one-byte) string value n times, and you get what you want:
s := strings.Repeat("A", 10)
fmt.Println(len(s), s)
This will output (try it on the Go Playground):
10 AAAAAAAAAA
If you don't care about the content of the string only about its length, then simply convert a byte slice like this:
s := string(make([]byte, 1024))
fmt.Println(len(s))
Or alternatively like this:
s2 := string([]byte{1023: 0})
fmt.Println(len(s2))
Both prints 1024. Try them on the Go Playground.
If you do care about the content and you already have a byte slice allocated, this is how you can efficiently fill it: Is there analog of memset in go?
Related
I am trying to replace a specific position character from an array of strings. Here is what my code looks like:
package main
import (
"fmt"
)
func main() {
str := []string{"test","testing"}
str[0][2] = 'y'
fmt.Println(str)
}
Now, running this gives me the error:
cannot assign to str[0][2]
Any idea how to do this? I have tried using strings.Replace, but AFAIK it will replace all the occurrence of the given character, while I want to replace that specific character. Any help is appreciated. TIA.
Strings in Go are immutable, you can't change their content. To change the value of a string variable, you have to assign a new string value.
An easy way is to first convert the string to a byte or rune slice, do the change and convert back:
s := []byte(str[0])
s[2] = 'y'
str[0] = string(s)
fmt.Println(str)
This will output (try it on the Go Playground):
[teyt testing]
Note: I converted the string to byte slice, because this is what happens when you index a string: it indexes its bytes. A string stores the UTF-8 byte sequence of the text, which may not necessarily map bytes to characters one-to-one.
If you need to replace the 2nd character, use []rune instead:
s := []rune(str[0])
s[2] = 'y'
str[0] = string(s)
fmt.Println(str)
In this example it doesn't matter though, but in general it may.
Also note that strings.Replace() does not (necessarily) replace all occurrences:
func Replace(s, old, new string, n int) string
The parameter n tells how many replacement are to be performed max. So the following also works (try it on the Go Playground):
str[0] = strings.Replace(str[0], "s", "y", 1)
Yet another solution could be to slice the string up until the replacable character, and starting from the character after the replacable one, and just concatenate them (try this one on the Go Playground):
str[0] = str[0][:2] + "y" + str[0][3:]
Care must be taken here too: the slice indices are byte indices, not character (rune) indices.
See related question: Immutable string and pointer address
Here's a function that will do that for you. It takes care of converting the string that you want to modify into a []rune, and then back out to string.
If your intention is to replace bytes rather than runes, you can:
copy this function's code, rename it from runeSub to byteSub
change the r rune parameter to b byte
Also available on repl.it
package main
import "fmt"
// runeSub - given an array of strings (ss), replace the
// (ri)th rune (character) in the (si)th string
// of (ss), with the rune (r)
//
// ss - the array of strings
// si - the index of the string in ss that you want to modify
// ri - the index of the rune in ss[si] that you want to replace
// r - the rune you want to insert
//
// NOTE: this function has no panic protection from things like
// out-of-bound index values
func runeSub(ss []string, si, ri int, r rune) {
rr := []rune(ss[si])
rr[ri] = r
ss[si] = string(rr)
}
func main() {
ss := []string{"test","testing"}
runeSub(ss, 0, 2, 'y')
fmt.Println(ss)
}
In Python I can slice a string to get a sub-string of up to N characters and if the string is too short it will simply return the rest of the string, e.g.
"mystring"[:100] # Returns "mystring"
What's the easiest way to do the same in Go? Trying the same thing panics:
"mystring"[:100] // panic: runtime error: slice bounds out of range
Of course, I can write it all manually:
func Substring(s string, startIndex int, count int) string {
maxCount := len(s) - startIndex
if count > maxCount {
count = maxCount
}
return s[startIndex:count]
}
fmt.Println(Substring("mystring", 0, n))
But that's rather a lot of work for something so simple and (I would have thought) common. What's more, I don't know how to generalise this function to slices of other types, since Go doesn't support generics. I'm hoping there is a better way. Even Math.Min() doesn't easily work here, because it expects and returns float64.
Note that while a function remains the recommended solution (even if it has to be implemented for slices with different type), it wouldn't work well with string.
fmt.Println(Substring("世界mystring", 0, 5)) would actually print 世�� instead of 世界mys.
See "Code points, characters, and runes": a character may be represented by a number of different sequences of code points, and therefore different sequences of UTF-8 bytes.
And in Go, a "code point" is a rune (as seen here).
Using rune would be more robust (again, in case of strings)
func SubstringRunes(s string, startIndex int, count int) string {
runes := []rune(s)
length := len(runes)
maxCount := length - startIndex
if count > maxCount {
count = maxCount
}
return string(runes[startIndex:count])
}
See it in action in this playground.
s := "some string"
b := []byte(s) // convert string -> []byte
s2 := string(b) // convert []byte -> string
what is the difference between the string and []byte in Go?
When to use "he" or "she"?
Why?
bb := []byte{'h','e','l','l','o',127}
ss := string(bb)
fmt.Println(ss)
hello
The output is just "hello", and lack of 127, sometimes I feel that it's weird.
string and []byte are different types, but they can be converted to one another:
3 . Converting a slice of bytes to a string type yields a string whose successive bytes are the elements of the slice.
4 . Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.
Blog: Arrays, slices (and strings): The mechanics of 'append':
Strings are actually very simple: they are just read-only slices of bytes with a bit of extra syntactic support from the language.
Also read: Strings, bytes, runes and characters in Go
When to use one over the other?
Depends on what you need. Strings are immutable, so they can be shared and you have guarantee they won't get modified.
Byte slices can be modified (meaning the content of the backing array).
Also if you need to frequently convert a string to a []byte (e.g. because you need to write it into an io.Writer()), you should consider storing it as a []byte in the first place.
Also note that you can have string constants but there are no slice constants. This may be a small optimization. Also note that:
The expression len(s) is constant if s is a string constant.
Also if you are using code already written (either standard library, 3rd party packages or your own), in most of the cases it is given what parameters and values you have to pass or are returned. E.g. if you read data from an io.Reader, you need to have a []byte which you have to pass to receive the read bytes, you can't use a string for that.
This example:
bb := []byte{'h','e','l','l','o',127}
What happens here is that you used a composite literal (slice literal) to create and initialize a new slice of type []byte (using Short variable declaration). You specified constants to list the initial elements of the slice. You also used a byte value 127 which - depending on the platform / console - may or may not have a visual representation.
Late but i hope this could help.
In simple words
Bit: 0 and 1 is how machines represents all the information
Byte: 8 bits that represents UTF-8 encodings i.e. characters
[ ]type: slice of a given data type. Slices are dynamic size arrays.
[ ]byte: this is a byte slice i.e. a dynamic size array that contains bytes i.e. each element is a UTF-8 character.
String: read-only slices of bytes i.e. immutable
With all this in mind:
s := "Go"
bs := []byte(s)
fmt.Printf("%s", bs) // Output: Go
fmt.Printf("%d", bs) // Output: [71 111]
or
bs := []byte{71, 111}
fmt.Printf("%s", bs) // Output: Go
%s converts byte slice to string
%d gets UTF-8 decimal value of bytes
IMPORTANT:
As strings are immutable, they cannot be changed within memory, each time you add or remove something from a string, GO creates a new string in memory. On the other hand, byte slices are mutable so when you update a byte slice you are not recreating new stuffs in memory.
So choosing the right structure could make a difference in your app performance.
The charCodeAt() method in JavaScript returns the numeric Unicode value of the character at the given index, e.g.
"s".charCodeAt(0) // returns 115
How would I go by to get the numeric unicode value of the the same string/letter in Go?
The character type in Go is rune which is an alias for int32 so it is already a number, just print it.
You still need a way to get the character at the specified position. Simplest way is to convert the string to a []rune which you can index. To convert a string to runes, simply use the type conversion []rune("some string"):
fmt.Println([]rune("s")[0])
Prints:
115
If you want it printed as a character, use the %c format string:
fmt.Println([]rune("absdef")[2]) // Also prints 115
fmt.Printf("%c", []rune("absdef")[2]) // Prints s
Also note that the for range on a string iterates over the runes of the string, so you can also use that. It is more efficient than converting the whole string to []rune:
i := 0
for _, r := range "absdef" {
if i == 2 {
fmt.Println(r)
break
}
i++
}
Note that the counter i must be a distinct counter, it cannot be the loop iteration variable, as the for range returns the byte position and not the rune index (which will be different if the string contains multi-byte characters in the UTF-8 representation).
Wrapping it into a function:
func charCodeAt(s string, n int) rune {
i := 0
for _, r := range s {
if i == n {
return r
}
i++
}
return 0
}
Try these on the Go Playground.
Also note that strings in Go are stored in memory as a []byte which is the UTF-8 encoded byte sequence of the text (read the blog post Strings, bytes, runes and characters in Go for more info). If you have guarantees that the string uses characters whose code is less than 127, you can simply work with bytes. That is indexing a string in Go indexes its bytes, so for example "s"[0] is the byte value of 's' which is 115.
fmt.Println("s"[0]) // Prints 115
fmt.Println("absdef"[2]) // Prints 115
Internally string is a 8 bit byte array in golang. So every byte will represent the ascii value.
str:="abc"
byteValue := str[0]
intValue := int(byteValue)
fmt.Println(byteValue)//97
fmt.Println(intValue)//97
I wonder how I can I get a Unicode character from a string. For example, if the string is "你好", how can I get the first character "你"?
From another place I get one way:
var str = "你好"
runes := []rune(str)
fmt.Println(string(runes[0]))
It does work.
But I still have some questions:
Is there another way to do it?
Why in Go does str[0] not get a Unicode character from a string, but it gets byte data?
First, you may want to read https://blog.golang.org/strings
It will answer part of your questions.
A string in Go can contains arbitrary bytes. When you write str[i], the result is a byte, and the index is always a number of bytes.
Most of the time, strings are encoded in UTF-8 though. You have multiple ways to deal with UTF-8 encoding in a string.
For instance, you can use the for...range statement to iterate on a string rune by rune.
var first rune
for _,c := range str {
first = c
break
}
// first now contains the first rune of the string
You can also leverage the unicode/utf8 package. For instance:
r, size := utf8.DecodeRuneInString(str)
// r contains the first rune of the string
// size is the size of the rune in bytes
If the string is encoded in UTF-8, there is no direct way to access the nth rune of the string, because the size of the runes (in bytes) is not constant. If you need this feature, you can easily write your own helper function to do it (with for...range, or with the unicode/utf8 package).
You can use the utf8string package:
package main
import "golang.org/x/exp/utf8string"
func main() {
s := utf8string.NewString("ÄÅàâäåçèéêëìîïü")
// example 1
r := s.At(1)
println(r == 'Å')
// example 2
t := s.Slice(1, 3)
println(t == "Åà")
}
https://pkg.go.dev/golang.org/x/exp/utf8string
you can do this:
func main() {
str := "cat"
var s rune
for i, c := range str {
if i == 2 {
s = c
}
}
}
s is now equal to a