Why strings.HasPrefix is faster than bytes.HasPrefix? - string

In my code, I have such benchmark:
const STR = "abcd"
const PREFIX = "ab"
var STR_B = []byte(STR)
var PREFIX_B = []byte(PREFIX)
func BenchmarkStrHasPrefix(b *testing.B) {
for i := 0; i < b.N; i++ {
strings.HasPrefix(STR, PREFIX)
}
}
func BenchmarkBytHasPrefix(b *testing.B) {
for i := 0; i < b.N; i++ {
bytes.HasPrefix(STR_B, PREFIX_B)
}
}
And I am little bit confused about results:
BenchmarkStrHasPrefix-4 300000000 4.67 ns/op
BenchmarkBytHasPrefix-4 200000000 8.05 ns/op
Why there is up to 2x difference?
Thanks.

The main reason is the difference in calling cost of bytes.HasPrefix() and strings.HasPrefix(). As #tomasz pointed out in his comment, strings.HashPrefix() is inlined by default while bytes.HasPrefix() is not.
Further reason is the different parameter types: bytes.HasPrefix() takes 2 slices (2 slice descriptors). strings.HasPrefix() takes 2 strings (2 string headers). Slice descriptors contain a pointer and 2 ints: length and capacity, see reflect.SliceHeader. String headers contain only a pointer and an int: the length, see reflect.StringHeader.
We can prove this if we manually inline the HasPrefix() functions in the benchmark functions, so we eliminate the calling costs (zero both). By inlining them, no function call will be made (to them).
HasPrefix() implementations:
// HasPrefix tests whether the byte slice s begins with prefix.
func HasPrefix(s, prefix []byte) bool {
return len(s) >= len(prefix) && Equal(s[0:len(prefix)], prefix)
}
// HasPrefix tests whether the string s begins with prefix.
func HasPrefix(s, prefix string) bool {
return len(s) >= len(prefix) && s[0:len(prefix)] == prefix
}
Benchmark functions after inlining them:
func BenchmarkStrHasPrefix(b *testing.B) {
s, prefix := STR, PREFIX
for i := 0; i < b.N; i++ {
_ = len(s) >= len(prefix) && s[0:len(prefix)] == prefix
}
}
func BenchmarkBytHasPrefix(b *testing.B) {
s, prefix := STR_B, PREFIX_B
for i := 0; i < b.N; i++ {
_ = len(s) >= len(prefix) && bytes.Equal(s[0:len(prefix)], prefix)
}
}
Running these you get very close results:
BenchmarkStrHasPrefix-2 300000000 5.88 ns/op
BenchmarkBytHasPrefix-2 200000000 6.17 ns/op
The reason for the small difference in the inlined benchmarks may be that both functions test the presence of the prefix by slicing the string and []byte operand. And since strings are comparable while byte slices are not, BenchmarkBytHasPrefix() requires an additional function call to bytes.Equal() compared to BenchmarkStrHasPrefix() (and the extra function call also includes making copies of its parameters: 2 slice headers).
Other things that may slightly contribute to the original different results: arguments used in BenchmarkStrHasPrefix() are constants, while parameters used in BenchmarkBytHasPrefix() are variables.
You shouldn't worry much about the performance difference, both functions complete in just a few nanoseconds.
Note that the "implementation" of bytes.Equal():
func Equal(a, b []byte) bool // ../runtime/asm_$GOARCH.s
This may be inlined in some platforms resulting in no additional call costs.

Related

Splitting a rune correctly in golang

I'm wondering if there is an easy way, such as well known functions to handle code points/runes, to take a chunk out of the middle of a rune slice without messing it up or if it's all needs to coded ourselves to get down to something equal to or less than a maximum number of bytes.
Specifically, what I am looking to do is pass a string to a function, convert it to runes so that I can respect code points and if the slice is longer than some maximum bytes, remove enough runes from the center of the runes to get the bytes down to what's necessary.
This is simple math if the strings are just single byte characters and be handled something like:
func shortenStringIDToMaxLength(in string, maxLen int) string {
if len(in) > maxLen {
excess := len(in) - maxLen
start := maxLen/2 - excess/2
return in[:start] + in[start+excess:]
}
return in
}
but in a variable character width byte string it's either going to be a fair bit more coding looping through or there will be nice functions to make this easy. Does anyone have a code sample of how to best handle such a thing with runes?
The idea here is that the DB field the string will go into has a fixed maximum length in bytes, not code points so there needs to be some algorithm from runes to maximum bytes. The reason for taking the characters from the the middle of the string is just the needs of this particular program.
Thanks!
EDIT:
Once I found out that the range operator respected runes on strings this became easy to do with just strings which I found because of the great answers below. I shouldn't have to worry about the string being a well formed UTF format in this case but if I do I now know about the UTF module, thanks!
Here's what I ended up with:
package main
import (
"fmt"
)
func ShortenStringIDToMaxLength(in string, maxLen int) string {
if maxLen < 1 {
// Panic/log whatever is your error system of choice.
}
bytes := len(in)
if bytes > maxLen {
excess := bytes - maxLen
lPos := bytes/2 - excess/2
lastPos := 0
for pos, _ := range in {
if pos > lPos {
lPos = lastPos
break
}
lastPos = pos
}
rPos := lPos + excess
for pos, _ := range in[lPos:] {
if pos >= excess {
rPos = pos
break
}
}
return in[:lPos] + in[lPos+rPos:]
}
return in
}
func main() {
out := ShortenStringIDToMaxLength(`123456789 123456789`, 5)
fmt.Println(out, len(out))
}
https://play.golang.org/p/YLGlj_17A-j
Here is an adaptation of your algorithm, which removes incomplete runes from the beginning of your prefix and the end of your suffix :
func TrimLastIncompleteRune(s string) string {
l := len(s)
for i := 1; i <= l; i++ {
suff := s[l-i : l]
// repeatedly try to decode a rune from the last bytes in string
r, cnt := utf8.DecodeRuneInString(suff)
if r == utf8.RuneError {
continue
}
// if success : return the substring which contains
// this succesfully decoded rune
lgth := l - i + cnt
return s[:lgth]
}
return ""
}
func TrimFirstIncompleteRune(s string) string {
// repeatedly try to decode a rune from the beginning
for i := 0; i < len(s); i++ {
if r, _ := utf8.DecodeRuneInString(s[i:]); r != utf8.RuneError {
// if success : return
return s[i:]
}
}
return ""
}
func shortenStringIDToMaxLength(in string, maxLen int) string {
if len(in) > maxLen {
firstHalf := maxLen / 2
secondHalf := len(in) - (maxLen - firstHalf)
prefix := TrimLastIncompleteRune(in[:firstHalf])
suffix := TrimFirstIncompleteRune(in[secondHalf:])
return prefix + suffix
}
return in
}
link on play.golang.org
This algorithm only tries to drop more bytes from the selected prefix and suffix.
If it turns out that you need to drop 3 bytes from the suffix to have a valid rune, for example, it does not try to see if it can add 3 more bytes to the prefix, to have an end result closer to maxLen bytes.
You can use simple arithmetic to find start and end such that the string s[:start] + s[end:] is shorter than your byte limit. But you need to make sure that start and end are both the first byte of any utf-8 sequence to keep the sequence valid.
UTF-8 has the property that any given byte is the first byte of a sequence as long as its top two bits aren't 10.
So you can write code something like this (playground: https://play.golang.org/p/xk_Yo_1wTYc)
package main
import (
"fmt"
)
func truncString(s string, maxLen int) string {
if len(s) <= maxLen {
return s
}
start := (maxLen + 1) / 2
for start > 0 && s[start]>>6 == 0b10 {
start--
}
end := len(s) - (maxLen - start)
for end < len(s) && s[end]>>6 == 0b10 {
end++
}
return s[:start] + s[end:]
}
func main() {
fmt.Println(truncString("this is a test", 5))
fmt.Println(truncString("日本語", 7))
}
This code has the desirable property that it takes O(maxLen) time, no matter how long the input string (assuming it's valid utf-8).

Using the == symbol in golang and using a loop to compare if string a equals string b,which performance is better?

for i:=0;i<len(a);i++{
if a[i] != b[i]{
return false
}
}
and just
a == b
I've found that the same string have different address
a := "abc"
b := "abc"
println(&a)
println(&b)
answer is :
0xc420045f68
0xc420045f58
so == not using address to compare.
In fact, I would like to know how == compares two strings.
I am searching for a long time on net. But failed...
You should use the == operator to compare strings. It compares the content of the string values.
What you print is the address of a and b variables. Since they are 2 distinct non-zero size variables, their addresses cannot be the same by definition. The values they hold of course may or may not be the same. The == operator compares the values the variables hold, not the addresses of the variables.
Your solution with the loop might even result in a runtime panic, if the b string is shorter than a, as you index it with values that are valid for a.
The built-in == operator will likely always outperform any loop, as that is implemented in architecture specific assembly code. It is implemented in the runtime package, unexported function memequal().
Also note that the built-in comparison might even omit checking the actual contents of the texts if their string header points to the same data (and have equal length). There is no reason not to use ==.
The only reason where a custom equal function for string values would make sense is where the heuristics of your strings are known. E.g. if you know all the string values have the same prefix and they may only differ in their last character. In this case you could write a comparator function which only compares the last character of the strings to decide if they are equal (and only, optionally revert to actually compare the rest). This solution would of course not use a loop.
Use the Go == operator for string equality. The Go gc and gccgo compilers are optimizing compilers. The Go runtime has been optimized.
This comment in the strings package documentation for the strings.Compare function is also relevant to equality:
Compare is included only for symmetry with package bytes. It is
usually clearer and always faster to use the built-in string
comparison operators ==, <, >, and so on.
In Go, the runtime representation of a string is a struct:
type StringHeader struct {
Data uintptr // byte array pointer
Len int // byte array length
}
When you assign a Go string value to a variable,
s := "ABC"
the memory location allocated for the variable s is set to the value of a StringHeader struct describing the string. The address of the variable, &s, points to the struct, not to the underlying byte array.
A comparison for Go string equality compares the bytes of the underlying arrays, the values of *StringHeader.Data[0:StringHeader.Len].
In Go, we use the Go testing package to benchmark performance. For example, comparing the Go == operator to two Go string equality functions:
Output:
$ go test equal_test.go -bench=. -benchmem
BenchmarkEqualOper-4 500000000 3.19 ns/op 0 B/op 0 allocs/op
BenchmarkEqualFunc1-4 500000000 3.32 ns/op 0 B/op 0 allocs/op
BenchmarkEqualFunc2-4 500000000 3.61 ns/op 0 B/op 0 allocs/op
$ go version
go version devel +bb222cde10 Mon Jun 11 14:47:06 2018 +0000 linux/amd64
$
equal_test.go:
package main
import (
"reflect"
"testing"
"unsafe"
)
func EqualOper(a, b string) bool {
return a == b
}
func EqualFunc1(a, b string) bool {
if len(a) != len(b) {
return false
}
for i := 0; i < len(a); i++ {
if a[i] != b[i] {
return false
}
}
return true
}
func EqualFunc2(a, b string) bool {
if len(a) != len(b) {
return false
}
if len(a) == 0 {
return true
}
// string intern equality
if (*reflect.StringHeader)(unsafe.Pointer(&a)).Data == (*reflect.StringHeader)(unsafe.Pointer(&b)).Data {
return true
}
for i := 0; i < len(a); i++ {
if a[i] != b[i] {
return false
}
}
return true
}
var y, z = "aby", "abz"
func BenchmarkEqualOper(B *testing.B) {
a, b := y, z
for i := 0; i < B.N; i++ {
_ = EqualOper(a, b)
}
}
func BenchmarkEqualFunc1(B *testing.B) {
a, b := y, z
for i := 0; i < B.N; i++ {
_ = EqualFunc1(a, b)
}
}
func BenchmarkEqualFunc2(B *testing.B) {
a, b := y, z
for i := 0; i < B.N; i++ {
_ = EqualFunc2(a, b)
}
}

How to generate a random string of a fixed length in Go?

I want a random string of characters only (uppercase or lowercase), no numbers, in Go. What is the fastest and simplest way to do this?
Paul's solution provides a simple, general solution.
The question asks for the "the fastest and simplest way". Let's address the fastest part too. We'll arrive at our final, fastest code in an iterative manner. Benchmarking each iteration can be found at the end of the answer.
All the solutions and the benchmarking code can be found on the Go Playground. The code on the Playground is a test file, not an executable. You have to save it into a file named XX_test.go and run it with
go test -bench . -benchmem
Foreword:
The fastest solution is not a go-to solution if you just need a random string. For that, Paul's solution is perfect. This is if performance does matter. Although the first 2 steps (Bytes and Remainder) might be an acceptable compromise: they do improve performance by like 50% (see exact numbers in the II. Benchmark section), and they don't increase complexity significantly.
Having said that, even if you don't need the fastest solution, reading through this answer might be adventurous and educational.
I. Improvements
1. Genesis (Runes)
As a reminder, the original, general solution we're improving is this:
func init() {
rand.Seed(time.Now().UnixNano())
}
var letterRunes = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
func RandStringRunes(n int) string {
b := make([]rune, n)
for i := range b {
b[i] = letterRunes[rand.Intn(len(letterRunes))]
}
return string(b)
}
2. Bytes
If the characters to choose from and assemble the random string contains only the uppercase and lowercase letters of the English alphabet, we can work with bytes only because the English alphabet letters map to bytes 1-to-1 in the UTF-8 encoding (which is how Go stores strings).
So instead of:
var letters = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
we can use:
var letters = []byte("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
Or even better:
const letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
Now this is already a big improvement: we could achieve it to be a const (there are string constants but there are no slice constants). As an extra gain, the expression len(letters) will also be a const! (The expression len(s) is constant if s is a string constant.)
And at what cost? Nothing at all. strings can be indexed which indexes its bytes, perfect, exactly what we want.
Our next destination looks like this:
const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
func RandStringBytes(n int) string {
b := make([]byte, n)
for i := range b {
b[i] = letterBytes[rand.Intn(len(letterBytes))]
}
return string(b)
}
3. Remainder
Previous solutions get a random number to designate a random letter by calling rand.Intn() which delegates to Rand.Intn() which delegates to Rand.Int31n().
This is much slower compared to rand.Int63() which produces a random number with 63 random bits.
So we could simply call rand.Int63() and use the remainder after dividing by len(letterBytes):
func RandStringBytesRmndr(n int) string {
b := make([]byte, n)
for i := range b {
b[i] = letterBytes[rand.Int63() % int64(len(letterBytes))]
}
return string(b)
}
This works and is significantly faster, the disadvantage is that the probability of all the letters will not be exactly the same (assuming rand.Int63() produces all 63-bit numbers with equal probability). Although the distortion is extremely small as the number of letters 52 is much-much smaller than 1<<63 - 1, so in practice this is perfectly fine.
To make this understand easier: let's say you want a random number in the range of 0..5. Using 3 random bits, this would produce the numbers 0..1 with double probability than from the range 2..5. Using 5 random bits, numbers in range 0..1 would occur with 6/32 probability and numbers in range 2..5 with 5/32 probability which is now closer to the desired. Increasing the number of bits makes this less significant, when reaching 63 bits, it is negligible.
4. Masking
Building on the previous solution, we can maintain the equal distribution of letters by using only as many of the lowest bits of the random number as many is required to represent the number of letters. So for example if we have 52 letters, it requires 6 bits to represent it: 52 = 110100b. So we will only use the lowest 6 bits of the number returned by rand.Int63(). And to maintain equal distribution of letters, we only "accept" the number if it falls in the range 0..len(letterBytes)-1. If the lowest bits are greater, we discard it and query a new random number.
Note that the chance of the lowest bits to be greater than or equal to len(letterBytes) is less than 0.5 in general (0.25 on average), which means that even if this would be the case, repeating this "rare" case decreases the chance of not finding a good number. After n repetition, the chance that we still don't have a good index is much less than pow(0.5, n), and this is just an upper estimation. In case of 52 letters the chance that the 6 lowest bits are not good is only (64-52)/64 = 0.19; which means for example that chances to not have a good number after 10 repetition is 1e-8.
So here is the solution:
const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
const (
letterIdxBits = 6 // 6 bits to represent a letter index
letterIdxMask = 1<<letterIdxBits - 1 // All 1-bits, as many as letterIdxBits
)
func RandStringBytesMask(n int) string {
b := make([]byte, n)
for i := 0; i < n; {
if idx := int(rand.Int63() & letterIdxMask); idx < len(letterBytes) {
b[i] = letterBytes[idx]
i++
}
}
return string(b)
}
5. Masking Improved
The previous solution only uses the lowest 6 bits of the 63 random bits returned by rand.Int63(). This is a waste as getting the random bits is the slowest part of our algorithm.
If we have 52 letters, that means 6 bits code a letter index. So 63 random bits can designate 63/6 = 10 different letter indices. Let's use all those 10:
const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
const (
letterIdxBits = 6 // 6 bits to represent a letter index
letterIdxMask = 1<<letterIdxBits - 1 // All 1-bits, as many as letterIdxBits
letterIdxMax = 63 / letterIdxBits // # of letter indices fitting in 63 bits
)
func RandStringBytesMaskImpr(n int) string {
b := make([]byte, n)
// A rand.Int63() generates 63 random bits, enough for letterIdxMax letters!
for i, cache, remain := n-1, rand.Int63(), letterIdxMax; i >= 0; {
if remain == 0 {
cache, remain = rand.Int63(), letterIdxMax
}
if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
b[i] = letterBytes[idx]
i--
}
cache >>= letterIdxBits
remain--
}
return string(b)
}
6. Source
The Masking Improved is pretty good, not much we can improve on it. We could, but not worth the complexity.
Now let's find something else to improve. The source of random numbers.
There is a crypto/rand package which provides a Read(b []byte) function, so we could use that to get as many bytes with a single call as many we need. This wouldn't help in terms of performance as crypto/rand implements a cryptographically secure pseudorandom number generator so it's much slower.
So let's stick to the math/rand package. The rand.Rand uses a rand.Source as the source of random bits. rand.Source is an interface which specifies a Int63() int64 method: exactly and the only thing we needed and used in our latest solution.
So we don't really need a rand.Rand (either explicit or the global, shared one of the rand package), a rand.Source is perfectly enough for us:
var src = rand.NewSource(time.Now().UnixNano())
func RandStringBytesMaskImprSrc(n int) string {
b := make([]byte, n)
// A src.Int63() generates 63 random bits, enough for letterIdxMax characters!
for i, cache, remain := n-1, src.Int63(), letterIdxMax; i >= 0; {
if remain == 0 {
cache, remain = src.Int63(), letterIdxMax
}
if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
b[i] = letterBytes[idx]
i--
}
cache >>= letterIdxBits
remain--
}
return string(b)
}
Also note that this last solution doesn't require you to initialize (seed) the global Rand of the math/rand package as that is not used (and our rand.Source is properly initialized / seeded).
One more thing to note here: package doc of math/rand states:
The default Source is safe for concurrent use by multiple goroutines.
So the default source is slower than a Source that may be obtained by rand.NewSource(), because the default source has to provide safety under concurrent access / use, while rand.NewSource() does not offer this (and thus the Source returned by it is more likely to be faster).
7. Utilizing strings.Builder
All previous solutions return a string whose content is first built in a slice ([]rune in Genesis, and []byte in subsequent solutions), and then converted to string. This final conversion has to make a copy of the slice's content, because string values are immutable, and if the conversion would not make a copy, it could not be guaranteed that the string's content is not modified via its original slice. For details, see How to convert utf8 string to []byte? and golang: []byte(string) vs []byte(*string).
Go 1.10 introduced strings.Builder. strings.Builder is a new type we can use to build contents of a string similar to bytes.Buffer. Internally it uses a []byte to build the content, and when we're done, we can obtain the final string value using its Builder.String() method. But what's cool in it is that it does this without performing the copy we just talked about above. It dares to do so because the byte slice used to build the string's content is not exposed, so it is guaranteed that no one can modify it unintentionally or maliciously to alter the produced "immutable" string.
So our next idea is to not build the random string in a slice, but with the help of a strings.Builder, so once we're done, we can obtain and return the result without having to make a copy of it. This may help in terms of speed, and it will definitely help in terms of memory usage and allocations.
func RandStringBytesMaskImprSrcSB(n int) string {
sb := strings.Builder{}
sb.Grow(n)
// A src.Int63() generates 63 random bits, enough for letterIdxMax characters!
for i, cache, remain := n-1, src.Int63(), letterIdxMax; i >= 0; {
if remain == 0 {
cache, remain = src.Int63(), letterIdxMax
}
if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
sb.WriteByte(letterBytes[idx])
i--
}
cache >>= letterIdxBits
remain--
}
return sb.String()
}
Do note that after creating a new strings.Buidler, we called its Builder.Grow() method, making sure it allocates a big-enough internal slice (to avoid reallocations as we add the random letters).
8. "Mimicing" strings.Builder with package unsafe
strings.Builder builds the string in an internal []byte, the same as we did ourselves. So basically doing it via a strings.Builder has some overhead, the only thing we switched to strings.Builder for is to avoid the final copying of the slice.
strings.Builder avoids the final copy by using package unsafe:
// String returns the accumulated string.
func (b *Builder) String() string {
return *(*string)(unsafe.Pointer(&b.buf))
}
The thing is, we can also do this ourselves, too. So the idea here is to switch back to building the random string in a []byte, but when we're done, don't convert it to string to return, but do an unsafe conversion: obtain a string which points to our byte slice as the string data.
This is how it can be done:
func RandStringBytesMaskImprSrcUnsafe(n int) string {
b := make([]byte, n)
// A src.Int63() generates 63 random bits, enough for letterIdxMax characters!
for i, cache, remain := n-1, src.Int63(), letterIdxMax; i >= 0; {
if remain == 0 {
cache, remain = src.Int63(), letterIdxMax
}
if idx := int(cache & letterIdxMask); idx < len(letterBytes) {
b[i] = letterBytes[idx]
i--
}
cache >>= letterIdxBits
remain--
}
return *(*string)(unsafe.Pointer(&b))
}
(9. Using rand.Read())
Go 1.7 added a rand.Read() function and a Rand.Read() method. We should be tempted to use these to read as many bytes as we need in one step, in order to achieve better performance.
There is one small "problem" with this: how many bytes do we need? We could say: as many as the number of output letters. We would think this is an upper estimation, as a letter index uses less than 8 bits (1 byte). But at this point we are already doing worse (as getting the random bits is the "hard part"), and we're getting more than needed.
Also note that to maintain equal distribution of all letter indices, there might be some "garbage" random data that we won't be able to use, so we would end up skipping some data, and thus end up short when we go through all the byte slice. We would need to further get more random bytes, "recursively". And now we're even losing the "single call to rand package" advantage...
We could "somewhat" optimize the usage of the random data we acquire from math.Rand(). We may estimate how many bytes (bits) we'll need. 1 letter requires letterIdxBits bits, and we need n letters, so we need n * letterIdxBits / 8.0 bytes rounding up. We can calculate the probability of a random index not being usable (see above), so we could request more that will "more likely" be enough (if it turns out it's not, we repeat the process). We can process the byte slice as a "bit stream" for example, for which we have a nice 3rd party lib: github.com/icza/bitio (disclosure: I'm the author).
But Benchmark code still shows we're not winning. Why is it so?
The answer to the last question is because rand.Read() uses a loop and keeps calling Source.Int63() until it fills the passed slice. Exactly what the RandStringBytesMaskImprSrc() solution does, without the intermediate buffer, and without the added complexity. That's why RandStringBytesMaskImprSrc() remains on the throne. Yes, RandStringBytesMaskImprSrc() uses an unsynchronized rand.Source unlike rand.Read(). But the reasoning still applies; and which is proven if we use Rand.Read() instead of rand.Read() (the former is also unsynchronzed).
II. Benchmark
All right, it's time for benchmarking the different solutions.
Moment of truth:
BenchmarkRunes-4 2000000 723 ns/op 96 B/op 2 allocs/op
BenchmarkBytes-4 3000000 550 ns/op 32 B/op 2 allocs/op
BenchmarkBytesRmndr-4 3000000 438 ns/op 32 B/op 2 allocs/op
BenchmarkBytesMask-4 3000000 534 ns/op 32 B/op 2 allocs/op
BenchmarkBytesMaskImpr-4 10000000 176 ns/op 32 B/op 2 allocs/op
BenchmarkBytesMaskImprSrc-4 10000000 139 ns/op 32 B/op 2 allocs/op
BenchmarkBytesMaskImprSrcSB-4 10000000 134 ns/op 16 B/op 1 allocs/op
BenchmarkBytesMaskImprSrcUnsafe-4 10000000 115 ns/op 16 B/op 1 allocs/op
Just by switching from runes to bytes, we immediately have 24% performance gain, and memory requirement drops to one third.
Getting rid of rand.Intn() and using rand.Int63() instead gives another 20% boost.
Masking (and repeating in case of big indices) slows down a little (due to repetition calls): -22%...
But when we make use of all (or most) of the 63 random bits (10 indices from one rand.Int63() call): that speeds up big time: 3 times.
If we settle with a (non-default, new) rand.Source instead of rand.Rand, we again gain 21%.
If we utilize strings.Builder, we gain a tiny 3.5% in speed, but we also achieved 50% reduction in memory usage and allocations! That's nice!
Finally if we dare to use package unsafe instead of strings.Builder, we again gain a nice 14%.
Comparing the final to the initial solution: RandStringBytesMaskImprSrcUnsafe() is 6.3 times faster than RandStringRunes(), uses one sixth memory and half as few allocations. Mission accomplished.
You can just write code for it. This code can be a little simpler if you want to rely on the letters all being single bytes when encoded in UTF-8.
package main
import (
"fmt"
"time"
"math/rand"
)
func init() {
rand.Seed(time.Now().UnixNano())
}
var letters = []rune("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
func randSeq(n int) string {
b := make([]rune, n)
for i := range b {
b[i] = letters[rand.Intn(len(letters))]
}
return string(b)
}
func main() {
fmt.Println(randSeq(10))
}
Use package uniuri, which generates cryptographically secure uniform (unbiased) strings.
Disclaimer: I'm the author of the package
Simple solution for you, with least duplicate result:
import (
"fmt"
"math/rand"
"time"
)
func randomString(length int) string {
rand.Seed(time.Now().UnixNano())
b := make([]byte, length+2)
rand.Read(b)
return fmt.Sprintf("%x", b)[2 : length+2]
}
Check it out in the PlayGround
Two possible options (there might be more of course):
You can use the crypto/rand package that supports reading random byte arrays (from /dev/urandom) and is geared towards cryptographic random generation. see http://golang.org/pkg/crypto/rand/#example_Read . It might be slower than normal pseudo-random number generation though.
Take a random number and hash it using md5 or something like this.
If you want cryptographically secure random numbers, and the exact charset is flexible (say, base64 is fine), you can calculate exactly what the length of random characters you need from the desired output size.
Base 64 text is 1/3 longer than base 256. (2^8 vs 2^6; 8bits/6bits = 1.333 ratio)
import (
"crypto/rand"
"encoding/base64"
"math"
)
func randomBase64String(l int) string {
buff := make([]byte, int(math.Ceil(float64(l)/float64(1.33333333333))))
rand.Read(buff)
str := base64.RawURLEncoding.EncodeToString(buff)
return str[:l] // strip 1 extra character we get from odd length results
}
Note: you can also use RawStdEncoding if you prefer + and / characters to - and _
If you want hex, base 16 is 2x longer than base 256. (2^8 vs 2^4; 8bits/4bits = 2x ratio)
import (
"crypto/rand"
"encoding/hex"
"math"
)
func randomBase16String(l int) string {
buff := make([]byte, int(math.Ceil(float64(l)/2)))
rand.Read(buff)
str := hex.EncodeToString(buff)
return str[:l] // strip 1 extra character we get from odd length results
}
However, you could extend this to any arbitrary character set if you have a base256 to baseN encoder for your character set. You can do the same size calculation with how many bits are needed to represent your character set. The ratio calculation for any arbitrary charset is: ratio = 8 / log2(len(charset))).
Though both of these solutions are secure, simple, should be fast, and don't waste your crypto entropy pool.
Here's the playground showing it works for any size. https://play.golang.org/p/_yF_xxXer0Z
Another version, inspired from generate password in JavaScript crypto:
package main
import (
"crypto/rand"
"fmt"
)
var chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890-"
func shortID(length int) string {
ll := len(chars)
b := make([]byte, length)
rand.Read(b) // generates len(b) random bytes
for i := 0; i < length; i++ {
b[i] = chars[int(b[i])%ll]
}
return string(b)
}
func main() {
fmt.Println(shortID(18))
fmt.Println(shortID(18))
fmt.Println(shortID(18))
}
Following icza's wonderfully explained solution, here is a modification of it that uses crypto/rand instead of math/rand.
const (
letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" // 52 possibilities
letterIdxBits = 6 // 6 bits to represent 64 possibilities / indexes
letterIdxMask = 1<<letterIdxBits - 1 // All 1-bits, as many as letterIdxBits
)
func SecureRandomAlphaString(length int) string {
result := make([]byte, length)
bufferSize := int(float64(length)*1.3)
for i, j, randomBytes := 0, 0, []byte{}; i < length; j++ {
if j%bufferSize == 0 {
randomBytes = SecureRandomBytes(bufferSize)
}
if idx := int(randomBytes[j%length] & letterIdxMask); idx < len(letterBytes) {
result[i] = letterBytes[idx]
i++
}
}
return string(result)
}
// SecureRandomBytes returns the requested number of bytes using crypto/rand
func SecureRandomBytes(length int) []byte {
var randomBytes = make([]byte, length)
_, err := rand.Read(randomBytes)
if err != nil {
log.Fatal("Unable to generate random bytes")
}
return randomBytes
}
If you want a more generic solution, that allows you to pass in the slice of character bytes to create the string out of, you can try using this:
// SecureRandomString returns a string of the requested length,
// made from the byte characters provided (only ASCII allowed).
// Uses crypto/rand for security. Will panic if len(availableCharBytes) > 256.
func SecureRandomString(availableCharBytes string, length int) string {
// Compute bitMask
availableCharLength := len(availableCharBytes)
if availableCharLength == 0 || availableCharLength > 256 {
panic("availableCharBytes length must be greater than 0 and less than or equal to 256")
}
var bitLength byte
var bitMask byte
for bits := availableCharLength - 1; bits != 0; {
bits = bits >> 1
bitLength++
}
bitMask = 1<<bitLength - 1
// Compute bufferSize
bufferSize := length + length / 3
// Create random string
result := make([]byte, length)
for i, j, randomBytes := 0, 0, []byte{}; i < length; j++ {
if j%bufferSize == 0 {
// Random byte buffer is empty, get a new one
randomBytes = SecureRandomBytes(bufferSize)
}
// Mask bytes to get an index into the character slice
if idx := int(randomBytes[j%length] & bitMask); idx < availableCharLength {
result[i] = availableCharBytes[idx]
i++
}
}
return string(result)
}
If you want to pass in your own source of randomness, it would be trivial to modify the above to accept an io.Reader instead of using crypto/rand.
Here is my way ) Use math rand or crypto rand as you wish.
func randStr(len int) string {
buff := make([]byte, len)
rand.Read(buff)
str := base64.StdEncoding.EncodeToString(buff)
// Base 64 can be longer than len
return str[:len]
}
Here is a simple and performant solution for a cryptographically secure random string.
package main
import (
"crypto/rand"
"unsafe"
"fmt"
)
var alphabet = []byte("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
func main() {
fmt.Println(generate(16))
}
func generate(size int) string {
b := make([]byte, size)
rand.Read(b)
for i := 0; i < size; i++ {
b[i] = alphabet[b[i] % byte(len(alphabet))]
}
return *(*string)(unsafe.Pointer(&b))
}
Benchmark
Benchmark 95.2 ns/op 16 B/op 1 allocs/op
func Rand(n int) (str string) {
b := make([]byte, n)
rand.Read(b)
str = fmt.Sprintf("%x", b)
return
}
I usually do it like this if it takes an option to capitalize or not
func randomString(length int, upperCase bool) string {
rand.Seed(time.Now().UnixNano())
var alphabet string
if upperCase {
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
} else {
alphabet = "abcdefghijklmnopqrstuvwxyz"
}
var sb strings.Builder
l := len(alphabet)
for i := 0; i < length; i++ {
c := alphabet[rand.Intn(l)]
sb.WriteByte(c)
}
return sb.String()
}
and like this if you don't need capital letters
func randomString(length int) string {
rand.Seed(time.Now().UnixNano())
var alphabet string = "abcdefghijklmnopqrstuvwxyz"
var sb strings.Builder
l := len(alphabet)
for i := 0; i < length; i++ {
c := alphabet[rand.Intn(l)]
sb.WriteByte(c)
}
return sb.String()
}
If you are willing to add a few characters to your pool of allowed characters, you can make the code work with anything which provides random bytes through a io.Reader. Here we are using crypto/rand.
// len(encodeURL) == 64. This allows (x <= 265) x % 64 to have an even
// distribution.
const encodeURL = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
// A helper function create and fill a slice of length n with characters from
// a-zA-Z0-9_-. It panics if there are any problems getting random bytes.
func RandAsciiBytes(n int) []byte {
output := make([]byte, n)
// We will take n bytes, one byte for each character of output.
randomness := make([]byte, n)
// read all random
_, err := rand.Read(randomness)
if err != nil {
panic(err)
}
// fill output
for pos := range output {
// get random item
random := uint8(randomness[pos])
// random % 64
randomPos := random % uint8(len(encodeURL))
// put into output
output[pos] = encodeURL[randomPos]
}
return output
}
This is a sample code which I used to generate certificate number in my app.
func GenerateCertificateNumber() string {
CertificateLength := 7
t := time.Now().String()
CertificateHash, err := bcrypt.GenerateFromPassword([]byte(t), bcrypt.DefaultCost)
if err != nil {
fmt.Println(err)
}
// Make a Regex we only want letters and numbers
reg, err := regexp.Compile("[^a-zA-Z0-9]+")
if err != nil {
log.Fatal(err)
}
processedString := reg.ReplaceAllString(string(CertificateHash), "")
fmt.Println(string(processedString))
CertificateNumber := strings.ToUpper(string(processedString[len(processedString)-CertificateLength:]))
fmt.Println(CertificateNumber)
return CertificateNumber
}
/*
korzhao
*/
package rand
import (
crand "crypto/rand"
"math/rand"
"sync"
"time"
"unsafe"
)
// Doesn't share the rand library globally, reducing lock contention
type Rand struct {
Seed int64
Pool *sync.Pool
}
var (
MRand = NewRand()
randlist = []byte("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890")
)
// init random number generator
func NewRand() *Rand {
p := &sync.Pool{New: func() interface{} {
return rand.New(rand.NewSource(getSeed()))
},
}
mrand := &Rand{
Pool: p,
}
return mrand
}
// get the seed
func getSeed() int64 {
return time.Now().UnixNano()
}
func (s *Rand) getrand() *rand.Rand {
return s.Pool.Get().(*rand.Rand)
}
func (s *Rand) putrand(r *rand.Rand) {
s.Pool.Put(r)
}
// get a random number
func (s *Rand) Intn(n int) int {
r := s.getrand()
defer s.putrand(r)
return r.Intn(n)
}
// bulk get random numbers
func (s *Rand) Read(p []byte) (int, error) {
r := s.getrand()
defer s.putrand(r)
return r.Read(p)
}
func CreateRandomString(len int) string {
b := make([]byte, len)
_, err := MRand.Read(b)
if err != nil {
return ""
}
for i := 0; i < len; i++ {
b[i] = randlist[b[i]%(62)]
}
return *(*string)(unsafe.Pointer(&b))
}
24.0 ns/op 16 B/op 1 allocs/
As a follow-up to icza's brilliant solution, below I am using rand.Reader
func RandStringBytesMaskImprRandReaderUnsafe(length uint) (string, error) {
const (
charset = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
charIdxBits = 6 // 6 bits to represent a letter index
charIdxMask = 1<<charIdxBits - 1 // All 1-bits, as many as charIdxBits
charIdxMax = 63 / charIdxBits // # of letter indices fitting in 63 bits
)
buffer := make([]byte, length)
charsetLength := len(charset)
max := big.NewInt(int64(1 << uint64(charsetLength)))
limit, err := rand.Int(rand.Reader, max)
if err != nil {
return "", err
}
for index, cache, remain := int(length-1), limit.Int64(), charIdxMax; index >= 0; {
if remain == 0 {
limit, err = rand.Int(rand.Reader, max)
if err != nil {
return "", err
}
cache, remain = limit.Int64(), charIdxMax
}
if idx := int(cache & charIdxMask); idx < charsetLength {
buffer[index] = charset[idx]
index--
}
cache >>= charIdxBits
remain--
}
return *(*string)(unsafe.Pointer(&buffer)), nil
}
func BenchmarkBytesMaskImprRandReaderUnsafe(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
const length = 16
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
RandStringBytesMaskImprRandReaderUnsafe(length)
}
})
}
package main
import (
"encoding/base64"
"fmt"
"math/rand"
"time"
)
// customEncodeURL is like `bas64.encodeURL`
// except its made up entirely of uppercase characters:
const customEncodeURL = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKL"
// Random generates a random string.
// It is not cryptographically secure.
func Random(n int) string {
b := make([]byte, n)
rand.Seed(time.Now().UnixNano())
_, _ = rand.Read(b) // docs say that it always returns a nil error.
customEncoding := base64.NewEncoding(customEncodeURL).WithPadding(base64.NoPadding)
return customEncoding.EncodeToString(b)
}
func main() {
fmt.Println(Random(16))
}
const (
chars = "0123456789_abcdefghijkl-mnopqrstuvwxyz" //ABCDEFGHIJKLMNOPQRSTUVWXYZ
charsLen = len(chars)
mask = 1<<6 - 1
)
var rng = rand.NewSource(time.Now().UnixNano())
// RandStr 返回指定长度的随机字符串
func RandStr(ln int) string {
/* chars 38个字符
* rng.Int63() 每次产出64bit的随机数,每次我们使用6bit(2^6=64) 可以使用10次
*/
buf := make([]byte, ln)
for idx, cache, remain := ln-1, rng.Int63(), 10; idx >= 0; {
if remain == 0 {
cache, remain = rng.Int63(), 10
}
buf[idx] = chars[int(cache&mask)%charsLen]
cache >>= 6
remain--
idx--
}
return *(*string)(unsafe.Pointer(&buf))
}
BenchmarkRandStr16-8 20000000 68.1 ns/op 16 B/op 1 allocs/op

What is the fastest way to generate a long random string in Go?

Like [a-zA-Z0-9] string:
na1dopW129T0anN28udaZ
or hexadecimal string:
8c6f78ac23b4a7b8c0182d
By long I mean 2K and more characters.
This does about 200MBps on my box. There's obvious room for improvement.
type randomDataMaker struct {
src rand.Source
}
func (r *randomDataMaker) Read(p []byte) (n int, err error) {
for i := range p {
p[i] = byte(r.src.Int63() & 0xff)
}
return len(p), nil
}
You'd just use io.CopyN to produce the string you want. Obviously you could adjust the character set on the way in or whatever.
The nice thing about this model is that it's just an io.Reader so you can use it making anything.
Test is below:
func BenchmarkRandomDataMaker(b *testing.B) {
randomSrc := randomDataMaker{rand.NewSource(1028890720402726901)}
for i := 0; i < b.N; i++ {
b.SetBytes(int64(i))
_, err := io.CopyN(ioutil.Discard, &randomSrc, int64(i))
if err != nil {
b.Fatalf("Error copying at %v: %v", i, err)
}
}
}
On one core of my 2.2GHz i7:
BenchmarkRandomDataMaker 50000 246512 ns/op 202.83 MB/s
EDIT
Since I wrote the benchmark, I figured I'd do the obvious improvement thing (call out to the random less frequently). With 1/8 the calls to rand, it runs about 4x faster, though it's a big uglier:
New version:
func (r *randomDataMaker) Read(p []byte) (n int, err error) {
todo := len(p)
offset := 0
for {
val := int64(r.src.Int63())
for i := 0; i < 8; i++ {
p[offset] = byte(val & 0xff)
todo--
if todo == 0 {
return len(p), nil
}
offset++
val >>= 8
}
}
panic("unreachable")
}
New benchmark:
BenchmarkRandomDataMaker 200000 251148 ns/op 796.34 MB/s
EDIT 2
Took out the masking in the cast to byte since it was redundant. Got a good deal faster:
BenchmarkRandomDataMaker 200000 231843 ns/op 862.64 MB/s
(this is so much easier than real work sigh)
EDIT 3
This came up in irc today, so I released a library. Also, my actual benchmark tool, while useful for relative speed, isn't sufficiently accurate in its reporting.
I created randbo that you can reuse to produce random streams wherever you may need them.
You can use the Go package uniuri to generate random strings (or view the source code to see how they're doing it). You'll want to use:
func NewLen(length int) string
NewLen returns a new random string of the provided length, consisting of standard characters.
Or, to specify the set of characters used:
func NewLenChars(length int, chars []byte) string
This is actually a little biased towards the first 8 characters in the set (since 255 is not a multiple of len(alphanum)), but this will get you most of the way there.
import (
"crypto/rand"
)
func randString(n int) string {
const alphanum = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
var bytes = make([]byte, n)
rand.Read(bytes)
for i, b := range bytes {
bytes[i] = alphanum[b % byte(len(alphanum))]
}
return string(bytes)
}
If you want to generate cryptographically secure random string, I recommend you to take a look at this page. Here is a helper function that reads n random bytes from the source of randomness of your OS and then use these bytes to base64encode it. Note that the string length would be bigger than n because of base64.
package main
import(
"crypto/rand"
"encoding/base64"
"fmt"
)
func GenerateRandomBytes(n int) ([]byte, error) {
b := make([]byte, n)
_, err := rand.Read(b)
if err != nil {
return nil, err
}
return b, nil
}
func GenerateRandomString(s int) (string, error) {
b, err := GenerateRandomBytes(s)
return base64.URLEncoding.EncodeToString(b), err
}
func main() {
token, _ := GenerateRandomString(32)
fmt.Println(token)
}
Here Evan Shaw's answer re-worked without the bias towards the first 8 characters of the string. Note that it uses lots of expensive big.Int operations so probably isn't that quick! The answer is crypto strong though.
It uses rand.Int to make an integer of exactly the right size len(alphanum) ** n, then does what is effectively a base conversion into base len(alphanum).
There is almost certainly a better algorithm for this which would involve keeping a much smaller remainder and adding random bytes to it as necessary. This would get rid of the expensive long integer arithmetic.
import (
"crypto/rand"
"fmt"
"math/big"
)
func randString(n int) string {
const alphanum = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
symbols := big.NewInt(int64(len(alphanum)))
states := big.NewInt(0)
states.Exp(symbols, big.NewInt(int64(n)), nil)
r, err := rand.Int(rand.Reader, states)
if err != nil {
panic(err)
}
var bytes = make([]byte, n)
r2 := big.NewInt(0)
symbol := big.NewInt(0)
for i := range bytes {
r2.DivMod(r, symbols, symbol)
r, r2 = r2, r
bytes[i] = alphanum[symbol.Int64()]
}
return string(bytes)
}

How to reverse a string in Go?

How can we reverse a simple string in Go?
In Go1 rune is a builtin type.
func Reverse(s string) string {
runes := []rune(s)
for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
runes[i], runes[j] = runes[j], runes[i]
}
return string(runes)
}
Russ Cox, on the golang-nuts mailing list, suggests
package main
import "fmt"
func main() {
input := "The quick brown 狐 jumped over the lazy 犬"
// Get Unicode code points.
n := 0
rune := make([]rune, len(input))
for _, r := range input {
rune[n] = r
n++
}
rune = rune[0:n]
// Reverse
for i := 0; i < n/2; i++ {
rune[i], rune[n-1-i] = rune[n-1-i], rune[i]
}
// Convert back to UTF-8.
output := string(rune)
fmt.Println(output)
}
This works, without all the mucking about with functions:
func Reverse(s string) (result string) {
for _,v := range s {
result = string(v) + result
}
return
}
From Go example projects: golang/example/stringutil/reverse.go, by Andrew Gerrand
/*
Copyright 2014 Google Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
// Reverse returns its argument string reversed rune-wise left to right.
func Reverse(s string) string {
r := []rune(s)
for i, j := 0, len(r)-1; i < len(r)/2; i, j = i+1, j-1 {
r[i], r[j] = r[j], r[i]
}
return string(r)
}
Go Playground for reverse a string
After reversing string "bròwn", the correct result should be "nwòrb", not "nẁorb".
Note the grave above the letter o.
For preserving Unicode combining characters such as "as⃝df̅" with reverse result "f̅ds⃝a",
please refer to another code listed below:
http://rosettacode.org/wiki/Reverse_a_string#Go
This works on unicode strings by considering 2 things:
range works on string by enumerating unicode characters
string can be constructed from int slices where each element is a unicode character.
So here it goes:
func reverse(s string) string {
o := make([]int, utf8.RuneCountInString(s));
i := len(o);
for _, c := range s {
i--;
o[i] = c;
}
return string(o);
}
There are too many answers here. Some of them are clear duplicates. But even from the left one, it is hard to select the best solution.
So I went through the answers, thrown away the one that does not work for unicode and also removed duplicates. I benchmarked the survivors to find the fastest. So here are the results with attribution (if you notice the answers that I missed, but worth adding, feel free to modify the benchmark):
Benchmark_rmuller-4 100000 19246 ns/op
Benchmark_peterSO-4 50000 28068 ns/op
Benchmark_russ-4 50000 30007 ns/op
Benchmark_ivan-4 50000 33694 ns/op
Benchmark_yazu-4 50000 33372 ns/op
Benchmark_yuku-4 50000 37556 ns/op
Benchmark_simon-4 3000 426201 ns/op
So here is the fastest method by rmuller:
func Reverse(s string) string {
size := len(s)
buf := make([]byte, size)
for start := 0; start < size; {
r, n := utf8.DecodeRuneInString(s[start:])
start += n
utf8.EncodeRune(buf[size-start:], r)
}
return string(buf)
}
For some reason I can't add a benchmark, so you can copy it from PlayGround (you can't run tests there). Rename it and run go test -bench=.
I noticed this question when Simon posted his solution which, since strings are immutable, is very inefficient. The other proposed solutions are also flawed; they don't work or they are inefficient.
Here's an efficient solution that works, except when the string is not valid UTF-8 or the string contains combining characters.
package main
import "fmt"
func Reverse(s string) string {
n := len(s)
runes := make([]rune, n)
for _, rune := range s {
n--
runes[n] = rune
}
return string(runes[n:])
}
func main() {
fmt.Println(Reverse(Reverse("Hello, 世界")))
fmt.Println(Reverse(Reverse("The quick brown 狐 jumped over the lazy 犬")))
}
I wrote the following Reverse function which respects UTF8 encoding and combined characters:
// Reverse reverses the input while respecting UTF8 encoding and combined characters
func Reverse(text string) string {
textRunes := []rune(text)
textRunesLength := len(textRunes)
if textRunesLength <= 1 {
return text
}
i, j := 0, 0
for i < textRunesLength && j < textRunesLength {
j = i + 1
for j < textRunesLength && isMark(textRunes[j]) {
j++
}
if isMark(textRunes[j-1]) {
// Reverses Combined Characters
reverse(textRunes[i:j], j-i)
}
i = j
}
// Reverses the entire array
reverse(textRunes, textRunesLength)
return string(textRunes)
}
func reverse(runes []rune, length int) {
for i, j := 0, length-1; i < length/2; i, j = i+1, j-1 {
runes[i], runes[j] = runes[j], runes[i]
}
}
// isMark determines whether the rune is a marker
func isMark(r rune) bool {
return unicode.Is(unicode.Mn, r) || unicode.Is(unicode.Me, r) || unicode.Is(unicode.Mc, r)
}
I did my best to make it as efficient and readable as possible. The idea is simple, traverse through the runes looking for combined characters then reverse the combined characters' runes in-place. Once we have covered them all, reverse the runes of the entire string also in-place.
Say we would like to reverse this string bròwn. The ò is represented by two runes, one for the o and one for this unicode \u0301a that represents the "grave".
For simplicity, let's represent the string like this bro'wn. The first thing we do is look for combined characters and reverse them. So now we have the string br'own. Finally, we reverse the entire string and end up with nwo'rb. This is returned to us as nwòrb
You can find it here https://github.com/shomali11/util if you would like to use it.
Here are some test cases to show a couple of different scenarios:
func TestReverse(t *testing.T) {
assert.Equal(t, Reverse(""), "")
assert.Equal(t, Reverse("X"), "X")
assert.Equal(t, Reverse("b\u0301"), "b\u0301")
assert.Equal(t, Reverse("😎⚽"), "⚽😎")
assert.Equal(t, Reverse("Les Mise\u0301rables"), "selbare\u0301siM seL")
assert.Equal(t, Reverse("ab\u0301cde"), "edcb\u0301a")
assert.Equal(t, Reverse("This `\xc5` is an invalid UTF8 character"), "retcarahc 8FTU dilavni na si `�` sihT")
assert.Equal(t, Reverse("The quick bròwn 狐 jumped over the lazy 犬"), "犬 yzal eht revo depmuj 狐 nwòrb kciuq ehT")
}
//Reverse reverses string using strings.Builder. It's about 3 times faster
//than the one with using a string concatenation
func Reverse(in string) string {
var sb strings.Builder
runes := []rune(in)
for i := len(runes) - 1; 0 <= i; i-- {
sb.WriteRune(runes[i])
}
return sb.String()
}
//Reverse reverses string using string
func Reverse(in string) (out string) {
for _, r := range in {
out = string(r) + out
}
return
}
BenchmarkReverseStringConcatenation-8 1000000 1571 ns/op 176 B/op 29 allocs/op
BenchmarkReverseStringsBuilder-8 3000000 499 ns/op 56 B/op 6 allocs/op
Using strings.Builder is about 3 times faster than using string concatenation
Here is quite different, I would say more functional approach, not listed among other answers:
func reverse(s string) (ret string) {
for _, v := range s {
defer func(r rune) { ret += string(r) }(v)
}
return
}
This is the fastest implementation
func Reverse(s string) string {
size := len(s)
buf := make([]byte, size)
for start := 0; start < size; {
r, n := utf8.DecodeRuneInString(s[start:])
start += n
utf8.EncodeRune(buf[size-start:], r)
}
return string(buf)
}
const (
s = "The quick brown 狐 jumped over the lazy 犬"
reverse = "犬 yzal eht revo depmuj 狐 nworb kciuq ehT"
)
func TestReverse(t *testing.T) {
if Reverse(s) != reverse {
t.Error(s)
}
}
func BenchmarkReverse(b *testing.B) {
for i := 0; i < b.N; i++ {
Reverse(s)
}
}
A simple stroke with rune:
func ReverseString(s string) string {
runes := []rune(s)
size := len(runes)
for i := 0; i < size/2; i++ {
runes[size-i-1], runes[i] = runes[i], runes[size-i-1]
}
return string(runes)
}
func main() {
fmt.Println(ReverseString("Abcdefg 汉语 The God"))
}
: doG ehT 语汉 gfedcbA
You could also import an existing implementation:
import "4d63.com/strrev"
Then:
strrev.Reverse("abåd") // returns "dåba"
Or to reverse a string including unicode combining characters:
strrev.ReverseCombining("abc\u0301\u031dd") // returns "d\u0301\u031dcba"
These implementations supports correct ordering of unicode multibyte and combing characters when reversed.
Note: Built-in string reverse functions in many programming languages do not preserve combining, and identifying combining characters requires significantly more execution time.
func ReverseString(str string) string {
output :=""
for _, char := range str {
output = string(char) + output
}
return output
}
// "Luizpa" -> "apziuL"
// "123日本語" -> "語本日321"
// "⚽😎" -> "😎⚽"
// "´a´b´c´" -> "´c´b´a´"
This code preserves sequences of combining characters intact, and
should work with invalid UTF-8 input too.
package stringutil
import "code.google.com/p/go.text/unicode/norm"
func Reverse(s string) string {
bound := make([]int, 0, len(s) + 1)
var iter norm.Iter
iter.InitString(norm.NFD, s)
bound = append(bound, 0)
for !iter.Done() {
iter.Next()
bound = append(bound, iter.Pos())
}
bound = append(bound, len(s))
out := make([]byte, 0, len(s))
for i := len(bound) - 2; i >= 0; i-- {
out = append(out, s[bound[i]:bound[i+1]]...)
}
return string(out)
}
It could be a little more efficient if the unicode/norm primitives
allowed iterating through the boundaries of a string without
allocating. See also https://code.google.com/p/go/issues/detail?id=9055 .
If you need to handle grapheme clusters, use unicode or regexp module.
package main
import (
"unicode"
"regexp"
)
func main() {
str := "\u0308" + "a\u0308" + "o\u0308" + "u\u0308"
println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme(str))
println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme2(str))
}
func ReverseGrapheme(str string) string {
buf := []rune("")
checked := false
index := 0
ret := ""
for _, c := range str {
if !unicode.Is(unicode.M, c) {
if len(buf) > 0 {
ret = string(buf) + ret
}
buf = buf[:0]
buf = append(buf, c)
if checked == false {
checked = true
}
} else if checked == false {
ret = string(append([]rune(""), c)) + ret
} else {
buf = append(buf, c)
}
index += 1
}
return string(buf) + ret
}
func ReverseGrapheme2(str string) string {
re := regexp.MustCompile("\\PM\\pM*|.")
slice := re.FindAllString(str, -1)
length := len(slice)
ret := ""
for i := 0; i < length; i += 1 {
ret += slice[length-1-i]
}
return ret
}
It's assuredly not the most memory efficient solution, but for a "simple" UTF-8 safe solution the following will get the job done and not break runes.
It's in my opinion the most readable and understandable on the page.
func reverseStr(str string) (out string) {
for _, s := range str {
out = string(s) + out
}
return
}
The following two methods run faster than the fastest solution that preserve combining characters, though that's not to say I'm missing something in my benchmark setup.
//input string s
bs := []byte(s)
var rs string
for len(bs) > 0 {
r, size := utf8.DecodeLastRune(bs)
rs += fmt.Sprintf("%c", r)
bs = bs[:len(bs)-size]
} // rs has reversed string
Second method inspired by this
//input string s
bs := []byte(s)
cs := make([]byte, len(bs))
b1 := 0
for len(bs) > 0 {
r, size := utf8.DecodeLastRune(bs)
d := make([]byte, size)
_ = utf8.EncodeRune(d, r)
b1 += copy(cs[b1:], d)
bs = bs[:len(bs) - size]
} // cs has reversed bytes
NOTE: This answer is from 2009, so there are probably better solutions out there by now.
Looks a bit 'roundabout', and probably not very efficient, but illustrates how the Reader interface can be used to read from strings. IntVectors also seem very suitable as buffers when working with utf8 strings.
It would be even shorter when leaving out the 'size' part, and insertion into the vector by Insert, but I guess that would be less efficient, as the whole vector then needs to be pushed back by one each time a new rune is added.
This solution definitely works with utf8 characters.
package main
import "container/vector";
import "fmt";
import "utf8";
import "bytes";
import "bufio";
func
main() {
toReverse := "Smørrebrød";
fmt.Println(toReverse);
fmt.Println(reverse(toReverse));
}
func
reverse(str string) string {
size := utf8.RuneCountInString(str);
output := vector.NewIntVector(size);
input := bufio.NewReader(bytes.NewBufferString(str));
for i := 1; i <= size; i++ {
rune, _, _ := input.ReadRune();
output.Set(size - i, rune);
}
return string(output.Data());
}
func Reverse(s string) string {
r := []rune(s)
var output strings.Builder
for i := len(r) - 1; i >= 0; i-- {
output.WriteString(string(r[i]))
}
return output.String()
}
Simple, Sweet and Performant
func reverseStr(str string) string {
strSlice := []rune(str) //converting to slice of runes
length := len(strSlice)
for i := 0; i < (length / 2); i++ {
strSlice[i], strSlice[length-i-1] = strSlice[length-i-1], strSlice[i]
}
return string(strSlice) //converting back to string
}
Reversing a string by word is a similar process. First, we convert the string into an array of strings where each entry is a word. Next, we apply the normal reverse loop to that array. Finally, we smush the results back together into a string that we can return to the caller.
package main
import (
"fmt"
"strings"
)
func reverse_words(s string) string {
words := strings.Fields(s)
for i, j := 0, len(words)-1; i < j; i, j = i+1, j-1 {
words[i], words[j] = words[j], words[i]
}
return strings.Join(words, " ")
}
func main() {
fmt.Println(reverse_words("one two three"))
}
Another hack is to use built-in language features, for example, defer:
package main
import "fmt"
func main() {
var name string
fmt.Scanln(&name)
for _, char := range []rune(name) {
defer fmt.Printf("%c", char) // <-- LIFO does it all for you
}
}
For simple strings it possible to use such construction:
func Reverse(str string) string {
if str != "" {
return Reverse(str[1:]) + str[:1]
}
return ""
}
For Unicode strings it might look like this:
func RecursiveReverse(str string) string {
if str == "" {
return ""
}
runes := []rune(str)
return RecursiveReverse(string(runes[1:])) + string(runes[0])
}
A version which I think works on unicode. It is built on the utf8.Rune functions:
func Reverse(s string) string {
b := make([]byte, len(s));
for i, j := len(s)-1, 0; i >= 0; i-- {
if utf8.RuneStart(s[i]) {
rune, size := utf8.DecodeRuneInString(s[i:len(s)]);
utf8.EncodeRune(rune, b[j:j+size]);
j += size;
}
}
return string(b);
}
rune is a type, so use it. Moreover, Go doesn't use semicolons.
func reverse(s string) string {
l := len(s)
m := make([]rune, l)
for _, c := range s {
l--
m[l] = c
}
return string(m)
}
func main() {
str := "the quick brown 狐 jumped over the lazy 犬"
fmt.Printf("reverse(%s): [%s]\n", str, reverse(str))
}
try below code:
package main
import "fmt"
func reverse(s string) string {
chars := []rune(s)
for i, j := 0, len(chars)-1; i < j; i, j = i+1, j-1 {
chars[i], chars[j] = chars[j], chars[i]
}
return string(chars)
}
func main() {
fmt.Printf("%v\n", reverse("abcdefg"))
}
for more info check http://golangcookbook.com/chapters/strings/reverse/
and http://www.dotnetperls.com/reverse-string-go
func reverseString(someString string) string {
runeString := []rune(someString)
var reverseString string
for i := len(runeString)-1; i >= 0; i -- {
reverseString += string(runeString[i])
}
return reverseString
}
Strings are immutable object in golang, unlike C inplace reverse is not possible with golang.
With C , you can do something like,
void reverseString(char *str) {
int length = strlen(str)
for(int i = 0, j = length-1; i < length/2; i++, j--)
{
char tmp = str[i];
str[i] = str[j];
str[j] = tmp;
}
}
But with golang, following one, uses byte to convert the input into bytes first and then reverses the byte array once it is reversed, convert back to string before returning. works only with non unicode type string.
package main
import "fmt"
func main() {
s := "test123 4"
fmt.Println(reverseString(s))
}
func reverseString(s string) string {
a := []byte(s)
for i, j := 0, len(s)-1; i < j; i++ {
a[i], a[j] = a[j], a[i]
j--
}
return string(a)
}
Here is yet another solution:
func ReverseStr(s string) string {
chars := []rune(s)
rev := make([]rune, 0, len(chars))
for i := len(chars) - 1; i >= 0; i-- {
rev = append(rev, chars[i])
}
return string(rev)
}
However, yazu's solution above is more elegant since he reverses the []rune slice in place.

Resources