Golang: verify that string is a valid hex string? - string

I have a struct:
type Name struct {
hexID string
age uint8
}
What is the easiest way to check that hexID field is a valid hex string? And if not - rise an error.
For example:
var n Name
n.hexID = "Hello World >)" // not a valid hex
n.hexID = "aaa12Eb9990101010101112cC" // valid hex
Or maybe there are somewhere struct tag exists?

Iterate over the characters of the string, and check if each is a valid hex digit.
func isValidHex(s string) bool {
for _, r := range s {
if !(r >= '0' && r <= '9' || r >= 'a' && r <= 'f' || r >= 'A' && r <= 'F') {
return false
}
}
return true
}
Testing it:
fmt.Println(isValidHex("Hello World >)"))
fmt.Println(isValidHex("aaa12Eb9990101010101112cC"))
Output (try it on the Go Playground):
false
true
Note: you'd be tempted to use hex.DecodeString() and check the returned error: if it's nil, it's valid. But do note that this function expects that the string has even length (as it produces bytes from the hex digits, and 2 hex digits forms a byte). Not to mention that if you don't need the result (as a byte slice), this is slower and creates unnecessary garbage (for the gc to collect).
Another solution could be using big.Int.SetString():
func isValidHex(s string) bool {
_, ok := new(big.Int).SetString(s, 16)
return ok
}
This outputs the same, try it on the Go Playground. But this again is slower and uses memory allocations (generates garbage).

What about this one
regexp.MatchString("[^0-9A-Fa-f]", n.hexID)
True if string contains HEX illegal characters

Comment: I'm completely confused now which one to use :( – armaka
Different inplementations have different performance. For example,
func isHexRock(s string) bool {
for _, b := range []byte(s) {
if !(b >= '0' && b <= '9' || b >= 'a' && b <= 'f' || b >= 'A' && b <= 'F') {
return false
}
}
return true
}
func isHexIcza(s string) bool {
for _, r := range s {
if !(r >= '0' && r <= '9' || r >= 'a' && r <= 'f' || r >= 'A' && r <= 'F') {
return false
}
}
return true
}
var rxNotHex = regexp.MustCompile("[^0-9A-Fa-f]")
func isHexOjacoGlobal(s string) bool {
return !rxNotHex.MatchString(s)
}
func isHexOjacoLocal(s string) bool {
notHex, err := regexp.MatchString("[^0-9A-Fa-f]", s)
if err != nil {
panic(err)
}
return !notHex
}
Some benchmark results:
BenchmarkRock-4 36386997 30.92 ns/op 0 B/op 0 allocs/op
BenchmarkIcza-4 21100798 52.86 ns/op 0 B/op 0 allocs/op
BenchmarkOjacoGlobal-4 5958829 209.9 ns/op 0 B/op 0 allocs/op
BenchmarkOjacoLocal-4 227672 4648 ns/op 1626 B/op 22 allocs/op

I think ParseInt method can also tackle this -
_, err := strconv.ParseInt("deadbeef", 16, 64)
if err != nil {
fmt.Println("not a hex string")
} else {
fmt.Println("it is a hex string")
}

Related

Which is better to get the last X character of a Golang String?

When I have string "hogemogehogemogehogemoge世界世界世界" which code is better to get last rune with avoiding memory allocation?
There are similar question about to get last X character of Golang String.
How to get the last X Characters of a Golang String?
I want to make sure which is prefered if I just want to get last rune, without any additional operation.
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
// which is more better for memory allocation?
s := "hogemogehogemogehogemoge世界世界世界a"
getLastRune(s, 3)
getLastRune2(s, 3)
}
func getLastRune(s string, c int) {
// DecodeLastRuneInString
j := len(s)
for i := 0; i < c && j > 0; i++ {
_, size := utf8.DecodeLastRuneInString(s[:j])
j -= size
}
lastByRune := s[j:]
fmt.Println(lastByRune)
}
func getLastRune2(s string, c int) {
// string -> []rune
r := []rune(s)
lastByRune := string(r[len(r)-c:])
fmt.Println(lastByRune)
}
世界a
世界a
Whenever performance and allocations are the question, you should run benchmarks.
First let's modify your functions to not print but rather return the result:
func getLastRune(s string, c int) string {
j := len(s)
for i := 0; i < c && j > 0; i++ {
_, size := utf8.DecodeLastRuneInString(s[:j])
j -= size
}
return s[j:]
}
func getLastRune2(s string, c int) string {
r := []rune(s)
if c > len(r) {
c = len(r)
}
return string(r[len(r)-c:])
}
And the benchmark functions:
var s = "hogemogehogemogehogemoge世界世界世界a"
func BenchmarkGetLastRune(b *testing.B) {
for i := 0; i < b.N; i++ {
getLastRune(s, 3)
}
}
func BenchmarkGetLastRune2(b *testing.B) {
for i := 0; i < b.N; i++ {
getLastRune2(s, 3)
}
}
Running them:
go test -bench . -benchmem
Results:
BenchmarkGetLastRune-4 30000000 36.9 ns/op 0 B/op 0 allocs/op
BenchmarkGetLastRune2-4 10000000 165 ns/op 0 B/op 0 allocs/op
getLastRune() is more than 4 times faster. Neither of them is making any allocations, but this is due to a compiler optimization (converting a string to []rune and back generally requires allocation).
If we run the benchmarks with optimizations disabled:
go test -gcflags '-N -l' -bench . -benchmem
Results:
BenchmarkGetLastRune-4 30000000 46.2 ns/op 0 B/op 0 allocs/op
BenchmarkGetLastRune2-4 10000000 197 ns/op 16 B/op 1 allocs/op
Compiler optimizations or not, getLastRune() is the clear winner.

Parse hex string to image/color

How can I parse RGB color in web color format (3 or 6 hex digits) to Color from image/color? Does go have any built-in parser for that?
I want to be able to parse both #XXXXXX and #XXX colors formats.
color docs says nothing about it: https://golang.org/pkg/image/color/
but this task is very common, so I believe that go has some functions for that (which I just didn't find).
Update: I created small Go library based on accepted answer: github.com/g4s8/hexcolor
Foreword: I released this utility (the 2. Fast solution) in github.com/icza/gox, see colorx.ParseHexColor().
1. Elegant solution
Here's another solution using fmt.Sscanf(). It certainly not the fastest solution, but it is elegant. It scans right into the fields of a color.RGBA struct:
func ParseHexColor(s string) (c color.RGBA, err error) {
c.A = 0xff
switch len(s) {
case 7:
_, err = fmt.Sscanf(s, "#%02x%02x%02x", &c.R, &c.G, &c.B)
case 4:
_, err = fmt.Sscanf(s, "#%1x%1x%1x", &c.R, &c.G, &c.B)
// Double the hex digits:
c.R *= 17
c.G *= 17
c.B *= 17
default:
err = fmt.Errorf("invalid length, must be 7 or 4")
}
return
}
Testing it:
hexCols := []string{
"#112233",
"#123",
"#000233",
"#023",
"invalid",
"#abcd",
"#-12",
}
for _, hc := range hexCols {
c, err := ParseHexColor(hc)
fmt.Printf("%-7s = %3v, %v\n", hc, c, err)
}
Output (try it on the Go Playground):
#112233 = { 17 34 51 255}, <nil>
#123 = { 17 34 51 255}, <nil>
#000233 = { 0 2 51 255}, <nil>
#023 = { 0 34 51 255}, <nil>
invalid = { 0 0 0 255}, input does not match format
#abcd = { 0 0 0 255}, invalid length, must be 7 or 4
#-12 = { 0 0 0 255}, expected integer
2. Fast solution
If performance does matter, fmt.Sscanf() is a really bad choice. It requires a format string which the implementation has to parse, and according to it parse the input, and use reflection to store the result to the pointed values.
Since the task is basically just "parsing" a hexadecimal value, we can do better than this. We don't even have to call into a general hex parsing library (such as encoding/hex), we can do that on our own. We don't even have to treat the input as a string, or even as a series of runes, we may lower to the level of treating it as a series of bytes. Yes, Go stores string values as UTF-8 byte sequences in memory, but if the input is a valid color string, all its bytes must be in the range of 0..127 which map to bytes 1-to-1. If that would not be the case, the input would already be invalid, which we will detect, but what color we return in that case should not matter (does not matter).
Now let's see a fast implementation:
var errInvalidFormat = errors.New("invalid format")
func ParseHexColorFast(s string) (c color.RGBA, err error) {
c.A = 0xff
if s[0] != '#' {
return c, errInvalidFormat
}
hexToByte := func(b byte) byte {
switch {
case b >= '0' && b <= '9':
return b - '0'
case b >= 'a' && b <= 'f':
return b - 'a' + 10
case b >= 'A' && b <= 'F':
return b - 'A' + 10
}
err = errInvalidFormat
return 0
}
switch len(s) {
case 7:
c.R = hexToByte(s[1])<<4 + hexToByte(s[2])
c.G = hexToByte(s[3])<<4 + hexToByte(s[4])
c.B = hexToByte(s[5])<<4 + hexToByte(s[6])
case 4:
c.R = hexToByte(s[1]) * 17
c.G = hexToByte(s[2]) * 17
c.B = hexToByte(s[3]) * 17
default:
err = errInvalidFormat
}
return
}
Testing it with the same inputs as in the first example, the output is (try it on the Go Playground):
#112233 = { 17 34 51 255}, <nil>
#123 = { 17 34 51 255}, <nil>
#000233 = { 0 2 51 255}, <nil>
#023 = { 0 34 51 255}, <nil>
invalid = { 0 0 0 255}, invalid format
#abcd = { 0 0 0 255}, invalid format
#-12 = { 0 17 34 255}, invalid format
3. Benchmarks
Let's benchmark these 2 solutions. The benchmarking code will include calling them with long and short formats. Error case is excluded.
func BenchmarkParseHexColor(b *testing.B) {
for i := 0; i < b.N; i++ {
ParseHexColor("#112233")
ParseHexColor("#123")
}
}
func BenchmarkParseHexColorFast(b *testing.B) {
for i := 0; i < b.N; i++ {
ParseHexColorFast("#112233")
ParseHexColorFast("#123")
}
}
And here are the benchmark results:
go test -bench . -benchmem
BenchmarkParseHexColor-4 500000 2557 ns/op 144 B/op 9 allocs/op
BenchmarkParseHexColorFast-4 100000000 10.3 ns/op 0 B/op 0 allocs/op
As we can see, the "fast" solution is roughly 250 times faster and uses no allocation (unlike the "elegant" solution).
An RGBA color is just 4 bytes, one each for the red, green, blue, and alpha channels. For three or six hex digits the alpha byte is usually implied to be 0xFF (AABBCC is considered the same as AABBCCFF, as is ABC).
So parsing the color string is as simple as normalizing it, such that it is of the form "RRGGBBAA" (4 hex encoded bytes), and then decoding it:
package main
import (
"encoding/hex"
"fmt"
"image/color"
"log"
)
func main() {
colorStr := "102030FF"
colorStr, err := normalize(colorStr)
if err != nil {
log.Fatal(err)
}
b, err := hex.DecodeString(colorStr)
if err != nil {
log.Fatal(err)
}
color := color.RGBA{b[0], b[1], b[2], b[3]}
fmt.Println(color) // Output: {16 32 48 255}
}
func normalize(colorStr string) (string, error) {
// left as an exercise for the reader
return colorStr, nil
}
Try it on the playground: https://play.golang.org/p/aCX-vyfMG4G
You can convert any 2 hex digits into an integer using strconv.ParseUint
strconv.ParseUint(str, 16, 8)
The 16 indicates base 16 (hex) and the 8 indicates the bit count, in this case, one byte.
You can use this to parse each 2 characters into their components
https://play.golang.org/p/B56B8_NvnVR
func ParseHexColor(v string) (out color.RGBA, err error) {
if len(v) != 7 {
return out, errors.New("hex color must be 7 characters")
}
if v[0] != '#' {
return out, errors.New("hex color must start with '#'")
}
var red, redError = strconv.ParseUint(v[1:3], 16, 8)
if redError != nil {
return out, errors.New("red component invalid")
}
out.R = uint8(red)
var green, greenError = strconv.ParseUint(v[3:5], 16, 8)
if greenError != nil {
return out, errors.New("green component invalid")
}
out.G = uint8(green)
var blue, blueError = strconv.ParseUint(v[5:7], 16, 8)
if blueError != nil {
return out, errors.New("blue component invalid")
}
out.B = uint8(blue)
return
}
Edit: Thanks to Peter for the correction

Using the == symbol in golang and using a loop to compare if string a equals string b,which performance is better?

for i:=0;i<len(a);i++{
if a[i] != b[i]{
return false
}
}
and just
a == b
I've found that the same string have different address
a := "abc"
b := "abc"
println(&a)
println(&b)
answer is :
0xc420045f68
0xc420045f58
so == not using address to compare.
In fact, I would like to know how == compares two strings.
I am searching for a long time on net. But failed...
You should use the == operator to compare strings. It compares the content of the string values.
What you print is the address of a and b variables. Since they are 2 distinct non-zero size variables, their addresses cannot be the same by definition. The values they hold of course may or may not be the same. The == operator compares the values the variables hold, not the addresses of the variables.
Your solution with the loop might even result in a runtime panic, if the b string is shorter than a, as you index it with values that are valid for a.
The built-in == operator will likely always outperform any loop, as that is implemented in architecture specific assembly code. It is implemented in the runtime package, unexported function memequal().
Also note that the built-in comparison might even omit checking the actual contents of the texts if their string header points to the same data (and have equal length). There is no reason not to use ==.
The only reason where a custom equal function for string values would make sense is where the heuristics of your strings are known. E.g. if you know all the string values have the same prefix and they may only differ in their last character. In this case you could write a comparator function which only compares the last character of the strings to decide if they are equal (and only, optionally revert to actually compare the rest). This solution would of course not use a loop.
Use the Go == operator for string equality. The Go gc and gccgo compilers are optimizing compilers. The Go runtime has been optimized.
This comment in the strings package documentation for the strings.Compare function is also relevant to equality:
Compare is included only for symmetry with package bytes. It is
usually clearer and always faster to use the built-in string
comparison operators ==, <, >, and so on.
In Go, the runtime representation of a string is a struct:
type StringHeader struct {
Data uintptr // byte array pointer
Len int // byte array length
}
When you assign a Go string value to a variable,
s := "ABC"
the memory location allocated for the variable s is set to the value of a StringHeader struct describing the string. The address of the variable, &s, points to the struct, not to the underlying byte array.
A comparison for Go string equality compares the bytes of the underlying arrays, the values of *StringHeader.Data[0:StringHeader.Len].
In Go, we use the Go testing package to benchmark performance. For example, comparing the Go == operator to two Go string equality functions:
Output:
$ go test equal_test.go -bench=. -benchmem
BenchmarkEqualOper-4 500000000 3.19 ns/op 0 B/op 0 allocs/op
BenchmarkEqualFunc1-4 500000000 3.32 ns/op 0 B/op 0 allocs/op
BenchmarkEqualFunc2-4 500000000 3.61 ns/op 0 B/op 0 allocs/op
$ go version
go version devel +bb222cde10 Mon Jun 11 14:47:06 2018 +0000 linux/amd64
$
equal_test.go:
package main
import (
"reflect"
"testing"
"unsafe"
)
func EqualOper(a, b string) bool {
return a == b
}
func EqualFunc1(a, b string) bool {
if len(a) != len(b) {
return false
}
for i := 0; i < len(a); i++ {
if a[i] != b[i] {
return false
}
}
return true
}
func EqualFunc2(a, b string) bool {
if len(a) != len(b) {
return false
}
if len(a) == 0 {
return true
}
// string intern equality
if (*reflect.StringHeader)(unsafe.Pointer(&a)).Data == (*reflect.StringHeader)(unsafe.Pointer(&b)).Data {
return true
}
for i := 0; i < len(a); i++ {
if a[i] != b[i] {
return false
}
}
return true
}
var y, z = "aby", "abz"
func BenchmarkEqualOper(B *testing.B) {
a, b := y, z
for i := 0; i < B.N; i++ {
_ = EqualOper(a, b)
}
}
func BenchmarkEqualFunc1(B *testing.B) {
a, b := y, z
for i := 0; i < B.N; i++ {
_ = EqualFunc1(a, b)
}
}
func BenchmarkEqualFunc2(B *testing.B) {
a, b := y, z
for i := 0; i < B.N; i++ {
_ = EqualFunc2(a, b)
}
}

How to compare strings in golang? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I want to make a function that calculates the length of the common segment (starting from the beginning) in two strings. For example:
foo:="Makan"
bar:="Makon"
The result should be 3.
foo:="Indah"
bar:="Ihkasyandehlo"
The result should be 1.
It's not clear what you are asking because you limited your test cases to ASCII characters.
I've added a Unicode test case and I've included answers for bytes, runes, or both.
play.golang.org:
package main
import (
"fmt"
"unicode/utf8"
)
func commonBytes(s, t string) (bytes int) {
if len(s) > len(t) {
s, t = t, s
}
i := 0
for ; i < len(s); i++ {
if s[i] != t[i] {
break
}
}
return i
}
func commonRunes(s, t string) (runes int) {
if len(s) > len(t) {
s, t = t, s
}
i := 0
for ; i < len(s); i++ {
if s[i] != t[i] {
break
}
}
return utf8.RuneCountInString(s[:i])
}
func commonBytesRunes(s, t string) (bytes, runes int) {
if len(s) > len(t) {
s, t = t, s
}
i := 0
for ; i < len(s); i++ {
if s[i] != t[i] {
break
}
}
return i, utf8.RuneCountInString(s[:i])
}
func main() {
Tests := []struct {
word1, word2 string
}{
{"Makan", "Makon"},
{"Indah", "Ihkasyandehlo"},
{"日本語", "日本語"},
}
for _, test := range Tests {
fmt.Println("Words: ", test.word1, test.word2)
fmt.Println("Bytes: ", commonBytes(test.word1, test.word2))
fmt.Println("Runes: ", commonRunes(test.word1, test.word2))
fmt.Print("Bytes & Runes: ")
fmt.Println(commonBytesRunes(test.word1, test.word2))
}
}
Output:
Words: Makan Makon
Bytes: 3
Runes: 3
Bytes & Runes: 3 3
Words: Indah Ihkasyandehlo
Bytes: 1
Runes: 1
Bytes & Runes: 1 1
Words: 日本語 日本語
Bytes: 9
Runes: 3
Bytes & Runes: 9 3
Note that if you were working with Unicode characters, the result could be quite different.
Try for instance using utf8.DecodeRuneInString().
See this example:
package main
import "fmt"
import "unicode/utf8"
func index(s1, s2 string) int {
res := 0
for i, w := 0, 0; i < len(s2); i += w {
if i >= len(s1) {
return res
}
runeValue1, width := utf8.DecodeRuneInString(s1[i:])
runeValue2, width := utf8.DecodeRuneInString(s2[i:])
if runeValue1 != runeValue2 {
return res
}
if runeValue1 == utf8.RuneError || runeValue2 == utf8.RuneError {
return res
}
w = width
res = i + w
}
return res
}
func main() {
foo := "日本本a語"
bar := "日本本b語"
fmt.Println(index(foo, bar))
foo = "日本語"
bar = "日otest"
fmt.Println(index(foo, bar))
foo = "\xF0"
bar = "\xFF"
fmt.Println(index(foo, bar))
}
Here, the result would be:
9 (3 common runes of width '3')
3 (1 rune of width '3')
0 (invalid rune, meaning utf8.RuneError)
You mean like this. Please note, this will not handle UTF 8, only ascii.
package main
import (
"fmt"
)
func equal(s1, s2 string) int {
eq := 0
if len(s1) > len(s2) {
s1, s2 = s2, s1
}
for key, _ := range s1 {
if s1[key] == s2[key] {
eq++
} else {
break
}
}
return eq
}
func main() {
fmt.Println(equal("buzzfizz", "buzz"))
fmt.Println(equal("Makan", "Makon"))
fmt.Println(equal("Indah", "Ihkasyandehlo"))
}

What is the best way to test for an empty string in Go?

Which method is best (most idomatic) for testing non-empty strings (in Go)?
if len(mystring) > 0 { }
Or:
if mystring != "" { }
Or something else?
Both styles are used within the Go's standard libraries.
if len(s) > 0 { ... }
can be found in the strconv package: http://golang.org/src/pkg/strconv/atoi.go
if s != "" { ... }
can be found in the encoding/json package: http://golang.org/src/pkg/encoding/json/encode.go
Both are idiomatic and are clear enough. It is more a matter of personal taste and about clarity.
Russ Cox writes in a golang-nuts thread:
The one that makes the code clear.
If I'm about to look at element x I typically write
len(s) > x, even for x == 0, but if I care about
"is it this specific string" I tend to write s == "".
It's reasonable to assume that a mature compiler will compile
len(s) == 0 and s == "" into the same, efficient code.
...
Make the code clear.
As pointed out in Timmmm's answer, the Go compiler does generate identical code in both cases.
This seems to be premature microoptimization. The compiler is free to produce the same code for both cases or at least for these two
if len(s) != 0 { ... }
and
if s != "" { ... }
because the semantics is clearly equal.
Assuming that empty spaces and all leading and trailing white spaces should be removed:
import "strings"
if len(strings.TrimSpace(s)) == 0 { ... }
Because :
len("") // is 0
len(" ") // one empty space is 1
len(" ") // two empty spaces is 2
Checking for length is a good answer, but you could also account for an "empty" string that is also only whitespace. Not "technically" empty, but if you care to check:
package main
import (
"fmt"
"strings"
)
func main() {
stringOne := "merpflakes"
stringTwo := " "
stringThree := ""
if len(strings.TrimSpace(stringOne)) == 0 {
fmt.Println("String is empty!")
}
if len(strings.TrimSpace(stringTwo)) == 0 {
fmt.Println("String two is empty!")
}
if len(stringTwo) == 0 {
fmt.Println("String two is still empty!")
}
if len(strings.TrimSpace(stringThree)) == 0 {
fmt.Println("String three is empty!")
}
}
As of now, the Go compiler generates identical code in both cases, so it is a matter of taste. GCCGo does generate different code, but barely anyone uses it so I wouldn't worry about that.
https://godbolt.org/z/fib1x1
As per official guidelines and from performance point of view they appear equivalent (ANisus answer), the s != "" would be better due to a syntactical advantage. s != "" will fail at compile time if the variable is not a string, while len(s) == 0 will pass for several other data types.
I think == "" is faster and more readable.
package main
import(
"fmt"
)
func main() {
n := 1
s:=""
if len(s)==0{
n=2
}
fmt.Println("%d", n)
}
when dlv debug playground.go cmp with len(s) and =="" I got this
s == "" situation
playground.go:6 0x1008d9d20 810b40f9 MOVD 16(R28), R1
playground.go:6 0x1008d9d24 e28300d1 SUB $32, RSP, R2
playground.go:6 0x1008d9d28 5f0001eb CMP R1, R2
playground.go:6 0x1008d9d2c 09070054 BLS 56(PC)
playground.go:6 0x1008d9d30* fe0f16f8 MOVD.W R30, -160(RSP)
playground.go:6 0x1008d9d34 fd831ff8 MOVD R29, -8(RSP)
playground.go:6 0x1008d9d38 fd2300d1 SUB $8, RSP, R29
playground.go:7 0x1008d9d3c e00340b2 ORR $1, ZR, R0
playground.go:7 0x1008d9d40 e01f00f9 MOVD R0, 56(RSP)
playground.go:8 0x1008d9d44 ff7f05a9 STP (ZR, ZR), 80(RSP)
playground.go:9 0x1008d9d48 01000014 JMP 1(PC)
playground.go:10 0x1008d9d4c e0037fb2 ORR $2, ZR, R0
len(s)==0 situation
playground.go:6 0x100761d20 810b40f9 MOVD 16(R28), R1
playground.go:6 0x100761d24 e2c300d1 SUB $48, RSP, R2
playground.go:6 0x100761d28 5f0001eb CMP R1, R2
playground.go:6 0x100761d2c 29070054 BLS 57(PC)
playground.go:6 0x100761d30* fe0f15f8 MOVD.W R30, -176(RSP)
playground.go:6 0x100761d34 fd831ff8 MOVD R29, -8(RSP)
playground.go:6 0x100761d38 fd2300d1 SUB $8, RSP, R29
playground.go:7 0x100761d3c e00340b2 ORR $1, ZR, R0
playground.go:7 0x100761d40 e02300f9 MOVD R0, 64(RSP)
playground.go:8 0x100761d44 ff7f06a9 STP (ZR, ZR), 96(RSP)
playground.go:9 0x100761d48 ff2700f9 MOVD ZR, 72(RSP)
playground.go:9 0x100761d4c 01000014 JMP 1(PC)
playground.go:10 0x100761d50 e0037fb2 ORR $2, ZR, R0
playground.go:10 0x100761d54 e02300f9 MOVD R0, 64(RSP)
playground.go:10 0x100761d58 01000014 JMP 1(PC)
playground.go:6 0x104855d2c 09070054 BLS 56(PC)
Blockquote
It would be cleaner and less error-prone to use a function like the one below:
func empty(s string) bool {
return len(strings.TrimSpace(s)) == 0
}
Just to add more to comment
Mainly about how to do performance testing.
I did testing with following code:
import (
"testing"
)
var ss = []string{"Hello", "", "bar", " ", "baz", "ewrqlosakdjhf12934c r39yfashk fjkashkfashds fsdakjh-", "", "123"}
func BenchmarkStringCheckEq(b *testing.B) {
c := 0
b.ResetTimer()
for n := 0; n < b.N; n++ {
for _, s := range ss {
if s == "" {
c++
}
}
}
t := 2 * b.N
if c != t {
b.Fatalf("did not catch empty strings: %d != %d", c, t)
}
}
func BenchmarkStringCheckLen(b *testing.B) {
c := 0
b.ResetTimer()
for n := 0; n < b.N; n++ {
for _, s := range ss {
if len(s) == 0 {
c++
}
}
}
t := 2 * b.N
if c != t {
b.Fatalf("did not catch empty strings: %d != %d", c, t)
}
}
func BenchmarkStringCheckLenGt(b *testing.B) {
c := 0
b.ResetTimer()
for n := 0; n < b.N; n++ {
for _, s := range ss {
if len(s) > 0 {
c++
}
}
}
t := 6 * b.N
if c != t {
b.Fatalf("did not catch empty strings: %d != %d", c, t)
}
}
func BenchmarkStringCheckNe(b *testing.B) {
c := 0
b.ResetTimer()
for n := 0; n < b.N; n++ {
for _, s := range ss {
if s != "" {
c++
}
}
}
t := 6 * b.N
if c != t {
b.Fatalf("did not catch empty strings: %d != %d", c, t)
}
}
And results were:
% for a in $(seq 50);do go test -run=^$ -bench=. --benchtime=1s ./...|grep Bench;done | tee -a log
% sort -k 3n log | head -10
BenchmarkStringCheckEq-4 150149937 8.06 ns/op
BenchmarkStringCheckLenGt-4 147926752 8.06 ns/op
BenchmarkStringCheckLenGt-4 148045771 8.06 ns/op
BenchmarkStringCheckNe-4 145506912 8.06 ns/op
BenchmarkStringCheckLen-4 145942450 8.07 ns/op
BenchmarkStringCheckEq-4 146990384 8.08 ns/op
BenchmarkStringCheckLenGt-4 149351529 8.08 ns/op
BenchmarkStringCheckNe-4 148212032 8.08 ns/op
BenchmarkStringCheckEq-4 145122193 8.09 ns/op
BenchmarkStringCheckEq-4 146277885 8.09 ns/op
Effectively variants usually do not reach fastest time and there is only minimal difference (about 0.01ns/op) between variant top speed.
And if I look full log, difference between tries is greater than difference between benchmark functions.
Also there does not seem to be any measurable difference between
BenchmarkStringCheckEq and BenchmarkStringCheckNe
or BenchmarkStringCheckLen and BenchmarkStringCheckLenGt
even if latter variants should inc c 6 times instead of 2 times.
You can try to get some confidence about equal performance by adding tests with modified test or inner loop. This is faster:
func BenchmarkStringCheckNone4(b *testing.B) {
c := 0
b.ResetTimer()
for n := 0; n < b.N; n++ {
for _, _ = range ss {
c++
}
}
t := len(ss) * b.N
if c != t {
b.Fatalf("did not catch empty strings: %d != %d", c, t)
}
}
This is not faster:
func BenchmarkStringCheckEq3(b *testing.B) {
ss2 := make([]string, len(ss))
prefix := "a"
for i, _ := range ss {
ss2[i] = prefix + ss[i]
}
c := 0
b.ResetTimer()
for n := 0; n < b.N; n++ {
for _, s := range ss2 {
if s == prefix {
c++
}
}
}
t := 2 * b.N
if c != t {
b.Fatalf("did not catch empty strings: %d != %d", c, t)
}
}
Both variants are usually faster or slower than difference between main tests.
It would also good to generate test strings (ss) using string generator with relevant distribution. And have variable lengths too.
So I don't have any confidence of performance difference between main methods to test empty string in go.
And I can state with some confidence, it is faster not to test empty string at all than test empty string. And also it is faster to test empty string than to test 1 char string (prefix variant).
This would be more performant than trimming the whole string, since you only need to check for at least a single non-space character existing
// Strempty checks whether string contains only whitespace or not
func Strempty(s string) bool {
if len(s) == 0 {
return true
}
r := []rune(s)
l := len(r)
for l > 0 {
l--
if !unicode.IsSpace(r[l]) {
return false
}
}
return true
}
I think the best way is to compare with blank string
BenchmarkStringCheck1 is checking with blank string
BenchmarkStringCheck2 is checking with len zero
I check with the empty and non-empty string checking. You can see that checking with a blank string is faster.
BenchmarkStringCheck1-4 2000000000 0.29 ns/op 0 B/op 0 allocs/op
BenchmarkStringCheck1-4 2000000000 0.30 ns/op 0 B/op 0 allocs/op
BenchmarkStringCheck2-4 2000000000 0.30 ns/op 0 B/op 0 allocs/op
BenchmarkStringCheck2-4 2000000000 0.31 ns/op 0 B/op 0 allocs/op
Code
func BenchmarkStringCheck1(b *testing.B) {
s := "Hello"
b.ResetTimer()
for n := 0; n < b.N; n++ {
if s == "" {
}
}
}
func BenchmarkStringCheck2(b *testing.B) {
s := "Hello"
b.ResetTimer()
for n := 0; n < b.N; n++ {
if len(s) == 0 {
}
}
}

Resources