Counting characters in golang string - string

I am trying to count "characters" in go. That is, if a string contains one printable "glyph", or "composed character" (or what someone would ordinarily think of as a character), I want it to count 1. For example, the string "Hello, δΈ–πŸ––πŸΏπŸ––η•Œ", should count 11, since there are 11 characters, and a human would look at this and say there are 11 glyphs.
utf8.RuneCountInString() works well in most cases, including ascii, accents, asian characters and even emojis. However, as I understand it runes correspond to code points, not characters. When I try to use basic emojis it works, but when I use emojis that have different skin tones, I get the wrong count: https://play.golang.org/p/aFIGsB6MsO
From what I read here and here the following should work, but I still don't seem to be getting the right results (it over-counts):
func CountCharactersInString(str string) int {
var ia norm.Iter
ia.InitString(norm.NFC, str)
nc := 0
for !ia.Done() {
nc = nc + 1
ia.Next()
}
return nc
}
This doesn't work either:
func GraphemeCountInString(str string) int {
re := regexp.MustCompile("\\PM\\pM*|.")
return len(re.FindAllString(str, -1))
}
I am looking for something similar to this in Objective C:
+ (NSInteger)countCharactersInString:(NSString *) string {
// --- Calculate the number of characters enterd by user and update character count label
NSInteger count = 0;
NSUInteger index = 0;
while (index < string.length) {
NSRange range = [string rangeOfComposedCharacterSequenceAtIndex:index];
count++;
index += range.length;
}
return count;
}

Straight forward natively use the utf8.RuneCountInString()
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
str := "Hello, δΈ–πŸ––πŸ––η•Œ"
fmt.Println("counts =", utf8.RuneCountInString(str))
}

I wrote a package that allows you to do this: https://github.com/rivo/uniseg. It breaks strings according to the rules specified in Unicode Standard Annex #29 which is what you are looking for. Here is how you would use it in your case:
package main
import (
"fmt"
"github.com/rivo/uniseg"
)
func main() {
fmt.Println(uniseg.GraphemeClusterCount("Hello, δΈ–πŸ––πŸΏπŸ––η•Œ"))
}
This will print 11 as you expect.

Have you tried strings.Count?
package main
import (
"fmt"
"strings"
)
func main() {
fmt.Println(strings.Count("Hello, δΈ–πŸ––πŸ––η•Œ", "πŸ––")) // Returns 2
}

Reference to the example of API document.
https://golang.org/pkg/unicode/utf8/#example_DecodeLastRuneInString
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
str := "Hello, δΈ–πŸ––η•Œ"
count := 0
for len(str) > 0 {
r, size := utf8.DecodeLastRuneInString(str)
count++
fmt.Printf("%c %v\n", r, size)
str = str[:len(str)-size]
}
fmt.Println("count:",count)
}

I think the easiest way to do this would be like this:
package main
import "fmt"
func main() {
str := "Hello, δΈ–πŸ––πŸ––η•Œ"
var counter int
for range str {
counter++
}
fmt.Println(counter)
}
This one prints 11

Related

How can I remove the last 4 characters from a string?

I want to remove the last 4 characters from a string, so "test.txt" becomes "test".
package main
import (
"fmt"
"strings"
)
func main() {
file := "test.txt"
fmt.Print(strings.TrimSuffix(file, "."))
}
This will safely remove any dot-extension - and will be tolerant if no extension is found:
func removeExtension(fpath string) string {
ext := filepath.Ext(fpath)
return strings.TrimSuffix(fpath, ext)
}
Playground example.
Table tests:
/www/main.js -> '/www/main'
/tmp/test.txt -> '/tmp/test'
/tmp/test2.text -> '/tmp/test2'
/tmp/test3.verylongext -> '/tmp/test3'
/user/bob.smith/has.many.dots.exe -> '/user/bob.smith/has.many.dots'
/tmp/zeroext. -> '/tmp/zeroext'
/tmp/noext -> '/tmp/noext'
-> ''
Though there is already an accepted answer, I want to share some slice tricks for string manipulation.
Remove last n characters from a string
As the title says, remove the last 4 characters from a string, it is very common usage of slices, ie,
file := "test.txt"
fmt.Println(file[:len(file)-4]) // you can replace 4 with any n
Output:
test
Playground example.
Remove file extensions:
From your problem description, it looks like you are trying to trim the file extension suffix (ie, .txt) from the string.
For this, I would prefer #colminator's answer from above, which is
file := "test.txt"
fmt.Println(strings.TrimSuffix(file, filepath.Ext(file)))
You can use this to remove everything after last "."
go playground
package main
import (
"fmt"
"strings"
)
func main() {
sampleInput := []string{
"/www/main.js",
"/tmp/test.txt",
"/tmp/test2.text",
"/tmp/test3.verylongext",
"/user/bob.smith/has.many.dots.exe",
"/tmp/zeroext.",
"/tmp/noext",
"",
"tldr",
}
for _, str := range sampleInput {
fmt.Println(removeExtn(str))
}
}
func removeExtn(input string) string {
if len(input) > 0 {
if i := strings.LastIndex(input, "."); i > 0 {
input = input[:i]
}
}
return input
}

Access a string as a character array for using in strings.Join() method: GO language

I am trying to access a string as a character array or as a rune and join with some separator. What is the right way to do it.
Here are the two ways i tried but i get an error as below
cannot use ([]rune)(t)[i] (type rune) as type []string in argument to strings.Join
How does a string represented in GOLANG. Is it like a character array?
package main
import (
"fmt"
"strings"
)
func main() {
var t = "hello"
s := ""
for i, rune := range t {
s += strings.Join(rune, "\n")
}
fmt.Println(s)
}
package main
import (
"fmt"
"strings"
)
func main() {
var t = "hello"
s := ""
for i := 0; i < len(t); i++ {
s += strings.Join([]rune(t)[i], "\n")
}
fmt.Println(s)
}
I also tried the below way.BUt, it does not work for me.
var t = "hello"
s := ""
for i := 0; i < len(t); i++ {
s += strings.Join(string(t[i]), "\n")
}
fmt.Println(s)
The strings.Join method expects a slice of strings as first argument, but you are giving it a rune type.
You can use the strings.Split method to obtain a slice of strings from a string. Here is an example.

How to merge multiple strings and int into a single string

I am a newbie in Go. I can't find any official docs showing how to merge multiple strings into a new string.
What I'm expecting:
Input: "key:", "value", ", key2:", 100
Output: "Key:value, key2:100"
I want to use + to merge strings like in Java and Swift if possible.
I like to use fmt's Sprintf method for this type of thing. It works like Printf in Go or C only it returns a string. Here's an example:
output := fmt.Sprintf("%s%s%s%d", "key:", "value", ", key2:", 100)
Go docs for fmt.Sprintf
You can use strings.Join, which is almost 3x faster than fmt.Sprintf. However it can be less readable.
output := strings.Join([]string{"key:", "value", ", key2:", strconv.Itoa(100)}, "")
See https://play.golang.org/p/AqiLz3oRVq
strings.Join vs fmt.Sprintf
BenchmarkFmt-4 2000000 685 ns/op
BenchmarkJoins-4 5000000 244 ns/op
Buffer
If you need to merge a lot of strings, I'd consider using a buffer rather than those solutions mentioned above.
You can simply do this:
package main
import (
"fmt"
"strconv"
)
func main() {
result:="str1"+"str2"+strconv.Itoa(123)+"str3"+strconv.Itoa(12)
fmt.Println(result)
}
Using fmt.Sprintf()
var s1="abc"
var s2="def"
var num =100
ans:=fmt.Sprintf("%s%d%s", s1,num,s2);
fmt.Println(ans);
You can use text/template:
package main
import (
"text/template"
"strings"
)
func format(s string, v interface{}) string {
t, b := new(template.Template), new(strings.Builder)
template.Must(t.Parse(s)).Execute(b, v)
return b.String()
}
func main() {
s := struct{
Key string
Key2 int
}{"value", 100}
f := format("key:{{.Key}}, key2:{{.Key2}}", s)
println(f)
}
or fmt.Sprint:
package main
import "fmt"
func main() {
s := fmt.Sprint("key:", "value", ", key2:", 100)
println(s)
}
https://golang.org/pkg/fmt#Sprint
https://pkg.go.dev/text/template
Here's a simple way to combine string and integer in Go Lang(Version: go1.18.1 Latest)
package main
import (
"fmt"
"io"
"os"
)
func main() {
const name, age = "John", 26
s := fmt.Sprintf("%s is %d years old.\n", name, age)
io.WriteString(os.Stdout, s) // Ignoring error for simplicity.
}
Output:
John is 26 years old.

How can I assign a new char into a string in Go?

I'm trying to alter an existing string in Go but I keep getting this error "cannot assign to new_str[i]"
package main
import "fmt"
func ToUpper(str string) string {
new_str := str
for i:=0; i<len(str); i++{
if str[i]>='a' && str[i]<='z'{
chr:=uint8(rune(str[i])-'a'+'A')
new_str[i]=chr
}
}
return new_str
}
func main() {
fmt.Println(ToUpper("cdsrgGDH7865fxgh"))
}
This is my code, I wish to change lowercase to uppercase but I cant alter the string. Why? How can I alter it?
P.S I wish to use ONLY the fmt package!
Thanks in advance.
You can't... they are immutable. From the Golang Language Specification:
Strings are immutable: once created, it is impossible to change the contents of a string.
You can however, cast it to a []byte slice and alter that:
func ToUpper(str string) string {
new_str := []byte(str)
for i := 0; i < len(str); i++ {
if str[i] >= 'a' && str[i] <= 'z' {
chr := uint8(rune(str[i]) - 'a' + 'A')
new_str[i] = chr
}
}
return string(new_str)
}
Working sample: http://play.golang.org/p/uZ_Gui7cYl
Use range and avoid unnecessary conversions and allocations. Strings are immutable. For example,
package main
import "fmt"
func ToUpper(s string) string {
var b []byte
for i, c := range s {
if c >= 'a' && c <= 'z' {
if b == nil {
b = []byte(s)
}
b[i] = byte('A' + rune(c) - 'a')
}
}
if b == nil {
return s
}
return string(b)
}
func main() {
fmt.Println(ToUpper("cdsrgGDH7865fxgh"))
}
Output:
CDSRGGDH7865FXGH
In Go strings are immutable. Here is one very bad way of doing what you want (playground)
package main
import "fmt"
func ToUpper(str string) string {
new_str := ""
for i := 0; i < len(str); i++ {
chr := str[i]
if chr >= 'a' && chr <= 'z' {
chr = chr - 'a' + 'A'
}
new_str += string(chr)
}
return new_str
}
func main() {
fmt.Println(ToUpper("cdsrgGDH7865fxgh"))
}
This is bad because
you are treating your string as characters - what if it is UTF-8? Using range str is the way to go
appending to strings is slow - lots of allocations - a bytes.Buffer would be a good idea
there is a very good library routine to do this already strings.ToUpper
It is worth exploring the line new_str += string(chr) a bit more. Strings are immutable, so what this does is make a new string with the chr on the end, it doesn't extend the old string. This is wildly inefficient for long strings as the allocated memory will tend to the square of the string length.
Next time just use strings.ToUpper!

How to convert an int value to string in Go?

i := 123
s := string(i)
s is 'E', but what I want is "123"
Please tell me how can I get "123".
And in Java, I can do in this way:
String s = "ab" + "c" // s is "abc"
how can I concat two strings in Go?
Use the strconv package's Itoa function.
For example:
package main
import (
"strconv"
"fmt"
)
func main() {
t := strconv.Itoa(123)
fmt.Println(t)
}
You can concat strings simply by +'ing them, or by using the Join function of the strings package.
fmt.Sprintf("%v",value);
If you know the specific type of value use the corresponding formatter for example %d for int
More info - fmt
fmt.Sprintf, strconv.Itoa and strconv.FormatInt will do the job. But Sprintf will use the package reflect, and it will allocate one more object, so it's not an efficient choice.
It is interesting to note that strconv.Itoa is shorthand for
func FormatInt(i int64, base int) string
with base 10
For Example:
strconv.Itoa(123)
is equivalent to
strconv.FormatInt(int64(123), 10)
You can use fmt.Sprintf or strconv.FormatFloat
For example
package main
import (
"fmt"
)
func main() {
val := 14.7
s := fmt.Sprintf("%f", val)
fmt.Println(s)
}
In this case both strconv and fmt.Sprintf do the same job but using the strconv package's Itoa function is the best choice, because fmt.Sprintf allocate one more object during conversion.
check the benchmark here: https://gist.github.com/evalphobia/caee1602969a640a4530
see https://play.golang.org/p/hlaz_rMa0D for example.
Converting int64:
n := int64(32)
str := strconv.FormatInt(n, 10)
fmt.Println(str)
// Prints "32"
Another option:
package main
import "fmt"
func main() {
n := 123
s := fmt.Sprint(n)
fmt.Println(s == "123")
}
https://golang.org/pkg/fmt#Sprint
ok,most of them have shown you something good.
Let'me give you this:
// ToString Change arg to string
func ToString(arg interface{}, timeFormat ...string) string {
if len(timeFormat) > 1 {
log.SetFlags(log.Llongfile | log.LstdFlags)
log.Println(errors.New(fmt.Sprintf("timeFormat's length should be one")))
}
var tmp = reflect.Indirect(reflect.ValueOf(arg)).Interface()
switch v := tmp.(type) {
case int:
return strconv.Itoa(v)
case int8:
return strconv.FormatInt(int64(v), 10)
case int16:
return strconv.FormatInt(int64(v), 10)
case int32:
return strconv.FormatInt(int64(v), 10)
case int64:
return strconv.FormatInt(v, 10)
case string:
return v
case float32:
return strconv.FormatFloat(float64(v), 'f', -1, 32)
case float64:
return strconv.FormatFloat(v, 'f', -1, 64)
case time.Time:
if len(timeFormat) == 1 {
return v.Format(timeFormat[0])
}
return v.Format("2006-01-02 15:04:05")
case jsoncrack.Time:
if len(timeFormat) == 1 {
return v.Time().Format(timeFormat[0])
}
return v.Time().Format("2006-01-02 15:04:05")
case fmt.Stringer:
return v.String()
case reflect.Value:
return ToString(v.Interface(), timeFormat...)
default:
return ""
}
}
package main
import (
"fmt"
"strconv"
)
func main(){
//First question: how to get int string?
intValue := 123
// keeping it in separate variable :
strValue := strconv.Itoa(intValue)
fmt.Println(strValue)
//Second question: how to concat two strings?
firstStr := "ab"
secondStr := "c"
s := firstStr + secondStr
fmt.Println(s)
}

Resources