How found offset index a string in rune using go - string

How found offset index a string in []rune using go?
I can do this work with string type.
if i := strings.Index(input[offset:], "}}"); i > 0 {print(i);}
but i need for runes.
i have a rune and want get offset index.
how can do this work with runes type in go?
example for more undrestand want need:
int offset=0//mean start from 0 (this is important for me)
string text="123456783}}56"
if i := strings.Index(text[offset:], "}}"); i > 0 {print(i);}
output of this example is : 9
but i want do this with []rune type(text variable)
may?
see my current code : https://play.golang.org/p/seImKzVpdh
tank you.

Edit #2: You again indicated a new type "meaning" of your question: you want to search a string in a []rune.
Answer: this is not supported directly in the standard library. But it's easy to implement it with 2 for loops:
func search(text []rune, what string) int {
whatRunes := []rune(what)
for i := range text {
found := true
for j := range whatRunes {
if text[i+j] != whatRunes[j] {
found = false
break
}
}
if found {
return i
}
}
return -1
}
Testing it:
value := []rune("123}456}}789")
result := search(value, "}}")
fmt.Println(result)
Output (try it on the Go Playground):
7
Edit: You updated the question indicating that you want to search runes in a string.
You may easily convert a []rune to a string using a simple type conversion:
toSearchRunes := []rune{'}', '}'}
toSearch := string(toSearchRunes)
And from there on, you can use strings.Index() as you did in your example:
if i := strings.Index(text[offset:], toSearch); i > 0 {
print(i)
}
Try it on the Go Playground.
Original answer follows:
string values in Go are stored as UTF-8 encoded bytes. strings.Index() returns you the byte position if the given substring is found.
So basically what you want is to convert this byte-position to rune-position. The unicode/utf8 package contains utility functions for telling the rune-count or rune-length of a string: utf8.RuneCountInString().
So basically you just need to pass the substring to this function:
offset := 0
text := "123456789}}56"
if i := strings.Index(text[offset:], "}}"); i > 0 {
fmt.Println("byte-pos:", i, "rune-pos:", utf8.RuneCountInString(text[offset:i]))
}
text = "世界}}世界"
if i := strings.Index(text[offset:], "}}"); i > 0 {
fmt.Println("byte-pos:", i, "rune-pos:", utf8.RuneCountInString(text[offset:i]))
}
Output (try it on the Go Playground):
byte-pos: 9 rune-pos: 9
byte-pos: 6 rune-pos: 2
Note: offset must also be a byte position, because when slicing a string like text[offset:], the index is interpreted as byte-index.
If you want to get the index of a rune, use strings.IndexRune() instead of strings.Index().

Related

How to convert a string to rune?

Here is my code snippet:
var converter = map[rune]rune {//some data}
sample := "⌘こんにちは"
var tmp string
for _, runeValue := range sample {
fmt.Printf("%+q", runeValue)
tmp = fmt.Sprintf("%+q", runeValue)
}
The output of fmt.Printf("%+q", runeValue) is:
'\u2318'
'\u3053'
'\u3093'
'\u306b'
'\u3061'
'\u306f'
These value are literally rune but as the return type of Sprintf is string, I cannot use it in my map which is [rune]rune.
I was wondering how can I convert string to rune, or in other words how can I handle this problem?
A string is not a single rune, it may contain multiple runes. You may use a simple type conversion to convert a string to a []runes containing all its runes like []rune(sample).
The for range iterates over the runes of a string, so in your example runeValue is of type rune, you may use it in your converter map, e.g.:
var converter = map[rune]rune{}
sample := "⌘こんにちは"
for _, runeValue := range sample {
converter[runeValue] = runeValue
}
fmt.Println(converter)
But since rune is an alias for int32, printing the above converter map will print integer numbers, output will be:
map[8984:8984 12371:12371 12385:12385 12395:12395 12399:12399 12435:12435]
If you want to print characters, use the %c verb of fmt.Printf():
fmt.Printf("%c\n", converter)
Which will output:
map[⌘:⌘ こ:こ ち:ち に:に は:は ん:ん]
Try the examples on the Go Playground.
If you want to replace (switch) certain runes in a string, use the strings.Map() function, for example:
sample := "⌘こんにちは"
result := strings.Map(func(r rune) rune {
if r == '⌘' {
return 'a'
}
if r == 'こ' {
return 'b'
}
return r
}, sample)
fmt.Println(result)
Which outputs (try it on the Go Playground):
abんにちは
If you want the replacements defined by a converter map:
var converter = map[rune]rune{
'⌘': 'a',
'こ': 'b',
}
sample := "⌘こんにちは"
result := strings.Map(func(r rune) rune {
if c, ok := converter[r]; ok {
return c
}
return r
}, sample)
fmt.Println(result)
This outputs the same. Try this one on the Go Playground.
Convert string to rune array:
runeArray := []rune("пример")

How to find a substring skipping N chars

How do I get the index of a substring in a string skipping starting with a certain position/with a certain offset, e.g.:
package main
import (
"fmt"
"strings"
)
func main() {
string := "something.value=something=end"
index1 := strings.Index(string, "value=")
fmt.Println(index1) // prints 10
// index2 = ... How do I get the position of the second =, 25?
}
Similar offset in PHP int strpos ( string $haystack , mixed $needle [, int $offset = 0 ] )
The strings package does not provide you such function, but in practice it's rarely needed. Often the strings.Split() function is used to easily split strings into tokens / parts.
But if you do need it: you can simply slice a string, which is efficient (no copy is made, the result shares memory with the original string value).
So effectively the function you're looking for would look like this:
func Index(s, substr string, offset int) int {
if len(s) < offset {
return -1
}
if idx := strings.Index(s[offset:], substr); idx >= 0 {
return offset + idx
}
return -1
}
Example using it:
s := "something.value=something=end"
index1 := strings.Index(s, "value=")
fmt.Println(index1) // prints 10
index2 := Index(s, "=", index1+len("value="))
fmt.Println(index2) // prints 25
Output (try it on the Go Playground):
10
25
Note that when slicing a string, and the offset you have to pass to our Index() function is the byte index, not the rune (character) index. They are equal as long as you have characters with less than 128 codepoints, but beyond that the byte index will be greater than the rune index because those codepoints map to multiple bytes in UTF-8 encoding (which is how Go stores strings in memory). strings.Index() returns you the byte index, and len(s) also returns you the byte-length, so the example will work with all strings properly.
Your original task using strings.Split() could look like this:
s := "something.value=something=end"
parts := strings.Split(s, "=")
fmt.Println(parts)
Which outputs (try it on the Go Playground):
[something.value something end]
The value you want to "parse" out is in parts[1].
taking a slice of string with utf-8 characters may produce corrupted strings as you need to convert it to runes
[]rune(videoHtml)[0:index]

Golang converting from rune to string

I have the following code, it is supposed to cast a rune into a string and print it. However, I am getting undefined characters when it is printed. I am unable to figure out where the bug is:
package main
import (
"fmt"
"strconv"
"strings"
"text/scanner"
)
func main() {
var b scanner.Scanner
const a = `a`
b.Init(strings.NewReader(a))
c := b.Scan()
fmt.Println(strconv.QuoteRune(c))
}
That's because you used Scanner.Scan() to read a rune but it does something else. Scanner.Scan() can be used to read tokens or runes of special tokens controlled by the Scanner.Mode bitmask, and it returns special constants form the text/scanner package, not the read rune itself.
To read a single rune use Scanner.Next() instead:
c := b.Next()
fmt.Println(c, string(c), strconv.QuoteRune(c))
Output:
97 a 'a'
If you just want to convert a single rune to string, use a simple type conversion. rune is alias for int32, and converting integer numbers to string:
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer.
So:
r := rune('a')
fmt.Println(r, string(r))
Outputs:
97 a
Also to loop over the runes of a string value, you can simply use the for ... range construct:
for i, r := range "abc" {
fmt.Printf("%d - %c (%v)\n", i, r, r)
}
Output:
0 - a (97)
1 - b (98)
2 - c (99)
Or you can simply convert a string value to []rune:
fmt.Println([]rune("abc")) // Output: [97 98 99]
There is also utf8.DecodeRuneInString().
Try the examples on the Go Playground.
Note:
Your original code (using Scanner.Scan()) works like this:
You called Scanner.Init() which sets the Mode (b.Mode) to scanner.GoTokens.
Calling Scanner.Scan() on the input (from "a") returns scanner.Ident because "a" is a valid Go identifier:
c := b.Scan()
if c == scanner.Ident {
fmt.Println("Identifier:", b.TokenText())
}
// Output: "Identifier: a"
I know I'm a bit late to the party but here's a []rune to string function:
func runesToString(runes []rune) (outString string) {
// don't need index so _
for _, v := range runes {
outString += string(v)
}
return
}
yes, there is a named return but I think it's ok in this case as it reduces the number of lines and the function is only short
This simple code works in converting a rune to a string
s := fmt.Sprintf("%c", rune)
Since I came to this question searching for rune and string and char, thought this may help newbies like me
// str := "aഐbc"
// testString(str)
func testString(oneString string){
//string to byte slice - No sweat -just type cast it
// As string IS A byte slice
var twoByteArr []byte = []byte(oneString)
// string to rune Slices - No sweat
// string IS A slice of runes
var threeRuneSlice []rune = []rune(oneString)
// Hmm! String seems to have a dual personality it is both a slice of bytes and
// a slice of runes - yeah - read on
// A rune slice can be convered to string -
// No sweat - as string == rune slice
var thrirdString string = string(threeRuneSlice)
// There is a catch here and that is in printing "characters", using for loop and range
fmt.Println("Chars in oneString")
for i,r := range oneString {
fmt.Printf(" %d %v %c ",i,r,r) //you may not get index 0,1,2,3 here
// since the range runs specially over strings https://blog.golang.org/strings
}
fmt.Println("\nChars in threeRuneSlice")
for i,r := range threeRuneSlice {
fmt.Printf(" %d %v %c ",i,r,r) // i = 0,1,2,4 , perfect!!
// as runes are made up of 4 bytes (rune is int32 and byte in unint8
// and a set of bytes is used to represent a rune which is used to
// represent UTF characters == the REAL CHARECTER
}
fmt.Println("\nValues in oneString ")
for j := 0; j < len(oneString); j++ {
fmt.Printf(" %d %v ",j,oneString[j]) // No you cannot get charecters if you iterate through string in this way
// as you are going over bytes here - not runes
}
fmt.Println("\nValues in twoByteArr")
for j := 0; j < len(twoByteArr); j++ {
fmt.Printf(" %d=%v ",j,twoByteArr[j]) // == same as above
}
fmt.Printf("\none - %s, two %s, three %s\n",oneString,twoByteArr,thrirdString)
}
And some more pointless demo https://play.golang.org/p/tagRBVG8k7V
adapted from https://groups.google.com/g/golang-nuts/c/84GCvDBhpbg/m/Tt6089MPFQAJ
to show that the 'characters' are encoded with one to up to 4 bytes depending on the unicode code point
Provide simple examples to understand how to do it quickly.
// rune => string
fmt.Printf("%c\n", 65) // A
fmt.Println(string(rune(0x1F60A))) // 😊
fmt.Println(string([]rune{0x1F468, 0x200D, 0x1F9B0})) // 👨‍🦰
// string => rune
fmt.Println(strconv.FormatUint(uint64([]rune("😊")[0]), 16)) // 1f60a
fmt.Printf("%U\n", '😊') // U+1F60A
fmt.Printf("%U %U %U\n", '👨', '‍', '🦰') // U+1F468 U+200D U+1F9B0
go playground

Go: convert rune (string) to string representation of the binary

This is just in case someone else is learning Golang and is wondering how to convert from a string to a string representation in binary.
Long story short, I have been looking at the standard library without being able to find the right call. So I started with something similar to the following:
func RuneToBinary(r rune) string {
var buf bytes.Buffer
b := []int64{128, 64, 32, 16, 8, 4, 2, 1}
v := int64(r)
for i := 0; i < len(b); i++ {
t := v-b[i]
if t >= 0 {
fmt.Fprintf(&buf, "1")
v = t
} else {
fmt.Fprintf(&buf, "0")
}
}
return buf.String()
}
This is all well and dandy, but after a couple of days looking around I found that I should have been using the fmt package instead and just format the rune with %b%:
var r rune
fmt.Printf("input: %b ", r)
Is there a better way to do this?
Thanks
Standard library support
fmt.Printf("%b", r) - this solution is already very compact and easy to write and understand. If you need the result as a string, you can use the analog Sprintf() function:
s := fmt.Sprintf("%b", r)
You can also use the strconv.FormatInt() function which takes a number of type int64 (so you first have to convert your rune) and a base where you can pass 2 to get the result in binary representation:
s := strconv.FormatInt(int64(r), 2)
Note that in Go rune is just an alias for int32, the 2 types are one and the same (just you may refer to it by 2 names).
Doing it manually ("Simple but Naive"):
If you'd want to do it "manually", there is a much simpler solution than your original. You can test the lowest bit with r & 0x01 == 0 and shift all bits with r >>= 1. Just "loop" over all bits and append either "1" or "0" depending on the bit:
Note this is just for demonstration, it is nowhere near optimal regarding performance (generates "redundant" strings):
func RuneToBin(r rune) (s string) {
if r == 0 {
return "0"
}
for digits := []string{"0", "1"}; r > 0; r >>= 1 {
s = digits[r&1] + s
}
return
}
Note: negative numbers are not handled by the function. If you also want to handle negative numbers, you can first check it and proceed with the positive value of it and start the return value with a minus '-' sign. This also applies the other manual solution below.
Manual Performance-wise solution:
For a fast solution we shouldn't append strings. Since strings in Go are just byte slices encoded using UTF-8, appending a digit is just appending the byte value of the rune '0' or '1' which is just one byte (not multi). So we can allocate a big enough buffer/array (rune is 32 bits so max 32 binary digits), and fill it backwards so we won't even have to reverse it at the end. And return the used part of the array converted to string at the end. Note that I don't even call the built-in append function to append the binary digits, I just set the respective element of the array in which I build the result:
func RuneToBinFast(r rune) string {
if r == 0 {
return "0"
}
b, i := [32]byte{}, 31
for ; r > 0; r, i = r>>1, i-1 {
if r&1 == 0 {
b[i] = '0'
} else {
b[i] = '1'
}
}
return string(b[i+1:])
}

Why does this conversion in go from a rune-string to integer does not work?

i have the following code:
I know about runes in go, i read about them a lot in the last hours i have tried to solve this...
package main
import (
"fmt"
"strconv"
)
func main() {
e := "\x002"
fmt.Println(e)
new := string(e)
i, err := strconv.Atoi(new)
if err != nil { fmt.Println(err) }
fmt.Println(i)
}
result is:
2
strconv.ParseInt: parsing "\x002": invalid syntax
0
why can't i convert the string to an integer?
Any help appreciated!
I'm not 100% sure of your goal but it looks like you want to extract the int value of the rune you get from a string containing a given character.
It looks like you want
e := "\x02"
runes := []rune(e)
i := runes[0]
fmt.Println(i) // 2
\xXXXX tries to parse it as a unicode rune, you need to skip the \ check this:
Either use :
e := "\\x002"
#or use a raw string :
e := `\x002`
edit :
Why do you think \x002 is a valid integer? do you mean 0x002?

Resources