How to find a substring skipping N chars - string

How do I get the index of a substring in a string skipping starting with a certain position/with a certain offset, e.g.:
package main
import (
"fmt"
"strings"
)
func main() {
string := "something.value=something=end"
index1 := strings.Index(string, "value=")
fmt.Println(index1) // prints 10
// index2 = ... How do I get the position of the second =, 25?
}
Similar offset in PHP int strpos ( string $haystack , mixed $needle [, int $offset = 0 ] )

The strings package does not provide you such function, but in practice it's rarely needed. Often the strings.Split() function is used to easily split strings into tokens / parts.
But if you do need it: you can simply slice a string, which is efficient (no copy is made, the result shares memory with the original string value).
So effectively the function you're looking for would look like this:
func Index(s, substr string, offset int) int {
if len(s) < offset {
return -1
}
if idx := strings.Index(s[offset:], substr); idx >= 0 {
return offset + idx
}
return -1
}
Example using it:
s := "something.value=something=end"
index1 := strings.Index(s, "value=")
fmt.Println(index1) // prints 10
index2 := Index(s, "=", index1+len("value="))
fmt.Println(index2) // prints 25
Output (try it on the Go Playground):
10
25
Note that when slicing a string, and the offset you have to pass to our Index() function is the byte index, not the rune (character) index. They are equal as long as you have characters with less than 128 codepoints, but beyond that the byte index will be greater than the rune index because those codepoints map to multiple bytes in UTF-8 encoding (which is how Go stores strings in memory). strings.Index() returns you the byte index, and len(s) also returns you the byte-length, so the example will work with all strings properly.
Your original task using strings.Split() could look like this:
s := "something.value=something=end"
parts := strings.Split(s, "=")
fmt.Println(parts)
Which outputs (try it on the Go Playground):
[something.value something end]
The value you want to "parse" out is in parts[1].

taking a slice of string with utf-8 characters may produce corrupted strings as you need to convert it to runes
[]rune(videoHtml)[0:index]

Related

How can I iterate over each 2 consecutive characters in a string in go?

I have a string like this:
package main
import "fmt"
func main() {
some := "p1k4"
for i, j := range some {
fmt.Println()
}
}
I want take each two consecutive characters in the string and print them. the output should like p1, 1k, k4, 4p.
I have tried it and still having trouble finding the answer, how should I write the code in go and get the output I want?
Go stores strings in memory as their UTF-8 encoded byte sequence. This maps ASCII charactes one-to-one in bytes, but characters outside of that range map to multiple bytes.
So I would advise to use the for range loop over a string, which ranges over the runes (characters) of the string, properly decoding multi-byte runes. This has the advantage that it does not require allocation (unlike converting the string to []rune). You may also print the pairs using fmt.Printf("%c%c", char1, char2), which also will not require allocation (unlike converting runes back to string and concatenating them).
To learn more about strings, characters and runes in Go, read blog post: Strings, bytes, runes and characters in Go
Since the loop only returns the "current" rune in the iteration (but not the previous or the next rune), use another variable to store the previous (and first) runes so you have access to them when printing.
Let's write a function that prints the pairs as you want:
func printPairs(s string) {
var first, prev rune
for i, r := range s {
if i == 0 {
first, prev = r, r
continue
}
fmt.Printf("%c%c, ", prev, r)
prev = r
}
// Print last pair: prev is the last rune
fmt.Printf("%c%c\n", prev, first)
}
Testing it with your input and with another string that has multi-byte runes:
printPairs("p1k4")
printPairs("Go-世界")
Output will be (try it on the Go Playground):
p1, 1k, k4, 4p
Go, o-, -世, 世界, 界G
package main
import (
"fmt"
)
func main() {
str := "12345"
for i := 0; i < len(str); i++ {
fmt.Println(string(str[i]) + string(str[(i+1)%len(str)]))
}
}
This is a simple for loop over your string with the first character appended at the back:
package main
import "fmt"
func main() {
some := "p1k4"
ns := some + string(some[0])
for i := 0; i < len(ns)-1; i++ {
fmt.Println(ns[i:i+2])
}
}

Fixed Byte Slice to String Conversion - ignore nulls

Is there is efficient way to convert a fixed byte slice to a string without adding null characters to the string?
The traditional way to convert a string from a byte slice is the following:
out := string(b[STRIDX:STRIDX+STRLEN])
While this returns a string, the length is always equal to the byte slice length. So while the string looks normal on a Print statement it is still referencing potentiality null values.This has some very odd effects if you append characters to this string.
Right now i scan the byte slice for nulls to limit the byte slice i feed to string. Not very pretty or efficient.
Example: https://play.golang.org/p/hOoaqCOoFl0
Write a simple function:
func CToGoString(b []byte) string {
i := bytes.IndexByte(b, 0)
if i < 0 {
i = len(b)
}
return string(b[:i])
}
For your example,
package main
import (
"bytes"
"fmt"
)
func CToGoString(b []byte) string {
i := bytes.IndexByte(b, 0)
if i < 0 {
i = len(b)
}
return string(b[:i])
}
const (
BUFLEN = 50
STRLEN = 10
STRIDX = 10
)
func main() {
test := "test"
b := [BUFLEN]byte{}
fmt.Printf("Original\n\tString: '%+v' with length '%d'\n", test, len(test))
copy(b[10:], []byte(test))
s := CToGoString(b[STRIDX : STRIDX+STRLEN])
fmt.Printf("Unpacking with []byte()\n\tString: '%+v' with length '%d' Buf:%+v\n", s, len(s), []byte(s))
}
Playground: https://play.golang.org/p/mH3CBdM6eG_l
Output:
Original
String: 'test' with length '4'
Unpacking with []byte()
String: 'test' with length '4' Buf:[116 101 115 116]

How found offset index a string in rune using go

How found offset index a string in []rune using go?
I can do this work with string type.
if i := strings.Index(input[offset:], "}}"); i > 0 {print(i);}
but i need for runes.
i have a rune and want get offset index.
how can do this work with runes type in go?
example for more undrestand want need:
int offset=0//mean start from 0 (this is important for me)
string text="123456783}}56"
if i := strings.Index(text[offset:], "}}"); i > 0 {print(i);}
output of this example is : 9
but i want do this with []rune type(text variable)
may?
see my current code : https://play.golang.org/p/seImKzVpdh
tank you.
Edit #2: You again indicated a new type "meaning" of your question: you want to search a string in a []rune.
Answer: this is not supported directly in the standard library. But it's easy to implement it with 2 for loops:
func search(text []rune, what string) int {
whatRunes := []rune(what)
for i := range text {
found := true
for j := range whatRunes {
if text[i+j] != whatRunes[j] {
found = false
break
}
}
if found {
return i
}
}
return -1
}
Testing it:
value := []rune("123}456}}789")
result := search(value, "}}")
fmt.Println(result)
Output (try it on the Go Playground):
7
Edit: You updated the question indicating that you want to search runes in a string.
You may easily convert a []rune to a string using a simple type conversion:
toSearchRunes := []rune{'}', '}'}
toSearch := string(toSearchRunes)
And from there on, you can use strings.Index() as you did in your example:
if i := strings.Index(text[offset:], toSearch); i > 0 {
print(i)
}
Try it on the Go Playground.
Original answer follows:
string values in Go are stored as UTF-8 encoded bytes. strings.Index() returns you the byte position if the given substring is found.
So basically what you want is to convert this byte-position to rune-position. The unicode/utf8 package contains utility functions for telling the rune-count or rune-length of a string: utf8.RuneCountInString().
So basically you just need to pass the substring to this function:
offset := 0
text := "123456789}}56"
if i := strings.Index(text[offset:], "}}"); i > 0 {
fmt.Println("byte-pos:", i, "rune-pos:", utf8.RuneCountInString(text[offset:i]))
}
text = "世界}}世界"
if i := strings.Index(text[offset:], "}}"); i > 0 {
fmt.Println("byte-pos:", i, "rune-pos:", utf8.RuneCountInString(text[offset:i]))
}
Output (try it on the Go Playground):
byte-pos: 9 rune-pos: 9
byte-pos: 6 rune-pos: 2
Note: offset must also be a byte position, because when slicing a string like text[offset:], the index is interpreted as byte-index.
If you want to get the index of a rune, use strings.IndexRune() instead of strings.Index().

Golang converting from rune to string

I have the following code, it is supposed to cast a rune into a string and print it. However, I am getting undefined characters when it is printed. I am unable to figure out where the bug is:
package main
import (
"fmt"
"strconv"
"strings"
"text/scanner"
)
func main() {
var b scanner.Scanner
const a = `a`
b.Init(strings.NewReader(a))
c := b.Scan()
fmt.Println(strconv.QuoteRune(c))
}
That's because you used Scanner.Scan() to read a rune but it does something else. Scanner.Scan() can be used to read tokens or runes of special tokens controlled by the Scanner.Mode bitmask, and it returns special constants form the text/scanner package, not the read rune itself.
To read a single rune use Scanner.Next() instead:
c := b.Next()
fmt.Println(c, string(c), strconv.QuoteRune(c))
Output:
97 a 'a'
If you just want to convert a single rune to string, use a simple type conversion. rune is alias for int32, and converting integer numbers to string:
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer.
So:
r := rune('a')
fmt.Println(r, string(r))
Outputs:
97 a
Also to loop over the runes of a string value, you can simply use the for ... range construct:
for i, r := range "abc" {
fmt.Printf("%d - %c (%v)\n", i, r, r)
}
Output:
0 - a (97)
1 - b (98)
2 - c (99)
Or you can simply convert a string value to []rune:
fmt.Println([]rune("abc")) // Output: [97 98 99]
There is also utf8.DecodeRuneInString().
Try the examples on the Go Playground.
Note:
Your original code (using Scanner.Scan()) works like this:
You called Scanner.Init() which sets the Mode (b.Mode) to scanner.GoTokens.
Calling Scanner.Scan() on the input (from "a") returns scanner.Ident because "a" is a valid Go identifier:
c := b.Scan()
if c == scanner.Ident {
fmt.Println("Identifier:", b.TokenText())
}
// Output: "Identifier: a"
I know I'm a bit late to the party but here's a []rune to string function:
func runesToString(runes []rune) (outString string) {
// don't need index so _
for _, v := range runes {
outString += string(v)
}
return
}
yes, there is a named return but I think it's ok in this case as it reduces the number of lines and the function is only short
This simple code works in converting a rune to a string
s := fmt.Sprintf("%c", rune)
Since I came to this question searching for rune and string and char, thought this may help newbies like me
// str := "aഐbc"
// testString(str)
func testString(oneString string){
//string to byte slice - No sweat -just type cast it
// As string IS A byte slice
var twoByteArr []byte = []byte(oneString)
// string to rune Slices - No sweat
// string IS A slice of runes
var threeRuneSlice []rune = []rune(oneString)
// Hmm! String seems to have a dual personality it is both a slice of bytes and
// a slice of runes - yeah - read on
// A rune slice can be convered to string -
// No sweat - as string == rune slice
var thrirdString string = string(threeRuneSlice)
// There is a catch here and that is in printing "characters", using for loop and range
fmt.Println("Chars in oneString")
for i,r := range oneString {
fmt.Printf(" %d %v %c ",i,r,r) //you may not get index 0,1,2,3 here
// since the range runs specially over strings https://blog.golang.org/strings
}
fmt.Println("\nChars in threeRuneSlice")
for i,r := range threeRuneSlice {
fmt.Printf(" %d %v %c ",i,r,r) // i = 0,1,2,4 , perfect!!
// as runes are made up of 4 bytes (rune is int32 and byte in unint8
// and a set of bytes is used to represent a rune which is used to
// represent UTF characters == the REAL CHARECTER
}
fmt.Println("\nValues in oneString ")
for j := 0; j < len(oneString); j++ {
fmt.Printf(" %d %v ",j,oneString[j]) // No you cannot get charecters if you iterate through string in this way
// as you are going over bytes here - not runes
}
fmt.Println("\nValues in twoByteArr")
for j := 0; j < len(twoByteArr); j++ {
fmt.Printf(" %d=%v ",j,twoByteArr[j]) // == same as above
}
fmt.Printf("\none - %s, two %s, three %s\n",oneString,twoByteArr,thrirdString)
}
And some more pointless demo https://play.golang.org/p/tagRBVG8k7V
adapted from https://groups.google.com/g/golang-nuts/c/84GCvDBhpbg/m/Tt6089MPFQAJ
to show that the 'characters' are encoded with one to up to 4 bytes depending on the unicode code point
Provide simple examples to understand how to do it quickly.
// rune => string
fmt.Printf("%c\n", 65) // A
fmt.Println(string(rune(0x1F60A))) // 😊
fmt.Println(string([]rune{0x1F468, 0x200D, 0x1F9B0})) // 👨‍🦰
// string => rune
fmt.Println(strconv.FormatUint(uint64([]rune("😊")[0]), 16)) // 1f60a
fmt.Printf("%U\n", '😊') // U+1F60A
fmt.Printf("%U %U %U\n", '👨', '‍', '🦰') // U+1F468 U+200D U+1F9B0
go playground

How to convert []int8 to string

What's the best way (fastest performance) to convert from []int8 to string?
For []byte we could do string(byteslice), but for []int8 it gives an error:
cannot convert ba (type []int8) to type string
I got the ba from SliceScan() method of *sqlx.Rows that produces []int8 instead of string
Is this solution the fastest?
func B2S(bs []int8) string {
ba := []byte{}
for _, b := range bs {
ba = append(ba, byte(b))
}
return string(ba)
}
EDIT my bad, it's uint8 instead of int8.. so I can do string(ba) directly.
Note beforehand: The asker first stated that input slice is []int8 so that is what the answer is for. Later he realized the input is []uint8 which can be directly converted to string because byte is an alias for uint8 (and []byte => string conversion is supported by the language spec).
You can't convert slices of different types, you have to do it manually.
Question is what type of slice should we convert to? We have 2 candidates: []byte and []rune. Strings are stored as UTF-8 encoded byte sequences internally ([]byte), and a string can also be converted to a slice of runes. The language supports converting both of these types ([]byte and []rune) to string.
A rune is a unicode codepoint. And if we try to convert an int8 to a rune in a one-to-one fashion, it will fail (meaning wrong output) if the input contains characters which are encoded to multiple bytes (using UTF-8) because in this case multiple int8 values should end up in one rune.
Let's start from the string "世界" whose bytes are:
fmt.Println([]byte("世界"))
// Output: [228 184 150 231 149 140]
And its runes:
fmt.Println([]rune("世界"))
// [19990 30028]
It's only 2 runes and 6 bytes. So obviously 1-to-1 int8->rune mapping won't work, we have to go with 1-1 int8->byte mapping.
byte is alias for uint8 having range 0..255, to convert it to []int8 (having range -128..127) we have to use -256+bytevalue if the byte value is > 127 so the "世界" string in []int8 looks like this:
[-28 -72 -106 -25 -107 -116]
The backward conversion what we want is: bytevalue = 256 + int8value if the int8 is negative but we can't do this as int8 (range -128..127) and neither as byte (range 0..255) so we also have to convert it to int first (and back to byte at the end). This could look something like this:
if v < 0 {
b[i] = byte(256 + int(v))
} else {
b[i] = byte(v)
}
But actually since signed integers are represented using 2's complement, we get the same result if we simply use a byte(v) conversion (which in case of negative numbers this is equivalent to 256 + v).
Note: Since we know the length of the slice, it is much faster to allocate a slice with this length and just set its elements using indexing [] and not calling the built-in append function.
So here is the final conversion:
func B2S(bs []int8) string {
b := make([]byte, len(bs))
for i, v := range bs {
b[i] = byte(v)
}
return string(b)
}
Try it on the Go Playground.
Not entirely sure it is the fastest, but I haven't found anything better.
Change ba := []byte{} for ba := make([]byte,0, len(bs) so at the end you have:
func B2S(bs []int8) string {
ba := make([]byte,0, len(bs))
for _, b := range bs {
ba = append(ba, byte(b))
}
return string(ba)
}
This way the append function will never try to insert more data that it can fit in the slice's underlying array and you will avoid unnecessary copying to a bigger array.
What is sure from "Convert between slices of different types" is that you have to build the right slice from your original int8[].
I ended up using rune (int32 alias) (playground), assuming that the uint8 were all simple ascii character. That is obviously an over-simplification and icza's answer has more on that.
Plus the SliceScan() method ended up returning uint8[] anyway.
package main
import (
"fmt"
)
func main() {
s := []int8{'a', 'b', 'c'}
b := make([]rune, len(s))
for i, v := range s {
b[i] = rune(v)
}
fmt.Println(string(b))
}
But I didn't benchmark it against using a []byte.
Use unsafe package.
func B2S(bs []int8) string {
return strings.TrimRight(string(*(*[]byte)unsafe.Pointer(&bs)), "\x00")
}
Send again ^^

Resources