Understanding bytePtrToString() from Golang pkg/net/interface_windows.go

Understanding bytePtrToString() from Golang pkg/net/interface_windows.go - string

I am trying to understand this code to convert from [16]byte to string:
// From: https://code.google.com/p/go/source/browse/src/pkg/net/interface_windows.go
func bytePtrToString(p *uint8) string {
a := (*[10000]uint8)(unsafe.Pointer(p))
i := 0
for a[i] != 0 {
i++
}
return string(a[:i])
}
// Type of ipl.IpAddress.String is [16]byte
str := bytePtrToString(&ipl.IpAddress.String[0])
I mean, what is the reason for the pointer magic? Couldn't it be written simply as follows?
func toString(p []byte) string {
for i, b := range p {
if b == 0 {
return string(p[:i])
}
}
return string(p)
}
// Type of ipl.IpAddress.String is [16]byte
str := toString(ipl.IpAddress.String[:])

No, it could not be written as you propose.
You forget what []byte is not array of bytes. Its slice of bytes.
In memory slice is just structure which has pointer to buffer (array of bytes, which will hold data) and some information about array size.
a := (*[10000]uint8)(unsafe.Pointer(p))
this code convert byte pointer to array of bytes.
In go array of bytes - its just bytes, like in C. Nothing more.
So if you will declare array of bytes:
var buffer [256]byte
Go will allocate exactly 256 bytes, array don't even have length. Length of array is part of its type, and only compiler have information about array length.
Whats why we can convert byte pointer to array. Its basically the same.
i := 0
for a[i] != 0 {
i++
}
Here we just find the end of null terminated string.
return string(a[:i])
Here we use slicing operation to create a string. In Go string is basically constant slice, so it will point to the same memory address as original byte pointer.
This two articles will help you understand this topic a bit better:
http://blog.golang.org/slices
http://blog.golang.org/go-slices-usage-and-internals

Related

How to find a substring skipping N chars

How do I get the index of a substring in a string skipping starting with a certain position/with a certain offset, e.g.:
package main
import (
"fmt"
"strings"
)
func main() {
string := "something.value=something=end"
index1 := strings.Index(string, "value=")
fmt.Println(index1) // prints 10
// index2 = ... How do I get the position of the second =, 25?
}
Similar offset in PHP int strpos ( string $haystack , mixed $needle [, int $offset = 0 ] )

The strings package does not provide you such function, but in practice it's rarely needed. Often the strings.Split() function is used to easily split strings into tokens / parts.
But if you do need it: you can simply slice a string, which is efficient (no copy is made, the result shares memory with the original string value).
So effectively the function you're looking for would look like this:
func Index(s, substr string, offset int) int {
if len(s) < offset {
return -1
}
if idx := strings.Index(s[offset:], substr); idx >= 0 {
return offset + idx
}
return -1
}
Example using it:
s := "something.value=something=end"
index1 := strings.Index(s, "value=")
fmt.Println(index1) // prints 10
index2 := Index(s, "=", index1+len("value="))
fmt.Println(index2) // prints 25
Output (try it on the Go Playground):
10
25
Note that when slicing a string, and the offset you have to pass to our Index() function is the byte index, not the rune (character) index. They are equal as long as you have characters with less than 128 codepoints, but beyond that the byte index will be greater than the rune index because those codepoints map to multiple bytes in UTF-8 encoding (which is how Go stores strings in memory). strings.Index() returns you the byte index, and len(s) also returns you the byte-length, so the example will work with all strings properly.
Your original task using strings.Split() could look like this:
s := "something.value=something=end"
parts := strings.Split(s, "=")
fmt.Println(parts)
Which outputs (try it on the Go Playground):
[something.value something end]
The value you want to "parse" out is in parts[1].

taking a slice of string with utf-8 characters may produce corrupted strings as you need to convert it to runes
[]rune(videoHtml)[0:index]

Can I make a prefilled string in golang with make or new?

I am trying to optimize my stringpad library in Go. So far the only way I have found to fill a string (actually bytes.Buffer) with a known character value (ex. 0 or " ") is with a for loop.
the snippet of code is:
// PadLeft pads string on left side with p, c times
func PadLeft(s string, p string, c int) string {
var t bytes.Buffer
if c <= 0 {
return s
}
if len(p) < 1 {
return s
}
for i := 0; i < c; i++ {
t.WriteString(p)
}
t.WriteString(s)
return t.String()
}
The larger the string pad I believe there is more memory copies of the t buffer. Is there a more elegant way to make a known size buffer with a known value on initialization?

You can only use make() and new() to allocate buffers (byte slices or arrays) that are zeroed. You may use composite literals to obtain slices or arrays that initially contain non-zero values, but you can't describe the initial values dynamically (indices must be constants).
Take inspiration from the similar but very efficient strings.Repeat() function. It repeats the given string with given count:
func Repeat(s string, count int) string {
// Since we cannot return an error on overflow,
// we should panic if the repeat will generate
// an overflow.
// See Issue golang.org/issue/16237
if count < 0 {
panic("strings: negative Repeat count")
} else if count > 0 && len(s)*count/count != len(s) {
panic("strings: Repeat count causes overflow")
}
b := make([]byte, len(s)*count)
bp := copy(b, s)
for bp < len(b) {
copy(b[bp:], b[:bp])
bp *= 2
}
return string(b)
}
strings.Repeat() does a single allocation to obtain a working buffer (which will be a byte slice []byte), and uses the builtin copy() function to copy the repeatable string. One thing noteworthy is that it uses the working copy and attempts to copy the whole of it incrementally, meaning e.g. if the string has already been copied 4 times, copying this buffer will make it 8 times, etc. This will minimize the calls to copy(). Also the solution takes advantage of that copy() can copy bytes from a string without having to convert it to a byte slice.
What we want is something similar, but we want the result to be prepended to a string.
We can account for that, simply allocating a buffer that is used inside Repeat() plus the length of the string we're left-padding.
The result (without checking the count param):
func PadLeft(s, p string, count int) string {
ret := make([]byte, len(p)*count+len(s))
b := ret[:len(p)*count]
bp := copy(b, p)
for bp < len(b) {
copy(b[bp:], b[:bp])
bp *= 2
}
copy(ret[len(b):], s)
return string(ret)
}
Testing it:
fmt.Println(PadLeft("aa", "x", 1))
fmt.Println(PadLeft("aa", "x", 2))
fmt.Println(PadLeft("abc", "xy", 3))
Output (try it on the Go Playground):
xaa
xxaa
xyxyxyabc
See similar / related question: Is there analog of memset in go?

golang: optimal sorting and joining strings

This short method in go's source code has a comment which implies that it's not allocating memory in an optimal way.
... could do better allocation-wise here ...
This is the source code for the Join method.
What exactly is inefficiently allocated here? I don't see a way around allocating the source string slice and the destination byte slice. The source being the slice of keys. The destination being the slice of bytes.

The code referenced by the comment is memory efficient as written. Any allocations are in strings.Join which is written to minimize memory allocations.
I suspect that the comment was accidentally copied and pasted from this code in the net/http package:
// TODO: could do better allocation-wise here, but trailers are rare,
// so being lazy for now.
if _, err := io.WriteString(w, "Trailer: "+strings.Join(keys, ",")+"\r\n"); err != nil {
return err
}
This snippet has the following possible allocations:
[]byte created in strings.Join for constructing the result
string conversion result returned by strings.Join
string result for expression "Trailer: "+strings.Join(keys, ",")+"\r\n"
The []byte conversion result used in io.WriteString
A more memory efficient approach is to allocate a single []byte for the data to be written.
n := len("Trailer: ") + len("\r\n")
for _, s := range keys {
n += len(s) + 1
}
p := make([]byte, 0, n-1) // subtract 1 for len(keys) - 1 commas
p = append(p, "Trailer: "...)
for i, s := range keys {
if i > 0 {
p = append(p, ',')
}
p = append(p, s...)
}
p = append(p, "\r\n"...)
w.Write(p)

Golang converting from rune to string

I have the following code, it is supposed to cast a rune into a string and print it. However, I am getting undefined characters when it is printed. I am unable to figure out where the bug is:
package main
import (
"fmt"
"strconv"
"strings"
"text/scanner"
)
func main() {
var b scanner.Scanner
const a = `a`
b.Init(strings.NewReader(a))
c := b.Scan()
fmt.Println(strconv.QuoteRune(c))
}

That's because you used Scanner.Scan() to read a rune but it does something else. Scanner.Scan() can be used to read tokens or runes of special tokens controlled by the Scanner.Mode bitmask, and it returns special constants form the text/scanner package, not the read rune itself.
To read a single rune use Scanner.Next() instead:
c := b.Next()
fmt.Println(c, string(c), strconv.QuoteRune(c))
Output:
97 a 'a'
If you just want to convert a single rune to string, use a simple type conversion. rune is alias for int32, and converting integer numbers to string:
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer.
So:
r := rune('a')
fmt.Println(r, string(r))
Outputs:
97 a
Also to loop over the runes of a string value, you can simply use the for ... range construct:
for i, r := range "abc" {
fmt.Printf("%d - %c (%v)\n", i, r, r)
}
Output:
0 - a (97)
1 - b (98)
2 - c (99)
Or you can simply convert a string value to []rune:
fmt.Println([]rune("abc")) // Output: [97 98 99]
There is also utf8.DecodeRuneInString().
Try the examples on the Go Playground.
Note:
Your original code (using Scanner.Scan()) works like this:
You called Scanner.Init() which sets the Mode (b.Mode) to scanner.GoTokens.
Calling Scanner.Scan() on the input (from "a") returns scanner.Ident because "a" is a valid Go identifier:
c := b.Scan()
if c == scanner.Ident {
fmt.Println("Identifier:", b.TokenText())
}
// Output: "Identifier: a"

I know I'm a bit late to the party but here's a []rune to string function:
func runesToString(runes []rune) (outString string) {
// don't need index so _
for _, v := range runes {
outString += string(v)
}
return
}
yes, there is a named return but I think it's ok in this case as it reduces the number of lines and the function is only short

This simple code works in converting a rune to a string
s := fmt.Sprintf("%c", rune)

Since I came to this question searching for rune and string and char, thought this may help newbies like me
// str := "aഐbc"
// testString(str)
func testString(oneString string){
//string to byte slice - No sweat -just type cast it
// As string IS A byte slice
var twoByteArr []byte = []byte(oneString)
// string to rune Slices - No sweat
// string IS A slice of runes
var threeRuneSlice []rune = []rune(oneString)
// Hmm! String seems to have a dual personality it is both a slice of bytes and
// a slice of runes - yeah - read on
// A rune slice can be convered to string -
// No sweat - as string == rune slice
var thrirdString string = string(threeRuneSlice)
// There is a catch here and that is in printing "characters", using for loop and range
fmt.Println("Chars in oneString")
for i,r := range oneString {
fmt.Printf(" %d %v %c ",i,r,r) //you may not get index 0,1,2,3 here
// since the range runs specially over strings https://blog.golang.org/strings
}
fmt.Println("\nChars in threeRuneSlice")
for i,r := range threeRuneSlice {
fmt.Printf(" %d %v %c ",i,r,r) // i = 0,1,2,4 , perfect!!
// as runes are made up of 4 bytes (rune is int32 and byte in unint8
// and a set of bytes is used to represent a rune which is used to
// represent UTF characters == the REAL CHARECTER
}
fmt.Println("\nValues in oneString ")
for j := 0; j < len(oneString); j++ {
fmt.Printf(" %d %v ",j,oneString[j]) // No you cannot get charecters if you iterate through string in this way
// as you are going over bytes here - not runes
}
fmt.Println("\nValues in twoByteArr")
for j := 0; j < len(twoByteArr); j++ {
fmt.Printf(" %d=%v ",j,twoByteArr[j]) // == same as above
}
fmt.Printf("\none - %s, two %s, three %s\n",oneString,twoByteArr,thrirdString)
}
And some more pointless demo https://play.golang.org/p/tagRBVG8k7V
adapted from https://groups.google.com/g/golang-nuts/c/84GCvDBhpbg/m/Tt6089MPFQAJ
to show that the 'characters' are encoded with one to up to 4 bytes depending on the unicode code point

Provide simple examples to understand how to do it quickly.
// rune => string
fmt.Printf("%c\n", 65) // A
fmt.Println(string(rune(0x1F60A))) // 😊
fmt.Println(string([]rune{0x1F468, 0x200D, 0x1F9B0})) // 👨‍🦰
// string => rune
fmt.Println(strconv.FormatUint(uint64([]rune("😊")[0]), 16)) // 1f60a
fmt.Printf("%U\n", '😊') // U+1F60A
fmt.Printf("%U %U %U\n", '👨', '‍', '🦰') // U+1F468 U+200D U+1F9B0
go playground

How to convert []int8 to string

What's the best way (fastest performance) to convert from []int8 to string?
For []byte we could do string(byteslice), but for []int8 it gives an error:
cannot convert ba (type []int8) to type string
I got the ba from SliceScan() method of *sqlx.Rows that produces []int8 instead of string
Is this solution the fastest?
func B2S(bs []int8) string {
ba := []byte{}
for _, b := range bs {
ba = append(ba, byte(b))
}
return string(ba)
}
EDIT my bad, it's uint8 instead of int8.. so I can do string(ba) directly.

Note beforehand: The asker first stated that input slice is []int8 so that is what the answer is for. Later he realized the input is []uint8 which can be directly converted to string because byte is an alias for uint8 (and []byte => string conversion is supported by the language spec).
You can't convert slices of different types, you have to do it manually.
Question is what type of slice should we convert to? We have 2 candidates: []byte and []rune. Strings are stored as UTF-8 encoded byte sequences internally ([]byte), and a string can also be converted to a slice of runes. The language supports converting both of these types ([]byte and []rune) to string.
A rune is a unicode codepoint. And if we try to convert an int8 to a rune in a one-to-one fashion, it will fail (meaning wrong output) if the input contains characters which are encoded to multiple bytes (using UTF-8) because in this case multiple int8 values should end up in one rune.
Let's start from the string "世界" whose bytes are:
fmt.Println([]byte("世界"))
// Output: [228 184 150 231 149 140]
And its runes:
fmt.Println([]rune("世界"))
// [19990 30028]
It's only 2 runes and 6 bytes. So obviously 1-to-1 int8->rune mapping won't work, we have to go with 1-1 int8->byte mapping.
byte is alias for uint8 having range 0..255, to convert it to []int8 (having range -128..127) we have to use -256+bytevalue if the byte value is > 127 so the "世界" string in []int8 looks like this:
[-28 -72 -106 -25 -107 -116]
The backward conversion what we want is: bytevalue = 256 + int8value if the int8 is negative but we can't do this as int8 (range -128..127) and neither as byte (range 0..255) so we also have to convert it to int first (and back to byte at the end). This could look something like this:
if v < 0 {
b[i] = byte(256 + int(v))
} else {
b[i] = byte(v)
}
But actually since signed integers are represented using 2's complement, we get the same result if we simply use a byte(v) conversion (which in case of negative numbers this is equivalent to 256 + v).
Note: Since we know the length of the slice, it is much faster to allocate a slice with this length and just set its elements using indexing [] and not calling the built-in append function.
So here is the final conversion:
func B2S(bs []int8) string {
b := make([]byte, len(bs))
for i, v := range bs {
b[i] = byte(v)
}
return string(b)
}
Try it on the Go Playground.

Not entirely sure it is the fastest, but I haven't found anything better.
Change ba := []byte{} for ba := make([]byte,0, len(bs) so at the end you have:
func B2S(bs []int8) string {
ba := make([]byte,0, len(bs))
for _, b := range bs {
ba = append(ba, byte(b))
}
return string(ba)
}
This way the append function will never try to insert more data that it can fit in the slice's underlying array and you will avoid unnecessary copying to a bigger array.

What is sure from "Convert between slices of different types" is that you have to build the right slice from your original int8[].
I ended up using rune (int32 alias) (playground), assuming that the uint8 were all simple ascii character. That is obviously an over-simplification and icza's answer has more on that.
Plus the SliceScan() method ended up returning uint8[] anyway.
package main
import (
"fmt"
)
func main() {
s := []int8{'a', 'b', 'c'}
b := make([]rune, len(s))
for i, v := range s {
b[i] = rune(v)
}
fmt.Println(string(b))
}
But I didn't benchmark it against using a []byte.

Use unsafe package.
func B2S(bs []int8) string {
return strings.TrimRight(string(*(*[]byte)unsafe.Pointer(&bs)), "\x00")
}
Send again ^^

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Understanding bytePtrToString() from Golang pkg/net/interface_windows.go - string

Related

How to find a substring skipping N chars

Can I make a prefilled string in golang with make or new?

golang: optimal sorting and joining strings

Golang converting from rune to string

How to convert []int8 to string

Categories

Resources