Go: understanding strings

Go: understanding strings - string

I got slightly confused this morning when the following code worked.
// s points to an empty string in memory
s := new(string)
// assign 1000 byte string to that address
b := make([]byte, 0, 1000)
for i := 0; i < 1000; i++ {
if i%100 == 0 {
b = append(b, '\n')
} else {
b = append(b, 'x')
}
}
*s = string(b)
// how is there room for it there?
print(*s)
http://play.golang.org/p/dAvKLChapd
I feel like I'm missing something obvious here. Some insight would be appreciated.

I hope I understood the question...
An entity of type string is implemented by a run time struct, roughly
type rt_string struct {
ptr *byte // first byte of the string
len int // number of bytes in the string
}
The line
*s = string(b)
sets a new value (of type rt_string) at *s. Its size is constant, so there's "room" for it.
More details in rsc's paper.

Related

Can I make a prefilled string in golang with make or new?

I am trying to optimize my stringpad library in Go. So far the only way I have found to fill a string (actually bytes.Buffer) with a known character value (ex. 0 or " ") is with a for loop.
the snippet of code is:
// PadLeft pads string on left side with p, c times
func PadLeft(s string, p string, c int) string {
var t bytes.Buffer
if c <= 0 {
return s
}
if len(p) < 1 {
return s
}
for i := 0; i < c; i++ {
t.WriteString(p)
}
t.WriteString(s)
return t.String()
}
The larger the string pad I believe there is more memory copies of the t buffer. Is there a more elegant way to make a known size buffer with a known value on initialization?

You can only use make() and new() to allocate buffers (byte slices or arrays) that are zeroed. You may use composite literals to obtain slices or arrays that initially contain non-zero values, but you can't describe the initial values dynamically (indices must be constants).
Take inspiration from the similar but very efficient strings.Repeat() function. It repeats the given string with given count:
func Repeat(s string, count int) string {
// Since we cannot return an error on overflow,
// we should panic if the repeat will generate
// an overflow.
// See Issue golang.org/issue/16237
if count < 0 {
panic("strings: negative Repeat count")
} else if count > 0 && len(s)*count/count != len(s) {
panic("strings: Repeat count causes overflow")
}
b := make([]byte, len(s)*count)
bp := copy(b, s)
for bp < len(b) {
copy(b[bp:], b[:bp])
bp *= 2
}
return string(b)
}
strings.Repeat() does a single allocation to obtain a working buffer (which will be a byte slice []byte), and uses the builtin copy() function to copy the repeatable string. One thing noteworthy is that it uses the working copy and attempts to copy the whole of it incrementally, meaning e.g. if the string has already been copied 4 times, copying this buffer will make it 8 times, etc. This will minimize the calls to copy(). Also the solution takes advantage of that copy() can copy bytes from a string without having to convert it to a byte slice.
What we want is something similar, but we want the result to be prepended to a string.
We can account for that, simply allocating a buffer that is used inside Repeat() plus the length of the string we're left-padding.
The result (without checking the count param):
func PadLeft(s, p string, count int) string {
ret := make([]byte, len(p)*count+len(s))
b := ret[:len(p)*count]
bp := copy(b, p)
for bp < len(b) {
copy(b[bp:], b[:bp])
bp *= 2
}
copy(ret[len(b):], s)
return string(ret)
}
Testing it:
fmt.Println(PadLeft("aa", "x", 1))
fmt.Println(PadLeft("aa", "x", 2))
fmt.Println(PadLeft("abc", "xy", 3))
Output (try it on the Go Playground):
xaa
xxaa
xyxyxyabc
See similar / related question: Is there analog of memset in go?

Safetly write bytes as UTF-8 characters using strings.Builder?

I'm trying to do an exercise where I have to reverse some strings. I heard that Go strings.Builder is the fastest way to create strings at the moment, so I did the following:
func String(toReverse string) string {
var reversedString strings.Builder
for i := len(toReverse) - 1; i >= 0; i-- {
reversedString.WriteByte(toReverse[i])
}
return reversedString.String()
}
The problem is that this doesn't work with multibyte test cases like this:
Hello, 世界
becomes
"\u008c\u0095ç\u0096¸ä ,olleH"
Thanks.

While it is possible - and the default - to access a string's individual bytes by index, a Go string is also convertible to an array of runes, which represent individual Unicode code points.
First of all you need to cast your string into a []rune, and iterate on that:
func String(toReverse string) string {
var reversedString strings.Builder
runes := []rune(toReverse)
for i := len(runes) - 1; i >= 0; i-- {
reversedString.WriteRune(runes[i])
}
return reversedString.String()
}
See https://play.golang.org/p/WYn_MGAGw_x for a live demo.

Golang converting from rune to string

I have the following code, it is supposed to cast a rune into a string and print it. However, I am getting undefined characters when it is printed. I am unable to figure out where the bug is:
package main
import (
"fmt"
"strconv"
"strings"
"text/scanner"
)
func main() {
var b scanner.Scanner
const a = `a`
b.Init(strings.NewReader(a))
c := b.Scan()
fmt.Println(strconv.QuoteRune(c))
}

That's because you used Scanner.Scan() to read a rune but it does something else. Scanner.Scan() can be used to read tokens or runes of special tokens controlled by the Scanner.Mode bitmask, and it returns special constants form the text/scanner package, not the read rune itself.
To read a single rune use Scanner.Next() instead:
c := b.Next()
fmt.Println(c, string(c), strconv.QuoteRune(c))
Output:
97 a 'a'
If you just want to convert a single rune to string, use a simple type conversion. rune is alias for int32, and converting integer numbers to string:
Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer.
So:
r := rune('a')
fmt.Println(r, string(r))
Outputs:
97 a
Also to loop over the runes of a string value, you can simply use the for ... range construct:
for i, r := range "abc" {
fmt.Printf("%d - %c (%v)\n", i, r, r)
}
Output:
0 - a (97)
1 - b (98)
2 - c (99)
Or you can simply convert a string value to []rune:
fmt.Println([]rune("abc")) // Output: [97 98 99]
There is also utf8.DecodeRuneInString().
Try the examples on the Go Playground.
Note:
Your original code (using Scanner.Scan()) works like this:
You called Scanner.Init() which sets the Mode (b.Mode) to scanner.GoTokens.
Calling Scanner.Scan() on the input (from "a") returns scanner.Ident because "a" is a valid Go identifier:
c := b.Scan()
if c == scanner.Ident {
fmt.Println("Identifier:", b.TokenText())
}
// Output: "Identifier: a"

I know I'm a bit late to the party but here's a []rune to string function:
func runesToString(runes []rune) (outString string) {
// don't need index so _
for _, v := range runes {
outString += string(v)
}
return
}
yes, there is a named return but I think it's ok in this case as it reduces the number of lines and the function is only short

This simple code works in converting a rune to a string
s := fmt.Sprintf("%c", rune)

Since I came to this question searching for rune and string and char, thought this may help newbies like me
// str := "aഐbc"
// testString(str)
func testString(oneString string){
//string to byte slice - No sweat -just type cast it
// As string IS A byte slice
var twoByteArr []byte = []byte(oneString)
// string to rune Slices - No sweat
// string IS A slice of runes
var threeRuneSlice []rune = []rune(oneString)
// Hmm! String seems to have a dual personality it is both a slice of bytes and
// a slice of runes - yeah - read on
// A rune slice can be convered to string -
// No sweat - as string == rune slice
var thrirdString string = string(threeRuneSlice)
// There is a catch here and that is in printing "characters", using for loop and range
fmt.Println("Chars in oneString")
for i,r := range oneString {
fmt.Printf(" %d %v %c ",i,r,r) //you may not get index 0,1,2,3 here
// since the range runs specially over strings https://blog.golang.org/strings
}
fmt.Println("\nChars in threeRuneSlice")
for i,r := range threeRuneSlice {
fmt.Printf(" %d %v %c ",i,r,r) // i = 0,1,2,4 , perfect!!
// as runes are made up of 4 bytes (rune is int32 and byte in unint8
// and a set of bytes is used to represent a rune which is used to
// represent UTF characters == the REAL CHARECTER
}
fmt.Println("\nValues in oneString ")
for j := 0; j < len(oneString); j++ {
fmt.Printf(" %d %v ",j,oneString[j]) // No you cannot get charecters if you iterate through string in this way
// as you are going over bytes here - not runes
}
fmt.Println("\nValues in twoByteArr")
for j := 0; j < len(twoByteArr); j++ {
fmt.Printf(" %d=%v ",j,twoByteArr[j]) // == same as above
}
fmt.Printf("\none - %s, two %s, three %s\n",oneString,twoByteArr,thrirdString)
}
And some more pointless demo https://play.golang.org/p/tagRBVG8k7V
adapted from https://groups.google.com/g/golang-nuts/c/84GCvDBhpbg/m/Tt6089MPFQAJ
to show that the 'characters' are encoded with one to up to 4 bytes depending on the unicode code point

Provide simple examples to understand how to do it quickly.
// rune => string
fmt.Printf("%c\n", 65) // A
fmt.Println(string(rune(0x1F60A))) // 😊
fmt.Println(string([]rune{0x1F468, 0x200D, 0x1F9B0})) // 👨‍🦰
// string => rune
fmt.Println(strconv.FormatUint(uint64([]rune("😊")[0]), 16)) // 1f60a
fmt.Printf("%U\n", '😊') // U+1F60A
fmt.Printf("%U %U %U\n", '👨', '‍', '🦰') // U+1F468 U+200D U+1F9B0
go playground

interview riddle (string manipulation) - explanation needed

i am studying for an interview and encountered a question + solution.
i am having a problem with one line in the solution and was hoping maybe someone here can explain it.
the question:
Write a method to replace all spaces in a string with ‘%20’.
the solution:
public static void ReplaceFun(char[] str, int length) {
int spaceCount = 0, newLength, i = 0;
for (i = 0; i < length; i++) {
if (str[i] == ‘ ‘) {
spaceCount++;
}
}
newLength = length + spaceCount * 2;
str[newLength] = ‘\0’;
for (i = length - 1; i >= 0; i--) {
if (str[i] == ‘ ‘) {
str[newLength - 1] = ‘0’;
str[newLength - 2] = ‘2’;
str[newLength - 3] = ‘%’;
newLength = newLength - 3;
} else {
str[newLength - 1] = str[i];
newLength = newLength - 1;
}
}
}
my problem is with line number 9. how can he just set str[newLength] to '\0'? or in other words, how can he take over the needed amount of memory without allocating it first or something like that?
isn't he running over a memory?!

Assuming this is actually meant to be in C (private static is not valid C or C++), they can't, as it's written. They're never allocating a new str which will be long enough to hold the old string plus the %20 expansion.
I suspect there's an additional part to the question, which is that str is already long enough to hold the expanded %20 data, and that length is the length of the string in str, not counting the zero terminator.

This is valid code, but it's not good code. You are completely correct in your assessment that we are overwriting the bounds of the initial str[]. This could cause some rather unwanted side-effects depending on what was being overwritten.

Understanding bytePtrToString() from Golang pkg/net/interface_windows.go

I am trying to understand this code to convert from [16]byte to string:
// From: https://code.google.com/p/go/source/browse/src/pkg/net/interface_windows.go
func bytePtrToString(p *uint8) string {
a := (*[10000]uint8)(unsafe.Pointer(p))
i := 0
for a[i] != 0 {
i++
}
return string(a[:i])
}
// Type of ipl.IpAddress.String is [16]byte
str := bytePtrToString(&ipl.IpAddress.String[0])
I mean, what is the reason for the pointer magic? Couldn't it be written simply as follows?
func toString(p []byte) string {
for i, b := range p {
if b == 0 {
return string(p[:i])
}
}
return string(p)
}
// Type of ipl.IpAddress.String is [16]byte
str := toString(ipl.IpAddress.String[:])

No, it could not be written as you propose.
You forget what []byte is not array of bytes. Its slice of bytes.
In memory slice is just structure which has pointer to buffer (array of bytes, which will hold data) and some information about array size.
a := (*[10000]uint8)(unsafe.Pointer(p))
this code convert byte pointer to array of bytes.
In go array of bytes - its just bytes, like in C. Nothing more.
So if you will declare array of bytes:
var buffer [256]byte
Go will allocate exactly 256 bytes, array don't even have length. Length of array is part of its type, and only compiler have information about array length.
Whats why we can convert byte pointer to array. Its basically the same.
i := 0
for a[i] != 0 {
i++
}
Here we just find the end of null terminated string.
return string(a[:i])
Here we use slicing operation to create a string. In Go string is basically constant slice, so it will point to the same memory address as original byte pointer.
This two articles will help you understand this topic a bit better:
http://blog.golang.org/slices
http://blog.golang.org/go-slices-usage-and-internals

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Go: understanding strings - string

Related

Can I make a prefilled string in golang with make or new?

Safetly write bytes as UTF-8 characters using strings.Builder?

Golang converting from rune to string

interview riddle (string manipulation) - explanation needed

Understanding bytePtrToString() from Golang pkg/net/interface_windows.go

Categories

Resources