Read() function - io

http://play.golang.org/p/Opb7pRFyMf
// func (f *File) Read(b []byte) (n int, err error)
record, err := reader.Read()
Is the Read() function defined in os package?
I am trying to understand this code but cannot find where the Read() function is defined... if that is the one in os package, it returns integer for record variable. But how come it is able to print out the text in the text file?

Reader is the interface that wraps the basic Read method.
type Reader interface {
Read(p []byte) (n int, err error)
}
Read method takes a slice of byte as an argument and returns (number of bytes read, error)
myReader := strings.NewReader("This is my reader")
arr := make([]byte, 4)
for {
// n is number of bytes read
n, err := myReader.Read(arr)
if err == io.EOF {
break
}
fmt.Println(string(arr[:n]))
}
Output:
This
is
my r
eade
r
string(arr[:n]) converts content of slice arr to string.
To read more about Read and io.Reader, refer this article

Related

Golang Increcementing numbers in strings (using runes)

I have a string mixed with characters and numerals, but i want to increment the last character which happens to be a number, here is what i have, it works, but once i reach 10 rune goes to black since 10 decimal is zero, is there a better way to do this?
package main
import (
"fmt"
)
func main() {
str := "version-1.1.0-8"
rStr := []rune(str)
last := rStr[len(rStr)-1]
rStr[len(rStr)-1] = last + 1
}
So this works for str := "version-1.1.0-8" = version-1.1.0-9
str := version-1.1.0-9 = version-1.1.0-
I understand why it is happening, but I dont know how to fix it
Your intention is to increment the number represented by the last rune, so you should do that: parse out that number, increment it as a number, and "re-encode" it into string.
You can't operate on a single rune, as once the number reaches 10, it can only be represented using 2 runes. Another issue is if the last number is 19, incrementing it needs to alter the previous rune (and not adding a new rune).
Parsing the numbers and re-encoding though is much easier than one might think.
You can take advantage of the fmt package's fmt.Sscanf() and fmt.Sprintf() functions. Parsing and re-encoding is just a single function call.
Let's wrap this functionality into a function:
const format = "version-%d.%d.%d-%d"
func incLast(s string) (string, error) {
var a, b, c, d int
if _, err := fmt.Sscanf(s, format, &a, &b, &c, &d); err != nil {
return "", err
}
d++
return fmt.Sprintf(format, a, b, c, d), nil
}
Testing it:
s := "version-1.1.0-8"
for i := 0; i < 13; i++ {
var err error
if s, err = incLast(s); err != nil {
panic(err)
}
fmt.Println(s)
}
Output (try it on the Go Playground):
version-1.1.0-9
version-1.1.0-10
version-1.1.0-11
version-1.1.0-12
version-1.1.0-13
version-1.1.0-14
version-1.1.0-15
version-1.1.0-16
version-1.1.0-17
version-1.1.0-18
version-1.1.0-19
version-1.1.0-20
version-1.1.0-21
Another option would be to just parse and re-encode the last part, and not the complete version text. This is how it would look like:
func incLast2(s string) (string, error) {
i := strings.LastIndexByte(s, '-')
if i < 0 {
return "", fmt.Errorf("invalid input")
}
d, err := strconv.Atoi(s[i+1:])
if err != nil {
return "", err
}
d++
return s[:i+1] + strconv.Itoa(d), nil
}
Testing and output is the same. Try this one on the Go Playground.

Overhead of converting from []byte to string and vice-versa

I always seem to be converting strings to []byte to string again over and over. Is there a lot of overhead with this? Is there a better way?
For example, here is a function that accepts a UTF8 string, normalizes it, remove accents, then converts special characters to ASCII equivalent:
var transliterations = map[rune]string{'Æ':"AE",'Ð':"D",'Ł':"L",'Ø':"OE",'Þ':"Th",'ß':"ss",'æ':"ae",'ð':"d",'ł':"l",'ø':"oe",'þ':"th",'Œ':"OE",'œ':"oe"}
func RemoveAccents(s string) string {
b := make([]byte, len(s))
t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
_, _, e := t.Transform(b, []byte(s), true)
if e != nil { panic(e) }
r := string(b)
var f bytes.Buffer
for _, c := range r {
temp := rune(c)
if val, ok := transliterations[temp]; ok {
f.WriteString(val)
} else {
f.WriteRune(temp)
}
}
return f.String()
}
So I'm starting with a string because that's what I get, then I'm converting it to a byte array, then back to a string, then to a byte array again, then back to a string again. Surely this is unnecessary but I can't figure out how to not do this..? And does it really have a lot of overhead or do I not have to worry about slowing things down with excessive conversions?
(Also if anyone has the time I've not yet figured out how bytes.Buffer actually works, would it not be better to initialize a buffer of 2x the size of the string, which is the maximum output size of the return value?)
In Go, strings are immutable so any change creates a new string. As a general rule, convert from a string to a byte or rune slice once and convert back to a string once. To avoid reallocations, for small and transient allocations, over-allocate to provide a safety margin if you don't know the exact number.
For example,
package main
import (
"bytes"
"fmt"
"unicode"
"unicode/utf8"
"code.google.com/p/go.text/transform"
"code.google.com/p/go.text/unicode/norm"
)
var isMn = func(r rune) bool {
return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
}
var transliterations = map[rune]string{
'Æ': "AE", 'Ð': "D", 'Ł': "L", 'Ø': "OE", 'Þ': "Th",
'ß': "ss", 'æ': "ae", 'ð': "d", 'ł': "l", 'ø': "oe",
'þ': "th", 'Œ': "OE", 'œ': "oe",
}
func RemoveAccents(b []byte) ([]byte, error) {
mnBuf := make([]byte, len(b)*125/100)
t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
n, _, err := t.Transform(mnBuf, b, true)
if err != nil {
return nil, err
}
mnBuf = mnBuf[:n]
tlBuf := bytes.NewBuffer(make([]byte, 0, len(mnBuf)*125/100))
for i, w := 0, 0; i < len(mnBuf); i += w {
r, width := utf8.DecodeRune(mnBuf[i:])
if s, ok := transliterations[r]; ok {
tlBuf.WriteString(s)
} else {
tlBuf.WriteRune(r)
}
w = width
}
return tlBuf.Bytes(), nil
}
func main() {
in := "test stringß"
fmt.Println(in)
inBytes := []byte(in)
outBytes, err := RemoveAccents(inBytes)
if err != nil {
fmt.Println(err)
}
out := string(outBytes)
fmt.Println(out)
}
Output:
test stringß
test stringss
There is no answer to this question. If these conversions are a performance bottleneck in your application you should fix them. If not: Not.
Did you profile your application under realistic load and RemoveAccents is the bottleneck? No? So why bother?
Really: I assume one could do better (in the sense of less garbage, less iterations and less conversions) e.g. by chaining in some "TransliterationTransformer". But I doubt it would be wirth the hassle.
There is a small overhead with converting a string to a byte slice (not an array, that's a different type). Namely allocating the space for the byte slice.
Strings are its own type and are an interpretation of a sequence of bytes. But not every sequence of bytes is a useful string. Strings are also immutable. If you look at the strings package, you will see that strings will be sliced a lot.
In your example you can omit the second conversion back to string. You can also range over a byte slice.
As with every question about performance: you will probably need to measure. Is the allocation of byte slices really your bottleneck?
You can initialize your bytes.Buffer like so:
f := bytes.NewBuffer(make([]byte, 0, len(s)*2))
where you have a size of 0 and a capacity of 2x the size of your string. If you can estimate the size of your buffer, it is probably good to do that. It will save you a few reallocations of the underlying byte slices.

Reading beyond buffer

I have a buffer of size bufferSize from which I read in chunks of blockSize, however, this yields some (to me) unexpected behavior, when the blockSize goes beyond the bufferSize.
I've put the code here:
http://play.golang.org/p/Ra2jicYHPu
Why does the second chunk only give 4 bytes? What's happening here?
I'd expect Read to always give the amount of bytes len(byteArray), and if it goes beyond the buffer, it'll handle that situation by setting the pointer in the buffer to after byteArray, and putting the rest of the buffer + whatever is beyond until the new buffer pointer.
Your expectations are not based on any documented behavior of bufio.Reader. If you want "Read to always give the amount of bytes len(byteArray)" you must use io.ReadAtLeast.
package main
import (
"bufio"
"fmt"
"io"
"strings"
)
const bufSize = 10
const blockSize = 12
func main() {
s := strings.NewReader("some length test string buffer boom")
buffer := bufio.NewReaderSize(s, bufSize)
b := make([]byte, blockSize)
n, err := io.ReadAtLeast(buffer, b, blockSize)
if err != nil {
fmt.Println(err)
}
fmt.Printf("First read got %d bytes: %s\n", n, string(b))
d := make([]byte, blockSize)
n, err = io.ReadAtLeast(buffer, d, blockSize)
if err != nil {
fmt.Println(err)
}
fmt.Printf("Second read got %d bytes: %s\n", n, string(d))
}
Playground
Output:
First read got 12 bytes: some length
Second read got 12 bytes: test string
1.see the code of buffio.NewReaderSize
func NewReaderSize(rd io.Reader, size int) *Reader {
// Is it already a Reader?
b, ok := rd.(*Reader)
if ok && len(b.buf) >= size {
return b
}
if size < minReadBufferSize {
size = minReadBufferSize
}
return &Reader{
buf: make([]byte, size),
rd: rd,
lastByte: -1,
lastRuneSize: -1,
}
}
strings.NewReader return a strings.Reader,so the buffer's (returned by bufio.NewReaderSize ) buf has minReadBufferSize(val is 16)
2.see code of bufio.Read
func (b *Reader) Read(p []byte) (n int, err error) {
……
copy(p[0:n], b.buf[b.r:])
b.r += n
b.lastByte = int(b.buf[b.r-1])
b.lastRuneSize = -1
return n, nil
}
copy src is b.buf[b.r:],when your first Read,b.r=12,……

Go sort a slice of runes?

I'm having trouble sorting strings by character (to check whether two strings are anagrams, I want to sort both of them, and check for equality).
I can get a []rune representation of the string s like this:
runes := make([]rune, len(s))
copy(runes, []rune(s))
And I can sort ints like this
someInts := []int{5, 2, 6, 3, 1, 4} // unsorted
sort.Ints(someInts)
But rune is just an alias for int32 so I should be able to call
sort.Ints(runes)
However, I get the error:
cannot use runes (type []rune) as type []int in function argument
So... how do I sort a slice of int32, int64, or int*?
EDIT: I did get my runes sorted, but boy, this is ugly.
type RuneSlice []rune
func (p RuneSlice) Len() int { return len(p) }
func (p RuneSlice) Less(i, j int) bool { return p[i] < p[j] }
func (p RuneSlice) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
func sorted(s string) string {
runes := []rune(s)
sort.Sort(RuneSlice(runes))
return string(runes)
}
So basically if you have a slice of whatever, you'll have to wrap it in a type that implements sort.Interface. All those implementations will have the exact same method bodies (like sort.IntSlice and sort.Float64Slice). If this is really how ugly this has to be then why didn't they provide these WhateverSlice wrappers in the sort package? The lack of generics start to hurt very badly now. There must be a better way of sorting things.
Use sort.Sort(data Interface) and implement sort.Interface, see the examples on package documentation.
You cannot use rune which is int32 as int. Check the comment of int.
int is a signed integer type that is at least 32 bits in size. It is a
distinct type, however, and not an alias for, say, int32.
Note: Go 1.8 will introduce helpers for sorting slices.
See issue 16721 and commit 22a2bdf by Brad Fitzpatrick
var strings = [...]string{"", "Hello", "foo", "bar", "foo", "f00", "%*&^*&^&", "***"}
func TestSlice(t *testing.T) {
data := strings
Slice(data[:], func(i, j int) bool {
return data[i] < data[j]
})
}
Just as a point of comparison, here's what things might look like if the sort interface were slightly different. That is, rather than the interface being on the container, what would things look like if the interface were on the elements instead?
package main
import (
"fmt"
"sort"
)
type Comparable interface {
LessThan(Comparable) bool
}
type ComparableSlice []Comparable
func (c ComparableSlice) Len() int {
return len(c)
}
func (c ComparableSlice) Less(i, j int) bool {
return c[i].LessThan(c[j])
}
func (c ComparableSlice) Swap(i, j int) {
c[i], c[j] = c[j], c[i]
}
func SortComparables(elts []Comparable) {
sort.Sort(ComparableSlice(elts))
}
//////////////////////////////////////////////////////////////////////
// Let's try using this:
type ComparableRune rune
func (r1 ComparableRune) LessThan(o Comparable) bool {
return r1 < o.(ComparableRune)
}
func main() {
msg := "Hello world!"
comparables := make(ComparableSlice, len(msg))
for i, v := range msg {
comparables[i] = ComparableRune(v)
}
SortComparables(comparables)
sortedRunes := make([]rune, len(msg))
for i, v := range comparables {
sortedRunes[i] = rune(v.(ComparableRune))
}
fmt.Printf("result: %#v\n", string(sortedRunes))
}
Here, we define a Comparable interface, and we get our type ComparableRune to satisfy it. But because it's an interface, we've got to do the awkward boxing to go from rune to ComparableRune so that dynamic dispatch can kick in:
comparables := make(ComparableSlice, len(msg))
for i, v := range msg {
comparables[i] = ComparableRune(v)
}
and unboxing to get back our runes:
sortedRunes := make([]rune, len(msg))
for i, v := range comparables {
sortedRunes[i] = rune(v.(ComparableRune))
}
This approach appears to require us to know how to do typecasts to go back and forth between the interface and the dynamic type of the value. It seems like we would need to use more parts of Go---more mechanics---than the approach that uses the container as the interface.
There is, in fact a soft-generic way to do what you want.
Check out the following package:
https://github.com/BurntSushi/ty/tree/master/fun
especially the following file:
https://github.com/BurntSushi/ty/blob/master/fun/sort_test.go
Example of how it is used:
tosort := []int{10, 3, 5, 1, 15, 6}
fun.Sort(func(a, b int) bool {
return b < a
}, tosort)
There are lots of other interesting fun generic algorithms implemented through reflection in that package.
All credits go to #BurntSushi.
As of November 2020 at least, https://golang.org/pkg/sort/ offers to use a custom Less function passed as a closure. The code below has the desired effect:
package main
import (
"fmt"
"sort"
)
func main() {
s1 := "eidbaooo"
runeSlice := []rune(s1)
fmt.Println(string(runeSlice))
sort.Slice(runeSlice, func(i, j int) bool {
return runeSlice[i] < runeSlice[j]
})
fmt.Println(string(runeSlice))
}
Output:
eidbaooo
abdeiooo
This can spare you the full interface implementation.

What is the fastest way to generate a long random string in Go?

Like [a-zA-Z0-9] string:
na1dopW129T0anN28udaZ
or hexadecimal string:
8c6f78ac23b4a7b8c0182d
By long I mean 2K and more characters.
This does about 200MBps on my box. There's obvious room for improvement.
type randomDataMaker struct {
src rand.Source
}
func (r *randomDataMaker) Read(p []byte) (n int, err error) {
for i := range p {
p[i] = byte(r.src.Int63() & 0xff)
}
return len(p), nil
}
You'd just use io.CopyN to produce the string you want. Obviously you could adjust the character set on the way in or whatever.
The nice thing about this model is that it's just an io.Reader so you can use it making anything.
Test is below:
func BenchmarkRandomDataMaker(b *testing.B) {
randomSrc := randomDataMaker{rand.NewSource(1028890720402726901)}
for i := 0; i < b.N; i++ {
b.SetBytes(int64(i))
_, err := io.CopyN(ioutil.Discard, &randomSrc, int64(i))
if err != nil {
b.Fatalf("Error copying at %v: %v", i, err)
}
}
}
On one core of my 2.2GHz i7:
BenchmarkRandomDataMaker 50000 246512 ns/op 202.83 MB/s
EDIT
Since I wrote the benchmark, I figured I'd do the obvious improvement thing (call out to the random less frequently). With 1/8 the calls to rand, it runs about 4x faster, though it's a big uglier:
New version:
func (r *randomDataMaker) Read(p []byte) (n int, err error) {
todo := len(p)
offset := 0
for {
val := int64(r.src.Int63())
for i := 0; i < 8; i++ {
p[offset] = byte(val & 0xff)
todo--
if todo == 0 {
return len(p), nil
}
offset++
val >>= 8
}
}
panic("unreachable")
}
New benchmark:
BenchmarkRandomDataMaker 200000 251148 ns/op 796.34 MB/s
EDIT 2
Took out the masking in the cast to byte since it was redundant. Got a good deal faster:
BenchmarkRandomDataMaker 200000 231843 ns/op 862.64 MB/s
(this is so much easier than real work sigh)
EDIT 3
This came up in irc today, so I released a library. Also, my actual benchmark tool, while useful for relative speed, isn't sufficiently accurate in its reporting.
I created randbo that you can reuse to produce random streams wherever you may need them.
You can use the Go package uniuri to generate random strings (or view the source code to see how they're doing it). You'll want to use:
func NewLen(length int) string
NewLen returns a new random string of the provided length, consisting of standard characters.
Or, to specify the set of characters used:
func NewLenChars(length int, chars []byte) string
This is actually a little biased towards the first 8 characters in the set (since 255 is not a multiple of len(alphanum)), but this will get you most of the way there.
import (
"crypto/rand"
)
func randString(n int) string {
const alphanum = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
var bytes = make([]byte, n)
rand.Read(bytes)
for i, b := range bytes {
bytes[i] = alphanum[b % byte(len(alphanum))]
}
return string(bytes)
}
If you want to generate cryptographically secure random string, I recommend you to take a look at this page. Here is a helper function that reads n random bytes from the source of randomness of your OS and then use these bytes to base64encode it. Note that the string length would be bigger than n because of base64.
package main
import(
"crypto/rand"
"encoding/base64"
"fmt"
)
func GenerateRandomBytes(n int) ([]byte, error) {
b := make([]byte, n)
_, err := rand.Read(b)
if err != nil {
return nil, err
}
return b, nil
}
func GenerateRandomString(s int) (string, error) {
b, err := GenerateRandomBytes(s)
return base64.URLEncoding.EncodeToString(b), err
}
func main() {
token, _ := GenerateRandomString(32)
fmt.Println(token)
}
Here Evan Shaw's answer re-worked without the bias towards the first 8 characters of the string. Note that it uses lots of expensive big.Int operations so probably isn't that quick! The answer is crypto strong though.
It uses rand.Int to make an integer of exactly the right size len(alphanum) ** n, then does what is effectively a base conversion into base len(alphanum).
There is almost certainly a better algorithm for this which would involve keeping a much smaller remainder and adding random bytes to it as necessary. This would get rid of the expensive long integer arithmetic.
import (
"crypto/rand"
"fmt"
"math/big"
)
func randString(n int) string {
const alphanum = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
symbols := big.NewInt(int64(len(alphanum)))
states := big.NewInt(0)
states.Exp(symbols, big.NewInt(int64(n)), nil)
r, err := rand.Int(rand.Reader, states)
if err != nil {
panic(err)
}
var bytes = make([]byte, n)
r2 := big.NewInt(0)
symbol := big.NewInt(0)
for i := range bytes {
r2.DivMod(r, symbols, symbol)
r, r2 = r2, r
bytes[i] = alphanum[symbol.Int64()]
}
return string(bytes)
}

Resources