Using this file (data file):
package main
import (
"io/ioutil"
"time"
)
func main() {
ioutil.ReadFile("100mb.file")
time.Sleep(time.Duration(time.Minute))
}
Showed memory usage for me of 107 MB. With this similar file:
package main
import (
"bytes"
"os"
"time"
)
func read(path_s string) (bytes.Buffer, error) {
buf_o := bytes.Buffer{}
open_o, e := os.Open(path_s)
if e != nil {
return buf_o, e
}
buf_o.ReadFrom(open_o)
open_o.Close()
return buf_o, nil
}
func main() {
read("100mb.file")
time.Sleep(time.Duration(time.Minute))
}
Memory usage went to 273 MB. Finally this similar file:
package main
import (
"io"
"os"
"strings"
"time"
)
func read(path_s string) (strings.Builder, error) {
str_o := strings.Builder{}
open_o, e := os.Open(path_s)
if e != nil {
return str_o, e
}
io.Copy(&str_o, open_o)
open_o.Close()
return str_o, nil
}
func main() {
read("100mb.file")
time.Sleep(time.Duration(time.Minute))
}
Memory usage went to 432 MB. I tried to be careful and close files where
possible. Why is the memory usage so high for the second example, and especially
the final example? Can I change something so that they are closer to the first
example?
ioutil.ReadFile("100mb.file") gets the size of the file, allocates a []byte that size and slurps the bytes up into that slice.
buf_o.ReadFrom(open_o) allocates an initial []byte of some size and reads into that slice. If there's more data in the reader than space in the slice, then the function allocates a larger slice, copies existing data to that slice and reads more. This repeats until EOF.
The function ioutil.ReadFile uses bytes Buffer.ReadFrom internally. Take a look at the ioutil.ReadFile implementation to see how to improve direct use of the bytes.Buffer. A synopsis of the logic is this:
var buf bytes.Buffer
// Open file
f, err := os.Open(path)
if err != nil {
return &buf, err
}
defer f.Close()
// Get size.
fi, err := f.Stat()
if err != nil {
return &buf, err
}
// Grow to size of file plus extra slop to ensure no realloc.
buf.Grow(int(fi.Size()) + bytes.MinRead)
_, err := buf.ReadFrom(f)
return &buf, err
The strings.Builder example reallocates the internal buffer several times as in the bytes.Buffer example. In addition, io.Copy allocates a buffer. You can improve the strings.Builder example by growing the builder to the size of the file before reading.
Here's the code for strings.Builder:
var buf strings.Builder
// Open file
f, err := os.Open(path)
if err != nil {
return &buf, err
}
defer f.Close()
// Get size.
fi, err := f.Stat()
if err != nil {
return &buf, err
}
buf.Grow(int(fi.Size()))
_, err = io.Copy(&buf, f)
return &buf, err
io.Copy or some other code using an extra buffer is required because strings.Builder does not have a ReadFrom method. The strings.Builder type does not have a ReadFrom method because that method can leak a reference to the backing array of the internal slice of bytes.
Using suggestion from Muffin Top, I took my second example and added this
directly before the call to ReadFrom:
stat_o, e := open_o.Stat()
if e != nil {
return buf_o, e
}
buf_o.Grow(bytes.MinRead + int(stat_o.Size()))
and the memory went down to 107 MB, basically the same as the first example.
Related
In Go, I am trying to create a function that reads and processes the next line of input:
// Read a string of hex from stdin and parse to an array of bytes
func ReadHex() []byte {
r := bufio.NewReader(os.Stdin)
t, _ := r.ReadString('\n')
data, _ := hex.DecodeString(strings.TrimSpace(t))
return data
}
Unfortunately, this only works the first time it is called. It captures the first line but is unable to capture subsequent lines piped via standard input.
I suspect, if the same persistent bufio.Reader() object was used on each subsequent call, it would work but I haven't been able to achieve this without passing it manually on each function call.
Yes, try this:
package main
import (
"bufio"
"encoding/hex"
"fmt"
"log"
"os"
"strings"
)
func ReadFunc() func() []byte {
r := bufio.NewReader(os.Stdin)
return func() []byte {
t, err := r.ReadString('\n')
if err != nil {
log.Fatal(err)
}
data, err := hex.DecodeString(strings.TrimSpace(t))
if err != nil {
log.Fatal(err)
}
return data
}
}
func main() {
r, w, err := os.Pipe()
if err != nil {
log.Fatal(err)
}
os.Stdin = r
w.Write([]byte(`ffff
cafebabe
ff
`))
w.Close()
ReadHex := ReadFunc()
fmt.Println(ReadHex())
fmt.Println(ReadHex())
fmt.Println(ReadHex())
}
Output:
[255 255]
[202 254 186 190]
[255]
Using a struct, try this:
package main
import (
"bufio"
"encoding/hex"
"fmt"
"io"
"log"
"os"
"strings"
)
// InputReader struct
type InputReader struct {
bufio.Reader
}
// New creates an InputReader
func New(rd io.Reader) *InputReader {
return &InputReader{Reader: *bufio.NewReader(rd)}
}
// ReadHex returns a string of hex from stdin and parse to an array of bytes
func (r *InputReader) ReadHex() []byte {
t, err := r.ReadString('\n')
if err != nil {
log.Fatal(err)
}
data, err := hex.DecodeString(strings.TrimSpace(t))
if err != nil {
log.Fatal(err)
}
return data
}
func main() {
r, w, err := os.Pipe()
if err != nil {
log.Fatal(err)
}
os.Stdin = r
w.Write([]byte(`ffff
cafebabe
ff
`))
w.Close()
rdr := New(os.Stdin)
fmt.Println(rdr.ReadHex())
fmt.Println(rdr.ReadHex())
fmt.Println(rdr.ReadHex())
}
I'm trying to mremap a file from Go, but the size of the file doesn't seem to be changing, despite the returned errno of 0. This results in a segfault when I try to access the mapped memory.
I've included the code below. The implementation is similar to the mmap implementation in the sys package, so I'm not sure what's going wrong here:
package main
import (
"fmt"
"io/ioutil"
"log"
"os"
"reflect"
"unsafe"
"golang.org/x/sys/unix"
)
// taken from <https://github.com/torvalds/linux/blob/f8394f232b1eab649ce2df5c5f15b0e528c92091/include/uapi/linux/mman.h#L8>
const (
MREMAP_MAYMOVE = 0x1
// MREMAP_FIXED = 0x2
// MREMAP_DONTUNMAP = 0x4
)
func mremap(data []byte, size int) ([]byte, error) {
header := (*reflect.SliceHeader)(unsafe.Pointer(&data))
mmapAddr, mmapSize, errno := unix.Syscall6(
unix.SYS_MREMAP,
header.Data,
uintptr(header.Len),
uintptr(size),
uintptr(MREMAP_MAYMOVE),
0,
0,
)
if errno != 0 {
return nil, fmt.Errorf("mremap failed with errno: %s", errno)
}
if mmapSize != uintptr(size) {
return nil, fmt.Errorf("mremap size mismatch: requested: %d got: %d", size, mmapSize)
}
header.Data = mmapAddr
header.Cap = size
header.Len = size
return data, nil
}
func main() {
log.SetFlags(log.LstdFlags | log.Lshortfile)
const mmPath = "/tmp/mm_test"
// create a file for mmap with 1 byte of data.
// this should take up 1 block on disk (4096 bytes).
err := ioutil.WriteFile(mmPath, []byte{0x1}, 0755)
if err != nil {
log.Fatal(err)
}
// open and stat the file.
file, err := os.OpenFile(mmPath, os.O_RDWR, 0)
if err != nil {
log.Fatal(err)
}
defer file.Close()
stat, err := file.Stat()
if err != nil {
log.Fatal(err)
}
// mmap the file and print the contents.
// this should print only one byte of data.
data, err := unix.Mmap(int(file.Fd()), 0, int(stat.Size()), unix.PROT_READ|unix.PROT_WRITE, unix.MAP_SHARED)
if err != nil {
log.Fatal(err)
}
fmt.Printf("mmap data: %+v\n", data)
// mremap the file to a size of 2 blocks.
data, err = mremap(data, 2*4096)
if err != nil {
log.Fatal(err)
}
// access the mremapped data.
fmt.Println(data[:4096]) // accessing the first block works.
fmt.Println(data[:4097]) // accessing the second block fails with `SIGBUS: unexpected fault address`.
}
I tried looking for other Go code that uses mremap, but I can't seem to find any. I would appreciate any input!
As #kostix mentioned in the comments, mmap is being used to map a regular file into memory. The reason that accessing the buffer results in a segfault is that the underlying file itself is not large enough. The solution is to truncate the file to the desired length before calling mremap:
if err := file.Truncate(2*4096); err != nil {
log.Fatal(err)
}
data, err = mremap(data, 2*4096)
If application does some heavy lifting with multiple file descriptors (e.g., opening - writing data - syncing - closing), what actually happens to Go runtime? Does it block all the goroutines at the time when expensive syscall occures (like syscall.Fsync)? Or only the calling goroutine is blocked while the others are still operating?
So does it make sense to write programs with multiple workers that do a lot of user space - kernel space context switching? Does it make sense to use multithreading patterns for disk input?
package main
import (
"log"
"os"
"sync"
)
var data = []byte("some big data")
func worker(filenamechan chan string, wg *sync.waitgroup) {
defer wg.done()
for {
filename, ok := <-filenamechan
if !ok {
return
}
// open file is a quite expensive operation due to
// the opening new descriptor
f, err := os.openfile(filename, os.o_create|os.o_wronly, os.filemode(0644))
if err != nil {
log.fatal(err)
continue
}
// write is a cheap operation,
// because it just moves data from user space to the kernel space
if _, err := f.write(data); err != nil {
log.fatal(err)
continue
}
// syscall.fsync is a disk-bound expensive operation
if err := f.sync(); err != nil {
log.fatal(err)
continue
}
if err := f.close(); err != nil {
log.fatal(err)
}
}
}
func main() {
// launch workers
filenamechan := make(chan string)
wg := &sync.waitgroup{}
for i := 0; i < 2; i++ {
wg.add(1)
go worker(filenamechan, wg)
}
// send tasks to workers
filenames := []string{
"1.txt",
"2.txt",
"3.txt",
"4.txt",
"5.txt",
}
for i := range filenames {
filenamechan <- filenames[i]
}
close(filenamechan)
wg.wait()
}
https://play.golang.org/p/O0omcPBMAJ
If a syscall blocks, the Go runtime will launch a new thread so that the number of threads available to run goroutines remains the same.
A fuller explanation can be found here: https://morsmachine.dk/go-scheduler
i am trying to create a simple program to read lines from a text file and print them out to the console in golang. I spent lots of time going over my code and I simply can't understand why only the last line is being printed out to the screen. can anyone tell me where I am going wrong here? Everything here should compile and run.
package main
import (
"bufio"
"fmt"
"os"
)
func Readln(r *bufio.Reader) (string, error) {
var (
isPrefix bool = true
err error = nil
line, ln []byte
)
for isPrefix && err == nil {
line, isPrefix, err = r.ReadLine()
ln = append(ln, line...)
}
return string(ln), err
}
func main() {
f, err := os.Open("tickers.txt")
if err != nil {
fmt.Printf("error opening file: %v\n", err)
os.Exit(1)
}
r := bufio.NewReader(f)
s, e := Readln(r)
for e == nil {
fmt.Println(s)
s, e = Readln(r)
}
}
I therefore suspect that the problem is in your tickers.txt file line endings. The docs for ReadLine() also indicate that for most situations a Scanner is more suitable.
The following SO question has some useful information for alternative implementations: reading file line by line in go
I then used the example in the above question to re-implement your main function as follows:
f, err := os.Open("tickers.txt")
if err != nil {
fmt.Printf("error opening file: %v\n", err)
os.Exit(1)
}
scanner := bufio.NewScanner(f)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
if err := scanner.Err(); err != nil {
fmt.Println(err)
}
I already found encoding/binary package to deal with it, but it depended on reflect package so it didn't work with uncapitalized(that is, unexported) struct fields. However I spent a week to find that problem out, I still have a question: if struct fields should not be exported, how do I dump them easily into binary data?
EDIT: Here's the example. If you capitalize the name of fields of Data struct, that works properly. But Data struct was intended to be an abstract type, so I don't want to export these fields.
package main
import (
"fmt"
"encoding/binary"
"bytes"
)
type Data struct {
id int32
name [16]byte
}
func main() {
d := Data{Id: 1}
copy(d.Name[:], []byte("tree"))
buffer := new(bytes.Buffer)
binary.Write(buffer, binary.LittleEndian, d)
// d was written properly
fmt.Println(buffer.Bytes())
// try to read...
buffer = bytes.NewBuffer(buffer.Bytes())
var e = new(Data)
err := binary.Read(buffer, binary.LittleEndian, e)
fmt.Println(e, err)
}
Your best option would probably be to use the gob package and let your struct implement the GobDecoder and GobEncoder interfaces in order to serialize and deserialize private fields.
This would be safe, platform independent, and efficient. And you have to add those GobEncode and GobDecode functions only on structs with unexported fields, which means you don't clutter the rest of your code.
func (d *Data) GobEncode() ([]byte, error) {
w := new(bytes.Buffer)
encoder := gob.NewEncoder(w)
err := encoder.Encode(d.id)
if err!=nil {
return nil, err
}
err = encoder.Encode(d.name)
if err!=nil {
return nil, err
}
return w.Bytes(), nil
}
func (d *Data) GobDecode(buf []byte) error {
r := bytes.NewBuffer(buf)
decoder := gob.NewDecoder(r)
err := decoder.Decode(&d.id)
if err!=nil {
return err
}
return decoder.Decode(&d.name)
}
func main() {
d := Data{id: 7}
copy(d.name[:], []byte("tree"))
buffer := new(bytes.Buffer)
// writing
enc := gob.NewEncoder(buffer)
err := enc.Encode(d)
if err != nil {
log.Fatal("encode error:", err)
}
// reading
buffer = bytes.NewBuffer(buffer.Bytes())
e := new(Data)
dec := gob.NewDecoder(buffer)
err = dec.Decode(e)
fmt.Println(e, err)
}