Issue modifying map from goroutine func - multithreading

scores := make(map[string]int)
percentage := make(map[string]float64)
total := 0
for i, ans := range answers {
answers[i] = strings.ToLower(ans)
}
wg := sync.WaitGroup{}
go func() {
wg.Add(1)
body, _ := google(question)
for _, ans := range answers {
count := strings.Count(body, ans)
total += count
scores[ans] += 5 // <------------------- This doesn't work
}
wg.Done()
}()
Here's a snippet of code, my issue is, that I am unable to modify the scores, I've tried using pointers, I've tried doing it normally, I've tried passing it as a parameter.

Package sync
import "sync"
type WaitGroup
A WaitGroup waits for a collection of goroutines to finish. The main
goroutine calls Add to set the number of goroutines to wait for. Then
each of the goroutines runs and calls Done when finished. At the same
time, Wait can be used to block until all goroutines have finished.
You have provided us with a non-working fragment of code. See How to create a Minimal, Complete, and Verifiable example.
As a guess, your use of a sync.WaitGroup looks strange. For example, by simply following the instructions in the sync.Waitgroup documentation, I would expect something more like the following:
package main
import (
"fmt"
"strings"
"sync"
)
func google(string) (string, error) { return "yes", nil }
func main() {
question := "question?"
answers := []string{"yes", "no"}
scores := make(map[string]int)
total := 0
wg := sync.WaitGroup{}
wg.Add(1)
go func() {
defer wg.Done()
body, _ := google(question)
for _, ans := range answers {
count := strings.Count(body, ans)
total += count
scores[ans] += 5 // <-- This does work
}
}()
wg.Wait()
fmt.Println(scores, total)
}
Playground: https://play.golang.org/p/sZmB2Dc5RjL
Output:
map[yes:5 no:5] 1

Related

Go program slowing down when increasing number of goroutines

I'm doing a small project for my parallelism course and I have tried it with buffered channels, unbuffered channels, without channels using pointers to slices etc. Also, tried to optimize it as much as possible (not the current state) but I still get the same result: increasing number of goroutines (even by 1) slows down the whole program. Can someone please tell me what I'm doing wrong and is even parallelism enhancement possible in this situation?
Here is part of the code:
func main() {
rand.Seed(time.Now().UnixMicro())
numAgents := 2
fmt.Println("Please pick a number of goroutines: ")
fmt.Scanf("%d", &numAgents)
numFiles := 4
fmt.Println("How many files do you want?")
fmt.Scanf("%d", &numFiles)
start := time.Now()
numAssist := numFiles
channel := make(chan []File, numAgents)
files := make([]File, 0)
for i := 0; i < numAgents; i++ {
if i == numAgents-1 {
go generateFiles(numAssist, channel)
} else {
go generateFiles(numFiles/numAgents, channel)
numAssist -= numFiles / numAgents
}
}
for i := 0; i < numAgents; i++ {
files = append(files, <-channel...)
}
elapsed := time.Since(start)
fmt.Printf("Function took %s\n", elapsed)
}
func generateFiles(numFiles int, channel chan []File) {
magicNumbersMap := getMap()
files := make([]File, 0)
for i := 0; i < numFiles; i++ {
content := randElementFromMap(&magicNumbersMap)
length := rand.Intn(400) + 100
hexSlice := getHex()
for j := 0; j < length; j++ {
content = content + hexSlice[rand.Intn(len(hexSlice))]
}
hash := getSHA1Hash([]byte(content))
file := File{
content: content,
hash: hash,
}
files = append(files, file)
}
channel <- files
}
Expectation was that by increasing goroutines the program would run faster but to a certain number of goroutines and at that point by increasing goroutines I would get the same execution time or a little bit slower.
EDIT: All the functions that are used:
import (
"crypto/sha1"
"encoding/base64"
"fmt"
"math/rand"
"time"
)
type File struct {
content string
hash string
}
func getMap() map[string]string {
return map[string]string{
"D4C3B2A1": "Libcap file format",
"EDABEEDB": "RedHat Package Manager (RPM) package",
"4C5A4950": "lzip compressed file",
}
}
func getHex() []string {
return []string{
"0", "1", "2", "3", "4", "5",
"6", "7", "8", "9", "A", "B",
"C", "D", "E", "F",
}
}
func randElementFromMap(m *map[string]string) string {
x := rand.Intn(len(*m))
for k := range *m {
if x == 0 {
return k
}
x--
}
return "Error"
}
func getSHA1Hash(content []byte) string {
h := sha1.New()
h.Write(content)
return base64.URLEncoding.EncodeToString(h.Sum(nil))
}
Simply speaking - the files generation code is not complex enough to justify parallel execution. All the context switching and moving data through the channel eats all benefit of parallel processing.
If you add something like time.Sleep(time.Millisecond * 10) inside the loop in your generateFiles function as if it was doing something more complex, you'll see what you expected to see - more goroutines work faster. But again, only until certain level, when extra work to do parallel processing overweights the benefit.
Note also, the execution time of the last bit of your program:
for i := 0; i < numAgents; i++ {
files = append(files, <-channel...)
}
directly depends on number of goroutines. Since all goroutines finish approximately at the same time, this loop almost never executed in parallel with your workers and the time it takes to run is simply added to the total time.
Next, when you append to files slice multiple times, it has to grow several times and copy the data over to the new location. You can avoid this by initially creating a slice that will fil all your resulting elements (luckily, you know exactly how many you'll need).

Can I perform an action on all slice items at once in go?

I have the following code:
func myfunction() {
results := make([]SomeCustomStruct, 0)
// ... results gets populated ...
for index, value := range results {
results[index].Body = cleanString(value.Body)
}
// ... when done, more things happen ...
}
func cleanString (in string) (out string) {
s := sanitize.HTML(in)
s = strings.Replace(s, "\n", " ", -1)
out = strings.TrimSpace(s)
return
}
The slice will never contain more than 100 or so entries. Is there any way I can exploit goroutines here to perform the cleanString function on each slice item at the same time rather than one by one?
Thanks!
If the slice only has 100 items or less and that's is the entirety of cleanString, you're not going to get a lot of speedup unless the body strings are fairly large.
Parallelizing it with goroutines would look something like:
var wg sync.WaitGroup
for index, value := range results {
wg.Add(1)
go func(index int, body string) {
defer wg.Done()
results[index].Body = cleanString(body)
}(index, value.Body)
}
wg.Wait()

Go channel takes each letter as string instead of the whole string

I'm creating a simple channel that takes string values. But apparently I'm pushing each letter in the string instead of the whole string in each loop.
I'm probably missing something very fundamental. What am I doing wrong ?
https://play.golang.org/p/-6E-f7ALbD
Code:
func doStuff(s string, ch chan string) {
ch <- s
}
func main() {
c := make(chan string)
loops := [5]int{1, 2, 3, 4, 5}
for i := 0; i < len(loops); i++ {
go doStuff("helloooo", c)
}
results := <-c
fmt.Println("channel size = ", len(results))
// print the items in channel
for _, r := range results {
fmt.Println(string(r))
}
}
Your code sends strings on the channel properly:
func doStuff(s string, ch chan string){
ch <- s
}
The problem is at the receiver side:
results := <- c
fmt.Println("channel size = ", len(results))
// print the items in channel
for _,r := range results {
fmt.Println(string(r))
}
results will be a single value received from the channel (the first value sent on it). And you print the length of this string.
Then you loop over this string (results) using a for range which loops over its runes, and you print those.
What you want is loop over the values of the channel:
// print the items in channel
for s := range c {
fmt.Println(s)
}
This when run will result in a runtime panic:
fatal error: all goroutines are asleep - deadlock!
Because you never close the channel, and a for range on a channel runs until the channel is closed. So you have to close the channel sometime.
For example let's wait 1 second, then close it:
go func() {
time.Sleep(time.Second)
close(c)
}()
This way your app will run and quit after 1 second. Try it on the Go Playground.
Another, nicer solution is to use sync.WaitGroup: this waits until all goroutines are done doing their work (sending a value on the channel), then it closes the channel (so there is no unnecessary wait / delay).
var wg = sync.WaitGroup{}
func doStuff(s string, ch chan string) {
ch <- s
wg.Done()
}
// And in main():
for i := 0; i < len(loops); i++ {
wg.Add(1)
go doStuff("helloooo", c)
}
go func() {
wg.Wait()
close(c)
}()
Try this one on the Go Playground.
Notes:
To repeat something 5 times, you don't need that ugly loops array. Simply do:
for i := 0; i < 5; i++ {
// Do something
}
The reason you are getting back the letters instead of string is that you are assigning the channel result to a variable and iterating over the result of the channel assigned to this variable which in your case is a string, and in Go you can iterate over a string with a for range loop to get the runes.
You can simply print the channel without to iterate over the channel result.
package main
import (
"fmt"
)
func doStuff(s string, ch chan string){
ch <- s
}
func main() {
c := make(chan string)
loops := [5]int{1,2,3,4,5}
for i := 0; i < len(loops) ; i++ {
go doStuff("helloooo", c)
}
results := <- c
fmt.Println("channel size = ", len(results))
fmt.Println(results) // will print helloooo
}

Synchronisation of threads in Go lang

I want to understand a bit more about how synchronisation of threads works in go. Below here I've have a functioning version of my program which uses a done channel for syncronization.
package main
import (
. "fmt"
"runtime"
)
func Goroutine1(i_chan chan int, done chan bool) {
for x := 0; x < 1000000; x++ {
i := <-i_chan
i++
i_chan <- i
}
done <- true
}
func Goroutine2(i_chan chan int, done chan bool) {
for x := 0; x < 1000000; x++ {
i := <-i_chan
i--
i_chan <- i
}
done <- true
}
func main() {
i_chan := make(chan int, 1)
done := make(chan bool, 2)
i_chan <- 0
runtime.GOMAXPROCS(runtime.NumCPU())
go Goroutine1(i_chan, done)
go Goroutine2(i_chan)
<-done
<-done
Printf("This is the value of i:%d\n", <-i_chan)
}
However when I try to run it with out any synchronisation. Using a wait statement and no channel to specify when it's done so no synchronisation.
const MAX = 1000000
func Goroutine1(i_chan chan int) {
for x := 0; x < MAX-23; x++ {
i := <-i_chan
i++
i_chan <- i
}
}
func main() {
i_chan := make(chan int, 1)
i_chan <- 0
runtime.GOMAXPROCS(runtime.NumCPU())
go Goroutine1(i_chan)
go Goroutine2(i_chan)
time.Sleep(100 * time.Millisecond)
Printf("This is the value of i:%d\n", <-i_chan)
}
It'll print out the wrong value of i. If you extend the wait for let say 1 sec it'll finish and print out the correct statement. I kind of understand that it has something with both thread not being finished before you print what's on the i_chan I'm just a bit curious about how this works.
Note that your first example would deadlock, since it never calls GoRoutine2 (the OP since edited the question).
If it calls GoRoutine2, then the expected i value is indeed 0.
Without synchronization, (as in this example), there is no guarantee that the main() doesn't exit before the completion of Goroutine1() and Goroutine2().
For a 1000000 loop, a 1 millisecond wait seems enough, but again, no guarantee.
func main() {
i_chan := make(chan int, 1)
i_chan <- 0
runtime.GOMAXPROCS(runtime.NumCPU())
go Goroutine2(i_chan)
go Goroutine1(i_chan)
time.Sleep(1 * time.Millisecond)
Printf("This is the value of i:%d\n", <-i_chan)
}
see more at "How to Wait for All Goroutines to Finish Executing Before Continuing", where the canonical way is to use the sync package’s WaitGroup structure, as in this runnable example.

How to get CPU usage

My Go program needs to know the current cpu usage percentage of all system and user processes.
How can I obtain that?
Check out this package http://github.com/c9s/goprocinfo, goprocinfo package does the parsing stuff for you.
stat, err := linuxproc.ReadStat("/proc/stat")
if err != nil {
t.Fatal("stat read fail")
}
for _, s := range stat.CPUStats {
// s.User
// s.Nice
// s.System
// s.Idle
// s.IOWait
}
I had a similar issue and never found a lightweight implementation. Here is a slimmed down version of my solution that answers your specific question. I sample the /proc/stat file just like tylerl recommends. You'll notice that I wait 3 seconds between samples to match top's output, but I have also had good results with 1 or 2 seconds. I run similar code in a loop within a go routine, then I access the cpu usage when I need it from other go routines.
You can also parse the output of top -n1 | grep -i cpu to get the cpu usage, but it only samples for half a second on my linux box and it was way off during heavy load. Regular top seemed to match very closely when I synchronized it and the following program:
package main
import (
"fmt"
"io/ioutil"
"strconv"
"strings"
"time"
)
func getCPUSample() (idle, total uint64) {
contents, err := ioutil.ReadFile("/proc/stat")
if err != nil {
return
}
lines := strings.Split(string(contents), "\n")
for _, line := range(lines) {
fields := strings.Fields(line)
if fields[0] == "cpu" {
numFields := len(fields)
for i := 1; i < numFields; i++ {
val, err := strconv.ParseUint(fields[i], 10, 64)
if err != nil {
fmt.Println("Error: ", i, fields[i], err)
}
total += val // tally up all the numbers to get total ticks
if i == 4 { // idle is the 5th field in the cpu line
idle = val
}
}
return
}
}
return
}
func main() {
idle0, total0 := getCPUSample()
time.Sleep(3 * time.Second)
idle1, total1 := getCPUSample()
idleTicks := float64(idle1 - idle0)
totalTicks := float64(total1 - total0)
cpuUsage := 100 * (totalTicks - idleTicks) / totalTicks
fmt.Printf("CPU usage is %f%% [busy: %f, total: %f]\n", cpuUsage, totalTicks-idleTicks, totalTicks)
}
It seems like I'm allowed to link to the full implementation that I wrote on bitbucket; if it's not, feel free to delete this. It only works on linux so far, though: systemstat.go
The mechanism for getting CPU usage is OS-dependent, since the numbers mean slightly different things to different OS kernels.
On Linux, you can query the kernel to get the latest stats by reading the pseudo-files in the /proc/ filesystem. These are generated on-the-fly when you read them to reflect the current state of the machine.
Specifically, the /proc/<pid>/stat file for each process contains the associated process accounting information. It's documented in proc(5). You're interested specifically in fields utime, stime, cutime and cstime (starting at the 14th field).
You can calculate the percentage easily enough: just read the numbers, wait some time interval, and read them again. Take the difference, divide by the amount of time you waited, and there's your average. This is precisely what the top program does (as well as all other programs that perform the same service). Bear in mind that you can have over 100% cpu usage if you have more than 1 CPU.
If you just want a system-wide summary, that's reported in /proc/stat -- calculate your average using the same technique, but you only have to read one file.
You can use the os.exec package to execute the ps command and get the result.
Here is a program issuing the ps aux command, parsing the result and printing the CPU usage of all processes on linux :
package main
import (
"bytes"
"log"
"os/exec"
"strconv"
"strings"
)
type Process struct {
pid int
cpu float64
}
func main() {
cmd := exec.Command("ps", "aux")
var out bytes.Buffer
cmd.Stdout = &out
err := cmd.Run()
if err != nil {
log.Fatal(err)
}
processes := make([]*Process, 0)
for {
line, err := out.ReadString('\n')
if err!=nil {
break;
}
tokens := strings.Split(line, " ")
ft := make([]string, 0)
for _, t := range(tokens) {
if t!="" && t!="\t" {
ft = append(ft, t)
}
}
log.Println(len(ft), ft)
pid, err := strconv.Atoi(ft[1])
if err!=nil {
continue
}
cpu, err := strconv.ParseFloat(ft[2], 64)
if err!=nil {
log.Fatal(err)
}
processes = append(processes, &Process{pid, cpu})
}
for _, p := range(processes) {
log.Println("Process ", p.pid, " takes ", p.cpu, " % of the CPU")
}
}
Here is an OS independent solution using Cgo to harness the clock() function provided by C standard library:
//#include <time.h>
import "C"
import "time"
var startTime = time.Now()
var startTicks = C.clock()
func CpuUsagePercent() float64 {
clockSeconds := float64(C.clock()-startTicks) / float64(C.CLOCKS_PER_SEC)
realSeconds := time.Since(startTime).Seconds()
return clockSeconds / realSeconds * 100
}
I recently had to take CPU usage measurements from a Raspberry Pi (Raspbian OS) and used github.com/c9s/goprocinfo combined with what is proposed here:
Accurate calculation of CPU usage given in percentage in Linux?
The idea comes from the htop source code and is to have two measurements (previous / current) in order to calculate the CPU usage:
func calcSingleCoreUsage(curr, prev linuxproc.CPUStat) float32 {
PrevIdle := prev.Idle + prev.IOWait
Idle := curr.Idle + curr.IOWait
PrevNonIdle := prev.User + prev.Nice + prev.System + prev.IRQ + prev.SoftIRQ + prev.Steal
NonIdle := curr.User + curr.Nice + curr.System + curr.IRQ + curr.SoftIRQ + curr.Steal
PrevTotal := PrevIdle + PrevNonIdle
Total := Idle + NonIdle
// fmt.Println(PrevIdle, Idle, PrevNonIdle, NonIdle, PrevTotal, Total)
// differentiate: actual value minus the previous one
totald := Total - PrevTotal
idled := Idle - PrevIdle
CPU_Percentage := (float32(totald) - float32(idled)) / float32(totald)
return CPU_Percentage
}
For more you can also check https://github.com/tgogos/rpi_cpu_memory

Resources