I have a function named linearize...I'm trying to speed up its execution but surprised to find that it had become slower. Have I missed something or have messed up with fundamentals..
As per my understanding things should improve..
Thanks,
package main
import (
"fmt"
"math"
"sync"
"time"
)
var rgb []float64
func linearizeWithWg(v float64, idx int, wg *sync.WaitGroup) {
defer wg.Done()
if v <= 0.04045 {
rgb[idx] = v / 12.92
} else {
rgb[idx] = math.Pow((v+0.055)/1.055, 2.4)
}
}
func linearizeWithGoR(v float64) float64 {
res := make(chan float64)
go func(input float64) {
if input <= 0.04045 {
res <- input / 12.92
} else {
res <- math.Pow((input+0.055)/1.055, 2.4)
}
}(v)
return <-res
}
func linearizeNomal(v float64) float64 {
if v <= 0.04045 {
return v / 12.92
}
return math.Pow((v+0.055)/1.055, 2.4)
}
func main() {
start := time.Now()
const C = 1.0 / 255.0
//Normal Execution
for i := 0; i < 100000; i++ {
linearizeNomal(float64(i) * C * 0.5)
linearizeNomal(float64(i) * C * 1.5)
linearizeNomal(float64(i) * C * 4.5)
}
elaspsed := time.Since(start)
fmt.Println(elaspsed)
//With GoRoutines.. Slow
start = time.Now()
for i := 0; i < 100000; i++ {
linearizeWithGoR(float64(i) * C * 0.5)
linearizeWithGoR(float64(i) * C * 1.5)
linearizeWithGoR(float64(i) * C * 2.5)
}
elaspsed = time.Since(start)
fmt.Println(elaspsed)
//With WaitGroup. Slow
for i := 0; i < 100000; i++ {
rgb = make([]float64, 3)
var wg sync.WaitGroup
wg.Add(3)
linearizeWithWg(float64(i)*C*0.5, 0, &wg)
linearizeWithWg(float64(i)*C*1.5, 1, &wg)
linearizeWithWg(float64(i)*C*4.5, 2, &wg)
wg.Wait()
}
elaspsed = time.Since(start)
fmt.Println(elaspsed)
}
The overhead of all the concurrency related functions(channel creation, channel sending, goroutine creation) is way bigger than the two instructions that you execute in each of your goroutine.
In addition, your goroutine version is basically serial because you spawn a goroutine and immediately wait for the result of its channel. The waitgroup version is similar.
Try again with a small number of goroutines each executing a chunk of your loop. #Joker_vD also has a good point to make sure that GOMAXPROCS is bigger than one.
Your problem is that you don't actually do anything concurrently.
in the workgroup example, you need to call go linearizeWithWg(...)
In the goroutine example, you start a goroutine, but then wait for it to end in the function. To run it concurrently, you would need a buffered response channel, and having another goroutine getting the responses
Related
I have many goroutines running in my application, and I have another goroutine that must handle only one request at the same period of time and then send the result back to caller.
It means other goroutines should wait until the necessary (single-operated) goroutine is busy.
[goroutine 1] <-
-
-
-
[goroutine 2]<- - - - -> [Process some data in a single goroutine and send the result back to caller
-
-
-
[goroutine 3] <-
This is the diagram how it should look like
I'm very very new to Go and I have a poor knowledge how it should be correctly implemented.
Could someone provide me with some working example so I can run it on go playground?
Here a code snippet which has a few worker-goroutines and one processor-goroutine. Only one single worker-goroutine can send something to the processor because the the processorChannel only allows one entry. When the processor is done, he sends back the response to the worker he got the work from.
package main
import (
"fmt"
"time"
)
type WorkPackage struct {
value int
responseChannel chan int
}
func main() {
processorChannel := make(chan *WorkPackage)
for i := 0; i < 3; i++ {
go runWorker(processorChannel)
}
go runProcessor(processorChannel)
// Do some clever waiting here like with wait groups
time.Sleep(5 * time.Second)
}
func runWorker(processorChannel chan *WorkPackage) {
responseChannel := make(chan int)
for i := 0; i < 10; i++ {
processorChannel <- &WorkPackage{
value: i,
responseChannel: responseChannel,
}
fmt.Printf("** Sent %d\n", i)
response := <-responseChannel
fmt.Printf("** Received the response %d\n", response)
// Do some work
time.Sleep(300 * time.Millisecond)
}
}
func runProcessor(processorChannel chan *WorkPackage) {
for workPackage := range processorChannel {
fmt.Printf("## Received %d\n", workPackage.value)
// Do some processing work
time.Sleep(100 * time.Millisecond)
workPackage.responseChannel <- workPackage.value * 100
}
}
I'll describe the approach with a goroutine that adds two numbers.
Declare request and response types for the goroutine. Include a channel of response values in the request:
type request struct {
a, b int // add these two numbers
ch chan response
}
type response struct {
n int // the result of adding the numbers
}
Kick off a goroutine that receives requests, executes the action and sends the response to the channel in the request:
func startAdder() chan request {
ch := make(chan request)
go func() {
for req := range ch {
req.ch <- response{req.a + req.b}
}
}()
return ch
}
To add the numbers, send a request to the goroutine with a response channel. Receive on the response channel. Return the response value.
func add(ch chan request, a, b int) int {
req := request{ch: make(chan response), a: a, b: b}
ch <- req
return (<-req.ch).n
}
Use it like this:
ch := startAdder()
fmt.Println(add(ch, 1, 2))
Run it on the GoLang PlayGround.
I am trying to make my application run as fast as possible. I purchased a semi-powerful container off of Google Cloud and I am just itching to see how many iterations per second I can get out of this program. However, I am new to Go and so far my implementation is showing to be very messy and not working well.
The way I have it set up now, it will start out at a high rate (around 11,000 iterations per second) but then quickly dwindle down to 2,000. My goal is for a far bigger number than even 11,000. Also, the infofunc(i) function can't seem to keep up with fast speeds and using a goroutine for that function causes overlap of the printing to the console. Also, it will on occasion reuse the WaitGroup before the Wait has returned.
I don't like to be the person to ask to be spoon-fed code, but I am at a loss as to how to implement this. There seems to be so many different methods when it comes to parallelism, multithreading, etc. and it is confusing to me.
import (
"fmt"
"math/big"
"os"
"os/exec"
"sync"
"time"
)
var found = 0
var pages_queried = 0
var start_time = time.Now()
var bignum = new(big.Int)
var foundAddresses = 0
var wg sync.WaitGroup
var set = make(map[string]bool)
var addresses = []string{"6ab42gyr", "lo08n4g6"}
func main() {
bignum.SetString("1000000000000000000000000000", 10)
pick := os.Args[1]
kpp := 128
switch pick {
case "btc":
i := new(big.Int)
i, ok := i.SetString(os.Args[2], 10)
if ok {
cmd := exec.Command("clear")
cmd.Stdout = os.Stdout
cmd.Run()
for i.Cmp(bignum) < 0 {
wg.Add(1)
go func(i *big.Int) {
defer wg.Done()
go printKeys(i.String(), kpp)
i.Add(i, big.NewInt(1))
pages_queried += 1
infofunc(i)
}(i)
wg.Wait()
}
}
}
}
func infofunc(i *big.Int) {
elapsed := time.Now().Sub(start_time)
duration, _ := time.ParseDuration(elapsed.String())
duration2 := int(duration.Seconds())
if duration2 != 0 {
fmt.Printf("\033[5;0H")
fmt.Printf("Started at %s. Found: %d. Elapsed: %s. Queried: %d pages. Current page: %s. Rate: %d/s", start_time.String(), found, elapsed.String(), pages_queried, i.String(), (pages_queried / duration2))
}
}
func printKeys(pageNumber string, keysPerPage int) {
keys := generateKeys(pageNumber, keysPerPage)
length := len(keys)
var addressesLen = len(addresses)
for i := 0; i < length; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
for ii := 0; ii < addressesLen; ii++ {
wg.Add(1)
go func(i int, ii int, keys []key) {
defer wg.Done()
for _, v := range addresses {
if set[keys[i].compressed] || set[keys[i].uncompressed] {
fmt.Print("Found an address: " + v + "!\n")
fmt.Printf("%v", keys[i])
fmt.Print("\n")
foundAddresses += 1
found += 1
}
}
}(i, ii, keys)
}
}(i)
foundAddresses = 0
}
}
I would not use a global sync.WaitGroup, it is hard to understand what is happening. Instead, just define it wherever you need.
You are calling wg.Wait() inside the loop block. That is basically blocking the loop every iteration waiting for goroutine to complete. What you really want is to spawn all the goroutines and only then wait for their completition.
if ok {
cmd := exec.Command("clear")
cmd.Stdout = os.Stdout
cmd.Run()
var wg sync.WaitGroup //I am about to spawn goroutines, I need to wait for them
for i.Cmp(bignum) < 0 {
wg.Add(1)
go func(i *big.Int) {
defer wg.Done()
go printKeys(i.String(), kpp)
i.Add(i, big.NewInt(1))
pages_queried += 1
infofunc(i)
}(i)
}
wg.Wait() //Now that all goroutines are working, let's wait
}
You cannot avoid the print overlap when you have multiple goroutines. If that's a problem you might think of using the Go's log stdlib, which will add timestamps for you. Then, you should be able to sort them in chronological order.
Anyway, split the code in more goroutines does not ensure a speed up. If the problem you are trying to solve is intrinsically sequential, then more goroutines will just add more contention and pressure on Go scheduler, leading to the opposite result. More details here. Thus, a goroutine for infofunc will not help. But it can be improved by using a logger library instead of plain fmt package.
func infofunc(i *big.Int) {
duration := time.Since(start_time).Seconds()
if duration != 0 {
log.Printf("\033[5;0H")
log.Printf("Started at %s. Found: %d. Elapsed: %s. Queried: %d pages. Current page: %s. Rate: %d/s", start_time.String(), found, elapsed.String(), pages_queried, i.String(), (pages_queried / duration2))
}
}
For printKeys, I would not create so many goroutines, they are not going to help if work they need to perform is CPU bound, which seems to be the case here.
func printKeys(pageNumber string, keysPerPage int) {
keys := generateKeys(pageNumber, keysPerPage)
length := len(keys)
var addressesLen = len(addresses)
var wg sync.WaitGroup //Local WaitGroup
for i := 0; i < length; i++ {
wg.Add(1)
go func(i int) { //This goroutine could be removed, in my opinion.
defer wg.Done()
for ii := 0; ii < addressesLen; ii++ {
for _, v := range addresses {
if set[keys[i].compressed] || set[keys[i].uncompressed] {
log.Printf("Found an address: %v\n", v)
log.Printf("%v", keys[i])
log.Printf("\n")
foundAddresses += 1
found += 1
}
}
}
}(i)
foundAddresses = 0
}
wg.Wait()
}
I would suggest to write a benchmark on these functions and then enable tracing. In this way you should get an idea where your code is spending most of the time.
Here is my code:
package main
import (
"fmt"
"sync"
)
func worker(id int, wg sync.WaitGroup, work <-chan int) {
defer func() {
wg.Done()
fmt.Println("worker", id, "done")
}()
fmt.Println("worker", id, "started")
for w := range work {
fmt.Println("worker", id, "got work", w)
}
}
func main() {
work := make(chan int, 2)
var wg sync.WaitGroup
for i := 0; i < 3; i++ {
wg.Add(1)
go worker(i, wg, work)
}
// Generate work and send.
for i := 0; i < 5; i++ {
work <- i
}
close(work)
fmt.Println("waiting ...")
wg.Wait()
fmt.Println("done")
}
Here is the output:
worker 2 started
worker 2 got work 0
worker 2 got work 1
worker 2 got work 2
worker 1 started
waiting ...
worker 0 started
worker 0 done
worker 1 got work 4
worker 1 done
worker 2 got work 3
worker 2 done
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc42001409c)
/usr/local/Cellar/go/1.10.2/libexec/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc420014090)
/usr/local/Cellar/go/1.10.2/libexec/src/sync/waitgroup.go:129 +0x72
main.main()
/Users/lone/bar/bar.go:39 +0x15f
exit status 2
Why did this deadlock occur?
From the docs:
A WaitGroup must not be copied after first use.
Try to pass a pointer to your worker function instead:
func worker(id int, wg *sync.WaitGroup, work <-chan int)
Here's the complete code: Playground
I have the following example below where two goroutine should be running in parallel. But if you check the output, the second goroutine only runs after the first goroutine completes. So, it's sequential.
Adding 2 processors: runtime.GOMAXPROCS(2) also didn't help. I'm running on a Mac pro with 8 cores and it is definitely not a hardware issue.
So my question - Is Golang really parallel and how to make example below run in
parallel?
Output:
Thread 1
Thread 1
…………....
Thread 1
Thread 1
Thread 2
Thread 2
…………....
Thread 2
Thread 2
Go code:
package main
import (
"runtime"
"time"
)
func main() {
runtime.GOMAXPROCS(2)
go func() {
for i := 0; i < 100; i++ {
println("Thread 1")
//time.Sleep(time.Millisecond * 10)
}
}()
go func() {
for i := 0; i < 100; i++ {
println("Thread 2")
//time.Sleep(time.Millisecond * 10)
}
}()
time.Sleep(time.Second)
}
In order to understand your program is running parallel or concurrently using goroutine is print the different value in sequence.
From this article : Concurrency, Goroutines and GOMAXPROCS .
Your code can't express enough to represent a parallel call. Please see the code below.
package main
import (
"fmt"
"runtime"
"sync"
)
func main() {
runtime.GOMAXPROCS(2)
var wg sync.WaitGroup
wg.Add(2)
fmt.Println("Starting Go Routines")
go func() {
defer wg.Done()
//time.Sleep(1 * time.Microsecond)
for char := 'a'; char < 'a'+26; char++ {
fmt.Printf("%c ", char)
}
}()
go func() {
defer wg.Done()
for number := 1; number < 27; number++ {
fmt.Printf("%d ", number)
}
}()
fmt.Println("Waiting To Finish")
wg.Wait()
fmt.Println("\nTerminating Program")
}
As you can see from the above code print the different value sequence using for loop.
The output would be different if you comment the runtime.GOMAXPROCS(2) and uncomment the line on time.Sleep(1 * time.Microsecond).
In the following GoLang program, I am trying to implement the stable marriage problem for N men and N women, using 2*N goroutines (1 for each man and woman).
The program closely follows the program definition as each goroutine (read "each man") sends a message via channel to the desired woman goroutine who in turn rejects/accepts his proposal. I expected the program to easily be scheduled on multiple threads on setting runtime.GOMAXPROCS(4) however it still runs in (almost) the exact same time (and running linux command time still shows CPU usage at 100% instead of expected 400%)
package main
import (
"fmt"
"runtime"
"time"
)
const N = 500
type human struct {
pref [N]int
phone chan int
cur_eng int
cur_num int
id int
}
var men = [N]human{}
var women = [N]human{}
func man(id int) {
guy := &men[id]
for {
runtime.Gosched()
for j := guy.cur_num + 1; j < N; j++ {
guy.cur_num = j
girl := &women[guy.pref[guy.cur_num]]
girl.phone <- guy.id
msg := <-guy.phone
if msg == 1 {
guy.cur_eng = guy.pref[guy.cur_num]
break
}
}
select {
case <-guy.phone:
guy.cur_eng = -1
}
}
}
func woman(id int, termi chan bool) {
girl := &women[id]
for {
runtime.Gosched()
select {
case msg := <-girl.phone:
if msg >= 0 {
if girl.cur_eng == -1 {
men[msg].phone <- 1
girl.cur_eng = msg
termi <- true
} else if girl.pref[girl.cur_eng] < girl.pref[msg] {
men[msg].phone <- 0
} else {
men[msg].phone <- 1
men[girl.cur_eng].phone <- -10
girl.cur_eng = msg
}
}
}
}
}
func main() {
runtime.GOMAXPROCS(8)
for i := 0; i < N; i++ {
men[i] = human{pref: [N]int{}, phone: make(chan int), cur_eng: -1, cur_num: -1, id: i}
for j := 0; j < N; j++ {
fmt.Scanf("%d\n", &(men[i].pref[j]))
}
}
for i := 0; i < N; i++ {
women[i] = human{pref: [N]int{}, phone: make(chan int), cur_eng: -1, cur_num: -1, id: i}
for j := 0; j < N; j++ {
t := 0
fmt.Scanf("%d\n", &t)
women[i].pref[t] = j
}
}
termi := make(chan bool)
for i := 0; i < N; i++ {
go man(i)
go woman(i, termi)
}
for i := 0; i < N; i++ {
<-termi
}
time.Sleep(100 * time.Millisecond)
for i := 0; i < N; i++ {
fmt.Printf("%d %d\n", i, men[i].cur_eng)
}
}
EDIT: The serial implementation of the program I made is here. The time comparison shows both running in almost identical time (1.27s for serial, 1.30s for above one).
Also, the algorithm followed for the parallel implementation was made according to this as well as I could understand (I used goroutines as I didn't know how to use MPI).
Please feel free to suggest an alternative implementation (parallel) if possible.
The program above needs the following as input file : https://drive.google.com/file/d/0B6jsnt965ZwrWlV1OE9LLVA1LUk/view?usp=sharing
I think the input file you provided would take that much time to be read by the program(via scanf per line).