Suppose I have a set of goroutines
wg := sync.WaitGroup{}
wg.Add(5)
go somefunction("A", &wg) //1
go somefunction("B", &wg) //2
go somefunction("A", &wg) //3
go somefunction("A", &wg) //4
go somefunction("B", &wg) //5
wg.Wait()
What I require is that only one goroutine of a particular string ("A" or "B" here) run concurrently. At any time only one of somefunction("A", &wg) should run. For example, //1 and //2 starts running concurrently. After //2 is completed, //5 starts running. After //1 is completed, any one of //3 or //4 starts running.
I was thinking of developing a key based mutex to solve around this issue.
somefunction(key string){
Lock(key)
//piece of code
Unlock(key)
}
The piece of code will be locked for the particular key here.
Let somefunction take a mutex param, and pass the same mutex instance for the same key.
func somefunction(mux *sync.Mutex, wg *sync.WaitGroup) {
mux.Lock()
defer mux.Unlock()
...
}
wg := &sync.WaitGroup{}
wg.Add(5)
muxA := &sync.Mutex{}
muxB := &sync.Mutex{}
go somefunction(muxA, wg) //1
go somefunction(muxB, wg) //2
go somefunction(muxA, wg) //3
go somefunction(muxA, wg) //4
go somefunction(muxB, wg) //5
wg.Wait()
If you want to keep using key-base access, you can store the mutexes in a map:
muxmap := map[string]*sync.Mutex{
"A": &sync.Mutex{},
"B": &sync.Mutex{},
}
go somefunction(muxmap["A"], wg)
Probably can implement a special lock like
type StringKeyLock struct {
locks map[string]*sync.Mutex
mapLock sync.Mutex // to make the map safe concurrently
}
func NewStringKeyLock() *StringKeyLock {
return &StringKeyLock{locks: make(map[string]*sync.Mutex)}
}
func (l *StringKeyLock) getLockBy(key string) *sync.Mutex {
l.mapLock.Lock()
defer l.mapLock.Unlock()
ret, found := l.locks[key]
if found {
return ret
}
ret = &sync.Mutex{}
l.locks[key] = ret
return ret
}
func (l *StringKeyLock) Lock(key string) {
l.getLockBy(key).Lock()
}
func (l *StringKeyLock) Unlock(key string) {
l.getLockBy(key).Unlock()
}
Then to initialize a "global" StringKeyLock
var stringKeyLock = NewStringKeyLock()
Last, to use it
func somefunction(key string){
stringKeyLock.Lock(key)
//piece of code
stringKeyLock.Unlock(key)
}
Related
I have a use case where I need to lock on arguments of a function.
The function itself can be accessed concurrently
Function signature is something like
func (m objectType) operate(key string) (bool) {
// get lock on "key" (return false if unable to get lock in X ms - eg: 100 ms)
// operate
// release lock on "key"
return true;
}
The data space which can be locked is in the range of millions (~10 million)
Concurrent access to operate() is in the range of thousands (1 - 5k)
Expected contention is low though possible in case of hotspots in key (hence the lock)
What is the right way to implement this ? Few options I explored using a concurrent hash map
sync.Map - this is suited for cases with append only entries and high read ratio compared to writes. Hence not applicable here
sharded hashmap where each shard is locked by RWMutex - https://github.com/orcaman/concurrent-map - While this would work, concurrency is limited by no of shards rather than actual contention between keys. Also doesn't enable the timeout scenarios when lot of contention happens for a subset of keys
Though timeout is a P1 requirement, the P0 requirement would be to increase throughput here by granular locking if possible.
Is there a good way to achieve this ?
I would do it by using a map of buffered channels:
to acquire a "mutex", try to fill a buffered channel with a value
work
when done, empty the buffered channel so that another goroutine can use it
Example:
package main
import (
"fmt"
"sync"
"time"
)
type MutexMap struct {
mut sync.RWMutex // handle concurrent access of chanMap
chanMap map[int](chan bool) // dynamic mutexes map
}
func NewMutextMap() *MutexMap {
var mut sync.RWMutex
return &MutexMap{
mut: mut,
chanMap: make(map[int](chan bool)),
}
}
// Acquire a lock, with timeout
func (mm *MutexMap) Lock(id int, timeout time.Duration) error {
// get global lock to read from map and get a channel
mm.mut.Lock()
if _, ok := mm.chanMap[id]; !ok {
mm.chanMap[id] = make(chan bool, 1)
}
ch := mm.chanMap[id]
mm.mut.Unlock()
// try to write to buffered channel, with timeout
select {
case ch <- true:
return nil
case <-time.After(timeout):
return fmt.Errorf("working on %v just timed out", id)
}
}
// release lock
func (mm *MutexMap) Release(id int) {
mm.mut.Lock()
ch := mm.chanMap[id]
mm.mut.Unlock()
<-ch
}
func work(id int, mm *MutexMap) {
// acquire lock with timeout
if err := mm.Lock(id, 100*time.Millisecond); err != nil {
fmt.Printf("ERROR: %s\n", err)
return
}
fmt.Printf("working on task %v\n", id)
// do some work...
time.Sleep(time.Second)
fmt.Printf("done working on %v\n", id)
// release lock
mm.Release(id)
}
func main() {
mm := NewMutextMap()
var wg sync.WaitGroup
for i := 0; i < 50; i++ {
wg.Add(1)
id := i % 10
go func(id int, mm *MutexMap, wg *sync.WaitGroup) {
work(id, mm)
defer wg.Done()
}(id, mm, &wg)
}
wg.Wait()
}
EDIT: different version, where we also handle the concurrent access to the chanMap itself
I am trying to make my application run as fast as possible. I purchased a semi-powerful container off of Google Cloud and I am just itching to see how many iterations per second I can get out of this program. However, I am new to Go and so far my implementation is showing to be very messy and not working well.
The way I have it set up now, it will start out at a high rate (around 11,000 iterations per second) but then quickly dwindle down to 2,000. My goal is for a far bigger number than even 11,000. Also, the infofunc(i) function can't seem to keep up with fast speeds and using a goroutine for that function causes overlap of the printing to the console. Also, it will on occasion reuse the WaitGroup before the Wait has returned.
I don't like to be the person to ask to be spoon-fed code, but I am at a loss as to how to implement this. There seems to be so many different methods when it comes to parallelism, multithreading, etc. and it is confusing to me.
import (
"fmt"
"math/big"
"os"
"os/exec"
"sync"
"time"
)
var found = 0
var pages_queried = 0
var start_time = time.Now()
var bignum = new(big.Int)
var foundAddresses = 0
var wg sync.WaitGroup
var set = make(map[string]bool)
var addresses = []string{"6ab42gyr", "lo08n4g6"}
func main() {
bignum.SetString("1000000000000000000000000000", 10)
pick := os.Args[1]
kpp := 128
switch pick {
case "btc":
i := new(big.Int)
i, ok := i.SetString(os.Args[2], 10)
if ok {
cmd := exec.Command("clear")
cmd.Stdout = os.Stdout
cmd.Run()
for i.Cmp(bignum) < 0 {
wg.Add(1)
go func(i *big.Int) {
defer wg.Done()
go printKeys(i.String(), kpp)
i.Add(i, big.NewInt(1))
pages_queried += 1
infofunc(i)
}(i)
wg.Wait()
}
}
}
}
func infofunc(i *big.Int) {
elapsed := time.Now().Sub(start_time)
duration, _ := time.ParseDuration(elapsed.String())
duration2 := int(duration.Seconds())
if duration2 != 0 {
fmt.Printf("\033[5;0H")
fmt.Printf("Started at %s. Found: %d. Elapsed: %s. Queried: %d pages. Current page: %s. Rate: %d/s", start_time.String(), found, elapsed.String(), pages_queried, i.String(), (pages_queried / duration2))
}
}
func printKeys(pageNumber string, keysPerPage int) {
keys := generateKeys(pageNumber, keysPerPage)
length := len(keys)
var addressesLen = len(addresses)
for i := 0; i < length; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
for ii := 0; ii < addressesLen; ii++ {
wg.Add(1)
go func(i int, ii int, keys []key) {
defer wg.Done()
for _, v := range addresses {
if set[keys[i].compressed] || set[keys[i].uncompressed] {
fmt.Print("Found an address: " + v + "!\n")
fmt.Printf("%v", keys[i])
fmt.Print("\n")
foundAddresses += 1
found += 1
}
}
}(i, ii, keys)
}
}(i)
foundAddresses = 0
}
}
I would not use a global sync.WaitGroup, it is hard to understand what is happening. Instead, just define it wherever you need.
You are calling wg.Wait() inside the loop block. That is basically blocking the loop every iteration waiting for goroutine to complete. What you really want is to spawn all the goroutines and only then wait for their completition.
if ok {
cmd := exec.Command("clear")
cmd.Stdout = os.Stdout
cmd.Run()
var wg sync.WaitGroup //I am about to spawn goroutines, I need to wait for them
for i.Cmp(bignum) < 0 {
wg.Add(1)
go func(i *big.Int) {
defer wg.Done()
go printKeys(i.String(), kpp)
i.Add(i, big.NewInt(1))
pages_queried += 1
infofunc(i)
}(i)
}
wg.Wait() //Now that all goroutines are working, let's wait
}
You cannot avoid the print overlap when you have multiple goroutines. If that's a problem you might think of using the Go's log stdlib, which will add timestamps for you. Then, you should be able to sort them in chronological order.
Anyway, split the code in more goroutines does not ensure a speed up. If the problem you are trying to solve is intrinsically sequential, then more goroutines will just add more contention and pressure on Go scheduler, leading to the opposite result. More details here. Thus, a goroutine for infofunc will not help. But it can be improved by using a logger library instead of plain fmt package.
func infofunc(i *big.Int) {
duration := time.Since(start_time).Seconds()
if duration != 0 {
log.Printf("\033[5;0H")
log.Printf("Started at %s. Found: %d. Elapsed: %s. Queried: %d pages. Current page: %s. Rate: %d/s", start_time.String(), found, elapsed.String(), pages_queried, i.String(), (pages_queried / duration2))
}
}
For printKeys, I would not create so many goroutines, they are not going to help if work they need to perform is CPU bound, which seems to be the case here.
func printKeys(pageNumber string, keysPerPage int) {
keys := generateKeys(pageNumber, keysPerPage)
length := len(keys)
var addressesLen = len(addresses)
var wg sync.WaitGroup //Local WaitGroup
for i := 0; i < length; i++ {
wg.Add(1)
go func(i int) { //This goroutine could be removed, in my opinion.
defer wg.Done()
for ii := 0; ii < addressesLen; ii++ {
for _, v := range addresses {
if set[keys[i].compressed] || set[keys[i].uncompressed] {
log.Printf("Found an address: %v\n", v)
log.Printf("%v", keys[i])
log.Printf("\n")
foundAddresses += 1
found += 1
}
}
}
}(i)
foundAddresses = 0
}
wg.Wait()
}
I would suggest to write a benchmark on these functions and then enable tracing. In this way you should get an idea where your code is spending most of the time.
I am having trouble structuring my goroutines and channels. My select statement keeps quitting before all goroutines are finished, I know the problem is where I am sending the done signal. Where should I send the done signal.
func startWorker(ok chan LeadRes, err chan LeadResErr, quit chan int, verbose bool, wg *sync.WaitGroup) {
var results ProcessResults
defer wg.Done()
log.Info("Starting . . .")
start := time.Now()
for {
select {
case lead := <-ok:
results.BackFill = append(results.BackFill, lead.Lead)
case err := <-err:
results.BadLeads = append(results.BadLeads, err)
case <-quit:
if verbose {
log.Info("Logging errors from unprocessed leads . . .")
logBl(results.BadLeads)
}
log.WithFields(log.Fields{
"time-elapsed": time.Since(start),
"number-of-unprocessed-leads": len(results.BadLeads),
"number-of-backfilled-leads": len(results.BackFill),
}).Info("Done")
return
}
}
}
//BackFillParallel . . .
func BackFillParallel(leads []Lead, verbose bool) {
var wg sync.WaitGroup
gl, bl, d := getChans()
for i, lead := range leads {
done := false
if len(leads)-1 == i {
done = true
}
wg.Add(1)
go func(lead Lead, done bool, wg *sync.WaitGroup) {
ProcessLead(lead, gl, bl, d, done, wg)
}(lead, done, &wg)
}
startWorker(gl, bl, d, verbose, &wg)
}
//ProcessLead . . .
func ProcessLead(lead Lead, c1 chan LeadRes, c2 chan LeadResErr, c3 chan int, done bool, wg *sync.WaitGroup) {
defer wg.Done()
var payloads []Payload
for _, p := range lead.Payload {
decMDStr, err := base64.StdEncoding.DecodeString(p.MetaData)
if err != nil {
c2 <- LeadResErr{lead, err.Error()}
}
var decMetadata Metadata
if err := json.Unmarshal(decMDStr, &decMetadata); err != nil {
goodMetadata, err := FixMDStr(string(decMDStr))
if err != nil {
c2 <- LeadResErr{lead, err.Error()}
}
p.MetaData = goodMetadata
payloads = append(payloads, p)
}
}
lead.Payload = payloads
c1 <- LeadRes{lead}
if done {
c3 <- 0
}
}
First a comment on what main problem I see with the code:
You are passing a done variable to the last ProcessLead call which in turn you use in ProcessLead to stop your worker via quit channel. The problem with this is, that the "last" ProcessLead call may finish BEFORE other ProcessLead calls as they are executed in parallel.
First improvement
Think of your problem as a pipeline. You have 3 steps:
going through all the leads and starting a routine for each one
the routines process their lead
collecting the results
After spreading out in step 2 the simplest way to synchronise is the WaitGroup. As already mentioned you are not calling the synchronise and if you would, you would currently create a deadlock in connection with your collecting routine. You need another goroutine separating the sync from the collecting routine for this to work.
How that could look like (sry for removing some code, so I could better see the structure):
//BackFillParallel . . .
func BackFillParallel(leads []Lead, verbose bool) {
gl, bl, d := make(chan LeadRes), make(chan LeadResErr), make(chan int)
// additional goroutine with wg.Wait() and closing the quit channel
go func(d chan int) {
var wg sync.WaitGroup
for i, lead := range leads {
wg.Add(1)
go func(lead Lead, wg *sync.WaitGroup) {
ProcessLead(lead, gl, bl, wg)
}(lead, &wg)
}
wg.Wait()
// stop routine after all other routines are done
// if your channels have buffers you might want make sure there is nothing in the buffer before closing
close(d) // you can simply close a quit channel. just make sure to only close it once
}(d)
// now startworker is running parallel to wg.Wait() and close(d)
startWorker(gl, bl, d, verbose)
}
func startWorker(ok chan LeadRes, err chan LeadResErr, quit chan int, verbose bool) {
for {
select {
case lead := <-ok:
fmt.Println(lead)
case err := <-err:
fmt.Println(err)
case <-quit:
return
}
}
}
//ProcessLead . . .
func ProcessLead(lead Lead, c1 chan LeadRes, c2 chan LeadResErr, wg *sync.WaitGroup) {
defer wg.Done()
var payloads []Payload
for _, p := range lead.Payload {
decMDStr, err := base64.StdEncoding.DecodeString(p.MetaData)
if err != nil {
c2 <- LeadResErr{lead, err.Error()}
}
var decMetadata Metadata
if err := json.Unmarshal(decMDStr, &decMetadata); err != nil {
goodMetadata, err := FixMDStr(string(decMDStr))
if err != nil {
c2 <- LeadResErr{lead, err.Error()}
}
p.MetaData = goodMetadata
payloads = append(payloads, p)
}
}
lead.Payload = payloads
c1 <- LeadRes{lead}
}
Suggested Solution
As mentioned in a comment you might run into trouble if you have buffered channels. The complication comes with the two output channels you have (for Lead and LeadErr). You could avoid this with the following structure:
//BackFillParallel . . .
func BackFillParallel(leads []Lead, verbose bool) {
gl, bl := make(chan LeadRes), make(chan LeadResErr)
// one goroutine that blocks until all ProcessLead functions are done
go func(gl chan LeadRes, bl chan LeadResErr) {
var wg sync.WaitGroup
for _, lead := range leads {
wg.Add(1)
go func(lead Lead, wg *sync.WaitGroup) {
ProcessLead(lead, gl, bl, wg)
}(lead, &wg)
}
wg.Wait()
}(gl, bl)
// main routine blocks until all results and errors are collected
var wg sync.WaitGroup
res, errs := []LeadRes{}, []LeadResErr{}
wg.Add(2) // add 2 for resCollector and errCollector
go resCollector(&wg, gl, res)
go errCollector(&wg, bl, errs)
wg.Wait()
fmt.Println(res, errs) // in these two variables you will have the results.
}
func resCollector(wg *sync.WaitGroup, ok chan LeadRes, res []LeadRes) {
defer wg.Done()
for lead := range ok {
res = append(res, lead)
}
}
func errCollector(wg *sync.WaitGroup, ok chan LeadResErr, res []LeadResErr) {
defer wg.Done()
for err := range ok {
res = append(res, err)
}
}
// ProcessLead function as in "First improvement"
This piece of code gets an array with strings and should mix it and make a response (both multithreaded).
The problem is that the text of the answer is randomly cutted. This is probably due to the fact that the variable "dump" does not have time to sign up completely. If it is wrapped in mutexes, the text is returned full, but threads are blocked and the program is executed for a long time. Please help!
const url = "https://yandex.ru/referats/write/?t=astronomy+mathematics"
const parseThreadsNum = 10
const generateThreadsNum = 1000
func startServer(wg *sync.WaitGroup) {
fmt.Println("Server started")
ch := make(chan []byte, generateThreadsNum)
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
wg.Add(generateThreadsNum)
rawText := multiparseText() // text parsing
prettyText := prettifyText(rawText) // truncates commas, dots etc. Return []string
for i := 0; i < generateThreadsNum; i++ {
go func() {
dump, _ := json.Marshal(shuffle(&prettyText)) //shuffle mixes array randomly
ch <- dump
}()
go func() {
w.Write(<-ch)
defer wg.Done()
}()
}
fmt.Println("Text generated")
wg.Wait()
})
log.Fatal(http.ListenAndServe(":8081", nil))
}
func main() {
var wg sync.WaitGroup
runtime.GOMAXPROCS(runtime.NumCPU())
startServer(&wg)
}
I was trying to implement multithreading in golang. I am able to implement go routines but it is not working as expected. below is the sample program which i have prepared,
func test(s string, fo *os.File) {
var s1 [105]int
count :=0
for x :=1000; x<1101;x++ {
s1[count] = x;
count++
}
//fmt.Println(s1[0])
for i := range s1 {
runtime.Gosched()
sd := s + strconv.Itoa(i)
var fileMutex sync.Mutex
fileMutex.Lock()
fmt.Fprintf(fo,sd)
defer fileMutex.Unlock()
}
}
func main() {
fo,err :=os.Create("D:/Output.txt")
if err != nil {
panic(err)
}
for i := 0; i < 4; i++ {
go test("bye",fo)
}
}
OUTPUT - good0bye0bye0bye0bye0good1bye1bye1bye1bye1good2bye2bye2bye2bye2.... etc.
the above program will create a file and write "Hello" and "bye" in the file.
My problem is i am trying to create 5 thread and wanted to process different values values with different thread. if you will see the above example it is printing "bye" 4 times.
i wanted output like below using 5 thread,
good0bye0good1bye1good2bye2....etc....
any idea how can i achieve this?
First, you need to block in your main function until all other goroutines return. The mutexes in your program aren't blocking anything, and since they're re-initialized in each loop, they don't even block within their own goroutine. You can't defer an unlock if you're not returning from the function, you need to explicitly unlock in each iteration of the loop. You aren't using any of the values in your array (though you should use a slice instead), so we can drop that entirely. You also don't need runtime.GoSched in a well-behaved program, and it does nothing here.
An equivalent program that will run to completion would look like:
var wg sync.WaitGroup
var fileMutex sync.Mutex
func test(s string, fo *os.File) {
defer wg.Done()
for i := 0; i < 105; i++ {
fileMutex.Lock()
fmt.Fprintf(fo, "%s%d", s, i)
fileMutex.Unlock()
}
}
func main() {
fo, err := os.Create("D:/output.txt")
if err != nil {
log.Fatal(err)
}
for i := 0; i < 4; i++ {
wg.Add(1)
go test("bye", fo)
}
wg.Wait()
}
Finally though, there's no reason to try and write serial values to a single file from multiple goroutines, and it's less efficient to do so. If you want the values ordered over the entire file, you will need to use a single goroutine anyway.