This piece of code gets an array with strings and should mix it and make a response (both multithreaded).
The problem is that the text of the answer is randomly cutted. This is probably due to the fact that the variable "dump" does not have time to sign up completely. If it is wrapped in mutexes, the text is returned full, but threads are blocked and the program is executed for a long time. Please help!
const url = "https://yandex.ru/referats/write/?t=astronomy+mathematics"
const parseThreadsNum = 10
const generateThreadsNum = 1000
func startServer(wg *sync.WaitGroup) {
fmt.Println("Server started")
ch := make(chan []byte, generateThreadsNum)
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
wg.Add(generateThreadsNum)
rawText := multiparseText() // text parsing
prettyText := prettifyText(rawText) // truncates commas, dots etc. Return []string
for i := 0; i < generateThreadsNum; i++ {
go func() {
dump, _ := json.Marshal(shuffle(&prettyText)) //shuffle mixes array randomly
ch <- dump
}()
go func() {
w.Write(<-ch)
defer wg.Done()
}()
}
fmt.Println("Text generated")
wg.Wait()
})
log.Fatal(http.ListenAndServe(":8081", nil))
}
func main() {
var wg sync.WaitGroup
runtime.GOMAXPROCS(runtime.NumCPU())
startServer(&wg)
}
Related
I'm using Golang, I'm trying to make this program thread-safe. It takes a number as a parameter (which is the number of consumer tasks to start), reads lines from an input, and accumulates word count. I want the threads to be safe (but I don't want it to just lock everything, it needs to be efficient) should I use channels? How do I do this?
package main
import (
"bufio"
"fmt"
"log"
"os"
"sync"
)
// Consumer task to operate on queue
func consumer_task(task_num int) {
fmt.Printf("I'm consumer task #%v ", task_num)
fmt.Println("Line being popped off queue: " + queue[0])
queue = queue[1:]
}
// Initialize queue
var queue = make([]string, 0)
func main() {
// Initialize wait group
var wg sync.WaitGroup
// Get number of tasks to run from user
var numof_tasks int
fmt.Print("Enter number of tasks to run: ")
fmt.Scan(&numof_tasks)
// Open file
file, err := os.Open("test.txt")
if err != nil {
log.Fatal(err)
}
defer file.Close()
// Scanner to scan the file
scanner := bufio.NewScanner(file)
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
// Loop through each line in the file and append it to the queue
for scanner.Scan() {
line := scanner.Text()
queue = append(queue, line)
}
// Start specified # of consumer tasks
for i := 1; i <= numof_tasks; i++ {
wg.Add(1)
go func(i int) {
consumer_task(i)
wg.Done()
}(i)
}
wg.Wait()
fmt.Println("All done")
fmt.Println(queue)
}
You have a data race on the slice queue. Concurrent goroutines, when popping elements off the head of the queue to do so in a controlled manner either via a sync.Mutex lock. Or use a channel to manage the "queue" of work items.
To convert what you have to using channels, update the worker to take an input channel as your queue - and range on the channel, so each worker can handle more than one task:
func consumer_task(task_num int, ch <-chan string) {
fmt.Printf("I'm consumer task #%v\n", task_num)
for item := range ch {
fmt.Printf("task %d consuming: Line item: %v\n", task_num, item)
}
// each worker will drop out of their loop when channel is closed
}
change queue from a slice to a channel & feed items in like so:
queue := make(chan string)
go func() {
// Loop through each line in the file and append it to the queue
for scanner.Scan() {
queue <- scanner.Text()
}
close(queue) // signal to workers that there is no more items
}()
then just update your work dispatcher code to add the channel input:
go func(i int) {
consumer_task(i, queue) // add the queue parameter
wg.Done()
}(i)
https://go.dev/play/p/AzHyztipUZI
I am trying to make my application run as fast as possible. I purchased a semi-powerful container off of Google Cloud and I am just itching to see how many iterations per second I can get out of this program. However, I am new to Go and so far my implementation is showing to be very messy and not working well.
The way I have it set up now, it will start out at a high rate (around 11,000 iterations per second) but then quickly dwindle down to 2,000. My goal is for a far bigger number than even 11,000. Also, the infofunc(i) function can't seem to keep up with fast speeds and using a goroutine for that function causes overlap of the printing to the console. Also, it will on occasion reuse the WaitGroup before the Wait has returned.
I don't like to be the person to ask to be spoon-fed code, but I am at a loss as to how to implement this. There seems to be so many different methods when it comes to parallelism, multithreading, etc. and it is confusing to me.
import (
"fmt"
"math/big"
"os"
"os/exec"
"sync"
"time"
)
var found = 0
var pages_queried = 0
var start_time = time.Now()
var bignum = new(big.Int)
var foundAddresses = 0
var wg sync.WaitGroup
var set = make(map[string]bool)
var addresses = []string{"6ab42gyr", "lo08n4g6"}
func main() {
bignum.SetString("1000000000000000000000000000", 10)
pick := os.Args[1]
kpp := 128
switch pick {
case "btc":
i := new(big.Int)
i, ok := i.SetString(os.Args[2], 10)
if ok {
cmd := exec.Command("clear")
cmd.Stdout = os.Stdout
cmd.Run()
for i.Cmp(bignum) < 0 {
wg.Add(1)
go func(i *big.Int) {
defer wg.Done()
go printKeys(i.String(), kpp)
i.Add(i, big.NewInt(1))
pages_queried += 1
infofunc(i)
}(i)
wg.Wait()
}
}
}
}
func infofunc(i *big.Int) {
elapsed := time.Now().Sub(start_time)
duration, _ := time.ParseDuration(elapsed.String())
duration2 := int(duration.Seconds())
if duration2 != 0 {
fmt.Printf("\033[5;0H")
fmt.Printf("Started at %s. Found: %d. Elapsed: %s. Queried: %d pages. Current page: %s. Rate: %d/s", start_time.String(), found, elapsed.String(), pages_queried, i.String(), (pages_queried / duration2))
}
}
func printKeys(pageNumber string, keysPerPage int) {
keys := generateKeys(pageNumber, keysPerPage)
length := len(keys)
var addressesLen = len(addresses)
for i := 0; i < length; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
for ii := 0; ii < addressesLen; ii++ {
wg.Add(1)
go func(i int, ii int, keys []key) {
defer wg.Done()
for _, v := range addresses {
if set[keys[i].compressed] || set[keys[i].uncompressed] {
fmt.Print("Found an address: " + v + "!\n")
fmt.Printf("%v", keys[i])
fmt.Print("\n")
foundAddresses += 1
found += 1
}
}
}(i, ii, keys)
}
}(i)
foundAddresses = 0
}
}
I would not use a global sync.WaitGroup, it is hard to understand what is happening. Instead, just define it wherever you need.
You are calling wg.Wait() inside the loop block. That is basically blocking the loop every iteration waiting for goroutine to complete. What you really want is to spawn all the goroutines and only then wait for their completition.
if ok {
cmd := exec.Command("clear")
cmd.Stdout = os.Stdout
cmd.Run()
var wg sync.WaitGroup //I am about to spawn goroutines, I need to wait for them
for i.Cmp(bignum) < 0 {
wg.Add(1)
go func(i *big.Int) {
defer wg.Done()
go printKeys(i.String(), kpp)
i.Add(i, big.NewInt(1))
pages_queried += 1
infofunc(i)
}(i)
}
wg.Wait() //Now that all goroutines are working, let's wait
}
You cannot avoid the print overlap when you have multiple goroutines. If that's a problem you might think of using the Go's log stdlib, which will add timestamps for you. Then, you should be able to sort them in chronological order.
Anyway, split the code in more goroutines does not ensure a speed up. If the problem you are trying to solve is intrinsically sequential, then more goroutines will just add more contention and pressure on Go scheduler, leading to the opposite result. More details here. Thus, a goroutine for infofunc will not help. But it can be improved by using a logger library instead of plain fmt package.
func infofunc(i *big.Int) {
duration := time.Since(start_time).Seconds()
if duration != 0 {
log.Printf("\033[5;0H")
log.Printf("Started at %s. Found: %d. Elapsed: %s. Queried: %d pages. Current page: %s. Rate: %d/s", start_time.String(), found, elapsed.String(), pages_queried, i.String(), (pages_queried / duration2))
}
}
For printKeys, I would not create so many goroutines, they are not going to help if work they need to perform is CPU bound, which seems to be the case here.
func printKeys(pageNumber string, keysPerPage int) {
keys := generateKeys(pageNumber, keysPerPage)
length := len(keys)
var addressesLen = len(addresses)
var wg sync.WaitGroup //Local WaitGroup
for i := 0; i < length; i++ {
wg.Add(1)
go func(i int) { //This goroutine could be removed, in my opinion.
defer wg.Done()
for ii := 0; ii < addressesLen; ii++ {
for _, v := range addresses {
if set[keys[i].compressed] || set[keys[i].uncompressed] {
log.Printf("Found an address: %v\n", v)
log.Printf("%v", keys[i])
log.Printf("\n")
foundAddresses += 1
found += 1
}
}
}
}(i)
foundAddresses = 0
}
wg.Wait()
}
I would suggest to write a benchmark on these functions and then enable tracing. In this way you should get an idea where your code is spending most of the time.
I am having trouble structuring my goroutines and channels. My select statement keeps quitting before all goroutines are finished, I know the problem is where I am sending the done signal. Where should I send the done signal.
func startWorker(ok chan LeadRes, err chan LeadResErr, quit chan int, verbose bool, wg *sync.WaitGroup) {
var results ProcessResults
defer wg.Done()
log.Info("Starting . . .")
start := time.Now()
for {
select {
case lead := <-ok:
results.BackFill = append(results.BackFill, lead.Lead)
case err := <-err:
results.BadLeads = append(results.BadLeads, err)
case <-quit:
if verbose {
log.Info("Logging errors from unprocessed leads . . .")
logBl(results.BadLeads)
}
log.WithFields(log.Fields{
"time-elapsed": time.Since(start),
"number-of-unprocessed-leads": len(results.BadLeads),
"number-of-backfilled-leads": len(results.BackFill),
}).Info("Done")
return
}
}
}
//BackFillParallel . . .
func BackFillParallel(leads []Lead, verbose bool) {
var wg sync.WaitGroup
gl, bl, d := getChans()
for i, lead := range leads {
done := false
if len(leads)-1 == i {
done = true
}
wg.Add(1)
go func(lead Lead, done bool, wg *sync.WaitGroup) {
ProcessLead(lead, gl, bl, d, done, wg)
}(lead, done, &wg)
}
startWorker(gl, bl, d, verbose, &wg)
}
//ProcessLead . . .
func ProcessLead(lead Lead, c1 chan LeadRes, c2 chan LeadResErr, c3 chan int, done bool, wg *sync.WaitGroup) {
defer wg.Done()
var payloads []Payload
for _, p := range lead.Payload {
decMDStr, err := base64.StdEncoding.DecodeString(p.MetaData)
if err != nil {
c2 <- LeadResErr{lead, err.Error()}
}
var decMetadata Metadata
if err := json.Unmarshal(decMDStr, &decMetadata); err != nil {
goodMetadata, err := FixMDStr(string(decMDStr))
if err != nil {
c2 <- LeadResErr{lead, err.Error()}
}
p.MetaData = goodMetadata
payloads = append(payloads, p)
}
}
lead.Payload = payloads
c1 <- LeadRes{lead}
if done {
c3 <- 0
}
}
First a comment on what main problem I see with the code:
You are passing a done variable to the last ProcessLead call which in turn you use in ProcessLead to stop your worker via quit channel. The problem with this is, that the "last" ProcessLead call may finish BEFORE other ProcessLead calls as they are executed in parallel.
First improvement
Think of your problem as a pipeline. You have 3 steps:
going through all the leads and starting a routine for each one
the routines process their lead
collecting the results
After spreading out in step 2 the simplest way to synchronise is the WaitGroup. As already mentioned you are not calling the synchronise and if you would, you would currently create a deadlock in connection with your collecting routine. You need another goroutine separating the sync from the collecting routine for this to work.
How that could look like (sry for removing some code, so I could better see the structure):
//BackFillParallel . . .
func BackFillParallel(leads []Lead, verbose bool) {
gl, bl, d := make(chan LeadRes), make(chan LeadResErr), make(chan int)
// additional goroutine with wg.Wait() and closing the quit channel
go func(d chan int) {
var wg sync.WaitGroup
for i, lead := range leads {
wg.Add(1)
go func(lead Lead, wg *sync.WaitGroup) {
ProcessLead(lead, gl, bl, wg)
}(lead, &wg)
}
wg.Wait()
// stop routine after all other routines are done
// if your channels have buffers you might want make sure there is nothing in the buffer before closing
close(d) // you can simply close a quit channel. just make sure to only close it once
}(d)
// now startworker is running parallel to wg.Wait() and close(d)
startWorker(gl, bl, d, verbose)
}
func startWorker(ok chan LeadRes, err chan LeadResErr, quit chan int, verbose bool) {
for {
select {
case lead := <-ok:
fmt.Println(lead)
case err := <-err:
fmt.Println(err)
case <-quit:
return
}
}
}
//ProcessLead . . .
func ProcessLead(lead Lead, c1 chan LeadRes, c2 chan LeadResErr, wg *sync.WaitGroup) {
defer wg.Done()
var payloads []Payload
for _, p := range lead.Payload {
decMDStr, err := base64.StdEncoding.DecodeString(p.MetaData)
if err != nil {
c2 <- LeadResErr{lead, err.Error()}
}
var decMetadata Metadata
if err := json.Unmarshal(decMDStr, &decMetadata); err != nil {
goodMetadata, err := FixMDStr(string(decMDStr))
if err != nil {
c2 <- LeadResErr{lead, err.Error()}
}
p.MetaData = goodMetadata
payloads = append(payloads, p)
}
}
lead.Payload = payloads
c1 <- LeadRes{lead}
}
Suggested Solution
As mentioned in a comment you might run into trouble if you have buffered channels. The complication comes with the two output channels you have (for Lead and LeadErr). You could avoid this with the following structure:
//BackFillParallel . . .
func BackFillParallel(leads []Lead, verbose bool) {
gl, bl := make(chan LeadRes), make(chan LeadResErr)
// one goroutine that blocks until all ProcessLead functions are done
go func(gl chan LeadRes, bl chan LeadResErr) {
var wg sync.WaitGroup
for _, lead := range leads {
wg.Add(1)
go func(lead Lead, wg *sync.WaitGroup) {
ProcessLead(lead, gl, bl, wg)
}(lead, &wg)
}
wg.Wait()
}(gl, bl)
// main routine blocks until all results and errors are collected
var wg sync.WaitGroup
res, errs := []LeadRes{}, []LeadResErr{}
wg.Add(2) // add 2 for resCollector and errCollector
go resCollector(&wg, gl, res)
go errCollector(&wg, bl, errs)
wg.Wait()
fmt.Println(res, errs) // in these two variables you will have the results.
}
func resCollector(wg *sync.WaitGroup, ok chan LeadRes, res []LeadRes) {
defer wg.Done()
for lead := range ok {
res = append(res, lead)
}
}
func errCollector(wg *sync.WaitGroup, ok chan LeadResErr, res []LeadResErr) {
defer wg.Done()
for err := range ok {
res = append(res, err)
}
}
// ProcessLead function as in "First improvement"
I was trying to implement multithreading in golang. I am able to implement go routines but it is not working as expected. below is the sample program which i have prepared,
func test(s string, fo *os.File) {
var s1 [105]int
count :=0
for x :=1000; x<1101;x++ {
s1[count] = x;
count++
}
//fmt.Println(s1[0])
for i := range s1 {
runtime.Gosched()
sd := s + strconv.Itoa(i)
var fileMutex sync.Mutex
fileMutex.Lock()
fmt.Fprintf(fo,sd)
defer fileMutex.Unlock()
}
}
func main() {
fo,err :=os.Create("D:/Output.txt")
if err != nil {
panic(err)
}
for i := 0; i < 4; i++ {
go test("bye",fo)
}
}
OUTPUT - good0bye0bye0bye0bye0good1bye1bye1bye1bye1good2bye2bye2bye2bye2.... etc.
the above program will create a file and write "Hello" and "bye" in the file.
My problem is i am trying to create 5 thread and wanted to process different values values with different thread. if you will see the above example it is printing "bye" 4 times.
i wanted output like below using 5 thread,
good0bye0good1bye1good2bye2....etc....
any idea how can i achieve this?
First, you need to block in your main function until all other goroutines return. The mutexes in your program aren't blocking anything, and since they're re-initialized in each loop, they don't even block within their own goroutine. You can't defer an unlock if you're not returning from the function, you need to explicitly unlock in each iteration of the loop. You aren't using any of the values in your array (though you should use a slice instead), so we can drop that entirely. You also don't need runtime.GoSched in a well-behaved program, and it does nothing here.
An equivalent program that will run to completion would look like:
var wg sync.WaitGroup
var fileMutex sync.Mutex
func test(s string, fo *os.File) {
defer wg.Done()
for i := 0; i < 105; i++ {
fileMutex.Lock()
fmt.Fprintf(fo, "%s%d", s, i)
fileMutex.Unlock()
}
}
func main() {
fo, err := os.Create("D:/output.txt")
if err != nil {
log.Fatal(err)
}
for i := 0; i < 4; i++ {
wg.Add(1)
go test("bye", fo)
}
wg.Wait()
}
Finally though, there's no reason to try and write serial values to a single file from multiple goroutines, and it's less efficient to do so. If you want the values ordered over the entire file, you will need to use a single goroutine anyway.
I have a large log file that you want to analyze in parallel.
I have this code:
package main
import (
"bufio"
"fmt"
"os"
"time"
)
func main() {
filename := "log.txt"
threads := 10
// Read the file
file, err := os.Open(filename)
if err != nil {
fmt.Println("Could not open file with the database.")
os.Exit(1)
}
defer file.Close()
scanner := bufio.NewScanner(file)
// Channel for strings
tasks := make(chan string)
// Run the threads that catch events from the channel and understand one line of the log file
for i := 0; i < threads; i++ {
go parseStrings(tasks)
}
// Start a thread load lines from a file into the channel
go getStrings(scanner, tasks)
// At this point I have to wait until all of the threads executed
// For example, I set the sleep
for {
time.Sleep(1 * time.Second)
}
}
func getStrings(scanner *bufio.Scanner, tasks chan<- string) {
for scanner.Scan() {
s := scanner.Text()
tasks <- s
}
}
func parseStrings(tasks <-chan string) {
for {
s := <-tasks
event := parseLine(s)
fmt.Println(event)
}
}
func parseLine(line string) string {
return line
}
Actually, as I wait for the end of all threads?
I was advised to create a separate thread in which I'll add the result of, but how to add?
Using the pipeline pattern, and the "fan out / fan in" pattern:
package main
import (
"bufio"
"fmt"
"strings"
"sync"
)
func main() {
file := "here is first line\n" +
"here is second line\n" +
"here is line 3\n" +
"here is line 4\n" +
"here is line 5\n" +
"here is line 6\n" +
"here is line 7\n"
scanner := bufio.NewScanner(strings.NewReader(file))
// all lines onto one channel
in := getStrings(scanner)
// FAN OUT
// Multiple functions reading from the same channel until that channel is closed
// Distribute work across multiple functions (ten goroutines) that all read from in.
xc := fanOut(in, 10)
// FAN IN
// multiplex multiple channels onto a single channel
// merge the channels from c0 through c9 onto a single channel
for n := range merge(xc) {
fmt.Println(n)
}
}
func getStrings(scanner *bufio.Scanner) <-chan string {
out := make(chan string)
go func() {
for scanner.Scan() {
out <- scanner.Text()
}
close(out)
}()
return out
}
func fanOut(in <-chan string, n int) []<-chan string {
var xc []<-chan string
for i := 0; i < n; i++ {
xc = append(xc, parseStrings(in))
}
return xc
}
func parseStrings(in <-chan string) <-chan string {
out := make(chan string)
go func() {
for n := range in {
out <- parseLine(n)
}
close(out)
}()
return out
}
func parseLine(line string) string {
return line
}
func merge(cs []<-chan string) <-chan string {
var wg sync.WaitGroup
wg.Add(len(cs))
out := make(chan string)
for _, c := range cs {
go func(c <-chan string) {
for n := range c {
out <- n
}
wg.Done()
}(c)
}
go func() {
wg.Wait()
close(out)
}()
return out
}
Check it out on the playground.
var wg sync.WaitGroup
When start each goroutine do:
wg.Add(1)
When goroutine work done decrement counter with
wg.Done()
as a result, instead of
for {
time.Sleep(1 * time.Second)
}
do
wg.Wait()
Just use the sync.WaitGroup
package main
import(
"sync"
)
func stuff(wg *sync.WaitGroup) {
defer wg.Done() // tell the WaitGroup it's done
/* stuff */
}
func main() {
count := 50
wg := new(sync.WaitGroup)
wg.Add(count) // add number of gorutines to the WaitGroup
for i := 0; i < count; i++ {
go stuff(wg)
}
wg.Wait() // wait for all
}
Play