Peekable Queue in Golang - multithreading

I am trying to design a mechanism to allow cooperation of many processes – goroutines. There are two classes of processes – providers and users. Providers put “bids” for their services into a queue and users take waiting bids and start working with providers. However, a user may not like a bid and then two things should happen:
This bid should return to the queue. It should be placed at the beginning of the queue
The user should be given the next bid in the queue
Ideally, I would like to avoid a central process that coordinates the communication between providers and users.
Another way of thinking about this problem is to imagine a “peekable” queue or channel. A concept similar to the way AWS Kinesis works. A reader can get an access to “peek” into the head of the queue. As this reader is peeking, no other readers can see the item. It the reader likes the item, then it removes it from the queue. If not the reader releases the lock on the item and another reader can peek.
Any ideas how to best implement this behavior in Go using channels and goroutines?

As #DaveC states in his comment, the simplest and fastest way to do this is to use a mutex.
You could use the "container/list" package, which implements a double-linked list for you. This can be pushed/popped from both ends.
Here is a quick implementation that does what I think you are asking:
import (
"container/list"
"sync"
)
type Queue struct {
q list.List
l sync.Mutex
}
func (q *Queue) Push(data interface{}) {
q.l.Lock()
q.q.PushBack(data)
q.l.Unlock()
}
func (q *Queue) Pop() interface{} {
q.l.Lock()
data := q.q.Remove(q.q.Front())
q.l.Unlock()
return data
}
func (q *Queue) TakeAnother(data interface{}) interface{} {
q.l.Lock()
e := q.q.Front()
// swap the data with whatever is in the front of the list
e.Value, data = data, e.Value
q.l.Unlock()
return data
}
Nowhere do I use channels or goroutines, since I don't think they are the correct tool for this job.

Related

two way communication through channels in golang

I have several functions that I want them to be executed atomically since they deal with sensitive data structures. Suppose the following scenario:
There are two functions: lock(sth) and unlock(sth) that can be called anytime by a goroutine to lock or unlock sth in a global array. I was thinking about having a command channel so that goroutines send lock and unlock commands into the channel, and on the receive side of the channel, some kind of handler handles lock, unlock requests, sequentially, by grabbing commands from the channel. That's fine, but what if the handler wants to send the result back to the requester? Is it possible to do so use golang channels? I know that it is possible to use some kind of lock mechanism like mutex, but I was wondering if it's possible to use channels for such use-case? I saw somewhere that it is recommended to use channel instead of goland low-level lock structs.
In a single sentence:
In a channel with the capacity of 1, I want the receiver side to be able to reply back to the goroutine which sent the message.
or equivalently:
A goroutine sends something to a channel; the message is received by another goroutine and handled leading to some result; how does the sender become aware of the result?
The sync package includes a Mutex lock, sync.Mutex, which can be locked and unlocked from any goroutine in a threadsafe way. Instead of using a channel to send a command to lock something, how about just using a mutex lock from the sender?
mutex := new(sync.Mutex)
sensitiveData := make([]string, 0)
// when someone wants to operate on a sensitiveData,
// ...
mutex.Lock()
operate(sensitiveData)
mutex.Unlock()
When you say how does the sender become aware of the result, I think you're talking about how does the handler receive the result -- that would be with a chan. You can send data through channels.
Alternatively, if you just want to be aware, a semaphore, sync.WaitGroup might do the job. This struct can be Add()ed to, and then the sender can wg.Wait() until the handler calls wg.Done(), which will indicate to the sender (which is waiting) that the handler is done doing such and such.
If your question is about whether to use locks or channels, the wiki has a terse answer:
A common Go newbie mistake is to over-use channels and goroutines just because it's possible, and/or because it's fun. Don't be afraid to use a sync.Mutex if that fits your problem best. Go is pragmatic in letting you use the tools that solve your problem best and not forcing you into one style of code.
As a general guide, though:
Channel: passing ownership of data, distributing units of work, communicating async results
Mutex: caches, state
If you absolutely want to avoid anything but chans :), try not altering the sensitive array to begin with. Rather, use channels to send data to different goroutines, at each step processing the data, and then funneling the processed data into a final type goroutine. That is, avoid using an array at all and store the data in chans.
As the motto goes,
Do not communicate by sharing memory; instead, share memory by communicating.
If you want to prevent race conditions then sync primitives should work just fine, as described in #Nevermore's answer. It leaves the code much more readable and easier to reason about.
However, if you want channels to perform syncing for you, you can always try something like below:
// A global, shared channel used as a lock. Capacity of 1 allows for only
// one thread to access the protected resource at a time.
var lock = make(chan struct{}, 1)
// Operate performs the access/modification on the protected resource.
func Operate(f func() error) error {
lock <- struct{}{}
defer func() { <- lock }()
return f()
}
To use this Operate, pass in a closure that accesses the protected resource.
// Some value that requires concurrent access.
var arr = []int{1, 2, 3, 4, 5}
// Used to sync up goroutines.
var wg sync.WaitGroup
wg.Add(len(arr))
for i := 0; i < len(arr); i++ {
go func(j int) {
defer wg.Done()
// Access to arr remains protected.
Operate(func () error {
arr[j] *= 2
return nil
})
}(i)
}
wg.Wait()
Working example: https://play.golang.org/p/Drh-yJDVNh
Or you can entirely bypass Operate and use lock directly for more readability:
go func(j int) {
defer wg.Done()
lock <- struct{}{}
defer func() { <- lock }()
arr[j] *= 2
}(i)
Working example: https://play.golang.org/p/me3K6aIoR7
As you can see, arr access is protected using a channel here.
The other questions have covered locking well, but I wanted to address the other part of the question around using channels to send a response back to a caller. There is a not-uncommon pattern in Go of sending a response channel with the request. For example, you might send commands to a handler over a channel; these commands would be a struct with implementation-specific details, and the struct would include a channel for sending the result back, typed to the result type. Each command sent would include a new channel, which the handler would use to send back the response, and then close. To illustrate:
type Command struct {
// command parameters etc
Results chan Result
}
type Result struct {
// Whatever a result is in this case
}
var workQueue = make(chan Command)
// Example for executing synchronously
func Example(param1 string, param2 int) Result {
workQueue <- Command{
Param1: param1,
Param2: param2,
Results: make(chan Result),
}
return <- Results

Companion object variable update while other actors reading

We are building REST using spray and akka. For this we need to read more than 10k files from disk (Mostly static, updates might come twice per day). Reading from disk for each request is giving performance hit, we put all required information in DataMap (Map object). Using akka scheduler updating DataMap for each 15min(Needs to be up-to date with disk data).
class SampleScheduler extends Actor with ActorLogging {
import context._
val tick = context.system.scheduler.schedule(1.second,15.minute, self,"mytick")
override def postStop() = tick.cancel()
override def receive: Receive = {
case "mytick" => {
println(s"Yes got the tick now ${new Date().toGMTString}")
Test.setDataMap()
}
}
}
object Test {
var DataMap:Map[String,List[String]]=Map()
def setDataMap()={
DataMap = //Read files from disk
}
}
object Main extends App {
//For each new request look into DataMap
if(Test.DataMap.isEmpty) {
//How to handle this, can i use like this
Thread.sleep(1000)
}
}
So when the new request comes, it searches required data from map and get information, process accordingly.
How to achieve below requirements with above said design.
Now for each request, creating one actor and reading above DataMap and starts processing. After started processing, if DataMap becomes empty and re-loaded, how to handle this?
If DataMap found empty, how to retry? Can i use Thread.sleep method?
Is storing DataMap and resetting it for each 15min in "Object" good practice?
First and most important of all, you must avoid (3). Storing state outside of actor is evil! Besides, storing state and maintaining it is what actors are good at.
And after you put state inside of actor and share it via messages, (2) will be obsolete. The request actor will ask state actor and if state actor is busy with re-reading files, it will answer after its job done.
Lastly, there are 2 different strategies you can follow to solve (1). State actor will process each messages in order so it can (a) reply message with last know state or (b) hold messages and reply with newly populated state if it thinks it should re-read files.

How to control simultaneous access to multiple shared queues by multiple producers?

One way would be to lock and then check the status of first shared queue, push data if space available, or ignore if not, and then unlock.
Then check the status of second shared queue, push data if space available, or ignore if not, and then unlock.
So, on and so forth.
Here we'll be constantly locking and unlocking to see the status of a shared queue and then act accordingly.
Questions:
What are the drawbacks of this method? Of course time will be spent in locking and unlocking. Is that it?
What are the other ways to achieve the same effect without the current method's drawbacks?
Lock contention is very expensive because it requires a context switch - see the LMAX Disruptor for a more in-depth explanation, in particular the performance results page; the Disruptor is a lock-free bounded queue that exhibits less latency than a bounded queue that uses locks.
One way to reduce lock contention is to have your producers check the queues in a different order from each other, for example instead of each producer checking Queue1, then Queue2, ... and finally QueueN, each producer would repeatedly generate a random number between [1, N] and then check Queue[Rand(N)]. A more complex solution would be to maintain a set of queues sorted according to their available space (e.g. in Java this would be a ConcurrentSkipListSet), then have each producer remove the queue from the head of the set (i.e. the queue with the most available space that is not being simultaneously accessed by another producer), add an element, and insert the queue back into the set; a simpler solution along the same vein is to maintain an unbounded unsorted queue of queues and to have a producer remove and check the queue from the head of the queue of queues and to then insert the queue back into the tail of the queue of queues, which would ensure that only one producer is able to check a queue at any given point of time.
Another solution is to reduce and ideally eliminate the number of locks - it's difficult to write lock-free algorithms but it's also potentially very rewarding as demonstrated by the performance of LMAX's lock-free queue. In lieu of replacing your locked bounded queues with LMAX's lock-free bounded queues, another solution is to replace your locked bounded queues with lock-free unbounded queues (e.g. Java's ConcurrentLinkedQueue; lock-free unbounded queues are much more likely to be in your language's standard library than lock-free bounded queues) and to place conservative lock-free guards on these queues. For example, using Java's AtomicInteger for the guards:
public class BoundedQueue<T> {
private ConcurrentLinkedQueue<T> queue = new ConcurrentLinkedQueue<>();
private AtomicInteger bound = new AtomicInteger(0);
private final int maxSize;
public BoundedQueue(int maxSize) {
this.maxSize = maxSize;
}
public T poll() {
T retVal = queue.poll();
if(retVal != null) {
bound.decrementAndGet();
}
return retVal;
}
public boolean offer(T t) {
if(t == null) throw new NullPointerException();
int boundSize = bound.get();
for(int retryCount = 0; retryCount < 3 && boundSize < maxSize; retryCount++) {
if(bound.compareAndSet(boundSize, boundSize + 1)) {
return queue.offer(t);
}
boundSize = bound.get();
}
return false;
}
}
poll() will return the element from the head of the queue, decrementing bound if the head element isn't null i.e. if the queue isn't empty. offer(T t) attempts to increment the size of bound without exceeding maxSize, if this succeeds then it puts the element at the tail of the queue, else if this fails three times then the method returns false. This is a conservative guard because it is possible for offer to fail even if the queue isn't full, e.g. if an element is removed after boundSize = bound.get() sets boundSize to maxSize, or if the bound.compareAndSet(expected, newVal) method happens to fail three times due to multiple consumers calling poll().
Really, you are making too many locks/unlocks here. The solution is to make the same check twice:
check if space is available, if not, continue
lock
check if space is available AGAIN
... go on as you did before.
This way you will lock if you needn't to do it only in very rare cases.
I have first seen the solution in the book "Professional Java EE Design Patterns" (Yener, Theedom)
Edit.
About spreading the start queue numbers among threads.
Notice, that without any special organization these threads are waiting for queues only the first time. The next time the necessary timeshift will be already created by simply waiting. Of course, we can create the timeshift ourselves, spreading the start numbers among threads. And that simple spread by equal shift will be more effective that a random one.

Synchronizing producer, consumer and a producer queue

I have a producer and a consumer. Producer fills its internal queue with objects, consumer takes these objects one by one. I want to synchronize the cosumer with the producer, so that the consumer blocks when there are no objects ready, and I want to synchronize the producer with itself, so that it stops producing when the queue is full (and starts again when there’s space). How do I do that? I was able to solve a simpler case without the queue using NSConditionalLock, but with the queue the problem looks more complex.
You might consider using a pair of NSOperationQueues or dispatch queues. Have your production operations (in the producer queue) send messages, on the main thread if necessary, to an object that adds consumption operations to the consumer queue.
I ended up using two semaphores, objectsReady and bufferFreeSlots:
#implementation Producer
- (id) getNextObject {
[objectsReady wait];
id anObject = [[buffer objectAtIndex:0] retain];
[buffer removeObjectAtIndex:0];
[bufferFreeSlots signal];
return [anObject autorelease];
}
- (void) decodeLoop {
while (1) {
[bufferFreeSlots wait];
[buffer push:[self produceAnObject]];
[objectsReady signal];
}
}
#end
The bufferFreeSlots is initialized to the maximum queue size. So far it seems to work, but God knows this is an act of faith, not a solid confidence.

Threading 101: What is a Dispatcher?

Once upon a time, I remembered this stuff by heart. Over time, my understanding has diluted and I mean to refresh it.
As I recall, any so called single threaded application has two threads:
a) the primary thread that has a pointer to the main or DllMain entry points; and
b) For applications that have some UI, a UI thread, a.k.a the secondary thread, on which the WndProc runs, i.e. the thread that executes the WndProc that recieves messages that Windows posts to it. In short, the thread that executes the Windows message loop.
For UI apps, the primary thread is in a blocking state waiting for messages from Windows. When it recieves them, it queues them up and dispatches them to the message loop (WndProc) and the UI thread gets kick started.
As per my understanding, the primary thread, which is in a blocking state, is this:
C++
while(getmessage(/* args &msg, etc. */))
{
translatemessage(&msg, 0, 0);
dispatchmessage(&msg, 0, 0);
}
C# or VB.NET WinForms apps:
Application.Run( new System.Windows.Forms() );
Is this what they call the Dispatcher?
My questions are:
a) Is my above understanding correct?
b) What in the name of hell is the Dispatcher?
c) Point me to a resource where I can get a better understanding of threads from a Windows/Win32 perspective and then tie it up with high level languages like C#. Petzold is sparing in his discussion on the subject in his epic work.
Although I believe I have it somewhat right, a confirmation will be relieving.
It depends on what you consider the primary thread. Most UI frameworks will have an event handler thread that sits mostly idle, waiting for low level events. When an event occurs this thread gets a lock on the event queue, and adds the events there. This is hardly what I'd consider the primary thread, though.
In general a dispatcher takes some events and, based on their content or type sends (dispatches, if you will) them to another chunk of code (often in another thread, but not always). In this sense the event handler thread itself is a simple dispatcher. On the other end of the queue, the framework typically provides another dispatcher that will take events off of the queue. For instance, sending mouse events to mouse listeners, keyboard events to keyboard listeners etc.
Edit:
A simple dispatcher may look like this:
class Event{
public:
EventType type; //Probably an enum
String data; //Event data
};
class Dispatcher{
public:
...
dispatch(Event event)
{
switch(event.type)
{
case FooEvent:
foo(event.data);
break;
...
}
};
Most people I've met use "dispatcher" to describe something that's more than just a simple passthrough. In this case, it performs different actions based on a type variable which is consistent with most of the dispatchers I've seen. Often the switch is replaced with polymorphism, but switch makes it clearer what's going on for an example.

Resources