I am building something like a delay-line: one process RPUSHes objects into a list, another LPOPs them out in the same order.
The trick is that objects should only be popped from the list one hour after they have been added. I could use a time-stamp for that. The delay is the same for all items and never changes.
Now how to I implement the pop in a concurrency-friendly way (so that it still works when several workers access that list)? I could take out an item, check the timestamp and put it back into the list if it's still too early. But if several workers do that simultaneously, it may mess up the order of items. I could check the first item and only pop it if it's due. But another worker might have popped it then so I pop the wrong one.
Should I use the WATCH command? How? Should I use sorted sets instead of a list? Help appreciated!
I'd suggest using a sorted set. Entries go into the zset with the normal identifier as key and a Unix-style timestamp as the score. Make the timestamps the date+time after which each entry is ready for parsing. Workers do a ZPOP, which isn't a built-in but can be emulated with:
MULTI
ZRANGE <key> 0 0 WITHSCORES
ZREMRANGEBYRANK <key> 0 0
EXEC
Capture the results of the ZRANGE and you have the element with the lowest score at the time, already removed from the set, with its score. Put it back if it's not valid yet with a ZADD <key> <score> <item>.
Putting the item back in the list after checking it won't mess up the ordering enough to matter - by having any sort of concurrency you accept that the order won't be strictly defined. However, that approach won't be particularly efficient as you can't check the timestamp without some processing of the item.
With the scripting build you could implement a conditional pop command for sorted sets, though I don't think that is the easiest solution.
There are a couple of ways to implement basic scheduling using multiple lists:
Add tasks to a sorted set and have a single worker responsible for moving them from the sorted set to a list that multiple workers access with pop.
Have a list for each minute, and have the workers read from past lists. You can minimize the number of lists checked by using another key that tracks the oldest non-empty queue.
Related
Consider you are running simulations and each simulation writes results to a output.txt file. I want to run thousands of simulations using multithreading, while doing that even though if i use locking, unlocking, i was still having errors when multiple threads access to the file at the same time.
To solve this, I am going to add result texts to a query that stores them. That is, each thread will add result to this query instead of writing it to the output.txt file. And in the end, i'll take stored texts from query and write to output.txt
My question here is: whenever multiple threads are adding such items to a query, do you think an error might happen in the end, like, missing simulation ? I've come up to this question because whenever you increase single value by multithreads, that value will not be incremented as much as you want if you don't be careful. (i.e, in a multithreaded for loop, add +1 to previously declared int a for 1000 times; then in the end, a will not be 1000 (ofc this can be prevented by other things))
I am writing an app and need to do something functionally similar to what url shortening websites do. I will be generating 6 character (case insensitive alphanumeric) random strings which would identify their longer versions of the link. This leads to 2176782336 possibilities ((10+26)^6). While assigning these strings, there are two approaches I can think about.
Approach 1: the system generates a random string at the runtime and checks for it uniqueness in the system, if it is not unique it tries again. and finally reaches a unique string somehow. But it might create issues if the user is "unlucky" maybe.
Approach 2: I generate a pool of some possible values and assign them as soon as they are needed, this however would make sure the user is always allocated a unique string almost instantly, while this could at the same time also mean, I would have to do plenty of computation in crons beforehand and will increase over the period of time.
While I already have the code to generate such values, a help on approach might be insightful as I am looking forward to a highly accelerated app experience. I could not find any comparative study on this.
Cheers!
What I do in similar situations is to keep N values queued up so that I can instantly assign them, and then when the queue's size falls below a certain threshold (say, .2 * N) I have a background task add another N items to the queue. It probably makes sense to start this background task as soon as your program starts (as opposed to generating the first N values offline and then loading them at startup), operating on the assumption that there will be some delay between startup and requests for values from the queue.
I mean like thousands users in time updating values in database?
Yes, nextval is safe to use from multiple concurrently operating transactions. That is its purpose and its reason for existing.
That said, it is not actually "thread safe" as such, because PostgreSQL uses a multi-processing model not a multi-threading model, and because most client drivers (libpq, for example) do not permit more than one thread at a time to interact with a single connection.
You should also be aware that while nextval is guaranteed to return distinct and increasing values, it is not guaranteed to do so without "holes" or "gaps". Such gaps are created when a generated value is discarded without being committed (say, by a ROLLBACK) and when PostgreSQL recovers after a server crash.
While nextval will always return increasing numbers, this does not mean that your transactions will commit in the order they got IDs from a given sequence in. It's thus perfectly normal to have something like this happen:
Start IDs in table: [1 2 3 4]
1st tx gets ID 5 from nextval()
2nd tx gets ID 6 from nextval()
2nd tx commits: [1 2 3 4 6]
1st tx commits: [1 2 3 4 5 6]
In other words, holes can appear and disappear.
Both these anomalies are necessary and unavoidable consequences of making one nextval call not block another.
If you want a sequence without such ordering and gap anomalies, you need to use a gapless sequence design that permits only one transaction at a time to have an uncommitted generated ID, effectively eliminating all concurrency for inserts in that table. This is usually implemented using SELECT FOR UPDATE or UPDATE ... RETURNING on a counter table.
Search for "PostgreSQL gapless sequence" for more information.
Yes it is threadsafe.
From the manual:
nextvalAdvance the sequence object to its next value and return that value. This is done atomically: even if multiple sessions execute nextval concurrently, each will safely receive a distinct sequence value.
(Emphasis mine)
Yes: http://www.postgresql.org/docs/current/static/functions-sequence.html
It wouldn't be useful otherwise.
Edit:
Here is how you use nextval and currval:
nextval returns a new sequence number, you use this for the id in an insert on the first table
currval returns the last sequence number obtained by this session, you use that in foreign keys to reference the first table
each call to nextval returns another value, don't call it twice in the same set of inserts.
And of course, you should use transactions in any multiuser code.
This poster asked a different question on the same flawed code.
here Point is: he does not seem to know how foreign keys work, and has them reversed (a sequence functioning as a foreign key is kind of awkward IMHO)
BTW: This is should be a comment, not an answer; but I can't comment yet.
I need to be able to do the following:
search a linked list.
add a new node to the list in case it's not found.
be thread safe and use rwlock since it's read mostly list.
The issue i'm having is when I promote from read_lock to write_lock I need to search the list again just to make sure some other thread wasn't waiting on a write_lock while I was doing the list search holding the read_lock.
Is there a different way to achieve the above without doing a double list search (perhaps a seq_lock of some sort)?
Convert the linked list to a sorted linked list. When its time to add a new node you can check again to see if another writer has added an equivalent node while you were acquiring the lock by inspecting only two nodes, instead of searching the entire list. You will spend a little more time on each node insertion because you need to determine the sorted order of the new node, but you will save time by not having to search the entire list. Overall you will probably save a lot of time.
I have 2 threaded methods running in 2 separate places but sharing access at the same time to a list array object (lets call it PriceArray), the first thread Adds and Removes items from PriceArray when necessary (the content of the array gets updated from a third party data provider) and the average update rate is between 0.5 and 1 second.
The second thread only reads -for now- the content of the array every 3 seconds using a foreach loop (takes most items but not all of them).
To ensure avoiding the nasty Collection was modified; enumeration operation may not execute exception when the second thread loops through the array I have wrapped the add and remove operation in the first thread with lock(PriceArray) to ensure exclusive access and prevent that exception from occurring. The problem is I have noticed a performance issue when the second method tries to loop through the array items as most of the time the array is locked by the add/remove thread.
Having the scenario running this way, do you have any suggestions how to improve the performance using other thread-safety/exclusive access tactics in C# 4.0?
Thanks.
Yes, there are many alternatives.
The best/easiest would be to switch to using an appropriate collection in System.Collections.Concurrent. These are all thread-safe collections, and will allow you to use them without managing your own locks. They are typically either lock-free or use very fine grained locking, so will likely dramatically improve the performance impacts you're getting from the synchronization.
Another option would be to use ReaderWriterLockSlim to allow your readers to not block each other. Since a third party library is writing this array, this may be a more appropriate solution. It would allow you to completely block during writing, but the readers would not need to block each other during reads.
My suggestion is that ArrayList.Remove() takes most of the time, because in order to perform deletion it performs two costly things:
linear search: just takes elements one by one and compares with element being removed
when index of the element being removed is found - it shifts everything below it by one position to the left.
Thus every deletion takes time proportionally to count of elements currently in the collection.
So you should try to replace ArrayList with more appropriate structure for this task. I need more information about your case to suggest which one to choose.