I need a data structure to store 500k keys, each one with some associated data. 150 Threads will be running concurrently & accessing the keys. Once in a day I need to update the data structure since there may be some manipulation operation, say the key is deleted, new key is added or the data is changed.
When the data structure updation is in progress I can not block any of the 150 threads from accessing it.
I don't want to use current hash implementations like memcache or redis since the number of keys may grow in future & I want in-memory access for faster lookup?
Instead will prefer some data structure implementation in C/C++.
The Userspace RCU library contains a set of concurrent data structures implemented with the help of RCU. Among those a lock-free resizable hash table based on the articles
Ori Shalev and Nir Shavit. Split-ordered lists: Lock-free
extensible hash tables. J. ACM 53, 3 (May 2006), 379-405.
Michael, M. M. High performance dynamic lock-free hash tables
and list-based sets. In Proceedings of the fourteenth annual ACM
symposium on Parallel algorithms and architectures, ACM Press,
(2002), 73-82.
For more information you can see the comments in the implementation at http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=rculfhash.c
LMDB can handle this http://symas.com/mdb/ Since it uses MVCC, writers don't block readers. You can update whatever/whenever and your 150 reader threads will run just fine. LMDB reads perform no blocking operations whatsoever, and scale perfectly linearly across any number of CPUs.
(Disclaimer: I am the author of LMDB)
Related
I am a little lost in this task. There is a requirement for our caching solution to split a large data dictionary into partitions and perform operations on them in separate threads.
The scenario is: We have a large pool of data that is to be kept in memory (40m rows), the chosen strategy is first to have a Dictionary with int key. This dictionary contains a subset of 16 dictionaries that are keyed by guid and contain a data class.
The number 16 is calculated on startup and indicates CPU core count * 4.
The data class contains a byte[] which is basically a translated set of properties and their values, int pointer to metadata dictionary and checksum.
Then there is a set of control functions that takes care of locking and assigns/retrieves Guid keyed data based on a division of the first segment of guid (8 hex numbers) by divider. This divider is just FFFFFFFF / 16. This way each key will have a corresponding partition assigned.
Now I need to figure out how to perform operations (key lookup, iterations and writes) on these dictionaries in separate threads in parallel? Will I just wrap these operations using Tasks? Or will it be better to load these behemoth dictionaries into separate threads whole?
I have a rough idea how to implement data collectors, that will be the easy part I guess.
Also, is using Dictionaries a good approach? Their size is limited to 3mil rows per partition and if one is full, the control mechanism tries to insert on another server that is using the exact same mechanism.
Is .NET actually a good language to implement this solution?
Any help will be extremely appreciated.
Okay, so I implemented ReaderWriterLockSlim and implemented concurrent access through System.Threading.Tasks. I also managed to exclude any dataClass object from the storage, now it is only a dictionary of byte[]s.
It's able to store all 40 million rows taking just under 4GB of RAM and through some careful SIMD optimized manipulations performs EQUALS, <, > and SUM operation iterations in under 20ms, so I guess this issue is solved.
Also the concurrency throughput is quite good.
I just wanted to post this in case anybody faces similar issue in the future.
The problem
I have discovered that Cosmos DB is priced very aggressively and can be expensive if used with many data types.
I would think that a good structure, would be to put each data type I have in their own collection, almost like tables in a database (not quite).
However, each collection costs at least 24 USD per month. This is if I choose "Fixed", that limits me to 10GB and is NOT scalable. Hardly the point of Cosmos DB, so I would rather choose "Unlimited". However, here the price is at least 60 USD per month.
60 USD per month per data type.
This includes 1000 RU, but on top of this, I have to pay more for consumption.
This might be OK if I have a few data types, but if I a fully fledged business application with 30 data types (not at all uncommon), it becomes 1800 USD per month, at least. As a starting price. When I have no data yet.
The question
The structure of the data in the collection is not strict. I can store different types of documents in the same collection.
When using an "Unlimited" collection, I can use partition keys, which should be used to partition my data to ensure scalability.
However, why do I not just include the data type in the partition key?
Then the partition key becomes something like:
[customer-id]-[data-type]-[actual-partition-value, like 'state']
With one swift move, my minimum cost becomes 60 USD and the rest is based on consumption. Presumably, partition keys ensure satisfactory performance regardless of the data volume. So what am I missing? Is there some problem with this approach?
Update
Microsoft now supports sharing RU across all containers (without a minimum of 10000 RU) so this question is essentially no longer relevant, as you can now freely choose to separate data into different containers without any extra cost.
No there will be no problem per se.
It all boils down to whether you're fine with having 1000 RU/s, or more specifically a single bottleneck, for your whole system.
In fact you can simplify this even more by having your document id to be the partition key. This will guarantee the uniqueness of the document id and will enable the maximum possible distribution and scale in CosmosDB.
That's exactly how collection sharing works in Cosmonaut (disclaimer, I'm the creator of this project) and I have noticed no problems, even on systems with many different data types.
However you have to keep in mind that even though you can scale this collection up and down you still restrict your whole system with this one bottleneck. I would recommend that you don't just create one collection but probably 2 or 3 collections with shared entities in them. If this is done smart enough and you batch entities in a logical way then you can scale your throughput for specific parts of your system.
I built a Twitter clone, and the row that stores Justin Bieber’s profile (some very famous person with a lot of followers) is read incredibly often. The server that stores it seems to be overloaded. Can I buy a bigger server just for that row? By the way, it isn’t updated very often.
The short answer is that Cloud Spanner does not offer different server configurations, except to increase your number of nodes.
If you don't mind reading stale data, one way to increase read throughput is to use read-only, bounded-staleness transactions. This will ensure that your reads for these rows can be served from any replica of the split(s) that owns those rows.
If you wanted to go even further, you might consider a data modeling tradeoff that makes writes more expensive but reads cheaper. One way of doing that would be to manually shard that row (for example by creating N copies of it with different primary keys). When you want to read the row, a client can pick one to read at random. When you update it, just update all the copies atomically within a single transaction. Note that this approach is rarely used in practice, as very few workloads truly have the characteristics you are describing.
How hazelcast-jet achieves anything vastly different from what was earlier achievable by submitting EntryProcessors on keys in an IMap?
Curious to know.
Quoting the InfoQ article on Jet:
Sending a runnable to a partition is analogous to the work of a single DAG vertex. The advantage of Jet comes from the ability to have the vertex transform the data it reads, producing items which no longer belong to the same partition, then reshuffle them while sending to the downstream vertex so they are again correctly partitioned. This is essential for any kind of map-reduce operation where the reducing unit must observe all the data items with the same key. To minimize network traffic, Jet can first reduce the data slice produced on the local member, then send only one item per key to the remote member that combines the partial results.
And note that this is just an advantage in the context of the same or similar use cases currently covered by entry processors. Jet can take data from any source and make use of the whole cluster's computational resources to process it.
I need to know the factoring that needs to be taken into consideration when implementing a solution using CouchDB. I understand that CouchDB does not require normalization and that the standard techniques that I use in RDBMS development are mostly thrown away.
But what exactly are the costs involved. I perfectly understand the benefits, but the costs of storage make me a bit nervous as it appears as CouchDB would need an awful lot of replicated data, some of it going stale and out of date well before its usage. How would one manage stale data?
I know that I could implement some awful relationship model with documents using Couchdb and lower the costs of storage, but wouldn't this defeat the objectives of Couchdb and the performances that I can gain?
An example I am thinking about is a system for requistions, ordering and tendering. The system currently has the one to many thing going on and the many might get updated more frequently than the one.
Any help would be great as I am an old school RDBMS guy with all the teachings of C.J. Date, E.F Codd and R. F. Boyce, so struggling at the moment with the radical notion of document storage.
Does Couchdb have anything internal to manage the recognition and reduction of duplicate data?
Only you know how many copies of how much data you will use, so unfortunately the only good answer will be to build simulated data sets and measure the disk usage.
In addition, similar to a file system, CouchDB requires additional storage for metadata. This cost depends on two factors:
How often you update or create a document
How often you compact
The worst-case instantaneous disk usage will be the total amount of data times two, plus all the old document revisions (#1) existing at compaction time (#2). This is because compaction builds a new database file with only the current document revisions. Therefore the usage will be two copies of current data (from the old file plus the new file), plus all of the "wasted" old revisions awatiing deletion when compaction completes. After compaction, the old file is deleted so you will reclaim over half of this worst-case value.
Running compaction all the time is no problem to reduce data use however it has implications with disk i/o.