Hazelcast : Difference in data distribution across partition in IMap and ISemaphore - hazelcast

My doubt is from link https://hazelcast.org/mastering-hazelcast/#controlled-partitioning
It says:
Hazelcast has two types of distributed objects.
One type is the truly partitioned data structure, like the IMap, where
each partition will store a section of the Map.
The other type is a non-partitioned data structure, like the
IAtomicLong or the ISemaphore, where only a single partition is
responsible for storing the main instance.
Let's say, I have put 500 records in IMap, what I understand is, each record may go in different partition.
Now I have put 500 records in ISemaphore, then from above quoted paragraph from the link does it mean, that all the 500 records will go in single partition?
Please help me to understand IAtomicLong or the ISemaphore, where only a single partition is responsible for storing the main instance.
Also would like to understand, how Semaphore and IMap differ when it comes to data distribution across parttion in hazelcast?

With an IMap, it makes sense to partition the data structure because it will usually hold lots of items (500 in your example) and concurrent access to items anywhere in the map is frequently needed.
But data structures like ISemaphore and IAtomicLong are simple objects, not collections of objects - you can't add 500 records to an ISemaphore. The semaphore state consists of just a few fields (count, current owner, name, maybe a few others) and it doesn't make sense to break those apart and store them in separate partitions.
A queue is more interesting because it does hold multiple items, but is not a partitioned data structure. You could add 500 items to a queue, but access is always going to be to the front of the queue (for reading) or back of the queue (for writing), so distributing the data structure across partitions doesn't really offer improved concurrency as it does with Map, Set, List, and similar collections that are accessed randomly.

Related

How hazelcast-jet achieves anything different from hazelcast EntryProcessors

How hazelcast-jet achieves anything vastly different from what was earlier achievable by submitting EntryProcessors on keys in an IMap?
Curious to know.
Quoting the InfoQ article on Jet:
Sending a runnable to a partition is analogous to the work of a single DAG vertex. The advantage of Jet comes from the ability to have the vertex transform the data it reads, producing items which no longer belong to the same partition, then reshuffle them while sending to the downstream vertex so they are again correctly partitioned. This is essential for any kind of map-reduce operation where the reducing unit must observe all the data items with the same key. To minimize network traffic, Jet can first reduce the data slice produced on the local member, then send only one item per key to the remote member that combines the partial results.
And note that this is just an advantage in the context of the same or similar use cases currently covered by entry processors. Jet can take data from any source and make use of the whole cluster's computational resources to process it.

Azure storage table records filtering advice

Imaging something like a blog posting system, built using Azure Storage Table.
A user posts a message and the database records user's Region, City and Language along with it.
After that, a user is able to browse all other user's posts and able to filter them by any combination of Region, City and Language. Or neither and see all of them.
I see several solutions:
Put each message in 8 different partitions with combinations of Region-City-Language (pros: lightning fast point queries on read; cons: 8 transactions per message on write).
Put each message in 4 different partitions with combinations of Region-City and an ability to do partition scan to filter by languages (pros: less transactions than (1); cons: partition scan, 4 transactions per message).
Put each message in partitions, based on user's ID (pros: single transaction per message; cons: slow table scan and partition scan after that).
The way i see it:
Fast reads, slow (and perhaps costly) writes.
Balanced reads/writes/cost.
Fast writes, slow (but cheap) reads.
By "cost/cheap" i mean pricing based on transactions (not space).
And by "balanced" i mean just among these variants.
Thought about using index tables, but can't see how they help here.
So the question is, perhaps there is another, better way?
I've decided to go with a variation of (1).
The difference is that i won't be storing ALL of combinations for Region-Location-Language. Instead i decided to store only uniques:
Table: FiltersByRegion
----------------------
Partition: Region
Row: Location.Language
Prop: Message
Table: FiltersByRegionPlace
---------------------------
Partition: Region.Location
Row: Language
Prop: Message
Table: FiltersByRegionLanguage
------------------------------
Partition: Region.Language
Row: Location
Prop: Message
Table: FiltersByLanguage
------------------------
Partition: Language
Row: Region.Location
Prop: Message
Because of the fact that i'm storing only uniques there won't be a lot of transactions per every post. Only those, that are not already present in database.
In other words, if there are a lot of posts from the same region-location-language, filter tables won't be updated and transactions won't be spent. Tests for uniques could use Redis to speed things a bit.
Filtering is now only a matter of picking the right table.
It depends on your scenarios and read/write pattern. You might want to consider some aspects:
Design for how the records will be queried. Putting them into a "Region-City-Language" partition with message ID as entity data may help in your fast query.
Each message may have a unique message ID and ID-Message mappings are saved in other tables, then every time you only need to update one table when a message is updated and the message ID referenced in other tables keeps unchanged.
Leverage ParitionKey and RowKey in your table design and query entities with both keys. For instance: "Region-City-Language" as partition keys and "User" as row keys.
Consider storing duplicate copies of entities for query scenarios. For example, if you have heavy users based and language based queries, you may consider have two tables with "user" and "language" as keys respectively.
Please also refer to https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/ for a full guide.

How concurrent are CQL's collections?

I am taking a look at Cassandra's CQL collections (list, set, map) and I can't find a reliable source stating on their concurrency.
I want to know if having multiple writers adding different elements to the same set is supported.
From what I read of the implementation (http://www.opensourceconnections.com/blog/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure-sets-lists-and-maps/) it seems that sets are implemented using columns, so I should be safe.
But on the other hand, I've read here and there that the operations on the collections always triggered a full read (even writes). This suggests that I could get in trouble if multiple writers were using the same collection.
So can I (and should I) use collection from multiple writers? (And also, the documentation mentions that collections should be used for "small amount of data", how much would that be? Tens, hundreds, thousands?
Updates are atomic which should include any collections in the row. So it should be fine to have multiple writers.
"In an UPDATE statement, all updates within the same partition key are applied atomically and in isolation."
http://cassandra.apache.org/doc/cql3/CQL.html#updateStmt
"Values of items in collections are limited to 64K"
http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddlWhenCollections.html
Cheers,

Azure Table Storage Partition Key

Two somewhat related questions.
1) Is there anyway to get an ID of the server a table entity lives on?
2) Will using a GUID give me the best partition key distribution possible? If not, what will?
we have been struggling for weeks on table storage performance. In short, it's really bad, but early on we realized that using a randomish partition key will distribute the entities across many servers, which is exactly what we want to do as we are trying to achieve 8000 reads per second. Apparently our partition key wasn't random enough, so for testing purposes, I have decided to just use a GUID. First impression is it is waaaaaay faster.
Really bad get performance is < 1000 per second. Partition key is Guid.NewGuid() and row key is the constant "UserInfo". Get is execute using TableOperation with pk and rk, nothing else as follows: TableOperation retrieveOperation = TableOperation.Retrieve(pk, rk); return cloudTable.ExecuteAsync(retrieveOperation). We always use indexed reads and never table scans. Also, VM size is medium or large, never anything smaller. Parallel no, async yes
As other users have pointed out, Azure Tables are strictly controlled by the runtime and thus you cannot control / check which specific storage nodes are handling your requests. Furthermore, any given partition is served by a single server, that is, entities belonging to the same partition cannot be split between several storage nodes (see HERE)
In Windows Azure table, the PartitionKey property is used as the partition key. All entities with same PartitionKey value are clustered together and they are served from a single server node. This allows the user to control entity locality by setting the PartitionKey values, and perform Entity Group Transactions over entities in that same partition.
You mention that you are targeting 8000 requests per second? If that is the case, you might be hitting a threshold that requires very good table/partitionkey design. Please see the article "Windows Azure Storage Abstractions and their Scalability Targets"
The following extract is applicable to your situation:
This will provide the following scalability targets for a single storage account created after June 7th 2012.
Capacity – Up to 200 TBs
Transactions – Up to 20,000 entities/messages/blobs per second
As other users pointed out, if your PartitionKey numbering follows an incremental pattern, the Azure runtime will recognize this and group some of your partitions within the same storage node.
Furthermore, if I understood your question correctly, you are currently assigning partition keys via GUID's? If that is the case, this means that every PartitionKey in your table will be unique, thus every partitionkey will have no more than 1 entity. As per the articles above, the way Azure table scales out is by grouping entities in their partition keys inside independent storage nodes. If your partitionkeys are unique and thus contain no more than one entity, this means that Azure table will scale out only one entity at a time! Now, we know Azure is not that dumb, and it groups partitionkeys when it detects a pattern in the way they are created. So if you are hitting this trigger in Azure and Azure is grouping your partitionkeys, it means your scalability capabilities are limited to the smartness of this grouping algorithm.
As per the the scalability targets above for 2012, each partitionkey should be able to give you 2,000 transactions per second. Theoretically, you should need no more than 4 partition keys in this case (assuming that the workload between the four is distributed equally).
I would suggest you to design your partition keys to group entities in such a way that no more than 2,000 entities per second per partition are reached, and drop using GUID's as partitionkeys. This will allow you to better support features such as Entity Transaction Group, reduce the complexity of your table design, and get the performance you are looking for.
Answering #1: There is no concept of a server that a particular table entity lives on. There are no particular servers to choose from, as Table Storage is a massive-scale multi-tenant storage system. So... there's no way to retrieve a server ID for a given table entity.
Answering #2: Choose a partition key that makes sense to your application. just remember that it's partition+row to access a given entity. If you do that, you'll have a fast, direct read. If you attempt to do a table- or partition-scan, your performance will certainly take a hit.
See
http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx for more info on key selection (Note the numbers are 3 years old, but the guidance is still good).
Also this talk can be of some use as far as best practice : http://channel9.msdn.com/Events/TechEd/NorthAmerica/2013/WAD-B406#fbid=lCN9J5QiTDF.
In general a given partition can support up to 2000 tps, so spreading data across partitions will help achieve greater numbers. Something to consider is that atomic batch transactions only apply to entities that share the same partitionkey. Additionally, for smaller requests you may consider disabling Nagle as small requests may be getting held up at the client layer.
From the client end, I would recommend using the latest client lib (2.1) and Async methods as you have literally thousands of requests per second. (the talk has a few slides on client best practices)
Lastly, the next release of storage will support JSON and JSON no metadata which will dramatically reduce the size of the response body for the same objects, and subsequently the cpu cycles needed to parse them. If you use the latest client libs your application will be able to leverage these behaviors with little to no code change.

update 40+ million entities in azure table with many instances how to handle concurrency issues

So here is the problem. I need to update about 40 million entities in an azure table. Doing this with a single instance (select -> delete original -> insert with new partitionkey) will take until about Christmas.
My thought is use an azure worker role with many instances running. The problem here is the query grabs the top 1000 records. That's fine with one instance but with 20 running their selects will obviously overlap.. a lot. this would result in a lot of wasted compute trying to delete records that were already deleted by another instance and updating a record that has already been updated.
I've run through a few ideas, but the best option I have is to have the roles fill up a queue with partition and row keys then have the workers dequeue and do the actual processing?
Any better ideas?
Very interesting question!!! Extending #Brian Reischl's answer (and a lot of it is thinking out loud, so please bear with me :))
Assumptions:
Your entities are serializable in some shape or form. I would assume that you'll get raw data in XML format.
You have one separate worker role which is doing all the reading of entities.
You know how many worker roles would be needed to write modified entities. For the sake of argument, let's assume it is 20 as you mentioned.
Possible Solution:
First you will create 20 blob containers. Let's name them container-00, container-01, ... container-19.
Then you start reading entities - 1000 at a time. Since you're getting raw data in XML format out of table storage, you create an XML file and store those 1000 entities in container-00. You fetch next set of entities and save them in XML format in container-01 and so on and so forth till the time you hit container-19. Then the next set of entities go into container-00. This way you're evenly distributing your entities across all the 20 containers.
Once all the entities are written, your worker role for processing these entities would come into picture. Since we know that instances in Windows Azure are sequentially ordered, you get instance names like WorkerRole_IN_0, WorkerRole_IN_1, ... and so on.
What you would do is take the instance name, get the number "0", "1" etc. Based on this you would determine which worker role instance will read from which blob container...WorkerRole_IN_0 will read files from container-00, WorkerRole_IN_1 will read files from container-01 and so on.
Now your individual worker role instance will read the XML file, create the entities from that XML file, update those entities and save it back into table storage. Once this process is done, you would then delete the XML file and you move on to next file in that container. Once all files are read and processed, you can just delete the container.
As I said earlier, this is a lot "thinking out loud" kind of solution and some things must be considered like what happens when "reader" worker role goes down and other things.
If your PartitionKeys and/or RowKeys fall into a known range, you could attempt to divide them into disjoint sets of roughly equal size for each worker to handle. eg, Worker1 handles keys starting with 'A' through 'C', Worker2 handles keys starting with 'D' through 'F', etc.
If that's not feasible, then your queuing solution would probably work. But again, I would suggest that each queue message represent a range of keys if possible. eg, a single queue message specifies deleting everything in the range 'A' through 'C', or something like that.
In any case, if you have multiple entities in the same PartitionKey then use batch transactions to your advantage for both inserting and deleting. That could cut down the number of transactions by almost a factor of ten in the best case. You should also use parallelism within each worker role. Ideally use the async methods (either Begin/End or *Async) to do the writing, and run several transactions (12 is probably a good number) in parallel. You can also run multiple threads, but that's somewhat less efficient. In either case, a single worker can push a lot of transactions with table storage.
As a side note, your process should go "Select -> Insert New -> Delete Old". Going "Select -> Delete Old -> Insert New" could result in permanent data loss if a failure occurs between steps 2 & 3.
I think you should mark your question as the answer ;) I cant think of a better solution since I don't know what your partition and row keys look like. But to enhance your solution, you may choose to pump multiple partition/row keys into each queue message to save on transaction cost. Also when consuming from the queue, get them in batches of 32. Process asynchronously. I was able to transfer 170 million records from SQL server (Azure) to Table storage in less than a day.

Resources