does gridgain distributed queue supports transaction? - gridgain

Trying to use transaction with distributed queue in gridgain. Didn't find anything in the documentation or Java Doc.
If it is supported, can someone share some sample code? I tried the following code and it doesn't seem to work.
here is my test code:
GridCache<Object, Object> cache = g.cache("myCache");
GridCacheDataStructures dataStruct = cache.dataStructures();
GridCacheQueue<String> queue = dataStruct.queue("myQueue", 0, false, true);
GridCacheTx tx = cache.txStart();
.....queue.poll()...etc...
tx.commit();

Short answer is No, GridGain queue operations, although transactional/atomic themselves, currently cannot participate in user transactions.
However, as far as I know, this should not be that hard to support and will be added in the nearest future.

Related

spark.streaming.kafka.consumer.cache.enabled property working/ affect on performance of Kafka Consumers

I have come across the config spark.streaming.kafka.consumer.cache.enabled= false in the properties of our application and surprisingly no one in my team knows how does this helps us in achieving better performance. It was added on advice of the support from Cloudera. I couldn't find any elaborate explanation about this property in the Spark Docs. Can anyone please help me understand how does this configuration affect the Kafka Consumer performance.
Looking at the source code, you can see that it has a useCache : Boolean value, and seems to be putting internal KafkaConsumer objects into this cache based on the group id & topic+partition assignments.
I don't have any idea why not caching consumers would be "more performant", but I could guess that not having them cached allows for the Kafka consumer group rebalancing to operate "better"
If you think this property is missing its necessary documentation, then I would suggest opening a JIRA

Azure CosmosDB I cannot get more than 1500 RUs

I have an application that requires large RUs, but for some reason I cannot get the client app to handle more than 1000-1500 RUs, although the collection is set to 10000 RUs. Obviously I can add more clients, but I need one client to give me at least 10000 RUs then scale that.
My requests are simple
var query = connection.CreateDocumentQuery<DocumentDBProfile>(
CollectionUri, //cached
"SELECT * FROM Col1 WHERE Col1.key = '" + partitionKey + "' AND Col1.id ='" + id + "'",
new FeedOptions
{
MaxItemCount = -1,
MaxDegreeOfParallelism = 10000000,
MaxBufferedItemCount = 1000,
}).AsDocumentQuery();
var dataset = await query.ExecuteNextAsync().ConfigureAwait(false);
The query above is hitting 150,000 partitions, each is within own task (awaiting all at the end) and the client is initialized with TCP and direct mode:
var policy = new ConnectionPolicy
{
EnableEndpointDiscovery = false,
ConnectionMode = ConnectionMode.Direct,
ConnectionProtocol = Protocol.Tcp,
};
The CPU on the client appears to max out, mostly to service the call query.ExecuteNextAsync()
Am I doing anything wrong? Any optimization tips? Is there a lower level API I can use? Is there a way to pre-parse queries or make Json parsing more optimal?
UPDATE
I was able to get up to 3000-4000 RU on one client by lowering the number of concurrent requests, and stripping down my deserialized class to one with a single property (id), but I am still 10% of the limit of 50,000 RUs mentioned in the performance guidelines.
Not sure what else I could do. Is there any security checks or overhead I can disable in the .Net SDK?
UPDATE2
All our tests are run on Azure in the same region D11_V2. Running multiple clients scales well, so we are client bound not server bound.
Still not able to achieve 10% of the performance outlined in the CosmosDB performance guideline
By default the SDK will use a retry policy to mask throttling errors. Have you looked at the RU metrics available on Azure portal to confirm if you are being throttled or not? For more details on this, see tutorial here.
Not sure why the REST API would perform better than the .NET SDK. Can you give some more details on the operation you used here?
The example query you provided is querying a single document with a known partitionkey and id per request. For this kind of point-read operation, it would be better to use DocumentClient.ReadDocumentAsnyc, as it should be cheaper than a query.
It sounds like your sole purpose has become to disprove the documentation of Microsoft. Don't overrate this "50.000 RU/S" value for how you should scale your clients.
I don't think you can get a faster & lower level API than using .NET SDK with TCP & direct mode. The critical part is to use the TCP protocol (which you are). Only Java SDK also has direct mode, i doubt its faster. Maybe .NET Core...
How can your requirement be to "have large RU/s"? That is equivalent to "the application should require us to pay X$ for CosmosDB every month". The requirement should rather be "needs to complete X queries per second" or something like this. You then go on from there. See also the request unit calculator.
A request unit is the cost of your transaction. It depends on how large your documents are, how your collection is configured and on what your are doing. Inserting documents is usually much more expensive than retrieving data. Retrieving data across partitions within one query is more expensive than touching only a single one. A rule of thumb is that writing data is about 5 times more expensive than reading it.
I suggest you read the documentation about request units.
The problem with the performance tip of Microsoft is that they don't mention anything about which request should incur those RU/s. I would not expect it to mean: "The most basic request possible will not max out the CPU on the client system if you are still below 50.000 RU/s". Inserting data will get you to those numbers much more easily. I did a very quick test on my local machine, and got the official benchmarking sample up to about 7-8k RU/s using TCP+direct. I did not do anything apart from downloading the code and running it from Visual Studio. So my guess would be that the tips are also about inserting, since the performance testing examples are as well. The example achieves 100.000RU/s incidentally.
There are some good samples from Azure about "Benchmarking" and "Request Units". They should also be good sources for further experiments.
Only one actual tip on how to improve your query: Maybe ditch deserialization to your class, using CreateDocumentQuery(..) or CreateDocumentQuery<dynamic>. Could help your CPU. My first guess would be that your CPU is doing a bunch of that.
Hope this helps in any way.

Does Hazelcast support bulk set or asynchronous bulk put operation?

I have a use-case of inserting a lot of data during big calculation which really don't have to be available in the cluster immediately (so the cluster can synchronize as we go).
Currently I'm inserting batches using putAll() operation and it's blocking and taking time.
I've read a blog post about efficiency of set() operation but there is no analogous setAll(). I also saw putAsync() and didn't see matching putAllAsync() (I'm not interested in the future object).
Am I overlooking something? How can I improve insertion performance?
EDIT: Feature request: https://github.com/hazelcast/hazelcast/issues/5337
I think you're right, they're missing. Could you create a feature request, maybe you're also interested in helping to implement them using the Hazelcast Incubator?

How can i use parallel transactions in neo4j?

I am currently working on an application using Neo4j as an embedded database.
And I wondering how it would be possible to make sure that separate threads use separate transactions. Normally, I would assign database operations to a transaction, but the code examples I found, don't allow for making sure that write operations use separate transactions:
try (Transaction tx = graphDb.beginTx()) {
Node node = graphDb.createNode();
tx.success();
}
As graphDB shall be used as a thread-safe singleton, I really don't see, how that shall work... (E.g. for several users creating a shopping list in separate transactions.)
I would be grateful for pointing out where I misunderstand the concept of transactions in Neo4j.
Best regards and many thanks in advance,
Oliver
The code you posted will run in separate transactions if executed by multiple threads, one transaction per thread.
The way this is achieved (and it's quite a common pattern) is storing transaction state against ThreadLocal (read the Javadoc and things will become clear).
Neo4j Transaction Management
In order to fully maintain data integrity and ensure good transactional behavior, Neo4j supports the ACID properties:
atomicity: If any part of a transaction fails, the database state is left unchanged.
consistency: Any transaction will leave the database in a consistent state.
isolation: During a transaction, modified data cannot be accessed by other operations.
durability: The DBMS can always recover the results of a committed transaction.
Specifically:
-All database operations that access the graph, indexes, or the schema must be performed in a transaction.
Here are the some useful links to understand Neo4j transactions
http://neo4j.com/docs/stable/rest-api-transactional.html
http://neo4j.com/docs/stable/query-transactions.html
http://comments.gmane.org/gmane.comp.db.neo4j.user/20442

Transactional operation with SaveChanges and ExecuteStoreCommand

I have a problem that I would like to share. The context is a bit messy, so I will try to do my best in the explanation.
I need to create a transactional operation over a number of entities. I'm working with EF CodeFirst but with a legacy database that I can't change. In order to create a more consistent model than the database provides I'm projecting the database information into a more refined entities I created on my own.
As I need to use different contexts, my initial idea was to use TransactionScope which gave me good results in the past. Why do I need different contexts? Due to diverse problems with db, I can't make the updates only in one operation (UnitOfWork). I need to retrieve different IDs which only appears after SaveChanges().
using (var scope = new TransactionScope())
{
Operation1();
Operation2();
Operation3(uses ExecuteStoreCommand)
SaveChanges();
Operation4();
SaveChanges();
}
I know that, in order to use TransactionScope, I need to share the same connection among all the operations (And I'm doing it, passing the context to the objects). However, when I execute one of the operations (which uses ExecuteStoreCommand) or I try to do some update after the first SaveChanges I always receive the MSDTC error (the support for distributed transactions is disabled) or even more rare, as unloaded domains.
I don't know if someone can help me, at least to know which is the best direction for this scenario.
Have a look at this answer:
Entity Framework - Using Transactions or SaveChanges(false) and AcceptAllChanges()?
The answer does exactly what you require having a transaction, over multiple data contexts.
This post on Transactions and Connections in Entity Framework 4.0 I found really helpful too.
For people who may need a simpler solution, here's what I use when I need to mix ExecuteStoreCommand and SaveChanges in a transaction.
using (var dataContext = new ContextEntities())
{
dataContext.Connection.Open();
var trx = dataContext.Connection.BeginTransaction();
var sql = "DELETE TestTable WHERE SomeCondition";
dataContext.ExecuteStoreCommand(sql);
var list = CreateMyListOfObjects(); // this could throw an exception
foreach (var obj in list)
dataContext.TestTable.AddObject(obj);
dataContext.SaveChanges(); // this could throw an exception
trx.Commit();
}

Resources