Kogito Apps with distributed transaction with data index

Kogito Apps with distributed transaction with data index - kogito

We are using kogito runtime and data index. Need to have distributed transaction management for the process.
What we see is domain object is persisted and process instance is not created if there is issue with Kafka. Similarly process instance gets created but not. Domain objects if there is issue with kogito runtime persistence.
What can we do for correct rollback for the apps.
How to enable dostributed transaction manageme t between kogito apps and kogito runtime especially data index.

Processing of Kafka messages in Data Index is transactional so it should be all or nothing for domain and process instance. That means, for every message received from Kafka it will update the domain and process instance in the same transaction to avoid inconsistencies, if that is not working somehow we need to investigate.
As for consistency with the runtime, I would recommend looking into the outbox pattern using Debezium there is an example using MongoDB (https://github.com/kiegroup/kogito-examples/tree/stable/kogito-quarkus-examples/process-outbox-mongodb-quarkus). We plan to extend that to other DBS such as PostgreSQL.
Which persistence backend are you using? And Kogito version?

Related

Azure EventHub Push/Pull?

When it comes to Apache Kafka, on the consumer side I know it's a pull model. What about Azure EventHubs? Are they pull or push?
From what I've gathered so far unlike kafka event hubs "push" events to the listeners. Can someone confirm? Any additional details or references would be helpful.
A simple google search landed me on the this page to back up my claim
Is there a simple way to test this theory out?

Yes, Azure Event Hub push events to event consumers, there is no need to 'poll' for consuming the events. The event processor defines event handlers which are invoked as new events are ingested into the event stream.
The event consumer can do something called as checkpoint that marks the event upto which the events have been consumed.
See the doc for more details.

The short answer to this is that the model for consuming events depends on the type of client that your application has chosen to use. The official Azure SDK packages offer consumer types that are push-based and those that are pull-based.
You don't mention the specific language that you're using but, since you're comparing to Kafka, I'll assume that you're interested in Java. The azure-messaging-eventhubs family of packages are the current generation of the Azure SDK and has the following clients for reading events:
EventProcessorClient: This is a push-based client intended to serve as the primary consumer of events in production scenarios for the majority of workloads. It is responsible for reading and processing events for all partitions of an Event Hub and collaborates with other EventProcessorClient instances using the same Event Hub and consumer group to balance work between them. A high degree of fault tolerance is built-in, allowing the processor to be resilient in the face of errors.
EventHubConsumerAsyncClient: This is a push-based client focused on reading events from a single partition using a Flux-based subscription via the Reactor library. This client requires applications to own responsibility for resilience and processing state persistence.
EventHubConsumerClient: This is a pull-based client focused on reading events from a single partition using an iterator pattern. This client requires applications to own responsibility for resilience and processing state persistence.
More information around the package, its types, and basic usage in the Azure Event Hubs client library for Java overview. More detailed samples can be found in the samples overview, including those for Consuming events and Using the EventProcessorClient.

In an Event-Driven Microservice, how to I update private database with older data

I'm working on a new project, and I am still learning about how to use Microservice/Domain Driven Design.
If the recommended architecture is to have a Database-Per-Service, and use Events to achieve eventual consistency, how does the service's database get initialized with all the data that it needs?
If the events indicating an update to the database occurred before the new service/db was ever designed, do I need to start with a copy of the previous database?
Or should I publish a 'New Service On The Block' event, and allow all the other services to vomit back everything back to me again? Which could be a LOT of chatty-ness, and cause performance issues.

how does the service's database get initialized with all the data that it needs?
It asks for it; which is to say that you design a protocol so that the service that is spinning up can get copies of all of the information that it needs. That often includes tracking checkpoints, and queries that allow you to ask what has happened since some checkpoint.
Think "pull", rather than "push".
Part of the point of "services": designing the right data boundaries. The need to copy a lot of data between services often indicates that the service boundaries need to be reconsidered.

There is a special streaming platform named Apache Kafka, that solves something similar.
With Kafka you would publish events for other services to consume. What makes Kafka special is the fact, that events never (depends on configuration) get deleted and can be consumed again by new services spinning up. This feature can be used for initially populating the database (by setting the offset for a Topic to 0 and therefore re-read the history of events).
There also is another feature, called GlobalKTable what is a TableView of all events for a particular Topic. The GlobalKTable holds the latest value for each key (like primary key) and can be turned into an state-store (RocksDB under the hood), what makes it queryable. This state-store initializes itself whenever the application starts up. So the application does not need to have a database itself, because the state-store would be kept up-to-date automatically (consistency still is a thing to keep in mind). Only for more complex queries that state-store would need to be accompanied with a database (with kafka you would try to pre-compute the results of those queries and make them accessible to a distinct state-store itself).
This would be a complex endeavor, but if it suits your needs it is a fun thing to do!

HazelCast - How to call a business service through HazelCast Client

I am using HazelCast to cache data from a database in a Proof Of Concept to a likely customer.
Client layer is in C#. I am using the .Net dll to retrieve data from the HazelCast layer.
My requirement is to execute some business logic steps followed by a transaction. This transaction will insert/update few records in the database.
So, I want to execute a service method which will take an object as input and return another object as output. The method implementation will have the business logic followed by the transaction. The method should return the result of the execution.
I see that I cannot invoke a generic service through the HazelCast client.
Client only provides methods to get data through HazelCast datastructures.
Is there a solution for my requirement?
Thanks for your answers.
s.r.guruprasad

Distributed Executor Service or Entry Processor is what you are looking for but apparently it is not made available for a .NET client.
Solution would be have another webservices layer which can make use of Hazelcast's Java client which supports them.
http://docs.hazelcast.org/docs/3.5/manual/html/distributedcomputing.html

Hazelcast vs Redis(or RedisLabs) for NodeJS application

I have an application having more than 2 TB of data to be stored in cache, the data will be accessed using NodeJS APIs. For a NodeJS app which would be a better choice, Hazelcast or Redis(or RedisLabs)? Considering following criteria?
NodeJS API Support, including connection pooling. Looks like HazelCast doesn't have NodeJS API
I understand that in benchmarking Hazelcast is faster due to multithreaded implementation, and its scalable as well. But can we effectively leverage these good features using NodeJS(need Set datastructure)? Lastly, we can have multiple shards in RedisLabs which will be like having multiple threads or processes working on their respective chunk of data, in that case I believe the Hazelcast's edge due to multi-threaded nature would be true for Redis but not for RedisLabs, Any comments in this?

Hazelcast Node.js client in fact does exist and currently provides following features
implementation of Open Client Binary Protocol, Redis uses text-based protocol
Map
Get
Put
Remove
Smart Client - clients connect to each cluster node.
Since each data partition uses the well known and consistent hashing algorithm, each client can send an operation to the relevant cluster node, which increases the overall throughput and efficiency. The client doesn't need to be restarted in case of adding or removing nodes from the cluster.
Distributed Object Listener
Lifecycle Service
In terms of comparing Hazelcast and Redis server-side features, you find comprehensive doc here.
Thank you

Well, I would suggest if you are using very complex data handling/processing you should go with HazelCast, and by nature nodejs is single threaded so if you are using it just to store key-value dont go with it.
There is official API you can use for (NodeJS + hazelcast) but with very limited functionality, uses only KeyVal
If you are just using cache as key-value store Redis is good, fast, FREE!, it can handle huge data as well with some extra setup take look at
http://redis.io/topics/partitioning
in terms of support pricing RedisLabs are less costly and if you use Redisson with Redis it can give you all data structure which Hazelcast use :)
BitSet, Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, CountDownLatch, Publish / Subscribe, Bloom filter, Remote service, Spring cache, Executor service, Live Object service, Scheduler service
RedisLabs is having much more userbase + contributors, HazelCast is bit less if you comapre users, so if your data is 2TB if its just key-value .. Redis be best

Messaging bus + event storage + PubSub

I'm looking at building an application which has many data sources, each of which put events into my system. Events have a well defined data structure and could be encoded using JSON or XML.
I would like to be able to guarantee that events are saved persistently, and that the events are used as a part of a publish/subscribe bus with multiple subscribers possible per event.
For the database, availability is very important even as it scales to multiple nodes, and partition tolerance is important so that I can scale the number of places which can store my events. Eventual consistency is good enough for me.
I was thinking of using a JMS enterprise messaging bus (e.g. Mule) or an AMQP enterprise messaging bus (such as RabbitMQ or ZeroMQ).
But for my application, it seems that if I could set up a publish subscribe system with CouchDB or something similar, it would solve my problem without having to integrate a enterprise messaging bus and a persistent storage system.
Which would work better, CouchDB + scaling + loadbalancing + some kind of PubSub mechanism, or an explicit PubSub messaging system with attached eventually-consistent , Available, partition-tolerant storage? Which one is easier to set up, administer, and operate? Which solution will have high throughput for a given cost? Why?
Also, are there any more questions I should ask before selecting my technologies? (BTW, Java is the server-side and client-side language).

I am using a CouchDB message queue in production. (It is not pub/sub, so I do not consider this answer complete.)
Currently (June 2011), CouchDB has huge potential as a messaging substrate:
Good data persistence
Well-poised for clustering (on a LAN, using BigCouch or Lounge)
Well-poised for distribution (between data centers, world-wide)
Good platform. Despite the shortcomings listed below, I love CQS because I can re-use my DB and it works from Erlang, NodeJS, and every web browser.
The _changes query
Continuous feeds, instant delivery without polling
Network going down is no problem, just retry later from the previous position
Still, even a low-volume message system in CouchDB requires careful planning and maintenance. CouchDB is potentially a great messaging server. (It is inspired by Lotus notes, which handles high email volume.)
However, these are the challenges with CouchDB:
Append-only database files grow fast
Be mindful about disk capacity
Be mindful about disk i/o. Compaction will read and re-write all live documents
Deleted documents are not really deleted. They are marked deleted=true and kept forever, even after compaction! This is in fact uniquely good about CouchDB, because the deleted action will propagate through the cluster, even if the network goes down for a time.
Propagating (replicating) deletes is great, but what about the buildup of deleted docs? Eventually it will outstrip everything else. The solution is to purge them, which actually removes them from disk. Unfortunately, if you do 2 or more purges before querying a map/reduce view, the view will completely rebuild itself. That may take too much time, depending on your needs.
As usual, we hear NoSQL databases shouting "free lunch!", "free lunch!" while CouchDB says "you are going to have to work for this."
Unfortunately, unless you have compelling pressure to re-use CouchDB, I would use a dedicated messaging platform. I had a good experience with ejabberd as a messaging platform and to communicate to/from Google App Engine.)

I think that the best solution would be CouchDB + Jabber/XMPP server (ejabberd) + book: http://professionalxmpp.com
JSON is the natural storing mechanism for CouchDB
Jabber/XMPP server includes pubsub support
The book is a must read

While you can use a database as an alternative to a message queueing system, no database is a message queuing system, not even CouchDB. A message queueing system like AMQP provides more than just persistence of messages, in fact with RabbitMQ, persistence is just an invisible service under the hood that takes care of all of the challenges that you have to deal with by yourself on CouchDB.
Take a good look at the RabbitMQ website where there is lots of information about AMQP and how to make use of it. They have done a great job of collecting together articles and blogs about message queueing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string