I have scoured the Internet, posted to the Spring forums, and read nearly the whole of the online documentation, but I cannot figure out whether Spring Integration can process more than one message within a single multi-resource (JTA) transaction. This is critical for my purposes, in order to get the throughput necessary. Does anyone know if this is possible? (And a little guidance on how to make it work would be appreciated.)
Once a transaction is started, as long as you don't pass a thread boundary all work will remain in that transaction.
This means that, if your transaction manager supports multi-resource transactions and you avoid introducing concurrency within the transaction, you will be OK.
In other words: it depends, but it is possible.
Related
I want to create a CQRS and Event Sourcing architecture that is very cheap and very flexible and very uncomplicated.
I want to make sure that events never fail to at least reach the publisher/event store, ever, ever, because that's where business is.
Now, i have several options in mind:
Azure
With azure, i seem to not know what to use.
Azure service bus
Azure Function
Azure webjob (i suppose this can be replaced with Azure functions)
?? (something else i forgot or dont know?)
How reliable are these azure server-less solutions??
Custom
For this i am thinking of using RabbitMQ, the problem is the cost of a virtual machine to run it.
All in all, i want:
Ability to replay the messages/events in case of failure.
Ability to easily add subscribers.
Ability to select the subscribers upon which to replay the messages.
The Event store should be able to store very large sizes of event messages (or how else shall queue an image or file??).
The event store MUST NEVER EVER get chocked, or sleep.
Speed of implementation/prototyping would be an added
advantage.
What does your experience suggest?
What about other alternatives? (eg: apache-kafka)?
Why not run Event Store? Created by Greg Young himself. Host where you need.
I am a java user, I have been using hornetq (aka artemis which I dont use) an alternative to rabbitmq for the longest; the only problem is it does not support replication but gets the job done when it comes to eventsourcing. For your custom scenario, rabbitmq is a good choice but try running it on a digital ocean instance for low costs. If you are looking for simplicity and flexibility you have only 2 choices , build your own or forgo simplicity and pick up apache kafka with all its complexities but will give you flexibility. Again you can also build an eventstore with mongodb. https://www.mongodb.com/blog/post/event-sourcing-with-mongodb
Your requirements are too vague to make the optimal choice. You need to consider a lot of things, one of them would be, for instance, the numbers of events per one aggregate, the number of aggregates (note that this has to be statistical). Those are important primarily because if you allow tens of thousands of events for each aggregate then you would need to have snapshotting which adds complexity which you might not need.
But for regular use cases you could just use a relational database like Postgres as your (linearizable) event store. It also has a listen/notify functionality to you would not really need any message bus either and your application could be written in a reactive way.
It is unclear for me, how the transactions implemented in google's datastore.
How they determines the resource I'm trying to reach? Because we are receiving global transaction object that we must use then for all requests. I have doubts, that this is done on the client library level, not on database level.
Is this transaction works both for read and write? I mean, that can I using the transaction implement document lock mechanism, so two user's cannot access it simultaneously (like mutex when we are dealing with multithreaded/multiprocessed applications).
And the final question. Does anybody know how the transnational mechanism implemented in datastore? I mean, high level architecture, or maybe processes diagrams, maybe short inside description, to get better understanding, what I am working with.
It shouldn't be relevant, but I'm using Google Cloud Functions and node.js environment for the project. I believe that this should not make any hard restrictions on usage.
It's not entirely clear what you are asking, but there is some good documentation on transactions with Cloud Datastore. I suggest you take a read through the concept documentation for this topic: Transactions
Transactions are a database level concept (server-side).
Transactions do support reads and writes. It is implemented as optimistic locking, so it won't block another client from reading it like a mutex, but it will cause the transaction to fail and rollback as appropriate.
In the documentation it talks about the details that are relevant to using it.
Outside of transactions, Cloud Datastore's isolation level is closest
to read committed. Inside of transactions, serializable isolation is
enforced. This means that another transaction cannot concurrently
modify the data that is read or modified by this transaction. Read the
serializable isolation wiki and the Transaction Isolation article for
more information on isolation levels.
I have a spring integration process that is used to integrate two systems in near real time expectation.
I need to build a fail-over process for this that may run on same or another machine.
Is there a inbuilt support for this in spring-integration?
If not, some ideas to implement this would be greatly helpful.
I am thinking some sort of heartbeat messages on a message channel and if they don't arrive within a stipulated time-frame, activate the workflow, but i don't know how these can be achieved in spring-integration.
You need to provide more details - types of communication etc but, generally, yes, it can be configured for failover - the default DirectChannel uses round-robin distribution between consumers by default, but you can configure it with a dispatcher that has load-balancer="NONE". Then it will always try the first consumer and failover to the second on failure. You can also configure a circuit breaker advice on the first consumer so we fail fast (for some period of time) and only retry the first consumer once in a while.
As I said, if you can provide more details of your actual requirements, we can help with more specific answers.
I have a couple of questions regarding EJB transactions. I have a situation where a process has become longer running that originally intended and is sometimes failing due to server timeout's being exceeded. While I have increased the timeouts initially (both total transaction and max transaction), for a long running process, I know that it make more sense to segment this work as much as possible into smaller units of work that don't fail based on timeout. As a result, I'm looking for some thoughts or references regarding next course of action based on the background below and the questions that follow.
Environment:
EJB 3.1, JPA 2.0, WebSphere 8.5
Background:
I built a set of POJOs to do some batch oriented work for an enterprise application. They are non-EJB POJOs that were intended to implement several business processes (5 related, sequential processes, each depending on it's predecessor). The POJOs are in a plain Java project, not an EJB project.
However, these POJOs access an EJB facade for database access via JPA. The abstract core of the 5 business processes does the JNDI lookup for the EJB facade in order to return the domain objects for processing. Originally, the design was to run from the server completely, however, a need arose to initiate these processes externally. As a result, I created an EJB wrapper so that the processes could be called remotely (individually or as a single process based on a common strategy interface). Unfortunately, the size of the data, both row width and row count, has grown well beyond the original intent.
The processing time required to complete these batch processes has increased significantly (from around a couple of hours to around 1/2 a day and could increase beyond that). Only one of the 5 processes made sense to multi-thread (I did implement it multi-threaded). Since I have the wrapper EJB to initiate 1 or all, I have decided to create a new container transaction for each process as opposed to the single default transaction of "required" when I run all as a single process. Since the one process is multi-threaded, it would make sense to attempt to create a new transaction per thread, however, being a group of POJOs, I do not have transaction capability.
Question:
So my question is, what makes more sense and why? Re-engineer the POJOs to be EJBs themselves and have the wrapper EJB instantiate each process as a child process where each can have its own transaction and more importantly, the multi-threaded process can create a transaction per thread. Or does it make more sense to attempt to create a UserTransaction in the POJOs from a JNDI lookup in the container and try to manage it as if it were a bean managed transaction (if that's even a viable solution). I know this may be application dependent, but what is reasonable with regard to timeouts for a Java EE container? Obviously, I don't want run away processes, but want to make sure that I can complete these batch processes.
Unfortunatly, this application has already been deployed as a production system. Re-engineering, though it may be little more than assembling the strategy logic in EJBs, is a large change to the functionality.
I did look around for some other threads here and via general internet searches, but thought I would see if anyone had compelling arguments for one over the other or another solution entirely. Additional links that talk about a topic such as this are appreciated. I wrestled with whether to post this since some may construe this as subjective, however, I felt the narrowed topic was worth the post and potentially relevant to others attempting processes like this.
This is not direct answer to your question, but something you could consider.
WebSphere 8.5 especially for these kind of applications (batch) provides a batch container. The batch function accommodate applications that must perform batch work alongside transactional applications. Batch work might take hours or even days to finish and uses large amounts of memory or processing power while it runs. You can reuse your Java classes in batch applications, batch steps can be run in parallel in cluster and has transaction checkpoint management.
Take a look at following resources:
IBM Education Assistant - Batch applications
Getting started with the batch environment
Since I really didn't get a whole lot of response or thoughts for this question over the past couple of weeks, I figured I would answer this question to hopefully help others in making a decision if they run across this or a similar situation.
Ultimately, I re-engineered one of the POJOs into an EJB that acted as a wrapper to call the other POJOs. The wrapper EJB performed the same activity as when it was just a POJO, except for the fact that I added the transaction semantics (REQUIRES_NEW) on the primary method. The primary method calls the other POJOs based on a stategy pattern so each call (or POJO) gets its own transaction. Other methods in the EJB that call the primary method were defined with NOT_SUPPORTED so that I could separate the transactions for each call to the primary method and not join an existing transaction.
Full disclosure, the original addition of transaction semantics significantly increased the processing time (on the order of days), but the process did not fail due to exceeding transaction timeouts. It was the result of some unexpected problems with JPA Many-To-One relationships that were bringing back too much data. Data retreived as a result of a the Many-To-One relationship. As I mentioned originally, some of my data row width increased unexpectedly. That data increase was in the related table object, but the query did not need that data at the time. I corrected those issues by changing my queries (creating objects for SELECT NEW queries, changed relationships to FetchType.LAZY, etc).
Going forward, if I am able to dedicate enough time, I will transform the rest of those POJOs into EJBs. The POJO doing the most significant amount of work that is threaded has been implemented with a Callable implementation that is run via an ExecutorService. If I can transform that one, the plan will be to make each thread its own transaction. However, while I'm not sure yet, it appears that my container may already be creating transactions for each thread group (of 10 threads) due to status updates I'm seeing. I will have to do more investigation.
In order to support offline clients, I want to evaluate how to fit Multi-Version Concurrency Control with a CQRS-DDD system.
Learning from CouchDB I felt tempted to provide each Entity with a version field. However there are other version concurrency algorithms like vector clocks. This made me think that maybe, I should just not expose this version concept for each Entity and/or Event.
Unfortunately most of the implementations I have seen are based on the assumption that the software runs on a single server, where the timestamps for the events come from one reliable source. However if some events are generated remotely AND offline, there is the problem with the local client clock offset. In that case, a normal timestamp does not seem a reliable source for ordering my events.
Does this force me to evaluate some form of MVCC solution not based on timestamps?
What implementation details must an offline-CQRS client evaluate to synchronize a delayed chain of events with a central server?
Is there any good opensource example?
Should my DDD Entities and/or CQRS Query DTOs provide a version parameter?
I manage a version number and it has worked out well for me. The nice thing about the version number is that you can make your code very explicit when dealing with concurrency conflicts. My approach is to ensure that my DTO's all have the version number of the aggregate they are associated with. When I send in a command it has the current version as seen on the client. This number may or may not be in sync with the actual version of the aggregate, ie. the client has been offline. Before the event is persisted I check the version number is the one I expected and if not I do a check against the preceding events to see any of them actually conflict. Only then if they do, do I raise an exception. This is essentially a very fine grained form of optimistic concurrency. If your interested I've written more detail including some code samples on my blog at: http://danielwhittaker.me/2014/09/29/handling-concurrency-issues-cqrs-event-sourced-system/
I hope that helps.
I suggest you to have a look at Greg's presentation on the subject. It might have answers you're looking for https://skillsmatter.com/skillscasts/1980-cqrs-not-just-for-server-systems
I guess you should rethink your domain, separate remote client logic in its own bounded context and integrate it with the other BC using the known principles of DDD for BC interop.