Here is my scenario:
I have two servers with a multi-threaded message queuing consumer on each (two consumers total).
I have many message types (CreateParent, CreateChild, etc.)
I am stuck with bad legacy code (creating a child will partially creates a parent. I know it is bad...But I cannot change that.)
Message ordering cannot be assume (message queuing principle!)
RabbitMQ is my message queuing broker.
My problem:
When two threads are running simultaneous (one executing a CreateParent, the other executing a CreateChild), they generate conflicts because the two threads try to create the Parent in the database (remember the legacy code!)
My initial solution:
Inside the consumer, I created an "entity locking" concept. So when the thread processes a CreateChild message for example, it locks the Child and the Parent (legacy code!!) so the CreateParent message processing can wait. I used basic .net Monitor and list of Ids to implement this concept. It works well.
My initial solution limitation:
My "entity locking" concept works well on a single consumer in a single process on a single server. But it will not works across multiple servers running multiple consumers.
I am thinking of using a shared database to "store" my entity locking concept, so each processes (and threads) could access the database to verify which entities are locked.
My question (finally!):
All this is becoming very complex and it increases the bugs risk and code maintenance problems. I really don`t like it!
Does anyone already faced this kind of problem? Are they acceptable workarounds for it?
Does anyone have an idea for a clean solution for my scenario?
Thanks!
Finally, simple solutions are always the better ones!
Instead of using all the complexity of my "entity locking" concept, I finally turn down to pre-validate all the required data and entities states before executing the request.
More precisely, instead of letting CreateChild process crashes by itself when it encounter already existing data created by the CreateParent, I fully validate that everything is okay in the databases BEFORE executing the CreateChild message.
The drawback of this solution is that the implementation of the CreateChild must be aware of what of the specific data the CreateParent will produces and verify it`s presence before starting the execution. But seriously, this is far better than locking all the stuff in cross-system!
Related
I am writing payroll management web application in nodejs for my organisation. In many cases application shall involve cpu intensive mathematical calculation for calculating the figures and that too with many users trying to do this simulatenously.
If i plainly write the logic (setting aside the fact that i already did my best from algorithm and data structure point of view to contain the complexity) it will run synchronously blocking the event loop and make request, response slow.
How to resolve this scenario? What are the possible options to do this asynchronously? I also want to mention that this calculation stuff can be let to run in the background and later i can choose to tell user via notification about the status. I have searched for the solution all over this places and i found some solutions but only in theory & i haven't tested them all by implementing. Mentioning below:
Clustering the node server
Use worker threads
Use an alternate server and do some load balancing.
Use a message queue and couple it with worker thread to do backgound tasks.
Can someone suggest me some tried and battle tested advice on this scenario? and also some tutorial links associated with that.
You might wanna try web workers,easy to use and documented.
https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers
I have a Java application, which uses an Oracle Queue to store messages in the queue for later processing by multiple threads consuming queued messages. The messages in this queue can be related to each other, and must therefore be processed in a specific order based on the business logic of my application. Basically, I want to achieve that the dequeueing of one message A is held back as long as another message B in the queue has not been completely processed. The only weapon given by Oracle AQ I see here, are the Delay and an Priority parameters. These, however, cannot be used to achieve the scenario outlined above, since there are situations, where two related messages still can be dequeued and processed at the same time. Are there any tools that can help establishing an advanced processing order of messages?
I came to the conclusion that it is not a good idea to order these messages using the queue, because it would need a custom and very specialized dequeue strategy, which has a very bad smell to me, both, complexity and most likely performance wise. It also tries to fix communication protocol issues using the queue, which are application specific and therefore should find treatment in the application itself. Instead, the application / communication protocol should be tolerant enough to handle ordering issues.
I'm using Hibernate in an embedded Jetty server, and I want to be able to parallelize my data processing with some multithreading and still have it all be in the same transaction. As Sessions are not thread safe this means I need a way to get multiple sessions attached to the same transaction, which means I need to switch away from the "thread" session context I've been using.
By my understanding of the documentation, this means I need to switch to JTA session context, but I'm having trouble getting that to work. My research so far seems to indicate that it requires something external to Hibernate in the server to provide transaction management, and that Jetty does not have such a thing built in, so I would have to pull in some additional library to do it. The top candidates I keep running across for that generally seem to be large packages that do all sorts of other stuff too, which seems wasteful, confusing, and distracting when I'm just looking for the one specific feature.
So, what is the minimal least disruptive setup and configuration change that will allow getCurrentSession() to return Sessions attached to the same transaction in different threads?
While I'm at it, I know that fetching objects in one thread and altering them in another is not safe, but what about reading their properties in another thread, for example calling toString() or a side effect free getter?
I have a couple of questions regarding EJB transactions. I have a situation where a process has become longer running that originally intended and is sometimes failing due to server timeout's being exceeded. While I have increased the timeouts initially (both total transaction and max transaction), for a long running process, I know that it make more sense to segment this work as much as possible into smaller units of work that don't fail based on timeout. As a result, I'm looking for some thoughts or references regarding next course of action based on the background below and the questions that follow.
Environment:
EJB 3.1, JPA 2.0, WebSphere 8.5
Background:
I built a set of POJOs to do some batch oriented work for an enterprise application. They are non-EJB POJOs that were intended to implement several business processes (5 related, sequential processes, each depending on it's predecessor). The POJOs are in a plain Java project, not an EJB project.
However, these POJOs access an EJB facade for database access via JPA. The abstract core of the 5 business processes does the JNDI lookup for the EJB facade in order to return the domain objects for processing. Originally, the design was to run from the server completely, however, a need arose to initiate these processes externally. As a result, I created an EJB wrapper so that the processes could be called remotely (individually or as a single process based on a common strategy interface). Unfortunately, the size of the data, both row width and row count, has grown well beyond the original intent.
The processing time required to complete these batch processes has increased significantly (from around a couple of hours to around 1/2 a day and could increase beyond that). Only one of the 5 processes made sense to multi-thread (I did implement it multi-threaded). Since I have the wrapper EJB to initiate 1 or all, I have decided to create a new container transaction for each process as opposed to the single default transaction of "required" when I run all as a single process. Since the one process is multi-threaded, it would make sense to attempt to create a new transaction per thread, however, being a group of POJOs, I do not have transaction capability.
Question:
So my question is, what makes more sense and why? Re-engineer the POJOs to be EJBs themselves and have the wrapper EJB instantiate each process as a child process where each can have its own transaction and more importantly, the multi-threaded process can create a transaction per thread. Or does it make more sense to attempt to create a UserTransaction in the POJOs from a JNDI lookup in the container and try to manage it as if it were a bean managed transaction (if that's even a viable solution). I know this may be application dependent, but what is reasonable with regard to timeouts for a Java EE container? Obviously, I don't want run away processes, but want to make sure that I can complete these batch processes.
Unfortunatly, this application has already been deployed as a production system. Re-engineering, though it may be little more than assembling the strategy logic in EJBs, is a large change to the functionality.
I did look around for some other threads here and via general internet searches, but thought I would see if anyone had compelling arguments for one over the other or another solution entirely. Additional links that talk about a topic such as this are appreciated. I wrestled with whether to post this since some may construe this as subjective, however, I felt the narrowed topic was worth the post and potentially relevant to others attempting processes like this.
This is not direct answer to your question, but something you could consider.
WebSphere 8.5 especially for these kind of applications (batch) provides a batch container. The batch function accommodate applications that must perform batch work alongside transactional applications. Batch work might take hours or even days to finish and uses large amounts of memory or processing power while it runs. You can reuse your Java classes in batch applications, batch steps can be run in parallel in cluster and has transaction checkpoint management.
Take a look at following resources:
IBM Education Assistant - Batch applications
Getting started with the batch environment
Since I really didn't get a whole lot of response or thoughts for this question over the past couple of weeks, I figured I would answer this question to hopefully help others in making a decision if they run across this or a similar situation.
Ultimately, I re-engineered one of the POJOs into an EJB that acted as a wrapper to call the other POJOs. The wrapper EJB performed the same activity as when it was just a POJO, except for the fact that I added the transaction semantics (REQUIRES_NEW) on the primary method. The primary method calls the other POJOs based on a stategy pattern so each call (or POJO) gets its own transaction. Other methods in the EJB that call the primary method were defined with NOT_SUPPORTED so that I could separate the transactions for each call to the primary method and not join an existing transaction.
Full disclosure, the original addition of transaction semantics significantly increased the processing time (on the order of days), but the process did not fail due to exceeding transaction timeouts. It was the result of some unexpected problems with JPA Many-To-One relationships that were bringing back too much data. Data retreived as a result of a the Many-To-One relationship. As I mentioned originally, some of my data row width increased unexpectedly. That data increase was in the related table object, but the query did not need that data at the time. I corrected those issues by changing my queries (creating objects for SELECT NEW queries, changed relationships to FetchType.LAZY, etc).
Going forward, if I am able to dedicate enough time, I will transform the rest of those POJOs into EJBs. The POJO doing the most significant amount of work that is threaded has been implemented with a Callable implementation that is run via an ExecutorService. If I can transform that one, the plan will be to make each thread its own transaction. However, while I'm not sure yet, it appears that my container may already be creating transactions for each thread group (of 10 threads) due to status updates I'm seeing. I will have to do more investigation.
I was reading a paper recently Why Events are Bad. The paper is a comparative study of Event based and thread based highly concurrent servers and finally concludes stating that Threads are better than events in that scenario.
I find that I am not able to classify what sort of concurrency model erlang exposes. Erlang provides Light Weight Processes, but those processes are suspended most of the time until it has received some event/message of some sort.
/Arun
The Erlang concurrency model is based on the following premises:
Lightweight concurrency. You should be able to efficiently create as many processes as you need for your application and you should be able efficiently to create and delete them when necessary. This means that processes are light and small and there is no need to have a process pool to save time.
Asynchronous communication. All process communication is through asynchronous message passing, that's it, there is nothing else, nada.
Error handling. The same way as as lightweight concurrency and asynchronous messages are fundamental to building concurrent systems error handling is fundamental to building robust systems. The primitives for this interact with concurrency and are part of the Erlang concurrency model.
Process isolation. There is no shared state at all between processes, the only way to communicate is through message passing. This is fundamental to being able to build robust systems as it allows processes to crash without ruining it for other processes. Of course they may receive information that a process has crashed through the error handling mechanism but a crashed will never create inconsistent state in other processes. A corollary to this is that there is no global data.
These are the fundamental premises to Erlang's concurrency model. You may often see them expressed in different ways but they are basically the same. Erlang also has immutable data which is a BIG WIN but this is not really part of the concurrency model, message passing and process isolation are enough. In some circles this may be considered a heretical viewpoint.
As you can see Actors are only part of the model. Error handling is fundamental but often overlooked. Overlooking it means you have missed part of the point.
N.B. Erlang processes are proper processes/threads in that they have a life of their own and are not just a form of event driven coroutines. A process can happily go about its business and change its internal state without being driven by external events.
I guess it's called the Actor model.