I have a Java application, which uses an Oracle Queue to store messages in the queue for later processing by multiple threads consuming queued messages. The messages in this queue can be related to each other, and must therefore be processed in a specific order based on the business logic of my application. Basically, I want to achieve that the dequeueing of one message A is held back as long as another message B in the queue has not been completely processed. The only weapon given by Oracle AQ I see here, are the Delay and an Priority parameters. These, however, cannot be used to achieve the scenario outlined above, since there are situations, where two related messages still can be dequeued and processed at the same time. Are there any tools that can help establishing an advanced processing order of messages?
I came to the conclusion that it is not a good idea to order these messages using the queue, because it would need a custom and very specialized dequeue strategy, which has a very bad smell to me, both, complexity and most likely performance wise. It also tries to fix communication protocol issues using the queue, which are application specific and therefore should find treatment in the application itself. Instead, the application / communication protocol should be tolerant enough to handle ordering issues.
Related
I'm trying to design a robust architecture, however I'm having trouble on solving the message delivery.
Let me try to explain
The API would be clustered on ECS receiving a bunch of requests.
The Workers would be clustered too subscribing the same channels. (that's the problem, if we were working with only one worker it wouldn't have any issue)
How to deal with multiple workers avoiding duplicated messages?
What would be a good simple approach, keeping many workers occupied.?
Thank you.
This sounds like a very fundamental problem, for a message broker: having one channel and multiple workers subscribed to it, and all of them to receive the same message. It wouldn't really be useful to process the same message multiple times.
This problem has been addressed in most message brokers (I believe). For example, when you consume a message from an Amazon SQS queue, that message is not visible to other consumers, for a particular timeframe (visibility timeout).
When the worker processed the message, it has to delete it from the queue. Otherwise, if the timeout expired, other workers will see the message and process it.
SQS in particular has a distributed architecture and sometimes you get duplicate messages in the queue, which are processed by different workers. That's the effect of the at-least-once delivery guarantee that SQS provides.
If your system has to be strict about duplicate messages, then you need to build a de-duplication mechanism around it.
The keywords you are looking for is "exactly once guarantee in a distributed system". With that you can do some research on your own, but here some pointers.
You could use the right Event Queue System that supports "exactly once" guarantees. For example Apache Pulsar (see this link) or Kafka, or you can use their approach as inspiration in your own implementation (which may be somewhat hard to do).
For your own implementation you could write a special consumer that is the only consumer and acts a distributor for worker tasks and whose task it is to guarantee "exactly once". It would be a tradeoff and could prove a bottleneck, depending on your scalability requirements. This article explains why it is a difficult problem in distributed systems.
I want to create a CQRS and Event Sourcing architecture that is very cheap and very flexible and very uncomplicated.
I want to make sure that events never fail to at least reach the publisher/event store, ever, ever, because that's where business is.
Now, i have several options in mind:
Azure
With azure, i seem to not know what to use.
Azure service bus
Azure Function
Azure webjob (i suppose this can be replaced with Azure functions)
?? (something else i forgot or dont know?)
How reliable are these azure server-less solutions??
Custom
For this i am thinking of using RabbitMQ, the problem is the cost of a virtual machine to run it.
All in all, i want:
Ability to replay the messages/events in case of failure.
Ability to easily add subscribers.
Ability to select the subscribers upon which to replay the messages.
The Event store should be able to store very large sizes of event messages (or how else shall queue an image or file??).
The event store MUST NEVER EVER get chocked, or sleep.
Speed of implementation/prototyping would be an added
advantage.
What does your experience suggest?
What about other alternatives? (eg: apache-kafka)?
Why not run Event Store? Created by Greg Young himself. Host where you need.
I am a java user, I have been using hornetq (aka artemis which I dont use) an alternative to rabbitmq for the longest; the only problem is it does not support replication but gets the job done when it comes to eventsourcing. For your custom scenario, rabbitmq is a good choice but try running it on a digital ocean instance for low costs. If you are looking for simplicity and flexibility you have only 2 choices , build your own or forgo simplicity and pick up apache kafka with all its complexities but will give you flexibility. Again you can also build an eventstore with mongodb. https://www.mongodb.com/blog/post/event-sourcing-with-mongodb
Your requirements are too vague to make the optimal choice. You need to consider a lot of things, one of them would be, for instance, the numbers of events per one aggregate, the number of aggregates (note that this has to be statistical). Those are important primarily because if you allow tens of thousands of events for each aggregate then you would need to have snapshotting which adds complexity which you might not need.
But for regular use cases you could just use a relational database like Postgres as your (linearizable) event store. It also has a listen/notify functionality to you would not really need any message bus either and your application could be written in a reactive way.
Has anybody thought about implementing strategies for Azure storage queues that would allow dequeuing messages in an arbitrary order (other than first-in, first-out). For examples, some people might be interested in LIFO, some people might want to dequeue "important" messages ahead of less important ones, etc.
Personally, I am interested in implementing a strategy that would allow messages in a multi-tenant system to be dequeued in a way that ensures large number of messages related to a particular tenant will not cause messages for other tenants to be delayed.
I am also interested in other queuing systems that may have implemented similar strategies.
Are there other queuing systems that allow this kind of
What you are looking for is referred to as a Priority Queue Pattern which you can read more about here.
There are a couple of strategies for achieving this. One is to use different queues for the higher priority messages. Or in your case, a queue for each customer.
Another approach, and is one I would prefer for your scenario, is to use the ServiceBus Topics and Subscriptions (pub/sub basically).
Both of these are discussed in more detail in the link provided above.
The priority queue pattern is the way to go. Use different queues for different message priorities. You can also assign appropriate numbers of workers to each queue to drain at an appropriate rate.
I have a multithreaded program that processes images and I want to make a queue in the program for inserting requests from an external program in this queue and the image processor's threads dequeue the requests from it. Now what is the best solution for when the queue is full and new requests arrive?Should I drop new or old or random requests?
That is totally up to you and how you want your program to behave. See what the outcome of each decision might lead and decide which is the lesser evil.
I recommend using a message broker for this approach.
You can send the images as messages to this broker which will handle the routing to the target program, the complete queuing system as well as the "full queue" problem.
I personally have made some good experience with RabbitMQ, although it might be a little overhead for your special purpose. You may have a look at ZeroMQ as well as it might be a little thinner and a better approach for you.
To do things right, determine what you really need and check out if those message broker are really useful in your current situation - from my point of view and in my opinion they are, but depends on your exact requirements and implementation.
If you're interested look at AMQP, the Advanced Messaging Queuing Protocol itself - it's the basic for all those message brokers and quite interesting.
Here is my scenario:
I have two servers with a multi-threaded message queuing consumer on each (two consumers total).
I have many message types (CreateParent, CreateChild, etc.)
I am stuck with bad legacy code (creating a child will partially creates a parent. I know it is bad...But I cannot change that.)
Message ordering cannot be assume (message queuing principle!)
RabbitMQ is my message queuing broker.
My problem:
When two threads are running simultaneous (one executing a CreateParent, the other executing a CreateChild), they generate conflicts because the two threads try to create the Parent in the database (remember the legacy code!)
My initial solution:
Inside the consumer, I created an "entity locking" concept. So when the thread processes a CreateChild message for example, it locks the Child and the Parent (legacy code!!) so the CreateParent message processing can wait. I used basic .net Monitor and list of Ids to implement this concept. It works well.
My initial solution limitation:
My "entity locking" concept works well on a single consumer in a single process on a single server. But it will not works across multiple servers running multiple consumers.
I am thinking of using a shared database to "store" my entity locking concept, so each processes (and threads) could access the database to verify which entities are locked.
My question (finally!):
All this is becoming very complex and it increases the bugs risk and code maintenance problems. I really don`t like it!
Does anyone already faced this kind of problem? Are they acceptable workarounds for it?
Does anyone have an idea for a clean solution for my scenario?
Thanks!
Finally, simple solutions are always the better ones!
Instead of using all the complexity of my "entity locking" concept, I finally turn down to pre-validate all the required data and entities states before executing the request.
More precisely, instead of letting CreateChild process crashes by itself when it encounter already existing data created by the CreateParent, I fully validate that everything is okay in the databases BEFORE executing the CreateChild message.
The drawback of this solution is that the implementation of the CreateChild must be aware of what of the specific data the CreateParent will produces and verify it`s presence before starting the execution. But seriously, this is far better than locking all the stuff in cross-system!