Should I use a Webhook or AWS queue (SQS)? - node.js

I've been implementing an SQS service(AWS) for my project. My purpose for this implement is I have 2 projects (microservice) and I want to sync data from one project to another. So, I intend to use SQS service but I also think about webhook for solving my case. I know some basics of the pros and cons of them. So, my question is should I use a webhook or SQS for my case?
Thanks for any helping!

First of all, if you wish to sync 2 databases you would probably want something that's not accounting on your service. Try reading about change data capture - Log scanners is a safe way to do that. Debezium - is a strong tool for it.
Second, if you wish to go with your own implementation I would suggest going with the queueing approach. The biggest advantage of it will be incased when the second service is down. While if using Webhooks the information will be lost, using queues (SQS or any other) will keep the data until the service is up again.

SQS is your best bet here. Couple of reasons
- Reliability in case something is down.
- Ability to repopulate other micro-services. For example if you decide to create another microservice and you need to populate data since start, you will probably read everything from service 1 and put it in the queue for the new micro service.
- Scalability - Queues makes your architecture horizontally scalable. Just put machines to do the work while reading it from queues in parallel.

Related

Microservices, how to notify backend when task complete

For example, if i have main application (backend) and some microservice, e.g for image cropping.
User loads an image, making request to backend, backend using rabbitmq posts new task in the queue, then image cropping service pickup a task, completes it and i need somehow notify backend.
What is options for this? I need another microservice for such notifications?
so... there are reaaaaaaly many ways to do that.
On the high level, what you want to achieve is to produce an event that 1 or more services can react to. Now depending on what you have available, you can produce the event in a number of different ways.
if you want to be completely platform independent, you can use Apache Kafka. It's a popular service specifically for what we need -> publishing events and processing them at mass-scale. Kafka can be clustered, partitioned, have multiple parallel consumers of the same type (like multiple instances of your main backend service) or different types (3 different microservices that happen to be interested in a specific event). This bad boy just has it all and is famous for that. You can set up a cluster yourself or use one that comes out-of-the-box with some of the cloud platforms (like AWS for instance), but this might be more expensive and difficult to use compared to some cloud-specific fully-managed solutions.
if you're running your stuff on the google cloud, you can make it easier and cheaper by using the PubSub service. PubSub is a fully managed service that is scaled out-of-the-box (welcome to the cloud! you don't need to scale or cluster anything by yourself!).
if you're running on AWS, you can use SNS, or a more recent alternative - EventBridge (kinda like SNS, but booooooy what can it not do?). Yeah... I would recommend EventBridge. It can just do more... with the target filtering rules, payload transformations, it can automatically trigger more things...
Azure... ehm... Event Hub... but I haven't worked with this one yet... I'm not much of an Azurer... because you know... nobody uses azure for this kind of stuff...

Questions pertaining to micro-service architecture

I have a couple of questions that exist around micro service architecture, for example take the following services:
orders,
account,
communication &
management
Question 1: From what I read I understand that each service is suppose to have ownership of the data pertaining to that service, so orders would have an orders database. How important is that data ownership? Would micro-services make sense if they all called from one traditional database such that all data pertaining to the services would exist in one database? If so, are there an implications of structuring the services this way.
Question 2: Services should be able to communicate with one and other. How would that statement be any different than simply curling an existing API? & basing the logic on that response? Is calling a service more efficient than simply curling the API?
Question 3: Is it worth it? Now I understand this is a massive generality , and it's fundamentally predicated on the needs of the business. But when that discussion has been had, was the re-build worth it? & what challenges can you expect to face
I will try to answer all the questions.
Respect to all services using the same database. If you do so you have two main problems. First the database would become a bottleneck because all requests will go to the same point. And second you will have coupled all your services, so if the database goes down or it needs to update, all your services will be affected. (The database will became a single point of failure)
The communication between services could be whatever your services need (syncrhonous, asynchronous, via message passing (message broker), etc..) it all depends on the use cases you have to support. The recommended way to do to avoid temporal decoupling is to use a message broker like kafka, doing this your services don't have to known each other and in case some of them go down the others will still working. And when they are up again, they can continue to process the messages that have pending. However, if your services need to respond in synchronous way, you can define synchronous communication between services and use a circuit breaker to behave properly in case the callee service is down.
Microservices architecture is far more complicated to make it work, to monitoring and to debug than a traditional monolith architecture so, it is only worth if you will have very large requirements of scalability and availability and/or if the system is very large and it will require several teams working in different parts of the system and it is recommendable to avoid dependencies among them. So each team can work at their own pace deploying their own services

CQRS and Event Sourcing Guide

I want to create a CQRS and Event Sourcing architecture that is very cheap and very flexible and very uncomplicated.
I want to make sure that events never fail to at least reach the publisher/event store, ever, ever, because that's where business is.
Now, i have several options in mind:
Azure
With azure, i seem to not know what to use.
Azure service bus
Azure Function
Azure webjob (i suppose this can be replaced with Azure functions)
?? (something else i forgot or dont know?)
How reliable are these azure server-less solutions??
Custom
For this i am thinking of using RabbitMQ, the problem is the cost of a virtual machine to run it.
All in all, i want:
Ability to replay the messages/events in case of failure.
Ability to easily add subscribers.
Ability to select the subscribers upon which to replay the messages.
The Event store should be able to store very large sizes of event messages (or how else shall queue an image or file??).
The event store MUST NEVER EVER get chocked, or sleep.
Speed of implementation/prototyping would be an added
advantage.
What does your experience suggest?
What about other alternatives? (eg: apache-kafka)?
Why not run Event Store? Created by Greg Young himself. Host where you need.
I am a java user, I have been using hornetq (aka artemis which I dont use) an alternative to rabbitmq for the longest; the only problem is it does not support replication but gets the job done when it comes to eventsourcing. For your custom scenario, rabbitmq is a good choice but try running it on a digital ocean instance for low costs. If you are looking for simplicity and flexibility you have only 2 choices , build your own or forgo simplicity and pick up apache kafka with all its complexities but will give you flexibility. Again you can also build an eventstore with mongodb. https://www.mongodb.com/blog/post/event-sourcing-with-mongodb
Your requirements are too vague to make the optimal choice. You need to consider a lot of things, one of them would be, for instance, the numbers of events per one aggregate, the number of aggregates (note that this has to be statistical). Those are important primarily because if you allow tens of thousands of events for each aggregate then you would need to have snapshotting which adds complexity which you might not need.
But for regular use cases you could just use a relational database like Postgres as your (linearizable) event store. It also has a listen/notify functionality to you would not really need any message bus either and your application could be written in a reactive way.

Can Azure EventHub be used for critical transactional data in production?

Reading the documentation, Azure EventHubs is meant for:
Application instrumentation
User experience or workflow processing
Internet of Things (IoT) scenarios
Can this be used for any transactional data, handling revenue or application sensitive data?
Based on what I read, looks like it is meant for handling data that one should not be worried about any data loss. Is this the case?
It is mainly designed for large scale ingestion of data. That is why typical scenario's include IoT solutions which consists of a multitude of devices sending mass amounts of telemetry data.
To allow for this kind of scale it does not include some features other messaging service, like Azure Service Bus, do have. I think this blog does a good job of listening the differences. Especially the section Use Case explains things very well:
From a target use case perspective if we consider some of our typical enterprise integration patterns then if you are implementing a pattern which uses a Command Message, or a Request/Reply Message then you probably want to use Azure Service Bus Messaging.  RPC patterns can be implemented using Request/Reply messages on Azure Service Bus using a response queue.  These are really about ESB and EAI style messaging patterns where you want to send messages between applications and probably want to use other features such as property based routing.
Azure Event Hubs is more likely to be used if you’re implementing patterns with Event Messages and you want somewhere reliable to send them that is capable of dealing with a massive scale but will allow you to do stuff with the events out of process.
With these core target use cases in mind it is easy to see where the scale differences come into play.  For messaging it’s about one application telling one or more apps to DO SOMETHING or GIVE ME SOMETHING.  The alternative is that in eventing the applications are saying SOMETHING HAS HAPPENED.  When you consider this in typical application scenarios and you put events into the telemetry and logging space you can quickly see that the SOMETHING HAS HAPPENED scenario will produce a lot more traffic than the other.
Now I’m not saying that you can’t implement some messaging type functions using event hubs and that you can’t push events to a Service Bus topic as in integration there are always different requirements which result in different implementation scenarios, but I think if you follow the above as a general rule then you will usually be on the right path.
That does not mean however, that it is only capable of handling data that one should not be worried about any data loss. Data is stored for a configurable amount of time and if necessary, this data can be read from an earlier point in time.
Now, given your scenario I do not think Event Hub is the best fit. But truth to be told, I am not sure because you will have to elaborate more on what you want to do exactly.
Addition
The idea behind Event Hubs is that you will get at least once delivery at great scale. (Source). See also this question: Does Azure Event Hub guarantees at least once delivery?

Is there a compelling reason to use an AMQP based server over something like beanstalkd or redis?

I'm writing a piece to a project that's responsible for processing tasks outside of the main application facing data server, which is written in javascript using Node.js. It needs to handle tasks which are scheduled in the future and potentially handle tasks that are "right now". The "right now" just means the next time a worker becomes available it will operate on that task, so that bit might not matter. The workers are going to all talk to external resources, an example job would be to send an email. We are a small shop and we don't have a ton of resources so one thing I don't want to do is start mixing languages at this point in the process, and I already see that Node can do this for us pretty easily, so that's what we're going to go with unless I see a compelling reason not to before I start coding, which is soon.
All that said, I can't tell if there is a compelling reason to use an AMQP based server, like OpenAMQ or RabbitMQ over something like Kue or Beanstalkd with a node client. So, here we go:
Is there a compelling reason to use an AMQP based server over something like beanstalkd or redis with Kue? If yes, which AMPQ based server would fit best with the architecture that I laid out? If no, which nosql solution (beanstalkd, redis/Kue) would be easiest to set up and fastest to deploy?
FWIW, I'm not accepting my answer yet, I'm going to explain what I've decided and why. If I don't get any answers that appear to be better than what I've decided, I'll accept my own later.
I decided on Kue. It supports multiple workers running asynchronously, and with cluster it can take advantage of multicore systems. It is easily extended to provide security. It's backed with Redis, which is used all over for this exact thing, so I know I'm not backing my job process server with unproven software (that's not to say that any of the others are unproven.)
The most compelling reasons that I picked Kue is that it provides a JSON api so that the client applications (The first client is going to be a web based application, but we're planning on making smartphone apps also) can add jobs easily without going through the main application facing node instance, so I can be totally out of the way of the rest of my team as I write this. I don't need a route, I don't need anything, and it's all provided for me so I don't need to write anything to support this. This has another advantage, with an extention to provide l/p security only authorized clients can add jobs, so I don;t have to expose my redis server to client applications directly. It also has a built in web console and the API allows the client to pull back lists of jobs associated with a given user very easily, so we can show the user all of their scheduled tasks in a nifty calendar view with 0 effort on my part.
The other compelling reason is the lack of steep learning curve associated with getting redis and Kue going for me. I've set up redis before, and Kue is simple and effective.
Yes, I'm a lazy developer, but I'm the good kind of lazy developer.
UPDATE:
I have it working and doing jobs, the throughput is amazing. I split out the task marshaling logic into it's own node instance, basically all I have to do is deploy my repo to a new machine and run node task-server.js to scale out my workers. I may need to add in some more job searching calls to Kue, because of how I implimented a few things, but that will be easy.

Resources