How to conditionally poll messages from Kafka Topic - node.js

I have a few task notifications in a MongoDB database. Each task has a due_date and reminder flag. I am pushing these tasks to a Kafka Topic. There is a Node JS app that polls from this topic and pushes notifications to a frontend app based on the due_date and reminder flag. The due_date could be past dated or upcoming.
From Kafka we need to send notifications to the Node App that is listening whenever those conditions time-based conditions occur:
Reminder = true and it is X time before the Due Date
Due Date = now
The Task still exists and is Past Due
How can this be done with Kafka?

DB to Kafka interaction should be via source connector. DB Connectors can publish events to Kafka whenever there is a change in underlying table. So if new rows are created or any column is updated.
So ideal solution would be to introduce some more columns in table OR a new utility table with columns to identify the conditions you mentioned above. May be a column like "IsDueDate" which can be a boolean type. Create a scheduler in DB (not sure of Mongo but most DBs have option for this) Or any batch system (like Spring batch/boot app) to validate your data and populate these columns.
Once these columns are updated, it will trigger a message to Kafka via connector and your front end apps polls Kafka for new messages and ultimately can use these flags in payload to identify which condition triggered this and you can do the stuffs in front end.

Related

How to ensure idempotency on event hub on consumers that only stores aggregated information?

I'm working on an event-driven micro-services architecture and I'm using eventhubs to send a lot of data (around 20-30k events per minute) to multiple consumer groups and I'm using Azure Functions EventHubTrigger to process these events.
The data I'm passing around has a unique identifier and my other consumers can guarantee idempotency since I'm storing them on their data stores as well upon processing - so if the unique event identifier already exists, I can skip processing for that specific event.
I do however have one service that only does data-aggregation for reporting to a relational database - doing counts, sums, and what-not. Pretty much upserts so that I can do some queries against it to produce reports - and I did see quite a bit of events that have been processed multiple times.
So an idea that I had was to just have some sort of event store. Redis with TTL, or Azure Table Storage, or even a table on my relational database that only contains a single field with a unique constraint so I can do a transaction on the whole event processing.
Is there a better way to do this?

Not able to understand correctly, how Event Hub consumer group working

Requirement I want to process event data (same data) from multiple consumers parallel
What I understand from documentation to process same data form event hub we need to create multiple consumer groups.
Consumer groups enable multiple consuming applications to each have a separate view of the event stream, and to read the stream independently at its own pace and with its own offsets.
Message Retention 1,
Partition count 3
Problem: I'm not getting data form log1 consumer group, Instead log1 When I tried $Default and log2 consumer group I was getting data in parallel.
Can anyone help me understand this problem?
Based on the official doc , your understanding is correct. I tried to create my event hub as same as yours to reproduce your issue.
And I start 2 EPH via java code to monitor Log1 and log2 separately.
Then I send 10 messages to event hub to observe the 2 EPH console log.
It seems that I can receiver messages from both of them,however the order is messy. I guess the your issue is delay.
Update Answer:
Per my knowledge, delay in Azure event hub may be affected by many aspects. Maybe network , internal internal mechanisms. However ,Azure event hub ensure that your data is not lost during effective storage time.
If you need to ensure real time data , you could use Kafka which is used for building real-time data pipelines and streaming apps.

Is it possible to have external trigger in Cassandra?

I need a worker to subscribe to new data entries in a column family.
I have to invoke the services consuming data on the producer side, or poll the column family for new data, which is a waste of resources and also leads to some extended latency.
I want some external service to be invoked when new data is written to column family. Is it possible to invoke an external service, such as an REST endpoint upon new data arrival?
There are two features, triggers and CDC (change data capture) that may work. You can create a trigger to receive updates and execute the http request, or you can use CDC to get a per replica copy of the mutations as a log to walk through.
CDC is better for consistency, since a trigger fires before mutations applied, your API endpoint may be notified but then have the mutation fail to apply so your at an inconsistent state. But triggers are easier since you dont need to worry about deduplication since its only 1 per query vs 1 per replica. Or you can use both, triggers that update a cached state and then CDC with a map reduce job to fix any inconsistencies.

Creating a Table in Node OPCUA

How to extend the address space into your SQL database with bidirectional mirroring, which immediately reflects any value change in the variable or database end in the opposite end.
So if I have a table in Database, whose values can be changed from outside(for-example data could be added, deleted or updated), how would my node-opcua server would be notified?
In OPC UA, any server which is created will follow SoA architecture. Meaning server will process request only when some service request.
In your case, you can achieve that with the help of Subscribing for Data Change and Monitoring the node which exposes the table in your data base to client. Subscribing for data change will be possible only when that node is exposed to client.
Once node is subscribed for data change, there are 2 values which is needed by server from client.
Sampling interval: how frequently server should refresh data from source
Publishing interval: how frequently client is going to ask for notification from server.
Lets say for example Sampling interval is 100 milliseconds and Publishing interval is 1 minute. Meaning Server has to collect the samples from the source (in your case it could be data base) at every 100 milliseconds, But Client will request for all those collected samples every 1 minute.
In you will be able to achieve updating the server with the changed values for table in database.
If SDK supports multi threading, then there is another way to achieve what is mentioned in question.
In server application, let the data source (i.e. data base) object be running in one thread.
Create a callback to server application layer and intialise data source object with this callback.
When data changes in data base, trigger a call to data source thread from data base. and if it is the required data and need to be informed to server, call the callback function which is initialized earlier.
I hope this answered your question.

Detect a new record was added to cassandra table

I have a requirement: when a new comment is posted, i want to get all previous comment's owner id and send a notification.
Problem here is how will i know that a new comment was added to cassandra table. What will the solution for this kind of requirement ?
If you want to use only cassandra, without changes, it's impossible.
With changes, you have three options:
You can use cassandra as embedded service in java. Here is a simple and short how to: http://prettyprint.me/prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/index.html
You can create a wrapper for your cassandra connection. An Application which handles the Cassandra Connection and is available via API for your other application.
Cassandra has a trigger functionality. (Never used it and never heard that someone is using it)
I prefer the second solution. Here are the reasons why:
It's simpler to create.
You can handler all your views in this application.
You can validate the input, resolve relations, logging data etc.
You can simply push the new added comment to kafka or another message queue.
This could be a setup:
Create a new comment -> call a backend api -> call the cassandra database interface -> push a new message to kafka -> send the data to all kafka consumer

Resources