Using Pub/Sub for Google Cloud Storage with GKE - node.js

I have a GKE application that currently is driven by Notifications from a Google Cloud Storage bucket. I want to convert this node.js application to be triggered instead by PubSub notifications. I've been crawling through Google documentation pages most of the day, and do not have a clear answer. I see some python code that might do it, but it's not helping much.
The code as it is currently written is working - an image landing in my GCS bucket triggers a notification to my GKE pod(s), and my function runs. Trying to understand what I need to do inside my function to subscribe to a Pub/Sub topic to trigger the processing. Any and all suggestions welcome.

Firstly thanks, I didn't know the notification capability of GCS!!
The principle is close but you use PubSub as intermediary. Instead of notify directly your application with a watchbucket command, you notif a PubSub topic.
From there, the notifications arrive in PubSub topic, now you have to create a subscription. 2 types are possible:
Push: you specify an HTTP URL that is called with a POST request, and the body contain the notification message.
Pull: your application need to create a connection with the PubSub subscription and to read the messages.
Pro and cons
Push requires an authentication from the PubSub push subscription to your application. And if you use internal IP, you can't use this solution (URL endpoint must be publicly accessible). The main advantage is the scalability and the simplicity of the model.
Pull require an authentication of the subscriber (here your application) and thus, even if your application is privately deployed, you can use Pull subscription. Pull is recommended for high throughput but require higher skill in processing, concurrency/multi-threading programming. You don't scale on request rate (as with Push model) but according to the number of message that you read. And you need to acknowledge manually the messages.
Data model is mentioned here. Your pubsub message is like that
{
"data": string,
"attributes": {
string: string,
...
},
"messageId": string,
"publishTime": string,
"orderingKey": string
}
The attributes are discribed in the documentation and the payload (base64 encoded, be carefull) has this format. Very similar of what you get today.
So, why the attributes? Because you can use the filter feature on PubSub to create subscription with only a subset of messages.
You can also shiht gears and use Cloud Event (base on Knative events) if you use Cloud Run for Anthos in your GKE cluster. Here, the main advantage is the portability of the solution, because the messages are compliant with Cloud Event format and not specific to GCP.

Related

Subscribe non-exposed NodeJS Application to SNS

On my current architecture, I have a NodeJS application that posts a message on SQS, which triggers a lambda function that (finally) puts the result on MongoDB. While the lambda is running, the NodeJS app pools MongoDB until the status field changes to SUCCESS or FAILED.
I would like to change this architecture to be event-driven rather than relying on pooling. To achieve that I considered changing both the NodeJS app to subscribe to a SNS topic and the Lambda function to post the result to that topic.
However, I faced a challenge when attempting to subscribe to the SNS topic. The subscribe method demands an Endpoint to confirm the subscription, and the NodeJS in question is not exposed (it's not an API). So how could the subscription be confirmed?
I understand that AWS might want to avoid spams by implementing subscription confirmation for SMS and Email, but on this case, the subscriber is a simple application...
Is there any way to subscribe tot he topic without exposing the NodeJS application? Or SNS is not the appropriate solution here?
I have used RabbitMQ for this on the past, but I would rather not deploy an instance and instead leverage a platform-as-a-service type of product.

Azure Service Bus Topic as Google PubSub Push Subscriber

The requirement is that we have an Azure Service Bus topic which we want to set up as Push Subscriber in Google PubSub Topic. This way any messages published to Google PubSub Topic will be pushed to Azure SB Topic without any intermediate layer involved.
On Paper this should work because messages can be published to Azure SB Topic using API framework and Google PubSub can also configure API as Push Subscribers.
I have gone through following articles but couldn't make this linking workout.
Azure SB as API: https://learn.microsoft.com/en-us/rest/api/servicebus/send-message-to-queue
Google PubSub Push Subscriptions: https://cloud.google.com/pubsub/docs/push
Has anyone done this kind of linking before?
Thanks in Advance
It would be nice to create a pub/sub push subscription, which pushes to Azure Event Hub. I ran into the same problem when configuring this setup.
Pub/sub push subscriptions currently do not support custom Authorization headers, which are needed as per the Event Hub documentation.
POST https://your-namespace.servicebus.windows.net/your-event-hub/messages?timeout=60&api-version=2014-01 HTTP/1.1
Authorization: SharedAccessSignature sr=your-namespace.servicebus.windows.net&sig=your-sas-key&se=1403736877&skn=RootManageSharedAccessKey
Content-Type: application/atom+xml;type=entry;charset=utf-8
Host: your-namespace.servicebus.windows.net
{ "DeviceId":"dev-01", "Temperature":"37.0" }
So the only two options I see here:
Push setup: Create a cloud function or dataflow job in Google Cloud. Push pub/sub events to this endpoint and then pass on the event to Azure Event hub with the appropriate headers.
Pull setup: Poll a pub/sub pull subscription from the Azure side with an Azure function or WebJob.
Both options require extra compute resources, so definitely not the preferred way of doing this. I would always try a push setup first, since then you don't have t o continuously have a polling job running in the background.
I have hopes that pub/sub push subscriptions will support custom headers anywhere in the future. Lately some other useful features have been added like: Dead Lettering, Message Ordering and Partitioning (Lite Topics). Hopefully they will add custom headers as well.
I have a workaround on this problem.
1- you can create cloud function on google cloud that can push data for you on azure service bus.
2- you can develop web job on azure that will run continuously and check google pub/sub topic with the help of provided connection string and google pub/sub supporting libraries.
From above mentioned solution you can get data from google cloud and push to your azure service bus.

How to consume events delivered by Azure Event Grid to GCP

Basically what I understood from few Azure topics is as below:
Azure Event Hub - where data is received initially and converted into events
Service Bus- acting as a queue
Azure Event Grid - where events converted in hub are transferred here.
so the connection is like below:
Hub -> Service Bus -> Event Grid -> Pub Sub -> Storage
I understood this concept. My problem is I want data to be pushed from the event grid to GCP (subscription / topics). My question are:
How can I establish this using PUSH method?
What do I need to develop exactly?
How can I push things from grid to pubsub/subscriptions?
I found this link where data is getting published into Event Grid but I want to push data from the event grid to gcp. Can anybody explain me where am I going wrong or what exactly should I start with. I am new to this and its very confusing so I just need little bit of guidance over here.
I have below doubts:
Is there any direct subscriber option available with event grid listener? I mean can I directly link my google storage account with this listener so, whenever there is an event triggered it will be directly pushed to my GCP account(I don't have Azure account with me right now since access issue is in progress so I can't see it that's why I am asking here)
Suppose I have 20 columns in my data but I want only 16 columns to be pushed in GCP so is there any customization possible while sending data from event grid/event hub to pub/sub
If I write custom connectors code as per the links provided in the below answers then how can I run it? I mean where I can deploy those scripts on the cloud so that they will be triggered automatically whenever an event is triggered?
Can I implement webhooks in this scenario? (as an alternative to connectors), If yes then how can I do it and on which side do I need to create it?
Also, I read some articles and I came to know from a few guys that they experienced data loss in this entire process. So, what's the possibility over here and how can it be avoided
Can anybody explain me where am I going wrong or what exactly should I start with.
It's right here:
so the connection is like below:
Hub -> Service Bus -> Event Grid -> Pub Sub -> Storage
Although this might be the case, it sounds very much as if you're looking at one (very) specific scenario where data flows in this exact way.
Azure Event Hub, Azure Service Bus and Azure Event Grid can work together, but can also be used completely separate from each other.
Event Grid
The purpose of Event Grid is to enable Reactive programming. Use this when you want to react to (status) changes.
Event Hubs
Event Hubs facilitate a big data pipeline. Use this when you need telemetry and distributed data streaming.
Service Bus
The purpose of Service bus is to enable High-value enterprise messaging. Use this when you want to do something like Order processing and financial transactions.
In some cases, you use the services side by side to fulfill distinct roles. For example, an ecommerce site can use Service Bus to process the order, Event Hubs to capture site telemetry, and Event Grid to respond to events like an item was shipped.
In other cases, you link them together to form an event and data pipeline. You use Event Grid to respond to events in the other services. For an example of using Event Grid with Event Hubs to migrate data to a data warehouse, see Stream big data into a data warehouse.
Taken from the very interesting and important documentation article Choose between Azure messaging services - Event Grid, Event Hubs, and Service Bus
EDIT
My problem is I want data to be pushed from event grid to GCP (subscription / topics). So how can I establish this using PUSH method??
Possibly the simplest solution is to have an Event Grid Event trigger a webhook (which might run an Azure Function or a Google Cloud Function) which in turn puts the event/message on the GCP Topic.
Publishing messages is quite well documented. There are examples on how to do so with a REST call, command-line, C#, Go, JAVA, NodeJS, PHP, Python and Ruby.
EDIT 2
What you need to do is create an Event Grid Subscription to listen to and handle Event Grid Events.
Here's an example screenshot on how to listen for events for a specific Storage Account and call a WebHook whenever such an event occurs:
Pay attention to the "Endpoint Details": that's where you can specify to, for instance, call a webhook every time an event is triggered.
The easiest way to transfer the EventHub generated events would probably be to create an EventHub event receiver in Node.js (which you mentioned in your comments) as described here, which receives events and publishes them to Cloud Pub/Sub directly, as described in the Cloud Pub/Sub publisher documentation for Node.js.

How to forward a message from Azure estate to NodeJS app?

I'm creating an IoT solution. On my web app i want to have a table on the dashboard that updates alert events in realtime.
I currently have an API, written in NodeJS that receives the JSON and invokes sockets.io to update the table. This solution seems a bit clunky.
I'm wondering is there a more seamless way to do this, similar to how the different components within azure link together.
I've looked into Azure Queues and having nodeJS subscribe and consume from the queue but as far as I could find there's no way to persist a queue connection from NodeJS and I would have had to continually poll.
I've looked into Power BI as well but I need to be able to completely change and modify every aspect of the design too.
The following is what I have so far:
Devices send data to IoT Hub. This is then processed by an azure stream analytics job and if a certain criteria is hit it sends the message to a documentDB for storage, and also to a service bus queue. I have a logic app which is triggered by a message arriving on the service bus queue and then POSTs the data to my API.

Reading Azure Service Bus Queue

I'm simply trying to work out how best to retrieve messages as quickly as possible from an Azure Service Bus Queue.
I was shocked that there wasn't some way to properly subscribe to the queue for notifications and that I'm going to have to poll. (unless I'm wrong in which case the documentation is terrible).
I got long polling working, but checking a single message every 60 seconds looks like it'll cost around £900 per month (again, unless I've misunderstood that). And if I add a redundant/second service to poll it'll double.
So I'm wondering what the best/most cost efficient way of doing it is.
Essentially I just want to take a message from the queue, perform an API lookup on some internally held data (perhaps using hybrid services?) and then perhaps post a message back to a different queue with some additional information .
I looked at worker roles(?) -- is that something that could do it?
I should mention that I've been looking at doing this with node.js.
Check out these videos from Scott Hanselman and Mark Simms on Azure Queues.
It's C# but you get the idea.
https://channel9.msdn.com/Search?term=azure%20queues%20simms#ch9Search
Touches on:
Storage Queues vs. Service Bus Queues
Grabbing messages in bulk vs. one by one (chunky vs. chatty)
Dealing with poison messages (bad actors)
Misc implementation details
Much more stuff i can't remember now
As for your compute, you can either do a VM, a Worker Role (Cloud Services), App Service Webjobs, or Azure Functions.
The Webjobs SDK and Azure Functions bot have a way to subscribe to Queue events (notify on message).
(Listed from IaaS to PaaS to FaaS - Azure Functions - if such a thing exists).
Azure Functions already has sample code provided as templates to do all that with Node. Just make a new Function and follow the wizard.
If you need to touch data on-prem you either need to look at integrating with a VNET that has site-to-site connectivity back to your prem, or Hybrid Connections (App Service only!). Azure Functions can't do that yet, but every other compute is a go.
https://azure.microsoft.com/en-us/documentation/articles/web-sites-hybrid-connection-get-started/
(That tutorial is Windows only but you can pull data from any OS. The Hybrid Connection Manager has to live on a Windows box, but then it acts as a reverse proxy to any host on your network).
To deal with Azure ServiceBus Queue easily, the best option seems to be Azure Webjob.
There is a ServiceBusTrigger that allows you to get messages from an Azure ServiceBus queue.
For node.js integration, you should have a look at Azure Function. It is built on top of the webjob SDK and have node.js integration :
Azure Functions NodeJS developer reference
Azure Functions Service Bus triggers and bindings for queues and topics
In the second article, there is an example on how get messages from a queue using Azure Function and nodejs :
module.exports = function(context, myQueueItem) {
context.log('Node.js ServiceBus queue trigger function processed message', myQueueItem);
context.done();
};

Resources