Proper way of tracing distributed requests through Azure Function Apps

Proper way of tracing distributed requests through Azure Function Apps - azure

I am experimenting with Node.js and the application insights SDK in two separate function apps. Nodejs is just what I am comfortable with to quickly poc, this might not be the final language so I don't want to know any language specific solutions, simply how application insights behaves in the context of function apps and what it expects to be able to draw a proper application map.
My goal is to be able to write simple queries in log analytics to get the full chain of a single request through multiple function apps, no matter how these are connected. I also want an accurate (as possible) view of the system in the application map in application insights.
My assumption is that a properly set operation_id and operation_parentId would yield both a queryable trace using kusto and a proper application map.
I've set up the following flow:
Function1 only exposes a HTTP trigger, whereas Function2 exposes both a HTTP and Service Bus trigger.
The full flow looks like this:
I call Function1 using GET http://function1.com?input=test
Function1 calls Function2 using REST at GET http://function2.com?input=test
Function1 uses the response from Function2 to add a message to a service bus queue
Function2 has a trigger on that same queue
I am mixing patterns here just to see what the application map does and understand how to use this correctly.
For step 1 through 3, I can see the entire chain in my logs on a single operation_Id. In this screenshot the same operationId spans two different function apps:
What I would expect to find in this log is also the trigger of the service bus where the trigger is called ServiceBusTrigger. The service bus does trigger on the message, it just gets a different operationId.
To get the REST correlation to work, I followed the guidelines from applicationinsights npm package in the section called Setting up Auto-Correlation for Azure Functions.
This is what Function1 looks like (the entrypoint and start of the chain)
let appInsights = require('applicationinsights')
appInsights.setup()
.setAutoCollectConsole(true, true)
.setDistributedTracingMode(appInsights.DistributedTracingModes.AI_AND_W3C)
.start()
const https = require('https')
const httpTrigger = async function (context, req) {
context.log('JavaScript HTTP trigger function processed a request.')
const response = await callOtherFunction(req)
context.res = {
body: response
}
context.log("Sending response on service bus")
context.bindings.outputSbQueue = response;
}
async function callOtherFunction(req) {
return new Promise((resolve, reject) => {
https.get(`https://function2.azurewebsites.net/api/HttpTrigger1?code=${process.env.FUNCTION_2_CODE}&input=${req.query.input}`, (resp) => {
let data = ''
resp.on('data', (chunk) => {
data += chunk
})
resp.on('end', () => {
resolve(data)
})
}).on("error", (err) => {
reject("Error: " + err.message)
})
})
}
module.exports = async function contextPropagatingHttpTrigger(context, req) {
// Start an AI Correlation Context using the provided Function context
const correlationContext = appInsights.startOperation(context, req);
// Wrap the Function runtime with correlationContext
return appInsights.wrapWithCorrelationContext(async () => {
const startTime = Date.now(); // Start trackRequest timer
// Run the Function
const result = await httpTrigger(context, req);
// Track Request on completion
appInsights.defaultClient.trackRequest({
name: context.req.method + " " + context.req.url,
resultCode: context.res.status,
success: true,
url: req.url,
time: new Date(startTime),
duration: Date.now() - startTime,
id: correlationContext.operation.parentId,
});
appInsights.defaultClient.flush();
return result;
}, correlationContext)();
};
And this is what the HTTP trigger in Function2 looks like:
let appInsights = require('applicationinsights')
appInsights.setup()
.setAutoCollectConsole(true, true)
.setDistributedTracingMode(appInsights.DistributedTracingModes.AI_AND_W3C)
.start()
const httpTrigger = async function (context, req) {
context.log('JavaScript HTTP trigger function processed a request.')
context.res = {
body: `Function 2 received ${req.query.input}`
}
}
module.exports = async function contextPropagatingHttpTrigger(context, req) {
// Start an AI Correlation Context using the provided Function context
const correlationContext = appInsights.startOperation(context, req);
// Wrap the Function runtime with correlationContext
return appInsights.wrapWithCorrelationContext(async () => {
const startTime = Date.now(); // Start trackRequest timer
// Run the Function
const result = await httpTrigger(context, req);
// Track Request on completion
appInsights.defaultClient.trackRequest({
name: context.req.method + " " + context.req.url,
resultCode: context.res.status,
success: true,
url: req.url,
time: new Date(startTime),
duration: Date.now() - startTime,
id: correlationContext.operation.parentId,
});
appInsights.defaultClient.flush();
return result;
}, correlationContext)();
};
The Node.js application insights documentation says:
The Node.js client library can automatically monitor incoming and outgoing HTTP requests, exceptions, and some system metrics.
So this seems to work for HTTP, but what is the proper way to do this over (for instance) a service bus queue to get a nice message trace and correct application map? The above solution for the applicationinsights SDK seems to only be for HTTP requests where you use the req object on the context. How is the operationId persisted in cross-app communication in these instances?
What is the proper way of doing this across other messaging channels? What do I get for free from application insights, and what do I need to stitch myself?
UPDATE
I found this piece of information in the application map documentation which seems to support the working theory that only REST/HTTP calls will be able to get traced. But then the question remains, how does the output binding work if it is not a HTTP call?
The app map finds components by following HTTP dependency calls made between servers with the Application Insights SDK installed.
UPDATE 2
In the end I gave up on this. In conclusion, Application Insights traces some things but it is very unclear when and how that works and also depends on language. For the Node.js docs it says:
The Node.js client library can automatically monitor incoming and outgoing HTTP requests, exceptions, and some system metrics. Beginning in version 0.20, the client library also can monitor some common third-party packages, like MongoDB, MySQL, and Redis. All events related to an incoming HTTP request are correlated for faster troubleshooting.
I solved this by taking inspiration from OpenTracing. Our entire stack runs in Azure Functions, so I've implemented logic to use correlationId that passes through all processes. Each process is a span. Each function/process is responsible for logging according to a structured logging framework.

Related

How to cancel a task enqueued on Firebase Functions?

I'm talking about this: https://firebase.google.com/docs/functions/task-functions
I want to enqueue tasks with the scheduleTime parameter to run in the future, but I must be able to cancel those tasks.
I expected it would be possible to do something like this pseudo code:
const task = await queue.enqueue({ foo: true })
// Then...
await queue.cancel(task.id)
I'm using Node.js. In case it's not possible to cancel a scheduled task with firebase-admin, can I somehow work around it by using #google-cloud/tasks directly?
PS: I've also created a feature request: https://github.com/firebase/firebase-admin-node/issues/1753

The Firebase SDK doesn't return the task name/ID right now as in the code.
If you need this functionality, I'd recommend filing a feature request and meanwhile use Cloud Tasks directly.
You can simply create a HTTP Callable Function and then use the Cloud Tasks SDK to create a HTTP Target tasks that call this Cloud Function instead of using the onDispatch.
// Instead of onDispatch()
export const handleQueueEvent = functions.https.onRequest((req, res) => {
// ...
});
Adding a Cloud Task:
async function createHttpTask() {
const parent = client.queuePath(project, location, queue);
const task = {
httpRequest: {
httpMethod: 'POST', // change method if required
url, // Use URL of handleQueueEvent function here
},
};
const request = {
parent: parent,
task: task
};
const [response] = await client.createTask(request);
return;
}
Checkout the documentation
for more information.

NodeJS Express API - Ticketing/Queue System

Rephrased at the end
NodeJS communicates with other APIs through GRPC.
Each external API has its own dedicated GRPC connection with Node and every dedicated GRPC connection has an upper bound of concurrent clients that it can serve simultaneously (e.g. External API 1 has an upper bound of 30 users).
Every request to the Express API, may need to communicate with External API 1, External API 2, or External API 3 (from now on, EAP1, EAP2 etc) and the Express API also has an upper bound of concurrent clients (e.g. 100 clients) that can feed the EAPs with.
So, how I am thinking of solving the issue:
A Client makes a new request to the Express API.
A middleware, queueManager, creates a Ticket for the client (think of it as a Ticket that approves access to the System - it has basic data of the Client (e.g. name))
The Client gets the Ticket, creates an Event Listener that listens
to an event with their Ticket ID as the event name (when the System
is ready to accept a Ticket, it yields the Ticket's ID as an event)
and enters a "Lobby" where, the Client, just waits till their ticket
ID is accepted/announced (event).
My issue is that I can't really think of how to implement the way that the system will keep track of the tickets and how to have a queue based on the concurrent clients of the system.
Before the client is granted access to the System, the System itself should:
Check if the Express API has reached its upper-bound of concurrent clients -> If that's true, it should just wait till a new Ticket position is available
If a new position is available, it should check the Ticket and find out which API it needs to contact. If, for example, it needs to contact EAP1, it should check how many current clients use the GRPC connection. This is already implemented (Every External API is under a Class that has all the information that is needed). If the EAP1 has reached its upper-bound, then NodeJS should try again later (But, how much later? Should I emit a system event after the System has completed another request to EAP1?)
I'm aware of Bull, but I am not really sure if it fits my requirements.
What I really need to do is to have the Clients in a queue, and:
Check if Express API has reached its upper-bound of concurrent users
If a position is free, pop() a Ticket from the Ticket's array
Check if the EAPx has reached its upper-bound limit of concurrent users
If true, try another ticket (if available) that needs to communicate
with a different EAP
If false, grant access
Edit: One more idea could be to have two Bull Queues. One for the Express API (where the option "concurrency" could be set as the upper bound of the Express API) and one for the EAPs. Each EAP Queue will have a distinct worker (in order to set the upper bound limits).
REPHRASED
In order to be more descriptive about the issue, I'll try to rephrase the needs.
A simple view of the System could be:
I have used Clem's suggestion (RabbitMQ), but again, I can't achieve concurrency with limits (upper-bounds).
So,
Client asks for a Ticket from the TicketHandler. In order for the TicketHandler to construct a new Ticket, the client, along with other information, provides a callback:
TicketHandler.getTicket(variousInfo, function () {
next();
})
The callback will be used by the system to allow a Client to connect with an EAP.
TickerHandler gets the ticket:
i) Adds it to the queue
ii) When the ticket can be accessed (upper-bound is not reached), it asks the appropriate EAP Handler if the client can make use of the GRPC connection. If yes, then asks the EAP Handler to lock a position and then it calls the ticket's available callback (from Step 1)
If no, TicketHandler checks the next available Ticket that needs to contact a different EAP. This should go on until the EAP Handler that first informed TicketHandler that "No position is available", sends a message to TicketHandler in order to inform it that "Now there are X available positions" (or "1 available position"). Then TicketHandler, should check the ticket that couldn't access EAPx before and ask again EAPx if it can access the GRPC connection.

From your description I understand what follows:
You have a Node.js front-tier. Each Node.js box needs to be limited to up to 100 clients
You have an undefined back-tier that has GRPC connections with the boxes in the front-tier (let's call them EAPs). Each EAP <-> Node.js GRPS link is limited to N concurrent connections.
What I see here are only server-level and connection-level limits thus I see no reason to have any distributed system (like Bull) to manage the queue (if the Node.js box dies there is no one able to recover the HTTP request context to offer a response to that specific request - therefore when a Node.js box dies responses to its requests are not more useful).
This being considered I would simply create a local queue (as simple as an array) to manage your queuing.
Disclaimer: this has to be considered pseudo-code what follows is simplified and untested
This may be a Queue implementation:
interface SimpleQueueObject<Req, Res> {
req: Req;
then: (Res) => void;
catch: (any) => void;
}
class SimpleQueue<Req = any, Res = any> {
constructor(
protected size: number = 100,
/** async function to be executed when a request is de-queued */
protected execute: (req: Req) => Promise<Res>,
/** an optional function that may ba used to indicate a request is
not yet ready to be de-queued. In such case nex request will be attempted */
protected ready?: (req: Req) => boolean,
) { }
_queue: SimpleQueueObject<Req, Res>[] = [];
_running: number = 0;
private _dispatch() {
// Queues all available
while (this._running < this.size && this._queue.length > 0) {
// Accept
let obj;
if (this.ready) {
const ix = this._queue.findIndex(o => this.ready(o.req));
// todo : this may cause queue to stall (for now we throw error)
if (ix === -1) return;
obj = this._queue.splice(ix, 1)[0];
} else {
obj = this._queue.pop();
}
// Execute
this.execute(obj.req)
// Resolves the main request
.then(obj.then)
.catch(obj.catch)
// Attempts to queue something else after an outcome from EAP
.finally(() => {
this._running --;
this._dispatch();
});
obj.running = true;
this._running ++;
}
}
/** Queue a request, fail if queue is busy */
queue(req: Req): Promise<Res> {
if (this._running >= this.size) {
throw "Queue is busy";
}
// Queue up
return new Promise<Res>((resolve, reject) => {
this._queue.push({ req, then: resolve, catch: reject });
this._dispatch();
});
}
/** Queue a request (even if busy), but wait a maximum time
* for the request to be de-queued */
queueTimeout(req: Req, maxWait: number): Promise<Res> {
return new Promise<Res>((resolve, reject) => {
const obj: SimpleQueueObject<Req, Res> = { req, then: resolve, catch: reject };
// Expire if not started after maxWait
const _t = setTimeout(() => {
const ix = this._queue.indexOf(obj);
if (ix !== -1) {
this._queue.splice(ix, 1);
reject("Request expired");
}
}, maxWait);
// todo : clear timeout
// Queue up
this._queue.push(obj);
this._dispatch();
})
}
isBusy(): boolean {
return this._running >= this.size;
}
}
And then your Node.js business logic may do something like:
const EAP1: SimpleQueue = /* ... */;
const EAP2: SimpleQueue = /* ... */;
const INGRESS: SimpleQueue = new SimpleQueue<any, any>(
100,
// Forward request to EAP
async req => {
if (req.forEap1) {
// Example 1: this will fail if EAP1 is busy
return EAP1.queue(req);
} else if (req.forEap2) {
// Example 2: this will fail if EAP2 is busy and the request can not
// be queued within 200ms
return EAP2.queueTimeout(req, 200);
}
}
)
app.get('/', function (req, res) {
// Forward request to ingress queue
INGRESS.queue(req)
.then(r => res.status(200).send(r))
.catch(e => res.status(400).send(e));
})
Or this solution will allow you (as requested) to also accept requests for busy EAPs (up to a max of 100 in total) and dispatch them when they become ready:
const INGRESS: SimpleQueue = new SimpleQueue<any, any>(
100,
// Forward request to EAP
async req => {
if (req.forEap1) {
return EAP1.queue(req);
} else if (req.forEap2) {
return EAP2.queue(req);
}
},
// Delay queue for busy consumers
req => {
if (req.forEap1) {
return !EAP1.isBusy();
} else if (req.forEap2) {
return !EAP2.isBusy();
} else {
return true;
}
}
)
Please note that:
in this example, Node.js will start throwing when more than 100 concurrent requests are received (it is not unusual to throw a 503 while throttling)
Be careful when you have more throttling limits (Node.js and GRPC in your case) as the first may cause the seconds starvation (think about receiving 100 requests for EAP1 and then 10 for EAP2, Node.js will be full with EAP1 requests and will refuse EAP2 ones all do EAP2 is doing nothing)

Slack delayed message integration with Node TypeScript and Lambda

I started implementing a slash-command which kept evolving and eventually might hit the 3-second slack response limit. I am using serverless-stack with Node and TypeScript. With sst (and the vscode launchfile) it hooks and attaches the debugger into the lambda function which is pretty neat for debugging.
When hitting the api endpoint I tried various methods to send back an acknowledgement to slack, do my thing and send a delayed message back without success. I didnt have much luck finding info on this but one good source was this SO Answer - unfortunetly it didn't work. I didn't use request-promise since it's deprecated and tried to implement it with vanilla methods (maybe that's where i failed?). But also invoking a second lambda function from within (like in the first example of the post) didn't seem to be within the 3s limitation.
I am wondering if I am doing something wrong or if attachinf the debugger is just taking to long etc.
However, before attempting to send a delayed message it was fine including accessing and scaning dynamodb records, manipulating the results and then responding back to slack while debugger attached without hitting the timeout.
Attempting to use a post
export const answer: APIGatewayProxyHandlerV2 = async (
event: APIGatewayProxyEventV2, context, callback
) => {
const slack = decodeQueryStringAs<SlackRequest>(event.body);
axios.post(slack.response_url, {
text: "completed",
response_type: "ephemeral",
replace_original: "true"
});
return { statusCode: 200, body: '' };
}
The promise never resolved, i guess that once hitting return on the function the lambda function gets disposed and so the promise?
Invoking 2nd Lambda function
export const v2: APIGatewayProxyHandlerV2 = async (
event: APIGatewayProxyEventV2, context, callback
): Promise<any> => {
//tried with CB here and without
//callback(null, { statusCode: 200, body: 'processing' });
const slack = decodeQueryStringAs<SlackRequest>(event.body);
const originalMessage = slack.text;
const responseInfo = url.parse(slack.response_url)
const data = JSON.stringify({
...slack,
})
const lambda = new AWS.Lambda()
const params = {
FunctionName: 'dev-****-FC******SmE7',
InvocationType: 'Event', // Ensures asynchronous execution
Payload: data
}
return lambda.invoke(params).promise()// Returns 200 immediately after invoking the second lambda, not waiting for the result
.then(() => callback(null, { statusCode: 200, body: 'working on it' }))
};
Looking at the debugger logs it does send the 200 code and invokes the new lambda function though slack still times out.
Nothing special happens logic wise ... the current non-delayed-message implementation does much more logic wise (accessing DB and manipulating result data) and manages not to timeout.
Any suggestions or help is welcome.

Quick side note, I used request-promise in the linked SO question's answer since the JS native Promise object was not yet available on AWS Lambda's containers at the time.
There's a fundamental difference between the orchestration of the functions in the linked question and your own from what I understand but I think you have the same goal:
> Invoke an asynchronous operation from Slack which posts back to slack once it has a result
Here's the problem with your current approach: Slack sends a request to your (1st) lambda function, which returns a response to slack, and then invokes the second lambda function.
The slack event is no longer accepting responses once your first lambda returns the 200. Here lies the difference between your approach and the linked SO question.
The desired approach would sequentially look like this:
Slack sends a request to Lambda no. 1
Lambda no. 1 returns a 200 response to Slack
Lambda no. 1 invokes Lambda no. 2
Lambda no. 2 sends a POST request to a slack URL (google incoming webhooks for slack)
Slack receives the POST requests and displays it in the channel you chose for your webhook.
Code wise this would look like the following (without request-promise lol):
Lambda 1
module.exports = async (event, context) => {
// Invoking the second lambda function
const AWS = require('aws-sdk')
const lambda = new AWS.Lambda()
const params = {
FunctionName: 'YOUR_SECOND_FUNCTION_NAME',
InvocationType: 'Event', // Ensures asynchronous execution
Payload: JSON.stringify({
... your payload for lambda 2 ...
})
}
await lambda.invoke(params).promise() // Starts Lambda 2
return {
text: "working...",
response_type: "ephemeral",
replace_original: "true"
}
}
Lambda 2
module.exports = async (event, context) => {
// Use event (payload sent from Lambda 1) and do what you need to do
return axios.post('YOUR_INCOMING_WEBHOOK_URL', {
text: 'this will be sent to slack'
});
}

How to poll another server periodically from a node.js server?

I have a node.js server A with mongodb for database.
There is another remote server B (doesn't need to be node based) which exposes a HTTP/GET API '/status' and returns either 'FREE' or 'BUSY' as the response.
When a user hits a particular API endpoint in server A(say POST /test), I wish to start polling server B's status API every minute, until server B returns 'FREE' as the response. The user doesn't need to wait till the server B returns a 'FREE' response (polling B is a background job in server A). Once the server A gets a 'FREE' response from B, it shall send out an email to the user.
How can this be achieved in server A, keeping in mind that the number of concurrent users can go large ?

I suggest you use Agenda. https://www.npmjs.com/package/agenda
With agenda you can create recurring schedules under which you can schedule anything pretty flexible.
I suggest you use request module to make HTTP get/post requests.
https://www.npmjs.com/package/request

Going from the example in node.js docs I'd go with something like the code here. I tested and it works. BTW, I'm assuming here that the api response is something like {"status":"BUSY"} & {"status":"FREE"}
const http = require('http');
const poll = {
pollB: function() {
http.get('http://serverB/status', (res) => {
const { statusCode } = res;
let error;
if (statusCode !== 200) {
error = new Error(`Request Failed.\n` +
`Status Code: ${statusCode}`);
}
if (error) {
console.error(error.message);
res.resume();
} else {
res.setEncoding('utf8');
let rawData = '';
res.on('data', (chunk) => { rawData += chunk; });
res.on('end', () => {
try {
const parsedData = JSON.parse(rawData);
// The important logic comes here
if (parsedData.status === 'BUSY') {
setTimeout(poll.pollB, 10000); // request again in 10 secs
} else {
// Call the background process you need to
}
} catch (e) {
console.error(e.message);
}
});
}
}).on('error', (e) => {
console.error(`Got error: ${e.message}`);
});
}
}
poll.pollB();
You probably want to play with this script and get rid of unnecessary code for you, but that's homework ;)
Update:
For coping with a lot of concurrency in node.js I'd recommend to implement a cluster or use a framework. Here are some links to start researching about the subject:
How to fully utilise server capacity for Node.js Web Apps
How to Create a Node.js Cluster for Speeding Up Your Apps
Node.js v7.10.0 Documentation :: cluster
ActionHero.js :: Fantastic node.js framework for implementing an API, background tasks, cluster using http, sockets, websockets

Use a library like request, superagent, or restify-clients to call server B. I would recommend you avoid polling and instead use a webhook when calling B (assuming you are also authoring B). If you can't change B, then setTimeout can be used to schedule subsequent calls on a 1 second interval.

How to asynchronously service multiple QBWC clients with Node.js

The idea is to implement a QBWC web service using Node.js which can serve multiple incoming requests in an asynchronous fashion. Currently I am looking into qbws which is a Node.js web service for QuickBooks Desktop Web Connector. Any ideas on how I can extend this to support an asynchronous architecture for the service methods?
Thanks in Advance!

The soap module supports asynchronous function calls which makes this easy to do. To use the same template as my other answer, here's how you'd do that:
var soap = require('soap');
var yourService = {
QBWebConnectorSvc: {
QBWebConnectorSvcSoap: {
serverVersion: function (args, callback) {
// serverVersion code here
callback({
serverVersionResult: { string: retVal }
});
},
clientVersion: function (args, callback) {
//clientVersion code here
callback({
clientVersionResult: { string: retVal }
});
},
// and all other service functions required by QBWC
}
}
};
There are two differences:
Each method signature has an additional callback parameter
There is no return, that's handled by callback() instead.
I don't currently have a suitable environment to test this, but I created a client to imitate QuickBooks Web Connector and it worked fine. Converting the qbws methods to asynchronous allowed it to service multiple clients simultaneously (including one legitimate QBWC client).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string