Performance of Cloud Functions for Firebase with outbound networking - node.js

When I run the following code in Cloud Functions, it takes more than 2 seconds.
When I run it locally, it takes about 600 milliseconds.
What are the possible causes?
import * as functions from 'firebase-functions'
import axios from 'axios'
const headers = { 'accept': 'application/json', 'x-access-key': '...', 'x-access-secret': '...' }
exports.functionName = functions.https.onRequest(async (req, res) => {
try {
console.log('request 1 start')
const response1 = await axios.get(`https://api.sample.com/users/${req.body.userId}`, { headers })
console.log('request 1 completed')
const response2 = await axios.post(`https://api.sample.com/contents1/${response1.data.id}`, {}, { headers })
console.log('request 2 completed')
const response3 = await axios.post(`https://api.sample.com/contents2/${response2.data.id}`, {}, { headers })
console.log('request 3 completed')
res.send(response3)
} catch (error) {
res.send(error)
}
})
Metrics
In Cloud, each asynchronous request (axios.get/post) is taking up to almost 1 second.
Hypothesis
It is inevitable that Cloud Functions with Outbound networking will take a long time
Cold start is not the cause (as the execution time does not decrease after the second execution).
What I tried
I think I tried all the methods described in the official Firebase documentation.
Minimum number of instances: I set it to "2" in the GCP console, but no improvement
Increase Memory allocated: I increased it to 1GB, but no improvement
Use global variables to reuse objects: in the above code, header object is the one
HTTP Keep-Alive: I wrote the following code, but no improvement
const httpAgent = new http.Agent({ keepAlive: true })
await axios.get(`https://api.sample.com/users/${req.body.userId}`, { headers, httpAgent })

Important thing to consider:
Where is your API located?
Where are you located?
Where are the functions located?
They could be deployed on the other side of the planet than you and your API is.
You can control where you want your functions deployed by specifying the datacenters.
https://firebase.google.com/docs/functions/locations
All our functions code looks like this:
const DEFAULT_FUNCTIONS_LOCATION = "us-east4";
const runtimeOpts: Record<string, "512MB" | any> = {
timeoutSeconds: 120,
memory: "512MB",
};
const getCustomAnalysis = functions
.region(DEFAULT_FUNCTIONS_LOCATION)
.runWith(runtimeOpts)
You can have functions deployed in different datacenters / in one project. Btw you will need to specify this when you call them.
We do have this in one of our projects. The whole project is deployed in EU (Because of legal). One function is in US - calling US API. Once the data are in the GCP function and they travel to another GCP datacenter, they utilize Google premium tier networking. But if you are not constrained, just deploy your whole project closes to your users / your API.
Also is your GCP project clean? You don't utilize any VPC networking?
Networking can be messy, they could be also "issue" between the GCP datacenter and your API datacenter.
As a tip for more testing: Try out different urls and measure the speed. But I don't think this is general CF issues.
Btw I have run into similar issue with retrieving larger amount of data from Firestore. However we noticed difference when we speed up functions (using more Memory give you more MHz).

Related

Proper way of tracing distributed requests through Azure Function Apps

I am experimenting with Node.js and the application insights SDK in two separate function apps. Nodejs is just what I am comfortable with to quickly poc, this might not be the final language so I don't want to know any language specific solutions, simply how application insights behaves in the context of function apps and what it expects to be able to draw a proper application map.
My goal is to be able to write simple queries in log analytics to get the full chain of a single request through multiple function apps, no matter how these are connected. I also want an accurate (as possible) view of the system in the application map in application insights.
My assumption is that a properly set operation_id and operation_parentId would yield both a queryable trace using kusto and a proper application map.
I've set up the following flow:
Function1 only exposes a HTTP trigger, whereas Function2 exposes both a HTTP and Service Bus trigger.
The full flow looks like this:
I call Function1 using GET http://function1.com?input=test
Function1 calls Function2 using REST at GET http://function2.com?input=test
Function1 uses the response from Function2 to add a message to a service bus queue
Function2 has a trigger on that same queue
I am mixing patterns here just to see what the application map does and understand how to use this correctly.
For step 1 through 3, I can see the entire chain in my logs on a single operation_Id. In this screenshot the same operationId spans two different function apps:
What I would expect to find in this log is also the trigger of the service bus where the trigger is called ServiceBusTrigger. The service bus does trigger on the message, it just gets a different operationId.
To get the REST correlation to work, I followed the guidelines from applicationinsights npm package in the section called Setting up Auto-Correlation for Azure Functions.
This is what Function1 looks like (the entrypoint and start of the chain)
let appInsights = require('applicationinsights')
appInsights.setup()
.setAutoCollectConsole(true, true)
.setDistributedTracingMode(appInsights.DistributedTracingModes.AI_AND_W3C)
.start()
const https = require('https')
const httpTrigger = async function (context, req) {
context.log('JavaScript HTTP trigger function processed a request.')
const response = await callOtherFunction(req)
context.res = {
body: response
}
context.log("Sending response on service bus")
context.bindings.outputSbQueue = response;
}
async function callOtherFunction(req) {
return new Promise((resolve, reject) => {
https.get(`https://function2.azurewebsites.net/api/HttpTrigger1?code=${process.env.FUNCTION_2_CODE}&input=${req.query.input}`, (resp) => {
let data = ''
resp.on('data', (chunk) => {
data += chunk
})
resp.on('end', () => {
resolve(data)
})
}).on("error", (err) => {
reject("Error: " + err.message)
})
})
}
module.exports = async function contextPropagatingHttpTrigger(context, req) {
// Start an AI Correlation Context using the provided Function context
const correlationContext = appInsights.startOperation(context, req);
// Wrap the Function runtime with correlationContext
return appInsights.wrapWithCorrelationContext(async () => {
const startTime = Date.now(); // Start trackRequest timer
// Run the Function
const result = await httpTrigger(context, req);
// Track Request on completion
appInsights.defaultClient.trackRequest({
name: context.req.method + " " + context.req.url,
resultCode: context.res.status,
success: true,
url: req.url,
time: new Date(startTime),
duration: Date.now() - startTime,
id: correlationContext.operation.parentId,
});
appInsights.defaultClient.flush();
return result;
}, correlationContext)();
};
And this is what the HTTP trigger in Function2 looks like:
let appInsights = require('applicationinsights')
appInsights.setup()
.setAutoCollectConsole(true, true)
.setDistributedTracingMode(appInsights.DistributedTracingModes.AI_AND_W3C)
.start()
const httpTrigger = async function (context, req) {
context.log('JavaScript HTTP trigger function processed a request.')
context.res = {
body: `Function 2 received ${req.query.input}`
}
}
module.exports = async function contextPropagatingHttpTrigger(context, req) {
// Start an AI Correlation Context using the provided Function context
const correlationContext = appInsights.startOperation(context, req);
// Wrap the Function runtime with correlationContext
return appInsights.wrapWithCorrelationContext(async () => {
const startTime = Date.now(); // Start trackRequest timer
// Run the Function
const result = await httpTrigger(context, req);
// Track Request on completion
appInsights.defaultClient.trackRequest({
name: context.req.method + " " + context.req.url,
resultCode: context.res.status,
success: true,
url: req.url,
time: new Date(startTime),
duration: Date.now() - startTime,
id: correlationContext.operation.parentId,
});
appInsights.defaultClient.flush();
return result;
}, correlationContext)();
};
The Node.js application insights documentation says:
The Node.js client library can automatically monitor incoming and outgoing HTTP requests, exceptions, and some system metrics.
So this seems to work for HTTP, but what is the proper way to do this over (for instance) a service bus queue to get a nice message trace and correct application map? The above solution for the applicationinsights SDK seems to only be for HTTP requests where you use the req object on the context. How is the operationId persisted in cross-app communication in these instances?
What is the proper way of doing this across other messaging channels? What do I get for free from application insights, and what do I need to stitch myself?
UPDATE
I found this piece of information in the application map documentation which seems to support the working theory that only REST/HTTP calls will be able to get traced. But then the question remains, how does the output binding work if it is not a HTTP call?
The app map finds components by following HTTP dependency calls made between servers with the Application Insights SDK installed.
UPDATE 2
In the end I gave up on this. In conclusion, Application Insights traces some things but it is very unclear when and how that works and also depends on language. For the Node.js docs it says:
The Node.js client library can automatically monitor incoming and outgoing HTTP requests, exceptions, and some system metrics. Beginning in version 0.20, the client library also can monitor some common third-party packages, like MongoDB, MySQL, and Redis. All events related to an incoming HTTP request are correlated for faster troubleshooting.
I solved this by taking inspiration from OpenTracing. Our entire stack runs in Azure Functions, so I've implemented logic to use correlationId that passes through all processes. Each process is a span. Each function/process is responsible for logging according to a structured logging framework.

Firebase Functions timeout when querying AWS RDS PostgreSQL database

I am trying to query an Amazon RDS database from a Firebase Node JS cloud function. I built the query and can successfully run the code locally using firebase functions:shell. However, when I deploy the function and call it from client-side js on my site I receive errors on both the client and server side.
Client-side:
Error: internal
Origin http://localhost:5000 is not allowed by Access-Control-Allow-Origin.
Fetch API cannot load https://us-central1-*****.cloudfunctions.net/query due to access control checks.
Failed to load resource: Origin http://localhost:5000 is not allowed by Access-Control-Allow-Origin.
Server-side:
Function execution took 60004 ms, finished with status: 'timeout'
I believe the issue has two parts:
CORS
pool.query() is async
I have looked at multiple questions for a CORS solution, here and here for example, but none of the solutions have worked for me. In regards to pool.query() being async I believe I am handling it correctly however neither the result nor an error is printed to the servers logs.
Below is all the relevant code from my projects.
Client-side:
var queryRDS = firebase.functions().httpsCallable('query');
queryRDS({
query: document.getElementById("search-input").value
})
.then(function (result) {
if (result) {
console.log(result)
}
})
.catch(function (error) {
console.log(error);
});
Server-side:
const functions = require('firebase-functions');
const { Pool } = require('pg');
const pool = new Pool({
user: 'postgres',
host: '*****.*****.us-west-2.rds.amazonaws.com',
database: '*****',
password: '*****',
port: 5432
})
exports.query = functions.https.onCall((data, context) => {
// This is not my real query, I just changed it for the
// simplicity of this question
var query = "Select * FROM table"
pool.query(query)
.then(result_set => {
console.log(result_set)
return result_set
}).catch(err => {
console.log(err)
return err
})
})
I know everything works up until pool.query(), based on my logs it seems that the .then() or the .catch() are never reached and the returns never reach the client-side.
Update:
I increased the timeout of the Firebase Functions from 60s to 120s and changed my server function code by adding a return statment before pool.query():
return pool.query(query)
.then(result_set => {
console.log(result_set)
return result_set
}).catch(err => {
console.log("Failed to execute query: " + err)
return err
})
I now get an error message reading Failed to execute query: Error: connect ETIMEDOUT **.***.***.***:5432 with the IP address being my AWS RDS database. It seems this might have been the underlying problem all along, but I am not sure why the RDS is giving me a timeout.
The CORS should be automatically handled by the onCall handler. The error message about CORS is likely to be inaccurate, and a result of the function timing out, as the server side error is showing.
That being said, according to the Cloud Functions Documentation on Function's Timeout, the default timeout for Cloud Functions is of 60 seconds, which translated to the ~60000 ms on your error message, and this means that 1 minute is not enough for your function to execute such query, which makes sense if your consider that the function is accessing an external provider, which is the Amazon RDS database.
In order to fix it you will have to redeploy your function with a flag for setting the function execution timeout, as follows:
gcloud functions deploy FUNCTION_NAME --timeout=TIMEOUT
The Value of TIMEOUT could be anything until 540, which is the maximum seconds that Cloud Functions allows before timeout (9 minutes).
NOTE: This could also be mitigated by deploying your function to the closest location possible to where your Amazon RDS database is located, you can check this link on what locations are available for Cloud Functions and you can use --region=REGION on the deploy command to specify region to be deployed.

Axios always time out on AWS Lambda for a particular API

Describe the issue
I'm not really sure if this is an Axios issue or not. The following code runs successfully on my local development machine but always time out whenever I run it from the cloud (e.g. AWS Lambda). Same thing happens when I run on repl.it.
I can confirm that AWS Lambda has internet access and it works for any other API but this:
https://www.target.com.au/ws-api/v1/target/products/search?category=W95362
Example Code
https://repl.it/repls/AdeptFluidSpreadsheet
const axios = require('axios');
const handler = async () => {
const url = 'https://www.target.com.au/ws-api/v1/target/products/search?category=W95362';
const response = await axios.get(url, { timeout: 10000 });
console.log(response.data.data.productDataList);
}
handler();
Environment
Axios Version: 0.19.2
Runtime: nodejs12x
Update 1
I tried the native require('https') and it times out on both localhost and cloud server. Please find sample code here: https://repl.it/repls/TerribleViolentVolume
const https = require('https');
const url = 'https://www.target.com.au/ws-api/v1/target/products/search?category=W95362';
https.get(url, res => {
var body = '';
res.on('data', chunk => {
body += chunk;
});
res.on('end', () => {
var response = JSON.parse(body);
console.log("Got a response: ", response);
});
}).on('error', e => {
console.log("Got an error: ", e);
});
Again, I can confirm that same code works on any other API.
Update 2
I suspect that this is something server side as it also behaves very weirdly with curl.
curl from local -> 403 access denied
curl from local with User-Agent header -> success
curl from cloud server -> 403 access denied
It must be server side validation, something related to AkamaiGHost.
You have probably placed your Lambda function in a VPC without Internet access to the outside world. Try check the VPC section in your lambda configuration, and setup an internet gateway accordingly
You should try by wrapping axios call into try/catch maybe that will catch the issue.
const axios = require('axios');
const handler = async () => {
try {
const url = 'https://www.target.com.au/ws-api/v1/target/products/search?category=W95362';
const response = await axios.get(url, { timeout: 10000 });
console.log(typeof (response));
console.log(response);
} catch (e) {
console.log(e, "error api call");
}
}
handler();
As suggested by Akshay you can use try and catch block to get the error. Maybe it helps you out.
Have you configured Error Handling for Asynchronous Invocation?
To configure error handling follow the below steps:
Open the Lambda console Functions page.
Choose a function.
Under Asynchronous invocation, choose Edit.
Configure the following settings.
Maximum age of event – The maximum amount of time Lambda retains an event in the asynchronous event queue, up to 6 hours.
Retry attempts – The number of times Lambda retries when the function returns an error, between 0 and 2.
Choose Save.
axios is only Promise based HTTP client for the browser and node.js and as you set timeout: 10000 so I believe timeout issue is not from its end.
Although your API
https://www.target.com.au/ws-api/v1/target/products/search?category=W95362
is working fine on the browser and rendering JSON data.
and Function timeout of lambda is by default 15 minutes, which I believe is enough for the response. There may be another issue.
Make sure you have set other configurations like permissions etc. as suggested in the documentation.
Here you can check the default limits for AWS lambda.

Nodejs proxy request coalescing

I'm running into an issue with my http-proxy-middleware stuff. I'm using it to proxy requests to another service which i.e. might resize images et al.
The problem is that multiple clients might call the method multiple times and thus create a stampede on the original service. I'm now looking into (what some services call request coalescing i.e. varnish) a solution that would call the service once, wait for the response and 'queue' the incoming requests with the same signature until the first is done, and return them all in a single go... This is different from 'caching' results due to the fact that I want to prevent calling the backend multiple times simultaneously and not necessarily cache the results.
I'm trying to find if something like that might be called differently or am i missing something that others have already solved someway... but i can't find anything...
As the use case seems pretty 'basic' for a reverse-proxy type setup, I would have expected alot of hits on my searches but since the problemspace is pretty generic i'm not getting anything...
Thanks!
A colleague of mine has helped my hack my own answer. It's currently used as a (express) middleware for specific GET-endpoints and basically hashes the request into a map, starts a new separate request. Concurrent incoming requests are hashed and checked and walked on the separate request callback and thus reused. This also means that if the first response is particularly slow, all coalesced requests are too
This seemed easier than to hack it into the http-proxy-middleware, but oh well, this got the job done :)
const axios = require('axios');
const responses = {};
module.exports = (req, res) => {
const queryHash = `${req.path}/${JSON.stringify(req.query)}`;
if (responses[queryHash]) {
console.log('re-using request', queryHash);
responses[queryHash].push(res);
return;
}
console.log('new request', queryHash);
const axiosConfig = {
method: req.method,
url: `[the original backend url]${req.path}`,
params: req.query,
headers: {}
};
if (req.headers.cookie) {
axiosConfig.headers.Cookie = req.headers.cookie;
}
responses[queryHash] = [res];
axios.request(axiosConfig).then((axiosRes) => {
responses[queryHash].forEach((coalescingRequest) => {
coalescingRequest.json(axiosRes.data);
});
responses[queryHash] = undefined;
}).catch((err) => {
responses[queryHash].forEach((coalescingRequest) => {
coalescingRequest.status(500).json(false);
});
responses[queryHash] = undefined;
});
};

How to call youtube api in Firebase Function Spark mode

I am using Firebase for first time. I manage Firebase to store the data in database and send external http rquest not working.
I am calling Youtube api to get data. I am reading many answers here saying only google owned api are allowed. My code doesn't work on sending request to youtube api.
Someone please check and help.
const functions = require('firebase-functions');
const admin = require('firebase-admin');
var req = require('request');
admin.initializeApp(functions.config().firebase);
var db = admin.firestore();
exports.helloWorld = functions.https.onRequest((request, response) => {
var url = "https://www.googleapis.com/youtube/v3/commentThreads?part=id,snippet,replies&allThreadsRelatedToChannelId=abcd&maxResults=100&order=time&key=key"
req(url, function (error, resp, body) {
if (!error && resp.statusCode === 200) {
return resp;
// var comments = JSON.parse(body);
// comments.forEach(comment => {
// var dataid = com.snippet.topLevelComment.id;
// var docRef = db.collection("comments").doc(dataid);
// var storeInDB = docRef.set(comment);
// });
}
});
response.send("hello World");
});
This function does work fine when I just return the response so I think, Youtbe api thing work fine, But when I uncomment the code which does parse the response and trying to store it in database I see this
Function execution started
6:46:31.736 PM
outlined_flag
helloWorld
Billing account not configured. External network is not accessible and quotas are severely limited. Configure billing account to remove these restrictions
6:46:31.829 PM
outlined_flag
helloWorld
Function execution took 94 ms, finished with status code: 200
In Spark mode, you won't be able to make a call to YouTube API since it isn't supported.
Firebase still thinks that you are calling external API which requires billing setup.
Use Blaze plan which is pay as you go. It feels like the most expensive plan since it is located in the far right. Yet, it includes free tier quota. Once it goes over then you will be charged. You can set low budget to cap it. Then it becomes basically same free tier with billing setup.

Resources