Question:
Do you know a way to actually monitor how much memory is being used by a GCF (Node.js 8) ?
Do you have recommendations regarding the memory profiling of Google Cloud Functions (even locally) for Node.js 8?
Context:
I deployed a Google Cloud Function (NodeJS), with 128MB of memory, that used to work pretty well.
Today, it fails saying "Error: memory limit exceeded.".
GCP tells me the function doesn't use up more than 58MiB, yet it fails with a memory error when it has 128MB.
I feel lost and flawed because:
It used to work and I didn't change a thing since then.
It seems I can't trust google when it comes to monitoring the consumption of memory
The "Details" screen of the function shows it consuming no more than 58MiB.
The Dashboard I created in Monitoring in order to monitor it shows the same values.
Yet it fails with a memory limit.
I have already seen this question Memory profiler for Google cloud function?, but Stackdriver Profiler doesn't seem to work for GCF (per doc)
The cloud functions need to respond when they're done! if they don't respond then their allocated resources won't be free. Any exception in the cloud functions may cause a memory limit error. So you need to handle all corner cases, exceptions, and promise rejections properly and respond immediately!
A tutorial video series on youtube by Doug Stevenson.
Another video about promises in cloud function by Doug.
An ask firebase video hosted by Jen Person about the memory of cloud function.
Set memory allocation in Cloud Functions from the Google Cloud Console.
From the documentation:
To set memory allocation and timeout in the Google Cloud Platform Console:
In the Google Cloud Platform Console, select Cloud Functions from the left menu.
Select a function by clicking on its name in the functions list.
Click the Edit icon in the top menu.
Select a memory allocation from the drop-down menu labeled Memory allocated.
Click More to display the advanced options, and enter a number of seconds in the Timeout text box.
Click Save to update the function.
Things to check for memory leaks (very tricky to troubleshoot):
Async-await functions.
Promises run "in the background" (with .then).
Writing to the writeable part of the filesystem /tmp/ to store temporary files in a function instance will also consume memory provisioned for the function.
Cloud Function's Auto-scaling and Concurrency concepts
Each instance of a function handles only one concurrent request at a time. This means that while your code is processing one request, there is no possibility of a second request being routed to the same instance. Thus the original request can use the full amount of resources (CPU and memory) that you requested.
Cloud Functions monitoring
These are the available resources at hand to monitor your Cloud Functions:
Stackdriver Logging captures and stores Cloud Functions logs.
Stackdriver Error Reporting captures specially formatted error logs and displays them in the Error Reporting dashboard.
Stackdriver Monitoring records metrics regarding the execution of Cloud Functions.
Related
I have a long running (node.js) orchestrator in Azure Function App that calls a couple hundred activity functions. Sometimes with a group of 5 or so running in parallel with context.df.Task.all. I find that it will run steadily for about two hours then the function app itself seems to abruptly stop. The logs stop displaying in the log stream. And the records in my database that the activity functions are supposed to be writing stop writing. There are no exceptions in the logs. It will remain paused or stalled like this indefinitely... until I restart the function app. Then it will come back to to life and resume where it stopped before for a time and then stop again.
Does this behavior sound familiar to anyone?
Should I update the extension bundle to [4.0.0, 5.0.0)
Could my storage account be the problem? Should I create a new one?
We are using the "Premium Plan", Could I be running up against a limit of some kind? If so what and what should I tell the IT team to increase.
As far as I know,
Should I update the extension bundle to [4.0.0, 5.0.0)
I believe this issue is not related to extension bundles because this is regarding on the usage compatible extensions, libraries, packages used in the Function App and extension bundle is versioned where each version comprises of Rich set of supported binding extensions to be installed based on the version of the Function App.
If any timeout value is defined in the host.json, make it as unbounded (-1) as the function project is deployed/hosting in the premium plan for the longer timeout duration of function executions.
Could my storage account be the problem? Should I create a new one?
Instead of creating a new account, you can increase the quota of the Storage account to 5 PiB.
If Storage account is in consideration, then make sure that both the function app and storage account are in same region to reduce latency issues.
Also, in production environment - it is better to allocate a separate storage account for each azure function app.
We are using the "Premium Plan", Could I be running up against a limit of some kind? If so what and what should I tell the IT team to increase.
Also, you mentioned in the question that the function app stalls, with no executions after stalling and works by restart from where it has paused. I have seen some points mentioned by Microsoft even the long running functions hosted in premium plan will stops with no executions like your scenario:
Refer to the MS Doc for more information.
On azure functions app which is running on the app service plan we notice that memory is significantly increasing (from ~100MB to 3GB).
The function app is written in Python and is triggered whenever new event is received in the events hub.
I've tried to profile memory based on azure's official guide and there are several weird parts I've noticed:
on each new event invocation, the function memory is being increased by several KB / MB
for example, when variables hold data, inside the Python function, as logs show the memory is not released (?)
over time this little increments add up to high memory usage.
It would be helpful if you can suggest possible solutions or any further debug methods.
G'day folks,
I'm having some issues with an Azure function that I'm hoping someone might be able to help with.
We have a relatively long-running process (3-4 mins) that is being triggered from a Service Bus message, and we were having issues with the function execution ending without error and then attempting to re-process. The time take for this to happen is less than all the timeout/lock duration settings we have configured. Watching the logs (log stream, for both file system and app insights) we see the last line of the previous execution, then it kicks straight into the next.
To determine whether it's service bus related, I've also tried executing the process via a blob trigger (the process uses the file as a data source anyway) but I'm seeing the same thing except I don't see the subsequent retries.
In both scenarios I don't see anything in App insights apart from the Trace records. I don't get an exception, or even a 'request' entry. (function logic is all enclosed in try/catch blocks btw)
So my question is - Is it possible to trap these scenarios so we can determine the root cause? Currently I've got nothing to go on to try and diagnose. These errors don't happen when running locally.
FWIW we've seen this issue happen during the execution of a third-party libraries (MS Graph and an OpenXMLPowerTools library) - as we're generating documents for upload into Sharepoint. Not sure if this is relevant.
Thanking you in advance,
Tim
May be this is because of the plan that you are using , If you're using the Consumption plan, the default timeout is 5 minutes, but you can increase it to a maximum of 10 minutes. The maximum timeout on a Premium plan is 60 minutes. You can set your timeout as long as you want if you have a dedicated App Service plan.
Also try configuring the timeout of your function app i.e by changing the value of functionTimeout in host.json of your function app.
You should have a look at durable functions.
They allows us to have long running processes, i.e. import/export tasks.
I was able to wrap a long running import process, which takes about 20 mins to run successfully.
I have an Azure Function app in Node.js with a couple of Queue-triggered functions.
These were working great, until I saw a couple of timeouts in my function logs.
From that point, none of my triggered functions are actually doing anything. They just keep timing out even before executing the first line of code, which is a context.log()-statement to show the execution time.
What could be the cause of this?
Check your functions storage account in the azure portal, you'll likely see very high activity for files monitoring.
This is likely due to the interaction between Azure Files and requiring a large node_modules tree. Once the modules have been required once, functions will execute quickly because modules are cached, but these timeouts can throw the function app into a timeout -> restart loop.
There's a lot of discussion on this, along with one possible improvement (using webpack on server side modules) here.
Other possibilities:
decrease number of node modules if possible
move to dedicated instead of consumption plan (it runs on a different file system which has better performance)
use C# or F#, which don't suffer from these limitations
I am relatively new to Azure. I have a website that has been running for a couple of months with not too much traffic...when users are on the system, the various dashboard monitors go up and then flat line the rest of the time. This week, the CPU time when way up when there were no requests and data going in or out of the site. Is there a way to determine the cause of this CPU activity when the site is not active? It doesn't make sense to me that I should have CPU activity being assigned to my site when there is to site activity.
If your website has significant processing at application start, it is possible your VM got rebooted or your app pool recycled and your onstart handler got executed again (which would cause CPU to spike without any request).
You can analyze this by adding application logs to your Application_Start event (but after initializing trace). There is another comment detailing how to enable logging, but you can also consult this link.
You need to collect data to understand what's going on. So first thing I would say is:
1. Go to Azure management portal -> your website (assuming you are using Azure websites) -> dashboard -> operation logs. Try to see whether there is any suspicious activity going on.
download the logs for your site using any ftp client and analyze what's happening. If there is not much data, I would suggest adding more logging in your application to see what is happening or which module is spinning.
A great way to detect CPU spikes and even determine slow running areas of your application is to use a profiler like New Relic. It's a free add on for Azure that collects data and provides you with a dashboard of data. You might find it useful to determine the exact cause of the CPU spike.
We regularly use it to monitor the performance of our applications. I would recommend it.