Stackdriver-trace on Google Cloud Run failing, while working fine on localhost - node.js

I have a node server running on Google Cloud Run. Now I want to enable stackdriver tracing. When I run the service locally, I am able to get the traces in the GCP. However, when I run the service as Google Cloud Run, I am getting an an error:
"#google-cloud/trace-agent ERROR TraceWriter#publish: Received error with status code 403 while publishing traces to cloudtrace.googleapis.com: Error: The request is missing a valid API key."
I made sure that the service account has tracing agent role.
First line in my app.js
require('#google-cloud/trace-agent').start();
running locally I am using .env file containing
GOOGLE_APPLICATION_CREDENTIALS=<path to credentials.json>
According to https://github.com/googleapis/cloud-trace-nodejs These values are auto-detected if the application is running on Google Cloud Platform so, I don't have this credentials on the gcp image

There are two challenges to using this library with Cloud Run:
Despite the note about auto-detection, Cloud Run is an exception. It is not yet autodetected. This can be addressed for now with some explicit configuration.
Because Cloud Run services only have resources until they respond to a request, queued up trace data may not be sent before CPU resources are withdrawn. This can be addressed for now by configuring the trace agent to flush ASAP
const tracer = require('#google-cloud/trace-agent').start({
serviceContext: {
service: process.env.K_SERVICE || "unknown-service",
version: process.env.K_REVISION || "unknown-revision"
},
flushDelaySeconds: 1,
});
On a quick review I couldn't see how to trigger the trace flush, but the shorter timeout should help avoid some delays in seeing the trace data appear in Stackdriver.
EDIT: While nice in theory, in practice there's still significant race conditions with CPU withdrawal. Filed https://github.com/googleapis/cloud-trace-nodejs/issues/1161 to see if we can find a more consistent solution.

Related

Azure functions is not loggin all the traces in AppInsights

I have an Azure Function App with multiples functions connected with Application Insights.
For some reason that I don't know sometimes, some requests and traces get lost and it's like they never happen, but I can see the data in our DB and also in others systems.
Here is a new function with just one call, in the azure function dashboard I can see the log:
But in Application Insights, when I try to search for the logs of the trace or the request, there is not info retrived.
This's not happening everytime, but there's not the first time I saw this issue. I can see the logs for others requests but I don't know why sometimes logs are lost.
Azure function info:
Runtime Version: 3
Stack: NodeJS
Have you configured sampling? This can appear as data loss.
You can control it as follows, as per the documentation:
const appInsights = require("applicationinsights");
appInsights.setup("<instrumentation_key>");
appInsights.defaultClient.config.samplingPercentage = 33; // 33% of all telemetry will be sent to Application Insights
appInsights.start();

Where are the logs and memory dumps of Azure Function crashes?

Sometimes my azure function fails and I have no record of what happened. Function just stops executing.
I think there is major error like StackOverflow, but since there is no record of it I can't be sure.
I created a simple azure function to emulate simple stack overflow:
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
ILogger log, ExecutionContext executionContext)
{
RunStackOverflow();
}
private static void RunStackOverflow()
{
RunStackOverflow();
}
When I call this using http trigger, I get 502 error in browser, but there is nothing in logs about this failure. Screenshot: https://www.screencast.com/t/ymWoBey4KX
StackOverflow is just one of the exceptions that can't be caught and can result in function crash. Locally when I run the function in emulator I see stack overflow error in cmd window where function starts. Screenshot: https://www.screencast.com/t/f85U2KmdEBBt
In Azure portal I checked:
function invocations (screenshot: https://www.screencast.com/t/ufB1Zfthz)
function logs (screenshot: https://www.screencast.com/t/A2ix6yuSuJkE)
app insights (screenshot: https://www.screencast.com/t/NyRFLDK23p)
But there is no log entry of this crash anywhere.
I contacted Azure support, but they are not very helpful so far.
Update on Apr 12
Using KUDU I can create memory dump using command like this
c:\devtools\sysinternals\procdump -e -ma -w 12268
This shows me all stack traces for all threads and this is what I need, but only when first chance exception occurs.
The command to trigger memory dump when there is such exception is:
c:\devtools\sysinternals\procdump -accepteula -e -g -ma 8844
but when I run it and then trigger StackOverflow exception here is what is written out to command line:
[11:37:36] Exception: E0434352.CLR
[11:37:36] Exception: C00000FD.STACK_OVERFLOW <--- Stack overflow
[11:37:37] The process has exited.
[11:37:37] Dump count not reached.
Unfortunately there is no memory dump created, so I can't see a stack trace that caused stack overflow.
I also tried:
c:\devtools\sysinternals\procdump -accepteula -e -g -ma -t 13244
-t option triggers memory dump when process exits.
This one actually records a memory dump when Function crashes. Unfortunately this dump doesn't include stack trace for StackOverflow. It seems to get dumped after the thread already crashed.
Update on Apr 21
There are multiple ways to host Azure functions described here:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale
The most common and default way is Consumption plan. After a bunch of trial-and-errors I found that Diagnostics Tools (https://www.screencast.com/t/DyT6Jpuqm2uo) which can be used to detect and analyze crashes are not available with Consumption plan. On the other hand they are available with App Service (Basic and plus) and other plans. Azure support told me that currently there are no plans to add it to Consumption plan.
So for now I made a new Azure Function hosted using App Service Plan and I was able to use Diagnostic tools to record crash dumps. After fixing the issues I plan to go back to Consumption plan, so it is a bit of a hack, but it does work for now.
Currently, this level of logs are not supported very well.
You can use Diagnose and solve problems option from azure portal by following this link, but note that some features(like Application Crashes) in this option are still in development.
Steps:
1.In azure portal -> your function app -> click Diagnose and solve problems -> then click the Function App Down or Reporting Errors link. Here is the screenshot:
2.Wait for a while before the report completes generating -> then check the items starts with red exclamation mark(by using your code, the error details are in the Web App Restarted item. But it just shows a common message like app crashes, not stackoverflow):

HTTP Error 500.30 - ANCM In-Process Start Failure with newly created app service

we are created new development environment so I cloned a current working app service into a new one and changed the configurations and deployed same code but the new app service is returning HTTP Error 500.30 - ANCM In-Process Start Failure
after trying the console for more details that's what I get, I don't think its related to runtime identifier because same code runs on different exact app services
The dreaded 500.3x ACNM error can mean different things, so I'm going to assist you in pinpointing those things.
My recommendation:
Go to Azure Portal > Your App Service > development tools
Open console.
Screen should look like this:
Console Screen Azure
Type in (YourWebAppName).exe
What this will do, is show error messages that are relevant to your startup issue.
Also, some information regarding errors can be seen here:
https://learn.microsoft.com/en-us/aspnet/core/test/troubleshoot-azure-iis?view=aspnetcore-3.1#app-startup-errors

Azure backend return 500 in PATCH operation

I am desperately trying to debug an error 500 only when I try to update an object from my xamarin.Forms offline DB to Azure. I am using Azure Mobile Client.
I set all the logging to ON in azure, then I downloaded the log. I can see the generic error, but nothing useful.
<failedRequest url="https://MASKED:80/tables/Appel/9A3342A2-0598-4126-B0F6-2999B524B4AE"
siteId="Masked"
appPoolId="Masked"
processId="6096"
verb="PATCH"
remoteUserName=""
userName=""
tokenUserName="IIS APPPOOL\Masked"
authenticationType="anonymous"
activityId="{80000063-0000-EA00-B63F-84710C7967BB}"
failureReason="STATUS_CODE"
statusCode="500"
triggerStatusCode="500"
timeTaken="625"
xmlns:freb="http://schemas.microsoft.com/win/2006/06/iis/freb"
>
The table that failed is the only one I extend with some virtual runtime calculated field of navigation field. But I add the [JsonIgnore] to stop AzureService to create field in the local DB (that work) or send it on the wire to the server. But I always got the 500 error, not exception when debugging the c# Azure backend too.
How I can find the stack trace or the "deep" reason for this 500 error in my backend?
For C# Mobile App backend, you could add the following code in the ConfigureMobileApp method of your Startup.MobileApp.cs file for including error details and return to your client side.
config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Always;
You could just capture the exception in your mobile application or leverage fiddler to capture the network traces when invoking the PATCH operation to retrieve the detailed error message.
Moreover, you are viewing the Failed Request Traces log, you need to check the Application logs. Details you could follow Enable diagnostics logging for web apps in Azure App Service.

Unable to update VM with nodejs app on Google App Engine

When I try to deploy from the gcloud CLI I get the following error.
Copying files to Google Cloud Storage...
Synchronizing files to [gs://staging.logically-abstract-www-site.appspot.com/].
Updating module [default]...\Deleted [https://www.googleapis.com/compute/v1/projects/logically-abstract-www-site/zones/us-central1-f/instances/gae-builder-vm-20151030t150724].
Updating module [default]...failed.
ERROR: (gcloud.preview.app.deploy) Error Response: [4] Timed out creating VMs.
My app.yaml is:
runtime: nodejs
vm: true
api_version: 1
automatic_scaling:
min_num_instances: 2
max_num_instances: 20
cool_down_period_sec: 60
cpu_utilization:
target_utilization: 0.5
and I am logged in successfully and have the correct project ID. I see the new version created in the Cloud Console for App Engine, but the error is after that it seems.
In the stdout log I see both instances go up with the last console.log statement I put in the app after it starts listening on the port, but in the shutdown.log I see "app was unhealthy" and in syslog I see "WARNING: never got healthy response from app, but sending /_ah/start query anyway."
From my experience with nodejs using Google Cloud App Engine, I see that "Timed out creating VMs" is neither a traditional timeout nor does it have to do with creating VMs. I had found that other errors were reported during the launch of the server --which happens to be right after VMs are created. So, I recommend checking console output to see if it tells you anything.
To see the console output:
For a vm instance, then go to /your/ vm instances and click the vm instance you want, then scroll towards the bottom and click "Serial console output".
For stdout console logging, go monitoring /your/ logs then change the log type dropdown from Request to be stdout.
I had found differences in the process.env when running locally versus in the cloud. I hope you find your solution too --good luck!

Resources