I want to treat 4xx HTTP responses from a function app (e.g. a 400 response after sending a HTTP request to my function app) as failures in application insights. The function app is being called by another service I control so a 4xx response probably means an implementation error and so I'd like to capture that to ultimately run an alert on it (so I can get an email instead of checking into Azure everytime).
If possible, I'd like it to appear here:
If not, are there any alternative approaches that might fit my use case?
Unless an unhandled exception occurs the function runtime will mark the invocation as succesful, whether the status code is actually denoting an error or not. Since this behavior is defined by the runtime there are 2 things you can do: throw an exception in the code of the function and/or remove exception handling so the invocation is marked as not succesful.
Since you ultimately want to create an alert, you better alert on this specific http status code using a "Custom log search" alert
requests
| where toint(resultCode) >= 400
I'm trying to use cloud run for receiving millions of log messages through HTTPS (streaming) and sending them to cloud logging.
But I found there is some loss of data, the number of messages in cloud logging is less than cloud run receiving.
This is the sample code that I tried,
# unzip data
data = gzip.decompress(request.data)
# split by lines
logs = data.decode('UTF-8').split('\n')
# output the logs
log_cnt = 0
for log in logs:
try:
# output to jsonPayload
print(json.dumps(json.loads(log_str))
log_cnt += 1
except Exception as e:
logging.error(F"messsage: {str(e)}")
and If I compare the log_cnt and number of logs in cloud logging, the log_cnt is more. So some print is not finishing the delivering message.
I tried using logging API instead of print(), but the number of logs is too many for sending using logging API (12,000 calls limit for a minute), so it causes the latency very bad, and could not handle requests stably.
I dought the moving number of instances might cause it, so I test when the active instance is not changed, but still, 3-5% of messages are missing.
Is there something I can do for sending all of the messages to cloud logging without any loss?
(updated)
the line of data looks like this, (around 1kb)
{"key1": "ABCDEFGHIJKLMN","key2": "ABCDEFGHIJKLMN","key3": "ABCDEFGHIJKLMN","key4": "ABCDEFGHIJKLMN","key5": "ABCDEFGHIJKLMN","key6": "ABCDEFGHIJKLMN","key7": "ABCDEFGHIJKLMN","key8": "ABCDEFGHIJKLMN","key9": "ABCDEFGHIJKLMN","key10": "ABCDEFGHIJKLMN","key11": "ABCDEFGHIJKLMN","key12": "ABCDEFGHIJKLMN","key13": "ABCDEFGHIJKLMN","key14": "ABCDEFGHIJKLMN","key15": "ABCDEFGHIJKLMN","key16": "ABCDEFGHIJKLMN","key17": "ABCDEFGHIJKLMN","key18": "ABCDEFGHIJKLMN","key19": "ABCDEFGHIJKLMN","key20": "ABCDEFGHIJKLMN","key21": "ABCDEFGHIJKLMN","key22": "ABCDEFGHIJKLMN","key23": "ABCDEFGHIJKLMN","key24": "ABCDEFGHIJKLMN","key26": "ABCDEFGHIJKLMN","key27": "ABCDEFGHIJKLMN","key28": "ABCDEFGHIJKLMN","key29": "ABCDEFGHIJKLMN","key30": "ABCDEFGHIJKLMN","key31": "ABCDEFGHIJKLMN","key32": "ABCDEFGHIJKLMN","key33": "ABCDEFGHIJKLMN","key34": "ABCDEFGHIJKLMN","key35": "ABCDEFGHIJKLMN"}
I can recommend you to use the entries.write API that allow you to write a bulk of entries in the same time. Have a look on my test in the API explorer
It could solve the JSON format and the multiple write in the same time. Have a try with this API and let me know if it's better for you. If not, I will remove this answer!
I am working on a notification service which is built on AWS infra and uses MSK, lambda and SES.
The lambda is written in Nodejs for which the trigger is an MSK topic. Now the weird thing about this lambda is its getting invoked continuously even after the messages are processed. Inside lambda is the code to fetch recipients and send emails via SES.
I have ensured that there is no loop present inside the code. so my guess is for some reason the messages are not getting marked as consumed.
One reason this could happen is if the code executing throws an Error at some point. But I have no error in the logs.
Can execution time (lambda getting timed out be an issue? I don't see anything like that in the logs though), the volume of messages be responsible for this behavior ?
Lambda is setup using serverless framework:
notificationsKafkaConsumer:
handler: src/consumers/notifications.consumer
events:
- msk:
arn: ${ssm:/kafka/cluster_arn~true}
topic: "notifications"
startingPosition: LATEST
It turned out that the lambdas getting timeout out was the issue. The message got lost in the huge volume of logs and interestingly "timed out" is not an error so filtering logs with "ERROR" didn't work.
I am using Dialogflow ES and once I got the webhook setup, I haven't been having issues. But after a few months, I just started getting a random error. It seems to be inconsistent as in sometimes I get it for a specific web call and other times it works fine. This is from the Raw API response:
"webhookStatus": {
"code": 3,
"message": "Webhook call failed. Error: [ResourceName error] Path '' does not match template 'projects/{project_id=*}/locations/{location_id=*}/agent/environments/{environment_id=*}/users/{user_id=*}/sessions/{session_id=*}/contexts/{context_id=*}'.."
}
The webhook is in GCP Functions in the same project. I have a simple "ping" function in the same agent that calls the webhook. That works properly and pings the function, records some notes in the function log (so I know the function is being called), and returns a response fine, so I know the webhook is connected and working for other intents in the same agent before and after I get the error above.
Other intents in the same agent work (and this one WAS working), but I get this error now. I also tried recreating the intent and I get the same behavior.
The project is linked to a billing account and I have been getting charged for it, so I don't think it is an issue with being on a trial or otherwise. Though the Dialogflow itself is in "trial", but the linked webhook function is billed.
Where can I find what this error means or where to look to resolve it?
After looking at this with fresh eyes, I found out what was happening.
The issue was a mal-formed output context. I was returning the bad output context sometimes (which explained why sometimes it worked and sometimes it didn't). Specifically, I was returning the parameters directly into the output context without the output context 'name' or 'parameters'. Everything looked like it was working and I didn't get any other errors, but apparently, when Dialogflow receives a bad web response, it generates the unhelpful error above.
Situation - I have a lambda that:
is built with Node.js v8
has console.log() statements
is triggered by SQS events
works properly (the downstream system receives all messages, AWS X-Ray can see those executions)
Problem:
this lambda does not log anything!
But if the same lambda is called manually (using "Test" button) - all logging statements are visible in CloudWatch.
My lambda is based on this tutorial: https://www.jeremydaly.com/serverless-consumers-with-lambda-and-sqs-triggers/
A very similar situation occurs if the lambda was called from within another lambda (recursion). Only the first lambda logs stuff (started manually), but every next lambda in the recursion chain does not log anything.
an example can be found here:
https://theburningmonk.com/2016/04/aws-lambda-use-recursive-function-to-process-sqs-messages-part-1/
any idea how to tackle this problem will be highly appreciated.