GCP Cloud run (with python) logging to cloud logging - python-3.x

I'm trying to use cloud run for receiving millions of log messages through HTTPS (streaming) and sending them to cloud logging.
But I found there is some loss of data, the number of messages in cloud logging is less than cloud run receiving.
This is the sample code that I tried,
# unzip data
data = gzip.decompress(request.data)
# split by lines
logs = data.decode('UTF-8').split('\n')
# output the logs
log_cnt = 0
for log in logs:
try:
# output to jsonPayload
print(json.dumps(json.loads(log_str))
log_cnt += 1
except Exception as e:
logging.error(F"messsage: {str(e)}")
and If I compare the log_cnt and number of logs in cloud logging, the log_cnt is more. So some print is not finishing the delivering message.
I tried using logging API instead of print(), but the number of logs is too many for sending using logging API (12,000 calls limit for a minute), so it causes the latency very bad, and could not handle requests stably.
I dought the moving number of instances might cause it, so I test when the active instance is not changed, but still, 3-5% of messages are missing.
Is there something I can do for sending all of the messages to cloud logging without any loss?
(updated)
the line of data looks like this, (around 1kb)
{"key1": "ABCDEFGHIJKLMN","key2": "ABCDEFGHIJKLMN","key3": "ABCDEFGHIJKLMN","key4": "ABCDEFGHIJKLMN","key5": "ABCDEFGHIJKLMN","key6": "ABCDEFGHIJKLMN","key7": "ABCDEFGHIJKLMN","key8": "ABCDEFGHIJKLMN","key9": "ABCDEFGHIJKLMN","key10": "ABCDEFGHIJKLMN","key11": "ABCDEFGHIJKLMN","key12": "ABCDEFGHIJKLMN","key13": "ABCDEFGHIJKLMN","key14": "ABCDEFGHIJKLMN","key15": "ABCDEFGHIJKLMN","key16": "ABCDEFGHIJKLMN","key17": "ABCDEFGHIJKLMN","key18": "ABCDEFGHIJKLMN","key19": "ABCDEFGHIJKLMN","key20": "ABCDEFGHIJKLMN","key21": "ABCDEFGHIJKLMN","key22": "ABCDEFGHIJKLMN","key23": "ABCDEFGHIJKLMN","key24": "ABCDEFGHIJKLMN","key26": "ABCDEFGHIJKLMN","key27": "ABCDEFGHIJKLMN","key28": "ABCDEFGHIJKLMN","key29": "ABCDEFGHIJKLMN","key30": "ABCDEFGHIJKLMN","key31": "ABCDEFGHIJKLMN","key32": "ABCDEFGHIJKLMN","key33": "ABCDEFGHIJKLMN","key34": "ABCDEFGHIJKLMN","key35": "ABCDEFGHIJKLMN"}

I can recommend you to use the entries.write API that allow you to write a bulk of entries in the same time. Have a look on my test in the API explorer
It could solve the JSON format and the multiple write in the same time. Have a try with this API and let me know if it's better for you. If not, I will remove this answer!

Related

How to increase the AWS lambda to lambda connection timeout or keep the connection alive?

I am using boto3 lambda client to invoke a lambda_S from a lambda_M. My code looks something like
cfg = botocore.config.Config(retries={'max_attempts': 0},read_timeout=840,
connect_timeout=600) # tried also by including the ,
# region_name="us-east-1"
lambda_client = boto3.client('lambda', config=cfg) # even tried without config
invoke_response = lambda_client.invoke (
FunctionName=lambda_name,
InvocationType='RequestResponse',
Payload=json.dumps(request)
)
Lambda_S is supposed to run for like 6 minutes and I want lambda_M to be still alive to get the response back from lambda_S but lambda_M is timing out, after giving a CW message like
"Failed to connect to proxy URL: http://aws-proxy..."
I searched and found someting like configure your HTTP client, SDK, firewall, proxy or operating system to allow for long connections with timeout or keep-alive settings. But the issue is I have no idea how to do any of these with lambda. Any help is highly appreciated.
I would approach this a bit differently. Lambdas charge you by second, so in general you should avoid waiting in them. One way you can do that is create an sns topic and use that as the messenger to trigger another lambda.
Workflow goes like this.
SNS-A -> triggers Lambda-A
SNS-B -> triggers lambda-B
So if you lambda B wants to send something to A to process and needs the results back, from lambda-B you send a message to SNS-A topic and quit.
SNS-A triggers lambda, which does its work and at the end sends a message to SNS-B
SNS-B triggers lambda-B.
AWS has example documentation on what policies you should put in place, here is one.
I don't know how you are automating the deployment of native assets like SNS and lambda, assuming you will use cloudformation,
you create your AWS::Lambda::Function
you create AWS::SNS::Topic
and in its definition, you add 'subscription' property and point it to you lambda.
So in our example, your SNS-A will have a subscription defined for lambda-A
lastly you grant SNS permission to trigger the lambda: AWS::Lambda::Permission
When these 3 are in place, you are all set to send messages to SNS topic which will now be able to trigger the lambda.
You will find SO answers to questions on how to do this cloudformation (example) but you can also read up on AWS cloudformation documentation.
If you are not worried about automating the stuff and manually tests these, then aws-cli is your friend.

Azure functions is not loggin all the traces in AppInsights

I have an Azure Function App with multiples functions connected with Application Insights.
For some reason that I don't know sometimes, some requests and traces get lost and it's like they never happen, but I can see the data in our DB and also in others systems.
Here is a new function with just one call, in the azure function dashboard I can see the log:
But in Application Insights, when I try to search for the logs of the trace or the request, there is not info retrived.
This's not happening everytime, but there's not the first time I saw this issue. I can see the logs for others requests but I don't know why sometimes logs are lost.
Azure function info:
Runtime Version: 3
Stack: NodeJS
Have you configured sampling? This can appear as data loss.
You can control it as follows, as per the documentation:
const appInsights = require("applicationinsights");
appInsights.setup("<instrumentation_key>");
appInsights.defaultClient.config.samplingPercentage = 33; // 33% of all telemetry will be sent to Application Insights
appInsights.start();

Stackdriver-trace on Google Cloud Run failing, while working fine on localhost

I have a node server running on Google Cloud Run. Now I want to enable stackdriver tracing. When I run the service locally, I am able to get the traces in the GCP. However, when I run the service as Google Cloud Run, I am getting an an error:
"#google-cloud/trace-agent ERROR TraceWriter#publish: Received error with status code 403 while publishing traces to cloudtrace.googleapis.com: Error: The request is missing a valid API key."
I made sure that the service account has tracing agent role.
First line in my app.js
require('#google-cloud/trace-agent').start();
running locally I am using .env file containing
GOOGLE_APPLICATION_CREDENTIALS=<path to credentials.json>
According to https://github.com/googleapis/cloud-trace-nodejs These values are auto-detected if the application is running on Google Cloud Platform so, I don't have this credentials on the gcp image
There are two challenges to using this library with Cloud Run:
Despite the note about auto-detection, Cloud Run is an exception. It is not yet autodetected. This can be addressed for now with some explicit configuration.
Because Cloud Run services only have resources until they respond to a request, queued up trace data may not be sent before CPU resources are withdrawn. This can be addressed for now by configuring the trace agent to flush ASAP
const tracer = require('#google-cloud/trace-agent').start({
serviceContext: {
service: process.env.K_SERVICE || "unknown-service",
version: process.env.K_REVISION || "unknown-revision"
},
flushDelaySeconds: 1,
});
On a quick review I couldn't see how to trigger the trace flush, but the shorter timeout should help avoid some delays in seeing the trace data appear in Stackdriver.
EDIT: While nice in theory, in practice there's still significant race conditions with CPU withdrawal. Filed https://github.com/googleapis/cloud-trace-nodejs/issues/1161 to see if we can find a more consistent solution.

How do I query GCP log viewer and obtain json results in Python 3.x (like gcloud logging read)

I'm building a tool to download GCP logs, save the logs to disk as single line json entries, then perform processing against those logs. The program needs to support both logs exported to cloud storage and logs currently in stackdriver (to partially support environments where exports to cloud storage hasn't been pre-configured). The cloud storage piece is done, but I'm having difficulty with downloading logs from stackdriver.
I would like to implement similar functionality to the gcloud function 'gcloud logging read' in Python. Yes, I could use gcloud, however I would like to build everything into the one tool.
I currently have this sample code to print the result of hits, however I can't get the full log entry in JSON format:
def downloadStackdriver():
client = logging.Client()
FILTER = "resource.type=project"
for entry in client.list_entries(filter_=FILTER):
a = (entry.payload.value)
print(a)
How can I obtain full JSON output of matching logs like it works using gcloud logging read?
Based on other stackoverflow pages, I've attempted to use MessageToDict and MessageToJson, however I receive the error
"AttributeError: 'ProtobufEntry' object has no attribute 'DESCRIPTOR'"
You can use the to_api_repr function on the LogEntry class from the google-cloud-logging package to do this:
from google.cloud import logging
client = logging.Client()
logger = client.logger('log_name')
for entry in logger.list_entries():
print(entry.to_api_repr())

Firebase Functions - ERROR, but no Event Message in Console

I have written a function on firebase that downloads an image (base64) from firebase storage and sends that as response to the user:
const functions = require('firebase-functions');
import os from 'os';
import path from 'path';
const storage = require('firebase-admin').storage().bucket();
export default functions.https.onRequest((req, res) => {
const name = req.query.name;
let destination = path.join(os.tmpdir(), 'image-randomNumber');
return storage.file('postPictures/' + name).download({
destination
}).then(() => {
res.set({
'Content-Type': 'image/jpeg'
});
return res.status(200).sendFile(destination);
});
});
My client calls that function multiple times after one another (in series) to load a range of images for display, ca. 20, of an average size of 4KB.
After 10 or so pictures have been loaded (amount varies), all other pictures fail. The reason is that my function does not respond correctly, and the firebase console shows me that my function threw an error:
The above image shows that
A request to the function (called "PostPictureView") suceeds
Afterwards, three requests to the controller fail
In the end, after executing a new request to the "UserLogin"-function, also that fails.
The response given to the client is the default "Error: Could not handle request". After waiting a few seconds, all requests get handled again as they are supposed to be.
My best guesses:
The project is on free tier, maybe google is throttling something? (I did not hit any limits afaik)
Is there a limit of messages the google firebase console can handle?
Could the tmpdir from the functions-app run low? I never delete the temporary files so far, but would expect that either google deletes them automatically, or warns me in a different way that the space is running low.
Does someone know an alternative way to receive the error messages, or has experienced similar issues? (As Firebase Functions is still in Beta, it could also be an error from google)
Btw: Downloading the image from the client (android app, react-native) directly is not possible, because I will use the function to check for access permissions later. The problem is reproducable for me.
In Cloud Functions, the /tmp directory is backed by memory. So, every file you download there is effectively taking up memory on the server instance that ran the function.
Cloud Functions may reuses server instances for repeated calls to the same function. This means your function is downloading another file (to that same instance) with each invocation. Since the names of the files are different each time, you are accumulating files in /tmp that each occupy memory.
At some point, this server instance is going to run out of memory with all these files in /tmp. This is bad.
It's a best practice to always clean up files after you're done with them. Better yet, if you can stream the file content from Cloud Storage to the client, you'll use even less memory (and be billed even less for the memory-hours you use).
After some more research, I've found the solution: The Firebase Console seems to not show all error information.
For detailed information to your functions, and errors that might be omitted in the Firebase Console, check out the website from google cloud functions.
There I saw: The memory (as suggested by #Doug Stevensson) usage never ran over 80MB (limit of 256MB) and never shut the server down. Moreover, there is a DNS resolution limit for the free tier, that my application hit.
The documentation points to a limit of DNS resolutions: 40,000 per 100 seconds. In my case, this limit was never hit - firebase counts a total executions of roundabout 8000 - but it seems there is a lower, undocumented limit for the free tier. After upgrading my account (I started the trial that GCP offers, so actually not paying anything) and linking the project to the billing account, everything works perfectly.

Resources