Use Google Cloud Secrets when initializing code - node.js

I have this code to retrieve the secrets:
import {SecretManagerServiceClient} from "#google-cloud/secret-manager";
const client = new SecretManagerServiceClient();
async function getSecret(secret: String, version = "latest") {
const projectID = process.env.GOOGLE_CLOUD_PROJECT;
const [vs] = await client.accessSecretVersion({
name: `projects/${projectID}/secrets/${secret}/versions/${version}`
});
const secretValue = JSON.parse(vs.payload.data.toString());
return secretValue;
}
export {getSecret};
I would like to replace the process.env.SENTRY_DNS with await getSecrets("SENTRY_DNS") but I can't call a promise (await) outside an async function.
Sentry.init({
dsn: process.env.SENTRY_DNS,
environment: Config.isBeta ? "Beta" : "Main"
});
function sentryCreateError(message, contexts) {
Sentry.captureMessage(message, {
level: "error", // one of 'info', 'warning', or 'error'
contexts
});
}
What are the best practices with Google Secrets? Should I be loading the secrets once in a "config" file and then call the values from there? If so, I'm not sure how to do that, do you have an example?

Leaving aside your code example (I don't work with JS anyway), I would think about a few different questions, answers on which may affect the design. For example:
Where this code is executed? - compute engine, app engine, cloud run, k8s, cloud function, and so on. Depending on the answer - an approach to store secrets might be different.
Suppose, for example, that is going to be a cloud function. The next question -
Would you prefer to store the secret values in a special environment variables, or in the secret manager? The first option is faster, but less secure - as, for instance, everybody, who has access tot he cloud function details in the console, might see those environment variable values.
Load secret values into the memory on initialization? Or on every invocation? The first option is faster, but might cause some issues if the secrete values are modified (gradual replacement of old values with new, when some instances are terminated, and new instances are initialized).
The second option may need some additional discussion. It might be possible to get the values asynchronously. In what circumstances it might be useful? I think - only in case your code has something else to do, while waiting for the secret values, which are required to do (probably) the main job of the cloud function. How much can we shave on that? - probably a few milliseconds used on the Secret Manager API call. Any drawbacks? - code complexity, as somebody is to maintain the code in the future. Is that performance gain still overweight? - we probably can return to the item 2 in the list above and think about storing secrets in environment variables in that case.
What about the first option? Again - if the performance is the priority - return back to the item 2 above, otherwise - is the code simplicity and maintainability the priority, and we don't need any asynchronous work here? May be the answer of that question depends on skills, knowledge and a financial budget of your company/team, rather than on the technical preferences.
About the "config" file to store the secret values... While it is possible to store data in a pseudo "/tmp" directory (actually in the memory of a cloud function) during the cloud function execution, we should not expect that data to be preserved between cloud function invocations. Thus, we come back to either environment variables (see the item 2 above), or to some other remote place with an API access. I don't know if there are many other services with better latency than the Secret Manager, which can be used as a cache for storing secrets. Suppose we find such services. And now we get the performance vs complexity/maintainability dilemma again...
Some concluding notes. My context, experience, budget, requirements - may be completely different from your case. My assumptions (i.e. the code is for a cloud function) - can be completely wrong as well... Thus, I would suggest to consider my writing with some criticism, and use ideas which are only relevant for your specific situation.

Related

Azure durable entity or static variables?

Question: Is it thread-safe to use static variables (as a shared storage between orchestrations) or better to save/retrieve data to durable-entity?
There are couple of azure functions in the same namespace: hub-trigger, durable-entity, 2 orchestrations (main process and the one that monitors the whole process) and activity.
They all need some shared variables. In my case I need to know the number of main orchestration instances (start new or hold on). It's done in another orchestration (monitor)
I've tried both options and ask because I see different results.
Static variables: in my case there is a generic List, where SomeMyType holds the Id of the task, state, number of attempts, records it processed and other info.
When I need to start new orchestration and List.Add(), when I need to retrieve and modify it I use simple List.First(id_of_the_task). First() - I know for sure needed task is there.
With static variables I sometimes see that tasks become duplicated for some reason - I retrieve the task with List.First(id_of_the_task) - change something on result variable and that is it. Not a lot of code.
Durable-entity: the major difference is that I add List on a durable entity and each time I need to retrieve it I call for .CallEntityAsync("getTask") and .CallEntityAsync("saveTask") that might slow done the app.
With this approach more code and calls is required however it looks more stable, I don't see any duplicates.
Please, advice
Can't answer why you would see duplicates with the static variables approach without the code, may be because list is not thread safe and it may need ConcurrentBag but not sure. One issue with static variable is if the function app is not always on or if it can have multiple instances. Because when function unloads (or crashes) the state would be lost. Static variables are not shared across instances either so during high loads it wont work (if there can be many instances).
Durable entities seem better here. Yes they can be shared across many concurrent function instances and each entity can only execute one operation at a time so they are for sure a better option. The performance cost is a bit higher but they should not be slower than orchestrators since they perform a lot of common operations, writing to Table Storage, checking for events etc.
Can't say if its right for you but instead of List.First(id_of_the_task) you should just be able to access the orchestrators properties through the client which can hold custom data. Another idea depending on the usage is that you may be able to query the Table Storages directly with CloudTable class for the information about the running orchestrators.
Although not entirely related you can look at some settings for parallelism for durable functions Azure (Durable) Functions - Managing parallelism
Please ask any questions if I should clarify anything or if I misunderstood your question.

How to modify error logs in google cloud functions?

I am using google cloud functions with Python. I want to format all logs with some additional data, e.g. customer id. I achieved this without any problem using Stackdriver Logging library with CloudLoggingHandler. In the same manner, I also like to add this information to uncaught error logs and tracebacks.
I tried to modify sys.excepthook and sys.stderr but it did not work, probably they are handled exclusively by cloud functions.
Is there any way I can modify uncaught exceptions or modify handled errors, e.g. by using Stackdriver error reporting? Or do you have any alternative solution for this (without catching all exceptions)?
Cloud Functions provides the (currently) highest level abstraction for code execution. The philosophy is that your bring the code that implements your desired logic and Cloud Functions provides the highest level environment for execution. This has pluses and minuses.
Furthermore, the biggest plus is that you have the very least to concern yourself with in order to get the execution you desired.
On the other hand, you have very little in the way of operational control (the vision is that Cloud Functions provides the maximum in operational control).
As a consequence, if you want more control over the environment at the cost of you having to do more "work", I suggest you Cloud Run. In Cloud Run, you package your application logic as a Docker container and then ask it to take care of all execution of such logic. In your container, you can do anything you want ... including using technology such as Stackdriver logging and defining a CloudLoggingHandler. Cloud Run then takes care of your scaling and execution environment from there.
To sum up, the answer then becomes "No" you don't have control over error logs in Cloud Functions but you can achieve your desired outcome by leveraging Cloud Run instead.
Although you cannot override sys.excepthook, you might do the following:
To provide a little context, I organize my code structure similar to what is presented here: https://code.luasoftware.com/tutorials/google-cloud-functions/structure-for-google-cloud-functions-development-and-split-multiple-file/
Google stores your function entrypoint function in an env var called X_GOOGLE_ENTRY_POINT. You use Python language capabilities to override this function and wrap it similarly to a "decorator", basically wrapping it around a try/except block and then you can run whatever code you want there. I tried using sys.modules[\__name__] but it didn't work, so I went for locals().
I have the function code defined in test_logging.py
from app.functions.test_logging import *
and after importing I do the following
fn = os.getenv('X_GOOGLE_ENTRY_POINT')
lcl = locals()
def decorate(fn):
def run(*args, **kwargs):
try:
fn(*args, **kwargs)
except Exception as e:
'''Do whatever you want to do here'''
return run
lcl[fn] = decorate(lcl[fn])
It's hackish, but it's working for me and it's particularly useful because I keep my code base inside functions folder and I basically don't need to touch the main.py, making this very flexible. You can ever re-raise the error after handling the exception if you want to allow the GCP to know it has failed and maybe re-run

Firebase functions - database cache?

How does one cache database data in firebase function?
Based on this SO answer, firebase caches data for as long as there is an active listener.
Considering the following example:
exports.myTrigger = functions.database.ref("some/data/path").onWrite((data, context) => {
var dbRootRef = data.after.ref.root;
dbRootRef.child("another/data/path").on("value", function(){});
return dbRootRef.child("another/data/path").once("value").then(function(snap){/*process data*/})
}
This will cache the data but the question is - is this valid approach for server side? Should I call .off() at some point in time so it doesn't produce any issues since this call can scale quickly producing tons of '.on()' listeners? Or is it ok to keep 'on()' indefinitely?
Since active data is kept in memory, your code will keep a snapshot of the latest data at another/data/path in memory as long as the listener is active. Since you never call off in your code, that will be as long as the container that runs the function is active, not just for the duration that this function is active.
Even if you have other Cloud Functions in that container, and those other functions don't need this data, it'll still be using memory.
If that is the behavior you want, then it's a valid approach. I'd just recommend doing a cost/benefit analysis, because I expect this may lead to hard-to-understand behavior at some point.

Immutable Global objects for multi-tenancy

We run a multi-tenant environment where users can execute arbitrary scripts using Nashorn. Performance is extremely important for us, so we would rather not create a new SimpleScriptContext object (or even a new SimpleBindings object) on each script eval.
However, this leaves us open to people modifying the global context, which then must be reused for other users / executions. E.g. Math.min = function(a,b) { return 42; }.
Freezing the Math object is a partial solution, but it seems to slow down script execution, and we have to be extremely diligent to make sure we do this everywhere. Similarly, creating a new SimpleBindings object to replace the ENGINE_SCOPE bindings on each execution is a big performance hit.
Are there any options available to us to lock the Global state? Or anything that I haven't thought of?
Thanks!

node.js express custom format debug logging

A seemingly simple question, but I am unsure of the node.js equivalent to what I'm used to (say from Python, or LAMP), and I actually think there may not be one.
Problem statement: I want to use basic, simple logging in my express app. Maybe I want to output DEBUG messages, or INFO messages, or just some stats to the log for consumption by other back-end systems later.
1) I want all logs message, however, to contain some fields: remote-ip and request url, for example.
2) On the other hand, code that logs is everywhere in my app, including deep inside the call tree.
3) I don't want to pass (req,res) down into every node in the call tree (this just creates a lot of parameter passing where they are mostly not needed, and complicates my code, as I need to pass these into async callbacks and timeouts etc.)
In other systems, where there is a thread per request, I will store the (req,res) pair (where all the data I need is) in a thread-local-storage, and the logger will read this and format the message.
In node, there is only one thread. What is my alternative here? What's "the request context in which a specific piece of code is running under"?
The only way I can think of achieving something like this is by looking at a trace, and using reflection to look at local variables up the call tree. I hate that, plus would need to implement this for all callbacks, setTimeouts, setIntervals, new Function()'s, eval's, ... and the list goes on.
What are other people doing?

Resources