Local VM storage in Azure Batch task - azure

For my task I'm using locally persisted data. Until now I've successfully used Path.GetTempPath() to get temp folder and persist data there and perform some computations. The temp folder is on the system drive which is not big enough (around 30GB). I'm using VM with 1000GB HDD.
I'd like to write data to the big C:\ drive instead, but it throws an exception:
Access to the path 'C:\whatever_the_path_is' is denied. when I try to access it.
I see that the tasks run under PoolNonAdmin[some-digits] user that obviously doesn't have sufficient permissions.
Are there any special APIs to use local storage with Azure Batch tasks?
EDIT: I'm familiar with %AZ_BATCH_NODE_SHARED_DIR% but for specific reasons I can't use it.

You can use Azure Batch defined environment variables for paths that refer to the ephemeral disk. For example, %AZ_BATCH_TASK_WORKING_DIR% will target the current task's working directory (and is writable by whatever user the task is running as). Or the %AZ_BATCH_NODE_SHARED_DIR% variable will reference the shared directory path that will always be on the ephemeral disk; all users (for which tasks are run under, be it pool admin, non-admin or ephemeral task users) can write to this directory. You can view all of the environment variables defined by Azure Batch here.

I've found a solution here:
https://learn.microsoft.com/en-us/azure/batch/batch-user-accounts
In my case, it was to assign the task elevated UserIdentity like this:
task.UserIdentity = new UserIdentity(new AutoUserSpecification(elevationLevel: ElevationLevel.Admin, scope: AutoUserScope.Pool));

Related

Do temp files get automatically deleted after Azure function stops

I have an Azure function app with multiple Azure functions in it. The function app is running on the consumption plan and a linux os. In one of the Azure functions, each time it runs I'm saving a file to a /tmp directory so that I can use that file in the Azure function. After the Azure function stops running, if you don't delete this file from the /tmp directory will it automatically be deleted? And if it isn't automatically deleted, what's the security implications of having those files still there. I say this because, as I understand it, using the consumption plan means sharing resources with other people. I don't know if the tmp directory is part of those shared resources or not. But I'd rather not have other people be able to access files I write to the tmp directory. Thanks :)
I had read here
"Keep in mind though, that Functions is conceived as a serverless
platform, and you should not depend on local disk access to do any
kind of persistence. Whatever you store in that location will be
deleted on other Function invocations."
I ran one of the Azure functions which uploads a file to that tmp directory each time it runs. From that link, I thought the files in tmp would be deleted each time I ran the function, but they persisted across each time that I ran the function. What am I getting wrong here?
Temp directory files are deleted automatically once in 12 hours of time but not after each run of the function app of if the function app is restarted.
It means the data stored in temp directory (For E.g., D:\local\Temp) in Azure, can exists up to the function host process is alive which means that temp data/files are ephemeral.
Persisted files are not shared among site instances and also cannot rely on those files by staying there. Hence, there are no security concerns here.
Please refer to the Azure App Service File System temporary files GitHub Document for more information.

Cloud Run, how to access root folder to upload

I succeeded in uploading the node js container image to cloud run through docker and it works fine.
But now I have to upload some executable file in the root directory in binary form. (Probably, it would be nice to set basic file permissions as well) But I can't find a way to access it.. I know it's running on Debian 64-bit, right? How can I access the root folder?
Although it is technically possible to download/copy a file to a running Cloud Run instance, that action would need to take place on every cold start. Depending on how large the files are, you could run out of memory as file system changes are in-memory. Containers should be considered read-only file systems for most use cases except for temporary file storage during computation.
Cloud Run does not provide an interface to log in to an instance or remotely access files. The Docker exec type commands are not supported. That level of functionality would need to be provided by your application.
Instead, rebuild your container with updates/changes and redeploy.

Using the temp directory for Azure Functions

I have a set of Azure functions running on the same host, which scales up to many instances at times. I'd like to store a very small amount of ephemeral data (a few kb's) and opportunistically share those data between function executions. I know that the temp directory is only available to the functions running on that same instance. I also know that I could use the home directory, durable functions, or other Azure (such as blob) storage to share data between all functions persistently.
I have two main questions
What are the security implications of using the temp directory? Who can access its contents outside of the running function?
Is this still a reasonable solution? I can't find much in the way of Microsoft documentation outside of what looks like some outdated kudu documentation here.
Thanks!
Answer to Question 1
Yes, it is secure. The Function host process runs inside a sandbox. All access data stored to D:\local is self-contained and isolated to the processes within the sandbox. Kindly see https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox
Answer to Question 2
The data in D:\local\Temp exists as long as the Function host process is alive. The Functions host process can be recycled at any time due to unexpected events such as unhandled exceptions, timeouts, hitting resource usage limits for your plan. As long as your workflow accounts for the fact that the data stored in D:\local\Temp is ephemeral, then the answer is a 'yes'.
I believe this will answer your question :
Please refer to this for more details.
Also, when Folder/Files when created via code inside the “Temp” folder; you cannot view them when you visit KUDU site. But you can use those files/ folders.
How to view the files/ folders if created via KUDU?
We will need to add - WEBSITE_DISABLE_SCM_SEPARATION = true in Configuration(app settings).
Note:- Another important note is that the Main site and the scm site do not share temp files. So if you write some files there from your site, you will not see them from Kudu Console (and vice versa).
You can make them use the same temp space if you disable separation (via WEBSITE_DISABLE_SCM_SEPARATION).
But note that this is a legacy flag, and its use is not recommended/supported.
(ref : shared document link)
Security implications depend on the level of isolation you are seeking.
In shared app-service plan or consumption plan you need to trust the sandbox isolation. This is not an isolated microvm like AWS lambda.
If you have your own app-service plan then you need to trust the VM hypervisor isolation of your app-service plan.
If you are really paranoid or running healtcare application, then you likely need to run your function in a ASE plan.
Reasonable solution is one where the cost is not exceeding the worth of data you are protecting :)

Export Memory Dump Azure Kubernetes

I need to export memory dump from Aks Cluster and save it in some location
How can I do it? Is easy to export to a storage account? Exist another solution? Can someone give me an step y step?
EDIT: the previous answer was wrong, I didn't paid attention you needed a dump. You'll actually will need to get it from Boot Diagnostic or some command line:
https://learn.microsoft.com/en-us/azure/virtual-machines/troubleshooting/boot-diagnostics#enable-boot-diagnostics-on-existing-virtual-machine
This question is quite old, but let me nevertheless share how I realized it:
Linux has an internal setting called RLIMIT_CORE which limits the size of the core dump you'll receive when your application crashes - this is what you find quite quickly.
Next, you have to define the location of where core files are saved, which is done in the file /proc/sys/kernel/core_pattern. The given path can either be a relative file name (saved next to the binary which crashed), an absolute path (absolute to the mounted namespace) or - here is where it gets interesting - a pipe followed by an absolute path to an executable (application or script). This script will (according to the docs - see headline Piping core dumps to a program) be started as user and group root - but furthermore, it will (according to this post in the Linux mailing list) also be executed in the global namespace - in other words, outside of the container.
If you are like me, and you do not have access to the image used for new nodes on your AKS cluster, you want to set these values using DaemonSets, a pod which runs once on every node.
Armed with all this knowledge, you can do the following:
Create a DaemonSet - a pod running on every machine performing the initial setup.
This DaemonSet will run as a privileged container to allow it to switch to the root namespace.
After having switched namespaces successfully, it can change the value of /proc/sys/kernel/core_pattern.
The value should be something like |/bin/dd of=/core/%h.%e.%p.%t (dd will take the stdin, the core file, and save it to the location defined by the parameter of). Core files will now be saved at /core/. The name of the file can be explained by the variables found in the docs for core files.
After knowing that the files will be saved to /core/ of the root namespace, we can mount our storage there - in my case Azure File Storage. Here's a tutorial of how to mount AzureFileStorage.
Pods have the RestartPolicy set to Always. Since the job of your pod is done, and you don't want it to restart automatically, let it remain running using sleep infinity.
This writeup is almost a copy of what I discovered while contacting the support from Microsoft. Here's the thread in their forum, which contains an almost finished configuration for a DaemonSet.
I'll leave some links here which I used during my research:
how to generate core file in docker container?
How to access docker host filesystem from privileged container
https://medium.com/#patnaikshekhar/initialize-your-aks-nodes-with-daemonsets-679fa81fd20e
Sidenote:
I could also just have mounted the AzureFileSystem into every container and set the value for /proc/sys/kernel/core_pattern to just /core/%h.%e.%p.%t but this would require me to mention the mount on every container. Going this way I could free the configuration of the pods of this administrative task and put it where it (in my opinion) belongs, to the initial machine setup.

Neo4j Azure hosting and Database location

I know that we can use the VM Depot to get started with the Neo4J in Azur but one thing that is not clear is where should we physically store the DB files. I tried to look around in the net if there are any recommendations on where the physical files would be stored so that then a VM crashes or restarts, the data is not lost.
can someone share their thoughts or point me to a address where some more details can be found on do and don'ts of Neo4j on Azure for a production environment.
Regards
Kiran
When you set up a Neo4j VM via VM Depot, that image, by default, configures the database files to reside within the same VM as the server itself. The location is specified in neo4j-server.properties. This lets you simply spin up the VM and start using Neo4j immediately.
However: You'll soon discover that your storage space is limited (I believe the VM instances are set up with a 127GB disk). To work with larger databases, you'll need to attach an additional disk (or disks), each disk up to 1TB in size. These disks, as well as the main VM disk, are backed by blob storage, meaning they're durable - persistent disks.
How you ultimately configure this is up to you, depending on the size of the database and its purpose. The only storage to avoid, if you need persistence, is the scratch disk provided (which is a locally-attached drive with no durability).
The documentation announcing that VM doesn't say. But when you install neo4j as a package on to other similar linux systems (the VM in question is a linux VM) then the data usually goes into /var/lib/neo4j/data. Here's an example:
user#host:/var/lib/neo4j/data$ pwd
/var/lib/neo4j/data
user#host:/var/lib/neo4j/data$ ls
graph.db keystore log neo4j-service.pid README.txt rrd
user#host:/var/lib/neo4j/data$ cat README.txt
Neo4j Data
=======================================
This directory contains all live data managed by this server, including
database files, logs, and other "live" files.
The main directory you really have to have is the "graph.db" directory. That's going to contain the bulk of the data. May as well back up the entirety of this directory. Some of the files (like the .pid file and the README.txt) of course aren't needed.
Now, there's no guarantee that in the VM that it's going to be /var/lib/neo4j/data but it's going to be something very similar. And what you're going to want is going to be a directory whose name ends in .db since that's the default for new neo4j databases.
To narrow down further, once you get that VM running, just run updatedb then locate *.db | grep neo4j and that's almost certain to find it quickly.

Resources