Do temp files get automatically deleted after Azure function stops

Do temp files get automatically deleted after Azure function stops - azure

I have an Azure function app with multiple Azure functions in it. The function app is running on the consumption plan and a linux os. In one of the Azure functions, each time it runs I'm saving a file to a /tmp directory so that I can use that file in the Azure function. After the Azure function stops running, if you don't delete this file from the /tmp directory will it automatically be deleted? And if it isn't automatically deleted, what's the security implications of having those files still there. I say this because, as I understand it, using the consumption plan means sharing resources with other people. I don't know if the tmp directory is part of those shared resources or not. But I'd rather not have other people be able to access files I write to the tmp directory. Thanks :)
I had read here
"Keep in mind though, that Functions is conceived as a serverless
platform, and you should not depend on local disk access to do any
kind of persistence. Whatever you store in that location will be
deleted on other Function invocations."
I ran one of the Azure functions which uploads a file to that tmp directory each time it runs. From that link, I thought the files in tmp would be deleted each time I ran the function, but they persisted across each time that I ran the function. What am I getting wrong here?

Temp directory files are deleted automatically once in 12 hours of time but not after each run of the function app of if the function app is restarted.
It means the data stored in temp directory (For E.g., D:\local\Temp) in Azure, can exists up to the function host process is alive which means that temp data/files are ephemeral.
Persisted files are not shared among site instances and also cannot rely on those files by staying there. Hence, there are no security concerns here.
Please refer to the Azure App Service File System temporary files GitHub Document for more information.

Related

Using the temp directory for Azure Functions

I have a set of Azure functions running on the same host, which scales up to many instances at times. I'd like to store a very small amount of ephemeral data (a few kb's) and opportunistically share those data between function executions. I know that the temp directory is only available to the functions running on that same instance. I also know that I could use the home directory, durable functions, or other Azure (such as blob) storage to share data between all functions persistently.
I have two main questions
What are the security implications of using the temp directory? Who can access its contents outside of the running function?
Is this still a reasonable solution? I can't find much in the way of Microsoft documentation outside of what looks like some outdated kudu documentation here.
Thanks!

Answer to Question 1
Yes, it is secure. The Function host process runs inside a sandbox. All access data stored to D:\local is self-contained and isolated to the processes within the sandbox. Kindly see https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox
Answer to Question 2
The data in D:\local\Temp exists as long as the Function host process is alive. The Functions host process can be recycled at any time due to unexpected events such as unhandled exceptions, timeouts, hitting resource usage limits for your plan. As long as your workflow accounts for the fact that the data stored in D:\local\Temp is ephemeral, then the answer is a 'yes'.

I believe this will answer your question :
Please refer to this for more details.
Also, when Folder/Files when created via code inside the “Temp” folder; you cannot view them when you visit KUDU site. But you can use those files/ folders.
How to view the files/ folders if created via KUDU?
We will need to add - WEBSITE_DISABLE_SCM_SEPARATION = true in Configuration(app settings).
Note:- Another important note is that the Main site and the scm site do not share temp files. So if you write some files there from your site, you will not see them from Kudu Console (and vice versa).
You can make them use the same temp space if you disable separation (via WEBSITE_DISABLE_SCM_SEPARATION).
But note that this is a legacy flag, and its use is not recommended/supported.
(ref : shared document link)

Security implications depend on the level of isolation you are seeking.
In shared app-service plan or consumption plan you need to trust the sandbox isolation. This is not an isolated microvm like AWS lambda.
If you have your own app-service plan then you need to trust the VM hypervisor isolation of your app-service plan.
If you are really paranoid or running healtcare application, then you likely need to run your function in a ASE plan.
Reasonable solution is one where the cost is not exceeding the worth of data you are protecting :)

Local VM storage in Azure Batch task

For my task I'm using locally persisted data. Until now I've successfully used Path.GetTempPath() to get temp folder and persist data there and perform some computations. The temp folder is on the system drive which is not big enough (around 30GB). I'm using VM with 1000GB HDD.
I'd like to write data to the big C:\ drive instead, but it throws an exception:
Access to the path 'C:\whatever_the_path_is' is denied. when I try to access it.
I see that the tasks run under PoolNonAdmin[some-digits] user that obviously doesn't have sufficient permissions.
Are there any special APIs to use local storage with Azure Batch tasks?
EDIT: I'm familiar with %AZ_BATCH_NODE_SHARED_DIR% but for specific reasons I can't use it.

You can use Azure Batch defined environment variables for paths that refer to the ephemeral disk. For example, %AZ_BATCH_TASK_WORKING_DIR% will target the current task's working directory (and is writable by whatever user the task is running as). Or the %AZ_BATCH_NODE_SHARED_DIR% variable will reference the shared directory path that will always be on the ephemeral disk; all users (for which tasks are run under, be it pool admin, non-admin or ephemeral task users) can write to this directory. You can view all of the environment variables defined by Azure Batch here.

I've found a solution here:
https://learn.microsoft.com/en-us/azure/batch/batch-user-accounts
In my case, it was to assign the task elevated UserIdentity like this:
task.UserIdentity = new UserIdentity(new AutoUserSpecification(elevationLevel: ElevationLevel.Admin, scope: AutoUserScope.Pool));

Automated processing of large text file(s)

The scenario is as follows: A large text file is put somewhere. At a certain time of the day (or manually or after x number of files), a Virtual Machine with Biztalk installed is about to start automatically for processing of these files. Then, the files should be put in some output place and the VM should be shut down. I don´t know the time it takes for processing these files.
What is the best way to build such a solution? The solution is preferably to be used for similar scenarios in the future.
I was thinking of Logic Apps for the workflow, blob storage or FTP for input/output of the files, an API App for starting/shutting down the VM. Can Azure Functions be used in some way?
EDIT:
I also asked the question elsewhere, see link.
https://social.msdn.microsoft.com/Forums/en-US/19a69fe7-8e61-4b94-a3e7-b21c4c925195/automated-processing-of-large-text-files?forum=azurelogicapps

Just create an Azure Runbook with a Schedule, make that Runbook check for specific files in a Storage Account, if they exist, start up a VM and wait till the files are gone, once the files are gone (so BizTalk processed them, deleted and put them in some place where they belong), Runbook would stop the VM.

How can I get data from Heroku files storage? Where files stored?

I've googled this question a lot, but I haven't found right answer.
I've built an example app on NodeJS without Database connectivity. While developing I stored data in separate dir - "fakeDB", which contains "tests", "questions", "users" dirs and so on. In "tests" there are JSON files represent test data (a set of questions and answers).
When I deployed app on Heroku, tests stored correctly. When new test created, it is saved in "tests" dir and I have an access to it later.
But when I push a new commit to GH repo, tests that were created in Heroku, have been deleted.
How can I get copy of my Heroku repo on local machine?
NOTE: I've run heroku run bash and on ls it printed the list of local files, not from remote. Also, I've run git pull heroku to separate dir, but there were also a set of my previous files without created on Heroku.

Heroku's filesystem is ephemeral:
Each dyno gets its own ephemeral filesystem, with a fresh copy of the most recently deployed code. During the dyno’s lifetime its running processes can use the filesystem as a temporary scratchpad, but no files that are written are visible to processes in any other dyno and any files written will be discarded the moment the dyno is stopped or restarted. For example, this occurs any time a dyno is replaced due to application deployment and approximately once a day as part of normal dyno management.
This means that you can't reliably create files and store them on the local filesystem. They aren't shared across dynos, and they periodically disappear. Furthermore, Heroku doesn't provide a mechanism for easily retrieving generated files.
The official recommendation is to use something like Amazon S3 for storing uploads, generated files, etc. Of course, depending on what's in your files a database might be a better fit.

Contents ofAzure worker role e:\approot folder disappearing on instance reboot

When a Windows Azure worker role instance is rebooted from within the Azure portal, are the contents of the e:\approot folder deleted?
I have an elevated startup task which checks for the existence of a file in this folder before adding some registry settings. This has worked in the past but is now failing because the file it expects to find is no longer there following a portal-induced reboot.
If I perform a 'shutdown' command from within the startup task, the instance reboots but the contents of e:\approot are unaffected.

As others have already said, the contents of the drive are not lost on reboot. What has most likely happened is that you are hardcoding "e:\approot" in your startup task. You should not do this. I would hazard a guess that when you reboot, the drive has moved to f:\ or some other drive. I have seen this quite a bit.
Instead, you should reference %ROLEROOT% environment variable. That will point to the correct drive and path (e.g. "%ROLEROOT%\AppRoot") on reboot regardless of where the drive actually gets moved to.

I don't believe the conent of e:\approot will "disappear". The original content I mean.
This is the location where your role code is located, so it is not being deleted in any way, otherwise your role will not work at all. It might be reinitiated on every reboot, however I really doubt that is true.
If you use startup task to check for something you manually add, I suggest that you use a Local Storage Resource. Keep anything that is not part of your original package deployment in a Local Resource. You have the option to keep the content of this folder(s) (or clean it) upon role "recycle".
If your startup task is checking for some contents of your role code/package to be there, I suggest that you implement some wait logic in the cmd/batch file you are using. And also mark the startup task as "background" type, so it does not block the instance startup. As I said, e:\approot cannot be empty, because this is where your code resides! The content might come there later, but for sure it will not stay empty.

You can't count on local changes surviving (or not surviving) updates or restarts - changes may persist or may be lost.
Your code should be designed to account for that, period. You can store temporary data locally to resume faster, but that data persisting is not guaranteed, so you should have that data in some durable storage like SQL Azure or Azure Storage.
The behavior you see might be caused by installing software updates. I'm not sure that's how it works, but imagine Azure infrastructure decides to roll on Windows updates on some of your instances VMs. Installing updates can take long, so Azure will just stop your instance, then start another (already updated) clean VM and deploy and start your role instance there. If that happens all local changes will of course be lost - your instance will be started on a fresh clean VM and your current VM will be discarded. That's just a speculation, but I imagine it's quite realistic.

Answer is that when the Reboot button is clicked on the Azure portal, the contents of the AppRoot folder are deleted and the package redeployed.
To test, deploy something (anything...) to an Azure instance. RDP onto the instance and create a file (test.txt) in the AppRoot folder (this will be on the E: or F: drive).
Click the Reboot button on the portal. Wait for restart, then RDP onto the instance again - test.txt no longer exists.
Note that if you RDP onto the instance and choose Restart from the Windows UI, then test.txt is not deleted.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Do temp files get automatically deleted after Azure function stops - azure

Related

Using the temp directory for Azure Functions

Local VM storage in Azure Batch task

Automated processing of large text file(s)

How can I get data from Heroku files storage? Where files stored?

Contents ofAzure worker role e:\approot folder disappearing on instance reboot

Categories

Resources