Copy local files to Azure Blob - long file names - azure

I need to copy/sync a folder, containing numerous sub folders and files, from a local machine (Windows Server 2012) to our Azure Blob container. Some paths exceed 260chars.
I attempted to use AzCopy (https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy), but got an exception with a long file name.
What are the options for copying files from a local PC to an Azure Blob container, which have pretty long folder/file names? Something like RoboCopy, but then I'd need to map a folder to my blob storage, and I'm not sure that's possible.

Azure Blob Storage doesn't have the concept of folders. There's just: container name + blob name (though a blob's container name can contain separator characters like \ which makes it appear like a path).
And a container's name cannot exceed 63 characters (and must be lowercase). There's no getting around that. If you're trying to store your local server's path as the container name, and that path exceeds 63 characters, it's not going to work.
Azure File Shares (which are backed by Azure Storage) don't have this limitation, as they support standard file I/O operations and directory structures. If you take this route, you should be able to copy your folder structure as-is. There are a few differences:
File shares may be mounted (as an smb share), allowing you to just copy your content over (e.g. xcopy)
You may make SDK/API calls to copy files (slightly different API)
A file share is limited to 5TB, with total 1000 IOPS across the share

Related

Do temp files get automatically deleted after Azure function stops

I have an Azure function app with multiple Azure functions in it. The function app is running on the consumption plan and a linux os. In one of the Azure functions, each time it runs I'm saving a file to a /tmp directory so that I can use that file in the Azure function. After the Azure function stops running, if you don't delete this file from the /tmp directory will it automatically be deleted? And if it isn't automatically deleted, what's the security implications of having those files still there. I say this because, as I understand it, using the consumption plan means sharing resources with other people. I don't know if the tmp directory is part of those shared resources or not. But I'd rather not have other people be able to access files I write to the tmp directory. Thanks :)
I had read here
"Keep in mind though, that Functions is conceived as a serverless
platform, and you should not depend on local disk access to do any
kind of persistence. Whatever you store in that location will be
deleted on other Function invocations."
I ran one of the Azure functions which uploads a file to that tmp directory each time it runs. From that link, I thought the files in tmp would be deleted each time I ran the function, but they persisted across each time that I ran the function. What am I getting wrong here?
Temp directory files are deleted automatically once in 12 hours of time but not after each run of the function app of if the function app is restarted.
It means the data stored in temp directory (For E.g., D:\local\Temp) in Azure, can exists up to the function host process is alive which means that temp data/files are ephemeral.
Persisted files are not shared among site instances and also cannot rely on those files by staying there. Hence, there are no security concerns here.
Please refer to the Azure App Service File System temporary files GitHub Document for more information.

What is an efficient way to copy a subset of files from one container to another?

I have millions of files in one container and I need to copy ~100k to another container in the same storage account. What is the most efficient way to do this?
I have tried:
Python API -- Using BlobServiceClient and related classes, I make a BlobClient for the source and destination and start a copy with new_blob.start_copy_from_url(source_blob.url). This runs at roughly 7 files per second.
azcopy (one file per line) -- Basically a batch script with a line like azcopy copy <source w/ SAS> <destination w/ SAS> for every file. This runs at roughly 0.5 files per second due to azcopy's overhead.
azcopy (1000 files per line) -- Another batch script like the above, except I use the --include-path argument to specify a bunch of semicolon-separated files at once. (The number is arbitrary but I chose 1000 because I was concerned about overloading the command. Even 1000 files makes a command with 84k characters.) Extra caveat here: I cannot rename the files with this method, which is required for about 25% due to character constraints on the system that will download from the destination container. This runs at roughly 3.5 files per second.
Surely there must be a better way to do this, probably with another Azure tool that I haven't tried. Or maybe by tagging the files I want to copy then copying the files with that tag, but I couldn't find the arguments to do that.
Please check with below references:
1. AZCOPY would be best for best performance for copying blobs within
same storage or other storage accounts .we can force a synchronous
copy by specifying "/SyncCopy" parameter for AZCopy to ensures that
the copy operation will get consistent speed. azcopy sync |
Microsoft Docs .
But note that AzCopy performs the synchronous copy by
downloading the blobs to local memory and then uploads to the Blob
storage destination. So performance will also depend on network
conditions between the location where AZCopy is being run and Azure
DC location. Also note that /SyncCopy might generate additional
egress cost comparing to asynchronous copy, the recommended approach
is to use this sync option with azcopy in the Azure VM which is in the same region as
your source storage account to avoid egress cost.
Choose a tool and strategy to copy blobs - Learn | Microsoft Docs
2. StartCopyAsync is one of the ways you can try for copy within a
storage account .
References:
1. .net - Copying file across Azure container without using azcopy - Stack Overflow
2. Copying Azure Blobs Between Containers the Quick Way (markheath.net)
3. You may consider Azure data factory in case of millions of files
but also note that it may be expensive and little timeouts may occur
but it may be worth for repeated kind of work.
References:
1. Copy millions of files (andrewconnell.com) , GitHub(microsoft docs)
2. File Transfer between container to another container - Microsoft Q&A
4. Also check out and try the Azure storage explorer copy blob container to
another

How to copy files of specific size from one blob container to another blob container?

Is there a way to copy files of Specific size(like files size greater than or less than 100 mb) from one blob to another blob container or any location using ADF or any other Azure resources which can help achieve it.
I recommend you to use logic app. The following is the design process:
Note that the unit of size is bytes, you need to convert it.
Here are the test results:
Source container
Destination container
The test results look okay.

Azure Data Lake Storage and Data Factory - Temporary GUID folders and files

I am using Azure Data Lake Store (ADLS), targeted by an Azure Data Factory (ADF) pipeline that reads from Blob Storage and writes in to ADLS. During execution I notice that there is a folder created in the output ADLS that does not exist in the source data. The folder has a GUID for a name and many files in it, also GUIDs. The folder is temporary and after around 30 seconds it disappears.
Is this part of the ADLS metadata indexing? Is it something used by ADF during processing? Although it appears in the Data Explorer in the portal, does it show up through the API? I am concerned it may create issues down the line, even though it it a temporary structure.
Any insight appreciated - a Google turned up little.
So what your seeing here is something that Azure Data Lake Storage does regardless of the method you use to upload and copy data into it. It's not specific to Data Factory and not something you can control.
For large files it basically parallelises the read/write operation for a single file. You then get multiple smaller files appearing in the temporary directory for each thread of the parallel operation. Once complete the process concatenates the threads into the single expected destination file.
Comparison: this is similar to what PolyBase does in SQLDW with its 8 external readers that hit a file in 512MB blocks.
I understand your concerns here. I've also done battle with this where by the operation fails and does not clean up the temp files. My advice would be to be explicit with you downstream services when specifying the target file path.
One other thing, I've had problems where using the Visual Studio Data Lake file explorer tool to do uploads of large files. Sometimes the parallel threads did not concatenate into the single correctly and caused corruption in my structured dataset. This was with files in the 4 - 8GB region. Be warned!
Side note. I've found PowerShell most reliable for handling uploads into Data Lake Store.
Hope this helps.

Can any "lxc-*" commands list the searching template path?

Can any lxc-* commands list the searching template path? Since in some OSs, the path is /usr/share/lxc/templates/, while in others, it may be /usr/local/share/lxc/templates/.
This can't be done with LXC command, since template storage depend on storage disks that you can choose their role.
So to create a container you should pickup a disk and a location

Resources