Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 days ago.
Improve this question
I am new to Azure and request help on below issue.
I have multiple servers on-prem.
My requirement is to copy all files and folders for all of the servers to an Azure storage account / container.
basic queries:
a) what is the best practice - create one container per server?
b) how to ensure that all files / folders are copied recursively from all disks on a server?
b) How to schedule the copy from server to container on daily basis with the requirement that date is appended in file name?
I am looking at https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-migrate-on-premises-data?tabs=windows for a possible solution.
Will appreciate if you can share your thoughts / experience on the above.
I am looking at Microsoft learn for the time being and looking to install AZCopy.
a) The decision to create one container per server or to store all files in a single container depends on specific needs and use case.
One container per server can help with organization and management but may result in a larger number of containers to manage. Storing all files in a single container can simplify management but may require additional organization within the container.
b) One way to ensure that all files and folders are copied recursively from all disks on a server is to use a command-line tool such as Robocopy, which is built into Windows. To copy all files and folders from a server, you can use the following command:
AZCopy
you can use Azcopy to copy to containers, directories, and blobs
azcopy copy 'https://mysourceaccount.blob.core.windows.net/' 'https://mydestinationaccount.blob.core.windows.net' --recursive
And from local to containers, directories, and blobs
azcopy copy 'Local path' 'https://mydestinationaccount.blob.core.windows.net' --recursive
RoboCopy is also an approach to copy files to azure.
robocopy C:\ \\azurestorageaccount.blob.core.windows.net\container /S /Z /B /MT:16 /XD pagefile.sys /XF *.log /LOG+:C:\logs\robocopy.log
This command will copy all files and folders from the C:\ drive to the specified Azure storage account container. The /S switch tells Robocopy to copy subdirectories and files, the /Z switch allows the copy to be resumed if interrupted, the /B switch ensures the copy is performed with administrator rights, and the /MT switch specifies the number of threads to use for the copy. The /XD switch excludes the specified file from the copy (in this case, the pagefile.sys), and the /XF switch specifies files to exclude from the copy (in this case, all files with a .log extension). Finally, the /LOG+ switch logs the results of the copy to the specified file.
To schedule this copy on a daily basis with the date appended to the file name, you can use a batch file with the following content:
#echo off
setlocal
set DATE=%date:/=-%
set DATE=%DATE: =_%
set DATE=%DATE:,=_%
robocopy C:\ \\azurestorageaccount.blob.core.windows.net\container /S /Z /B /MT:16 /XD pagefile.sys /XF *.log /LOG+:C:\logs\robocopy_%DATE%.log
This batch file will set the DATE environment variable to the current date in the format of YYYY-MM-DD, with the slashes replaced by hyphens and spaces replaced by underscores. The Robocopy command is the same as before, but the log file will be named with the current date appended to the file name. To schedule this batch file to run daily, you can use the Windows Task Scheduler.
You can use AzCopy to transfer data from your on-prem server to an Azure container. AzCopy is a command-line tool that you can use to copy data to or from a storage account.
For more information refer below blogs
Thanks to Robert Allen for the blog - Robocopy examples
Robocopy Scheduled jobs.
Related
I am trying to sync my local hard-disks folders to the s3 bucket, the thing is that the local folders are separated to few drivers like C:\, D:\ and so on...
For example the files in S3 Bucket includes 'RD1' to 'RD80' directories and in the local files C:\ holds 'RD1' to 'RD12', 'D:\' contain 'RD12' to 'RD20' and so on...
There is anyway to use aws cli sync command to accomplish my needs?
I wrote python script that would compere the two backups but i do prefer to use sync command and permanently control the synchronization.
Thanks alot,
Best regards.
The AWS CLI aws s3 sync command will synchronize files from one location to another location, including subdirectories.
If you wish to synchronize multiple directories to different locations, then you will need to run the sync command multiple times.
Also, please note that the sync command is one-way, either from your local computer to S3 or from S3 to the local computer (or S3 to S3). If you want to sync in 'both directions', then you would need to execute the sync command both ways. This is especially important for handling deleted files, which are only deleted in the destination location if you use the --delete option.
I have millions of files in one container and I need to copy ~100k to another container in the same storage account. What is the most efficient way to do this?
I have tried:
Python API -- Using BlobServiceClient and related classes, I make a BlobClient for the source and destination and start a copy with new_blob.start_copy_from_url(source_blob.url). This runs at roughly 7 files per second.
azcopy (one file per line) -- Basically a batch script with a line like azcopy copy <source w/ SAS> <destination w/ SAS> for every file. This runs at roughly 0.5 files per second due to azcopy's overhead.
azcopy (1000 files per line) -- Another batch script like the above, except I use the --include-path argument to specify a bunch of semicolon-separated files at once. (The number is arbitrary but I chose 1000 because I was concerned about overloading the command. Even 1000 files makes a command with 84k characters.) Extra caveat here: I cannot rename the files with this method, which is required for about 25% due to character constraints on the system that will download from the destination container. This runs at roughly 3.5 files per second.
Surely there must be a better way to do this, probably with another Azure tool that I haven't tried. Or maybe by tagging the files I want to copy then copying the files with that tag, but I couldn't find the arguments to do that.
Please check with below references:
1. AZCOPY would be best for best performance for copying blobs within
same storage or other storage accounts .we can force a synchronous
copy by specifying "/SyncCopy" parameter for AZCopy to ensures that
the copy operation will get consistent speed. azcopy sync |
Microsoft Docs .
But note that AzCopy performs the synchronous copy by
downloading the blobs to local memory and then uploads to the Blob
storage destination. So performance will also depend on network
conditions between the location where AZCopy is being run and Azure
DC location. Also note that /SyncCopy might generate additional
egress cost comparing to asynchronous copy, the recommended approach
is to use this sync option with azcopy in the Azure VM which is in the same region as
your source storage account to avoid egress cost.
Choose a tool and strategy to copy blobs - Learn | Microsoft Docs
2. StartCopyAsync is one of the ways you can try for copy within a
storage account .
References:
1. .net - Copying file across Azure container without using azcopy - Stack Overflow
2. Copying Azure Blobs Between Containers the Quick Way (markheath.net)
3. You may consider Azure data factory in case of millions of files
but also note that it may be expensive and little timeouts may occur
but it may be worth for repeated kind of work.
References:
1. Copy millions of files (andrewconnell.com) , GitHub(microsoft docs)
2. File Transfer between container to another container - Microsoft Q&A
4. Also check out and try the Azure storage explorer copy blob container to
another
The scenario is as follows: A large text file is put somewhere. At a certain time of the day (or manually or after x number of files), a Virtual Machine with Biztalk installed is about to start automatically for processing of these files. Then, the files should be put in some output place and the VM should be shut down. I donĀ“t know the time it takes for processing these files.
What is the best way to build such a solution? The solution is preferably to be used for similar scenarios in the future.
I was thinking of Logic Apps for the workflow, blob storage or FTP for input/output of the files, an API App for starting/shutting down the VM. Can Azure Functions be used in some way?
EDIT:
I also asked the question elsewhere, see link.
https://social.msdn.microsoft.com/Forums/en-US/19a69fe7-8e61-4b94-a3e7-b21c4c925195/automated-processing-of-large-text-files?forum=azurelogicapps
Just create an Azure Runbook with a Schedule, make that Runbook check for specific files in a Storage Account, if they exist, start up a VM and wait till the files are gone, once the files are gone (so BizTalk processed them, deleted and put them in some place where they belong), Runbook would stop the VM.
I am trying to upload a 2.6 GB iso to Azure China Storage using AZCopy from my machine here in the USA. I shared the file with a colleague in China and they didn't have a problem. Here is the command which appears to work for about 30 minutes and then fails. I know there is a "Great Firewall of China" but I'm not sure how to get around the problem.
C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy> .\AzCopy.exe
/Source:C:\DevTrees\MyProject\Layout-Copy\Binaries\Iso\Full
/Dest:https://xdiso.blob.core.chinacloudapi.cn/iso
/DestKey:<my-key-here>
The network between the azure server and your local machine should be very slow, and AzCopy use default 8*core threads to do data transfer which might be too aggressive for the slow network.
I would suggest you reduce the thread number by set parameter "/NC:", you can set it as a smaller number as "/NC:2" or "/NC:5", and see if the transfer will be more stable.
BTW, when the timeout issue repro again, please resume with same AzCopy command line, then you can always make progress with resume, instead of start from beginning.
Since you're experiencing a timeout, you could try AZCopy with in re-startable mode like this:
C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy> .\AzCopy.exe
/Source:<path-to-my-source-data>
/Dest:<path-to-my-storage>
/DestKey:<my-key-here>
/Z:<path-to-my-journal-file>
The path to your journal file is arbitrary. For instance, you could site it to C:\temp\azcopy.log if you'd like.
Assume an interrupt occurs while copying your file, and 90% of the file has been transferred to Azure already. Then upon restarting, we will only transfer the remaining 10% of the file.
For more information, type .\AzCopy.exe /?:Z to find the following info:
Specifies a journal file folder for resuming an operation. AzCopy
always supports resuming if an operation has been interrupted.
If this option is not specified, or it is specified without a folder path,
then AzCopy will create the journal file in the default location,
which is %LocalAppData%\Microsoft\Azure\AzCopy.
Each time you issue a command to AzCopy, it checks whether a journal
file exists in the default folder, or whether it exists in a folder
that you specified via this option. If the journal file does not exist
in either place, AzCopy treats the operation as new and generates a
new journal file.
If the journal file does exist, AzCopy will check whether the command
line that you input matches the command line in the journal file.
If the two command lines match, AzCopy resumes the incomplete
operation. If they do not match, you will be prompted to either
overwrite the journal file to start a new operation, or to cancel the
current operation.
The journal file is deleted upon successful completion of the
operation.
Note that resuming an operation from a journal file created by a
previous version of AzCopy is not supported.
You can also find out more here: http://blogs.msdn.com/b/windowsazurestorage/archive/2013/09/07/azcopy-transfer-data-with-re-startable-mode-and-sas-token.aspx
We're using putty and a ssh connection to our webhost. They backup our files daily onto their servers.
Since the backup files use a large amount of space, we now want to copy the backup files to our own server via a cronjob daily.
How do we have to set up the cronjob?
If you know the the backup filepath and filename (eg: backup_ddmmyyyy.tar.gz), you can simply scp that backup file from one server to another.
Put this scp command inside a shell script, and configure it accordingly with server address of the other server, and location where you want to copy the file.
Since your backup files use a large amount of space, my guess is, they are large sized individually as well, so using rsync over ssh instead of a plain scp might be a better option to compensate for network failures.
Once your script is working, you can put in a cronjob for that script for an appropriate time, after the backups on webhost are guaranteed to be over.