Azure China Storage - AZCopy upload fail - azure

I am trying to upload a 2.6 GB iso to Azure China Storage using AZCopy from my machine here in the USA. I shared the file with a colleague in China and they didn't have a problem. Here is the command which appears to work for about 30 minutes and then fails. I know there is a "Great Firewall of China" but I'm not sure how to get around the problem.
C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy> .\AzCopy.exe
/Source:C:\DevTrees\MyProject\Layout-Copy\Binaries\Iso\Full
/Dest:https://xdiso.blob.core.chinacloudapi.cn/iso
/DestKey:<my-key-here>

The network between the azure server and your local machine should be very slow, and AzCopy use default 8*core threads to do data transfer which might be too aggressive for the slow network.
I would suggest you reduce the thread number by set parameter "/NC:", you can set it as a smaller number as "/NC:2" or "/NC:5", and see if the transfer will be more stable.
BTW, when the timeout issue repro again, please resume with same AzCopy command line, then you can always make progress with resume, instead of start from beginning.

Since you're experiencing a timeout, you could try AZCopy with in re-startable mode like this:
C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy> .\AzCopy.exe
/Source:<path-to-my-source-data>
/Dest:<path-to-my-storage>
/DestKey:<my-key-here>
/Z:<path-to-my-journal-file>
The path to your journal file is arbitrary. For instance, you could site it to C:\temp\azcopy.log if you'd like.
Assume an interrupt occurs while copying your file, and 90% of the file has been transferred to Azure already. Then upon restarting, we will only transfer the remaining 10% of the file.
For more information, type .\AzCopy.exe /?:Z to find the following info:
Specifies a journal file folder for resuming an operation. AzCopy
always supports resuming if an operation has been interrupted.
If this option is not specified, or it is specified without a folder path,
then AzCopy will create the journal file in the default location,
which is %LocalAppData%\Microsoft\Azure\AzCopy.
Each time you issue a command to AzCopy, it checks whether a journal
file exists in the default folder, or whether it exists in a folder
that you specified via this option. If the journal file does not exist
in either place, AzCopy treats the operation as new and generates a
new journal file.
If the journal file does exist, AzCopy will check whether the command
line that you input matches the command line in the journal file.
If the two command lines match, AzCopy resumes the incomplete
operation. If they do not match, you will be prompted to either
overwrite the journal file to start a new operation, or to cancel the
current operation.
The journal file is deleted upon successful completion of the
operation.
Note that resuming an operation from a journal file created by a
previous version of AzCopy is not supported.
You can also find out more here: http://blogs.msdn.com/b/windowsazurestorage/archive/2013/09/07/azcopy-transfer-data-with-re-startable-mode-and-sas-token.aspx

Related

What is an efficient way to copy a subset of files from one container to another?

I have millions of files in one container and I need to copy ~100k to another container in the same storage account. What is the most efficient way to do this?
I have tried:
Python API -- Using BlobServiceClient and related classes, I make a BlobClient for the source and destination and start a copy with new_blob.start_copy_from_url(source_blob.url). This runs at roughly 7 files per second.
azcopy (one file per line) -- Basically a batch script with a line like azcopy copy <source w/ SAS> <destination w/ SAS> for every file. This runs at roughly 0.5 files per second due to azcopy's overhead.
azcopy (1000 files per line) -- Another batch script like the above, except I use the --include-path argument to specify a bunch of semicolon-separated files at once. (The number is arbitrary but I chose 1000 because I was concerned about overloading the command. Even 1000 files makes a command with 84k characters.) Extra caveat here: I cannot rename the files with this method, which is required for about 25% due to character constraints on the system that will download from the destination container. This runs at roughly 3.5 files per second.
Surely there must be a better way to do this, probably with another Azure tool that I haven't tried. Or maybe by tagging the files I want to copy then copying the files with that tag, but I couldn't find the arguments to do that.
Please check with below references:
1. AZCOPY would be best for best performance for copying blobs within
same storage or other storage accounts .we can force a synchronous
copy by specifying "/SyncCopy" parameter for AZCopy to ensures that
the copy operation will get consistent speed. azcopy sync |
Microsoft Docs .
But note that AzCopy performs the synchronous copy by
downloading the blobs to local memory and then uploads to the Blob
storage destination. So performance will also depend on network
conditions between the location where AZCopy is being run and Azure
DC location. Also note that /SyncCopy might generate additional
egress cost comparing to asynchronous copy, the recommended approach
is to use this sync option with azcopy in the Azure VM which is in the same region as
your source storage account to avoid egress cost.
Choose a tool and strategy to copy blobs - Learn | Microsoft Docs
2. StartCopyAsync is one of the ways you can try for copy within a
storage account .
References:
1. .net - Copying file across Azure container without using azcopy - Stack Overflow
2. Copying Azure Blobs Between Containers the Quick Way (markheath.net)
3. You may consider Azure data factory in case of millions of files
but also note that it may be expensive and little timeouts may occur
but it may be worth for repeated kind of work.
References:
1. Copy millions of files (andrewconnell.com) , GitHub(microsoft docs)
2. File Transfer between container to another container - Microsoft Q&A
4. Also check out and try the Azure storage explorer copy blob container to
another

Recover deleted folder from Google VPS

We have a VPS running on Google Cloud which had a very important folder in a user directory. An employee of ours deleted that folder and we can't seem to figure out how to recover it. I came across extundelete but it seems the partition needs to be unmounted for it to work but I don't understand how I would do it on Google. This project took more than a year and that was the latest copy after a fire which took out the last copy from our local servers.
Could anyone please help or guide me in the right direction?
Getting any files back from your VM's disk may be tricky (at best) or impossible (most probably) if the files got overwritten.
Easiest way would be to get them back from a copy or snapshot of your VM's disk. If you have a snapshot of your disk (either taken manually or automatically) from before when the folder in question got delete then you will get your files back.
If you don't have any backups then you may try to recover the files - I've found many guides and tutorials, let me just link the ones I believe would help you the most:
Unix/Linux undelete/recover deleted files
Recovering accidentally deleted files
Get list of files deleted by rm -rf
------------- UPDATE -----------
Your last chance in this battle is to make two clones of the disk
and then detach original disk from the VM and attach one of the clones to keep your VM running. Then use second clone for any experiments. Keep the original untouched in case you mess up the second clone.
Now create a new Windows VM and attach your second clone as the additional disk. At this moment you're ready to try various data redovery software;
UFS Explorer
Virtual Machine Data Recovery
There are plenty of others to try from too.
Another approach would be to create an image from the original disk and export it as a VMDK imagae (and save it to a storage bucket). Then download it to yor local computer and then use for example VMware VMDK Recovery or other specialized software for extracting data from virtual machines disk images.

How to set AzCopy verbose log to the Console directly?

I'm using AzCopy 8.1.0-netcore on Windows. The /V:[verbose-log-file] option can only append verbose log to a file. I'd like to output verbose to the Console directly. Is that possible?
Preferable way to save the log as a file, since there could be a lot of useful information.
if any transfer ever goes wrong. AzCopy resume a job, AzCopy will attempt to transfer all of the files that are listed in the plan file which weren't already transferred. One option would be to save the log file in the current directory, or you can change the location of log file using Azcopy env : https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-configure
~/.azcopy/plans contains the state files that allow AzCopy to resume failed jobs. They also allow the user to list all the jobs that ran in the past and query their results with ./azcopy jobs list and `./azcopy jobs show [job-ID]. We currently do not have a strategy to get rid of these files, as we don't know how long the user wants to keep records of their old jobs.
Logs are critical in helping our customers to investigate issues, as they could be very verbose and offer loads of useful information.
We can certainly add some kind of clean command that gets rid of these logs and plan files.
So as for workaround you can use azcopy jobs clean to remove the older logs and plan files.
Refer the document has same discussion did regarding the same:
https://github.com/Azure/azure-storage-azcopy/issues/221

How do I use perforce checkpoints?

Why is perforce giving this error when I try to create a checkpoint? Can I restore the entire database from just a checkpoint file and the journal file? What am I doing wrong, how does this work? Why the perforce user guide a giant book, and there are no video tutorials online?
Why is perforce giving this error when I try to create a checkpoint?
You specified an invalid prefix (//. is not a valid filename). If you want to create a checkpoint that doesn't have a particular prefix, just omit that argument:
p4d -jc
This will create a checkpoint called something like checkpoint.8 in the server root directory (P4ROOT), and back up the previous journal file (journal.7) in the same location.
Can I restore the entire database from just a checkpoint file and the journal file?
Yes. The checkpoint is a snapshot of the database at the moment in time when you took the checkpoint. The journal file records all transactions made after that point.
If you restore from just the checkpoint, you will recover the database as of the point in time at which the checkpoint was taken. If you restore from your latest checkpoint plus the currently running journal, you can recover the entire database up to the last recorded transaction.
The old journal backups that are created as part of the checkpoint process provide a record of everything that happened in between checkpoints. You don't need these to recover the latest state, but they can be useful in extraordinary circumstances (e.g. you discover that important data was permanently deleted by a rogue admin a month ago and you need to recover a copy of the database to the exact moment in time before that happened).
The database (and hence the checkpoint/journal) does not include depot file content! Make sure that your depots are located on reasonably durable storage (e.g. a mirrored RAID) and/or have regular backups (ideally coordinated with your database checkpoints so that you can restore a consistent snapshot in the event of a catastrophe).
https://www.perforce.com/manuals/v15.1/p4sag/chapter.backup.html

Copy large number of small blobs with AzCopy

I am trying to do an incremental copy of ca. 500.000 blobs from one storage account to another.
However, it seems that if I do not specify a /Pattern: parameter, AzCopy just hangs forever, never finishes.. (I actually stopped the process after about 15 min).
Is half a million (potentially up to 5 million) blobs too much for AzCopy to handle, or am I missing something here?
The command I'm using looks like this:
AzCopy /Source:<src>/documents /SourceKey:<srcKey> /Dest:<dest>/documents /DestKey:<deskKey> /S /XO /Y
Adding the /pattern parameter solves it, but I'd like a complete copy of all blobs in the container.
I have to add, it managed to copy all the blobs already, it is the subsequent runs that fail, when it has to "figure out" which blobs have been added since the last full backup..
Which version of AzCopy are you using? I guess this issue has been fixed for many releases... Several versions ago, AzCopy needs to list all the blobs to be downloaded before starting transfer; currently AzCopy is able to do listing and transfer simultaneously.
For download latest version of AzCopy and find more information, please refer to http://aka.ms/azcopy .

Resources