I need to copy images from Linux server to my windows desktop, should i use threads? and how many? - multithreading

I need to copy between 400 - 5000 images, it changes every run.
how can i calculate how many threads will give me the fastest result?
should i open new SSH connection to each thread?
i use paramiko to open ssh connection.
and use sftp to copy the images.
thx

I guess best solution before copying it's add images to one archive, because each time it's checks that each file copied and creating of new file it's very consumable operation.
If you will copy archive in one thread it's can have much faster speed of copying, because it's will not wait for each image copy.
So, will be much faster
pack to archive
copy
unpack
You can check it even without connection between any computers, just copy about 1 gb little files from one hard drive to another, and than pack these files into archive and copy again, you will notice that 2nd way will be muuuuuch faster

Related

Cannot restore big file from azure backup because of six hours timeout

I am trying to restore a big file (~40GB) from Azure backup. I can see my recovery point and mount it as disk drive so I can copy/paste the file I need. The problem is that the copying takes approx. 8 hours, but the disk drive (recovery point) is automatically unmounted after 6 hours and the process fails consistently. I couldn't find any setting in the backup agent to increase this slot.
Any thoughts how to overcome this?
You can extend the mount time by setting the number of hours as a higher value.
RecoveryJobTimeOut under "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Azure Backup\Config\CloudBackupProvider”
Type is DWORD; value is number of hours.
After some struggling I've found a workaround, so I'll post it here for others...
I've mounted the needed recovery point as a disk drive and started a file copy. It shows the standard Windows copy file progress dialog, which has an option of pausing. So after ~5.5 hours, just before the drive is unmounted, I paused the copy, unmounted the drive manually, mounted it again (getting another 6-hours slot), and than resumed the copy. Well, I don't think that this is how Microsoft wanted me to work, but it gets the job done.
Happy restoring!
Try to compress a copy of the file into a .zip on the server, then download the (hopefully smaller file)
Also, if you don't mind me asking, what the heck made a 40Gig file?

What is the fastest method to copy directory in linux?

Linux is offering much less copying speed compared to windows.To copy 1GB data in linux it takes much more time compared to windows.So please suggest me to copy directory as effective as windows.I tried using cp,rsync and also changed the file system to ntfs but i did not find any of the above method as good as windows copying speed.
Copying speed heavily depends on underlying file system. Sorther command for copying a directory is this:
cp -a ORIGIN DESTINATION

s3cmd sync "remote copy" style operation in rsync or alternatives?

There is a very useful "remote copy" feature in s3cmd sync where any duplicate copies of a file existing in the source directory will not be transferred more than once to the S3 bucket, but a remote copy will take place instead, reducing the bandwidth taken for the transfer.
I have been searching for a similar solution to do a similar file transfer between 2 Linux servers. I've used rsync many times in the past, doesn't look like it has an option for this but perhaps I have missed something.
Simple example :
/sourcedir/dir1/filea
/sourcedir/dir1/fileb
/sourcedir/dir1/filec
/sourcedir/dir2/filea
/sourcedir/dir2/filed
/sourcedir/dir2/filee
/sourcedir/dir3/filea
/sourcedir/dir3/filef
/sourcedir/dir3/fileg
With a typical transfer, filea would be transferred across the network 3 times.
I'd like to transfer this file only once and have the remote server copy the file twice to restore it to the correct directories on the other side.
I need to perform a sync on a large directory with many duplicates in the fastest time possible.
I know it would be possible to script a solution to this, but if anyone knows an application with this native functionality then that would be great!
Thanks

cp command time discrepancy

Im not sure exactly what category to put this in.
I have tried to do the following with a file that is 7.7GB on my system Centos 5.5
time cp original copy
and
time cp copy copy2
The copy of the copy is about half the time of the copy of the original.
I thought maybe the OS was cacheing or something, so I went to another directory and copied a few small files and stuff, and went back to make the copy of the copy again, and it was still way faster.
Any ideas whats going on here? Is the OS caching the file or something?
What made me notice this problem is that I have some code that processes this file. I wanted to test it on two files, so I just made a copy. I then noticed that the original file takes the longest to process on. What kind of diagnostics can I run on this?
The OS doesn't cache the file so much as it caches the disk blocks it read.
There's a couple of ways to try and account for caching when running timing tests. You could try to flush the OS disk buffers by allocating a huge amount of memory (I usually run something like perl -e '"\0"x1024x1024x1024' to do this); free before and after should give you an idea of how much data the OS has cached (under the buffers and cached columns).
Or when you time your run, ignore the system time - that will be primarily I/O - and just watch the user time. Of course, different runs may be very well dealing with different amounts of data so you would expect there to be different amounts of I/O.
The most reliable way is to run the test several times and use the fastest time as the value to compare.
sync && echo 3 > /proc/sys/vm/drop_caches
time cp original copy
sync && echo 3 > /proc/sys/vm/drop_caches
time cp copy copy2

Break a zip file into INDIVIDUAL pieces

What I am trying to do is this;
I get these zip files from clients which are 1.5gb in general. They all include pictures only. I need to make them into 100mb files to actually upload it to my server. Problem is that, If I break my 1.5gb zip file, I need to re-attach all of them if I need to use one.
When I break the 1.5gb zip file into a 100mb zip file, I need the 100mb one to act as a separate new file so the server will unzip it and upload the pictures into the database. I have looked for this problem but most of the threads are about how to split a zip file. This is partially what I want to do and I can do it now but I also need those smaller pieces to be able to unzip on its own. Is it possible to break a zip file into smaller pieces that will act as a new, stand alone zip files?
Thanks.
I have the same question. I think unzip in the Linux shell cannot handle a zip file larger than 1 GB, and I need to unzip them unattended in a headless NAS. What I do for now is unzip everything in the desktop HD, select files until they almost reach 1 GB, archive and delete them, then select the next set of files until I reach 1 GB.
Your answer is not clear, but I will try to answer it based upon my understanding of your dilemma.
Questions
Why does the file size need to be limited?
Is it the transfer to the server that is the constraining factor?
Is application (on the server) unable to process files over a certain size?
Can the process be altered so that image file fragments can be recombined on the server before processing?
What operating systems are in use on the client and the server?
Do you have shell access to the server?
A few options
Use imagemagick to reduce the files so they fit within the file size constraints
On Linux/Mac, this is relatively straightforward to do:
split -b 1m my_large_image.jpg (you need the b parameter for it to work on binary files)
Compress each file into its own zip
Upload to the server
Unzip
Concatenate the fragments back into an image file:
cat xaa xab xac xad (etc) > my_large_image.jpg

Resources