What is the fastest method to copy directory in linux?

What is the fastest method to copy directory in linux? - linux

Linux is offering much less copying speed compared to windows.To copy 1GB data in linux it takes much more time compared to windows.So please suggest me to copy directory as effective as windows.I tried using cp,rsync and also changed the file system to ntfs but i did not find any of the above method as good as windows copying speed.

Copying speed heavily depends on underlying file system. Sorther command for copying a directory is this:
cp -a ORIGIN DESTINATION

Related

I need to copy images from Linux server to my windows desktop, should i use threads? and how many?

I need to copy between 400 - 5000 images, it changes every run.
how can i calculate how many threads will give me the fastest result?
should i open new SSH connection to each thread?
i use paramiko to open ssh connection.
and use sftp to copy the images.
thx

I guess best solution before copying it's add images to one archive, because each time it's checks that each file copied and creating of new file it's very consumable operation.
If you will copy archive in one thread it's can have much faster speed of copying, because it's will not wait for each image copy.
So, will be much faster
pack to archive
copy
unpack
You can check it even without connection between any computers, just copy about 1 gb little files from one hard drive to another, and than pack these files into archive and copy again, you will notice that 2nd way will be muuuuuch faster

best way to copy complex directory structure file

As we know, we could use rsync, tar or cp to transfer directory or files on NFS file system or different server.
Just be curious that which way is best to transfer complex directory?
Or writing a customized c program would be simple and fast to copy or create the link directory structure tree.

The best way is use rsync. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. rsync is widely used for backups and mirroring and as an improved copy command for everyday use.
The utility of command tar is: "designed to store and extract files from an archive file known as a tarfile". Not to do a transfer between servers.. And cp is more simple than rsync command. Replace all the files in the destination.
About program customized in C, i think that It is difficult to optimize something that has already been optimized.

Checking the integrity of a copied folder

I am copying a big folder (300Gb) into an external hard drive. I want to make sure the copied file is complete and not corrupt before deleting the original file. How can I do that in ubuntu?

You could use rsync --checksum to check the files. Or simply use sha256sum or similar to check the files manually. Using rsync is in my opinion more comfortable because it automatically checks recursively, but that largely depends on your usecase.
If you really require absolute integrity, you should really consider using an error correction code . Hard drives don't keep data integrity forever, a bit might change from time to time.

How to speed up reading of a fixed set of small files on linux?

I have 100'000 1kb files. And a program that reads them - it is really slow.
My best idea for improving performance is to put them on ramdisk.
But this is a fragile solution, every restart need to setup the ramdisk again.
(and file copying is slow as well)
My second best idea is to concatenate the files and work with that. But it is not trivial.
Is there a better solution?
Note: I need to avoid dependencies in the program, even Boost.

You can optimize by storing the files contiguous on disk.
On a disk with ample free room, the easiest way would be to read a tar archive instead.
Other than that, there is/used to be a debian package for 'readahead'.
You can use that tool to
profile a normal run of your software
edit the lsit of files accesssed (detected by readahead)
You can then call readahead with that file list (it will order the files in disk order so the throughput will be maximized and the seektimes minimized)
Unfortunately, it has been a while since I used these, so I hope you can google to the resepctive packages
This is what I seem to have found now:
sudo apt-get install readahead-fedora
Good luck

If your files are static, I agree just tar them up and then place that in a RAM disk. Probably be faster to read directly out of the TAR file, but you can test that.
edit:: instead of TAR, you could also try creating a squashfs volume.
If you don't want to do that, or still need more performance then:
put your data on an SSD.
start investigating some FS performance test, starting with EXT4, XFS, etc...

How can I make a copy of device/socket file

I can know inode of device/socket with stat, so seems like I can somehow "copy" this file for backup. Of course the solution is "dd", but I have no idea what can I do if the device is infinity (like the random one). And can I just copy the inode somehow?

These are referred to as "special files" or "special nodes". Copying their contents doesn't make sense, as the contents are generated in one way or another programatically by the kernel as needed.
Programs like "tar" know how to copy the contents of the inode, which will refer to the portion of the kernel that support each of these different nodes. See the documentation of the "mknod" command for some more details.

And if you need one-liner to copy device nodes with tar, here it is:
cd /dev && tar -cpf- sda* | tar -xf- -C /some/destination/path/

Found out the major and minor number of the device file you need to copy then use mknod to create the device file with the same major and minor number. Major number is used for a program to access to kernel device switch table and calling the proper kernel function (usually device drive). Minor number is used as a parameter for calling those functions (like different density, disk, .... etc).

24 July 2022
There is one legitimate use case for copying (archiving) a socket.
I have a program that gathers and summarizes attribute data in a file system tree. In order to regression test, I created a directory that contains one example of every type of file the program might encounter. I run my program on this directory to test it whenever I alter the code.
It is necessary to backup this directory along with other more valuable data, and it is necessary to restore it, should the storage device fail.
tar is the program of choice, and of course tar can not archive a socket. Doing so in most situations is senseless - any program that uses the socket will have to delete it and recreate it before use.
In the case of the test directory, there is one named socket, for it is possible that my program will encounter such things and it needs to correctly gather attributes for a complete summary.
As noted by others, that socket is not useful for anything directly. It does, however, occupy a little storage space, much as an empty file occupies storage space. That is why you can see it in the directory listing.
You can copy it successfully with the command:
cp -ar --parents <path> <backup_device_directory>
and restore it with:
cp -ar --parents <backup_device_directory>/<path> <directory>
The socket is not useful for anything except probing its attributes with a program during a regression test.
Archiving it saves the trouble of having to remember to recreate it after a restoration. The extra nuisance of archiving the sockets is easily codified in a script and forgotten. That is what we all want - easy to use solutions whose implementation you can ignore after you have solved the problem.

You can copy from a working system as below to some shared location between the machines and copy from the shared location to the other system.
Machine A
cp -rf /dev/SRC shared_directory
Machine B
cp -rf shared_directory /dev/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string