tar a folder into multiple files over SSH - linux

Here is the thing
I have a server with total 85 GB disk space and right now i have a folder with the size of 50 GB which is containing over 60000 files .
Now i want to download these files on my localhost and in order to do that i need to tar the folder but I can't tar the whole folder because of disk space limitation.
So i'm looking for a way to archive the folder into two 25 GB tar file like part1.tar and part2.tar but when the first part is done it should wait for asking something like next part name or permission or anything so I can transfer the first part to an another server and then continue archiving to part2. Or a way to tar half of the folder like first 30000 files and then tar the rest.
Any idea? Thanks in advance

One of the earliest applications of rsync was to implement mirroring or backup for multiple Unix clients to a central Unix server using rsync/ssh and standard Unix accounts.
I use rsync to move compressed (and uncompressed) files between servers.
I think the command should be something like this
rsync -av host::src /dest

rsync solution was good enough but i found the solution for main question:
tar -c -M --tape-length=30000000 --file=filename.tar foldername
After reaching 29GB you will need to change the tape(in my case transferring the first part and removing it) and hit enter for continue.Additionally it is possible for give next parts name:
Prepare volume #2 for `filename.tar' and hit return:
n filename2.tar
Because it is going to take time i suggest using screen session over SSH :
http://thelinuxnoob.com/linux/screen-in-ssh/

Related

Why rm command in Linux can delete file/dir in seconds while delete in FTP is really slow

Recently I created some dir which contains a lot of files and subdir by mistake. And then I tried to delete the dir through my FTP software (FileZilla), but it's really slow, like you can see it cost 2/3 seconds to delete each file.
So I stopped it and tried that through SSH and use rm -rf command, then the target directory was deleted just in a second.
My question is why it's so slow on FTP while fast on SSH?
Much thanks!
To delete a directory tree, you have to iterate it, retrieve lists of all files and subdirectories, and delete them one by one.
When you use the remote rm -rf command, it has a direct access to the file system, so it is relatively quick.
While the FTP client has to retrieve the file lists (what involves couple of FTP command exchanges, opening data channel, listing transfer, etc) and then it has to delete the files one by one. Each delete involves sending the FTP command, waiting for the response. So it takes long.
There's no "delete whole tree" command in FTP protocol that would be an equivalent of the rm -rf command executed on the remote shell.

Backup files in pre-folders of a certain size

I want to backup my NAS on multiple DVD's. What i had in mind was a script what does the following:
-Create a folder for each DVD
-Copy the files and filestructure into the DVD folders
-Stop / goto next DVD folder when the first DVD folder is full
i.e. the trigger is 4 GByte (which calculates easy for the example)
I have a datasrouce with 10 gb of data., so this will be 3 DVD's. So the script first create three folders: DVD-1, DVD-2 and DVD-3. Next the copy will start to copy 4 GB to the DVD-1 folder. After that, the remaining files must come in DVD-2 and DVD-3.
As far as i know, rsync and cp doesn't bother about calculating this. I know it is an option to do this by using archives like zip, tar or gz but at first i want to try it with unpacked files.
Is all above an option with standard Linux bash commands or is it insane?
No, there isn't any standard tool that does this out of the box. But it's pretty simple to code up, and there are a few projects to do it:
https://unix.stackexchange.com/questions/18628/generating-sets-of-files-that-fit-on-a-given-media-size-for-tar-t

Debian: Cron bash script every 5 minutes and lftp

We have to run a script every 5 minutes for downloading data from an FTP server. We have arranged the FTP script, but now we want to download automatic every 5 minutes the data.
We can use: "0 * * * * /home/kbroeren/import.ch"
where import the ftp script is for downloading the data files.
The point is, the data files become every 5 minutes available on the FTP server. Sometimes this where will be a minute offset. It would be nice to download the files when they become a couple of seconds be available on the FTP server. Maybe a function that scans the ftp file folder if the file is allready available, and then download the file, if not... the script will retry it again in about 10 seconds.
One other point to fix is the time of the FTP script. there are 12k files in the map. We should only the newest every time we run the script. Now scanning the folder takes about 3 minutes time thats way too long. The filename of all the datafiles contains date and time, is there a possibility to make a dynamic filename to download the right file every 5 minutes ?
Lot os questions, i hope someone could help me out with this!
Thank you
Kevin Broeren
Our FTP script:
#!/bin/bash
HOST='ftp.mysite.com'
USER='****'
PASS='****'
SOURCEFOLDER='/'
TARGETFOLDER='/home/kbroeren/datafiles'
lftp -f "
open $HOST
user $USER $PASS
LCD $SOURCEFOLDER
mirror --newer-than=now-1day --use-cache $SOURCEFOLDER $TARGETFOLDER
bye
"
find /home/kbroeren/datafiles/* -mtime +7 -exec rm {} \;
Perhaps you might want to give a try to curlftpfs. Using this FUSE filesystem you can mount an FTP share into your local filesystem. If you do so, you won't have to download the files from FTP and you can iterate over the files as if they were local. You can give it a try following these steps:
# Install curlftpfs
apt-get install curlftpfs
# Make sure FUSE kernel module is loaded
modprobe fuse
# Mount the FTP Directory to your datafiles directory
curlftpfs USER:PASS#ftp.mysite.com /home/kbroeren/datafiles -o allow_other,disable_eprt
You are now able to process these files as you wish. You'll always have the most recent files in this directory. But be aware of the fact, that this is not a copy of the files. You are working directly on the FTP server. For example removing a file from /home/kbroeren/datafiles will remove it from the FTP server.
If this works foor you, you might want to write this information into /etc/fstab, to make sure the directory is mounted with each start of the mashine:
curlftpfs#USER:PASS#ftp.mysite.com /home/kbroeren/datafiles fuse auto,user,uid=USERID,allow_other,_netdev 0 0
Make sure to change USERID to match the UID of the user who needs access to this files.

rsync : copy files if local file doesn't exist. Don't check filesize, time, checksum etc

I am using rsync to backup a million images from my linux server to my computer (windows 7 using Cygwin).
The command I am using now is :
rsync -rt --quiet --rsh='ssh -p2200' root#X.X.X.X:/home/XXX/public_html/XXX /cygdrive/images
Whenever the process is interrupted, and I start it again, it takes long time to start the copying process.
I think it is checking each file if there is any update.
The images on my server won't change once they are created.
So, is there any faster way to run the command so that it may copy files if local file doesn't exist without checking filesize, time, checksum etc...
Please suggest.
Thank you
did you try this flag -- it might help, but it might still take some time to resume the transfer:
--ignore-existing
This tells rsync to skip updating files that already exist on the destination (this does not ignore
existing directories, or nothing would get done). See also --existing.
This option is a transfer rule, not an exclude, so it doesn't affect the data that goes into the
file-lists, and thus it doesn't affect deletions. It just limits the files that the receiver requests
to be transferred.
This option can be useful for those doing backups using the --link-dest option when they need to con-
tinue a backup run that got interrupted. Since a --link-dest run is copied into a new directory hier-
archy (when it is used properly), using --ignore existing will ensure that the already-handled files
don't get tweaked (which avoids a change in permissions on the hard-linked files). This does mean that
this option is only looking at the existing files in the destination hierarchy itself.

Recover files deleted with rsync -avz --delete

Is it possible to recover files deleted with rsync -avz --delete?
If it is, what are some suggested tools to do so?
I am assuming you ran rsync on some unix system.
If you don't have a backup of your file system,
then its a long tedious process recovering deleted files from unix file system.
High level steps :
find partition where your file resided
create image of entire partition % dd if=/partition of=partition.img ..
(this assumes you have enough space to store this somewhere locally in a different partition, or you can copy it over to different system % dd if=/partition | ssh otherhost "dd of=partition.img")
open the img file in hex edit
(this assumes you know the contents of the files that you've lost and can identify them when you see the content.)
note the byte offset and length of your file
use grep -b to extract the contents of your missing file.
enjoy!
I wasn't able to get extundelete to work, so I ended up using photorec + find/grep in order to recover my important files.

Resources