How can I download all the files from a remote directory to my local directory? - linux

I want to download all the files in a specific directory of my site.
Let's say I have 3 files in my remote SFTP directory
www.site.com/files/phone/2017-09-19-20-39-15
a.txt
b.txt
c.txt
My goal is to create a local folder on my desktop with ONLY those downloaded files. No parents files or parents directory needed. I am trying to get the clean report.
I've tried
wget -m --no-parent -l1 -nH -P ~/Desktop/phone/ www.site.com/files/phone/2017-09-19-20-39-15 --reject=index.html* -e robots=off
I got
I want to get
How do I tweak my wget command to get something like that?
Should I use anything else other than wget ?

Ihue,
Taking a shell programatic perspective I would recommend you try the following command line script, note I also added the citation so you can see the original threads.
wget -r -P ~/Desktop/phone/ -A txt www.site.com/files/phone/2017-09-19-20-39-15 --reject=index.html* -e robots=off
-r enables recursive retrieval. See Recursive Download for more information.
-P sets the directory prefix where all files and directories are saved to.
-A sets a whitelist for retrieving only certain file types. Strings and patterns are accepted, and both can be used in a comma separated list. See Types of Files for more information.
Ref: #don-joey
https://askubuntu.com/questions/373047/i-used-wget-to-download-html-files-where-are-the-images-in-the-file-stored

Related

List all RSYNCed folders in GCP GAE Linux

I set up some folders in GAE to be synced using the command -
gsutil rsync -r gs://sample1bucket1 ./sample1;
But I have forgotten what all places I have done it. How to list all these?
As per my understanding of your question, all your GAE folders are in cloud storage bucket, “sample1 bucket1”, and you are trying to sync them into directory “sample1”.If yes, then while writing the rsync commands you have to mention source and destination. So you should know where you are syncing all your files to, as per public documentation.
However,
you can list the folders in the current directory using the “ ls “
command to check for your destination folder and later cd into those
folders “cd simple1” (for your case) to see if the content has been
copied from your bucket to the file.
You can also list the number of running rsync processes using :
ps -ef| grep rsync | wc -l
I am leaving some information regarding the commands, in case you need them :
You can list all objects in a bucket using :
gsutil ls -r gs://bucket
You can list the directory with detailed information using :
rsync --list-only username#servername:/directoryname
You can list the folder contents using :
rsync --list-only username#servername:/directoryname/
You can also use the following command to parse out exactly what you need :
rsync -i

Bash Scripting with xargs to BACK UP files

I need to copy a file from multiple locations to the BACK UP directory by retaining its directory structure. For example, I have a file "a.txt" at the following locations /a/b/a.txt /a/c/a.txt a/d/a.txt a/e/a.txt, I now need to copy this file from multiple locations to the backup directory /tmp/backup. The end result should be:
when i list /tmp/backup/a --> it should contain /b/a.txt /c/a.txt /d/a.txt & /e/a.txt.
For this, I had used the command: echo /a/*/a.txt | xargs -I {} -n 1 sudo cp --parent -vp {} /tmp/backup. This is throwing the error "cp: cannot stat '/a/b/a.txt /a/c/a.txt a/d/a.txt a/e/a.txt': No such file or directory"
-I option is taking the complete input from echo instead of individual values (like -n 1 does). If someone can help debug this issue that would be very helpful instead of providing an alternative command.
Use rsync with the --relative (-R) option to keep (parts of) the source paths.
I've used a wildcard for the source to match your example command rather than the explicit list of directories mentioned in your question.
rsync -avR /a/*/a.txt /tmp/backup/
Do the backups need to be exactly the same as the originals? In most cases, I'd prefer a little compression. [tar](https://man7.org/linux/man-pages/man1/tar.1.html) does a great job of bundling things including the directory structure.
tar cvzf /path/to/backup/tarball.tgz /source/path/
tar can't update compressed archives, so you can skip the compression
tar uf /path/to/backup/tarball.tar /source/path/
This gives you versioning of a sort, as if only updates changed files, but keeps the before and after versions, both.
If you have time and cycles and still want the compression, you can decompress before and recompress after.

Is there a way to download files matching a pattern trough SFTP on shell script?

I'm trying to download multiple files trough SFTP on a linux server using
sftp -o IdentityFile=key <user>#<server><<END
get -r folder
exit
END
which will download all contents on a folder. It appears that find and grep are invalid commands, so are for loops.
I need to download files having a name containing a string e.g.
test_0.txt
test_1.txt
but no file.txt
Do you really need the -r switch? Are there really any subdirectories in the folder? You do not mention that.
If there are no subdirectories, you can use a simple get with a file mask:
cd folder
get *test*
Are you required to use sftp? A tool like rsync that operates over ssh has flexible include/exclude options. For example:
rsync -a <user>#<server>:folder/ folder/ \
--include='test_*.txt' --exclude='*.txt'
This requires rsync to be installed on the remote system, but that's very common these days. If rsync isn't available, you could do something similar using tar:
ssh <user>#<server> tar -cf- folder/ | tar -xvf- --wildcards '*/test_*.txt'
This tars up all the files remotely, but then only extracts files matching your target pattern on the receiving side.

tar very large files to FTP directly splited into smaller files

I need to backup a large server into FTP storage. I can tar all files, I can upload using FTP and I can split the tar file into many small files.
But the problem is I can't do these three steps in one step. I can tar to FTP directly, I can tar with split, but can't tar with FTP and split.
The OS is CentOS 6.2
The Files Size more than 800G
Thanks
To can tar, split and ftp a directory with one command line you need the following:
split command write to the standard output only, so you can't pass the file to another command like ftp to process it, to do so you need to patch split to can use the --filter option to can pass the output file to ftp "on the fly" without having to save to hard disk by setting up the $FILE environmental variable with the output file (the file names would be x00, x01, x02 ...).
1) Here is the split patch: http://lists.gnu.org/archive/html/coreutils/2011-01/txt3j8asgk8WH.txt
After patching split command, you would see in the man that the --filter option available in your split command.
2) install the ncftp ftp client which is a good ftp client that allows you to connect to ftp and put file in one line command, without waiting for the ftp response like ordinary ftp client. the ncftp is useful to integrate with scripts and so on.
here is the command that would compress /home directory with tar split it to 100MB small files and transfer each file through FTP
tar cvz -i /home | split -d -b 100m --filter 'ncftpput -r 10 -F -c -u ftpUsername -p ftpPassword ftpHost $FILE'
note that we used the ncftpput that would pass the $FILE to ftp in single command too.
additional ftp options:
-r 10: allows you to try to reconnect 10 times after loosing connection with ftp.
-F: To use passive mode.
-c: takes the input from stdin.
To merge the split files (x00, x01, x02, x03 ...) to can extract the file use following command
cat x* > originalFile.tar
You can make a shell script and use
tar zcf - /usr/folder | split -b 30720m - /usr/archive.tgz
and then upload to FTP also because once you are doing tar and putting onto FTP then how can you split.

How do I mirror a directory with wget without creating parent directories?

I want to mirror a folder via FTP, like this:
wget --mirror --user=x --password=x ftp://ftp.site.com/folder/subfolder/evendeeper
But I do not want to create a directory structure like this:
ftp.site.com -> folder -> subfolder -> evendeeper
I just want:
evendeeper
And anything below it to be the resulting structure. It would also be acceptable for the contents of evendeeper to wind up in the current directory as long as subdirectories are created for subdirectories of evendeeper on the server.
I am aware of the -np option, according to the documentation that just keeps it from following links to parent pages (a non-issue for the binary files I'm mirroring via FTP). I am also aware of the -nd option, but this prevents creating any directory structure at all, even for subdirectories of evendeeper.
I would consider alternatives as long as they are command-line-based, readily available as Ubuntu packages and easily automated like wget.
For a path like: ftp.site.com/a/b/c/d
-nH would download all files to the directory a/b/c/d in the current directory, and -nH --cut-dirs=3 would download all files to the directory d in the current directory.
I had a similar requirement and the following combination seems to be the perfect choice:
In the below example, all the files in http://url/dir1/dir2 (alone) are downloaded to local directory /dest/dir
wget -nd -np -P /dest/dir --recursive http://url/dir1/dir2
Thanks #ffledgling for the hint on "-nd"
For the above example:
wget -nd -np --mirror --user=x --password=x ftp://ftp.site.com/folder/subfolder/evendeeper
Snippets from manual:
-nd
--no-directories
Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the
filenames will get extensions .n).
-np
--no-parent
Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.
-np (no parent) option will probably do what you want, tied in with -L 1 (I think, don't have a wget install before me), which limits the recursion to one level.
EDIT. ok. gah... maybe I should wait until I've had coffee.. There is a --cut or similar option, which allows you to "cut" a specified number of directories from the output path, so for /a/b/c/d, a cut of 2 would force wget to create c/d on your local machine
Instead of using:
-nH --cut-dirs=1
use:
-nH --cut-dirs=100
This will cut more directories and no folders will be created.
Note: 100 = the number of folders to skip creating.
You can change 100 to any number.

Resources