wget recursive folder structure with multiple index.html files in tree

wget recursive folder structure with multiple index.html files in tree - linux

By default wget -r downloads directories as directoryname.html. I'd like it to download to directoryname/index.html
So instead of:
index.html
contact.html
support.html
I'd like:
index.html
contact/index.html
support/index.html
Is this posible with wget?

When I want to mirror a web site I use:
$ wget -m -E -nH -np --cut-dirs=2 http://site/a/b/
This way everything under the directory "b" will be downloaded. If your target directory is at a different level, you need to adjust --cut-dirs accordingly.

Related

How can I download all the files from a remote directory to my local directory?

I want to download all the files in a specific directory of my site.
Let's say I have 3 files in my remote SFTP directory
www.site.com/files/phone/2017-09-19-20-39-15
a.txt
b.txt
c.txt
My goal is to create a local folder on my desktop with ONLY those downloaded files. No parents files or parents directory needed. I am trying to get the clean report.
I've tried
wget -m --no-parent -l1 -nH -P ~/Desktop/phone/ www.site.com/files/phone/2017-09-19-20-39-15 --reject=index.html* -e robots=off
I got
I want to get
How do I tweak my wget command to get something like that?
Should I use anything else other than wget ?

Ihue,
Taking a shell programatic perspective I would recommend you try the following command line script, note I also added the citation so you can see the original threads.
wget -r -P ~/Desktop/phone/ -A txt www.site.com/files/phone/2017-09-19-20-39-15 --reject=index.html* -e robots=off
-r enables recursive retrieval. See Recursive Download for more information.
-P sets the directory prefix where all files and directories are saved to.
-A sets a whitelist for retrieving only certain file types. Strings and patterns are accepted, and both can be used in a comma separated list. See Types of Files for more information.
Ref: #don-joey
https://askubuntu.com/questions/373047/i-used-wget-to-download-html-files-where-are-the-images-in-the-file-stored

Cron command in server to Zip a folder and exclude other?

I created a cron task in the hosting of my website. I use this command:
zip -r public_html.zip public_html -x *public_html/cache/smarty*
As you can see I'm trying to zip the public_html excluding the folder: public_html/cache/smarty
The zip is created but I can not get to exclude the folder.
What am I missing here?

try this:
zip -r public_html.zip public_html -x *cache/smarty/*

How do I mirror a directory with wget without creating parent directories?

I want to mirror a folder via FTP, like this:
wget --mirror --user=x --password=x ftp://ftp.site.com/folder/subfolder/evendeeper
But I do not want to create a directory structure like this:
ftp.site.com -> folder -> subfolder -> evendeeper
I just want:
evendeeper
And anything below it to be the resulting structure. It would also be acceptable for the contents of evendeeper to wind up in the current directory as long as subdirectories are created for subdirectories of evendeeper on the server.
I am aware of the -np option, according to the documentation that just keeps it from following links to parent pages (a non-issue for the binary files I'm mirroring via FTP). I am also aware of the -nd option, but this prevents creating any directory structure at all, even for subdirectories of evendeeper.
I would consider alternatives as long as they are command-line-based, readily available as Ubuntu packages and easily automated like wget.

For a path like: ftp.site.com/a/b/c/d
-nH would download all files to the directory a/b/c/d in the current directory, and -nH --cut-dirs=3 would download all files to the directory d in the current directory.

I had a similar requirement and the following combination seems to be the perfect choice:
In the below example, all the files in http://url/dir1/dir2 (alone) are downloaded to local directory /dest/dir
wget -nd -np -P /dest/dir --recursive http://url/dir1/dir2
Thanks #ffledgling for the hint on "-nd"
For the above example:
wget -nd -np --mirror --user=x --password=x ftp://ftp.site.com/folder/subfolder/evendeeper
Snippets from manual:
-nd
--no-directories
Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the
filenames will get extensions .n).
-np
--no-parent
Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.

-np (no parent) option will probably do what you want, tied in with -L 1 (I think, don't have a wget install before me), which limits the recursion to one level.
EDIT. ok. gah... maybe I should wait until I've had coffee.. There is a --cut or similar option, which allows you to "cut" a specified number of directories from the output path, so for /a/b/c/d, a cut of 2 would force wget to create c/d on your local machine

Instead of using:
-nH --cut-dirs=1
use:
-nH --cut-dirs=100
This will cut more directories and no folders will be created.
Note: 100 = the number of folders to skip creating.
You can change 100 to any number.

Can wget be used to get all the files on a server?

Can wget be used to get all the files on a server.Suppose if this is the directory structure using Django framework on my site foo.com
And if this is the directory structure
/web/project1
/web/project2
/web/project3
/web/project4
/web/templates
Without knowing the name of directories of /project1,project2.....Is it possible to download all the files

You could use
wget -r -np http://www.foo.com/pool/main/z/
-r (fetch files/folders recursively)
-np (do not descent to parent directory when retrieving recursively)
or
wget -nH --cut-dirs=2 -r -np http://www.foo.com/pool/main/z/
--cut-dirs (it makes Wget not "see" number remote directory components)
-nH (invoking Wget with -r http://fly.srk.fer.hr/ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.)

First of all, wget can only be used to retrieve files served by the web server. It's not clear in the question you're posting whether you mean actual files or web pages. I would guess from the way you phrased your question that your intent is to download the server files, not the web pages served by Django. If this is correct, then no wget won't work. You need to use something like rsync or scp.
If you do mean using wget to retrieve all of the generated pages from Django, then this will only work if links point to those directories. So, you need a page that has code like:
<ul>
<li>Project1</li>
<li>Project2</li>
<li>Project3</li>
<li>Project4</li>
<li>Templates</li>
</ul>
wget is not a psychic; it can only pull in pages it knows about.

try recursive retrieval - the -r option.

Command to zip a directory using a specific directory as the root

I'm writing a PHP script that downloads a series of generated files (using wget) into a directory, and then zips then up, using the zip command.
The downloads work perfectly, and the zipping mostly works. I run the command:
zip -r /var/www/oraviewer/rgn_download/download/fcst_20100318_0319.zip /var/www/oraviewer/rgn_download/download/fcst_20100318_0319
which yields a zip file with all the downloaded files, but it contains the full /var/www/oraviewer/rgn_download/download/ directories, before reaching the fcst_20100318_0319/ directory.
I'm probably just missing a flag, or something small, from the zip command, but how do I get it to use fcst_20100318_0319/ as the root directory?

I don't think zip has a flag to do that. I think the only way is something like:
cd /var/www/oraviewer/rgn_download/download/ && \
zip -r fcst_20100318_0319.zip fcst_20100318_0319
(The backslash is just for clarity, you can remove it and put everything on one line.)
Since PHP is executing the command in a subshell, it won't change your current directory.

I have also get it worked by using this command
exec('cd '.$_SERVER['DOCUMENT_ROOT'].' && zip -r com.zip "./"');

cd /home/public_html/site/upload/ && zip -r sub_upload.zip sub_upload/

Use the -j or --junk-paths option in your zip command.
From the zip man page:
-j
--junk-paths
Store just the name of a saved file (junk the path), and do not store
directory names. By default, zip will store the full path (relative
to the current directory).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

wget recursive folder structure with multiple index.html files in tree - linux

By default wget -r downloads directories as directoryname.html. I'd like it to download to directoryname/index.html So instead of: index.html contact.html support.html I'd like: index.html contact/index.html support/index.html Is this posible with wget?

When I want to mirror a web site I use: $ wget -m -E -nH -np --cut-dirs=2 http://site/a/b/ This way everything under the directory "b" will be downloaded. If your target directory is at a different level, you need to adjust --cut-dirs accordingly.

Related

How can I download all the files from a remote directory to my local directory?

Cron command in server to Zip a folder and exclude other?

How do I mirror a directory with wget without creating parent directories?

Can wget be used to get all the files on a server?

Command to zip a directory using a specific directory as the root

Categories

Resources