Is there any way to download data from disk station to the server directly? - linux

I have disk station with some large files (site with interactive file system where I can download some files and archives). I also have server with ssh access. Can I download file directy to the server without downloading it to my local machine and then using scp to the server?
Some people say to use
wget http://link-to-the-file
But I am not sure, that there is direct download link. Moreover you should specify language of achieve before downloading (confirming).
Note: I tried to use wget but can't understand is there downloading link.

Related

Downloading PDF files from a remote website

How can I download all the PDF files present in a website? Here I don't want to use wget command as that makes it manual and a lot time taking.

how to copy between two host machine and windows server client using ansible faster?

My aim is to copy from a machine A(ubuntu) to remote server B(windows2012 server) using ansible copy command. I can ping the windows server machine and can even copy a small folder from ubunto to server but when the folder size becomes big it takes so so long to get copied and sometimes not get copied . I am using as follows:
-name:copy file
win_copy:
src: '/service/test.zip'
dest:'D:/test/test.zip'
test.zip folder is around 300 MB. So, win_copy is not solving my purpose. Could you suggest what can be good option in this case?
I've had this problem and just wrote a powershell script to download the file directly to a known location on the target. Write a powershell script to download the file. I would deploy the script to the target using win_copy or win_template (if you need to do substitutions) and then call it using win_command.

SmartDL for ftp

I need a python code which can download files from a ftp server. I need a built in multi-part download managing package which can help me to retrieve files faster. I tried SmartDL but the problem is I don't know how to retrieve files in a ftp server. Also I used the add_basic_authentication to ensure that, I am passing the right credentials. Please help me with a solution.
I have no problem using any other solution/package which uses Multipart download.
P.S:- I need to save the Downloaded files on to an Object storage on Cloud. The size of each file may be 300MB and I need to download 20TB of data.
Thanks in anticipation.
Take a look at ftplib, it's a simple FTP library which will permit you to download files from a FTP server.

linux (CLI) download files via shared dropbox (folder`) link without a account

I was thinking to use dropbox to upload my source code of a web-application. For this folder i would create a shared link. This link i like to use to download all the latest source files on my test server (instead of using s/FTP).
Now i know you can use dropbox with linux by installing their version, but it requires to create account. I don't want to use a account, and for sure don't want to use my own account.
Is there anyway to use a shared (folder) link, and download all the files in that folder command-line, without a account (maybe something like wget) ? There is no need for live-syncing, it would be fine to trigger the download with some bash script.
Thanks.
If you're ok with your links being public (which i think is not a good idea) , then you can just create a file with a list of links to your files and then create a bash script to loop over each line of the file get the link with wget
If you want to use authentication, you'll have to register for a Dropbox API key and then create a script (in python,ruby or java etc) to authenticate and get the files.
If you don't have a specific need for dropbox, i'll recommend you use git (or similar). With git you'll just have to create the repository on your server and clone it on your desktop. Then you can just edit your files and push it to the server.... it's so much easier.
Rogier, github has become the norm for hosting code. There are other options (Sourceforge, Google Code, Beanstalk) or you can set up a private git repository on your own computer.
Somewhere deep in my browser history there's an article about how to do that.
However a little googling turned up http://news.ycombinator.com/item?id=1652414. Let me know if you can't find some satisfactory instructions on your own of how to set up a git repo on your computer.

What's the best way to save a complete webpage on a linux server?

I need to archive complete pages including any linked images etc. on my linux server. Looking for the best solution. Is there a way to save all assets and then relink them all to work in the same directory?
I've thought about using curl, but I'm unsure of how to do all of this. Also, will I maybe need PHP-DOM?
Is there a way to use firefox on the server and copy the temp files after the address has been loaded or similar?
Any and all input welcome.
Edit:
It seems as though wget is 'not' going to work as the files need to be rendered. I have firefox installed on the server, is there a way to load the url in firefox and then grab the temp files and clear the temp files after?
wget can do that, for example:
wget -r http://example.com/
This will mirror the whole example.com site.
Some interesting options are:
-Dexample.com: do not follow links of other domains
--html-extension: renames pages with text/html content-type to .html
Manual: http://www.gnu.org/software/wget/manual/
Use following command:
wget -E -k -p http://yoursite.com
Use -E to adjust extensions. Use -k to convert links to load the page from your storage. Use -p to download all objects inside the page.
Please note that this command does not download other pages hyperlinked in the specified page. It means that this command only download objects required to load the specified page properly.
If all the content in the web page was static, you could get around this issue with something like wget:
$ wget -r -l 10 -p http://my.web.page.com/
or some variation thereof.
Since you also have dynamic pages, you cannot in general archive such a web page using wget or any simple HTTP client. A proper archive needs to incorporate the contents of the backend database and any server-side scripts. That means that the only way to do this properly is to copy the backing server-side files. That includes at least the HTTP server document root and any database files.
EDIT:
As a work-around, you could modify your webpage so that a suitably priviledged user could download all the server-side files, as well as a text-mode dump of the backing database (e.g. an SQL dump). You should take extreme care to avoid opening any security holes through this archiving system.
If you are using a virtual hosting provider, most of them provide some kind of Web interface that allows backing-up the whole site. If you use an actual server, there is a large number of back-up solutions that you could install, including a few Web-based ones for hosted sites.
What's the best way to save a complete webpage on a linux server?
I tried couple of tools curl, wget included but nothing works up to my expectations.
Finally I found a tool to save a complete webpage (images, scripts, linked pages.... everything included). Its written in rust named monolith. Take a look.
It do not save images and other scripts/ stylesheets as separate files but pack them in 1 html file.
For example
If I had to save https://nodejs.org/en/docs/es6 to a es6.html with all page requisites packed in one file then I had to run:
monolith https://nodejs.org/en/docs/es6 -o es6.html
wget -r http://yoursite.com
Should be sufficient and grab images/media. There are plenty of options you can feed it.
Note: I believe wget nor any other program supports downloading images specified through CSS - so you may need to do that yourself manually.
Here may be some useful arguments: http://www.linuxjournal.com/content/downloading-entire-web-site-wget

Resources