How to save web page with its objects using wget? - linux

A web page which is shown in a browser consists of HTML document and some objects such as CSS, JS, Image, etc. I want to save all of them on my hard disk using wget command to load it later from local computer. Is there any chance?
Note: I want a page not all pages of a web site or something similar.

Use following command:
wget -E -k -p http://example.com
Detail of switches:
-E :
If a file of type application/xhtml+xml or text/html is downloaded and the URL does not end with the regexp .[Hh][Tt][Mm][Ll]?, this option will cause the suffix .html to be appended to the local filename. This is useful, for instance, when you're mirroring a remote site that uses .asp pages, but you want the mirrored pages to be viewable on your stock Apache server. Another good use for this is when you're downloading CGI-generated materials. A URL like http://example.com/article.cgi?25 will be saved as article.cgi?25.html.
-k
After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.
-p
This option causes Wget to download all the files that are necessary
to properly display a given HTML page. This includes such things as
inlined images, sounds, and referenced stylesheets.

Related

How to force audio mp3 download on mobile using PHP

I'm trying to find a solution for forcing audio files to download on mobile using PHP.
I tried using the MIME type 'AddType audio/mpeg .mp3' in my .htaccess file to no effect. Direct downloads work fine on desktop but not on mobile - it always redirects to the default Wordpress player to download instead of just downloading after clicking the link.
I've searched various solutions and this is the closet I got Forcing to download a file using PHP
My files are hosted outside of my domain. This is the PHP file I'm using but it's not working.
$file_name = 'file.mp3';
$file_url = 'https://mcdn.podbean.com/mf/web/' . $file_name;
header('Content-Type: audio/mpeg');
header("Content-Transfer-Encoding: Binary");
header("Content-disposition: attachment; filename="my-file.mp3");
readfile($file_url);
exit;
These files are hosted with another provider on another domain. Nothing you can do will force the direct download unless they were hosted on a platform like AWS where you can create your own bucket and link it to your domain. That's costly.
Podbean suggested I change "web" in the podcast URL to "download." I used the Find and Replace All Wordpress plugin and it automatically changed them all. Just make sure you search for whatever proceeds the word you want to change so you don't replace a common word like "web" elsewhere in your database.
Always backup your database before making mass changes.
With over 800 episodes it worked perfectly after clearing the cache.

Htaccess - Allow access to file only from pdf.js

i have some problem with .htaccess file.
For prevent download or print of pdf documents , i am using PDF.js for reading contents.
Now i want to disable direct http connection to those files.
Inside the pdf.js folders, i put a directory called "doc", that contains all items and this .htaccess:
Order allow,deny
Deny from all
<Files ~ "viewer\.html$">
Allow from all
</Files>
Where viewer.html is the page that contains the documents reader.
So, when i try access from my browser to
localhost:8080/test/pdfjs/web/viewer.html?file=doc/mondia.pdf
i get:
Unexpected server response (403) while retrieving PDF "../test/pdfjs/web/mondia.pdf"
Where i am wrong?
If PDF.js is running inside the user's web browser, then the user needs to be able to download the PDF document. Apache can't (reliably) tell the difference between "PDF.js on the user's computer" and "Google Chrome on the user's computer" - both are HTTP requests from the user's computer for the resource.
If you really wanted to, you might be able to detect some header set by PDF.js when it requests the PDF, and refuse requests without that header. That would stop casual users directly accessing the file, but anyone who presses F12 in their browser could see the PDF being downloaded by PDF.js and save the contents from there.
Even if you served it in some form other than PDF, the user could copy and paste the resulting HTML, or take a screenshot of how it renders to the screen.
Stopping a user doing something with their own computer is fundamentally hard; if they can read something on their screen, you have sent it to them in some form. To really block them, you need a trusted "DRM" encryption system that renders directly to screen without ever making decrypted data accessible to the user. In the vast majority of cases, that would be completely overkill, and just annoy your users (for instance, blind users probably won't be able to access the content, as their screen reader software will not be trusted).
You can try with this plugin
https://it.wordpress.org/plugins/editionguard-for-woocommerce-ebook-sales-with-drm/#description
or similar,
DRM is the best solution for wordpress site.
Or try with this header in pdf-js
How to set range header from client with pdf.js?
Please edit the .htacess file present in Vtiger_root_location/storage
add 'pdf' option as follows:

Forcing file text/plain display on csv file from href link

I want the users of my webapp to be able to click an href link leading directly to the content of .csv files located on the IIS server, just as if it was a .txt file. I don't want the browser to open the download file dialog which obviously forces the users to download the file in order to see it quickly and that's not what I want.
In a nutshell, (how) can we force the browsers to display csv files as text/plain without having the hand on the http requests (href link) ? Is it possible in IIS 7 ?
Thanks a lot.

is it possible to redirect a link to a pdf file

is it possible to redirect a link to a pdf file?
This is my site: www.mysite.com
And I createad a redirect link that if I open www.mysite.com/documentation - an index.html file will open but for now this index.html file says it's under construction.
Can I redirect the link to a pdf file?
I uploaded the pdf file into the server. So that if I open www.mysite.com/documentation, my pdf named as thedocument.pdf will open.
Is it possible?
If your web hosting provider won't allow you alter the default extensions (i.e. to add pdf) then you could create a HTML page to act as the landing page and then redirect to the PDF.
Details here: http://www.web-source.net/html_redirect.htm
Yes, this is possible. Check your web server configuration that currently points /documentation to index.html, and change it to point to thedocument.pdf.
Be aware that the PDF may or may not load in the user's browser. Some configurations will prompt the user to download the file.
You can achieve this with a modX Weblink.
Upload your pdf to a place from where it is publicly downloadable. For this example I will upload it to modx/assets/content/test.pdf. You should now be able to download your file from http://yourdomain.tld/assets/content/test.pdf.
Create a new weblink resource in modX (Site -> New Weblink) on the base level of your resource-tree. Make it a container and type in 'documentation' as the alias of your resource. You should also make sure, that you already use friendly urls, but your question sounds like you're already doing this.
A weblink has a field to enter the 'weblink' instead of the ressource content. Just use the URL from which you are already able to download your pdf-file. In this example, this would be http://yourdomain.tld/assets/content/test.pdf.
If you now view the newly created weblink resource under http://yourdomain.tld/documentation, you should instantly download a copy of your file test.pdf.

How to force a specific code page for a website?

HI
I have the following (apparently simple) problem: I have to install a simple website, made by someone else, on a web hosting account. The site consists of lot and lot of HTML pages, no dynamic content, created some in MS Word and saved as html, some in frontpage, etc. A mixed bag.
I uploaded initially on a test account on my server (Win Server 2003) and it works ok.
Then I uploaded on the real web hosting (fedora / apache).
When I loaded the site in browser I see lot of odd craracters (instead of diacritics, used in html pages). Duacritics were saved as escape code, like & #350; for Ș (using codepage 1252).
The problem is, when I load the page from my own test server, the browser select automatically correct codepage (1252).
But when I load the site from public host, the same bowser loads the page using utf-8 encoding, rendering page with odd caracrets.
The test site on my server can be seen at http://radu-stanian.dnsalias.com and on public server at http://radustanian.scoli.edu.ro/
This happens no matter what browser I use (IE, ff or chrome)
What should I do to force browsers to load the pages in correct codepage?
Making changes to every page is not an option, because there are hundreds of pages, created by various peoples which could edit them further for update
Thank you
I did a quick google search and this is what I came up with:
http://www.w3.org/International/questions/qa-htaccess-charset
I've never messed with the .htaccess files with this scenario, but from what I read up it seems like you can force a certain character codepage mode based on file extension, which is what you need.
I'm not sure if it works, but hopefully it does :)
Most web servers allow you to edit HTTP headers. One of them can specify the exact codepage for a browser to use.
For example:
Content-Type: text/html; charset=ISO-8859-4

Resources