htaccess Deny all except url - .htaccess

I use my website to host some files, but I do not want users to download the files directly. I first want to show a preview of the docx file (what I am doing via the officeviewer from Microsoft).
Since the viewer is an embed-link that has a parameter to the file, I obviously can't block the URL from Microsoft.
I have tried to allow it by IP, by URL and have been looking around on the internet, but I haven't found a solution yet that works for me. I mostly found solutions to block a site from viewing and I have no clue to make it inverse.
My code is currently this:
Order deny,allow
Deny from all
Allow from view.officeapps.live.com
How can I keep denying all, but allowing the domain of view.officeapps.live.com?
Thanks in advance

Related

Preventing indexing of PDF files with htaccess

I have a ton of PDF files in different folders on my website. I need to prevent them from being indexed by Google using .htaccess (since robots.txt apparently doesn't prevent indexing if other pages link to the files).
However, I've tried adding the following to my .htaccess file:
<Files ~ "\.pdf$">
Header append X-Robots-Tag "noindex, nofollow, noarchive, nosnippet"
</Files>
to no avail; the PDF files still show up when googling "site:mysite.com pdf", even after I've asked Google to re-index the site.
I don't have the option of hosting the files elsewhere or protecting them with a login system; I'd really like to simply get the htaccess file to do the job. What am I missing?
As I see in the comment made on another answer, I understand that
you are looking for removing indexed file/folder which is already done by google. You can temporary forbid it using following if you stop anyone accessing directly.
First, let me give you a workaround
after that I will let you know what you can do which will be taking bit longer time.
<Files "path/to/pdf/* ">
Order Allow,Deny
Deny from all
Require all denied
</Files>
this way all files/folders inside the given directory will be forbidden to use in the HTTP method. This means you can only access it programmatically for sending in attachment or deleting or something but the user will not be able to view these.
You can make a script on your serverside which will access file internally and show file using parsing instead direct URL.(assuming data is critical as of now).
Example
$contents = file_get_contents($filePath);
header('Content-Type: ' . mime_content_type($filePath));
header('Content-Length: ' . filesize($filePath));
echo $contents;
Indexing vs Forbidding (No need of this now)
Preventing indexing basically prevent this folder/files to be index by google bots or search engine bots, anyone visiting directly will still be able to view the file.
In the case of Forbidding, no external entity/users/bots will able to see/access this file/folder.
If you have recently forbidden access of your pdf folder, it may still be visible to Google until Googlebot visits again on your site and find those missing or you mention noindex for that specific folder.
You can read more about crawler rate on https://support.google.com/webmasters/answer/48620?hl=en
If you still want these to remove, you can visit the Google search console and request the same. visit: https://www.google.com/webmasters/tools/googlebot-report?pli=1
Just paste this in your htaccess file, use set instead of append
<Files ~ "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</Files>

restrict access to all php files besides three of them

I've made a site 1 year ago using php, when I had alot less experience. My teacher and I were analysing the code today and there seems to be a security issue. He wants me to fix it before he gives me the points I need.
I've got an index.php and an edit.php file in the root directory, and a login page in /php/login.php (which I find to be a very silly place to put a login file in, now that I look back on it, I would probably swap edit.php's and login.php's directory's if I were to rewrite my site).
Basically, I want these three files to be accessible externally. I want all other php files to be restricted from the outside, so it's impossible to do an ajax call to /php/phpsavefile.php from outside the system (which is the security issue I mentioned). edit.php makes the ajax call to /php/savefile.php.
I think this is what I need to get the job done:
Order Deny,Allow
Deny from all
Allow from 127.0.0.1
<Files /index.php>
Order Allow,Deny
Allow from all
</Files>
But how can I add three files instead of just one after <Files and before >?
I've also tried second approach:
Order Deny,Allow
Deny from all
This doesn't seem to work because an ajax call appears to be a regular http request as well, so it gets a 403 response.
Another approach I tried was putting the restricted php files inside a map called "private"
in the same folder where "httpdocs" remains (the parent folder of webroot). My teacher had told me about an admin folder, that no one can access but the site itsself. I tried including the restricted php files inside the private folder, but it didn't seem to include it properly...
Any help or tips for this novice at .htaccess would be appreciated :-)
Edit:
.htaccess allow access to files only from includes
Ray's comment said:
Of course, because they are requested by the client. You can't "allow the client" and "not allow the client" to serve files.
I suppose this is true, but how can I prevent people from calling my ajax file?
I secured it by checking if the user was logged in.

I need to prevent the files that are on my drupal website from beeing downloaded from outside even when they're on public nodes

I've set this in a .htaccess
ErrorDocument 403 http://websiteip.fr/fonds-spe/document-indisponible
order deny,allow
deny from all
allow from websiteip
allow from 127.0.0.1
allow from localhost
That .htaccess works : each time I'm trying to download a file via the "Apache" URL, I got the Error Document - that's ok.
BUT
with that, even Drupal can't access the files to display or play them (some are videos or sounds).
For example, on a node with an image in it, I got This image failed to load.
Any idea suggestion welcome
Each client will have a different public IP ... .htaccess is not the way to accomplish this. You'd be better suited modifying Drupal (via module) . Keep in mind however, if it plays on the screen or through the speakers, it can be recorded /downloaded /etc.
We use the Private Upload module on our site to protect files from being downloaded.
http://drupal.org/project/private_upload

How to stop search engines from crawling the whole website?

I want to stop search engines from crawling my whole website.
I have a web application for members of a company to use. This is hosted on a web server so that the employees of the company can access it. No one else (the public) would need it or find it useful.
So I want to add another layer of security (In Theory) to try and prevent unauthorized access by totally removing access to it by all search engine bots/crawlers. Having Google index our site to make it searchable is pointless from the business perspective and just adds another way for a hacker to find the website in the first place to try and hack it.
I know in the robots.txt you can tell search engines not to crawl certain directories.
Is it possible to tell bots not to crawl the whole site without having to list all the directories not to crawl?
Is this best done with robots.txt or is it better done by .htaccess or other?
Using robots.txt to keep a site out of search engine indexes has one minor and little-known problem: if anyone ever links to your site from any page indexed by Google (which would have to happen for Google to find your site anyway, robots.txt or not), Google may still index the link and show it as part of their search results, even if you don't allow them to fetch the page the link points to.
If this might be a problem for you, the solution is to not use robots.txt, but instead to include a robots meta tag with the value noindex,nofollow on every page on your site. You can even do this in a .htaccess file using mod_headers and the X-Robots-Tag HTTP header:
Header set X-Robots-Tag noindex,nofollow
This directive will add the header X-Robots-Tag: noindex,nofollow to every page it applies to, including non-HTML pages like images. Of course, you may want to include the corresponding HTML meta tag too, just in case (it's an older standard, and so presumably more widely supported):
<meta name="robots" content="noindex,nofollow" />
Note that if you do this, Googlebot will still try to crawl any links it finds to your site, since it needs to fetch the page before it sees the header / meta tag. Of course, some might well consider this a feature instead of a bug, since it lets you look in your access logs to see if Google has found any links to your site.
In any case, whatever you do, keep in mind that it's hard to keep a "secret" site secret very long. As time passes, the probability that one of your users will accidentally leak a link to the site approaches 100%, and if there's any reason to assume that someone would be interested in finding the site, you should assume that they will. Thus, make sure you also put proper access controls on your site, keep the software up to date and run regular security checks on it.
It is best handled with a robots.txt file, for just bots that respect the file.
To block the whole site add this to robots.txt in the root directory of your site:
User-agent: *
Disallow: /
To limit access to your site for everyone else, .htaccess is better, but you would need to define access rules, by IP address for example.
Below are the .htaccess rules to restrict everyone except your people from your company IP:
Order allow,deny
# Enter your companies IP address here
Allow from 255.1.1.1
Deny from all
If security is your concern, and locking down to IP addresses isn't viable, you should look into requiring your users to authenticate in someway to access your site.
That would mean that anyone (google, bot, person-who-stumbled-upon-a-link) who isn't authenticated, wouldn't be able to access your pages.
You could bake it into your website itself, or use HTTP Basic Authentication.
https://www.httpwatch.com/httpgallery/authentication/
In addition to the provided answers, you can stop search engines from crawling/indexing a specific page on your website in .robot.text. Below is an example:
User-agent: *
Disallow: /example-page/
The above example is especially handy when you have dynamic pages, otherwise, you may want to add the below HTML meta tag on the specific pages you want to be disallowed from search engines:
<meta name="robots" content="noindex, nofollow" />

Deny http access to a directory, allow access from WordPress plugin

Hey. I need to prevent direct access to http://www.site.com/wp-content/uploads/folder/something.pdf through the browser.
However the Download Monitor plugin I am using, which allows logged in users to download the file, needs to be able to work.
Trying
Order Allow,Deny
Deny from all
http://www.site.com/wp-content/plugins/download-monitor/download.php">
Allow from all
but the download links do not now work... even though (I think) they are links produced by the script e.g.
http://www.site.com/wp-content/plugins/download-monitor/download.php?id=something.pdf
Enter that in the address bar and you correctly get a WordPress message, 'You must be logged in to download this file.'
However, if someone knows the URL where the file was uploaded
http://www.site.com/wp-content/uploads/folder/something.pdf
they can still access it directly.
I don't know how (guesswork?) they would find the direct URL anyway, but the client wants it stopped!
Thanks for any help.
You cannot set Deny in .htaccess because your WordPress and a standard file request has the same server user - www-data/apache/http/or something.
You can for example sat folder's chmod to 700 and it will allow access for script but not for direct file call.
And accept your recent questions.

Resources