Preventing indexing of PDF files with htaccess - .htaccess

I have a ton of PDF files in different folders on my website. I need to prevent them from being indexed by Google using .htaccess (since robots.txt apparently doesn't prevent indexing if other pages link to the files).
However, I've tried adding the following to my .htaccess file:
<Files ~ "\.pdf$">
Header append X-Robots-Tag "noindex, nofollow, noarchive, nosnippet"
</Files>
to no avail; the PDF files still show up when googling "site:mysite.com pdf", even after I've asked Google to re-index the site.
I don't have the option of hosting the files elsewhere or protecting them with a login system; I'd really like to simply get the htaccess file to do the job. What am I missing?

As I see in the comment made on another answer, I understand that
you are looking for removing indexed file/folder which is already done by google. You can temporary forbid it using following if you stop anyone accessing directly.
First, let me give you a workaround
after that I will let you know what you can do which will be taking bit longer time.
<Files "path/to/pdf/* ">
Order Allow,Deny
Deny from all
Require all denied
</Files>
this way all files/folders inside the given directory will be forbidden to use in the HTTP method. This means you can only access it programmatically for sending in attachment or deleting or something but the user will not be able to view these.
You can make a script on your serverside which will access file internally and show file using parsing instead direct URL.(assuming data is critical as of now).
Example
$contents = file_get_contents($filePath);
header('Content-Type: ' . mime_content_type($filePath));
header('Content-Length: ' . filesize($filePath));
echo $contents;
Indexing vs Forbidding (No need of this now)
Preventing indexing basically prevent this folder/files to be index by google bots or search engine bots, anyone visiting directly will still be able to view the file.
In the case of Forbidding, no external entity/users/bots will able to see/access this file/folder.
If you have recently forbidden access of your pdf folder, it may still be visible to Google until Googlebot visits again on your site and find those missing or you mention noindex for that specific folder.
You can read more about crawler rate on https://support.google.com/webmasters/answer/48620?hl=en
If you still want these to remove, you can visit the Google search console and request the same. visit: https://www.google.com/webmasters/tools/googlebot-report?pli=1

Just paste this in your htaccess file, use set instead of append
<Files ~ "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</Files>

Related

htaccess Deny all except url

I use my website to host some files, but I do not want users to download the files directly. I first want to show a preview of the docx file (what I am doing via the officeviewer from Microsoft).
Since the viewer is an embed-link that has a parameter to the file, I obviously can't block the URL from Microsoft.
I have tried to allow it by IP, by URL and have been looking around on the internet, but I haven't found a solution yet that works for me. I mostly found solutions to block a site from viewing and I have no clue to make it inverse.
My code is currently this:
Order deny,allow
Deny from all
Allow from view.officeapps.live.com
How can I keep denying all, but allowing the domain of view.officeapps.live.com?
Thanks in advance

.htaccess limit file to script access

I am extremely new to the concept of .htaccess, and I wanted to know how I could use it to allow a file to be used on a script on a .html file in the same directory as the .htaccess and the file. However, if you try to navigate to the file instead of viewing the script on the .html file, I would like it to be blocked. Thanks!
Update: Please see below comments!
Update 2: It seems that there is no way to achieve what I wished. That's ok, though. I just used a bunch of obfustication, and that seems to work well.
You are wanting to restrict access to a (script)file using htaccess so that a visitor can't directly link to the script file. Assuming this is working like described the visitor would load the HTML-file, the HTML-file would render and request the scriptfile....which will be blocked. So this isn't the way to go I reckon.
I would suggest changing the HTML-file to PHP when possible and include the script with a php include/require. This way the server-side code will determine what content is served.
Once you're including the file server-side you can prevent direct access to the file using htaccess by placing the code below inside your htaccess:
#Prevent Users From Accessing .inc* files in .htaccess
<Files ~ ".inc">
Order allow,deny
Deny from all
</Files>
In the above example direct access to .inc-files will be denied. Change this file-extension to your needs.
Inside your index.php file you'll need to include the file containing your script with something like:
include 'filewithscript.inc';
This should solve your problem.

Allow accessing a file only from specific URL in HTACCESS

I have no idea how to block entering to a specific file from every one except one website. I Have 2 websites, one with my website and other that I can upload to him any thing I want to. I want to allow image uploading to my website, but really it will upload the image to the other website, using AJAX. But I don't want people to be able to use the AJAX code for their sites too, I want that the other website will allow image uploading (AKA accessing to the PHP upload file) only from the first website.
I do not know how to use HTACCESS, I know only the basic, so here is an a example in PHP (and please help me to 'translate' it to HTACCESS):
<?php
if($_SERVER[REQUEST_URI] == "/images/uploads/EXAMPLE" && $from != "http://www.example.com"){ // $from = from where the ajax request were sent.
header("location: 403.html");
}
?>
You can give this a try, put the content below in your .htaccess file:
SetEnvIf Referer my-domain.com internal
#
<Files *>
order Deny,Allow
Deny from all
Allow from env=internal
</Files>
Put the .htaccess file in website where you upload the image to. My-Domain.com is the domain that you would like to have access to the images.
Note: This is not tested

restrict access to all php files besides three of them

I've made a site 1 year ago using php, when I had alot less experience. My teacher and I were analysing the code today and there seems to be a security issue. He wants me to fix it before he gives me the points I need.
I've got an index.php and an edit.php file in the root directory, and a login page in /php/login.php (which I find to be a very silly place to put a login file in, now that I look back on it, I would probably swap edit.php's and login.php's directory's if I were to rewrite my site).
Basically, I want these three files to be accessible externally. I want all other php files to be restricted from the outside, so it's impossible to do an ajax call to /php/phpsavefile.php from outside the system (which is the security issue I mentioned). edit.php makes the ajax call to /php/savefile.php.
I think this is what I need to get the job done:
Order Deny,Allow
Deny from all
Allow from 127.0.0.1
<Files /index.php>
Order Allow,Deny
Allow from all
</Files>
But how can I add three files instead of just one after <Files and before >?
I've also tried second approach:
Order Deny,Allow
Deny from all
This doesn't seem to work because an ajax call appears to be a regular http request as well, so it gets a 403 response.
Another approach I tried was putting the restricted php files inside a map called "private"
in the same folder where "httpdocs" remains (the parent folder of webroot). My teacher had told me about an admin folder, that no one can access but the site itsself. I tried including the restricted php files inside the private folder, but it didn't seem to include it properly...
Any help or tips for this novice at .htaccess would be appreciated :-)
Edit:
.htaccess allow access to files only from includes
Ray's comment said:
Of course, because they are requested by the client. You can't "allow the client" and "not allow the client" to serve files.
I suppose this is true, but how can I prevent people from calling my ajax file?
I secured it by checking if the user was logged in.

Is there a way to restrict the external users to access my server files

Is there a way to restrict the external users to access my server files..
example is when i access this dir http://puptaguig.net/evaluation/js/ it shows the 404 page(though it's not obvious) but when i tried to view control.js here http://puptaguig.net/evaluation/js/controls.js it opened up..
IndexIgnore *
<Files .htaccess>
order allow,deny
deny from all
</Files>
i just want to make these files inside my server directory to secured from outside viewing for some reasons..but how?
Best Regards..
siegheil/js? Should be siegheil/ns for sure?
You could chmod 000 and then no one would see them or access them. You can't have people accessing and not seeing them at the same time. Can't be done.
You can add below lines to your httpd.conf or. htaccess this will avoid access of your JavaScripts
<Files ~ "\.js$">
Order allow,deny
Deny from all
Satisfy All
</Files>
The only way I can think to manage this is deny access to your js files by throwing a .htaccess in the siegheil/js/ folder that says something along the lines of:
deny from all
or just simply put your code in a folder above the root document level of the site itself.
After that, you then use something like minify to retrieve the js files from the backend (PHP / some other server language side) and have the minified / obfuscated code placed in another folder or just outputted directly from the script.
With all that said, in the end, the js code must be downloaded one way or another to be run by the browser. This will make it impossible to prevent people from looking at your code and figuring out what it does if they really want to.
You were able to access http://puptaguig.net/evaluation/js/controls.js but not http://puptaguig.net/evaluation/js/ because most Apache installs prevent an anonymous user from viewing the directory contents, and only permit access to specific files in the directory.
There is no way "hide" client-side JS because without access to those files your users will not be able to run your script. As suggested by #General Redneck, you can obfuscate and minify your js using a tools like minify or uglifyJS, but those can, potentially, been un-minified (minification is still a good idea for performance reasons). Ultimately you are fighting against the "open" nature of the web. I'd suggest putting a license on your code, and keeping an open mind : )
If you really need something to be secure, try accomplishing the essential functionality (which you want to keep private) with a backend language like php or asp.net and feeding the relevant data to you JS script.
You should create an .htaccess file in the relevant directory that has
-Indexes
in it. This will prevent listing of the directory and will cause a 403 error to be raised. Your application can then handle that however it wants to display whatever you want.

Resources