Canonical Header Links for PDF and Image files in .htaccess - .htaccess

I'm attempting to setup Canonical links for a number of PDF and images files on my website.
Example Folder Structure:
/index.php
/docs/
file.pdf
/folder1/
file.pdf
/folder2/
file1.pdf
file2.pdf
/img/
sprite.png
/slideshow/
slide1.jpg
slide2.jpg
Example PDF URL to Canonical URL:
http://www.example.com/docs/folder1/file.pdf --> http://www.example.com/products/folder1/
I am trying to avoid having to put individual .htaccess files in each of the sub-folders that contain all of my images and PDFs. I currently have 7 "main" folders, and each of these folders have any where from 2-10 sub-folders, and most sub-folders have their own sub-folders. I have roughly 80 PDFs, and even more images.
I'm looking for a (semi)dynamic solution where all files in a certain folder will have the Canonical Link set to a single url. I want to keep as much as possible in a single .htaccess file.
I know that <Files> and <FilesMatch> do not understand paths, and that <Directory> and <DirectoryMatch> don't work in .htaccess files.
Is there a fairly simple way to accomplish this?

I don't know of a way to solve this with apache rules alone as it would require some sort of regex matching and reusing the result of the match in a directive, which isn't possible.
However, it's pretty simple if you introduce a php script into the mix:
RewriteEngine On
RewriteCond %{REQUEST_URI} \.(jpg|png|pdf)$
RewriteRule (.*) /canonical-header.php?path=$1
Note that this would send requests for all jpg, png and pdf files to the script regardless of the folder name. If you want to include only specific folders, you could add another RewriteCond to accomplish that.
Now the canonical-header.php script:
<?php
// Checking for the presence of the path variable in the query string allows us to easily 404 any requests that
// come directly to this script, just to be safe.
if (!empty($_GET['path'])) {
// Be sure to add any new file types you want to handle here so the correct content-type header will be sent.
$mimeTypes = array(
'pdf' => 'application/pdf',
'jpg' => 'image/jpeg',
'png' => 'image/png',
);
$path = filter_input(INPUT_GET, 'path', FILTER_SANITIZE_URL);
$file = realpath($path);
$extension = pathinfo($path, PATHINFO_EXTENSION);
$canonicalUrl = 'http://' . $_SERVER['HTTP_HOST'] . '/' . dirname($path);
$type = $mimeTypes[$extension];
// Verify that the file exists and is readable, or send 404
if (is_readable($file)) {
header('Content-Type: ' . $type);
header('Link <' . $canonicalUrl . '>; rel="canonical"');
readfile(realpath($path));
} else {
header('HTTP/1.0 404 Not Found');
echo "File not found";
}
} else {
header('HTTP/1.0 404 Not Found');
echo "File not found";
}
Please consider this code untested and check that it works as expected across browsers before releasing it to production.

I was able to achieve adding canonical links for files in different directories through a single .htacess file.
The following code adds a canonical link for each file pointing to the same directory:
<FilesMatch "\.(jpg|png|pdf)$">
RewriteRule ([^/]+)\.(jpg|png|pdf)$ - [E=FILENAME:%{HTTP_HOST}/<your-desired-location>/$1.$2]
Header add Link '<https://%{FILENAME}e>; rel="canonical"'
</FilesMatch>
And the code below adds a canonical link to the file's requested URL, which in many cases will be its actual location on the server:
<FilesMatch "\.(jpg|png|pdf)$">
RewriteRule ([^/]+)\.(jpg|png|pdf)$ - [E=FILENAME:%{HTTP_HOST}%{REQUEST_URI}]
Header set Link '<https://%{FILENAME}e>; rel="canonical"'
</FilesMatch>

Here is the solution !!!
you can use .htacess file for controlling header which is more simple way to manage headers.
How you can do ?
Lets take a example, I have a pdf named "testPDF.pdf" which is in the root folder of my site.
All you have to do, pasted following code into .htaccss file.
<Files testPDF.pdf >
Header add Link '<http://<your_site_name>.com/ >; rel="canonical"'
</Files>
Once you've added that to your .htaccess file, you'll need to test your header to ensure that it's working accurately

For an IIS solution, try something like this.
Response.AppendHeader("Link", "<" + "https://" + Request.Url.Host + "/" + product.GetSeName() + ">; rel=\"canonical\"");
this was added to a function which generated a PDF version of the webpage :)

Related

HTACCESS - Get Path without filename

I was wondering how to get just the request path without the file name, i.e. the folder where the requested file is in. Is that possible with the HTACCESS file or is there a variable like% {REQUEST_URI} without the filename?
I would then like to deposit the code in the htaccess in the / directory. I want to find out the folder path so that I don't have to enter the path for each subfolder individually. So when I access example.com/folder the code checks the same as it does on example.com/folder2.
So I want to redirect TO a maintenance page FROM every URL in whose same folder there is a maintenance.txt.
I would then like to check with an If statement in the folder of the requested file whether a certain file ("maintenance.txt") exists. Because I want to "block" every folder where this file exists in it.
Maybe it is possible with RegEx? I thought of something like this:
<If "-f %{FOLDER_OF_REQUESTED_FILE}/maintenance.txt">
ErrorDocument 200 /assets/maintenance.html
</If>

How to display error page instead of showing file in my directory on my cpanel server

Can anyone help with how to display the error page instead of showing the directory for my files when no file is pointed to on the URL? For instance, in the attached picture I simply remove thefile.php and input the URL on a browser like www.mywebsite.com/foldername/ instead of www.mywebsite.com/foldername/thefile.php and it displays all the files under this folder. I want it to redirect the user to the error page instead of showing all files in this folder.
One way to to it would be to create an index.php file inside that folder with the content bellow:
<?php
$dir = "."; // the directory you want to check
$exclude = array(".", "..", "index.php"); // ignore . .. and file itself
$files = scandir($dir);
$files = array_diff($files, $exclude); // delete the entries in exclude array from your files array
if(!empty($files)) {
echo "there are files";
}
else
{
header("Location: /404.php");
}
?>
When you access http://url/folder, index.php is the default php file loaded, so if there is nothing else in the folder except for the index.php file, it will show the 404 page (you have to replace the /404.php page with whatever page you have for handling 404 errors)
I assume you could do it with some regex in .htaccess as well

htaccess - creating directories and files of the same name

I want to create a bunch of files without an extension showing at the end. The easiest way to do that was to do this:
/usa/index.php
/usa/alaska/index.php
/usa/alabama/index.php
/usa/california/index.php
What I want to do is this
/usa/alaska.php
/usa/alabama.php
/usa/california.php
and have it show up as:
/usa/alaska
/usa/alabama
/usa/california
However, I have one more level I want to add to this, the cities
/usa/alaska/adak.php
/usa/alaska/anchorage.php
/usa/california/los-angles.php
I don't want the ".php" showing up, but then each state exists as both a file and a directory. What I want is an htaccess rule that serves up the file version of the file, not the directory which is the default. I also want to strip the .php off of the end of the files so the final result looks like
/usa
/usa/alaska (alaska.php)
/usa/alaska/adak (adak.php)
I know I can get close to this by creating all the directories and using index.php for each directory, but then I will have thousands of directories each with one file in it and updating is a pain in the butt. I would much rather have one directory with 1000 files in it, than 1000 directories with 1 file in it.
Please, can someone point me in the right direction and know that I am doing this for all 50 states.
Jim
I would also suggest using a single php (e.g. index.php) file and redirecting all urls starting with usa to it, instead of separating them in different directories and files. The you'd need a couple of rewrite rules like the following
RewriteEngine On
RewriteRule ^usa/([^/.]+)$ index.php?state=$1 [L]
RewriteRule ^usa/([^/]+)/([^/.]+)$ index.php?state=$1&city=$2 [L]
So then in your index.php you'd only need to check the $_GET parameters.
Update:
If you don't feel comfortable enough to use a database and pull the needed data from there you could always use the parameters to dynamically include/require the needed files. Something like this
<?php
$source = ''; //or the 'ROOT' directory
if(isset($_GET['state'])) $source .= $_GET['state'].'/';
if(isset($_GET['city'])) $source .= $_GET['city'].'.php';
include($source); // here $source would be something like 'alaska/adak.php'
// and is assumed that the dir 'alaska' is on the same
// level as 'index.php'
?>
But to answer your original question nevertheless you could use the following .htaccess
RewriteEngine On
RewriteRule ^usa/([^/.]+)$ usa/$1.php [L]
RewriteRule ^usa/([^/]+)/([^/.]+)$ usa/$1/$2.php [L]
what about creating just one single file:
/usa/index.php
With
$_SERVER["REQUEST_URI"]
you can read the current URI.
Well, now if a user enters "http://domain.foo/usa/alaska" for example, he will get an 404 error of course.
But to call your index.php instead, you could write this line to the .htaccess:
ErrorDocument 404 /usa/index.php
Now the index.php receives everything what is written to the URI and you can match the result and include files or handle errors.
But maybe there is a better solution with .htaccess only, don't know. :)

.htaccess modification

I am using direct paths for downloading files from my site. the link is something like this
http://www.site.com/download.php?dir1/dir/dir3/file.doc
i want to wrap it with mod rewite rules so that only below link should be appeared
http://www.site.com/download
file, dir and dir3 are variable.
what i'hv to do in my .htaccess file?? Any Idea??
A simple redirect would be:
RewriteRule ^http://www.site.com/download/(.*)/?$ http://www.site.com/download.php?dir1/dir/dir3/$1 [NC,L]
This will take any request for something in the 'artificial' download directory and route it to the real location.
You can add more complex rules stripping out filetypes etc depending on your needs, or redirecting a 'name' to a filename etc etc..
e.g:
RewriteRule ^http://www.site.com/download/pdf/(.*)/?$ http://www.site.com/download.php?dir1/dir/dir3/$1.pdf [L,NC]
This would have an artificial PDF folder containing a filename ex the extension, routing to a .pdf doc....you can shape the redirect any way you like really...depends on the format you prefer
Not specific question. What is dir1/dir/dir3/file.doc means? If you want to get http://www.site.com/download.php?dir1/dir/dir3/file.doc, when you go to http://www.site.com/download do the next things in your .htaccess file.
RewriteEngine on
RewriteRule ^download/(.*)/?$ download.php?dir1/dir/dir3/file.doc [L]

Save HTTP_REFERER with mod_rewrite?

actually I'm trying to pass referers inside the .htaccess. What I'm trying to do is that the referer value shall be send to a PHP script where this value will be saved to a databse. In some cases (depending on the referer) the image shall be blocked (hot linking) and in some other cases the image shall be shown normally. But it will not work :-( My current "try" looks like the following (it is just for testing, so currently every image will be handled):
RewriteCond %{REQUEST_URI} (.*)jpg$
RewriteCond %{ENV:verified} ^$
RewriteRule (.*)jpg$ /include/referrer.php?ref=%{REQUEST_FILENAME}&uri=%{REQUEST_URI}&query=%{QUERY_STRING}&env=%{ENV:verified} [E=verified:yes]
RewriteCond %{REQUEST_URI} (.*)jpg$
RewriteCond %{ENV:verified} ^yes$
RewriteRule ^(.*)$ %{REQUEST_FILENAME} [E=verified:no]
The referrer.php look like:
<?
log_img($_REQUEST['uri'].' - "'.$_REQUEST['env'].'"');
?>
The problem is that the referrer.php is called but the image will not be displayed, which is obvious because the second rule is not reached.
I also have tried to display the image inside of the referrer.php, like:
<?
log_img($_REQUEST['uri'].' - "'.$_REQUEST['env'].'"');
$src = str_replace($_SERVER['DOCUMENT_ROOT'],'',$_REQUEST['ref']);
?>
<img src="<? echo $src ?>" />
But then the .htaccess is called again and I will run into endless loops.
The question is now: how can I access the second rule or how can I achieve what I want to do. Is there any way to do that?
Thanks for your help,
Lars
Your current solution doesn't work because mod_rewrite can only be used to rewrite the request to a single destination, but you seem to want the request to take a detour to your PHP script, then continue onward to the image. It might be possible to cause a subrequest that would cause the PHP script to get triggered, but I don't think it would be possible to control whether or not the original request continued on to the image in that scenario.
The best course of action here is to have your PHP file print out the actual image data (not an image tag referencing the image) after it does whatever checking/logging you intend it to do. You can do this with readfile(), provided that you send the right headers. After making sure the file is one of the images you want to serve up (and not some arbitrary file on your system...), you'll at least need to determine the appropriate content type, then print out the data. It's also a good idea to take caching (see this answer, as well as this one) into consideration.
Combining some of the techniques mentioned, a simple pseudo-example of the referrer script would be as follows. Note that you should research the best way to implement the techniques described, and you need to pay particular attention to security since you're opening files and printing their contents.
$filename = /* sanitized file name */;
log_img(/* log some data about the request */);
if (file_exists($filename) && allowedToView($filename)) {
// Assume we're not on PHP 5.3...
$types = array(
'gif' => 'image/gif',
'png' => 'image/png',
'jpg' => 'image/jpg',
);
$parts = pathinfo($filename);
$ext = strtolower($parts['extension']);
if (array_key_exists($ext, $types)) {
$mime = $types[$ext];
$size = filesize($filename);
$expires = 60 * 60 * 24 * 30;
if (!empty($_SERVER['IF-MODIFIED-SINCE'])) {
$modified = filemtime($filename);
$cached = strtotime($_SERVER['IF-MODIFIED-SINCE']);
if ($modified <= $cached) {
header('HTTP/1.1 304 Not Modified');
exit();
}
}
header("Content-Type: $mime");
header("Content-Length: $size");
header('Expires: ' . gmdate('D, d M Y H:i:s', time() + $expires)
. ' GMT');
header('Cache-control: private, max-age=' . $expires);
readfile($filename);
exit();
}
}
header("HTTP/1.0 404 Not Found");
exit();
And as far as the .htaccess file goes, it would just be something like this (the stuff that you added to the query string is available in $_SERVER anyway, so I see no point in manually passing it to the script):
RewriteEngine on
RewriteRule \.(jpg|png|gif)$ /include/referrer.php [NC]

Resources