Redirect single user agent to a specific RSS feed - .htaccess

I have asked this before and since then done a lot of searching but I can't find a solution other than a suspicion that it will be a targeted 301 redirect in htaccess - having read https://support.google.com/feedburner/answer/78464?hl=en
If anyone can assist I would be really grateful. I have a heap of feeds - many I'm not aware of I imagine (Wordpress). I have created a specific feed for a single purpose (Flipboard). I'm not sure what the Flipboard user agent is, but if I can establish that, is there a way to block it from accessing all RSS feeds except the feed I have created specifically for it?
I am at your mercy :-) Skip.

You can try a rules like this in your DOCUMENT_ROOT/.htaccess file:
RewriteEngine on
# if USER_AGENT is flipboard (or whatever)
RewriteCond %{HTTP_USER_AGENT} flipboard
# if accessing any feed other than /path/to/flipboard-feed.xml
# then give Forbidden error
RewriteRule !^path/to/flipboard-feed\.xml$ - [F]

Related

htaccess prevent direct access to files while allowing rewrite rules to access them

General Overview
I've been creating this really nice .htaccess file with a bunch of settings that work great so far. I am wondering if it is possible, now, to allow access to files through flat links only while denying access to the same files directly.
Explanation & Current Settings
To better present this question consider the following:
I have a file: i.e. myFile.php
Which is in a subfolder: i.e. my/path/to/file/
The file's full path would then be my/path/to/file/myFile.php
Accessing this file through a URL, one would write: my.domain.com/my/path/to/file/myFile.php
In my .htaccess file, I have written a rewrite rule, similar to the following line of code (preceded by some RewriteCond's that ensure conditions are met regarding the host and filenames respectively):
RewriteRule ^home$ \/my\/path\/to\/file\/myFile.php [NC,L]
This means that someone trying to get to my page my/path/to/file/myFile.php can simply write my.domain.com/home instead of the ugly path my.domain.com/my/path/to/file/myFile.php.
Question & Preferred Outcome
What I am asking is:
Is it possible to block access to myFile.php if a person or machine attempts to go to my.domain.com/my/path/to/file/myFile.php, all the while allowing access to the file through my.domain.com/home?
Any help regarding this is greatly appreciated
Is it possible to block access to myFile.php if a person or machine attempts to go to my.domain.com/my/path/to/file/myFile.php, all the while allowing access to the file through my.domain.com/home
Yes it is possible using THE_REQUEST variable, represents original request received by Apache from your browser and it doesn't get overwritten after execution of some rewrite rules.
You can use this rule to block direct access to that particular file:
RewriteCond %{THE_REQUEST} /my/path/to/file/myFile\.php[?/\s] [NC]
RewriteRule ^ - [F]

Compress .htaccess file

I've got a htaccess file which contains over 3,000 lines mainly thanks to 301 redirects I have setup from my old ecommerce site. The file is 323kb in size and I'm worried it's going to be a burden for load times and therefore conversions.
Is there anything available that can compress (minify?) the file into a smaller size or someone offer a better idea to handle the 301 redirects?
If the redirects are simple redirects i.e. url1 to url2, no regex etc, AND you have access to httpd.conf, then you could use a RewriteMap for all the redirects and possibly have just 1 rule in your .htaccess to handle these.
From the RewriteMap documentation
The looked-up keys are cached by httpd until the mtime (modified time) of the mapfile changes, or the httpd server is restarted. This ensures better performance on maps that are called by many requests.
Can you specify some regular expressions to group / match all of these redirects? This then offers two options for doing this:
The first is to use a (hopefully smaller) set of RewriteRule statements using the [R=301] flag.
The second is to move this redirection into a redirector script where you use, say, PHP logic to decode the legacy ecommerce URI into its current format then issue a response with and 301/302 status and Location: pointing to the current URI. This would also need you to do a catch-all rewrite of the legacy ecommerce URIs to this redirector script, e.g.
RewriteRule ^(product/.*) rewriter.php?uri=$1 [QSA,L]
Without some examples, I can't give a more specific reply. Sorry.
I've had some of these cases before, most of the times you can replace the redirect statments with RewriteRules. For example, if your URL's went from:
http://shop.example.com/shop/category/product-id.html
To this:
http://shop.example.com/category/product-id.html
You can fetch it with a rewrite like this:
RewriteRule ^/shop/([a-z]+)/([0-9]+)\.html$ /$1/$2.html [L, R=301]
This will still result in a 301 redirect, so crawlers will still know it's a permanent move.

301 Redirect to change structure

I have been researching redirects for a few days now and am still struggling, so I decided to post my first question here. For some reason, it is just not clicking for me.
I have redesigned and developed a client's WordPress site and need to update it's structure.
The site's current structure is:
www.domain.com/blog/postname/2011/12/26/
The new structure should be:
www.domain.com/blog/postname
I really thought this was going to be easy since all I am looking to do is drop the date, but have not been able to grasp the whole wildcard aspect and how to end what I am trying to match. Any help would be greatly appreciated. A simple answer is great, but an explanation would be even better.
I am assuming you already know how to change your WordPress permalink structure to drop the date.
To 301 redirect all of the old URLs to the new ones, add the following rules to your .htaccess file in the root of your websites domain, ahead of any existing rules that are there.
#if these 2 lines already exist, skip them and add the rest
RewriteEngine on
RewriteBase /
# if there is a request of the form /blog/post-name/yyyy/mm/dd/
RewriteCond %{REQUEST_URI} ^(/blog/[^/]+/)[0-9]{4}/[0-9]{2}/[0-9]{2}/$ [NC]
#redirect the request to the URL without the date
RewriteRule . %1 [L,R=301]
If you want to learn more about .htaccess/rewriting you can take a look at the following urls: Indepth htaccess, Brief Introduction to Rewriting, Apache Mod_rewrite.
Let me know if this works for you and/or you have any issues.

Fastest way to redirect missing image files

I have an on-the-fly thumbnailing system and am trying to find the best way to make sure it's as fast as possible when serving up images. Here is the current flow:
User requests thumbnail thumbnails/this-is-the-image-name.gif?w=200&h=100&c=true
htaccess file uses modrewrite to send requests from this folder to a PHP file
PHP file checks file_exists() for the requested image based on the query string values
If it does:
header('content-type: image/jpeg');
echo file_get_contents($file_check_path);
die();
If it doesn't it creates the thumbnail and returns it.
My question is whether there is a way to optimize this into being faster? Ideally my htaccess file would do a file_exists and only send you to the PHP file when it doesn't... but since I am using query strings there is no way to build a dynamic URL to check. Is it worth switching from query strings to an actual file request and then doing the existence check in htaccess? Will that be any faster? I prefer the query string syntax, but currently all requests go to the PHP file which returns images whether they exist or not.
Thank you for any input in advance!
You should be able to do this in theory. The RewriteCond command has a flag -f which can be used to check for the existence of a file. You should be able to have a rule like this:
# If the file doesn't exist
RewriteCond %{REQUEST_FILENAME} !-f
# off to PHP we go
RewriteRule (.*) your-code.php [L,QSA]
The twist here is that I imagine you're naming files according to the parameters that come in -- so the example above might be thumbnails/this-is-the-image-name-200-100.gif. If that is the case, you'll need to generate a filename to test on the fly, and check for that instead of the REQUEST_FILENAME -- the details of this are really specific to your setup. If you can, I would recommend some sort of system that doesn't involve too much effort. For example, you could store your thumbnails to the filesystem in a directory structure like /width/height/filename, which would be easier to check for in a rewrite rule than, modified-filename-width-height.gif.
If you haven't checked it out, Apache's mod_rewrite guide has a bunch of decent examples.
UPDATE: so, you'll actually need to check for the dynamic filename from the looks of it. I think that the easiest way to do something like this will be to stick the filename you generate into an environment variable, like this (I've borrowed from your other question to flesh this out):
# generate potential thumbnail filename
RewriteCond %{SCRIPT_FILENAME}%{QUERY_STRING} /([a-zA-Z0-9-]+).(jpg|gif|png)w=([0-9]+)&h=([0-9]+)(&c=(true|false))
# store it in a variable
RewriteRule .* - [E=thumbnail:%1-%2-%3-%4-%6.jpg]
# check to see if it exists
RewriteCond %{DOCUMENT_ROOT}/path/%{ENV:thumbnail} !-f
# off to PHP we go
RewriteRule (.*) thumbnail.php?file_name=%1&type=%2&w=%3&h=%4&c=%6 [L,QSA]
This is completely untested, and subject to not working for sure. I would recommend a couple other things:
Also, one huge recommendation I have for you is that if possible, turn on logging and set RewriteLogLevel to a high level. The log for rewrite rules can be pretty convoluted, but definitely gives you an idea of what is going on. You need server access to do this -- you can't put the logging config in an .htaccess file if I recall.

Why does this cause an infinite request loop?

Earlier today, I was helping someone with an .htaccess use case, and came up with a solution that works but can't quite figure it out myself!
He wanted to be able to:
Browse to index.php?id=3&cat=5
See the location bar read index/3/5/
Have the content served from index.php?id=3&cat=5
The last two steps are fairly typical (usually from the user entering index/3/5 in the first place), but the first step was required because he still had some old-format links in his site and, for whatever reason, couldn't change them. So he needed to support both URL formats, and have the user always end up seeing the prettified one.
After much to-ing and fro-ing, we came up with the following .htaccess file:
RewriteEngine on
# Prevents browser looping, which does seem
# to occur in some specific scenarios. Can't
# explain the mechanics of this problem in
# detail, but there we go.
RewriteCond %{ENV:REDIRECT_STATUS} 200
RewriteRule .* - [L]
# Hard-rewrite ("[R]") to "friendly" URL.
# Needs RewriteCond to match original querystring.
# Uses "?" in target to remove original querystring,
# and "%n" backrefs to move its components.
# Target must be a full path as it's a hard-rewrite.
RewriteCond %{QUERY_STRING} ^id=(\d+)&cat=(\d+)$
RewriteRule ^index\.php$ http://example.com/index/%1/%2/? [L,R]
# Soft-rewrite from "friendly" URL to "real" URL.
# Transparent to browser.
RewriteRule ^index/(\d+)/(\d+)/$ /index.php?id=$1&cat=$2
Whilst it might seem to be a somewhat strange use case ("why not just use the proper links in the first place?", you might ask), just go with it. Regardless of the original requirement, this is the scenario and it's driving me mad.
Without the first rule, the client enters into a request loop, trying to GET /index/X/Y/ repeatedly and getting 302 each time. The check on REDIRECT_STATUS makes everything run smoothly. But I would have thought that after the final rule, no more rules would be served, the client wouldn't make any more requests (note, no [R]), and everything would be gravy.
So... why would this result in a request loop when I take out the first rule?
Without being able to tinker with your setup, I can't say for sure, but I believe this problem is due to the following relatively arcane feature of mod_rewrite:
When you manipulate a URL/filename in per-directory context mod_rewrite first rewrites the filename back to its corresponding URL (which is usually impossible, but see the RewriteBase directive below for the trick to achieve this) and then initiates a new internal sub-request with the new URL. This restarts processing of the API phases.
(source: mod_rewrite technical documentation, I highly recommend reading this)
In other words, when you use a RewriteRule in an .htaccess file, it's possible that the new, rewritten URL maps to an entirely different directory on the filesystem, in which case the .htaccess file in the original directory wouldn't apply anymore. So whenever a RewriteRule in an .htaccess file matches the request, Apache has to restart processing from scratch with the modified URL. This means, among other things, that every RewriteRule gets checked again.
In your case, what happens is that you access /index/X/Y/ from the browser. The last rule in your .htaccess file triggers, rewriting that to /index.php?id=X&cat=Y, so Apache has to create a new internal subrequest with the URL /index.php?id=X&cat=Y. That matches your earlier external redirect rule, so Apache sends a 302 response back to the browser to redirect it to /index/X/Y/. But remember, the browser never saw that internal subrequest; as far as it knows, it was already on /index/X/Y/. So it looks to you as though you're being redirected from /index/X/Y/ to that same URL, triggering an infinite loop.
Besides the performance hit, this is probably one of the better reasons that you should avoid putting rewrite rules in .htaccess files when possible. If you move these rules to the main server configuration, you won't have this problem because matches on the rules won't trigger internal subrequests. If you don't have access to the main server configuration files, one way you can get around it (EDIT: or so I thought, although it doesn't seem to work - see comments) is by adding the [NS] (no subrequest) flag to your external redirect rule,
RewriteRule ^index\.php$ http://example.com/index/%1/%2/? [L,R,NS]
Once you do that, you should no longer need the first rule that checks the REDIRECT_STATUS.
The solution below worked for me.
RewriteEngine on
RewriteBase /
#rule1
#Guard condition: only if the original client request was for index.php
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php [NC]
RewriteCond %{QUERY_STRING} ^id=(\d+)&cat=(\d+)$ [NC]
RewriteRule . /index/%1/%2/? [L,R]
#rule 2
RewriteRule ^index/(\d+)/(\d+)/$ /index.php?id=$1&cat=$2 [L,NC]
Here is what I think is happening
From the steps you quoted above
Browse to index.php?id=3&cat=5
See the location bar read index/3/5/
Have the content served from index.php?id=3&cat=5
At Step 1, Rule 1 matches and redirects to location bar and fulfills Step 2.
At Step 3, Rule 2 now matches and rewrites to index.php.
The rules are rerun, for the reasons David stated, but since THE_REQUEST is immutable once set to the original request, it still contains /index/3/5 so Rule 1 does not match.
Rule 2 does not match either and the result of index.php is served.
Most other variables are mutable e.g. REQUEST_URI. Their modification during rule processing, and the incorrect expectation that the pattern matches are against the original request is a common reason for infinite loops.
Its feels quite esoteric sometimes, but I am sure there is a logical reason for its complexity :-)
EDIT
Surely there are two distinct requests
There are 2 client requests, the original one from Step1 and the one from the external redirect in step 2.
What I glossed over above is that when Rule 2 matches on the second request, it is rewritten to /index.php and causes an internal redirect. This forces the .htaccess file for / directory to be loaded again (it could easily have been another another directory with different .htaccess rules) and Re-run all the rules again.
So... why would this result in a request loop when I take out the first rule?
When the rules are re-run, the first rule now unexpectedly matches, as a result of Rule2's rewrite, and does a redirect, causing an infinite loop.
David's answer does contain most of this information and is what I meant "for the reasons David stated".
However, the main point here is that you do need the extra condition, either your condition, which stops further rule processing on internal redirects, or mine, which prevents rule 1 from matching, is necessary to prevent the infinite loop.

Resources