.htaccess mod_rewrite: difference between the -s and -f conditions - .htaccess

I have used the apache rewrite module a lot, but now I stumbled upon these two lines:
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -s [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f
The first line uses the -s condition. The second uses the -f condition. In the docs it reads:
For -f
Treats the TestString as a pathname and tests whether or not it exists, and is a regular file.
For -s
Treats the TestString as a pathname and tests whether or not it exists, and is a regular file with size greater than zero.
I found the two conditions using Julian Pömp’s htaccess generator for angular.
So, what is the use of the -s condition (file with size) if there is already the -f condition (file exists)? It seems a little redundant to both check for the existence of a file and for the existence of a file with a size greater than zero. All files will pass the -f condition so there seems to be no need for the -s condition check…

You're right, it's redundant. If I use both conditions it shouldn't matter if the file size is greater than 0 or not. I'll change it to only -f.

Related

Case insensitive rewrite rule

So I have a rewrite rule that it's needed because of the old site, and have some images that are linked from another website, the problem is that I can' manually fix the url's because there are a lot of images.
So before the website was hosted on Windows, and there was no problem if you want to link an image like this:
http://www.example.com/Fder69.JPG and the filename was "fder69.JPG" it did work, now I have a rewrite rule like this:
RewriteRule ^([^/.]+.JPG)$ /imgs/$1 [L,NC,R=302] so basicly rewrites the old links to the new structure, but some of the images that don't have the exact filename don't work.
Is there a way to accomplish this? with something like CheckSpelling Off or ? can I make the rewrite cond to accept .JPG and .jpg, any tips?
One option is to rename all the files to be all-lowercase, which generally leads to nicer URLs, and then redirect any requests for mixed-case versions to all lowercase.
This approach has the advantage that each file ends up with only a single URL, rather than the same content appearing under multiple URLs as would be the case if you used mod_speling. This is good for search engine rankings, among other things.
One way to rename all the files would be to generate a bunch of mv commands in a shell script, like this:
find . | perl -ne 'chomp; print "mv \"", $_, "\" \"", lc $_, "\"\n";' > rename-files.sh
Note that I make no warranties that this won't mess up all your files, but I think it's right...
The redirection is done using a "RewriteMap", which is a function which can be applied on the right hand side of a RewriteRule. One of the built-in mappings available is int:tolower, allowing you to do this:
# Alias the mapping function as "lc"
RewriteMap lc int:tolower
# Perform the substitution if the URL contains uppercase letters
RewriteCond %{REQUEST_URI} [A-Z]
# Issue a 301 redirect to the all-lowercase version
RewriteRule /(.*) /${lc:$1} [R=permanent,L]

Htaccess caching system in subfolder not working

Sorry if this is a duplicate: I found many questions about caching system, but my problem seems to tied to the fact that the whole script is working within a subfolder.
All I need to do is implementing a simple caching system for my website, but I can't get this to work.
Here's my .htaccess file (widely commented to be clear - sorry if too many comments are confusing):
RewriteEngine on
# Map for lower-case conversion of some case-insensitive arguments:
RewriteMap lc int:tolower
# The script lives into this subfolder:
RewriteBase /mydir/
# IMAGES
# Checks if cached version exists...
RewriteCond cache/$1-$2-$3-{lc:$4}.$5 -f
# ...if yes, redirects to cached version...
RewriteRule ^(hello|world)\/image\/([a-zA-Z0-9\.\-_]+)\/([a-zA-Z0-9\.\-_]+)\/([a-zA-Z0-9\.\-_\s]+)\.(png|gif|jpeg?|jpg)$ cache/$1-$2-$3-{lc:$4}.$5 [L]
# ...if no, tries to generate content dynamically.
RewriteRule ^(hello|world)\/image\/([a-zA-Z0-9\.\-_]+)\/([a-zA-Z0-9\.\-_]+)\/([a-zA-Z0-9\.\-_\s]+)\.(png|gif|jpeg?|jpg)$ index.php?look=$1&action=image&size=$2&data=$3&name=$4&format=$5 [L,QSA]
# OTHER
# This is always non-cached.
RewriteRule ^(hello|world)\/([a-zA-Z0-9\.\-_\s]+)\/([a-zA-Z0-9\.\-_\s]+)?\/?$ index.php?look=$1&action=$2&name=$3 [QSA]
Now, the issue is that the RewriteCond seems to be always failing, as the served image is always generated by PHP. I also tried prepending a %{DOCUMENT_ROOT}, but is still not working. If I move the whole script to the root directory, it magically starts working.
What am I doing wrong?
Well one thing that you are doing wrong is trying to use a rewrite map in an .htaccess file. in the first place. According to the Apache documentation:
The RewriteMap directive may not be used in <Directory> sections or .htaccess files. You must declare the map in server or virtualhost context. You may use the map, once created, in your RewriteRule and RewriteCond directives in those scopes. You just can't declare it in those scopes.
If your ISP / sysadmin has already defined the lc map then you can use it. If not then you can only do case-sensitive file caching on Linux, because its FS naming is case sensitive. However, since these are internally generated images, just drop the case conversion and stick to lower case.
%{DOCUMENT_ROOT} may not be set correctly at time of mod_rewrite execution on some shared hosting configurations. See my Tips for debugging .htaccess rewrite rules for more hints. Also here is the equivalent lines from my blog's .htaccess FYI. The DR variable does work here, but didn't for my previous ISP, to I had to hard-code the parth
# For HTML cacheable blog URIs (a GET to a specific list, with no query params,
# guest user and the HTML cache file exists) then use it instead of executing PHP
RewriteCond %{HTTP_COOKIE} !blog_user
RewriteCond %{REQUEST_METHOD}%{QUERY_STRING} =GET [NC]
RewriteCond %{DOCUMENT_ROOT}html_cache/$1.html -f
RewriteRule ^(article-\d+|index|sitemap.xml|search-\w+|rss-[0-9a-z]*)$ \
html_cache/$1.html [L,E=END:1]
Note that I bypass the cache if the user is logged on or for posts and if any query parameters are set.
Footnote
Your match patterns are complicated because you are not using the syntax of regexps: use the \w and you don't need to escape . in [ ] or / . Also the jpeg isn't right is it? So why not:
RewriteRule ^(hello|world)/image/([.\w\-]+)/([.\w\-]+)/([\w\-]+\.(png|gif|jpe?g))$ \
cache/$1-$2-$3-$4 [L]
etc.. Or even (given that the file rule will only match for valid files in the cache)
RewriteRule ^(hello|world)/image/(.+?)/(.+?)/(.*?\.(png|gif|jpe?g))$ \
cache/$1-$2-$3-$4 [L]
The non-greedy modifier means that (.+?) is the same as ([^/]+) so doing hacks like ../../../../etc/passwd won't walk the file hierarchy.

Rewriting public assets to omit file extension

For the sake of formality, I want to trim my asset URIs so that file extensions are omitted.
For example, if I request /resources/stylesheets/common, Apache will return /resources/stylesheets/common.css. At the moment, I'm using the following rewrite to get this to work:
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI}.css -f
RewriteRule ^ /%{REQUEST_URI}.css [L]
This works, as it should. However, I need to also do this for other resources, like images and scripts. For simplicity, I would like to avoid repeating this approach for every type of resource.
Question: What method can I use to achieve this in one rule? I'm thinking something along these lines (though this will not work, and I can see why - perhaps Apache doesn't like guess-work):
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI}.(css|js|png) -f
RewriteRule ^ /%{REQUEST_URI}.%1 [L]

Pass variables from htaccess to bash script

I'm trying to pass the value of a cookie to a bash script:
RewriteCond %{HTTP_COOKIE} mycookie=(.*) [NC]
RewriteRule .* script.sh?cookievar=%1
... but can't seem to find out how to read the GET variable in the bash script. (I suppose I'm asking Google the wrong queries, but can't find any info on this).
Is this even possible, and if so, please how?
Thanks, David
You have to look at QUERY_STRING environment variable in Bash in order to access GET variables. In your case it should be set to cookievar=VALUE. To extract a variable's value, use something like this:
COOKIEVAR=$(echo ${QUERY_STRING} | sed -n -e 's/^.*cookievar=\([^&]*\).*$/\1/p' -e 's/%20/ /g')
Good luck!

mod_rewrite RewriteCond based on Last-modified? (.htaccess)

I know that we can easily base a RewriteCond on any http request header. But can we check (some of) the response headers that are going to be sent? In particular, the Last-modified one?
I want to rewrite a url only when the Last-modified date is older than 30 minutes and I'm trying to avoid the overhead of delegating that check to a php file every single time a file from that directory is requested.
Thanks in advance!
No, that’s not possible. But you could use a rewrite map to get that information from a program with less overhead than PHP, maybe a shell script.
Here’s an example bash script:
#!/usr/bin/env bash
while read line; do
max_age=${line%%:*}
filename=${line#*:}
if [[ -f $filename ]]; then
lm=$(stat -f %m "$filename")
if [[ $(date +%s)-$lm -le $max_age ]]; then
echo yes
else
echo no
fi
else
echo no
fi
done
The declaration of the rewrite map needs to be placed in your server or virtual host configuraion file as the program is just started once and then waits for input:
RewriteMap last-modified-within prg:/absolute/file/system/path/to/last-modified-within.sh
And then you can use that rewrite map like this (.htaccess example):
RewriteCond %{last-modified-within:30:%{REQUEST_FILENAME}} =yes
RewriteRule ^foo/bar$ - [L]
RewriteRule ^foo/bar$ script.php [L]
The outbound headers do not exist until much later than mod_rewrite is acting. There also isn't any file-modification-time checking functionality built into mod_rewrite, so the closest you'd get using it is making a RewriteMap of the External Rewriting Program variety to find out whether the file in question has been modified.
If I understand your application correctly, you could also look into having a cron job delete files in that directory that are older than 30 minutes, and then rewriting on a file-nonexistence condition.
Have you considered using mod_proxy, mod_cache, and/or squid? It sounds like you're trying to roll your own caching...

Resources