.htaccess to remove arbitrary text in filenames - .htaccess

I've got a client who uploads thousands of images with names like 1057_1.jpg , 1057_2.jpg, 1083_1H.jpg etc - always a number, an underscore, a number and an optional letter. The CMS uses these to link them to relevant entries.
For SEO reasons we want those image filenames to contain some keywords taken from the CMS. So they would become, say, 1057_1-some-keywords-here.jpg. Is there a way, with .htaccess, to keep the filenames the same, but redirect 1057_1-any-arbitrary-words.jpg to 1057_1.jpg? Basically to remove everything from the first dash up to the dot?
Thanks for your help - I must learn htaccess properly sometime but need to find a quick solution for now!

You may try this:
RewriteEngine On
RewriteBase /
#RewriteCond %{REQUEST_FILENAME} !-f
#RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} /([^\-]+)-.+\.([^/]+)/?
RewriteRule .* %1.%2 [R=301,L]
Redirects permanently any URL like:
http://example.com/1057_1-anything.jpg or
http://example.com/any/number/of/folders/1057_1-anything.jpg
To:
http://example.com/1057_1.jpg or
http://example.com/any/number/of/folders/1057_1.jpg
Effectively removing -anything from the last string in the URL-path.
The image name hast to be the last string in the URL-path, for the rule-set to work.
For a silent mapping, remove R=301 from [R=301,L].
UPDATE:
anything did not include the period as it was used to determine the end of the name and the start of the extension. However, I modified the rule-set to remove also any number of periods in anything except the last one, according to the OP requirement in previous comment.

Related

Mirror a file in htaccess

I'm trying to work on making a new site and I want to be able to mirror a site. Below is an example:
User visits: https://example.com/items/{some child folder}
User sees this file mirrored: https://example.com/items/listing.php
I want user to be able to see that file, but, when doing so, it don't want it to redirect. Any ideas?
UPDATE
I found a solution to the above problem. However, I need another question fixed. How would I stop the file listing.php in the /products folder from following the redirect?
RewriteEngine On
RewriteRule ^products/(.*) index.php?name=$1 [NC,L]
How would I stop the file listing.php in the /products folder from following the redirect?
RewriteRule ^products/(.*) index.php?name=$1 [NC,L]
Be more specific in the regex. If your products don't contain dots in the URL-path then exclude dots in the regex. For example:
RewriteRule ^products/([^./]+)$ index.php?name=$1 [L]
The above assumes your product URLs are of the form /products/<something>. Where <something> cannot consist of dots or slashes (so naturally excludes listing.php) and must consist of "something", ie. not empty.
Unless you specifically need the NC flag then this potentially opens you up to duplicate content.
If you want to be explicit then include a condition (RewriteCond directive):
RewriteCond %{REQUEST_URI} !^/products/listing\.php$
RewriteRule ^products/([^/]+)$ index.php?name=$1 [L]
The REQUEST_URI server variable contains the root-relative URL-path, starting with a slash. The ! prefix on the CondPattern negates the regex.
Or, use a negative lookahead in the RewriteRule pattern, without using a condition. For example:
RewriteRule ^products/(?!listing\.php)([^/]+)$ index.php?name=$1 [L]
Reference:
https://httpd.apache.org/docs/current/rewrite/intro.html
https://httpd.apache.org/docs/current/mod/mod_rewrite.html

url rewriting with htaccess not working

I am trying to rewrite my urls in my site so whatever is after the slash is passed as an argument (example.com/Page goes to example.com/index.php?page=Page)
here is the code that isn't working (it gives a Forbidden):
RewriteEngine On
RewriteRule ^/(.+)/$ /index.php?page=$1 [L]
Any Help will be appreciated
This is what I suggested in the comment to your question:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/index\.php
RewriteRule ^(.+)$ /index.php?page=$1 [L,B]
The leading slash does not make sense in .htaccess style files, since you do not process an absolute oath in there, but a relative one. About the trailing slash: your example does not show such a slash, so why do you want to have it in the regular expression? It results in your pattern not matching anything but a request terminated by a slash. Which is not what you want.
The RewriteCond lines are there to still allow access to physical existing files and directories and to prevent an endless loop, though that should not occur with an internal-only rewriting. And you need the B flag to escape the part of the request url you want to specify as GET argument.
The last condition is actually obsolete, since obviously /index.php should be a file. I leave it in for demonstration purposes.
In general it is a very good idea to take a look at the documentation of apaches rewriting module: httpd.apache.org/docs/current/mod/mod_rewrite.html
It is well written, very precise and contains lots of really good examples. It should answer all your questions.

Trying to stop urls such as mydomain.com/index.php/garbage-after-slash

I know very little about .htaccess files and mod-rewrite rules. Looking at my statcounter information today, I noticed that a visitor to my site entered a url as follows:
http://mywebsite.com/index.php/contact-us
Since there is no such folder or file on the website and no broken links on the site, I'm assuming this was a penetration attempt. What was displayed to the visitor was the output of the index.php file, but without benefit of the associated CSS layout.
I need to create a rewrite rule that will either remove the information after index.php (or any .php file), or perhaps more appropriately, insert a question mark (after the .php filename), so that any following garbage will be treated like a parameter (and will be gracefully ignored if no parameters are required).
Thank you for any assistance.
If you're only expecting real directories and real files that do exist, then you can add this to an .htaccess file. What it does is it takes a non-existent file or directory request and gives the user the index.php page with the original request as a query string. [QSA] appends any existing query string.
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) index.php?$1 [PT,QSA]
I found a solution, using information provided by AbsoluteZero as well as other threads that popped up on the right side of the screen as the solution came closer.
Here's the code that worked for me...
Options -Multiviews -Indexes +FollowSymLinks
RewriteEngine On
RewriteBase /
DirectorySlash Off
# remove trailing slash
RewriteRule ^(.*)\/(\?.*)?$ $1$2 [R=301,L]
# translate PATH_INFO information into a parameter
RewriteRule ^(.*)\.php(\/)(.*) $1.php?$3 [R=301,L]
# rewrite /dir/file?query to /dir/file.php?query
RewriteRule ^([\w\/-]+)(\?.*)?$ $1.php$2 [L,T=application/x-httpd-php]
I got the removal of trailing slash from another post on StackOverflow. However, even after removing the trailing slash, the rewrite rule did not account for someone appending what looks to be valid information after the .php file
(For example: mysite.com/index.php/somethingelse)
My goal was to either remove the "/somethingelse", or render it harmless. The PATH_INFO rule locates a "/" after the .php file, and turns everything else from that point forward into a query string (which will usually be ignored by the PHP file).

Htaccess redirect and remove last 5 characters

I have 20k+ indexed pages but ~200 have /feed/ added at the end:
http://www.domain.com/page-ID-TITLE/feed/
ID and TITLE are dynamic.
TITLE can have multiple words, it doesn't have a fixed length: word1-word2-.....
The normal URL is:
http://www.domain.com/page-ID-TITLE/
The problem is that I get duplicated content on those pages, how can I redirect the URLs with feed at the end to the normal URL?
Thank you for your time!
Quite simply, actually. You don't need to specify the count of characters - only define what they are:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.+)/feed/?$ $1 [R=301]
I would suggest that you use the [R] flag instead of the [R=301] flag whilst testing on a production site. If it works, then switch it to the latter flag.

htaccess block everything but 1 file and 1 folder

I have seen a number of posts similar but wasn't able to achieve my actual desired feel.
So this is what I have.
I need to have all my past urls forward to the INDEX.PHP file. Which is fine and I have that so far, but I need to add an exception of a specific folder the 'images' folder so the images show up, b/c since i'm forwarding everything to the index it's not grabbing all the right images because it is blocking them.
currently I have something like this
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{REQUEST_URI} !images/$
RewriteCond %{REQUEST_URI} !index.php$
RewriteRule ^(.*)$ "http://www.exmample.com" [R=301,L]
currently my images won't load unless I do a specific call to the image ie:
RewriteCond %{REQUEST_URI} !images/single/picture.jpg$
Any thoughts or suggestions would be much appreciated.
Cheers
Yeah, the "!images/$" means that you're rewriting everything except for the specific request to the URI "http://www.example.com/images/". Use a regular expression in your RewriteCond:
RewriteCond %{REQUEST_URI} !images/(.*)$
This means that no URI starting with "images/" will be rewritten. I guess it'd be safer to do something along the lines of "!images/(.*).jpg$" so that no URIs starting with "\images" and ending with ".jpg" are rewritten. Refine it as needed, i.e. specifying multiple extensions ( (.jpe?g)|(.gif) ).
I'm no mod_rewrite wizard, and I'm not very good at regex, but this approach worked for me.

Resources