Mod Rewrite Rule for Dynamic URL - Is this possible? - .htaccess

I've given myself a headache trying to figure out if this can be done. I have a forum that was recently migrated, leaving thousands of broken dynamic links.
A typical URL looks like this:
http://domain.com/Forum_Name/b10001/25/
('b10001' refers to the forum ID number and the last number refers to the page number.)
The new URL is formatted like this:
http://domain.com/forums/Forum_Name.10001/
(No page number. Also, notice the 'b' is no longer in front of the ID number.)
Is there a rewrite rule that can achieve this?

I'm not a rewriter, but following what I've read here, something like this should work:
RewriteRule ^([A-Za-z0-9-]+)/b([0-9])+(/[0-9]+)?/?.*$ forums/$1.$2/ [NC,L]
^([A-Za-z0-9-]+) says "begins with an alphanumeric string", then there's the /b constant, followed by [0-9]+ (one or more digits), and then an optional / with one or more digit (the page number, (/[0-9]+)?), and lastly, it ends with an optional slash (/?$).
If the URL matches that pattern, then it's rewritten to forums/$1\.$2/. \. escapes the dot (it's a wildcard), $1 is the first match of the pattern (that first alphanumeric string which is the forum name), and $2 is the second match, namely, the number after the b.
Finally, NC means pattern is case-insensitive, and L is "last" - so you don't process any other rule. I think that is most up to you, just read the linked article and pick the flags you need :)
Edit: corrected pattern checking with http://htaccess.madewithlove.be/

I think what you're looking for is
RewriteRule ^([a-zA-Z0-9_]+)/b([0-9]+)/.*$ forums/$1/$2/
Make sure the contents of the [] parts match the format you're using for forum names and ids.
For parameters, you probably want R=301 to force a permanent redirect.

Related

Rewrite rules for making multiple paths work

I have a requirement to make the following paths work.
Depending on what the url consists of, they are mapped to go to different java classes.
/books/
/books/science/
/books/science/fiction/
/books/science/fiction/kids/
So, I have given the rewrite rules in my configuration file as:
^/books$
^/books/(.*)$
^/books/(.*)/(.*)$
^/books/(.*)/(.*)/(.*)$
but the moment I give a url something like this
http://localhost/books/science/fiction/kids/12345
instead of getting captured by the fourth rewrite rule, it is captured by the second one which is not what I want.
Can someone please tell me how to achieve this? Thanks in advance
^/books$ /webapp/wcs/stores/servlet/ABCController?resultsFor=allCategories [PT,QSA]
^/books/(.*)$ /webapp/wcs/stores/servlet/XYZController?make=$1&resultsFor=category [PT,QSA]
^/books/(.*)/(.*)$ /webapp/wcs/stores/servlet/ABCDController?format=$1-$2&resultsFor=subCategory [PT,QSA]
^/books/(.*)/(.*)/(.*)$ /webapp/wcs/stores/servlet/ASDFController?resultsFor=product [PT,QSA]
instead of getting captured by the fourth rewrite rule, it is captured by the second one
That’s because the dot matches any character, so slashes as well.
Replacing it by a character class allowing anything but a slash (and demanding at least one character out of that class, so + instead of *) should fix that: ([^/]+)
Another way would be to reverse the order of your rules … You should always try and write them in order from most to least specific anyway.

Rewrite rule for seo - title in url

Lets say I want users to be able to type this url in:
www.website.com/blog/2453/I-gained-0.1%-more-scripting-knowledge-!
I'm trying to include title information in the url for seo benefits.
I also want to include an id for my query. Effectively I want to pick up the id and ignore the title stuff that comes after, bearing in mind its user generated text so could contain any special characters in it.
How can I write a .htaccess rewrite rule so that the server reads it as the following with the appropriate GET data:
www.website.com/blog.php?id=2453
This is what I have tried but frankly I am way out of my depth here:
RewriteRule ^blog/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$ blog.php?id=$1 [NC,L]
The rewrite rule you are using should work except for the ., %, and ! characters that are in your URL. The % characters is not safe to use in URLs because it has a special meaning in the URL syntax. I wouldn't use exclamation points either.
If the ID is always going to be numeric, use ([0-9]+) instead of ([A-Za-z0-9-]+).
Try this URL:
www.website.com/blog/2453/I-gained-0.1-more-scripting-knowledge
With this rule:
RewriteRule ^blog/([0-9]+)/[A-Za-z0-9\-\.]+/?$ blog.php?id=$1 [NC,L]

Is it possible with canonical URL for this pattern in htaccess: /a/*/id/uniqueid?

A big problem is that I am not a programmer….! So I need to solve this with means within my own competence… I would be very happy for help!
I have an issue with a lot of duplicated URLs in the Google index and there are strong signs that it is causing SEO problems.
I don’t have duplicate links on the site itself, but as it once was set-up, for certain pages the system allows all sorts of variations in the URL. As long as is it has a specific article-id, the same content will be presented under an infinite number of URLs.
I guess the duplicates in Google's index has been growing over long time and is due to links gone wrong from other sites that links to mine. The problem is that the system have accepted the variations.
Here are examples of variations that exists in the Google index:
site.com/a/Cow_Cat/id/5272
site.com/a/cow_cat/id/5272
site.com/a/cow…cat/id/5272
site.com/a/cowcat/id/5272
site.com/a/bird/id/5272
The first URL with mixed case is the one used site-wide and for now I have to live with it, it would take too long time to make a change to all lower case. I cannot make a manual effort via htaccess as it is a total of 300.000 articles. I believe there are 10 ‘s of thousands that have one or more duplicates.
My question is this:
Is it possible to create rules for canonical URLs in htaccess in order to make the above URLs to be handled as one as well as for the rest of the 300.000?
I e, is there a way to say that all URLs having
/a/*/id/uniqueid
should be seen as one = based only on the unique ID and not give any regard to the text expressed with the “*”?
My hope is that it would be possible to say that a certain pattern like above should only be differentiated by the last unique segment.
If it is not possible in htaccess, how would it be done with link rel="canonical" on each page, can the code include wildcards?
I should add that the majority of the duplicates are caused by incoming links being lower case where the site itself is using a mix. Would it be OK to assign a canonical URL only with lower case although the site itself is basically always using a mix of lower/upper case?
If this is possible, I would be very happy to be helped with how to do it!!!!
Jonas
Hi Michael! I am not an expert but this is how I think it could be done:
1) My problem is that the URLs have mixed cases and I cannot change that now.
2) If it is OK for the searchengines, it would be fine for me to make the canonical URL identical to the actual URLs with the difference that it was all lower case, that would solve approx 90% of the duplicates. I e this would be the used URL: site.com/a/Cow_Cat/id/5272 and this would be the canonical: site.com/a/cow_cat/id/5272. As I understand, that would be good SEO...or...?
My idea was NOT to change the address browser address bar (i e using 301 redirect) but rather just telling the search engines which URLs that are duplicates, as I understand, that can be done by defining a canonical URL either in htaccess (as a pattern - I hope) or as a tag on each page.
3) IF, it would be possible to find a wildcard solution...I am not sure if this is possible at all, but that would mean it was possible to NOT assign a specific canonical URL but rather a "group pattern", i e "Please search engine, see all URLs with this patter - having the unique identifier in the end - as if they are one and the same URL, you SE, decide which one you prefer": /a/*/id/uniqueid
Would that work? It will only work in htaccess if canonical URLs can be defined as a group where the group is defined as a pattern with a defined part as the unique id.
Is it possible when adding a tag for each page to say that "all URLs containing this unique id should be treated the same"? If that would work it would look something similar to this
link rel="canonical" /a/*/id/5272
I dont know if this syntax with wildcard exist but it would be nice : )
My advice would be to use 301 redirects, with URL rewriting. Ask your webmaster to place this in your apache config or virtual host config:
RewriteMap lc int:tolower
Then inside your .htaccess file you can use the map ${lc:$1} to convert matches to lower case. Here, the $1 part is a match (backreference from brackets in a regex in the RewriteRule) and the ${lc: } part is just how you apply the lc (lowercase) function set up earlier. Here is an example of what you might want in your .htaccess file:
RewriteCond %{REQUEST_URI} [A-Z] #this matches a url with any uppercase characters
RewriteRule (.*) /${lc:$1} [L,R=301] #this makes it lowercase
As for matching the IDs, presuming your examples mean "always end with the ID" you could use a regex like:
^(.+/)(\d+))$
The first match (brackets) gets everything up to and including the forward slash before the ID, and the second part grabs the ID. We can then use it to point to a single, specific URL (like canonical, but with a 301).
If you do just want to use canonical tags, then you'll have to say what you're using code wise, but an example I use (so as not add tags to hundreds of individual pages, for instance) in PHP would be:
if ($_SERVER["REDIRECT_URL"] != "") {
$canonicalUrl = $_SERVER["SERVER_NAME"] . $_SERVER["REDIRECT_URL"];
} else if ($_SERVER["REQUEST_URI"] != "") {
$canonicalUrl = $_SERVER["SERVER_NAME"] . preg_replace('/^([^?]+)\?.*$/', "$1", $_SERVER['REQUEST_URI']);
}
Here, the redirect URL is used if it's available, and if not the request uri is used. This code strips off the query string (this bold bit in http://www.mysite.com/a/blah/12345/?something=true). Of course you can add to this code to specify a custom path, not just taking off the query string, by playing with the regex.

remove part of url via mod_rewrite

Is there any way to hide part of a Url via mod_rewrite. I am currently using part of the url, .htm, to split the page that is being requested and the query string.
Example
http://www.example.com/page/article/single.htm/articleid=8
This would let me know that the page requested is:
http://www.example.com/page/article/single
And the quest string is:
article=8
Ideally i would like the have this to work the same url without the .htm visible
http://www.example.com/page/article/single/articleid=8
The number of variables in the query sting varies as does the number of levels before the .htm so the rule would need to be dynamic
Thanks
To also do multiple querystring parameters, how do you want it to look? I started with this, which keeps this simple, then got trickier below.
http://www.example.com/page/article/single/articleid=8&anothervar=abc
Try this rule:
RewriteRule ^([^=]+)/(.+)$ $1.htm?$2 [NC,L]
This handles one or more querystring parameters, but does require at least one. This looks for anything without an = up to a slash, then everything else. Basically, it uses the = as the indicator of the path vs. the querystring portions; but actually splits it on the slash. (The NC is a habit of mine; not needed in this case, but when I leave it out I forget it when it's needed.)
To let querystrings be optional, so it could handle just
http://www.example.com/page/article/single
I found it easiest with two rules, instead of trying to mingle this into one rule:
RewriteRule ^([^=]+)$ $1.htm [NC,L]
RewriteRule ^([^=]+)/(.+)$ $1.htm?$2 [NC,L]
You can do something even prettier, using slashes for everything including multiple querystring parameters, like this:
http://www.example.com/page/article/single/articleid=8/anothervar=abc
It's a little hairy, but I think this works (couldn't let it go...)
Another rule handles replacing the slashes with ampersands, then doing the rewrite as above. This was easier to keep straight - maybe there's a way to do it all at once, but this was tricky enough for me:
RewriteRule ^([^=]+)$ $1.htm [NC,L]
RewriteRule ^([^=]+)/([^=]+=[^/]+)/([^=]+=.+)$ $1/$2&$3 [NC,LP]
RewriteRule ^/([^=]+)/(.+)$ /$1.htm?$2 [NC,L]
The first rule is as above, handling no querystrings at all. That just gets it out of the way.
The second rule is a loop LP, which is what I tend to find in examples whenever you have an unknown number of replacements. In this case, it's replacing the last querystring-slash with an ampersand, and looping until there's only one left (leaving that for the question mark in the third rule).
It's looking for the last one of these articleid=8/anothervar=abc where there are two parameters left. It replaces the slash with an ampersand like articleid=8&anothervar=abc
In words, it's looking for (and capturing in parentheses):
(not-equalsign) slash (not-equalsign equalsign not-slash) slash (not-equalsign equalsign anything)
This lines up as:
(not-equalsign) /page/article/single
slash /
(not-equalsign equalsign not-slash) articleid = 8
slash /
(not-equalsign equalsign anything) anothervar = abc
It replaces the last slash with an ampersand, and after looping, turns it into the first draft above: http://www.example.com/page/article/single/articleid=8&anothervar=abc . The third rule handles this as described above.
A note: These also assume all your urls will look like this, since they're going to tack on .htm to everything. If you want still allow explicit /something/page.htm then these rules would need to not-match on .htm if it's already there - something like that. Or maybe an initial rule up front that looks for .htm and just stops rewriting there. Or maybe only do this for the /page paths.

Stripping "&Itemid=XX" out of my URLS using .htaccess

I need to strip out a certain part of my URLS being generated and want to use .htaccess.
Some links are being appended with "&Itemid=XX" after the .html.
Example:
http://www.site.com/conferences-and-events.html&Itemid=XX
XX could be one digit or four so I guess I need a wild card for that part. I know other questions have been answered related to stripping out certain parts of URLs using .htaccess but I can't seem to figure out how to get my specific string stripped out. Any help would be appreciated and sorry for being redundant and dense.
You'll want to use URL rewriting for that, something like this should work;
RewriteEngine On
RewriteRule (.*)&Itemid=\d{1,4}(.*) $1$2 [R]
Explanation: This regular expression matches anything ((.*)) followed by &Itemid= [1 to 4 decimals], followed by anything (another (.*)), and redirects ([R]) to the first anything concatenated with the second anything, thus taking out the &Itemid=xx part.

Resources