Is it possible with canonical URL for this pattern in htaccess: /a/*/id/uniqueid? - .htaccess

A big problem is that I am not a programmer….! So I need to solve this with means within my own competence… I would be very happy for help!
I have an issue with a lot of duplicated URLs in the Google index and there are strong signs that it is causing SEO problems.
I don’t have duplicate links on the site itself, but as it once was set-up, for certain pages the system allows all sorts of variations in the URL. As long as is it has a specific article-id, the same content will be presented under an infinite number of URLs.
I guess the duplicates in Google's index has been growing over long time and is due to links gone wrong from other sites that links to mine. The problem is that the system have accepted the variations.
Here are examples of variations that exists in the Google index:
site.com/a/Cow_Cat/id/5272
site.com/a/cow_cat/id/5272
site.com/a/cow…cat/id/5272
site.com/a/cowcat/id/5272
site.com/a/bird/id/5272
The first URL with mixed case is the one used site-wide and for now I have to live with it, it would take too long time to make a change to all lower case. I cannot make a manual effort via htaccess as it is a total of 300.000 articles. I believe there are 10 ‘s of thousands that have one or more duplicates.
My question is this:
Is it possible to create rules for canonical URLs in htaccess in order to make the above URLs to be handled as one as well as for the rest of the 300.000?
I e, is there a way to say that all URLs having
/a/*/id/uniqueid
should be seen as one = based only on the unique ID and not give any regard to the text expressed with the “*”?
My hope is that it would be possible to say that a certain pattern like above should only be differentiated by the last unique segment.
If it is not possible in htaccess, how would it be done with link rel="canonical" on each page, can the code include wildcards?
I should add that the majority of the duplicates are caused by incoming links being lower case where the site itself is using a mix. Would it be OK to assign a canonical URL only with lower case although the site itself is basically always using a mix of lower/upper case?
If this is possible, I would be very happy to be helped with how to do it!!!!
Jonas
Hi Michael! I am not an expert but this is how I think it could be done:
1) My problem is that the URLs have mixed cases and I cannot change that now.
2) If it is OK for the searchengines, it would be fine for me to make the canonical URL identical to the actual URLs with the difference that it was all lower case, that would solve approx 90% of the duplicates. I e this would be the used URL: site.com/a/Cow_Cat/id/5272 and this would be the canonical: site.com/a/cow_cat/id/5272. As I understand, that would be good SEO...or...?
My idea was NOT to change the address browser address bar (i e using 301 redirect) but rather just telling the search engines which URLs that are duplicates, as I understand, that can be done by defining a canonical URL either in htaccess (as a pattern - I hope) or as a tag on each page.
3) IF, it would be possible to find a wildcard solution...I am not sure if this is possible at all, but that would mean it was possible to NOT assign a specific canonical URL but rather a "group pattern", i e "Please search engine, see all URLs with this patter - having the unique identifier in the end - as if they are one and the same URL, you SE, decide which one you prefer": /a/*/id/uniqueid
Would that work? It will only work in htaccess if canonical URLs can be defined as a group where the group is defined as a pattern with a defined part as the unique id.
Is it possible when adding a tag for each page to say that "all URLs containing this unique id should be treated the same"? If that would work it would look something similar to this
link rel="canonical" /a/*/id/5272
I dont know if this syntax with wildcard exist but it would be nice : )

My advice would be to use 301 redirects, with URL rewriting. Ask your webmaster to place this in your apache config or virtual host config:
RewriteMap lc int:tolower
Then inside your .htaccess file you can use the map ${lc:$1} to convert matches to lower case. Here, the $1 part is a match (backreference from brackets in a regex in the RewriteRule) and the ${lc: } part is just how you apply the lc (lowercase) function set up earlier. Here is an example of what you might want in your .htaccess file:
RewriteCond %{REQUEST_URI} [A-Z] #this matches a url with any uppercase characters
RewriteRule (.*) /${lc:$1} [L,R=301] #this makes it lowercase
As for matching the IDs, presuming your examples mean "always end with the ID" you could use a regex like:
^(.+/)(\d+))$
The first match (brackets) gets everything up to and including the forward slash before the ID, and the second part grabs the ID. We can then use it to point to a single, specific URL (like canonical, but with a 301).
If you do just want to use canonical tags, then you'll have to say what you're using code wise, but an example I use (so as not add tags to hundreds of individual pages, for instance) in PHP would be:
if ($_SERVER["REDIRECT_URL"] != "") {
$canonicalUrl = $_SERVER["SERVER_NAME"] . $_SERVER["REDIRECT_URL"];
} else if ($_SERVER["REQUEST_URI"] != "") {
$canonicalUrl = $_SERVER["SERVER_NAME"] . preg_replace('/^([^?]+)\?.*$/', "$1", $_SERVER['REQUEST_URI']);
}
Here, the redirect URL is used if it's available, and if not the request uri is used. This code strips off the query string (this bold bit in http://www.mysite.com/a/blah/12345/?something=true). Of course you can add to this code to specify a custom path, not just taking off the query string, by playing with the regex.

Related

Htaccess - Rewrite URL with multiple different conditions

I want to make URL friendly with multiple conditions.
I got this: www.example.com/?lang=en&page=test&model=mymodel
I want to have: www.example.com/en/test/mymodel
But I got also this (with other parameters):
www.example.com/?lang=en&otherpage=othertest&othermodel=myothermodel
Must be:
www.example.com/en/othertest/myothermodel
How can I do this for my entire website?
If you're going to use friendly URLs that look like this:
www.example.com/<language>/<value1>/<value2>
then Apache won't be able to distinguish between the first and the second "non-friendly" URLs that you mentioned:
www.example.com/?lang=en&page=test&model=mymodel
www.example.com/?lang=en&otherpage=othertest&othermodel=myothermodel
This is because the parameter names (page and model in the 1st, otherpage and othermodel in the 2nd URL) are not present, and can't be guessed, from the friendly URL.
A possible workaround depends on how many different scenarios you have, that is, how many different parameters you want to handle.
E.g. if you only have a few scenarios, you can add a part to the friendly URL pattern telling Apache which parameter names to use, like so:
www.example.com/<language>/<parameter_set>/<value1>/<value2>
then, tell Apache to use the first parameter set if <parameter_set> equals e.g. 1, the second set if it equals 2 and so on.
A sample rewrite rule set could be:
RewriteEngine On
RewriteRule ^([\w]+)/1/([\w]+)/([\w]+)$ ./?lang=$1&page=$2&model=$3
RewriteRule ^([\w]+)/2/([\w]+)/([\w]+)$ ./?lang=$1&otherpage=$2&othermodel=$3
Please note that 1 and 2 are completely arbitrary (they could be any other string).
Naturally, the official docs are there to help.

RewriteRule - redirect multi variable URL to multi variable URL

Our old website has a search URL structure like this:
example.com/Country/United States/Region/California/Area/Southern California/City/San Diego/Suburb/South Park/Type/House/Bedrooms/4/Bathrooms/3/
This is currently rewritten to point to the physical page:
/search/index.aspx
The parameters in the URL can be mixed up in different orders, and the URL can include one or more parameters.
We want to 301 redirect these old URLs to a new structure that is ordered in a logical way and more concise:
example.com/united-states/california/southern-california/san-diego/south-park/?type=house&bedrooms=4&bathrooms=3
example.com/united-states/california/?type=house&bedrooms=4&bathrooms=3
Is there a way with URL rewriting to interrogate the old URL, work out what parameters are existing and then write out the new URL structure?
Even if we can limit it to just the Country, Region, Area, City and Suburb, that may be good enough to at least return some results even if it's not perfect.
Also, spaces should be turned into hyphens and all text made lowercase.
I already have the RewriteRule to turn the new URL structure into a URL to point to a physical page. It's just transforming the old URL in to the new URL I need help with. I've googled endlessly and it's just beyond me!
Can anyone help? Thanks.
Since you already have the old search page with rewriting rules set up for it and which is capable of parsing all parameters you need, the easiest and most appropriate solution I see here is to issue a redirect you require from this old search page's code. Just put the code that composes new URL with all parameters needed and redirects from this page - this should be a lot easier than trying to parse all these parameters in .htaccess and combine them into the new format.

Rewrite condition to remove specific parameter in Prestashop

I have around 1000 categories created in prestashop and I have SPSEARCHPRO module installed. This module enables me to live search though my products.
Live search doesn't work due to the high number of categories but if I search normally it doesn't work either because the cat_id are included in the link and the link is too long. I suppose that's why the live search doesn't work either.
Here is what I'm trying to do:
I have this link:
https://example.com/en/module/spsearchpro/catesearch?fc=module&module=spsearchpro&controller=catesearch&orderby=name&orderway=desc&cat_id=2%2C4%2C(etc etc etc etc etc)
how can I remove the cat_id parameter from the link because the value is too long, it includes all the category id's.
I'm on prestashop 1.6.1.9 with multistore enabled (I don't know if that matters).
Putting this early in your .htaccess should cut out the unwanted parameter when the path ends with the category search slug, you may need to add other slugs to that if there are more affected pages.
RewriteCond %{QUERY_STRING} ^(.*?&)?cat_id=(?>[^&]*)(?:&(.*))?$
RewriteRule ^.*/catesearch$ /$0?%1%2 [NS,DPI,PT]
You may have to use L,R instead of DPI,PT flags if PrestaShop doesn't trust the $_GET it starts with (which comes from the rewritten URL). I'm unsure because it looks like it re-parses the URL from $_SERVER['REQUEST_URI'] which is unchanged by rewriting and would overwrite the corrected parameters with the original undesired ones. It may be the only way to make it work is an external redirect.

howto rewrite engine rule

i'm trying to understand some rules or rewrite engine but i can't figure how to do it.
i have this link:
w**.example.com/index.php?city=new+york
and i wish to rewrite to this new:
w**.example.com/good-parties-in-new-york
the value of city can change to any other city.
but the point here is I only what to rewrite if all the key is:
index.php?city=
because the
index.php?zone=
is used for other things, etc...
any suggestion? thanks.
I'm a little confused on what exactly you want to achieve. URL rewriting is normally done to make URLs look nicer, not the other way around.
You would typically want to have a nice URL like this (which you'd communicate to your users):
w**.example.com/good-parties-in-new-york
act as an "alias" for a not-so-nice looking URL like this (= the actual page being served, unbeknownst to the users):
w**.example.com/index.php?city=new+york
With Rewrite, you can backreference regular expressions. In this case, you could convert parts of the "nice" URL into RegEx's which you would then backreference with variables in the query string of the page working in the background.
E.g.:
RewriteEngine On
RewriteBase /
RewriteRule ^good-parties-in-([a-z]+)-([a-z]+)$ index.php?city=$1+$2 [NC,L]
The first RegEx (([a-z]+)) is referenced as $1, the second as $2 (and so on).
Note that this example will only work for city names consisting of two words, like New York, San Francisco etc. You'll have to figure out of how many words city names can consist of and rewrite your code accordingly. (You might also have to set different flags.)
Plus, you should make sure that your php script checks against existing city names and throws an appropriate error/gives out a warning if users enter fantasy names like good-parties-in-magical-rainbow-city or similar.
If this isn't what you're looking for, maybe you could clarify your question?

htaccess redirect based on query value (range of values)

I'm trying to redirect a bunch of pages from one domain to another (not all the pages, just part of them).
The URL of a page is domain.com/?p=ID
ID is always a number.
I'd like to redirect all pages with IDs under 2000 a new domain, say domain2.com/?=ID
How can I do it? I'll probably have to use REGEX patterns, but I'm not that savvy when it comes to REGEX.
Thanks,
Roy
It is probably possible to do with a regex but regex is not really suited to doing ranges like that. Another way to do it could be to use a RewriteMap in your Apache conf like so:
RewriteMap examplemap txt:/path/to/file/map.txt
RewriteRule ^/?p=(.*) ${examplemap:$1|/?p=$1}
Your /path/to/file/map.txt file would then contain something like:
1 http://domain2.com/?p=1
2 http://domain2.com/?p=2
3 http://domain2.com/?p=3
.
.
2000 http://domain2.com/?p=2000
If the entry is not found in the map file then it should default to the existing domain because of the part after the pipe in the RewriteRule. This might seem like an overkill but it gives you the finest level of control over each redirect.
The above code has not been tested but hopefully it explains the principal. See the Apache docs for more information on using RewriteMap.

Resources