mod_rewrite complexity problem - .htaccess

I have a need to mod-rewrite my domain based off of subdomains...there are two scenarios in which i would need to do this:
my site is located in different cities...so losangeles.example.com should localized to los angeles, etc...essentially rewritten to www.example.com/?loc=losangeles
i want to allow users to create username.example.com to pass their profiles to friends with ease...essentially rewritten to www.example.com/user.php?id=username
As my site scales, I plan on having over 300 locations, and several thousand users...which means that hand coding each rewrite rule would be a little tedious.
How can I tell mod-rewrite to decipher between a location subdomain and a username subdomain?

I think fundamentally you need to rethink that strategy as mixing your namespaces is going to cause a lot of headaches in the long term. Consider what happens if you have missed out a city name, such as alexandra.example.com but a user has registered that username? Also, you are going to end up with having to do things like checking whether a city exists at the point that a username is trying to be created.
If you really do want to stick with this, one possibility is simply to send the requests to a single point which then does the lookup against your two databases and then internally sets a variable of either $city or $user .
Once you have set up your DNS wildcarding, in Apache set:
ServerAlias *.example.com
and then the rest of your virtualHost as normal.
Then in the application just check against $_SERVER['SERVER_NAME'], e.g.
$server_name = addslashes($_SERVER['SERVER_NAME']);
$query = "select * from instances where '{$server_name}' regexp replace(concat(server, '$'),'.','\.') limit 1";
but then you're actually going to have to have two queries and additional logic to pull apart the two namespaces or handle clash situations.
I would use a namespace like
<cityname>.example.com/user/<username>
which will avoid the problems, and also mean that you can customise what is visible on the user's homepage depending on which city you've entered, if that is desirable.

Related

SPF record with REDIRECT and INCLUDE

So I have to make an SPF record for a shared domain - 2 mailsystems, one is Office 365. Normally it looks like this:
“v=spf1 mx include:MAIL_SERVER include:spf.protection.outlook.com ~all”
It's quite straight forward, if it has been configured like this beforehand:
“v=spf1 mx include:MAIL_SERVER ~all"
But I have a different situation, where it is like this:
“v=spf1 mx redirect:_spf.PROVIDERSERVER.COM"
I am not sure, it I can do it like this:
“v=spf1 mx redirect:_spf.PROVIDERSERVER.COM include:spf.protection.outlook.com ~all”
Is it going to work like this? If not, then what should be changed?
The redirect is a modifier rather than a mechanism, and will only be considered after all other mechanisms have been tested. Unlike an include, once a redirect has been navigated it will not return to evaluate further terms, and although your positioning isn't invalid for clarity it should appear as the last term in the record since it will only be evaluated after all the other terms have been tested and passed over. ie its position in the SPF record will not determne its order of processing.
If any alternative mechanism term is satisfied in the record then the processing will stop at that term and return the evaluated condition, this includes any all mechanism that may be present. Therefore you cannot use redirect in combination with all, because the all mechanism will always be tested and satisfied first, and the redirect will never be processed. Of course, any all mechanism in the redirected domain's SPF would still apply if reached, unlike the -all in an include which would be ignored by returning not-matched to the include mechanism call. (Caveat: if a +all is encountered within a traversed include it will return matched, and trigger whichever result has been prepended to that include, usually a default + .)
It's worth noting that any redirected domain's own SPF may contain further redirects, and they would cascade as expected. However each redirect counts towards the lookup count limits.
So in summary you would want to use something like...
“v=spf1 mx include:spf.protection.outlook.com redirect:_spf.PROVIDERSERVER.COM”
I'm not sure on this, but here goes a guess! The docs say that redirect entirely replaces the current record, so I would expect it to ignore all other clauses - but it also evaluates from left to right, so maybe it would do the mx lookup first - you could test that manually.
I'm not sure why you're looking at redirect in the first place.
I suspect you could achieve what you're aiming for with:
"v=spf1 mx include:_spf.PROVIDERSERVER.COM include:spf.protection.outlook.com ~all"
Ass an addon to earlier answers.
From the RFC, section 6.1 on the redirect modifier it reads:
This facility is intended for use by organizations that wish to apply
the same record to multiple domains. For example:
la.example.com. TXT "v=spf1 redirect=_spf.example.com"
ny.example.com. TXT "v=spf1 redirect=_spf.example.com"
sf.example.com. TXT "v=spf1 redirect=_spf.example.com"
_spf.example.com. TXT "v=spf1 mx:example.com -all"
In this example, mail from any of the three domains is described by
the same record. This can be an administrative advantage.
Note: In general, the domain "A" cannot reliably use a redirect to
another domain "B" not under the same administrative control. Since
the stays the same, there is no guarantee that the record at
domain "B" will correctly work for mailboxes in domain "A",
especially if domain "B" uses mechanisms involving local-parts. An
"include" directive will generally be more appropriate.
And, a redirect modifier MUST not be combined with an all mechanism:
For clarity, any "redirect" modifier SHOULD appear as the very last
term in a record. Any "redirect" modifier MUST be ignored if there
is an "all" mechanism anywhere in the record.
Considering all this, I would suggest to go with the syntax as provided by #Synchro. Although it is not against the rules, it is highly unusual to combine mechanisms with the redirect modifier.
As far as I know ( / understand https://www.rfc-editor.org/rfc/rfc7208#page-26) you can do the record from the last example. The redirect modifier will be used if everything else fails, meaning it will be the last thing checked).
Note that according to this same RFC the redirect modifier is recommended to be at the end of the record, before ~all.

Is my domain mapping negatively affecitng me for SEO?

Consider this:
example.com is equivalent to domain.com/example
this is also true
example.com/subfolder is equivalent to example.com/example/subfolder
as a result of the domain mapping / .htaccess
is this bad?
to further elaborate, I am hosting multiple domains on the same ip
Search engines has the concept of duplicate content and that is exactly what will happen in your case.
If example.com/subfolder and example.com/example/subfolder have same content and is not marked as canonical of one another, then search engines considers them as duplicate. Your own page competes against each other, which will hurt the SEO values. In order to solve it, pick your preferred domain and do <linl rel=canonical href='your preferred domain' /> the other one to that.

Is it possible with canonical URL for this pattern in htaccess: /a/*/id/uniqueid?

A big problem is that I am not a programmer….! So I need to solve this with means within my own competence… I would be very happy for help!
I have an issue with a lot of duplicated URLs in the Google index and there are strong signs that it is causing SEO problems.
I don’t have duplicate links on the site itself, but as it once was set-up, for certain pages the system allows all sorts of variations in the URL. As long as is it has a specific article-id, the same content will be presented under an infinite number of URLs.
I guess the duplicates in Google's index has been growing over long time and is due to links gone wrong from other sites that links to mine. The problem is that the system have accepted the variations.
Here are examples of variations that exists in the Google index:
site.com/a/Cow_Cat/id/5272
site.com/a/cow_cat/id/5272
site.com/a/cow…cat/id/5272
site.com/a/cowcat/id/5272
site.com/a/bird/id/5272
The first URL with mixed case is the one used site-wide and for now I have to live with it, it would take too long time to make a change to all lower case. I cannot make a manual effort via htaccess as it is a total of 300.000 articles. I believe there are 10 ‘s of thousands that have one or more duplicates.
My question is this:
Is it possible to create rules for canonical URLs in htaccess in order to make the above URLs to be handled as one as well as for the rest of the 300.000?
I e, is there a way to say that all URLs having
/a/*/id/uniqueid
should be seen as one = based only on the unique ID and not give any regard to the text expressed with the “*”?
My hope is that it would be possible to say that a certain pattern like above should only be differentiated by the last unique segment.
If it is not possible in htaccess, how would it be done with link rel="canonical" on each page, can the code include wildcards?
I should add that the majority of the duplicates are caused by incoming links being lower case where the site itself is using a mix. Would it be OK to assign a canonical URL only with lower case although the site itself is basically always using a mix of lower/upper case?
If this is possible, I would be very happy to be helped with how to do it!!!!
Jonas
Hi Michael! I am not an expert but this is how I think it could be done:
1) My problem is that the URLs have mixed cases and I cannot change that now.
2) If it is OK for the searchengines, it would be fine for me to make the canonical URL identical to the actual URLs with the difference that it was all lower case, that would solve approx 90% of the duplicates. I e this would be the used URL: site.com/a/Cow_Cat/id/5272 and this would be the canonical: site.com/a/cow_cat/id/5272. As I understand, that would be good SEO...or...?
My idea was NOT to change the address browser address bar (i e using 301 redirect) but rather just telling the search engines which URLs that are duplicates, as I understand, that can be done by defining a canonical URL either in htaccess (as a pattern - I hope) or as a tag on each page.
3) IF, it would be possible to find a wildcard solution...I am not sure if this is possible at all, but that would mean it was possible to NOT assign a specific canonical URL but rather a "group pattern", i e "Please search engine, see all URLs with this patter - having the unique identifier in the end - as if they are one and the same URL, you SE, decide which one you prefer": /a/*/id/uniqueid
Would that work? It will only work in htaccess if canonical URLs can be defined as a group where the group is defined as a pattern with a defined part as the unique id.
Is it possible when adding a tag for each page to say that "all URLs containing this unique id should be treated the same"? If that would work it would look something similar to this
link rel="canonical" /a/*/id/5272
I dont know if this syntax with wildcard exist but it would be nice : )
My advice would be to use 301 redirects, with URL rewriting. Ask your webmaster to place this in your apache config or virtual host config:
RewriteMap lc int:tolower
Then inside your .htaccess file you can use the map ${lc:$1} to convert matches to lower case. Here, the $1 part is a match (backreference from brackets in a regex in the RewriteRule) and the ${lc: } part is just how you apply the lc (lowercase) function set up earlier. Here is an example of what you might want in your .htaccess file:
RewriteCond %{REQUEST_URI} [A-Z] #this matches a url with any uppercase characters
RewriteRule (.*) /${lc:$1} [L,R=301] #this makes it lowercase
As for matching the IDs, presuming your examples mean "always end with the ID" you could use a regex like:
^(.+/)(\d+))$
The first match (brackets) gets everything up to and including the forward slash before the ID, and the second part grabs the ID. We can then use it to point to a single, specific URL (like canonical, but with a 301).
If you do just want to use canonical tags, then you'll have to say what you're using code wise, but an example I use (so as not add tags to hundreds of individual pages, for instance) in PHP would be:
if ($_SERVER["REDIRECT_URL"] != "") {
$canonicalUrl = $_SERVER["SERVER_NAME"] . $_SERVER["REDIRECT_URL"];
} else if ($_SERVER["REQUEST_URI"] != "") {
$canonicalUrl = $_SERVER["SERVER_NAME"] . preg_replace('/^([^?]+)\?.*$/', "$1", $_SERVER['REQUEST_URI']);
}
Here, the redirect URL is used if it's available, and if not the request uri is used. This code strips off the query string (this bold bit in http://www.mysite.com/a/blah/12345/?something=true). Of course you can add to this code to specify a custom path, not just taking off the query string, by playing with the regex.

Is it possible to redirect domain to a case sensitive version of itself?

Is it possible to do a DNS redirect so that mywebsite.com would remap to MyWebsite.com?
This would be for purely cosmetic purposes on the domain name alone. I understand that the domain name will ultimately resolve to a lower case version, and that all characters following the TLD are best kept in lowercase.
What I'd like to achieve is simply maintaining MyWebsite.com/whatever-in-lower-case in the URL bar.
DNS is not case-sensitive, so this is not likely to be possible. You can modify your bookmarks/home page links/embedded URLs/whatever to contain the case you want, though, but I'm not sure it's really worth the effort...

Using one directory with different aliases using .htaccess

I'm redeveloping a site (replacing it with one based on CodeIgniter), which is currently a horrid mess of repeated procedural code, however, it has good search engine rankings. Because of this, I need to keep the exact same URL structure.
The company has many different quote pages, which are all essentially the same - so I've produced one clean version which can be used everywhere.
The quote system is now in a folder called /get-quote, but due to the old URLs being required, that folder mustn't be visible anywhere.
I'd like the following to happen, but don't know how to:
A user accessing /insurancequote.php should (on the server) load the /get-quote/ directory (which in turn will load the default CI route). The Base URL in CI should be http://www.mysite.com/insurancequote.php (I'm able to do that bit), so moving to step 2 would result in: http://www.mysite.com/insurancequote.php/step2 (which would map to /get-quote/step2).
Secondly, a user accessing /brokerquote.php should show mysite.com/broker in the address bar (redirect?), but on the server access /get-quote/broker.
Thirdly, a user accessing one of many broker-specific pages, e.g. mysite.com/brokername1.php or mysite.com/broker/brokername2.php (yep, they are scattered all over the place! - but I do know where each one is) should show mysite.com/broker/brokername1 or mysite.com/broker/brokername2. On the server, /get-quote/broker/brokername1 or /get-quote/broker/brokername2 should be accessed.
I don't think what I've written is completely clear, so maybe sudocode helps:
If '/insurancequote.php'
Dont Redirect
Use '/get-quote/'
If '/brokerquote.php'
Redirect '/broker/'
Use '/get-quote/broker/'
// Do the following (manually) for each broker
If '/brokername1.php'
Redirect '/broker/brokername1/'
Use '/get-quote/broker/brokername1/'
If '/brokers/bname2.php'
Redirect '/broker/brokername2/'
Use '/get-quote/broker/brokername2/'
If '/mybrokerpage.php'
Redirect '/broker/mybroker/'
Use '/get-quote/broker/mybroker/'
Is this possible? If so, how would I go about doing it?
Thanks!
The risk you take is messing up your new clean code for historical reasons (and the guy coming next you will say, WTF, this is a mess!).
For me the right solution would be handling the url migration in apache and not in your application. Every refferenced url that you do not want to keep should get a 410 - Gone message (think about referenced images for example) and every referenced page which have a new matching page should get a 301 - moved permanently redirection on the right page. Then after some time as gone check the access log of your server, and if nobody checks the old url anymore then remove the rules.
If you know every old url and every matching url then use a matching url file (or hash file, faster) and manage the redirection 301 with rewriteMap. You can have a really big number of files in a hash file, the match should be fast. And it should be a temporary function, waiting for robots to fix the urls.

Resources