How to deny USA users except google bots? - .htaccess

As for one of my friends request i had to build a website for 3 countries in 3 sub-domains.
Like
au.example.com
us.example.com
in.example.com
All these three has common contents and some unique contents.All the traffic comes from a particular country will redirect to the related subdomain.
My problem is the google indexing. As all the traffic comes from USA will directed to us.example.com,google bots will index only us.example sub-domain.But there are lot of other contents at in.example.com. so how can I let the google to index my other 2 subdomains?
Thanks for your advices

How are you doing your location-specific sorting?
What you'll have to do is add an exception based on something, probably based on the user-agent.
However, Google has searches for all languages, so you really shouldn't worry about it. Their version of search for AU will crawl AU, etc. If you want to allow Google to index AU for its US search... that might get you in a bit of trouble with Google (and honestly, would defeat the purpose of what you are trying to achieve).

Related

Whats the best way to use multiple languages on a website?

I was wondering what would be the best way to achieve a multi-language template based website. So say I want to offer my website in Englisch and German there are some different methods. My interest is mainly about SEO, so which would be the best way for search engines.
The first way that I often see is using different directories for each language, for example www.example.com for English and www.example.com/de/ for the German translation. The disadvantage of this is: when changing a file, ist has to be changed in every directory manually. And for search engines the two directories would be concerned as duplicate content, wouldnt they?
The second way I know is just using some GET value like www.example.com?lang=de and then setting a cookie. But this way search engines probably wont even find the different languages.
So is there another way or which one is the best?
I worked on internationalised websites until this year. The advice we always had from SEO gurus was to discriminate language based on URL - so, www.example.com/en and www.example.com/de.
I think this is also better for users; if i bookmark a page in German, then when i come back to it, i get a page in German even if my cookies have expired. Similarly, i can do things like post the URL on Facebook, and have my German-speaking friends click on it and get a site in German.
Note that if your site serves multiple countries, you should handle those along with language - so, you might have example.com/de-DE, example.com/en-GB, example.com/en-IE, etc.
However, this should not involve duplication. Instead, you should set your application up to process the URL, extract the locale information, and then forward the request internally to a locale-independent page. So, a request for example.com/de-DE/info and a request for example.com/en-IE/info should both be passed to /info.jsp (or i'm guessing info.php in your case). That page should then be coded to emit text in the appropriate language, using a page-level localisation mechanism.
Things are a bit trickier if you want the URLs themselves to be localised (eg example.org/de-DE/anmelden vs example.org/en-IE/sign-in). However, the same principle applies: extract the locale, then forward to a common page. The difference is that there must be more sophistication in figuring out what the page is from the URL; you will need a mapping from natural language in the URL to the page filename.

Expression Engine search problem

We runnig site-s with EE 1.6.8... Not funny, but my boss like it...
So we implemented a search. Everything is fine but the search url is like this:
/search/results/0374c6c40f159934bc6795f031c4e52f10/
instead
/search/results/keyword
The developers said, that only a paid plugin can we put the keyword in the url.
OMG.
Is it true?
And another Q: after few hours the search url give no results back. It seems, that the session of the cookie expired or anything.
I have two ideas:
1. Our developers want to fool me
2. EE is so, it's not a cms just a cms like thing...
You are correct, the EE Search module uses session-based URLs for results. The reason being that search results are cached for performance, so those results need to expire after a short period of time (as new results might need to appear).
I assume what you want is bookmarkable search results. In this case, I suggest Super Search, or on the free, Google-powered end, the Google Search Results plugin.
Not 100% sure if it would work but in theory you could have www.example.com/search/results/keyword .
In your EE code you would put {exp:weblog:entries search:body="{segment_3}"}title:{title} etc..{/exp:channel:enties} as shown on http://expressionengine.com/legacy_docs/modules/weblog/parameters.html#par_search
The problem is when the keyword contains non [a-z][0-9] characters which is worth considering.
We offer EvoPost on our website for free http://www.eevolution.co.uk/index.php/addons/evopost which will enable you to capture the keywords from a HTTP POST variable e.g. search:body="{ep_txtboxname}"
Feel free to contact us through our website if you need any assistance with the product.
Thanks
Tim
EEvolution Developer

How to determine the number of domains of a specific country TLD?

As the title says,
lets say I want to get the number of .de domains:
Googling:
inurl:www.*.de
retrieves the correct results but a lot of them are from the same domain.
Is there another way to do this?
the better search query would be: site:de
but even so, the result count of goolge is just a very very very blurry page estimate (a.k.a. completely wrong and not what you are looking for).
google is the wrong source for this.
but via google i found this
http://www.denic.de/hintergrund/geschichte-der-denic-eg.html
August 2009 13 Millionen Domains
unterhalb von .de registriert –
darunter 463.000 IDNs.

Why and how does the googlebot use my website's search engine?

Looking through my search logs from time to time, I notice that by far the biggest user of my search engine is the google-bot. What gives? Is it looking for content that might not be directly accessible through navigation? If so, how does it know which words and phrases to look for (they're surprisingly relevant). Does it check the most popular keywords on the site? I know I seem to be answering my own question here, but this is really only working it out from first principles. I'd like to hear from someone who knows what they're talking about (i.e. not me).
If your search form's method is get instead of post, each search has its own url, and people might be posting those urls elsewhere. Or if you have a (possibly inadvertently) publicly accessible webstats page that listed those urls, that's another common way for search engines to stumble upon your internal search urls. A third way I've seen is sites that list recent searches on their pages, but this is more intentional. "MySQL Performance Blog" does this to an annoying extent, so any search of their site from google yields hundreds of pages of similar searches, even if none of them found what they were looking for.
Edit: Looks like it does on occasion, but only GET forms:
http://googlewebmastercentral.blogspot.com/2008/04/crawling-through-html-forms.html
Google will use words that occur on your site in search boxes to try to find pages that it can't otherwise.
Google says that for the past few months, it has been filling in forms
on a "small number" of "high-quality" web sites to get back
information. What words has it been entering into those forms? Words
automatically selected that occur on the site, with check boxes and
drop-down menus also being selected.
http://searchengineland.com/google-now-fills-out-forms-crawls-results-13760

How does google return "searches" from other websites?

Let's say I'm performing a google search for search term.
Sometimes, one of the suggestions will be to a URL like this: www.someothersearch.com/search+term/
How does "someothersearch.com" do this?
In general, a page will only be in Google if some other page links to it. Google is not going to go to someothersearch.com and submit "search term" into the form, it is likely a hidden or nonhidden link on someothesearch.com.
Why not? someothersearch.com presumably has its own index pages for terms searched previously; the Google spider is just indexing those index pages as well.
Just a guess. Maybe these sites support OpenSearch?
I misunderstood your question at first; What these sites are doing is rewriting their requests. How they know which terms people will search for is a bit of a mystery to me, but it probably relies on things like watching google.com/trends, scraping their own and other log files for referral from google that include the search term, buying lists of well ranking terms people might use AdSense for and instead trying to generate natural traffic for them... etc. Probably when they add new pages with these terms they're also adding them to their xml sitemap that Google will crawl.
Redacted:
I have added the Open-Search tag to your question; please follow it. You'll find this post on https://stackoverflow.com/questions/20830/firefox-and-ie7-users-here-is-your-stackoverflow-search-pluginlink textthe most informative; however I recommend you use image/png for your icon format.

Resources