How come google crawls some sites real time? [closed] - search

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 13 years ago.
Improve this question
I posted a source code on codeplex and to my surprise found that it appeared on google within 13 hours. Also when i made some changes to my account on codeplex those changes reflected on google within a matter of minutes. How did that happen ? Is there some extra importance that google pays to sites like Codeplex, Stackoverflow etc to make their results appear in the search results fast ? Are there some special steps i can take to make google crawl my site somewhat faster, if not this fast.

Google prefers some sites over others. There is a lot of magic rules involved, in the case of CodePlex and Stackoverflow we can even assume that they had ben manually put on some whitelist. Then Google subscribes to the RSS feed of these sites and crawls them whenever there is a new RSS post.
Example: Posts on my blog are included in the index within minutes, but if I dont post for weeks, Google just passes by every week or so.

Huh?

Probably (and you have to be an insider to know...) if they find enough changes from crawl to crawl they narrow the window between crawling until - sites like popular blogs / news ect are being crawled every few min.

For popular sites like stackoverflow.com the indexing occurs more often than normal, you could notice this by searching for a question that has been just asked.

It is not well known but Google relies on pigeons to rank its pages. Some pages have particularly tasty corn, which attracts the pigeons' attentions much more frequently than other pages.

Actually ... Popular sites have certain feeds that they share will google. The site updates these feeds and google updates its index when the feed changes. For other sites that rank well, seach engines crawl more often, provided there are changes. True its not public knowledge and even for the popular sites there are no guarantees about when newly published data appears in the index.

Real time search is one of the newest buzzwords and battlegrounds in the search engine wars. Google's announced/Bing's twitter integration are good examples of this new focus on super-fresh content.
Incorporating fresh content is a real technical challenge and priority for companies like Google since one has to crawl the documents, incorporate them into the index (which is spread across hundreds/thousands of machines), and then somehow determine if the new content is relevant for a given query. Remember, since we are indexing brand new documents and tweets that these things aren't going to have many inbound links which is the typical thing that boosts PageRank.
The best way to get Google/Yahoo/Bing to crawl your site more often is to have a site with frequently updated content that gets a decent amount of traffic. (All of these companies know how popular sites are and will devote more resources indexing sites like stackoverflow, nytimes, and amazon)
The other thing you can do is also make sure that your robots.txt isn't preventing spiders from crawling your site as much as you want and to make sure to submit a sitemap to google/bing-hoo so that they will have a list of your urls. But be careful what you wish for: https://blog.stackoverflow.com/2009/06/the-perfect-web-spider-storm/

Well even my own blog appears in real time (it's pagerank 3 though) so it's not such a big deal I think :)
For example I just posted this and it appeared in Google at least 37 minutes ago (maybe it was in real-time as I didn't check before)
http://www.google.com/search?q=rebol+cgi+hosting

Related

Htaccess and url for multilingual site [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm ready to use the subdirectory format for my multilingual website.
My first question is:
For SEO, have I to translate the page name in the url or it is useless?
Example:
- Same filename
site.com/fr/login
site.com/en/login
OR
- Different filename
site.com/fr/connexion
site.com/en/login
Then, when user is on site.com: Should I redirect him to site.com/en and site.com/fr depending user's IP? Or have I to set a default local, and have my url like site.com/page and site.com/fr/page
Finally, what is the best way to get the local from user's current URL?
Parsing url to get /fr or /en, or add a GET parameter in url with lang=fr (hidden with htaccess)
Thanks :)
As a precondition, I assume that you are not using frameworks / libraries. Furthermore, I never have solved similar problems only using .htaccess (as the title of your question requests) and thus don't know if it is possible to do so. Nevertheless, the following guidelines may help you.
Your first question
In general, a web page's file name and path have influence on its ranking. Furthermore, having page names and paths in native languages might help your users memorize the most important of your URLs even without bookmarking them.
Nevertheless, I never would translate the page names or directories for pages which are part of a web application (as opposed to a informational or promotional pages).
The login page you mentioned is a good example. I am nearly sure that you do not want your site to be found because of its contents on its login page. Actually, there are many websites which exclude login pages and other application pages from being indexed at all.
Instead, in SEO terms, put your effort into your promotional and informational pages. Provide valuable content, explain what is special about you or your site, and do everything you could that those pages get properly indexed. IMHO, static HTML pages are the best choice for doing so.
Furthermore, if you translate the names of pages which belong to your actual application, you will run into massive trouble. For example, after successful login, your application probably will transfer the user to his personal dashboard, which probably will be based on another HTML template / page. If you have translated that page name into different languages, then your application will have to take care to take the user to the right version. Basically, that means that you need as many versions of your application as languages you want to support. Of course, there are tricks to make life easier, but this will be a constant pain and definitely in no way worth the effort.
To summarize: Create static pages which show your USP (unique seller position) and provide valuable content to users (for example sophisticated tutorials and so on). Translate those pages, including names and paths, and SEO them in every way you could. But regarding the actual application, optimizing its pages is kind of pointless and even counterproductive.
Your second question
I would never use IP based redirecting for several reasons.
First, there are many customers in countries which are not their home country. For example, do you really want to redirect all native English speakers to your Hungarian pages because they are currently in Hungary for a business trip?
Second, more and more users today are using VPNs for different reasons, thereby often hiding the country where they currently are.
Third, which IP address belongs to which provider or country is highly volatile; you would have to constantly update your databases to keep up.
There are more reasons, but I think you already have got the idea.
Fortunately, there is a solution to your problem (but see "Final remark" below): Every browser, when fetching a page from a server, tells the server the preferred and accepted languages. For example, Apache can directly use that information in RewriteRule statements and redirect the user to the correct page.
If you can't alter your Server's configuration, then you can evaluate the respective header in your CGI program.
When doing your research, look for the Accept-Language HTTP 1.1 header. A good starting point probably is here.
Your third question
You eventually are mixing up two different things in your third question. A locale is not the same as a language. On one hand, you are asking "...to get the local from...", and on the other hand, you say "...lang=fr...", thus making the impression you want to get the language.
If you want to get the language: See my answer to your second question, or parse the language from the current path (as you already have suggested).
If you want to get the locale, things are more complicated. The only reasonable automatic method is to derive the locale from the language, but this will often fail. For example, I generally prefer the English language when doing research, but on the other hand, I am located in Germany and thus would like dates and times in the format I am used to, so deriving my desired locale from my preferred language will fail.
Unfortunately, there is no HTTP header which could tell the server which locale the user prefers. As a starting point, this article may help you.
See the final remark (next section) on how to solve this problem.
Final remark
As the article linked above already states: The only reliable way to satisfy the user is to let him choose his language, his locale and his time zone within your application. You could store the user's choices either in cookies or in your back-end database; each has its own advantages and disadvantages.
I usually use a combination of all methods (HTTP headers, cookies, database) in my projects.
Think about humans at the first. Is the URL translation important for users in France? Some people may think what it’s fine to get translated words in the URL. Users from other locales may think otherwise. Search engines take into account user behavioral factors. SEO factors will higher if you solution is more convinient for users.
It whould be nice if users get an expected language version. A site could help them if it suggests a language version by IP, HTTP headers, cookies and so on. Some people may prefer another language, some people may be on a trip. So it's still important let them to choice a language version manually.
Please read manuals and analyze competitors sites in case of doubt.
i usally show in mostly website they give url like site.com/en and site.com/fr as you mention but it upon you how you want to show website to user. i prefer make default site.com/en and give user option to select his language.
if you still confuse then refer below link it will usefull.
See Refferal Link Here
Should you translate paths?
If possible, by all means - as this will help users of that language to feel like "first class citizens". Login routes probably won't have much impact on SEO, but translating URLs on content pages may well help them to be more discoverable.
You can read Google's recommendations on multi-regional and multilingual sites, which state that it's "fine to translate words in the URL".
Should you redirect based on IP?
This can help first time users, but there are a few things to bear in mind:
How accurate will this be? E.g. if I speak English but I visit France and then view your site - I will get French. If your target market is mobile-toting globe-trotters, then it may not be the best choice. Is looking at Accept-Language, for example, any better?
Will geolocating the IP address on every request introduce any performance problems for your servers? Or make too many calls to an external geocoding service? Make sure to carry out capacity planning, and reduce calls where you already know the locale (e.g. from a cookie, or user explicitly stating their preference.
Even if you guess a preferred locale, always allow an explicit user preference to override that. There's nothing more frustrating than moving between pages, and having the site decide that it knows better what language you understand :-)
As long as you make it easy to switch between sites, you shouldn't need a specific landing page. It doesn't hurt to pop up a banner if you're unsure whether you should redirect a user (for example, amazon.com will show a banner to a UK user, giving them the option of switching sites - but not deciding for them)
How should you structure your URLs?
Having the language somewhere in the URL (either as a subdomain or a folder) is probably best for SEO. Don't forget to update your sitemap as well, to indicate to crawlers that there are alternate content pages in different languages.

Change of domain and company name but keep good SEO [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm just after a bit of advice please?
I have a company that has rebranded recently. There is currently two seperate domains set up for them where the content is fundamentally the same apart from where the company name is mentioned.
The newer site has therefore been created with the new branding with the intention of taking down the old site at some point. However, I've always been reluctant to do this as the old site does very well for particular keywords (probably because of the age)
I've read a few things but just wanted to ask what is the best way to go about decommissioning the old site? Is it a case of going through 301 redirects. If the original domain ceases to exist will these be read?
Thanks
I think 301 redirects are definitely the best place to start off – it is an easy way of letting the Google spiders know that they should travel to your new site instead, which means you will still have the benefits of the old site for keywords, but they will move to the new content you are setting up.
But the downside of this, of course, is that if you completely take down the old site, you get nothing, and the same with if you don’t maintain its SEO updates.
This ends up being a lot of hassle, so what we tend to do is we go through an overlap period of a few months so the new site can be better established and then remove the old one.
While you are doing that, you want to be moving your links over too – so contact web masters and get them on board with the move so that you can keep all that ‘link-juice’ flowing.
Ultimately though, the age of your website does have a bit of an impact on your SEO, but if you are starting from scratch with the new one, you can craft it with SEO in mind and make it more attuned to it right from the outset.
If you fail to implement proper redirects when migrating a website from one domain to another you will instantly lose any traffic to your website you currently enjoy.
It is also important to convey your old domain’s search engine rankings to the new web address – or your website will effectively be starting from zero visibility, in search engines at any rate.
Buying old domains and redirecting them to similar content shouldn’t be a problem. Even when redirecting pages, I usually make sure page titles and content are very similar on both pages if I want to keep particular rankings.
301 Redirects are an excellent 1st step. When search spiders crawl your old site, they will see & remember the redirect. Because of this, the relevance of your old site to certain search terms will be applied to your new site.
Great. Now that you've handled traffic coming in from search engines, what do you do about other sources of traffic? Remember that 301 Redirects will only work on non-search-engine traffic for as long as you maintain the old site...
Next you'll want to contact the web masters of any sites that link to your old site & inform them of the change. This way when you retire the old site their links don't go dead, loosing you traffic. Keep an eye on the "Referrer" field in your logs to see who is currently linking to you.
Lastly, you'll want to keep the old site doing redirects for a while longer so that folks who have bookmarked your old site will have the redirect cached by their browser. "How long?" you ask... Well I'd keep an eye on the web logs of the old site. When the non-spider traffic drops off to near 0, you'll know you've done your job right...
301 is the best way i could think of.

google bot rel="nofollow" how long to stop following

I just added rel="nofollow" to some links.
Anyone know how long it takes for google to stop following after "nofollow" is added to a link?
I added an hour ago and still see them crawling the "nofollow" links.
It might be the case that it won't stop following your rel="nofollow" links. According to Wikipedia:
Google states that their engine takes "nofollow" literally and does not "follow" the link at all. However, experiments conducted by SEOs show conflicting results. These studies reveal that Google does follow the link, but does not index the linked-to page, unless it was in Google's index already for other reasons (such as other, non-nofollow links that point to the page).
From Google's Webmaster Central:
Google's spiders regularly crawl the
web to rebuild our index. Crawls are
based on many factors such as
PageRank, links to a page, and
crawling constraints such as the
number of parameters in a URL. Any
number of factors can affect the crawl
frequency of individual sites.
Our crawl process is algorithmic;
computer programs determine which
sites to crawl, how often, and how
many pages to fetch from each site. We
don't accept payment to crawl a site
more frequently. For tips on
maintaining a crawler-friendly
website, please visit our webmaster
guidelines.
Using Google Webmaster Tools, you can see the last time it crawled your website and if the links are still showing in the searches, they may be conflicting as per #Bears post.
That depends on many things. Google's databases are massively distributed, so it may take a while for the change to propagate. Also, it may take the crawler some time to revisit the page where you added the nofollows - again, this is computed by some closed Google algorithm. In the worst cases, I've seen tens of days without the indexes getting updated; best case, few minutes. The modus would be a few days. Be patient, young Jedi ;)
I am keeping track of around 7000 pages on how google visits them. Yes it keeps following the pages even if I put the nofollow thing but for a while. It will crawl the same page a couple of times before it finally removes it. So it'll take time.

How does a website build popularity? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
So, my question is how do people get the word out that their website or blog exists? Do blogger invest in ads? Is is just through word of mouth? Or searching Google? I'm just curious how does a website build it's popularity. Do you just put your website up on the web and hope people find it? I know you can make your site SEO friendly, create sitemaps and such but what other techniques are used?
Thanks,
John
The big thing is, build a good site! have good quality relevant content. SEO and page linking will help. Most search traffic comes from Google imho. I would suggest
http://www.google.com/webmasters/start
Submite a sitemap would be high on my todo list.
Also Use relevant and unique - page titles, Friendly urls and relevant H1 tags
Hope that helps
My blog has been running for about a year and a half. I tried some tricks or tips on promotion that I read from around the Internet. Sure, you can get some activity bursts from promotion on other places, but I've found that the number one factor for a blog is simply to have good quality content. You can trick people into visiting with clever reddit or digg titles, but they'll never turn into repeat visitors. With quality posts, the search engine and referrer inflow will be steady.
If you only blog for popularity or money, and don't really care about putting out worthwhile content, it will show, and the people will not visit your site. I changed pretty early on from quantity over quality to quality over quantity. After all, ask yourself: wouldn't you rather subscribe to a blog that gave you a great read once a month rather than a blog that flooded your reader inbox with shallow, forced posts?
Among hobbyists, the usual approach is to make a polite announcement on related forums, and when a subject comes up on a forum or blog that you have addressed on your blog or web site, include a link as a part of your response.
Among professionals, advertising, advertising, advertising.
This thread at Hacker News is a good starting point.
Well, this site was started by Jeff Atwood and Joel Spolsky (if I'm correct), so they both mentioned it on their blogs. Then they asked for people on those blogs to join the beta version. By the time the live version came out, people were already here. Then... wait for word of mouth to spread. If your site is great, they will come.
Join the ASP, read and participate in their forums... I learned a lot from them and highly recommend them. Ignore the politics.
I used Myspace as my own free advertising engine, perhaps a bit a-moral but it did the trick till I caught my server on fire.
Having a community of users is very important in having a popular site. Typically your site would have a message board of some sort where users could interact with each other. Also having a large source of reference information is also important. Once you have your site up, you need to go out and promote it. It takes time, be patient but never give up promoting it.

How to collect customer feedback? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
What's the best way to close the loop and have a desktop app "call home" with customer feedback? Right now our code will login to our SMTP server and send me some email.
The site GetSatisfaction has been an increasingly popular way to get customer feedback.
http://getsatisfaction.com/
GetSatisfaction is a community based site that builds a community around your application. Users can post questions, comments, and feedback about and application and get answers to their questions either from other members or from members of the development team themselves.
They also have an API so you can incorporate GetSatifaction into your app, and/or your site.
I've been playing with it for a couple of weeks and it is pretty cool. Kind of like stackoverflow, but for customer feedback.
Feedback from users and programmers simply is one of the most important points of development in my opinion. The whole web2.0 - beta - concept more or less is build around this concept and therefore there should be absolutely no pain involved whatsoever for the user. What does it have to do with your question? I think quite a bit. If you provide a feedback option, make it visible in your application, but don't annoy the user (like MS sometimes does with there feedback thingy on there website above all elements!!). Place it somewhere directly! visible, but discreet. What about a separate menu entry? Some leftover space in the statusbar? Put it there so it is accessible all the time. Why? People really liking your product or who are REALLY annoyed about something will probably find your feedback option in any case, but you will miss the small things. Imagine a user unsure about the value of his input "should I really write him?". This one will probably will not make the afford in searching and in the end these small things make a really outstanding product, don't they? OK, the user found your feedback form, but how should it look and what's next? Keep it simple and don't ask him dozens questions and provoke him with check- and radioboxes. Give him two input fields, one for a title and one for a long description. Not more and not less. Maybe a small text shortly giving him some info what might be useful (OS, program version etc., maybe his email), but leave all this up to him. How to get the message to you and how to show the user that his input counts? In most cases this is simple. Like levand suggested use http and post the comment on a private area on your site and provide a link to his input. After revisiting his input, make it public and accessible for all (if possible). There he can see your response and that you really care etc.. Why not use the mail approach? What about a firewall preventing him to access your site? Duo to spam in quite some modern routers these ports are by default closed and you certainly will not get any response from workers in bigger companies, however port 80 or 443 is often open... (maybe you should check, if the current browser have a proxy installed and use this one..). Although I haven't used GetSatisfaction yet, I somewhat disagree with Nick Hadded, because you don't want third parties to have access to possible private and confidential data. Additionally you want "one face to the customer" and don't want to open up your customers base to someone else. There is SOO much more to tell, but I don't want to get banned for tattling .. haha! THX for caring about the user! :)
You might be interested in UseResponse, open-source (yet not free) hosted customer feedback / idea gathering solution that will be released in December, 2001.
It should run on majority of PHP hosting environments (including shared ones) and according to it's authors it's absorbed only the best features of it's competitors (mentioned in other answers) while will have little-to-none flaws of these.
You could also have the application send a POST http request directly to a URL on your server.
What my friend we are forgetting here is that, does having a mere form on your website enough to convince the users how much effort a Company puts in to act on that precious feedback.
A users' note to a company is a true image about the product or service that they offer. In Web 2.0 culture, people feel proud of being part of continuous development strategy always preached by almost all companies nowadays.
A community engagement platform is the need of the hour & an entry point on ur website that gains enuf traction from visitors to start talking what they feel will leave no stone unturned in getting those precious feedback. Thats where products like GetSatisfaction, UserRules or Zendesk comes in.
A company's active community that involves unimagined ideas, unresolved issues and ofcourse testimonials conveys the better development strategy of the product or service they offer.
Personally, I would also POST the information. However, I would send it to a PHP script that would then insert it into a mySQL database. This way, your data can be pre-sorted and pre-categorized for analysis later. It also gives you the potential to track multiple entries by single users.
There's quite a few options. This site makes the following suggestions
http://www.suggestionbox.com/
http://www.kampyle.com/
http://getsatisfaction.com/
http://www.feedbackify.com/
http://uservoice.com/
http://userecho.com/
http://www.opinionlab.com/content/
http://ideascale.com/
http://sparkbin.net/
http://www.gri.pe/
http://www.dialogcentral.com/
http://websitechat.net/en/
http://www.anymeeting.com/
http://www.facebook.com/
I would recommend just using pre built systems. Saves you the hassle.
Get an Insight is good: http://getaninsight.com/

Resources