Why would you stop Google from indexing pages in your website? [closed] - google-index

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I've read some articles on how to stop the indexing, but I'm not clear WHY you would actually want to do that.
1) The explanation I found for why was:
"For marketers, one common reason is to prevent duplicate content (when there is more than one version of a page indexed by the search engines, as in a printer-friendly version of your content) from being indexed.
Another good example? A thank-you page (i.e., the page a visitor lands on after converting on one of your landing pages). This is usually where the visitor gets access to whatever offer that landing page promised, such as a link to an ebook PDF." [Basically you don't want the user to find your Thank You page with freebies through search without signing up]
However, in both these cases it actually seems like a bad idea to prevent indexing? You'd rather just redirect to the sign-in page (in the second example) after your user finds you? At least the user will be able to reach your website.
2) It's also mentioned that indexing is not the same as appearing in Google search results, but it's not really clear what the difference is. Could someone enlighten?
TIA.

Let me provide few good reasons from my experience, though I believe many more exist.
Traditionally known primary reason is to save computing resources. Imagine a search engine - probably it would not like another search engine to index all of its results.
A big part of it is to prevent waste of resources. Imagine a search engine would index itself, that can take some time. This also applies to binary data which has no text.
Your example somewhat falls into this category
"For marketers, one common reason is to prevent duplicate content (when there is more than one version of a page indexed by the search engines, as in a printer-friendly version of your content) from being indexed.
But this is not considered a valid reason any more, as resource consumption is generally low, and proper disambiguation should be done with html metadata like
<link rel='canonical' href='<permanent link>' ...>
<link rel='alternate' media='printed' ...>
Another big reason to prevent indexing is privacy. E.g. facebook profiles are not indexed if owner chooses so.
Another good example? A thank-you page (i.e., the page a visitor lands on after converting on one of your landing pages). This is usually where the visitor gets access to whatever offer that landing page promised, such as a link to an ebook PDF." [Basically you don't want the user to find your Thank You page with freebies through search without signing up]
This falls into privacy category. Even better, a search engine once indexed a set of these "thank you" pages from a website of mobile operator, which also included the message sent.
One observed reason is general newbie paranoia. It is a bad reason, because paranoia solution would be much better implemented with HTTP authentication.

Related

Should I let non-members add comment to a post? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
As long as it's SQL injection proof, would it be alright for me to let non-members add comments to a post and give the Author the ability to delete them?
Before you do it, consider the following questions
(and any other questions specific to your project that may spring to mind)
Do you have a good rate-limiting scheme set up so a user can't just fill your hard drive with randomly-generated comments?
Do you have a system in place to automatically ban users / IP addresses who seem to be abusive? Do you have a limit on the number / number of kilobytes of comments loaded per page (so someone can't fill a page with comments, making the page take forever to load / making it easy to DoS you by making a lot of requests for that page)?
Is it possible to fold comments out of sight on the webpage so users can easily hide spammy comments they'd rather not see?
Is it possible for legitimate users to report spammy comments?
These are all issues that apply to full members, of course. But it also matters for anonymous users, and since anonymous posting is low-hanging fruit, a botmaster would be more likely to target that. The main thing is simply to consider "If I were a skilled programmer who hated this website, or wanted to make money from advertising on it, and I have a small botnet, what is the worst thing I could do to this website using anonymous comments given the resources I have?" And that's a tough question, which depends a great deal on what other stuff you have in place.
If you do it, here are a few pointers:
HTML-escape the comments when you fetch them from the database before you display them, otherwise you're open to XSS.
Make sure you never run any eval-like function on the input the user gives you (this includes printf; to do something like that you'd want to stick with printf("%s", userStr);, so printf doesn't directly try to interpret userStr. If you care about why that's an issue, google for Aleph One's seminal paper on stack smashing),
Never rely on the size of the input to fall within a specific range (even if you check this in Javascript; in fact, especially if you try to ensure this in Javascript) and
Never trust anything about the content will be true (make no assumptions about character encoding, for example. Remember, a malicious user won't need to use a browser; they can craft their calls however they want).
Default to paranoia If someone posts 20 comments in a minute, ban them from commenting for a while. If they keep doing that, ban their IP. If they're a real person, and they care, they'll ask you to undo it. Plus, if they're a real person, and they have a history of posting 20 comments a minute, chances are pretty good those comments would be improved by some time under the banhammer; no one's that witty.
Typically this kind of question depends on the type of community, as well as the control you give your authors. Definitely implement safety and a verification system (eg CAPTCHA), but this is something you'll have to gauge over time more often than not. If users are being well-behaved, then it's fine. If they start spamming every post they get their hands on, then it's probably time a feature like that should just go away.

Are these tasks all doable in sharepoint 2010 [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I'm not all that familiar with sharepoint. A client of ours asked if the following can be setup in sharepoint. I believe the following is all achievable, however he had a few questions, which I've included at the bottom. Here's the description:
Client wants to catalog all of his images in sharepoint. These images are used for marketing, annual reports, etc. Here are some features they need:
We’ll setup a subsite and make this guy an admin. He can edit a couple of group memberships to define who can have full access and who has read only.
Let him upload pictures…this is a photo library. Probably in a document library. He’ll need metatags, or custom fields. Description, expiration date, some others.
Give them some views grouping by some of this metadata. Like country.
Send out a weekly report of images nearing expiration.
When images have expired, delete them automatically
General search that will search all metatags and return hits
And here are his questions:
Couple of questions (not sure if these are possible):
- They would like to have a low quality image with a watermark over top of it for read only people. And they would have to click to ask for permission for the full version. The manager would get an email when this permission is asked for. Not sure what is the easiest way after that. Maybe the manager clicks something that will email the full image to that person. If this is doable, write up for me how it would work. So people with full permission see the full image, people with read only see the watermark version.
Is it possible to have it search by only one field, like country. Or give them to the choice to do a general search for all.
In sharepoint, is it possible to show a thumbprint image in the list of pictures? So if they search and get 10 results, they see the thumbnail and they don’t have to click on each one to even see a basic picture.
Are these all doable in sharepoint?
Thanks
Let him upload pictures…this is a photo library. Probably in a
document library. He’ll need metatags, or custom fields. Description,
expiration date, some others.
Give them some views grouping by some of this metadata. Like country.
Send out a weekly report of images nearing expiration.
When images have expired, delete them automatically
General search that will search all metatags and return hits
Everything in the first section SharePoint provides out of the box. The email may be the hardest part but even then it is likely a simple timer job.
a low quality image with a watermark over top of it for read only
people
Showing different images based on user security may be tricky. There is the ability for each item in a library to have its own security but it can be hard to maintain and slow down performance so I would recommend storing them in two lists. One for the watermark images and one for the full image. Linking the two is easy.
Is it possible to have it search by only one field, like country. Or
give them to the choice to do a general search for all.
Searching on one field and general search is also provided out of the box and you can create custom pages with any kind of search you could need.
In sharepoint, is it possible to show a thumbprint image in the list
of pictures? So if they search and get 10 results, they see the
thumbnail and they don’t have to click on each one to even see a basic
picture.
I know the 2013 search provides a preview but I do not know about 2010.

Redirects vs. "true" page hits : A Crawler's perspective [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Background :
Site domains such as bit.ly, ow.ly instagr.am, and gowal.la are shorteners which forward elsewhere . Since most of thes url's actually forward to other, third party sites, Im assuming they can handle a pretty heavy load.
Question :
Is there a different politeness metric when crawling 301 redirects from a single domain (i.e. ow.ly) , compared with crawling "real" content pages (i.e. blogger.com/) ?
More concretely : How many times a day would we expect to be able to hit a site which issues 301 redirects, compared with a normal site which streams real content.
Some initial thoughts :
My initial guess would be (10E6 = 1,000,000), given that what i see online suggests that hitting a mature site at the order of 10E3-10E5 times a day is not a huge issue, considering that large site like tumbler receives around (10E7 = 10,000,000+) views per day, with sites like google are on the order of 10E8 (billions) of view per day.
In any case, I hope this very raw bit of fact-finding that I've done will spur some thoughts on defining the difference in "politeness" metrics when we are discussing 301 redirects versus "true" page crawls (which are bandwidth intensive).
When in doubt, check robots.txt. There's a non-standard extension called Crawl-delay, which as you may be able to imagine, specifies how many seconds to wait between requests.
You mentioned bit.ly; their robots.txt has no such restrictions, and a human-friendly comment saying "robots welcome". As long as you are not abusive, you probably won't have a problem with them. There are also comments in there stating that they have an API. Using that API may be more useful than crawling.
As for defining abusive... well, unfortunately that's a very subjective thing, and there's not going to be any one right answer. You'd probably need to ask each specific vendor what their recommendations and limits are, if they don't provide this information through documentation on their site, robots.txt, or through an actual API, which itself may have well-defined access limits.

Cursors + Pagination & SEO

I would like to know if it's possible to paginate using cursors and keep those pages optimized for SEO at the same time.
/page/1
/page/2
Using offsets, gives to Google bot some information about the depth, that's not the case with curors:
/page/4wd3TsiqEIbc4QTcu9TIDQ
/page/5Qd3TvSUF6Xf4QSX14mdCQ
Should I just only use them as an parameter ?
/page?c=5Qd3TvSUF6Xf4QSX14mdCQ
Well, this question is really interesting and I'll try to answer your question thoroughly.
Introduction
A general (easy to solve) con
If you are using a pagination system, you're probably showing, for each page, a snippet of your items (news, articles, pages and so on). Thus, you're dealing with the famous duplicate content issue. In the page I've linked you'll find the solution to this problem too. In my opinion, this is one of the best thing you can do:
Use 301s: If you've restructured your site, use 301 redirects
("RedirectPermanent") in your .htaccess file to smartly redirect
users, Googlebot, and other spiders. (In Apache, you can do this with
an .htaccess file; in IIS, you can do this through the administrative
console.)
A little note to the general discussion: Since few weeks, Google has been introducing a "system" to help they recognise the relationship between pages as you can see here: Pagination with rel="next" and rel="prev"
Said that, now I can go to the core of the question. In each of the two solutions, there are pros and cons.
As subfolder (page/1)
Cons: You are losing link juice on the page "page" because every piece (page) of your pagination system, will be seen as an indipendent source because they have a different url (infact you are not using parameters).
Pros: If your whole system is doing using the '/' as separator between parameters (which is in a lot of case a good thing) this solution will give coninuity to your system.
As parameter (page?param=1)
Cons: Though Google and the other S.E.s manage the parameters without problems, you're letting them decide for you if a parameter is important or not and if they have to take care to manage them or ignore them. Obviously this is true unless you're not deciding how to manage them in their respective webmaster tool panel.
Pros: You're taking all the link juice on the page "page" but indeed this is not so important because you want to give the link juice to those pages which will show the detailed items.
An "alternative" to pagination
As you can see, I posted on this website a question which is related to your. To sum up, I wanted to know an alternative to pagination. Here is the question (read the accepter answer): How to avoid pagination in a website to have a flat architecture?
Well, I really hope I've answered your question thoroughly.

how to make our website to be among ten in Google results? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
how to make our site viewable in top ten of the Google search...
I want my website to be available for the user who Google with search name social networking or something like ssit How to do that?
GET to know SEO(Search Engine Optimization).
1) Use proper relevant meta tag's keywords and description
2) Include title tag, get more important keyword in heading tags
3) Use proper title and alt for images
4) Have a page for site-map in your website
5) Have more back-links to your site by submitting articles, press release and news
6) Cross linking between pages of the same website to provide more links to most important pages may improve its visibility
But before anything one should know about the type of users they want for their site and search for the relevant keywords for their site, Google analytic definitely helps for this purpose.
And most important don't expect you site to be on the top soon, it will take some time like 6 month at least to get on the top of search engine. As soon as users of your site increase rank will increase. So BEST OF LUCK
The algorithm used by Google gives a rank based on the number of other sites linking your site in association to specific keywords.
This has been spoofed to do so-called "google bombing": if a lot of people spread a link to a specific site using a specific word, they connect that word to that site and have the top rank (e.g., it has been used to associate insults to politics). The same technique has been used by spammers to rise the rank of garbage sites: they flood forums and blog comments with links to their sites. Although the algorithm has been improved to try to avoid this issue, it is still a viable way to raise the site rank.
It is clear that using such methods to improve visibility of your site will give a very bad reputation to your site.
I suggest instead to pay Google to advertise on it (so you will get top in a legitimate way).
Of course, you are supposed to get the top ten if your site is really the top for the specific argument.
Slip google a fiver (cough) I mean, uh, get other well known websites to link to you. Google's Pagerank works out a pages final rank or priority by determining how many other sites link to it, with links that come from already high ranks sites being weighted more then links form a 'unknown' site.

Resources