Prevent Hyperlinks to Bad Domains [closed] - security

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a forum, user's can post comments, they can also enable/disable and approve comments; however, I can't always trust users will disapprove comments linking to bad domains. Like these: http://www.mywot.com/en/forum/3823-275-bad-domains-to-blacklist
My question is two part:
If a user does hyperlink to a 'bad domain' like those in the link above, will my forum/forum-category/forum-category-thread be penalised by it, and even if so if I add no-follow to the forum thread's links?
Is there a free API service out there, that I can make a request to to get a list of bad domains, so I can then filter them out of users' posts?
I maybe being paranoid, but it's probably because I'm not too SEO savvy.

The actual algorithms aren't public, but this is what I found looking around the 'net.
1) Google's Web master Guidelines says that it may lower ranking for sites that participate in link schemes. As an example of a link scheme, they give "Links to web spammers or bad neighborhoods on the web". NoFollow may or may not have impact on it, but the consensus seems to be that it doesn't.
2) You can use either of Google's two safe browsing APIs to check if sites have been found to be phishing and/or malware sites.

If your website linking to bad domains, that will definitely harm your website but again; it is depending upon outgoing links ratio.
I strongly recommend recruit forum moderator from active members who can manually moderate forum post and will help you to save from spamming.
I am not sure but many forums allow various restriction like:
- Only members having number of post can keep link in forum reply
- Only specified months/days old member can share links
- Only particular number of links are allowed in forum post.
Kindly check for such facilities that can help you to restrict the users.

Related

(How) can I add a second domain to a website to improve SEO? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 years ago.
Improve this question
So I'm making a website for a restaurant in a village in France. The restaurant is called Le Cantou, so I've registered www.lecantou.net. I want to make sure it is easy to find with Google. Now people obviously are not going to type in the name of the restaurant in Google, they will write "restaurant a saint cirq lapopie", because that's the name of the village. So I've also registered http://restaurant-a-saint-cirq-lapopie.com in the hopes that that will make it clear to the visitor that this is the restaurant they want.
Now my question is, I have one website with two domains: is there a way to handle the two domains so I get maximum SEO? I think duplicating the website is a bad idea. But setting a redirect from the long domain name to the original domain name also doesn't work, because then the long domain name will never show up in Google results, isn't that right?
What do you guys recommend?
I recommend you to give up on "the longer domain". Since Google's EMD Update, having domain witch includes the same keywords like popular search queries, won't help you rank better.
You should work more on the content, interaction with your visitors and getting the links from the local websites. That will help you improve the rankings.
Adding multiple domains to a website is a tricky procedure as Google(or other search engine) can easily find out the multiple domains or duplicate links with the help of sophisticated algorithms. Secondary domains pointing to the primary ones can be created through 301Redirects which is a safe technique. All the online visitors can be directed to the secondary domains easily.
As the other guys have said - you need to shift your focus from old-school SEO thinking.
Instead try and create an awesome website on your main domain, awesome content, etc.
Don't go for a superlong keyword heavy domain. It probably won't work.

Is there any effort towards a scraper and bot freindly Internet? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am working on a scraping project for a company. I used Python selenium, mechanize , BeautifulSoup4 etc. libraries and had been successful on putting data into MySQL database and generating reports they wanted.
But I am curious : why there is no standardization on structure of websites. Every site has a different name\id for username\password fields. I looked at Facebook and Google Login pages, even they have different naming for username\password fields. also, other elements are also named arbitrarily and placed anywhere.
One obvious reason I can see is that bots will eat up lot of bandwidth and websites are basically targeted to human users. Second reason may be because websites want to show advertisements.There may be other reasons too.
Would it not be better if websites don't have to provide API's and there would be a single framework of bot\scraper login. For example, Every website can have a scraper friendly version which is structured and named according to a standard specification which is universally agreed on. And also have a page, which shows help like feature for the scraper. To access this version of website, bot\scraper has to register itself.
This will open up a entirely different kind of internet to programmers. For example, someone can write a scraper that can monitor vulnerability and exploits listing websites, and automatically close the security holes on the users system. (For this those websites have to create a version which have such kind of data which can be directly applied. Like patches and where they should be applied)
And all this could be easily done by a average programmer. And on the dark side , one can write a Malware which can update itself with new attacking strategies.
I know it is possible to use Facebook or Google login using Open Authentication on other websites. But that is only a small thing in scraping.
My question boils down to, Why there is no such effort there out in the community? and If there is one, kindly refer me to it.
I searched over Stack overflow but could not find a similar. And I am not sure that this kind of question is proper for Stack overflow. If not, please refer me to the correct Stack exchange forum.
I will edit the question, if something there is not according to community criteria. But it's a genuine question.
EDIT: I got the answer thanks to #b.j.g . There is such an effort by W3C called Semantic Web.(Anyway I am sure Google will hijack whole internet one day and make it possible,within my lifetime)
EDIT: I think what you are looking for is The Semantic Web
You are assuming people want their data to be scraped. In actuality, the data people scrape is usually proprietary to the publisher, and when it is scraped... they lose exclusivity on the data.
I had trouble scraping yoga schedules in the past, and I concluded that the developers were conciously making it difficult to scrape so third parties couldn't easily use their data.

Protecting from "registration bots"? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
What is best strategy of protecting from "registration bots". Ones that just POSTing registration forms to my server, creating dumb users.
For my application, it started with just several new accounts per day. But now it became a real problem.
I would like to avoid confirmation mail, as much as possible. What are strategies to prevent this?
You can use a variety of techniques here:
Use a CAPTCHA like reCaptcha
Present the user with a trivial problem like "2+2=?". A human will be able to respond correctly where as a bot won't.
Add a hidden text field to your form. Bots are programmed to fill in every field they can. If you find that the hidden field has some data in it when the form was submitted, discard the request.
Use something like reCaptcha
Any kind of captcha will do it. eg: reCAPTCHA, but for popular bots a simple check like: "from the following checkboxes below please select the nth one" will do it.
Also, if you use a popular app like phpBB, just a little tweaking of registration page will do it.
If your site is very popular, then it's a different story altogether, and there will be always a way to write bots specifically designed for your site, but these basic tricks should be enough to stop generic bots.
You could log the IPs of those bots and block them. That is if they are not rotating lots of IPs.

Is MediaWiki viable for sensitive information? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I was under the impression that MediaWiki is due to its nature as "open for all Wiki platform" not tailored towards managing sensitive information.
I found some warnings about this on the MediaWiki FAQ and some user account extensions as:
If you need per-page or partial page access restrictions, you are advised to install an appropriate content management package. MediaWiki was not written to provide per-page access restrictions, and almost all hacks or patches promising to add them will likely have flaws somewhere, which could lead to exposure of confidential data. We are not responsible for anything being leaked, leading to loss of funds or one's job.
Now a consultant of my boss tells him there is no problem with sensitive information at all. I would like to hear if he is right and I worry too much.
I suppose all problems would go away if we would use separate instances of MediaWiki for every user group with the same rights.
Think about the risks here:
What sort of data are you planning on populating it with? If it is personal data such as salary, home address or medical data, or if it is credit card data then you may be required to protect it appropriately (in the US see HIPAA, Gramm-Leech-Bliley, SoX and state data protection legislation; in the UK see DPA 1988, FSA regs; in Japan JSoX; Globally PCI-DSS)
Aside from those regulations (and a whole lot of others globally) how would your business cope if the data was deleted, or published on the Internet, or modified, or corrupted?
The answers should help you define an 'appropriate' level of protection, which should then be explained along with the possible risks to the board, who should then make the decision as to whether it should go in.
(tweak the above based on company size, country etc)

Where Googlebot starts crawling? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 12 years ago.
Improve this question
Say if I register a domain and have developed it into a complete website. From where and how Googlebot knows that the new domain is up? Does it always start with the domain registry?
If it starts with the registry, does that mean that anyone can have complete access to the registry's database? Thanks for any insight.
Google will find your website on its own if some existing website has a link to it.
You can jump-start the process: http://www.google.com/addurl/.
You may also be interested in Google's Webmaster Tools.
Google needs to find you. That is, if there is no link to your site from another web site, it'll never find it.
Google finds pages to crawl as links from other pages. If no site links to your site, Google will likely never find it.
You should look at the google website. They have some good information here. They even have a link to add your site to their list to crawl.
Google works purely (sort of) on pagerank, which accounts for its success. If nobody cares about your site, nor do Google.
Usually I submit new domain directly to Google, I found out it takes less time for Google crwaler to find out about it rather than waiting for Googelbot to parse some link to the new domain in some other websites: http://www.google.com/addurl/
Another way is to create a free Webmaster Tools account on Google and add the new domain/website there (maybe also with a sitemap), this will take to Goggle even less time to crawl your domain and get it indexed.
As far as I know the registry's informations of a domain for most domains are public, you can read them by using a simple whois service.

Resources