search copies of data from all over internet - search

i need your help and want advice as developer point of view that how people are running like sites like copyscape.com bascially they search copies of data on whole internet i want to know how they are searching and making catalog of all website from internet same like google as google makes index of site from internet
please guide me how they are searching data from all over internet how its possible to keep track of each and every website on internet how google knows that there is new site on internet from where there crawlers knows that new website is launched so in short i want to know how can i develop a site in which i can search copies of data all over internet with out depending on any third party api plzzz advice me i hope you will help me
thanks

Google's crawlers don't know when a new site is launched. Usually developers must submit their sites to Google or get incoming links from sites that are indexed.
And nobody has a copy of the entire Internet. There are websites that are not linked and never get visited by any crawler. This is called the deep web and is generally inaccessible to crawlers.
How do they do it exactly? I don't know. Maybe they index popular sites where text is likely to be copied, like Blogger, ezinearticles, etc. And if they don't find the text on those sites, they simply say its original. Just a theory and I am probably wrong.
Me? I would probably use Google. Just take a good chunk of text from the website you are checking is copied and then filter out the results that are from the original website. And viola, you have the website that have that exact phrase which is presumably copied.

Related

Google servers see website differently

I Googled one of our sites today (gamestyling.com) and saw that the results where in Chinese. It looks like our site was hacked but I see no traces of that. When opening the site all looks normaal (no Chinese).
On further inspection it seems that Google doesn't see the website correctly:
I cannot verify in Google search console. When I use the meta tag it shows me it detected a completely different tag.
When running pagespeed insight the preview does show Chinese: https://developers.google.com/speed/pagespeed/insights/?url=gamestyling.com
Also, when running the site through a proxy it looks completely normal.
Any idea how I can get Google to see my site correctly or what is causing this issue?
UPDATE
I now have access to Google search console and found that someone already had access to the property (2nd user):
I cannot remove the user because it uses a meta tag that google thinks is still in the header but doesn't appear in my code. So I'm still not sure if someone is playing tricks on Google or that we've been actually hacked. Note; nothing has changed on the server itself.
UPDATE2
This article describes exactly what's going on; https://blog.sucuri.net/2015/09/malicious-google-search-console-verifications.html. I must say that's an amazing safety fault on Google's part...
I had experienced this issue on one of the site and resubmitted website for review in google webmasters. Search results in google were corrected in couple of days.

Why cant I interact with my google docs embedded in my google site?

Happy Friday!
So, a while ago, I embedded a Google docs folder in my Google site. What a great feature!
But here is the problem: when I first embedded it, when I clicked on an embedded folder, there would be a section on the left, in which, I could, from the site, add files. All I had to do was just drag and drop them into the folder embedded on the website itself. Now, that option is just not there!
Have I changed some sort of setting by accident? Did Google change the way the embedding operates to disallow this?
Thanks!

launch google search from link

I am running a website based on php on a server run by a large host. My goal is very simple. Include link on my site to google search where I dynamically give the search term.
Starting with the url that appears in the address bar, I've narrowed the syntax down to
http://www.google.com/search?q=test
This works when I type it into the address bar. However, when I launch from the server, it redirects to:
www.google.com/webhp...lots of characters
There are references on the web to webhp being related to a virus but I'm pretty sure my host does not have any viruses on its servers.
Does anyone know proper way to launch simple google search from a link? Is a straight link forbidden? I am Willing to use JS to push link to client if necessary (which I use for google maps at Google's recommendation due to usage limits) but want to keep things as simple as possible. This link is just to save people a few clicks.
Thanks for any suggestions.
Simply use the urlencode Method
<?php
echo '<a href="http://www.google.com/search?q=', urlencode($userinput), '">';
?>
If you wish to do it with Javascript the answer is here: Encode URL in JavaScript?
Try to track down the "Url Rewriting", I think its a virus you need to remove: http://www.ehow.com/how_8728291_rid-webhp.html
WebHP is a computer virus that automatically sets your homepage to a
fake Google site, known as Google.com/WebHP. This virus will also
randomly open windows or tabs to load this website, as well as
generate pop-ups and fake errors. Also installed with this virus is a
rootkit which can disable your PC's firewall and other methods of
security. If left untreated, the WebHP virus allows hackers to
remotely access your computer and steal personal information, such as
credit card numbers and email passwords.

Fix/Replace DNN search-engine with FTP

I'm working on a DNN website, I have a user account with Admin privileges but don't have access to the Host Account. I do have FTP access and have been browsing around the file-structure and have seen some files referring to search.
The search is not working on the website so I was hoping I could replace the back-end code which runs the search, via FTP.
What files would need to be replaced to make sure they are not corrupted/buggy.
I realize doing this may not solve the problem, so any other advice as to trouble-shooting or possible solutions are appreciated.
EDIT(For those asking how in what way search does not work):
Here is an image of what happens when I search 'sheep' (the website is all about sheep). Was told by the company that original website that the search runs on our pages 'Keywords'. I've made sure pages contain keywords but they still do not show up in search.
The solution I ended up using for this problem because I could find no other solution without having the Super-User account access. Was to implement Google's Custom Search Engine, with the multi-page option.
http://www.google.com/cse/
In my case the original search engine was working via GET command with a value of q. This is the same as Google's CSE multi-page option. So I was able to simply remove the old search results html from a module and replace it with the html snippet provided by Google.

Search Engine listing like the provided screenshot

My question is about Search Engine Result pages, if your site is the first search engine result, many site's search results show the page listing like this as is this screenshot.
So, is there a procedure to follow so I can achieve the same effect for my site.
Unfortunately not, those are automatically generated by Google. You can read more details on the Google Webmaster page about Sitelinks.
Those are completely generated automatically by GOOGLE
Google Doesn't Say How to Get Sitelinks
The workings of many Google algorithms, including Sitelinks, are kept secret to discourage people from manipulating the rankings, but we can still look at examples and try to understand where Sitelinks come from. I've worked on a number of sites with Sitelinks, and these sites are similar in the following ways:
* Site ranks first for the keyword(s) that generate the Sitelinks listing
* Easily spiderable, structured navigation
* Fairly high natural search traffic
* High click through rates from the search results page
* Useful outbound links
* Inbound links from high quality sites
* Site age is several years or older
Sitelinks generally appear when a site is considered to be the main authority on a particular keyword, in short an old site with lots of links with the keyword anchor text is likely to get sitelinks.
I've found that, as stated above, having clear navigation helps. Also, it appears that the Google bot has particular support for media wiki, automatically pulling the TOC out into the results.
You can get more information about your own site links by logging into google webmaster tools.

Resources