Scanning the web for links - search

I was just wondering if their is a way to scan the web to find out every website that has a link to my website.
Say scan the web for every website that has a link to www.example.com ???
Is their a tool to do this.

Massive task, don't reinvent the wheel, use google webmaster tools:
https://www.google.com/webmasters/tools/
Edit: If you wanted to do it yourself you'd have to text index and datamine the whole internets. You'd be long dead before you got close to finishing this.

Related

Response Time on a web page like a stop watch

I have searched and found so many answers but nothing that fits my requirement. I will try to explain here and see if any of you guys have some tips.
I wish to click on a link manually and from there on; I wish that some kind of tool or service starts recording time from my click and stops when the desired page is loaded. This way, I am able to find out the exact user interface response time.
All the online web testing services ask for main URL. In my case the main URL has gazillion links and I wish to use only 1 link as standard sample which is a dynamic link
For example:
- I click on my friend's name on Facebook
- From my click to the time page is loaded, if there's a tool that does the stop watch thing?
End goal is:
I will be stress testing a server with extensive load and client wishes to see response time of simple random pages when load is at 500, 1000, 2000 and so on.
Please help!
Thank you.
You can use a simple load tester tool and the developer tools on the Chrome browser, you can get a clear picture of page load times under load. Also you can see which request completed in how much time and the time from start to finish.
Just start the load test and try from the chrome.
Also you can use a automated latency monitor like smokeping.
You may use httpwatch or YSlow to find the client side page load times.
http Watch and Fiddler helped. Didn't really go as I had thought but pretty Close and satsifactory. Thanks guys
You could try WPT this is a tool which has a private and a public instance for serving exactly what you want to do also supports scripted steps executed via the browsers JS the nicest thing i find in WPT is that you can use the public instance to measure the actual user experience from other than yours world locations or you can make a private one.

launch google search from link

I am running a website based on php on a server run by a large host. My goal is very simple. Include link on my site to google search where I dynamically give the search term.
Starting with the url that appears in the address bar, I've narrowed the syntax down to
http://www.google.com/search?q=test
This works when I type it into the address bar. However, when I launch from the server, it redirects to:
www.google.com/webhp...lots of characters
There are references on the web to webhp being related to a virus but I'm pretty sure my host does not have any viruses on its servers.
Does anyone know proper way to launch simple google search from a link? Is a straight link forbidden? I am Willing to use JS to push link to client if necessary (which I use for google maps at Google's recommendation due to usage limits) but want to keep things as simple as possible. This link is just to save people a few clicks.
Thanks for any suggestions.
Simply use the urlencode Method
<?php
echo '<a href="http://www.google.com/search?q=', urlencode($userinput), '">';
?>
If you wish to do it with Javascript the answer is here: Encode URL in JavaScript?
Try to track down the "Url Rewriting", I think its a virus you need to remove: http://www.ehow.com/how_8728291_rid-webhp.html
WebHP is a computer virus that automatically sets your homepage to a
fake Google site, known as Google.com/WebHP. This virus will also
randomly open windows or tabs to load this website, as well as
generate pop-ups and fake errors. Also installed with this virus is a
rootkit which can disable your PC's firewall and other methods of
security. If left untreated, the WebHP virus allows hackers to
remotely access your computer and steal personal information, such as
credit card numbers and email passwords.

Fix/Replace DNN search-engine with FTP

I'm working on a DNN website, I have a user account with Admin privileges but don't have access to the Host Account. I do have FTP access and have been browsing around the file-structure and have seen some files referring to search.
The search is not working on the website so I was hoping I could replace the back-end code which runs the search, via FTP.
What files would need to be replaced to make sure they are not corrupted/buggy.
I realize doing this may not solve the problem, so any other advice as to trouble-shooting or possible solutions are appreciated.
EDIT(For those asking how in what way search does not work):
Here is an image of what happens when I search 'sheep' (the website is all about sheep). Was told by the company that original website that the search runs on our pages 'Keywords'. I've made sure pages contain keywords but they still do not show up in search.
The solution I ended up using for this problem because I could find no other solution without having the Super-User account access. Was to implement Google's Custom Search Engine, with the multi-page option.
http://www.google.com/cse/
In my case the original search engine was working via GET command with a value of q. This is the same as Google's CSE multi-page option. So I was able to simply remove the old search results html from a module and replace it with the html snippet provided by Google.

search copies of data from all over internet

i need your help and want advice as developer point of view that how people are running like sites like copyscape.com bascially they search copies of data on whole internet i want to know how they are searching and making catalog of all website from internet same like google as google makes index of site from internet
please guide me how they are searching data from all over internet how its possible to keep track of each and every website on internet how google knows that there is new site on internet from where there crawlers knows that new website is launched so in short i want to know how can i develop a site in which i can search copies of data all over internet with out depending on any third party api plzzz advice me i hope you will help me
thanks
Google's crawlers don't know when a new site is launched. Usually developers must submit their sites to Google or get incoming links from sites that are indexed.
And nobody has a copy of the entire Internet. There are websites that are not linked and never get visited by any crawler. This is called the deep web and is generally inaccessible to crawlers.
How do they do it exactly? I don't know. Maybe they index popular sites where text is likely to be copied, like Blogger, ezinearticles, etc. And if they don't find the text on those sites, they simply say its original. Just a theory and I am probably wrong.
Me? I would probably use Google. Just take a good chunk of text from the website you are checking is copied and then filter out the results that are from the original website. And viola, you have the website that have that exact phrase which is presumably copied.

Web site analyser

This is sort of a statistics question. I am looking for a website analyser, not quite like google analytics. I want the analyser to crawl the website itself and record all the data on a page. Images, size of image and so on.
Even if it is just a library then its a start for me.
Thanks
You could try wget to download all the images on a site. Doubt it's the best way to do this though. Chrome's Inspect Element function has information on the sizes of all images on a page, if that's more what you're looking for.

Resources