How do search engines recognize search boxes on websites? - search

I've noticed that a lot of the time when i search something on Google, Google automatically uses the search function of relevant websites and return the result of the website search as if it was just another URL.
How do i let Google and other search engines know what is the search box on my own website and does Open Search has anything to do with it?

do you maybe mean the site search function via the google chrome omnibar?
to get there you just need to have a
form with method type GET
input type text element
submit button
on the root page of your domain
if users go directly to your root page and search something there, google learns of this form and adds it to the search engines accessible via the omnibar (the google chrome address bar).
did you mean this?

Google doesn't use anyones search forms - it just finds a link to search results, you need to
Use GET for your search parameters to make this possible
Create links to common/useful search results pages
Make sure google finds those links
Google makes it look like just another URL because that is exactly what it is.
Most of the time though Google will do a better job than your search engine so actually doing this could lower the quality of results from your site...

I don't think it does. It's impossible to spider sites in real time.
It's just a SEO technique some sites use to improve their ranking by spamming Google with fake results. They feed the Google bot with an endless stream of links to bogus pages:
http://en.wikipedia.org/wiki/Spamdexing

Related

How does Mixpanel's Search Keyword work?

I'm curious on how Mixpanel tracks which Search Keywords an event is affiliated with. Is this from the organic search (vs. paid search ads)?
If yes, how did they do it? From a glance, I guess organic search works this way:
That link goes to a proxy link with some query parameters which contain info about the (encrypted) search term & the real destination link.
Redirect to the real destination link.
Google Analytics know the organic search keyword used on a session because they intercept it in the middle point. I'm not sure if there's any way for someone outside of Google to intercept that info (including Mixpanel). Right? (correct me if I'm wrong)
If there is a way for the destination website to know the organic search keyword, can I be enlightened on the method?
I don't think this is coming from organic search or paid ads due to a couple reasons:
Most of the organic traffic is now in HTTPS which makes it hard to get the search parameters. Google Analytics shows this data through the Webmaster Tools console which is able to grab keyword data in a different way (I assume through the Google backend and not the URL itself). Otherwise, you are stuck with the "Not Provided" issue in Google Analytics.
Mixpanel only captures the default UTM parameters: utm_campaign, utm_source, utm_keyword, utm_medium and utm_content. Mixpanel also calls this properties as expected: UTM Medium, UTM Source, etc.
I can't tell from your screenshot but it seems this might be a custom property that your Mixpanel setup is setting it, perhaps from an internal search engine? Or perhaps you're grabbing a custom URL query?
Can you provide more information as to how this event is being captured?

Get/Show google search results in my app

I am facing a problem while developing an app, where I need to display search engine results directly on my app page without directing to www.google.com.
This is how it looks, in the search box I'll enter the RSS feed site name, and now I want to get the google search result on my app page so that I can easily extract RSS feed website and perform the operation I was intended to do.
I am intending only to get RSS feeds from the site just by typing sitename.
Thank you!
Answer.
Almost working..,
Thank you #Chandan,#Suzi
Check under 2. A Better Approach
I didn't try it out practically and am not sure whether its deprecated by this time or not.

Is it possible to scrape any given URL with NodeJS?

est I'll preface this by saying this is something that is new to me and is purely a learning exercise, so please excuse any naivety.
I've been looking through some articles on scraping and it seems that NodeJS, ExpressJS, Request and Cheerio would be my preferred method as a Front-End guy who is comfortable with JS/jQuery.
All the articles I've read so far focus on scraping data from a specific website in the absence of an API, whereas what I am looking to achieve to start with is a tool which takes any given URL and returns a true/false for a list of which common libraries are being used and which social networks are linked.
For example, a user enters a URL and the results return a "This website uses jQuery, MooTools, BackboneJS, AngularJS, etc" and "This website is linked with Facebook, Twitter, etc". Somewhat similar to Tregia: http://www.tregia.com/process?q=http://smashingmagazine.com.
Is my chosen setup (above) appropriate or limited to only scraping specific pages due to CSS selectors?
You should be able to scrape all pages and then find their tags and read which tools they're using (although keep in mind they may have renamed them [ex angularjs3.1.0.js - > foobar.js] to keep people from knowing their stack). You should also be able to get the specific text within the rest of the tags that you feel relevant as well.
You should try and pay attention to every page's robots.txt as well.
Edit: You probably won't be able to scrape "members"/"login only" areas of sites though.

how to get URL / link from google search results and accepted by TMemo

how to get a lot of URLs from Google.com search and I received it on TMemo without using TWebbrowser. but I do not mean no Source Code HTML / even like this code [eg: Idhttp.Get ('http://www.google.com/search?q=blah+blah+blah'); ], but only a Text / String URLs from Google search results.
Thx b4.
Don't use Google's HTML-based website frontend. It is meant for web browsers and user interactions. Use Google's developer APIs instead, like its Custom Search API.

How fast does Google take to crawl new page, and can we influence Google's crawler?

I want to submit my site to Google. How much time does it take to crawl a new post on the website?
Also, is there a way to feed this post to Google crawler as soon as a post is created?
Google has three modes of entering a website into its results - discover, crawl, index.
In order to 'discover' your site, it must be made aware of it's existence - normally through back-links. If you're site is brand new you can use the submit URL form - but this isn't really a trusted method. You're better off signing up for a Google Webmaster Tools account and submitting your site. An additional step is to submit an XML sitemap of your site. If you are publishing to your site in a blogging/posting way - you can always consider PubSubHubbub.
From there on, crawl frequency is normally based on site popularity (as measured by ye olde PageRank). Depth of crawl (crawl-budget) is also determined by PR.
There are a couple ways to help "feed" the Google Crawler a URL.
The first way is to go here and submit a URL ---> www.google.com/webmasters/tools/submit-url/
The second way is to go to your Google Webmasters Tools and clicking "Fetch as GoogleBot"
And then inputting the URL you want to add:
http://i.stack.imgur.com/Q3Iva.png
The URL will then appear similar to this:
http:\\example.site Web Success URL submitted to index 1/22/12 2:51 AM
As for how long it takes for a question on here to appear on google, there are many factors that are put in to this.
If the owners of the site use Google Webmasters Tools, the following setting is available:
http://i.stack.imgur.com/RqvOi.png
For fast crawl you should submit your xml sitemap in google web master and manually crawled and index your web pages url through google webmaster fetch.
I also used google crawled and index method and after that this practices give me best result.
This is a great resource that really breaks down all the factors that affect a crawl budget and how to optimize your website to increase it. Cleaning up your broken links and removing outdated content, for example, can work wonders. https://prerender.io/crawl-budget-seo/ 
I acknowledged error in my response by adding a comment to original question a long time ago. Now, I am updating this post in interest of keeping future readers from being misguided as I was. Please see notes from other users below - they are correct. Google does not make use of the revisit-after meta tag. I am still keeping the original response text here to make sure that anyone else looking for similar answer will find it here along with this note confirming that this meta tag IS NOT VALID! Hope this helps someone.
You may use HTML meta tag as follows:
<meta name="revisit-after" content="1 day">
Adjust time period as necessary. There is no guarantee that robots will return in given time frame but this is how you are telling robots about how often a given page is likely to change.
The Revisit Meta Tag is used to tell search engines when to come back next.

Resources