How do I aggregate data off of a google search - search

I am trying to aggregate movie times off of google/movies search into a usable format such as json or xml
http://www.google.com/movies?q=movie+times&sc=1&mid=&hl=en&oi=showtimes&ct=change-location&near=new+york
The Google AJAX api does not seem to work for this as you cannot do a movie search.
Does anyone know how this can be done?

Lookup the technique called web scraping.
Basically, you have to fetch the results page using some server-side scripting, and then extract data from it, to present in a formated way (json, xml, etc). Regular expressions or a DOM/XML parser could help.

This guy has a PHP script that converts Google results to RSS.

Related

Unable to extract data using Import.io from Amazon web page where data is loaded into the page via Ajax

Anyone know how to extract data from a webpage using Import.io where the data is loaded into the page via Ajax?
I am unable to extract data from below mentioned pages.
There is no issue in first page data extraction, but how do I move on to extract data from second page?
URL is given below.
<http://www.amazon.com/gp/aag/main?ie=UTF8&asin=&isAmazonFulfilled=&isCBA=&marketplaceID=ATVPDKIKX0DER&orderID=&seller=A13JB7253Q5S1B>
The data on that page is deployed using an interesting mix of technologies; it relies heavily on server side code and Javascript. That type of page can be a challenge, however, there are always methods to get the data. For example, some sellers have a page like this:
http://www.amazon.co.uk/gp/node/index.html?ie=UTF8&marketplaceID=ATVPDKIKX0DER&me=A2WO1PQ2OIOIGM&merchant=A2WO1PQ2OIOIGM
Which is very easy to extract data from, even using the magic algorithm - https://magic.import.io/?site=http:%2F%2Fwww.amazon.co.uk%2Fgp%2Fnode%2Findex.html%3Fie%3DUTF8%26marketplaceID%3DA1F83G8C2ARO7P%26me%3DA2WO1PQ2OIOIGM%26merchant%3DA2WO1PQ2OIOIGM
I had to take off the redirect=true from the URLs before it would work - just an FYI.
Other times some stores don't have such a URL, its a bit of a pain, and there URLs can be tough to figure out.
We do help some of our enterprise customers build bespoke APIs when the data is very important to them, so do feel free to get in touch. I imagine a larger scale workaround would be to create a dataset/API based on a the categories you are interested in and then to filter that larger dataset down (python or CSV style) by seller name. That would probably work!
I managed to get a static dataset but no API. You can find that dataset at the following GUID: c7c63f1c-7081-4d4a-ad91-afe9789a6620
Thanks

Export google search to a spreadsheet

Is it possible for me to create a list of google search results from a specific query and export it into excel? For example, I'd like to google orthodontists in Florida and be able to export the business name, phone number and address to an excel spreadsheet. I've done a lot of searching but I can't find any solutions. I'm looking for someone to point me in the right direction. Any help is appreciated, thanks.
An API is an Application Programming Interface and it's a way for your software to interact with the software on a server. Google has an API called the "Custom Search Engine" which you can use for 100 free queries per day. Other search engines may have more generous free APIs. With a search API you can write a code to download text that contain all the relevant data. You can read more about search engine APIs here.
Another way to collect data from google is to scrape their page. This means that you use a code to download the HTML, and from that HTML you collect the relevant pieces (wikipedia link). With a programming language like python, many people use the Beautiful Soup library for scraping. With code then you can take the relevant parts of the HTML and put it into a format like CSV that is readable by Excel. With python there are ways to write to Excel, directly, too (link).
Finally, here is a link from 2007 that says with Google Spreadsheets you can import HTML.
Update: here is the MS Excel version.
The following web app https://www.resultstoexcel.com/ allows you to download Google search results to a CSV file, a Microsoft Excel readable format, for free.
If you have any problem viewing the downloaded results correctly in MS Excel, please read the FAQs section where you will find how to open the file using the correct column separators.
Where are the results coming from?
A Google search on the topic retrieves many companies that offer online access to Google Search Results through an Application Programming Interface (API). This web app uses SERPSBOT API.

Lotus Notes Search on Web without change in URL

Is there a way in Lotus Notes to search a view without using URL commands.
Thanks.
You would need to use Ajax and some Javascript for that. I would suggest to use jQuery, there you already have the function jQuery.get() that will do all the heavy lifting for you.
More info plus examples at http://api.jquery.com/jQuery.get/
Call the search, perhaps a Lotusscript agent that you pass the search criteria to and returns the results formatted as HTML, then simply insert the return value into the DIV on yoru form where you want to display it.
Or you can have the agent return the results as JSON, and then you can parse it locally in the browser and display it the way you want it.

google cse- rendering search results

I'm using Google CSE on my website and I want to have the search results display differently than the standard method. I've found this:
http://code.google.com/apis/customsearch/docs/snippets.html
I'm a little confused on the steps on how to style the results to my liking. I know that I have to create the structured data in my pages first (ie Pagemaps).
What does the second step mean though
"Fetch that structured data in the search results for your Custom Search Engine.
The Custom Search server can return the search results, along with the structured data, in XML or JSON format. "
And for the third step, do I just copy the code provided in the Custom Search Element?
Thanks in advance
"Fetch that structured data in the search results for your Custom Search Engine. The Custom Search server can return the search results, along with the structured data, in XML or JSON format. "
You don't need to fetch them yourself, I guess Indexing is meant with that. You can force Google to re-index your sites or upload a Pagemap directly through their service: https://developers.google.com/custom-search/docs/structured_data#pagemaphttp
After that you just request data from the JSON url:
https://www.google.com/cse?cx=[CSEID]&q=animal&output=xml&sort=myprivate12345-document-rating&pgmpk=myprivate12345
And for the third step, do I just copy the code provided in the Custom Search Element?
If you plan to use Javascript you best request the results in JSON. After that it is an Object in your code and you can style the hell out of it or do other things with it.

Automating book citation search

I have a list of books listed by their titles in a text file. I want to write a script which can use a web service like Google scholar or amazon to search for the books and return me a xml or bibtex file with citation info for each book.
Which programming tools can I use for this kind of automated search ?
Python would be my recommendation.
Get names from the text file, simple file reading
Construct a REST URL request to google's book API
http://books.google.com/books/feeds/volumes?q=Elizabeth+Bennet&start-index=21&max-results=10
Simple python code to get data from this URL (may need an API key, would advise using urllib2 with error handling rather than urllib)
Sample code,
import urllib
url = 'http://foo.api.request'
data = urllib.urlopen(url).read()
See the return schemas for this API (you can use the XML however you like).
See BibTeXML for conversion between the two formats.
HTH
I think it could be useful if you specify what kind of script you want to write!
Anyway... you could do some low level work and write your own HttpRequest for google and amazon or you could just rely on their API for example: http://code.google.com/apis/books/
There is a great project which does something similar what you want to do, it's called Shelves. It's written for Android but should give you some ideas how to handle your requests. Instead of downloading some citations it's downloading the cover.
http://code.google.com/p/shelves/
Just as a quick side note, saving your books in a xml file could be an option as well. In some cases it makes parsing them easier.

Resources