google cse- rendering search results - search

I'm using Google CSE on my website and I want to have the search results display differently than the standard method. I've found this:
http://code.google.com/apis/customsearch/docs/snippets.html
I'm a little confused on the steps on how to style the results to my liking. I know that I have to create the structured data in my pages first (ie Pagemaps).
What does the second step mean though
"Fetch that structured data in the search results for your Custom Search Engine.
The Custom Search server can return the search results, along with the structured data, in XML or JSON format. "
And for the third step, do I just copy the code provided in the Custom Search Element?
Thanks in advance

"Fetch that structured data in the search results for your Custom Search Engine. The Custom Search server can return the search results, along with the structured data, in XML or JSON format. "
You don't need to fetch them yourself, I guess Indexing is meant with that. You can force Google to re-index your sites or upload a Pagemap directly through their service: https://developers.google.com/custom-search/docs/structured_data#pagemaphttp
After that you just request data from the JSON url:
https://www.google.com/cse?cx=[CSEID]&q=animal&output=xml&sort=myprivate12345-document-rating&pgmpk=myprivate12345
And for the third step, do I just copy the code provided in the Custom Search Element?
If you plan to use Javascript you best request the results in JSON. After that it is an Object in your code and you can style the hell out of it or do other things with it.

Related

does filetype equals pdf in google cse list work in python

I am using google custom search through the api google_api_client python. I want to retrieve the pdf documents for a particular query. Below is the function that does it.
def query_results(service,q=None,startIndex=1,siteSearch=None,fileType=None):
return service.cse().list(
q=q,
cx='000906600611484344115:o9lfdh9y1m5',
start=startIndex,
siteSearch=siteSearch,
fileType=fileType,
safe='off'
).execute()
When i call the above method passing q="alienware", filetype="pdf" and siteSearch="google.com", I get hardly one result. But then when I type the same query filetype:pdf alienware in google search through browser, I got plenty of results. Did I do any wrong ? or Is there an issue with the api itself ?.
When i say I got plenty of results, I mean the below image
Below is the configuration of custom search
Specifying siteSearch="google.com" means you will only search pages on google.com. It is equivalent to searching for site:google.com on Google. Remove the siteSearch and you should get the expected results.

Need help on how to customize fb generic template with flight search results?

I am working on a travel bot. The user can search and book flight by entering origin,destination along with dates.
I have integrated node.js server and I have an external API to retrieve flight details based on the search.
Everything is working fine,but how do I display the results in a template format (GENERIC TEMPLATE).
I have found a similar bot Skyscanner which display the search results in a beautiful way.
Like the below one.
IMAGE TO SKYSCANNER FLIGHT SEARCH RESULTS
They have converted the search results into a image and displaying in a generic template (HOW can we do this ?).
How can I display search results in a template format?
Appreciate Help!
That appears to be an image they are generating and attaching on the fly to the generic template. You could also use the airline itinerary template:
https://developers.facebook.com/docs/messenger-platform/send-api-reference/airline-itinerary-template

Getting thumbnails in OpenSearchServer search results

I need an alternative to Google Custom Search for a website I look after, it has to be something that will crawl a website, index it, allow fiddling of priorities, and then allow search queries via REST or something similar and return XML or JSON etc. It needs to run on a Windows Server instance.
So, I'm up and running with http://www.opensearchserver.com/ and it seems to do the trick, but can't, for the life of me, work out how to get thumbnail images in the results? I've searched the documentation and read everything I could, but can't find out how to do this (or how to get my head around it).
I'm crawling standard web pages and they all have thumbnail meta data, which I'm assuming should be able to be parsed somehow for results and included in the JSON results?
Any pointers at all would be very helpful, thanks!
I figured this out, in case anyone else is struggling, here's how I did it. The answer is in the documentations, it's just not that simple.
Read: http://www.opensearchserver.com/documentation/faq/crawling/how_to_extract_specific_information_from_web_pages.md - it contains the method
Assume you set up a 'web crawler' index.
Assuming you're using a meta thumbnail like this:
<meta name="thumbnail" content="http://my_cdn.com/news/images/29637.jpg">
Go into Schema / Fields. Add a new field called 'thumbnail' with index no, store yes, vector no, analyser Text, copy of blank. Save that.
Now go to schema / parser list, edit HTML parser. Go to 'field mapping', now add a new regex for the thumbnail in the html. We map from the 'htmlSource' to the thumbnail' with the matching regex.
My imperfect regex (that works though) is:
htmlSource -> linked in: thumbnail -> captured by:
(?s)<meta name="thumbnail" content="(.*?)">
Now SAVE this and go to crawl/manual crawl, enter a url that has a thumbnail and then check if the field now appears in the list below when it's read. If not check your regex, and check you actually saved the HTML Parser changes.
To get the thumb in your results, simply add the fieldname to the JSON you send with the query:
"returnedFields": [ "
"url",
"thumbnail"
],

From a pool of webpages, finding pages similar to any given webpage

I am given a set of webpages and I need to build a page recommender. Whichever URL is given to the application, the application should be able to find out pages from the given pool that are similar to the page at the URL.
I tried looking for different approaches to do that. The use of word2vec interested me. I am planning to crawl through all the given set of webpages and generate tags for that page based on the content in that page. From these tags I was hoping to use word2vec to calculate a vector value for the page and store it. When searching, I would caclulate vector for the given page in similar way to look for similar values. Is this the correct way of using word2vec? What training vector should be used? Any other better way to do this task?Or just plain text matching would be a better option?
I'd recommend using existing IR open source to handle your documents i.e. to index your crawled webpages and to query to get the results.
You can try to index document using elastic index all webpages and to query using more like this query, from elastic documentation:
The More Like This Query (MLT Query) finds documents that are "like" a given set of documents

How do I aggregate data off of a google search

I am trying to aggregate movie times off of google/movies search into a usable format such as json or xml
http://www.google.com/movies?q=movie+times&sc=1&mid=&hl=en&oi=showtimes&ct=change-location&near=new+york
The Google AJAX api does not seem to work for this as you cannot do a movie search.
Does anyone know how this can be done?
Lookup the technique called web scraping.
Basically, you have to fetch the results page using some server-side scripting, and then extract data from it, to present in a formated way (json, xml, etc). Regular expressions or a DOM/XML parser could help.
This guy has a PHP script that converts Google results to RSS.

Resources