Solr Facetting - Showing First 10 results and Other - search

I am implementing a solution in Solr where I have a lot of values in my facet.
As opposed to displaying a long list of values(facets) down the side of my page I want to display the top 10. And also have one for other.
For instance I would be faceting on Nationality.
So, I do not want to have a list of every nationality, Nor do I want a "see all" button.
What I require is the top 10 nationalitys and then "Other".
When a user clicks on other, it facets on this?

This is quite easy in Solr.. All you need to do is add a
&facet.limit=10
e.g.
http://solrserver:8080/solr/select&version=2.2&q=solr&start=0&rows=0&indent=on&facet=on&facet.field= nationality&facet.limit=10
to your request and you should be able to limit the results.
For more information you can check out my blog post on faceting in solr:
http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html
or the solr wiki here:
http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit

Related

Cloudant/Couch db pagination in search API - How to skip n number of records

I am building a typical pagination that allows the user to click on a particular page number and view the results (similar to the google search result view). I am using the cloudant search API for this. The cloudant search API provides the limit option but no skip option. How can I skip n number of results if the user is on page 1 and clicks on page 4 ?
I can see that the pagination is implemented using bookmarks. Does it mean that I need to first get the bookmark for page 4 by sending 3 additional requests one after another to the search api ?
There are a couple of different ways of handling this - one is the one you already suggested, which is just to fetch the pages as needed to get the bookmarks. I'm not sure there are many alternatives for search results where we can't pre-calculate the results.
Another alternative, and this depends a bit on the details of what you are trying to do, is to create a view containing the data and use the keys to narrow down the view to the results you need. View outputs support use of limit and skip which would enable you to implement pagination.
There's also a good example of pagination in the docs: http://docs.couchdb.org/en/2.1.0/ddocs/views/pagination.html

when startKey is unknown how to implement strategy for couchdb/pouchDb

Numerous readings indicate that skip should REALLY be avoided when doing pagination. As in the link most cite that using startKey and limit is the way to go. After I get the first page I know the startKey of that page, the lastKey of that page and the total entries. If I have a pagination control with page numbers as buttons and the user selects page 3, how do I get there? I have no idea what the startKey of page3. Perhaps, I do a simple view up front to go get the start keys for the each page.
This page nicely describes pagination:
http://docs.couchdb.org/en/1.6.1/couchapp/views/pagination.html
So, you can't really have a "Go to page 298", but to have links to the previous and next 5 pages, you can look up a larger number of preceeding and following documents and generate links accordingly. For example, if you have 10 posts per page, look up 50 following keys and take every 10th one.
As for making a "Go to page X", perhaps a background script that generates somesort of cache?

How can I crawl but not index web pages in OpenSearchServer?

I'm using OpenSearchServer to provide search functionality on a web site. I want to crawl all pages on the site for links to follow but I want to exclude some pages from the index. I can't work out how to do this.
Specifically the website includes a shop that has its own product search and I am keeping this search for products and categories. The product pages have URLs like http://www.thesite/p/123 so I don't want to include any page like this in the search results. However some product pages reference background info pages and I want these to be included in the search index.
The problem I have is that the filter has no effect on the results - it doesn't filter out the /p/ and /c/ results. If I change the filter by unticking the negative box I get no results so it seems to be either the contents of the field or the filter criteria that is causing the problem.
I've tried adding a negative filter to the default query called search in the Query > Filter tab on the index with url:"http://www.thesite/p/*"
but it seems that wildcards are not supported for query filters although they are supported for Crawler > Exclusion list filters.
I've tried adding a new field called urlField in Schema > Fields and populating it using an analyzer configured using the Whitespace Tokenizer and a regular expression (http://www.thesite/(c|p)/). When I use the Test button it seems to generate two tokens for my test URL http://www.thesite/p/123:
http://www.thesite/p/
p
I'd hoped to be able to use the first one in a Query > Filter to exclude all the shop results and optionally be able to use the p (for product) or c (for category) if I need to search the product pages sometime in the future.
The urlShop field in the schema is set up as follows:
Indexed: yes
Stored: no (because I don't need the field back, just want to be able to filter on it)
TermVector: No
Analyzer: urlShop
Copy of: url
I've added urlFilter:"http://www.thesite/p/" to Query > Filters with the negative box ticked.
This seems to have no effect on the results when I use the default renderer.
To see whether it affects the returned results I unticked the negative box in the query filter I get no results in the default renderer. This leads me to believe that the urlShop field is not being populated but I'm not sure how to check this directly.
I would like to know whether there is an easier way to do this but if my approach makes sense in the context of OpenSearchServer please can you help me identify what's wrong?
The website is running under IIS and OpenSearchServer will be configured on the same server running in Tomcat.
Finally figured this out...
Go to query and hit edit for your configured query. Then go to the filters tab. Add a query filter like this:
urlExact:"http://myurltoexclude*"
Check the "negative" box. Click add.
Now make sure to click "save in the tiny little button on the right hand side. This is the part I missed. The URLS are still in the DB and crawl, but at least they aren't returned in results.

Pagination in Marklogic while using Search API

I have around 53,00,000 documents in MarkLogic server and I am building a simple search application. User enters a search term and MarkLogic server searches that term in all the nodes in all the documents and returns the matching documents as the result. I have implemented a custom paging to show results per page. I am showing 10 results per page.
I am using search api for this as:-
import module namespace search="http://marklogic.com/appservices/search" at "/Marklogic/appservices/search/search.xqy";
declare variable $options:=
<options xmlns="http://marklogic.com/appservices/search">
<transform-results apply="raw"/>
</options>;
search:search($p, $options, $noRecFrom, 10)/search:result
where $p is the input from the user $noRecFrom is the number which indicates from where we have to show records. For example for page 1 $noRecFrom will be 1, for page 2 $noRecFrom will be 11, for page 3 $noRecFrom will be 21 and so on. For paging there are hyperlinks to go to First, Next, Prev and Last pages.
To calculate the total number of records returned I am using:-
for $x in search:search($p, $options)
return $x//#total;
While First, Next and Prev hyperlink works perfectly but if someone clicks Last the application stops responding and the query does not show any output. Is it due to the large number of documents in the database or I am implementing it wrongly.
Is there any efficient way for pagination in MarkLogic (for search:search) so that the user can go the Last page without delay in query result for such a large database ?
The way you've implemented it, you're running the search repeatedly in your for loop. And that would indeed be slow.
Instead, you should be calculating a $start parameter based on the #total and number of documents per page, and passing that in as an argument (I think it's the third one) to search:search.
I would also recommend making sure you can run in unfiltered mode. There is good information about optimizing for fast pagination (indexes, etc) on the developer site; the idea is to resolve queries out of indexes to give very good, accurate unfiltered performance.
If you do it that
There is a tutorial on paginated search at http://developer.marklogic.com/learn/2006-09-paginated-search
Once you have resolved the issues mentioned by cwhit above, if you still want to get to the last page of data in a faster manner, you could make your code smart enough to reverse the sort order and pull the correct offset of records.
Here's another tip:
To get better insight to what MarkLogic is doing with search:search, call
search:get-default-options()
to see the starting point for common search applications.

Drupal views: limit results shown based on CCK field setting

I have a view that pulls in feed items (from various "Owner feeds" to use the feeds module lingo), then sorts them by date (very important). The owner feed has a CCK field for the type of feed (Twitter, Blog, etc.) and a CCK field theoretically to limit the number of feed items displayed in the view. (The reason for the limit is that Twitter dominates, but we want some blogs, etc., so I don't want to have to show 100 tweets before displaying my first blog.)
I'm guessing some sort of Views hook code is in order, but I'm not sure which one. Perhaps the hook that allows direct modification of the query...
Note that the Owner feed nid is being pulled into the view via a Relationship.
Thanks in advance!
Using the Arguments field in the view, you could first select the node type, then node post date.
In this way you could give priority to the blog type, before sorting by date.
You could also consider building two Views, for instance the top 5 blog items and the top 20 Twitter feeds...

Resources