CouchDB 3 Clouseau-based Full Text Search Bookmark for Previous Page - pagination

The built-in Clouseau-based search in CouchDB 3 would return a bookmark for paging to the next page of results. However, there does not seem to be a way to generate a link to page backward. I have tried to use the same bookmark and reverse the sort parameter, but it does not produce the desired result.
Is there a way to generate a previous link from the result of a Clouseau search?

Related

Cloudant/Couch db pagination in search API - How to skip n number of records

I am building a typical pagination that allows the user to click on a particular page number and view the results (similar to the google search result view). I am using the cloudant search API for this. The cloudant search API provides the limit option but no skip option. How can I skip n number of results if the user is on page 1 and clicks on page 4 ?
I can see that the pagination is implemented using bookmarks. Does it mean that I need to first get the bookmark for page 4 by sending 3 additional requests one after another to the search api ?
There are a couple of different ways of handling this - one is the one you already suggested, which is just to fetch the pages as needed to get the bookmarks. I'm not sure there are many alternatives for search results where we can't pre-calculate the results.
Another alternative, and this depends a bit on the details of what you are trying to do, is to create a view containing the data and use the keys to narrow down the view to the results you need. View outputs support use of limit and skip which would enable you to implement pagination.
There's also a good example of pagination in the docs: http://docs.couchdb.org/en/2.1.0/ddocs/views/pagination.html

How can I crawl but not index web pages in OpenSearchServer?

I'm using OpenSearchServer to provide search functionality on a web site. I want to crawl all pages on the site for links to follow but I want to exclude some pages from the index. I can't work out how to do this.
Specifically the website includes a shop that has its own product search and I am keeping this search for products and categories. The product pages have URLs like http://www.thesite/p/123 so I don't want to include any page like this in the search results. However some product pages reference background info pages and I want these to be included in the search index.
The problem I have is that the filter has no effect on the results - it doesn't filter out the /p/ and /c/ results. If I change the filter by unticking the negative box I get no results so it seems to be either the contents of the field or the filter criteria that is causing the problem.
I've tried adding a negative filter to the default query called search in the Query > Filter tab on the index with url:"http://www.thesite/p/*"
but it seems that wildcards are not supported for query filters although they are supported for Crawler > Exclusion list filters.
I've tried adding a new field called urlField in Schema > Fields and populating it using an analyzer configured using the Whitespace Tokenizer and a regular expression (http://www.thesite/(c|p)/). When I use the Test button it seems to generate two tokens for my test URL http://www.thesite/p/123:
http://www.thesite/p/
p
I'd hoped to be able to use the first one in a Query > Filter to exclude all the shop results and optionally be able to use the p (for product) or c (for category) if I need to search the product pages sometime in the future.
The urlShop field in the schema is set up as follows:
Indexed: yes
Stored: no (because I don't need the field back, just want to be able to filter on it)
TermVector: No
Analyzer: urlShop
Copy of: url
I've added urlFilter:"http://www.thesite/p/" to Query > Filters with the negative box ticked.
This seems to have no effect on the results when I use the default renderer.
To see whether it affects the returned results I unticked the negative box in the query filter I get no results in the default renderer. This leads me to believe that the urlShop field is not being populated but I'm not sure how to check this directly.
I would like to know whether there is an easier way to do this but if my approach makes sense in the context of OpenSearchServer please can you help me identify what's wrong?
The website is running under IIS and OpenSearchServer will be configured on the same server running in Tomcat.
Finally figured this out...
Go to query and hit edit for your configured query. Then go to the filters tab. Add a query filter like this:
urlExact:"http://myurltoexclude*"
Check the "negative" box. Click add.
Now make sure to click "save in the tiny little button on the right hand side. This is the part I missed. The URLS are still in the DB and crawl, but at least they aren't returned in results.

Google Custom Search retrieve all the results, is it possible?

I'm starting to use the Google Custom Search Engine in order to retrieve a temporal use of some selected word in an online newspaper.
I see that for example my result provides a total of 22000 retrieved articles. I tried to retrieve pages after the 100 index but I can't get any result.
I also tried to search directly on the google web page, but I see that after the 10 page I can't go further, so this only show me the first 1000 result at max.
Does it is possible to retrieve every single result or I've to get just only a small portion of that?
Thanks

"Previous" link - equivalent of LIMIT x OFFSET y?

I'm creating a page system using CouchDB, showing:
10 items per page
a link to the previous page (if any)
a link to the next page (if any)
From this article on the topic, I understand that using skip is suboptimal, and that I should instead use the startkey property to specify the first document, read 11 documents from there, display the first 10 and use the key of the 11th to display the link to the next page. What troubles me is the link to the previous page. The article says:
Populating the link to the previous page is as simple as carrying the current startkey over to the next page. If there’s no previous startkey, we are on the first page.
This works when going to the next page: when I move from page 4 to page 5 I can remember that the previous page was 4. But when I move back from page 5 to page 4, I have no way of carrying over the startkey of page 3. How can this work?
Is it possible (and recommended) to use endkey along with skip=10 and limit=1 to find the first element on the previous page, so that I may create a link back to it?
You can in fact only ask for 11 documents with no skip, and that is what Futon does (look at CouchDB logs).
The trick
Both the next and previous page link will be similar: startkey is the first or last element, with a skip=1 to avoid overlapping. You then have to correctly use the descending parameter to get previous documents or next documents.
The execution
Whenever you're asking for a page, CouchDB answers with eleven documents. Let's say the key of the first one is first and the key of the last one is last. The pagination links will look like:
"next": /db/_view/myview?descending=true&limit=11&startkey=last&skip=1
"back": /db/_view/myview?descending=false&limit=11&startkey=first&skip=1
Et voilà! You just have to reverse the documents before displaying them when descending is false. ("Finding your data with views" from the CouchDB guide explains nicely the relation between those parameters and B-Trees.)
Bonus
You can easily get the docid of the first or last page (limit=1 and descending true or false), and get a pagination system that looks a lot like something you would have with a classical database (first, last, previous, next).
Read 21 documents instead of 11 - One extra going forward, and ten going backwards. The very first one holds the key to the previous page.

Best way to implement <next>, <prev> element links from search list

In my web application I got a search results list (SR). The search is heavily parametrized. Each element on the list can be clicked and then the element's own page (EP) is displayed.
Now, the customer wants to have the ability to go to previous and next element from the search list that was used to enter the element page.
How would you implement this? I can either pass the search conditions to the EP and the element's index on the list, then prev/next would just mean to rerun the search query, get previous / next index and display it (still passing the conditions and new index).
Or is there a better approach?
How intensive is your search process? It sounds like something you don't want to execute anymore than necessary. What if you when you render the search results you also store in a list the unqiue EP IDs on the server. You can then navigate through that using indexes for prev/next and the unique ID of the EP element to load details? You can then also store the query term and repopulate the search results with a 'Back to Search' link?

Resources