How to support pagination for external change log searching to OpenDJ LDAP? - pagination

I want to search change log under "cn=changelog". I can search the result normally if the result entries were not a lot. But if there are a lot of entries in the result, the memory will be not enough. So, I want to page the result. How can I define the size limit?
I also refered to https://bugster.forgerock.org/jira/si/jira.issueviews:issue-html/OPENDJ-1218/OPENDJ-1218.html. While, I wonder how to define a filter to support "changeNumber". And in my result, there is not this attribut "changeNumber". Why?
Please help me how shoud I do?
BTW, I am using OpenDJ 3.0.

Size limit is an option of the client call. You can always specify the maximum amount of entries you want to be returned (the server has it's own limit and will enforce the smallest between the 2).
How to define the size limit depends on what you are using as client, and you did not mention it.
Can you provide details on what you are using to search (tool, library...) and what is the filter and options you are currently using ? It's difficult to provide help and suggestions to improvement when there is no detail.

Related

Github API incomplete_results for a simple search

I am querying github via API. This is a really simple query to find files called pipelines.yaml in pipelines folder in my org abc-xyz.
Querying https://api.github.com/search/code?q=filename:pipelines+path:pipelines+language:yaml+org:abc-xyz
However, all my outputs have a "incomplete_results:true". The output length varies every time I query by a repo or two.
Query Output:[total_count:16, incomplete_results:true, items:[[name:pipelines.yaml, path:pipelines/pipelines.yaml.
How do I fix this. I understand why github does this, but I see online large results causing this problem. But in my case I am talking of less than 500 results. Ideally less than 50.
However,my org has about 800 repos. Is this why I always get an incomplete_results. Is there an easy way to fix this?
Appreciate any feedback possible.
My repo is private.
Thanks
Vinay
There's some details in the documentation that are helpful here:
To keep the Search API fast for everyone, we limit how long any individual query can run. For queries that exceed the time limit, the API returns the matches that were already found prior to the timeout, and the response has the incomplete_results property set to true.
Reaching a timeout does not necessarily mean that search results are incomplete. More results might have been found, but also might not.
My guess here is that the query is hitting a timeout and returning what it could find at that time. Tuning the search query using the advice also in the docs might help to get your query under that timeout. Perhaps moving to querying per-repository and doing client-side caching of results here might work, but that might also cause you to need to think about rate limiting.

Logstash: is it possible to save documents in memory?

I am trying to save data in memory that I would be able to retrieve quickly in my filter part. Indeed, when i receive new documents i want to retrieve former documents related in order to compute some new metrics.
Can anyone tell me if it is possible and if yes how could I achieve that ?
Thank you very much.
Joe
The closest thing to achieve this would be to use the elasticsearch filter in order to query an ES cluster for some document or the unofficial memcached filter, which is probably more up to that task given the features of memcached.
I'm not aware of any official/unofficial redis or hazelcast filters, though, but that would also be an option since they are both caching technologies.
You should also have a look at the existing metrics filter which might also be of some help depending on your use case, which, by the way, you should detail a bit more if you desire more precise help.

Date function and Selecting top N queries in DocumentDB

I have following questions regarding Azure DocumentDB
According to this article, multiple functions have been added to
DocumentDB. Is there any way to get Date functions working? How can i
get the queries of type greater than some date working?
Is there any way to select top N results like 'Select top 10 * from users'?
According to Document playground , Order By will be supported in future. Is ther any other way around for now?
The application that I am developing requires certain number of results to be displayed that have been inserted recently. I need these functionalities within a stored procedure. The documents that I am storing in DocumentDB have a DateTime property. I require the above mentioned functionalities for my application to work. I have searched at documentation and samples. Please help if you know of any workaround.
Some thoughts/suggestions below:
Please take a look at this idea on how to store and query dates in DocumentDB (as epoch timestamps). http://azure.microsoft.com/blog/2014/11/19/working-with-dates-in-azure-documentdb-4/
To get top N results, set FeedOptions.MaxItemCount and read only one page, i.e., call ExecuteNextAsync() once. See https://msdn.microsoft.com/en-US/library/microsoft.azure.documents.linq.documentqueryable.asdocumentquery.aspx for an example. We're planning to add TOP to the grammar to make this easier in the future.
You can email me at arramac at microsoft dot com to get early access to Order By right away. This is planned for broad release shortly.
Please note that stored procedures are best used when you have a write operation(s). You'll be able to better throughput on reads when you query directly.

azure search. What if I have a lot of facets

in a commercial application it is not uncommun to have hundreds facets. Of course not all products are flaged with all of them.
But when searching I need to add a facet querystring parameter that list all the facets that I want to get back. As I don't know by advance the list of relevant one, I have to pass all of them in the query.
This is not practical we more than a few facets.
Is there a way to solve this issue or is it a limitation of the product?
The Azure Search doc:
https://msdn.microsoft.com/fr-fr/library/azure/dn798927.aspx
You are correct that this is a current limitation of Azure Search in that you need to pass all the facets in the query string. Please know that we are aware of this and in fact it can be an even bigger issue for customers where they have so many parameters or facets in their query string that it exceeds the max size of the url. For this reason, we are investigating what can be done about this to accommodate this.
I apologize that I do not yet have a date for when this is to be available other than to say it is on our short term roadmap.
Liam
It looks like Azure Search now supports both a GET and POST method, and recommends using POST when the length of the URL would exceed the max limit of 2048 characters (1024 for just the querystring).
https://learn.microsoft.com/en-us/rest/api/searchservice/search-documents

How do I sort Lucene results by field value using a HitCollector?

I'm using the following code to execute a query in Lucene.Net
var collector = new GroupingHitCollector(searcher.GetIndexReader());
searcher.Search(myQuery, collector);
resultsCount = collector.Hits.Count;
How do I sort these search results based on a field?
Update
Thanks for your answer. I had tried using TopFieldDocCollector but I got an error saying, "value is too small or too large" when i passed 5000 as numHits argument value. Please suggest a valid value to pass.
The search.Searcher.search method will accept a search.Sort parameter, which can be constructed as simply as:
new Sort("my_sort_field")
However, there are some limitations on which fields can be sorted on - they need to be indexed but not tokenized, and the values convertible to Strings, Floats or Integers.
Lucene in Action covers all of the details, as well as sorting by multiple fields and so on.
What you're looking for is probably TopFieldDocCollector. Use it instead of the GroupingHitCollector (what is that?), or inside it.
Comment on this if you need more info. I'll be happy to help.
In the original (Java) version of Lucene, there is no hard restriction on the size of the the TopFieldDocCollector results. Any number greater than zero is accepted. Although memory constraints and performance degradation create a practical limit that depends on your environment, 5000 hits is trivial and shouldn't pose a problem outside of a mobile device.
Perhaps in porting Lucene, TopFieldDocCollector was modified to use something other than Lucene's "heap" implementation (called PriorityQueue, extended by FieldSortedHitQueue)—something that imposes an unreasonably small limit on the results size. If so, you might want to look at the source code for TopFieldDocCollector, and implement your own similar hit collector using a better heap implementation.
I have to ask, however, why are you trying to collect 5000 results? No user in an interactive application is going to want to see that many. I figure that users willing to look at 200 results are rare, but double it to 400 just as factor of safety. Depending on the application, limiting the result size can hamper malicious screen scrapers and mitigate denial-of-service attacks too.
The constructor for Sort accepting only the string field name has been depreciated. Now you have to create a sort object and pass it in as the last paramater of searcher.Search()
/* sorting by a field of type long called "size" from greatest -> smallest
(signified by passing in true for the last isReversed paramater)*/
Sort sorter = new Sorter(new SortField("size", SortField.Type.LONG, true))
searcher.Search(myQuery, collector, sorter);

Resources