Github API incomplete_results for a simple search - github-api

I am querying github via API. This is a really simple query to find files called pipelines.yaml in pipelines folder in my org abc-xyz.
Querying https://api.github.com/search/code?q=filename:pipelines+path:pipelines+language:yaml+org:abc-xyz
However, all my outputs have a "incomplete_results:true". The output length varies every time I query by a repo or two.
Query Output:[total_count:16, incomplete_results:true, items:[[name:pipelines.yaml, path:pipelines/pipelines.yaml.
How do I fix this. I understand why github does this, but I see online large results causing this problem. But in my case I am talking of less than 500 results. Ideally less than 50.
However,my org has about 800 repos. Is this why I always get an incomplete_results. Is there an easy way to fix this?
Appreciate any feedback possible.
My repo is private.
Thanks
Vinay

There's some details in the documentation that are helpful here:
To keep the Search API fast for everyone, we limit how long any individual query can run. For queries that exceed the time limit, the API returns the matches that were already found prior to the timeout, and the response has the incomplete_results property set to true.
Reaching a timeout does not necessarily mean that search results are incomplete. More results might have been found, but also might not.
My guess here is that the query is hitting a timeout and returning what it could find at that time. Tuning the search query using the advice also in the docs might help to get your query under that timeout. Perhaps moving to querying per-repository and doing client-side caching of results here might work, but that might also cause you to need to think about rate limiting.

Related

Efficient pagination in Cosmos DB

I need to implement efficient pagination for Cosmos DB with nodejs api. There are many examples about the implementation with .NET and LINQ but I could not find anything good for nodejs. The idea is to send the pageSize and pageIndex and get the relevant result.
I already know we can always use dbClient.queryDocuments and get the queryIterator and perform the pagination but this requires always iterating from the first document in the DB. An example could be find here.
Any idea how to do it in an efficient way?
Unfortunately CosmosDB as an engine doesn’t have skip and take pagination support yet.
It is, however, a planned feature.
The blogs you’ve read provide one of the few viable workarounds for now which of course comes with a cost.
You could write something smarter and instead of iterating though every document from the beginning, you could keep the request’s continuation token and use it with your next request. That way you can have a previous and next button logic.

How to deal with the response time from a huge mongo db database

I have a mongoDb database with one of the collection having 2300000 documents and growing. Till the database had 1000000 documents the api response time was quick and the webpage loaded quickly, as soon as it crossed the 2000000 mark it started giving issues and took about a 100 seconds to find and throw the data. I dont know what to do with this sudden surge in the data, are there any practices that I have to follow inorder to manage and reduce the response time from the APIs
The data that im trying to fetch is based on date and the query has to run through the entire database inorder to find data for just one day.
I searched for a lot of things but am not able to find the solution.
[Not enough reputation to comment]
Index is probably the solution for you.
Can you provide example of both a typical document and the query you run?
Are you retrieving (or do you really need) the whole documents, or just some fields on them?
Typically i would suggest to create an index on your date field, with inverse order, it will surely improve your search if it concerns the more recent documents. I can help you to achieve it if you need.
This doc will help you to understand indexes and how to optimize queries.

Why can't ContinuationToken be used for paging in Azure Search API?

Reading the documentation for the Azure Search .NET SDK, I see that the ContinuationToken property is not supposed to used for pagination (this is the same as the #odata.nextLink and #search.nextPageParameter properties in the REST API).
Note that this property is not meant to help you implement paging of search results. You can implement paging using the Top and Skip search parameters.
Source
Why can't I use it for pagination? I have a situation where I want to run a query and then step through a static copy of the results page by page. I don't want those query results to change beneath my feet, however, as I am navigating through them, as new documents are added to the underlying database. In my case, there could be hundreds or thousands of results that get added in the minute or two between submitting the initial query and navigating to another page. How could I accomplish this?
Your question can be addressed in two parts:
Why is it not recommended to use ContinuationToken to implement pagination?
How can pagination be implemented such that results remain completely stable from page to page?
These are actually unrelated questions, since nothing about ContinuationToken guarantees the stability of the search results. Azure Search makes no consistency guarantees around paging, whether you use $top and $skip or ContinuationToken.
For question #1, the reason ContinuationToken is not recommended for paging is that Azure Search controls when the token is returned, not your application code. If you make assumptions about how and when Azure Search decides to return you a token, there's a chance those assumptions may break with a future service update. The intent of ContinuationToken is to prevent requests for too many documents from overwhelming the service, so you should assume that it is entirely at the service's discretion whether it will return a token.
For question #2, since Azure Search doesn't provide consistency guarantees, you can't completely avoid issues like the same document showing up in multiple pages, missing documents, or documents that are deleted by the time they are seen in results. Even if you wanted to build your own snapshot of the results and page over them in your application code, building a consistent snapshot isn't possible in the first place. However, if your only concern is to avoid showing new documents in the results, you can include a created timestamp field in your index and filter on that in every search request.
Frankly, unless you're trying to export the entire contents of your index, I would question the need for such strong consistency guarantees around paging. Google and Bing make no such guarantees, so arguably user expectations are already set around this. If you are trying to export your data, this is unfortunately not easy with Azure Search today. In that case, please vote on this User Voice item to help the team prioritize this scenario.

How to support pagination for external change log searching to OpenDJ LDAP?

I want to search change log under "cn=changelog". I can search the result normally if the result entries were not a lot. But if there are a lot of entries in the result, the memory will be not enough. So, I want to page the result. How can I define the size limit?
I also refered to https://bugster.forgerock.org/jira/si/jira.issueviews:issue-html/OPENDJ-1218/OPENDJ-1218.html. While, I wonder how to define a filter to support "changeNumber". And in my result, there is not this attribut "changeNumber". Why?
Please help me how shoud I do?
BTW, I am using OpenDJ 3.0.
Size limit is an option of the client call. You can always specify the maximum amount of entries you want to be returned (the server has it's own limit and will enforce the smallest between the 2).
How to define the size limit depends on what you are using as client, and you did not mention it.
Can you provide details on what you are using to search (tool, library...) and what is the filter and options you are currently using ? It's difficult to provide help and suggestions to improvement when there is no detail.

Date function and Selecting top N queries in DocumentDB

I have following questions regarding Azure DocumentDB
According to this article, multiple functions have been added to
DocumentDB. Is there any way to get Date functions working? How can i
get the queries of type greater than some date working?
Is there any way to select top N results like 'Select top 10 * from users'?
According to Document playground , Order By will be supported in future. Is ther any other way around for now?
The application that I am developing requires certain number of results to be displayed that have been inserted recently. I need these functionalities within a stored procedure. The documents that I am storing in DocumentDB have a DateTime property. I require the above mentioned functionalities for my application to work. I have searched at documentation and samples. Please help if you know of any workaround.
Some thoughts/suggestions below:
Please take a look at this idea on how to store and query dates in DocumentDB (as epoch timestamps). http://azure.microsoft.com/blog/2014/11/19/working-with-dates-in-azure-documentdb-4/
To get top N results, set FeedOptions.MaxItemCount and read only one page, i.e., call ExecuteNextAsync() once. See https://msdn.microsoft.com/en-US/library/microsoft.azure.documents.linq.documentqueryable.asdocumentquery.aspx for an example. We're planning to add TOP to the grammar to make this easier in the future.
You can email me at arramac at microsoft dot com to get early access to Order By right away. This is planned for broad release shortly.
Please note that stored procedures are best used when you have a write operation(s). You'll be able to better throughput on reads when you query directly.

Resources