I'm using WIQL to query for a list of work items in Azure DevOps. However, Azure DevOps will return a maximum of 20000 work items in a single query. If the query results contain more that 20000 items, an error code is returned instead of the work items. To get a list of all work items matching my query, I modified the query to filter my ID and then loop to build of a list of work items with multiple queries. The issue is that there is apparently no way to know when I have reached the end of my loop because I don't know the maximum work item ID in the system.
idlist = []
max_items_per_query = 19000
counter = 0
while not done:
wiql = ("SELECT [System.Id] FROM WorkItems WHERE [System.WorkItemType] IN ('User Story','Bug')
AND [System.AreaPath] UNDER 'mypath' AND System.ID >= count AND System.ID < counter + max_items".format(counter, counter + max_items_per_query))
url = base_url+'wiql'
params = {'api-version':'4.0'}
body = {'query':wiql}
request = session.post(url, auth=('',personal_access_token), params=params, json=body)
response = request.json()
newItems = [w['id'] for w in response['workItems']]
idlist.extend(newItems)
if not newItems:
done = True
This works in most cases but the loop exits prematurely if it encounters a gap in work item ids under the specified area path. Ideally, I could make this work if there was a way to query to the max work item ID in the system and then use this number to exit when the counter reaches that value. However, I can't find a way to do this. Is there a way to query for this number or possibly another solution that will allow me to get a list of all work items matching a specific criteria?
You can use $top parameter to get the last one.
Something like below ( This is just sample - you can extend it to your query)
SELECT [System.Id] FROM workitems WHERE [System.Id] > 0 ORDER BY [System.Id] DESC with $top = 1
This will return the maximum System id - as it arranging it in the descending order.
Suggestion :
You can also change your logic something like below as well :
SELECT [System.Id] FROM workitems WHERE [System.Id] > 0 ORDER BY [System.Id] ASC with $top = 5000
Get the 5000th item System.Id , let's us assume there it is 5029
The next query would be :
SELECT [System.Id] FROM workitems WHERE [System.Id] > 5029 ORDER BY [System.Id] ASC with $top = 5000
You will get the next 5000 items starting from the system id- 5029.
You can loop the above logic.
For the exit case of the loop, you can check the number of items returned as part of the iteration - if it is less than 5000, then that would be the end of the iteration.
Related
I want to do a select query in Cosmos DB that returns a maximum number of results (say 50) and then gives me the continuation token so I can continue the search where I left off.
Now let's say my query has 2 equality conditions in my where clause, e.g.
where prop1 = "a" and prop2 = "w" and prop3 = "g"
In the results that are returned, I want the records that satisfy prop1 = "a" to appear first, followed by the results that have prop2 = "w" followed by the ones with prop3 = "g".
Why do I need it? Because while I could just get all the data to my application and sort it there, I can't pull all records obviously as that would mean pulling in too much data. So if I can't order it this way in cosmos itself, in the results that I get, I might only have those records that don't have prop1 = "a" at all. Now I could keep retrying this till I get the ones with prop1 = "a" (I need this because I want to show the results with prop1 = "a" as the first set of results to the user) but I might have to pull like a 100 times to get the first record since I have a huge dataset sitting in my Cosmos DB.
How can I handle this scenario in Cosmos? Thanks!
So if I am understanding your question correctly, you want to accomplish this:
SELECT * FROM c
WHERE
c.prop1 = 'a'
AND
c.prop2 = 'b'
AND
c.prop3 = 'c'
ORDER BY
c.prop1, c.prop2, c.prop3
OFFSET 0 LIMIT 25
Now, luckily you can now do this in CosmosDB SQL. But, there is a caveat. You have to set up a composite index in your collection to allow for this.
So, for this collection, my composite index would look like this:
Now, if I wanted to change it to this:
SELECT * FROM c
WHERE
c.prop1 = 'a'
AND
c.prop2 = 'b'
AND
c.prop3 = 'c'
ORDER BY
c.prop1 DESC, c.prop2, c.prop3
OFFSET 0 LIMIT 25
I could add another composite index to cover that use-case. You can see in your settings it's an array of arrays so you can add as many combinations as you'd like.
This should get you to where you need to be if I understood your question correctly.
I would like to make an exists PostgreSQL query.
Let's say I have a Q ArangoDB query (AQL). How can I check if Q returns any result?
Example:
Q = "For u in users FILTER 'x#example.com' = u.email"
What is the best way to do it (most performant)?
I have ideas, but couldn't find an easy way to measure the performance:
Idea 1: using Length:
RETURN LENGTH(%Q RETURN 1) > 0
Idea 2: using Frist:
RETURN First(%Q RETURN 1) != null
Above, %Q is a substitution for the query defined at the beginning.
I think the best way to achieve this for a generic selection query with a structure like
Q = "For u in users FILTER 'x#example.com' = u.email"
is to first add a LIMIT clause to the query, and only make it return a constant value (in contrast to the full document).
For example, the following query returns a single match if there is such document or an empty array if there is no match:
FOR u IN users FILTER 'x#example.com' == u.email LIMIT 1 RETURN 1
(please note that I also changed the operator from = to == because otherwise the query won't parse).
Please note that this query may benefit a lot from creating an index on the search attribute, i.e. email. Without the index the query will do a full collection scan and stop at the first match, whereas with the index it will just read at most a single index entry.
Finally, to answer your question, the template for the EXISTS-like query will then become
LENGTH(%Q LIMIT 1 RETURN 1)
or fleshed out via the example query:
LENGTH(FOR u IN users FILTER 'x#example.com' == u.email LIMIT 1 RETURN 1)
LENGTH(...) will return the number of matches, which in this case will either be 0 or 1. And it can also be used in filter conditions like as follows
FOR ....
FILTER LENGTH(...)
RETURN ...
because LENGTH(...) will be either 0 or 1, which in context of a FILTER condition will evaluate to either false or true.
Do you need and AQL solution?
Only the count:
var q = "For u in users FILTER 'x#example.com' = u.email";
var res = db._createStatement({query: q, count: true}).execute();
var ct = res.count();
Is the fastest I can think of.
we need create custom last evaluated key
The use case goes Here:
First scan table its having ten records,the tenth record should my last evaluated key when i do second time scan operation
Thanks in advance
#Exclusive start key should be null for your first page
esk = None
#Get the first page
scan_generator = MyTable.scan(Limit=10, exclusive_start_key=esk)
#Get the key for the next page
esk = scan_generator.kwargs['exclusive_start_key'].values()
#Now get page 2
scan_generator = MyTable.scan(Limit=10, exclusive_start_key=esk)
EDIT:
exclusive_start_key (list or tuple) – Primary key of the item from which to continue an earlier query. This would be provided as the LastEvaluatedKey in that query.
For example
exclusive_start_key = ('myhashkey');
or
exclusive_start_key = ('myhashkey', 'myrangekey');
Items in my DynamoDB table have the following format:
{
'id': 1,
'last_check': 1234556,
'check_interval': 100,
....
}
Now I'd like to scan the table and find all items where last_check + check_interval is less than some given value last_check_time. I did not find a way to create a FilterExpression that combines two attributes so I'm currently doing it the following way:
last_check_time = time.time()
response = status_table.scan(
ProjectionExpression='id, last_check, check_interval',
FilterExpression = Attr('last_check').lt(last_check_time)
)
# manually filter the items and only preserve those with last_check + check_interval < last_check_time:
for item in response['Items']:
if item['last_check'] + item['check_interval'] < last_check_time:
# This is one of the items I'm interested in!
....
else:
# Not interested in this item. And I'd prefer to let DynamoDB filter this out.
continue
Is there a way to let DynamoDB do the filtering and therefore make the for loop in the example above obsolete?
Unfortunately it is currently not possible to request DynamoDB to perform a filtered calculation for you, but you could create another attribute which is the sum of the two attributes, and you have a couple of approaches to achieve that;
Potentially compute an additional attribute (last_check + check_interval) in code when writing the item to DynamoDB.
Use DynamoDB Triggers to create an additional attribute (last_check + check_interval)
You can use either option to create a new attribute on the item to filter on.
Azure Search returns a maximum of 1,000 results at a time. For paging on the client, I want the total count of matches in order to be able to display the correct number of paging buttons at the bottom and in order to be able to tell the user how many results there are. However, if there are over a thousand, how do I get the actual count? All I know is that there were at least 1,000 matches.
I need to be able to do this from within the SDK.
If you want to get total number of documents in an index, one thing you could do is set IncludeTotalResultCount to true in your search parameters. Once you do that when you execute the query, you will see the count of total documents in an index in Count property of search results.
Here's a sample code for that:
var credentials = new SearchCredentials("account-key (query or admin key)");
var indexClient = new SearchIndexClient("account-name", "index-name", credentials);
var searchParameters = new SearchParameters()
{
QueryType = QueryType.Full,
IncludeTotalResultCount = true
};
var searchResults = await indexClient.Documents.SearchAsync("*", searchParameters);
Console.WriteLine("Total documents in index (approx) = " + searchResults.Count.GetValueOrDefault());//Prints the total number of documents in the index
Please note that:
This count will be approximate.
Getting the count is an expensive operation so you should only do it with the very first request when implementing pagination.
For REST clients using the POST API, just include "count": "true" to the payload. You get the count in #odata.count.
Ref: https://learn.microsoft.com/en-us/rest/api/searchservice/search-documents