cosmos db document query takes long time - azure

i am new to cosmos-db and facing issues in querying the collection, i have a partitioned collection with 100000 RU/s(unlimited storage capacity). the partition is based on '/Bid' which a GUID. i am querying the collection based on the partition key value which has 10,000 records (the collection has more than 28,942,445 documents for different partitions). i am using the following query to get the documents but it takes around 50 seconds to execute the query which is not feasible.
object partitionkey = new object();
partitionkey = "2359c59a-f730-40df-865c-d4e161189f5b";
// Now execute the same query via direct SQL
var DistinctBColumn = this.client.CreateDocumentQuery<BColumn>(BordereauxColumnCollection.SelfLink, "SELECT * FROM BColumn_UL c WHERE c.BId = '2359c59a-f730-40df-865c-d4e161189f5b'",new FeedOptions { EnableCrossPartitionQuery=true, PartitionKey= new PartitionKey("2359c59a-f730-40df-865c-d4e161189f5b") }, partitionkey).ToList();
also tried with other querying options which too resulted in talking along 50 seconds.
But it takes less than a second for the same query on azure portal.
kindly help to optimize the query and correct me if i am wrong. Many Thanks.

Related

Script returning limited amount of records as compare to Query

I tried to convert an SQL query into Gosu Script ( Guidewire). My script is working only for limited number of records
This is the SQL query
select PolicyNumber,* from pc_policyperiod
where ID in ( Select ownerID from pc_PRActiveWorkflow
where ForeignEntityID in (Select id from pc_workflow where State=3))
This is my script
var workFlowIDQuery = Query.make(Workflow).compare(Workflow#State,Relop.Equals,WorkflowState.TC_COMPLETED).select({QuerySelectColumns.path(Paths.make(entity.Workflow#ID))}).transformQueryRow(\row ->row.getColumn(0)).toTypedArray()
var prActiveWorkFlowQuery = Query.make(PRActiveWorkflow).compareIn(PRActiveWorkflow#ForeignEntity, workFlowIDQuery).select({QuerySelectColumns.path(Paths.make(entity.PRActiveWorkflow#Owner))}).transformQueryRow(\row -> row.getColumn(0)).toTypedArray()
var periodQuery = Query.make(PolicyPeriod).compareIn(PolicyPeriod#ID,prActiveWorkFlowQuery).select()
for(period in periodQuery){
print(period.policynmber)
}
Can anyone find a cause; why the script results in limited records or suggest improvements?
I would suggest you to write a single Gosu Query to select policyPeriod and join 3 entities with a foreign key to other entity.
I am note sure if the PolicyPeriod ID is same as the PRActiveWorkflow ID. Can you elaborate the relation between PolicyPeriod and PRActiveWorkflow entity ?

Why is it so slow to query azure metrics tables and how can I speed it up?

https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/diagnostics-template#wadmetrics-tables-in-storage
I'm trying to extract data from one of these WADMETRIC tables, but even if I setup a query to simply query the last one minute of data it takes hours to run!
This happens whether I query it using the Java sdk or from logviewer on the portal.
What am I doing wrong? Should I expect to wait this long?
Any help would be greatly appreciated.
Here's my java code.
String connectionString = String.format("DefaultEndpointsProtocol=https;AccountName=%s;AccountKey=%s", "imgoriginalblobs", "lAHAdJcjUcak48lvNwOoC2UztAcqD+hLX/YMdXq/qppQOgeuzQXTDMtYL3jx/oT+a0hxURqlF2Nv9Fza686s8g==");
CloudStorageAccount account = CloudStorageAccount.parse(connectionString);
CloudTableClient cloudTableClient = account.createCloudTableClient();
Calendar myCal = Calendar.getInstance();
Date now = myCal.getTime();
myCal.set(Calendar.MINUTE, Calendar.getInstance().get(Calendar.HOUR) - 1);
Date theDate = myCal.getTime();
String queryString = TableQuery.combineFilters(
TableQuery.combineFilters(TableQuery.generateFilterCondition("Timestamp", TableQuery.QueryComparisons.GREATER_THAN_OR_EQUAL, theDate),
TableQuery.Operators.AND,
TableQuery.generateFilterCondition("CounterName", TableQuery.QueryComparisons.EQUAL, "/builtin/filesystem/freespace")),
TableQuery.Operators.AND,
TableQuery.generateFilterCondition("Timestamp", TableQuery.QueryComparisons.LESS_THAN_OR_EQUAL, now));
TableQuery<TableMetric> query = TableQuery.from(TableMetric.class).where(queryString);
CloudTable table = cloudTableClient.getTableReference("WADMetricsPT1MP10DV2S20191017");
System.out.println(table.exists());
System.out.println(table.getName());
for (TableMetric entity : table.execute(query)) {
extractPartitionKey(entity.getPartitionKey(), entity.getLast(), entity.getCounterName());
}
And a screenshot of me trying to query the data in the last 5 minutes, this has been hanging for over an hour.
I think the problem is because you're not specifying the Partition Key. So it's performing the query over all partitions that probably are stored in multiple regions, that's the reason it's taking a long time to give you the desired output.
In general, there are many limitations with storing and retrieving metrics in Azure table storage. Take a look at custom metrics in Azure Monitor to see if that can meet your scenario and requirements.
https://learn.microsoft.com/en-us/azure/azure-monitor/platform/metrics-custom-overview

How to get document count within a partition using sql query in Azure CosmosDB

We have Azure CosmosDb database where collection is partitioned based on say "/deviceid". We want to get count of all the documents within a specific partition. We ran this query -
FeedOptions options = new FeedOptions()
{
PartitionKey = new PartitionKey("f0e14e52ed2c499e893ac934ae934835"),
};
IDocumentQuery<dynamic> query = client.CreateDocumentQuery(collectionUri, "Select Value Count(1) From c", options).AsDocumentQuery();
FeedResponse<dynamic> data = await query.ExecuteNextAsync();
This query works but at the cost of high Rus. for 153068 documents we incurred ~10K Rus. Indexing mode is "consistent" and automatic is set to true.
Looking for suggestions on how can we get the count of documents without incurring so many Rus.

In Azure DocumentDb, how do I write an "IN" statement using LINQ to SQL

I want to retrieve about 50 - 100 documents by their ID from DocumentDb. I have the list of IDs in a List<string>. How do I use LINQ to SQL to retrieve those documents. I don't want to write the actual SQL query as a string, as in:
IQueryable<Family> results = client.CreateDocumentQuery<Family>(collectionUri, "SELECT * FROM family WHERE State IN ('TX', 'NY')", DefaultOptions);
I want to be able to use lambda expressions to create the query, because I don't want to hard-code the names of the fields as string.
It seems that you do not want to generate and pass query string SELECT * FROM family WHERE State IN ('TX', 'NY') to query documents, you could try the following code.
List<string> ids = new List<string>() { "TX", "NY" };
client.CreateDocumentQuery<Family>(collectionUri).Where(d => ids.Contains(d.id));

Parse GeoPoint query slow and timed out using javascript sdk in node.js

I have the following parse query which times out when the number of records is large.
var query = new Parse.Query("UserLocation");
query.withinMiles("geo", geo, MAX_LOCATION_RADIUS);
query.ascending("createdAt");
if (createdAt !== undefined) {
query.greaterThan("createdAt", createdAt);
}
query.limit(1000);
it runs ok if UserLocation table is small. But the query times out from time to time when the table has ~100k records:
[2015-07-15 21:03:30.879] [ERROR] [default] - Error while querying for locations: [latitude=39.959064, longitude=-75.15846]: {"code":124,"message":"operation was slow and timed out"}
UserLocation table has a latitude,longitude pair and a radius. Given a geo point (latitude,longitude), I'm trying to find the list of UserLocations whose circle (lat,long)+radius covers the given geo point. It doesn't seem like I can use the value from another column in the table for the distance query (something like query.withinMiles("geo", inputGeo, "radius"), where "geo" and "radius" are the column names for GeoPoint and radius). It also has the limit that query "limit" combined with "skip" can only return maximum of 10,000 records (1000 records at a time and skip 10 times). So I had to do a almost full table scan by using "createdAt" as a filter criteria and keep querying until the query doesn't return results any more.
Anyway I can improve the algorithm so that it doesn't time out on large data set?

Resources