Google Datastore query very slow from Cloud Function compared to local machine - node.js

I'm using Google Cloud Function with google-cloud/datastore modules. My data is structured as 1 kind with 4 string properties, only 1000 entities, indexed on all properties.
My query is:
if (/^[a-z0-9]+$/i.test(name)) {
name = name.toLowerCase();
query = datastore.createQuery('IPPhone').filter('email', '>=', name).filter('email', '<', name + '\uffff');
} else if (name.includes('<')) {
query = datastore.createQuery('IPPhone').filter('department', '>=', name).filter('department', '<', name + '\uffff');
isDepartment = true;
} else {
name = fixName(name);
query = datastore.createQuery('IPPhone').filter('name', '=', name);
}
When I query from Google Cloud Function the query time is 14-17 second. However, doing the same thing on my local machine, the query time is much shorter around 800 - 1000 ms. I'm from Hanoi, Vietnam but the only option I can choose for Cloud Function is us-central1.

Related

azure storage table retrieve rows with correct data types. All values are returned as string instead of their correct type

Using Node.js with Azure Storage table.
I've created a new table with two fields/keys (CreatedOn and NextRenewalDate) as Int64 values (time since epoch). The entgen correctly identifies the values as Int64 as desired before I save them to the table.
I know Azure documentation says that it stores int64 as string. It has been like that for a loong time.
I have been parsing the values of properties I know are int64 in my app's code after fetching and before using.
I am wondering if that is still the case ? Is there any in built way of getting the correct data type for values when I read them from Azure Storage Tables ?
Only those 8 types are supported by the Table Service Data Model. For more details, please refer to here.
Edm.Binary
Edm.Boolean
Edm.DateTime
Edm.Double
Edm.Guid
Edm.Int32
Edm.Int64
Edm.String
Besides, when we retrieve entities from Azure table, the entity will return as OData JSON format. According to the OData protocol azure table support, Int64 will represent as string. For more details, please refer to here and here
I have not been able to find a built in way to read / convert Azure storage table entities back into json.
So I wrote a quick method, this will work as long as you requested metadata in payload.
const options = {payloadFormat:"application/json;odata=fullmetadata"};
Here is the quick function that might help somebody:
function convertArrayOfEntitiesToJson(arrEntities){
let arrRecords = [];
for(let x=0; x < arrEntities.length; x++){
const entity = arrEntities[x];
let allKeys = Object.keys( entity )
let record = {};
allKeys.forEach(function(key){
if(entity[key]['$'] === "Edm.Int64" )
record[key] = parseInt( entity[key]["_"] );
else if(entity[key]['$'] === "Edm.DateTime" )
record[key] = new Date( entity[key]["_"] );
else if(entity[key]['$'] === "Edm.Boolean" )
record[key] = ( entity[key]["_"] == 'true' );
else if(entity[key]['$'] === "Edm.Boolean" )
record[key] = ( entity[key]["_"] == 'true' );
else // fall back to string
record[key] = entity[key]["_"];
});
arrRecords.push(record);
}
return arrRecords;
}
I'm missing a couple of data types on purpose. I did not have a need for the rest of the types. Feel free to add other data types as needed.

Delete Documents from CosmosDB based on condition through Query Explorer

What's the query or some other quick way to delete all the documents matching the where condition in a collection?
I want something like DELETE * FROM c WHERE c.DocumentType = 'EULA' but, apparently, it doesn't work.
Note: I'm not looking for any C# implementation for this.
This is a bit old but just had the same requirement and found a concrete example of what #Gaurav Mantri wrote about.
The stored procedure script is here:
https://social.msdn.microsoft.com/Forums/azure/en-US/ec9aa862-0516-47af-badd-dad8a4789dd8/delete-multiple-docdb-documents-within-the-azure-portal?forum=AzureDocumentDB
Go to the Azure portal, grab the script from above and make a new stored procedure in the database->collection you need to delete from.
Then right at the bottom of the stored procedure pane, underneath the script textarea is a place to put in the parameter. In my case I just want to delete all so I used:
SELECT c._self FROM c
I guess yours would be:
SELECT c._self FROM c WHERE c.DocumentType = 'EULA'
Then hit 'Save and Execute'. Viola, some documents get deleted. After I got it working in the Azure Portal I switched over the Azure DocumentDB Studio and got a better view of what was happening. I.e. I could see I was throttled to deleting 18 a time (returned in the results). For some reason I couldn't see this in the Azure Portal.
Anyway, pretty handy even if limited to a certain amount of deletes per execution. Executing the sp is also throttled so you can't just mash the keyboard. I think I would just delete and recreate the Collection unless I had a manageable number of documents to delete (thinking <500).
Props to Mimi Gentz #Microsoft for sharing the script in the link above.
HTH
I want something like DELETE * FROM c WHERE c.DocumentType = 'EULA'
but, apparently, it doesn't work.
Deleting documents this way is not supported. You would need to first select the documents using a SELECT query and then delete them separately. If you want, you can write the code for fetching & deleting in a stored procedure and then execute that stored procedure.
I wrote a script to list all the documents and delete all the documents, it can be modified to delete the selected documents as well.
var docdb = require("documentdb");
var async = require("async");
var config = {
host: "https://xxxx.documents.azure.com:443/",
auth: {
masterKey: "xxxx"
}
};
var client = new docdb.DocumentClient(config.host, config.auth);
var messagesLink = docdb.UriFactory.createDocumentCollectionUri("xxxx", "xxxx");
var listAll = function(callback) {
var spec = {
query: "SELECT * FROM c",
parameters: []
};
client.queryDocuments(messagesLink, spec).toArray((err, results) => {
callback(err, results);
});
};
var deleteAll = function() {
listAll((err, results) => {
if (err) {
console.log(err);
} else {
async.forEach(results, (message, next) => {
client.deleteDocument(message._self, err => {
if (err) {
console.log(err);
next(err);
} else {
next();
}
});
});
}
});
};
var task = process.argv[2];
switch (task) {
case "listAll":
listAll((err, results) => {
if (err) {
console.error(err);
} else {
console.log(results);
}
});
break;
case "deleteAll":
deleteAll();
break;
default:
console.log("Commands:");
console.log("listAll deleteAll");
break;
}
And if you want to do it in C#/Dotnet Core, this project may help: https://github.com/lokijota/CosmosDbDeleteDocumentsByQuery. It's a simple Visual Studio project where you specify a SELECT query, and all the matches will be a) backed up to file; b) deleted, based on a set of flags.
create stored procedure in collection and execute it by passing select query with condition to delete. The major reason to use this stored proc is because of continuation token which will reduce RUs to huge extent and will cost less.
##### Here is the python script which can be used to delete data from Partitioned Cosmos Collection #### This will delete documents Id by Id based on the result set data.
Identify the data that needs to be deleted before below step
res_list = "select id from id_del"
res_id = [{id:x["id"]}
for x in sqlContext.sql(res_list).rdd.collect()]
config = {
"Endpoint" : "Use EndPoint"
"Masterkey" : "UseKey",
"WritingBatchSize" : "5000",
'DOCUMENTDB_DATABASE': 'Database',
'DOCUMENTDB_COLLECTION': 'collection-core'
};
for row in res_id:
# Initialize the Python DocumentDB client
client = document_client.DocumentClient(config['Endpoint'], {'masterKey': config['Masterkey']})
# use a SQL based query to get documents
## Looping thru partition to delete
query = { 'query': "SELECT c.id FROM c where c.id = "+ "'" +row[id]+"'" }
print(query)
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 1000
result_iterable = client.QueryDocuments('dbs/Database/colls/collection-core', query, options)
results = list(result_iterable)
print('DOCS TO BE DELETED : ' + str(len(results)))
if len(results) > 0 :
for i in range(0,len(results)):
# print(results[i]['id'])
docID = results[i]['id']
print("docID :" + docID)
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 1000
options['partitionKey'] = docID
client.DeleteDocument('dbs/Database/colls/collection-core/docs/'+docID,options=options)
print ('deleted Partition:' + docID)

How to search records in all directory except one in Marklogic

I am trying to search XML's in my database using marklogic-java api with following restrictions :
1) A particular XML tag must contain my given value for ex : tradeId must equal to what I pass
2) Results must lie in collections provided by me
3) Results can lie anywhere in marklogic database except in one particular directory
I don't have the solution to point 3. Few of the records meeting my first two criteria have a URI starting with /TRADES/* and so i want to search everywhere except under directory "/TRADES".
This is how my code looks :
public static List<DocumentPage> getResults() throws KeyManagementException, NoSuchAlgorithmException,
KeyStoreException, CertificateException, IOException {
String tradeId = "XYZ";
DatabaseClient client = getDBClient();
QueryManager queryMgr = client.newQueryManager();
XMLDocumentManager xmlMngr = client.newXMLDocumentManager();
StringHandle rawHandle = new StringHandle();
rawHandle.withFormat(Format.XML).set(getQueryToFetchMessagesByTradeId(tradeId));
RawQueryByExampleDefinition querydef = queryMgr.newRawQueryByExampleDefinition(rawHandle);
querydef.setCollections("/messages/processed", /messages/toBeProcessed");
querydef.setDirectory("/");
return getDocumentPageList(querydef, client);
}
private static String getQueryToFetchMessagesByTradeId(String tradeId) {
String query = "<q:qbe xmlns:q=\"http://marklogic.com/appservices/querybyexample\">\n" + "<q:query>\n<q:word>"
+ "<tradeId tradeIdScheme=" + "\"http://www.abcd.com/internalid/trade-id\"" + ">" + tradeId
+ "</tradeId></q:word>\n" + "</q:query>\n" + "</q:qbe>";
return query;
}
Any help is highly appreciated.
I don't see an option to compose metadata (directory) queries in query by example syntax. So you'll have to use Structured Queries instead. I find them more readable in Java anyway since they don't require string concatenation. Use StructuredQueryBuilder like so:
StructuredQueryBuilder qb = new StructuredQueryBuilder();
StructuredQueryDefinition querydef =
qb.and(
qb.containerQuery(
qb.element(new QName("http://www.abcd.com/internalid/trade-id", "tradeId")),
qb.and(
qb.term( tradeId ),
qb.value(
qb.elementAttribute(
qb.element(new QName("http://www.abcd.com/internalid/trade-id", "tradeId")),
qb.attribute("tradeIdScheme")
),
"http://www.abcd.com/internalid/trade-id"
)
)
),
qb.not(
qb.directory(1, "/TRADES/")
)
);
querydef.setCollections("/messages/processed", "/messages/toBeProcessed")
return getDocumentPageList(querydef, client);

Querying DocumentDb in .NET with a parametrized query

For an application I'm running a query on DocumentDb in .NET. For this used I wanted to use a parametrized query, like this:
var sqlString = "select p.Id, p.ActionType, p.Type, p.Region, a.TimeStamp, a.Action from History p join a in p.Actions where a.TimeStamp >= #StartTime and a.TimeStamp <= #EndTime and p.ClientId = #ClientId and p.ActionType = #ActionType";
if (actionType != "") { sqlString += actionTypeFilter; }
var queryObject = new SqlQuerySpec
{
QueryText = sqlString,
Parameters = new SqlParameterCollection()
{
new SqlParameter("#StartTime", startDate),
new SqlParameter("#EndTime", endDate),
new SqlParameter("#ClientId", clientId.ToString()),
new SqlParameter("#ActionType", actionType)
},
};
var dataListing = _documentDbClient.CreateDocumentQuery<PnrTransaction>(UriToPnrHistories, queryObject, new FeedOptions() { MaxItemCount = 1 });
When I execute this, I'm getting en empty dataset. But when I use the same query, and build it using classic string replace it works just fine.
Can anyone tell me what I'm doing wrong in my parametrized query?
If the code above is the running code, you still add the actiontypeFilter on the parameterized SQL string. Try to remove the if statement on line 2. Seems to me that may be your problem.
It would help if you could post a sample document from the server.
I usually see this syntax:
SqlParameterCollection parameters = new SqlParameterCollection();
parameters.Add(...);
parameters.Add(...);
Try that and see if you get different results. It might be that the list you use to initialize it in your answer needs to be typed differently to work.

Why code running on Azure is so slow?

I have a web app running on Azure shared web site mode. A simple method where I add items to a list and sort this list, when the list size is about 300 items, takes 0.3s on my machine and 10s after deploy (on azure machine).
Does anybody has any idea why Azure is so slow?
Is any configuration I do it wrong? I use default one but replaced FREE mode with SHARED mode because I thought this would help but it seems it does not.
UPDATE:
public ActionResult GetPosts(String selectedStreams, int implicitSelectedVisualiserId, int userId)
{
DateTime begin = DateTime.UtcNow;
List<SearchQuery> selectedSearchQueries = searchQueryRepository.GetSearchQueriesOfStreamsIds(selectedStreams == String.Empty ? new List<int>() : selectedStreams.Split(',').Select(n => int.Parse(n)).ToList());
var implicitSelectedVisualiser = VisualiserModel.ToVisualiserModel(visualiserRepository.GetVisualiser(implicitSelectedVisualiserId));
var twitterSearchQueryOfImplicitSelectedVisualiser = searchQueryRepository.GetSearchQuery(implicitSelectedVisualiser.Stream.Name, Service.Twitter, userId);
var instagramSearchQueryOfImplicitSelectedVisualiser = searchQueryRepository.GetSearchQuery(implicitSelectedVisualiser.Stream.Name, Service.Instagram, userId);
var facebookSearchQueryOfImplicitSelectedVisualiser = searchQueryRepository.GetSearchQuery(implicitSelectedVisualiser.Stream.Name, Service.Facebook, userId);
var manualSearchQueryOfImplicitSelectedVisualiser = searchQueryRepository.GetSearchQuery(implicitSelectedVisualiser.Stream.Name, Service.Manual, userId);
List<SearchResultModel> approvedSearchResults = new List<SearchResultModel>();
if (twitterSearchQueryOfImplicitSelectedVisualiser != null || instagramSearchQueryOfImplicitSelectedVisualiser != null || facebookSearchQueryOfImplicitSelectedVisualiser != null
|| manualSearchQueryOfImplicitSelectedVisualiser != null)
{
// Define search text to be displayed during slideshow;
SearchModel searchModel = new SearchModel();
// Set slideshow settings from implicit selected visualiser.
ViewBag.CurrentVisualiser = implicitSelectedVisualiser;
// Load search results from selected visualisers.
foreach (SearchQuery searchQuery in selectedSearchQueries)
{
approvedSearchResults.AddRange(
SearchResultModel.ToSearchResultModel(
searchResultRepository.GetSearchResults
(searchQuery.Id,
implicitSelectedVisualiser.Language)));
// Add defined query too.
searchModel.SearchValue += " " + searchQuery.Query;
}
// Add defined query for implicit selected visualiser.
if (twitterSearchQueryOfImplicitSelectedVisualiser != null)
searchModel.SearchValue += " " + twitterSearchQueryOfImplicitSelectedVisualiser.Query;
if (instagramSearchQueryOfImplicitSelectedVisualiser != null)
searchModel.SearchValue += " " + instagramSearchQueryOfImplicitSelectedVisualiser.Query;
if (facebookSearchQueryOfImplicitSelectedVisualiser != null)
searchModel.SearchValue += " " + facebookSearchQueryOfImplicitSelectedVisualiser.Query;
ViewBag.Search = searchModel;
// Also add search results from implicit selected visualiser
if (twitterSearchQueryOfImplicitSelectedVisualiser != null)
approvedSearchResults.AddRange(SearchResultModel.ToSearchResultModel(searchResultRepository.GetSearchResults(twitterSearchQueryOfImplicitSelectedVisualiser.Id, implicitSelectedVisualiser.Language)));
if (instagramSearchQueryOfImplicitSelectedVisualiser != null)
approvedSearchResults.AddRange(SearchResultModel.ToSearchResultModel(searchResultRepository.GetSearchResults(instagramSearchQueryOfImplicitSelectedVisualiser.Id, implicitSelectedVisualiser.Language)));
if (facebookSearchQueryOfImplicitSelectedVisualiser != null)
approvedSearchResults.AddRange(SearchResultModel.ToSearchResultModel(searchResultRepository.GetSearchResults(facebookSearchQueryOfImplicitSelectedVisualiser.Id, implicitSelectedVisualiser.Language)));
if (manualSearchQueryOfImplicitSelectedVisualiser != null)
approvedSearchResults.AddRange(SearchResultModel.ToSearchResultModel(searchResultRepository.GetSearchResults(manualSearchQueryOfImplicitSelectedVisualiser.Id, implicitSelectedVisualiser.Language)));
// if user selected to show only posts from specific number of last days.
var approvedSearchResultsFilteredByDays = new List<SearchResultModel>();
if (implicitSelectedVisualiser.ShowPostsFromLastXDays != 0)
{
foreach (SearchResultModel searchResult in approvedSearchResults)
{
var postCreatedTimeWithDays = searchResult.PostCreatedTime.AddDays(implicitSelectedVisualiser.ShowPostsFromLastXDays + 1);
if (postCreatedTimeWithDays >= DateTime.Now)
approvedSearchResultsFilteredByDays.Add(searchResult);
}
}
else
{
approvedSearchResultsFilteredByDays = approvedSearchResults;
}
// Order search results (posts to be displayed by created datetime).
var approvedSearchResultsOrdered = new List<SearchResultModel>();
if (implicitSelectedVisualiser.PostsSortOrder == PostsSortOrder.CREATED_DATE_ASC)
{
approvedSearchResultsOrdered = approvedSearchResultsFilteredByDays.OrderBy(s => s.PostCreatedTime).ToList(); ;
}
else if (implicitSelectedVisualiser.PostsSortOrder == PostsSortOrder.CREATED_DATE_DESC)
{
approvedSearchResultsOrdered = approvedSearchResultsFilteredByDays.OrderByDescending(s => s.PostCreatedTime).ToList(); ;
}
else if (implicitSelectedVisualiser.PostsSortOrder == PostsSortOrder.RANDOM)
{
var rnd = new Random();
approvedSearchResultsOrdered = approvedSearchResultsFilteredByDays.OrderBy(x => rnd.Next()).ToList();
}
// Load background images;
var visualiserImages = visualiserImageRepository.GetImages(implicitSelectedVisualiser.Id);
//foreach (SearchResultModel searchResultModel in approvedSearchResultsOrdered)
//{
// searchResultModel.BackgroundImagePath = TwitterUtils.GetRandomImageBackgroundForDisplay(visualiserImages);
//}
ViewBag.BackgroundImagePath = TwitterUtils.GetRandomImageBackgroundForDisplay(visualiserImages);
approvedSearchResults = approvedSearchResultsOrdered;
}
DateTime end = DateTime.UtcNow;
Elmah.ErrorSignal.FromCurrentContext().Raise(new Exception(String.Format("User {0}: Preparing {1} posts for visualiser took {2} seconds", MySession.Current.LoggedInUserName, approvedSearchResults.Count(), (end - begin).TotalMilliseconds / 1000)));
return PartialView("_DisplayPostsNew", approvedSearchResults);
}
This isn't surprising actually. The servers used in Windows Azure are currently mostly 1.6 GHz machines. The larger sized machine you use the more cores you get, but they are all the same speed. This likely is a much slower CPU than the development machine you use.
On Windows Azure Web Sites when you move to Shared mode you are still in a multi-tenant environment, so you could be seeing some noisy neighbors here. The difference between Free and Shared is that many of the quotas for free are removed since you are paying. When you move to Standard then you are assigned a Virtual Machine dedicated to your web sites (up to 100 of them), so that is the best case scenario since you are the only one using the resources at that point.
There was a thread on this on the MSDN forums a while back : http://social.msdn.microsoft.com/Forums/windowsazure/en-US/0d0a3a88-eac4-4b9e-8b10-4a547cbf653b/performance-of-azure-servers-slow-cpus?forum=windowsazuredevelopment
They have started offering different hardware configurations with more memory for Virtual Machines and Cloud Services and such, but I'm not sure the CPUs have been changed. It's hard to find the CPU stated on WindowsAzure.com anymore, but on the pricing calculator for Web Sites it references 1.6Ghz machines when you move the slider to Standard.
Actually I found the issue.
Locally, I tested with a few hundreds of records in my DB while in Azure DB I have over 70 000 records in that table which affects performance of the algorithm...
One mistake I did in the code above: I have filtered records from DB by specific date AFTER taking all out. By filtering directly in Linq, I increased the performance from 10s to 0.3s in Azure too.

Resources