Peculiar Issue with Azure Table Storage Select Query - azure

I came across with weird behavior of Azure table Storage query. I used following code to get the list of entities from Azure Table Storage
query = context.CreateQuery (DomainData.Employee.TABLE_NAME) .Where(strPredicate).Select(selectQuery));
where is context TableServiceContext, I was trying to pull Employee entity from Azure table storage, My requirement is dynamically construct predicate and Projections.
So strPredicate is a string, where it contains dynamically constructed predicate.
selectQuery is projection string, where it is constructed dynamically based on User Selected Properties.
When the users selects all the properties of Employee Object, here Employee object has over 200 properties. system constructed dynamic projection string based on all properties and System takes 45 minutes to retrieve 60000 records from Azure table storage.
Whereas when i enter directly object in select projection, i.e looks like below
query = (context.CreateQuery<DomainData.Employee> (DomainData.Employee.TABLE_NAME) .Where(strPredicate)
then query takes only 5 minutes to retrieve 60000 records from Azure table storage. Why is this peculiar behavior both the query are same , one with project of columns/properties other is without any projection, but Azure table storage provides same number of entity with same column property and same size of each entity why is it Azure table storage is taking too much of time in the first query why is it faster in second query. Please let me know.

The standard advice when dealing with perceived anomalies with Windows Azure Storage is to use Fiddler to identify the actual storage operation invoked. This will quickly allow you to see what the actual differences are with the two operations.

Related

Delta Logic implementation using SnapLogic

Is there any snap available in SnapLogic to do following
Connect with snowflake and get data by SELECT * FROM VIEW
Connect with Azure Blob Storage and get the data from csv file : FILENAME_YYYYMMDD.csv
Take only those data which are available in 1 but NOT available in 2 and write this delta back to Azure Blob Storage : FILENAME_YYYYMMDD.csv
Is In-Memory Look-Up useful for this?
No, In-Memory Lookup snap is used for cases where you need to look up the value corresponding to the value in a certain field of the incoming records. For example, say you want to look up a country name against the country ISO code. This snap generally fetches the lookup table once and stores it in memory. Then it uses this stored lookup table to provide data corresponding to the incoming records.
In your case, you have to use the Join snap and configure it to an inner join.

Can you use dynamic/run-time outputs with azure stream analytics

I am trying to get aggregate data sent to different table storage outputs based on a column name in select query. I am not sure if this is possible with stream analytics.
I've looked up the stream analytics docs and different forums, so far haven't found any leads. I am looking for something like
Select tableName,count(distinct records)
into tableName
from inputStream
I hope this makes it clear what I'm trying to achieve, I am trying to insert aggregates data into table storage (defined as outputs). I want to grab the output stream/tablestorage name from a select Query. Any idea how that could be done?
I am trying to get aggregate data sent to different table storage
outputs based on a column name in select query.
If i don't misunderstand your requirement,you want to do a case...when... or if...else... structure in the ASA sql so that you could send data into different table output based on some conditions. If so,i'm afraid that it could not be implemented so far.Every destination in ASA has to be specific,dynamic output is not supported in ASA.
However,as a workaround,you could use Azure Function as output.You could pass the columns into Azure Function,then do the switches with code in the Azure Function to save data into different table storage destinations. More details,please refer to this official doc:https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-with-azure-functions

Azure Cosmos DB - incorrect and variable document count

I have inserted exactly 1 million documents in an Azure Cosmos DB SQL container using the Bulk Executor. No errors were logged. All documents share the same partition key. The container is provisioned for 3,200 RU/s, unlimited storage capacity and single-region write.
When performing a simple count query:
select value count(1) from c where c.partitionKey = #partitionKey
I get varying results varying from 303,000 to 307,000.
This count query works fine for smaller partitions (from 10k up to 250k documents).
What could cause this strange behavior?
It's reasonable in cosmos db. Firstly, what you need to know is that Document DB imposes limits on Response page size. This link summarizes some of those limits: Azure DocumentDb Storage Limits - what exactly do they mean?
Secondly, if you want to query large data from Document DB, you have to consider the query performance issue, please refer to this article:Tuning query performance with Azure Cosmos DB.
By looking at the Document DB REST API, you can observe several important parameters which has a significant impact on query operations : x-ms-max-item-count, x-ms-continuation.
So, your error is resulted of bottleneck of RUs setting. The count query is limited by the number for RUs allocated to your collection. The result that you would have received will have a continuation token.
You may have 2 solutions:
1.Surely, you could raise the RUs setting.
2.For cost, you could keep looking for next set of results via continuation token and keep on adding it so that you will get total count.(Probably in sdk)
You could set value of Max Item Count and paginate your data using continuation tokens. The Document Db sdk supports reading paginated data seamlessly. You could refer to the snippet of python code as below:
q = client.QueryDocuments(collection_link, query, {'maxItemCount':10})
results_1 = q._fetch_function({'maxItemCount':10})
#this is a string representing a JSON object
token = results_1[1]['x-ms-continuation']
results_2 = q._fetch_function({'maxItemCount':10,'continuation':token})
I imported exactly 30k documents into my database.Then I tried to run the query
select value count(1) from c in Query Explorer. It turns out only partial of total documents every page. So I need to add them all by clicking Next Page button.
Surely, you could do this query in the sdk code via continuation token.

Azure Table Storage data modeling considerations

I have a list of users. A user can either login either using username or e-mail address.
As a beginner in azure table storage, this is what I do for the data model for fast index scan.
PartitionKey RowKey Property
users:email jacky#email.com nickname:jack123
users:username jack123 email:jacky#email.com
So when a user logs in via email, I would supply PartitionKey eq users:email in the azure table query. If it is username, Partition eq users:username.
Since it doesn't seem possible to simulate contains or like in azure table query, I'm wondering if this is a normal practice to store multiple row of data for 1 user ?
Since it doesn't seem possible to simulate contains or like in azure
table query, I'm wondering if this is a normal practice to store
multiple row of data for 1 user ?Since it doesn't seem possible to
simulate contains or like in azure table query, I'm wondering if this
is a normal practice to store multiple row of data for 1 user ?
This is a perfectly valid practice and in fact is a recommended practice. Essentially you will have to identify the attributes on which you could potentially query your table storage and somehow use them as a combination of PartitionKey and RowKey.
Please see Guidelines for table design for more information. From this link:
Consider storing duplicate copies of entities. Table storage is cheap so consider storing the same entity multiple times (with
different keys) to enable more efficient queries.

How Storage API can be used to get Azure Classic VM metrics?

Can we collect metrics for Azure Classic VM using Storage API or is there any other way to get the metrics for Azure Classic VM please suggest.
To get the Azure VM metrics from a Azure Storage Service, you need to enable Diagnostics and configure the Storage Account on Azure portal.
After that, you will find that multi tables will be created to store the metrics.
The tables are with the following naming conventions:
WADMetrics : Standard prefix for all WADMetrics tables
PT1H or PT1M : Signifies that the table contains aggregate data over 1 hour or 1 minute
P10D : Signifies the table will contain data for 10 days from when the table started collecting data
V2S : String constant
yyyymmdd : The date at which the table started collecting data
Each WADMetrics table will contain the following columns:
PartitionKey: The partitionkey is constructed based on the resourceID value to uniquely identify the VM resource. for e.g. : - 002Fsubscriptions::002FresourceGroups:002F:002Fproviders:002FMicrosoft:002ECompute:002FvirtualMachines:002F
RowKey : Follows the format :. The descending time tick calculation is max time ticks minus the time of the beginning of the aggregation period. E.g. if the sample period started on 10-Nov-2015 and 00:00Hrs UTC then the calculation would be: DateTime.MaxValue.Ticks - (new DateTime(2015,11,10,0,0,0,DateTimeKind.Utc).Ticks). For the memory available bytes performance counter the row key will look like: 2519551871999999999__:005CMemory:005CAvailable:0020Bytes
CounterName : Is the name of the performance counter. This matches the counterSpecifier defined in the xml config.
Maximum : The maximum value of the performance counter over the aggregation period.
Minimum : The minimum value of the performance counter over the aggregation period.
Total : The sum of all values of the performance counter reported over the aggregation period.
Count : The total number of values reported for the performance counter.
Average : The average (total/count) value of the performance counter over the aggregation period.
To read the data from Azure Table, you could use Azure Table client library or Azure Table REST API.
Get started with Azure Table storage using .NET
Table Service REST API
Update 2017/07/18
my doubt is 20170709 is start date and 20170719 is end date am i right?
Yes, you are right.
Doubt 2.To access this table i need to create a POJO so how can i get the schema of the table meaning if maximum/minimum/Average is int/long/double/float
You can open a entity of the table from Azure Storage Explorer. You will see the type of the columns. For example,
Doubt 3. how to query WADMetricsPT1HP10DV2S20170709 to get metrics for one particular hour?
You could query the data by Timestamp.
Can we collect metrics for Azure Classic VM using Storage API
By collect metrics, if you mean the process of capturing the metrics data then the answer is no. You can't use Storage API to do that. You would need to use the Metrics API for that purpose. Data collected by this API will store the data in Azure Storage.
Once the data is in Azure Storage, then you can use Storage API to get that data. Depending on where the data is stored (Blobs and/or Tables), you would use appropriate parts of Storage API to fetch and manage that data.

Resources