Azure Stream Analytics is not feeding DocumentDB output sink - azure

I am trying to integrate Azure Stream Analytics with DocumentDB and use it as a output sink. Problem is, that there are no documents created in DocDB when the processing job is running. I tried to test my query and I have even tried to mirror the output to storage account. There is json file being created in the storage containing all the values, but DocDB stays empty.
Here is my query:
WITH Res1 AS ( SELECT id,
concat(
cast( datepart(yyyy,timestamp) as nvarchar(max)),
'-',
cast( datepart(mm,timestamp) as nvarchar(max)),
'-',
cast( datepart(dd,timestamp) as nvarchar(max))) date, temp, humidity, distance, timestamp
FROM
iothub Timestamp By timestamp)
Select * into docdboutput FROM Res1
Select * into test FROM Res1
I did set the documentDB output correctly to existing collection. I also tried to provide and not to provide document id parameter and neither of the options was working. I have used date field as a partition key when creating DocDB database and collection.
I did try also manual document upload. I have copied line from the created json file in storage account. I created separate json file containing this one record and uploaded it manually to DocumentDB collection via portal. It succeeded. Here is example of one line that was output to storage file:
{"id":"8ace6228-a2e1-434d-a5f3-c2c2f15da309","date":"2017-2-10","temp":21.0,"humidity":20.0,"distance":0,"timestamp":"2017-02-10T20:47:54.3716407Z"}
Please can anyone advice me, if there is some problem with my query, or navigate me how can I investigate and diagnose further.

Are you by any chance using a collection which has <=10K RUs, and has a partition key defined in DocDb (aka Single Partition Collection) ?
There is an ongoing defect that is blocking output to Single partitioned collections. This should be fixed by end of next week. Your workarounds at this point are try using a different collection --
a) with >10K RUs (with partition key defined in DocDB)
b) with <=10K RUs (with no partition key defined in DocDB/ASA)
Hope that helps!

Related

Delta Logic implementation using SnapLogic

Is there any snap available in SnapLogic to do following
Connect with snowflake and get data by SELECT * FROM VIEW
Connect with Azure Blob Storage and get the data from csv file : FILENAME_YYYYMMDD.csv
Take only those data which are available in 1 but NOT available in 2 and write this delta back to Azure Blob Storage : FILENAME_YYYYMMDD.csv
Is In-Memory Look-Up useful for this?
No, In-Memory Lookup snap is used for cases where you need to look up the value corresponding to the value in a certain field of the incoming records. For example, say you want to look up a country name against the country ISO code. This snap generally fetches the lookup table once and stores it in memory. Then it uses this stored lookup table to provide data corresponding to the incoming records.
In your case, you have to use the Join snap and configure it to an inner join.

Can you use dynamic/run-time outputs with azure stream analytics

I am trying to get aggregate data sent to different table storage outputs based on a column name in select query. I am not sure if this is possible with stream analytics.
I've looked up the stream analytics docs and different forums, so far haven't found any leads. I am looking for something like
Select tableName,count(distinct records)
into tableName
from inputStream
I hope this makes it clear what I'm trying to achieve, I am trying to insert aggregates data into table storage (defined as outputs). I want to grab the output stream/tablestorage name from a select Query. Any idea how that could be done?
I am trying to get aggregate data sent to different table storage
outputs based on a column name in select query.
If i don't misunderstand your requirement,you want to do a case...when... or if...else... structure in the ASA sql so that you could send data into different table output based on some conditions. If so,i'm afraid that it could not be implemented so far.Every destination in ASA has to be specific,dynamic output is not supported in ASA.
However,as a workaround,you could use Azure Function as output.You could pass the columns into Azure Function,then do the switches with code in the Azure Function to save data into different table storage destinations. More details,please refer to this official doc:https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-with-azure-functions

Azure Table Storage Explorer Query

I would like to look at the data collected recently.
I've written some query statements, but they were shown in the order of the first data collected.
I do not know how to write a query to see the data in the order in which it was recently collected.
Please let me know if you have any related tips.
You can click on Timestamp column and view the data in ascending or descending order as there is no support for Order by queries in Azure Storage Explorer.
I do not know how to write a query to see the data in the order in which it was recently collected.
Based on the official document, the order by query is not support by Azure Table Service currently. The query result is default order by PartitionKey and RowKey.

Strategy for storing application logs in Azure Table Storage

I am to determine a good strategy for storing logging information in Azure Table Storage. I have the following:
PartitionKey: The name of the log.
RowKey: Inversed DateTime ticks,
The only issue here is that partitions could get very large (millions of entities) and the size will increase with time.
But that being said, the type of queries being performed will always include the PartitionKey (no scanning) AND a RowKey filter (a minor scan).
For example (in a natural language):
where `PartitionKey` = "MyApiLogs" and
where `RowKey` is between "01-01-15 12:00" and "01-01-15 13:00"
Provided that the query is done on both PartitionKey and RowKey, I understand that the size of the partition doesn't matter.
Take a look at our new Table Design Patterns Guide - specifically the log-data anti-pattern as it talks about this scenario and alternatives. Often when people write log files they use a date for the PK which results in a partition being hot as all writes go to a single partition. Quite often Blobs end up being a better destination for log data - as people typically end up processing the logs in batches anyway - the guide talks about this as an option.
Adding my own answer so people can have something inline without needing external links.
You want the partition key to be the timestamp plus the hash code of the message. This is good enough in most cases. You can add to the hash code of the message the hash code(s) of any additional key/value pairs as well if you want, but I've found it's not really necessary.
Example:
string partitionKey = DateTime.UtcNow.ToString("o").Trim('Z', '0') + "_" + ((uint)message.GetHashCode()).ToString("X");
string rowKey = logLevel.ToString();
DynamicTableEntity entity = new DynamicTableEntity { PartitionKey = partitionKey, RowKey = rowKey };
// add any additional key/value pairs from the log call to the entity, i.e. entity["key"] = value;
// use InsertOrMerge to add the entity
When querying logs, you can use a query with partition key that is the start of when you want to retrieve logs, usually something like 1 minute or 1 hour from the current date/time. You can then page backwards another minute or hour with a different date/time stamp. This avoids the weird date/time hack that suggests subtracting the date/time stamp from DateTime.MaxValue.
If you get extra fancy and put a search service on top of the Azure table storage, then you can lookup key/value pairs quickly.
This will be much cheaper than application insights if you are using Azure functions, which I would suggest disabling. If you need multiple log names just add another table.

Peculiar Issue with Azure Table Storage Select Query

I came across with weird behavior of Azure table Storage query. I used following code to get the list of entities from Azure Table Storage
query = context.CreateQuery (DomainData.Employee.TABLE_NAME) .Where(strPredicate).Select(selectQuery));
where is context TableServiceContext, I was trying to pull Employee entity from Azure table storage, My requirement is dynamically construct predicate and Projections.
So strPredicate is a string, where it contains dynamically constructed predicate.
selectQuery is projection string, where it is constructed dynamically based on User Selected Properties.
When the users selects all the properties of Employee Object, here Employee object has over 200 properties. system constructed dynamic projection string based on all properties and System takes 45 minutes to retrieve 60000 records from Azure table storage.
Whereas when i enter directly object in select projection, i.e looks like below
query = (context.CreateQuery<DomainData.Employee> (DomainData.Employee.TABLE_NAME) .Where(strPredicate)
then query takes only 5 minutes to retrieve 60000 records from Azure table storage. Why is this peculiar behavior both the query are same , one with project of columns/properties other is without any projection, but Azure table storage provides same number of entity with same column property and same size of each entity why is it Azure table storage is taking too much of time in the first query why is it faster in second query. Please let me know.
The standard advice when dealing with perceived anomalies with Windows Azure Storage is to use Fiddler to identify the actual storage operation invoked. This will quickly allow you to see what the actual differences are with the two operations.

Resources