I try to get my head around the hopping window in azure stream analytics.
I'll get the following data from an Azure Event Hub:
[
{
"Id": "1",
"SensorData": [
{
"Timestamp": 1603112431,
"Type": "LineCrossing",
"Direction": "forward"
},
{
"Timestamp": 1603112431,
"Type": "LineCrossing",
"Direction": "forward"
}
],
"EventProcessedUtcTime": "2020-10-20T06:35:48.5890814Z",
"PartitionId": 1,
"EventEnqueuedUtcTime": "2020-10-20T06:35:48.3540000Z"
},
{
"Id": "1",
"SensorData": [
{
"Timestamp": 1603112430,
"Type": "LineCrossing",
"Direction": "backward"
}
],
"EventProcessedUtcTime": "2020-10-20T06:35:48.5890814Z",
"PartitionId": 0,
"EventEnqueuedUtcTime": "2020-10-20T06:35:48.2140000Z"
}
]
My query looks like the following:
SELECT s.Id, COUNT(data.ArrayValue.Direction) as Count
FROM [customers] s TIMESTAMP BY EventEnqueuedUtcTime
CROSS APPLY GetArrayElements(s.SensorData) AS data
WHERE data.ArrayValue.Type = 'LineCrossing'
AND data.ArrayValue.Direction = 'forward'
GROUP BY s.Id, HoppingWindow(second, 3600, 5)
I used a Hopping Window to get every 5th second all events from the last day.
My expectation for the given dto would be: One row with Id1 and Count 2, but what I receive is: 720 rows (so 3600 divided by 5) with Id1 has Count 2.
Shouldn't those events not be aggregated by the HoppingWindow function?
I structured your query as it follows:
with inputValues as (Select input.*, message.ArrayValue as Data from input CROSS APPLY GetArrayElements(input.SensorData) as message)
select inputValues.Id, count(Data.Direction) as Count
into output
from inputValues
where Data.Type = 'LineCrossing' and Data.Direction='forward'
GROUP BY inputValues.Id, HoppingWindow(second, 3600, 5)
I have set the input to Event Hub, and in the Visual Studio I have started a query with the cloud input.
I used a Windows Client application to pipe in the messages to Event Hub(2. from the picture below) and observed that events were coming every 5 seconds(1. from the picture below and 3. from the picture below).
Maybe just change the query I shared to reflect the correct time-stamping, but the result should be as expected - every 5 seconds count to the output per the defined condition for all events that arrived in the last hour(3600 seconds in the HoppingWindow function).
Related
So, I want to capture Administrative events sent by Azure to an EventHub with Stream Analytics Job and forward only the events which match an specific criteria to an Azure Function. The events come in an object like this (heavily trimmed to simplify):
{
"records": [
{
"resourceId": "<resource_path>",
"operationName": "MICROSOFT.COMPUTE/VIRTUALMACHINES/WRITE",
},
{
"time": "2021-03-19T19:19:56.0639872Z",
"operationName": "MICROSOFT.COMPUTE/VIRTUALMACHINES/WRITE",
"category": "Administrative",
"resultType": "Accept",
"resultSignature": "Accepted.Created",
"properties": {
"statusCode": "Created",
"serviceRequestId": "<trimmed>",
"eventCategory": "Administrative",
"message": "Microsoft.Compute/virtualMachines/write",
"hierarchy": "<trimmed>"
},
"tenantId": "<trimmed>"
}
],
"EventProcessedUtcTime": "2021-03-19T19:25:21.1471185Z",
"PartitionId": 1,
"EventEnqueuedUtcTime": "2021-03-19T19:20:43.9080000Z"
}
I want to filter the query based on these criteria: records[0].operationName = 'MICROSOFT.COMPUTE/VIRTUALMACHINES/WRITE' AND records[1].properties.statusCode = 'Created'. To achieve that, I began with the following query which returns this record, but it's lacking one of the criteria I NEED to match (statusCode)
SELECT
records
INTO
[output]
FROM
[input]
WHERE
GetArrayElement(records, 0).operationName = 'MICROSOFT.COMPUTE/VIRTUALMACHINES/WRITE'
Trying the query below doesn't work (it returns 0 matches):
SELECT
records
INTO
[output]
FROM
[input]
WHERE
GetArrayElement(records, 0).operationName = 'MICROSOFT.COMPUTE/VIRTUALMACHINES/WRITE'
AND GetArrayElement(records, 1).properties.statusCode = 'OK'
Anyone has a clue on this?
Found out the solution! I need to use GetRecordPropertyValue, like so:
SELECT
records
INTO
[output]
FROM
[input]
WHERE
GetArrayElement(records, 0).operationName = 'MICROSOFT.COMPUTE/VIRTUALMACHINES/WRITE'
AND GetRecordPropertyValue(GetArrayElement(records, 1).properties, 'statusCode') = 'Created'
Looks a bit clumsy to me, but it worked!
I'm using Azure Data Factory to periodically import data from MySQL to Azure SQL Data Warehouse.
The data goes through a staging blob storage on an Azure storage account, but when I run the pipeline it fails because it can't separate the blob text back to columns. Each row that the pipeline tries to insert into the destination becomes a long string which contains all the column values delimited by a "⯑" character.
I used Data Factory before, without trying the incremental mechanism, and it worked fine. I don't see a reason it would cause such a behavior, but I'm probably missing something.
I'm attaching the JSON that describes the pipeline with some minor naming changes, please let me know if you see anything that can explain this.
Thanks!
EDIT: Adding exception message:
Failed execution Database operation failed. Error message from
database execution :
ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error
happened when loading data into SQL Data
Warehouse.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=Query
aborted-- the maximum reject threshold (0 rows) was reached while
reading from an external source: 1 rows rejected out of total 1 rows
processed.
(/f4ae80d1-4560-4af9-9e74-05de941725ac/Data.8665812f-fba1-407a-9e04-2ee5f3ca5a7e.txt)
Column ordinal: 27, Expected data type: VARCHAR(45) collate SQL_Latin1_General_CP1_CI_AS, Offending value:* ROW OF VALUES
* (Tokenization failed), Error: Not enough columns in this
line.,},],'.
{
"name": "CopyPipeline-move_incremental_test",
"properties": {
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "RelationalSource",
"query": "$$Text.Format('select * from [table] where InsertTime >= \\'{0:yyyy-MM-dd HH:mm}\\' AND InsertTime < \\'{1:yyyy-MM-dd HH:mm}\\'', WindowStart, WindowEnd)"
},
"sink": {
"type": "SqlDWSink",
"sqlWriterCleanupScript": "$$Text.Format('delete [schema].[table] where [InsertTime] >= \\'{0:yyyy-MM-dd HH:mm}\\' AND [InsertTime] <\\'{1:yyyy-MM-dd HH:mm}\\'', WindowStart, WindowEnd)",
"allowPolyBase": true,
"polyBaseSettings": {
"rejectType": "Value",
"rejectValue": 0,
"useTypeDefault": true
},
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "column1:column1,column2:column2,column3:column3"
},
"enableStaging": true,
"stagingSettings": {
"linkedServiceName": "StagingStorage-somename",
"path": "somepath"
}
},
"inputs": [
{
"name": "InputDataset-input"
}
],
"outputs": [
{
"name": "OutputDataset-output"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 10,
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "Activity-0-_Custom query_->[schema]_[table]"
}
],
"start": "2017-06-01T05:29:12.567Z",
"end": "2099-12-30T22:00:00Z",
"isPaused": false,
"hubName": "datafactory_hub",
"pipelineMode": "Scheduled"
}
}
It sounds like what your doing is right, but the data is poorly formed (common problem, none UTF-8 encoding) so ADF can't parse the structure as you require. When I encounter this I often have to add a custom activity to the pipeline that cleans and prepares the data so it can then be used in a structured way by downstream activities. Unfortunately this is a big be overhead in the development of the solution and will require you to write a C# class to deal with the data transformation.
Also remember ADF has none of its own compute, it only invokes other services, so you'll also need an Azure Batch Service to execute to compiled code.
Sadly there is no magic fix here. Azure is great to Extract and Load your perfectly structured data, but in the real world we need other services to do the Transform or Cleaning meaning we need a pipeline that can ETL or I prefer ECTL.
Here's a link on create ADF custom activities to get you started: https://www.purplefrogsystems.com/paul/2016/11/creating-azure-data-factory-custom-activities/
Hope this helps.
I've been struggeling with the same message, sort of, when importing from Azure sql db to Azure DWH using Data Factory v.2 using staging (which implies Polybase). I've learned that Polybase will fail with error messages related to incorrect data types etc. The message I've received is much similar to the one mentioned here, even though I'm not using Polybase directly from SQL, but via Data Factory.
Anyways, the solution for me was to avoid NULL values for columns of decimal or numeric type, e.g. ISNULL(mynumericCol, 0) as mynumericCol.
I want to pass multiple, individual records within a set window (can be tumbling, hopping, sliding) without any aggregation into a javascript UDF like so:
Input data is:
{ "device":"A", "temp":20.0, "humidity":0.9, "param1": 83}
{ "device":"A", "temp":22.0, "humidity":0.9, "param1": 63}
{ "device":"B", "temp":15.0, "humidity":0.5, "param1": 13}
{ "device":"A", "temp":22.0, "humidity":0.5, "param1": 88}
{ "device":"A", "temp":22.0, "humidity":0.5, "param1": 88}
Pass records within a specified window as an object array:
function process_records(record_array) {
//access individual records
record_one_device = records[0].device
record_two_device = records[1].device
record_three_device = records[2].device
...
}
Thanks for any help!
According to your requirement, I assumed that you could leverage the Collect aggregate function from Azure Stream Analytics. Here is my test, you could refer to it:
Input
[
{
"Make": "Honda",
"Time": "2015-01-01T00:00:01.0000000Z",
"Weight": 1000
},
{
"Make": "Honda",
"Time": "2015-01-01T00:00:03.0000000Z",
"Weight": 3000
},
{
"Make": "Honda",
"Time": "2015-01-01T00:00:12.0000000Z",
"Weight": 2000
},
{
"Make": "Honda",
"Time": "2015-01-01T00:00:52.0000000Z",
"Weight": 1000
}
]
With the following query, I could retrieve the data contained in temporal windows as follows:
SELECT
Make,
System.TimeStamp AS Time,
Collect() AS records
FROM
Input TIMESTAMP BY Time
GROUP BY
Make,
HoppingWindow(second, 10,10)
Then, you could call UDF.processRecords(Collect()) in your query. For more details, you could refer to common Stream Analytics usage patterns and Azure Stream Analytics UDF and Stream Analytics Window functions.
When parsing exported Application Insights telemetry from Blob storage, the request data looks something like this:
{
"request": [
{
"id": "3Pc0MZMBJgQ=",
"name": "POST Blah",
"count": 6,
"responseCode": 201,
"success": true,
"url": "https://example.com/api/blah",
"durationMetric": {
"value": 66359508.0,
"count": 6.0,
"min": 11059918.0,
"max": 11059918.0,
"stdDev": 0.0,
"sampledValue": 11059918.0
},
...
}
],
...
}
I am looking for the duration of the request, but I see that I am presented with a durationMetric object.
According to the documentation the request[0].durationMetric.value field is described as
Time from request arriving to response. 1e7 == 1s
But if I query this using Analytics, the value don't match up to this field:
They do, however, match up to the min, max and sampledValue fields.
Which field should I use? And what does that "value": 66359508.0 value represent in the above example?
It doesn't match because you're seeing sampled data (meaning this event represents sampled data from multiple requests). I'd recommend starting with https://azure.microsoft.com/en-us/documentation/articles/app-insights-sampling/ to understand how sampling works.
In this case, the "matching" value would come from duration.sampledValue (notice that value == count * sampledValue)
It's hard to compare exactly what you're seeing because you don't show the Kusto query you're using, but you do need to be aware of sampling when writing AI Analytics queries. See https://azure.microsoft.com/en-us/documentation/articles/app-insights-analytics-tour/#counting-sampled-data for more details on the latter.
Im running Stripe in test mode.
I've created a Yearly billing plan for 100GBP amount, with a 7 days trial ( directly onto Stripe dashboard. )
However, to test the webhooks i've hardcoded the trial_end:
$trialEnd = new DateTime();
$trialEnd->setTimestamp(time()+120);
$user = Users::find($this->user()['user_id']);
$user->subscription($stripe_plan['stripe_plan'])->trialFor($trialEnd)->create($data['stripeToken'], [
'email' => $this->user()['email']
]);
$user->save();
Basically all goes well, but into stripe dashboard the first invoice for 0 GBP is shown, and after one minute i get the Subscription will end in a minute event. After all, the subscription become Active ( from Trialing ) state.
All the webhooks and even the first subscription add reponse i get the trial ends period instead subscription ends.
How can i get the subscription_ends_at timestamp ?
All webhook requests are having the following timestamps:
{
"id": "evt_18baRrIzJLF7fe6PMDPYD0NM",
"object": "event",
"api_version": "2016-07-06",
"created": 1469558315,
"data": {
"object": {
"id": "sub_8tNBbqy0AmSk8p",
"object": "subscription",
"application_fee_percent": null,
"cancel_at_period_end": false,
"canceled_at": null,
"created": 1469558268,
"current_period_end": 1469558384,
"current_period_start": 1469558268,
"customer": "cus_8tNB1tWYw3Jw7L",
"discount": null,
"ended_at": null,
"livemode": false,
"metadata": {
},
"plan": {
"id": "yearly_200",
"object": "plan",
"amount": 20000,
"created": 1469545724,
"currency": "gbp",
"interval": "year",
"interval_count": 1,
"livemode": false,
"metadata": {
},
"name": "Full Club Membership - Pay Anually",
"statement_descriptor": "FULL MEMBERSHIP",
"trial_period_days": 7
},
"quantity": 1,
"start": 1469558268,
"status": "trialing",
"tax_percent": null,
"trial_end": 1469558384,
"trial_start": 1469558268
}
},
"livemode": false,
"pending_webhooks": 1,
"request": null,
"type": "customer.subscription.trial_will_end"
}
So if you look at trial_start and trial_end is same with current_period_start and current_period_end.
I've though initially that if this is the current period.. fine, but after trial expires the current period shouldn't be trials one.
There is any method to take the subscription_ends_at field from Stripe api ? And also, after the trial period ends, shouldn't send a invoice with the real amount ?
Also, i created a subscription plan with no trial period. That plan after a client subscribed, i get the correct timestamps.
Thanks in advance!
It looks like you figured it out. Basically, the delay comes from the fact that when the timestamp passes for your trial expiration, your request to create a new Invoice on that billing cycle gets added to a queue. Typically the queue will create the new invoice ~immediately, but it can sometimes go several minutes before triggering.
The first Invoice will always have timestamps for the current_period_* that map to the trial_period_* ones. Whereas, the second Invoice (that shows up with the invoice.created-event) will have the accurate timestamps for the billing period.
Oh, now i understand .. i will explain maybe will help someone :D.
Basically if a subscription got a trial period when you subscribe you will get a invoice for 0. Then, even if you set the trial to expire in 2 minutes with the request , the first payment will occur in about 10 minutes :D ( with that payment (if you set a webhook url) you will get a "type": "customer.subscription.updated" event who will contain all desired informations. At that time you can update your subscription_ends_at .
I didn't wait 10 minutes to see if the new invoice will be triggered.. and created -> removed -> recreated -> removed and so on for 4 hours with different tests.