My Azure Stream Analytics Job does not detect any input events if I use reference data in the query. When I'm using only streaming data it works well.
Here is my query:
SELECT v.localization as Station, v.lonn as Station_Longitude, v.latt as Station_Latitude, d.lat as My_Latitude, d.lon as My_Longitude
INTO [closest-station]
FROM eventhub d
CROSS JOIN [stations] v
WHERE ST_DISTANCE(CreatePoint(d.lat, d.lon), CreatePoint(v.latt, v.lonn) ) < 300
I used eventhub and blob as the input and the result was the same - works only without reference data
Inb4
When I'm testing the query with sample reference data (I'm uploading the exact same file as stored in the reference data location) it returns expected values
I've tested both inputs and tests were conducted successfully
The data comes from the logic app which copies it from dropbox to the eventhub or storage account (I've tested both scenarios) that are used in Azure Stream Analytics as inputs. Even if see this ran successfully, still no input events in ASA appear.
The idea is to get coordinates of the stations closer than 300 m to my localization.
Solved - you have to specify explicitly the reference file in the reference data input path pattern. Specifying container only doesn't work even if there is only one file inside.
Stream Analytics job will wait indefinitely for the blob to become available
As described here: Use referenece data for lookups in Stream Analytics
Related
I'm trying to define the architecture of my project using Azure Iot Edge and the module Azure SQL Edge (so I can have storage and use ML), but I'm stucked in the streaming part.
I'm getting data from a factory that has several machines and each machine has several different sensors that send data at different times. Each variable I get is just identified by an ID. I receiving something like this:
Timestamp
variableId
value
07/04/2022 12:34:7.89
abc123
3
07/04/2022 12:34:8
ert456
45
07/04/2022 12:34:8.59
abc123
5
07/04/2022 12:34:9
uio786
12.67
I want to use the variable abc123 and uio786 to a ML model and the ert456 and uio786 to another without making a specific select for their ID, but having this definition somewhere. A dynamic select...
Is this possible?
According to Microsoft, you can use a data file as reference data for an Edge Stream Analytics job, so you could add your data needed for routing in there.
You can't do this in SQL Edge Streaming.
I Have a requirement where I need to ingest continuous/steam data(Json format) from eventHub to Azure data lake.
I want to follow the layered approach(raw, clean, prepared) to finally store data into delta table.
My doubt is around the raw layer.
out of below two approach which one do you recommend is best.
Event hub -> RawLayer(Raw Json Format) -> cleanLayer (delta table) -> preparedLayer(delta table)
Event hub -> RawLayer(delta table) -> cleanLayer (delta table) -> preparedLayer(delta table)
so shall I store the raw Json format in raw layer or its suggested to create delta table in Raw layer is well.
Regards,
I will let others debate the theoretical approaches.
From a practical standpoint, here are the most common ways to write to disk from Event Hub:
Event Hub Capture, dumps files to a storage account directly from an event hub, but the format is AVRO. This is not practical, but it is the "rawest" form your records can take. If I remember correctly your payload is encoded in base64 and embedded in a common schema. They have guidance on how to extract your data in Spark.
Azure Stream Analytics can output to JSON or parquet. In both cases, events are actually going through a deserialization / serialization process that can't be bypassed. This means the output will look raw (at least in the JSON case) but won't really be. In this scenario ASA should be seen as a streaming ETL/ETL. Don't use it (and pay for it) if you're not actively using its features (transformation, cleaning, enrichment...). Note that ASA doesn't support delta lake as an output yet - so you will still need some post processing to ingest the generated files.
Azure Functions using the proper bindings, but as ASA it will require deserialization that don't really qualify as "raw", unless you take a similar approach to what's done in EH Capture to which point you should just use Capture.
I am trying to get aggregate data sent to different table storage outputs based on a column name in select query. I am not sure if this is possible with stream analytics.
I've looked up the stream analytics docs and different forums, so far haven't found any leads. I am looking for something like
Select tableName,count(distinct records)
into tableName
from inputStream
I hope this makes it clear what I'm trying to achieve, I am trying to insert aggregates data into table storage (defined as outputs). I want to grab the output stream/tablestorage name from a select Query. Any idea how that could be done?
I am trying to get aggregate data sent to different table storage
outputs based on a column name in select query.
If i don't misunderstand your requirement,you want to do a case...when... or if...else... structure in the ASA sql so that you could send data into different table output based on some conditions. If so,i'm afraid that it could not be implemented so far.Every destination in ASA has to be specific,dynamic output is not supported in ASA.
However,as a workaround,you could use Azure Function as output.You could pass the columns into Azure Function,then do the switches with code in the Azure Function to save data into different table storage destinations. More details,please refer to this official doc:https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-with-azure-functions
I have the following scenario:
Mobile app produces events that are sent to Event Hub which is input stream source to a Stream Analytics query. From there they are passed through a sequential flow of queries that splits the stream into 2 streams based on criteria, evaluates other conditions and decides whether or not to let the event keep flowing through the pipeline (if it doesn't it is simply discarded). You could classify what we are doing is noise reduction/event filtering. Basically if A just happened don't let A happen again unless B & C happened or X time passes. At the end of the query gauntlet the streams are merged again and the "selected" events are propagated as "chosen" outputs.
My problem is that I need the ability to compare the current event to the previous "chosen" event (not just the previous input event) so in essence I need to join my input stream to my output stream. I have tried various ways to do this and so far none have worked, I know that other CEP engines support this concept. My queries are mostly all defined as temporary results sets inside of a WITH statement (that's where my initial input stream is pulled into the first query and each following query depends on the one above it) but I see no way to either join my input to my output or to join my input to another temporary result set that is further down in the chain. It appears that join only supports inputs?
For the moment I am attempting to work around this limitation with something I really don't want to do in production, but I actually have an output defined going to an Azure Queue then an Azure Function triggered by events on that queue that wakes up and posts it to a different Event hub that is mapped as a recirc feed input back into my queries which I can join to. Still wiring all of that up so not 100% sure it will work but thinking there has to be a better option for this relatively common pattern?
The WITH statement is indeed the right way to get a previous input joined with some other data.
You may need to combine it with the LAG operator, that gets the previous event in a data stream.
Let us know if it works for you.
Thanks,
JS - Azure Stream Analytics
AFAIK, the stream analytics job supports two distinct data input types: data stream inputs and reference data inputs. Per my understanding, you could leverage Reference data to perform a lookup or to correlate with your data stream. For more details, you could refer to the following tutorials:
Data input types: Data stream and reference data
Configuring reference data
Tips on refreshing your reference data
Reference Data JOIN (Azure Stream Analytics)
I have 1 eventhub with 2 partitions, I want to aggregate my data for a minute and save that data to database, I am using IEventProcessor to read events from the eventhub.
I am able to save data to database as it is, but when I aggregate data, I get 2 entries per minute instead of 1. I think the reason is the IEventProcessor runs twice, i.e each time for a partition in eventhub.
Are there any ways I can achieve aggregation of streaming data for a minute while reading from eventhub and then save to the database? (I can't use stream analytics, since I have data in protobuf format.)
You can use Azure IoTHub React Java and Scala API, it provides a merged reactive stream with events from all EventHub partitions.
From your perspective you'll see only one stream of data, regardless of the number of partitions in EventHub, and you can select a subset of partitions too if you need.
These samples show how the API works, it should make your task very simple. You need to define your "Sink" which is going to be a method writing events to a database, and link the provided "Source", something like:
val eventHubRecords = IoTHub().source(java.time.Instant.now())
val myDatabase = Sink.foreach[MessageFromDevice] {
m ⇒ MyDB.writeRecord(m)
}
eventHubRecords.to(myDatabase).run()
Here are the configuration settings, checkpointing supports Cassandra and AzureBlob.
Note: the project is named after Azure IoT, however you can use it for EventHub, let me know if you have any question.
You can use Stream Analytics and it's Group By clause. As long as all the rows are unique it won't summarize them. You can then push that output onto another Event Hub for your IEventProcessor to handle, or write it directly to storage.