Azure Stream Analytics Job cannot detect either input table or output table - azure

I'm new to the Azure Stream Job, and I want to use the reference data from Azure SQL DB to load into Power BI to have streaming data.
I've set up the storage account when setting up the SQL input table. I test the output table (Power BI) which is also fine, no error.
I tested both input table and output table connection, both are successfully connected, and I can see the input data from Input preview.
But when I tried to compose the query to test it out, the query cannot detect either input table or the output table.
The output table icon also grey out.
Error message: Query must refer to as least one data stream input.
Could you help me?
Thank you!!

The test query portal will not allow you to test the query if there are syntax errors. You will need to correct the syntax (as seen by yellow squiggles) before testing.
Here is a sample test query without any syntax error messages:

Stream Analytics does require to have at least one source coming from one of these 3 streaming sources: Event Hubs, IoT Hub, or Blob/ADLS. We don't support SQL as a streaming source at this time.
Using reference data is meant to augment the stream of data.
From your scenario, I see you want to get data from SQL to Power BI directly. For this, you can actually directly connect Power BI to your SQL source.
JS (Azure Stream Analytics)

Related

Stream Analytics Query UI portal: Unable to connect to input source at the moment

I followed the link below, and created an input, like below:
https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-tutorial-visualize-anomalies
The Test button in the red box confirms the connection is ok.
However, on the Query page, it shows error below:
Unable to connect to input source at the moment. Please check if the
input source is available and if it has not hit connection limits.
Below shows that the incoming messages are already in the event hub.
I tried using both Connection String and MI for the input, but am still getting the error.
I can send messages to and receive them from the event hub by following the link below:
https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-dotnet-standard-getstarted-send
To test your query against a specific time range of incoming events, select Select time range.
You may checkout the MS Q&A thread addressing similar issue.
For more details, refer to Test an Azure Stream Analytics job in the portal.

How to implement Change Data Capture (CDC) using apache spark and kafka?

I am using spark-sql-2.4.1v with java 1.8. and kafka versions spark-sql-kafka-0-10_2.11_2.4.3 and kafka-clients_0.10.0.0.
I need to join streaming data with meta-data which is stored in RDS.
but RDS meta data could be added/changed.
If I read and load RDS table data in application , it would be stale for joining with streaming data.
I understood ,need to use Change Data Capture (CDC).
How can I implement Change Data Capture (CDC) in my scenario?
any clues or sample way to implement Change Data Capture (CDC) ?
thanks a lot.
You can stream a database into Kafka so that the contents of a table plus every subsequent change is available on a Kafka topic. From here it can be used in stream processing.
You can do CDC in two different ways:
Query-based: poll the database for changes, using Kafka Connect JDBC Source
Log-based: extract changes from the database's transaction log using e.g. Debezium
For more details and examples see http://rmoff.dev/ksny19-no-more-silos

From IOT hub to multiple tables in an Azure SQL database

I have an IOT hub with devices that push their sensor data to it, to be stored in a SQL database. This seems to be quite easy to do by means of a Stream Analytics job.
However, the tricky part is as follows. The data I'm pushing is not normalized, and since I'm using a SQL database I would like to structure it among multiple tables. This does not seem to be an easy task with Stream Analytics.
This is an example of the payload I'm pushing to the IOT hub:
{
"timestamp" : "2019-01-10 12:00",
"section" : 1,
"measurements" :
{
"temperature" : 28.7,
"height" : 280,
"ec" : 6.8
},
"pictures" : {
"101_a.jpg",
"102_b.jpg",
"103_c.jpg"
}
}
My database has a table Measurement, MeasurementItem and Picture. I would like to store the timestamp and section in a Measurement record, the temperature, height and ec in a MeasurementItem record and the pictures in the Picture table.
Filling one table is easy, but to fill the second table I need the generated auto-increment ID of the previous record to keep the relation intact.
Is that actually possible with Stream Analytics, and if no, how should I do that?
You should'nt try it with Stream Analytics (SA) for several reasons. It's not designed for workloads like this, because otherwise SA would not be able to perform it's work this performant. It's just sending data to one or more sinks depending on input data.
I would suggest passing the data to a component that is able to perform logic on the output-side. There are a some options for this. 2 examples might be:
Azure Function (via service-bus-trigger pointing to the IoT hub built-in endpoint as described here)
Event-Grid-based trigger on a storage you write the IoT data to (so again you could use a Azure Function but let it be triggered by an event from a storage account)
This solutions also come with the price that each incoming data package will call a logic unit for which you have to pay additionally. Be aware that there are billing options on Azure Functions that will not depend on the amount of calls but provide you the logic in a more app-service-like model.
If you have huge amounts of data to process you might consider an architecture using Data Lake Storage Account in combination with Data Lake Analytics instead. The latter can collect, aggregate and distribute your incoming data into different data stores too.
I ended up with an Azure function with an IoT hub trigger. The function uses EF Core to store the JSON messages in the SQL database, spread over multiple tables. I was a bit reluctant for this approach, as it introduces extra logic and I expected to pay extra for that.
The opposite appeared to be true. For Azure Functions, the first 400,000 GB/s of execution and 1,000,000 executions are free. Moreover, this solution gives extra flexibility and control because the single table limitation does no longer apply.

Azure Data Factory Connection to Google Big Query Timeout Issues

I´m trying to grab Firebase analytics data from Google BigQuery with Azure Data Factory.
The Connection to BigQuery works but I have quite often timeout issues when running a (simple) query. 3 out of 5 times I run into a timeout. If no timeout occurs I recive the data as expected.
Can someone of you confirm this issue? Or has an idea what´s the reason for the.
Thanks & best,
Michael
Timeout issues could happen in the Azure Data Factory sometimes. It is affected by source dataset, sink dataset, network, query performance and other factors, etc. After all, your connectors are not azure services.
You could try to set timeout param follow this json chart. Or you could set retry times to deal with timeout issues.
If your sample data is so simple that can't be timeout,maybe you could commit feedback here to ask adf team about your concern.

Redshift to Azure Data Warehouse CopyActivity Issue - HybridDeliveryException

Facts:
-I am running an Azure Data Factory Pipeline between AWS Redshift -> Azure Data Warehouse (since Power BI Online Service doesn't support Redshift as of this posts date)
-I am using Polybase for the copy since I need to skip a few problematic rows.
I use the "rejectValue" key and give it an integer.
-I made two Activity runs and got different errors on each run
Issue:
Run no:1 Error
Database operation failed. Error message from database execution : ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BooleanWritable,Source=.Net SqlClient Data Provider,SqlErrorNumber=106000,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=106000,State=1,Message=org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.BooleanWritable,},],'.
Run No:2 Error
Database operation failed. Error message from database execution : ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error happened when loading data into SQL Data Warehouse.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message= ,Source=.Net SqlClient Data Provider,SqlErrorNumber=106000,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=106000,State=1,Message= ,},],'.
Below is the reply from Azure Data Factory product team:
Like Alexandre mentioned, the error #1 means you have a text valued column on the source Redshift where the corresponding column in SQL DW has type bit. You should be able to resolve the error by making the two column types compatible to each other.
Error #2 is another error from Polybase deserialization. Unfortunately the error message is not clear enough to find out the root cause. However, recently the product team has done some change on the staging format for Polybase load so you should no longer see such error. Do you have the Azure Data Factory runID for the failed job? The product team could take a look.
Power BI Online Service does support Redshift, through ODBC and an On-Premises Data Gateway (https://powerbi.microsoft.com/en-us/blog/on-premises-data-gateway-august-update/). You can install the latter on a Windows VM in Azure or AWS.
Redshift ODBC Drivers are here: http://docs.aws.amazon.com/redshift/latest/mgmt/install-odbc-driver-windows.html
Otherwise, your error indicates that one column of your SQL DW table does not have the expected data type (you probably have a BIT where a CHAR or VARCHAR should be.

Resources