Kafka to Azure to Power BI - azure

Expected Fuctionality: send data from Kafka to azure ADLS and to power BI for visualization.
I am currently sending a hard coded text in my Kafka program (java) to Azure event hub. While Capture event hub content to ADLS gen1, .Avro file is created in ADLS. Since .avro files aren't supported in Power BI, I'm unable to proceed.
I need a solution to send the hard coded text as a .txt or .json file instead of .avro so that it ill be easy for me visualize in power BI. Else i need a solution to convert the .avro file to .txt or .json and save it in ADLS.

So far, I don't have the comment privileges. So i will ask it here.
Can you directly create a
and with parameters and then use Kafka CURL commands to call the rest API and push data.
I tried pushing data and was able to do it. Since I haven't worked with Kafka, I can't say for sure.

Related

Read Shapefiles in Azure Data Factory

I want to find out if it's possible to read shapefiles in Azure Data Factory?
An example will be to store the file into a container, and then manipulate the data of the shapefile with a data flow.
Please let me know.
Thank you.
I've tried loading the file into a container, and reading the file as an Azure Blob, but it doesn't seem to be the correct way of reading a shapefile in this platform.
I'm not sure if there is a support for shapefiles in ADF...

MS PowerPlatform: Use data inside a Azure data lake via a Flow

Edit: Still nowhere closer to an answer. Help appreciated...
My company has some simple lists with data I need and we already have these lists in the .parquet format inside a data lake. (Edit: Gen2)
I'm supposed to build a small PowerApp that uses some of the information inside these lists but I can't figure out the correct way to get the content of them via a Flow.
There's a connector "Azure Blob Storage: Get Blob Content" which sounds like the right one and indeed outputs a cryptic content string. But how do I get from this to an actually readable table where I can use the items? Or is this the wrong connector for this?
(Very new to all this Microsoft stuff. Don't really know anything about how this data lake is set up etc. Not sure whether this helps but basically the following Python script works and is exactly what I need to do via a Flow so it can be done automatically daily:)
import os
from io import BytesIO
import pandas as pd
from azure.storage.blob import BlobServiceClient, BlobClient
from azure.storage.blob import ContentSettings, ContainerClient
blob = BlobClient.from_connection_string(MY_CONNECTION_STRING, "myContainer", "myFile.parquet")
df = pd.read_parquet(BytesIO(blob.download_blob().readall()))
Thanks for any help :)
To clarify: By no means do I have to use this exact process. If you tell me "The standard way is to build a python REST Api on top of the data lake that answers this, that's perfectly fine. I just need to know the easiest and most standard way to access data inside a data lake)
If the Azure Data Lake is Gen1, then you don't need Power Automate to access it. You can go directly from a canvas app PowerApp using the Azure Data Lake connector.
https://learn.microsoft.com/en-us/connectors/azuredatalake/
you can call the ReadFile operation to read the contents of your file. this returns a binary which you then just convert to a string and work from there.
Since your data is in ADLS Gen2, I don't think you've got a direct connector that can transform Parquet data in Power Automate for this.
I would not bother ingesting and transforming parquet files in automate as you're bordering on a requirement for an ETL tool at this stage.
I would look at transforming the file/s using Azure Data Factory Pipelines and maybe dumping them in a database of your choice. Then power automate can pick it from there as it's got connectors to most databases. You could also convert them to CSV files if database is overkill for your setup.
I would use power automate as my orchestration layer. It will call this data factory pipeline > wait for it to complete > pick it from there.
https://learn.microsoft.com/en-us/azure/data-factory/format-parquet
https://learn.microsoft.com/en-au/connectors/azuredatafactory/#actions

How can I decide, if I should use the Power BI API to push data into my streaming dataset or Azure Stream Analytics?

I am very new to Azure. I need to create a Power BI dashboard to visualize some data produced by a sensor. The dashboard needs to get updated "almost" real-time. I have identified that I need a push data set, as I want to visualize some historic data on a line chart. However, from the architecture point of view, I could use the Power BI REST APIs (which would be completely fine in my case, as we process the data with a Python app and I could use that to call Power BI) or Azure Stream Analytics (which could also work, I could dump the data to the Azure Blob storage from the Python app and then stream it).
Can you tell me generally speaking, what are the advantages/disadvantages of the two approaches?
Azure stream analytics lets you have multiple sources and define multiple targets and one of those targets could be Power-BI and Blob ... and at the same time you can use windowing function on the data as it comes in. It also provides you a visual way of managing your pipeline including windowing function.
In your case you are kind of replicating the incoming data to Blob first and secondly to power-BI. But if you have a use case to apply windowing function(1 minutes or so) as your data is coming in from multiple sources e.g. more than one sensor or a senor and other source, you have to fiddle around a lot to get it working manually, where as in stream analytics you can easily do it.
Following article highlights some of the pros and cons of Azure Analytics...
https://www.axonize.com/blog/iot-technology/the-advantages-and-disadvantages-of-using-azure-stream-analytics-for-iot-applications/
If possible, I would recommend streaming data to IoT Hub first, and then ASA can pick it up and render the same on Power BI. It will provide you better latency than streaming data from Blob to ASA and then Power BI. It is the recommended IoT pattern for remote monitoring, predictive maintenance etc , and provides you longer term options to add a lot of logic in the real-time pipelines (ML scoring, windowing, custom code etc).

Azure IoT + Stream Analytics with blob data

we currently try to evaluate whether or not we should port our business logic
to Azure IoT Hub.
So far this looks promising but i have a questions about stream analytics.
Lets say we have IoT device in the field that send their data as csv files.
Currently our back end has some huge problem to go through this data, analyse it and inject it into our database systems with a decent performance.
I want to try to use Azure for that.
If I use IoT hub and wanna send this csv format to the hub. We assume that the csv format is fixed so i can't just port to the d2c communication format.
Can the stream analytics service work with this csv format and can it puts the embedded data into specific tables in a table storage ?
This would be really important. Are there any example of that out there that might clear things up for me ?
I guess Auzre has its libraries for handling csv files. What if we use no csv format but instead another industry standard format that Azure might not know about ?
Hope you can help me here.
Azure Stream Analytics (ASA) does support CSV as input:
Event serialization format: The serialization format (JSON, CSV, or Avro) of the incoming data stream.
And yes, it also support Azure Table Storage as output . See the docs
When you create an ASA job you can upload your csv file to test the query, so you can easily try it out if you create a sample file.
They have some example csv data on github
I suggest you create a small Proof of concept based on your sample data.
If, for some reason (like the data is in an unsupported format), ASA does not fit you can always retrieve the IoT Hub data using different techniques, for example using an EventProcessorHost. This way you have complete control over the data and you can output it using everything you want and it will still be scalable (but of course this depends on the data destination as well). See this post as a rough idea. It seems a bit outdated but the concept is still valid this day.
The official docs about possible other options to read data from the EventHub can be found here

Input data from multiple sources Azure streaming job

I need to send data from two devices to my Azure IOT hub.
Both of the devices transmit data with different JSON format. The common column between them is TimeStamp.
Now i need to consume and combine these two inputs and output my data into Power BI.
Can anybody suggest any approach or any link to refer to?
Prateek Raina
To implement such a scenario you might want to use Azure Stream Analytics to reconcile the 2 different data types. The query language for ASA is SQL like and is pretty straight forward, it shouldn't be too hard to do this considering your data sources are both json.
Note that you can easily setup PowerBI as the output for Stream Analytics as well.
I suggest you send it separately. But if you want to achieve with a single datatable, you can join it with Union.

Resources