How to process the telemetry json messages in Azure data lake Gen2? - azure

I have simulated devices which is sending messages to IoT Hub blob storage and from there I am copying data(encoded in JSON format) to Azure Data Lake Gen2 by creating a pipeline using Azure Data Factory.
How to convert these json output file to CSV file to be processed by data lake engine? Can't I process all the incoming json telemetry directly in azure data lake?

There are 3 official built-in extractors that allows you to analyze data contained in CSV, TSV or Text files.
But MSFT also released some additional sample extractors on their Azure GitHub repo that deal with Xml, Json and Avro files. I have used the Json extractor in production as it is really stable and useful.
The JSON Extractor treats the entire input file as a single JSON document. If you have a JSON document per line, see the next section. The columns that you try to extract will be extracted from the document. In this case, I'm extracting out the _id and Revision properties. Note, it's possible that one of these is a further nested object, in which case you can use the JSON UDF's for subsequent processing.
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
//Define schema of file, must map all columns
#myRecords =
EXTRACT
_id string,
Revision string
FROM #"sampledata/json/{*}.json"
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

Related

ingest multiple json files from ADLS gen2 to ADX through ADF

I'm new to both of ADF(Azure Data Factory) and ADX(Azure Data Explorer).
I have multiple Json files in ADLS in different folder level, and I need to ingest all the files into ADX.
ex) UserData/Overground/UsersFolder/project1/main/data/json/demo-02/2021/01/28/03/demo-02-2021-01-28-03-30.json
UserData/Overground/UsersFolder/project1/main/data/json/demo-02/2021/01/28/04/demo-02-2021-01-28-03-30.json
UserData/Overground/UsersFolder/project1/main/data/json/demo-02/2021/01/29/03/demo-02-2021-01-28-03-30.json
UserData/Overground/UsersFolder/project1/main/data/json/demo-02/2021/02/23/03/demo-02-2021-01-28-03-30.json
I'm just wondering if I need to create as many tables in ADX as the number of the Json files in ADLS.. so if I have 1000 Json files in ADLS, should I create 1000 tables in ADX to copy the data from adls to adx?
and how could I copy the data from adls to adx in ADF?
Appreciate your help in advance
To copy from multiple folders, you can use Additional settings of Copy activity Source. For more information follow this official document. You may need to use wildcard for multiple files.
Additional settings:
recursive Indicates whether the data is read recursively from the subfolders or only from the specified folder. Note that when recursive is set to true and the sink is a file-based store, an empty folder or subfolder isn't copied or created at the sink.
Allowed values are true (default) and false.
This property doesn't apply when you configure fileListPath.
Also refer Azure Data Explorer as Sink
Azure Data Explorer is supported as a source, where data is copied from Azure Data Explorer to any supported data store, and a sink, where data is copied from any supported data store to Azure Data Explorer. Integrate Azure Data Explorer with Azure Data Factory

issue with azure blob connector in power automate

I have a Power automate Flow which uses Azure blob connector to read excel file from the blob using the Get blob content action.
The problem is I need to process the excel data and save it in D365 f and O entity. for that I need the data in json format. I saw we can use cloudmersive connector to convert excel to json
I want to do it without using any 3rd party connector.?
You can read the file, and insert it into a table. After that, you can use compose actions or arrays to assign them to a JSON object.

Changing the format of BLOB storage in Azure

How to change the storage format of the BLOB. For me it is always storing AVRO file. I wanted to change as JSON format.
Also, I am seeing the following error msg in BLOB storage :
avro may not render correctly as it contains an unrecognized extension.(Pls refer the error in the attached msg)
Also I am seeing encrypted msg, due to all this reason data explorer could not pull the data.
I am not able to pull the data in to data explorer because of this format issues
This is likely because you might have enabled capturing of events streaming through Azure Event Hubs.
Azure Event Hubs Capture enables you to automatically deliver the streaming data in Event Hubs to an Azure Blob storage or Azure Data Lake Storage Gen1 or Gen 2 account of your choice.
Captured data is written in Apache Avro format: a compact, fast, binary format that provides rich data structures with inline schema. This format is widely used in the Hadoop ecosystem, Stream Analytics, and Azure Data Factory.
More information about working with Avro files is available in this article: Exploring the captured files and working with Avro
#mmking is right in that you cannot change/convert the file formats within Azure Blob Storage, although you can use Avro Tools to convert the file to JSON format and perform other processing.

I have about 20 files of type excel /pdf which can be dowloaded from an Http Server.I need to load this file into Azure Storage using Data Factory

I have 20 files of type Excel/pdf located in different https server. i need to validate these file and load into azure storage Using Data Factory.I need to do apply some business logic on this data and load into azure SQL Database.I need to if we have to create a pipe line and store this data in azure blob storage and then load into Azure sql Database
I have tried creating copy data in data factory
My idea as below:
No.1
Step 1: Use Copy Activity to transfer data from http connector source into blob storage connector sink.
Step 2: Meanwhile, configure a blob storage trigger to execute your logic code so that the blob data will be processed as soon as it's collected into blob storage.
Step 3: Use Copy Activity to transfer data from blob storage connector source into SQL database connector sink.
No.2:
Step 1:Use Copy Activity to transfer data from http connector source into SQL database connector sink.
Step 2: Meanwhile, you could configure stored procedure to add your logic steps. The data will be executed before inserted into table.
I think both methods are feasible. The No.1, the business logic is freer and more flexible. The No.2, it is more convenient, but it is limited by the syntax of stored procedures. You could pick the solution as you want.
The excel and pdf are supported yet. Based on the link,only below formats are supported by ADF diectly:
i tested for csv file and get the below random characters:
You could refer to this case to read excel files in ADF:How to read files with .xlsx and .xls extension in Azure data factory?

custom output path from Azure Stream analytics

I have an event hub which receives telemetry data from different devices. I created a stream analytics job to process this data and output it to various sinks (Power BI, Cosmos DB and Data Lake). While creating the data lake output I found that I couldn't set the output path based on the message payload. The path I can set inside the sink is of the format: [folder_structure]/{date}{time}. I need a very specific folder structure which would check the message payload and put the file in the specified location. Is there any way to do that?
This capability is currently available in private preview - for output to blob storage.
https://azure.microsoft.com/en-us/blog/4-new-features-now-available-in-azure-stream-analytics/
If this is something you can use, please provide details at the following url. we will add you to the preview program.
https://forms.office.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR8zMnUkKzk5Elg9i6hoUmJVUNDhIMjJESFdVNDhRODNMTVZTNDVIR0w2Qi4u

Resources