How to use a Tab-Delimited UTF-16le file as source in a Microsoft Azure data Factory dataflow - azure

I am working for a customer in the medical business (so excuse the many redactions in the screenshots). I am pretty new here so excuse any mistakes I might make please.
We are trying to fill a SQL database table with data coming from 2 different sources (CSV files). Both are delivered on a BLOB storage where we have read access.
The first flow I build to do this with azure data factory works perfectly so I just thought to clone that flow and point it to the second source. However the CSV files from the second source are TAB delimited and UTF-16le encoded. Luckily you can set these parameters when you create a dataset:
Dataset Settings
When I verify the dataset by using the "Preview Data" option, I see a nice list with data coming from the CSV file:Output from preview data So it appears to work fine !
Now I create a new dataflow and in the source I use the newly created Data source. All settings I left at default. data flow settings
Now when I open Data Preview and click refresh I get garbage and NULL outputs instead of the nice data I received when testing the data source. output from source block in dataflow In my first dataflow i created this does produce the expected data from the csv file but somehow the data is now scrambled ?
Could someone please help me with what I am missing or doing wrong here ?

Tried to repro and here you could see if you have the Dataset settings,
Encoding as UTF-8 instead of UTF-16 then you will ne able to preview the data.
Data Preview inside the Dataflow:
And if even I try to have the UTF-16LE enabled for the encoding having such issues:
Hence, for now you could change the Encoding and use the pipeline.

Related

How to read *.txt files in Azure Data Factory?

I'm trying to load data from a file *.txt type to a SQL Data Base by using a Data Flow or Copy Data activity in Azure Data Factory, but I'm not being capable to do it, down below is my try:
File configuration (as you see guys, I'm using the csv option cause' is the unique way that Azure allows me to read it):
Here is the Preview Data shows:
Everything looks fine, but once I use the Data Set in a Data Flow, I get as follow:
It is possible to read a *.txt file with Azure? What I'm doing wrong?
I tried with a sample text file and was able to get the original data in the Source transformation data preview.
Please check if you have selected the correct source dataset in your source transformation. Sometimes, when the source file is changed, it still shows old projections or incorrect projections and data previews. To reset you can change the output stream name or reconnect the source file.
Below is my source dataset connection and source settings.
Source dataset: text file
Dataflow source:

Newline in sink output data

Why does azure data factory data flow automatically add new line to the output file? Can this be deleted or is there a settings to configure? See the screenshot of the first image.
output file
I have only 1 row/record when I preview the data.
sink data preview
Sorry, I have to removed/blurred the data.
I tried to repro this scenario and you are right. This happens in some file types. Such as I see in .CSV and binary files.
I know that when using Binary dataset, ADF does not parse file content but treat it as-is, and you can only copy from Binary dataset to Binary dataset.
And Data Preview is a snapshot of your transformed data using row limits and data sampling from data frames in Spark memory. Therefore, the sink drivers are not utilized or tested in this scenario. It shows limited number of rows when previewed and the number of columns shown in preview is adopted from the first row in the file.
I can see it as below:
Output file from sink in ADF preview editor in Storage container:
You can also confirm by looking at the inspect tab
I also tried downloading the output file to local and opening using different editors to confirm the behavior (New line '16' got appended automatically)
Workaround: You can try use DelimitedText as source dataset or Json as sink dataset instead.
Please share your feedback with product group so that they can look into this.
Similar Feedback: https://feedback.azure.com/forums/217298-storage/suggestions/40268644--preview-file-in-blob-container-vs-edit

Copying data using Data Copy into individual files for blob storage

I am entirely new to Azure, so if this is easy please just tell me to RTFM, but I'm not used to the terminology yet so I'm struggling.
I've created a data factory and pipeline to copy data, using a simple query, from my source data. The target data is a .txt file in my blob storage container. This part is all working quite well.
Now, what I'm attempting to do is to store each row that's returned from my query into an individual file in blob storage. This is where I'm getting stuck, and I'm not sure where to look. This seems like something that'll be pretty easy, but as I said I'm new to Azure and so far am not sure where to look.
You can type 1 in the Max rows per file of the Sink setting and don't set the file name in the dataset of sink. If you need, you can specify the file name prefix in the File name prefix setting.
Screenshots:
The dataset of sink
Sink setting in the copy data activity
Result:

Export Sharepoint list to .csv and upload to Azure Data Lake Using Flow

I am trying to using Microsoft Flow to export a Sharepoint List to Azure Data Lake.
I want it so that anytime a particular online list is changed, its entire contents are loaded into a file in Data Lake. If the file already exists, I want to overwrite it. Can someone please explain how I can go about doing this, I have tried multiple ways, but they are not getting the job done.
Thanks
I was able to get the items in the SharePoint list to near perfection. I will post the Flow here in case anyone in the future needs it.
So what I did is that every 5 minutes I "create" a file in Azure Data Lake which overwrites the file if it exists. The content of the files cannot be blank, so I added a newline to the content. Then I use Get Items to retrieve all the items in the SharePoint List. From there, using an Apply to each loop, I append the content of the current row of the Sharepoint list to the Data Lake file (separated by | and ending with a new line after all the content is added). This works to near perfection, with the only caveat being the newline at the beginning of the file, which I eliminate using PowerQuery.
This is exactly what I needed. If anybody sees a way to make this better, please post so that we can get this to perfection.

Want to setup setting data in Windows Azure Stream Analytics

Need help to setup the Reference data in stream analytics. I want to add setting(default) data of my application into stream analytics. I can add the reference data and by doing upload sample file I can upload JSON or CSV file. However while firing a join query it gives 0 rows as all reference data haven't stored (So null if left outer join).
I investigate the issue and I think it is due to Path Pattern, but I do not have much idea about it.
Based on your description, as you said, you had been sure that the issue was caused by Path Pattern/Path Prefix Pattern, but I could not give some helpful suggestion for you without any details, such as the screenshot of your Path Pattern setting.
So just list some resources as references for you, hope these help for resolving your issue.
Two screenshots about Path Prefix Pattern/Path Pattern which be introduced from Link 1 & 2.
A sample Use Stream Analytics to process exported data from Application Insights introduce how to read stream data from Blob Storage at its section Create an Azure Stream Analytics instance, which step as similar as for Reference data.
Hope it helps.
The issue was due to not properly formatted JSON file.

Resources