AzureML Studio - Unable to Export Data using Designer - azure

I built a classification model using the new AzureML Studio Designer. I am trying to export
enter code herethe scored model as CSV file using the pill Export Data. I have selected
workspaceblobstore as datastore and csv as file format. The pipeline runs fine, but the
dataset does not show up under Data. I am also unable to just right-click on the scored model
and download a csv file.*
[Pipeline][1]
[Export Data Parameters][2]
[Output][3]
[1]: https://i.stack.imgur.com/dlaec.png
[2]: https://i.stack.imgur.com/PLwRv.png
[3]: https://i.stack.imgur.com/rua29.png

When the dataset is uploaded using any of the following formats, we can see the dataset under the dataset of the designer tab.
To reproduce the problem, used a sample dataset and uploaded from the local directory in the form of CSV file. It validated perfectly under the data it is visible in designer tab

Related

How to use a Tab-Delimited UTF-16le file as source in a Microsoft Azure data Factory dataflow

I am working for a customer in the medical business (so excuse the many redactions in the screenshots). I am pretty new here so excuse any mistakes I might make please.
We are trying to fill a SQL database table with data coming from 2 different sources (CSV files). Both are delivered on a BLOB storage where we have read access.
The first flow I build to do this with azure data factory works perfectly so I just thought to clone that flow and point it to the second source. However the CSV files from the second source are TAB delimited and UTF-16le encoded. Luckily you can set these parameters when you create a dataset:
Dataset Settings
When I verify the dataset by using the "Preview Data" option, I see a nice list with data coming from the CSV file:Output from preview data So it appears to work fine !
Now I create a new dataflow and in the source I use the newly created Data source. All settings I left at default. data flow settings
Now when I open Data Preview and click refresh I get garbage and NULL outputs instead of the nice data I received when testing the data source. output from source block in dataflow In my first dataflow i created this does produce the expected data from the csv file but somehow the data is now scrambled ?
Could someone please help me with what I am missing or doing wrong here ?
Tried to repro and here you could see if you have the Dataset settings,
Encoding as UTF-8 instead of UTF-16 then you will ne able to preview the data.
Data Preview inside the Dataflow:
And if even I try to have the UTF-16LE enabled for the encoding having such issues:
Hence, for now you could change the Encoding and use the pipeline.

Newline in sink output data

Why does azure data factory data flow automatically add new line to the output file? Can this be deleted or is there a settings to configure? See the screenshot of the first image.
output file
I have only 1 row/record when I preview the data.
sink data preview
Sorry, I have to removed/blurred the data.
I tried to repro this scenario and you are right. This happens in some file types. Such as I see in .CSV and binary files.
I know that when using Binary dataset, ADF does not parse file content but treat it as-is, and you can only copy from Binary dataset to Binary dataset.
And Data Preview is a snapshot of your transformed data using row limits and data sampling from data frames in Spark memory. Therefore, the sink drivers are not utilized or tested in this scenario. It shows limited number of rows when previewed and the number of columns shown in preview is adopted from the first row in the file.
I can see it as below:
Output file from sink in ADF preview editor in Storage container:
You can also confirm by looking at the inspect tab
I also tried downloading the output file to local and opening using different editors to confirm the behavior (New line '16' got appended automatically)
Workaround: You can try use DelimitedText as source dataset or Json as sink dataset instead.
Please share your feedback with product group so that they can look into this.
Similar Feedback: https://feedback.azure.com/forums/217298-storage/suggestions/40268644--preview-file-in-blob-container-vs-edit

Unable to read same table data from a pdf file spanning across multiple pages using form recognizer client library using C# code

I am not able to read table data which is spanning across next page. I am using c# code to extract data using FormRecognizerClient
and by using a trained custom model in a console application. Also tried by using StartRecognizeInvoicesFromUriAsync i.e using
analyze invoices of FormRecognizerClient , this is also failed in that scenario mentioned above.
And FormRecognizerClient in C# is also not recognizing headers in a table.
I am using below link for FormRecognizerClient C# library Code,
https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/client-library?tabs=preview%2Cv2-1&pivots=programming-language-csharp#analyze-invoices
Looking for any help.
Below is the image link showing format for pdf :
https://i.stack.imgur.com/mx5NW.png
Form Recognizer does not yet support tables spanning across pages and will extract each table per page. Can you please share a snippet of the table on the second page which is not detected (please anonymize all data and redact all data before sharing)

How to export Azure Machine Learning output to CSV

I have created an Azure ML experiment which will give the output as predicted probability values and some charts such as bar chart, pie chart, etc. Now I am able to see the outputs in Azure ML's output page.
How can I export my Azure ML experiment results to CSV (or any other similar format)?
You can just configure that by using the modules under Data Format Conversions. Have a look here and here. Documentation is in progress, unluckily.
Once you've trained your model, publish it as a web service. Then from published service, you can Download Excel Workbook. Through this workbook, it will run your web service with the data you input into excel. Then it will show the predicted values.
You can add a module called convert to csv in your experiment.
The Run selected module.
Right click after the module is run and click on 'Download'

How to open spss data files in Excel?

I want to open spss .sav data files in Excel without opening the spss files (I don't want to convert spss data file into Excel file). I know this is possible using OLDB connection, but I don't know how to do this.
I converted sav to csv online: http://pspp.benpfaff.org/
(Not exactly an answer for you, since do you want avoid opening the files, but maybe this helps others).
I have been using the open source GNU PSPP package to convert the sav tile to csv. You can download the Windows version at least from SourceForge [1]. Once you have the software, you can convert sav file to csv with following command line:
pspp-convert <input.sav> <output.csv>
[1] http://sourceforge.net/projects/pspp4windows/files/?source=navbar
In order to download that driver you must have a license to SPSS. For those who do not, there is an open source tool that is very much like SPSS and will allow you to import SAV files and export them to CSV.
Here's the software
And here are the steps to export the data.
I help develop the Colectica for Excel addin, which opens SPSS and Stata data files in Excel. This does not require ODBC configuration; it reads the file and then inserts the data and metadata into your worksheet.
The addin is downloadable from
http://www.colectica.com/software/colecticaforexcel
You can do it via ODBC. The steps to do it:
Install IBM SPSS Statistics Data File Driver. Standalone Driver is enough.
Create DNS via ODBC manager.
Use the data importer in Excel via ODBC by selecting created DNS.
You can use online converter, developed by me at N'counter.
This is the easiest way to open SPSS file in Excel.
1) You just have to upload your file to SPSS coN'verter at https://secure.ncounter.de/SpssConverter
2) Select some options
3) And your converted Excel file will be downloaded
No information about your file contents is retained on our server. The file travels to our server, is converted in-memory, and is immediately discarded: We don't peer into your data at any time!
I tried the below and it worked well,
Install Dimensions Data Model and OLE DB Access
and follow the below steps in excel
Data->Get External Data ->From Other sources -> From Data Connection Wizard -> Other/Advanced-> SPSS MR DM-2 OLE DB Provider-> Metadata type as SPSS File(SAV)-> SPSS data file in Metadata Location->Finish

Resources