Azure Data Factory Excel read via HTTP fails

Azure Data Factory Excel read via HTTP fails - excel

I am looking to import data form a publicly available Excel sheet into ADF. I have set up the dataset using an HTTP linked service (see first screenshot), with AutoResolveIntegrationRuntime. However, when I attempt to preview the data, I get an error suggestion that the source file is not in the correct format (second screenshot).
I'm wondering if I may have something set incorrectly in my configuration?

.xls format is not supported while using HTTP.
Since, the API downloads file you can't preview data. You can load file to blob or Azure Datalake Storage using copy activity and then on top of that file have a dataset to preview.
The workaround is to save your .xlsx file as a .csv file because Azure Data Factory does not support reading .xlsx files explicitly for HTTP connectors.
Furthermore, there is no need to convert the.xlsx file to.csv if you only want to copy it; simply select the Binary Copy option.
Here, is a similar discussion where the MS-FTE has confirmed with Product Team that's its not supported yet for HTTP Connector.
Please submit a proposal in the QnA thread to allow this functionality in future versions, which will be actively monitored by the data factory product team and evaluated for adoption.
Please check the issue at QnA Thread- Here.

Related

Is there an Azure platform service that can convert text from pdf files and save those unstructured data in database?

Our organization is migrating our routine work onto Azure Cloud platform. One of my works is using Python to read many pdf files and convert all the text/unstructured data into tables, e.g.
first column shows the file name and second column saves all the text data etc.
Just wondering is there a service in Azure platform that can achieve this automatically? I am new user to Azure, so not quite familiar with this. Thanks heaps if any help.

I would recommend looking at Azure Form Recognizer. You can train it to recognize tables and extract data from PDF files.

How to copy the latest file from Sharepoint to Blob Storage using Logic App?

I am trying to extract the latest excel file from Sharepoint into Azure blob storage using Logic App.
I created the flow and it's working. However, it's copying all the files from the sharepoint to Blob.
Below is my flow.
enter image description here
I get new excel file everyday in my Sharepoint (/Shared documents/Data), hence I used list folder to locate it.
Then I used Filter array to filter the files as last modified with less than or equal to 5 m
I don't get any error. However, it's copying all the files rather than last modified file.
Can anyone advise how to address this?

You can use the trigger specific to 'When new file is added in sharepoint folder'. Documentation link - https://learn.microsoft.com/en-us/connectors/sharepoint/#when-a-file-is-created-in-a-folder

Office365 Excel as source for GCP BigQuery

We are using Office365 Excel and manually creating some data that we need in BigQuery. What solution would you create to automatically load the data from this excel to a table in bq? We are not allowed to use Google Sheets (which would solve all our problems).
We use Matillion and GCP products.
I have no idea how to solve this, and I don't find any information about this, so any suggestion or idea is appreciated.
Cheers,
Cris

You can save your data as csv and then load them to BigQuery.
Then, you can load data by using one of the following:
The Google Cloud Console
The bq command-line tool's bq load command
The API
The client libraries
Here you can find more details Loading data from local files

As a different approach, you can also try this other option:
BigQuery: loading excel file
For this you will need to use Google Drive and federated tables.
Basically you will upload your Excel files to Google Drive with the option "Convert uploaded files to Google Docs editor format" checked in your settings, and upload to BigQuery from Google Drive.

403 Forbidden on reading CSV file in OneDrive

I am trying to read my CSV files using Microsoft Graph API:
/me/drive/items/${someId}/workbook/worksheets('${someSheetName}')/usedRange
However it returns
403 Forbidden -> AccessDenied -> Could not get WAC token.
When reading a XLSX file, it works fine. I am using personal microsoft account.
Thanks for your help.

CSV and .xls are not supported formats. Only .xlsx works for this feature.

As others have pointed out, CSV files are not supported in the new Excel API. However, to help others who were initially confused by the error message like me, I'd like to elaborate a bit more.
First, it's useful to distinguish the Microsoft Graph API and the Excel API. The Microsoft Graph API mostly provides the basic functionalities of a file storage system so that third-party can work with files and folders in OneDrive and SharePoint. The Excel API, on the other hand, provides Excel functionalities so that third-parties can work with Excel files (.xlsx files specifically). Although the Excel API uses the same resource identification system and shares the same request "syntax" as the Microsoft Graph API, the two are not the same.
The request below clearly belongs to the Excel API, not the Graph API. Although the Graph API can handle CSV file (it doesn't care what type of file it's working with since it's application-agnostic), the Excel API can't.
/me/drive/items/${someId}/workbook/worksheets('${someSheetName}')/usedRange
If you look at the endpoints in the Excel API, you'll see that most of them point to features that do not exist in CSV files: workbook, worksheet, cells, etc. For instance, the request above attempts to read a specific worksheet within a workbook file, which is not possible if the file is CSV.
Also, the Excel API handles features such as formulas, data types, and cell formatting, which are also not present in CSV files.
Essentially, CSV files are no more than just plain-text files and thus are not supported by the Excel API.
Of course, it would be really helpful if the Excel API team could return a more meaningful error message. I personally find the current error message very misleading.

Upload Excel 2013 Workbook to website hosted on Azure

Does anyone have guidance and/or example code (which would be awesome) on how I would go about the following?
With a Web application using C# / ASP.NET MVC and hosted on Azure:
Allow a user to upload an Excel Workbook (multiple worksheets) via a web page UI
Populate a Dataset by reading in the worksheets so I can then process the data
Couple of things I'm unclear on:
I've read that Azure doesn't have ACEOLEDB, which is what Excel 2007+ requires, and I'd have to use OPEN XML SDK. Is this true? Is this the only way?
Is it possible to read the file into memory and not actually save it to Azure storage?
I DO NOT need to modify the uploaded spreadsheet. Only read the data in and then throw the spreadsheet away.

Well that's many questions in one post, let me see if we can tackle them one by one
With a Web application using C# / ASP.NET MVC and hosted on Azure:
1.Allow a user to upload an Excel Workbook (multiple worksheets) via a web page UI
2.Populate a Dataset by reading in the worksheets so I can then process the data
Couple of things I'm unclear on:
1.I've read that Azure doesn't have ACEOLEDB, which is what Excel 2007+ requires, and I'd have to use OPEN XML SDK. Is this true? Is
this the only way?
2.Is it possible to read the file into memory and not actually save it to Azure storage?
1/2. You can allow a user to upload the excel workbook to some /temp location and once you have read you can choose to do the cleanup, you can also write a script which can do the cleanup of the files which couldn't get deleted from /temp for whatever reasons.
Alternatively if you want to keep the files, you should store them in Azure Stoarge, and fetch/read when you need to.
check out this thread read excelsheet in azure uploaded as a blob
By default when you upload a file it is wrote into local disk and one later chooses to save the files to azure storage or whatever places.
Reading the excel - you can use any of the nugget packages given here http://nugetmusthaves.com/Tag/Excel and read the excel file, I prefer Gembox and NPOI
http://www.aspdotnet-suresh.com/2014/12/how-to-upload-files-in-asp-net-mvc-razor.html

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string