In Azure Data Factory read excel files - azure

I am new to Azure Data Factory(ADF), I need to access/load excel files sitting in a blob into ADF but as ADF doesn't support excel format(supports tex/csv/json/.. only) is there a way to ingest excel files into ADF?
I really appreciate if anybody could help!
Thanks.

ADF does not support reading from xls file yet.
You can find solutions in this answer: How to read files with .xlsx and .xls extension in Azure data factory?

ADF V2 now supports reading data from an Excel file, Here is the link to the article.
Hope this helps!

ADF now support Excel as a data source. You can read here

You are right, Azure Data Factory does not support to read .xlsx file, the workaround is to save your .xlsx file as a .csv file, I think it should work.
My .xlsx file:
Save as .csv file, the info will not change:
Preview Data in ADF:
Besides, if you want to just copy the .xlsx file, no need to convert it to .csv, you just need to choose the Binary Copy option.

If you're familiar with SSIS, you could simply use Excel Source in your SSIS package, and then run it on SSIS Integration Runtime using Execute SSIS Package activity in ADF pipeline.

An easier solution is to use a powerautomate flow to export an excel table as a CSV and trigger the datafactory process. The only issue is that an undocumented feature on power automate is that you need to use a fixed Excel file name as passing the name as a variable fails.

Related

XLSX from blob column to Oracle table

I can't use e.g. external tables or any other ways to upload data than saving .xlsx file as a blob in database.
Structure of xlsx file will always be the same, is it possible to read data from blob column as a table?
The best article about it: https://jeffkemponoracle.com/tag/xlsx/.
There are several ways / tools which can help you.
If you can't use any of it, you can consider using oracle's embedded ZIP and XML tools (xlsx is the zipped folder with xml files which you can parse)
If you have Oracle APEX you can consider also this: https://docs.oracle.com/en/database/oracle/application-express/20.2/aeapi/APEX_DATA_PARSER.html

Azure Data Factory - Recording file name when reading all files in folder from Azure Blob Storage

I have a set of CSV files stored in Azure Blob Storage. I am reading the files into a database table using the Copy Data task. The Source is set as the folder where the files reside, so it's grabbing it's file and loading it into the database. The issue is that I can't seem to map the file name in order to read it into a column. I'm sure there are more complicated ways to do it, for instance first reading the metadata and then read the files using a loop, but surely the file metadata should be available to use while traversing through the files?
Thanks
This is not possible in a regular copy activity. Mapping Data Flows has this possibility, it's still in preview, but maybe it can help you out. If you check the documentation, you find an option to specify a column to store file name.
It looks like this:

where to convert .xls file to json file inside Nifi data flow?

I am new to NiFi but i want to make a flow in NiFi in which i will take a .xls file from a FTP point and convert it to JSON file and put it in websocket server. But there is no processor regarding Excel. It will we a great help if someone let me know how to do it?
NIFI-2613 This feature is in progress
In the meantime,
First, please try to use the option provided XLS to CSV. Basically use the provided script to convert xls to csv.
Then, follow Nifi CSV to JSON to achieve your goal

How to load outlook item (.msg) file format attachment to hive table?

First of all, I am using Microsoft Azure HDinsight hadoop.
I have .msg file attachments(mail message format for Outlook)
I already upload to my blob storage but I can not upload them to table that I have created. Is there way I can upload them to the existing table? Any advice will help. Thank you so much in advance.
Hive does not understand msg format so you will have to read it as a string and then write a query on it to read from the blob store and then insert it into the table. You can either use space as a delimiter and/or write your own custom extractor using Java or Python to insert it into the table.

How to open spss data files in Excel?

I want to open spss .sav data files in Excel without opening the spss files (I don't want to convert spss data file into Excel file). I know this is possible using OLDB connection, but I don't know how to do this.
I converted sav to csv online: http://pspp.benpfaff.org/
(Not exactly an answer for you, since do you want avoid opening the files, but maybe this helps others).
I have been using the open source GNU PSPP package to convert the sav tile to csv. You can download the Windows version at least from SourceForge [1]. Once you have the software, you can convert sav file to csv with following command line:
pspp-convert <input.sav> <output.csv>
[1] http://sourceforge.net/projects/pspp4windows/files/?source=navbar
In order to download that driver you must have a license to SPSS. For those who do not, there is an open source tool that is very much like SPSS and will allow you to import SAV files and export them to CSV.
Here's the software
And here are the steps to export the data.
I help develop the Colectica for Excel addin, which opens SPSS and Stata data files in Excel. This does not require ODBC configuration; it reads the file and then inserts the data and metadata into your worksheet.
The addin is downloadable from
http://www.colectica.com/software/colecticaforexcel
You can do it via ODBC. The steps to do it:
Install IBM SPSS Statistics Data File Driver. Standalone Driver is enough.
Create DNS via ODBC manager.
Use the data importer in Excel via ODBC by selecting created DNS.
You can use online converter, developed by me at N'counter.
This is the easiest way to open SPSS file in Excel.
1) You just have to upload your file to SPSS coN'verter at https://secure.ncounter.de/SpssConverter
2) Select some options
3) And your converted Excel file will be downloaded
No information about your file contents is retained on our server. The file travels to our server, is converted in-memory, and is immediately discarded: We don't peer into your data at any time!
I tried the below and it worked well,
Install Dimensions Data Model and OLE DB Access
and follow the below steps in excel
Data->Get External Data ->From Other sources -> From Data Connection Wizard -> Other/Advanced-> SPSS MR DM-2 OLE DB Provider-> Metadata type as SPSS File(SAV)-> SPSS data file in Metadata Location->Finish

Resources