Error while reading CSV file using Apache POI - apache-poi

I am trying to read a CSV file from a local drive using the Apache PoI API.
FileInputStream fInputStream = new FileInputStream(inputName);
Workbook workBook = WorkbookFactory.create(fInputStream);
When I try to read a CSV file, which is created in Windows, the API reads it perfectly.
Whereas, when I have the CSV file(which is DOWNLOADED from an UNIX environment) and read it in windows environment, I get the below exception.
java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:81)
Can somebody throw some inputs on this behavior.

Related

Can't read an .xlsx file with [BlobInput]

I'm trying to read an .xlsx file from blob storage but the only option I have is to read it as a string from the binding parameter.
[BlobInput("templates/myTemplate.xlsx", Connection = "StorageAccountConnStr")] string template
To load the .xlsx file I need to make a MemoryStream. Thus I wrote:
var templateBytes = Encoding.Unicode.GetBytes(template);
var templateStream = new MemoryStream(templateBytes);
It fails and tells me the file might be corrupt.
Any ideas how to read properly an .xlsx file from blob storage as an input?
Turns out, except string, byte[] is supported.
Therefore I could be able to read and open my file. Azure documentation does not mention it yet.

Talend 7.1 tFileOutputExcel corrupt file

I'm trying to output an excel file from Talend 7.1. I've tried a few different setups and both xls and xlsx formats but they all result in the output file being corrupt and not being able to open it.
What am I doing wrong? I am loading an xlsx file into a database and this part works fine but outputting to excel I just can't figure it out! I was writing from the tMap directly to the tFileOutputExcel and it wasn't working (corrupt) so I changed it to write to a csv file first and then write that csv to the tFileOutputExcel but it is still corrupt.
This is my job detail:
And this is the settings in the tFileOutputExcel
I got this working by changing the transfer mode in the FTP component from 'ascii' to 'binary'. Such a simple thing but if this helps anyone else with this issue who is a newb like me :)

Apache POI appending data to xlsx file when task ran twice

I have a template.xls file that I'm adding data to from some database queries. I add the data and generate a new file named yyyyMMddHHmmss.xls. This works great. The file size is getting large so I'm trying to do the same with an xlsx file. When I generate the file the first time it works great. If I run the process again (even if I restart my java app) it's somehow retaining the last file in memory and appending the data to that file. In both cases it's pulling the source file from template.xls(x) which is an unmodified file.
The code between the two is identical except I'm passing in xlsx instead of xls in the latter case.
ClassLoader classLoader = getClass().getClassLoader();
File file = new File(Objects.requireNonNull(classLoader.getResource("template.xlsx")).getFile());
Workbook workbook = WorkbookFactory.create(file);
// write data
Date date = new Date();
SimpleDateFormat formatter = new SimpleDateFormat("yyyyMMddHHmmss");
String currentDate = formatter.format(date);
FileOutputStream fileOutputStream = new FileOutputStream(currentDate + ".xlsx");
workbook.write(fileOutputStream);
fileOutputStream.close();
workbook.close();
I'm using Java 8u201 and org.apache.poi:poi:4.1.0 (also tried 4.0.1)
As told in Apache POI - FileInputStream works, File object fails (NullPointerException) already, creating a XSSFWorkbook from a File has the disadvantage, that all changes which was made in that workbook always will be stored into that file while XSSFWorkbook.write. This is true even if write writes to another file. But writing explicitly to the same file is not even possible because the File stays open after the workbook was created and so writing into that same file leads to exceptions.
So creating a XSSFWorkbook from a File using
Workbook workbook = WorkbookFactory.create(file);
is not a good idea when file is a *.xlsx file. Instead the Workbook needs to be created using a FileInputstream:
Workbook workbook = WorkbookFactory.create(new FileInputStream(file));
Although the linked SO Q/A is from 2017, the same problem always nor occurs today using apache poi 4.1.0.

What .xlsx file format is this?

Using an existing SSIS package, I was trying to import .xlsx files we received from a client. I received the error message:
External table is not in the expected format
These files will open in XL
When I use XL (currently XL2010) to Save As... the file without making any changes:
The new file imports just fine
The new file is 330% the size of the original file
When changing .xlsx to .zip and investigating the contents with WinZip:
The original file only has 4 .xml files and a _rels folder (with 2 .rels files):
The new file has the expected .xlsx contents:
Does anyone know what kind of file this could be?
It would be nice to develop my SSIS package to work with these original files, without having to open and re-save each file. There are only 12 files, so if there are no other options, opening/saving each file is not that big of deal...and I could automate it with VBA going forward.
Thanks for any help anyone can provide,
CTB
There are many Excel file formats.
The file you are trying to import may have another excel format but the extension is changed to .xlsx (it could be edited by someone else) , or it could be created with a different Excel version.
There is a Third-Part application called TridNet File Identifier which is an utility designed to identify file types from their binary signatures. you can use it to specify the real extension of the specified file.
Also after a simple search on External table is not in the expected format this error is thrown when the definition (or version) of the excel files supported in the connection string is different from the file selected. Check the connection string used in the excel connection manager. It might help to identify the version of the file.

Apache POI - How to write XSSFWorkbook to POIFSFileSystem?

Using Apache POI HSSF, we can create xls file like this
private void write(HSSFWorkbook workbook) {
POIFSFileSystem filesystem = new POIFSFileSystem();
filesystem.createDocument(new ByteArrayInputStream(workbook.getBytes()),
"Workbook");
FileOutputStream stream = new FileOutputStream("test.xls");
filesystem.writeFilesystem(stream);
}
Similarly, how can I write with XSSFWorkbook? This does not have the getBytes() method.
I tried to create ByteArrayInputStream from XSSFWorkbook like this -
ByteArrayOutputStream baos = new ByteArrayOutputStream();
workbook.write(baos); //XSSFWorkbook here
ByteArrayInputStream bias = new ByteArrayInputStream(baos.toByteArray());
But the xlsx file created was corrupt. How can I write the workbook to disc using POIFSFileSystem?
The same XSSFWorkbook was written sucessfully when I did like this -
FileOutputStream stream = new FileOutputStream("test.xlsx");
workbook.write(stream);
When I extracted and compared the xlsx files, there was no difference. However, when I do a plain text compare on the xlsx files directly (without extracting), there are few differences in the bytes.
So the problem should be in the createDocument() and/or writeFilesystem() methods of POIFSFileSystem. Can someone let me know how to write XSSFWorkbook using POIFSFileSystem?
You can't!
POIFSFileSystem works with OLE2 files, such as .xls, .doc, .ppt, .msg etc. The POIFS code handles reading and writing the individual streams within that for you.
With the OOXML files (.xlsx, .docx, .pptx etc), the container for the file is no longer OLE2. Instead, the files are stored within a Zip container. In POI, this is handled by OPCPackage, which takes care of reading and writing from Zip files with the required OOXML metadata.
If you want to write a XSSF file to disk, simply do:
FileOutputStream stream = new FileOutputStream("test.xlsx");
workbook.write(stream);
stream.close();
And XSSFWorkbook will handle talking to OPCPackage for you to make that happen.

Resources