Read-Write excel file in to MarkLogic - apache-poi

We have a requirement to write and read the excel file to and from the MarkLogic but we are getting exception while reading excel file from MarkLogic,
We want to pass the retrieved file to the XSSFWorkbook.java given by apache.poi.
I have tried the below code to write the Excel file to the MarkLogic,
DatabaseClient client = databaseClientService.getContentClient();
String contains = new String(Files.readAllBytes(Paths.get("src/test/resources/TestExcelEntity.xlsx")));
BytesHandle bytesHandle = new BytesHandle();
bytesHandle.setMimetype("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
bytesHandle.setFormat(Format.BINARY);
bytesHandle.set(contains.getBytes());
BinaryDocumentManager manager = client.newBinaryDocumentManager();
manager.writeAs("/test/binaryDoc.xlsx", bytesHandle);
Code to read Binay Excel file
FileHandle fileHandle = new FileHandle();
fileHandle.setMimetype("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
fileHandle.setFormat(Format.BINARY);
File file = manager.read("/test/binaryDoc.xlsx", fileHandle).get();
XSSFWorkbook workbook = new XSSFWorkbook(file)
I can see the downloaded file in a temp location, but when I open the downloaded excel file I can see the error message as "The file is corrupted and can not be open" same error message I can see when I download it from qconsole.
Since the "/test/binaryDoc.xlsx" file is not getting downloaded/read properly so XSSFWorkbook.java is failing with an exception.
org.apache.poi.openxml4j.exceptions.InvalidOperationException: Can't open the specified file input stream from file: 'C:\Users\SHIVLI~1\AppData\Local\Temp\tmp9485717536946276215.vnd.openxmlformats-officedocument.spreadsheetml.sheet'
at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:162)
at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:149)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:277)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:186)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:325)
at com.ucbos.appdata.MLSample.test(MLSample.java:55)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.springframework.test.context.junit4.statements.RunBeforeTestExecutionCallbacks.evaluate(RunBeforeTestExecutionCallbacks.java:74)
at org.springframework.test.context.junit4.statements.RunAfterTestExecutionCallbacks.evaluate(RunAfterTestExecutionCallbacks.java:84)
at org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75)
at org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86)
at org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:251)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53)
Caused by: java.io.FileNotFoundException: C:\Users\SHIVLI~1\AppData\Local\Temp\tmp9485717536946276215.vnd.openxmlformats-officedocument.spreadsheetml.sheet (The system cannot find the file specified)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
at org.apache.poi.openxml4j.opc.ZipPackage.openZipEntrySourceStream(ZipPackage.java:159)
... 35 more
Update - Tried BytesHandle to read the document as byte[] and then write it to the file system but still, I am getting the same error "The file is corrupted and can not be open".
BytesHandle readHandle = new BytesHandle();
readHandle.setMimetype("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
readHandle.setFormat(Format.BINARY);
readHandle.set(BYTES_BINARY);
byte[] file = manager.read("/test/binaryDoc.xlsx", readHandle).get();
File outputFile = new File("outputFile.xlsx");
OutputStream os = new FileOutputStream(outputFile);
os.write(file);
os.close();
Excel file is getting saved in a file system
I am not getting what is the wrong step I am doing here,
Could anyone help me to resolve this issue?

From the description, the issue seems to be that the document retrieval and write to the OS is not working correctly, since it shows a corrupted file. I'm not a Java developer, but it appears that you are trying to access the document as if it were a regular document, and not a binary. For binaries it appears you either need to stream the binary file or buffer it with com.marklogic.client.io.BytesHandle
In Reading Content From A Binary Document it shows several examples. The following example looks closest to what you are trying to do:
byte[] buf = docMgr.read(docID, new BytesHandle()).get();
I would also suggest eliminating passing the document to XSSFWorkbook.java until you can verify that you are saving valid files to the temp location, to simplify the troubleshooting process.

If you just want to read/write the xlsx file, please use below Class representing the input stream as bytes instead of reading binary file as string.
InputStreamHandle handle = new InputStreamHandle();
handle.set(docStream);
docMgr.write(uri, handle);
Please assert the validity of written data, control flow and conditions prior further manipulation.
Validation options:
Use Java binary package, the common facility in testing framework, to assert the input being written without loss:
> Task :fc-financial-asset:TypedWriteReadStreamTest.main()
Document /dmsdk/FXD.xlsx write completed.
Assert /dmsdk/FXD.xlsx Input Stream and File BYTE –
InputStream /dmsdk/FXD.xlsx bytes:
11614
Calculate /dmsdk/FXD.xlsx byte array:
11614
Read /dmsdk/FXD.xlsx file bytes:
11614
Rename the tmp*****.spreadsheetml.sheet to tmp*****.spreadsheetml.xlsx, you should be able to open the valid excel.
save or validate the document from QConsole.

Related

Can't read an .xlsx file with [BlobInput]

I'm trying to read an .xlsx file from blob storage but the only option I have is to read it as a string from the binding parameter.
[BlobInput("templates/myTemplate.xlsx", Connection = "StorageAccountConnStr")] string template
To load the .xlsx file I need to make a MemoryStream. Thus I wrote:
var templateBytes = Encoding.Unicode.GetBytes(template);
var templateStream = new MemoryStream(templateBytes);
It fails and tells me the file might be corrupt.
Any ideas how to read properly an .xlsx file from blob storage as an input?
Turns out, except string, byte[] is supported.
Therefore I could be able to read and open my file. Azure documentation does not mention it yet.

AmazonClientException when uploading file with spring-integration-aws S3MessageHandler

I have configured an S3MessageHandler from spring-integration-aws to upload a File object to S3.
The upload fails with the following trace:
Caused by: com.amazonaws.AmazonClientException: Data read has a different length than the expected: dataLength=0; expectedLength=26; includeSkipped=false; in.getClass()=class com.amazonaws.internal.ResettableInputStream; markedSupported=true; marked=0; resetSinceLastMarked=false; markCount=1; resetCount=0
at com.amazonaws.util.LengthCheckInputStream.checkLength(LengthCheckInputStream.java:152)
...
Looking at the source code for S3MessageHandler, I'm not sure how uploading a File would ever succeed. The s3MessageHandler.upload() method does the following when I trace its execution:
Creates a FileInputStream for the File.
Computes the MD5 hash for the file contents, using the input stream.
Resets the stream if it can be reset (not possible for FileInputStream).
Sets up the S3 transfer using the input stream. This fails because the stream is at the EOF, so the number of transferable bytes doesn't match what's in the Content-Length header.
Am I missing something, or is this a bug in the message handler?
Yes; it's a bug; please open an Issue in GitHub and/or a JIRA Issue.
For FileInputStream a new one should be created, for InputStream payloads, we need to assert that markSupported() is true if MD5 consumes the stream.
Consider Contributing a fix after "signing" the CLA.
EDIT
I opened JIRA Issue INTEXT-225.

Error while reading CSV file using Apache POI

I am trying to read a CSV file from a local drive using the Apache PoI API.
FileInputStream fInputStream = new FileInputStream(inputName);
Workbook workBook = WorkbookFactory.create(fInputStream);
When I try to read a CSV file, which is created in Windows, the API reads it perfectly.
Whereas, when I have the CSV file(which is DOWNLOADED from an UNIX environment) and read it in windows environment, I get the below exception.
java.lang.IllegalArgumentException: Your InputStream was neither an OLE2 stream, nor an OOXML stream
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:81)
Can somebody throw some inputs on this behavior.

Tclsh convert base64 dump into zip file

I have written a Tclsh code that will fetch a zip file content in base64 format through xml-rpc method. I am dumping that base64 data into a file using the following snippet:
#!/usr/bin/tclsh
...
set mybase64Dump [myXmlRpcCallToReturnThisDump]
set zipFilePtr [open "xyz.zip" "w"]
puts $zipFilePtr $mybase64Dump
close $zipFilePte
Zip file was getting generated with XKbytes of size, but when trying to open using 7zip it says, Is not Archive. But I copy pasted the same base64 dump in a online converter. It was giving me a proper extractable zip file.
Is it something I am doing wrongly?
You probably need to configure the output file to be binary, not ascii. The default translation for a newly opened file is "auto", which does system-specific translation of the end-of-line characters, which is not what you want for a .zip file. Configure this using fconfigure on the handle after opening it or by adding the BINARY access flag to the open command.
See http://www.tcl.tk/man/tcl8.5/TclCmd/open.htm and http://www.tcl.tk/man/tcl8.5/TclCmd/fconfigure.htm for details on the syntax.

How to get the content from an uncompressed .docx file by using ifstream.open() and read() methods. office open xml, vc++

I'm trying to build a suffix tree from a .docx file.
So first I unzipped the .docx file and then again created a .docx file with out compressing it. I used ICSharpCode.SharpZipLib.Zip.ZipOutputStream.SetLevel(0) method. Here I used C#.
This uncompressed .docx files can be opened without any error.
For the next step I used vc++. By using ifstream.open ("uncompressed.docx", ios::binary ); method I tried to open the file and store the content in a char array by using ifstream.read ( (char *)T, MAX_LENGTH - 1 ) method. But I could not get the actual content of uncompressed.docx file. When I tried to print the content of the char array(T) it printed some formatting tags rather than printing the actual text content of the uncompressed.docx.
I could not figure out what is the actual file that ifstream.open() method opens.It is not the document.xml file.
Please tell me how to get the actual text content from the uncompressed.docx file using VC++.

Resources