Paragraph styles breaking formatting in poi pdf converter - apache-poi

I'm using Poi/Xdocreport pdf converter on a Tomcat server. Very simple implementation in code:
InputStream doc = new FileInputStream(new File(documentPath));
XWPFDocument document = new XWPFDocument(doc);
PdfOptions options = PdfOptions.getDefault();
File pdfFile = new File(pdfFilePath);
OutputStream out = new FileOutputStream(pdfFile);
PdfConverter.getInstance().convert(document, out, options);
While it works fine for most docx documents, when the document uses heading paragraph styles, the text overlaps when wrapping (the heading is forced to be displayed in one row)—rendering the pdf document practically unreadable. Versions are
fr.opensagres.poi.xwpf.converter.pdf-2.0.4
xdocreport version 2.0.4
poi 5.2.2.
Any help very much appreciated!
Source docx:
Resulting pdf:

Related

How to add image file as Ole object in excel using java

I tried with Aspose cell.
But I am able to add pdf file properly.
When I add jpg file, it shows in the excel file but doesnot get opened.
I tried with following way.
sheet.getOleObjects().get(oleObjectIndex).setImageData(binary);
sheet.getOleObjects().get(oleObjectIndex).setLeftCM(oleObjectIndex);
sheet.getOleObjects().get(oleObjectIndex).setDisplayAsIcon(true);
Here image shown like a thumbnail , but I dont want that.
sheet.getOleObjects().get(oleObjectIndex).setObjectData(binary);
//sheet.getOleObjects().get(oleObjectIndex).setFileType(oleFileType);
sheet.getOleObjects().get(oleObjectIndex).setDisplayAsIcon(true);
sheet.getOleObjects().get(oleObjectIndex).setLeftCM(oleObjectIndex);
Here it shows proper icon for the file but file does not get opened when double clicked.
Help from the community is highly apraciated.
Thank you.
See the following two lines of code that you also need to add:
sheet.getOleObjects().get(oleObjectIndex).setFileFormatType(FileFormatType.UNKNOWN);
sheet.getOleObjects().get(oleObjectIndex).setObjectSourceFullName(path);
Here is complete sample code that I tested and it works fine:
e.g
Sample code:
// Get the image file.
String path = "e:\\test\\myfile.jpg";
File file = new File(path);
// Get the picture into the streams.
byte[] img = new byte[(int) file.length()];
FileInputStream fis = new FileInputStream(file);
fis.read(img);
// Instantiate a new Workbook.
Workbook wb = new Workbook();
// Get the first worksheet.
Worksheet sheet = wb.getWorksheets().get(0);
// Add an Ole object into the worksheet with the image shown in MS Excel.
int oleObjIndex = sheet.getOleObjects().add(14, 3, 200, 220, img);
OleObject oleObj = sheet.getOleObjects().get(oleObjIndex);
// Set embedded ole object data and other attributes.
oleObj.setObjectData(img);
oleObj.setDisplayAsIcon(true);
oleObj.setFileFormatType(com.aspose.cells.FileFormatType.UNKNOWN);
oleObj.setObjectSourceFullName(path);
// Save the excel file
wb.save("f:\\files\\tstoleobject1.xlsx");
Hope, this helps a bit.
PS. I am working as Support developer/ Evangelist at Aspose.
i just use the same code,but i found when i open the new Excel,it shows as a image,i have to check it twice to transfer it to an OLE.
whats worry

Excel and Libre Office conflict over Open XML output

Open XML is generating .xlsx files that can be read by Open Office, but not by Excel itself.
With this as my starting point( Export DataTable to Excel with Open Xml SDK in c#) I have added code to create a .xlsx file. Attempting to open with Excel, I'm asked if I want to repair the file. Saying yes gets "The workbook cannot be opened or repaired by Microsoft Excel because it's corrupt." After many hours of trying to jiggle the data from my table to make this work, I finally threw up my hands in despair and made a spreadsheet with a single number in the first cell.
Still corrupt.
Renaming it to .zip and exploring shows intact .xml files. On a whim, I took a legit .xlsx file created by Excel, unzipped it, rezipped without changing contents and renamed back to .xlsx. Excel declared it corrupt. So this is clearly not a content issue, but file a format issue. Giving up on Friday, I sent some of the sample files home and opened them there with Libre Office. There were no issues at all. File content was correct and Calc had no problem. I'm using Excel for Office 365, 32 bit.
// ignore the bits (var list) that get data from the database. I've reduced this to just the output of a single header line
List< ReportFilingHistoryModel> list = DB.Reports.Report.GetReportClientsFullHistoryFiltered<ReportFilingHistoryModel>(search, client, report, signature);
MemoryStream memStream = new MemoryStream();
using (SpreadsheetDocument workbook = SpreadsheetDocument.Create(memStream, SpreadsheetDocumentType.Workbook))
{
var workbookPart = workbook.AddWorkbookPart();
workbook.WorkbookPart.Workbook = new Workbook();
workbook.WorkbookPart.Workbook.Sheets = new Sheets();
var sheetPart = workbook.WorkbookPart.AddNewPart<WorksheetPart>();
var sheetData = new SheetData();
sheetPart.Worksheet = new Worksheet(sheetData);
Sheets sheets = workbook.WorkbookPart.Workbook.GetFirstChild<Sheets>();
string relationshipId = workbook.WorkbookPart.GetIdOfPart(sheetPart);
uint sheetId = 1;
if (sheets.Elements<Sheet>().Count() > 0)
{
sheetId = sheets.Elements<Sheet>().Select(s => s.SheetId.Value).Max() + 1;
}
Sheet sheet = new Sheet() { Id = relationshipId, SheetId = sheetId, Name = "History" };
sheets.Append(sheet);
Row headerRow = new Row();
foreach( var s in "Foo|Bar".Split('|'))
{
var cell = new Cell();
cell.DataType = CellValues.Number;
cell.CellValue = new CellValue("5");
headerRow.AppendChild(cell);
}
sheetData.AppendChild(headerRow);
}
memStream.Seek(0, SeekOrigin.Begin);
Guid result = DB.Reports.Report.AddClientHistoryList( "test.xlsx", memStream.GetBuffer(), "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
return Ok(result);
This should just work. I've noticed other stack overflow discussions that direct back to the first link I mentioned above. I seem to be doing it right (and Calc concurs). There have been discussions of shared strings and whatnot, but by using plain numbers I shouldn't be having issues. What am I missing here?
In working on this, I went with the notion that some extraneous junk on the end of a .zip file is harmless. 7-Zip, Windows Explorer and Libre Office all seem to agree (as does some other zip program I used at home whose name escapes me). Excel, however, does not. Using the pointer at memStream.GetBuffer() was fine, but using its length was not. (The preceding Seek() was unnecessary.) Limiting the write of the data to a length equal to the current output position keeps Excel from going off the rails.

Groovy/Grails document download

I'm currently working on a web application using grails. One of the requirements is to generate excel timesheets and download it afterword.
This is my code for downloading from grails controller.
response.contentType = "application/vnd.ms-excel"
response.setHeader("Content-Disposition","attachment;filename=name.xls")
response.outputStream << wb.bytes
response.outputStream.flush()
But my excel file is corrupted. I can open it using open office, but doesn't work using microsoft office or google drive. Looks like the content of the xls file is not well formatted.
If I save document instead of downloading everything is ok.
FileOutputStream fileOut = new FileOutputStream("name.xls")
wb.write(fileOut)
fileOut.close()
I cannot figured out why the file content is corrupted when downloaded as byte array.
Grails version - 2.3.7
Apache poi version - 3.13
Thanks in advance,
Method code
def generate(){
TimeSheetExportWrapper timeSheet = new TimeSheetExportWrapper()
bindData(timeSheet, params.ts)
HSSFWorkbook wb = excelExportService.createExcelTimeSheet(getCurrentTenant(), timeSheet, getCurrentTimezone())
response.contentType = "application/vnd.ms-excel"
response.setHeader("Content-Disposition", "attachment;filename=${timeSheet.proposedFileName}")
response.outputStream << wb.bytes
response.outputStream.flush()
}
There are a few things that you should be doing:
First, set the content length: response.setHeader("Content-Length", "${wb.bytes.length}")
Secondly, close the output: response.outputStream.close()
And finally, make sure you return null to ensure Grails does not attempt to render a view.
def generate(){
TimeSheetExportWrapper timeSheet = new TimeSheetExportWrapper()
bindData(timeSheet, params.ts)
HSSFWorkbook wb = excelExportService.createExcelTimeSheet(getCurrentTenant(), timeSheet, getCurrentTimezone())
response.contentType = "application/vnd.ms-excel"
response.setHeader("Content-Length", "${wb.bytes.length}")
response.setHeader("Content-Disposition", "attachment;filename=${timeSheet.proposedFileName}")
response.outputStream << wb.bytes
response.outputStream.flush()
response.outputStream.close()
return null
}

chart formatting lost when save as xlsx in NPOI

I have a template xlsx, called "book3.xlsx" that contains a pre-formatted chart. All I want to do is to read my template xlsx and modify some cells and write out the a xlsx ("test.xlsx")
A strange thing happens here, if i open the result xlsx the format of the chart has changed... It has different coloring, axis is missing etc. When I used xls file format it was fine, but we should use xlsx. Anyone noticed this issue?
var file = new FileStream(#"book3.xlsx", FileMode.Open, FileAccess.Read);
var workbook = new XSSFWorkbook(file);
using (var fs = new FileStream(#"test.xlsx", FileMode.Create))
{
workbook.Write(fs);
}

Store large hidden text/string to a PDF using iTextSharp

I want to store a large string in PDF document somewhere hidden. Right now I have a hidden text field in which I am writing that text. The problem is that when the string size increased upto 10MB I start getting OutOfMemory errors.
What will be the best way to store some large hidden string/text to PDF document using iTextSharp? That text/string should be retrieved later as well.
Such private data can be stored in PieceInfo dictionaries, also cf. David's answer to the OP's follow-up question.
This answer to the older question "Insert hidden digest in pdf using iText library" shows how to make use of PieceInfo dictionaries in general using iText/Java (differences to iTextSharp/C# should be minimal here).
As the OP talks about data 10 MB and up, he may want to use PDF streams instead of strings.
The DocumentPieceInfo helper class provided in that older answer can be used with PDF streams for BIG DATA like this (again in Java as I'm mostly living on the Java side, and again porting to C# should be easy):
Storing document PieceInfo data
PdfName appName = new PdfName("MYAPP");
PdfName dataName = new PdfName("BigData");
DocumentPieceInfo dpi = new DocumentPieceInfo();
PdfReader reader = new PdfReader(...);
PdfStamper stamper = new PdfStamper(reader, ...);
InputStream in = ... BIG DATA INPUT STREAM ...;
PdfStream stream = new PdfStream(in, stamper.getWriter());
stream.flateCompress();
PdfIndirectObject ref = stamper.getWriter().addToBody(stream);
stream.writeLength();
in.close();
dpi.addPieceInfo(reader, appName, dataName, ref.getIndirectReference());
stamper.close();
Retrieving document PieceInfo data
PdfName appName = new PdfName("MYAPP");
PdfName dataName = new PdfName("BigData");
DocumentPieceInfo dpi = new DocumentPieceInfo();
PdfReader reader = new PdfReader("target/test-outputs/test-with-piece-info.pdf");
PdfObject myDataObject = dpi.getPieceInfo(reader, appName, dataName);
myDataObject = PdfReader.getPdfObject(myDataObject);
byte[] myData = PdfReader.getStreamBytes((PRStream)myDataObject)

Resources