CSV generation possible with Apache POI? - apache-poi

I need to generate csv files and I stumbled on a module in our project itself which uses Apache POI to generate excel sheets aleady. So I thought I could use the same to generate csv. So I asked google brother, but he couldnt find anything for sure that says Apache POI can be used for CSV file generation. I was checking on the following api too and it only talks about xls sheets and not csv anywhere. Any ideas?
http://poi.apache.org/apidocs/org/apache/poi/ss/usermodel/Workbook.html

Apache Poi will not output to CSV for you. However, you have a couple good options, depending on what kind of data you are writing into the csv.
If you know that none of your cells will contain csv markers such as commas, quotes, line endings, then you can loop through your data rows and copy the text into a StringBuffer and send that to regular java IO.
Here is an example of writing an sql query to csv along those lines: Poi Mailing List: writing CSV
Otherwise, rather than figure out how to escape the special characters yourself, you should check out the opencsv project

If you check official web site Apache POI, you can find lots of example there. There is also an example that shows how you can have csv formatted output by using apache POI.
ToCSV example

Basic strategy:
1) Apache Commons CSV is the standard library for writing CSV values.
2) But we need to loop through the Workbook ourselves, and then call Commons CSV's Printer on each cell value, with a newline at the end of each row. Unfortunately this is custom code, it's not automatically available in XSSF. But it's easy:
// In this example we construct CSVPrinter on a File, can also do an OutputStream
Reader reader = Files.newBufferedReader(Paths.get(SAMPLE_CSV_FILE_PATH));
CSVPrinter csvPrinter = new CSVPrinter(reader, CSVFormat.DEFAULT);
if (workbook != null) {
XSSFSheet sheet = workbook.getSheetAt(0); // Sheet #0
Iterator<Row> rowIterator = sheet.rowIterator();
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
csvPrinter.print(cell.getStringCellValue()); // Call Commons CSV here to print
}
// Newline after each row
csvPrinter.println();
}
}
// at the end, close and flush CSVPrinter
csvPrinter.flush();
csvPrinter.close();

An improved and tested version of gene b's response is this:
/**
* Saves all rows from a single Excel sheet in a workbook to a CSV file.
*
* #param excelWorkbook path to the Excel workbook.
* #param sheetNumber sheet number to export.
* #param csvFile CSV file path for output.
* #throws IOException if failed to read the Excel file or create/write to a CSV file.
*/
public static void excelToCsv(String excelWorkbook, int sheetNumber, String csvFile) throws IOException {
try (Workbook workbook = WorkbookFactory.create(new File(excelWorkbook), null, true); // Read-only: true
BufferedWriter writer = new BufferedWriter(new FileWriter(csvFile));
CSVPrinter csvPrinter = new CSVPrinter(writer, CSVFormat.DEFAULT)) {
Sheet sheet = workbook.getSheetAt(sheetNumber);
DataFormatter format = new DataFormatter();
for (Row row : sheet) {
for (int c = 0; c < row.getLastCellNum(); c++) {
// Null cells returned as blank
Cell cell = row.getCell(c, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);
String cellValue = format.formatCellValue(cell);
csvPrinter.print(cellValue);
}
csvPrinter.println();
}
csvPrinter.flush();
}
}
The following improvements were made:
NullPointerException won't be thrown if a cell in an Excel Row was never edited. A blank value will be written to the CSV instead.
Excel values are rendered using DataFormatter allowing the CSV to match the visual representation of the Excel sheet.
try-with-source used for auto-close of the file objects.
The workbook is opened in the read-only mode.

Related

How to automatically remove # of the excel formula generated by POI setCellFormula

When use setCellFormula set by paramete "CHISQ.TEST(ChiSq_Data!D5:F5,ChiSq_Data!L5:N5)",but the output was "=#CHISQ.TEST(ChiSq_Data!D5:F5,ChiSq_Data!L5:N5)", the # symbol make the formula did not work and shows #VALUE in the result excel.
How can I remove the # automatically?
This is a similar problem as this one: Apache POI Excel Formula entering # Symbols where they don't belong.
All new functions (introduced after Excel 2007) are prefixed with _xlfn in internally file storage. The GUI does not show that prefix if the Excel version is able to interpret that function. If the Excel version is too old to be able to interpret that function you may see that prefix even in GUI.
Apache POI creates Excel files and that's why writes formulas in file storage directly. Using:
cell.setCellFormula("CHISQ.TEST(ChiSq_Data!D5:F5,ChiSq_Data!L5:N5)");
it writes CHISQ.TEST(ChiSq_Data!D5:F5,ChiSq_Data!L5:N5) into the file storage but the Excel GUI expects _xlfn.CHISQ.TEST(ChiSq_Data!D5:F5,ChiSq_Data!L5:N5). That's why the #NAME? error.
But why the #? The # is the implicit intersection operator. Implicit intersection is a new feature of Excel 365 (a silly one in my opinion, as well as dynamic array and spilling array behavior). And because Excel 365 does not know the function CHISQ.TEST without the prefix but it contains arrays of cells as parameters, it puts # in front of it to show that it would use implicit intersection if it would know it.
So the solution is to put the correct prefix before the function name in file storage to make it work:
cell.setCellFormula("_xlfn.CHISQ.TEST(ChiSq_Data!D5:F5,ChiSq_Data!L5:N5)");
Complete example to test:
import java.io.FileOutputStream;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFCell;
class CreateExcelCHISQ_TEST {
public static void main(String[] args) throws Exception {
try (
Workbook workbook = new XSSFWorkbook(); FileOutputStream fileout = new FileOutputStream("Excel.xlsx") ) {
Sheet sheet = workbook.createSheet();
Row row;
Cell cell;
// Filling dummy data to another sheet
Sheet otherSheet = workbook.createSheet("ChiSq_Data");
row = otherSheet.createRow(4);
row.createCell(3).setCellValue(123);
row.createCell(4).setCellValue(456);
row.createCell(5).setCellValue(78);
row.createCell(11).setCellValue(122.5);
row.createCell(12).setCellValue(456.5);
row.createCell(13).setCellValue(77.5);
row = sheet.createRow(0);
cell = row.createCell(0);
//cell.setCellFormula("CHISQ.TEST(ChiSq_Data!D5:F5,ChiSq_Data!L5:N5)"); // wrong
cell.setCellFormula("_xlfn.CHISQ.TEST(ChiSq_Data!D5:F5,ChiSq_Data!L5:N5)");
workbook.write(fileout);
}
}
}

Excel handling in jmeter to add multiple dynamic rows

I need to create an excel file for upload scenario in jmeter. The excel has 3 columns and number of rows is a dynamic value coming from parameter file.
The row values cannot have same data for different excel. So I am using random string to create data. By hard coding number of rows I am able to create file with below code using apache poi but facing issues to handle dynamic number of rows. Can somebody please provide solution?
Below is the code which is working fine for creating 5 rows.
def path = FileServer.getFileServer().getBaseDir;
def separator = File.separator;
def sourceFileName = "CreateDynamicExcel";
HSSFWorkbook workbook = new HSSFWorkbook();
HSSFSheet sheet = workbook.createSheet("Billing");
Object[] dataTypes = [
["Column1Header","Column2Header","Column3Header"],
["${__RandomString(10,abcdefghij,)}","${__Random(100000000,199999999,)}","${__RandomString(10,abcdefghijklmnopqrst,)}"],
["${__RandomString(10,abcdefghij,)}","${__Random(100000000,199999999,)}","${__RandomString(10,abcdefghijklmnopqrst,)}"],
["${__RandomString(10,abcdefghij,)}","${__Random(100000000,199999999,)}","${__RandomString(10,abcdefghijklmnopqrst,)}"],
["${__RandomString(10,abcdefghij,)}","${__Random(100000000,199999999,)}","${__RandomString(10,abcdefghijklmnopqrst,)}"]];
int rowNum = 0;
for (Object[] datatype:datatypes)
HSSFRow = sheet.createRow(rowNum++);
int colNum = 0;
for(Object filed:datatype){
HSSFCell cell = row.createCell(colNumn+=);
if(filed.instanceof(String){
cell.setCellValue((String) filed);
}
if(filed.instanceof(Integer){
cell.setCellValue((Integer) filed);
}
}
try{
FileOutputStream out = new FileOutputStream(new File(path+separator+sourceFileName+".xls"));
workbook.write(out);
out.close();
}
catch(FileNotFoundException e){
e.printStacktrace();
}
I don't think you should be inlining JMeter Functions or Variables in Groovy scripts because:
It conflicts with Groovy GString Template Engine syntax
Only first occurrence will be cached and used for subsequent iterations
So you can use the following expressions instead:
org.apache.commons.lang3.RandomStringUtils.randomAlphanumeric(10)
org.apache.commons.lang3.RandomUtils.nextInt(100000000, 199999999)
etc.
In case of any problems - take a look at jmeter.log file, in case of any issues you should find the root cause or at least a clue there

Excel and Libre Office conflict over Open XML output

Open XML is generating .xlsx files that can be read by Open Office, but not by Excel itself.
With this as my starting point( Export DataTable to Excel with Open Xml SDK in c#) I have added code to create a .xlsx file. Attempting to open with Excel, I'm asked if I want to repair the file. Saying yes gets "The workbook cannot be opened or repaired by Microsoft Excel because it's corrupt." After many hours of trying to jiggle the data from my table to make this work, I finally threw up my hands in despair and made a spreadsheet with a single number in the first cell.
Still corrupt.
Renaming it to .zip and exploring shows intact .xml files. On a whim, I took a legit .xlsx file created by Excel, unzipped it, rezipped without changing contents and renamed back to .xlsx. Excel declared it corrupt. So this is clearly not a content issue, but file a format issue. Giving up on Friday, I sent some of the sample files home and opened them there with Libre Office. There were no issues at all. File content was correct and Calc had no problem. I'm using Excel for Office 365, 32 bit.
// ignore the bits (var list) that get data from the database. I've reduced this to just the output of a single header line
List< ReportFilingHistoryModel> list = DB.Reports.Report.GetReportClientsFullHistoryFiltered<ReportFilingHistoryModel>(search, client, report, signature);
MemoryStream memStream = new MemoryStream();
using (SpreadsheetDocument workbook = SpreadsheetDocument.Create(memStream, SpreadsheetDocumentType.Workbook))
{
var workbookPart = workbook.AddWorkbookPart();
workbook.WorkbookPart.Workbook = new Workbook();
workbook.WorkbookPart.Workbook.Sheets = new Sheets();
var sheetPart = workbook.WorkbookPart.AddNewPart<WorksheetPart>();
var sheetData = new SheetData();
sheetPart.Worksheet = new Worksheet(sheetData);
Sheets sheets = workbook.WorkbookPart.Workbook.GetFirstChild<Sheets>();
string relationshipId = workbook.WorkbookPart.GetIdOfPart(sheetPart);
uint sheetId = 1;
if (sheets.Elements<Sheet>().Count() > 0)
{
sheetId = sheets.Elements<Sheet>().Select(s => s.SheetId.Value).Max() + 1;
}
Sheet sheet = new Sheet() { Id = relationshipId, SheetId = sheetId, Name = "History" };
sheets.Append(sheet);
Row headerRow = new Row();
foreach( var s in "Foo|Bar".Split('|'))
{
var cell = new Cell();
cell.DataType = CellValues.Number;
cell.CellValue = new CellValue("5");
headerRow.AppendChild(cell);
}
sheetData.AppendChild(headerRow);
}
memStream.Seek(0, SeekOrigin.Begin);
Guid result = DB.Reports.Report.AddClientHistoryList( "test.xlsx", memStream.GetBuffer(), "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
return Ok(result);
This should just work. I've noticed other stack overflow discussions that direct back to the first link I mentioned above. I seem to be doing it right (and Calc concurs). There have been discussions of shared strings and whatnot, but by using plain numbers I shouldn't be having issues. What am I missing here?
In working on this, I went with the notion that some extraneous junk on the end of a .zip file is harmless. 7-Zip, Windows Explorer and Libre Office all seem to agree (as does some other zip program I used at home whose name escapes me). Excel, however, does not. Using the pointer at memStream.GetBuffer() was fine, but using its length was not. (The preceding Seek() was unnecessary.) Limiting the write of the data to a length equal to the current output position keeps Excel from going off the rails.

How do I get from .xlsx file to workbook using apache poi eventusermodel?

I need to work with both xls and xlsx. I got an outofmemory error when using xssf so I changed to sxssf and while that doesn't work I would like to change my code to use eventusermodel instead of ss usermodel. Unfortunately I do not understand very well how to use event api so if someone could provide some example code to go from File file or inputstream to a workbook.
You should use the Event API, meaning that you need to combine SAX for reading and SXSSFWorkbook for writing.
This example is an Excel to CSV convertor. You should do something likewise in the endElement() method. You should create a new Row if not created and a new cell every time there is a value (name=="v"). Set the type of the cell and the new value:
if ("v".equals(name)) {
if (row == null)
row = sheet.createRow(0);
cell = row.createCell(thisColumn);
switch (nextDataType) {
case BOOL:
cell.setCellType(Cell.CELL_TYPE_BOOLEAN);
char first = value.charAt(0);
thisStr = first == '0' ? "FALSE" : "TRUE";
if (thisStr == "FALSE")
cell.setCellValue(false);
else
cell.setCellValue(false);
case OTHER_CELL_TYPE:
//....
default:
cell.setCellType(Cell.CELL_TYPE_BLANK);
break;
}
//......More proccessing
}
Here you have the apache poi SXSSF example for a better understanding of how to save it.

apache poi how to disable external reference or external links?

I've been looking on the web for 30 minutes now and can't find any explanation about that. Here is my problem :
I wrote an application with poi to parse some data from 200 excel files or so and put some of it into a new file. I do some cell evaluation with FormulaEvaluator to know the content of the cells before choosing to keep them or not.
Now, when i test it on a test file with only values in the cells, the program works perfectly but when i use it on my pile of files I get this error :
"could not resolve external workbook name"
Is there any way to ignore external workbook references or set up the environment so that it wont evaluate formula with external references?
Because the ones I need don't contain references...
Thank you
Can you not just catch the error, and skip over that cell?
You're getting the error because you've asked POI to evaluate a the formula in a cell, and that formula refers to a different file. However, you've not told POI where to find the file that's referenced, so it objects.
If you don't care about cells with external references, just catch the exception and move on to the next cell.
If you do care, you'll need to tell POI where to find your files. You do this with the setupEnvironment(String[],Evaluator[]) method - pass it an array of workbook names, and a matching array of evaluators for those workbooks.
In order for POI to be able to evaluate external references, it needs access to the workbooks in question. As these don't necessarily have the same names on your system as in the workbook, you need to give POI a map of external references to open workbooks, through the setupReferencedWorkbooks(java.util.Map<java.lang.String,FormulaEvaluator> workbooks) method.
I have done please see below code that is working fine at my side
public static void writeWithExternalReference(String cellContent, boolean isRowUpdate, boolean isFormula)
{
try
{
File yourFile = new File("E:\\Book1.xlsx");
yourFile.createNewFile();
FileInputStream myxls = null;
myxls = new FileInputStream(yourFile);
XSSFWorkbook workbook = new XSSFWorkbook(myxls);
FormulaEvaluator mainWorkbookEvaluator = workbook.getCreationHelper().createFormulaEvaluator();
XSSFWorkbook workbook1 = new XSSFWorkbook(new File("E:\\elk\\lookup.xlsx"));
// Track the workbook references
Map<String,FormulaEvaluator> workbooks = new HashMap<String, FormulaEvaluator>();
workbooks.put("Book1.xlsx", mainWorkbookEvaluator);
workbooks.put("elk/lookup.xlsx", workbook1.getCreationHelper().createFormulaEvaluator());
workbook2.getCreationHelper().createFormulaEvaluator());
// Attach them
mainWorkbookEvaluator.setupReferencedWorkbooks(workbooks);
XSSFSheet worksheet = workbook.getSheetAt(0);
XSSFRow row = null;
if (isRowUpdate) {
int lastRow = worksheet.getLastRowNum();
row = worksheet.createRow(++lastRow);
}
else {
row = worksheet.getRow(worksheet.getLastRowNum());
}
if (!isFormula) {
Cell cell = row.createCell(row.getLastCellNum()==-1 ? 0 : row.getLastCellNum());
cell.setCellValue(Double.parseDouble(cellContent));
} else {
XSSFCell cell = row.createCell(row.getLastCellNum()==-1 ? 0 : row.getLastCellNum());
System.out.println(cellContent);
cell.setCellFormula(cellContent);
mainWorkbookEvaluator.evaluateInCell(cell);
cell.setCellFormula(cellContent);
// mainWorkbookEvaluator.evaluateInCell(cell);
//System.out.println(cell.getCellFormula() + " = "+cell.getStringCellValue());
}
workbook1.close();
myxls.close();
FileOutputStream output_file =new FileOutputStream(yourFile,false);
//write changes
workbook.write(output_file);
output_file.close();
} catch (Exception e) {
e.printStackTrace();
}
}

Resources