Apache POI to excel - Zero Date - excel

Please make sure you understand my problem before replying, it is not as simple as it looks. Please don't just do a google search and post the link to the results; I already looked.
I have a VB.Net application that we are replacing with a Java application. The purpose of the application is to write an excel sheet (.xls). The file is then sent over to a second party and they process the data in it. I am using the APACHE POI to write the file.
The final product is being rejected by the second party because two time fields are "not valid". After scratching my head for a while, I noticed that Java produced file and VB.Net produced file are handling 0 date values differently. Let's say the time is suppose to be 3:30 PM in military time, the data appears as 15:30 on both files. The problem is the date portion of the field:
VB.Net generated: 1/0/1900 15:30
Java generated: 1/1/1970 15:30
I can't seem to find a way to have the apache POI mimic the way excel handles 0 dates. The following are some of the things I tried.
I set my date/time variable in the java application as 1/0/1900 15:30. This gives me an error in the application.
I set my variable as a string and pass it to the worksheet and then set the format of the cell. I don't get an error, but the data stays as 'general' until I double click on the cell and press Enter. This process is suppose to be automated so this is not an option.
I set the formula of the cell to =TIMEVALUE("15:30"), but this was not accepted by the 2nd party.
Has anyone else ran into this problem? Can anyone think of a way around this? Having the second party change the way they read the file is not an option.

What you need to know is that Excel stores datetime values as floating point double values. There 0 = 00:00:00 and 1 = 24:00:00 = 01/01/1900 00:00:00. Also 0.5 = 12:00:00 and 1.5 = 36:00:00 = 01/01/1900 12:00:00. So in other words, Excels datetime values are starting with 0 and 1 is one day and is 01/01/1900. Also 1/24 is one hour, 1/24/60 is one minute and 1/24/60/60 is one second.
The problem using a Java Date is that the months in Calendar constructors are 0 based. So month 0 is January and new GregorianCalendar(1900, 0, 1, 15, 30, 0) will be 01/01/1900 15:30:00. And there is not a day 0, so new GregorianCalendar(1900, 0, 0, 15, 30, 0) will be 12/31/1899 15:30:00 and this will be -1 for Excel.
Because the problems with Excels date behavior are known, apache poi provides DateUtil.
Using this we can do:
import java.io.*;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import java.util.Calendar;
import java.util.GregorianCalendar;
class XSSFNullDateTest {
public static void main(String[] args) {
try {
Workbook wb = new XSSFWorkbook();
Sheet sheet = wb.createSheet("Sheet1");
CreationHelper creationHelper = wb.getCreationHelper();
CellStyle cellStyleTime = wb.createCellStyle();
cellStyleTime.setDataFormat(creationHelper.createDataFormat().getFormat("hh:mm:ss"));
//using a Calendar:
Calendar calendar = new GregorianCalendar(1900, 0, 1, 15, 30, 0);
System.out.println(calendar.getTime()); //01/01/1900 15:30:00
double doubleTime = DateUtil.getExcelDate(calendar, false);
System.out.println(doubleTime); //1.6458333333333335
Cell cell = sheet.createRow(0).createCell(0);
cell.setCellValue(doubleTime-1); //subtract 1 so we have day 0
cell = sheet.getRow(0).createCell(1);
cell.setCellValue(doubleTime-1); //subtract 1 so we have day 0
cell.setCellStyle(cellStyleTime);
//using a string:
doubleTime = DateUtil.convertTime("15:30:00");
System.out.println(doubleTime); //0.6458333333333334 = day 0 already
cell = sheet.createRow(1).createCell(0);
cell.setCellValue(doubleTime);
cell = sheet.getRow(1).createCell(1);
cell.setCellValue(doubleTime);
cell.setCellStyle(cellStyleTime);
OutputStream out = new FileOutputStream("XSSFNullDateTest.xlsx");
wb.write(out);
wb.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
}

Related

Excel or POI changes my date format to something that doesn't work

I've got an Excel file I want to recreate through POI. The existing Excel file uses as DataFormat _(* #.##0_);_(* (#.##0);_(* "-"??_);_(#_). But when I assign that data format through POI, the resulting Excel file will instead use _(* #,##0_);_(* (#,##0);_(* "-"??_);_(#_). Note the tiny difference: a dot changed to a comma. Because of this, the entire format doesn't work anymore. It's not like it's now showing a comma where it used to have a dot; it's formatting the entire value in a completely different way.
Why does this happen? And how do I fix it?
The correct format string _(* #.##0_);_(* (#.##0);_(* "-"??_);_(#_) results in the number being displayed as 13.534.000.
The incorrect format string that Excel or POI changes it to, _(* #,##0_);_(* (#,##0);_(* "-"??_);_(#_) formats the value as 13534000,0.
It's a complete mystery to me why it would do that. I suppose it has something to do with the US and Europe using different formats to display big numbers, but I would imagine that that's exactly what this data format is supposed to address. Instead, it turns it into nonsense.
Apache POI creates Microsoft Office files. Those files never are localized. They always store data in en_US locale. The locale dependent adjustments are done in locale Office applications then. So a Microsoft Office file can be sent around the world without the need to change the stored data to a foreign locale.
So if you set the data format using ...
...
Workbook workbook = new XSSFWorkbook();
DataFormat dataformat = workbook.createDataFormat();
CellStyle cellStyle = workbook.createCellStyle();
cellStyle.setDataFormat(dataformat.getFormat("_(* #,##0_);_(* (#,##0);_(* \"-\"??_);_(#_)"));
...
... the format pattern always needs to be en_US. That means dot is decimal separator, comma is thousands delimiter. A locale Excel application might adjust that to _(* #.##0_);_(* (#.##0);_(* "-"??_);_(#_) then.
Let's have a complete example:
import java.io.FileOutputStream;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
public class CreateExcelNumberFormat {
public static void main(String[] args) throws Exception {
Workbook workbook = new XSSFWorkbook();
DataFormat dataformat = workbook.createDataFormat();
CellStyle cellStyle = workbook.createCellStyle();
cellStyle.setDataFormat(dataformat.getFormat("_(* #,##0_);_(* (#,##0);_(* \"-\"??_);_(#_)"));
Sheet sheet = workbook.createSheet();
Row row = sheet.createRow(0);
Cell cell = row.createCell(0);
cell.setCellValue(1234567.89);
cell.setCellStyle(cellStyle);
row = sheet.createRow(1);
cell = row.createCell(0);
cell.setCellValue(-1234567.89);
cell.setCellStyle(cellStyle);
sheet.setColumnWidth(0, 15*256);
FileOutputStream out = new FileOutputStream("./CreateExcelNumberFormat.xlsx");
workbook.write(out);
out.close();
workbook.close();
}
}
The result Excel file looks in my German Excel like so:

How do I emulate Excel's "Fill down" in Apache POI?

I've got a template workbook, with a sheet ("All data") which I populate using Apache POI. I don't know how many rows I'm going to need in "All data" when I start.
In another sheet (call it "Calc"), I have 4 columns containing formulae that do stuff based on "All data". I need to have as many rows in Calc as in "All data", and I thought the easiest way to do it would be to have, in the template, one row with the formulae in it, which I can then fill down the sheet as many times as necessary.
Thus, in the template I have:
Col1Header | Col2Header | Col3Header | Col4Header
=+'All data'!F2 | =IF(LEFT(A55,1)="4",'All data'!R2,"") | =IF(LEFT(A55,1)="4",'All data'!O2,"") | =+'All data'!W2
Then I would expect to be able to "fill down" from that first formula line, so that I have n rows (where n is the number of rows I'm using in the "All data" sheet).
However, I cannot see how to do "fill down" in Apache POI. Is it something that's not possible? Or am I looking for the wrong name?
Yes, an alternative method would be simply to change the template by manually copying down more rows than I would ever expect to be using, but that is (a) inelegant and (b) is asking for trouble in the future:-)
I feel sure there must be a better way?
If this is for an Office Open XML workbook (*.xlsx, XSSF) and current apache poi 5.0.0 is used, then XSSFSheet.copyRows can be used. The default CellCopyPolicy copies formulas and adjusts the cell references in them.
Example:
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import java.io.FileInputStream;
import java.io.FileOutputStream;
class ExcelReadCopyRowsAndWrite {
public static void main(String[] args) throws Exception {
String fileIn= "./TestIn.xlsx";
String fileOut= "./TestOut.xlsx";
int n = 10; // 10 rows needed
int fillRowsFromIdx = 1; // start at row 2 (index 1) which is row having the formulas
int fillRowsToIdx = fillRowsFromIdx + n - 1;
try (Workbook workbook = WorkbookFactory.create(new FileInputStream(fileIn));
FileOutputStream out = new FileOutputStream(fileOut)) {
Sheet sheet = workbook.getSheet("Calc"); // do it in sheet named "Calc"
if (sheet instanceof XSSFSheet) {
XSSFSheet xssfSheet = (XSSFSheet) sheet;
for (int i = fillRowsFromIdx; i < fillRowsToIdx; i++) {
xssfSheet.copyRows(i, i, i+1, new CellCopyPolicy());
}
}
workbook.write(out);
}
}
}
ThecopyRows method is only in XSSF up to now. For an example how to copy formulas also working for BIFF workbook (*.xls, HSSF) see Apache POI update formula references when copying.

Apache POI Date Parsing One Second Off

I'm parsing an Excel spreadsheet with a date in it. The results from POI are off by 1 second compared to what's displayed in Excel.
The unformatted data in Excel is: 43261.5027743056
The cell in Excel has a format of: mm/dd/yyyy hh:mm:ss
The field in Excel displays as: 6/10/2018 12:04:00 PM
The POI parser (v 4.0.1 and 4.1.0 both) parse it as:
Value: 43261.502774305598
Format: mm/dd/yyyy\ hh:mm:ss
Result: 6/10/2018 12:03:59 PM
Here's my code:
private final DataFormatter formatter;
case NUMBER:
String n = value.toString();
if (this.formatString != null) {
thisStr = formatter.formatRawCellContents(Double.parseDouble(n), this.formatIndex, this.formatString);
}
else thisStr = n;
break;
Am I doing something wrong?
The problem is not the binary floating point problem. This also exists but it should not impact seconds of time.
The problem is that your value 43261.5027743056 is not really exact the date time 06/10/2018 12:04:00 but 06/10/2018 12:03:59.700. So it is 06/10/2018 12:03:59 plus 700 milliseconds. You could see this if you would formatting the cell using the format DD/MM/YYYY hh:mm:ss.000 in Excel.
For such values there is a discrepancy between Excel's date formatting and apache poi's DataFormatter, which uses Java's date format. When Excel shows the date time value 06/10/2018 12:03:59,700 without milliseconds, then it rounds to seconds internally. So 06/10/2018 12:03:59.700 is shown as 06/10/2018 12:04:00. Java's date formatters don't round but simply don't show the milliseconds. So 06/10/2018 12:03:59.700 is shown as 06/10/2018 12:03:59.
Apache poi's DateUtil provides methods which rounds seconds. But those methods seems not be used in DataFormatter.
As workaround we could override formatCellValue of DataFormatter to do so.
Complete example:
Excel:
Code:
import java.io.FileInputStream;
import org.apache.poi.util.LocaleUtil;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.formula.ConditionalFormattingEvaluator;
import java.util.Date;
class ExcelParseCellValues {
public static void main(String[] args) throws Exception {
Workbook workbook = WorkbookFactory.create(new FileInputStream("Excel.xlsx"));
DataFormatter dataFormatter = new DataFormatter() {
#Override
public String formatCellValue(Cell cell, FormulaEvaluator evaluator, ConditionalFormattingEvaluator cfEvaluator) {
CellType cellType = cell.getCellType();
if (cellType == CellType.FORMULA) {
if (evaluator == null) {
return cell.getCellFormula();
}
cellType = evaluator.evaluateFormulaCell(cell);
}
if (cellType == CellType.NUMERIC && DateUtil.isCellDateFormatted(cell, cfEvaluator)) { //we have a date
CellStyle style = cell.getCellStyle();
String dataFormatString = style.getDataFormatString();
if (!dataFormatString.matches(".*(s\\.0{1,3}).*")) { //the format string does not show milliseconds
boolean use1904Windowing = false;
if ( cell != null && cell.getSheet().getWorkbook() instanceof Date1904Support)
use1904Windowing = ((Date1904Support)cell.getSheet().getWorkbook()).isDate1904();
boolean roundSeconds = true; //we round seconds
Date date = DateUtil.getJavaDate(cell.getNumericCellValue(), use1904Windowing, LocaleUtil.getUserTimeZone(), roundSeconds);
double value = DateUtil.getExcelDate(date);
return super.formatRawCellContents(value, style.getDataFormat(), dataFormatString, use1904Windowing);
}
}
return super.formatCellValue(cell, evaluator, cfEvaluator);
}
};
CreationHelper creationHelper = workbook.getCreationHelper();
FormulaEvaluator formulaEvaluator = creationHelper.createFormulaEvaluator();
Sheet sheet = workbook.getSheetAt(0);
for (Row row : sheet) {
for (Cell cell : row) {
String cellValue = dataFormatter.formatCellValue(cell, formulaEvaluator);
System.out.print(cellValue + "\t");
}
System.out.println();
}
workbook.close();
}
}
Result:
Description of value Floatingpoint value DD/MM/YYYY hh:mm:ss.000 DD/MM/YYYY hh:mm:ss
Your example value 43261,5027743056 06/10/2018 12:03:59.700 06/10/2018 12:04:00
Exact Datetime 12:04 43261,5027777778 06/10/2018 12:04:00.000 06/10/2018 12:04:00
Exact minus 500 ms 43261,5027719907 06/10/2018 12:03:59.500 06/10/2018 12:04:00
Exact plus 500 ms 43261,5027835648 06/10/2018 12:04:00.500 06/10/2018 12:04:01
Exact minus 501 ms 43261,5027719792 06/10/2018 12:03:59.499 06/10/2018 12:03:59
Exact plus 501 ms 43261,5027835764 06/10/2018 12:04:00.501 06/10/2018 12:04:01
You're doing this when you parse the cell value as a double. Not all decimal values can be represented exactly as doubles. The nearest double to 43261.5027743056 is 43261.502774305597995407879352569580078125, which rounds to the value you're seeing.

how to write data to excel using scriptom in groovy?

I am reading properties and their values from soapUI and write them to an excel.
I am able to write the unique properties name into an excel
def oExcel = new ActiveXObject('Excel.Application')
Thread.sleep(1000)
assert oExcel != null, "Excel object not initalized"
def openWb = oExcel.Workbooks.Open(excelPath) //excelPath complete path to the excel
def dtUsedRange = openWb.Sheets(dataSheetName).UsedRange //dataSheetName is the name of teh sheet which will ultimately hold the data
//add property names to xlMapSheet under col d or col# 4
for(int r = 1;r<=uniqPropName.size().toInteger();r++){ //uniqPropName is a list that holds all the unique property names in a test suite
openWb.Sheets(xlMapSheet).Cells(r,4).Value = uniqPropName[r-1]
}
oExcel.DisplayAlerts = false
openWb.Save
oExcel.DisplayAlerts = true
openWb.Close(false,null,false)
oExcel.Quit()
Scriptom.releaseApartment()
However now I have to write all the properties to the same excel. I have already created a map of the excel column names and soapUI properties so now i just have to find the matching excel col name from the map and write the property value under that excel.
I am using a function to do this stuff. This function is called from within a for loop which loops through all the properties in a test case. To this function I pass
sheetName //sheet where data has to be written
sheet //path of the excel file
pName //property name
pValue //property value
xMap //excel col name/heading map
tName //test case name
tsNum //step number
The relevant code for this function is below.
def write2Excel(sheetName,sheet,pName,pValue,xMap,tName,tsNum){
//find the xl Col Name from the map
def xl = new ActiveXObject('Excel.Application')
assert xl != null, "Excel object not initalized"
//open excel
def wb = xl.Workbooks.Open(sheet)
def rng = wb.Sheets(sheetName).UsedRange
//get row count
int iColumn = rng.Columns.Count.toInteger()
int iRow = rng.Rows.Count.toInteger()
//find column number using the col name
//find the row with matching testcase name and step#
//write data to excel
if(rFound){ //if a row matching test case name and step number is found
rng.Cells(r,colId).Value = pValue
}else{
rng = rng.Resize(r+1,iColumn) //if the testcase and step# row doesn't exist then the current range has to be extended to add one more row of data.
rng.Cells(r+1,colId).Value = pValue
}
//save and close
xl.DisplayAlerts = false
wb.Save
xl.DisplayAlerts = true
wb.Close(false,null,false)
xl.Quit()
Scriptom.releaseApartment()
}
The code is currently running. It has been running since yesterday evening(2pm EST) so even if the code works it is not optimal. I can't wait this long to write data.
The curious thing is that the size of the excel keeps increasing which would mean that data is being written to the excel but i have check the excel and it has no new data..nothing..zilch!!
Evidence of increasing size of the file.
20/02/2014 04:23 PM 466,432 my_excel_file.xls
20/02/2014 04:23 PM 466,944 my_excel_file.xls
20/02/2014 04:38 PM 470,016 my_excel_file.xls
20/02/2014 04:45 PM 471,552 my_excel_file.xls
20/02/2014 04:47 PM 472,064 my_excel_file.xls
20/02/2014 05:01 PM 474,112 my_excel_file.xls
20/02/2014 05:01 PM 474,112 my_excel_file.xls
21/02/2014 07:23 AM 607,232 my_excel_file.xls
21/02/2014 07:32 AM 608,768 my_excel_file.xls
21/02/2014 07:50 AM 611,328 my_excel_file.xls
My questions are:
1. Why is data not being written when i am calling the function from within the for loop but getting written when i call it linear-ly?
2. In the first piece of code the excel process goes away when its done writing but when the function is run, the excel process remains even though its memory utilization goes up and down.
I am going to kill the excel process and instead of looping I am going to try and write only one or two sets of data using the function and will update this question accordingly.
The process of opening an excel, writing to a cell, , save the excel, closing the excel is a time consuming task and when you multiply this with 300 test cases and ~15 properties per test, it can take significantly long. That is what has happening in my case and hence the process was taking forever to complete.
I am not 100% on why the size of the excel was increasing and nothing was getting written but i would guess that data was being kept in the memory and would have been written once the last cell was written, workbook saved and excel closed. This never happened because I didn't let it complete and would kill it when i realized that it has been running for an exceptionally long time.
In order to make this work, i changed my approach to the following.
generate a map of col name and prop name
generate a map of prop name and prop value for each test case. As one test case can have multiple property test steps, i create a multi map like this...
[Step#:[propname:propvalue,....propname:propvalue]]
create another map with col name and col id.
Create a new map with col id and prop value. I made this using the above created maps.
write data to excel. Because i already have the col id, and the value that goes into it. i don't do any checks and just write data to excel.
these steps are repeated for all test cases in the test suite. Using this process, i was able to complete my task within a few minutes.
I know i am using quite a few maps but this is the approach i could come up with. If anyone has a better approach, I would really like to try that out too.

DateTime is rounded up to the next day using ExcelLibrary

The datetime I'm writing to Excel always get rounded up to the next day:
workSheet.Cells[0, 0] = new Cell(DateTime.Now, new CellFormat(CellFormatType.DateTime, #"HH:mm:ss"));
In the output Excel file the cell gets this value: 29/09/2013 00:00:00
The DateTime.Now from this example is 28/09/2013 19:42:23
I ended up passing the cell value as a string instead of as a DateTime:
workSheet.Cells[0, 0] = new Cell(DateTime.Now.ToString(#"HH:mm:ss:ff"),
new CellFormat(CellFormatType.DateTime, #"HH:mm:ss"));
If you are using the ExcelLibrary Project Source Code, you can fix this by:
Go to SharedResource Class in this location: [Project Source Code folder]\Office\BinaryFileFormat folder
Change the EncodeDateTime function as below:
public double EncodeDateTime(DateTime value)
{
double days = (value - BaseDate).Days;
//if (days > 365) days++;
return days;
}
Pass the DataTime object to the Cell with the prefered format:
worksheet.Cells[iIndex, j] = new Cell(((DateTime)cellValue), new CellFormat(CellFormatType.DateTime, #"dd/MM/yyyy"));
You need to convert the date format from OLE Automation to the .net format by using DateTime.FromOADate.
If oCell.Format.FormatType = CellFormatType.Date OrElse oCell.Format.FormatType = CellFormatType.DateTime Then
Dim d As Double = Double.Parse(oCell.Value)
Debug.print(DateTime.FromOADate(d))
End If

Resources