apache poi reading date from excel date() function gives wrong date - apache-poi

Im using an xlsx file that contains a DATE(year, month, day) formula within a cell. This cell is formatted as date, so Excel/OpenOffice shows the proper date.
e.g. the cell content is '=DATE(2013;1;1)' which produces : '01.01.2013'
So far - so good.
When reading this file with the poi library I do:
XSSFWorkbook workbook = new XSSFWorkbook(new FileInputStream(new File("/home/detlef/temp/test.xlsx")));
XSSFSheet sheet = workbook.getSheet("Sheet A");
XSSFCell cell = sheet.getRow(0).getCell(0);
System.out.println(cell.getDateCellValue());
This will print out:
Sun Dec 31 00:00:00 CET 1899
I am using POI 3.9.
Can anybody tell me why this happens?
Found a way to do it:
if (cell.getCellType() == Cell.CELL_TYPE_FORMULA) {
FormulaEvaluator evaluator = workbook.getCreationHelper().createFormulaEvaluator();
CellValue evaluate = evaluator.evaluate(cell);
Date date = DateUtil.getJavaDate(evaluate.getNumberValue());
System.out.println(date);
}
That produces:
Tue Jan 01 00:00:00 CET 2013
Thanks anyway.

Everything you have described is entirely correct and expected
When Excel stores a formula in a cell, not only does it store the formula itself (either a string in .xlsx, or in parsed token form for .xls), it also stores the last evaluated answer of that formula. This means that when Excel loads, it doesn't have to grind away for ages calculating the formula results to display, it can just render them with the last value like any other cell.
This is why when you make changes to your excel file with Apache POI, you then need to run an evaluation to update all those cached formula values, so it looks right in Excel before you go to that cell
However, there are a few special formula functions in Excel, which are defined as volatile. These are functions which always return a different value every time, for example DATE, and for these Excel just writes a dummy value (eg 0 or -1) to the cell, and re-evaluates it on load
When you read the numeric value of a formula cell in POI, it gives you the cached value back. If you want POI to evaluate the formula, you need to ask it to, as you're doing
Dates in Excel are stored as fractional days since 1/1/1900 or 1/1/1904 (depending on a flag). A value of -1 is the 31st of December 1899, so that's what you see when Excel writes -1 and you request it as a date

Related

Apache POI: Transfer time from cell to cell

Need in short: I'd like to copy a time value from one cell to another cell.
Problem in short: POI 5.2.2 (or more specific: DateUtil.internalGetExcelDate) transforms the 08:00 o'clock from the input cell to the numeric value -1.00.
More details: There's an xlsx file (created with LibreOffice 7.0.4.2) with the time '08:00:00' in it:
I can read that value with sourceCell.getLocalDateTimeCellValue(), which is fine.
But when I try to transfer that value into another cell (targetCell.setCellValue(sourceCell.getLocalDateTimeCellValue())), in the targetCell there is the value -1 instead of the expected 08:00:00 o'clock.
Here's a screenshot while debugging the setCellValue call:
And here's a screenshot while debugging the DateUtil.internalGetExcelDate call:
Possible workaround: I guess that it would work to evaluate the LocalDateTime from the sourceCell and if its year is < 1904 then I add some years to that the resulting LocalDateTime is not transformed to -1.00 by DateUtil.internalGetExcelDate.
This is something I don't want to do because that would set another value in the targetCell than there was in the sourceCell.
Another workaround: Another workaround would be to use LocalDateTime.now(), set the hour and minute, call targetCell.setCellValue(...) and then change the format like this:
short format = workbook.createDataFormat().getFormat("HH:MM:SS");
CellStyle cellStyle = workbook.createCellStyle();
cellStyle.setDataFormat(format);
targetCell.setCellStyle(cellStyle);
Unfortunately I don't know whether the sourceCell just contains a time or whether it contains a full timestamp. I just want to copy cell contents (which works fine with String, Number, ...).
Actual workaround: As a current (working) workaround I check the year and if it's <1900 then I set another year at the date, set the modified LocalDateTime into the cell and set the dataformat (see workaround description above).
Question: How can I transfer a LocalTime value from one cell to another cell (without manipulating the year)? I guess that my (working) workaround should not be the answer ...
Excel cell date types
Microsoft Excel only has following cell data types:
String (alphanumeric)
Numeric (floating point number)
Boolean (true or false)
Formula (formula strings)
Error (internally error codes)
Empty (empty cell)
There is no special date cell data type as well as no special Integer cell date type.
How Excel stores date or time or date-time
If Excel stores date-time, it stores it as floating point number. There 1.00 is 1900-01-01 00:00:00.000. (There is a special case when Excel has set 1904-Date. But that is a special case only about the meaning of 1.00.).
Adding 1.00 means adding one day. Adding 1/24 means adding one hour. Adding 1/24/60 means adding one minute. Adding 1/24/60/60 means adding one second. Adding 1/24/60/60/10 means adding a tenth second and so on.
For cell values lower than 1.00, Excel interprets that as time in day 0 of month 1 in year 1900 (or 1904 if Excel has set 1904-Date). There 1/24 means one hour. 1/24 + 1/24/60 means one hour and one minute and so on. So your 08:00:00 is the cell value 8*1/24 (8 hours) = 1/3.
When reading Excel cell values the only way to determine whether Excel interprets a cell value as a date or time or date-time is to get the cell's number format too. If that is a date or time or date-time format, then Excel interprets a cell value of 1.00 as 1900-01-01, a cell value of 1/24 as 01:00:00, a cell value of 1/24 + 1/24/60 as 01:01:00 and so on.
But your observation is correct. If Apache POI reads a date-time from an Excel cell which has set only a time, which is a numeric (double) value between 0.00 and 1.00, then it reads a date-time of day 1899-12-31. But that is not what Excel does. For Excel the time-only value is in day 0 of month 1 in year 1900. If then Apache POI sets a date-time value of, for example 1899-12-31 08:00:00, then it sets -1 because Excel cannot have date-time values before 1900.
So the only way to set time values in Excel cells is to set numeric (double) values between 0.00 and 1.00 and set a cell style having a number format of HH:MM:SS. One cannot set a time only Excel cell value from a Java date-time value, because there is not a Java date-time value which can have day 0 of month 1 in year 1900.
So if DateUtil.isCellDateFormatted tells that the source cell is date formatted and the numeric (double) cell value of that cell is lower than 1, then set that numeric (double) cell value to the new cell and format that cell the same as the source cell.
Complete example to test:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.poi.ss.usermodel.*;
public class ExcelSetCellValue {
public static void main(String[] args) throws Exception {
Workbook workbook = WorkbookFactory.create(new FileInputStream("./ExcelWithTime.xlsx")); String filePath = "./ExcelWithTimeNew.xlsx";
//Workbook workbook = WorkbookFactory.create(new FileInputStream("./ExcelWithTime.xls")); String filePath = "./ExcelWithTimeNew.xls";
Sheet sheet = workbook.getSheetAt(0);
Cell sourceCell = sheet.getRow(0).getCell(0); // get cell value from A1: 08:00:00
Row targetRow = sheet.getRow(5); if (targetRow==null) targetRow = sheet.createRow(5);
Cell targetCell = targetRow.getCell(5); if (targetCell==null) targetCell = targetRow.createCell(5);
if (sourceCell.getCellType() == CellType.NUMERIC && DateUtil.isCellDateFormatted(sourceCell)) {
System.out.println(sourceCell.getNumericCellValue()); // 8*1/24 = 1/3
System.out.println(sourceCell.getLocalDateTimeCellValue()); // 1899-12-31T08:00
//targetCell.setCellValue(sourceCell.getLocalDateTimeCellValue()); // does not work because sourceCell.getLocalDateTimeCellValue() is in year 1899
targetCell.setCellValue(sourceCell.getNumericCellValue());
targetCell.setCellStyle(sourceCell.getCellStyle());
}
FileOutputStream out = new FileOutputStream(filePath);
workbook.write(out);
out.close();
workbook.close();
}
}

day and month are reversed

I have a cell with the following content:
01/02/2015
The cell is date formatted.
Then I copy the value and put it in my module class:
Set Line = New CTravelLine
Line.Date= Cells(1, 8).value
Everything works fine until the moment I put this value in another cell:
The value 01/02/2015 becomes 02/01/2015.
I am using this format (dd/mm/yyyy). I have the impression that when the days are numerically lower than the month, the 2 values are reversed. The values are reversed whatever the method I tried:
Method 1:
Dim WrdArray() As String, datetest As String
WrdArray() = Split(travelLine.Date, "/")
datetest= WrdArray(0) & "/" & WrdArray(1) & "/" & WrdArray(2)
Cells(5, 5) = datetest
Method 2:
Cells(5, 5) = travelLine.Date
I don't understand how I can solve this problem.
This might have happened due to 'Regional formatting problem'.
Excel has a habit of forcing the American date format (mm/dd/yyyy) when the dates have been imported from another data source. So, if the day in your date happens to be 1 - 12, then Excel will switch the date to mm/dd/yyyy.
When dates are imported from a text file, there is an option in the VBA code to apply regional format which corrects this problem.
OR
Change number format of date column in excelsheet from 'date' format category to 'text'; save it.
(After Saving run the VBA Code if you have any. Now check whether the date format is 'text' or changed back to 'date'.)
If it has changed back to 'date' try to fix it as 'text'
If it's 'text'; Correct the erroneous date cells and save the excel sheet. This will make dates not to change automatically to American Format.
Long story short, I had a similar problem where the dates are working just fine in some cells but keep flipping in others regardless if I copy paste or enter manually, I did the whole data text to column and cell formatting solutions and all of that didn't work.
The solution actually is not in excel, it's in the region and language setting.
To have the dates display as MM/DD/YYYY in the formats tab change the format to US.
To have the dates display as DD/MM/YYYY in the formats tab change the format to UK.
I had the same issue as you .
Let me explain what I want to do :
I have a csv file with some date.
I copy a range of my sheet in variable table. The range contain some columns with dates.
I make some manipulations on my table (very basic ones)
I transpose my variable table (since only the last dimension of a variable table can be increase)
I put my variable table on a new sheet.
What I found:
There is no date issue after executing step 1-4. The date issue shows up when writing on the sheet...
Considering what Avidan said on the post of Feb 24 '15 at 13:36, I guess it is excel which forces the American format mm/dd/yyyy... So I just change the date format at the very beginning of my program :
Before starting any manipulation date:
do
.Cells("where my date is") = Format(.Cells("where my date is"), "mm dd yy")
execute your program
write the table on a sheet
back up to the date format you like directly on the sheet
Just use:
Line.Date = cDate(Cells(1, 8).value2)

Date format dd/mm/yyyy read as mm/dd/yyyy

I have a spreadsheet with a column formatted as:
Category: Date
Type: *dd/mm/yyyy
Location: UK
When I read the data in this column via VBA, it reads in the format mm/dd/yyyy.
For example, 10/06/2014 (10 June 2014) is reading 06/10/2014 (06 Oct 2014).
My code: sDate = SourceSheet.Range("AB" & CurRow.Row).Value
I have this issue with my forms too and the best method for me is to format the textbox like this:
sDate = format(SourceSheet.Range("AB" & CurRow.Row).Value, "mm/dd/yyyy")
Even though the date format is wrong in VBA, it seems to work the right way round in Excel. It's weird, I can't explain why it happens, but this fixes it for me. Whenever I go from VBA to Excel, I almost always find this issue if the value is stored as a date.
Consider:
Sub luxation()
Dim sDate As Date, CurRow As Range
Set SourceSheet = ActiveSheet
Set CurRow = Range("A1")
ary = Split(SourceSheet.Range("AB" & CurRow.Row).Text, "/")
sDate = DateSerial(ary(2), ary(1), ary(0))
MsgBox Format(sDate, "dd mmmm yyyy")
End Sub
This question of mine - .NumberFormat sometimes returns the wrong value with dates and times - gives some background which may help.
I first encountered this VBA bug many years ago and it is worse than it seems. I noticed that many - but not all - dates in a worksheet that I had been updating for a year were wrong. It took me a long time to diagnose the problem. Those dates that could be interpreted as middle endian dates had been corrupted but those that could not be interpreted as middle endian dates were unchanged. So 12/06/2014 will become 6 December but 13/06/2014 will remain 13 June. If 13/06/2014 had been rejected as an invalid date or left as a string, I would have spotted the error immediately. The dual interpretation so every date was imported as a date - the wrong date but still a date - ensured I did not notice until much later maximising the cost of correcting for the bug.
Excel holds dates and times as numbers. "17 June 2014" is held as 41807 and "1 January 1900" is held as 1. In both cases, the value is the number of days since 31 December 1899. Times as held as a fraction:
number of seconds since midnight
--------------------------------
seconds in a day
So 06:00, 12:00 and 18:00 are held as 0.25, 0.5 and 0.75.
This bug is encountered when the transfer of a date involves a conversion to and from string format. I have not discovered a single case in which the conversion from date to string has been wrong. It is the conversion from string to date that hits this bug.
I can see that SilverShotBee's solution will avoid the bug but it would not appeal to me. I no longer use any ambiguous dates ever.
One choice is to transfer the value as a number. If cell A3 contains the date and time "17 June 2014 9:00" then CDbl(Range("A3").Value) returns 41807.375. When you store this number in a cell you will need to set the cell's NumberFormat to the date format of your choice but that might be a good thing.
If I were going to use middle endian dates, I would be explicit. #13/06/2014# is always interpreted as middle endian.
I prefer unambiguous strings. "2014-06-13" or "13 June 2014" are not misinterpreted by VBA or by a human reader.
Have just come up against this issue! Reading records from a .csv and storing in an .xls
I found the following sequence works to overcome the misinterpreted dates:
Read the date field from the .csv file
Store it into a cell in the .xls file
Read it back into vba
Store into its required destination in the .xls
Date is in original format
I found this issue to be incredibly complex and was trying to keep it as simple as possible but have indeed left a few vital details out! Apologies. Here is a fuller version of what I found:
First of all I should explain I was reading dates (and other fields) from a .csv and storing back into an .xls
I am on Office 2002 running on Windows/7
Using 2 example dates: 27/4/2015 and 7/5/2015 in dd/mm/yyyy string format (from the csv)
What I found was:
Reading the 27/4/2015 text date field from csv into a variable dimensioned as STRING and storing into an xls field in dd/mm/yyyy DATE format produces a cell that reads 27/4/2015 but converting it into a cell formatted as Number also produces 27/4/2015. 7/5/2015 on the other hand produces a string that reads 7/5/2015 and converting it into a cell formatted as Number produces 42131.
Reading the 27/4/2015 text date field from csv into an undimensioned variable and storing into an xls field in dd/mm/yyyy DATE format produces a cell that reads 27/4/2015 but converting it into a cell formatted as Number also produces 27/4/2015 while 7/5/2015 reads 5/7/2015 and converting it into a cell formatted as Number produces 42190.
Reading the 27/4/2015 text date field from csv into a variable dimensioned as DATE and storing into an xls field in dd/mm/yyyy DATE format produces a cell that reads 27/4/2015 and converting it into a cell formatted as Number produces 42121. 7/5/2015 on the other hand produces a string that reads 5/7/2015 and converting it into a cell formatted as Number produces 42190.
The first 3 scenarios above therefore do not produce the desired results for all date specifications.
To fix this I do the following:
Input_Workbook.Activate
ilr = Range("A5000").End(xlDown).End(xlDown).End(xlDown).End(xlUp).Row
For i = 1 To ilr
Input_Workbook.Activate
If IsDate(Cells(i, 1).Value) Then
d1 = Cells(i, 1).Value
d1 = Replace(d1, "/", "-")
ThisWorkbook.Activate
Cells(14, 5).Value = d1
d1 = Cells(14, 5).Value
If VarType(d1) = vbString Then
d1 = CDate(d1)
End If
Cells(i, 1).Value = d1
End If
Next
The cell used to store the date initially is formatted GENERAL and the ultimate target cells is formatted as DATE (dd/mm/yyyy).
I don't have enough brain cells left to fully explain what happens to the dates during this process but it works for me and of course the choice of target cells is completely random in the above code block.
The problem was VBA was opening the csv with the reverse dates for single digit days.
This way of opening the workbook worked the same as when I did it manually so had the correct dates in dd/mm/yyyy format. Then copied across correctly:
Workbooks.OpenText FileName:=fpathO, datatype:=xlDelimited, comma:=True, local:=True

Excel VBA will not format date correctly after slice

I'm new to VBA and have been reading quite a bit about it lately, though I've ran into a small issue which I can't seem to find the an answer to.
I have a spread sheet that I need to format into a certain amount of columns, pulling data from one column and reformatting it into another.
One of these columns needs to include an week-end date that the report was submitted on - the date is in a cell (N10) and looks like this:
Week: 2011 36:02 Oct 11 - 08 Oct 11
So I've sliced that cell and entered it into another cell (C14) with this bit of code:
Range("C14") = "=Right($N$10, 9)"
I'm able to get the portion of the string "08 Oct 11" but cannot get it into a m/d/yyyy format. This is the portion of the code I'm using to format to a date:
Columns("C:C").NumberFormat = "m/d/yyyy"
I'm imagine that this cell needs to be formatted more in order to then format as a date, but I'm not sure where to start.
Thank you
The Right function will return a string, you need to convert it to a date. There's a variety of ways you could do that, one way could be to use the Value function:
Range("C14") = "=Value(Right($N$10, 9))"
This will give C14 the value 40824 (see this for an explanation), but once you apply your NumberFormat it will display as 10/8/2011.

EXCEL VBA CSV Date formatting issue

I am an Excel VBA newbie. My apologies if I am asking what seems to be an obvious question.
I have a CSV text file I am opening in an Excel worksheet. One row of the file contains date information formatted as "YY-MMM', (ie: 9-FEB, or 10-MAY). When the Excel spreadsheet opens the file the date information is changed to "mm/dd/yyyy" format and reads as 02/09/2009 or 05/10/2009 where the YY-MMM is now MM/YY/2009, (ie: the 9-FEB becomes 02/09/2009, and the 10-MAY becomes 05/10/2009).
I would like to know if it is possible to reformat the field from YY-MMM to mm/01/yyyy.
I have tried to parse the date field after converting it to text with
Range("B11", "IV11").NumberFormat = "text"
However, then the value is a serial date and non-parsable.
I have been unsuccessfully looking for a list of the NumberFormat options.
If you can point me in a direction it will be much appreciated.
Just to answer a part of your question, here is the list of date formatting options (excluding time):
d = day of month, e.g. 7
dd = zero-padded day of month, e.g. 07
ddd = abbreviated day name, e.g. Thu
dddd = full day name, e.g. Thursday
pretty much the same for month...
m = month number, e.g. 7
mm = zero padded month number, e.g. 07
mmm = abbreviated month name, e.g. Jul
mmmm = full month name, e.g. July
years are simpler...
yy = 2 digit year number, e.g. 09
yyyy = 4 digit year number, e.g. 2009
you can combine them and put whatever separators you like in them
e.g.
YY-MMM, 09-FEB
DDDD-DD-MMM-YY, Wednesday-04-Feb-09
dd/mm/yyyy, 04/02/2009
mm/dd/yyyy, 02/04/2009
I've just been trying a few things out and I think your best bet is to change the format in the text file to conform to a more standard date arrangement. If you can reverse them (e.g. MMM-YY) you'll be fine, or split them into separate columns (what if when you import you define - as a separator as well as comma?). This is one case where Excel trying to be clever is a pain.
HTH
If you are currently using File > Open to open the CSV file then try Data > Import External Data > Import Data instead. This brings up the text import wizard which might give you more flexibility in how to import the file. Specifically, it lets you declare a column in the file as being text so that Excel does not try to parse the value
As Simon has explained, your terminology for the current date format is not correct. 9-FEB corresponds to d-mmm format and 02-19-2009 (NB deliberately changed to an unambiguous date for this example; 02-09-2009 is 9th Feb in the US but 2nd Sept in the UK) corresponds to mm-dd-yyyy
If you wanted to change the NumberFormat to text for the range starting in cell B11 and ending in cell IV11 then you would use:
Range("B11:IV11").NumberFormat = "#"
# signifies text format and you need the : operator to indicate a contiguous range of cells. Your original example of Range("B11","IV11") actually indicates the union of cells B11 and IV11 and thus would only have affected two cells
It is unusual to have your data structured in rows rather than columns. Rather than having your dates in row 11, it would be more common to have them in column K instead so that you would use Range("K2:K65535") or just Rows("K") Most of the built-in Excel stuff like Sort and AutoFilter assume that your data is laid out in columns anyway. Row 1 traditionally contains the column names and then each subsequent row contains an individual record. You might want to look at how your CSV file is being generated to see if you can switich it to a more usable column-based structure
If you have a date like 19-FEB as text in a cell then, assuming that you always want the year part to be the current year, you can change it to the first day of the month in mm/dd/yyyy format in VBA. This example only changes one cell but you can use it as the basis for a wider solution:
Option Explicit
Sub main()
Dim strOrigValue As String
With ThisWorkbook.Worksheets("Sheet1").Range("B11")
strOrigValue = .Value
.NumberFormat = "mm/dd/yyyy"
.Value = DateSerial(Year(Date), Month(strOrigValue), 1)
End With
End Sub
DateSerial creates a date serial number from the given year, month and day. Year(Date) extracts the year part from the current system date, Month(strOrigValue) takes the month part from the data that's already in the cell and we want the first day of the month
In order to read in a date, you can do something like this:
myDate = DateValue(dateText)
although there are caveats to how excel may interpret dates, and what technical limits it has in what dates it may store. (you may need to read up on maximum and minimum dates, and how excel interprets two-digit years >30 vs <30, among other things)
In order to format the date, you can use one of three functions:
myDateText = DatePart("yyyy", myDate) & "/" & DatePart("mm", myDate) & "/" & DatePart("dd", myDate)
or this
Format (myDate, "yyyy/mm/dd")
or this
FormatDateTime(#1/1/2020#, vbShortDate)
You can search google for more documentation on each function. For the format function, you will want to find a good reference -- I've seen some rather incomplete ones. MSDN tends to have good ones, but not always. Feel free to bump this answer if you want more details.
This assumes your 2 digit year is 2000. Once you end up with your incorrect date 9-Feb = 02/02/2009, you can convert this to the date you want with the formula: =DATE(Day(B11)+2000,Month(B11),1)

Resources