How to recreate excel's Long type to Date type conversion in scala - excel

I did a paste special of date column as values in excel. I want to convert the resulted long values back to dates in spark (using scala api).
Example: converting 41088.96389 to date in excel results in 6/29/16 23:08
same when did through cast(DataTypes.TimestampType) in spark, it gives 01 Jan 1970 11:24:48 GMT
Any links to how excel handles long type when converting to date will be appreciated.

Excel's number is the amount of days elapsed since 1st of January, 1900.
DateTime or Date usually accepts a UNIX epoch timestamp, which is the number of milliseconds (or sometimes seconds, depends on the implementation) elapsed since the 1st of January, 1970.
Converting between the two is not so easy, since you have to count for leap years, and maybe even time zones.
You can find multiple implementations for this in different languages, maybe you can port one of those:
https://gist.github.com/peter216/6361201
https://gist.github.com/christopherscott/2782634
Or maybe just try to normalize the Excel output if you have access to the sheet:
https://stackoverflow.com/a/11140924/1395437

Related

Unix timestamp from csv gives me the same date, although different values

I have a csv file with some columns, like date and time. The date is a 10 digit format, unix format from what i read online.
examples are: 1567285228,1567285348, 1567371053.I found that I can use the formula: =(((A2/60)/60)/24)+DATE(1970,1,1) to calculate the proper date from the code, but the problem is that my csv is for one entire month, the codes are different, but the proper date gives me only 2 values: 01 sept 2019 and 31 aug 2019.
How can I find the REAL dates, for the WHOLE month?
Also, the time when the measurements are taken are for example: 712980, 713040
,713100,713160, but when i use excel to format the time stamp to proper time, it only shows me 12 AM. How can I calculate the proper dates and time, so that I can analyse the data from the csv?

How to convert a String date to a Date

How can I convert this string:
Tue Jan 24 14:59:20 BRT 2017.
Into a date that includes day month year and time and timezone, using Excel functions only.
I have several cells with dates following this format. I have to compute the difference between some of these dates in minutes. I believe that the first step is converting the date to a String to a real date information. Then, I will be able to: order the dates and compute the time between consecutive dates.
Use this formula:
=--(SUBSTITUTE(MID(A1,5,LEN(A1)),"BRT",""))
Then format it to the format you want.
It will now work in math equations.

Timestamp line to a excel readable timestamp

I have thousands of timestamps in the form shown:
Sun Jul 02 06:00:02 2017 (GMT-04:00)
With Date, Day, and Time all varying. This is a military time stamp ( ie each day counts up to 24 hours)
I have used the following formula to get just the time. But it's not readable by excel.
=MID($A2,FIND(":",$A2)-2,8)
This results in a value like:
06:00:02
However, this does not help and I have to do much manipulation. I want excel to recognize the date and time so i have a true timeline on my x-axis.
I'd like to get it in the form of
7/2/2017 6:00:02 AM
This way I can have a graph that appears like the following:
Graph Example with desired timeline on x axis
Cheers!!!!
Since the weekday and the month are all in three letter abbreviations, you can use the Mid function with an absolute position to extract elements of the date. For just the date you could use
=DATEVALUE(MID(A1,9,2)&"-"&MID(A1,5,3)&"-"&MID(A1,21,4))
For just the time value, you could use
=TIMEVALUE(MID(A1,12,8))
To get the date and time in one value, just add the two and format as you want to see it.
=DATEVALUE(MID(A1,9,2)&"-"&MID(A1,5,3)&"-"&MID(A1,21,4))+TIMEVALUE(MID(A1,12,8))

Date is not converted exactly when read from excel

I have an Excel spreadsheet that is being read.
The value in the spreadsheet is 7/24/2014 10:43:33 AM
The cell value after being read using OpenXML is 41844.446908680555.
When I do this calculation to convert to a date:
dte = DateTime.FromOADate(double.Parse(value));
I get 7/24/2014 10:43:32 AM
Is this typical when converting date/time or am I missing something?
Thanks
It seems that DateTime.FromOADate(double.Parse(value)); is truncating rather than rounding the fractional seconds. Excel stores date/time as days and fractions of days since 1 Jan 1900 (with the intentional error of calling 1900 a leap year, supposedly for compatibility with Lotus 123 at the time).
Therefore, the number 41844.446908680555 translates to, given Excel's level of precision
7/24/2014 10:43:32.910
(actually: 7/24/2014 10:43:32.9099949030205)
Just format the cells as Dates. 41844.446908680555 is Excel's way of serializing the date value.
When Excel stores a date or time in stores it in a number format with the date January 1, 1900 = 1
so really when you’re storing a date with the date format you’re really just storing the numeric value of difference between the date and January 1, 1900
So for example 365 = jan-30-1900
And fractions of a number equal parts of the day so .5 = half a day or 12 hours.
And for the fun of it right now = 41885.75 or sept-3-2014 at 6PM or 41885.75 from jan 1 1900.
The reason why this is done is to now allowed dates to be used in mathematical functions. And it deals with a lot of problems that pop-up with dates such as leap year and also provides for easier ways to deal with time zones as well.

Excel parsing (xls) date error

I'm working on a project where I have to parse excel files for a client to extract data. An odd thing is popping up here: when I parse a date in the format of 5/9 (may 9th) in the excel sheet, I get 39577 in my program. I'm not sure if the year is encoded here (it is 2008 for these sheets).
Are these dates the number of days since some sort of epoch?
Does anyone know how to convert these numbers to something meaningful? I'm not looking for a solution that would convert these properly at time of parsing from the excel file (we already have thousands of extracted files that required a human to select relevant information - re-doing the extraction is not an option).
Excel stores dates as the number of days since 0-JAN-1900 (so 1-JAN-1900 would have a value of "1"). You can find a really good breakdown of how Excel handles dates and times here:
Dates And Times In Excel
When dates appear on screen in Excel as "5/9", "May 9th", or some such, it is a trick of the formatting and is not representative of the underlying data value. It sounds like your parsing program is pulling the underlying value, not the formatted date. In order to suggest a fix, though, I need to know what your parsing program is (Excel macro, formula, outside code, etc.).
DateTime.FromOADate (if you're using .NET) is the method you want. Excel dates are stored as doubles. If you have dates in the first two months of 1900 you might get bit by the Excel 1900 leap year bug.
From http://msdn.microsoft.com/en-us/library/system.datetime.fromoadate.aspx:
Double-precision floating-point number
that represents a date as the number
of days before or after the base date,
midnight, 30 December 1899. The sign
and integral part of d encode the date
as a positive or negative day
displacement from 30 December 1899,
and the absolute value of the
fractional part of d encodes the time
of day as a fraction of a day
displacement from midnight. d must be
a value between negative 657435.0
through positive 2958466.0.
All you need to do is format the cells correctly. Or am I misunderstanding your question -- are you saying you want to do it OUTSIDE of Excel? I wasn't sure. I'll delete this answer if it turns out to be stupid.

Resources