Dealing with date strings with different formats in Matlab - string

I have a very long vector of date strings (1000000+) in Matlab that I want to convert to a serial number format. The issue is that not every date string is the same format. They are one of two formats,
'2010-03-04 12:00:00.1'
or
'2010-03-04 12:00:00'
The problem is that not all the strings have the millisecond precision. There is no regular pattern as to where these strings without milliseconds occur. The data is originally read from a data file, and the strings currently exist as cell arrays. My work around this is as follows:
for i=1:length(dates),
if length(dates{i})==19
dates(i)=datenum(temp);
elseif length(dates{i})==21
dates(i)=datenum(temp,'yyyy-mm-dd HH:MM:SS.FFF');
end
end
Is there perhaps a better way to go about this? It is important that I retain the millisecond precision when it is present. The intent of this is that I will have to extract and calculate statistics on data associated with each time based on different time criteria, and I figured it would be easier if the dates were handled as numbers.

In MATLAB R2010b, I'm able to get the desired output when calling DATENUM with no additional formatting arguments:
>> dateStrs = {'2010-03-04 12:00:00.1'; ... %# Sample strings
'2010-03-04 12:00:00'};
>> datenum(dateStrs)
ans =
1.0e+005 *
7.3420 %# The same? No, the Command Window just isn't displaying
7.3420 %# many places after the decimal point.
>> format long %# Let's make it show more decimal places
>> datenum(dateStrs)
ans =
1.0e+005 *
7.342015000011574 %# And there's the difference!
7.342015000000000

Related

SAS - Excel Import DateTime not retaining enough significant digits

As the title states, I'm having issues importing datetimes from Excel to SAS. The issue seems to be with seconds:
here's sample data from excel:
My DateTime:
Here's the raw excel numbers:
Edit2:
43417.58657407
43417.58656250
When I'm import these into SAS, this is how SAS is displaying them:
13NOV2018:14:04:39
13NOV2018:14:04:39
and the numeric values:
1857737079
1857737079
I'm trying to figure out how to get SAS to read the seconds correctly. I'm using proc import and here's my code:
proc import
out = MyDSOutput
datafile= MyDSInput
dbms = EXCEL replace;
sheet = "page";
getnames = yes;
mixed = yes;
scantext = yes;
usedate = no;
scantime = yes;
textsize = 32767;
;
run;
EDIT: I should have added that converting this to a CSV really isn't an option because I have numeric IDs that are >15 digits and excel will convert anything >15 digits to 0s.
EDIT2: Added an expanded version of the raw excel numbers
Excel stores time as a fraction of a day. It is impossible to represent a specific number of seconds exactly when you do that. That particular time, 14:04:40 (second number 50,680), is particularly hard to represent as a floating point fraction. If you represent it as 0.58657407 and multiple by the number of seconds in a day you get seconds of 50,679.999648, so slightly less that what you want.
Try splitting your DATETIME field into separate DATE and TIME fields. That way SAS will have more binary digits to represent just the time of day (since it doesn't also have to have the 40K or 20K seconds for the day of the year). Perhaps that will get close? Or store the value as a character string in Excel and then use INPUT() function in SAS to convert the string to a datetime value.
For your ID issue, do not store ID values as numbers, in any system if you can avoid it. If you can store the ID in your EXCEL file then EXCEL can write it the the CSV file. But when you read the CSV file make sure to read that column into a character variable and not a numeric one.

SAS: Date reading issue

I have imported an excel sheet where the date1 is 4/1/16 date2 is 5/29/14 and date3 is 5/2/14. However, when I import the sheet into SAS and do PROC PRINT gives the first 2 variable columns as "42461" and "41788" while the date3 is 05/02/2014.
I need these date formats consistent b/c I am doing a Cox regression with PROC PHREG.
Any thoughts about how to make these dates consistent?
Thanks!
This probably depends on how the data is represented in Excel and how it is imported into SAS. First, are the formats the same in Excel? The first two are being imported as a number. The second as a string.
In Excel, you can format the column using a date format. Perhaps your import method will recognize this. You can also define another column as a string, using the text(<whatever>, "YYYY-MM-DD") to convert to a string in that format.
Alternatively, you can import all as numbers and then add the value to 1899-12-31. That is the base date for Excel. This makes more sense if you think of "1" as being 1900-01-01.
Because your column had mixed numeric (date) and character values SAS imported the field as character. So the actual dates got imported as the text version of the actual number that Excel stores for dates. The ones that look like date strings in SAS are the fields that were strings in Excel also.
Or if in your case one of the three columns was all valid dates then SAS imported it as a number and assigned a date format to it so there is nothing to fix for that column.
The best way to fix it is to make sure that all of the values in the date column are either real dates or empty cells. Then PROC IMPORT will be able to make the right guess at how to import it.
Once you have the strings in SAS and you want to try to fix them then you need to decide which strings look like integers and which should be treated as date strings.
So you might just check if they have any non-digit characters and assume those are the ones that are date strings instead of numbers. For the ones that look like integers just adjust the number to account for the fact that Excel numbers dates from 1900 and SAS numbers them from 1960.
data want ;
set have ;
if missing(exel_string) then date=.;
else if notdigit(trim(excel_string)) then date=input(excel_string,anydtdte32.);
else date=input(excel_string,32.) + '01JAN1900'd -2 ;
format date yymmdd10. ;
run;
You might wonder why the minus 2? It is because Excel starts from 1 instead of 0 and also because Excel thinks 1900 was a leap year. Here are the Excel date numbers for some key dates and a little SAS program to convert them. Try it.
data excel_dates;
input datestr :$10. excel_num :comma32. #1 sas_num :yymmdd10. ;
diff = sas_num - excel_num ;
format _numeric_ comma14. ;
sasdate1 = excel_num - 21916;
sasdate2 = excel_num + '01JAN1900'd -2 ;
format sasdate: yymmdd10.;
cards;
1900-01-01 1
1900-02-28 59
1900-03-01 61
1960-01-01 21,916
2018-01-01 43,101
;

Convert time string from unix time command like 10m20.5s into time format in excel

I have time data from the unix time command like
203m53.5s
I have this data in excel. I want it to be converted to Excell time format so I can do mathematical operations like sum and averages over them.
How can I do this
Replace the m with : and the s with "":
=--SUBSTITUTE(SUBSTITUTE(A1,"m",":"),"s","")
Now that the time is in a format that Excel will recognize we need to change it from string text to a number. The -- is forcing the string into a number by performing a mathematical process of multiplying -1 * -1 to it.
It can be replaced by TIMEVALUE()
Then format the cell with a custom format of:
[mm]:ss.0
One way is to use a forumala to strip out the m and s and use those values for time in a new column in Excel.
Assume the Unix data is in column A.
=(LEFT($A1,FIND("m",$A1)-1)*60+MID($A1,FIND("m",$A1)+1, LEN($A1)-FIND("m",$A1)-1)/84600
then format the cell as custom and choose the time without the AM/PM
Breakdown:
(get the minutes by finding "m")
multiply by 60 to convert to seconds
+ (get the seconds by starting at the location of m, +1 to the location of m-length of the whole string)
-1 to account for the actual "s"
Then divide the whole thing by 84600 to convert to time as a decimal

printing serial date and data to .csv or excel file using fprintf?

I just started using Matlab (R2015a) a few weeks ago, and although I have searched for an answer to this question (and tried a few workarounds) I haven't had any luck. Hopefully, it's an easy fix!!
I am trying to write one column of serial dates at high precision (I need milliseconds) and many columns of data to a .csv file. I don't want insane precision for everything, just the first column of dates.
Here's what I've found:
- csvwrite doesn't allow for differing precisions.
xlswrite doesn't have enough precision (even though my serial date is a double, and yes I looked at the spreadsheet cell)
dlmwrite appends data in row format, so writing the dates and then appending the rest of the data doesn't work (though soooo close!)
Now I'm trying with fprintf:
hz_time is the serial date (double)
data1 and data2 are 4x25 (double) and 4x7 (double) respectively
hz_time = 1.0e+05 *
[7.357583607870371, 7.357583607928241, 7.357583607986110, 7.357583608043980]
STR_data = [data1, data2];
filename = (strcat('Processed_',files(k1).name));
file = fopen(filename,'w');
fprintf(file,'%.20f\n',hz_time);
fprintf(file,'%f%f%f%f%f%f%f%f%\n',STR_data);
fclose('all')
Currently, this code appends data1 and data2 in one cell at the end of the STR_date_time column. When I try concatenating hz_time and the data matrices together (using strcat) I fail:
STR_data = strcat([hz_time, data1, data2])
Warning: Out of range or non-integer values truncated during conversion to character.
I'm sure it's probably my formatting...
My end goal is to export this data (into a .csv or excel spreadsheet or something) so that the first column has the serial date (loads of precision) and columns 2-8 have the other data in it.
Any help would be much appreciated.
Thanks in advance!

How do I set decimal places when I pull data from Bloomberg using Matlab?

I was retrieving data from Bloomberg using Matlab history function and it seems that Matlab set 4 decimal places to be the default. This is sometimes inconsistent with the data that I pulled from Excel. For example:
Here's the Matlab code:
[d, sec] = history(c, 'TY1 Comdty', 'PX_LAST', '1982-5-6', '1982-5-6')
I get different results from Matlab and Excel:
Date 5/6/1982
Excel 72.96875
Matlab 72.9688
Is there a way to set the property of history function and get 72.96875 instead of 72.9688?
There is no solution as good as Excel has for displaying number with the desired format.
In Matlab, you can set the format long to have 15 decimal places, and format short to have 4 decimal places. That's all you have.
Nevertheless, two workarounds. The first uses round
(1) format long %define 15 digit precision
xround = #(x,d) round(x/d)*d; %rounding function with d format
a = xround(72.96875, 0.00001) %rounding your value by calling 'xround' function
It gives
a = 72.968750000000000
The second workaround prints a string (not a scalar)
(2) sprintf('%.5f', 72.96875)
It gives
ans = 72.96875
To match Excel with Matlab, you can type
[d, sec] = history(c, 'TY1 Comdty', 'PX_LAST', '1982-5-6', '1982-5-6');
d = xround(d, 0.00001);
Use the format command to set the display to the desired number of significant digits: http://www.mathworks.co.uk/help/matlab/ref/format.html

Resources