I have a date column that is of a string format. The dates appear like,
DDMMYYY or DDMYYYY
04032021 0412021
It has been a nightmare trying to split them and add '-' because they aren't all the same length. Not like a standard format of DDMMYYYY. I have also tried to split them by position. I have tried to cast it(the column) as a date.
Do you all have any suggestions for a way to get the entire column properly formatted and displaying as a date column? I need to filter on it.
It seems that the month and the year are on a fixed position relative to the start resp. end of the string.
So a trivial solution using substring would extract them and interpret the rest (padded with zeroes to the length of two) as the month.
select
dt,
substring(dt,1,2) dd,
LPAD(substring(dt,3,length(dt)-6),2,'0') mm,
substring(dt,length(dt)-3) yyyy
from T;
dt |dd|mm|yyyy|
--------+--+--+----+
04032021|04|03|2021|
0412021 |04|01|2021|
Conversion to date data type is a simple concatenation and a call of to_date
select
dt,
to_date(
substring(dt,1,2)|| -- dd
LPAD(substring(dt,3,length(dt)-6),2,'0')|| -- mm
substring(dt,length(dt)-3) -- yyyy
,'DDMMYYYY' ) ddt
from T;
dt |ddt |
--------+----------+
04032021|2021-03-04|
0412021 |2021-01-04|
Related
For example:
2021-08-18T22:24:49-06:00
I want to print this to a more readable format like: 8/18/21 10:24pm
I have tried using the built in DateTime function but it returns an error. Can someone point me in the right direction? I have checked other answers but they all relate to using the aforementioned funciton.
Looking at how your data is formatted and it seems your data is formatted "yyyy-mm-ddThh:mm:ss";so, here is my attempt:
Formula in C1:
=--SUBSTITUTE(LEFT(A1,16),"T"," ")
Then I just formatted the resulting datetime-stamp with:
m/d/yy hh:mm AM/PM
So it remains a numeric value to do your calculations with if needed.
Give a try to below formula-
=TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"T","</s><s>")&"</s></t>","//s[1]")+FILTERXML("<t><s>"&SUBSTITUTE(FILTERXML("<t><s>"&SUBSTITUTE(A1,"T","</s><s>")&"</s></t>","//s[2]"),"-","</s><s>")&"</s></t>","//s[1]"),"M/dd/yyyy hh:mm AM/PM")
The date is in ISO 8601 format. This will parse out the different parts of the date string and convert to a date, assuming that your string is in A1:
=DATEVALUE(LEFT(A1,10))
+TIMEVALUE(MID(A1,12,8))
+TIMEVALUE(RIGHT(A1, 5))
*INT(MID(A1, 20, 1) & 1)
The first part grabs the date, the second part grabs the time, the third part captures the date offset, and the last part captures the sign. If you want to format that, you can do it with the cell formatting or wrap it in TEXT:
=TEXT(
DATEVALUE(LEFT(A1,10))
+TIMEVALUE(MID(A1,12,8))
+TIMEVALUE(RIGHT(A1, 5))
*INT(MID(A1, 20, 1) & 1),
"yyyy-mm-dd hh:mm:ss"
)
Note that if you need to support UTC, that is indicated by Z instead of a time offset, and you would need to modify the formula slightly. If your data always has the same time offset, you could just hardcode it, instead of parsing it out.
how to separate date and time from datetime column if you have the format as below :
click here to view image
I am trying int(datetime column) for fetching date ; Datetime column - int(datetime column) for fetching time column
Your formula cannot work because your data is a text string (note that it has a letter included) and not a number.
So first convert the string into a "real" time with:
=substitute(a2,"T"," ")
You can then use:
Date: =INT(SUBSTITUTE(A2,"T"," "))
Time: =MOD(SUBSTITUTE(A2,"T"," "),1)
and be sure to format the results as desired:
If your column is formatted true date then use to separate date
=TEXT(A1,"yyyy-mm-dd")
For time
=TEXT(A1,"hh:mm:ss")
If data is in text string or output by TEXT() function then try below functions.
for date =TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"T","</s><s>")&"</s></t>","//s[1]"),"yyyy-mm-dd")
for time =TEXT(FILTERXML("<t><s>"&SUBSTITUTE(A1,"T","</s><s>")&"</s></t>","//s[last()]"),"hh:mm:ss")
For date
=LEFT(A2,FIND("T",A2)-1)
For time
=RIGHT(A2,LEN(A2)-FIND("T",A2))
I have imported an excel sheet where the date1 is 4/1/16 date2 is 5/29/14 and date3 is 5/2/14. However, when I import the sheet into SAS and do PROC PRINT gives the first 2 variable columns as "42461" and "41788" while the date3 is 05/02/2014.
I need these date formats consistent b/c I am doing a Cox regression with PROC PHREG.
Any thoughts about how to make these dates consistent?
Thanks!
This probably depends on how the data is represented in Excel and how it is imported into SAS. First, are the formats the same in Excel? The first two are being imported as a number. The second as a string.
In Excel, you can format the column using a date format. Perhaps your import method will recognize this. You can also define another column as a string, using the text(<whatever>, "YYYY-MM-DD") to convert to a string in that format.
Alternatively, you can import all as numbers and then add the value to 1899-12-31. That is the base date for Excel. This makes more sense if you think of "1" as being 1900-01-01.
Because your column had mixed numeric (date) and character values SAS imported the field as character. So the actual dates got imported as the text version of the actual number that Excel stores for dates. The ones that look like date strings in SAS are the fields that were strings in Excel also.
Or if in your case one of the three columns was all valid dates then SAS imported it as a number and assigned a date format to it so there is nothing to fix for that column.
The best way to fix it is to make sure that all of the values in the date column are either real dates or empty cells. Then PROC IMPORT will be able to make the right guess at how to import it.
Once you have the strings in SAS and you want to try to fix them then you need to decide which strings look like integers and which should be treated as date strings.
So you might just check if they have any non-digit characters and assume those are the ones that are date strings instead of numbers. For the ones that look like integers just adjust the number to account for the fact that Excel numbers dates from 1900 and SAS numbers them from 1960.
data want ;
set have ;
if missing(exel_string) then date=.;
else if notdigit(trim(excel_string)) then date=input(excel_string,anydtdte32.);
else date=input(excel_string,32.) + '01JAN1900'd -2 ;
format date yymmdd10. ;
run;
You might wonder why the minus 2? It is because Excel starts from 1 instead of 0 and also because Excel thinks 1900 was a leap year. Here are the Excel date numbers for some key dates and a little SAS program to convert them. Try it.
data excel_dates;
input datestr :$10. excel_num :comma32. #1 sas_num :yymmdd10. ;
diff = sas_num - excel_num ;
format _numeric_ comma14. ;
sasdate1 = excel_num - 21916;
sasdate2 = excel_num + '01JAN1900'd -2 ;
format sasdate: yymmdd10.;
cards;
1900-01-01 1
1900-02-28 59
1900-03-01 61
1960-01-01 21,916
2018-01-01 43,101
;
I have a series of dates and some corresponding values. The format of the data in Excel is "Custom" dd/mm/yyyy hh:mm.
When I try to convert this column into an array in Matlab, in order to use it as the x axis of a plot, I use:
a = datestr(xlsread('filename.xlsx',1,'A:A'), 'dd/mm/yyyy HH:MM');
But I get a Empty string: 0-by-16.
Therefore I am not able to convert it into a date array using the function datenum.
Where do I make a mistake? Edit: passing from hh:mm to HH:MM doesn't work neither. when I try only
a = xlsread('filename.xlsx',1,'A2')
I get: a = []
According to the documentation of datestr the syntax for minutes, months and hours is as follows:
HH -> Hour in two digits
MM -> Minute in two digits
mm -> Month in two digits
Therefore you have to change the syntax in the call for datestr. Because the serial date number format between Excel and Matlab differ, you have to add an offset of 693960 to the retrieved numbers from xlsread.
dateval = xlsread('test.xls',1,'A:A') + 693960;
datestring = datestr(dateval, 'dd/mm/yyyy HH:MM');
This will read the first column (A) of the first sheet (1) in the Excel-file. For better performance you can specify the range explicitly (for example 'A1:A20').
The code converts...
... to:
datestring =
22/06/2015 16:00
Edit: The following code should work for your provided Excel-file:
% read from file
tbl = readtable('data.xls','ReadVariableNames',false);
dateval = tbl.(1);
dateval = dateval + 693960;
datestring = datestr(dateval)
% plot with dateticks as x-axis
plot(dateval,tbl.(2))
datetick('x','mmm/yy')
%datetick('x','dd/mmm/yy') % this is maybe better than only the months
Minutes need to be called with a capital M to distinguish them from months.
Use a=datestr(xlsread('filename.xlsx',1,'A:A'),'dd/mm/yyyy HH:MM')
Edit: Corrected my original answer, where I had mixed up the cases needed.
I tried with this. It works but it is slow and I am not able to plot the dates at the end. Anyway:
table= readtable ('filename.xlsx');
dates = table(:,1);
dates = table2array (dates);
dates = datenum(dates);
dates = datestr (dates);
I'm new to Stata, and I'm wondering how can I change a string variable which contains a date to a date format.
The data in the variable looks like this:
yyyy-mm-dd
Should I first remove the dashes so that Stata can recognize the format in order to later use gen var = date() ?
Thank you for your help.
The Stata date function is smart about removing separator characters. See help datetime_translation under the section "the date function"
If your dates are in v1 and in the form yyyy-mm-dd you can specify the commands:
generate v2 = date(v1, "YMD")
format %td v2
The YMD is called a mask, and it tells Stata the order in which the parts of the date are specified. The second line will assign the variable the Stata daily date format, which means that when you look at that variable in the data, it will be shown in human readable form. The date is stored, however, as the number of days since January 1, 1960.
The best way to experiment with the date function is to use the display command. The first line will display an integer representing the number of days since January 1, 1960. The second line will display the date in a human readable format.
display date("2013-08-14", "YMD")
display %td date("2013-08-14", "YMD")
you can look here to see how to convert to data in Stata or do like this
tostring datedx, replace
generate str4 dxyr1= substr(datedx,1,4)
generate str2 dxmo1 = substr(datedx,6,7)
generate str2 dxda1 = substr(datedx,9,10)
destring dx*, replace
gen datedx1 = mdy(dxmo1, dxda1, dxyr1)