I'm new to Stata, and I'm wondering how can I change a string variable which contains a date to a date format.
The data in the variable looks like this:
yyyy-mm-dd
Should I first remove the dashes so that Stata can recognize the format in order to later use gen var = date() ?
Thank you for your help.
The Stata date function is smart about removing separator characters. See help datetime_translation under the section "the date function"
If your dates are in v1 and in the form yyyy-mm-dd you can specify the commands:
generate v2 = date(v1, "YMD")
format %td v2
The YMD is called a mask, and it tells Stata the order in which the parts of the date are specified. The second line will assign the variable the Stata daily date format, which means that when you look at that variable in the data, it will be shown in human readable form. The date is stored, however, as the number of days since January 1, 1960.
The best way to experiment with the date function is to use the display command. The first line will display an integer representing the number of days since January 1, 1960. The second line will display the date in a human readable format.
display date("2013-08-14", "YMD")
display %td date("2013-08-14", "YMD")
you can look here to see how to convert to data in Stata or do like this
tostring datedx, replace
generate str4 dxyr1= substr(datedx,1,4)
generate str2 dxmo1 = substr(datedx,6,7)
generate str2 dxda1 = substr(datedx,9,10)
destring dx*, replace
gen datedx1 = mdy(dxmo1, dxda1, dxyr1)
Related
i have a set of data for couple of days and the names of the data files like this
name='Newyork20200915'
which is for the 15th of September and i want to export only the date to excel like shown below
So how can i get the date from the name string ?
Thanks in advance
Assuming that the other part of name will not contain any digits besides the date, you can use regexp to get all the digits from the character array:
name = 'Newyork20200915'
date_only = regexp(name, '\d*', 'match')
Next, you can convert this date string to a serial date number using datenum, by providing the format in which the date is currently. And then use datestr to format it to your desired format.
date_formatted = datestr(datenum(date_only, 'yyyymmdd'), 'dd. mmm')
date_formatted =
'15. Sep'
I'm using today to aquire todays date and then adding a static value to the end of it using the following:
=TODAY()&"T23:00:00"
Which Returns 43202T23:00:00
I really need it in the format 2018-04-12T23:00:00
Any help on this would be great!
There are a couple ways to accomplish this, depending on whether your goal is a formatted String (to display) or a numeric value (such as data type Date) for storing or using with calculations.
If you want a formatted date/time result (to display to the user)...
Use the TEXT worksheet function:
=TEXT(TODAY(),"yyyy-mm-dd")&"T23:00:00"
...the reason this works is because TODAY() returns a Date data type, which is basically just a number representing the date/time, (where 1 = midnight on January 1, 1900, 2 = midnight on January 2, 1900, 2.5 = noon on January 2, 1900,etc).
You can convert the date type to a String (text) with the TEXT function, in whatever format you like. The example above will display today's date as 2018-04-12.
If, for example, you wanted the date portion of the string displayed asApril 12, 2018 then you would instead use:
TEXT(TODAY(),"mmmm d, yyyy")
Note that the TEXT worksheet function (and VBA's Format function) always return Strings, ready to be concatenated with the rest of the String that you're trying to add ("T23:00:00").
If you want to use the result in calculations...
If you instead want the result to be in a Date type, then instead of concatenating a string (produced by the TEXT function) to a string (from "T23:00:00"), you could instead add a date to a date:
=TODAY()+TIME(23,0,0)
or
=TODAY()+TIMEVALUE("23:00")
..and then you can format it as you like to show or hide Y/M/D/H/M/S as necessary with Number Formats (shortcut: Ctrl+1).
More Information:
MSDN : TEXT Function (Excel)
MSDN : TIMEVALUE Function (Excel)
MSDN : TIME Function (Excel)
I have imported an excel sheet where the date1 is 4/1/16 date2 is 5/29/14 and date3 is 5/2/14. However, when I import the sheet into SAS and do PROC PRINT gives the first 2 variable columns as "42461" and "41788" while the date3 is 05/02/2014.
I need these date formats consistent b/c I am doing a Cox regression with PROC PHREG.
Any thoughts about how to make these dates consistent?
Thanks!
This probably depends on how the data is represented in Excel and how it is imported into SAS. First, are the formats the same in Excel? The first two are being imported as a number. The second as a string.
In Excel, you can format the column using a date format. Perhaps your import method will recognize this. You can also define another column as a string, using the text(<whatever>, "YYYY-MM-DD") to convert to a string in that format.
Alternatively, you can import all as numbers and then add the value to 1899-12-31. That is the base date for Excel. This makes more sense if you think of "1" as being 1900-01-01.
Because your column had mixed numeric (date) and character values SAS imported the field as character. So the actual dates got imported as the text version of the actual number that Excel stores for dates. The ones that look like date strings in SAS are the fields that were strings in Excel also.
Or if in your case one of the three columns was all valid dates then SAS imported it as a number and assigned a date format to it so there is nothing to fix for that column.
The best way to fix it is to make sure that all of the values in the date column are either real dates or empty cells. Then PROC IMPORT will be able to make the right guess at how to import it.
Once you have the strings in SAS and you want to try to fix them then you need to decide which strings look like integers and which should be treated as date strings.
So you might just check if they have any non-digit characters and assume those are the ones that are date strings instead of numbers. For the ones that look like integers just adjust the number to account for the fact that Excel numbers dates from 1900 and SAS numbers them from 1960.
data want ;
set have ;
if missing(exel_string) then date=.;
else if notdigit(trim(excel_string)) then date=input(excel_string,anydtdte32.);
else date=input(excel_string,32.) + '01JAN1900'd -2 ;
format date yymmdd10. ;
run;
You might wonder why the minus 2? It is because Excel starts from 1 instead of 0 and also because Excel thinks 1900 was a leap year. Here are the Excel date numbers for some key dates and a little SAS program to convert them. Try it.
data excel_dates;
input datestr :$10. excel_num :comma32. #1 sas_num :yymmdd10. ;
diff = sas_num - excel_num ;
format _numeric_ comma14. ;
sasdate1 = excel_num - 21916;
sasdate2 = excel_num + '01JAN1900'd -2 ;
format sasdate: yymmdd10.;
cards;
1900-01-01 1
1900-02-28 59
1900-03-01 61
1960-01-01 21,916
2018-01-01 43,101
;
In Matlab, how can I convert a date into a numeric date?
For example, I want to convert '31-Jan-1990' to '19900131'.
You can use datestr to change the date format to 19900131, and then use str2double to convert it to a number:
numDate = str2double(datestr('31-Jan-1990','yyyymmdd'))
numDate =
19900131
If you want to keep the date as a string just remove str2double from the above code.
Here are two functions that are the most helpful and appropriate ones for this situation:
datenum and datestr
The first step is to convert your string to Matlab's date number, which can be later converted to any string format, or even do calculation for date or time. Here we use additional argument to help on conversion. You may also check here for format you like to construct.
daynum = datenum('31-Jan-1990','dd-mm-YYYY')
The second step is then straightforward. You use the date number to translate to the string with the format you want.
datestr(daynum,'YYYYmmdd');
You can sure combine both functions together
datestr(datenum('31-Jan-1990','dd-mm-YYYY'),'YYYYmmdd')
The result
>> datestr(datenum('31-Jan-1990','dd-mm-YYYY'),'YYYYmmdd')
ans =
'19900131'
Finally, use str2num to achieve what you want.
I have a variable ShiftStart that is a numeric variable in the format 01jan2014 06:59:59 (and so on). I want to change this to a string variable so that I can then substring it and create variables based on just date and just time separately.
When I try
generate str20 string_shiftstart=string(ShiftStart)
I create a string but all of the cells have been converted to strange values ("1.70e+12" and so on).
How can I keep the original contents of ShiftStart when it is converted to a string?
It seems you have a variable formatted as datetime. If so, no need to convert to string. There are appropriate functions that allow you to manipulate the original variable. This is clearly explained in help datetime:
clear
set more off
*----- example data -----
set obs 5
gen double datet = _n * 100000000
format datet %tc
list
*----- what you want -----
gen double date = dofc(datet)
format %td date
gen double hour = hh(datet) + mm(datet)/60 + ss(datet)/3600
list
The reason you find your original result surprising is because you are not aware of the fact that underlying the datetime display format, is a numerical value.
A good read (aside from help datetime) is
Stata tip 113: Changing a variable's format: What it does and does not mean, The Stata Journal, by Nicholas J. Cox.
Edit
To answer your last question:
If you want to create an indicator variable marking pre/post periods, one way is using td() (see the help file). Following the example given above:
// before 04jan1960
gen pre = date < td(04jan1960)
Creating this indicator variable is not always necessary. Most commands allow the use of the if qualifier, and you can insert the condition directly. See help if.
If you mean something else, you should be more explicit.