I have a complex date string being read from a csv file. The format is unable to be processed by Oracle's TO_DATE function. Looking for an efficient method to break this string apart and return a date object, to insert into a DATE column. The suggested option of using TO_DATE with 'DD-MON-YY HH.MI.SS AM' does not work. Not variation of this will break up this particular string. Hence the need for a custom function. I have also tried with the 'HH.MI.SS.SSSSS AM' format which also does not work. I have found that if I drop the fractional seconds, it will work. If I run a regex to drop that portion, it should convert as expected.
The string is formatted as: 21-OCT-04 01.03.23.966000 PM
My initial thought is to break up by space first, resulting in three sub strings.
Then break the first substring by - and the second by ., and load the resulting pieces into a DATE object directly.
Is there a better method I could use?
Thank you, Allan
Use what you have, which is a timestamp literal, to create a timestamp, and then cast it "as date":
select
cast(to_timestamp('21-OCT-04 01.03.23.966000 PM', 'dd-MON-rr hh.mi.ss.ff AM') as date) dt
from dual;
DT
----------------------
2004/10/21 01:03:23 PM
(The output format depends on my specific session NLS_DATE_FORMAT, which I actually changed for this illustration to 'yyyy/mm/dd hh:mi:ss AM'.)
Related
Say I have a df as follows:
a=pd.DataFrame([[1,3]]*3,columns=['a','b'],index=['5/4/2017','5/6/2017','5/8/2017'])
a.index=pd.to_datetime(a.index,format='%m/%d/%Y')
The type of of the df.index is now
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
When we try to call a row of data based on the index of type pd.datetime, it is possible to call the value based on a string format of date instead of inputting a datetime object. In the above case, if I want to call a row of data on 5/4/2017, I can simply input the string format of the date to .loc as follows:
print(a.loc['5/4/2017'])
And we do not need to input the datetime object
print(a.loc[pd.datetime(2017,5,4)]
My question is, when calling the data from .loc based on string format of date, how does pandas know if my date string format follows m-d-y or d-m-y or other combinations? In this above case, I used a.loc['5/4/2017'] and it succeeds in returning the value. Why wouldn't it think it might mean April 5 which is not within this index?
Here's my best shot:
Pandas has an internal function called pandas._guess_datetime_format. This is what gets called when passing the 'infer_datetime_format' argument to pandas.to_datetime. It takes a string and runs through a list of "guess" formats and returns its best guess on how to convert that string to a datetime object.
Referencing a datetime index with a string may use a similar approach.
I did some testing to see what would happen in the case you described - where a dataframe contains both the date 2017-04-05 and 2017-05-04.
In this case, the following:
df.loc['5/4/2017']
Returned the Data for May 4th, 2017
df.loc['4/5/2017']
Returned the data for April 5th, 2017.
Attempting to reference 4/5/2017 in your original matrix gave an "is not in the [index]" error.
Based on this, my conclusion is that pandas._guess_datetime_format defaults to a "%m/%d/%Y" format in cases where it cannot be distinguished from "%d/%m/%Y". This is the standard date format in the US.
In Matlab, how can I convert a date into a numeric date?
For example, I want to convert '31-Jan-1990' to '19900131'.
You can use datestr to change the date format to 19900131, and then use str2double to convert it to a number:
numDate = str2double(datestr('31-Jan-1990','yyyymmdd'))
numDate =
19900131
If you want to keep the date as a string just remove str2double from the above code.
Here are two functions that are the most helpful and appropriate ones for this situation:
datenum and datestr
The first step is to convert your string to Matlab's date number, which can be later converted to any string format, or even do calculation for date or time. Here we use additional argument to help on conversion. You may also check here for format you like to construct.
daynum = datenum('31-Jan-1990','dd-mm-YYYY')
The second step is then straightforward. You use the date number to translate to the string with the format you want.
datestr(daynum,'YYYYmmdd');
You can sure combine both functions together
datestr(datenum('31-Jan-1990','dd-mm-YYYY'),'YYYYmmdd')
The result
>> datestr(datenum('31-Jan-1990','dd-mm-YYYY'),'YYYYmmdd')
ans =
'19900131'
Finally, use str2num to achieve what you want.
I have a variable ShiftStart that is a numeric variable in the format 01jan2014 06:59:59 (and so on). I want to change this to a string variable so that I can then substring it and create variables based on just date and just time separately.
When I try
generate str20 string_shiftstart=string(ShiftStart)
I create a string but all of the cells have been converted to strange values ("1.70e+12" and so on).
How can I keep the original contents of ShiftStart when it is converted to a string?
It seems you have a variable formatted as datetime. If so, no need to convert to string. There are appropriate functions that allow you to manipulate the original variable. This is clearly explained in help datetime:
clear
set more off
*----- example data -----
set obs 5
gen double datet = _n * 100000000
format datet %tc
list
*----- what you want -----
gen double date = dofc(datet)
format %td date
gen double hour = hh(datet) + mm(datet)/60 + ss(datet)/3600
list
The reason you find your original result surprising is because you are not aware of the fact that underlying the datetime display format, is a numerical value.
A good read (aside from help datetime) is
Stata tip 113: Changing a variable's format: What it does and does not mean, The Stata Journal, by Nicholas J. Cox.
Edit
To answer your last question:
If you want to create an indicator variable marking pre/post periods, one way is using td() (see the help file). Following the example given above:
// before 04jan1960
gen pre = date < td(04jan1960)
Creating this indicator variable is not always necessary. Most commands allow the use of the if qualifier, and you can insert the condition directly. See help if.
If you mean something else, you should be more explicit.
I have this problem: I want to convert the sysdate to string, using fillmode on month, day and hour only. However,
select to_char(sysdate, 'fmmm/fmdd/yyyy fmhh12:mi:ss am') from dual
gives me results like
11/13/2013 9:45:**0** am
although it should be
11/13/2013 9:45:**00** am
any thoughts? Thanks in advance
You shouldn't use the FM format model, because FM, as written in the documentation:
FM - Used in combination with other elements to direct the suppression of leading or trailing blanks
So using FM will make your final string shorter, if possible.
You should remove the FM from your format model mask and it will work as you expect:
select to_char(TRUNC(sysdate), 'mm/dd/yyyy hh12:mi:ss am') from dual;
Output:
11/13/2013 12:00:00 am.
I've changed my answer after reading Nicholas Krasnov's comment (thanks).
More about date format models in Oracle Documentation: Format models
Edit
Yes, the code I provided would return, for example, 01-01-2013. If you want to have the month and day without leading zeroes, than you should write it like this: fmDD-MM-YYYY fmHH:MI:SS.
The first fm makes the leading zeroes be truncated. The second fm turns off that feature and you do get leading zeroes for the time part of the date, example:
SELECT TO_CHAR(
TO_DATE('01-01-2013 10:00:00', 'DD-MM-YYYY HH12:MI:SS'),
'fmmm/dd/yyyy fmhh12:mi:ss am')
FROM dual;
Output:
1/1/2013 10:00:00 am.
I'm new to Stata, and I'm wondering how can I change a string variable which contains a date to a date format.
The data in the variable looks like this:
yyyy-mm-dd
Should I first remove the dashes so that Stata can recognize the format in order to later use gen var = date() ?
Thank you for your help.
The Stata date function is smart about removing separator characters. See help datetime_translation under the section "the date function"
If your dates are in v1 and in the form yyyy-mm-dd you can specify the commands:
generate v2 = date(v1, "YMD")
format %td v2
The YMD is called a mask, and it tells Stata the order in which the parts of the date are specified. The second line will assign the variable the Stata daily date format, which means that when you look at that variable in the data, it will be shown in human readable form. The date is stored, however, as the number of days since January 1, 1960.
The best way to experiment with the date function is to use the display command. The first line will display an integer representing the number of days since January 1, 1960. The second line will display the date in a human readable format.
display date("2013-08-14", "YMD")
display %td date("2013-08-14", "YMD")
you can look here to see how to convert to data in Stata or do like this
tostring datedx, replace
generate str4 dxyr1= substr(datedx,1,4)
generate str2 dxmo1 = substr(datedx,6,7)
generate str2 dxda1 = substr(datedx,9,10)
destring dx*, replace
gen datedx1 = mdy(dxmo1, dxda1, dxyr1)