I need to import an excel, the excel has a few columns and the 1st column A is a date column. Column A has the date format DDMMMYYYY e.g. '01Jan2017' and in excel the data type is date type. But when I import it to SAS, all the other columns remain the same data type (numeric, character, etc.) and value. But column A becomes a number e.g. ('42736' for '01Jan2017'). How do I import the data as it is and without converting the data type to other types?
libname out '/path';
proc import out=out.sas_output_dataset
datafile='/path/excel_file.xlsx'
DBMS=XLSX
REPLACE;
sheet="Sheet1";
run;
It is hard to know without seeing the data. The below is general information, it may not answer your precise problem.
To avoid common errors you should set mixed=yes in your libname. You may also want to include stringdate=yes statement.
The mixed=yes allows for any out of range excel date values.
stringdates=yes brings all dates into SAS in character format, so you will need to use the input() function to convert this into a SAS date.
Date = input( Date , mmddyy10. )
I would suggest that you import the excel with the import wizard in SAS. Afterwards right-click on the query and extract the code, see here: SAS Import Query DE
In the generated code itself you can format each imported column into the desired format.
For the possible format see: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/leforinforref/n0p2fmevfgj470n17h4k9f27qjag.htm
Hope this helps.
A value of '42736' for '01Jan2017' is an indication that the column in the Excel file has a mix of cells with date values and cells with string values. In that case SAS will make the variable character and store the date values as a digit string that represents the raw number excel uses for the date. To convert '42736' to a date value you need to first convert it to a number and then adjust the number for the difference in the base date used by Excel.
date_value = input(date_string,32.) + '30DEC1899'd ;
To convert the strings that look like '01JAN2017' use the DATE informat instead.
date_value = input(date_string,date11.);
You could add logic to do both to handle a column with mixed values.
date_value = input(date_string,??date11.);
if missing(date_value) then
date_value = input(date_string,??32.) + '30DEC1899'd
;
To have the new variable print the date values in a human readable style attach a date type format to the variable.
format date_value date9. ;
Related
I have a column in my Excel sheet called "Start_Time" and the data in the column is in "HH:MM:SS" format, for example "10:13:20".
But when I use pandas.read_excel() function to load the data. The "Start_Time" column showed decimal values (for example: 0.425925925925926) with data type as "object".
How could make the df["Start_Time"] to display as "10:13:20"?
I tried pd.Timedelta(), but it works for only one value at a time. I want to convert all values in that column.
Start Time
End Time
16:24:50
16:32:27
10:35:53
15:06:46
15:21:43
6:39:50
6:39:50
21:55:02
3:29:04
3:29:13
0:53:06
0:53:06
10:21:13
10:25:18
16:15:25
16:19:31
Excel stores the date as a serial number counting from a starting date.
When you choose a format it converts that serial number to the format demanded.
If you want Pandas to display the correct date then you have to convert the serial number.
the date variable is showing as per my attached image . When I am importing it to SAS my date variable is reading by SAS as a categorical variable . How I can convert that date values in standard format (yyyy-mm-dd) in excel so sas can read it correctly . Thank you for your help .
date format
Select your column and click where the picture shows, and click on 'Date' (short or long date).
I would recommend converting it after the fact.
Assuming your dates are coming in as YYYY-MM-DD. What do you want to do with the 9999-99-999 or 354332 values? Input will assign them as missing and ? suppresses NOTES about invalid data.
data want;
set dataImported;
*use ?? to suppress the errors in the log;
newDate = input(address_date_changed, ? yymmdd10.);
format newDate yymmddd10.;
run;
*check output;
proc print data=want (obs=100) label noobs;
where not missing(newDate);
var newDate;
run;
As the title states, I'm having issues importing datetimes from Excel to SAS. The issue seems to be with seconds:
here's sample data from excel:
My DateTime:
Here's the raw excel numbers:
Edit2:
43417.58657407
43417.58656250
When I'm import these into SAS, this is how SAS is displaying them:
13NOV2018:14:04:39
13NOV2018:14:04:39
and the numeric values:
1857737079
1857737079
I'm trying to figure out how to get SAS to read the seconds correctly. I'm using proc import and here's my code:
proc import
out = MyDSOutput
datafile= MyDSInput
dbms = EXCEL replace;
sheet = "page";
getnames = yes;
mixed = yes;
scantext = yes;
usedate = no;
scantime = yes;
textsize = 32767;
;
run;
EDIT: I should have added that converting this to a CSV really isn't an option because I have numeric IDs that are >15 digits and excel will convert anything >15 digits to 0s.
EDIT2: Added an expanded version of the raw excel numbers
Excel stores time as a fraction of a day. It is impossible to represent a specific number of seconds exactly when you do that. That particular time, 14:04:40 (second number 50,680), is particularly hard to represent as a floating point fraction. If you represent it as 0.58657407 and multiple by the number of seconds in a day you get seconds of 50,679.999648, so slightly less that what you want.
Try splitting your DATETIME field into separate DATE and TIME fields. That way SAS will have more binary digits to represent just the time of day (since it doesn't also have to have the 40K or 20K seconds for the day of the year). Perhaps that will get close? Or store the value as a character string in Excel and then use INPUT() function in SAS to convert the string to a datetime value.
For your ID issue, do not store ID values as numbers, in any system if you can avoid it. If you can store the ID in your EXCEL file then EXCEL can write it the the CSV file. But when you read the CSV file make sure to read that column into a character variable and not a numeric one.
I have imported an excel sheet where the date1 is 4/1/16 date2 is 5/29/14 and date3 is 5/2/14. However, when I import the sheet into SAS and do PROC PRINT gives the first 2 variable columns as "42461" and "41788" while the date3 is 05/02/2014.
I need these date formats consistent b/c I am doing a Cox regression with PROC PHREG.
Any thoughts about how to make these dates consistent?
Thanks!
This probably depends on how the data is represented in Excel and how it is imported into SAS. First, are the formats the same in Excel? The first two are being imported as a number. The second as a string.
In Excel, you can format the column using a date format. Perhaps your import method will recognize this. You can also define another column as a string, using the text(<whatever>, "YYYY-MM-DD") to convert to a string in that format.
Alternatively, you can import all as numbers and then add the value to 1899-12-31. That is the base date for Excel. This makes more sense if you think of "1" as being 1900-01-01.
Because your column had mixed numeric (date) and character values SAS imported the field as character. So the actual dates got imported as the text version of the actual number that Excel stores for dates. The ones that look like date strings in SAS are the fields that were strings in Excel also.
Or if in your case one of the three columns was all valid dates then SAS imported it as a number and assigned a date format to it so there is nothing to fix for that column.
The best way to fix it is to make sure that all of the values in the date column are either real dates or empty cells. Then PROC IMPORT will be able to make the right guess at how to import it.
Once you have the strings in SAS and you want to try to fix them then you need to decide which strings look like integers and which should be treated as date strings.
So you might just check if they have any non-digit characters and assume those are the ones that are date strings instead of numbers. For the ones that look like integers just adjust the number to account for the fact that Excel numbers dates from 1900 and SAS numbers them from 1960.
data want ;
set have ;
if missing(exel_string) then date=.;
else if notdigit(trim(excel_string)) then date=input(excel_string,anydtdte32.);
else date=input(excel_string,32.) + '01JAN1900'd -2 ;
format date yymmdd10. ;
run;
You might wonder why the minus 2? It is because Excel starts from 1 instead of 0 and also because Excel thinks 1900 was a leap year. Here are the Excel date numbers for some key dates and a little SAS program to convert them. Try it.
data excel_dates;
input datestr :$10. excel_num :comma32. #1 sas_num :yymmdd10. ;
diff = sas_num - excel_num ;
format _numeric_ comma14. ;
sasdate1 = excel_num - 21916;
sasdate2 = excel_num + '01JAN1900'd -2 ;
format sasdate: yymmdd10.;
cards;
1900-01-01 1
1900-02-28 59
1900-03-01 61
1960-01-01 21,916
2018-01-01 43,101
;
I'm new to Stata, and I'm wondering how can I change a string variable which contains a date to a date format.
The data in the variable looks like this:
yyyy-mm-dd
Should I first remove the dashes so that Stata can recognize the format in order to later use gen var = date() ?
Thank you for your help.
The Stata date function is smart about removing separator characters. See help datetime_translation under the section "the date function"
If your dates are in v1 and in the form yyyy-mm-dd you can specify the commands:
generate v2 = date(v1, "YMD")
format %td v2
The YMD is called a mask, and it tells Stata the order in which the parts of the date are specified. The second line will assign the variable the Stata daily date format, which means that when you look at that variable in the data, it will be shown in human readable form. The date is stored, however, as the number of days since January 1, 1960.
The best way to experiment with the date function is to use the display command. The first line will display an integer representing the number of days since January 1, 1960. The second line will display the date in a human readable format.
display date("2013-08-14", "YMD")
display %td date("2013-08-14", "YMD")
you can look here to see how to convert to data in Stata or do like this
tostring datedx, replace
generate str4 dxyr1= substr(datedx,1,4)
generate str2 dxmo1 = substr(datedx,6,7)
generate str2 dxda1 = substr(datedx,9,10)
destring dx*, replace
gen datedx1 = mdy(dxmo1, dxda1, dxyr1)