Batch formatting of Date in multiple datasets - excel

i have extracted a number of excel spreadsheets into sas using proc import. however i now need to join the datasets together and need a uniform date format for all datasets. they are currently in character format and some are structured as '1999Q1' dates and some as '12/02/2013' dates. any help on how i can change formats for all dates in all datasets?

You will need to use INPUT() function to convert the strings to dates so that you can merge them. Let's make some sample datasets to simulate what you might have imported from your Excel sheets.
data have1;
date='1999Q1';
var1=1;
run;
data have2;
date='02DEC2013'd ;
format date yymmdd10.;
var2=2;
run;
Now let's get the variable names and types from those datasets.
proc contents data=work._all_ noprint out=contents; run;
We can use this metadata to write some code to convert the strings into dates.
filename code temp;
data _null_;
set contents;
where upcase(name)='DATE' and type=2;
file code ;
length dsn $41;
dsn=catx('.',libname,memname);
put 'data ' dsn ';'
/ ' set ' dsn ';'
/ ' datenum=input(date,anydtdte.);'
/ ' format datenum yymmdd10.;'
/ ' rename datenum=date date=datechar;'
/ 'run;'
;
run;
%inc code / source2 ;
Now we can merge the datasets.
data want ;
merge have1 have2;
by date;
run;

Related

When reading date columns in excel files using pandas do we need to specify year?

I have this python script
import pandas as pd
def main():
readExcelFile()
def readExcelFile():
fileName = "abc.xlsx"
data = pd.read_excel(fileName,engine='openpyxl')
print(data['Date'])
data['Date1'] = pd.to_datetime(data['Date'],format='%mm/%dd')
print(data['Date1'])
main()
My date column in the excel file looks like this
12/01
12/02
12/03
12/04
etc
But when I print it out as a pandas dataframe it looks like this
2001-12-01
2002-12-01
2003-12-01
2004-12-01
The input date that is present in the Excel file is not matching with the pandas date obtained from the data frame. The date type is correct it is datetime64[ns].
It is appending the year to date data but there is no year information in the Excel file.
Based on useful information from #cottontail in the comments my modified question - should an excel file containing date data columns contain the year information as well ? I have no use to do date differences of any sort. I am reading in the excel data and plotting it.
.

How to create hive table with date format 'dd-MMM-yyyy'?

I,m trying to create a hive table for importing csv data into table where the date format in the csv file is 'dd-MMM-yyyy' (for example 20-Mar-2018). When i created table in hive it turns out the entire column of date into null values. Can anyone suggest me how to figure out this?
My Query:
create external table new_stock (Symbol String,Series String,Dat date,Prev_Close float,Open_Price float,High_Price float,Low_Price float,Last_Price float,Close_Price float,Avg_Price float,Volume int,Turn_Over float,Trades int,Del_Qty int,DQPQ_Per float) row format delimited fields terminated by ',' stored as textfile LOCATION '/stock_details/'
Finally some help from #leftjoin, i solved the problem of converting string date with format (dd-MMM-yyyy) to (dd-MM-yyyy) by using select query. It would work fine.
select from_unixtime(unix_timestamp(columnname ,'dd-MMM-yyyy'), 'dd-MM-yyyy') from tablename;

I can't change the length of a character column after Proc Import process

The first proc import process reads a "character" column from an Excel file. The data from the Excel is only 2 characters, so SAS creates the column length as 2.
proc import datafile = "excelfie" out=MainTable DBMS = excel REPLACE;
SHEET = "Sheet1"; GETNAMES=NO; MIXED=YES; USEDATE=YES; SCANTIME=YES; RANGE="A1:C26";
run;
Then I insert another SAS-table (with same column names) into the main SAS-table by using proc append, but I get an error, because SAS created the column in main table for 2 character length and the new character data is 5 digits.
Proc Append Base=MainTable Data=Table1; Run;
I tried to change the length of the column before the proc attend process as
data MainTable;
set MainTable;
format Column2 $5.;
informat Column2 $5.;
length Column2 $5;
run;
Proc Append Base=MainTable Data=Table1; Run;
But I still get the same error, because the Column format is now $5., but the length is still 2.
I used the Force option in the Proc append process which forces the merge process for data with different formats.
Proc Append Base=MainTable Data=Table1 FORCE NOWARN; Run;
Now I don't get error, but it cuts off the new data from 5 digits to 2 digits.
So what should I do?
In your code the "new" MainTable takes the variables of the "old" one, then reads the LENGTH statement and ignores it, since the variable Column2 has been defined from "old" MainTable. You have to define the new length BEFORE the SET statement.
data MainTable;
format Column2 $5.;
informat Column2 $5.;
length Column2 $5;
set MainTable;
run;

Importing table with unknown length from Excel .xlsm

I want to transfer a table from Excel to SAS (version is 9.2 and Excel file format is .XLSM, macro). The column names will be read from the cell B3 and the data will start from the cell B4, like below:
A B C D E F G ...
1
2
3 Col1 Col2
4 15 20
5 16 21
6 ... ...
The problem is that the last row number is unknown, because the table length can be 200 rows today and it can be 350 rows tomorrow.
So how can I import this table from Excel (.XLSM) to SAS-table?
I read somewhere that we can use DATAROW in Proc Import when DBMS=EXCEL like below:
proc import datafile = "!datafile" out=Table1 DBMS = EXCEL REPLACE;
SHEET = "Sheet1"; GETNAMES=YES; MIXED=YES; USEDATE=YES; SCANTIME=YES; NAMEROW=3; DATAROW=4;
run;
However, SAS cannot recognize the DATAROW option, giving the error:
ERROR 180-322: Statement is not valid or it is used out of proper order.
There is another way of importing table from Excel like:
PROC SQL;
CONNECT TO EXCEL (PATH='C:\\thepath\excelfile.xlsm');
Create Table Table1 as SELECT * FROM CONNECTION TO EXCEL
(SELECT * FROM [Sheet1$]);
DISCONNECT FROM EXCEL;
QUIT;
Does anyone know how to export a table with an unknown number of rows from .XLSM to SAS?
I found an "ineffective" alternative solution which reads all possible rows in Excel (reads 50.000 rows), at the same time it checks every row under the column Col1 if these rows have a value.
It takes 7-8 seconds, and it works. But as I wrote, it feels ineffective to read the whole 50.000 rows. Does anyone have any better idea?
PROC SQL;
CONNECT TO EXCEL (PATH='C:\\thepath\excelfile.xlsm');
Create Table Table1 as SELECT * FROM CONNECTION TO EXCEL
(SELECT * FROM [Sheet1$B3:C50000] WHERE Col1 IS NOT NULL);
DISCONNECT FROM EXCEL;
QUIT;
You can use a direct connection to Excel using the libname statement:
libname xlsFile Excel 'C:\\thepath\excelfile.xlsm';
data want;
set xlsFile.'Sheet1$'n(firstobs=3);
where NOT missing(Col1);
run;
This is assuming you have Excel installed on the SAS server, and have purchased SAS/ACCESS to PC Files.

Date format issue in ssis

I have to import data from Excel file to SSIS but i am facing a problem in date column,in excel sheet date format is yyyy/mm/dd and when it gets upload in database it get change into yyyy/dd/mmm format.
How to fix this?
Use the SUBSTRING function in the derived column while importing the date,
(LEN(TRIM(SUBSTRING(ReceivedDateTime,1,8))) > 0 ? (DT_DBDATE)(SUBSTRING(ReceivedDateTime,1,4) + "-" + SUBSTRING(ReceivedDateTime,5,2) + "-" + SUBSTRING(ReceivedDateTime,7,2)) : (DT_DBDATE)NULL(DT_WSTR,5))
If the Data is there then use Substring function to extract the exact date that sets in the DB or if the date does not exist then insert NULL in the DB.
I see two options:
Data Conversion Transformation to convert to a text string in
the appropriate format. Using SSIS data types.
Add a script task that converts the data type. Using VB data types
First Create Table into Your Database Using below Command
CREATE TABLE [dbo].[Manual] (
[Name] nvarchar(255),
[Location] nvarchar(255),
[Date] datetime
)
SET DATEFORMAT YDM
By using DATEFORMAT YDM ,Date Will import in YYYY/DD/MM Format .Before runnung package modify the package and at the time of Column mapping select The Check Box "Delete Rows in Destination Table" .
Then Execute the Package . It Will work .

Resources