I need to create a SSIS package that would extract data from an Excel source and load it into a SQL Server Destination.
The Excel file name would have a date, typically the file name would look like emp_20110909.xls where 11 is the Month, 09 is the Day and 09 is the Year. Now I want to capture this date and in the destination table add another column named "Extracted_Date" and populate the captured date for all the records extracted from this excel.
Can anyone tell me how to do that process?
Excel as a data source offers no explicit functionality for this whereas the Flat File Source does. I blogged about this under What is the name of a file
What you're looking to do is have a Foreach File Enumerator look in a folder for your Excel file(s). Assign the value of the currently found file to a variable like #[User::CurrentFileName]. That would look something like C:\ssisdata\mySource\Input\emp_110909.xls
You would update the Excel Connection Manager to have an expression on the ExcelFilePath property so now as the value of #[User::CurrentFileName] changes, so does the actual referenced file. You can find plenty of references to using the foreach enumerator on the web or search my answers
The last bit you need is to parse the value of CurrentFileName to find the year
(11), month (09) and day (09) elements - or maybe you want it as one big value (110909). For this, I would create 4 variables: FileDate, FileYear, FileMonth, FileDay all as string. Yes, they're numbers but for our usage, treating them as string is going to be easier.
FileDate will correspond to everything between the underscore following emp up until the period of xls. We're going to use the Expression language of SSIS to do this and the particular elements will be SUBSTRING, FINDSTRING and LEN
SUBSTRING(#[User::CurrentFileName], FINDSTRING(#[User::CurrentFileName], "emp_", 1) + LEN("emp_"), 6)
Here, I was lazy and just "knew" the length was 6 and hardcoded as such. In the event that someone gives us a emp_20110909.xls this will fail. The preceding expression would be modified by finding the position of the period and then calculating the length from the emp_ position.
Now that we know FileDate, we can use SUBSTRING to slice out the first 2 elements for year, next 2 for month and final two for day.
You can then inject those values into your Data Flow via a Derived Task or push into an audit table via Execute SQL Task.
Related
Hello each month we receive a series of monthly returns from different accounts which go into a designated folder based on the account name. Each return has the new month's returns appended to all the previous monthly returns. I am running a vlookup function on my workbook based on the specific return I am looking for. Is it possible to change the source on the vlookup function so it takes the data from the most recently added file in the folder, that way it will contain all the most recent return data with all the previous returns?
Thanks
There are many ways to do that. The first step should be to connect to the designated folder. You should see then something like this:
Option 1: If the file contains the month
If your file contains the month you can use it to extract this information. Following the example above you could:
extract the first 7 charactes and parse it to a date.
sort the date in descending order, so the latest file will be on top
use keep rows to get rid of the rest of files
with the last file remaining, expand the content
Option 2: Use file properties
When you connect to a folder you can see the field "Date created". Use this the same way as explained in option 1.
Option 3: Remove duplicates
If for whatever reason the two options above are not possible, depending on your data you can:
join all files which will lead to duplicates
filter duplicates
This third option might not work if you could have two registers which look the same (all columns in the row have the same value) can appear in your dataset.
I'm trying to design a Microsoft Flow, which will create a outlook calendar event entry based on information in a SharePoint-online list.
The list will contain a value for a DueDate its a column of type Date, not including time.
I want to be able to create a outlook calendar entry on the date based off the duedate column. The calendar entry form in flow allows via dynamic content to add dates that also include time, however date columns not containing time cannot be added that way.
Is there a workaround to this, some expression that would allow me to fetch values from columns more freely and then possible append a time to it
I have tried converting the column in sharepoint to a Date with Time column and that workaround worked, however its not what I'm looking for. Id like to know how to be able to work around this because I don't necessarily want my column as a date-time column which can cause problems later on.
I have tried this expression:
formatDateTime(concat(item()?['DATE'], '08:00')'yyyy-MM-ddThh:mm:ss')
But I know this is wrong and it doesn't work. I'm simply not sure how to do it.
https://puu.sh/Df5ni/05cb882b23.png
I want the flow to add a calendar entry based off the due date column which i can append my own time to like the start of the day and last til the afternoon.
Actual results are I don't seem to be able to use a date column, just a date-time column for start and end times of the event, date column without time doesnt appear in the dynamic content list.
If there is some way to manually fetch values instead of using the dynamic content that is very powerful and can then possibly be converted to the right format using additional code.
Date column name in my list is date_without_time of type Date (Add time set to NO):
New element:
Function used in Create event (V2) action:
formatDateTime(triggerBody()?['date_without_time'],'yyyy-MM-ddT09:35')
Result:
Calendar:
I was trying to import an Excel worksheet into Access table and the worksheet had specific dates(E.g. 12/4/2017) as headers for columns.
And when i tried to import to Access, Access did not allow me to import that worksheet into table as "12/4/2017 isnt a valid field name"
Is there other ways to import the worksheet or work about this?
Thanks
Names of fields, controls & objects in Access:
Can be up to 64 characters long.
Can include any combination of letters, numbers, spaces, and special characters except a period (.), an exclamation point
(!), accent grave (``) or brackets ([ ]).
Can't begin with leading spaces.
Can't include control characters (ASCII values 0 through 31).
Can't include a double quotation mark (") in table, view, or stored procedure names in a Microsoft Access project.
(Source)
Date and time values in Excel are stored internally as a 64-bit floating point number. The value to the left of the decimal represents the number of days since December 30, 1899. The value to the right of the decimal represents the fraction of a day since midnight.
For example:
12:00 Noon is stored as 0.5.
1.0 represents midnight on January 1, 1900.
2.25 represents 6:00 AM on January 2, 1900.
Your example date 12/4/2017 would be stored as 43073.
Interpretation of datetime's depend on customization of regional settings according to Microsoft (not necessarily the country's government standard date format). For example, I live in North America, so by default, Excel would interpret 12/4/2017 as a date.
However, for various reasons, I prefer a date format of YYYY-MM-DD (technically named "ISO 8601"), so I changed the format in my Windows Settings. Therefore, when I enter 12/4/2017, Excel does not recognize it as a date, so it is stored as text, yet when I enter 2017-12-4, Excel knows to store it as a date.
Regional settings aside, I suspect that your field names may have times attached to them (even if they aren't formatted to display as such).
If the cell you'd like to use as a field name actually contains:
April 12, 2017 6:00 AM
which, if formatted as M/D/YYYY, "hides" the time, to display as:
12/4/2017
even though it is actually stored internally as:
43073.25
Given the Access field names can't contain a period (see above), Access becomes "confused" with the fraction of a day (.25).
Make sure your dates to be used as field names don't contain times.
You could:
Format the row that has the field names as text.
Right-click the row number and choose Format Cells.
Under the Number tab, choose Text
Use a function to remove the times:
If B1 contains a datetime you want to use as a field name in A1, you could use the Int function in cell A1 (to round the value down to a whole number):
=Int(B1)
The fraction (time) is removed but the value is still stored as a number/date.
Use a function to convert the datetime to text:
If B1 contains a date you want to use as a field name in A1, you could use the Text function in cell A1:
=Text(B1, "M/D/YYYY HH:MM")
As you can see in the image, Access allows me to use the dates as field names if they are properly formatted:
Related Further Reading:
TechRepublic: Techniques for successfully importing Excel data into Access
Office.com: Guidelines for naming fields, controls, and objects
ExcelTactics: The Definitive Guide to Using Dates and Times in Excel
Microsoft: How to use dates and times in Excel
Stack Overflow: MS Access - Date as Table Field Name
A note about Database Normalization:
Just because you can use dates as field names, that doesn't mean that you should. It is generally considered poor database design to have a field name so specific.
Perhaps your intention is to import the poorly-structured data into Access to fix this issue, but if not, you should consider storing the data in a more organized way that is conducive to database expansion and normalization.
If your data has date-specific field names:
...then the date should be added as part of the record, not as a field name:
...although this is still not normalized. Normalization is about optimizing efficiency and allowing for expansion, so perhaps the database could be setup more like:
With this method it would be database expansion and data analysis would be more logical (perhaps making it easier to find trends in Jane's troubling eating habits).
Alas, I digress. There is plenty of information available online about database normalization, to suit any experience level.
Further Reading about Normalization:
Wikipedia: Database Normalization
Microsoft: Description of Database Normalization Basics
ThoughtCo: Database Normalization Basics
Stack Overflow: Database normalization - who's right?
EDIT: (the result)
You didn't mention which method you're using to import the data from Excel to Access, which may be relevant (as there are several possible combinations). Access might handle the source data differently if your Excel data is saved in an XLSM vs XLS vs CSV, etc. Data could be imported using the New Source Data…from File interface, vs programmatically with VBA, or even other languages. Therefore, if you can't get one method working (with the dates formatted a specific way), try one of the other combinations.
For simplicity's sake, I used the built-in interface with an XLSM into an ACCDB. The result is demonstrated below:
Note that it worked even though I included times in the headers (and would work without times), since they are properly formatted as text, and First Column Contains Column Headers is selected.
There are many great examples of USQL across single files. But, how would you replicate a very common data processing example where you want to take the current system time, subtract X number of days from that time and query a set of data based on that result? For a SQL example:
SELECT * FROM MyTable
WHERE Date >= CAST(GETDATE() AS DATE) - 30
AND Date <= CAST(GETDATE() AS DATE) - 1
In this above example, my Dates is my file location such as:
'yyyy' | 'MM' | 'DD' | Filename.csv
-- Example path
/MyDirectory/2017/12/01/SomeData.csv
Therefore, is there a way in USQL with Azure Data Lake Analytics to do similar, but with the file location instead of querying everything with "{date:yyyy}/{date:MM}/{date:dd}/" expressions?
If that's not possible, what about specifing a range at least like:
"/MyDirectory/2017/{10-12}/{1-30}/{filename:*}.csv"
I can combine all files to one directory and use the natural date fields in the data to filter with a SELECT statement after the extractor, but the point of the directory structure is to reduce un-needed reads (transactions) and only targeting specific directories needed for a query based on the date of the said file itself.
Maya is correct.
There are examples at U-SQL Language Reference and more specifically at EXTRACT Expression (U-SQL). See the example under "Multiple directories with multiple files". Here are some modifications to that example that appear to satisfy your ask.
1) The example is missing DECLARE #dir string = "/Samples/Data/AmbulanceData/";
2) Modify the DECLARE #file_set_path2 to read DECLARE #file_set_path2 string = #dir + "{date:yyyy}/{date:MM}/{date:dd}/vehicle{vid}_{*}.csv";
3. For your filter you could use WHERE date >= DateTime.Now.AddDays(-30) AND date <= DateTime.Now.AddDays(-1)
I have an Excel file with multiple sheets, but only the first sheet has the date listed on it. What I was trying to do was read the excel file, put it into a variable to be able to be used later on in a data flow task.
Normally it would be your run of the mill read and write data flow task, but since this information lies in the first page of the excel sheet with just mostly information about the report, it makes reading the information a bit more difficult.
Here's what the sheet looks like, and the only information that I was wanting from this whole sheet was on the Data Period line more specifically, Dec 2016
Any direction would be greatly appreciated, thank you.
Excel sheets can be queried like tables. You can use an Execute SQL Task to read a range of cells iterate over the results, or you can read a single cell as if it were a range and store its value in a variable.
The process is described in Read Excel Value in SSIS and contains quite a few gotchas :
Add an Excel conneciton manager that points to your Excel file
Set its result type to Single Row.
Set the query to SELECT * FROM [Sheet1$A6:A6]. That's the first gotcha. You can't specify column names. In a dataflow query you can write SELECT RIGHT(F1,8) FROM
[Sheet1$A6:A6] to extract only the date part. This doesn't work in
the Execute SQL Task.
In the Result Set section, map the 0 result set to a new string variable, eg PeriodCell. The name has to be 0. That's the second gotcha
You can create another variable based on an expression that returns only the 8 rightmost characters of PeriodCell, eg RIGHT( #[User::PeriodCell],8)
You can parse the string directly into a date if your system uses an English locale. In this case, you could create a DateTime variable with an expression (DT_DATE)RIGHT( #[User::PeriodCell],8). For example, (DT_DATE)"Dec 2016" returns 1/1/2016
Unfortunately, this won't work if your locale is not English, even if you change the package's Locale property.
If you have load the content of the cells of the "Data Period" column into a SQL table with SSIS, you can easilty convert them from excel date format to SQL date format usign one of the following:
Date and time
select dateadd(second, (#time_xls - ROUND(#time_xls,0))*86400, dateadd(d, ROUND(#time_xls,0),'1899-12-30'))
For example the value 42853.4673611111 is converted into "2017-04-28 11:13:00.000"
Only date
select dateadd(d,#time_xls,'1899-12-30')
For example the value 36464 is converted into "1999-10-31 00:00:00.000".