I have an Excel file with multiple sheets, but only the first sheet has the date listed on it. What I was trying to do was read the excel file, put it into a variable to be able to be used later on in a data flow task.
Normally it would be your run of the mill read and write data flow task, but since this information lies in the first page of the excel sheet with just mostly information about the report, it makes reading the information a bit more difficult.
Here's what the sheet looks like, and the only information that I was wanting from this whole sheet was on the Data Period line more specifically, Dec 2016
Any direction would be greatly appreciated, thank you.
Excel sheets can be queried like tables. You can use an Execute SQL Task to read a range of cells iterate over the results, or you can read a single cell as if it were a range and store its value in a variable.
The process is described in Read Excel Value in SSIS and contains quite a few gotchas :
Add an Excel conneciton manager that points to your Excel file
Set its result type to Single Row.
Set the query to SELECT * FROM [Sheet1$A6:A6]. That's the first gotcha. You can't specify column names. In a dataflow query you can write SELECT RIGHT(F1,8) FROM
[Sheet1$A6:A6] to extract only the date part. This doesn't work in
the Execute SQL Task.
In the Result Set section, map the 0 result set to a new string variable, eg PeriodCell. The name has to be 0. That's the second gotcha
You can create another variable based on an expression that returns only the 8 rightmost characters of PeriodCell, eg RIGHT( #[User::PeriodCell],8)
You can parse the string directly into a date if your system uses an English locale. In this case, you could create a DateTime variable with an expression (DT_DATE)RIGHT( #[User::PeriodCell],8). For example, (DT_DATE)"Dec 2016" returns 1/1/2016
Unfortunately, this won't work if your locale is not English, even if you change the package's Locale property.
If you have load the content of the cells of the "Data Period" column into a SQL table with SSIS, you can easilty convert them from excel date format to SQL date format usign one of the following:
Date and time
select dateadd(second, (#time_xls - ROUND(#time_xls,0))*86400, dateadd(d, ROUND(#time_xls,0),'1899-12-30'))
For example the value 42853.4673611111 is converted into "2017-04-28 11:13:00.000"
Only date
select dateadd(d,#time_xls,'1899-12-30')
For example the value 36464 is converted into "1999-10-31 00:00:00.000".
Related
I am accessing a SSAS DMV through Power Query in Excel via:
let
Source = AnalysisServices.Database(TabularServerName, TabularDBName,
[Query="select * from $SYSTEM.TMSCHEMA_EXPRESSIONS"])
in
Source
This works great in Power BI, but in Excel, the Expression column is limited to a max of 1024 characters. How do I get Power Query in Excel to give me the entire value? My largest values are around 15000 characters, so still within the stated limits of Power Query that I can find.
If I set up a table with a connection and query behind it, Excel can pull in the entire Expression column, but the downside is the server and database cannot be parameterized and have to be manually changed in the connection. Also I don't remember how to do this manually, so I always have to access the DMV from DAX Studio and export to Excel to set it up!
Update
I did some heavy transformations of this column. I parsed out a value, I used it to merge the file with itself and add a column that I then did a bunch of transformations on, and then used it to replace text within the original problem column. And something in that pulled in the whole value. I tried just doing small parts of this, like adding a column that referenced the problem column, or doing a replace in the problem column, and none of that worked.
So, no, not easy to duplicate or figure out which step fixed it, but for my purposes, I now have what I need.
I think it is related to the type of the column your are loading in Excel. I had the same issue and read your answer (with Table.ReplaceValue).
Your solution is hiding the initial point : The function used in the expression you shared for Table.ReplaceValue() is Replacer.ReplaceText that as the additional specificity to convert a field of type Any
to type Text.
I tried to juste change the type of my field that was truncated when loaded in Excel, from type Any to type Text. Result : the complete values were then loaded in my worksheet.
I had to change this query today, and after I changed it, the values were truncated again. I added a Replace Value step at the end of the query on the truncated column and that seemed to fix it.
#"Replaced Value" = Table.ReplaceValue(#"Last Step","in ","in ",Replacer.ReplaceText,{"Truncated Column Name"})
in
#"Replaced Value"
I need to create a SSIS package that would extract data from an Excel source and load it into a SQL Server Destination.
The Excel file name would have a date, typically the file name would look like emp_20110909.xls where 11 is the Month, 09 is the Day and 09 is the Year. Now I want to capture this date and in the destination table add another column named "Extracted_Date" and populate the captured date for all the records extracted from this excel.
Can anyone tell me how to do that process?
Excel as a data source offers no explicit functionality for this whereas the Flat File Source does. I blogged about this under What is the name of a file
What you're looking to do is have a Foreach File Enumerator look in a folder for your Excel file(s). Assign the value of the currently found file to a variable like #[User::CurrentFileName]. That would look something like C:\ssisdata\mySource\Input\emp_110909.xls
You would update the Excel Connection Manager to have an expression on the ExcelFilePath property so now as the value of #[User::CurrentFileName] changes, so does the actual referenced file. You can find plenty of references to using the foreach enumerator on the web or search my answers
The last bit you need is to parse the value of CurrentFileName to find the year
(11), month (09) and day (09) elements - or maybe you want it as one big value (110909). For this, I would create 4 variables: FileDate, FileYear, FileMonth, FileDay all as string. Yes, they're numbers but for our usage, treating them as string is going to be easier.
FileDate will correspond to everything between the underscore following emp up until the period of xls. We're going to use the Expression language of SSIS to do this and the particular elements will be SUBSTRING, FINDSTRING and LEN
SUBSTRING(#[User::CurrentFileName], FINDSTRING(#[User::CurrentFileName], "emp_", 1) + LEN("emp_"), 6)
Here, I was lazy and just "knew" the length was 6 and hardcoded as such. In the event that someone gives us a emp_20110909.xls this will fail. The preceding expression would be modified by finding the position of the period and then calculating the length from the emp_ position.
Now that we know FileDate, we can use SUBSTRING to slice out the first 2 elements for year, next 2 for month and final two for day.
You can then inject those values into your Data Flow via a Derived Task or push into an audit table via Execute SQL Task.
I was trying to import an Excel worksheet into Access table and the worksheet had specific dates(E.g. 12/4/2017) as headers for columns.
And when i tried to import to Access, Access did not allow me to import that worksheet into table as "12/4/2017 isnt a valid field name"
Is there other ways to import the worksheet or work about this?
Thanks
Names of fields, controls & objects in Access:
Can be up to 64 characters long.
Can include any combination of letters, numbers, spaces, and special characters except a period (.), an exclamation point
(!), accent grave (``) or brackets ([ ]).
Can't begin with leading spaces.
Can't include control characters (ASCII values 0 through 31).
Can't include a double quotation mark (") in table, view, or stored procedure names in a Microsoft Access project.
(Source)
Date and time values in Excel are stored internally as a 64-bit floating point number. The value to the left of the decimal represents the number of days since December 30, 1899. The value to the right of the decimal represents the fraction of a day since midnight.
For example:
12:00 Noon is stored as 0.5.
1.0 represents midnight on January 1, 1900.
2.25 represents 6:00 AM on January 2, 1900.
Your example date 12/4/2017 would be stored as 43073.
Interpretation of datetime's depend on customization of regional settings according to Microsoft (not necessarily the country's government standard date format). For example, I live in North America, so by default, Excel would interpret 12/4/2017 as a date.
However, for various reasons, I prefer a date format of YYYY-MM-DD (technically named "ISO 8601"), so I changed the format in my Windows Settings. Therefore, when I enter 12/4/2017, Excel does not recognize it as a date, so it is stored as text, yet when I enter 2017-12-4, Excel knows to store it as a date.
Regional settings aside, I suspect that your field names may have times attached to them (even if they aren't formatted to display as such).
If the cell you'd like to use as a field name actually contains:
April 12, 2017 6:00 AM
which, if formatted as M/D/YYYY, "hides" the time, to display as:
12/4/2017
even though it is actually stored internally as:
43073.25
Given the Access field names can't contain a period (see above), Access becomes "confused" with the fraction of a day (.25).
Make sure your dates to be used as field names don't contain times.
You could:
Format the row that has the field names as text.
Right-click the row number and choose Format Cells.
Under the Number tab, choose Text
Use a function to remove the times:
If B1 contains a datetime you want to use as a field name in A1, you could use the Int function in cell A1 (to round the value down to a whole number):
=Int(B1)
The fraction (time) is removed but the value is still stored as a number/date.
Use a function to convert the datetime to text:
If B1 contains a date you want to use as a field name in A1, you could use the Text function in cell A1:
=Text(B1, "M/D/YYYY HH:MM")
As you can see in the image, Access allows me to use the dates as field names if they are properly formatted:
Related Further Reading:
TechRepublic: Techniques for successfully importing Excel data into Access
Office.com: Guidelines for naming fields, controls, and objects
ExcelTactics: The Definitive Guide to Using Dates and Times in Excel
Microsoft: How to use dates and times in Excel
Stack Overflow: MS Access - Date as Table Field Name
A note about Database Normalization:
Just because you can use dates as field names, that doesn't mean that you should. It is generally considered poor database design to have a field name so specific.
Perhaps your intention is to import the poorly-structured data into Access to fix this issue, but if not, you should consider storing the data in a more organized way that is conducive to database expansion and normalization.
If your data has date-specific field names:
...then the date should be added as part of the record, not as a field name:
...although this is still not normalized. Normalization is about optimizing efficiency and allowing for expansion, so perhaps the database could be setup more like:
With this method it would be database expansion and data analysis would be more logical (perhaps making it easier to find trends in Jane's troubling eating habits).
Alas, I digress. There is plenty of information available online about database normalization, to suit any experience level.
Further Reading about Normalization:
Wikipedia: Database Normalization
Microsoft: Description of Database Normalization Basics
ThoughtCo: Database Normalization Basics
Stack Overflow: Database normalization - who's right?
EDIT: (the result)
You didn't mention which method you're using to import the data from Excel to Access, which may be relevant (as there are several possible combinations). Access might handle the source data differently if your Excel data is saved in an XLSM vs XLS vs CSV, etc. Data could be imported using the New Source Data…from File interface, vs programmatically with VBA, or even other languages. Therefore, if you can't get one method working (with the dates formatted a specific way), try one of the other combinations.
For simplicity's sake, I used the built-in interface with an XLSM into an ACCDB. The result is demonstrated below:
Note that it worked even though I included times in the headers (and would work without times), since they are properly formatted as text, and First Column Contains Column Headers is selected.
In my SSAS cube I have a dimension that includes 6 date fields. They are all defined in the same way, with a Key field that is a date type and a Name field that is char(10) in format yyyy-mm-dd.
When I include those fields in an Excel pivot table, they all work fine except one. That one field is displayed correctly, but doesn't behave correctly when filtered. In particular, specifying a between filter always returns zero rows. Same with a greater than filter. Begins with works fine.
Once again, this only happens for one of the six date fields. But all six date fields are configured identically as far as I can tell. What type of mistake could cause this?
EDIT
Using SQLServer Profiler I can see that the MDX generated from Excel are identical for the dates that work and the one that doesn't (except for the field names changing of course). If I restrict the pivot table to a single date and add a between filter then the MDX is:
SELECT NON EMPTY Hierarchize({DrilldownLevel({[Participation Program].[Participation Start Date].[All]},,,INCLUDE_CALC_MEMBERS)})
DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS
FROM (SELECT Filter([Participation Program].[Participation Start Date].[Participation Start Date].AllMembers,
([Participation Program].[Participation Start Date].CurrentMember.member_caption>="1985"
AND [Participation Program].[Participation Start Date].CurrentMember.member_caption<="1990")) ON COLUMNS
FROM [Compass3])
WHERE ([Child].[Child is Handicapped].&[T],[Measures].[Child Count]) CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS
That returns correct results for [Participation Start Date], but if I do the same thing with [Participation Stop Date] it returns 0 results. So it is a problem on the SSAS side rather than Excel side. But I still can see no difference in the way the two dates are configured in the cube, and I am 100% certain that there is data that should match the specified date range.
I would change the Key for those attributes to a numeric representation of the date in YYYYMMDD format. I would achieve this via a SQL View where I would use the CONVERT function.
I would not use the date datatype at all in SSAS, as it's internal representation is obscure/uncertain.
I've come across a similar problem using epplus. If you haven't defined the date explicitly, Excel may not realize that the data is a date. This causes problems with sorts and filtering. (Note that the code below is C#, but the idea behind it with respect to the excel formulas/formats should be the same.)
When you generate the excel cell, explicitly define a date formula in the cell:
ws.Cells[rowCount, columnCount].Formula = "=DATE(" + myDate.ToString("yyyy,M,d") + ")";
Then, set the cell formatting to display the date the way you want:
ws.Cells[rowCounter, columnCount].Style.Numberformat.Format = "d-mmm-yyyy";
When I am importing the excel file in date format(dd/mm/yyyy) to access it is showing non date value what is the problem.
eg:27/3/2012 to 33765 here 33765 is a non date value
The reason is because all dates within the MS Office world are actually numeric values, e.g the date you gave above 27/03/2012 equates to 40995.
When using the import wizard to create a new table, Access will automatically read this as it's true numeric value and set the column type as so.
Fortunately it's easy to fix, open up the design view of your table, amend the column in question to a date format and save the table, this should now present all the values correctly.
Alternatively, setup a blank table with the column already set to the date data type and then import your spreadsheet into the existing table.
Access by default thinks that dates coming from Excel are in American date format (mm/dd/yyyy) not European (dd/mm/yyyy).
The number you are seeing is how excel is storing the dates. (every day since 1/1/1990 is +1).
When the dates are in european format Access doesnt recognize them as dates during import and may load their raw serial date numbers.
Additionally be sure that the Access column that these are importing to is set as 'date'. I figured that you may have already done that, but I wanted to check.