SSIS Dynamic population of Excel File Name - excel

I've written an SSIS package to upload data from an Excel source to a OLE DB Destination - however when I wish to use a ForeachLoop container in order to load data from multiple excel files I am getting an error. I have followed the tutorial contained in the link below:
https://msdn.microsoft.com/en-us/library/ms345182.aspx
All of the configurations are correct apart from the Variable strFileName which needs to be dynamically populated. As can be seen from the screen shot below my variable remains blank:
I am unsure how to do this. Is there an expression or function that can be used to dynamically populate this variable?

If you want stored fileName dynamically for each one in your folder use the Variable Mappings in your loop like this
Mapping
And for your loop:
ForEach
Note that your variable is always blank because the field is only updated when you enter in your loop.

Related

How to load Excel raw data into power query without converting into data table?

I am trying to find the name and full path of the current excel file where the power query is run.
I dont need the filename as such, its just that I want to have access to a sheet which do not have any data table, rather raw data is there.
When I try the Excel.CurrentWorkbook() it only gives a list of tables in the current workbook. But when I try to access the file using its name and full path using File.Contents() then all the sheet objects are returned which includes the sheets that contain raw data (without being converted into a data table).
So my plan is, if I could get the file name and path of the current workbook, then I can use it to access the sheet. I cant hardcode the file name as it gets changed everyday with the date as suffix.
Is there any other way around it?
I don't think this is currently possible using Excel.CurrentWorkbook().
It's possible to use a substring of CELL("filename") as a named range to read in the current path and workbook name into Power Query to use File.Contents but at that point, it's probably easier just to convert the sheet to a named range instead (only a few keys/clicks: select all data and hit the From Table button in the Data tab Get & Transform ribbon section).

How to read the data from the multiple excel workbook in Uipath

The scenario is like i have a folder that contains aleast 4 to 5 excel workbook . The work book has a standard first name the rest of the name will vary. I need to take the count of the excel workbook then read the data's in workbook's and same it is in diffrent datatable's after each time .This has to be done in Uipath
I would recommend you to create this activity as a Library. This is kind of a pattern that can be reused everywhere.
You can find a complete example here. There you can also download it.
To summarize it:
User Select Folder activity -> yourFolder
Create variable with value Directory.GetFiles(yourFolder) -> fileArray
Go through the files via a For Each fileArray
And if you would like to use it as library, I would recommend you to add those things:
variable "FilterFileExtentions" to filter for specific files
variable "NameStartsWith" to filter files starting with specific String
It looks like you are looking to work with files first, to determine which Excel workbook you want to open. To do that you cold get a list of all files in specific folder by using .NET System.IO.Directory.GetFiles method.
So assuming you are working with your project folder you will have an Assign activity looking like this:
ListOfFiles = System.IO.Directory.GetCurrentDirectory().GetFiles()
Where ListOfFiles is a variables declared as System.String[]
You could then iterate this array using For Each activity or get a count of workbooks by using its .Count property

how to dynamically create excel files with ssis

I have set up a sql task that loads the full result set of names into an object variable, I have it connected to a foreach loop that scans the whole object row by row. I'm unsure about the next steps though. If I can create a data flow task and somehow set up the destination variable equal to the for each loop mapping variable that would be nice. Any tips?
Based on what you described, all you need to do is as following:
1: The execute SQL task to return a list of excel file names, which you have already had.
Connect the output to Foreach Loop Container and starts to iterate each name.
Inside the container, the first task you need is Script Task, which is used for creating each excel file.
I assume that the excel format are the same for all that you need to populate. You need create a new template with desired column header name specified.
For that script task, take the mapped variable from container as the read only variable, you need to create another variable, set it as read and write, suppose it is named as A; for storing the dynamic excel file path for each one,and edit the script.
If you are familiar with C#, it will be easy for you to Copy the template for each iterating name.
Code will be like this:
Using System.IO;
...
...
...
string source = "C:\\template.xlsx";//need to be a full path
string target = "C:\\" + Dts.Variables["that read only variable"].Value.Tostring() + ".xlsx"
File.Copy(source,target );
Dts.Variables["A"].Value = target; //important!
After Script task, need a constrained data flow task, inside that, you need a excel destination, tricky part is (1): you need set a dynamic ExcelFile path from the properties for that excel connection manager, I suggest first time use an existing excel for cache the mapping, then for the dynamic connection part, select A, which is the read and write variable from the script task.
For populating the data to excel, you need to convert all the varchar type to nvarchar, this could be done using either derived column or data conversion
Last but not least, set delay validation to TRUE for both the connection manager, excel destination and the entire data flow task, it is very important for dynamic process.
All above might be a brief explanation, but that is the main idea.
PS: (1)Excel is very picky in SSIS, if you do not have data access engine installed, might not populate the data successfully. for excel it may need .JET (older) or .ACE(newer) provider.
(2)If your header row is not simply in the 1st row, you might also need to think about OPENROWSET properties.

SSIS excel Destination, how to force LongText?

I'm using SSIS to perform data migration.
I'm using an Excel destination file to output everything that's going wrong.
In this Excel file, I want to output the two Errors column (Error number and Error column) and also all columns from my input component.
This is nearly working except when I have string columns having more than 255 characters. When I set up my Excel destination, I create a new Table.
The Create Table statement defines Longtext properly as the data type :
CREATE TABLE `My data` (
`ErrorCode` Long,
`ErrorColumn` Long,
`ID` Long,
`MyStringColumn` LongText
)
This works the first time. Then, I remove all data from the Excel file because I want to clean up the excel file before outputing errors.
When I return in the Package designer, my columns definitions are messed up. Every text columns are handled as nvarchar(255), and no more ntext. That breaks my component as my data is exceeding 255.
How can I properly manage excel destinations ?
thx
[Edit] As I'm not sure of my interpretation, here is the errors message when I run the task :
Error: 0xC0202009 at MyDataTask, To Errors file [294]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E21.
Error: 0xC0202025 at MyDataTask, To Errors file [294]: Cannot create an OLE DB accessor. Verify that the column metadata is valid.
Error: 0xC004701A at MyDataTask, SSIS.Pipeline: component "To Errors file" (294) failed the pre-execute phase and returned error code 0xC0202025.
In SSIS packages that involve Excel Destination, I have used an Excel Template file format strategy to overcome the error that you are encountering.
Here is an example that first shows how to simulate your error message and then shows how to fix it. The example uses SSIS 2008 R2 with Excel 97-2003.
Simulation
Created a simple table with two fields Id and Description. Populated the table with couple of records.
Created an SSIS package with single Data Flow Task and the Data Flow Task was configured as shown below. It basically reads the data from the above mentioned SQL Server table and then tries to convert the Description column to Unicode text with character length set to 20.
Since the table has two rows that have Description column values exceeding 20 characters in length, the default Error configuration setting on the Data Conversion transformation would fail the package. However, we need to redirect all the error rows. So the Error configuration on the Data conversion task has to be changed as shown below to redirect the error rows.
Then, I have redirected the Error output to an Excel Destination that is configured to save the output to a file in the path C:\temp\Errors.xls. First execution of the package would be successful because the Excel file is empty to begin with.
The file will contain both the rows from table because both would have encountered the truncation error and hence redirected to the error output.
After deleting the contents in the Excel file without changing the column header, if we execute the package again it will fail.
Cause of the failure would be due to the error messages shown below.
That completes the simulation of the error mentioned in the question. And here is one possible way that the issue could be fixed.
Possible Solution
Delete the existing Excel File Destination to which the error output is redirected to. Create a new Excel Connection manager with the path C:\temp\Template.xls. Place a new Excel Destination and point it to the new Excel connection manager and also create the sheet within the new Excel file using the New button on the Excel Destination.
Create two package variables named TemplatePath and ActualPath. TemplatePath should have the value C:\temp\Template.xls and the ActualPath should have the value C:\temp\Errors.xls. the actual path is the path where you would like the file to be created.
Right-click on the Excel Connection Manager and set the DelayValidation property to False and set the ServerName expression to the variable #[User::ActualPath]. DelayValidation will make sure that the package doesn't throw errors during design time if the file C:\temp\Errors.xls doesn't exist. Setting the ServerName expression will ensure that the package will use the file path mentioned in the variable ActualPath to generate the file.
On the Control Flow tab, place a File System Task above the Data Flow task.
Configure the File System Task as shown below. So, the File System Task will copy the Template file C:\temp\Template.xls and will create a new destination file C:\temp\Errors.xls every time the package runs. If the file C:\temp\Errors.xls already exists, then the File System Task will simply overwrite the file when the OverwriteDestination property within the File System Task is set to True.
Now, you can continue to run the package any number of times. The package will not fail and also you will have only the error messages from the last execution without having to manually clear the Excel file content.
Hope that helps.
[Edit] Added by Steve B. to provide a bit more details directly in the post because its too long for a comment
In my solution, I have in my SSIS project tow Excel files: Errors_Design_Template.xls and Errors_Template.xls'. The former file contains my sheets with the headers and one line of data (using formulas like =Rept("A",1024)` for input columns having 1024 length max), the latter is exactly the same without the first line of data.
Both files are copied at the start of the package from my source directory to temp directory. I use two files because I want to keep the design time validation, and I’m pointing to the copy of the template file in the Excel connection. I’m duplicating the template file also because I’m often executing a single data flow task of my package, and I want to populate a temp file, not the template file in my project (which has to remain empty but the headers and the first dummy line of data).
I also used two variables, one to use in Excel connection expression, one for the actual output file. I also had to write a script having my two variables as input. ActualFilePath is read/write. The script copies at run-time the value of the ActualFilePath to the ErrorFilePath variable. (I don’t have the source code by now, but I can paste it next week if it can helps).
Using this component together allows me to have the Excel connection pointing to the design file while designing, and pointing to the actual error file at run-time, without having to set the delayvalidation to true.
its better to use a 'execute task' in control flow.In execute task specify the connection to excel connection manager.In the SQL statement drop the excel table which is created during the sheet creation in excel destination. after drop create the same table.hence next time the data will be inserted in excel table.

How to loop through Excel files and load them into a database using SSIS package?

I need to create an SSIS package for importing data from multiple Excel files into an SQL database. I plan on using nested Foreach Loop containers to achieve this. One Foreach File Enumerator and nested within that, a Foreach ADO.net Schema Rowset Enumerator
Problem to consider: Sheet names are different between excel files but structure remains the same.
I have created an Excel Connection Manager, but the Schema Rowset Enumerator is not accepting the connection manager in the Enumerator configuration.
After researching, I found that you can use the Jet Ole db provider to connect to an excel file. However, I can only specify Microsoft Access Database Files as the data source. Attempting to insert an Excel File as the data source fails
After more research I found that you can use the Odbc Data Provider with a connection string instead of a DSN. After inserting a connection string specifying the Excel file this also failed
I have been told not to use a Script Task to accomplish this and even after trying a last ditch effort to extract data from sheets be accessing the sheets by index I found that the index for the sheets in the different excel files are different
Any help would be greatly appreciated
Here is one possible way of doing this based on the assumption that there will not be any blank sheets in the Excel files and also all the sheets follow the exact same structure. Also, under the assumption that the file extension is only .xlsx
Following example was created using SSIS 2008 R2 and Excel 2007. The working folder for this example is F:\Temp\
In the folder path F:\Temp\, create an Excel 2007 spreadsheet file named States_1.xlsx with two worksheets.
Sheet 1 of States_1.xlsx contained the following data
Sheet 2 of States_1.xlsx contained the following data
In the folder path F:\Temp\, create another Excel 2007 spreadsheet file named States_2.xlsx with two worksheets.
Sheet 1 of States_2.xlsx contained the following data
Sheet 2 of States_2.xlsx contained the following data
Create a table in SQL Server named dbo.Destination using the below create script. Excel sheet data will be inserted into this table.
CREATE TABLE [dbo].[Destination](
[Id] [int] IDENTITY(1,1) NOT NULL,
[State] [nvarchar](255) NULL,
[Country] [nvarchar](255) NULL,
[FilePath] [nvarchar](255) NULL,
[SheetName] [nvarchar](255) NULL,
CONSTRAINT [PK_Destination] PRIMARY KEY CLUSTERED ([Id] ASC)) ON [PRIMARY]
GO
The table is currently empty.
Create a new SSIS package and on the package, create the following 4 variables. FolderPath will contain the folder where the Excel files are stored. FilePattern will contain the extension of the files that will be looped through and this example works only for .xlsx. FilePath will be assigned with a value by the Foreach Loop container but we need a valid path to begin with for design time and it is currently populated with the path F:\Temp\States_1.xlsx of the first Excel file. SheetName will contain the actual sheet name but we need to populate with initial value Sheet1$ to avoid design time error.
In the package's connection manager, create an ADO.NET connection with the following configuration and name it as ExcelSchema.
Select the provider Microsoft Office 12.0 Access Database Engine OLE DB Provider under .Net Providers for OleDb. Provide the file path F:\Temp\States_1.xlsx
Click on the All section on the left side and set the property Extended Properties to Excel 12.0 to denote the version of Excel. Here in this case 12.0 denotes Excel 2007. Click on the Test Connection to make sure that the connection succeeds.
Create an Excel connection manager named Excel as shown below.
Create an OLE DB Connection SQL Server named SQLServer. So, we should have three connections on the package as shown below.
We need to do the following connection string changes so that the Excel file is dynamically changed as the files are looped through.
On the connection ExcelSchema, configure the expression ServerName to use the variable FilePath. Click on the ellipsis button to configure the expression.
Similarly on the connection Excel, configure the expression ServerName to use the variable FilePath. Click on the ellipsis button to configure the expression.
On the Control Flow, place two Foreach Loop containers one within the other. The first Foreach Loop container named Loop files will loop through the files. The second Foreach Loop container will through the sheets within the container. Within the inner For each loop container, place a Data Flow Task that will read the Excel files and load data into SQL
Configure the first Foreach loop container named Loop files as shown below:
Configure the first Foreach loop container named Loop sheets as shown below:
Inside the data flow task, place an Excel Source, Derived Column and OLE DB Destination as shown below:
Configure the Excel Source to read the appropriate Excel file and the sheet that is currently being looped through.
Configure the derived column to create new columns for file name and sheet name. This is just to demonstrate this example but has no significance.
Configure the OLE DB destination to insert the data into the SQL table.
Below screenshot shows successful execution of the package.
Below screenshot shows that data from the 4 workbooks in 2 Excel spreadsheets that were creating in the beginning of this answer is correctly loaded into the SQL table dbo.Destination.
I ran into an article that illustrates a method where the data from the same excel sheet can be imported in the selected table until there is no modifications in excel with data types.
If the data is inserted or overwritten with new ones, importing process will be successfully accomplished, and the data will be added to the table in SQL database.
The article may be found here: http://www.sqlshack.com/using-ssis-packages-import-ms-excel-data-database/
Hope it helps.
I had a similar issue and found that it was much simpler to to get rid of the Excel files as soon as possible. As part of the first steps in my package I used Powershell to extract the data out of the Excel files into CSV files. My own Excel files were simple but here
Extract and convert all Excel worksheets into CSV files using PowerShell
is an excellent article by Tim Smith on extracting data from multiple Excel files and/or multiple sheets.
Once the Excel files have been converted to CSV the data import is much less complicated.

Resources