SSIS excel Destination, how to force LongText? - excel

I'm using SSIS to perform data migration.
I'm using an Excel destination file to output everything that's going wrong.
In this Excel file, I want to output the two Errors column (Error number and Error column) and also all columns from my input component.
This is nearly working except when I have string columns having more than 255 characters. When I set up my Excel destination, I create a new Table.
The Create Table statement defines Longtext properly as the data type :
CREATE TABLE `My data` (
`ErrorCode` Long,
`ErrorColumn` Long,
`ID` Long,
`MyStringColumn` LongText
)
This works the first time. Then, I remove all data from the Excel file because I want to clean up the excel file before outputing errors.
When I return in the Package designer, my columns definitions are messed up. Every text columns are handled as nvarchar(255), and no more ntext. That breaks my component as my data is exceeding 255.
How can I properly manage excel destinations ?
thx
[Edit] As I'm not sure of my interpretation, here is the errors message when I run the task :
Error: 0xC0202009 at MyDataTask, To Errors file [294]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E21.
Error: 0xC0202025 at MyDataTask, To Errors file [294]: Cannot create an OLE DB accessor. Verify that the column metadata is valid.
Error: 0xC004701A at MyDataTask, SSIS.Pipeline: component "To Errors file" (294) failed the pre-execute phase and returned error code 0xC0202025.

In SSIS packages that involve Excel Destination, I have used an Excel Template file format strategy to overcome the error that you are encountering.
Here is an example that first shows how to simulate your error message and then shows how to fix it. The example uses SSIS 2008 R2 with Excel 97-2003.
Simulation
Created a simple table with two fields Id and Description. Populated the table with couple of records.
Created an SSIS package with single Data Flow Task and the Data Flow Task was configured as shown below. It basically reads the data from the above mentioned SQL Server table and then tries to convert the Description column to Unicode text with character length set to 20.
Since the table has two rows that have Description column values exceeding 20 characters in length, the default Error configuration setting on the Data Conversion transformation would fail the package. However, we need to redirect all the error rows. So the Error configuration on the Data conversion task has to be changed as shown below to redirect the error rows.
Then, I have redirected the Error output to an Excel Destination that is configured to save the output to a file in the path C:\temp\Errors.xls. First execution of the package would be successful because the Excel file is empty to begin with.
The file will contain both the rows from table because both would have encountered the truncation error and hence redirected to the error output.
After deleting the contents in the Excel file without changing the column header, if we execute the package again it will fail.
Cause of the failure would be due to the error messages shown below.
That completes the simulation of the error mentioned in the question. And here is one possible way that the issue could be fixed.
Possible Solution
Delete the existing Excel File Destination to which the error output is redirected to. Create a new Excel Connection manager with the path C:\temp\Template.xls. Place a new Excel Destination and point it to the new Excel connection manager and also create the sheet within the new Excel file using the New button on the Excel Destination.
Create two package variables named TemplatePath and ActualPath. TemplatePath should have the value C:\temp\Template.xls and the ActualPath should have the value C:\temp\Errors.xls. the actual path is the path where you would like the file to be created.
Right-click on the Excel Connection Manager and set the DelayValidation property to False and set the ServerName expression to the variable #[User::ActualPath]. DelayValidation will make sure that the package doesn't throw errors during design time if the file C:\temp\Errors.xls doesn't exist. Setting the ServerName expression will ensure that the package will use the file path mentioned in the variable ActualPath to generate the file.
On the Control Flow tab, place a File System Task above the Data Flow task.
Configure the File System Task as shown below. So, the File System Task will copy the Template file C:\temp\Template.xls and will create a new destination file C:\temp\Errors.xls every time the package runs. If the file C:\temp\Errors.xls already exists, then the File System Task will simply overwrite the file when the OverwriteDestination property within the File System Task is set to True.
Now, you can continue to run the package any number of times. The package will not fail and also you will have only the error messages from the last execution without having to manually clear the Excel file content.
Hope that helps.
[Edit] Added by Steve B. to provide a bit more details directly in the post because its too long for a comment
In my solution, I have in my SSIS project tow Excel files: Errors_Design_Template.xls and Errors_Template.xls'. The former file contains my sheets with the headers and one line of data (using formulas like =Rept("A",1024)` for input columns having 1024 length max), the latter is exactly the same without the first line of data.
Both files are copied at the start of the package from my source directory to temp directory. I use two files because I want to keep the design time validation, and I’m pointing to the copy of the template file in the Excel connection. I’m duplicating the template file also because I’m often executing a single data flow task of my package, and I want to populate a temp file, not the template file in my project (which has to remain empty but the headers and the first dummy line of data).
I also used two variables, one to use in Excel connection expression, one for the actual output file. I also had to write a script having my two variables as input. ActualFilePath is read/write. The script copies at run-time the value of the ActualFilePath to the ErrorFilePath variable. (I don’t have the source code by now, but I can paste it next week if it can helps).
Using this component together allows me to have the Excel connection pointing to the design file while designing, and pointing to the actual error file at run-time, without having to set the delayvalidation to true.

its better to use a 'execute task' in control flow.In execute task specify the connection to excel connection manager.In the SQL statement drop the excel table which is created during the sheet creation in excel destination. after drop create the same table.hence next time the data will be inserted in excel table.

Related

SSIS: failed to retrieve long data / truncation errors

I'm getting either of those two errors when trying to export data from a set of excel spreadsheets.
Simplified scenario:
two excel spreadsheets containing 1 text column
in file 1 the text is never longer than 200 characters
in the 2nd - it is.
SSIS suppose to import them automatically from a folder - easy and simple, but...
Excel source component decides what data type is used here.
When, using created by me sample file with sample text data, it decides to use DT_WSTR(255) it fails with the second file with the truncation error.
When I force it to use DT_NTEXT (by creating longer text in the sample file) if fails with the 1st file complaining that "Failed to retrieve long data for column"... because the 1st file doesn't contain longer texts...
Has anybody found a solution/work-around for this? I mean - except manually changing the source data?
Any help much appreciated.
We can use Flat File Connection Manager instead of Excel Connection Manager. When we create Flat File Connection Manager we can set data type and length explicitly. To do so first we need to save the excel file as csv file or tab delimited file. Then we can use this file to create Flat File Connection. Drag and drop a Flat File Source in the Data Flow tab. In the Flat File Source Editor dialog box click New button and it will launches Flat File Connection Manager Editor dialog box. In the General tab specify the file full path and click Advanced tab. Then put data type and column width like below image.
Click OK and close the dialog box, this will create our connection manager. Now the connection manager can successfully read the full length data but we have to set the data type & length of the Output Columns so that we can get the data in the output pipeline. To do that right click on the Flat File Source and click Show Advanced Editor option. Then follow the below image instruction.
When we finish we run our package and it run successfully without any truncation error and insert all the data in our target database.

Improve speed of very slow appending of files to master file

I am trying to combine (or perhaps append is a better term) a group (10) of identical column Excel files into one master file.
I have tried a very simple process using a foreach loop in the control flow and simply doing an Excel Source to an Excel Destination. The process was not only slow (about 1 record pasted per second) but the process died after about 50k records.
It looks like:
Foreach Loop Container --> Data Flow task
where the Data Flow Task is Excel Source --> Excel Destination
At the end, I'd want to see one master file with all files appended. I recognize there are other tools that can do this like PowerQuery directly in Excel, but I'm trying to better understand SSIS and I have a lot of processing that would be better done in SQL Server.
Is there a better way to do this? I searched high and low online but couldn't find an example of this in SSIS.
This is very simple. The one thing I would suggest is to load to a flat file in csv format that easily opens in Excel.
Foreach Loop enumerated on filename.
In Foreach GUI set:
The path of the Excel files
The structure of the file (ex myfiles*.xls)
Go to Variable mapping and map the fully qualified name to a variable
Create an Excel connection to any one of the files.
In excel connection properties open Expression and set filepath to the variable from 5
Also in properties set delay validation to true
Add a dataflow task to the foreach loop container
goto dataflow
Use source assistant to read excel source
use destination assistant to load to a flat file (may sure not to overwrite destination or you will only get the last workbook

Exporting data into excel via SSIS - package on server ignores named range

I am trying to insert data into Excel file using SQL Server Integration Services. Everytime I have to create new excel file from template and fill two tables in one sheet, where first table starts on row 2 (data must start from 3 row), and second table starts on row 7 (data must start from 8 row). So, I created template excel file with two named ranges, in SSIS I created two Excel Destination Tasks and used named ranges as destination.
Everything perfectly works on my computer. I can run my package (in 32-bit mode), new excel file from template is created with filled properly tables.
Great, but it doesn't work properly on server. I created job that runs package with 32-bit option checked, added parameters and saved template on server. If I run job, it ends successfully, but excel file is not filled correctly. Whole saved data starts from row 2 (from both tables) and data from first table is overwritten by data from second table. It somehow ignores named ranges.
I tried another method without named ranges, that is, in Excel Destination Task I chosed SQL Command in "Data access mode" and write query SELECT * FROM [Sheet$A2:N2], but same history. Works locally, but not on server.
I downloaded package and template file from server and ran on my computer and everything worked properly...
Has anyone encountered such a problem?
Here are the steps how I managed to export data to Excel starting from 7-th row. For this example, assume you export 4 columns. Caveat - it works on SSIS 2012+.
Create a template Excel file, with named range (say, N1) at A6:D6 scoped for Workbook.
At Excel destination, open Advanced Editor, on Component Properties tab specify the following parameters - AccessMode choose OpenRowSet, OpenRowset type N1.
After that you have to map columns again at Excel destination.

SSIS Error in Connection Manager after modifying source file

Part of my package involves stripping the first row from an excel source in a script task before adding that data to a server in a data flow task.
The error message I get is VS_NEEDSNEWMETADATA. I have my Excel connection manager set to the excel file, and have "first row contains column headings" checked. And of course, The external columns for Excel Source are out of synchronization with the data source columns.
The problem is: the first row doesn't contain the column headings until I strip out the first row in my script task. But since that doesn't take place until the package runs, when I click my connection manager for the excel file, it doesn't know - it shows the first row before I strip it out.
I already tried delaying validation, but it still fails.
Any ideas on how to fix this predicament? Is there a way to basically set the connection manager to refresh or something after my script task has been completed?
Thanks
Point your connection manager to a version of the excel file that you have saved with the first row already stripped out, for the purposes of configuring the columns in the connection manager.
Then have an expression on the connection manager, setting the connection string property to the location of the file after the script task.
It will then not be out of sync with the static, 'configuration' version of the excel, but it is still pointed to the excel file that you want at runtime.
Cheers

How to transfer data from Excel spreadsheet to flat file without column headers?

Got a simple data flow in SSIS with excel source and flat file destination. The headers in excel are on first row and in SSIS I've the 'first row has headers' ticked in excel connection manager.
In my flat file the data is being loaded and all the data looks correct except the headers from the excel.
When I set up my flat file connection manager (ffcm) it was using the comma delimited setting for the columns
Checked in columns in ffcm and all the columns were there.
and after a few runs I noticed that I had not ticked the 'Column names in the first data row' in the flat file connection manager. Now that I have done this I have an error
TITLE: Package Validation Error
ADDITIONAL INFORMATION:
Error at Data Flow Task [DTS.Pipeline]: "component "Flat File Destination" (487)" failed >validation and returned validation status "VS_NEEDSNEWMETADATA".
Error at Data Flow Task [DTS.Pipeline]: One or more component failed validation.
Error at Data Flow Task: There were errors during task validation.
(Microsoft.DataTransformationServices.VsIntegration)
So unticked that again but made no difference.
Checked the columns in the ffcm and they are now set to column0, column1, column2....etc.
Also when I run it it puts out a number of lines of commas realted to the rows in excel sheet:
,,,,,,,,,,,,
,,,,,,,,,,,,
,,,,,,,,,,,,
,,,,,,,,,,,,
and I seem to be getting in a bit of a pickle and need some better advice about what the problem may be.
It seems that you have lost the field mappings between your Excel Source and Flat File Destination since you last configured the values.
Unchecking and checking the box Column names in the first data row on the Flat File Connection Manager has renamed the actual column names of the flat file destination. These new columns should now be re-mapped on the Flat File Destination component.
If you notice the warning sign on your Flat file destination, double-click the flat file destination. You will receive a message something similar to the one shown below.
On the Flat File Destination, you will notice the warning message Map the column on the Mappings page if the field mappings have been lost..
On the Flat File Destination, you will notice that field mappings have been lost and you need click Mappings page to configure the field mappings between source and destination.
I believe that this is the issue you are facing.

Resources