SSIS: failed to retrieve long data / truncation errors - excel

I'm getting either of those two errors when trying to export data from a set of excel spreadsheets.
Simplified scenario:
two excel spreadsheets containing 1 text column
in file 1 the text is never longer than 200 characters
in the 2nd - it is.
SSIS suppose to import them automatically from a folder - easy and simple, but...
Excel source component decides what data type is used here.
When, using created by me sample file with sample text data, it decides to use DT_WSTR(255) it fails with the second file with the truncation error.
When I force it to use DT_NTEXT (by creating longer text in the sample file) if fails with the 1st file complaining that "Failed to retrieve long data for column"... because the 1st file doesn't contain longer texts...
Has anybody found a solution/work-around for this? I mean - except manually changing the source data?
Any help much appreciated.

We can use Flat File Connection Manager instead of Excel Connection Manager. When we create Flat File Connection Manager we can set data type and length explicitly. To do so first we need to save the excel file as csv file or tab delimited file. Then we can use this file to create Flat File Connection. Drag and drop a Flat File Source in the Data Flow tab. In the Flat File Source Editor dialog box click New button and it will launches Flat File Connection Manager Editor dialog box. In the General tab specify the file full path and click Advanced tab. Then put data type and column width like below image.
Click OK and close the dialog box, this will create our connection manager. Now the connection manager can successfully read the full length data but we have to set the data type & length of the Output Columns so that we can get the data in the output pipeline. To do that right click on the Flat File Source and click Show Advanced Editor option. Then follow the below image instruction.
When we finish we run our package and it run successfully without any truncation error and insert all the data in our target database.

Related

Can I use CSV file as Excel pivot data source?

I have some external csv/txt files and I'd like to use them for a pivot table.
However, after I selected my csv file as the external data source, at the end of the guided procedure (header, seperators, etc.) Excel throws an error saying something along the lines of: it's impossible to use the selected type of connection for a pivot table.
Now, I know how to do it with another excel/db table - here it would come very handy to use a csv/txt file.
Can this be done natively, without external plugins?
Use the keyboard shortcut Alt, D, P (not all at once like Alt+D+P, but press each one seperatly). This brings up the old-style pivot table wizard.
Select External Data Source
Click Get Data
Choose and click OK.
Name your data source and choose Microsoft Access Text Driver
Click Connect, uncheck Use Current Directory (unless that's what you want), and Select the Directory you want.
If you don't identify the file when you get back to the "Select a default table..." text box, you'll get prompted to select one.
At that point, click OK back through the dialog boxes. Eventually you'll get thrown into MSQuery where you can build the query you want. From there Return Data to Excel and you can build your pivot table.

Excel - How to connect to another Excel file?

I have a big spreadsheet(Excel file A) which will be updated every month. Also, I created a parametric search in another Excel file(file B) which can pull data from Excel file A. Therefore, Once I send my parametric search Excel file B to my colleagues, they can always pull the fresh data without updating file B (I would need to update file A monthly to keep data fresh)
I tried to connect data by using Microsoft Query/web data. However, I noticed that if I use web data, the source link changes everytime I update the File A. Therefore, the file B connection won't work.
(I uploaded the file A to JIRA as an attachment. I tried to upload to Sharepoint, but Excel does not recognize Excel file on Sharepoint as an Excel file, it recognize as a html file. Therefore, I gave up using sharepoint)
Is there a better way to achieve what I have described above?
Thanks,
Jennifer.
Since you are using SharePoint, choose From File > From SharePoint folder and input the root URL (e.g. https://companyname.sharepoint.com/sites/workspacename/).
This should give you a dialog box like this once you've logged in:
Click on Edit to open the query editor.
You likely only want one particular file in there, so click on Binary in the row that corresponds to the File A that you should have already uploaded to that space. This will import the Excel file.
Click to expand the Table in the row that corresponds to the table that you want to import. This should be the table you keep up to date that gets loaded in.

How to transfer data from Excel spreadsheet to flat file without column headers?

Got a simple data flow in SSIS with excel source and flat file destination. The headers in excel are on first row and in SSIS I've the 'first row has headers' ticked in excel connection manager.
In my flat file the data is being loaded and all the data looks correct except the headers from the excel.
When I set up my flat file connection manager (ffcm) it was using the comma delimited setting for the columns
Checked in columns in ffcm and all the columns were there.
and after a few runs I noticed that I had not ticked the 'Column names in the first data row' in the flat file connection manager. Now that I have done this I have an error
TITLE: Package Validation Error
ADDITIONAL INFORMATION:
Error at Data Flow Task [DTS.Pipeline]: "component "Flat File Destination" (487)" failed >validation and returned validation status "VS_NEEDSNEWMETADATA".
Error at Data Flow Task [DTS.Pipeline]: One or more component failed validation.
Error at Data Flow Task: There were errors during task validation.
(Microsoft.DataTransformationServices.VsIntegration)
So unticked that again but made no difference.
Checked the columns in the ffcm and they are now set to column0, column1, column2....etc.
Also when I run it it puts out a number of lines of commas realted to the rows in excel sheet:
,,,,,,,,,,,,
,,,,,,,,,,,,
,,,,,,,,,,,,
,,,,,,,,,,,,
and I seem to be getting in a bit of a pickle and need some better advice about what the problem may be.
It seems that you have lost the field mappings between your Excel Source and Flat File Destination since you last configured the values.
Unchecking and checking the box Column names in the first data row on the Flat File Connection Manager has renamed the actual column names of the flat file destination. These new columns should now be re-mapped on the Flat File Destination component.
If you notice the warning sign on your Flat file destination, double-click the flat file destination. You will receive a message something similar to the one shown below.
On the Flat File Destination, you will notice the warning message Map the column on the Mappings page if the field mappings have been lost..
On the Flat File Destination, you will notice that field mappings have been lost and you need click Mappings page to configure the field mappings between source and destination.
I believe that this is the issue you are facing.

Delete some columns, re-arrange remaining columns and move processed files for multiple .csv files using SSIS 2008 R2

Googled for some tips on how to crack this. But did not get any helpful hits.
Now, I wonder if I can achieve the same in SSIS or not.
There are multiple .csv files in a folder. What I am trying to achieve is to:
open each .csv file (I would use a parameter as filenames' change)
Delete some columns
re-arrange the remaining columns in a particular order
save the .csv file (without the Excel confirmation message box)
Close the .csv file
Move the processed file to another folder.
and re-start the entire above process until all the .csv files in the folder are processed.
Initially I thought I can use the For Each Loop Container and Execute process Task to achieve this. However, not able to find any resource as to how to achieve the above desired objective.
Example:
Header of every Source .csv file:
CODE | NAME | Value 1 | Value 2 | Value 3 | DATE | QTY | PRICE | VALUE_ADD | ZONE
I need to delete columns: NAME | VALUE_ADD | ZONE from each file and re-arrange the columns in the below order.
Desired column order:
CODE | DATE| Value 1 | Value 2 | Value 3 | PRICE | QTY
I know this is possible within SSIS. But am not able to figure it out. Thanks for your help in advance.
Easily done using the following four steps :
Use a "Flat file Connection" to open your CSV.
Use a "Flat file Source" component to read your CSV.
Use a "Derived column" component to rearrange your columns.
Use a "Flat file Destination" component to save your CSV.
Et voilà!
After a lot of experimenting, managed to get the desired result. In the end, it seemed so simple.
My main motive for creating this package was that I had a lot of .csv files that needed the laborious task of opening each file and running a macro that eliminated a couple of columns, rearranged the remaining columns in the desired format. Then I had to manually save each of the files after clicking on the Excel Confirmation boxes. That was becoming too much. I wanted just a one click approach.
Giving a detailed way of what I did. Hope it helps people who are tying to get data from multiple .csv files as source, then get only the desired columns in the order they need, and finally save the desired output as .csv files into a new destination.
In brief, all I had to use was use:
a For Each Loop Container
a Data Flow Task within it.
And within the Data Flow Task:
a Flat File Source
a Flat File Destination
2 Flat File Connection Managers - One each for Source and Destination.
Also, had to use 3 Variables - all String Data Types with Project Scope - which I named: CurrFileName, DestFilePath, and FolderPath.
.
Detailed Steps:
Set default values to the variables:
CurrFileName: Just provide the name of one of the .scv files (test.csv) for temporary purpose.
FolderPath: Provide the path where your source .csv files are located (C:\SSIS\Data\Input)
DestFilePath: Provide the Destination path where you want to save the processed files (C:\SSIS\Data\Input\Output)
Step 1: Drag a For Each Loop Container to the Control Flow area.
Step 2: In collection, select the enumerator as 'Foreach File Enumerator'.
Step 3: Under Enumerator Configuration, under Folder: provide the folder path where the .csv files are located (In my case, C:\SSIS\Data\Input) and in Files:, provide the extension (in our case: *.csv)
Step 4: Under Retrieve file name, select 'Name and extension' radio button.
Step 5: Then go to the Variable Mappings section and select the Variable (in my case: User::CurrFileName.
Step 6: Create the source connection (let's call it SrcConnection)- right-click in the Connection Managers area and select the Flat File Connection manager and select one of the .csv files (for temporary purpose). Go to the Advanced tab and provide the correct desired data type for the columns you wish to keep. Click OK to exit.
Step 7: Then go to the Properties of this newly created source Flat File Connection and click the small box adjacent to the Expressions field to open the Property Expressions Editor. under 'Property', select 'ConnectionString' and in the Expression space, enter: #[User::FolderPath] + "\" + #[User::CurrFileName] and click OK to exit.
Step 8: In Windows Explorer, create a new folder inside your Source folder (in our case: C:\SSIS\Data\Input\Output)
Step 9: Create the Destination connection (let's call it DestConnection) - right-click in the Connection Managers area and select the Flat File Connection manager and select one of the .csv files (for temporary purpose). Go to the Advanced tab and provide the correct desired data type for the columns you wish to keep. Click OK to exit.
Step 10: Then go to the Properties of this newly created source Flat File Connection and click the small box adjacent to the Expressions field to open the Property Expressions Editor. under 'Property', select 'ConnectionString' and in the Expression space, enter: #[User::DestFilePath] + #[User::CurrFileName] and click OK to exit.
Step 11: Drag the Data Flow Task to the Foreach Loop Container.
Step 12: In the Data Flow Task, drag a Flat File Source and in the Flat file connection manager: select the source connection (in this case: SrcConnection). In Columns, de-select all the columns and select only the columns that you require (in the order that you require) and click OK to exit.
Step 13: Drag a Flat File Destination to the Data Flow Task and in the Flat File Connection manager: select the destination connection (in this case: DestConnection). Then, go to the Mappings section and verify if the mappings are as per desired output. Click OK to exit.
Step 14: That's it. Execute the package. it should execute without any trouble.
Hope this helped :-)
It isn't clear why you want to use SSIS to do this: your task seems to be to manipulate text files outside the database, and it's usually much easier to do this in a small script or program written in a language with good CSV parsing support (Perl, Python, PowerShell, whatever). If this should be part of a larger package then you can simply call the script using an Execute Process task. SSIS is a great tool, but I find it quite awkward for a task like this.

SSIS excel Destination, how to force LongText?

I'm using SSIS to perform data migration.
I'm using an Excel destination file to output everything that's going wrong.
In this Excel file, I want to output the two Errors column (Error number and Error column) and also all columns from my input component.
This is nearly working except when I have string columns having more than 255 characters. When I set up my Excel destination, I create a new Table.
The Create Table statement defines Longtext properly as the data type :
CREATE TABLE `My data` (
`ErrorCode` Long,
`ErrorColumn` Long,
`ID` Long,
`MyStringColumn` LongText
)
This works the first time. Then, I remove all data from the Excel file because I want to clean up the excel file before outputing errors.
When I return in the Package designer, my columns definitions are messed up. Every text columns are handled as nvarchar(255), and no more ntext. That breaks my component as my data is exceeding 255.
How can I properly manage excel destinations ?
thx
[Edit] As I'm not sure of my interpretation, here is the errors message when I run the task :
Error: 0xC0202009 at MyDataTask, To Errors file [294]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E21.
Error: 0xC0202025 at MyDataTask, To Errors file [294]: Cannot create an OLE DB accessor. Verify that the column metadata is valid.
Error: 0xC004701A at MyDataTask, SSIS.Pipeline: component "To Errors file" (294) failed the pre-execute phase and returned error code 0xC0202025.
In SSIS packages that involve Excel Destination, I have used an Excel Template file format strategy to overcome the error that you are encountering.
Here is an example that first shows how to simulate your error message and then shows how to fix it. The example uses SSIS 2008 R2 with Excel 97-2003.
Simulation
Created a simple table with two fields Id and Description. Populated the table with couple of records.
Created an SSIS package with single Data Flow Task and the Data Flow Task was configured as shown below. It basically reads the data from the above mentioned SQL Server table and then tries to convert the Description column to Unicode text with character length set to 20.
Since the table has two rows that have Description column values exceeding 20 characters in length, the default Error configuration setting on the Data Conversion transformation would fail the package. However, we need to redirect all the error rows. So the Error configuration on the Data conversion task has to be changed as shown below to redirect the error rows.
Then, I have redirected the Error output to an Excel Destination that is configured to save the output to a file in the path C:\temp\Errors.xls. First execution of the package would be successful because the Excel file is empty to begin with.
The file will contain both the rows from table because both would have encountered the truncation error and hence redirected to the error output.
After deleting the contents in the Excel file without changing the column header, if we execute the package again it will fail.
Cause of the failure would be due to the error messages shown below.
That completes the simulation of the error mentioned in the question. And here is one possible way that the issue could be fixed.
Possible Solution
Delete the existing Excel File Destination to which the error output is redirected to. Create a new Excel Connection manager with the path C:\temp\Template.xls. Place a new Excel Destination and point it to the new Excel connection manager and also create the sheet within the new Excel file using the New button on the Excel Destination.
Create two package variables named TemplatePath and ActualPath. TemplatePath should have the value C:\temp\Template.xls and the ActualPath should have the value C:\temp\Errors.xls. the actual path is the path where you would like the file to be created.
Right-click on the Excel Connection Manager and set the DelayValidation property to False and set the ServerName expression to the variable #[User::ActualPath]. DelayValidation will make sure that the package doesn't throw errors during design time if the file C:\temp\Errors.xls doesn't exist. Setting the ServerName expression will ensure that the package will use the file path mentioned in the variable ActualPath to generate the file.
On the Control Flow tab, place a File System Task above the Data Flow task.
Configure the File System Task as shown below. So, the File System Task will copy the Template file C:\temp\Template.xls and will create a new destination file C:\temp\Errors.xls every time the package runs. If the file C:\temp\Errors.xls already exists, then the File System Task will simply overwrite the file when the OverwriteDestination property within the File System Task is set to True.
Now, you can continue to run the package any number of times. The package will not fail and also you will have only the error messages from the last execution without having to manually clear the Excel file content.
Hope that helps.
[Edit] Added by Steve B. to provide a bit more details directly in the post because its too long for a comment
In my solution, I have in my SSIS project tow Excel files: Errors_Design_Template.xls and Errors_Template.xls'. The former file contains my sheets with the headers and one line of data (using formulas like =Rept("A",1024)` for input columns having 1024 length max), the latter is exactly the same without the first line of data.
Both files are copied at the start of the package from my source directory to temp directory. I use two files because I want to keep the design time validation, and I’m pointing to the copy of the template file in the Excel connection. I’m duplicating the template file also because I’m often executing a single data flow task of my package, and I want to populate a temp file, not the template file in my project (which has to remain empty but the headers and the first dummy line of data).
I also used two variables, one to use in Excel connection expression, one for the actual output file. I also had to write a script having my two variables as input. ActualFilePath is read/write. The script copies at run-time the value of the ActualFilePath to the ErrorFilePath variable. (I don’t have the source code by now, but I can paste it next week if it can helps).
Using this component together allows me to have the Excel connection pointing to the design file while designing, and pointing to the actual error file at run-time, without having to set the delayvalidation to true.
its better to use a 'execute task' in control flow.In execute task specify the connection to excel connection manager.In the SQL statement drop the excel table which is created during the sheet creation in excel destination. after drop create the same table.hence next time the data will be inserted in excel table.

Resources