SSIS : Read Excel File too lines - excel

I work on SSIS project (Visual studio 2012)
A browser send me an excel file with articles.
Informations begin in A2, until column G.
Then I use
SELECT * FROM [ProduitsFamilles$A2:G]
My problem is, I saw in this file last line is 16143 (under empty cell)
And when I run it, It gives 16382 lines ... then 200 lines empty crush import in database cause primary key can't be empty.
I think it's because before send this file, browser delete old unless row.
Using "conditional Split" give good responce but I Want know If I can break directly empty row, like using where clause...

SSIS will handle excel files and stop at the empty row automatically if you choose a table instead of a sql statement. I believe you could also select specific columns in the Select clause of an sql statement instead of defining your range in the From clause. SSIS generally assumes there is a header row, though you can also specify this.
There is another possible issue, which is that cells can be active and empty in Excel instead of inactive. To test this, press ctrl and the down arrow at the top of a full column in your sheet. It should stop at the last cell with data in it (the 16143th row in your case); if it instead goes down to the 16382th row, you'll know you have a bunch of empty but active rows that need to be taken care of before importing.
In general, it's a lot easier to use .csv files with SSIS than Excel files, which tend to have these types of formatting issues.

Related

Is there a way to match values in Excel when an individual cell has multiple values?

Above is a picture of my Excel sheet. I have 2 columns of data that have multiple data points in them (separated by commas). This is how my data is spit out after running an online psychology experiment. I'm hesitant to split text to columns because some lines only have 3 values and other lines have 20+. Essentially, I need to match values in one column to values in the second column. For example, the first value in column G needs to match with the first value in column H. The second value needs to match with the second value, etc. I don't need to match up every value in both columns, however. I only need a (defined) subset of values.
I'm not sure if this is possible to do in Excel (or any Excel add-on) without separating the values into separate columns, but any help is appreciated!
I've seen this before in survey data - the output uses "packed data" where each cell contains many values. You will need Excel 2010+ for Windows (or Excel 365) for this solution. Otherwise, there a solution that is also Mac compatible that does not involve VBA, but it takes time to construct. This approach should take you 10 mins to do - a lot of steps, but it is just clicking.
Let's say that these are your data in two columns in a table.
Click anywhere inside the table. Open the Data tab and click on From Table/Range:
This will convert your data into an Excel Table and ask you if your table has headers - yes it does. Click OK.
This will open the Power Query (PQ) editor (congratulations, you are now a step closer to data scientist, so take a selfy with this screen in the back and share on social media).
You will see in the Applied Steps on the right hand side that PQ has helpfully detected the data type in a step called Changed Type. You need to undo that because it will likely think that your comma separated numbers are just one giant number. So click the X on the left side of that step.
On the right side, you can expand out Queries as shown above. Right click on your table and select Duplicate.
NB: This is not the most efficient way to do this, but I think this is something you just want do one time and you probably don't want to go hacking through the Advanced Editor.
So now you have two tables:
Rename Table1 (2) to Output in the box on the right hand side just to create some clarity.
Right Click on the Response RT column in Output and Remove it. Click on Table1 and do the same thing to the Response column. So now you have Table1 with only the Response RT and Output with only the Responses. Now we will parse these into rows of cleaned data.
Parse Table1
First, in Table1, click on the Response RT column and in the Home tab you will see Split Column. 1) Click on that and choose By Delimiter.
2) It will default to Comma, but you need to click on Advanced options and choose the Rows radio button.
Click OK and it should turn your data into rows of separated numbers and change to the type (this time helpfully) to decimal.
Now you need to add an index. 3) Go to the Add Column tab and click on Add Index, starting from 1.
Parse Ouput Table
Now go back to Output and repeat steps 1), 2) and 3) for it as well. Then you will have to take an extra step to clean up your text column. Right-Click on the Response column and choose Transform > Trim on the data.
That will get rid of those spurious spaces.
Merge Them Back Together
While you still have the Output table selected, go to the Home tab and choose Merge Queries.
It will bring up this window:
Choose Table1 from the bottom dropdown. Click on Index on both tables and click OK. You will get something like this:
Click on the button on the top right of the Table1 column and then unselect Index and Use original column name as prefix.
Click OK. Right click the Index column and Remove it. You now have your answer, but you still need to bring it back to Excel.
Putting it back in Excel
Click on Close and Load to on the left hand of the Home tab. To keep things simple, just click OK.
It will put both Output and Table1 as worksheets into your workbook, (this is where I said it is not the most efficient approach - you can always delete the Table1 worksheet. Excel will complain when you do, but you can ignore it.) Output is your answer.
Congratulations, you just did an ETL (extract transform and load) operation in data analytics. Do another selfy with the answer and share on social media.

Excel format lost while populating thru SSIS

I have an formatted excel destination with one of the columns being a percentage and another being currency. I'm loading this excel with data from a sql table using SSIS. However the excel is not formatted after the load. What is happening?
I was able export two columns of information (percent and dollar value) to an Excel file (97-2003 format) from an SSIS package created in Visual Studio 2010. The data was sourced from a table on a server using SQL Server 2012.
Steps:
Created an Excel file containing a column called Percent and another called Cost. Both were formatted accordingly and are located in the first row.
Saved the file in a 97-2003 format and then closed it.
Created a table in a database and populated it with a couple of records.
create table test_export (
mypercent numeric(18,2),
mydollar numeric(18,2))
insert into test_export values (2.1, 50.00)
insert into test_export values (4.5, 120.00)
Opened Visual Studio 2010 and created a new SSIS package. Created an OLEDB connection to the database.
Under the Control Flow tab added a Data Flow Task and double-clicked on it after it was added. This should now highlight the Data Flow tab.
Add an OLE DB Source and point to the newly added database connection, under Data access mode select Table or View and then select the table created in step #3. Select the OK button.
Add an Excel Destination making certain to create a new Excel connection manager for it and point to the Sheet1$. Make certain to indicate that the first row contains headers. Select the OK button and connect the OLE DB Source task to the Excel Destination task below. Since only numbers are involved, there is no need to apply a Data Conversion task between the two. But in the Excel Destination task remember to select Mappings and connect from left to right which source columns match up with which destination columns.
In the Solution Explorer pane, right-click on the solution and select Properties from the drop-down menu. Open Configuration Properties on the left and the select Debugging under it. On the right under Debug Options, look for Run64BitRuntime and change it from True to False if you have not already done so. Select the OK button.
Run the SSIS package.
Stop the SSIS package when all of the tasks show green for completed.
Open the Excel file and it will now contain the values imported from the database table with the Excel percent and dollar formats preserved from step #1.
I ran these steps in a test and everything worked perfectly. If your formats are not being preserved, then you are likely leaving out a step.
Hope this helps and please indicate if it answered your question.
I had the same problem. What I did was:
apply conditional formatting on the column.
see attached picture
It is also important that you have in the first row, a model row, filled with numeric/text values, in order for the ssis to export the numbers as numbers and not as strings model row
If your template contains several rows before the place where ssis start to export the data, you might need to have more model rows. (I needed 4). This rows have to be the first rows in the document, regardless where you start the export.
You can hide the model rows

Skipping rows when importing Excel into SQL using SSIS 2008

I need to import sheets which look like the following:
March Orders
***Empty Row
Week Order # Date Cust #
3.1 271356 3/3/10 010572
3.1 280353 3/5/10 022114
3.1 290822 3/5/10 010275
3.1 291436 3/2/10 010155
3.1 291627 3/5/10 011840
The column headers are actually row 3. I can use an Excel Sourch to import them, but I don't know how to specify that the information starts at row 3.
I Googled the problem, but came up empty.
have a look:
the links have more details, but I've included some text from the pages (just in case the links go dead)
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/97144bb2-9bb9-4cb8-b069-45c29690dfeb
Q:
While we are loading the text file to SQL Server via SSIS, we have the
provision to skip any number of leading rows from the source and load
the data to SQL server. Is there any provision to do the same for
Excel file.
The source Excel file for me has some description in the leading 5
rows, I want to skip it and start the data load from the row 6. Please
provide your thoughts on this.
A:
Easiest would be to give each row a number (a bit like an identity in
SQL Server) and then use a conditional split to filter out everything
where the number <=5
http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/947fa27e-e31f-4108-a889-18acebce9217
Q:
Is it possible during import data from Excel to DB table skip first 6 rows for example?
Also Excel data divided by sections with headers. Is it possible for example to skip every 12th row?
A:
YES YOU CAN. Actually, you can do this very easily if you know the number columns that will be imported from your Excel file. In
your Data Flow task, you will need to set the "OpenRowset" Custom
Property of your Excel Connection (right-click your Excel connection >
Properties; in the Properties window, look for OpenRowset under Custom
Properties). To ignore the first 5 rows in Sheet1, and import columns
A-M, you would enter the following value for OpenRowset: Sheet1$A6:M
(notice, I did not specify a row number for column M. You can enter a
row number if you like, but in my case the number of rows can vary
from one iteration to the next)
AGAIN, YES YOU CAN. You can import the data using a conditional split. You'd configure the conditional split to look for something in
each row that uniquely identifies it as a header row; skip the rows
that match this 'header logic'. Another option would be to import all
the rows and then remove the header rows using a SQL script in the
database...like a cursor that deletes every 12th row. Or you could
add an identity field with seed/increment of 1/1 and then delete all
rows with row numbers that divide perfectly by 12. Something like
that...
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/847c4b9e-b2d7-4cdf-a193-e4ce14986ee2
Q:
I have an SSIS package that imports from an Excel file with data
beginning in the 7th row.
Unlike the same operation with a csv file ('Header Rows to Skip' in
Connection Manager Editor), I can't seem to find a way to ignore the
first 6 rows of an Excel file connection.
I'm guessing the answer might be in one of the Data Flow
Transformation objects, but I'm not very familiar with them.
A:
Question Sign in to vote 1 Sign in to vote rbhro, actually there were
2 fields in the upper 5 rows that had some data that I think prevented
the importer from ignoring those rows completely.
Anyway, I did find a solution to my problem.
In my Excel source object, I used 'SQL Command' as the 'Data Access
Mode' (it's drop down when you double-click the Excel Source object).
From there I was able to build a query ('Build Query' button) that
only grabbed records I needed. Something like this: SELECT F4,
F5, F6 FROM [Spreadsheet$] WHERE (F4 IS NOT NULL) AND (F4
<> 'TheHeaderFieldName')
Note: I initially tried an ISNUMERIC instead of 'IS NOT NULL', but
that wasn't supported for some reason.
In my particular case, I was only interested in rows where F4 wasn't
NULL (and fortunately F4 didn't containing any junk in the first 5
rows). I could skip the whole header row (row 6) with the 2nd WHERE
clause.
So that cleaned up my data source perfectly. All I needed to do now
was add a Data Conversion object in between the source and destination
(everything needed to be converted from unicode in the spreadsheet),
and it worked.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
We provide guidance to our customers and vendors about how files must be formatted before we can process them and it is up to them to meet the guidlines as much as possible. People often aren't aware that files like that create a problem in processing (next month it might have six lines before the data starts) and they need to be educated that Excel files must start with the column headers, have no blank lines in the middle of the data and no repeating the headers multiple times and most important of all, they must have the same columns with the same column titles in the same order every time. If they can't provide that then you probably don't have something that will work for automated import as you will get the file in a differnt format everytime depending on the mood of the person who maintains the Excel spreadsheet. Incidentally, we push really hard to never receive any data from Excel (only works some of the time, but if they have the data in a database, they can usually accomodate). They also must know that any changes they make to the spreadsheet format will result in a change to the import package and that they willl be charged for those development changes (assuming that these are outside clients and not internal ones). These changes must be communicated in advance and developer time scheduled, a file with the wrong format will fail and be returned to them to fix if not.
If that doesn't work, may I suggest that you open the file, delete the first two rows and save a text file in a data flow. Then write a data flow that will process the text file. SSIS did a lousy job of supporting Excel and anything you can do to get the file in a different format will make life easier in the long run.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
Not entirely correct.
SSIS forces you to use the format and quite often it does not work correctly with excel
If you can't change he format consider using our Advanced ETL Processor.
You can skip rows or fields and you can validate the data the way you want.
http://www.dbsoftlab.com/etl-tools/advanced-etl-processor/overview.html
Sky is the limit
You can just use the OpenRowset property you can find in the Excel Source properties.
Take a look here for details:
SSIS: Read and Export Excel data from nth Row
Regards.

How to export SSIS to Microsoft Excel without additional software?

This question is long winded because I have been updating the question over a very long time trying to get SSIS to properly export Excel data. I managed to solve this issue, although not correctly. Aside from someone providing a correct answer, the solution listed in this question is not terrible.
The only answer I found was to create a single row named range wide enough for my columns. In the named range put sample data and hide it. SSIS appends the data and reads metadata from the single row (that is close enough for it to drop stuff in it). The data takes the format of the hidden single row. This allows headers, etc.
WOW what a pain in the butt. It will take over 450 days of exports to recover the time lost. However, I still love SSIS and will continue to use it because it is still way better than Filemaker LOL. My next attempt will be doing the same thing in the report server.
Original question notes:
If you are in Sql Server Integrations Services designer and want to export data to an Excel file starting on something other than the first line, lets say the forth line, how do you specify this?
I tried going in to the Excel Destination of the Data Flow, changed the AccessMode to OpenRowSet from Variable, then set the variable to "YPlatters$A4:I20000" This fails saying it cannot find the sheet. The sheet is called YPlatters.
I thought you could specify (Sheet$)(Starting Cell):(Ending Cell)?
Update
Apparently in Excel you can select a set of cells and name them with the name box. This allows you to select the name instead of the sheet without the $ dollar sign. Oddly enough, whatever the range you specify, it appends the data to the next row after the range. Oddly, as you add data, it increases the named selection's row count.
Another odd thing is the data takes the format of the last line of the range specified. My header rows are bold. If I specify a range that ends with the header row, the data appends to the row below, and makes all the entries bold. if you specify one row lower, it puts a blank line between the header row and the data, but the data is not bold.
Another update
No matter what I try, SSIS samples the "first row" of the file and sets the metadata according to what it finds. However, if you have sample data that has a value of zero but is formatted as the first row, it treats that column as text and inserts numeric values with a single quote in front ('123.34). I also tried headers that do not reflect the data types of the columns. I tried changing the metadata of the Excel destination, but it always changes it back when I run the project, then fails saying it will truncate data. If I tell it to ignore errors, it imports everything except that column.
Several days of several hours a piece later...
Another update
I tried every combination. A mostly working example is to create the named range starting with the column headers. Format your column headers as you want the data to look as the data takes on this format. In my example, these exist from A4 to E4, which is my defined range. SSIS appends to the row after the defined range, so defining A4 to E68 appends the rows starting at A69. You define the Connection as having the first row contains the field names. It takes on the metadata of the header row, oddly, not the second row, and it guesses at the data type, not the formatted data type of the column, i.e., headers are text, so all my metadata is text. If your headers are bold, so is all of your data.
I even tried making a sample data row without success... I don't think anyone actually uses Excel with the default MS SSIS export.
If you could define the "insert range" (A5 to E5) with no header row and format those columns (currency, not bold, etc.) without it skipping a row in Excel, this would be very helpful. From what I gather, noone uses SSIS to export Excel without a third party connection manager.
Any ideas on how to set this up properly so that data is formatted correctly, i.e., the metadata read from Excel is proper to the real data, and formatting inherits from the first row of data, not the headers in Excel?
One last update (July 17, 2009)
I got this to work very well. One thing I added to Excel was the IMEX=1 in the Excel connection string: "Excel 8.0;HDR=Yes;IMEX=1". This forces Excel (I think) to look at all rows to see what kind of data is in it. Generally, this does not drop information, say for instance if you have a zip code then about 9 rows down you have a zip+4, Excel without this blanks that field entirely without error. With IMEX=1, it recognizes that Zip is actually a character field instead of numeric.
And of course, one more update (August 27, 2009)
The IMEX=1 will succeed importing data with missing contents in the first 8 rows, but it will fail exporting data where no data exists. So, have it on your import connection string, but not your export Excel connection string.
I have to say, after so much fiddling, it works pretty well.
P.S. If you are using a x64 bit version, make sure you call the DTExec from C:\Program Files\Microsoft SQL Server\90\DTS.x86\Binn. It will load the 32 bit Excel driver and work fine.
Would it be easier to create the Excel Workbook in a script task, then just pick it up later in the flow?
The engine part of SSIS is good but the integration with Excel is awful
"Using SSIS in conjunction with Excel is like having hot tar funnelled up your iHole in a road cone"
Dr. Zim, I believe you were the one that originally brought up this question. I totally feel your pain. I love SSIS overall, but I absolutely hate the limited tools that come standard for Excel. All I want to do is Bold the Heading or Row1 record in Excel, and not bold the following records. I have not found a great way to do that; granted I am approaching this with no script tasks or custom extensions, but you would think something this simple would be a standard option. Looks like I may be forced to research and program up something fancy for a task that should be so fundamental. I've already spent a rediculous amount of time on this myself. Does anyone know if you can use Excel XML with Excel versions: 2000/XP/2003? Thanks.
This is an old thread but what about using a flat file connection and writing the data out as a formatted html document. Set the mime type in the page header to "application/excel". When you send the document as an attachment and the recipient opens the attachment, it will open a browser session but should pop Excel up over the top of it with the data formatted according to the style (CSS) specified in the page.
Can you have SSIS write the data to an Excel sheet starting at A1, then create another sheet, formatted as you like, that refers to the other sheet at A1, but displays it as A4? That is, on the "pretty" sheet, A4 would refer to A1 on the SSIS sheet.
This would allow SSIS to do what it's good for (manipulate table-based data), but allow the Excel to be formatted or manipulated however you'd like.
When excel is the destination in SSIS, or the target export type in SSRS, you do not have much control over formatting and specifying how you want the final file to be. I have written a custom excel rendering engine for SSRS once, as my client was so strict about the format of final Excel report generated. I used 'Excel xml' to get the job done inside my custom renderer. May be you can use XML output and convert it to Excel XML using XSLT.
I understand you would rather not use a script component so perhaps you could create your own custom task using the code that a script contains so that others can use this in the future. Check here for an example.
If this seems feasible the solution I used was CarlosAg Excel Xml Writer Library. With this you can create code which is similar to using the Interop library but produces excel in xml format. This avoids using the Interop object which can sometimes lead to excel processes hanging around.
Instead of using a roundabout way to do this exercise of trying to write data to particular cell(s), format the cell(s), style them which is indeed a very tedius effort considering the support SSIS has for EXCEL, we could go the "template" way to do this.
assume we need to write data in the so & so cell with all the custom formating thats done on it. Have all the formatting in a sheet, say "SheetActual", Whereas the cells that will hold the data will actually have Lookups/ refrences/ Formulaes to refer to the original data that SSIS exports in a hidden sheet say "SheetMasterHidden" of the same Excel connection. This "SheetMasterHidden" will essentially hold the master data in default format that SSIS writes data to the excel. This way you need not worry about formatting the data runtime.
Formatting the Excel is a one time work "IF" the formatting dont change very often. If the format changes and the format is decided runtime this solution maynot go very well.
The answer is in the question. Over time, it became a progress status. However, there is SSRS that will create Excel files if you create TABLE presentations. It works pretty well too.

SSIS Excel Data Source - Is it possible to override column data types?

When an excel data source is used in SSIS, the data types of each individual column are derived from the data in the columns. Is it possible to override this behaviour?
Ideally we would like every column delivered from the excel source to be string data type, so that data validation can be performed on the data received from the source in a later step in the data flow.
Currently, the Error Output tab can be used to ignore conversion failures - the data in question is then null, and the package will continue to execute. However, we want to know what the original data was so that an appropriate error message can be generated for that row.
According to this blog post, the problem is that the SSIS Excel driver determines the data type for each column based on reading values of the first 8 rows:
If the top 8 records contain equal number of numeric and character types – then the priority is numeric
If the majority of top 8 records are numeric then it assigns the data type as numeric and all character values are read as NULLs
If the majority of top 8 records are of character type then it assigns the data type as string and all numeric values are read as
NULLs
The post outlines two things you can do to fix this:
First, add IMEX=1 to the end of your Excel driver connection string. This will allow Excel to read the values as Unicode. However, this is not sufficient if the data in the first 8 rows are numeric.
In the registry, change the value for HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Nod\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows to 0. This will ensure that the driver looks at all the rows to determine the data type for the column.
Yes, you can. Just go into the output column list on the Excel source and set the type for each of the columns.
To get to the input columns list right click on the Excel source, select 'Show Advanced Editor', click the tab labeled 'Input and Output Properties'.
A potentially better solution is to use the derived column component where you can actually build "new" columns for each column in Excel. This has the benefits of
You have more control over what you convert to.
You can put in rules that control the change (i.e. if null give me an empty string, but if there is data then give me the data as a string)
Your data source is not tied directly to the rest of the process (i.e. you can change the source and the only place you will need to do work is in the derived column)
If your Excel file contains a number in the column in question in the first row of data, it seems that the SSIS engine will reset the type to a numeric type. It kept resetting mine. I went into my Excel file and changed the numbers to "Numbers stored as text" by placing a single quote in front of them. They are now read as text.
I also noticed that SSIS uses the first row to IGNORE what the programmer has indicated is the actual type of the data (I even told Excel to format the entire column as TEXT, but SSIS still used the data, which was a bunch of digits), and reset it. Once I fixed that by putting a single-quote in my Excel file in front of the number in the first row of data, I thought it would get it right, but no, there is additional work.
In fact, even though the SSIS External DataSource Column now has the type DT_WSTR, it will still read 43567192 as 4.35671E+007. So you have to go back into your Excel file and put single quotes in front of all the numbers.
Pretty LAME, Microsoft! But there's your solution. I have no idea what to do if the Excel file is not under your control.
I was looking for a solution for the similar issue, but didn't find anything on the internet. Although most of the found solutions work at design time, they don't work when you want to automate your SSIS package.
I resolved the issue and made it work by changing the properties of "Excel Source". By default the AccessMode property is set to OpenRowSet. If you change it to SQL Command, you can write your own SQL to convert any column as you wish.
For me SSIS was treating the NDCCode column as float, but I needed it as a string and so I used following SQL:
Select [Site], Cstr([NDCCode]) as NDCCode From [Sheet1$]
Excel source is SSIS behaves crazy. SSIS determines the type of data in a particualr column by reading first 10 rows.. hence the issue. If you have a text column with null values in first 10 roes, SSIS takes the data type as Int. With a bit of struggle, here is a workaround
Insert a dummy row (preferrably first row) in the worksheet. I prefer doing this thru a Script task, you may consider using some service to preprocess the file before SSIS connects to it
With the duummy row, you are sure that the datatypes will be set as you need
Read the data using Excel source and filter out the dummy row before you take it for further processing.
I know it is a bit shabby, but it works :)
I could fix this issue. while creating the SSIS package, I manually changed the specific column to text (Open the excel file select the column, right click on column, select format cells, in number tab select Text and save the excel).
Now create the SSIS package and test it. It works. Now try to use the excel file where this column was not set as text.
It worked for me and I could execute the package successfully.
This should be resolved simply, just untick the box "Frist row as column names" and all data will be collected as text data type. Only downside of this choice is that you have to manage the columns names from the auto names (column 1, 2 etc) and handle the first row which contains the column names.
I had trouble implementing the solution here - I could follow the instructions, but it only gave new errors.
I solved my conversion issues by using a Data Conversion entity. This can be found on the SSIS Toolbox under Data Flow Transformations. I placed the Data Conversion between my Excel Source and OLE DB Destination, linked Excel to Data C, Data C to OLE DB, double clicked Data C to bring up a list of the data columns. Gave the problem column a new Alias, and changed the Data Type column.
Lastly, in the Mappings of the OLE DB Destination, use the Alias column name, rather than the original Excel column name. Job done.
You can use a Data Conversion component to convert to the desired data types.

Resources