Our database needs to be filled with the zip code for every state in our country, we are provided with a catalog of zip codes in a xls file, we have to import this file to a table in a database hosted in Windows Azure.
I don't know if Stack Overflow allows me to post a link to our xls, but I'll describe the structure of the file:
Every sheet holds the zip code information for a whole state, inside every sheet we have fifteen columns with information such as zip code, type of terrain, type of area, locality, state, city, etc. Every sheet has the same columns and the information inside the cells may contain special characters (i.e. á, é, ó, ú, etc.) normal to Spanish language and this special characters need to be preserved. Also some cell may be empty or not and blank spaces are likely to appear in the contents of the cells (i.e. Villa de Montenegro).
We are looking for a way to import every sheet into our table without losing special characters or skipping empty cells. We have no prior experience doing this kind of task and wanted to know what is the best way to import it.
We tried a suggestion of importing the xls to CSV files and then importing those CSV to our database, but we tried some of the variations of the macro recommended here but the CSV are generated with many errors (Macros aren't our forte).
In short, what is the best way to import our xls to an Azure database table without losing empty cells, special characters nor failing when blank spaces are inside a cell?
I recently had to migrate some data in a similar way. I used the SQL Server 2014 Import and Export Data Wizard. I initially tried with a .csv, but it was finicky about quoted commas and such. When I saved it as a .xlsx file, I was able to upload it without a problem. It's pretty straight forward to use, just select your xls file as the source, configure the connection to your Azure database, next-next-next, and hopefully you get the happy path. I wrote about it on my blog, step by step with screenshots.
We found an easy, although slow, way to copy the contents from an xls using Visual Studio, the version we used was 2012 but it works with 2008 and 2013 too.
Open the Server Explorer.
Add a new connection, the url for the database is required, the credentials are the same as the ones you use to access the database on Azure. Test the connection if you like, if the credentials are correct then you're good to go.
After the connection has been made, expand the Tables section and select the table you wish to dump your data.
Right click and select view table data.
If the table is empty or it has already some data, the workflow is the same. The last record will be empty, select it.
Go to your xls file, for this to work, the number and order of the columns must be the same as the table you will be dumping the data. Select the rows desired, copy them.
Return to Visual Studio, while the last empty row is selected paste the data. The data will start to copy directly into your Azure database.
Depending on your internet connection and the amount of data you're coping, this might take a long time.
This is an easy solution, although not optimal. This works if you don't own SQL Server with all of its tools. Still gotta check if this works on the express edition, will update when I test.
Related
Hello and thanks for any help. I'm a noob with Access but my company has asked a question about possibly importing some information for certain records in Access. Basically, I work for an insurance company and they have a claims system which was built in Access about a decade ago by someone who has left the company recently. The system works and is fine for current needs. However, we were recently asked to amend a field for certain records (claims). Because there are about 200-300 records, we are looking at a possible import solution.
The problem is, I have never done this before and am worried that it might affect other records or other fields in the records. The only thing that needs to change is one field and the rest must remain unchanged.
I know the Access table name & record numbers (RecordNo - textbox in Access) and the name of the field I want to change (Reference - textbox in Access) but am unsure how this can be imported, how the excel file needs to be prepared and how to make sure no new records are added but instead existing records are amended.
For instance, can I just have 2 columns, one called "RecordNo" and the other "Reference"? Or do I need to add blank columns to account for the extra columns in the Access table? Do I need to create a named range or an excel table or simply put the columns in Excel? Is there any specific formatting that I should be using (Text or General or something else - the Reference will be a text value as it has both numbers and letters)? When importing in Access using the import wizard, do I need to choose "Append a copy of the records to the table"? How will it know which record to amend as the Access table will contain thousands of records that I don't want changed in any way?
I also have access to the "Navigation Pane" in Access where I can find tables and queries etc and not sure if the records in question can be bulk amended on the table instead?
To make matters more complicated, the Access database is on the server and needs to be accessed by multiple users at the same time so I would ideally like to test this out on a separate copy. But copying it to my own computer does not sever the connection with the copy on the server and any changes are reflected in the original copy immediately.
I tried looking online but I can't seem to find anything that will quash my worries. I can find a few articles talking about importing issues, though not what I would be interested in, but they are all for previous versions of Access and really I can barely understand the current version. We are using Excel and Access 2013.
If possible, I would rather not use VBA as the current database has a lot of it anyway and its's difficult to manage and navigate. I also have no idea about Access VBA, just excel.
Thank you
To make matters more complicated, the Access database is on the server
and needs to be accessed by multiple users at the same time so I would
ideally like to test this out on a separate copy. But copying it to my
own computer does not sever the connection with the copy on the server
and any changes are reflected in the original copy immediately
That is because you are only copying the front end. There is somewhere a separate access file that is the back end. Or it could be a different database system like MS SQL Server or MySQL. So your first task it to find where the actual data is.
Beyond that, under no circumstances import an excel file directly into an existing table. Create an excel file with the necessary fields (record identifier and new value) and import it as a new table, then create an update query to effect the changes you need.
I built a data entry UserForm to populate a worksheet that will serve as the raw database. The raw data requires further manipulation and analysis in order to be reported, so I set up a database connection using Get External Data>From Microsoft Query>Excel Files, pointed it to the file I was already working in, selected the fields I wanted and performed basic functions on those I wanted aggregated. This creates an Excel table where I then use formulas that to complete the analysis. It works great for me; I can add entries to the database, Refresh the summary table, the new entries are added and the formulas populate automatically.
The problem is that no one else can refresh the table because it's looking locally for the file. The connection string is:
DSN=Excel Files;DBQ=C:\Users\MyName\Desktop\Folder 1\Results.xlsm;DefaultDir=C:\Users\MyName\Desktop\Folder 1;DriverId=1046;MaxBufferSize=2048;PageTimeout=5;
I have a very basic understanding of the database connections, but I need this file to be as automated as possible by request of my colleague. Can I fix the connection string so that the file is "flexible" and can be refreshed on any computer? Is this the best solution? If not, what else can I do that does not involve downloading additional plugins or 3rd party add-ins?
If what you need is a file containing the raw data (a Database) AND one or more excel files connected to it that pick up the data from the database and work with this data, you need to split the two things. You can do the database with an access file located on a shared directory with an appropriate table and you can reproduce the user form in this file so the insertion of the data will be made in this file. Then you connect one or more excel files (using connection Mode = Share Deny None, so you can update the data and at the same time work with them from the excel files), the data will be imported in the files in tables and here you do all the proessing you need.
If one file is enough for you (you don't need to have a database with the row data separated and you don't need to use the file from different location simultaneously) and all the problem is that if the file is opened from a different location from the one specifyed in the connection string it does not work...well in this case (that seems the case) i don't know why to use a connection to the same file.
If what you need is a table for work with, just create it selecting the range with the data you already have inserted (Create a table - quick start guide) and then when you add data through the form instead of adding them in a "normal" row, add them to a new row of the table with something like WorkSheets("name").ListObjects("table_name").ListRows.Add and add the data in the new table row.
Like many, I have spreadsheet that draws data from over 40 text files as data sources. The text files are from another app, and need to be periodically updated into Excel.
The set of data source files and spreadsheet need to be able to be duplicated and run on different systems. This is where the astonishing inability of Excel to support data import from the spreadsheet folder (or relative paths at all) becomes a big problem. This question mentions the issue but has no solution.
I developed a crude workaround for this (IMHO) fundamental flaw in Excel. Map your spreadsheet folder to a drive letter with SUBST. Then import the data from the SUBST drive letter. That drive letter and path will become part of the spreadsheet, buried deep in dialogs, and very inconvenient to update. So instead, whenever you copy or move the spreadsheet, re-create the SUBST to the current folder. Ugly, but effective.
New Question: Using this technique, when I open the spreadsheet and click Refresh to refresh from the data sources, I have to click "Import" on over 40 dialogs - one for each file. How can I automate that process?
I discovered that under a data range properties, there is a setting for "Prompt for file name on refresh". By unchecking that, it is no longer necessary to click import for every linked file. The properties for each linked data source must be adjusted individually. There doesn't seem to be any ability to multi-select data sources.
I need to import sheets which look like the following:
March Orders
***Empty Row
Week Order # Date Cust #
3.1 271356 3/3/10 010572
3.1 280353 3/5/10 022114
3.1 290822 3/5/10 010275
3.1 291436 3/2/10 010155
3.1 291627 3/5/10 011840
The column headers are actually row 3. I can use an Excel Sourch to import them, but I don't know how to specify that the information starts at row 3.
I Googled the problem, but came up empty.
have a look:
the links have more details, but I've included some text from the pages (just in case the links go dead)
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/97144bb2-9bb9-4cb8-b069-45c29690dfeb
Q:
While we are loading the text file to SQL Server via SSIS, we have the
provision to skip any number of leading rows from the source and load
the data to SQL server. Is there any provision to do the same for
Excel file.
The source Excel file for me has some description in the leading 5
rows, I want to skip it and start the data load from the row 6. Please
provide your thoughts on this.
A:
Easiest would be to give each row a number (a bit like an identity in
SQL Server) and then use a conditional split to filter out everything
where the number <=5
http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/947fa27e-e31f-4108-a889-18acebce9217
Q:
Is it possible during import data from Excel to DB table skip first 6 rows for example?
Also Excel data divided by sections with headers. Is it possible for example to skip every 12th row?
A:
YES YOU CAN. Actually, you can do this very easily if you know the number columns that will be imported from your Excel file. In
your Data Flow task, you will need to set the "OpenRowset" Custom
Property of your Excel Connection (right-click your Excel connection >
Properties; in the Properties window, look for OpenRowset under Custom
Properties). To ignore the first 5 rows in Sheet1, and import columns
A-M, you would enter the following value for OpenRowset: Sheet1$A6:M
(notice, I did not specify a row number for column M. You can enter a
row number if you like, but in my case the number of rows can vary
from one iteration to the next)
AGAIN, YES YOU CAN. You can import the data using a conditional split. You'd configure the conditional split to look for something in
each row that uniquely identifies it as a header row; skip the rows
that match this 'header logic'. Another option would be to import all
the rows and then remove the header rows using a SQL script in the
database...like a cursor that deletes every 12th row. Or you could
add an identity field with seed/increment of 1/1 and then delete all
rows with row numbers that divide perfectly by 12. Something like
that...
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/847c4b9e-b2d7-4cdf-a193-e4ce14986ee2
Q:
I have an SSIS package that imports from an Excel file with data
beginning in the 7th row.
Unlike the same operation with a csv file ('Header Rows to Skip' in
Connection Manager Editor), I can't seem to find a way to ignore the
first 6 rows of an Excel file connection.
I'm guessing the answer might be in one of the Data Flow
Transformation objects, but I'm not very familiar with them.
A:
Question Sign in to vote 1 Sign in to vote rbhro, actually there were
2 fields in the upper 5 rows that had some data that I think prevented
the importer from ignoring those rows completely.
Anyway, I did find a solution to my problem.
In my Excel source object, I used 'SQL Command' as the 'Data Access
Mode' (it's drop down when you double-click the Excel Source object).
From there I was able to build a query ('Build Query' button) that
only grabbed records I needed. Something like this: SELECT F4,
F5, F6 FROM [Spreadsheet$] WHERE (F4 IS NOT NULL) AND (F4
<> 'TheHeaderFieldName')
Note: I initially tried an ISNUMERIC instead of 'IS NOT NULL', but
that wasn't supported for some reason.
In my particular case, I was only interested in rows where F4 wasn't
NULL (and fortunately F4 didn't containing any junk in the first 5
rows). I could skip the whole header row (row 6) with the 2nd WHERE
clause.
So that cleaned up my data source perfectly. All I needed to do now
was add a Data Conversion object in between the source and destination
(everything needed to be converted from unicode in the spreadsheet),
and it worked.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
We provide guidance to our customers and vendors about how files must be formatted before we can process them and it is up to them to meet the guidlines as much as possible. People often aren't aware that files like that create a problem in processing (next month it might have six lines before the data starts) and they need to be educated that Excel files must start with the column headers, have no blank lines in the middle of the data and no repeating the headers multiple times and most important of all, they must have the same columns with the same column titles in the same order every time. If they can't provide that then you probably don't have something that will work for automated import as you will get the file in a differnt format everytime depending on the mood of the person who maintains the Excel spreadsheet. Incidentally, we push really hard to never receive any data from Excel (only works some of the time, but if they have the data in a database, they can usually accomodate). They also must know that any changes they make to the spreadsheet format will result in a change to the import package and that they willl be charged for those development changes (assuming that these are outside clients and not internal ones). These changes must be communicated in advance and developer time scheduled, a file with the wrong format will fail and be returned to them to fix if not.
If that doesn't work, may I suggest that you open the file, delete the first two rows and save a text file in a data flow. Then write a data flow that will process the text file. SSIS did a lousy job of supporting Excel and anything you can do to get the file in a different format will make life easier in the long run.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
Not entirely correct.
SSIS forces you to use the format and quite often it does not work correctly with excel
If you can't change he format consider using our Advanced ETL Processor.
You can skip rows or fields and you can validate the data the way you want.
http://www.dbsoftlab.com/etl-tools/advanced-etl-processor/overview.html
Sky is the limit
You can just use the OpenRowset property you can find in the Excel Source properties.
Take a look here for details:
SSIS: Read and Export Excel data from nth Row
Regards.
This question is long winded because I have been updating the question over a very long time trying to get SSIS to properly export Excel data. I managed to solve this issue, although not correctly. Aside from someone providing a correct answer, the solution listed in this question is not terrible.
The only answer I found was to create a single row named range wide enough for my columns. In the named range put sample data and hide it. SSIS appends the data and reads metadata from the single row (that is close enough for it to drop stuff in it). The data takes the format of the hidden single row. This allows headers, etc.
WOW what a pain in the butt. It will take over 450 days of exports to recover the time lost. However, I still love SSIS and will continue to use it because it is still way better than Filemaker LOL. My next attempt will be doing the same thing in the report server.
Original question notes:
If you are in Sql Server Integrations Services designer and want to export data to an Excel file starting on something other than the first line, lets say the forth line, how do you specify this?
I tried going in to the Excel Destination of the Data Flow, changed the AccessMode to OpenRowSet from Variable, then set the variable to "YPlatters$A4:I20000" This fails saying it cannot find the sheet. The sheet is called YPlatters.
I thought you could specify (Sheet$)(Starting Cell):(Ending Cell)?
Update
Apparently in Excel you can select a set of cells and name them with the name box. This allows you to select the name instead of the sheet without the $ dollar sign. Oddly enough, whatever the range you specify, it appends the data to the next row after the range. Oddly, as you add data, it increases the named selection's row count.
Another odd thing is the data takes the format of the last line of the range specified. My header rows are bold. If I specify a range that ends with the header row, the data appends to the row below, and makes all the entries bold. if you specify one row lower, it puts a blank line between the header row and the data, but the data is not bold.
Another update
No matter what I try, SSIS samples the "first row" of the file and sets the metadata according to what it finds. However, if you have sample data that has a value of zero but is formatted as the first row, it treats that column as text and inserts numeric values with a single quote in front ('123.34). I also tried headers that do not reflect the data types of the columns. I tried changing the metadata of the Excel destination, but it always changes it back when I run the project, then fails saying it will truncate data. If I tell it to ignore errors, it imports everything except that column.
Several days of several hours a piece later...
Another update
I tried every combination. A mostly working example is to create the named range starting with the column headers. Format your column headers as you want the data to look as the data takes on this format. In my example, these exist from A4 to E4, which is my defined range. SSIS appends to the row after the defined range, so defining A4 to E68 appends the rows starting at A69. You define the Connection as having the first row contains the field names. It takes on the metadata of the header row, oddly, not the second row, and it guesses at the data type, not the formatted data type of the column, i.e., headers are text, so all my metadata is text. If your headers are bold, so is all of your data.
I even tried making a sample data row without success... I don't think anyone actually uses Excel with the default MS SSIS export.
If you could define the "insert range" (A5 to E5) with no header row and format those columns (currency, not bold, etc.) without it skipping a row in Excel, this would be very helpful. From what I gather, noone uses SSIS to export Excel without a third party connection manager.
Any ideas on how to set this up properly so that data is formatted correctly, i.e., the metadata read from Excel is proper to the real data, and formatting inherits from the first row of data, not the headers in Excel?
One last update (July 17, 2009)
I got this to work very well. One thing I added to Excel was the IMEX=1 in the Excel connection string: "Excel 8.0;HDR=Yes;IMEX=1". This forces Excel (I think) to look at all rows to see what kind of data is in it. Generally, this does not drop information, say for instance if you have a zip code then about 9 rows down you have a zip+4, Excel without this blanks that field entirely without error. With IMEX=1, it recognizes that Zip is actually a character field instead of numeric.
And of course, one more update (August 27, 2009)
The IMEX=1 will succeed importing data with missing contents in the first 8 rows, but it will fail exporting data where no data exists. So, have it on your import connection string, but not your export Excel connection string.
I have to say, after so much fiddling, it works pretty well.
P.S. If you are using a x64 bit version, make sure you call the DTExec from C:\Program Files\Microsoft SQL Server\90\DTS.x86\Binn. It will load the 32 bit Excel driver and work fine.
Would it be easier to create the Excel Workbook in a script task, then just pick it up later in the flow?
The engine part of SSIS is good but the integration with Excel is awful
"Using SSIS in conjunction with Excel is like having hot tar funnelled up your iHole in a road cone"
Dr. Zim, I believe you were the one that originally brought up this question. I totally feel your pain. I love SSIS overall, but I absolutely hate the limited tools that come standard for Excel. All I want to do is Bold the Heading or Row1 record in Excel, and not bold the following records. I have not found a great way to do that; granted I am approaching this with no script tasks or custom extensions, but you would think something this simple would be a standard option. Looks like I may be forced to research and program up something fancy for a task that should be so fundamental. I've already spent a rediculous amount of time on this myself. Does anyone know if you can use Excel XML with Excel versions: 2000/XP/2003? Thanks.
This is an old thread but what about using a flat file connection and writing the data out as a formatted html document. Set the mime type in the page header to "application/excel". When you send the document as an attachment and the recipient opens the attachment, it will open a browser session but should pop Excel up over the top of it with the data formatted according to the style (CSS) specified in the page.
Can you have SSIS write the data to an Excel sheet starting at A1, then create another sheet, formatted as you like, that refers to the other sheet at A1, but displays it as A4? That is, on the "pretty" sheet, A4 would refer to A1 on the SSIS sheet.
This would allow SSIS to do what it's good for (manipulate table-based data), but allow the Excel to be formatted or manipulated however you'd like.
When excel is the destination in SSIS, or the target export type in SSRS, you do not have much control over formatting and specifying how you want the final file to be. I have written a custom excel rendering engine for SSRS once, as my client was so strict about the format of final Excel report generated. I used 'Excel xml' to get the job done inside my custom renderer. May be you can use XML output and convert it to Excel XML using XSLT.
I understand you would rather not use a script component so perhaps you could create your own custom task using the code that a script contains so that others can use this in the future. Check here for an example.
If this seems feasible the solution I used was CarlosAg Excel Xml Writer Library. With this you can create code which is similar to using the Interop library but produces excel in xml format. This avoids using the Interop object which can sometimes lead to excel processes hanging around.
Instead of using a roundabout way to do this exercise of trying to write data to particular cell(s), format the cell(s), style them which is indeed a very tedius effort considering the support SSIS has for EXCEL, we could go the "template" way to do this.
assume we need to write data in the so & so cell with all the custom formating thats done on it. Have all the formatting in a sheet, say "SheetActual", Whereas the cells that will hold the data will actually have Lookups/ refrences/ Formulaes to refer to the original data that SSIS exports in a hidden sheet say "SheetMasterHidden" of the same Excel connection. This "SheetMasterHidden" will essentially hold the master data in default format that SSIS writes data to the excel. This way you need not worry about formatting the data runtime.
Formatting the Excel is a one time work "IF" the formatting dont change very often. If the format changes and the format is decided runtime this solution maynot go very well.
The answer is in the question. Over time, it became a progress status. However, there is SSRS that will create Excel files if you create TABLE presentations. It works pretty well too.