I will be obtaining a few dozen MS Excel spreadsheets with 500+ columns and need to import just selected few columns into MS Access based on their column name. I have no control over this and yes, there doesn't need to be that many columns. I want to import a few selected fields only. In particular, I only want to import columns that have "c_" in the column name. The problem is I can't figure out how to do this. The other problem is the number of columns will vary each time, but I will always need the columns that start with "c_".
Normally when I use VBA code to import Excel data, it's not too difficult for me if I have to select a range. For example:
DoCmd.TransferSpreadsheet acLink, acSpreadsheetTypeExcel9, "ards_data_new", "H:\Misc. Projects\ARDS project\ards_data.xlsx", True, "ards_data!A1:A50"
The problem is that the range will vary for each file, so I can't do it this way. I wish there was a way I could use the command Like "c_" in obtaining the columns I need only, but I just don't know how. Any help would be appreciated.
Related
I have a system that is using PQ to import various CSV files. In those CSV files each row has a StaffID that uniquely identifies each member of staff. I have another sheet which has all current Staff details so I can match the import data to the existing staff by linking on the StaffID field.
If a new member of staff joins the company, I would only get notification as an unknown StaffID would appear in the CSV files I am importing. Is the correct approach here to either a) write a PQ to find StaffIDs in the new CSV files that don't match the master list in Excel or b) run the import into Excel and process those 'orphan' rows in Excel ie StaffID's in the new import that don't have a matching StaffID in the Master List Excel sheet?
I'm looking for any advice on 'best practice' here. I could solve this using only Excel / VBA but I am new to PQ so I and trying to find more about how it can be used and in what scenarios.
Thanks in advance for any feedback.
Expected result: Advice on a 'best practice' or sensible approach to use Excel and PQ to manage the handling of 'orphaned' or data without a matching PK.
The best approach to finding deltas like this is a simple outer join and then remove the matching rows. It will highlight rows which exist on the left side but don't exist on the right and vice versa. You can then load that to a table either in Excel or PowerBI with an alert.
I'm importing some data form Excel into Access and I'm facing some strange issues.
Problem:
I'm using DoCmd.TransferSpreadsheet Method to import Excel data into Access like this :
DoCmd.TransferSpreadsheet acImport, acSpreadsheetTypeExcel12Xml, "Excel_Data", "Filename", True
This "Excel_Data" table is not pre-created so Access creates It on It's own. Why ? If I pre-create It, then User has to import data to Destination table from Excel in exactly same order (column A in Excel is row 1 in Access etc..).
But If you don't pre-create It, then Access creates whatever table there is in Excel and you can Import only data that you wish - based on column names. Now here is where It get's stucked....
I don't know why, but on every other Imports I do like this, Access creates only Text fields - and then my Import to destination table works.
But in one of the Imports Access creates a Number field, and then Import into destination table doesn't work anymore. All Excel data are formatted as general.
Does anybody know how to avoid this ???
Basically I want just to Import excel data into Access, based on column names, in whatever column order there is in Excel.
thanks for help !
I would suggest to use the query like this instead of TransferSpreadsheet
SELECT * INTO Table1
FROM [Sheet1$D3:E24]
IN "C:\Temp\Test.xls" [Excel 12.0;HDR=YES;IMEX=1];
Note, that IMEX=1 allows you to suppress data type guessing and Access will create always text fields.
Also this will allow you to import data from specified ranges of spreadsheet and use WHERE clause for filtering unnecessary data
Thanks for all answers, but I have solved this on my own. Best thing to avoid any problems when doing Import is simply using ALTER Table command after DoCmd.TransferSpreadsheet method is done. I just altered all my columns to Text format, changed all my other tables to Text fields and now every manipulating with data works just fine.
I have written some pretty lengthy VBA code in excel for the comparison of 2 worksheets. My code does the following:
Lets you import 2 sheets for comparison
arranges the columns
removes departments which require different comparisons into a new worksheet
In sheet 1 checks if the id's appear more than once then checks, which row of data to use for comparison based on the latest update, and deletes the old rows
compares the sheets based on the header and then the cell contents as header names are different, for different values it then highlights them red
finally giving me a breakdown per column per department of differences and any id's that are missing
I have now found that my data set is becoming to big and looking to use MS Access, is it possible to copy my VBA code over to access? What do you guys suggest for this?
Any advice would be helpful.
From the nature of your question it sounds like you may not have used a database before. If you were using access, you would need to totally re-write the code using SQL statements. eg An Aggregating SQL SELECT statement to find the most recently updated update and ignore the rest.
You can use conditional formatting in an access form, but it's no better than using it in excel. How many rows does your data have? Will it fit in an excel sheet?
You might use access to pre-process the data to remove the unwanted rows that you use in excel. OR use power query or sql directly from excel to remove them.
You have a way to go.
Harvy
I am importing some table from excel to access.Sometime some blank columns also are imported as field13 or field-x .
Whats the reason for that.
Also sometimes some blank rows are imported also.Is there a way to stop it?
Below is the set of data I am trying for import. But sometimes I see an extra column after ArtSRv and and some extra empty rows after 2 row getting imported. So I want to know the reason for that.
During the import process you are given the options to not import columns (there is a checkbox). Also you can choose a primary key which can help with the blank records.
The blank records are mostly the way the data is structured in excel (or not structured as the case may be.
Excel keeps a property of the active area, and I believe Access uses this property to determine the rows/columns to import.
I need to import sheets which look like the following:
March Orders
***Empty Row
Week Order # Date Cust #
3.1 271356 3/3/10 010572
3.1 280353 3/5/10 022114
3.1 290822 3/5/10 010275
3.1 291436 3/2/10 010155
3.1 291627 3/5/10 011840
The column headers are actually row 3. I can use an Excel Sourch to import them, but I don't know how to specify that the information starts at row 3.
I Googled the problem, but came up empty.
have a look:
the links have more details, but I've included some text from the pages (just in case the links go dead)
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/97144bb2-9bb9-4cb8-b069-45c29690dfeb
Q:
While we are loading the text file to SQL Server via SSIS, we have the
provision to skip any number of leading rows from the source and load
the data to SQL server. Is there any provision to do the same for
Excel file.
The source Excel file for me has some description in the leading 5
rows, I want to skip it and start the data load from the row 6. Please
provide your thoughts on this.
A:
Easiest would be to give each row a number (a bit like an identity in
SQL Server) and then use a conditional split to filter out everything
where the number <=5
http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/947fa27e-e31f-4108-a889-18acebce9217
Q:
Is it possible during import data from Excel to DB table skip first 6 rows for example?
Also Excel data divided by sections with headers. Is it possible for example to skip every 12th row?
A:
YES YOU CAN. Actually, you can do this very easily if you know the number columns that will be imported from your Excel file. In
your Data Flow task, you will need to set the "OpenRowset" Custom
Property of your Excel Connection (right-click your Excel connection >
Properties; in the Properties window, look for OpenRowset under Custom
Properties). To ignore the first 5 rows in Sheet1, and import columns
A-M, you would enter the following value for OpenRowset: Sheet1$A6:M
(notice, I did not specify a row number for column M. You can enter a
row number if you like, but in my case the number of rows can vary
from one iteration to the next)
AGAIN, YES YOU CAN. You can import the data using a conditional split. You'd configure the conditional split to look for something in
each row that uniquely identifies it as a header row; skip the rows
that match this 'header logic'. Another option would be to import all
the rows and then remove the header rows using a SQL script in the
database...like a cursor that deletes every 12th row. Or you could
add an identity field with seed/increment of 1/1 and then delete all
rows with row numbers that divide perfectly by 12. Something like
that...
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/847c4b9e-b2d7-4cdf-a193-e4ce14986ee2
Q:
I have an SSIS package that imports from an Excel file with data
beginning in the 7th row.
Unlike the same operation with a csv file ('Header Rows to Skip' in
Connection Manager Editor), I can't seem to find a way to ignore the
first 6 rows of an Excel file connection.
I'm guessing the answer might be in one of the Data Flow
Transformation objects, but I'm not very familiar with them.
A:
Question Sign in to vote 1 Sign in to vote rbhro, actually there were
2 fields in the upper 5 rows that had some data that I think prevented
the importer from ignoring those rows completely.
Anyway, I did find a solution to my problem.
In my Excel source object, I used 'SQL Command' as the 'Data Access
Mode' (it's drop down when you double-click the Excel Source object).
From there I was able to build a query ('Build Query' button) that
only grabbed records I needed. Something like this: SELECT F4,
F5, F6 FROM [Spreadsheet$] WHERE (F4 IS NOT NULL) AND (F4
<> 'TheHeaderFieldName')
Note: I initially tried an ISNUMERIC instead of 'IS NOT NULL', but
that wasn't supported for some reason.
In my particular case, I was only interested in rows where F4 wasn't
NULL (and fortunately F4 didn't containing any junk in the first 5
rows). I could skip the whole header row (row 6) with the 2nd WHERE
clause.
So that cleaned up my data source perfectly. All I needed to do now
was add a Data Conversion object in between the source and destination
(everything needed to be converted from unicode in the spreadsheet),
and it worked.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
We provide guidance to our customers and vendors about how files must be formatted before we can process them and it is up to them to meet the guidlines as much as possible. People often aren't aware that files like that create a problem in processing (next month it might have six lines before the data starts) and they need to be educated that Excel files must start with the column headers, have no blank lines in the middle of the data and no repeating the headers multiple times and most important of all, they must have the same columns with the same column titles in the same order every time. If they can't provide that then you probably don't have something that will work for automated import as you will get the file in a differnt format everytime depending on the mood of the person who maintains the Excel spreadsheet. Incidentally, we push really hard to never receive any data from Excel (only works some of the time, but if they have the data in a database, they can usually accomodate). They also must know that any changes they make to the spreadsheet format will result in a change to the import package and that they willl be charged for those development changes (assuming that these are outside clients and not internal ones). These changes must be communicated in advance and developer time scheduled, a file with the wrong format will fail and be returned to them to fix if not.
If that doesn't work, may I suggest that you open the file, delete the first two rows and save a text file in a data flow. Then write a data flow that will process the text file. SSIS did a lousy job of supporting Excel and anything you can do to get the file in a different format will make life easier in the long run.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
Not entirely correct.
SSIS forces you to use the format and quite often it does not work correctly with excel
If you can't change he format consider using our Advanced ETL Processor.
You can skip rows or fields and you can validate the data the way you want.
http://www.dbsoftlab.com/etl-tools/advanced-etl-processor/overview.html
Sky is the limit
You can just use the OpenRowset property you can find in the Excel Source properties.
Take a look here for details:
SSIS: Read and Export Excel data from nth Row
Regards.