I am working for a start up where they are getting excel files from different companies with customer information. We do not have any ETL tool at present as the work is handled manually to transform the data into required structure and load into CRM system.
My plan is to load these excel files into a database and also replicate CRM into a database and do some fuzzy maping.
Can you please recommend a light weight ETL tool to apply a few rules to clean the data and compare the existing customer data that we have?
Thanks,
mc
Getting Excel feeds is certainly very common, and you need a good process for ingesting and validating them, especially since they are often manually created or tweaked, leading to frequent data and formatting issues. Adding insult to injury, Excel has a very fuzzy concept of data types, often throwing spanners in the works.
Where possible, switch your data sources to other formats (JSON, CSV, database extract). This requires upstream work but so does troubleshooting feed issues, so switching to a better format (and defining the feed well!) pays off for both sides fairly quickly.
Process Incoming Files Example describes a general approach for reliably handling multiple feeds of incoming files, with processing and archiving of successful and failing files. The example uses my company's actionETL cross-platform .NET ETL library, but I've also used the same approach previously with other ETL tools.
Map out all current and upcoming data sources and destinations, and see which tools are a good fit. Try before you buy with your actual ETL feeds and requirements. Expect the ETL data integration to be an ongoing project since feeds and requirements never stop changing and growing.
Cheers,
Kristian
I have an excel file about 340 mb which contains more than 2000 worksheets and dozens of long VBA program. The file is getting so large that it takes around 10-15 mins to open the file and often get into "NO RESPONSE", "NOT ENOUGH USEABLE RESOURCES" when I save or debug the file.
I searched online people suggest migrating to Acesss. However, I have never used Access before. SO I wonder
1) How to migrate the excel file to access?
2) Will the VBA program be carried to the new access file
3) Do i need to modify the excel VBA code to fit Access?
2) Can Access handle a 300-500mb file?
thank you
500MB Excel file does not mean 500MB Access file. Moving from Excel to Access is a very good thought. However, you need to understand the basics first. Excel it completely different from MS Access access except when Access looks similar to excel in a datasheet form.
Access is a "Relational"-Database + graphical front-end software where you can present your data in a form or through components. In a relational database you identify entities and define relationships between them. By doing so you eliminate [Data inconsistency] throughout your application. A proper modelling might reduce your data from 500MB to just 50MB.
To answer your questions:
1> How to migrate the excel file to access?
First of all migration in your case needs a fresh new MS Application. Start modelling your business first. Read about Rational-Databases, Read about MS Access tables, relationships, queries. Think if Access is a suitable platform for your business. Create the application in MS Access and then you can come back to us and ask about migration.
OR:
You can use the current Excel sheet as an external data-source and link them as linked table within the access application. This is very very not recommended because you are effectively not benefiting anything better than your current situation. (Except queries to find a data)
2> Will the VBA program be carried to the new access file
Do you mean VBA forms? no it won't. Access has its own Forms and controls which looks better than Excel userforms. You have to redesign that part completely in Access.
3> Do i need to modify the excel VBA code to fit Access?
In most cases YES unless its a generic function. Most Excel VBA codes are referring to a cell or worksheet which Access does not have! In other words, it depends how big and complex your codes are. Any specific code has to be adjusted to MS Access platform but the adjustments are very minor not a major task. Again it depends on your code complexity.
4> Can Access handle a 300-500mb file?
Yes it can! newest versions can reach up to 2GB but i personally would not stay with access when i have to work with such amount of data. I would look for splitting/upgrading the database to a proper dedicated database engine such as MSSQL or free MySQL servers.
Some advise from me: 500MB of Excel sheet is potentially dangerous you should seek an alternative very soon. If you are going to fiddle on you own please always "backup" because Access throws random errors which are very hard to understand. Find an experienced IT person to help you before you delete/update/wipe off your data. Good luck
Interesting
I faced much of the same problem years ago on a POS system that recorded the transaction history in excel via some VBA. I moved to Access and found (Access) get a little corruptible the larger the file gets. I had to build in safeguards to restore from a backup should the file quit on me. I eventually moved to Visual FoxPro as my VBA could just about be translated straight across. Works to this day as a matter of fact.
Following up on my previous post, I need to be able to query a database of 6M+ rows in the fastest way possible, so that this DB can be effectively used as a "remote" data source for a dynamic Excel report.
Like I said, normally I would store the data I need on a separate (perhaps hidden) worksheet and I would manipulate it through a second "control" sheet. This time, the size (i.e. number of rows) of my database prevents me from doing so (as you all know, excel cannot handle more than 1,4M rows).
The solution my IT guy put in place consists of holding the data on a txt file inside of a network folder. This far, I managed to query this file through ADO (slow but no mantainance needed) or to use it as a source to populate an indexed Access table, which I can then query (faster but requires more mantainance & additional software).
I feel both solutions, although viable, are sub-optimal. Plus it seems to me as all of this is but an unnecessary overcomplication. The txt file is actually an export from SAP BO, which the IT guy has access to through WEBI. Now, can't I just query the BO database through WEBI myself in a "dynamic" kind of way?
What I'm trying to say is, why can't I extract only bits of information at a time, on a need-to-know basis and directly from the primary source, instead of having all of the data transfered in bulk on a secondary/duplicate database?
Is this sort of "dynamic" queries even possible? Or will the "processing" times hinder the success of my approach? I need this whole thing to really feel istantaneuos, as if the data was already there and I'm not actually retrieving it all the times.
And most of all, can I do this through VBA? Unfortunately that's the only thing I will be having access to, I can't do this BO-side.
I'd like to thank you guys in advance for whatever help you can grant me!
Webi (short for Web Intelligence) is a front-end analytical reporting application from Business Objects. Your IT contact apparently has created (or has access to) such a Webi document, which retrieves data through a universe (an abstraction layer) from a database.
One way that you could use the data retrieved by Web Intelligence as a source and dynamically request bits instead of retrieving all information in one go, it to use a feature called BI Web Service. This will make data from Webi available as a web service, which you could then retrieve from within Excel. You can even make this dynamic by adding prompts which would put restrictions on the data retrieved.
Have a look at this page for a quick overview (or Google Web Intelligence BI Web Service for other tutorials).
Another approach could be to use the SDK, though as you're trying to manipulate Web Intelligence, your only language options are .NET or Java, as the Rebean SDK (used to talk to Webi) is not available for COM (i.e. VBA/VBScript/…).
Note: if you're using BusinessObjects BI 4.x, remember that the Rebean SDK is actually deprecated and replaced by a REST SDK. This could make it possible to approach Webi using VBA after all.
That being said, I'm not quite sure if this is the best approach, as you're actually introducing several intermediate layers:
Database (holding the data you want to retrieve)
Universe (semantic abstraction layer)
Web Intelligence
A way to get data out of Webi (manual export, web service, SDK, …)
Excel
Depending on your license and what you're trying to achieve, Xcelsius or Design Studio (BusinessObjects BI 4.x) could also be a viable alternative to the Excel front-end, thereby eliminating layers 3 to 4 (and replacing layer 5). The former's back-end is actually heavily based on Excel (although there's no VBA support). Design Studio allows scripting in JavaScript.
I have a ton of data in a sql database which I would like to be able to import and display in excel (I can already do this) and additionally modify or append to the dataset within excel and write the changes/additions back to the database.
What is the best way to go about doing something like this?
Please let me know, thanks!
The way to do this is via Sql Server's DTS/SSIS capabilities. Create SSIS packages for Excel import and export and execute them as needed.
However you still have the issue of people having to share this massive spread sheet. You should consider importing the data into the db permanently and providing a winforms interface for the data entry. You'd be surprised how quickly you could whip out an app with a databound grid view control that would give you decent, Excel-like ability to add/edit/delete table data.
Although Excel is great at displaying/reporting on data stored within a SQL DB, it has no built-in controls for updating the data.
I would recommend investigating using VBA (Visual Basic for Applications) or based on your coding experience/tools available to you, VSTO (Visual Studio Tools for Office).
This method will allow all of your users to share the spreadsheet at the same time and allow incremental updates plus validation of the data being entered by the user at the point they enter it.
All the usual gotchas apply though - mainly GIGO (Garbage In, Garbage Out). Correctly authenticate your users and what they are allowed to update
I am building a small application for a friend and they'd like to be able to use Excel as the front end. (the UI will basically be userforms in Excel). They have a bunch of data in Excel that they would like to be able to query but I do not want to use excel as a database as I don't think it is fit for that purpose and am considering using Access. [BTW, I know Access has its shortcomings but there is zero budget available and Access already on friend's PC]
To summarise, I am considering dumping a bunch of data into Access and then using Excel as a front end to query the database and display results in a userform style environment.
Questions:
How easy is it to link to Access from Excel using ADO / DAO? Is it quite limited in terms of functionality or can I get creative?
Do I pay a performance penalty (vs.using forms in Access as the UI)?
Assuming that the database will always be updated using ADO / DAO commands from within Excel VBA, does that mean I can have multiple Excel users using that one single Access database and not run into any concurrency issues etc.?
Any other things I should be aware of?
I have strong Excel VBA skills and think I can overcome Access VBA quite quickly but never really done Excel / Access link before. I could shoehorn the data into Excel and use as a quasi-database but that just seems more pain than it is worth (and not a robust long term solution)
Any advice appreciated.
Alex
I'm sure you'll get a ton of "don't do this" answers, and I must say, there is good reason. This isn't an ideal solution....
That being said, I've gone down this road (and similar ones) before, mostly because the job specified it as a hard requirement and I couldn't talk around it.
Here are a few things to consider with this:
How easy is it to link to Access from Excel using ADO / DAO? Is it quite limited in terms of functionality or can I get creative?
It's fairly straitforward. You're more limited than you would be doing things using other tools, since VBA and Excel forms is a bit more limiting than most full programming languages, but there isn't anything that will be a show stopper. It works - sometimes its a bit ugly, but it does work. In my last company, I often had to do this - and occasionally was pulling data from Access and Oracle via VBA in Excel.
Do I pay a performance penalty (vs.using forms in Access as the UI)?
My experience is that there is definitely a perf. penalty in doing this. I never cared (in my use case, things were small enough that it was reasonable), but going Excel<->Access is a lot slower than just working in Access directly. Part of it depends on what you want to do....
In my case, the thing that seemed to be the absolute slowest (and most painful) was trying to fill in Excel spreadsheets based on Access data. This wasn't fun, and was often very slow. If you have to go down this road, make sure to do everything with Excel hidden/invisible, or the redrawing will absolutely kill you.
Assuming that the database will always be updated using ADO / DAO commands from within Excel VBA, does that mean I can have multiple Excel users using that one single Access database and not run into any concurrency issues etc.?
You're pretty much using Excel as a client - the same way you would use a WinForms application or any other tool. The ADO/DAO clients for Access are pretty good, so you probably won't run into any concurrency issues.
That being said, Access does NOT scale well. This works great if you have 2 or 3 (or even 10) users. If you are going to have 100, you'll probably run into problems. Also, I tended to find that Access needed regular maintenance in order to not have corruption issues. Regular backups of the Access DB are a must. Compacting the access database on a regular basis will help prevent database corruption, in my experience.
Any other things I should be aware of?
You're doing this the hard way. Using Excel to hit Access is going to be a lot more work than just using Access directly.
I'd recommend looking into the Access VBA API - most of it is the same as Excel, so you'll have a small learning curve. The parts that are different just make this easier. You'll also have all of the advantages of Access reporting and Forms, which are much more data-oriented than the ones in Excel. The reporting can be great for things like this, and having the Macros and Reports will make life easier in the long run. If the user's going to be using forms to manage everything, doing the forms in Access will be very, very similar to doing them in Excel, and will look nearly identical, but will make everything faster and smoother.
I do this all the time. If you're using ADO, you're not really using Access, but Jet, the underlying database. That means anybody with Excel can use the app - Access not required. Oh I should mention, the place I work bought a bunch of Office Small Business licenses - no Access. Prior to working here, I would have assumed that anyone who had Excel would also have Access. Not so.
I create one class for every table in Access. I very rarely run queries through ADO, instead I keep that logic in the class modules. I read in with a SELECT statement and write out with and UPDATE or INSERT using the Execute method of the ADODB.Connection object.
See http://www.dailydoseofexcel.com/archives/2008/12/21/vba-framework-ii/
if you want to see how I set up my code.
To answer your questions: It will be a small learning curve for you if you already know Excel VBA, but there will be some learning to do; you will pay a performance penalty over doing it all in Access, but it's not that bad and only you can decide if it's worth it; and you can have multiple people accessing the database.
Just skip the excel part - the excel user forms are just a poor man's version of the way more robust Access forms. Also Access VBA is identical to Excel VBA - you just have to learn Access' object model. With a simple application you won't need to write much VBA anyways because in Access you can wire things together quite easily.
If the end user has Access, it might be easier to develop the whole thing in Access. Access has some WYSIWYG form design tools built-in.
Unless there is a strong advantage to running your user form in Excel then I would go with a 100% Access solution that would export the reports and data to Excel on an ad-hoc basis.
From what you describe, Access seems the stronger contender as it is built for working with data:
you would have a lot more tools at your disposal to solve any data problems than have to go around the limitations of Excel and shoehorn it into becoming Access...
As for your questions:
Very easy. There have been some other questions on SO on that subject.
See for instance this one and that one.
Don't know, but I would guess that there could be a small penalty.
The biggest difficulty I see is trying to get all the functionalities that Access gives you and re-creating some of these in Excel.
Yes, you can have multiple Excel users and a single Access database.
Here again, using Access as a front-end and keeping the data in a linked Access database on your network would make more sense and it's easy as pie, there's even a wizard in Access to help you do that: it's just 1 click away.
Really, as most other people have said, take a tiny bit of time to get acquainted with Access, it will save you a lot of time and trouble.
You may know Excel better but if you've gone 80% of the way already if you know VBA and are familiar with the Office object model.
Other advantages of doing it in Access: the Access 2007 runtime is free, meaning that if you were to deploy to app to 1 or 30 PC it would cost you the same: nothing.
You only need one full version of Access for your development work (the Runtime doesn't have the designers).
It really depends on the application. For a normal project, I would recommend using only Access, but sometimes, the needs are specific and an Excel spreadsheet might be more appropriate.
For instance, in a project I had to develop for a former employer, the need was to give access to different persons on forms(pre-filled with some data, different for each person) and have them complete them, then re-import the data.
Since the form was using heavy number crunching, it made more sense to build it in Excel.
The Excel workbooks for the different persons were built from a template using VBA, then saved in a proper location, with the access rights on the folder.
All workbooks were attached as External tables to the workbooks, using named ranges. I could then query the workbooks from the Access Application. All administrative stuff was made from the db, but the end users only had access to their respective workbook.
Developping an Excel/Access application this way was a pleasant experience and the UI was more user-friendly than it would have been using Access.
I have to say that in this case, it would have taken a lot more time doing it in Access than it took using Excel. Also, the Application Object Model seems better though in Excel than in Access.
If you plan to use Excel as a front-end, do not forget to lock all the cells, but the editable ones and don't be affraid to use masked rows and columnns (to construct output tables for the access database, to perform intermediate calculations, etc).
You should also turn off autocalculation while importing data.
It's quite easy and efficient to use Excel as a reporting tool for Access data.
A quick "non programming" approach is to set a List or a Pivot Table, linked to your External Data source. But that's out of scope for Stackoverflow.
A programmatic approach can be very simple:
strProv = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & SourceFile & ";"
Set cnn = New ADODB.Connection
cnn.Open strProv
Set rst = New ADODB.Recordset
rst.Open strSql, cnn
myDestRange.CopyFromRecordset rst
That's it !
Given the ease of use of Access, I don't see a compelling reason to use Excel at all other than to export data for number crunching. Access is designed to easily build data forms and, in my opinion, will be orders of magnitude easier and less time-consuming than using Excel. A few hours to learn the Access object model will pay for itself many times over in terms of time and effort.
I did it in one project of mine. I used MDB to store the data about bills and used Excel to render them, giving the user the possibility to adapt it.
In this case the best solution is:
Not to use any ADO/DAO in Excel. I implemented everything as public functions in MDB modules and called them directly from Excel. You can return even complex data objects, like arrays of strings etc by calling MDB functions with necessary arguments. This is similar to client/server architecture of modern web applications: you web application just does the rendering and user interaction, database and middle tier is then on the server side.
Use Excel forms for user interaction and for data visualisation.
I usually have a very last sheet with some names regions for settings: the path to MDB files, some settings (current user, password if needed etc.) -- so you can easily adapt your Excel implementation to different location of you "back-end" data.
To connect Excel to Access using VBA is very useful I use it in my profession everyday. The connection string I use is according to the program found in the link below. The program can be automated to do multiple connections or tasks in on shot but the basic connection code looks the same. Good luck!
http://vbaexcel.eu/vba-macro-code/database-connection-retrieve-data-from-database-querying-data-into-excel-using-vba-dao
It Depends how much functionality you are expecting by Excel<->Acess solution. In many cases where you don't have budget to get a complete application solution, these little utilities does work. If the Scope of project is limited then I would go for this solution, because excel does give you flexibility to design spreadsheets as in accordance to your needs and then you may use those predesigned sheets for users to use. Designing a spreadsheet like form in Access is more time consuming and difficult and does requires some ActiveX. It object might not only handling data but presenting in spreadsheet like formates then this solution should works with limited scope.
You could try something like XLLoop. This lets you implement excel functions (UDFs) on an external server (server implementations in many different languages are provided).
For example you could use a MySQL database and Apache web server and then write the functions in PHP to serve up the data to your users.
BTW, I work on the project so let me know if you have any questions.