Following up on my previous post, I need to be able to query a database of 6M+ rows in the fastest way possible, so that this DB can be effectively used as a "remote" data source for a dynamic Excel report.
Like I said, normally I would store the data I need on a separate (perhaps hidden) worksheet and I would manipulate it through a second "control" sheet. This time, the size (i.e. number of rows) of my database prevents me from doing so (as you all know, excel cannot handle more than 1,4M rows).
The solution my IT guy put in place consists of holding the data on a txt file inside of a network folder. This far, I managed to query this file through ADO (slow but no mantainance needed) or to use it as a source to populate an indexed Access table, which I can then query (faster but requires more mantainance & additional software).
I feel both solutions, although viable, are sub-optimal. Plus it seems to me as all of this is but an unnecessary overcomplication. The txt file is actually an export from SAP BO, which the IT guy has access to through WEBI. Now, can't I just query the BO database through WEBI myself in a "dynamic" kind of way?
What I'm trying to say is, why can't I extract only bits of information at a time, on a need-to-know basis and directly from the primary source, instead of having all of the data transfered in bulk on a secondary/duplicate database?
Is this sort of "dynamic" queries even possible? Or will the "processing" times hinder the success of my approach? I need this whole thing to really feel istantaneuos, as if the data was already there and I'm not actually retrieving it all the times.
And most of all, can I do this through VBA? Unfortunately that's the only thing I will be having access to, I can't do this BO-side.
I'd like to thank you guys in advance for whatever help you can grant me!
Webi (short for Web Intelligence) is a front-end analytical reporting application from Business Objects. Your IT contact apparently has created (or has access to) such a Webi document, which retrieves data through a universe (an abstraction layer) from a database.
One way that you could use the data retrieved by Web Intelligence as a source and dynamically request bits instead of retrieving all information in one go, it to use a feature called BI Web Service. This will make data from Webi available as a web service, which you could then retrieve from within Excel. You can even make this dynamic by adding prompts which would put restrictions on the data retrieved.
Have a look at this page for a quick overview (or Google Web Intelligence BI Web Service for other tutorials).
Another approach could be to use the SDK, though as you're trying to manipulate Web Intelligence, your only language options are .NET or Java, as the Rebean SDK (used to talk to Webi) is not available for COM (i.e. VBA/VBScript/…).
Note: if you're using BusinessObjects BI 4.x, remember that the Rebean SDK is actually deprecated and replaced by a REST SDK. This could make it possible to approach Webi using VBA after all.
That being said, I'm not quite sure if this is the best approach, as you're actually introducing several intermediate layers:
Database (holding the data you want to retrieve)
Universe (semantic abstraction layer)
Web Intelligence
A way to get data out of Webi (manual export, web service, SDK, …)
Excel
Depending on your license and what you're trying to achieve, Xcelsius or Design Studio (BusinessObjects BI 4.x) could also be a viable alternative to the Excel front-end, thereby eliminating layers 3 to 4 (and replacing layer 5). The former's back-end is actually heavily based on Excel (although there's no VBA support). Design Studio allows scripting in JavaScript.
I am currently architecting a large SharePoint deployment.
This deployment has the potential to grow to petabytes in size over the course of several years.
One of the current issues we are discussing is the option of storing our data in SharePoint using InfoPath Forms. Some of these forms contain hundres of fields and require a lot of mapping to content types for persistence and search. Our search requirement is primarily a singular identifier and NOT the contents of the forms, although I am told I should preempt the "want" to search in the future.
We require our information to be utilised for secondary purposes (such as reporting etc). The information MUST be accessible instantly after persisting to the system.
My core questions therefore are:
What are the benefit/risks of this approach compared to storing
our data in a singular relational store using web-service
persistence?
If we decided on this approach what would be the
impact of changing the forms, content-types over time?
What happens when our farm grows beyond a single web-application / site collection how accessible will the information be?
Will I know where it is and how portable will the information be overtime?
1.)
Benefit:
Form templates can be created & deployed (relatively) easy
You can easily configure Field Validation
Probably no code involved
Risks:
Hitting SharePoint 2010 Limits (not so uncommon as you might
think)
Needs careful form design/planning (correct XML structure)
Information only accessible via SharePoint Object model or
WebService's (very slow)
2.) Well this is a tough one. Changing the form template and re-deploying is easy and only takes a few minutes. However changing the structure (underlying XML) of the template can get you in trouble very easily, because older (filled out) forms will be invalid - there is an option to "upgrade" older forms out-of-the-box, but in my experience it never worked as it supposed to.
Content Types behave very similar, say you want to delete a column from a content type because it's no longer needed - you'll have to remove all references to it, which means removing all items so you can delete the column.
3.) Well portability is definitly an issue with InfoPath, because it heavily relies on the corresponding URL structure. You absolutely can add more site collections, but this means you have to deploy your form template to each site collection. Information (filled out forms) can't easily be shared between site collection's because each form contains the SourceURL (where it came from) and the Namespace of the template (which changes constantly once you deploy).
Considering your requirements, i would strongly recommend a relational store instead of InfoPath - simply because it is not designed to be a data storage.
I would use a SQL database to store the data and a custom UI (WebPart or Application Page) to perform CRUD operations. This means that the information is not actually stored in SharePoint - just displayed (which also means that it can't be searched with the builtin SharePoint Search). There is also the possibilty to use the Business Connectivity Services (which basically does all of the above without you needing to create a custom UI - however very slow with large amount of data).
If you do need the information just in SharePoint, why not just make all this happen with Lists only?
This is going to be a long one and may not have an answer just because there's no silver bullet for what you're looking for. It's mostly insight and ultimately the choice is up to you.
the option of storing our data in SharePoint using InfoPath Forms
This statement throws me a little. SharePoint data is stored in SharePoint (well, SQL technically) but InfoPath is just a UI layer for accessing any part of that data.
Some of these forms contain 100s of fields and require alot of mapping to content types for persistence and search
From this I assume there are multiple forms which would mean different types of data being accessed (and probably different purposes). Hundreds of fields is no problem and it really boils down to managing the form and view design.
From the form side you should check out cxpartners form design crib sheet. This gives you a nice standard to follow to manage all those fields. Another thing would be to look at breaking the form up in tabs or views itself (in InfoPath) based on what the user needs to fill out. Basically it breaks down to not creating a form with 100s of fields on one massively scrolling screen the user will just freak out over.
Same with the views on the form or document library you're storing the form data in. InfoPath forms are just xml stored in a library (so regardless of how many fields you have, the footprint is pretty minimal). You don't want to map and surface every field in the form nor do you want to have a view with 100 columns on it. You should look at breaking down the views as they're fit for purpose, with only a few hundred items in each view with a few columns. It's a balancing act too as you don't want to create 100s of views either so you need to find out what's right. A good B.A. or Information Architect will help with this with the SharePoint/InfoPath guru and business user helping out.
We require our information to be utilised for secondary purposes (such as reporting etc). The information MUST be accessible instantly
This is another requirement that's going to be a little difficult to meet exactly. If the library has thousands of items (or 10s of thousands) and a view has dozens of fields then expect the view to come to crawl (especially if the user is insistent on "seeing everything" and wants the limits of each view to be set to 1000 items, like anyone could process that much information at once). Instant access is difficult if you're keeping everything online for a long time (like for reporting). There's the operational side where users are filling out forms, finding forms, editing them, etc. and for that you only want a few hundred items to be live at any given moment (up to a few thousand but you need to be careful on the views). If you have a list with 100,000 items in it and users are using this for daily activities and trying to run reports for trending or long term operations against it, you're going to lose the performance battle. Look at doing reporting offline, potentially shipping the data that's reportable to a second source like SQL and using SSRS against it. Performance Point is an option but adds a layer of complexity to the architecture. The question will ultimately fall to what reporting looks like and how important is it in relation to daily operations.
To try to answer your questions directly:
The benefits to using SharePoint over a database are that the data can be easily viewed and sliced and diced up. Creating a view is child's play and can quickly show you useful information like # of sales in a month or customer feedback grouped by call centre person. SharePoint makes it easy to view this information and even setup dashboards, hook in KPIs, etc. without having to get some developer to craft custom web pages. As far as risks go, you need to be careful with letting things grow organically and out of control. Don't let the users design views of data, they generally want something but not sure and will ask for all columns to be available which they just export to Excel to slice and dice. Make sure there's a good design around the views and lists and they're fit for purpose and meet what needs the user is trying to get out of the data. Ask the question of what they're looking for and why, that will help shape what to expose.
Any change needs to be thought out and planned and tested. It's no different in SharePoint if you add a column to a list as you would by adding a column to a SQL database. Form updates should be considered and while you won't get it 100% right the first time, you should try to get as much as possible without going overboard and putting in crazy things like 100 "blank" fields that are players to be named later. Strike a balance by understanding the needs of the users and company and where things are going. Hopefully someone will have a vision of what this thing might be when it grows up and that'll go a long way to understanding the impact of change.
Data is just xml and as long as you're not doing stupid stuff in the form like hard coding absolute paths to services (use data connection libraries) the impact of growth will be minimal. Growing beyond a web application into multiple ones is a pretty big change and not something to be taken lightly. Even splitting site collections out is big and there needs to be a really good reason for this. Site collections can handle thousands of sites and millions of documents without issue. Web applications are really there for dividing up areas of interest or separation of purpose (like team sites on one web app and a publishing portal on another) and not really meant for splitting data due to growth concerns.
Like I said, there's no silver bullet here and what you're asking for is an architecture for a solution that nobody here has all the requirements for. Hope this helps.
I have a requirement to retrieve data from share point (I guess it is 2010, but will check with admin if relevant) and generate an excel report/chart. Say we have a bug tracking system in share point. Currently, I could create a view and see some statistics, but I need to plot a graph to see historically (every week) how the number of bugs changed. For example,
get the number of bugs filed in a specific week
do some grouping based on type/severity
based on classification get number of bugs solved that week etc.
If I can get the numbers based on date range, I may use excel to plot the graph.
After some reading, SharePoint object model come close to what I used to work with (Oracle DB). I understand it may be entirely different from tradition db and querying.
Please help me with
What is the best method to approach this?
Is there a good book/resource.
Thanks a lot,
bsr
The easiest apprach would be to LINK to the sharepoint lists using Access 2007 or 2010 and then export the data to Excel for further processling. Of course, you could also write a program that uses CAML query to access the data. Your requirement sound straightforward, unless you need to automate the reporting process, the simplest approach would be to access the lists via an access database.
You could also create a web service via REST that pulls the data directly into Excel.
SharePoint has it's own query language: CAML query, and in theory that could be used to retrieve the list you seek.
And you should be prepared for "some" trial and error.
Tools I used:
http://www.u2u.be/res/tools/camlquerybuilder.aspx
http://spud.codeplex.com/
what I understand from this question is that you have the need to put the SharePoint data to an excel file and this from within the SharePoint site? So it looks to me that you could just create a simple SharePoint web part that consists of one button "generate excel file". So when the user clicks on the button you would just query your SPList object(SharePoint object model) and you would get all the necessary data from the list (SPListItems).
This is the way that I would take. Mind you that this is offcourse custom SharePoint Development (.NET c#). There are lots of books or blogs that described how to create your own web part in SharePoint.
I have a ton of data in a sql database which I would like to be able to import and display in excel (I can already do this) and additionally modify or append to the dataset within excel and write the changes/additions back to the database.
What is the best way to go about doing something like this?
Please let me know, thanks!
The way to do this is via Sql Server's DTS/SSIS capabilities. Create SSIS packages for Excel import and export and execute them as needed.
However you still have the issue of people having to share this massive spread sheet. You should consider importing the data into the db permanently and providing a winforms interface for the data entry. You'd be surprised how quickly you could whip out an app with a databound grid view control that would give you decent, Excel-like ability to add/edit/delete table data.
Although Excel is great at displaying/reporting on data stored within a SQL DB, it has no built-in controls for updating the data.
I would recommend investigating using VBA (Visual Basic for Applications) or based on your coding experience/tools available to you, VSTO (Visual Studio Tools for Office).
This method will allow all of your users to share the spreadsheet at the same time and allow incremental updates plus validation of the data being entered by the user at the point they enter it.
All the usual gotchas apply though - mainly GIGO (Garbage In, Garbage Out). Correctly authenticate your users and what they are allowed to update
I am building a small application for a friend and they'd like to be able to use Excel as the front end. (the UI will basically be userforms in Excel). They have a bunch of data in Excel that they would like to be able to query but I do not want to use excel as a database as I don't think it is fit for that purpose and am considering using Access. [BTW, I know Access has its shortcomings but there is zero budget available and Access already on friend's PC]
To summarise, I am considering dumping a bunch of data into Access and then using Excel as a front end to query the database and display results in a userform style environment.
Questions:
How easy is it to link to Access from Excel using ADO / DAO? Is it quite limited in terms of functionality or can I get creative?
Do I pay a performance penalty (vs.using forms in Access as the UI)?
Assuming that the database will always be updated using ADO / DAO commands from within Excel VBA, does that mean I can have multiple Excel users using that one single Access database and not run into any concurrency issues etc.?
Any other things I should be aware of?
I have strong Excel VBA skills and think I can overcome Access VBA quite quickly but never really done Excel / Access link before. I could shoehorn the data into Excel and use as a quasi-database but that just seems more pain than it is worth (and not a robust long term solution)
Any advice appreciated.
Alex
I'm sure you'll get a ton of "don't do this" answers, and I must say, there is good reason. This isn't an ideal solution....
That being said, I've gone down this road (and similar ones) before, mostly because the job specified it as a hard requirement and I couldn't talk around it.
Here are a few things to consider with this:
How easy is it to link to Access from Excel using ADO / DAO? Is it quite limited in terms of functionality or can I get creative?
It's fairly straitforward. You're more limited than you would be doing things using other tools, since VBA and Excel forms is a bit more limiting than most full programming languages, but there isn't anything that will be a show stopper. It works - sometimes its a bit ugly, but it does work. In my last company, I often had to do this - and occasionally was pulling data from Access and Oracle via VBA in Excel.
Do I pay a performance penalty (vs.using forms in Access as the UI)?
My experience is that there is definitely a perf. penalty in doing this. I never cared (in my use case, things were small enough that it was reasonable), but going Excel<->Access is a lot slower than just working in Access directly. Part of it depends on what you want to do....
In my case, the thing that seemed to be the absolute slowest (and most painful) was trying to fill in Excel spreadsheets based on Access data. This wasn't fun, and was often very slow. If you have to go down this road, make sure to do everything with Excel hidden/invisible, or the redrawing will absolutely kill you.
Assuming that the database will always be updated using ADO / DAO commands from within Excel VBA, does that mean I can have multiple Excel users using that one single Access database and not run into any concurrency issues etc.?
You're pretty much using Excel as a client - the same way you would use a WinForms application or any other tool. The ADO/DAO clients for Access are pretty good, so you probably won't run into any concurrency issues.
That being said, Access does NOT scale well. This works great if you have 2 or 3 (or even 10) users. If you are going to have 100, you'll probably run into problems. Also, I tended to find that Access needed regular maintenance in order to not have corruption issues. Regular backups of the Access DB are a must. Compacting the access database on a regular basis will help prevent database corruption, in my experience.
Any other things I should be aware of?
You're doing this the hard way. Using Excel to hit Access is going to be a lot more work than just using Access directly.
I'd recommend looking into the Access VBA API - most of it is the same as Excel, so you'll have a small learning curve. The parts that are different just make this easier. You'll also have all of the advantages of Access reporting and Forms, which are much more data-oriented than the ones in Excel. The reporting can be great for things like this, and having the Macros and Reports will make life easier in the long run. If the user's going to be using forms to manage everything, doing the forms in Access will be very, very similar to doing them in Excel, and will look nearly identical, but will make everything faster and smoother.
I do this all the time. If you're using ADO, you're not really using Access, but Jet, the underlying database. That means anybody with Excel can use the app - Access not required. Oh I should mention, the place I work bought a bunch of Office Small Business licenses - no Access. Prior to working here, I would have assumed that anyone who had Excel would also have Access. Not so.
I create one class for every table in Access. I very rarely run queries through ADO, instead I keep that logic in the class modules. I read in with a SELECT statement and write out with and UPDATE or INSERT using the Execute method of the ADODB.Connection object.
See http://www.dailydoseofexcel.com/archives/2008/12/21/vba-framework-ii/
if you want to see how I set up my code.
To answer your questions: It will be a small learning curve for you if you already know Excel VBA, but there will be some learning to do; you will pay a performance penalty over doing it all in Access, but it's not that bad and only you can decide if it's worth it; and you can have multiple people accessing the database.
Just skip the excel part - the excel user forms are just a poor man's version of the way more robust Access forms. Also Access VBA is identical to Excel VBA - you just have to learn Access' object model. With a simple application you won't need to write much VBA anyways because in Access you can wire things together quite easily.
If the end user has Access, it might be easier to develop the whole thing in Access. Access has some WYSIWYG form design tools built-in.
Unless there is a strong advantage to running your user form in Excel then I would go with a 100% Access solution that would export the reports and data to Excel on an ad-hoc basis.
From what you describe, Access seems the stronger contender as it is built for working with data:
you would have a lot more tools at your disposal to solve any data problems than have to go around the limitations of Excel and shoehorn it into becoming Access...
As for your questions:
Very easy. There have been some other questions on SO on that subject.
See for instance this one and that one.
Don't know, but I would guess that there could be a small penalty.
The biggest difficulty I see is trying to get all the functionalities that Access gives you and re-creating some of these in Excel.
Yes, you can have multiple Excel users and a single Access database.
Here again, using Access as a front-end and keeping the data in a linked Access database on your network would make more sense and it's easy as pie, there's even a wizard in Access to help you do that: it's just 1 click away.
Really, as most other people have said, take a tiny bit of time to get acquainted with Access, it will save you a lot of time and trouble.
You may know Excel better but if you've gone 80% of the way already if you know VBA and are familiar with the Office object model.
Other advantages of doing it in Access: the Access 2007 runtime is free, meaning that if you were to deploy to app to 1 or 30 PC it would cost you the same: nothing.
You only need one full version of Access for your development work (the Runtime doesn't have the designers).
It really depends on the application. For a normal project, I would recommend using only Access, but sometimes, the needs are specific and an Excel spreadsheet might be more appropriate.
For instance, in a project I had to develop for a former employer, the need was to give access to different persons on forms(pre-filled with some data, different for each person) and have them complete them, then re-import the data.
Since the form was using heavy number crunching, it made more sense to build it in Excel.
The Excel workbooks for the different persons were built from a template using VBA, then saved in a proper location, with the access rights on the folder.
All workbooks were attached as External tables to the workbooks, using named ranges. I could then query the workbooks from the Access Application. All administrative stuff was made from the db, but the end users only had access to their respective workbook.
Developping an Excel/Access application this way was a pleasant experience and the UI was more user-friendly than it would have been using Access.
I have to say that in this case, it would have taken a lot more time doing it in Access than it took using Excel. Also, the Application Object Model seems better though in Excel than in Access.
If you plan to use Excel as a front-end, do not forget to lock all the cells, but the editable ones and don't be affraid to use masked rows and columnns (to construct output tables for the access database, to perform intermediate calculations, etc).
You should also turn off autocalculation while importing data.
It's quite easy and efficient to use Excel as a reporting tool for Access data.
A quick "non programming" approach is to set a List or a Pivot Table, linked to your External Data source. But that's out of scope for Stackoverflow.
A programmatic approach can be very simple:
strProv = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & SourceFile & ";"
Set cnn = New ADODB.Connection
cnn.Open strProv
Set rst = New ADODB.Recordset
rst.Open strSql, cnn
myDestRange.CopyFromRecordset rst
That's it !
Given the ease of use of Access, I don't see a compelling reason to use Excel at all other than to export data for number crunching. Access is designed to easily build data forms and, in my opinion, will be orders of magnitude easier and less time-consuming than using Excel. A few hours to learn the Access object model will pay for itself many times over in terms of time and effort.
I did it in one project of mine. I used MDB to store the data about bills and used Excel to render them, giving the user the possibility to adapt it.
In this case the best solution is:
Not to use any ADO/DAO in Excel. I implemented everything as public functions in MDB modules and called them directly from Excel. You can return even complex data objects, like arrays of strings etc by calling MDB functions with necessary arguments. This is similar to client/server architecture of modern web applications: you web application just does the rendering and user interaction, database and middle tier is then on the server side.
Use Excel forms for user interaction and for data visualisation.
I usually have a very last sheet with some names regions for settings: the path to MDB files, some settings (current user, password if needed etc.) -- so you can easily adapt your Excel implementation to different location of you "back-end" data.
To connect Excel to Access using VBA is very useful I use it in my profession everyday. The connection string I use is according to the program found in the link below. The program can be automated to do multiple connections or tasks in on shot but the basic connection code looks the same. Good luck!
http://vbaexcel.eu/vba-macro-code/database-connection-retrieve-data-from-database-querying-data-into-excel-using-vba-dao
It Depends how much functionality you are expecting by Excel<->Acess solution. In many cases where you don't have budget to get a complete application solution, these little utilities does work. If the Scope of project is limited then I would go for this solution, because excel does give you flexibility to design spreadsheets as in accordance to your needs and then you may use those predesigned sheets for users to use. Designing a spreadsheet like form in Access is more time consuming and difficult and does requires some ActiveX. It object might not only handling data but presenting in spreadsheet like formates then this solution should works with limited scope.
You could try something like XLLoop. This lets you implement excel functions (UDFs) on an external server (server implementations in many different languages are provided).
For example you could use a MySQL database and Apache web server and then write the functions in PHP to serve up the data to your users.
BTW, I work on the project so let me know if you have any questions.