Read 1GB size excel with 1.5 Records per sheet to Dataset - excel

I am trying to import large excel files to database using .NET application in which I will do some customized cleansing and processing of data. The excel file will have sheets with 255 columns and 150,000 rows. I tried different solutions such as Microsoft.JET/ACE provider, OpenXML/OfficeOpenXML and LinqToExcel. I get OutofMemory exceptions with both Microsoft adapter and openxml. Please let me know how to deal with it.

Have you tried Integration Services to import an Excel file into SQL Server 2005? You can use Integration Services (a wonderful tool) for extracting, transforming, and loading data. Common uses for Integration Services include: loading data into the database; changing data into to or out from your relational database structures.
You can call this services from you .net code again and again to perform the task in repetitive manor. If not from .net code you can schedule it as well.
see this http://www.techrepublic.com/blog/datacenter/how-to-import-an-excel-file-into-sql-server-2005-using-integration-services/205 sample application for the same.
If you dont want to use SSIS. You could use the following ready to use open tools for the same.
http://exceldatareader.codeplex.com/
http://www.codeproject.com/Articles/14639/Fast-Excel-file-reader-with-basic-functionality
http://www.codeproject.com/Articles/16210/Excel-Reader
This should help you.

Related

What is the best way to create azure function that can read excel sheet and convert the data into POCO to push into Azure Table?

I am creating an Azure function that can read an excel file and push the data to an azure table. I have researched and found the following options to proceed with the solution
Use EP Plus Package. There is no native method or functionality in this package to map sheet data to POCO but I have come across a few solutions to custom build one as per the requirements.
Use OLEDB Connection to query the sheet data.
Interop dll. But this is out of the question considering the deployment on cloud since it needs to have MS Office installed on the server.
Which one of the above approach would be more suitable for Azure cloud platform? Please let me know if there is any other way apart from the two mentioned above. Thanks.

Is it Possible to update SQL server data using Excel Power query?

I just recently realise, while using Excel, it lacks on features for being able to update SQL server data through their worksheet.
I have tried using Data-->From other sources--> SQL Server data; that works like a charm but as it has limited ability (View and only get the latest data but not update).
I don’t know if this is done purposely by Microsoft as a money making schemes.
But through my research today, I also came across PowerQuery, and It seems to do pretty much what Data add-in did escape it has few new extra features and sounds pretty advance, therefore, I was wondering if this add-in has the ability to update SQL server data using excel sheet, if so can you guys advise me to the right direction:
I came across lots of commercials products that did the job but frankly speaking, I cannot afford it.
The best solution for in this space that I've seen is the Master Data Services component included in SQL Server (Business Intelligence or Enterprise Edition). This includes an nice Excel Add-In for maintaining data, a Web UI and SQL Views and Staging Tables for data integration.
It doesn't have any direct integration with Power Query, but I would let PQ dump data into Excel Tables, then copy and paste the data into Excel tables using the MDS Add-In.

SQL Server and Excel

I want to link an excel file to SQL Server 2014 whereby I can edit the file and the data gets updated on the server automatically.
Similar to what happens when you link sql server to Access whereas you can edit the data and the changes take effect in the server.
Thanks in advance
There is no out of the box solution for this. You can do this either of two ways:
Write a C# code which has a file watcher attached to the Excel file which uploads the Excel file using SSIS job to the database.
Create a scheduled SSIS job which imports the Excel file periodically.
Understanding the purpose would allow for greater elaboration.
This depends on the type of data you wish to edit.
For master data, if you have the Enterprise or Business Intelligence edition of SQL Server and Master Data Services set up, there is a plug-in for Excel:
https://msdn.microsoft.com/en-us/library/hh231024(v=sql.120).aspx
For transactional data, I would strongly advise against using Excel as a front-end and would recommend you to consider alternatives.
However, if you are compelled to go down this route, you can achieve this using VBA scripting and linking via a DAL (Data Access Layer) such as ADO.NET. Be aware that giving such power to your users could open up your system to sql injection attacks - only proceed so if you trust the users 100%. Another thing to take into consideration is validation checks - validation checks should be applied to every cell where data can be entered. More information can be found here:
https://support.microsoft.com/en-us/kb/316934

Running Excel automation locally or on server

Wanted some opinions on which method is a better practice. We have a sales report that MUST be generated in a very specific format (down to the row colors and fonts).
I already have written a macro which pulls from our database and populates the entire workbook in about 15 seconds. The question is how should it be populated?
1) Process server-side: Users initiate the request on the intranet page. ASP.NET opens the workbook template, executes the macro and serves back the final sheet.
2) Process locally: Users download the blank template, run from their desktops which automatically connect to the database.
I like the first one because I can enforce the template, timing, users, and security of the data. But is running Excel automation on an internet web server recommended? I like the second option, but I'm afraid of losing standardization as template sheets begin floating around the company.
As for server side:
I highly.. HIGHLY.. recommend checking out the OpenOffice/LibreOffice XML format for spread sheets.
You can use the localc binary in headless mode to convert the XML file to XLSX or what have you. I use it to create PDF files instead of using ReportLab.
Alternatively here are some other projects that attempt to write to Microsoft formats directly:
http://pypi.python.org/pypi/xlrd
http://pypi.python.org/pypi/xlwt
As for client side:
If you expect the user to be only using Excel and not any other spreadsheet software then go ahead and use an ODBC data source. ODBC will have to be configured per user unless you use some fun VBScript to pull the data from an HTTP server every time it is loaded. There is also the option of making an XLS spreadsheet that simply holds the data and including it into an XLS document as well which would be both a server and client XLS requirement.
Go for server side. Makes information simple to archive and share and will most likely be multi-platform as well.
If you like to use your first option, then you want to avoid using VBA on an installed instance of Excel on the server. This is extremely resource intensive and does not scale well. Instead, if you are writing ASP.NET code, then you should try using the Microsoft Office Interop functionality that is built into the .NET framework. It should possible to adapt your existing VBA code to run under ASP.NET with some changes, but you will have a much more reliable product in the end.
Example Code
However, as #whardier points out in his response, if this were for a large scale or public site, the suggestions he makes would be much more suitable and would scale much further.

SSAS-like manipulation of data in excel, without SSAS

I have provided users with a view of a large data set through Sql Server Analysis Services, and they find it very easy and intuitive to manipulate.
However, I am now being asked to provide them with access to smaller and smaller data sets, for which Analysis Services is not a great fit. The reason is that they like the ease of manipulation of the data, and it's pretty flexible in it's presentation of the data.
Also, many of the data sets are available to retrieve via a REST API, in a tabular form, which I'd prefer to use rather than providing database access.
Can anyone recommend any tools or libraries (ideally open source) which:
provide an SSAS-like interface for building up a pivot table (with attributes grouped together rather than in a flat list)
can retrieve their data from a web service rather than a traditional DB?
(NB I thought about trying powerpivot, but I'm not really sure what I'd be getting myself into, so if anyone has any experience of using this I'd be interested to hear)
Powerpivot is an excel plugin for excel 2010 that uses the vertipaq engine. It has a language called DAX that is very similar to MDX,
more information can be found here
If you wish to use PowerPivot, you have three options:
1) Use PowerPivot from within Excel (it's a free add-in - be sure to install the edition that matches the edition of Excel you have, i.e. 2007 or 2010 and 32-bit or 64-bit). You are using the resources of the client machine in this configuration.
2) Use PowerPivot for SharePoint - this requires SPS 2010 Enterprise. It allows you to host (render) the PowerPivot workbook using resources from the SPS server.
3) Use SQL Server 2012 SSAS installed in Tabular mode (to build a BISM). BI Semantic Models are PowerPivot models which are hosted on a SQL Server instance. This requires a full SQL Server licence, so it's certainly not cheap. However, here you have the greatest flexibility for resources, as you can use (control/monitor) the resouces of your server.
For more information see my deck on the BISM on SlideShare.

Resources