The best method of storing large excel file in database - node.js

My application requires rapid fetching of excel data from file as big as 100,000 lines.
The serverside is currently nodeJS and the excel parsing tools are too slow and memory issues occur when I attempt to load such a big excel file into the program.
If I can somehow store the excel spreadsheet cells into a database, I can make queries to only retrieve n number of rows.
The problem is that the files that will be uploaded to the server do not have a set schema so I can't generalize a schema and push to the tables.
Any suggestions on storing xlsx files in database for rapid retrieval of data would be very much appreciated.

Related

What could cause missing rows when importing data into Sequelize + Heroku Postgres?

I am using Heroku Postgres for a project and am trying to import excel spreadsheets (.xlsx) of data into a Postgres database. I have written a script in Node.js using the "xlsx" package and the "sequelize" packages. Basically, I read the excel file and convert the sheet to JSON, and then I loop through each object (row of data) in the json list and insert it into the database using Sequelize.
The issue is that after successfully running my script, there is about 800 missing rows of data that was not inserted into the database. 800 missing rows is about 20% of the total number of rows in the excel sheet.
I have done some extensive googling, but I can't seem to find any information related to missing rows when importing excel sheets into a database. If I had to guess, Sequelize does some "internal magic" to verify the integrity of data such as checking for duplicates or similar things of that nature, which might be eliminating rows. Unfortunately, the logs and print statements from sequelize are quite large and verbose, so it is hard to parse for what might be the issue, if the issue even is with sequelize.
Any information is appreciated!

Sharing Excel file containing Power Queries without also needing the other datasource workbooks

I have an Excel file that contains a few Power Queries from other local workbooks.
Is it possible to save the file in a way to include the actual data of the queries where my users wouldn't need to have the other local workbooks for it to work? Essentially, creating a snapshot in time?
If you mean raw data, it's not possible - Power Query queries don't keep external data. But, of course, you may save results of queries in various forms (flat tables, pivot tables, data model).

Connecting Powerquery to multiple Powerpivot files

I have around half a dozen Powerpivot files containing data extracted from an SQL database. Each file has around 30 million lines of data in an identical format. I do not have access to the underlying database but each powerpivot file contains the data.
I would like to connect to all of these files in one new workbook using Powerquery so that I can append them, add them to the data model and work with them in Excel.
I have found various solutions on how to get the data into CSV format using DAX studio but I would prefer to avoid this as it seems an unwieldy solution to export hundreds of millions of lines of data to CSV and then import it back to Powerquery when I already have the formatted data in Powerpivot.
I also don't have any experience of using SQL so would prefer to avoid that route.
I've tried creating a linkback as described here https://www.sqlbi.com/articles/linkback-tables-in-powerpivot-for-excel-2013/ but when I connect to this it only returns 1,048,576 lines of data (i.e. what Excel is limited to).
Is there an option for Powerquery to use Powerpivot data in a separate workbook as a source or another straightforward solution?
Thanks
You can either materialise the data (which you've tried, using Linkback tables), or you can copy the queries. There's no other way to reference powerpivot model data.

How to replace index/match with a connection

In order to view customer data in an Excel sheet I have used the functions index/match to retrieve data. However, due to the large amount of customers the file has gotten very large. Right now it is 13MB. This file is regularly sent through mail, so it is a real headache having to open it every time.
Is there a way to replace Index/Match with something else in order to reduce the file size? Transforming the source file into an SQL file? Adding a connection to the source file?
Thanks.

Consumer PowerPivot/Excel DataModel from another Excel file?

Short version: Is there any way/hack to use the embedded DataModel/PowerPivot cube of an Excel 2013/6 file from another Excel file?
Long version:
We have a large Excel Data Model with >400k rows and >100 measurements, feeding multiple reports (i.e. PivotTable on separate worksheets). As all this is growing, we want to split this out into a (large) data model and multiple reports. I know this could be done with SharePoint or PowerBI - however one of the key requirements is to be able to analyse the data offline. Hence, I'm trying to figure out any way to connect to the data model from another file....
There's no way that I know to do what you're asking. Is there any reason you can't just include all the reports in one workbook with the data model? Since you have to be able to analyze offline, anyway, everyone will need a local copy of the model. If the concern is just that there will be too many sheets in a single workbook, you could just put a thin veneer of VBA in it to hide and unhide sheets in groups for ease of use.
It looks like Microsoft has added an option to establish connection via ODC file.
See this f.e. https://learn.microsoft.com/en-us/sql/reporting-services/report-data/use-an-office-data-connection-odc-with-reports?view=sql-server-ver15
However it's not working out for me, I am using Excel 2016, exported data model from the file with data model as a separate odc file but when I try to add this as a connection in another file - I get the message - can't open the file. Looks like creating ODC file is not that straightforward.
Anyone had similar issues?

Resources