Power Query Excel File Missing Data Issue - excel

I'm new to Power Query and running into a strange issue that I haven't been able to resolve.
I'm creating a query to to extract data from roughly 300 Excel files. Each file has one sheet, 115 Columns and around 100 rows. However, the query is only returning the data from the first two columns and rows and I'm not sure why the query won't return all of the data on the sheet.
Ex:
Header 1 Header 2
Data Data
I converted one file to a .csv file and the query will return all data from the file. I've scoured Google and I haven't been able to find anything that seems to relate to this issue. Is there an excel file limitation that I'm not aware of?
I'm assisting someone that is not technical savvy so I would like to try to avoid VB code and Access if possible. Also, I can't really provide a file I'm working with because the data contains PHI.
Thank you in advance!

Related

How handle mass data in Excel (e.g. powerquery)

please dont hurt me for this question... I know it is not the ideal way to handle mass data- but I need to try...
I got a folder with 10 csv Files which are always under the xlsx row limitation of 1.048.576 rows.
Now I try to combine those files in one file. The combination of all files reach over 1.048.576 rows. With the import dialog I always get the error saying: not possible to load all data etc..
I found a way to load the data only in the data model of power query and not directly in the sheet. But I cannot find any way to split the data into different sheets.
Ideal split for example:
Sheet 1: File 1-3
Sheet 2: File 4-8
Sheet 3: File 9-10.
Is there a way to get for each file a different query and then to append those queries in the sheets? I would like to get 10 queries, which I can append the way mention above.
Thank you for your Input!
You can load each CSV file separately as a unique query, with each File... Close and Load saved as Connection Only. Then create separate queries that use a Table.Combine() to put together the combinations you need [data .. Get data … combine queries .. Append...] in separate queries that you file load as either tables or pivot reports back on the sheets

Excel Power Query Connection and Loading into an existing Table (Query results cannot overlap a table or XML Mapping)

First post here.
I have an existing workbook that was created some time ago with a table by a user. Basically the user got an extract file and simply cut and pasted the data to the table, to the right of this data are a whole bunch of formulas within the table ... so they then just copy and paste any formulas down.
What I am trying to do is remove this effort and have it refreshed from the updated extract file of raw data.
I know I could do VB to deal with this (although not done any VB for a few years), HOWEVER I notice there is a data connection and Power Query so thought this would be a better way.
Problem is as there is already a Table I can not import into it due to the said error "Query results cannot overlap a table or XML Mapping", I understand that the connection creates/recreates the table.
I have tried methods to get round this ...
Recreate the Table and then Find & Replace the name references, but a lot of the formulas exceed the find and replace string.
Tried to convert the Table to a range and then import, but I ended up with 2 two tables side by side. I can't see anything to merge the 2 and not sure this will solve the problem when I try to import again.
Any starter for ten on this will help as I've not dabbled with Excel in this way for a few years.
Regards
Gary

How to replace index/match with a connection

In order to view customer data in an Excel sheet I have used the functions index/match to retrieve data. However, due to the large amount of customers the file has gotten very large. Right now it is 13MB. This file is regularly sent through mail, so it is a real headache having to open it every time.
Is there a way to replace Index/Match with something else in order to reduce the file size? Transforming the source file into an SQL file? Adding a connection to the source file?
Thanks.

How to select specific rows to load into an Excel Workbook from another at run-time

I have to .xlsx files. One has data "source.xlsx" and one has macros "work.xlsm". I can load the data from "source.xlsx" into "work.xlsm" using Excel's built-in load or using Application.GetOpenFilename. However, I don't want all the data in the source.xlsx. I only want to select specific rows, the criteria for which will be determined at run time.
Thinks of this as a SELECT from a database with parameters. I need to do this to limit the time and processing of the data being processed by "work.xlsx".
Is there a way to do that?
I tried using parameterized query from Excel --> [Data] --> [From Other Sources] but when I did that, it complained about not finding a table (same with ODBC). This is because the source has no table defined, so it makes sense. But I am restricted from touching the source.
So, In short, I need to filter data before exporting it in the target sheet without touching the source file. I want to do this either interactively or via a VBA macro.
Note: I am using Excel 2003.
Any help or pointers will be appreciated. Thx.
I used a macro to convert the source file from .xlsx to .csv format and then loaded the csv formatted file using a loop that contained the desired filter during the load.
This approach may not be the best, nevertheless, no other suggestion was offered and this one works!
The other approach is to abandon the idea of pre-filtering and sacrifice the load time delay and perform the filtering and removal of un-wanted rows in the "work.xlsm" file. Performance and memory size are major factors in this case, assuming code complexity is not the issue.

SSIS Data Flow Task Excel Source

I have a data flow task set up in SSIS.
The source is from an Excel source not an SQL DB.
The problem i seem to get is that, the package is importing empty rows.
My data has data in 555200 rows, but however when importing the SSIS package imports over 900,000 rows. The extra rows are imported even though the other empty.
When i then download this table into excel there are empty rows in between the data.
Is there anyway i can avoid this?
Thanks
Gerard
The best thing to do. If you can, is export the data to a flat file, csv or tab, and then read it in. The problem is even though those rows are blank they are not really empty. So when you hop across that ODBC-Excel bridge you are getting those rows as blanks.
You could possibly adjust the way the spreadsheet is generated to eliminate this problem or manually the delete the rows. The problem with these solutions is that they are not scalable or maintainable over the long term. You will also be stuck with that rickety ODBC bridge. The best long term solution is to avoid using the ODBC-Excel bridge entirely. By dumping the data to a flat file you have total control over how to read, validate, and interpret the data. You will not be at the mercy of a translation layer that is to this day riddled with bugs and is at the best of times "quirky"
You can also add in a Conditional Split component in your Data flow task, between the source task and the destination task. In here, check if somecolumn is null or empty - something that is consistent - meaning for every valid row, it has some data, and for every invalid row it's empty or null.
Then discard the output for that condition, sending the rest of the rows to the destination. You should then only get the amount of rows with valid data from Excel.

Resources