I have an excel worksheet with 10 tabs.
For each tab, the data is structured as follows:
All tabs follow this same basic structure.
In Power BI, when I go to "Get Data", and then choose the .xlsx file, I get the following error:
Unable to connect
We encountered an error while trying to connect.
Details: "The input couldn't be recognized as a valid Excel document."
This is very frustrating and I don't know how such a simple task cannot be accomplished in Power BI.
Thank you.
Such alert could appear when you try to use Power BI connector on Excel file. It's understandable if the source file is corrupted and can't be opened in Excel. However, it looks strange if Excel opens the file in question and shows nothing wrong.
Based on our experience above is usually means what something is wrong with XML scheme of the Excel workbook.
Mushup trace (Data->New Query->Query Options->Diagnostics->Enable tracing) could give some additional information, but often not enough to find the reason.
We had two main scenarios
XML scheme is not complete
Usually if Excel file was generated by third-party tool. Such tool could generate quite limited XML scheme which is enough to open the file in Excel and to work with it, but not enough for Power BI connector. As an example, trace log shows
[DataFormat.Error] The input couldn't be recognized as a valid Excel document.\r\nStackTrace:\n…
…
[DataFormat.Error] We couldn't find a part named '/xl/sharedStrings.xml' in the Excel package.\r\nStackTrace:\n…
Such case is easy to fix – it's enough to open the file in Excel and save it (without any changes) – Excel is clever enough to fix the scheme. For the routine regular tasks we use poweshell script which does exactly the same in background.
There is the link within Excel file which is not recognizable as valid
Usually if Excel file is synced/kept with some cloud storage. One of the variants, wrong link could appear with copy/paste from another such file. That could be active link in one of the cells; or the link within conditional formatting formula; or even the link which actually isn't used by Excel but kept somewhere inside the scheme. For example, in one of the files I found in Data->Consolidate->All references the link like
'\drive.tresorit.com#7235\Tresors….[file.xlsx]Sheet'!$AC$6:$AC$357
on the file which was deleted long ago and isn't used, but for some strange reason the link was kept within the scheme.
Unfortunately for such case trace log doesn't give enough information to localize the issue, it looks like
[DataFormat.Error] The input couldn't be recognized as a valid Excel document.\r\nStackTrace:\n…
…
nExceptionType: System.UriFormatException, System, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089\r\nMessage: Invalid URI: The hostname could not be parsed.\r\nStackTrace:\n
Perhaps I have not enough knowledge for more straight forward localization of the problem, but the only way is to exclude Excel file parts one by one and check if the issue disappeared. Another way could be to unzip Excel file and check if wookbook.xml or sheetNN.xml have something suspicious inside.
Related
I am working off of a great solution created by #MattHall from 2011 to a question that I also shared about importing a dynamic range from Excel into Access.
Specific to that--though in general for future VBA's--my question is whether there is an additional way to be able to point to the Excel source file if it is moved without having to go into the VBA editor every time?
For my specific needs, I am trying to work on these Access and Excel files with others through a shared BOX that has a different file path for whoever is working on it.
USER 1 may be: C:\Users\USER1\Box Sync\filename.xlsx
USER 2 may be: C:\Users\USER2\Box Sync\filename.xlsx
...and so forth for any other users. I am curious how we can all work off this when the file path used in the VBA created and used by USER1 is not accessible by USER2? Could there be some code that allows for the every user to locate the file each time through their own filepath?
It would be a pain to do that but I also do not know a better option as we are not working off a shared server and this is unfortunately limited to Box share at the moment.
EDIT: If anyone could also suggest how to integrate their recommendation into the 'Dynamic Range' code in solution from #MattHall in the linked Stackoverflow, that would particularly helpful to my request.
Coming from many different languages, I got stuck in ERP development. The thing is that our ERP is using XSL transformations to generate Excel files in SpreadsheetML format, which is saved with .xls extension, so that Windows open it within Excel.
This results in many unnecessary clicks that our users are forced into; excel screaming about "repairing" the file, wanting to save it with .xml extension - so our users have to pick correct extension in order to send it to client / colleagues... That's just unnecessary load from user standpoint.
Now, due to historical reasons we are unable to ditch the XSLT (and hey! it's awesome anyway) but what we do want to is to make the entire process streamlined from the user standpoint.
Generating 2007+ formats
In the end - after researching google thoroughly, I ended up writing xslt transformations "engine" which allows us to generate valid 2007 formats. All we do now is to generate the contents of sheet.xml in OpenXML format. It also allows us to generate macro-enabled excel files.
But seriously, is there something I missed? Is there any other way how to generate the Excel OpenXML format with some already maintained solution?
The SpreadsheetML is one way how to do it, I don't like it because it introduces a huge demand on end-user and does not contain images/macros. Or maybe I'm doing something wrong?
SpreadsheetML confusion
When I say SpreadsheetML I mean the "XML Spreadsheet" . I've noticed that Microsoft is calling the OpenXML Excel format also as SpreadsheetML - which makes me confused.
I have a page with an interactive report. If I do a 'Control Break' and have an aggregate in place, is there a way I can export the results to Excel, exactly the way it appears on the page?
When I 'Download' the report, it appears as the third screen shot, which is not separated.
Interactive Report Results:
How I would like to export the data to Excel:
The format that is currently exported:
The download to excel is always in CSV format. The file extension is not .xlsxbut .CSV. So, i'd say no.
It's tough too. Even if you were to create a custom export to excel you'd have to extract the current query of the report (which is something that has finally been made easier in 4.2, but is possible in 4.0/1 with 3rd party packages). Then you'd also have to account for the control break(s) you applied, since those are not reflected in the IR query (even with APEX_IR).
I've dabbled with generating an xlsx file and made a blogpost/sample application on that if you'd like to see what it encompasses. Be aware that this is taking 'custom solution' to the extreme though (at least, in my opinion).
http://apex.oracle.com/pls/apex/f?p=10063
you could create the report in BI Publisher in Oracle, then through APEX, you can call the report with parameters.
Actually APEX Office Print (AOP) supports exporting for Interactive Report and Interactive Grid (and others) to Excel, exactly as you see on the screen (so including breaks, group by etc)
HI;
I cannot connect a CSV file to a SpreadSheet when the file is open. Currently a have a csv log file that is being constantly updated. I was able to connect it a an Excel SpreadSheet by normal import from external source with refresh every hour. However, its a big file so I needed to produce the reports using EXCEL SQL. It will not allow me to connect to the file while it is open. It says that the MS Jet database engine cannot open the file'unknown'. It is already opened exclusively by another user or i need permission to view its data. If granting permission is he problem , where do I grant myself permission. On a standard Impor, I have no problems reading the file while it is open, but otherwise, get this message and cannot proce3ed. Any help would be appreciated. If I close the update program, I am able to run the queries, but not if the update is running.
Using MSO 2007 W7 x64
It will not allow me to connect to the file while it is open.
That's right, it won't - there is no way to change this.
You must find another way to solve your problem.
How big is the file? You may be able to make a copy to a temporary filename, and connect Excel to that instead.
It sounds like you are accessing a logfile. LogParser can read CSV. In any case LogParser has an excellent SQL-like syntax and can read CSV files much more quickly and reliably than ODBC. It is also programmable from Excel VBA (or script). Perhaps you can use LogParser to extract the values of interest and then load those into your Excel table instead.
I suspect your best solution will be to use the LogParser MSUtil.LogQuery object from Excel VBA, to extract the values of interest into your spreadsheet. Since I don't know what you are actually doing this is just a guess!
I cannot recommend LogParser highly enough - it is a wonderful tool, and can read just about every standard type of logfile, CSV, TSV, W3C, as well as plain text files and the windows NT event logs:
LogParser 2.2 Download: http://www.microsoft.com/downloads/en/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en
I am programatically opening excel workbooks under a folder tree to check for some project references using the following code -
workbook = app.Workbooks.Open(fileName,false,true,Missing,Missing,.....);
foreach(Reference r in workbook.VBProject.References)
{
//check for a specific reference here
}
This works fine but my folder structure is very deep and I have over 20,000 spreadsheets stored in them. Sometimes depending on the size of the excel file, the call to Workbooks.Open() takes a long time (over 5 minutes per call on some files). Is there a faster more efficient way to do this?
Thanks for the help
It seems like whenever you have to hit the Excel object model, you're going to take a performance hit. I agree with the previous poster, that if you want to speed up performance, you'll need to read the Excel files directly.
As a side note, since Excel 2007 files (*.xlsm, *.xltm) are essentially *.zip files, you would need to find and access the vbaProject.bin file directly. A quick look points to the path as (I changed the extension so I could browse the file):
..\Book1.zip\xl\vbaProject.bin
Obviously you could dig through that bin file manually and find particular references (as suggested by the previous poster), but if you're looking to loop through all of the references in a project, you'll need to use the API calls IStream/IStorage. There's a great article about reverse engineering the Office BIN files here: http://www.codeproject.com/KB/cs/office2007bin.aspx. To access references in vbaProject.bin, look for the section titled "Reading or updating vbaProject.bin parts". There is also a sample C# code project that demonstrates how to read an OLE container. I just took a peak at the code sample, so I can't attest to it's effectiveness, but it certainly seems in order.
Hope that helps!
I don't think you can increase the Workbooks.Open performance. However if your main intention is just to check if a particular reference is used by the Spreadsheet or not, then consider opening the Excel file in binary mode and searching for the dll string.(The path of the dll providing the functionality, which can be seen in the Location part of the References Window).
This would be very crude way, but if the Workbook.Open performance is really a bottleneck then you can definitely give it a try.