Best way to connect to SharePoint excel file from powerbi - sharepoint

I need to access about 9 files in a particular sharepoint sub-folder for my powerbi visualisation. Each file holds different data and I need them as separate tables.
I tried the below approach as I felt connecting to the share point folder is really slow.
Sharepoint_Folder_Query > This connects to share point site. And filters for the subfolder. This uses "Sharepoint.files" as I was not able to use sharepoint.contents in subfolder since the files are in a subfolder.
File_Query > This references the above "Sharepoint_Folder_Query" and picks up the file it needs. There is 9 File_Query(s), one for each file
Data_Query > There are again 9 data queries each referencing the respective "File_Query" and performs additional manipulation on the data. These are the tables used in my visualisations.
My idea on this was that since connecting to sharepoint takes a lot of time, I wanted to connect just once and use the reference from then on.
But right now my refresh is taking almost 1 hour... During the refresh I could see each of my queries are trying to connect to share point... I could see the message "Waiting for https://..." under all my queries. Not sure what I did wrong.
For comparison, initially I had the files in one-drive and just used the Data_Query section and created all the visualisations. It then took me only 30 seconds to refresh.
So my question. Is something wrong with my approach? if yes what? Also is there a better way to do this and reduce the refresh time.

Related

Live Connection to Database for Excel PowerQuery?

I currently have approximately 10M rows, ~50 columns in a table that I wrap up and share as a pivot. However, this also means that it takes approximately 30mins-1hour to download the csv or much longer to do a powerquery ODBC connection directly to Redshift.
So far the best solution I've found is to use Python -- Redshift_connector to run update queries and perform an unload a zipped resultset to an S3 bucket then use BOTO3/gzip to download and unzip the file, then finally performing a refresh from the CSV. This resulted in a 600MB excel file compiled in ~15-20 mins.
However, this process still feel clunky and sharing a 600MB excel file among teams isn't the best either. I've searched for several days but I'm not closer to finding an alternative: What would you use if you had to share a drillable table/pivot among a team with a 10GB datastore?
As a last note: I thought about programming a couple of PHP scripts, but my office doesn't have the infrastructure to support that.
Any help would or ideas would be most appreciated!
Call a meeting with the team and let them know about the constraints, you will get some suggestions and you can give some suggestions
Suggestions from my side:
For the file part
reduce the data, for example if it is time dependent, increase the interval time, for example an hourly data can be reduced to daily data
if the data is related to some groups you can divide the file into different parts each file belonging to each group
or send them only the final reports and numbers they require, don't send them full data.
For a fully functional app:
you can buy a desktop PC (if budget is a constraint buy a used one or use any desktop laptop from old inventory) and create a PHP/Python web application that can do all the steps automatically
create a local database and link it with the application
create the charting, pivoting etc modules on that application, and remove the excel altogether from your process
you can even use some pre build applications for charting and pivoting part, Oracle APEX is one examples that can be used.

Options for running data extraction on a daily basis

I currently have an excel based data extraction method using power query and vba (for docs with passwords). Ideally this would be programmed to run once or twice a day.
My current solution involves setting up a spare laptop on the network that will run the extraction twice a day on its own. This works but I am keen to understand the other options. The task itself seems to be quite a struggle for our standard hardware. It is 6 network locations across 2 servers with around 30,000 rows and increasing.
Any suggestions would be greatly appreciated
Thanks
if you are going to work with increasing data, and you are going to dedicate a exclusive laptot for the process, i will think about install a database in the laptot (MySQL per example), you can use Access too... but Access file corruptions are a risk.
Download to this db all data you need for your report, based on incremental downloads (only new, modified and deleted info).
then run the Excel report extracting from this database in the same computer.
this should increase your solution performance.
probably your bigger problem can be that you query ALL data on each report generation.

Creating an excel library (DLL for excel)?

I am working on a project within excel and am starting prepare my document for future performance related problems. The excel file contains large amounts of data and large amounts of images which are all in sets, ie, 40 images belong to one function of the program, another 50 belong to another etc... and only one set of them is used at a time.
This file is only going to get bigger as the number of jobs/functions it has to handle increase. Now, I could just make multiple excel files and let the user choose which one is appropriate for the job but it is requested that this is all done from one file.
Baring this in mind, I started thinking about methods of creating such a large file whilst keeping its performance levels high and had an idea which I am not sure is possible or not. This is to have multiple protected workbooks each one containing the information for each job "set" and a main workbook which accesses these files depending on the user inputs. This will result in many excel files which take time to download initially but whilst being used should eliminate the low performance issues as the computer only has to access a subset of these files.
From what I understand this is sort of like what DLL's are for but I am not sure if the same can be done by excel and if possible would the performance increase be significant?
If anyone has any other suggestions or elegant solutions on how this can be done please let me know.
Rather than saving data such as images in the excel file itself, write your macro to load the appropriate images from files and have your users select which routine to run. This way, you load only files you need. If your data is text / numbers, you can store it in a CSV or, if your data gets very large, use a Microsoft Access database and retrieve the data using the ADODB library.
Inserting Images: How to insert a picture into Excel at a specified cell position with VBA
More on using ADODB: http://msdn.microsoft.com/en-us/library/windows/desktop/ms677497%28v=vs.85%29.aspx

Need to create an SSIS Package for users to directly modify a table

I need to allow a couple of users to modify a table in my database, preferably as part of an integrated package that then submits the changes into our live database.
Please allow me to explain further:
We have an automated import task from one database system into another, with data transformation on the way through.
As part of this task, various checks are run before the final import and any rows with incomplete or incorrect data are sent to a rejections table and deleted from the import table.
I now need to allow a couple of senior users that ability to view and correct the missing/incorrect entries from the rejection table, before re-staging it and submitting to the live database.
(Obviously, it will be re-checked before submission and re-rejected if it is still wrong).
Can anyone tell me what I need to do in SSIS to display the contents of a specific table (e.g. MyDatabase.dbo.Reject_Table) to the user running this package from their local PC (the package will, of course, be located on the server).
Then they need the ability to modify the contents of the table - Either 1 row at a time or en-masse. Not bothered which).
When that is done, they hit a "Continue" or "Next" type button, which then continues to run the remainder of the package, which I am more than comfortable writing.
It is only the interactive stage(s) that I am struggling with and I would really appreciate some advice.
Thanks
Craig
That is non-native functionality in SSIS.
You can write pretty much anything you want in a script task and that includes GUI components. (I once had a package play music). In your data flow, you would have a Script Component that edits each row passing through the component.
Why this is a bad idea
Suitability - This isn't really what SSIS is for. The biggest challenge you'll run into is the data flow is tightly bound to the shape of the data. The reject table for Customer is probably different than the reject table for Phone.
Cost - How are you going to allow those senior users to run SSIS packages? If the answer involves installing SSIS on their machines, you are looking a production license for SQL Server. That's 8k to 23k ish per socket for SQL Server 2005-2008R2 and something insane per core for SQL Server 2012+.
What is a better approach
As always, I would decompose the problem into smaller tasks until I can solve it. I'd make 2 problem statements
As a data steward, I need the ability to correct (edit) incomplete data so that data can be imported into our application.
As an X, I need the ability to import (workflow) corrected rejected data so that we can properly bill our customers (or whatever the reason is).
Editing data. I'd make a basic web page or thick client app to provide edit capability. A DataGridView would be one way of doing. Heck, you could forgo custom development and just slap an Access front end to the tables and let them edit the data through that.
Import corrected data. This is where I'd use SSIS but possibly not exclusively. I'd probably look at adding a column to all the reject tables that indicates whether it's ready for reprocessing. For each reject table, I'd have a package that looks for any rows flagged as ready. I'd probably use a Delete first pattern to remove the flagged data and either insert it into the production tables or route it back into the reject table for further fixing. The mechanism for launching the packages could be whatever makes sense. Since I'm lazy,
I'd have a SQL Agent job that runs the packages and
Create a stored proc which can start that job
Grant security on that stored proc to the data stewards
Provide the stewards a big red button that says Import How that's physically implemented would depend on how you solved the edit question.

How do we get around the Lotus Notes 60 Gb database barrier

Are there ways to get around the upper database size limit on Notes databases? We are compacting a database that is still approaching 60 gigs in size. Thank you very much if you can offer a suggestion.
Even if you could find a way to get over the 64GB limit it would not be the recommended solution. Splitting up the application into multiple databases is far better if you wish to improve performance and retain the stability of your Domino server. If you think you have to have everything in the same database in order to be able to search, please look up domain search and multi-database search in the Domino Administrator help.
Maybe some parts of the data is "old" and could be put into one or more archive databases instead?
Maybe you have a lot of large attachments and can store them in a series of attachment databases?
Maybe you have a lot of complicated views that can be streamlined or eliminated and thereby save a lot of space and keep everything in the same database for the time being? (Remove sorting on columns where not needed, using "click on column header to sort" is a sure way to increase the size of the view index.)
I'm assuming your database is large because of file attachments as well. In that case look into DAOS - it will store all file attachments on filesystem (server functionality - transparent to clients and existing applications).
As a bonus it finds duplicates and stores them only once.
More here: http://www.ibm.com/developerworks/lotus/library/domino-green/
Just a stab in the dark:
Use the DB2 storage method instead of to a Domino server?
I'm guessing that 80-90% of that space is taken up by file attachments. My suggestion is to move all the attachments to a file share, provided everyone can access that share, or to an FTP server that everyone can connect to.
It's not ideal because security becomes an issue - now you need to manage credentials to the Notes database AND to the external file share - however it'll be worth the effort from a Notes administrator's perspective.
In the Notes documents, just provide a link to the file. If users are adding these files via a Notes form, perhaps you can add some background code to extract the file from the document after it has been saved, and replace it with a link to that file.
The 64GB is not actually an absolute limit, you can go above that, I've seen 80GB and even close to 100Gb although once your past 64Gb you can get problems at any time. The limit is not actually Notes, its the underlying file system, I've seen this on AS400 but the great thing about Notes is that if you do get a huge crash you can still access all the documents and pull everything out to new copies using scheduled agents even if you can no longer get views to open in the client.
Your best best is regular archiving, if it is file attachments then anything over two years old doesn't need to be in main system, just brief synopsis and link, you could even have 5 year archive, 2 year archive 1 year archive etc, data will continue to accumulate and has to be managed, irrespective of what platform you use to store it.
If the issue really is large file attachments, I would certainly recommend looking into implementing DAOS on your server / database. It is only available with Domino Server 8.5 and later. On the other hand, if your database contains over 100,000+ documents, you may want to look seriously at dividing the data into multiple NSF's - at that number of documents, you need to be very careful about your view design, your lookup code, etc.
Some documented successes with DAOS:
http://www.edbrill.com/ebrill/edbrill.nsf/dx/yet-another-daos-success-story-from-darren-duke?opendocument&comments
If you're database is getting to 60gb.. don't use a Domino solution you need to switch to a relational database. You need to archive or move documents across several databases. Although you can get to 60gb, you shouldn't do it. The performance hit for active databases is significant. Not so much a problem for static databases.
I would also look at removing any unnecessary views & their indexes. View indexes can occupy 80-90% of your disk space. If you can't remove them, simplify their sorting arrangements/formulas and remove any unnecessary column sorting options. I halved a 50gb down to 25gb with a few simple changes like this and virtually no users noticed.
One path could be, for once, to start with the user. Do all the users need to access all that data all the time ? If no, it's time to split or archive. If yes, there is probably a flaw in the design of the application.
Technically, I would add to the previous comments a suggestion to check the many options for compaction. Quick and dirty : disard all view indices, but be sure to rebuild at least the one for the default view if you don't want your users to riot. See updall
One more thing to check: make sure you have checked
[x] Use LZ1 compression for attachments
in db properties.

Resources