We have a cube that contains 1.6 years of data and it is taking a long time to load. Previously we got a memory issue error, but we have increased the SAP Memory size. Can anyone explain me any ways to troubleshoot, or any best practices that we can follow?
We are currently pulling 30-35 combinations of Dimensions and Characteristics and still its taking a lot of time, and we don’t have that amount of time in order to get the error and then act on it.
It is the internal MDX limitation that you have to live with. In order to mitigate this, you will have to use filters or variables to restrict the return volume. If you don't mind moving data out of SAP onto Azure storage first, then you will gain much better user experience by pointing Power BI to Azure DW, Azure SQL database, or even Blob. Otherwise, you will have to be stuck with SAP bottleneck.
Because Power BI and ADF share the same underlying engine to access SAP BW, for your reference, you can check our blog out for comparison and further explanation in context of ADF and BW integration:
Related
I have a requirement to write upto 500k records daily to Azure SQL DB using an ADF pipeline.
I had simple calculations as part of the data transformation that can performed in a SQL Stored procedure activity. I've also observed Databricks Notebooks being used commonly, esp. due to benefits of scalability going forward. But there is an overhead activity of placing files in another location after transformation, managing authentication etc. and I want to avoid any over-engineering unless absolutely required.
I've tested SQL Stored Proc and it's working quite well for ~50k records (not yet tested with higher volumes).
But I'd still like to know the general recommendation between the 2 options, esp. from experienced Azure or data engineers.
Thanks
I'm not sure there is enough information to make a solid recommendation. What is the source of the data? Why is ADF part of the solution? Is this 500K rows once per day or a constant stream? Are you loading to a Staging table then using SPROC to move and transform the data to another table?
Here are a couple thoughts:
If the data operation is SQL to SQL [meaning the same SQL instance for both source and sink], then use Stored Procedures. This allows you to stay close to the metal and will perform the best. An exception would be if the computational load is really complicated, but that doesn't appear to be the case here.
Generally speaking, the only reason to call Data Bricks from ADF is if you already have that expertise and the resources already exist to support it.
Since ADF is part of the story, there is a middle ground between your two scenarios - Data Flows. Data Flows are a low-code abstraction over Data Bricks. They are ideal for in-flight data transforms and perform very well at high loads. You do not author or deploy notebooks, nor do you have to manage the Data Bricks configuration. And they are first class citizens in ADF pipelines.
As an experienced (former) DBA, Data Engineer and data architect, I cannot see what Databricks adds in this situation. This piece of the architecture you might need to scale is the target for the INSERTs, ie Azure SQL Database which is ridiculously easy to scale either manually via the portal or via the REST API, if even required. Consider techniques such as loading into heaps and partition switching if you need to tune the insert.
The overhead of adding an additional component to your architecture and then taking your data through would have to be worth it, plus the additional cost of spinning up Spark clusters at the same time your db is running.
Databricks is a superb tool and has a number of great use cases, eg advanced data transforms (ie things you cannot do with SQL), machine learning, streaming and others. Have a look at this free resource for a few ideas:
https://databricks.com/p/ebook/the-big-book-of-data-science-use-cases
Our team have just recently started using Application Insights to add telemetry data to our windows desktop application. This data is sent almost exclusively in the form of events (rather than page views etc). Application Insights is useful only up to a point; to answer anything other than basic questions we are exporting to Azure storage and then using Power BI.
My question is one of data structure. We are new to analytics in general and have just been reading about star/snowflake structures for data warehousing. This looks like it might help in providing the answers we need.
My question is quite simple: Is this the right approach? Have we over complicated things? My current feeling is that a better approach will be to pull the latest data and transform it into a SQL database of facts and dimensions for Power BI to query. Does this make sense? Is this what other people are doing? We have realised that this is more work than we initially thought.
Definitely pursue Michael Milirud's answer, if your source product has suitable analytics you might not need a data warehouse.
Traditionally, a data warehouse has three advantages - integrating information from different data sources, both internal and external; data is cleansed and standardised across sources, and the history of change over time ensures that data is available in its historic context.
What you are describing is becoming a very common case in data warehousing, where star schemas are created for access by tools like PowerBI, Qlik or Tableau. In smaller scenarios the entire warehouse might be held in the PowerBI data engine, but larger data might need pass through queries.
In your scenario, you might be interested in some tools that appear to handle at least some of the migration of Application Insights data:
https://sesitai.codeplex.com/
https://github.com/Azure/azure-content/blob/master/articles/application-insights/app-insights-code-sample-export-telemetry-sql-database.md
Our product Ajilius automates the development of star schema data warehouses, speeding the development time to days or weeks. There are a number of other products doing a similar job, we maintain a complete list of industry competitors to help you choose.
I would continue with Power BI - it actually has a very sophisticated and powerful data integration and modeling engine built in. Historically I've worked with SQL Server Integration Services and Analysis Services for these tasks - Power BI Desktop is superior in many aspects. The design approaches remain consistent - star schemas etc, but you build them in-memory within PBI. It's way more flexible and agile.
Also are you aware that AI can be connected directly to PBI Web? This connects to your AI data in minutes and gives you PBI content ready to use (dashboards, reports, datasets). You can customize these and build new reports from the datasets.
https://powerbi.microsoft.com/en-us/documentation/powerbi-content-pack-application-insights/
What we ended up doing was not sending events from our WinForms app directly to AI but to the Azure EventHub
We then created a job that reads from the eventhub and send the data to
AI using the SDK
Blob storage for later processing
Azure table storage to create powerbi reports
You can of course add more destinations.
So basically all events are send to one destination and from there stored in many destinations, each for their own purposes. We definitely did not want to be restricted to 7 days of raw data and since storage is cheap and blob storage can be used in many analytics solutions of Azure and Microsoft.
The eventhub can be linked to stream analytics as well.
More information about eventhubs can be found at https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/
You can start using the recently released Application Insights Analytics' feature. In Application Insights we now let you write any query you would like so that you can get more insights out of your data. Analytics runs your queries in seconds, lets you filter / join / group by any possible property and you can also run these queries from Power BI.
More information can be found at https://azure.microsoft.com/en-us/documentation/articles/app-insights-analytics/
A brief summary of the project I'm working on:
I was hired as a web dev intern at a small company (part of a larger corporation) close to the state college I attend. For the past couple months, myself and two other interns have been working on the front-end as well as the back-end. The company is prototyping adding sensors to its products (oil/gas industry); we were tasked with building the portal that customers could login to to see data from their machines even if they're not near them.
Basically, we're collecting sensor data (~ten sensors/machine) and it's sent back to us. Where we're stuck is determining the best way to store and analyze long term data. We have a Redis Cache set up for fast access by the front-end, where only the lastest set of data for each machine is stored. But for historical data, I (and my coworkers) are having a tough time deciding the best route to go. Our whole project is based in VS (C#/Razor) with Azure integration (which is amazing by the way), so I'd like to keep the long term storage there as well. As far as I can tell, HDinsight + data in a BLOB seems to be the best option, but I'm fairly green when it comes to backend solutions. I would just like input from some older developers who may have more experience in this area, as we are the only developers here besides a couple older members who are more involved in the engineering side of things vs. development.
So, professionals of stack overflow, what would be your recommendation for long-term data storage and analytics?
PS: I apologize if I have HDinsight confused. From what I understand, it maps data in BLOB storage into HBase for easier analytics? Hadoop/HBase confuses me.
My first recommendation would be Azure Table storage. It provides a highly scalable and low cost data archival solution. If designed properly, you can also get a very decent query performance. Refer to the Azure Storage Table Design Guide for more details.
My second choice would be Azure DocumentDB service which is a NoSQL document database. It costs a bit more but querying data is much more flexible.
You should only go with HDInsight when you have a specific need as it's a resource-intensive and expensive service. Once you identify a specific requirement for a big-data analysis that's when you import your data and process it with HDInsight.
We are currently running on Azure and we have a table with hundreds of millions of rows. This table is static and will be refreshed weekly. We've looked at ColumnStore index but unfortunately it is not Azure yet so below are my questions,
Will ColumnStore index be available in Azure?
if not what other technology can we use to get the same performance
benefits as the ColumnStore index would provide?
Can we get the same query performance by using Azure Table Storage?
I'm a newbie to both Azure and Columnar databases so please bear me with me if I ask these questions.. :)
About ColumnStore, if you have bought the license, you can check with development team or ask on blogs such as ScottGu's Blog. From there only you will come to know about any feature release.
Azure Database is designed for scalability. You will need to use the Partition Key very wisely. Partition Key is like index of book, so if you want to search something in book, you can quickly refer to the index and reach the page quickly. In other words, you can group data depending upon certain criteria and store it in a single partition. So where ever you have the same criteria, your query will hit only one partition. The thing with partitions is, for a table you can any number of partition, but it is not necessary that all the partition will reside on same machine or even same farm. So when you fire a query on badly designed Azure Table, it can hit more than one server, and thus bad performance. Read about Real World: Designing a Scalable Partitioning Strategy for Windows Azure Table Storage
Hope you get what you are looking for.
As Amar pointed out, keep an eye on the team blogs for the latest in new feature announcements. The goal for SQL Azure is for it to eventually be where new features are found first. However, it will still take awhile for things to get there.
As for your peformance question, there's no simple answer for this. Windows Azure resources are designed for scale, not necessarially high performance. So its to take your scale/capacity targets into account when designing solutions. For your situation, I would encourage you to conside table storage, but this will depend on frequency access and the types of queries you need to make on the data. Just do not be surprised if you have to mave redundant copies of your data that are modelled differently, or possibly even running parrallel queries and aggregating results. This is the way table storage was designed to be used. Its cheaper then SQL Azure and its this price difference that makes redundant specialized data models possible.
This approach also has to be weighed against the cost of retraining your developers to stop thinking in RDBMS terms. :)
Im writing a 'proof of concept' application to investigate the possibility of moving a bespoke ASP.NET ecommerce system over to Windows Azure during a necessary re-write of the entire application.
Im tempted to look at using Azure Table Storage as an alternative to SQL Azure as the entities being stored are likely to change their schema (properties) over time as the application matures further, and I wont need to make endless database schema changes. In addition we can build refferential integrity into the applicaiton code - so the case for considering Azure Table Storage is a strong one.
The only potential issue I can see at this time is that we do a small amount of simple reporting - i.e. value of sales between two dates, number of items sold for a particular product etc.
I know that Table Storage doesnt support aggregate type functions, and I believe we can achieve what we want with clever use of partitions, multiple entity types to store subsets of the same data and possibly pre-aggregation but Im not 100% sure about how to go about it.
Does anyone know of any in-depth documents about Azure Table Storage design principles so that we make proper and efficient use of Tables, PartitionKeys and entity design etc.
there's a few simplistic documents around, and the current books available tend not to go into this subject in much depth.
FYI - the ecommerce site has about 25,000 customers and takes about 100,000 orders per year.
Have you seen this post ?
http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx
Pretty thorough coverage of tables
I think there are three potential issues I think in porting your app to Table Storage.
The lack of reporting - including aggregate functions - which you've already identified
The limited availability of transaction support - with 100,000 orders per year I think you'll end up missing this support.
Some problems with costs - $1 per million operations is only a small cost, but you can need to factor this in if you get a lot of page views.
Honestly, I think a hybrid approach - perhaps EF or NH to SQL Azure for critical data, with large objects stored in Table/Blob?
Enough of my opinion! For "in depth":
try the storage team's blog http://blogs.msdn.com/b/windowsazurestorage/ - I've found this very good
try the PDC sessions from Jai Haridas (couldn't spot a link - but I'm sure its there still)
try articles inside Eric's book - http://geekswithblogs.net/iupdateable/archive/2010/06/23/free-96-page-book---windows-azure-platform-articles-from.aspx
there's some very good best practice based advice on - http://azurescope.cloudapp.net/ - but this is somewhat performance orientated
If you have start looking at Azure storage such as table, it would do no harm in looking at other NOSQL offerings in the market (especially around document databases). This would give you insight into NOSQL space and how solution around such storages are designed.
You can also think about a hybrid approach of SQL DB + NOSQL solution. Parts of the system may lend themselves very well to Azure table storage model.
NOSQL solutions such as Azure table have their own challenges such as
Schema changes for data. Check here and here
Transactional support
ACID constraints. Check here
All table design papers I have seen are pretty much exclusively focused on the topics of scalability and search performance. I have not seen anything related to design considerations for reporting or BI.
Now, azure tables are accessible through rest APIs and via the azure SDK. Depending on what reporting you need, you might be able to pull out the information you require with minimal effort. If your reporting requirements are very sophisticated, then perhaps SQL azure together with Windows Azure SQL Reporting services might be a better option to consider?