I want to create Webapps with PowerBI Embedded from the german central datacenter. Unfortnuatly this service is not available and i don't know when it will become available.
Therefore my idea is to migrate PowerBi Embedded later and start with all other services located in german central. Is this possible or strongly recommended to have the PowerBi Embedded service and the azure SQL Datawarehouse in the same place?
When you place your data source (SQL Data warehouse) and your BI tool (Power BI) in different datacenters there are two things that should be mindful of:
Latency and network speed between the data centers may affect the performance of your BI solution significantly (in a negative way), especially if you are manipulating and analysing large amounts of data. It depends a little on how you set up your Power BI embedded. If you use DirectQuery then you will be hit with the latency penalty every time the query runs (whenever you look at your report), if not then you will only be hit with the latency when you refresh your imported data. However, without DirectQuery you may have to import more data in order to aggregate etc from the imported dataset.
Egress, you pay for traffic going out of data centers. If you continuously send large amounts of data between two data centers then the egress cost can be a factor for you. In a normal setup the traffic charges are almost negliable, but if your BI solution streams a lot of data on every refresh then it may build up to a lot of money.
Related
We are using Azure Stream Analytics to build out a new IoT product. The data is successfully streaming to Power BI but there is no way to implement Row Level Security so we can display this data back to a customer, limited to only that customer's data. I am considering adding an Azure SQL DB between ASA and PBI and switching the PBI Dataset from a streaming dataset to Direct Query with a high page refresh rate but this seems like it will be a very intense workload for an Azure SQL DB to handle. There is the potential, as the product grows, for multiple inserts per second and querying every couple of seconds. Streaming seems like the better answer besides the missing RLS. Any tips?
There is the potential, as the product grows, for multiple inserts per second and querying every couple of seconds.
A small Azure SQL Database should handle that load. 1000/sec simple. 100,000/sec is probably too much.
And ASA can ensure that the output streams are not too frequent.
We have our data stored in azure analysis services B1 Tier. Power bi report deployed in a premium workspace is
consuming the Model using live query. We are facing issue with AAS that the model is not releasing the memory
once the report is consumed by the users. Memory is released only after the service is restarted. Is this because of the B1 tier that we use? Is there any better way to
handle the caching ? Any help is appreciated
Thanks
Analysis Services releases memory as required, based on the demands placed on the model. It does not do this as soon as a report is rendered as this is not how it is designed to work and would negatively impact performance.
You see the memory utilisation drop after restarting because the cached data generated by the Vertipaq engine is flushed, though this will be rebuilt as soon as you start querying the model again.
It sounds like you have some rather intensive dashboard measures or too much data for the Basic tier, which is a general-purpose tier and recommended for production solutions with small Tabular models, limited user concurrency and simple data refresh requirements.
Your solution here is to do some combination of:
Upgrade your AAS instance
Reduce the amount of data in your model
Reduce the number of users
Reduce the complexity of your reports
Optimise your DAX measures
I have a requirement to write upto 500k records daily to Azure SQL DB using an ADF pipeline.
I had simple calculations as part of the data transformation that can performed in a SQL Stored procedure activity. I've also observed Databricks Notebooks being used commonly, esp. due to benefits of scalability going forward. But there is an overhead activity of placing files in another location after transformation, managing authentication etc. and I want to avoid any over-engineering unless absolutely required.
I've tested SQL Stored Proc and it's working quite well for ~50k records (not yet tested with higher volumes).
But I'd still like to know the general recommendation between the 2 options, esp. from experienced Azure or data engineers.
Thanks
I'm not sure there is enough information to make a solid recommendation. What is the source of the data? Why is ADF part of the solution? Is this 500K rows once per day or a constant stream? Are you loading to a Staging table then using SPROC to move and transform the data to another table?
Here are a couple thoughts:
If the data operation is SQL to SQL [meaning the same SQL instance for both source and sink], then use Stored Procedures. This allows you to stay close to the metal and will perform the best. An exception would be if the computational load is really complicated, but that doesn't appear to be the case here.
Generally speaking, the only reason to call Data Bricks from ADF is if you already have that expertise and the resources already exist to support it.
Since ADF is part of the story, there is a middle ground between your two scenarios - Data Flows. Data Flows are a low-code abstraction over Data Bricks. They are ideal for in-flight data transforms and perform very well at high loads. You do not author or deploy notebooks, nor do you have to manage the Data Bricks configuration. And they are first class citizens in ADF pipelines.
As an experienced (former) DBA, Data Engineer and data architect, I cannot see what Databricks adds in this situation. This piece of the architecture you might need to scale is the target for the INSERTs, ie Azure SQL Database which is ridiculously easy to scale either manually via the portal or via the REST API, if even required. Consider techniques such as loading into heaps and partition switching if you need to tune the insert.
The overhead of adding an additional component to your architecture and then taking your data through would have to be worth it, plus the additional cost of spinning up Spark clusters at the same time your db is running.
Databricks is a superb tool and has a number of great use cases, eg advanced data transforms (ie things you cannot do with SQL), machine learning, streaming and others. Have a look at this free resource for a few ideas:
https://databricks.com/p/ebook/the-big-book-of-data-science-use-cases
We have a large amount of data stored on hadoop (with multiple servers on Azure), we will clean and store our data into a datamart (star schema) on hadoop.
Our goal is to provide this data to users in a self service mode.
For that we already have a good knowledge of SSAS (multidim and tabular installed on premise), but we never use it on that kind of volumetry (7 billions of rows)
SSAS as a service is completly new for us, and we don't really know if it will the same as we know.
Do you know if Analysis Services as a service (provided on Azure plateform) will be able to store, process and provide us our data?
Do you know or did you meet some limitations?
Thank you very much for your support
Our team have just recently started using Application Insights to add telemetry data to our windows desktop application. This data is sent almost exclusively in the form of events (rather than page views etc). Application Insights is useful only up to a point; to answer anything other than basic questions we are exporting to Azure storage and then using Power BI.
My question is one of data structure. We are new to analytics in general and have just been reading about star/snowflake structures for data warehousing. This looks like it might help in providing the answers we need.
My question is quite simple: Is this the right approach? Have we over complicated things? My current feeling is that a better approach will be to pull the latest data and transform it into a SQL database of facts and dimensions for Power BI to query. Does this make sense? Is this what other people are doing? We have realised that this is more work than we initially thought.
Definitely pursue Michael Milirud's answer, if your source product has suitable analytics you might not need a data warehouse.
Traditionally, a data warehouse has three advantages - integrating information from different data sources, both internal and external; data is cleansed and standardised across sources, and the history of change over time ensures that data is available in its historic context.
What you are describing is becoming a very common case in data warehousing, where star schemas are created for access by tools like PowerBI, Qlik or Tableau. In smaller scenarios the entire warehouse might be held in the PowerBI data engine, but larger data might need pass through queries.
In your scenario, you might be interested in some tools that appear to handle at least some of the migration of Application Insights data:
https://sesitai.codeplex.com/
https://github.com/Azure/azure-content/blob/master/articles/application-insights/app-insights-code-sample-export-telemetry-sql-database.md
Our product Ajilius automates the development of star schema data warehouses, speeding the development time to days or weeks. There are a number of other products doing a similar job, we maintain a complete list of industry competitors to help you choose.
I would continue with Power BI - it actually has a very sophisticated and powerful data integration and modeling engine built in. Historically I've worked with SQL Server Integration Services and Analysis Services for these tasks - Power BI Desktop is superior in many aspects. The design approaches remain consistent - star schemas etc, but you build them in-memory within PBI. It's way more flexible and agile.
Also are you aware that AI can be connected directly to PBI Web? This connects to your AI data in minutes and gives you PBI content ready to use (dashboards, reports, datasets). You can customize these and build new reports from the datasets.
https://powerbi.microsoft.com/en-us/documentation/powerbi-content-pack-application-insights/
What we ended up doing was not sending events from our WinForms app directly to AI but to the Azure EventHub
We then created a job that reads from the eventhub and send the data to
AI using the SDK
Blob storage for later processing
Azure table storage to create powerbi reports
You can of course add more destinations.
So basically all events are send to one destination and from there stored in many destinations, each for their own purposes. We definitely did not want to be restricted to 7 days of raw data and since storage is cheap and blob storage can be used in many analytics solutions of Azure and Microsoft.
The eventhub can be linked to stream analytics as well.
More information about eventhubs can be found at https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/
You can start using the recently released Application Insights Analytics' feature. In Application Insights we now let you write any query you would like so that you can get more insights out of your data. Analytics runs your queries in seconds, lets you filter / join / group by any possible property and you can also run these queries from Power BI.
More information can be found at https://azure.microsoft.com/en-us/documentation/articles/app-insights-analytics/