We have a large amount of data stored on hadoop (with multiple servers on Azure), we will clean and store our data into a datamart (star schema) on hadoop.
Our goal is to provide this data to users in a self service mode.
For that we already have a good knowledge of SSAS (multidim and tabular installed on premise), but we never use it on that kind of volumetry (7 billions of rows)
SSAS as a service is completly new for us, and we don't really know if it will the same as we know.
Do you know if Analysis Services as a service (provided on Azure plateform) will be able to store, process and provide us our data?
Do you know or did you meet some limitations?
Thank you very much for your support
Related
I want to create Webapps with PowerBI Embedded from the german central datacenter. Unfortnuatly this service is not available and i don't know when it will become available.
Therefore my idea is to migrate PowerBi Embedded later and start with all other services located in german central. Is this possible or strongly recommended to have the PowerBi Embedded service and the azure SQL Datawarehouse in the same place?
When you place your data source (SQL Data warehouse) and your BI tool (Power BI) in different datacenters there are two things that should be mindful of:
Latency and network speed between the data centers may affect the performance of your BI solution significantly (in a negative way), especially if you are manipulating and analysing large amounts of data. It depends a little on how you set up your Power BI embedded. If you use DirectQuery then you will be hit with the latency penalty every time the query runs (whenever you look at your report), if not then you will only be hit with the latency when you refresh your imported data. However, without DirectQuery you may have to import more data in order to aggregate etc from the imported dataset.
Egress, you pay for traffic going out of data centers. If you continuously send large amounts of data between two data centers then the egress cost can be a factor for you. In a normal setup the traffic charges are almost negliable, but if your BI solution streams a lot of data on every refresh then it may build up to a lot of money.
Our team have just recently started using Application Insights to add telemetry data to our windows desktop application. This data is sent almost exclusively in the form of events (rather than page views etc). Application Insights is useful only up to a point; to answer anything other than basic questions we are exporting to Azure storage and then using Power BI.
My question is one of data structure. We are new to analytics in general and have just been reading about star/snowflake structures for data warehousing. This looks like it might help in providing the answers we need.
My question is quite simple: Is this the right approach? Have we over complicated things? My current feeling is that a better approach will be to pull the latest data and transform it into a SQL database of facts and dimensions for Power BI to query. Does this make sense? Is this what other people are doing? We have realised that this is more work than we initially thought.
Definitely pursue Michael Milirud's answer, if your source product has suitable analytics you might not need a data warehouse.
Traditionally, a data warehouse has three advantages - integrating information from different data sources, both internal and external; data is cleansed and standardised across sources, and the history of change over time ensures that data is available in its historic context.
What you are describing is becoming a very common case in data warehousing, where star schemas are created for access by tools like PowerBI, Qlik or Tableau. In smaller scenarios the entire warehouse might be held in the PowerBI data engine, but larger data might need pass through queries.
In your scenario, you might be interested in some tools that appear to handle at least some of the migration of Application Insights data:
https://sesitai.codeplex.com/
https://github.com/Azure/azure-content/blob/master/articles/application-insights/app-insights-code-sample-export-telemetry-sql-database.md
Our product Ajilius automates the development of star schema data warehouses, speeding the development time to days or weeks. There are a number of other products doing a similar job, we maintain a complete list of industry competitors to help you choose.
I would continue with Power BI - it actually has a very sophisticated and powerful data integration and modeling engine built in. Historically I've worked with SQL Server Integration Services and Analysis Services for these tasks - Power BI Desktop is superior in many aspects. The design approaches remain consistent - star schemas etc, but you build them in-memory within PBI. It's way more flexible and agile.
Also are you aware that AI can be connected directly to PBI Web? This connects to your AI data in minutes and gives you PBI content ready to use (dashboards, reports, datasets). You can customize these and build new reports from the datasets.
https://powerbi.microsoft.com/en-us/documentation/powerbi-content-pack-application-insights/
What we ended up doing was not sending events from our WinForms app directly to AI but to the Azure EventHub
We then created a job that reads from the eventhub and send the data to
AI using the SDK
Blob storage for later processing
Azure table storage to create powerbi reports
You can of course add more destinations.
So basically all events are send to one destination and from there stored in many destinations, each for their own purposes. We definitely did not want to be restricted to 7 days of raw data and since storage is cheap and blob storage can be used in many analytics solutions of Azure and Microsoft.
The eventhub can be linked to stream analytics as well.
More information about eventhubs can be found at https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/
You can start using the recently released Application Insights Analytics' feature. In Application Insights we now let you write any query you would like so that you can get more insights out of your data. Analytics runs your queries in seconds, lets you filter / join / group by any possible property and you can also run these queries from Power BI.
More information can be found at https://azure.microsoft.com/en-us/documentation/articles/app-insights-analytics/
Looking for best azure services for holding and manipulating data for an e-commerce application (online book store) with millions of books.
As of now the e-commerce application is running over asp.net and on-premises SQL server. As stock availability and prices are changed very frequently (in every hour) so we are manipulating/ updating millions of data in a specific time-line. Millions of records are updating with in 30 minutes using SSIS packages.
Now as we are intended to move our application over Azure, so can some help me to select the best data storage service on azure which meets our expectations.
Expectations:
1- Can store relational data
2- Data can update with in strict timeline - uses minimum time to complete full transaction
3- Highly scalable and highly available
As an experiment I am managing these data with Azure SQL Database (P1-tier) but not fully satisfied. Because for those task where On-premises Sql Server takes 30 minutes to complete, Azure Sql takes more than 7 hrs for the same process. I also tried with batches but still struggling.
Can someone suggest the solution please.
I'd be happy to help.
Unfortunately there is so much that's different between an P1 in a remote data center with a 99.99 SLA, automatic HA and very specific CPU/IOPS/Memory resources - and your on-premise server where the app logic, SQL Server are all running in the same OS context. Putting network latency to the side, I would guess that the HW resources in this server (CPU/IOPS/memory) are many times larger than what resources an P1 has.
Using what data you already have, upgrading to a P2 will approx. double the resources available in this test, P3 will quadruple and so on.
Happy to talk offline to help you build a more apples to apples comparison. guyhay at microsoft.com
A brief summary of the project I'm working on:
I was hired as a web dev intern at a small company (part of a larger corporation) close to the state college I attend. For the past couple months, myself and two other interns have been working on the front-end as well as the back-end. The company is prototyping adding sensors to its products (oil/gas industry); we were tasked with building the portal that customers could login to to see data from their machines even if they're not near them.
Basically, we're collecting sensor data (~ten sensors/machine) and it's sent back to us. Where we're stuck is determining the best way to store and analyze long term data. We have a Redis Cache set up for fast access by the front-end, where only the lastest set of data for each machine is stored. But for historical data, I (and my coworkers) are having a tough time deciding the best route to go. Our whole project is based in VS (C#/Razor) with Azure integration (which is amazing by the way), so I'd like to keep the long term storage there as well. As far as I can tell, HDinsight + data in a BLOB seems to be the best option, but I'm fairly green when it comes to backend solutions. I would just like input from some older developers who may have more experience in this area, as we are the only developers here besides a couple older members who are more involved in the engineering side of things vs. development.
So, professionals of stack overflow, what would be your recommendation for long-term data storage and analytics?
PS: I apologize if I have HDinsight confused. From what I understand, it maps data in BLOB storage into HBase for easier analytics? Hadoop/HBase confuses me.
My first recommendation would be Azure Table storage. It provides a highly scalable and low cost data archival solution. If designed properly, you can also get a very decent query performance. Refer to the Azure Storage Table Design Guide for more details.
My second choice would be Azure DocumentDB service which is a NoSQL document database. It costs a bit more but querying data is much more flexible.
You should only go with HDInsight when you have a specific need as it's a resource-intensive and expensive service. Once you identify a specific requirement for a big-data analysis that's when you import your data and process it with HDInsight.
I'm working on Integration project where third party will call our web service in Azure. For performance reason I would like to store 2 table data (more than 1000 records) on to the app fabric cache.
Could anyone please suggest if this is the right design pattern?
Depending on how much data this is (you don't mention how wide the tables are) you have a couple of options
You could certainly store it in the azure cache, this will cost though.
You might also want to consider storing the data in the http runtime cache which is free but not distributed.
You choice would largely depend on the size of the data, how often it changes and what effect is caused if someone receives slightly out of date data.