We are looking to provide a historic activity log on objects in our system(similar to jira's history tab). We are looking at Azure Data Explorer as a potential tool for addressing this usecase.
sample queries we need to answer:
give me all objects that have changed in the last 30 days.
give me all objects that have changed in the last 30 days that have value of key1 set to value1.
give me all the objects that userA changed in the last year.
The amount of data(objects) we have is huge(could be tens of millions), but the activity itself is not, and will not be in a streaming format for sure. Is this a right usecase for using Azure Data Explorer?
Yes, Azure Data Explorer (Kusto) is the ideal cloud service for this functionality. You can learn more about the service by watching the recent online event and using the quick starts in the docs.
Related
I´m looking for an Azure service that allows me to log ~200 million datasets a month. These are tracking datasets, so writing has to be fast.
The data will be read once or twice a day to cumulate the tracked data.
Does anyone know an Azure Service which makes sense for that?
Thanks in advance,
Michael
Data lake Service in Azure Seems to be a good option for your case
Please provide more details about the kind of "dataset". It´s a flat file with 100 records? Or, some million records?
How often is it going to be sent to the system, some dozend each minute, or 10 million at the end of the day?
With these infos, if you know something about AWS, you can find an equivalent service on azure here:
https://learn.microsoft.com/en-us/azure/architecture/aws-professional/services
If you are going to ingest a lot of request (like tracking events, or anything like) and want to summarize it by some time span, you can do some tests with IoT suite:
https://azure.microsoft.com/suites/iot-suite/
I heave created a query in (the wonderful tool ) Application Insight Analytics which I intended to use for monitoring in one way or another, but from what I found, this is not that easy?
The query returns some data I would like to set up Application Insight alerts on (such as if (column1 equals '1') then alert() or if(column2 > 10) then alert().
Or if that is not possible, is the Analytics service available either from .net code or power shell? If so, I would be able to create the alert-service myself (not ideal though).
Is any of the above features available?
(I do not think the functionality I´m after is available in Insights. I want to compare two custom events and based on differences between them, take actions if necessary)
There's currently no way to get azure alerts from an Analytics query.
However, there is a request for that on uservoice:
https://visualstudio.uservoice.com/forums/357324-application-insights/suggestions/14428134-add-alerts-based-on-results-of-analytics-queries
so go upvote and comment on that to make your voice heard.
There's also a planned service to read data from Analytics through an API as well:
https://visualstudio.uservoice.com/forums/357324-application-insights/suggestions/4999529-make-data-accessible-via-apis-for-custom-processin
Which you could write your own service against to do extra work.
Our team have just recently started using Application Insights to add telemetry data to our windows desktop application. This data is sent almost exclusively in the form of events (rather than page views etc). Application Insights is useful only up to a point; to answer anything other than basic questions we are exporting to Azure storage and then using Power BI.
My question is one of data structure. We are new to analytics in general and have just been reading about star/snowflake structures for data warehousing. This looks like it might help in providing the answers we need.
My question is quite simple: Is this the right approach? Have we over complicated things? My current feeling is that a better approach will be to pull the latest data and transform it into a SQL database of facts and dimensions for Power BI to query. Does this make sense? Is this what other people are doing? We have realised that this is more work than we initially thought.
Definitely pursue Michael Milirud's answer, if your source product has suitable analytics you might not need a data warehouse.
Traditionally, a data warehouse has three advantages - integrating information from different data sources, both internal and external; data is cleansed and standardised across sources, and the history of change over time ensures that data is available in its historic context.
What you are describing is becoming a very common case in data warehousing, where star schemas are created for access by tools like PowerBI, Qlik or Tableau. In smaller scenarios the entire warehouse might be held in the PowerBI data engine, but larger data might need pass through queries.
In your scenario, you might be interested in some tools that appear to handle at least some of the migration of Application Insights data:
https://sesitai.codeplex.com/
https://github.com/Azure/azure-content/blob/master/articles/application-insights/app-insights-code-sample-export-telemetry-sql-database.md
Our product Ajilius automates the development of star schema data warehouses, speeding the development time to days or weeks. There are a number of other products doing a similar job, we maintain a complete list of industry competitors to help you choose.
I would continue with Power BI - it actually has a very sophisticated and powerful data integration and modeling engine built in. Historically I've worked with SQL Server Integration Services and Analysis Services for these tasks - Power BI Desktop is superior in many aspects. The design approaches remain consistent - star schemas etc, but you build them in-memory within PBI. It's way more flexible and agile.
Also are you aware that AI can be connected directly to PBI Web? This connects to your AI data in minutes and gives you PBI content ready to use (dashboards, reports, datasets). You can customize these and build new reports from the datasets.
https://powerbi.microsoft.com/en-us/documentation/powerbi-content-pack-application-insights/
What we ended up doing was not sending events from our WinForms app directly to AI but to the Azure EventHub
We then created a job that reads from the eventhub and send the data to
AI using the SDK
Blob storage for later processing
Azure table storage to create powerbi reports
You can of course add more destinations.
So basically all events are send to one destination and from there stored in many destinations, each for their own purposes. We definitely did not want to be restricted to 7 days of raw data and since storage is cheap and blob storage can be used in many analytics solutions of Azure and Microsoft.
The eventhub can be linked to stream analytics as well.
More information about eventhubs can be found at https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/
You can start using the recently released Application Insights Analytics' feature. In Application Insights we now let you write any query you would like so that you can get more insights out of your data. Analytics runs your queries in seconds, lets you filter / join / group by any possible property and you can also run these queries from Power BI.
More information can be found at https://azure.microsoft.com/en-us/documentation/articles/app-insights-analytics/
So I've been reading through the Application Insights information published by Microsoft, and in particular this article: https://azure.microsoft.com/en-gb/documentation/articles/app-insights-search-diagnostic-logs/
So what I want to ask is, whats the most logical methodology to log database calls?
In my head, I want to be able to log into application insights, see the most common database calls being made, and see what their average call times are. That way, I can say "wow the lookup to the membership profile table is taking a few seconds today, what's the deal?"
So I have a database name, a stored procedure name, and an execution time, what's the best way for me to take that data and store it in AI? As a metric, an event, something else?
First of all AI has dependency calls autocollection. Please read this. Secondly it is planned to release SDK 1.1 next week. As part of that release there you will have DependencyTelemetry type that is added specifically for monitoring SQL, http, blob and other external dependencies.
For a project, I am using both SQL Azure and Azure table. A requirement here is that for the first 7 days, all data are stored in SQL Azure. After the first 7 days, the data are migrated into Azure table.
Is there any reliable project to achieve this goal? Or any idea to implement this?
thanks,
I think your best best is to have a set of SQL queries (or sprocs) that return data older than 7 days. Then have table-insertion code that writes this data to one or more tables, with appropriate partition/row key based on your query needs. Then, just build some type of background operation to perform the read+write+delete. There's no tool to do this (that I know of), since one is a relational database and the other is a NoSQL variant with no specific schema.
To optimize your writes, see if you can write batches of rows at the same time (this is called an Entity Group Transaction). It optimizes # of transactions, plus the rows in a group will be written atomically. See more info on entity group transactions, here.
You also may want to consider using a queue for workload assignment. That is, maybe once a day (or hour, whenever), push a queue message telling some background process to transfer data from SQL to Table Storage. This way, in case something fails during the operation, you can process it again later, since the queue message will still be there (you'd only delete the message if the operation succeeded).
If you're looking for a tool to do so, take a look at Cloud Storage Studio (http://www.cerebrata.com/products/cloudstoragestudio) which has a feature to import data from SQL Server to Azure Table Storage. I haven't checked for a long time but I believe ClumsyLeaf's TableXplorer (http://www.clumsyleaf.com) also has this feature. Long time back, we also built an open source tool to do the same. You can find it here: http://azuredatabaseupload.codeplex.com/.
As David mentioned, you could basically write some views in your database to fetch data older than 7 days. The idea is simple: You fetch the data, map the SQL Server data types to Azure data types, choose appropriate PartitionKey/RowKey values, convert the data into entities and then upload entities in batches.