Questions on Azure Data Explorer normalization

Questions on Azure Data Explorer normalization - azure

Our Customer currently build out a number of use cases for the client and facilitate the onboarding of logs from 300+ applications. The client is limited on the number of use cases they can support so they have been looking into the option of creating a custom schema with parsers etc.
The focus is insider threat so they are primarily collecting audit/activity logs for these applications.
The challenges they see them are that application audit/activity logs vary greatly and this will make it difficult to bring the data together from multiple applications. The client has a non-standard architecture and ingest their logs through ADX instead of Sentinel and then forward a subset of data for alerting. They also don’t make wide use of native tables yet.
Please do refer the architecture diagram in the attachment.
Question:
Is there a way of normalizing application audit and activity logs so they can build out insider threat use cases over multiple applications?
Any guidance for this requirement would be of great help. Thanks in advance.

Related

Design for a Cloud Native Application in Azure for ML Insights and Actions

I have an idea whereby I intend to build a cloud native application for algorithmic trading, ideally by consuming all PaaS and SaaS (no IaaS), and I'd like to get some feedback on how I intend to build it. The concept is pretty straight-forward in that I intend to consume financial trading data from an external SaaS solution via an API query, feed that data into various Azure PaaS solutions (most notably ML for modeling), and then take some action. Here is a high-level diagram I've come up with so far:
Solution Overview
As a note, while I'm familiar with Azure, I'm not a Azure cloud engineer and have limited experience in actually building solutions myself. Subsequently, I intend to use this project as a foundation to further educate myself.
When starting on the build, I immediately questioned whether I should or shouldn't use Event Hubs. Conceptually it makes sense, in that I'm decoupling the production of a data stream from the consumption of it. Presumably, this facilitates less complications when / if I need to update the data feed(s) in the future. I also thought about where the data is stored... should it be a SQL database, or more simply, an Azure Table? The idea here is that the trading data will need to be stored for regression testing as my iterate through my models. All that said, looking for some insights from anybody that may have experience in this space.
Thanks!

There's no real question in here. Take a look on the architecture reference provided by Microsoft: https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/

How Alteryx is deploying data in decentralized way?

From the link :https://reviews.financesonline.com/p/alteryx/, I see the following details
Alteryx is an advanced data analytics platform intended to serve the
needs of business analysts looking for a self-service solution. It
contains 3 basic components: Gallery, Designer, and Server, which
blend data from external sources and generate comprehensive reports.
Each of them, however, can be used separately.
The software structures and evaluates data from multiple external
sources, and organizes it into comprehensive insights that can be used
for business deciding and shared with multiple internal/external
users. Basically, Alteryx is deploying data in a decentralized way,
and eliminating in such way the risk of underestimating it. At the
same time, Alteryx is well-integrated, easy to use, and ran both on
premise and in cloud.
Can anyone help me to know what is the text above in bold trying to explain. I am interested to understand it in details with some explanation.

The basic idea of is that the tool can blend just about any kind of data and dump the result to your own local extract... the local extract is "decentralized" in that, obviously it's local, and also you didn't need to rely on some core ETL team to build a process for you (which they would probably dump in a central location). The use of the term "underestimating" probably indicates that, if you're not building in your own insights (say you find something online that you can blend into your analysis), you're "underestimating" the importance of that data.
It's worth noting that your custom extract could be turned into a nightly job and the output could itself be dumped to a centralized database server if desired. So the tool can be used to build centralized assets too. It really just depends on how you're using it. (With Alteryx this would require either their Desktop Automation, or their Server.)
So... it seems that any self service data blending tool would be capable of the same. What's special about Alteryx? The distinguishing factors will lie elsewhere: number of data types supported, overall functionality and power, performance, built-in examples, ease-of-use, service, support, online community, and perhaps other areas.

Decision path for Azure Service Fabric Programming Models

Background
We are looking at porting a 'monolithic' 3 tier Web app to a microservices architecture. The web app displays listings to a consumer (think Craiglist).
The backend consists of a REST API that calls into a SQL DB and returns JSON for a SPA app to build a UI (there's also a mobile app). Data is written to the SQL DB via background services (ftp + worker roles). There's also some pages that allow writes by the user.
Information required:
I'm trying to figure out how (if at all), Azure Service Fabric would be a good fit for a microservices architecture in my scenario. I know the pros/cons of microservices vs monolith, but i'm trying to figure out the application of various microservice programming models to our current architecture.
Questions
Is Azure Service Fabric a good fit for this? If not, other recommendations? Currently i'm leaning towards a bunch of OWIN-based .NET web sites, split up by area/service, each hosted on their own machine and tied together by an API gateway.
Which Service Fabric programming model would i go for? Stateless services with their own backing DB? I can't see how Stateful or Actor model would help here.
If i went with Stateful services/Actor, how would i go about updating data as part of a maintenance/ad-hoc admin request? Traditionally we would simply login to the DB and update the data, and the API would return the new data - but if it's persisted in-memory/across nodes in a cluster, how would we update it? Would i have to expose this all via methods on the service? Similarly, how would I import my existing SQL data into a stateful service?
For Stateful services/actor model, how can I 'see' the data visually, with an object Explorer/UI. Our data is our Gold, and I'm concerned of the lack of control/visibility of it in the reliable services models
Basically, is there some documentation on the decision path towards which programming model to go for? I could model a "listing" as an Actor, and have millions of those - sure, but i could also have a Stateful service that stores the listing locally, and i could also have a Stateless service that fetches it from the DB. How does one decide as to which is the best approach, for a given use case?
Thanks.

What is it about your current setup that isn't meeting your requirements? What do you hope to gain from a more complex architecture?
Microservices aren't a magic bullet. You mainly get four benefits:
You can scale and distribute pieces of your overall system independently. Service Fabric has very sophisticated tools and advanced capabilities for this.
You can deploy and upgrade pieces of your overall system independently. Service Fabric again has advanced capabilities for this.
You can have a polyglot system - each service can be written in a different language/platform.
You can use conflicting dependencies - each service can have its own set of dependencies, like different framework versions.
All of this comes at a cost and introduces complexity and new ways your system can fail. For example: your fast, compile-time checked in-proc method calls now become slow (by comparison to an in-proc function call) failure-prone network calls. And these are not specific to Service Fabric, btw, this is just what happens you go from in-proc method calls to cross-machine I/O - doesn't matter what platform you use. The decision path here is a pro/con list specific to your application and your requirements.
To answer your Service Fabric questions specifically:
Which programming model do you go for? Start with stateless services with ASP.NET Core. It's going to be the simplest translation of your current architecture that doesn't require mucking around with your data layer.
Stateful has a lot of great uses, but it's not necessarily a replacement for your RDBMS. A good place to start is hot data that can be stored in simple key-value pairs, is accessed frequently and needs to be low-latency (you get local reads!), and doesn't need to be datamined. Some examples include user session state, cache data, a "snapshot" of the most recent items in a data stream (like the most recent stock quote in a stream of stock quotes).
Currently the only way to see or query your data is programmatically directly against the Reliable Collection APIs. There is no viewer or "management studio" tool. You have to write (and secure) an API in each service that can display and query data.
Finally, the actor model is a very niche model. It serves specific purposes but if you just treat it as a data store it will not work for you. Like in your example, a listing per actor probably wouldn't work because you can't query across that list, or even have multiple users reading the same listing simultaneously.

Application insight -> export -> Power BI Data Warehouse Architecture

Our team have just recently started using Application Insights to add telemetry data to our windows desktop application. This data is sent almost exclusively in the form of events (rather than page views etc). Application Insights is useful only up to a point; to answer anything other than basic questions we are exporting to Azure storage and then using Power BI.
My question is one of data structure. We are new to analytics in general and have just been reading about star/snowflake structures for data warehousing. This looks like it might help in providing the answers we need.
My question is quite simple: Is this the right approach? Have we over complicated things? My current feeling is that a better approach will be to pull the latest data and transform it into a SQL database of facts and dimensions for Power BI to query. Does this make sense? Is this what other people are doing? We have realised that this is more work than we initially thought.

Definitely pursue Michael Milirud's answer, if your source product has suitable analytics you might not need a data warehouse.
Traditionally, a data warehouse has three advantages - integrating information from different data sources, both internal and external; data is cleansed and standardised across sources, and the history of change over time ensures that data is available in its historic context.
What you are describing is becoming a very common case in data warehousing, where star schemas are created for access by tools like PowerBI, Qlik or Tableau. In smaller scenarios the entire warehouse might be held in the PowerBI data engine, but larger data might need pass through queries.
In your scenario, you might be interested in some tools that appear to handle at least some of the migration of Application Insights data:
https://sesitai.codeplex.com/
https://github.com/Azure/azure-content/blob/master/articles/application-insights/app-insights-code-sample-export-telemetry-sql-database.md
Our product Ajilius automates the development of star schema data warehouses, speeding the development time to days or weeks. There are a number of other products doing a similar job, we maintain a complete list of industry competitors to help you choose.

I would continue with Power BI - it actually has a very sophisticated and powerful data integration and modeling engine built in. Historically I've worked with SQL Server Integration Services and Analysis Services for these tasks - Power BI Desktop is superior in many aspects. The design approaches remain consistent - star schemas etc, but you build them in-memory within PBI. It's way more flexible and agile.
Also are you aware that AI can be connected directly to PBI Web? This connects to your AI data in minutes and gives you PBI content ready to use (dashboards, reports, datasets). You can customize these and build new reports from the datasets.
https://powerbi.microsoft.com/en-us/documentation/powerbi-content-pack-application-insights/

What we ended up doing was not sending events from our WinForms app directly to AI but to the Azure EventHub
We then created a job that reads from the eventhub and send the data to
AI using the SDK
Blob storage for later processing
Azure table storage to create powerbi reports
You can of course add more destinations.
So basically all events are send to one destination and from there stored in many destinations, each for their own purposes. We definitely did not want to be restricted to 7 days of raw data and since storage is cheap and blob storage can be used in many analytics solutions of Azure and Microsoft.
The eventhub can be linked to stream analytics as well.
More information about eventhubs can be found at https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/

You can start using the recently released Application Insights Analytics' feature. In Application Insights we now let you write any query you would like so that you can get more insights out of your data. Analytics runs your queries in seconds, lets you filter / join / group by any possible property and you can also run these queries from Power BI.
More information can be found at https://azure.microsoft.com/en-us/documentation/articles/app-insights-analytics/

Fetching data from cerner's EMR / EHR

I don't have much idea in medical domain.
We evaluating a requirement from our client who is using Cerner EMR system
As per the requirement we need to expose the Cerner EMR or fetch some EMR / EHR data and to display it in SharePoint 2013 portal.
To meet this requirement what kind of integration options Cerner proposes. Is there any API’s or Web services exposed which can be used to build custom solutions for the same?
As far as I know Cerner did expose EMR / EHR information in HL7 format, but i don't have any idea how to access that.
I had also requested Cerner for the same awaiting replies from their end.
If anybody who have associated with similar kind of job can through some light and provide me with some insights.

You will need to request an interface between your organization and the facility with the EMR. An interface in the Health Care IT world is not the same as a GUI. Is is the mechanism (program/tool) that transfers HL7 data between one entity and the other. There will probably be a cost to have an interface setup. However, that is the traditional way Cerner communicates with 3rd parties. HIPAA laws will require that this connection be very secure.
You might also see if the facility with the EMR has an existing interface that produces the info you are after. You may be able to share that data or have a flat file generated from that interface that you could get access to. Because of HIPAA regulations, your client may be reluctant to share information in that manner.
I would suggest you start with your client's interface/integration team. They would be the ones that manage the information into and out of Cerner. They could also shed some light on how they prefer to see things done.
Good Luck

There are two ways of achieving this as I know. One is a direct connectivity to Cerner's Oracle database. This seems less likely to be possible as Cerner doesn't allow other vendors to have a direct access to their database.
The other way is to use Cerner's mPage Web Services. We have achieved this using mPage Web Services. The client needs to host the web services on a IBM WAS or some other container. We used WAS as that was readily available to us. Once the hosting is done, you will get a URL and using that you can execute any CCL program which will return you the data in JSON/XML format. mPage webservice has a basic HTTP authentication.
Now, CCL has to be written in a way which can return you the data you require.
We have a successful setup and have been working on this since 2014. For more details you can also try uCERN portal.
Thanks,
Navin

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string