Design for a Cloud Native Application in Azure for ML Insights and Actions

Design for a Cloud Native Application in Azure for ML Insights and Actions - azure

I have an idea whereby I intend to build a cloud native application for algorithmic trading, ideally by consuming all PaaS and SaaS (no IaaS), and I'd like to get some feedback on how I intend to build it. The concept is pretty straight-forward in that I intend to consume financial trading data from an external SaaS solution via an API query, feed that data into various Azure PaaS solutions (most notably ML for modeling), and then take some action. Here is a high-level diagram I've come up with so far:
Solution Overview
As a note, while I'm familiar with Azure, I'm not a Azure cloud engineer and have limited experience in actually building solutions myself. Subsequently, I intend to use this project as a foundation to further educate myself.
When starting on the build, I immediately questioned whether I should or shouldn't use Event Hubs. Conceptually it makes sense, in that I'm decoupling the production of a data stream from the consumption of it. Presumably, this facilitates less complications when / if I need to update the data feed(s) in the future. I also thought about where the data is stored... should it be a SQL database, or more simply, an Azure Table? The idea here is that the trading data will need to be stored for regression testing as my iterate through my models. All that said, looking for some insights from anybody that may have experience in this space.
Thanks!

There's no real question in here. Take a look on the architecture reference provided by Microsoft: https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/

Related

Should I be moving to a microservices based architecture?

I am working on a monolith system. All of it's code is in one repository (Web API and background workers). System is written in Nodejs and MongoDB (Mongoose) is used as a data store. My goal is to set a new path how project should evolve. At first I was wondering if I could move towards microservices based architecture.
Monolith architecture creates some problems:
If my background workers needs to scale. I have to deploy all the project to the server despite only using a small fraction of it.
All system must be redeployed when code changes. What if payment processor calls webhook while system is being redeployed?
Using microsevices advantages are quite obvious:
Smaller code base for individual microservice. Easier to reason about it.
Ability to select programming tools best for particular use case.
Easier to scale.
Looking at the current code I noticed that Mongoose ODM (Object Document Mapper) models are used across all the project to create, query and update models in database. As a principle of a good programming all such interactions with database should be abstracted. Business logic should not leak into other system layers. I could do that by introducing REPOSITORY pattern (Domain Driven Design). While code is still being shared across web api and it's background workers it is not a hard task to do.
If i decide to extract repositories into standalone microservices than all bunch of problems arise:
Some sort of query language must be introduced to accommodate complex search queries.
Interface must provide a way to iterate over search results (cursor based navigation) without returning all database documents over network.
Since project is in it's early stage and I am the only developer, going to microservices based architecture seems like an overkill. Maybe there are other approaches I should consider?
Extracting business logic and interaction with database into separate repository and sharing among services to avoid complex communication protocols between services?

Based on my experience with working in Microservices for last few years, it seems like an overkill in current scenario but pays off in long-term.
Based on the information stated above, my thoughts are:
Code Structure - Microservices Architecture (MSA) applying in above context means not separating DAO, Business Logic etc. rather is more on the designing system as per business functions. For example, if it is an eCommerce application, then you can shipping, cart, search as separate services, which can further be divided into smaller services. Read it more about domain-driven design here.
Deployment Unit - Keeping microservices apps as an independent deployment unit is a key principle. Hence, keep a vertical slice of the application and package them as Docker Image with Application Code, App Server (if any), Database and OS (Linux etc.)
Communication - With MSA, communication between services become a key and hence general practice is to remain with the message-oriented approach for communication (read about the reactive system and reactive programming for more insight).
PaaS Solution - There are multiple PaaS solutions available, which you can apply so that you don't need to worry about all the other aspects like container management, container orchestration, auto-scaling, configuration management, log management and monitoring etc. See following PaaS solutions:
https://www.nanoscale.io/ by TIBCO
https://fabric8.io/ - by RedHat
https://openshift.io - by RedHat
Cloud Vendor Platforms - AWS, Azure & Google Cloud all of them have specific support for Microservices App from the deployment perspective, which we can use as an alternative solution if you don't want to deploy PaaS solution in your organization.
Hope these pointers will have in understanding the overall landscape so that you can structure your architecture for future need.

I am working on a monolith system... My goal is to set a new path how project should evolve. At first I was wondering if I could move towards microservices based architecture.
In what ways do you need to evolve the project? Will it be mostly bugfixes, adding features, improving performance and/or scalability? Do you anticipate other developers collaborating in the future? Are you currently having maintenance issues? The answers to these questions (and many more) should be considered in guiding your choices.
You seem to be doing your homework around the pros and cons of a microservice architecture, so if you haven't asked yourself why you're even doing this in the first place, now would be good time to do so.
Maybe there are other approaches I should consider?
There's always the good old don't-break-what's-going ;)

Decision path for Azure Service Fabric Programming Models

Background
We are looking at porting a 'monolithic' 3 tier Web app to a microservices architecture. The web app displays listings to a consumer (think Craiglist).
The backend consists of a REST API that calls into a SQL DB and returns JSON for a SPA app to build a UI (there's also a mobile app). Data is written to the SQL DB via background services (ftp + worker roles). There's also some pages that allow writes by the user.
Information required:
I'm trying to figure out how (if at all), Azure Service Fabric would be a good fit for a microservices architecture in my scenario. I know the pros/cons of microservices vs monolith, but i'm trying to figure out the application of various microservice programming models to our current architecture.
Questions
Is Azure Service Fabric a good fit for this? If not, other recommendations? Currently i'm leaning towards a bunch of OWIN-based .NET web sites, split up by area/service, each hosted on their own machine and tied together by an API gateway.
Which Service Fabric programming model would i go for? Stateless services with their own backing DB? I can't see how Stateful or Actor model would help here.
If i went with Stateful services/Actor, how would i go about updating data as part of a maintenance/ad-hoc admin request? Traditionally we would simply login to the DB and update the data, and the API would return the new data - but if it's persisted in-memory/across nodes in a cluster, how would we update it? Would i have to expose this all via methods on the service? Similarly, how would I import my existing SQL data into a stateful service?
For Stateful services/actor model, how can I 'see' the data visually, with an object Explorer/UI. Our data is our Gold, and I'm concerned of the lack of control/visibility of it in the reliable services models
Basically, is there some documentation on the decision path towards which programming model to go for? I could model a "listing" as an Actor, and have millions of those - sure, but i could also have a Stateful service that stores the listing locally, and i could also have a Stateless service that fetches it from the DB. How does one decide as to which is the best approach, for a given use case?
Thanks.

What is it about your current setup that isn't meeting your requirements? What do you hope to gain from a more complex architecture?
Microservices aren't a magic bullet. You mainly get four benefits:
You can scale and distribute pieces of your overall system independently. Service Fabric has very sophisticated tools and advanced capabilities for this.
You can deploy and upgrade pieces of your overall system independently. Service Fabric again has advanced capabilities for this.
You can have a polyglot system - each service can be written in a different language/platform.
You can use conflicting dependencies - each service can have its own set of dependencies, like different framework versions.
All of this comes at a cost and introduces complexity and new ways your system can fail. For example: your fast, compile-time checked in-proc method calls now become slow (by comparison to an in-proc function call) failure-prone network calls. And these are not specific to Service Fabric, btw, this is just what happens you go from in-proc method calls to cross-machine I/O - doesn't matter what platform you use. The decision path here is a pro/con list specific to your application and your requirements.
To answer your Service Fabric questions specifically:
Which programming model do you go for? Start with stateless services with ASP.NET Core. It's going to be the simplest translation of your current architecture that doesn't require mucking around with your data layer.
Stateful has a lot of great uses, but it's not necessarily a replacement for your RDBMS. A good place to start is hot data that can be stored in simple key-value pairs, is accessed frequently and needs to be low-latency (you get local reads!), and doesn't need to be datamined. Some examples include user session state, cache data, a "snapshot" of the most recent items in a data stream (like the most recent stock quote in a stream of stock quotes).
Currently the only way to see or query your data is programmatically directly against the Reliable Collection APIs. There is no viewer or "management studio" tool. You have to write (and secure) an API in each service that can display and query data.
Finally, the actor model is a very niche model. It serves specific purposes but if you just treat it as a data store it will not work for you. Like in your example, a listing per actor probably wouldn't work because you can't query across that list, or even have multiple users reading the same listing simultaneously.

Choosing a long-term storage/analytic system?

A brief summary of the project I'm working on:
I was hired as a web dev intern at a small company (part of a larger corporation) close to the state college I attend. For the past couple months, myself and two other interns have been working on the front-end as well as the back-end. The company is prototyping adding sensors to its products (oil/gas industry); we were tasked with building the portal that customers could login to to see data from their machines even if they're not near them.
Basically, we're collecting sensor data (~ten sensors/machine) and it's sent back to us. Where we're stuck is determining the best way to store and analyze long term data. We have a Redis Cache set up for fast access by the front-end, where only the lastest set of data for each machine is stored. But for historical data, I (and my coworkers) are having a tough time deciding the best route to go. Our whole project is based in VS (C#/Razor) with Azure integration (which is amazing by the way), so I'd like to keep the long term storage there as well. As far as I can tell, HDinsight + data in a BLOB seems to be the best option, but I'm fairly green when it comes to backend solutions. I would just like input from some older developers who may have more experience in this area, as we are the only developers here besides a couple older members who are more involved in the engineering side of things vs. development.
So, professionals of stack overflow, what would be your recommendation for long-term data storage and analytics?
PS: I apologize if I have HDinsight confused. From what I understand, it maps data in BLOB storage into HBase for easier analytics? Hadoop/HBase confuses me.

My first recommendation would be Azure Table storage. It provides a highly scalable and low cost data archival solution. If designed properly, you can also get a very decent query performance. Refer to the Azure Storage Table Design Guide for more details.
My second choice would be Azure DocumentDB service which is a NoSQL document database. It costs a bit more but querying data is much more flexible.
You should only go with HDInsight when you have a specific need as it's a resource-intensive and expensive service. Once you identify a specific requirement for a big-data analysis that's when you import your data and process it with HDInsight.

What gives lower latency, to code business logic in T-SQL or js?

I'm about to start developing a back-end service for a mobile app using Azure Mobile Services. But I honestly can't figure out which approach is better for perfomance: to code business logic using stored procedures in T-SQL or doing it using javascript. Other than perfomance, also, which one gives more oportunnity to reuse?

JavaScript or C# would offer more opportunity for reuse, if for example you later expand your app and need to provide fuller web services than WAMS can provide. In terms of performance there's probably not enough difference to tip the scale one way or the other, since the IO is the main factor.
As a general rule, embedding business/application logic in your database is to be avoided, partly because SQL-derived languages are rarely ideal for that type of code, but more practically because it makes it much harder to support alternative databases in the future.

What design decisions can I make today, that would make a migration to Azure and Azure Tables easier later?

I'm rebuilding an application from the ground up. At some point in the future...not sure if it's near or far yet, I'd like to move it to Azure. What decisions can I make today, that will make that migration easier.
I'm going to be dealing with large amounts of data, and like the idea of Azure Tables...are there some specific persistance choices I can make now that will mimick Azure Tables so that when the time comes the pain of migration will be lessened?

A good place to start is the Windows Azure Guidance
If you want to use Azure Tables eventually, you could design your database where all tables are a primary key, plus a field with XML data.

I would advise to plan along the lines of almost-infinitely scalable solutions (see Pat Helland's paper on Life beyond distributed transactions) and the CQRS approach in general. This way you'll be able to avoid common pitfalls of the distributed apps generally and Azure table storage peculiarities.
This really helps us to work with Azure and Cloud Computing at Lokad (data-sets are quite large plus various levels of scalability are needed).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string