Pins + Vetiver vs MLflow which one to choose for MLOps - mlflow

I am a big fan boy of tidymodels and played around with vetiver + pins in R and Python in order to not only develop models but actually deploy them.
However, if you are looking for tools that support in the area of MLOps, sooner or later you will stumble across MLflow. Just like vetiver + pins, MLflow helps to track and deploy models and to build a model registry.
I see some pros for vetiver like:
you can directly dockerize a model or create a REST service
vetiver+ pins is very, very easy to use and does not require a lot of setup
At this point, I'd like to ask the community if there are any other advantages of vetiver + pins over MLflow or is it advisable to use MLflow directly, since it is completely agnostic regarding the programming language and already has a very large community?
Many thanks for your answers! M.

Related

Generic test data generation using NLP

I am new to AI and NLP world and trying to create functionality similar to datasloth.generate(). functionality work perfectly fine to my use case only problem is our organization has not approved use of Open-AI outside of POCs and also its bit expensive.
I am after similar functionality which uses open source libraries instead of OPEN-AI GPT-3 (I expect some degradation is results with alternatives)
Can you guys direct me or guide me to right path ?

If I'm integrating two seperate API's, is a MERN stack appropriate?

I have two separate cloud-based APIs that I am working on integrating together. Neither software directly talks to each other so I am creating something in the middle to get them to communicate. I have had trouble finding examples or documentation on how exactly to do this, does anyone know of any resources that could help me out?
My plan going in was to use a MERN Stack, running on a local server to do GET and POST requests to both APIs, use some mapping and logic to transpose the data into the correct format and send it to the other software. I do not have a client per se (other than myself) on my end, so I really will be skipping the React part of MERN, at least that is what I'm thinking. I'll be using Mongo to keep track of both sets of data for redundancy. I also considered using a LAMP Stack but felt that MERN would be faster in handling the data, and Mongo is more flexible in handling different data formats. If there is another process or technology that could help me that I'm not thinking of, I would be grateful to hear about it.
Has anyone encountered something like this before? Thank you.
As with most architecture questions, there's no completely right or wrong answer here. You could certainly design a well-built system to handle for this purpose with either stack; even more-so when you mention that your front-end framework is not an important consideration. Instead, ask yourself questions like this:
Which stack do you have more experience with, and is this an appropriate time to learn a new set of technologies, or is it important to do the best work you're capable of right now (how important is time, cost, or quality in this case)?
Another generalization I'll stick my neck out for is a data-first approach; what sort of data are you dealing with from each cloud integration, and what kind of data do you need to support and/or create in order to make your system work? Mongo, being a NoSQL persistence layer, will allow you to change your data model and handle more varied data in a quicker and easier manner than a SQL solution will. This is a double-edged sword, however, as lack of validation and a strongly-constrained (typed?) data model will make your application harder to work with and debug as it grows. In short - how big might this application grow?
If you have a handy and familiar way to manage the three different data models you're dealing with (cloud service 1, cloud service 2, and your app) via MySQL, then that's a compelling reason to use it. However, if your style is to start dumping data into your database and you're comfortable with a more iterative approach (which may require more, albeit shorter rounds of refactoring), then Mongo with MERN may be the preferable choice.
Finally, will others ever be working on this application? If so, which language would you prefer to be dealing with them upon - PHP or Javascript?

Web application deployment approaches

Currently, our product is a web application with SQL Server as DBMS, ASP.NET backend, and classic HTML/JavaScript/CSS frontend. The product is actively developed and each month we have to deploy a new version of it to production.
During this deployment, we update all the components listed above (apply some SQL scripts, update binaries, and client files) but we deploy only the delta (set of files which were changed since the last release). It has some benefits like we do not reset custom data/configs/client adjustments.
Now we are going to move inside clouds like Azure, AWS, etc. Adjust product architecture to be compliant with the Docker/Kubernetes and provide the product as SaaS.
And now the question itself: "Which approach of deployment is recommended in the clouds?" Can we keep applying the delta only? Or we have to reorganize the process to always deploy from scratch?
If there are some Internet resources I have missed, please share.
This question is extremely broad but maybe some clarification could steer you in the right direction anyway:
Source code deployments (like applying delta's) and container deployments are two very different directions in the sense that the tooling you invest in during the entire SLDC CAN differ substantially. Some testing pipelines/products focus heavily (or exclusively) on working with one or the other. There will be tools that can handle both of course.
They also differ in the problems they're attempting to solve and come with some pro's and con's:
Source Code Deployments/Apply Diffs:
Good for small teams and quick deployments as they're simple to understand and setup.
Starts to introduce risk when you need to upgrade the Host OS or application dependencies
Starts to introduce risk when the Host's in production begin to drift (have more differing files then expected) more dramatically over time
Slack has a good write up of their experience here.
Container deployments
Provides isolation from the application (developer space) and the Host OS (sysadmin/ops space). This usually means they can work with each other independently.
Gives an "artifact" that won't change between deployments, ie the container tagged v1 will always be the same unless you do something really funky. You can't really guarantee this
The practice of isolating stateless components makes autoscaling those components very easy, and you can eventually spend more time on the harder ones (usually stateful).
Introduces a new abstraction with new concerns that your team will have to mature into. Testing pipelines, dev tooling, monitoring/loggin architectures might all need to be adjusted over time and that comes with cost and risk.
Stateful containers is hardly a solved problem (ie shoving an existing database in a container can be a surprising challenge).
In order to work with Kubernetes, you need to have a containerized application. That doesn't mean you need to containerize your entire product over night. Splitting out the front end to deploy with cloudfront/s3, and containerizing a stateless app will get your feet wet.
Some books that talk about devops philosophies (in which this transition plays a part)
The Devops Handbook
Accelerate
Effective Devops
SRE book

Should I be moving to a microservices based architecture?

I am working on a monolith system. All of it's code is in one repository (Web API and background workers). System is written in Nodejs and MongoDB (Mongoose) is used as a data store. My goal is to set a new path how project should evolve. At first I was wondering if I could move towards microservices based architecture.
Monolith architecture creates some problems:
If my background workers needs to scale. I have to deploy all the project to the server despite only using a small fraction of it.
All system must be redeployed when code changes. What if payment processor calls webhook while system is being redeployed?
Using microsevices advantages are quite obvious:
Smaller code base for individual microservice. Easier to reason about it.
Ability to select programming tools best for particular use case.
Easier to scale.
Looking at the current code I noticed that Mongoose ODM (Object Document Mapper) models are used across all the project to create, query and update models in database. As a principle of a good programming all such interactions with database should be abstracted. Business logic should not leak into other system layers. I could do that by introducing REPOSITORY pattern (Domain Driven Design). While code is still being shared across web api and it's background workers it is not a hard task to do.
If i decide to extract repositories into standalone microservices than all bunch of problems arise:
Some sort of query language must be introduced to accommodate complex search queries.
Interface must provide a way to iterate over search results (cursor based navigation) without returning all database documents over network.
Since project is in it's early stage and I am the only developer, going to microservices based architecture seems like an overkill. Maybe there are other approaches I should consider?
Extracting business logic and interaction with database into separate repository and sharing among services to avoid complex communication protocols between services?
Based on my experience with working in Microservices for last few years, it seems like an overkill in current scenario but pays off in long-term.
Based on the information stated above, my thoughts are:
Code Structure - Microservices Architecture (MSA) applying in above context means not separating DAO, Business Logic etc. rather is more on the designing system as per business functions. For example, if it is an eCommerce application, then you can shipping, cart, search as separate services, which can further be divided into smaller services. Read it more about domain-driven design here.
Deployment Unit - Keeping microservices apps as an independent deployment unit is a key principle. Hence, keep a vertical slice of the application and package them as Docker Image with Application Code, App Server (if any), Database and OS (Linux etc.)
Communication - With MSA, communication between services become a key and hence general practice is to remain with the message-oriented approach for communication (read about the reactive system and reactive programming for more insight).
PaaS Solution - There are multiple PaaS solutions available, which you can apply so that you don't need to worry about all the other aspects like container management, container orchestration, auto-scaling, configuration management, log management and monitoring etc. See following PaaS solutions:
https://www.nanoscale.io/ by TIBCO
https://fabric8.io/ - by RedHat
https://openshift.io - by RedHat
Cloud Vendor Platforms - AWS, Azure & Google Cloud all of them have specific support for Microservices App from the deployment perspective, which we can use as an alternative solution if you don't want to deploy PaaS solution in your organization.
Hope these pointers will have in understanding the overall landscape so that you can structure your architecture for future need.
I am working on a monolith system... My goal is to set a new path how project should evolve. At first I was wondering if I could move towards microservices based architecture.
In what ways do you need to evolve the project? Will it be mostly bugfixes, adding features, improving performance and/or scalability? Do you anticipate other developers collaborating in the future? Are you currently having maintenance issues? The answers to these questions (and many more) should be considered in guiding your choices.
You seem to be doing your homework around the pros and cons of a microservice architecture, so if you haven't asked yourself why you're even doing this in the first place, now would be good time to do so.
Maybe there are other approaches I should consider?
There's always the good old don't-break-what's-going ;)

BDD with Cucumber to guide Chef development

I like a lot Cucumber and I find a very useful tool to solve problems seeing them with an outside-in approach so I would like to use it as part of chef projects too. I have successfully integrated it into the project I'm working on but at the time of writing business goal of features I have some doubts.
Who is the end user here?
Regarding on this the feature will be more service oriented or not, ie:
If the feature is more architecture faced the I could write a MongoDB feature which describes that I need up and running a MongoDB service and that the applications is linked to it.
In the other hand I should just write application features, forgetting about the infrastructure behind and then assume that if the cucumber tests run well for the application then it means that the infrastructure is fine too. (I dont like this approach)
Which of the both approaches are better? I like the most the first one but I'm just a noob on these lands. Please give me your considerations.

Resources