Best Azure serverless service to run python data processing project [closed]

Best Azure serverless service to run python data processing project [closed] - azure

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I am quite new to Azure and I am getting a bit lost in all the available services.
What do I want to do:
I want to run a Python project serverless on Azure which gets data from a database, processes it, does some analysis and writes it to a database again. After it's done, it should stop the server again. This can be triggered by some data uploaded to a storage location or has to run periodically. Most optimal I would like to be able to build it through CD (GitHub Actions).
What did I find
Reading through the documentation and some other resources, these are the services I think I can use in descending order, but I am not 100% sure.
Azure Functions
Azure Container Instances
Azure Web Apps
Also I found this, but seems outdated.
Question:
Which Azure service matches the best for my use case.

What you are trying to accomplish has a name - ETL (Extract-Transform-Load). This is a general pattern when you need to take data from its source (DB in your case), manipulate it, and offload it to some destination (DB in your case again).
You listed some valid options. From your list, Azure Function will be a truly serverless option as you aren't billed when it's idling. Other options can also accomplish the task, but you will pay also for hours when your code does nothing.
There's also a service just for your need: Azure Data Factory. You can design your data flow by using the UI and include your Python functions as steps. The overall result will be a data pipeline (like CD for data). And of course it's serverless. You will be billed only for the time the pipeline is executing.

Related

ADF, Azure Function or Hybrid [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 days ago.
Improve this question
I need to download data from several APIs some using basic REST some using GraphQL, parse the data and map some of the fields to various Azure SQL tables discarding the unneeded data (will be later visualised in PowerBI).
I started using Azure Data Factory but got frustrated with the lack of simple functions like converting json field containing html into text.
I then looked at Azure Functions, thought Python (although I’m open to NodeJS) however I’ve got a lot of data to download and upSert into the database and there is mentions on the internet the ADF is the most efficient to bulk upSert data.
Then I thought ADF using function to get the data and ADF to bill copy.
So my question is what should I be using for my use case? I’m open to any suggestions but it needs to be on Azure and cost sensitive. The ingestion needs to run daily upSerting around 300,000 records.

I think this pretty much comes down to taste, as you can probably solve this entirely only using ADF or an azure function, depending on more specific circumstances of your case. In my personal experience I've often ended up using the hybrid variant because I can be easier due to more flexibility compared to the standard API components of ADF, doing the extraction from an API using an azure function, storing the data in blob storage/data lake and then loading the data into a database using ADF. This setup can be pretty cost effective from my experience, depending on if you can use an azure function consumption plan (cheaper than alternatives) and/or can void using data flows in ADF (a significant cost driver in ADF)

Trigger Bigquery Scheduled Queries from Cloud Function [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I need to run some Scheduled Queries on-demand.
I believe Cloud Functions triggered by Pubsub events is a solution that provides good decoupling.
However, I can't find a reliable solution.
This solution crashes
BigQuery + Cloud Functions:
This one works only on the documentation page
Method: transferConfigs.startManualRuns
What is the best way to trigger On-Demand Scheduled Queries from cloud function?

I understood that you don't want a schedule queries, you want a query to easily invoque, without rewriting it.
I can propose 2 solutions:
store your query in a file on Cloud Storage. When you invoque your Cloud Function, you read the file content and you perform a bigQuery job on it.
PRO: you simply have to update the file content to update the query.
CONS: you need to read a file from storage and then to call BigQuery -> 2 API to calls and a query file to manage
Use stored procedure
Firstly, create your stored procedure
CREATE OR REPLACE PROCEDURE `my_project.my_dataset.my_procedure`()
BEGIN
SELECT * FROM `my_project.my_dataset.my_table`;
.......
END;
Then invoke it in your Cloud Function (It's a query to BigQuery
CALL `my_project.my_dataset.my_procedure`();
PRO: Simply update the stored procedure to update the query. Can perform complex queries
CONS: you don't have a query history (you can activate the bucket versioning in the previous solution for this)
Are they acceptable solutions?

How should I deploy my micro-services in Azure? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am confused about how should I deploy my micro-services in azure? I did deploy by creating couple of app services for each of the micro-services. I deployed using the ARM template. It is becoming very costly to deploy every micro-service in different app service and very difficult to manage all these services. Another approach which i was thinking is create one service and could have deployed under one app service but it would be again monolithic kind of web API.
Recently, I got to know through one of the blogs that to deploy micro-services you should use Azure Service fabric.
I want to understand which way I should opt out of below options-
One app service.
Multiple micro-service in different app service.
containerization with kubernetes(or other orchestrator)
azure service fabric?
Any other option which you suggest.
I am really confused about these. Please help me.
Thanks in Advance!!!

I'd highly recommend starting with the Azure Architecture Guide which will give you a solid big-picture overview. From there, you could take a look at the microservice-specific guidance.
To provide a very short, incomplete answer to your question, App Services are a unit of scale. If you're building a small service that focuses on one domain, and all of your functionality can scale together, you may be better off with one application hosted on one App Service. Know your domain first; don't split things up just to have microservices.
To choose which Azure compute service to use, this decision tree is very helpful.

Microservices are not only a solution to technological problems. They are also a solution to an organizational scalability problem. In the other hand, Microservices are really hard to manage, that is why usually they cannot be implemented without DevOps techniques to help to solve this problem.
I am saying all this because you wrote that they are becoming hard to manage, and it might be that the problem is not technical, but instead you don't have the right org structure and processes to handle microservices.
Microservices work well if the teams that builds it, runs it. And that includes deployment, support, etc. You should not have one person/team "handling" and deploying other's teams microservices, because as you discovered, microservices are really hard to manage
From a pure technological point of view, you need to clarify some stuffs:
How many microservice
How many teams/developers
What technologies are your microservices built in
How chatty these microservices are between each other
From those 4 questions, you probably end up in App Services if you have small amount of microservices and they do not need more than 10 instances each, are built in one of the supported technologies and your microservices are not super chatty to each other.
I would use AKS if you have a lot of microservices and many teams, so it is worth it to have a small platform team getting expert in kubernetes (not in charge of the deployments!)
I would recommend you to go through these links
Martin Fowler: https://martinfowler.com/microservices/
DevOps at Microsoft: https://www.youtube.com/watch?v=OwiT59e0kB4&t=349s
Why not to do Microservices: https://segment.com/blog/goodbye-microservices/
Million microservices talks in youtube like this: https://www.youtube.com/watch?v=MrV0DqTqpFU

Using IoT platform vs normal web application [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
There are a lot of IoT platform in the market like AWS Amazon IoT and Microsoft Azure Hub, I understand all the features that are offered by those platforms.
Questions:
Couldn't I implement all those features on a normal web application which can handle communication and all these features and run this application on a cluster of unmanaged server and have the same result?
When shall I use a normal web application and when shall I use IoT platform?

Of course you can implement your own IoT hub on any web application and cloud (or on-prem) platform, there is nothing secret or proprietary in those solutions. The question is, do you want to do that? What they offer is a lot of built in functionality that would take you some serious time to get production ready when building it yourself.
So:
1) yes, you can build it. Let's compare it to Azure IoT hub and look at what that contains:
a) reliable messages to and from hub
b) periodic health pulses
c) connected device inventory and device provisioning
d) support for multiple protocols (eg HTTP, AMQP, MQTT...)
e) access control and security using tokens
.... and more. Not supposed to be a full feature list here, just to illustrate that these solutions contains a whole lot of functionality, which you may (or may not) need when building your own IoT solution.
2) when does it make sense to build this yourself? I would say when you have a solution where you don't really neeed all of that functionality or can easily build or setup those parts you need yourself. Building all of that functionality doesn't, generally speaking, make sense, unless you are building your own IoT platform.
Another aspect is the ability to scale and offer a solution for multiple geographic locations. A web application on a cloud provider could easily be setup to both autoscale and cover multiple regions, but it is still something you would have to setup and manage yourself. It would likely also be more expensive to provide the same performance as the platform services does, they are built for millions of devices across a large number of customers, their solution will likely look different under the hood.
Third is time-to-market, by going with a platform service will get you up and running with your IoT solution fairly quick as opposed to building it yourself.
Figure out what requirements you want to support, how you want to scale, how many devices and so on. Then you can do a simple comparison of price and also what it would cost you to build the features you need.

Is there a schema versioning tool for cassandra [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
In the sql world, it's quite common to have a tool that goes through a folder of schema scripts to set up some schema. A widely used approach is to have a table holding the current db version number, and ddl scripts so that we can start from any version of the db and update to any subsequent version in a controller manner. Visual Studio has db projects, redgate have similar tools.
I was wondering if there's something for cassandra as well. I know it won't be too difficult to implement something basic for cassandra, but was wondering if somebody's already done it.

Pillar manages migrations for your Cassandra data stores.
Pillar grew from a desire to automatically manage Cassandra schema as code. Managing schema as code enables automated build and deployment, a foundational practice for an organization striving to achieve Continuous Delivery.
Pillar is to Cassandra what Rails ActiveRecord migrations or Play Evolutions are to relational databases with one key difference: Pillar is completely independent from any application development framework.
https://github.com/comeara/pillar

Your initial question doesn't specify a language, though you later indicate you'd like C#. I don't have a C# answer, but I've extracted the Java versioning component that I'm using for my project. I also created a small sample project that shows how to integrate it. It's bare-bones. There are different approaches to this problem, so I picked one that was simple to build and does what I need. Here are the two GitHub projects:
https://github.com/DonBranson/cql_schema_versioning
https://github.com/DonBranson/cql_schema_versioning_example
This component doesn't store a version # in the schema, but stores the list of scripts it's run. It depends on the sort order of the script names to determine run order. Very basic.

Cassandra is by its nature is 'schemaless' it is a a structured key-value store, so it is very different from a traditional rdbms in that regard.
Cassandra has now evolved to be 'schema-optional' in that it allows to you describe general datatypes that live in a particular column family.
Try looking at Liquibase and/or Flyaway to see if the extensions provide the versioning capability you require.
http://bungeedata.blogspot.com/2013/12/liquibase-and-cassandra.html
http://www.datastax.com/dev/blog/schema-in-cassandra-1-1
http://planetcassandra.org/blog/schema-vs-schema-less/

I was looking for a schema migration tool that could be used for the following scenarios:
Automated upgrade to schema when an application is deployed.
Allow test Cassandra databases to be populated for integration tests.
After some searching, I've found the following two that look like potential candidates:
https://github.com/Contrast-Security-OSS/cassandra-migration
https://github.com/DonBranson/cql_schema_versioning

I'm not aware of anything that exists today.
To the extent that you're using CQL you could probably come up with something but you'll likely run into problems with the limited abilities of CQL to modify tables and then with transformation phase.
When I've used these types of tools with SQL, I always ended up with a bunch of SQL to update the data set after the application of updated DDL.
With CQL, I've had to write code to be applied after the schema change.
If all you're doing is adding or dropping tables, columns and indexes, it should be do-able.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string