Trigger Bigquery Scheduled Queries from Cloud Function [closed] - python-3.x

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I need to run some Scheduled Queries on-demand.
I believe Cloud Functions triggered by Pubsub events is a solution that provides good decoupling.
However, I can't find a reliable solution.
This solution crashes
BigQuery + Cloud Functions:
This one works only on the documentation page
Method: transferConfigs.startManualRuns
What is the best way to trigger On-Demand Scheduled Queries from cloud function?

I understood that you don't want a schedule queries, you want a query to easily invoque, without rewriting it.
I can propose 2 solutions:
store your query in a file on Cloud Storage. When you invoque your Cloud Function, you read the file content and you perform a bigQuery job on it.
PRO: you simply have to update the file content to update the query.
CONS: you need to read a file from storage and then to call BigQuery -> 2 API to calls and a query file to manage
Use stored procedure
Firstly, create your stored procedure
CREATE OR REPLACE PROCEDURE `my_project.my_dataset.my_procedure`()
BEGIN
SELECT * FROM `my_project.my_dataset.my_table`;
.......
END;
Then invoke it in your Cloud Function (It's a query to BigQuery
CALL `my_project.my_dataset.my_procedure`();
PRO: Simply update the stored procedure to update the query. Can perform complex queries
CONS: you don't have a query history (you can activate the bucket versioning in the previous solution for this)
Are they acceptable solutions?

Related

ADF, Azure Function or Hybrid [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 days ago.
Improve this question
I need to download data from several APIs some using basic REST some using GraphQL, parse the data and map some of the fields to various Azure SQL tables discarding the unneeded data (will be later visualised in PowerBI).
I started using Azure Data Factory but got frustrated with the lack of simple functions like converting json field containing html into text.
I then looked at Azure Functions, thought Python (although I’m open to NodeJS) however I’ve got a lot of data to download and upSert into the database and there is mentions on the internet the ADF is the most efficient to bulk upSert data.
Then I thought ADF using function to get the data and ADF to bill copy.
So my question is what should I be using for my use case? I’m open to any suggestions but it needs to be on Azure and cost sensitive. The ingestion needs to run daily upSerting around 300,000 records.
I think this pretty much comes down to taste, as you can probably solve this entirely only using ADF or an azure function, depending on more specific circumstances of your case. In my personal experience I've often ended up using the hybrid variant because I can be easier due to more flexibility compared to the standard API components of ADF, doing the extraction from an API using an azure function, storing the data in blob storage/data lake and then loading the data into a database using ADF. This setup can be pretty cost effective from my experience, depending on if you can use an azure function consumption plan (cheaper than alternatives) and/or can void using data flows in ADF (a significant cost driver in ADF)

Best Azure serverless service to run python data processing project [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I am quite new to Azure and I am getting a bit lost in all the available services.
What do I want to do:
I want to run a Python project serverless on Azure which gets data from a database, processes it, does some analysis and writes it to a database again. After it's done, it should stop the server again. This can be triggered by some data uploaded to a storage location or has to run periodically. Most optimal I would like to be able to build it through CD (GitHub Actions).
What did I find
Reading through the documentation and some other resources, these are the services I think I can use in descending order, but I am not 100% sure.
Azure Functions
Azure Container Instances
Azure Web Apps
Also I found this, but seems outdated.
Question:
Which Azure service matches the best for my use case.
What you are trying to accomplish has a name - ETL (Extract-Transform-Load). This is a general pattern when you need to take data from its source (DB in your case), manipulate it, and offload it to some destination (DB in your case again).
You listed some valid options. From your list, Azure Function will be a truly serverless option as you aren't billed when it's idling. Other options can also accomplish the task, but you will pay also for hours when your code does nothing.
There's also a service just for your need: Azure Data Factory. You can design your data flow by using the UI and include your Python functions as steps. The overall result will be a data pipeline (like CD for data). And of course it's serverless. You will be billed only for the time the pipeline is executing.

What dependency to use for Firestore with App Engine Node.js backend [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
My goal is to implement a backend service in Node.js in Google App Engine with sessions, user authentiation and a basic API for user data. I want to use Firebase for authentication and Firestore for storing the user data.
I have been reading the Google documentation for Node.js. I am utterly confused about the differences between
Firebase,
Firestore,
Firestore in native mode,
Firestore in Datastore mode and
Google Cloud Datastore.
When I navigate in the Google Cloud Platform to Datastore, it tells me
You’re using Cloud Firestore in Native mode
You can always go straight to the Firestore page from the main navigation to access your database.
I want to follow this guide for implementing sessions since it works fine already. For authentication there is no guide in Node.js for App Engine.
I have two options:
npm install firebase-admin --save, guide and dependency
npm install #google-cloud/firestore, guide and dependency
My very focused question
What dependency should I use?
Just to clear a bit more about these because I know it can be confusing:
Firebase is a different platform from GCP, it does share some resources and some tools, but the focus is more regarding just coding and storage of data rather than all the thing you can do in GCP.
Firestore is the "new" database that was launched in Firebase for strong consistency, scalability and it's noSQL
Google Cloud Datastore: Is the original scalable noSQL solution that was launched on GCP, it has eventual consistency and high performance
Firestore in native mode: So, here is were it gets tricky, Firestore in "native mode" is just normal Firestore BUT on GCP projects, not on Firebase. After some time, they saw that Datastore and Firestore were pretty much the same thing but Firestore was a bit better, so they decided to migrate Datastore to Firestore and that's why their documentation is so mixed.
Firestore in Datastore mode: This is Firestore but with the behaviours of Datastore like eventual consistency and so on.
The differences are covered more in depth over here, but for most cases it's better to jump directly to Firestore as it's backwards compatible with Datastore and solves some issues such as the eventual consistency.
As for which node dependency you should use, I would go with Firebase just because the documentation is a bit more clear and there are more examples of usage, but it's really up to you.

Good architecture for Azure for streaming analytics? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I have JSON data coming from sensors per seconds to Azure IoT hub. Data is time series with 15 variables. I want to process this data real time using c# application which is quite complex and send outputs events to some other service(can be storage or PowerBI)
What do you think is the best architectural approach for it?
1. Try to process the data in stream analytics with c# code, I know there is .Net support for azure stream analytics but i think is very premature? Any experience in this approach?Does azure stream analytics support complex c# algorithms?
2. Store data to azure data lake and use data lake analytics to process the data?
Your experiences and recommendations are very much appreciated.
Many thanks
Try to process the data in stream analytics with c# code
Azure Stream Analytics uses Stream Analytics Query Language to perform transformations and computations over streams of events. The C# SDK is just a way to create and run a Stream Analytics job. All the transformations and computations work should be written in Stream Analytics Query Language.
Store data to azure data lake and use data lake analytics to process the data?
Stream Analytics is better in real-time data handle scenarios. I suggest you combine these 2 ways together. Use Azure Stream Analytics to do a preliminary and necessary data processing and conversion and output the data to azure data lake and use data lake analytics to further process the data.
If you're open to alternative solutions you could also use a HTTP API like Stride, which enables the creation of networks of continuous SQL queries, chained together, with the ability to subscribe to streams of changes as a means to streaming data out to applications.
If your computational needs fit within the confines of SQL this approach might work well for you. You can check out the Stride docs to see some examples.

Is nodejs + redis reliable? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Seems there has been some data loss with nodejs + redis :
https://hallard.me/damaged-community-forum-lost-data/
https://community.nodebb.org/topic/6904/how-to-export-from-redis-to-mongodb-my-database-got-wiped/58
Did someone experience the same disaster and know how to fix it apart from backuping up the whole stuff.
At the company i work at, we've been using it for quite a long time now and it never failed us.
In my opinion, you should never use a database you are not very familiar with, then and only then you will face problems such as saving corrupted data or "losing data".
redis will lose all its data in case of crashes (if the server memory maxes out for example) hence you will need to use redis persistence modules.
there are two types of redis persistence data modules, RDB and AOF. you should choose consciously choose which one (or both) to use based on the nature of data you're going to store in there.
The RDB persistence performs point-in-time snapshots of your dataset at specified intervals.
the AOF persistence logs every write operation received by the server, that will be played again at server startup, reconstructing the original dataset. Commands are logged using the same format as the Redis protocol itself, in an append-only fashion. Redis is able to rewrite the log on background when it gets too big.
read more about it here: http://redis.io/topics/persistence
here is a quote from a good blog post about using redis as a primary database:
Redis persistence is not less reliable compared to other databases, it
is actually more reliable in most of the cases because Redis writes in
an append-only mode, so there are no crashed tables, no strange
corruptions possible.
source: https://blog.andyet.com/2012/02/09/redis-reliability-for-realtime-apps/
node should not affect how redis work, its only used to communicate data from and to redis, you should not worry about using node in particular.

Resources