On GCP, using the python pubsub client, how to list only a subset of subscriptions based on a filter - python-3.x

On the gcloud cli, when listing the pubsub subscriptions of a project, it is possible to filter results by using the --filter flag. Here is an example:
gcloud --project=my-project pubsub subscriptions list --filter=my-filter-string --format='value(name)'
I did not manage to find out how to do this with the python library and its list_subscription method.
It seems to only basically accept a project string and to return all subscriptions in the project. This means I would need to get all the subscriptions in the project and then loop through them to filter them, as follows:
from google.cloud import pubsub_v1
subscriber_client = pubsub_v1.SubscriberClient()
filter = "my-filter-string"
with subscriber_client:
page_result = subscriber_client.list_subscriptions(
project="projects/my-project",
)
filtered_subscriptions = [
subscription.name
for subscription in page_result
if filter in subscription.name.split("/")[-1]
]
for subscription_name in filtered_subscriptions:
print(subscription_name)
Is there a more efficient way to do that ?
I have been trying to do this with the metadata: Sequence[Tuple[str, str]] argument on the method, but could not find examples of how to do it.

Neither the REST nor RPC API provide a way to filter on the server side, so no there is no more efficient way to do this.
I imagine the gcloud code to do the filter is conceptually similar to what you wrote.

Related

Azure Functions Orchestration using Logic App

I have multiple Azure Functions which carry out small tasks. I would like to orchestrate those tasks together using Logic Apps, as you can see here:
Logic App Flow
I am taking the output of Function 1 and inputting parts of it into Function 2. As I was creating the logic app, I realized I have to parse the response of Function 1 as JSON in order to access the specific parameters I need. Parse JSON requires me to provide an example schema however, I need to be able to parse the response as JSON without this manual step.
One solution I thought would work was to register Function 1 with APIM and provide a response schema. This doesn't seem to be any different to calling the Function directly.
Does anyone have any suggestions for how to get the response of a Function as a JSON/XML?
You can run Javascript snippets and dynamic parse the response from Function 1 without providing a sample.
e.g.
var data = Object.keys(workflowContext.trigger.outputs.body.Body);
var key = data.filter(s => s.includes('Property')).toString(); // to get element - Property - dynamic content
return workflowContext.trigger.outputs.body.Body[key];
https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-add-run-inline-code?tabs=consumption

how to launch a cloud dataflow pipeline when particular set of files reaches Cloud storage from a google cloud function

I have a requirement to create a cloud function which should check for a set of files in a GCS bucket and if all of those files arrives in GCS bucket then only it should launch the dataflow templates for all those files.
My existing cloud function code launches cloud dataflow for each file which comes into a GCS bucket. It runs different dataflows for different files based on naming convention. This existing code is working fine but my intention is not to trigger dataflow for each uploaded file directly.
It should check for set of files and if all the files arrives, then it should launch dataflows for those files.
Is there a way to do this using Cloud Functions or is there an alternative way of achieving the desired result ?
from googleapiclient.discovery import build
import time
def df_load_function(file, context):
filesnames = [
'Customer_',
'Customer_Address',
'Customer_service_ticket'
]
# Check the uploaded file and run related dataflow jobs.
for i in filesnames:
if 'inbound/{}'.format(i) in file['name']:
print("Processing file: {filename}".format(filename=file['name']))
project = 'xxx'
inputfile = 'gs://xxx/inbound/' + file['name']
job = 'df_load_wave1_{}'.format(i)
template = 'gs://xxx/template/df_load_wave1_{}'.format(i)
location = 'asia-south1'
dataflow = build('dataflow', 'v1b3', cache_discovery=False)
request = dataflow.projects().locations().templates().launch(
projectId=project,
gcsPath=template,
location=location,
body={
'jobName': job,
"environment": {
"workerRegion": "asia-south1",
"tempLocation": "gs://xxx/temp"
}
}
)
# Execute the dataflowjob
response = request.execute()
job_id = response["job"]["id"]
I've written the below code for the above functionality. The cloud function is running without any error but it is not triggering any dataflow. Not sure what is happening as the logs has no error.
from googleapiclient.discovery import build
import time
import os
def df_load_function(file, context):
filesnames = [
'Customer_',
'Customer_Address_',
'Customer_service_ticket_'
]
paths =['Customer_','Customer_Address_','Customer_service_ticket_']
for path in paths :
if os.path.exists('gs://xxx/inbound/')==True :
# Check the uploaded file and run related dataflow jobs.
for i in filesnames:
if 'inbound/{}'.format(i) in file['name']:
print("Processing file: {filename}".format(filename=file['name']))
project = 'xxx'
inputfile = 'gs://xxx/inbound/' + file['name']
job = 'df_load_wave1_{}'.format(i)
template = 'gs://xxx/template/df_load_wave1_{}'.format(i)
location = 'asia-south1'
dataflow = build('dataflow', 'v1b3', cache_discovery=False)
request = dataflow.projects().locations().templates().launch(
projectId=project,
gcsPath=template,
location=location,
body={
'jobName': job,
"environment": {
"workerRegion": "asia-south1",
"tempLocation": "gs://xxx/temp"
}
}
)
# Execute the dataflowjob
response = request.execute()
job_id = response["job"]["id"]
else:
exit()
Could someone please help me with the above python code.
Also my file names contain current dates at the end as these are incremental files which I get from different source teams.
If I'm understanding your question correctly, the easiest thing to do is to write basic logic in your function that determines if the entire set of files is present. If not, exit the function. If yes, run the appropriate Dataflow pipeline. Basically implementing what you wrote in your first paragraph as Python code.
If it's a small set of files it shouldn't be an issue to have a function run on each upload to check set completeness. Even if it's, for example, 10,000 files a month the cost is extremely small for this service assuming:
Your function isn't using lots of bandwidth to transfer data
The code for each function invocation doesn't take a long time to run.
Even in scenarios where you can't meet these requirements Functions is still pretty cheap to run.
If you're worried about costs I would recommend checking out the Google Cloud Pricing Calculator to get an estimate.
Edit with updated code:
I would highly recommend using the Google Cloud Storage Python client library for this. Using os.path likely won't work as there are additional underlying steps required to search a bucket...and probably more technical details there than I fully understand.
To use the Python client library, add google-cloud-storage to your requirements.txt. Then, use something like the following code to check the existence of an object. This example is based off an HTTP trigger, but the gist of the code to check object existence is the same.
from google.cloud import storage
import os
def hello_world(request):
# Instantiate GCS client
client = storage.client.Client()
# Instantiate bucket definition
bucket = storage.bucket.Bucket(client, name="bucket-name")
# Search for object
for file in filenames:
if storage.blob.Blob(file, bucket) and "name_modifier" in file:
# Run name_modifier Dataflow job
elif storage.blob.Blob(file, bucket) and "name_modifier_2" in file:
# Run name_modifier_2 Dataflow job
else:
return "File not found"
This code ins't exactly what you want from a logic standpoint, but should get you started. You'll probably want to just make sure all of the objects can be found first and then move to another step where you start running the corresponding Dataflow jobs for each file if they are all found in the previous step.

How do you set up an API Gateway Step Function integration using Terraform and aws_apigatewayv2_integration

I am looking for an example on how to start execution of a step function from API Gateway using Terraform and the aws_apigatewayv2_integration resource. I am using an HTTP API (have only found an older example for REST API's on Stackoverflow).
Currently I have this:
resource "aws_apigatewayv2_integration" "workflow_proxy_integration" {
api_id = aws_apigatewayv2_api.default.id
credentials_arn = aws_iam_role.api_gateway_step_functions.arn
integration_type = "AWS_PROXY"
integration_subtype = "StepFunctions-StartExecution"
description = "The integration which will start the Step Functions workflow."
payload_format_version = "1.0"
request_parameters = {
StateMachineArn = aws_sfn_state_machine.default.arn
}
}
Right now, my State Machine receives an empty input ("input": {}). When I try to add input to the request_parameters section, I get this error:
Error: error updating API Gateway v2 integration: BadRequestException: Parameter: input does not fit schema for Operation: StepFunctions-StartExecution.
I spent over an hour looking for a solution to a similar problem I was having with request_parameters. AWS's documentation currently uses camelCase for their keys in all of their examples (stateMachineArn, input, etc.) so it made it difficult to research.
You'll want to use PascalCase for your keys, similar to how you already did for StateMachineArn. So instead of input, you'll use Input.

Access CosmosDB from Azure Function (without input binding)

I have 2 collections in CosmosDB, Stocks and StockPrices.
StockPrices collection holds all historical prices, and is constantly updated.
I want to create Azure Function that listens to StockPrices updates (CosmosDBTrigger) and then does the following for each Document passed by the trigger:
Find stock with matching ticker in Stocks collection
Update stock price in Stocks collection
I can't do this with CosmosDB input binding, as CosmosDBTrigger passes a List (binding only works when trigger passes a single item).
The only way I see this working is if I foreach on CosmosDBTrigger List, and access CosmosDB from my function body and perform steps 1 and 2 above.
Question: How do I access CosmosDB from within my function?
One of the CosmosDB binding forms is to get a DocumentClient instance, which provides the full range of operations on the container. This way, you should be able to combine the change feed trigger and the item manipulation into the same function, like:
[FunctionName("ProcessStockChanges")]
public async Task Run(
[CosmosDBTrigger(/* Trigger params */)] IReadOnlyList<Document> changedItems,
[CosmosDB(/* Client params */)] DocumentClient client,
ILogger log)
{
// Read changedItems,
// Create/read/update/delete with client
}
It's also possible with .NET Core to use dependency injection to provide a full-fledged custom service/repository class to your function instance to interface to Cosmos. This is my preferred approach, because I can do validation, control serialization, etc with the latest version of the Cosmos SDK.
You may have done so intentionally, but just mentioning to consider combining your data into a single container partitioned by, for example, a combination of record type (Stock/StockPrice) and identifier. This simplifies things and can be more cost/resource efficient relative to multiple containers.
Ended up going with #Noah Stahl's suggestion. Leaving this here as an alternative.
Couldn't figure out how to do this directly, so came up with a work-around:
Add function with CosmosDBTrigger on StockPrices collection with Queue output binding
foreach over Documents from the trigger, serialize and add to the Queue
Add function with QueueTrigger, CosmosDB input binding for Stocks collection (with PartitionKey and Id set to StockTicker), and CosmosDB output binding for Stocks collection
Update Stock from CosmosDB input binding with values from the QueueTrigger
Assign updated Stock to CosmosDB output binding parameter (updates record in DB)
This said, I'd like to hear about more straightforward ways of doing this, as my approach seems like a hack.

Passing sets of properties and nodes as a POST statement wit KOA-NEO4J or BOLT

I am building a REST API which connects to a NEO4J instance. I am using the koa-neo4j library as the basis (https://github.com/assister-ai/koa-neo4j-starter-kit). I am a beginner at all these technologies but thanks to some help from this forum I have the basic functionality working. For example the below code allows me to create a new node with the label "metric" and set the name and dateAdded propertis.
URL:
/metric?metricName=Test&dateAdded=2/21/2017
index.js
app.defineAPI({
method: 'POST',
route: '/api/v1/imm/metric',
cypherQueryFile: './src/api/v1/imm/metric/createMetric.cyp'
});
createMetric.cyp"
CREATE (n:metric {
name: $metricName,
dateAdded: $dateAdded
})
return ID(n) as id
However, I am struggling to know how I can approach more complicated examples. How can I handle situations when I don't know how many properties will be added when creating a new node beforehand or when I want to create multiple nodes in a single post statement. Ideally I would like to be able to pass something like JSON as part of the POST which would contain all of the nodes, labels and properties that I want to create. Is something like this possible? I tried using the below Cypher query and passing a JSON string in the POST body but it didn't work.
UNWIND $props AS properties
CREATE (n:metric)
SET n = properties
RETURN n
Would I be better off switching tothe Neo4j Rest API instead of the BOLT protocol and the KOA-NEO4J framework. From my research I thought it was better to use BOLT but I want to have a Rest API as the middle layer between my front and back end so I am willing to change over if this will be easier in the longer term.
Thanks for the help!
Your Cypher syntax is bad in a couple of ways.
UNWIND only accepts a collection as its argument, not a string.
SET n = properties is only legal if properties is a map, not a string.
This query should work for creating a single node (assuming that $props is a map containing all the properties you want to store with the newly created node):
CREATE (n:metric $props)
RETURN n
If you want to create multiple nodes, then this query (essentially the same as yours) should work (but only if $prop_collection is a collection of maps):
UNWIND $prop_collection AS props
CREATE (n:metric)
SET n = props
RETURN n
I too have faced difficulties when trying to pass complex types as arguments to neo4j, this has to do with type conversions between js and cypher over bolt and there is not much one could do except for filing an issue in the official neo4j JavaScript driver repo. koa-neo4j uses the official driver under the hood.
One way to go about such scenarios in koa-neo4j is using JavaScript to manipulate the arguments before sending to Cypher:
https://github.com/assister-ai/koa-neo4j#preprocess-lifecycle
Also possible to further manipulate the results of a Cypher query using postProcess lifecycle hook:
https://github.com/assister-ai/koa-neo4j#postprocess-lifecycle

Resources