Python Integration Test Cloud Functions with Server running in subprocess - python-3.x

I am trying to mock a cloud storage response while testing a cloud function in a local testing environment.
1. I am starting the server - as advised by the docs - using a subprocess:
subprocess.Popen(
[
"functions-framework",
"--target",
"my_target",
"--port",
"8080",
"--source",
f"{os.getcwd()}/src/main.py",
],
cwd=os.path.dirname(__file__),
stdout=subprocess.PIPE,
)
2. I then ping the local server from my test function:
res = requests.get(
f"{SETTINGS.BASE_URL}:8080/test",
)
The server is up and running and it works just fine...
The cloud function talks to a Cloud Storage bucket that serves as a "database". However, I do not want to call it and Mock the buckets behaviour. I have tried many different things, such as patching the function, using MagicMock etc.. Nothing works. It always pings the real, live Cloud Storage bucket. I think I am not quite getting how to patch the cloud storage behaviour in the subprocess that the server is using.
Any ideas?
I tried e.g.:
def test_return_from_cs(cls):
mock_client = MagicMock(spec=google.cloud.storage.Client)
# Create a mock bucket
mock_bucket = MagicMock(spec=google.cloud.storage.Bucket)
# Create a mock blob
mock_blob = MagicMock(spec=google.cloud.storage.Blob)
# Set the return value of the mock client's get_bucket method to be the mock bucket
mock_client.get_bucket.return_value = mock_bucket
# Set the return value of the mock bucket's get_blob method to be the mock blob
mock_bucket.get_blob.return_value = mock_blob
# Set the return value of the mock blob's download_as_string method to be a string "test_data"
mock_blob.download_as_string.return_value = "test_data"
import subprocess
import os
subprocess.Popen(
[
"functions-framework",
"--target",
"my_target",
"--port",
"9090",
"--source",
f"{os.getcwd()}/src/main.py",
],
cwd=os.path.dirname(__file__),
stdout=subprocess.PIPE,
)
res = requests.get(
f"{SETTINGS.BASE_URL}:9090/test",
)
assert res.text == "test_data"
subprocess.terminate()
I also tried:
#patch("src.my_module.main.storage.Client", autospec=True)
def test_return_from_cs(cls, clientMock):
blob_mock = (
clientMock().get_bucket.return_value.get_blob.return_value
) # split this up for readability
bucket_mock = blob_mock.get_blob.return_value
bucket_mock.download_as_string.return_value = "test"
res = cls.session.get(
f"{SETTINGS.BASE_URL}:{SETTINGS.PORT}/test",
)
assert res.text == "test"
In the second attempt above the server is already running and imported from a conftest.py.
I guess I do not understand how to mock the call to CS within the subprocess...
It always pings the actual cloud storage bucket...

Related

My Azure Function is not sending data to my cosmos container, what have I done wrong

I've created an Azure Function that fetches data from an External API and sends that data to my cosomos container using a timer trigger. This function is not sending any data to the cosmos db and I can't figure out why.
I've removed some lines of code which contained sensitive information, but I have tested the code locally and it is fetching the data and sending to cosmos if I just run the file.
here is my function:
import datetime
import logging
import azure.functions as func
import time
import pandas as pd
import requests
import json
import uuid
from azure.cosmos.aio import CosmosClient
import asyncio
def get_data():
url = 'externalAPIUrl'
session = requests.Session()
request = session.get(url, headers=headers)
cookies = dict(request.cookies)
response = session.get(url, headers=headers, cookies=cookies).json()
rawData = pd.DataFrame(response)
rawop = pd.DataFrame(rawData['filtered']['data'])
rawop = rawop.set_index('paramter')
processed_data = []
for i in rawop.index:
#creating json to send
processed_data.append(processed_ce)
processed_data.append(processed_pe)
logging.info("Data processed successfully")
return processed_data
async def manage_cosmos(processed_data):
print(len(processed_data))
cosmosdb_endpoint = 'endpointURL'
cosmos_key = 'key'
DATABASE_NAME = 'dbname'
CONTAINER_NAME = 'containername'
async with CosmosClient(cosmosdb_endpoint, cosmos_key) as client:
database = client.get_database_client(DATABASE_NAME)
container = database.get_container_client(CONTAINER_NAME)
# send all the objects in the processed_Data list to cosmos asyncronously
tasks = []
for i in processed_data[:40]:
tasks.append(container.create_item(i))
await asyncio.gather(*tasks)
tasks = []
for i in processed_data[40:90]:
tasks.append(container.create_item(i))
await asyncio.gather(*tasks)
tasks = []
for i in processed_data[90:]:
tasks.append(container.create_item(i))
await asyncio.gather(*tasks)
logging.info("Data sent to cosmos successfully")
def main(mytimer: func.TimerRequest) -> None:
if mytimer.past_due:
logging.info('The timer is past due!')
try:
processed_data = get_data()
asyncio.run(manage_cosmos(processed_data))
except Exception as e:
logging.error("error in function", e)
here is my function.json
{
"scriptFile": "__init__.py",
"bindings": [
{
"name": "mytimer",
"type": "timerTrigger",
"direction": "in",
"schedule": "0 */2 * * * *"
}
]
}
this is my requirements.txt
# DO NOT include azure-functions-worker in this file
# The Python Worker is managed by Azure Functions platform
# Manually managing azure-functions-worker may cause unexpected issues
azure-functions
pandas
asyncio
aiohttp
requests
uuid
azure-cosmos
Can somebody tell me why it is not sending data to cosmos?
I deployed the above function and no data was added to my cosmos container. I ran the same code without the azure-function part locally and I was able to send new items to my cosmos container.
I am most probably making a mistake in creating the function so if anybody can point that out.

Python gRPC service for Envoy ratelimit

I am trying to create a small service to respond to Envoy's rate limiting queries. I have compiled all the relevant protobuff files and the one relevant for the service I am trying to implement is here:
https://github.com/envoyproxy/envoy/blob/v1.17.1/api/envoy/service/ratelimit/v3/rls.proto
There is a service definition in there but inside of the "compiled" python file, all I see about it is this:
_RATELIMITSERVICE = _descriptor.ServiceDescriptor(
name='RateLimitService',
full_name='envoy.service.ratelimit.v3.RateLimitService',
file=DESCRIPTOR,
index=0,
serialized_options=None,
create_key=_descriptor._internal_create_key,
serialized_start=1531,
serialized_end=1663,
methods=[
_descriptor.MethodDescriptor(
name='ShouldRateLimit',
full_name='envoy.service.ratelimit.v3.RateLimitService.ShouldRateLimit',
index=0,
containing_service=None,
input_type=_RATELIMITREQUEST,
output_type=_RATELIMITRESPONSE,
serialized_options=None,
create_key=_descriptor._internal_create_key,
),
])
_sym_db.RegisterServiceDescriptor(_RATELIMITSERVICE)
DESCRIPTOR.services_by_name['RateLimitService'] = _RATELIMITSERVICE
here is my feeble attempt at implementing the service
import logging
import asyncio
import grpc
from envoy.service.ratelimit.v3.rls_pb2 import RateLimitResponse, RateLimitRequest
class RL:
def ShouldRateLimit(self, request):
result = RateLimitResponse()
def add_handler(servicer, server):
rpc_method_handlers = {
'ShouldRateLimit': grpc.unary_unary_rpc_method_handler(
RL.ShouldRateLimit,
request_deserializer=RateLimitRequest.FromString,
response_serializer=RateLimitResponse.SerializeToString,
)
}
generic_handler = grpc.method_handlers_generic_handler(
'envoy.service.ratelimit.v3.RateLimitService',
rpc_method_handlers
)
server.add_generic_rpc_handlers((generic_handler,))
async def serve():
server = grpc.aio.server()
add_handler(RL(), server)
listen_addr = '[::]:5051'
server.add_insecure_port(listen_addr)
logging.info(f'Starting server on {listen_addr}')
await server.start()
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
asyncio.run(serve())
How am I supposed to return (or even instantiate) a RateLimitResponse back to the caller ?

How to refresh the boto3 credetials when python script is running indefinitely

I am trying to write a python script that uses watchdog to look for file creation and upload that to s3 using boto3. However, my boto3 credentials expire after every 12hrs, So I need to renew them. I am storing my boto3 credentials in ~/.aws/credentials. So right now I am trying to catch the S3UploadFailedError, renew the credentials, and write them to ~/.aws/credentials. But though the credentials are getting renewed and I am calling boto3.client('s3') again its throwing exception.
What am I doing wrong? Or how can I resolve it?
Below is the code snippet
try:
s3 = boto3.client('s3')
s3.upload_file(event.src_path,'bucket-name',event.src_path)
except boto3.exceptions.S3UploadFailedError as e:
print(e)
get_aws_credentials()
s3 = boto3.client('s3')
I have found a good example to refresh the credentials within this link:
https://pritul95.github.io/blogs/boto3/2020/08/01/refreshable-boto3-session/
but there this a little bug inside. Be careful about that.
Here is the corrected code:
from uuid import uuid4
from datetime import datetime
from time import time
from boto3 import Session
from botocore.credentials import RefreshableCredentials
from botocore.session import get_session
class RefreshableBotoSession:
"""
Boto Helper class which lets us create refreshable session, so that we can cache the client or resource.
Usage
-----
session = RefreshableBotoSession().refreshable_session()
client = session.client("s3") # we now can cache this client object without worrying about expiring credentials
"""
def __init__(
self,
region_name: str = None,
profile_name: str = None,
sts_arn: str = None,
session_name: str = None,
session_ttl: int = 3000
):
"""
Initialize `RefreshableBotoSession`
Parameters
----------
region_name : str (optional)
Default region when creating new connection.
profile_name : str (optional)
The name of a profile to use.
sts_arn : str (optional)
The role arn to sts before creating session.
session_name : str (optional)
An identifier for the assumed role session. (required when `sts_arn` is given)
session_ttl : int (optional)
An integer number to set the TTL for each session. Beyond this session, it will renew the token.
50 minutes by default which is before the default role expiration of 1 hour
"""
self.region_name = region_name
self.profile_name = profile_name
self.sts_arn = sts_arn
self.session_name = session_name or uuid4().hex
self.session_ttl = session_ttl
def __get_session_credentials(self):
"""
Get session credentials
"""
session = Session(region_name=self.region_name, profile_name=self.profile_name)
# if sts_arn is given, get credential by assuming given role
if self.sts_arn:
sts_client = session.client(service_name="sts", region_name=self.region_name)
response = sts_client.assume_role(
RoleArn=self.sts_arn,
RoleSessionName=self.session_name,
DurationSeconds=self.session_ttl,
).get("Credentials")
credentials = {
"access_key": response.get("AccessKeyId"),
"secret_key": response.get("SecretAccessKey"),
"token": response.get("SessionToken"),
"expiry_time": response.get("Expiration").isoformat(),
}
else:
session_credentials = session.get_credentials().__dict__
credentials = {
"access_key": session_credentials.get("access_key"),
"secret_key": session_credentials.get("secret_key"),
"token": session_credentials.get("token"),
"expiry_time": datetime.fromtimestamp(time() + self.session_ttl).isoformat(),
}
return credentials
def refreshable_session(self) -> Session:
"""
Get refreshable boto3 session.
"""
# get refreshable credentials
refreshable_credentials = RefreshableCredentials.create_from_metadata(
metadata=self.__get_session_credentials(),
refresh_using=self.__get_session_credentials,
method="sts-assume-role",
)
# attach refreshable credentials current session
session = get_session()
session._credentials = refreshable_credentials
session.set_config_variable("region", self.region_name)
autorefresh_session = Session(botocore_session=session)
return autorefresh_session
According to the documentation, the client looks in several locations for credentials and there are other options that are also more programmatic-friendly that you might want to consider instead of the .aws/credentials file.
Quoting the docs:
The order in which Boto3 searches for credentials is:
Passing credentials as parameters in the boto.client() method
Passing credentials as parameters when creating a Session object
Environment variables
Shared credential file (~/.aws/credentials)
AWS config file (~/.aws/config)
Assume Role provider
In your case, since you are already catching the exception and renewing the credentials, I would simply pass the new ones to a new instance of the client like so:
client = boto3.client(
's3',
aws_access_key_id=NEW_ACCESS_KEY,
aws_secret_access_key=NEW_SECRET_KEY,
aws_session_token=NEW_SESSION_TOKEN
)
If instead you are using these same credentials elsewhere in the code to create other clients, I'd consider setting them as environment variables:
import os
os.environ['AWS_ACCESS_KEY_ID'] = NEW_ACCESS_KEY
os.environ['AWS_SECRET_ACCESS_KEY'] = NEW_SECRET_KEY
os.environ['AWS_SESSION_TOKEN'] = NEW_SESSION_TOKEN
Again, quoting the docs:
The session key for your AWS account [...] is only needed when you are using temporary credentials.
Here is my implementation which only generates new credentials if existing credentials expire using a singleton design pattern
import boto3
from datetime import datetime
from dateutil.tz import tzutc
import os
import binascii
class AssumeRoleProd:
__credentials = None
def __init__(self):
assert True==False
#staticmethod
def __setCredentials():
print("\n\n ======= GENERATING NEW SESSION TOKEN ======= \n\n")
# create an STS client object that represents a live connection to the
# STS service
sts_client = boto3.client('sts')
# Call the assume_role method of the STSConnection object and pass the role
# ARN and a role session name.
assumed_role_object = sts_client.assume_role(
RoleArn=your_role_here,
RoleSessionName=f"AssumeRoleSession{binascii.b2a_hex(os.urandom(15)).decode('UTF-8')}"
)
# From the response that contains the assumed role, get the temporary
# credentials that can be used to make subsequent API calls
AssumeRoleProd.__credentials = assumed_role_object['Credentials']
#staticmethod
def getTempCredentials():
credsExpired = False
# Return object for the first time
if AssumeRoleProd.__credentials is None:
AssumeRoleProd.__setCredentials()
credsExpired = True
# Generate if only 5 minutes are left for expiry. You may setup for entire 60 minutes by catching botocore ClientException
elif (AssumeRoleProd.__credentials['Expiration']-datetime.now(tzutc())).seconds//60<=5:
AssumeRoleProd.__setCredentials()
credsExpired = True
return AssumeRoleProd.__credentials
And then I am using singleton design pattern for client as well which would generate a new client only if new session is generated. You can add region as well if required.
class lambdaClient:
__prodClient = None
def __init__(self):
assert True==False
#staticmethod
def __initProdClient():
credsExpired, credentials = AssumeRoleProd.getTempCredentials()
if lambdaClient.__prodClient is None or credsExpired:
lambdaClient.__prodClient = boto3.client('lambda',
aws_access_key_id=credentials['AccessKeyId'],
aws_secret_access_key=credentials['SecretAccessKey'],
aws_session_token=credentials['SessionToken'])
return lambdaClient.__prodClient
#staticmethod
def getProdClient():
return lambdaClient.__initProdClient()

How to send a GraphQL query to AppSync from python?

How do we post a GraphQL request through AWS AppSync using boto?
Ultimately I'm trying to mimic a mobile app accessing our stackless/cloudformation stack on AWS, but with python. Not javascript or amplify.
The primary pain point is authentication; I've tried a dozen different ways already. This the current one, which generates a "401" response with "UnauthorizedException" and "Permission denied", which is actually pretty good considering some of the other messages I've had. I'm now using the 'aws_requests_auth' library to do the signing part. I assume it authenticates me using the stored /.aws/credentials from my local environment, or does it?
I'm a little confused as to where and how cognito identities and pools will come into it. eg: say I wanted to mimic the sign-up sequence?
Anyways the code looks pretty straightforward; I just don't grok the authentication.
from aws_requests_auth.boto_utils import BotoAWSRequestsAuth
APPSYNC_API_KEY = 'inAppsyncSettings'
APPSYNC_API_ENDPOINT_URL = 'https://aaaaaaaaaaaavzbke.appsync-api.ap-southeast-2.amazonaws.com/graphql'
headers = {
'Content-Type': "application/graphql",
'x-api-key': APPSYNC_API_KEY,
'cache-control': "no-cache",
}
query = """{
GetUserSettingsByEmail(email: "john#washere"){
items {name, identity_id, invite_code}
}
}"""
def test_stuff():
# Use the library to generate auth headers.
auth = BotoAWSRequestsAuth(
aws_host='aaaaaaaaaaaavzbke.appsync-api.ap-southeast-2.amazonaws.com',
aws_region='ap-southeast-2',
aws_service='appsync')
# Create an http graphql request.
response = requests.post(
APPSYNC_API_ENDPOINT_URL,
json={'query': query},
auth=auth,
headers=headers)
print(response)
# this didn't work:
# response = requests.post(APPSYNC_API_ENDPOINT_URL, data=json.dumps({'query': query}), auth=auth, headers=headers)
Yields
{
"errors" : [ {
"errorType" : "UnauthorizedException",
"message" : "Permission denied"
} ]
}
It's quite simple--once you know. There are some things I didn't appreciate:
I've assumed IAM authentication (OpenID appended way below)
There are a number of ways for appsync to handle authentication. We're using IAM so that's what I need to deal with, yours might be different.
Boto doesn't come into it.
We want to issue a request like any regular punter, they don't use boto, and neither do we. Trawling the AWS boto docs was a waste of time.
Use the AWS4Auth library
We are going to send a regular http request to aws, so whilst we can use python requests they need to be authenticated--by attaching headers.
And, of course, AWS auth headers are special and different from all others.
You can try to work out how to do it
yourself, or you can go looking for someone else who has already done it: Aws_requests_auth, the one I started with, probably works just fine, but I have ended up with AWS4Auth. There are many others of dubious value; none endorsed or provided by Amazon (that I could find).
Specify appsync as the "service"
What service are we calling? I didn't find any examples of anyone doing this anywhere. All the examples are trivial S3 or EC2 or even EB which left uncertainty. Should we be talking to api-gateway service? Whatsmore, you feed this detail into the AWS4Auth routine, or authentication data. Obviously, in hindsight, the request is hitting Appsync, so it will be authenticated by Appsync, so specify "appsync" as the service when putting together the auth headers.
It comes together as:
import requests
from requests_aws4auth import AWS4Auth
# Use AWS4Auth to sign a requests session
session = requests.Session()
session.auth = AWS4Auth(
# An AWS 'ACCESS KEY' associated with an IAM user.
'AKxxxxxxxxxxxxxxx2A',
# The 'secret' that goes with the above access key.
'kwWxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxgEm',
# The region you want to access.
'ap-southeast-2',
# The service you want to access.
'appsync'
)
# As found in AWS Appsync under Settings for your endpoint.
APPSYNC_API_ENDPOINT_URL = 'https://nqxxxxxxxxxxxxxxxxxxxke'
'.appsync-api.ap-southeast-2.amazonaws.com/graphql'
# Use JSON format string for the query. It does not need reformatting.
query = """
query foo {
GetUserSettings (
identity_id: "ap-southeast-2:8xxxxxxb-7xx4-4xx4-8xx0-exxxxxxx2"
){
user_name, email, whatever
}}"""
# Now we can simply post the request...
response = session.request(
url=APPSYNC_API_ENDPOINT_URL,
method='POST',
json={'query': query}
)
print(response.text)
Which yields
# Your answer comes as a JSON formatted string in the text attribute, under data.
{"data":{"GetUserSettings":{"user_name":"0xxxxxxx3-9102-42f0-9874-1xxxxx7dxxx5"}}}
Getting credentials
To get rid of the hardcoded key/secret you can consume the local AWS ~/.aws/config and ~/.aws/credentials, and it is done this way...
# Use AWS4Auth to sign a requests session
session = requests.Session()
credentials = boto3.session.Session().get_credentials()
session.auth = AWS4Auth(
credentials.access_key,
credentials.secret_key,
boto3.session.Session().region_name,
'appsync',
session_token=credentials.token
)
...<as above>
This does seem to respect the environment variable AWS_PROFILE for assuming different roles.
Note that STS.get_session_token is not the way to do it, as it may try to assume a role from a role, depending where it keyword matched the AWS_PROFILE value. Labels in the credentials file will work because the keys are right there, but names found in the config file do not work, as that assumes a role already.
OpenID
In this scenario, all the complexity is transferred to the conversation with the openid connect provider. The hard stuff is all the auth hoops you jump through to get an access token, and thence using the refresh token to keep it alive. That is where all the real work lies.
Once you finally have an access token, assuming you have configured the "OpenID Connect" Authorization Mode in appsync, then you can, very simply, drop the access token into the header:
response = requests.post(
url="https://nc3xxxxxxxxxx123456zwjka.appsync-api.ap-southeast-2.amazonaws.com/graphql",
headers={"Authorization": ACCESS_TOKEN},
json={'query': "query foo{GetStuff{cat, dog, tree}}"}
)
You can set up an API key on the AppSync end and use the code below. This works for my case.
import requests
# establish a session with requests session
session = requests.Session()
# As found in AWS Appsync under Settings for your endpoint.
APPSYNC_API_ENDPOINT_URL = 'https://vxxxxxxxxxxxxxxxxxxy.appsync-api.ap-southeast-2.amazonaws.com/graphql'
# setup the query string (optional)
query = """query listItemsQuery {listItemsQuery {items {correlation_id, id, etc}}}"""
# Now we can simply post the request...
response = session.request(
url=APPSYNC_API_ENDPOINT_URL,
method='POST',
headers={'x-api-key': '<APIKEYFOUNDINAPPSYNCSETTINGS>'},
json={'query': query}
)
print(response.json()['data'])
Building off Joseph Warda's answer you can use the class below to send AppSync commands.
# fileName: AppSyncLibrary
import requests
class AppSync():
def __init__(self,data):
endpoint = data["endpoint"]
self.APPSYNC_API_ENDPOINT_URL = endpoint
self.api_key = data["api_key"]
self.session = requests.Session()
def graphql_operation(self,query,input_params):
response = self.session.request(
url=self.APPSYNC_API_ENDPOINT_URL,
method='POST',
headers={'x-api-key': self.api_key},
json={'query': query,'variables':{"input":input_params}}
)
return response.json()
For example in another file within the same directory:
from AppSyncLibrary import AppSync
APPSYNC_API_ENDPOINT_URL = {YOUR_APPSYNC_API_ENDPOINT}
APPSYNC_API_KEY = {YOUR_API_KEY}
init_params = {"endpoint":APPSYNC_API_ENDPOINT_URL,"api_key":APPSYNC_API_KEY}
app_sync = AppSync(init_params)
mutation = """mutation CreatePost($input: CreatePostInput!) {
createPost(input: $input) {
id
content
}
}
"""
input_params = {
"content":"My first post"
}
response = app_sync.graphql_operation(mutation,input_params)
print(response)
Note: This requires you to activate API access for your AppSync API. Check this AWS post for more details.
graphql-python/gql supports AWS AppSync since version 3.0.0rc0.
It supports queries, mutation and even subscriptions on the realtime endpoint.
The documentation is available here
Here is an example of a mutation using the API Key authentication:
import asyncio
import os
import sys
from urllib.parse import urlparse
from gql import Client, gql
from gql.transport.aiohttp import AIOHTTPTransport
from gql.transport.appsync_auth import AppSyncApiKeyAuthentication
# Uncomment the following lines to enable debug output
# import logging
# logging.basicConfig(level=logging.DEBUG)
async def main():
# Should look like:
# https://XXXXXXXXXXXXXXXXXXXXXXXXXX.appsync-api.REGION.amazonaws.com/graphql
url = os.environ.get("AWS_GRAPHQL_API_ENDPOINT")
api_key = os.environ.get("AWS_GRAPHQL_API_KEY")
if url is None or api_key is None:
print("Missing environment variables")
sys.exit()
# Extract host from url
host = str(urlparse(url).netloc)
auth = AppSyncApiKeyAuthentication(host=host, api_key=api_key)
transport = AIOHTTPTransport(url=url, auth=auth)
async with Client(
transport=transport, fetch_schema_from_transport=False,
) as session:
query = gql(
"""
mutation createMessage($message: String!) {
createMessage(input: {message: $message}) {
id
message
createdAt
}
}"""
)
variable_values = {"message": "Hello world!"}
result = await session.execute(query, variable_values=variable_values)
print(result)
asyncio.run(main())
I am unable to add a comment due to low rep, but I just want to add that I tried the accepted answer and it didn't work. I was getting an error saying my session_token is invalid. Probably because I was using AWS Lambda.
I got it to work pretty much exactly, but by adding to the session token parameter of the aws4auth object. Here's the full piece:
import requests
import os
from requests_aws4auth import AWS4Auth
def AppsyncHandler(event, context):
# These are env vars that are always present in an AWS Lambda function
# If not using AWS Lambda, you'll need to add them manually to your env.
access_id = os.environ.get("AWS_ACCESS_KEY_ID")
secret_key = os.environ.get("AWS_SECRET_ACCESS_KEY")
session_token = os.environ.get("AWS_SESSION_TOKEN")
region = os.environ.get("AWS_REGION")
# Your AppSync Endpoint
api_endpoint = os.environ.get("AppsyncConnectionString")
resource = "appsync"
session = requests.Session()
session.auth = AWS4Auth(access_id,
secret_key,
region,
resource,
session_token=session_token)
The rest is the same.
Hope this Helps Everyone
import requests
import json
import os
from dotenv import load_dotenv
load_dotenv(".env")
class AppSync(object):
def __init__(self,data):
endpoint = data["endpoint"]
self.APPSYNC_API_ENDPOINT_URL = endpoint
self.api_key = data["api_key"]
self.session = requests.Session()
def graphql_operation(self,query,input_params):
response = self.session.request(
url=self.APPSYNC_API_ENDPOINT_URL,
method='POST',
headers={'x-api-key': self.api_key},
json={'query': query,'variables':{"input":input_params}}
)
return response.json()
def main():
APPSYNC_API_ENDPOINT_URL = os.getenv("APPSYNC_API_ENDPOINT_URL")
APPSYNC_API_KEY = os.getenv("APPSYNC_API_KEY")
init_params = {"endpoint":APPSYNC_API_ENDPOINT_URL,"api_key":APPSYNC_API_KEY}
app_sync = AppSync(init_params)
mutation = """
query MyQuery {
getAccountId(id: "5ca4bbc7a2dd94ee58162393") {
_id
account_id
limit
products
}
}
"""
input_params = {}
response = app_sync.graphql_operation(mutation,input_params)
print(json.dumps(response , indent=3))
main()

How to trigger a dataflow with a cloud function? (Python SDK)

I have a cloud function that is triggered by cloud Pub/Sub. I want the same function trigger dataflow using Python SDK. Here is my code:
import base64
def hello_pubsub(event, context):
if 'data' in event:
message = base64.b64decode(event['data']).decode('utf-8')
else:
message = 'hello world!'
print('Message of pubsub : {}'.format(message))
I deploy the function this way:
gcloud beta functions deploy hello_pubsub --runtime python37 --trigger-topic topic1
You have to embed your pipeline python code with your function. When your function is called, you simply call the pipeline python main function which executes the pipeline in your file.
If you developed and tried your pipeline in Cloud Shell and you already ran it in Dataflow pipeline, your code should have this structure:
def run(argv=None, save_main_session=True):
# Parse argument
# Set options
# Start Pipeline in p variable
# Perform your transform in Pipeline
# Run your Pipeline
result = p.run()
# Wait the end of the pipeline
result.wait_until_finish()
Thus, call this function with the correct argument especially the runner=DataflowRunner to allow the python code to load the pipeline in Dataflow service.
Delete at the end the result.wait_until_finish() because your function won't live all the dataflow process long.
You can also use template if you want.
You can use Cloud Dataflow templates to launch your job. You will need to code the following steps:
Retrieve credentials
Generate Dataflow service instance
Get GCP PROJECT_ID
Generate template body
Execute template
Here is an example using your base code (feel free to split into multiple methods to reduce code inside hello_pubsub method).
from googleapiclient.discovery import build
import base64
import google.auth
import os
def hello_pubsub(event, context):
if 'data' in event:
message = base64.b64decode(event['data']).decode('utf-8')
else:
message = 'hello world!'
credentials, _ = google.auth.default()
service = build('dataflow', 'v1b3', credentials=credentials)
gcp_project = os.environ["GCLOUD_PROJECT"]
template_path = gs://template_file_path_on_storage/
template_body = {
"parameters": {
"keyA": "valueA",
"keyB": "valueB",
},
"environment": {
"envVariable": "value"
}
}
request = service.projects().templates().launch(projectId=gcp_project, gcsPath=template_path, body=template_body)
response = request.execute()
print(response)
In template_body variable, parameters values are the arguments that will be sent to your pipeline and environment values are used by Dataflow service (serviceAccount, workers and network configuration).
LaunchTemplateParameters documentation
RuntimeEnvironment documentation

Resources