Durable Azure Function (python) weird behaviour - azure

I have a function that transfers folders across SharePoint sites. Because it can run for quite some time depending on the number of files to transfer I converted the azure function to a durable function.
Initially, all worked as expected until one time when many transfers failed due to the data not being on the source site. When the process looked up the folder on the source site it received ''Exception: 404 Client Error: Not Found''. I wasn't able to find the folder on the source site. But looking at the sink site the folder has been moved. The timestamp fits well with the azure function run. But the folder was not moved by the azure function.
Current setup:
Starter function
import logging
import azure.functions as func
import azure.durable_functions as df
async def main(req: func.HttpRequest, starter: str) -> func.HttpResponse:
try:
req_body = req.get_json()
logging.info(req_body)
except ValueError:
apps = None
else:
apps = req_body.get('apps')
client = df.DurableOrchestrationClient(starter)
x = [req.route_params["environment"], apps]
instance_id = await client.start_new("orchestrator", None, x)
logging.info(f"Started orchestration with ID = '{instance_id}'.")
return client.create_check_status_response(req, instance_id)
Orchestrator
import logging
import json
import azure.functions as func
import azure.durable_functions as df
def orchestrator_function(context: df.DurableOrchestrationContext):
environment, apps = context.get_input()
if (environment == "production") and (not apps):
result1 = yield context.call_activity('sp_folder_transfer_prod', None)
return {"success": result1}
elif (environment == "saffull") and (apps):
result2 = yield context.call_activity('sp_folder_transfer_saffull', apps)
return {"success": result2}
else:
raise Exception('Process stopped')
main = df.Orchestrator.create(orchestrator_function)
sp_folder_transfer_prod init
import logging
from . import SAFTprod as saft
import traceback
def main(trigger):
try:
saft.main_saft_prod()
return True
except:
logging.error(traceback.format_exc())
return False
I can't find anything that would indicate anything going wrong. The only weird thing I could find is a delay between the orchestrator function and the main function. In all other runs, there is no delay.
02:25:54 starter [duration 186ms]
02:25:55 orchestrator (1st run) [duration 27ms]
02:41:17 sp_folder_transfer_prod [duration 4.27min]
02:45:35 orchestrator (2nd run) [duration 86ms]
There is a delay of 15min and 22s between the orchestrator being triggered and the start of the sp_folder_transfer_prod. Usually a run with the same number of folders to transfer would take about 20 min.
Any idea what is going on?
EDIT:
While investigating this issue I found out that the main activity function is triggered twice. The first run fails for some reason and there are no logs. Re-triggering the activity function again. The delay between the orchestrator and the activity function in this case is the first run of the activity function. More here:
Azure Durable Function triggering the main activity function twice (Python)

Related

How to log the return value of a POST method after returning the response?

I'm working on my first ever REST API, so apologies in advance if I've missed something basic. I have a function that takes a JSON request from another server, processes it (makes a prediction based on the data), and returns another JSON with the results. I'd like to keep a log on the server's local disk of all requests to this endpoint along with their results, for evaluation purposes and for retraining the model. However, for the purposes of minimising the latency of returning the result to the user, I'd like to return the response data first, and then write it to the local disk. It's not obvious to me how to do this properly, as the FastAPI paradigm necessitates that the result of a POST method is the return value of the decorated function, so anything I want to do with the data has to be done before it is returned.
Below is a minimal working example of what I think is my closest attempt at getting it right so far, using a custom object with a log decorator - my idea was just to assign the result to the log object as a class attribute, then use another method to write it to disk, but I can't figure out how to make sure that that function gets called after get_data every time.
import json
import uvicorn
from fastapi import FastAPI, Request
from functools import wraps
from pydantic import BaseModel
class Blob(BaseModel):
id: int
x: float
def crunch_numbers(data: Blob) -> dict:
# does some stuff
return {'foo': 'bar'}
class PostResponseLogger:
def __init__(self) -> None:
self.post_result = None
def log(self, func, *args, **kwargs):
#wraps(func)
def func_to_log(*args, **kwargs):
post_result = func(*args, **kwargs)
self.post_result = post_result
# how can this be done outside of this function ???
self.write_data()
return post_result
return func_to_log
def write_data(self):
if self.post_result:
with open('output.json', 'w') as f:
json.dump(self.post_result, f)
def main():
app = FastAPI()
logger = PostResponseLogger()
#app.post('/get_data/')
#logger.log
def get_data(input_json: dict, request: Request):
result = crunch_numbers(input_json)
return result
uvicorn.run(app=app)
if __name__ == '__main__':
main()
Basically, my question boils down to: "is there a way, in the PostResponseLogger class, to automatically call self.write_data after every call to self.log?", but if I'm using the wrong approach altogether, any other suggestions are also welcome.
You could have a Background Task for that purpose. A background task "will run only once the response has been sent" (see Starlette documentation). "This is useful for operations that need to happen after a request, but that the client doesn't really have to be waiting for the operation to complete before receiving the response" (see FastAPI documentation).
You can define a task function to run in the background for writing the log data, as shown below:
def write_log_data():
logger.write_data()
Then, import BackgroundTasks and define a parameter in your endpoint with a type declaration of BackgroundTasks. Inside of your endpoint, pass your task function (i.e., write_log_data, as defined above) to the background_tasks object with the method .add_task():
from fastapi import BackgroundTasks
#app.post('/get_data/')
#logger.log
def get_data(input_json: dict, request: Request, background_tasks: BackgroundTasks):
result = crunch_numbers(input_json)
background_tasks.add_task(write_log_data)
return result
The same principle could be applied if a middleware was used to capture and log the response data, as described in this answer, or a custom APIRoute class, as demonstrated in this answer.
For future reference, if you (or anyone) ever need to use async/await syntax, and run into concurrency issues (such as the event loop getting blocked) while performing some heavy background computation, please have a look at this answer, which explains the difference between defining an endpoint or a background task function with async def and def (briefly, async def endpoints/background tasks will run in the event loop, whereas def functions will run in an external threadpool that is then awaited), as well as provides solutions when it comes to running blocking I/O-bound or CPU-bound operations in such functions.

Tornado HTTPServer that adds objects to a queue upon receiving a POST request

I want to create a web server, that automatically handles "orders" when receiving a POST request.
My code so far looks like this:
from json import loads
from tornado.httpserver import HTTPServer
from tornado.ioloop import IOLoop
from tornado.web import Application, url, RequestHandler
order_list = list()
class MainHandler(RequestHandler):
def get(self):
pass
def post(self):
if self.__authenticate_incoming_request():
payload = loads(self.request.body)
order_list.append(payload)
else:
pass
def __authenticate_incoming_request(self) -> bool:
# some request authentication code
return True
def start_server():
application = Application([
url(r"/", MainHandler)
])
server = HTTPServer(application)
server.listen(8080)
IOLoop.current().start()
if __name__ == '__main__':
start_server()
Here is what I want to achieve:
Receive a POST request with information about incoming "orders"
Perform an action A n-times based on a value defined in the request.body(concurrently, if possible)
Previously, to perform action A n-times, I have used a threading.ThreadPoolExecutor, but I am not sure, how I should handle this correctly with a web server running in parallel.
My idea was something like this:
start_server()
tpe = ThreadPoolExecutor(max_workers=10)
while True:
if order_list:
new_order = order_list.pop(0)
tpe.submit(my_action, new_order) # my_action is my desired function
sleep(5)
Now this piece of code is of course blocking, and I was hoping that the web server would continue running in parallel, while I am running my while-loop.
Is a setup like this possible? Do I maybe need to utilize other modules? Any help greatly appreciated!
It's not working as expected because time.sleep is a blocking function.
Instead of using a list and a while loop and sleeping to check for new items in a list, use Tornado's queues.Queue which will allow you to check for new items asynchronously.
from tornado.queues import Queue
order_queue = Queue()
tpe = ThreadPoolExecutor(max_workers=10)
async def queue_consumer():
# The old while-loop is now converted into a coroutine
# and an async for loop is used instead
async for new_order in order_queue:
# run your function in threadpool
IOLoop.current().run_in_executor(tpe, my_action, new_order)
def start_server():
# ...
# run queue_consumer function before starting the loop
IOLoop.current().spawn_callback(queue_consumer)
IOLoop.current().start()
Put items in the queue like this:
order_queue.put(payload)

How to have more than one handler in AWS Lambda Function?

I have a very large python file that consists of multiple defined functions. If you're familiar with AWS Lambda, when you create a lambda function, you specify a handler, which is a function in the code that AWS Lambda can invoke when service executes my code, which is represented below in my_handler.py file:
def handler_name(event, context):
...
return some_value
Link Source: https://docs.aws.amazon.com/lambda/latest/dg/python-programming-model-handler-types.html
However, as I mentioned above, I have multiple defined functions in my_handler.py that have their own events and contexts. Therefore, this will result in an error. Are there any ways around this in python3.6?
Your single handler function will need to be responsible for parsing the incoming event, and determining the appropriate route to take. For example, let's say your other functions are called helper1 and helper2. Your Lambda handler function will inspect the incoming event and then, based on one of the fields in the incoming event (ie. let's call it EventType), call either helper1 or helper2, passing in both the event and context objects.
def handler_name(event, context):
if event['EventType'] == 'helper1':
helper1(event, context)
elif event['EventType'] == 'helper2':
helper2(event, context)
def helper1(event, context):
pass
def helper2(event, context):
pass
This is only pseudo-code, and I haven't tested it myself, but it should get the concept across.
Little late to the game but thought it wouldn't hurt to share. Best practices suggest that one separate the handler from the Lambda's core logic. Not only is it okay to add additional definitions, it can lead to more legible code and reduce waste--e.g. multiple API calls to S3. So, although it can get out of hand, I disagree with some of those critiques to your initial question. It's effective to use your handler as a logical interface to the additional functions that will accomplish your various work. In Data Architecture & Engineering land it's often less-costly and more efficient to work in this manner. Particularly if you are building out ETL pipelines, following service-oriented architectural patterns. Admittedly, I'm a bit of a Maverick and some may find this unruly/egregious but I've gone so far as to build classes into my Lambdas for various reasons--e.g. centralized, data-lake-ish S3 buckets that accommodate a variety of file types, reduce unnecessary requests, etc...--and I stand by it. Here's an example of one of my handler files from a CDK example project I put on the hub awhile back. Hopefully it'll give you some useful ideas, or at the very least not feel alone in wanting to beef up your Lambdas.
import requests
import json
from requests.exceptions import Timeout
from requests.exceptions import HTTPError
from botocore.exceptions import ClientError
from datetime import date
import csv
import os
import boto3
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
class Asteroids:
"""Client to NASA API and execution interface to branch data processing by file type.
Notes:
This class doesn't look like a normal class. It is a simple example of how one might
workaround AWS Lambda's limitations of class use in handlers. It also allows for
better organization of code to simplify this example. If one planned to add
other NASA endpoints or process larger amounts of Asteroid data for both .csv and .json formats,
asteroids_json and asteroids_csv should be modularized and divided into separate lambdas
where stepfunction orchestration is implemented for a more comprehensive workflow.
However, for the sake of this demo I'm keeping it lean and easy.
"""
def execute(self, format):
"""Serves as Interface to assign class attributes and execute class methods
Raises:
Exception: If file format is not of .json or .csv file types.
Notes:
Have fun!
"""
self.file_format=format
self.today=date.today().strftime('%Y-%m-%d')
# method call below used when Secrets Manager integrated. See get_secret.__doc__ for more.
# self.api_key=get_secret('nasa_api_key')
self.api_key=os.environ["NASA_KEY"]
self.endpoint=f"https://api.nasa.gov/neo/rest/v1/feed?start_date={self.today}&end_date={self.today}&api_key={self.api_key}"
self.response_object=self.nasa_client(self.endpoint)
self.processed_response=self.process_asteroids(self.response_object)
if self.file_format == "json":
self.asteroids_json(self.processed_response)
elif self.file_format == "csv":
self.asteroids_csv(self.processed_response)
else:
raise Exception("FILE FORMAT NOT RECOGNIZED")
self.write_to_s3()
def nasa_client(self, endpoint):
"""Client component for API call to NASA endpoint.
Args:
endpoint (str): Parameterized url for API call.
Raises:
Timeout: If connection not made in 5s and/or data not retrieved in 15s.
HTTPError & Exception: Self-explanatory
Notes:
See Cloudwatch logs for debugging.
"""
try:
response = requests.get(endpoint, timeout=(5, 15))
except Timeout as timeout:
print(f"NASA GET request timed out: {timeout}")
except HTTPError as http_err:
print(f"HTTP error occurred: {http_err}")
except Exception as err:
print(f'Other error occurred: {err}')
else:
return json.loads(response.content)
def process_asteroids(self, payload):
"""Process old, and create new, data object with content from response.
Args:
payload (b'str'): Binary string of asteroid data to be processed.
"""
near_earth_objects = payload["near_earth_objects"][f"{self.today}"]
asteroids = []
for neo in near_earth_objects:
asteroid_object = {
"id" : neo['id'],
"name" : neo['name'],
"hazard_potential" : neo['is_potentially_hazardous_asteroid'],
"est_diameter_min_ft": neo['estimated_diameter']['feet']['estimated_diameter_min'],
"est_diameter_max_ft": neo['estimated_diameter']['feet']['estimated_diameter_max'],
"miss_distance_miles": [item['miss_distance']['miles'] for item in neo['close_approach_data']],
"close_approach_exact_time": [item['close_approach_date_full'] for item in neo['close_approach_data']]
}
asteroids.append(asteroid_object)
return asteroids
def asteroids_json(self, payload):
"""Creates json object from payload content then writes to .json file.
Args:
payload (b'str'): Binary string of asteroid data to be processed.
"""
json_file = open(f"/tmp/asteroids_{self.today}.json",'w')
json_file.write(json.dumps(payload, indent=4))
json_file.close()
def asteroids_csv(self, payload):
"""Creates .csv object from payload content then writes to .csv file.
"""
csv_file=open(f"/tmp/asteroids_{self.today}.csv",'w', newline='\n')
fields=list(payload[0].keys())
writer=csv.DictWriter(csv_file, fieldnames=fields)
writer.writeheader()
writer.writerows(payload)
csv_file.close()
def get_secret(self):
"""Gets secret from AWS Secrets Manager
Notes:
Have yet to integrate into the CDK. Leaving as example code.
"""
secret_name = os.environ['TOKEN_SECRET_NAME']
region_name = os.environ['REGION']
session = boto3.session.Session()
client = session.client(service_name='secretsmanager', region_name=region_name)
try:
get_secret_value_response = client.get_secret_value(SecretId=secret_name)
except ClientError as e:
raise e
else:
if 'SecretString' in get_secret_value_response:
secret = get_secret_value_response['SecretString']
else:
secret = b64decode(get_secret_value_response['SecretBinary'])
return secret
def write_to_s3(self):
"""Uploads both .json and .csv files to s3
"""
s3 = boto3.client('s3')
s3.upload_file(f"/tmp/asteroids_{self.today}.{self.file_format}", os.environ['S3_BUCKET'], f"asteroid_data/asteroids_{self.today}.{self.file_format}")
def handler(event, context):
"""Instantiates class and triggers execution method.
Args:
event (dict): Lists a custom dict that determines interface control flow--i.e. `csv` or `json`.
context (obj): Provides methods and properties that contain invocation, function and
execution environment information.
*Not used herein.
"""
asteroids = Asteroids()
asteroids.execute(event)

Ibpy: how to capture data returned from reqAccountSummary

I'm using ibapi from interactive brokers and I got stuck on how to capture the returned data, generally. For example, according to api docs, when I request reqAccountSummary(), the method delivered the data via accountSummary(). But their example only print the data. I've tried capturing the data or assign it to a variable, but no where in their docs shows how to do this. I've also google search and only find register() and registerAll() but that is from ib.opt which isn't in the latest working ibapi package.
Here is my code. Could you show me how to modify accountSummary() to capture the data?
from ibapi.client import EClient
from ibapi.wrapper import EWrapper
from ibapi.common import *
class TestApp(EWrapper,EClient):
def __init__(self):
EClient.__init__(self,self)
# request account data:
def my_reqAccountSummary1(self, reqId:int, groupName:str, tags:str):
self.reqAccountSummary(reqId, "All", "TotalCashValue")
# The received data is passed to accountSummary()
def accountSummary(self, reqId: int, account: str, tag: str, value: str, currency: str):
super().accountSummary(reqId, account, tag, value, currency)
print("Acct# Summary. ReqId>:", reqId, "Acct:", account, "Tag: ", tag, "Value:", value, "Currency:", currency)
return value #This is my attempt which doesn't work
def main():
app = TestApp()
app.connect("127.0.0.1",7497,clientId=0)
app.my_reqAccountSummary1(8003, "All", "TotalCashValue") #"This works, but the data is print to screen. I don't know how to assign the received TotalCashValue to a variable"
# myTotalCashValue=app.my_reqAccountSummary1(8003, "All", "TotalCashValue") #"My attempt doesn't work"
# more code to stop trading if myTotalCashValue is low
app.run()
if __name__=="__main__":
main()
You cannot do this in the main function, since app.run listens to responses from the TWS. Once you have set up all the callbacks as you correctly did, the main function will be looping forever in app.run.
You have to put your code directly into the accountSummary function. This is how these kind of programs work, you put your logic directly into
the callback functions. You can always assign self.myTotalCashValue = value to make it available to other parts of your class, or even to another thread.
-- OR --
You run app.run in a thread and wait for the value to return, e.g.
add self._myTotalCashValue = value to accountSummary, import threading and time and then add something like this in main:
t = threading.Thread(target=app.run)
t.daemon = True
t.start()
while not hasattr(app,"_myTotalCashValue"):
time.sleep(1)
print(app._myTotalCashValue)
As usual with threads, you have to be a bit careful with shared memory between app and main.

GObject.timeout_add stops running unexpectedly

I am working on an Ubuntu Appindicator that displays the value of a JSON API call every X seconds.
The issue is that, randomly, it will stop calling self.loop without any error or warning. I can have it running for days or for hours. I've setup (in development) debug statements and it always stops running after the loop function is called.
It's as if I was returning False or not returning from the function even though the logic in this code should always return True.
Here is the documentation for GObject.timeout_add (for GTK2 but the principle stands).
I'm not sure if it's dependent on the PyGTK version. I've had it happen in Ubuntu 16.04 and Ubuntu 17.04.
Here is the full class. The point where the JSON API is called is at result = self.currency.query(). I am happy to give further feedback.
import gi
gi.require_version('Gtk', '3.0')
from gi.repository import GObject
class QueryLoop():
"""QueryLoop accepts the indicator to which the result will be written and an currency to obtain the results from.
To define an currency you only need to implement the query method and return the results in a pre-determined
format so that it will be consistent."""
def __init__(self, indicator, currency, timeout=5000):
"""
Initialize the query loop with the indicator and the currency to get the data from.
:param indicator: An instance of an indicator
:param currency: An instance of an currency to get the information from
:param timeout The interval between requests to the currency API
"""
self.indicator = indicator
self.currency = currency
self.timeout = timeout
self.last_known = {"last": "0.00"}
def loop(self):
"""Loop calls it-self forever and ever and will consult the currency for the most current value and update
the indicator's label content."""
result = self.currency.query()
if result is not None:
self.indicator.set_label("{} {}".format(result["last"], self.currency.get_ticker()))
self.last_known = result
else:
self.indicator.set_label("Last Known: {} EUR (Error)".format(self.last_known["last"]))
return True
def start(self):
"""Starts the query loop it does not do anything else. It's merely a matter of naming because
when initializing the loop in the main() point of entry."""
GObject.timeout_add(self.timeout, self.loop)
self.loop()

Resources