QLDB Python Driver Error Handling with Lambda and SQS - amazon-qldb

We have a QLDB ingestion process that consists of a Lambda function triggered by SQS.
We want to make sure our pipeline is airtight so if a failure or error occurs during driver execution, we don't lose that data if the data fails to commit to QLDB.
In our testing we noticed that if there's a failure within the Lambda itself, it automatically resends the message to the queue, but if the driver fails, the data is lost.
I understand that the default behavior for the driver is to retry four times after the initial failure. My question is, if I wrap qldb_driver.execute_lambda() in a try statement, will that allow the driver to retry upon failure or will it instantly return as a failure and be handled by the except statement?
Here is how I've written the first half of the function:
import json
import boto3
import datetime
from pyqldb.driver.qldb_driver import QldbDriver
from utils import upsert, resend_to_sqs, delete_from_sqs
queue_url = 'https://sqs.XXX/'
sqs = boto3.client('sqs', region_name='us-east-1')
ledger = 'XXXXX'
table = 'XXXXX'
qldb_driver = QldbDriver(ledger_name = ledger, region_name='us-east-1')
def lambda_handler(event, context):
# Simple iterable to identify messages
i = 0
# Error flag
error = False
# Empty list to store message send status as well as body or receipt_handle
batch_messages = []
for record in event['Records']:
payload = json.loads(record["body"])
payload['update_ts'] = str(datetime.datetime.now())
try:
qldb_driver.execute_lambda(lambda executor: upsert(executor, ledger = ledger, table_name = table, data = payload))
# If the message sends successfully, give it status 200 and add the recipt_handle to our list
# so in case an error occurs later, we can delete this message from the queue.
message_info = {f'message_{i}': 200, 'receiptHandle': record['receiptHandle']}
batch_messages.append(message_info)
except Exception as e:
print(e)
# Flip error flag to True
error = True
# If the commit fails, set status 400 and add the message's body to our list.
# This will allow us to send the message back to the queue during error handling.
message_info = {f'message_{i}': 400, 'body': record['body']}
batch_messages.append(message_info)
i += 1
Assuming that this try/except allows the driver to retry upon failure, I've written an additional process to record message data from our batch to delete successful commits and send failures back to the queue:
# Begin error handling
if error:
count = 0
for j in range(len(batch_messages)):
# If a message was sent successfully delete it from the queue
if batch_messages[j][f'message_{j}'] == 200:
receipt_handle = batch_messages[j]['receiptHandle']
delete_from_sqs(sqs, queue_url, receipt_handle)
# If the message failed to commit to QLDB, send it back to the queue
else:
body = batch_messages[j]['body']
resend_to_sqs(sqs, queue_url, body)
count += 1
print(f"ERROR(S) DETECTED - {count} MESSAGES RETURNED TO QUEUE")
else:
print("BATCH PROCESSING SUCCESSFUL")
Thank you for your insight!

The qldb python driver can be configured for more or less retries if you need. I'm not sure if you wanted it to only try 1 time, or if you were asking that the driver will try the transaction 4 times before triggering the try/catch exception. The driver will still try up-to 4 times, before throwing the except.
You can follow the example here to modify the retry amount. Also, note the default retry timeout is a random ms jitter and not exponential. With QLDB, you shouldn't need to wait long periods to retry since it uses optimistic concurrency control.
Also, with your design of throwing the failed message back into the queue...you might want to consider throwing it into a dead letter queue. Dead-letter queues would prevent trouble messages from retrying indefinitely, unless thats your goal.
(edit/additionally)
Observe that the qldb driver exhausting retires before raising an exception.

Related

Seperating AioRTC datachannel into multiple threads

I have a two-way datachannel setup that takes a heartbeat from a browser client and keeps the session alive as long as the heartbeat stays. The heartbeat is the 'main' communication for WebRTC, but I have other bits of into (Such as coordinates) I need to send constantly.
To do this when a webrtc offer is given, it takes that HTTP request:
Creates a new event loop 'rtcloop'
Set's that as the main event loop.
Then run 'rtcloop' until complete, calling my webRtcStart function and passing through the session info.
Then run a new thread with the target being 'rtcloop', run it forever and start.
Inside the new thread I set the loop with 'get_event_loop' and later define ' #webRtcPeer.on("datachannel")' so when we get a Datachannel message, we run code around that. Depending on the situation, I attempt to do the following:
ptzcoords = 'Supported' #PTZ Coords will be part of WebRTC Communication, send every 0.5 seconds.
ptzloop = asyncio.new_event_loop()
ptzloop.run_until_complete(updatePTZReadOut(webRtcPeer, cameraName, loop))
ptzUpdateThread = Thread(target=ptzloop.run_forever)
ptzUpdateThread.start()
The constant error I get no matter how I structure things is "coroutine 'updatePTZReadOut' was never awaited"
With updatePTZReadOut being:
async def updatePTZReadOut(rtcPeer, cameraName, eventLoop):
# Get Camera Info
# THE CURRENT ISSUE I am having is with the event loops, because this get's called to run in another thread, but it still needs
# to be awaitable,
# Current Warning Is: /usr/lib/python3.10/threading.py:953: RuntimeWarning: coroutine 'updatePTZReadOut' was never awaited
# Ref Article: https://xinhuang.github.io/posts/2017-07-31-common-mistakes-using-python3-asyncio.html
# https://lucumr.pocoo.org/2016/10/30/i-dont-understand-asyncio/
# Get current loop
# try:
loop = asyncio.set_event_loop(eventLoop)
# loop.run_until_complete()
# except RuntimeError:
# loop = asyncio.new_event_loop()
# asyncio.set_event_loop(loop)
# Getting Current COORDS from camera
myCursor.execute("Select * from localcameras where name = '{0}' ".format(cameraName))
camtuple = myCursor.fetchall()
camdata = camtuple[0]
# Create channel object
channel_local = rtcPeer.createDataChannel("chat")
while True:
ptzcoords = readPTZCoords(camdata[1], camdata[3], cryptocode.decrypt(str(camdata[4]), passwordRandomKey))
print("Updating Coords to {0}".format(ptzcoords))
# Publish Here
await channel_local.send("TTTT")
asyncio.sleep(0.5)
Any help here?
updatePTZReadOut is async function. You need to add await whenever you call this function.

Celery retry only the fail request in the loop and continue with the other

I'm kinda new to celery as a whole and having the problem with retry case on a for loop:
i have the following task:
#app.task(bind=True, autoretry_for=(CustomException,), retry_kwargs={'max_retries': 10,'countdown': 30})
def call_to_apis(self):
api_list = [api1, api2, api3, api4, api5,...]
for api in api_list:
try:
response = requests.get(api)
if response.status_code == 500:
raise CustomException
except CustomException:
continue
From my understanding celery will retry on my CustomException get raised.
In the case of retry, will it only retry for the the failed api or will it just run the whole process of every api in the api_list again? If so is there anyway for it to only retry the failed api ?
Expected result: only retry the failed api
EDIT:
i have split it into 2 different tasks and 1 request function as follow:
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(300.0, call_to_apis.s())
print("setup_periodic_tasks")
def call_api(api):
response = requests.get(api)
if response.status_code == 500:
raise CustomException
elif response.status_code == 404:
raise CustomWrongLinkException
#app.task(default_retry_delay=30, max_retries=10)
def send_fail_api(api):
try:
call_api(api)
except NonceTooLowException:
try:
send_fail_api.retry()
except MaxRetriesExceededError:
print("reached max retry number")
pass
except Exception:
pass
#app.task()
def call_to_apis():
api_list = [api1, api2, api3, api4, api5,...]
for api in api_list:
try:
call_api(api)
except CustomException:
send_fail_api.delay(api)
except CustomWrongLinkException:
print("wrong link")
except Exception:
pass
it worked and the other apis get to complete, with failed api it supposed to call to another task and retry for 10 time each delay 30 seconds.
But i'm getting more than expected retries about 24 times(expected to only retry 10 times) and it also printed out reached max retry number at 10th retry but it still retry until 24 retries
What am i doing wrong ?
The whole task will be retried in case of a known exception (specified in autoretry_for decorator argument), see the documentation. Celery can't know by any means the state of the task when exception is raised, that's what you have to handle. I would suggest splitting the task into individual tasks (one per API) and call them separately, presumably creating some workflow.

pubnub python 4 sdk

I have just started using pubnub. I entered the basic code which was given in pubnub python sdk (4.0) and I get the following errors
ERROR:pubnub:Async request Exception. 'Publish' object has no
attribute 'async' ERROR:pubnub:Exception in subscribe loop: 'Publish'
object has no attribute 'async' WARNING:pubnub:reconnection policy is
disabled, please handle reconnection manually.
As far as the async() is concerned, there is a troubleshoot in which the async error can be solved be entering the following
def callback(result, status):
if status.is_error():
print("Error %s" % str(status.error_data.exception))
print("Error category #%d" % status.category)
else:
print(str(result))\
but still it doesn't work.
This is the code
from pubnub.callbacks import SubscribeCallback
from pubnub.enums import PNStatusCategory
from pubnub.pnconfiguration import PNConfiguration
from pubnub.pubnub import PubNub
pnconfig = PNConfiguration()
pnconfig.subscribe_key = 'demo'
pnconfig.publish_key = 'demo'
pubnub = PubNub(pnconfig)
def my_publish_callback(envelope, status):
# Check whether request successfully completed or not
if not status.is_error():
pass # Message successfully published to specified channel.
else:
pass # Handle message publish error. Check 'category' property to find out possible issue
# because of which request did fail.
# Request can be resent using: [status retry];
class MySubscribeCallback(SubscribeCallback):
def presence(self, pubnub, presence):
pass # handle incoming presence data
def status(self, pubnub, status):
if status.category == PNStatusCategory.PNUnexpectedDisconnectCategory:
pass # This event happens when radio / connectivity is lost
elif status.category == PNStatusCategory.PNConnectedCategory:
# Connect event. You can do stuff like publish, and know you'll get it.
# Or just use the connected event to confirm you are subscribed for
# UI / internal notifications, etc
pubnub.publish().channel("awesomeChannel").message("hello!!").async(my_publish_callback)
elif status.category == PNStatusCategory.PNReconnectedCategory:
pass
# Happens as part of our regular operation. This event happens when
# radio / connectivity is lost, then regained.
elif status.category == PNStatusCategory.PNDecryptionErrorCategory:
pass
# Handle message decryption error. Probably client configured to
# encrypt messages and on live data feed it received plain text.
def message(self, pubnub, message):
pass # Handle new message stored in message.message
pubnub.add_listener(MySubscribeCallback())
pubnub.subscribe().channels('awesomeChannel').execute()
As the error is from the publish method, it could most probably be because
async has been changed to pn_async
Note that as on date, this is applicable only for Python3 as the same has not been implemented for Python 2.
Change
pubnub.publish().channel("awesomeChannel").message("hello!!").async(my_publish_callback)
to
pubnub.publish().channel("awesomeChannel").message("hello!!").pn_async(my_publish_callback)
Reference document here

Why is this queue not working properly?

The following queue is not working properly somehow. Is there any obvious mistake I have made? Basically every incoming SMS message is put onto the queue, tries to send it and if it successful deletes from the queue. If its unsuccessful it sleeps for 2 seconds and tries sending it again.
# initialize queue
queue = queue.Queue()
def messagePump():
while True:
item = queue.get()
if item is not None:
status = sendText(item)
if status == 'SUCCEEDED':
queue.task_done()
else:
time.sleep(2)
def sendText(item):
response = getClient().send_message(item)
response = response['messages'][0]
if response['status'] == '0':
return 'SUCCEEDED'
else:
return 'FAILED'
#app.route('/webhooks/inbound-sms', methods=['POST'])
def delivery_receipt():
data = dict(request.form) or dict(request.args)
senderNumber = data['msisdn'][0]
incomingMessage = data['text'][0]
# came from customer service operator
if (senderNumber == customerServiceNumber):
try:
split = incomingMessage.split(';')
# get recipient phone number
recipient = split[0]
# get message content
message = split[1]
# check if target number is 10 digit long and there is a message
if (len(message) > 0):
# for confirmation send beginning string only
successText = 'Message successfully sent to: '+recipient+' with text: '+message[:7]
queue.put({'from': virtualNumber, 'to': recipient, 'text': message})
The above is running on a Flask server. So invoking messagePump:
thread = threading.Thread(target=messagePump)
thread.start()
The common in such cases is that Thread has completed execution before item started to be presented in the queue, please call thread.daemon = True before running thread.start().
Another thing which may happen here is that Thread was terminated due to exception. Make sure the messagePump handle all possible exceptions.
That topic regarding tracing exceptions on threads may be useful for you:
Catch a thread's exception in the caller thread in Python

Python - Pass a function (callback) variable between functions running in separate threads

I am trying to develop a Python 3.6 script which uses pika and threading modules.
I have a problem which I think is caused by my A) being very new to Python and coding in general, and B) my not understanding how to pass variables between functions when they are run in separate threads and already being passed a parameter in parentheses at the end of the receiving function name.
The reason I think this, is because when I do not use threading, I can pass a variable between functions simply by calling the receiving function name, and supplying the variable to be passed, in parentheses, a basic example is shown below:
def send_variable():
body = "this is a text string"
receive_variable(body)
def receive_variable(body):
print(body)
This when run, prints:
this is a text string
A working version of the code I need to to get working with threading is shown below - this uses straight functions (no threading) and I am using pika to receive messages from a (RabbitMQ) queue via the pika callback function, I then pass the body of the message received in the 'callback' function to the 'processing function' :
import pika
...mq connection variables set here...
# defines username and password credentials as variables set at the top of this script
credentials = pika.PlainCredentials(mq_user_name, mq_pass_word)
# defines mq server host, port and user credentials and creates a connection
connection = pika.BlockingConnection(pika.ConnectionParameters(host=mq_host, port=mq_port, credentials=credentials))
# creates a channel connection instance using the above settings
channel = connection.channel()
# defines the queue name to be used with the above channel connection instance
channel.queue_declare(queue=mq_queue)
def callback(ch, method, properties, body):
# passes (body) to processing function
body_processing(body)
# sets channel consume type, also sets queue name/message acknowledge settings based on variables set at top of script
channel.basic_consume(callback, queue=mq_queue, no_ack=mq_no_ack)
# tells the callback function to start consuming
channel.start_consuming()
# calls the callback function to start receiving messages from mq server
callback()
# above deals with pika connection and the main callback function
def body_processing(body):
...code to send a pika message every time a 'body' message is received...
This works fine however I want to translate this to run within a script that uses threading. When I do this I have to supply the parameter 'channel' to the function name that runs in its own thread - when I then try to include the 'body' parameter so that the 'processing_function' looks as per the below:
def processing_function(channel, body):
I get an error saying:
[function_name] is missing 1 positional argument: 'body'
I know that when using threading there is more code needed and I have included the actual code that I use for threading below so that you can see what I am doing:
...imports and mq variables and pika connection details are set here...
def get_heartbeats(channel):
channel.queue_declare(queue=queue1)
#print (' [*] Waiting for messages. To exit press CTRL+C')
def callback(ch, method, properties, body):
process_body(body)
#print (" Received %s" % (body))
channel.basic_consume(callback, queue=queue1, no_ack=no_ack)
channel.start_consuming()
def process_body(channel, body):
channel.queue_declare(queue=queue2)
#print (' [*] Waiting for Tick messages. To exit press CTRL+C')
# sets the mq host which pika client will use to send a message to
connection = pika.BlockingConnection(pika.ConnectionParameters(host=mq_host))
# create a channel connection instance
channel = connection.channel()
# declare a queue to be used by the channel connection instance
channel.queue_declare(queue=order_send_queue)
# send a message via the above channel connection settings
channel.basic_publish(exchange='', routing_key=send_queue, body='Test Message')
# send a message via the above channel settings
# close the channel connection instance
connection.close()
def manager():
# Channel 1 Connection Details - =======================================================================================
credentials = pika.PlainCredentials(mq_user_name, mq_password)
connection1 = pika.BlockingConnection(pika.ConnectionParameters(host=mq_host, credentials=credentials))
channel1 = connection1.channel()
# Channel 1 thread =====================================================================================================
t1 = threading.Thread(target=get_heartbeats, args=(channel1,))
t1.daemon = True
threads.append(t1)
# as this is thread 1 call to start threading is made at start threading section
# Channel 2 Connection Details - =======================================================================================
credentials = pika.PlainCredentials(mq_user_name, mq_password)
connection2 = pika.BlockingConnection(pika.ConnectionParameters(host=mq_host, credentials=credentials))
channel2 = connection2.channel()
# Channel 2 thread ====================================================================================================
t2 = threading.Thread(target=process_body, args=(channel2, body))
t2.daemon = True
threads.append(t2)
t2.start() # as this is thread 2 - we need to start the thread here
# Start threading
t1.start() # start the first thread - other threads will self start as they call t1.start() in their code block
for t in threads: # for all the threads defined
t.join() # join defined threads
manager() # run the manager module which starts threads that call each module
This when run produces the error
process_body() missing 1 required positional argument: (body)
and I do not understand why this is or how to fix it.
Thank you for taking the time to read this question and any help or advice you can supply is much appreciated.
Please keep in mind that I am new to python and coding so may need things spelled out rather than being able to understand more cryptic replies.
Thanks!
On further looking in to this and playing with the code it seems that if I edit the lines:
def process_body(channel, body):
to read
def process_body(body):
and
t2 = threading.Thread(target=process_body, args=(channel2, body))
so that it reads:
t2 = threading.Thread(target=process_body)
then the code seems to work as needed - I also see multiple script processes in htop so it appears that threading is working - I have left the script processing for 24 hours + and did not receive any errors...

Resources