Azure timer function. When to add cursor.close - azure

I am very new to Azure functions and have a question. I am working on an Azure timer function that pulls data via an API and inserts it into an Azure SQL db. I am able to do all that part successfully. However, at the end of the script, I get the following error:
Exception: ProgrammingError: Attempt to use a closed cursor.
My question is, when would I include cursor.close? Should I have that in there at all? I assume yes, but, if so, where do I use that?
If I comment it out, it works fine, but I feel like I should have that in there.
Here's my code:
def main(mytimer: func.TimerRequest) -> None:
gp_data=get_properties()
for index, row in gp_data.iterrows():
cursor.execute("""INSERT INTO dbo.get_properties3 (propertyid, property_name, street_address,
city, state_code, zip_code, phone, email, manager, currentperiod_start,
currentperiod_end, as_of_date) values(?,?,?,?,?,?,?,?,?,?,?,?)""", \
row.propertyid, row.property_name, row.street_address, row.city, row.state_code, row.zip_code, \
row.phone, row.email, row.manager, \
row.currentperiod_start, row.currentperiod_end,row.as_of_date)
cnxn.commit()
# cursor.close()
Any advice would be greatly appreciated.
Thanks!

In my opinion, I think the line of code cursor.close() is unnecessary because it can be garbage collected like any other object in python. Each running instance of your timer function will not be affected even though you don't add cursor.close().

Related

How Can I Check For The Existence of a UUID

I am trying to check for the existence of a UUID as a primary key in my Django environment...and when it exists...my code works fine...But if it's not present I get a "" is not a Valid UUID...
Here's my code....
uuid_exists = Book.objects.filter(id=self.object.author_pk,is_active="True").first()
I've tried other variations of this with .exists() or .all()...but I keep getting the ['“” is not a valid UUID.'] error.
I did come up with a workaround....
if self.object.author_pk is not '':
book_exists = Book.objects.filter(id=self.object.author_pk,is_active="True").first()
context['author_exists'] = author_exists
Is this the best way to do this? I was hoping to be able to use a straight filter...without clarifying logic....But I've worked all afternoon and can't seem to come up with anything better. Thanks in advance for any feedback or comments.
I've had the same issue and this is what I have:
Wrapping it into try/except (in my case it's a View so it's supposed to return a Response object)
try:
object = Object.objects.get(id=object_id)
except Exception as e:
return Response(data={...}, status=status.HTTP_40...
It gets to the exception (4th line) but somehow sends '~your_id~' is not a valid UUID. text instead of proper data. Which might be enough in some cases.
This seems like an overlook, so might as well get a fix soon. I don't have enough time to investigate deeper, unfortunately.
So the solution I came up with is not ideal either but hopefully is a bit cleaner and faster than what you're using rn.
# Generate a list of possible object IDs (make use of filters in order to reduce the DB load)
possible_ids = [str(id) for id in Object.objects.filter(~ filters here ~).values_list('id', flat=True)]
# Return an error if ID is not valid
if ~your_id~ not in possible_ids:
return Response(data={"error": "Database destruction sequence initialized!"}, status=status.HTTP_401_UNAUTHORIZED)
# Keep working with the object
object = Objects.objects.get(id=object_id)

fastapi leaving Postgres idle connection

I want to decrease the number of idle PostgreSQL requests coming in from fast-api calls. I am not able to figure out what exactly is leaving this many idle connections as multiple people are using this DB and APIs associated with it.
Can someone suggest what I might have done wrong to leave this many idle connections or an efficient way to figure out what is causing this so that I can accordingly fix that portion and decrease it somehow.
Not sure if I have provided sufficient information to explain this but if anything else is required, I will be more than happy to provide that information.
postgresql idle connection screenshot
This is how I am creating a PostgreSQL object via fastapi
class postgres:
def __init__(self, config):
try:
SQLALCHEMY_DATABASE_URL = "postgresql://" + \
config['postgresql']['user']+":"+config['postgresql']['password']+"#" + \
config['postgresql']['host']+":5432" + \
"/"+config['postgresql']['database']
# print(SQLALCHEMY_DATABASE_URL)
engine = create_engine(SQLALCHEMY_DATABASE_URL, future=True)
self.SessionLocal = sessionmaker(
autocommit=False, autoflush=True, bind=engine)
except Exception as e:
raise
def get_db(self):
"""
Function to return session variable used by ORM
:return: SessionLocal
"""
try:
db = self.SessionLocal()
yield db
finally:
db.close()
I believe this is caused by the get_db(self), as it creates (yields) a new connection/session object every time you call SessionLocal().

Incremental data load from Redshift to S3 using Pyspark and Glue Jobs

I have created a pipeline where the data ingestion takes place between Redshift and S3. I was able to do the complete load using the below method:
def readFromRedShift(spark: SparkSession, schema, tablename):
table = str(schema) + str(".") + str(tablename)
(url, Properties, host, port, db) = con.getConnection("REDSHIFT")
df = spark.read.jdbc(url=url, table=table, properties=Properties)
return df
Where getConnection is a different method under a separate class that handles all the redshift-related details. Later on, I used this method and created a data frame, and wrote the results into S3 which worked like a charm.
Now, I want to load the incremental data. Will enabling the Job Bookmarks Glue option help me? Or is there any other way to do it? I followed this official documentation but was of no help to me for my problem statement. So, if I run it for the first time as it will load the complete data, and if I rerun it will it be able to load the newly arrived records?
You are right. It can be achieved via use of job bookmarks, but at the same time it can be a bit tricky.
Please refer to this doc https://aws.amazon.com/blogs/big-data/load-data-incrementally-and-optimized-parquet-writer-with-aws-glue/

Google Cloud Bigquery Library Error

I am receiving this error
Cannot set destination table in jobs with DDL statements
When I try to resubmit a job from the job.build_resource() function in the google.cloud.bigquery library.
It seems that the destination table is set to something like this after that function call.
'destinationTable': {'projectId': 'xxx', 'datasetId': 'xxx', 'tableId': 'xxx'},
Am I doing something wrong here? Thanks to anyone that can give me any guidance here.
EDIT:
The job is initially being triggered by this
query = bq.query(sql_rendered)
We store the job id and use it later to check the status.
We get the job like this
job = bq.get_job(job_id=job_id)
If it meets a condition, in this case it failed due to rate limiting. We retry the job.
We retry the job like this
di = job._build_resource()
jo = bigquery.Client(project=self.project_client).job_from_resource(di)
jo._begin()
I think that's pretty much all of the code you need, but happy to provide more if needed.
You are seeing this error because you have a DDL statement in your query. What is happening is that the job_config is changing some values after the execution of the first query, particularly the job_config.destination . In order to try to overcome this issue, you could try to reset the value of job_config.destination to None after each job submission or use a different job_config for every query.

Azure Stream Analytics: "Stream Analytics job has validation errors: The given key was not present in the dictionary."

I burned a couple of hours on a problem today and thought I would share.
I tried to start up a previously-working Azure Stream Analytics job and was greeted by a quick failure:
Failed to start Streaming Job 'shayward10ProcessLogs'.
I looked at the JSON log and found nothing helpful whatsoever. The only description of the problem was:
Stream Analytics job has validation errors: The given key was not present in the dictionary.
Given the error and some changes to our database, I tried the following to no effect:
Deleting and Recreating all Inputs
Deleting and Recreating all Outputs
Running tests against the data (coming from Event Hub) and the output looked good
My query looked as followed:
SELECT
dateTimeUtc,
context.tenantId AS tenantId,
context.userId AS userId,
context.deviceId AS deviceId,
changeType,
dataType,
changeStatus,
failureReason,
ipAddress,
UDF.JsonToString(details) AS details
INTO
[MyOutput]
FROM
[MyInput]
WHERE
logType = 'MyLogType';
Nothing made sense so I started deconstructing my query. I took it down to a single field and it succeeded. I went field by field, trying to figure out which field (if any) was the cause.
See my answer below.
The answer was simple (yet frustrating). When I got to the final field, that's where the failure was:
UDF.JsonToString(details) AS details
This was the only field that used a user-defined function. After futsing around, I noticed that the Function Editor showed the title of the function as:
udf.JsonToString
It was a casing issue. I had UDF in UPPERCASE and Azure Stream Analytics expected it in lowercase. I changed my final field to:
udf.JsonToString(details) AS details
It worked.
The strange thing is, it was previously working. Microsoft may have made a change to Azure Stream Analytics to make it case-sensitive in a place where it seemingly wasn't before.
It makes sense, though. JavaScript is case-sensitive. Every JavaScript object is basically a dictionary of members. Consider the error:
Stream Analytics job has validation errors: The given key was not present in the dictionary.
The "udf" object had a dictionary member with my function in it. The UDF object would be undefined. Undefined doesn't have my function as a member.
I hope my 2-hour head-banging session helps someone else.

Resources