So i have the following snippet where i am trying to generated a Dynamic SQL for insert, following is my payload that is passed as payload.
data = {"id": "123", "name": "dev", "description": "This is the dev Env","created_by":"me","updated_by": "me","table_name": table_name}
I am getting the following error for above mentioned payload.
LINE 1: ...updated_by, table_name) VALUES (123, dev, This is the dev En...
My Implementation:
class DMLRelationalDB:
def __init__(self):
pass;
def insert_sql(self, params):
"""
:param params:
:return:
"""
converted_dict = self.__convert_params_to_columns_and_placeholders(params)
print(converted_dict)
column_names = ", ".join(converted_dict['columns'])
placeholders = ", ".join(converted_dict['values'])
table_name = params["table_name"]
statement = f"""INSERT INTO {table_name} ({column_names}) VALUES ({placeholders})"""
print(statement)
return statement
def __convert_params_to_columns_and_placeholders(self, items_dict):
"""
:param items_dict:
:return:
"""
columns = []
values = []
for key, value in items_dict.items():
columns.append(key)
values.append(value)
return {"columns": columns, "values": values}
The problem is that you are trying to pass string values to your postgres DB without quoting them first. Personally I would quote all the data that will enter the database, just to make sure that it is handled correctly.
What you can do is the following:
placeholders = ", ".join([f'"{val}"' for val in converted_dict['values']])
If you have different types of data like datetimes for example the string representation will be put inside the f-string, so it would be safe.
If you have strings that contain double quotation marks you could use "dollar-quoting" to be on the safe side:
placeholders = ", ".join([f'$${val}$$' for val in converted_dict['values']])
If you think that there is a possibility that some string of yours would have two dollars after another, then put some string between the two-dollars to make it abolutely safe:
placeholders = ", ".join([f'$str${val}$str$' for val in converted_dict['values']])
The downside to this is that you increase the amount of data that is transfered and if you have lots of information, it will decrease the performance.
Related
I'm working on ETL in AWS Glue. I need to decode text from table which is in base64 - I'm doing that in Custom Transform in Python3.
My code is below:
def MyTransform (glueContext, dfc) -> DynamicFrameCollection:
import base64
newdf = dfc.select(list(dfc.keys())[0]).toDF()
data = newdf["email"]
data_to_decrypt = base64.b64decode(data)
I've got error like that:
TypeError: argument should be a bytes-like object or ASCII string, not 'Column'
How to get plan string from the Column object?
I was wrong and it was completely different thing than I thought.
Column object from newdf["email"] consists all rows for this single column, so it's not possible to just fetch one value from that.
What I ended up doing is iterating through whole rows and mapping them to new value like that:
def map_row(row):
id = row.id
client_key = row.client_key
email = decrypt_jasypt_string(row.email.strip())
phone = decrypt_jasypt_string(row.phone.strip())
created_on = row.created_on
return (id, email, phone, created_on, client_key)
df = dfc.select(list(dfc.keys())[0]).toDF()
rdd2=df.rdd.map(lambda row: map_row(row))
df2=rdd2.toDF(["id","email","phone", "created_on", "client_key"])
dyf_filtered = DynamicFrame.fromDF(df2, glueContext, "does it matter?")
I have a Python class:
class Database:
conn = sqlite3.connect(‘database.db’)
c = conn.cursor()
def __init__(self):
pass
Inside this class I have a multiple methods that I will use with my Database class such as:
def create_table(self, table_name, *args):
pass
def add_user(self):
pass
def remove_user(self):
pass
And so on.
My question is: how do I use *args with my ‘create_table’ function if I am not sure how many columns I will have. For example I know I will have first, last and pay columns, than my function will look like this:
def create_table(self, table_name, *args):
c.execute("""CREATE TABLE ‘{}’ (‘{}’ text, ‘{}’ text, ‘{}’
integer).format(self.table_name, self.first, self.last, self.pay)”””)
c.commit()
So if I want to create table I can do this:
Item = Database()
Item.create_table('employees', ‘First_name’, ’Last_name’, 100000)
But what if I don’t know how many columns I will have?
Thanks
def create_table(self, tableName, *args,):
columns = ''
for i in args:
columns += i
columns += ' '
message = '"""CREATE TABLE {} ({})"""'.format(tableName, columns[:-1])
return message
print(create_table('employees','first', 'text,', 'last', 'text,', 'pay', 'integer'))
Not sure how variable your columns are; or the time frame. Seems you have have a base set of definitions and then later, a new column pops up. So assuming your table starts with the 4 you mentioned above.
We run the create table first, then loop over the files, updating as we go, and if we find a new column, we run an ALTER TABLE tableName ADD column_name datatype and then obviously you update based on the key.
Or you can run over the table start to finish and create at once as qafrombayarea suggests. Our json files are just not that disciplined.
I'm Executing select query to postgresql database and after fetching those results I'm appending those results to list and then I'm giving that list as the input to another postgresql select query.
But due to conversion of those values to list it converts values with apostrophe(special character) cat's to double quotes "cat's". while executing second select query the value with double quotes is not been fetched because value with double quotes is not present in the database it is without double quotes cat's.
And there it gives me error that value is not present.
I have tried JSON dumps method but its isn't working because I cannot convert JSON list to tuple and give it as the input to postgresql select query
select_query = """select "Unique_Shelf_Names" from "unique_shelf" where category = 'Accessory'"""
cur.execute(select_query)
count = cur.fetchall()
query_list = []
for co in count:
for c in co:
query_list.append(c)
output of query_list:
query_list = ['parrot', 'dog', "leopard's", 'cat', "zebra's"]
Now this querylist is been converted to tuple and given as the input to another select query.
list2 = tuple(query_list)
query = """select category from "unique_shelf" where "Unique_Shelf_Names" in {} """.format(list2)
cur.execute(query)
This is where it gives me error "leopard's" doesn't exist but in database leopard's exists.
I want all the values in the query_list to be double quotes so this error doesn't arises.
Do not use format to construct the query. Simply use %s and pass the tuple into execute
query = """select category from "unique_shelf" where "Unique_Shelf_Names" in %s """
cur.execute(query,(list2,))
Tuples adaptation
I am trying to create automatically a rather large sqlite database tables which all have at least 50 columns. The column names are already available in different lists.
Using the .format I almost did this. The only open issue is the predetermination of the number of placeholders for "{}" from the length of name's list. Please see the code example below.
import sqlite3
db_path = "//Some path/"
sqlite_file = 'index_db.sqlite'
conn = sqlite3.connect(db_path + sqlite_file)
c = conn.cursor()
db_columns=['c1','c2','c3','c4','c5']
#This code is working
c.execute("create table my_table1 ({}, {}, {}, {}, {})" .format(*db_columns))
#This code doesn't work
c.execute("create table my_table2 (" + ("{}, " * 5)[:-2] + ")" .format(*db_columns))
#Following error appears
OperationalError: unrecognized token: "{"
#--> This even that the curly brackets are the same as in my_table1
print("create table my_table2 (" + ("{}, " * 5)[:-2] + ")")
#Output: create table my_table2 ({}, {}, {}, {}, {})
c.execute("INSERT INTO my_table1 VALUES (?,?,?,?,?)", (11, 111, 111, 1111, 11111))
conn.commit()
c.close
conn.close()
Is there a way to resolve that issue for my_table2?
Or is there a better way to create the column names dynamically from a list?
P.s. This is an internal database so I don't have any concerns regarding security issues due to using variables as names dynamically.
Thanks in advance!
Timur
Disclaimer:
do not use string concattenation to build SQL-strings - see f.e. http://bobby-tables.com/python for how to avoid injection by using parametrized queries.
According to this old post: Variable table name in sqlite you can not use "normal" parametrized queries to create a table / columnnames.
You can pre-format your createstatement though:
def scrub(table_name):
# attributation: https://stackoverflow.com/a/3247553/7505395
return ''.join( chr for chr in table_name if chr.isalnum() )
def createCreateStatement(tableName, columns):
return f"create table {scrub(tableName)} ({columns[0]}" + (
",{} "*(len(columns)-1)).format(*map(scrub,columns[1:])) + ")"
tabName = "demo"
colNames = ["one", "two", "three", "dont do this"]
print(createCreateStatement(tabName, colNames))
Output:
create table demo (one,two ,three ,dontdothis )
The scrub method is taken from Donald Miner's answer - upvote him :) if you like
I am working on a Python script to replicate some Postgresql tables from one environment to another (which does a little more than pg_dump). It works except when I am copying a table that has bytea data type.
I read the source table data in memory, then I dump the memory in the target database with concatenated inserts.
Here is my method that produces an insert statement:
def generateInsert(self, argCachedRow):
colOrd = 0;
valClauseList = []
hasBinary = False
for colData in argCachedRow:
colOrd += 1
colName = self.colOrdLookup.get(colOrd)
col = self.colLookup.get(colName)
dataType = col.dataType
insVal = None
if colData is not None:
strVal = str(colData)
if dataType.useQuote:
if "'" in strVal:
strVal = strVal.replace("'", "''")
insVal = "'%s'" % strVal
else:
if dataType.binary:
hasBinary = True
#insVal = psycopg2.Binary(colData)
#else:
insVal = strVal
else:
insVal = "NULL"
valClauseList.append(insVal)
valClause = ", ".join(valClauseList)
if hasBinary:
valClause = psycopg2.Binary(valClause)
result = "INSERT INTO %s VALUES (%s)" % (self.name, valClause)
return result
which works with every table that doesn't have binary data.
I also tried (intuitively) to wrap just the binary column data in psycopg2.Binary, which is the commented out line and then not do it to the whole row value list but that didn't work either.
Here is my simple DataType wrapper, which is loaded by reading Postgres' information_schema tables:
class DataType(object):
def __init__(self, argDispName, argSqlName, argUseQuote, argBin):
self.dispName = argDispName
self.sqlName = argSqlName
self.useQuote = argUseQuote
self.binary = argBin
How do I read and insert bytea columns using psycopg2?
If you have this database structure:
CREATE TABLE test (a bytea,
b int,
c text)
then inserting binary data into the request can be done like so, without any wrappers:
bin_data = b'bytes object'
db = psycopg2.connect(*args) # DB-API 2.0
c = db.cursor()
c.execute('''INSERT INTO test VALUES (%s, %s, %s)''', (bin_data, 1337, 'foo'))
c.execute('''UPDATE test SET a = %s''', (bin_data + b'1',))
Then, when you query it:
c.execute('''SELECT a FROM test''')
You'll receive a memoryview, which is easily converted back to bytes:
mview = c.fetchone()
new_bin_data = bytes(mview)
print(new_bin_data)
Output: b'bytes object1'
Also, I'd suggest you not to assemble queries by string formatting. psycopg2's built-in parameter substitution is much more convenient and you don't have to worry about validating data to protect from SQL injections.