cx_Oracle and SDO_GEOMETRY - cx-oracle

I have a cx_Oracle connection that is happy returning data. I am however having trouble with the geometry. It is being returned as a cx_Oracle object e.g. cx_Oracle.OBJECT at 0x3afc320. But I cannot access it's attributes as below
import cx_Oracle
query = '''
select geometry from table
'''
cx_Oracle.makedsn(...)
db_conn = cx_Oracle.connect(...)
cursor = db_conn.cursor()
cursor.execute(query)
columns = [i[0].lower() for i in cursor.description]
results = []
for row in cursor:
results.append(dict(zip(columns, row)))
db_conn.close()
print (results[0]['geometry'])
print results[0]['geometry'].SDO_ORDINATES
print results[0]['geometry'].SDO_GTYPE
print results[0]['geometry'].SDO_ELEM_INFO
Because? these attributes are not available?
import inspect
inspect.getmembers(results[0]['geometry'])
[('__class__', cx_Oracle.OBJECT),
('__delattr__',
<method-wrapper '__delattr__' of cx_Oracle.OBJECT object at 0x03AFC320>),
('__doc__', None),
('__format__', <function __format__>),
('__getattribute__',
<method-wrapper '__getattribute__' of cx_Oracle.OBJECT object at 0x03AFC320>),
('__hash__',
<method-wrapper '__hash__' of cx_Oracle.OBJECT object at 0x03AFC320>),
('__init__',
<method-wrapper '__init__' of cx_Oracle.OBJECT object at 0x03AFC320>),
('__new__', <function __new__>),
('__reduce__', <function __reduce__>),
('__reduce_ex__', <function __reduce_ex__>),
('__repr__',
<method-wrapper '__repr__' of cx_Oracle.OBJECT object at 0x03AFC320>),
('__setattr__',
<method-wrapper '__setattr__' of cx_Oracle.OBJECT object at 0x03AFC320>),
('__sizeof__', <function __sizeof__>),
('__str__',
<method-wrapper '__str__' of cx_Oracle.OBJECT object at 0x03AFC320>),
('__subclasshook__', <function __subclasshook__>),
('type', <cx_Oracle.ObjectType MDSYS.SDO_GEOMETRY>)]
When I use sql developer and look at the table in question, the field 'geometry' is of type 'SDO_GEOMETRY()'. Any help appreciated.

If you have a table that looks like the following:
create table TestGeometry (
IntCol number(9) not null,
Geometry sdo_geometry not null
);
and you populate it with data like the following:
insert into TestGeometry
values (1, sdo_geometry(2003, null, null, sdo_elem_info_array(1, 1003, 3),
sdo_ordinate_array(1, 1, 5, 7)));
then the following script will access the data you are looking for:
connection = cx_Oracle.Connection("user/pw#tns")
cursor = connection.cursor()
cursor.execute("""
select Geometry
from TestGeometry
where IntCol = 1""")
obj, = cursor.fetchone()
print(obj.SDO_ORDINATES)
print(obj.SDO_GTYPE)
print(obj.SDO_ELEM_INFO)
You can always find out which attributes are available using this code:
for attr in obj.type.attributes:
print(attr.name)

I was encountering similar issues. The sdo_geometry object does not get transferred to the requesting client. I believe the reason is that the internal geometry is stored as a LOB.
In my case, I needed to simply access the geometry so I convert sdo_geometry (SDO_UTIL.TO_WKTGEOMETRY) to character data, which then came down.
There may be other methods for converting the object to sub-components.
Hope it helps,
J

Related

SQLAlchemy - insert from a result object?

The below query makes a result set in the variable 'result'
I need to insert that into the iconndest (the new MySQL server). But I have no idea how to insert the query result into the new table? I just want to do Insert into DB.TBL SELECT * FROM RESULT. But I am not sure how?
import mysql.connector
import pandas as pd
from sqlalchemy import create_engine
import multiprocessing as mp
from multiprocessing import cpu_count
try:
engine_source = create_engine("CONN STRING")
iconn = engine_source.connect()
result = iconn.execute('SELECT QUERY')
print('EXTRACT COMPLETE')
engine_dest = create_engine("CONN STRING")
iconndest = engine_dest.connect()
iconndest.execute('SELECT * from ')
engine_source.dispose()
engine_dest.dispose()
except Exception as e:
print('extract: ' + str(e))
What you describe is very simple if we use .mappings() to convert the list of Row objects to a list of RowMapping objects when we retrieve the results. RowMapping objects behave like dict objects when passed as parameter values:
import sqlalchemy as sa
source_engine = sa.create_engine("mssql+pyodbc://scott:tiger^5HHH#mssql_199")
destination_engine = sa.create_engine("sqlite://")
with source_engine.begin() as conn:
results = (
conn.exec_driver_sql(
"""\
SELECT 1 AS id, N'foo' AS txt
UNION ALL
SELECT 2 AS id, N'bar' AS txt
"""
)
.mappings()
.all()
)
print(results)
# [{'id': 1, 'txt': 'foo'}, {'id': 2, 'txt': 'bar'}]
destination_engine.echo = True
with destination_engine.begin() as conn:
conn.exec_driver_sql("CREATE TABLE t (id int, txt varchar(10))")
conn.execute(
sa.text("INSERT INTO t (id, txt) VALUES (:id, :txt)"), results
)
"""SQL emitted:
INSERT INTO t (id, txt) VALUES (?, ?)
[generated in 0.00038s] ((1, 'foo'), (2, 'bar'))
"""

How to access public attributes in RPyC on nested data?

I am trying access public attributes on RPyC call by following this document but don't see it to be working as mentioned in document.
Documentation says if you don't specify protocol_config={'allow_public_attrs': True,} , public attributes , even of builtin data types won't be accessible. However even if we specify this, public attributes of nested data structure is not accessible ?
RPyC Server code.
import pickle
import rpyc
class MyService(rpyc.Service):
def on_connect(self, conn):
# code that runs when a connection is created
# (to init the service, if needed)
pass
def on_disconnect(self, conn):
# code that runs after the connection has already closed
# (to finalize the service, if needed)
pass
def exposed_get_answer(self): # this is an exposed method
return 42
exposed_the_real_answer_though = 43 # an exposed attribute
def get_question(self): # while this method is not exposed
return "what is the airspeed velocity of an unladen swallow?"
def exposed_hello(self, collection):
print ("Collection is ", collection)
print ("Collection type is ", type(collection).__name__)
for item in collection:
print ("Item type is ", type(item).__name__)
print(item)
def exposed_hello2(self, collection):
for item in collection:
for key, val in item.items():
print (key, val)
def exposed_hello_json(self, collection):
for item in collection:
item = json.loads(item)
for key, val in item.items():
print (key, val)
if __name__ == "__main__":
from rpyc.utils.server import ThreadedServer
t = ThreadedServer(
MyService(),
port=3655,
protocol_config={'allow_public_attrs': True,}
)
t.start()
Client Side Calls
>>> import rpyc
>>> rpyc.__version__
(4, 0, 2)
>>> c = rpyc.connect('a.b.c.d', 3655) ; client=c.root
# Case 1
If data is in nested structure (using builtin data types) , it doesn't work.
>>> data
[{'a': [1, 2], 'b': 'asa'}]
>>> client.hello2(data)
...
AttributeError: cannot access 'items'
========= Remote Traceback (2) =========
Traceback (most recent call last):
File "/root/lydian.egg/rpyc/core/protocol.py", line 329, in _dispatch_request
res = self._HANDLERS[handler](self, *args)
File "/root/lydian.egg/rpyc/core/protocol.py", line 590, in _handle_call
return obj(*args, **dict(kwargs))
File "sample.py", line 33, in exposed_hello2
for key, val in item.items():
File "/root/lydian.egg/rpyc/core/netref.py", line 159, in __getattr__
return syncreq(self, consts.HANDLE_GETATTR, name)
File "/root/lydian.egg/rpyc/core/netref.py", line 75, in syncreq
return conn.sync_request(handler, proxy, *args)
File "/root/lydian.egg/rpyc/core/protocol.py", line 471, in sync_request
return self.async_request(handler, *args, timeout=timeout).value
File "/root/lydian.egg/rpyc/core/async_.py", line 97, in value
raise self._obj
_get_exception_class.<locals>.Derived: cannot access 'items'
Case 2 : Workaround, Pass nested data as string using json (poor man's pickle) and decode at server end.
>>> jdata = [json.dumps({'a': [1,2], 'b':"asa"})].
>>> client.hello_json(jdata) # Prints following at remote endpoint.
a [1, 2]
b asa
Case 3 :
Interestingly, at first level builtin items are accessible as in case of
hello method. But calling that on nested data is giving error.
>>> client.hello([1,2,3,4]) # Prints following at remote endpoint.
Collection is [1, 2, 3, 4]
Collection type is list
Item type is int
1
Item type is int
2
Item type is int
3
Item type is int
4
I have workaround / solution to the problem (case 2 above) but looking for explanation on why is this not allowed or if it is a bug. Thanks for inputs.
The issue is not related to nested data.
Your problem is that you are not allowing public attributes in the client side.
The solution is simple:
c = rpyc.connect('a.b.c.d', 3655, config={'allow_public_attrs': True})
Keep in mind that rpyc is a symmetric protocol (see https://rpyc.readthedocs.io/en/latest/docs/services.html#decoupled-services).
In your case, the server tries to access the client's object, so allow_public_attrs must be set in the client side.
Actually for your specific example, there is no need to set allow_public_attrs in the server side.
Regarding case 3:
In the line for item in collection:, the server tries to access two fields: collection.__iter__ and collection.__next__.
Both of these fields are considered by default as "safe attributes", and this is why you didn't get error there.
To inspect the default configuration dictionary in rpyc:
>>> import rpyc
>>> rpyc.core.protocol.DEFAULT_CONFIG

Psycopg2 can't write numpy nans to postgresql table: invalid input syntax for type double precision: ""

I have a small pyhton code that build a dataframe with one (or more) nans and then write it to a postgres database with psycopg2 module using copy_from function. Here it is:
table_name = "test"
df = pd.DataFrame([[1.0, 2.0], [3.0, np.nan]], columns=["VALUE0", "VALUE1"], index=pd.date_range("2000-01-01", "2000-01-02"))
database = "xxxx"
user = "xxxxxxx"
password = "xxxxxx"
host = "127.0.0.1"
port = "xxxxx"
def nan_to_null(f,
_NULL=psycopg2.extensions.AsIs('NULL'),
_NaN=np.NaN,
_Float=psycopg2.extensions.Float):
if f != f:
return _NULL
else:
return _Float(f)
psycopg2.extensions.register_adapter(float, nan_to_null)
psycopg2.extensions.register_adapter(np.float, nan_to_null)
psycopg2.extensions.register_adapter(np.float64, nan_to_null)
with psycopg2.connect(database=database,
user=user,
password=password,
host=host,
port=port) as conn:
try:
with conn.cursor() as cur:
cmd = "CREATE TABLE {} (TIMESTAMP timestamp PRIMARY KEY NOT NULL, VALUE0 FLOAT, VALUE1 FLOAT)"
cur.execute(sql.SQL(cmd).format(sql.Identifier(table_name)))
buffer = StringIO()
df.to_csv(buffer, index_label='TIMESTAMP', header=False)
buffer.seek(0)
cur.copy_from(buffer, table_name, sep=",")
conn.commit()
except Exception as e:
conn.rollback()
logging.error(traceback.format_exc())
raise e
The problème is that psycopg2 fail to transform nan into posgres NULL, although I have used this trick:
How do I convert numpy NaN objects to SQL nulls?
(the nan_to_null function).
I cannot make it work, it throws the following exception:
psycopg2.errors.InvalidTextRepresentation: invalid input syntax for type double precision: ""
CONTEXT: COPY test, line 2, column value1: ""
I am using python 3.8 on windows 10 with anaconda 3, psycopg2 v2.8.5 and postgres v12.3.
Thanks!
I put here the same code with the solution updated of Adrian Klaver.
The line that changed is:
df.to_csv(buffer, index_label='TIMESTAMP', header=False, na_rep='NaN')
We've added na_rep='NaN' in to_csv function. No need to replace nans with another line of code. replacing with 'NULL' does not work.
import psycopg2, logging, numpy as np, pandas as pd
from psycopg2 import sql
import traceback
from io import StringIO
if __name__ == '__main__':
table_name = "test"
df = pd.DataFrame([[1.0, 2.0], [3.0, np.nan]], columns=["VALUE0", "VALUE1"], index=pd.date_range("2000-01-01", "2000-01-02"))
database = "xxxxxx"
user = "xxxxx"
password = "xxxxxx"
host = "127.0.0.1"
port = "xxxxxx"
with psycopg2.connect(database=database,
user=user,
password=password,
host=host,
port=port) as conn:
try:
with conn.cursor() as cur:
#Creating a new table test
cmd = "CREATE TABLE {} (TIMESTAMP timestamp PRIMARY KEY NOT NULL, VALUE0 FLOAT, VALUE1 FLOAT);"
cur.execute(sql.SQL(cmd).format(sql.Identifier(table_name)))
#Writting content
buffer = StringIO()
df.to_csv(buffer, index_label='TIMESTAMP', header=False, na_rep='NaN')
buffer.seek(0)
cur.copy_from(buffer, table_name, sep=",")
#Reading the table content
cmd = "SELECT * FROM {};"
cur.execute(sql.SQL(cmd).format(sql.Identifier(table_name)))
test_data = pd.DataFrame(cur.fetchall())
print(test_data)
print(type(test_data.loc[1, 2]))
#Deleting the test table
cmd = "DROP TABLE {};"
cur.execute(sql.SQL(cmd).format(sql.Identifier(table_name)))
conn.commit()
except Exception as e:
conn.rollback()
logging.error(traceback.format_exc())
raise e
The prints shows that nan is well interpreted and stored in the DB.
The issue is the use of copy_from. From the docs:
Currently no adaptation is provided between Python and PostgreSQL types on COPY: ...
So your adapter does not come into play.
UPDATE A possible solution:
Pandas Changing the format of NaN values when saving to CSV
See #cs95 answer.
It seems you are inserting empty string instead of NULL value, you can easily reproduce you error with the following SQL code:
CREATE TABLE test(
x FLOAT
);
INSERT INTO test(x) VALUES ('');
-- ERROR: invalid input syntax for type double precision: "" Position: 29
On the other hand, NaN can be safely inserted into PostgreSQL:
INSERT INTO test(x) VALUES ('NaN');
Notice PostgreSQL float support slightly differs from IEEE 754 standards because PostresSQL needs all value to be orderable to consistently build index. Therefore NaN is greater or equal to any other number including itself in PostgreSQL.
Thanks to Adrian Klaver and jlandercy answer, the solution is simple... replace np.nan by 'NaN' manually with the following line that replace the nan_to_null function:
'''
df.replace(np.nan, "NaN", inplace=True)
'''
And it works fine. Thank you guys!
Add na_rep='NaN' when you write your csv file.
If you are using this in conjunction with psycopg2's copy_expert method, you may need to also add the null = "NaN" param to your postgres syntax so that the null representations match up.
Here's an example:
df.to_csv(csv_filename, index=False, na_rep='NaN')
string = sql.SQL("""
copy {}
from stdin (
format csv,
null "NaN",
delimiter ',',
header
)
""").format(sql.Identifier(table_name))

AttributeError: 'tuple' object has no attribute 'translate'

AttributeError: 'tuple' object has no attribute 'translate'
mycursor = mydb.cursor()
mycursor.execute("SELECT content FROM news_tb")
myresult = mycursor.fetchall()
for row in myresult:
row = row.translate(str.maketrans('', '', string.punctuation)).lower()
tokens = word_tokenize(row)
listStopword = set(stopwords.words('indonesian'))
wordsFiltered = []
for t in tokens:
if t not in listStopword:
wordsFiltered.append(t)
print(wordsFiltered)
Traceback (most recent call last): File
"C:/Users/Rahmadyan/PycharmProjects/Skripsi/nltk_download.py", line
17, in
row = row.translate(str.maketrans('', '', string.punctuation)).lower() AttributeError: 'tuple' object has no
attribute 'translate'
Even though it is only returning a single column it still puts the value into a tuple just like it would if multiple values were returned.
Each row in the value is going to be something like ("hello",)
To get the string you'll need to access it like this row[0]

SQLAlchemy: column_property jsonb operation

My Test table has a JSONB column data:
class Test(Base):
__tablename__ = 'test'
data = Column(JSONB)
A typical document has two lists:
{'percentage': [10, 20, 50, 80, 90],
'age': [1.21, 2.65, 5.23, 8.65, 11.78]
}
With a column_property I would like to tailor this two lists so it is available as a dictionary. In "open field" Python this is straightforward:
dict(zip(Test.data['percentage'], Test.data['age']))
But with a column_property:
Test.data_dict = column_property(
dict(zip(Test.data['percentage'], Test.data['age']))
)
this gives:
AttributeError: 'dict' object has no attribute 'label'
Is this actually possible and how should this been done?
Does it solves your problem?
#property
def data_dict(self):
return dict(zip(Test.data['percentage'], Test.data['age']))
In PostgreSQL it would something like this (for PostgreSQL >= 9.4)
SELECT json_object(array_agg(ARRAY[p,a]))
FROM (
SELECT unnest(ARRAY(select jsonb_array_elements_text(data->'percentage'))) p,
unnest(ARRAY(select jsonb_array_elements_text(data->'age'))) a
FROM test
) x;
In SQLAlchemy
from sqlalchemy.orm import column_property
from sqlalchemy import select, alias, text
class Test(Base):
__tablename__ = 'test'
data = db.Column(JSONB)
data_dict = column_property(
select([text('json_object(array_agg(ARRAY[p,a]))')]).select_from(
alias(select([
text("unnest(ARRAY(select jsonb_array_elements_text(data->'percentage'))) p, \
unnest(ARRAY(select jsonb_array_elements_text(data->'age'))) a")
]).select_from(text('test')))
)
)

Resources