Issues creating a virtual HANA table - python-3.x

I am trying to create a virtual table in HANA based on a remote system table view.
If I run it at the command line using hdbsql
hdbsql H00=> create virtual table HanaIndexTable at "SYSRDL#CG_SOURCE"."<NULL>"."dbo"."sysiqvindex"
0 rows affected (overall time 305.661 msec; server time 215.870 msec)
I am able to select from HanaIndexTable and get results and see my index.
When I code it in python, I use the following command:
cursor.execute("""create virtual table HanaIndexTable1 at SYSRDL#CG_source.\<NULL\>.dbo.sysiqvindex""")
I think there is a problem with the NULL. But I see in the output that the escape key is doubled.
self = <hdbcli.dbapi.Cursor object at 0x7f02d61f43d0>
operation = 'create virtual table HanaIndexTable1 at SYSRDL#CG_source.\\<NULL\\>.dbo.sysiqvindex'
parameters = None
def __execute(self, operation, parameters = None):
# parameters is already checked as None or Tuple type.
> ret = self.__cursor.execute(operation, parameters=parameters, scrollable=self._scrollable)
E hdbcli.dbapi.ProgrammingError: (257, 'sql syntax error: incorrect syntax near "\\": line 1 col 58 (at pos 58)')
/usr/local/lib/python3.7/site-packages/hdbcli/dbapi.py:69: ProgrammingError
I have tried to run the command without the <> but get the following error.
hdbcli.dbapi.ProgrammingError: (257, 'sql syntax error: incorrect syntax near "NULL": line 1 col 58 (at pos 58)')
I have tried upper case, lower case and escaping. Is what I am trying to do impossible?

There was an issue with capitalization between HANA and my remote source. I also needed more escaping rather than less.

Related

inserting rows sql syntax error with Python 3.9

I am trying to insert rows into my database. Establishing a connection to the database is successful. When I try to insert my desired rows I get an error in the sql. The error appears to be coming from my variable "network_number". I am running nested for loops to iterate through the network number ranges from 1.1.1 - 254.254.254 and adding each unique IP to the database. The network number is written as a string so should the column for the network number be set to VARCHAR or TEXT to include full stops/period? The desired output is to populate my database table with each network number. You can find the sql query assigned to the variable sql_query.
def populate_ip_table(ip_ranges):
network_numbers = ["", "", ""]
information = "Populating the IP table..."
total_ips = (len(ip_ranges) * 254**2)
complete = 0
for octet_one in ip_ranges:
network_numbers[0] = str(octet_one)
percentage_complete = round(100 / total_ips * complete, 2)
information = f"{percentage_complete}% complete"
output_information(information)
for octet_two in range(1, 254 + 1):
network_numbers[1] = str(octet_two)
for octet_three in range(1, 254 + 1):
network_numbers[2] = str(octet_three)
network_number = ".".join(network_numbers)
complete += 1
sql_query = f"INSERT INTO ip_scan_record (ip, scanned_status, times_scanned) VALUES ({network_number}, false, 0)"
execute_sql_statement(sql_query)
information = "100% complete"
output_information(information)
Output
[ * ] Connecting to the PostgreSQL database...
[ * ] Connection successful
[ * ] Executing SQL statement
[ ! ] syntax error at or near ".50"
LINE 1: ...rd (ip, scanned_status, times_scanned) VALUES (1.1.50, false...
^
As stated by the Docs:
There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used
Postgresql Docs
I think you need to use VARCHAR, due to the small varying length of your ip-string. while, text is effectively avarchar (no limit), but it may have some problems related to indexing if a record with compressed size of greater than 2712 is tried to be inserted.
Actually your problem is, you need to put an extra single qoutes on network_number. To give you a string when inserting the value in postgresql.
To prove this try insert {network_number} as this:
network_number = "'" + ".".join(network_numbers) + "'"
sql_query = f"INSERT INTO ip_scan_record (ip, scanned_status, times_scanned) VALUES ({network_number}, false, 0)"
OR:
sql_query = f"INSERT INTO ip_scan_record (ip, scanned_status, times_scanned) VALUES ('{network_number}', false, 0)"
You could also, used inet dataType, which will save you this hassle.
As stated by Docs:
PostgreSQL offers data types to store IPv4, IPv6, and MAC addresses. It is better to use these types instead of plain text types to store network addresses, because these types offer input error checking and specialized operators and functions.
PostgreSQL: Network Address Types

Getting owner of file from smb share, by using python on linux

I need to find out for a script I'm writing who is the true owner of a file in an smb share (mounted using mount -t cifs of course on my server and using net use through windows machines).
Turns out it is a real challenge finding this information out using python on a linux server.
I tried using many many smb libraries (such as smbprotocol, smbclient and others), nothing worked.
I find few solutions for windows, they all use pywin32 or another windows specific package.
And I also managed to do it from bash using smbcalcs but couldn't do it cleanly but using subprocess.popen('smbcacls')..
Any idea on how to solve it?
This was unbelievably not a trivial task, and unfortunately the answer isn't simple as I hoped it would be..
I'm posting this answer if someone will be stuck with this same problem in the future, but hope maybe someone would post a better solution earlier
In order to find the owner I used this library with its examples:
from smb.SMBConnection import SMBConnection
conn = SMBConnection(username='<username>', password='<password>', domain=<domain>', my_name='<some pc name>', remote_name='<server name>')
conn.connect('<server name>')
sec_att = conn.getSecurity('<share name>', r'\some\file\path')
owner_sid = sec_att.owner
The problem is that pysmb package will only give you the owner's SID and not his name.
In order to get his name you need to make an ldap query like in this answer (reposting the code):
from ldap3 import Server, Connection, ALL
from ldap3.utils.conv import escape_bytes
s = Server('my_server', get_info=ALL)
c = Connection(s, 'my_user', 'my_password')
c.bind()
binary_sid = b'....' # your sid must be in binary format
c.search('my_base', '(objectsid=' + escape_bytes(binary_sid) + ')', attributes=['objectsid', 'samaccountname'])
print(c.entries)
But of course nothing will be easy, it took me hours to find a way to convert a string SID to binary SID in python, and in the end this solved it:
# posting the needed functions and omitting the class part
def byte(strsid):
'''
Convert a SID into bytes
strdsid - SID to convert into bytes
'''
sid = str.split(strsid, '-')
ret = bytearray()
sid.remove('S')
for i in range(len(sid)):
sid[i] = int(sid[i])
sid.insert(1, len(sid)-2)
ret += longToByte(sid[0], size=1)
ret += longToByte(sid[1], size=1)
ret += longToByte(sid[2], False, 6)
for i in range(3, len(sid)):
ret += cls.longToByte(sid[i])
return ret
def byteToLong(byte, little_endian=True):
'''
Convert bytes into a Python integer
byte - bytes to convert
little_endian - True (default) or False for little or big endian
'''
if len(byte) > 8:
raise Exception('Bytes too long. Needs to be <= 8 or 64bit')
else:
if little_endian:
a = byte.ljust(8, b'\x00')
return struct.unpack('<q', a)[0]
else:
a = byte.rjust(8, b'\x00')
return struct.unpack('>q', a)[0]
... AND finally you have the full solution! enjoy :(
I'm adding this answer to let you know of the option of using smbprotocol; as well as expand in case of misunderstood terminology.
SMBProtocol Owner Info
It is possible to get the SID using the smbprotocol library as well (just like with the pysmb library).
This was brought up in the github issues section of the smbprotocol repo, along with an example of how to do it. The example provided is fantastic and works perfectly. An extremely stripped down version
However, this also just retrieves a SID and will need a secondary library to perform a lookup.
Here's a function to get the owner SID (just wrapped what's in the gist in a function. Including here in case the gist is deleted or lost for any reason).
import smbclient
from ldap3 import Server, Connection, ALL,NTLM,SUBTREE
def getFileOwner(smb: smbclient, conn: Connection, filePath: str):
from smbprotocol.file_info import InfoType
from smbprotocol.open import FilePipePrinterAccessMask,SMB2QueryInfoRequest, SMB2QueryInfoResponse
from smbprotocol.security_descriptor import SMB2CreateSDBuffer
class SecurityInfo:
# 100% just pulled from gist example
Owner = 0x00000001
Group = 0x00000002
Dacl = 0x00000004
Sacl = 0x00000008
Label = 0x00000010
Attribute = 0x00000020
Scope = 0x00000040
Backup = 0x00010000
def guid2hex(text_sid):
"""convert the text string SID to a hex encoded string"""
s = ['\\{:02X}'.format(ord(x)) for x in text_sid]
return ''.join(s)
def get_sd(fd, info):
""" Get the Security Descriptor for the opened file. """
query_req = SMB2QueryInfoRequest()
query_req['info_type'] = InfoType.SMB2_0_INFO_SECURITY
query_req['output_buffer_length'] = 65535
query_req['additional_information'] = info
query_req['file_id'] = fd.file_id
req = fd.connection.send(query_req, sid=fd.tree_connect.session.session_id, tid=fd.tree_connect.tree_connect_id)
resp = fd.connection.receive(req)
query_resp = SMB2QueryInfoResponse()
query_resp.unpack(resp['data'].get_value())
security_descriptor = SMB2CreateSDBuffer()
security_descriptor.unpack(query_resp['buffer'].get_value())
return security_descriptor
with smbclient.open_file(filePath, mode='rb', buffering=0,
desired_access=FilePipePrinterAccessMask.READ_CONTROL) as fd:
sd = get_sd(fd.fd, SecurityInfo.Owner | SecurityInfo.Dacl)
# returns SID
_sid = sd.get_owner()
try:
# Don't forget to convert the SID string-like object to a string
# or you get an error related to "0" not existing
sid = guid2hex(str(_sid))
except:
print(f"Failed to convert SID {_sid} to HEX")
raise
conn.search('DC=dell,DC=com',f"(&(objectSid={sid}))",SUBTREE)
# Will return an empty array if no results are found
return [res['dn'].split(",")[0].replace("CN=","") for res in conn.response if 'dn' in res]
to use:
# Client config is required if on linux, not if running on windows
smbclient.ClientConfig(username=username, password=password)
# Setup LDAP session
server = Server('mydomain.com',get_info=ALL,use_ssl = True)
# you can turn off raise_exceptions, or leave it out of the ldap connection
# but I prefer to know when there are issues vs. silently failing
conn = Connection(server, user="domain\username", password=password, raise_exceptions=True,authentication=NTLM)
conn.start_tls()
conn.open()
conn.bind()
# Run the check
fileCheck = r"\\shareserver.server.com\someNetworkShare\someFile.txt"
owner = getFileOwner(smbclient, conn, fileCheck)
# Unbind ldap session
# I'm not clear if this is 100% required, I don't THINK so
# but better safe than sorry
conn.unbind()
# Print results
print(owner)
Now, this isn't super efficient. It takes 6 seconds for me to run this one a SINGLE file. So if you wanted to run some kind of ownership scan, then you probably want to just write the program in C++ or some other low-level language instead of trying to use python. But for something quick and dirty this does work. You could also setup a threading pool and run batches. The piece that takes longest is connecting to the file itself, not running the ldap query, so if you can find a more efficient way to do that you'll be golden.
Terminology Warning, Owner != Creator/Author
Last note on this. Owner != File Author. Many domain environments, and in particular SMB shares, automatically alter ownership from the creator to a group. In my case the results of the above is:
What I was actually looking for was the creator of the file. File creator and modifier aren't attributes which windows keeps track of by default. An administrator can enable policies to audit file changes in a share, or auditing can be enabled on a file-by-file basis using the Security->Advanced->Auditing functionality for an individual file (which does nothing to help you determine the creator).
That being said, some applications store that information for themselves. For example, if you're looking for Excel this answer provides a method for which to get the creator of any xls or xlsx files (doesn't work for xlsb due to the binary nature of the files). Unfortunately few files store this kind of information. In my case I was hoping to get that info for tblu, pbix, and other reporting type files. However, they don't contain this information (which is good from a privacy perspective).
So in case anyone finds this answer trying to solve the same kind of thing I did - Your best bet (to get actual authorship information) is to work with your domain IT administrators to get auditing setup.

How to properly invoke Python 3 script from SPSS syntax window using SCRIPT command (+ additional problems during runtime)

I would like to run two Python 3 scripts from SPSS syntax window. It is possible to perform it using BEGIN PROGRAM-END PROGRAM. block or SCRIPT command. This time I need to find a solution using second command.
Simplified code:
*** MACROS.
define export_tabs (!positional !tokens (1))
output modify
/select logs headings texts warnings pagetitles outlineheaders notes
/deleteobject delete = yes.
OUTPUT EXPORT
/CONTENTS EXPORT = visible LAYERS = printsetting MODELVIEWS = printsetting
/XLSX DOCUMENTFILE = "doc.xlsx"
OPERATION = createsheet
sheet = !quote(!unquote(!1))
LOCATION = lastcolumn NOTESCAPTIONS = no
!enddefine.
define matrix_tab (!positional !charend('/')
/!positional !charend('/')
/!positional !charend('/')
/!positional !charend('/')
/stat = !tokens (1))
!do !i !in (!3)
ctables
/mrsets countduplicates = no
/vlabels variables = !concat(!1,_,!2,_,!i) display = label
/table !concat(!1,_,!2,_,!i)
[rowpct.responses.count !concat(!unquote(!stat),"40.0"), totals[count f40.0]]
/slabels position = column visible = no
/clabels rowlabels = opposite
/categories variables = !concat(!1,_,!2,_,!i) order = a key = value
empty = include total = yes label = "VALID COUNT" position = after
/titles title = !upcase(!4).
!doend
!enddefine.
*** REPORT.
* Sheet 1.
output close all.
matrix_tab $Q1 / 1 / 1 2 / "QUESTION 1" / stat="pct".
script "C:\path\script 1.py".
script "C:\path\script 2.py".
export_tabs "Q1".
* Sheet 2.
output close all.
matrix_tab $Q2 / 2 / 3 4 / "QUESTION 2" / stat="pct".
script "C:\path\script 1.py".
script "C:\path\script 2.py".
export_tabs "Q2".
When I run a block for the first sheet everything works fine. However, when I run a block for the second sheet SPSS doesn't execute Python scripts and jumps straight to export_tabs macro (problems with synchronization?). I thought a problem had been possibly in a way I executed SCRIPT command. So I tried this:
script "C:\path\script 1.py" pythonversion = 3.
script "C:\path\script 2.py" pythonversion = 3.
but in effect SPSS - even though the syntax window coloured these parts of syntax - returned this error message:
>Error # 3251 in column 152. Text: pythonversion
>The SCRIPT command contains unrecognized text following the the file
>specification. The optional parameter must be a quoted string enclosed in
>parentheses.
>Execution of this command stops.
Has anyone of you had such problem and/or have an idea why this happens?
NOTE: Both Python scripts run smoothly from the Python 3.4.3 shell installed with the version of SPSS I have, thus I don't think the core of the problem is within those codes.
This seems to be a document defect in the way this keyword was implemented. I have been able to replicate it and have logged a defect with IBM SPSS Statistics Development.
In this case, the order matters. Rather than this:
script "C:\path\script 2.py" pythonversion = 3.
Try instead:
script pythonversion = 3 "C:\path\script 2.py".

Python: [Informix][Informix ODBC Driver]Invalid string or buffer length. SQLCODE=-11071 when I fetch data

When I try to retreive my table in informix with ifxpy package, I get this error:
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-6-b0557e7f099b> in <module>
16 while dictionary != False:
17 tph.append(dictionary)
---> 18 dictionary = IfxPy.fetch_assoc(stmt)
19 print(pd.DataFrame(tph))
Exception: [Informix][Informix ODBC Driver]Invalid string or buffer length. SQLCODE=-11071
This is my code:
import IfxPy
import pandas as pd
ConStr = "SERVER=informix1;DATABASE=ir_fmois;HOST=127.0.0.1;SERVICE=9092;UID=informix;PWD=1234;"
# netstat -a | findstr 9088
try:
# netstat -a | findstr 9088
conn = IfxPy.connect( ConStr, "", "")
except Exception as e:
print ('ERROR: Connect failed')
print ( e )
quit()
sql = "SELECT * FROM oih"
stmt = IfxPy.exec_immediate(conn, sql)
dictionary = IfxPy.fetch_assoc(stmt)
tph=[]
while dictionary != False:
tph.append(dictionary)
dictionary = IfxPy.fetch_assoc(stmt)
print(pd.DataFrame(tph))
When I print the dataframe I see that I got only the first 4 rows.
I tried also this code and i doesn't trhow any exception but it returns also the first 4 rows of my table:
import IfxPyDbi as dbapi2
ConStr = "SERVER=informix1;DATABASE=ir_fmois;HOST=127.0.0.1;SERVICE=9092;UID=informix;PWD=1234;"
conn = dbapi2.connect( ConStr, "", "")
cur = conn.cursor()
sql = "SELECT * FROM oih"
rows = cur.fetchall()
len(rows)
>>>4
EDIT
I tried importing columns one by one, and the same error has occured when I tried to select a byte (blob) column (with text data type). This column has emty values in the first 4 rows but the fifth wasn't empty, and I think this is why the error occured in line 5.
I would really appreciate if anyone has any idea on how to solve this.
The finderr command reports:
$ finderr -11071
-11071 Invalid string or buffer length.
This Informix CLI error code is the same as SQLSTATE value S1090. The
following functions can return this error code: SQLBindCol(),
SQLBindParameter(), SQLBrowseConnect(), SQLColAttributes(),
SQLColumnPrivileges(), SQLColumns(), SQLConnect(), SQLDataSources(),
SQLDescribeCol(), SQLDriverConnect(), SQLDrivers(), SQLExecDirect(),
SQLExecute(), SQLForeignKeys(), SQLGetCursorName(), SQLGetData(),
SQLGetInfo(), SQLNativeSql(), SQLPrepare(), SQLPrimaryKeys(),
SQLProcedureColumns(), SQLProcedures(), SQLPutData(), SQLSetCursorName(),
SQLSpecialColumns(), SQLStatistics(), SQLTablePrivileges(), and SQLTables().
The value specified for the argument cbValueMax is less than zero.
Supply a value for the argument cbValueMax that is zero or greater.
For all functions, an argument that specified a string or buffer length, such
as cbCursor, cbConnStrIn, or cbSqlStr, had one or more of the following
problems: (1) It was less than 0, (2) It was less than 0 but not equal to
SQL_NTS or SQL_NULL_DATA, (3) It was less than 0 but the corresponding
pointer was not a null pointer, (4) It was equal to 1, or (5) It was too
large. Set the string or buffer length to a valid value.
Additionally for SQLExecDirect() and SQLExecute(), a parameter value that was
set with SQLBindParameter() had one of the following problems: (1) It was a
null pointer, and the parameter length was not 0, SQL_NULL_DATA,
SQL_DATA_AT_EXEC, or less than or equal to SQL_LEN_DATA_AT_EXEC_OFFSET, or
(2) It was not a null pointer, and the parameter length was less than 0 but
was not SQL_NTS, SQL_NULL_DATA, SQL_DATA_AT_EXEC, or less than or equal to
SQL_LEN_EXEC_DATA_AT_EXEC_OFFSET. Set the parameter value to a valid value.
$
Since your Python code is not calling any of those functions directly, that suggests there is a bug in the ifxpy driver. It is probably calling one of the listed functions with an incorrect argument.
As such, you should probably report it as an 'issue' on the GitHub site for the driver: https://github.com/OpenInformix/IfxPy

Why doesn't psycopg2 allow us to open multiple server-side cursors in the same connection?

I am curious that why psycopg2 doesn't allow opening multiple server-side cursors (http://initd.org/psycopg/docs/usage.html#server-side-cursors) in the same connection. I got this problem recently and I have to solve it by replacing the second cursor by a client-side cursor. But I still want to know if there is any way to do that.
For example, I have these 2 tables on Amazon Redshift:
CREATE TABLE tbl_account (
acctid varchar(100),
regist_day date
);
CREATE TABLE tbl_my_artist (
user_id varchar(100),
artist_id bigint
);
INSERT INTO tbl_account
(acctid, regist_day)
VALUES
('TEST0000000001', DATE '2014-11-23'),
('TEST0000000002', DATE '2014-11-23'),
('TEST0000000003', DATE '2014-11-23'),
('TEST0000000004', DATE '2014-11-23'),
('TEST0000000005', DATE '2014-11-25'),
('TEST0000000006', DATE '2014-11-25'),
('TEST0000000007', DATE '2014-11-25'),
('TEST0000000008', DATE '2014-11-25'),
('TEST0000000009', DATE '2014-11-26'),
('TEST0000000010', DATE '2014-11-26'),
('TEST0000000011', DATE '2014-11-24'),
('TEST0000000012', DATE '2014-11-24')
;
INSERT INTO tbl_my_artist
(user_id, artist_id)
VALUES
('TEST0000000001', 2000011247),
('TEST0000000001', 2000157208),
('TEST0000000001', 2000002648),
('TEST0000000002', 2000383724),
('TEST0000000003', 2000002546),
('TEST0000000003', 2000417262),
('TEST0000000004', 2000076873),
('TEST0000000004', 2000417266),
('TEST0000000005', 2000077991),
('TEST0000000005', 2000424268),
('TEST0000000005', 2000168784),
('TEST0000000006', 2000284581),
('TEST0000000007', 2000284581),
('TEST0000000007', 2000000642),
('TEST0000000008', 2000268783),
('TEST0000000008', 2000284581),
('TEST0000000009', 2000088635),
('TEST0000000009', 2000427808),
('TEST0000000010', 2000374095),
('TEST0000000010', 2000081797),
('TEST0000000011', 2000420006),
('TEST0000000012', 2000115887)
;
I want to select from those 2 tables, then do something with query result.
I use 2 server-side cursors because I need 2 nested loops in my query. I want to use server-side cursor because the result can be very huge.
I use fetchmany() instead of fetchall() because I'm running on a single-node cluster.
Here is my code:
import psycopg2
from psycopg2.extras import DictCursor
conn = psycopg2.connect('connection parameters')
cur1 = conn.cursor(name='cursor1', cursor_factory=DictCursor)
cur2 = conn.cursor(name='cursor2', cursor_factory=DictCursor)
cur1.execute("""SELECT acctid, regist_day FROM tbl_account
WHERE regist_day <= '2014-11-25'
ORDER BY 1""")
for record1 in cur1.fetchmany(50):
cur2.execute("""SELECT user_id, artist_id FROM tbl_my_artist
WHERE user_id = '%s'
ORDER BY 1""" % (record1["acctid"]))
for record2 in cur2.fetchmany(50):
print '(acctid, artist_id, regist_day): (%s, %s, %s)' % (
record1["acctid"], record2["artist_id"], record1["regist_day"])
# do something with these values
conn.close()
When running, I got an error:
Traceback (most recent call last):
File "C:\Users\MLD1\Desktop\demo_cursor.py", line 20, in <module>
for record2 in cur2.fetchmany(50):
File "C:\Python27\lib\site-packages\psycopg2\extras.py", line 72, in fetchmany
res = super(DictCursorBase, self).fetchmany(size)
InternalError: opening multiple cursors from within the same client connection is not allowed.
That error occured at line 20, when I tried to fetch result from the second cursor.
An answer four years later, but it is possible to have more than one cursor open from the same connection. (It may be that the library was updated to fix the problem above.)
The caveat is that you are only allowed to call execute() only once using a named cursor, so if you reuse one of the cursors in the fetchmany loop you'd need to either remove the name or create another "anonymous" cursor.

Resources