LoadData script in Local SQL Server instance and Azure SQL Server - azure

I can run the following commands without any problems on my local SQL Server machine:
exec sp_configure 'show advanced options', 1
reconfigure
go
exec sp_configure 'Ad Hoc Distributed Queries', 1
reconfigure
go
exec LoadData 'C:\MyDataFile.urg';
go
But when I try to run the SP_CONFIGURE commands on Azure SQL, I get the following error:
Statement 'CONFIG' is not supported in this version of SQL Server.
And when I execute the Load data command I get the following error
Cannot bulk load because the file C:\MyDataFile.urg" could not be opened. Operating system error code (null).
The above error makes sense, since I am trying to access a file on my local machine from Azure cloud. Is there an equivalent process to Load data that I can follow in Azure to dump the contents of the file?
I can place the file in Azure blob but then what command I execute that will work similar to load data?
-- Update 1
Please keep in mind two things when answering
1) I am using a third party file that ends with .urg and is not a csv file.
2) When I use exec LoadData 'C:\MyDataFile.urg'; Note that I am not using a table name where file data will go to. LoadData command executes the file and dumps the data in respective files itself. I am assuming that .urg files gets opened and executes and has commands in it to know what data goes where.
--Update 2
So my understanding was incorrect. Found out that LoadData is a stored proc that the third party is using that takes the path to the file like this. File on the disk works great, I need to send it azure storage blob path.
CREATE PROCEDURE [dbo].[LoadData]
#DataFile NVARCHAR(MAX)
AS
DECLARE #LoadSql NVARCHAR(MAX)
SET #LoadSql = '
BULK INSERT UrgLoad FROM ''' + #DataFile + '''
WITH (
FIRSTROW = 2,
FIELDTERMINATOR = ''~'',
ROWTERMINATOR = ''0x0a'',
KEEPNULLS,
CODEPAGE = ''ACP''
)
'
EXEC sp_executesql #LoadSql
SELECT #Err = ##ERROR
Now I need to find a way to send a azure storage blob path to this stored proc in such a way that it can open it. I will update if I get into issues.
--Update 3
Since my blob storage account is not public, I am sure I need to add authorization piece. I added this piece of code to the proc
CREATE DATABASE SCOPED CREDENTIAL MyAzureBlobStorageCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sp=r&st=2020-03-10T01:04:16Z&se=2020-03-10T09:04:16Z&spr=https&sv=2019-02-02&sr=b&sig=Udxa%2FvPrUBZt09GAH4YgWd9joTlyxYDC%2Bt7j7CmuhvQ%3D';
-- Create external data source with the URL of the Blob storage Account and associated credential since its not public
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://dev.blob.core.windows.net/urg',
CREDENTIAL= MyAzureBlobStorageCredential
);
When I execute the proc says it already exists.
Msg 15530, Level 16, State 1, Procedure LoadData, Line 14 [Batch Start Line 1]
The credential with name "MyAzureBlobStorageCredential" already exists.
Msg 46502, Level 16, State 1, Procedure LoadData, Line 27 [Batch Start Line 1]
Type with name 'MyAzureBlobStorage' already exists.
When I take it out and I updated the Bulk insert piece of code like this
DECLARE #LoadSql NVARCHAR(MAX)
SET #LoadSql = '
BULK INSERT UrjanetLoad FROM ''' + #DataFile + '''
WITH ( DATA_SOURCE = ''MyAzureBlobStorage'',
FIRSTROW = 2,
FIELDTERMINATOR = ''~'',
ROWTERMINATOR = ''0x0a'',
KEEPNULLS,
CODEPAGE = ''ACP''
)
'
But it tells me
Cannot bulk load because the file "https://dev.blob.core.windows.net/urg/03_06_20_16_23.urg" could not be opened. Operating system error code 5(Access is denied.).
I guess the question is what am I missing in the authorization process and how can I make it a part of a stored proc I guess, so whenever it runs, it picks it.
Update 4: This article helped in accessing file from blob storage using credentials and dropping external data source and scoped credentials and getting a fresh SAS token in the stored proc, in case it can help someone else `https://social.technet.microsoft.com/wiki/contents/articles/52061.t-sql-bulk-insert-azure-csv-blob-into-azure-sql-database.aspx
Now I am getting error
Cannot bulk load because the file "03_06_20_16_23.urg" could not be opened. Operating system error code 32(The process cannot access the file because it is being used by another process.).
Tried this article but that does not address the file being in use by another process issue.
Update 5: Here is how the proc looks like
alter PROCEDURE [dbo].[TestLoad]
#DataFile NVARCHAR(MAX), #SAS_Token VARCHAR(MAX),#Location VARCHAR(MAX)
AS
BEGIN TRAN
-- Turn on NOCOUNT to prevent message spamming
SET NOCOUNT ON;
DECLARE #CrtDSSQL NVARCHAR(MAX), #DrpDSSQL NVARCHAR(MAX), #ExtlDS SYSNAME, #DBCred SYSNAME, #BulkInsSQL NVARCHAR(MAX) ;
SELECT #ExtlDS = 'MyAzureBlobStorage'
SELECT #DBCred = 'MyAzureBlobStorageCredential'
SET #DrpDSSQL = N'
IF EXISTS ( SELECT 1 FROM sys.external_data_sources WHERE Name = ''' + #ExtlDS + ''' )
BEGIN
DROP EXTERNAL DATA SOURCE ' + #ExtlDS + ' ;
END;
IF EXISTS ( SELECT 1 FROM sys.database_scoped_credentials WHERE Name = ''' + #DBCred + ''' )
BEGIN
DROP DATABASE SCOPED CREDENTIAL ' + #DBCred + ';
END;
';
SET #CrtDSSQL = #DrpDSSQL + N'
CREATE DATABASE SCOPED CREDENTIAL ' + #DBCred + '
WITH IDENTITY = ''SHARED ACCESS SIGNATURE'',
SECRET = ''' + #SAS_Token + ''';
CREATE EXTERNAL DATA SOURCE ' + #ExtlDS + '
WITH (
TYPE = BLOB_STORAGE,
LOCATION = ''' + #Location + ''' ,
CREDENTIAL = ' + #DBCred + '
);
';
-- PRINT #CrtDSSQL
EXEC (#CrtDSSQL);
-- Set up the load timestamp
DECLARE #LoadTime DATETIME, #Err varchar(60)
SELECT #LoadTime = GETDATE()
-- Set the bulk load command to a string and execute with sp_executesql.
-- This is the only way to do parameterized bulk loads
DECLARE #LoadSql NVARCHAR(MAX)
SET #LoadSql = '
BULK INSERT TestLoadTable FROM ''' + #DataFile + '''
WITH ( DATA_SOURCE = ''MyAzureBlobStorage'',
FIRSTROW = 2,
FIELDTERMINATOR = ''~'',
ROWTERMINATOR = ''0x0a'',
KEEPNULLS,
CODEPAGE = ''ACP''
)
'
EXEC (#LoadSql);
--EXEC sp_executesql #LoadSql
SELECT #Err = ##ERROR
IF #Err <> 0 BEGIN
PRINT 'Errors with data file ... aborting'
ROLLBACK
RETURN -1
END
SET NOCOUNT OFF;
COMMIT
GO
And this is how I am trying to call it.
EXEC TestLoad 'TestFile.csv',
'sv=2019-02-02&ss=bfqt&srt=sco&sp=rwdlacup&se=2020-03-16T02:07:03Z&st=2020-03-10T18:07:03Z&spr=https&sig=TleUPwAyEVT6dzX17fH6rq1lQQRAhIRImDHdJRKIrKE%3D',
''https://dev.blob.core.windows.net/urg';
and here is the error
Cannot bulk load because the file "TestFile.csv" could not be opened. Operating system error code 32(The process cannot access the file because it is being used by another process.).
Errors with data file ... aborting

If your file is placed on a public Azure Blob Storage account, you need to define EXTERNAL DATA SOURCE that points to that account:
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE, LOCATION = 'https://myazureblobstorage.blob.core.windows.net');
Once you define external data source, you can use the name of that source in BULK INSERT and OPENROWSET.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'some strong password';
CREATE DATABASE SCOPED CREDENTIAL MyAzureBlobStorageCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2015-12-11&ss=b&srt=sco&sp=rwac&se=2017-02-01T00:55:34Z&st=2016-12-29T16:55:34Z&spr=https&sig=copyFromAzurePortal';
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://myazureblobstorage.blob.core.windows.net',
CREDENTIAL= MyAzureBlobStorageCredential);

According my experience and all of the Azure SQL Database documents, we just can answer you that:
Azure SQL database doesn't support load file from on-premise/local computer directly.
Ref:
Azure SQL database doesn't support the .urg data file. We can not find any way support the urg file. Even Data Factory doesn't.
Reference:
Limitions: Only .mdf, .ldf, and .ndf files can be stored in
Azure Storage by using the SQL Server Data Files in Azure feature.
Data formats for import and export
Update:
I don't know if the urg file will be loaded successfully, but I find some ways you could try it:
You could upload your urg file to Blob storage firstly, then reference
this tutorial: Importing data from a file in Azure blob
storage.
Here's an another blog
Bulk insert using stored procedure and Bulk insert file path as stored procedure parameter
can help you pass the bulk insert file path as the parameter to the
stored procedure 'LoadData'.
Hope this helps.

Related

How to programmatically retrieve the workspace url and clusterOwnerUserId?

I would like to programmatically create the url to download a file.
To do this I need the workspaceUrl and clusterOwnerUserId.
How can I retrieve those in a Databricks notebook?
# how to get the `workspaceUrl` and `clusterOwnerUserId`?
tmp_file = '/tmp/output_abcd.xlsx'
filestore_file = '/FileStore/output_abcd.xlsx'
# code to create file omitted for brevity ...
dbutils.fs.cp(f'file:{tmp_file}', filestore_file)
downloadUrl = f'https://{workspaceUrl}/files/output_abcd.xlsx?o={clusterOwnerUserId}'
displayHTML(f"<a href='{downloadUrl}'>download</a>")
The variables are available in the spark conf.
E.g.
clusterOwnerUserId = spark.conf.get('spark.databricks.clusterUsageTags.orgId')
workspaceUrl = spark.conf.get('spark.databricks.workspaceUrl')
Use can then use the details as follows:
tmp_file = '/tmp/output_abcd.xlsx'
filestore_file = '/FileStore/output_abcd.xlsx'
# code to create file omitted for brevity ...
dbutils.fs.cp(f'file:{tmp_file}', filestore_file)
downloadUrl = f'https://{workspaceUrl}/files/output_abcd.xlsx?o={clusterOwnerUserId}'
displayHTML(f"<a href='{downloadUrl}'>download</a>")
Databricks Files in the Filestore at
/FileStore/my-stuff/my-file.txt is accessible at:
"https://databricks-instance-name.cloud.databricks.com/files/my-stuff/my-file.txt"
I don't think you need the o=... part. That is the workspace Id btw, not the clusterOwner user id.

Multiple SQL request on cx_Oracle?

I use cx_Oracle a Python Interface for Oracle Database.
In the same query, I have to:
Change session to CDB container
do a SQL request
Here is my code:
Query = f"""ALTER SESSION SET CONTAINER=cdb$root;
select RESOURCE_NAME as "Parametre",
MAX_UTILIZATION as "Valeur courrante",
LIMIT_VALUE as "Valeur limite",
decode( nvl( MAX_UTILIZATION, 0),0, 0, ceil ( ( MAX_UTILIZATION / LIMIT_VALUE) * 100) ) as "Pourcentage utilise"
from v$resource_limit
where RESOURCE_NAME in ('sessions','processes')
and decode( nvl( MAX_UTILIZATION, 0),0, 0, ceil ( ( MAX_UTILIZATION / LIMIT_VALUE) * 100) ) > {seuil_max_occupation_param}"""
SQL_Query = pd.read_sql_query(Query,conn)
But this code occurs an Oracle errors:
ORA-00922: option erronee ou absente
This set of request wors fine on SQLPlus for example.
I have a syntax problem but which one?
Thans a lot.
Best regards.
théo
All the Oracle APIs execute a single statement at a time. From the documentation:
cx_Oracle can be used to execute individual statements, one at a time. It does not read SQL*Plus “.sql” files.
You need to make multiple calls to execute(), passing the ALTER first, and then the SELECT. You need to use the same connection for both calls.
Alternatively you could wrap the SQL calls in a PL/SQL block (and return a REF CURSOR or Implicit Result Set).

How to convert selected files in S3 bucket into snowflake stage in order to load data into snowflake using python and boto3

I need to stage files which are in s3 bucket.first of all i find the latest file which upload to the given bucket and then i need to make those files into stage.not the whole bucket. for example let say i have bucket called topic. inside that i have 2 folders topic1 and topic2. those 2 folders has newly upload 2 files.in this case i need to make those newly upload file into stage in order to load those data into snowflake.i want to do this using python and boto3. i already built a code to find the latest file, but i don't know how make them as stage.when i used the CREATE OR REPLACE STAGE command with for loop for each file it will only create a stage for the last file. Not creating stage for each file. How should i do this?
` def download_s3_files(self):
s3_object = boto3.client('s3', aws_access_key_id=self.s3_acc_key, aws_secret_access_key=self.s3_sec_key)
if self.source_as_stage:
no_of_dir = []
try:
bucket = s3_object.list_objects(Bucket=self.s3_bucket, Prefix=self.file_path, Delimiter='/')
print("object bucket list >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>", bucket)
except Exception as e:
self.propagate_log_msg('check [%s] and Source File Location Path' % e)
for directory in bucket['CommonPrefixes']:
no_of_dir.append(str(directory['Prefix']).rsplit('/', 2)[-2])
print(no_of_dir)
no_of_dir.sort(reverse=True)
latest_dir = no_of_dir[0]
self.convert_source_as_stage(latest_dir)
except Exception as e:
print(e)
exit(-1)
def convert_source_as_stage(self, latest_file):
source_file_format = str(self.metadata['source_file_format']).lower()+'_format' if self.metadata['source_file_format'] is not None else 'pipe_format'
url = 's3://{bucket}/{location}/{dir_}'.format(location=self.s3_file_loc.strip("/"),
bucket=self.s3_bucket, dir_=latest_file)
print("formateed url>>>>>>>>>>>>>>>>>>", url)
file_name_dw = str(latest_file.rsplit('/', 1)[-1])
print("File_Name>>>>>>>>>>>>>", file_name_dw)
print("Source file format :", source_file_format)
print("source url: ", url)
self.create_stage = """
CREATE OR REPLACE STAGE {sa}.{table} URL='{url}'
CREDENTIALS=(AWS_KEY_ID='{access_key}' AWS_SECRET_KEY='{secret}')
FILE_FORMAT = {file};
// create or replace stage {sa}.{table}
// file_format = (type = 'csv' field_delimiter = '|' record_delimiter = '\\n');
""".format(sa=self.ss_cd, table=self.table.lower(), access_key=self.s3_acc_key, secret=self.s3_sec_key,
url=url, file=source_file_format, filename=str(self.metadata['source_table']))
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
'''CONNECT TO SNOWFLAKE''''''''''''''''''''''''''''''''''
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
print("Create Stage Statement :", self.create_stage)
con = snowflake.connector.connect(
user=self.USER,
password=self.PASSWORD,
account=self.ACCOUNT,
)
self.propagate_log_msg("Env metadata = [%s]" % self.env_metadata)
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
'''REFRESH DDL''''''''''''''''''''''''''''''''''
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
try:
file_format_full_path = os.path.join(self.root, 'sql', str(source_file_format)+'.sql')
self.create_file_format = open(file_format_full_path, 'r').read()
self.create_schema = "CREATE schema if not exists {db_lz}.{sa}".format(sa=self.ss_cd, db_lz=self.db_lz)
env_sql = 'USE database {db_lz}'.format(db_lz=self.db_lz)
self.propagate_log_msg(env_sql)
con.cursor().execute(env_sql)
con.cursor().execute(self.create_schema)
env_sql = 'USE schema {schema}'.format(schema=self.ss_cd)
self.propagate_log_msg(env_sql)
con.cursor().execute(env_sql)
con.cursor().execute(self.create_file_format)
con.cursor().execute(self.create_stage)
except snowflake.connector.ProgrammingError as e:
self.propagate_log_msg('Invalid sql, fix sql and retry')
self.propagate_log_msg(e)
exit()
except KeyError:
self.propagate_log_msg(traceback.format_exc())
self.propagate_log_msg('deploy_ods is not set in schedule metadata, assuming it is False')
except Exception as e:
self.propagate_log_msg('unhandled exception, debug')
self.propagate_log_msg(traceback.format_exc())
exit()
else:
self.propagate_log_msg(
"Successfully dropped and recreated table/stage for [{sa}.{table}]".format(sa=self.ss_cd,
table=self.table))`
Perhaps you can take a step back and give a bigger picture of what you are trying to achieve. That will help others in order to give good advice.
Best practice is to create one Snowflake STAGE for the whole bucket. The STAGE object then mirrors the bucket object. If your setup needs eg. different permissions for different parts of the bucket, then it can make sense to create multiple stages with different access rights.
It looks like the purpose of setting up stages is to import S3 objects into Snowflake tables. This is done with the COPY INTO <table> command, and that command has two options for selecting objects/filenames to import:
FILES = ( '<file_name>' [ , '<file_name>' ] [ , ... ] )
PATTERN = '<regex_pattern>'
I suggest you put your effort into the COPY INTO <table> parameters instead of creating excess amounts of STAGE objects in the database.
You should also take a serious look into Snowpipes. Snowpipes does the job importing S3 objects near-realtime into Snowflake tables with COPY INTO <table> commands triggered by S3 eg. create object events. Snowpipes cost less than warehouses as they are not dedicated resources.
Simple and effective.

How do you process many files from a blob storage with long paths in databricks?

I've enabled logging for an API Management service and the logs are being stored in a storage account. Now I'm trying to process them in an Azure Databricks workspace but I'm struggling with accessing the files.
The issue seems to be that the automatically generated virtual folder structure looks like this:
/insights-logs-gatewaylogs/resourceId=/SUBSCRIPTIONS/<subscription>/RESOURCEGROUPS/<resource group>/PROVIDERS/MICROSOFT.APIMANAGEMENT/SERVICE/<api service>/y=*/m=*/d=*/h=*/m=00/PT1H.json
I've mounted the insights-logs-gatewaylogs container under /mnt/diags and a dbutils.fs.ls('/mnt/diags') correctly lists the resourceId= folder but dbutils.fs.ls('/mnt/diags/resourceId=') claims file not found
If I create empty marker blobs along the virtual folder structure I can list each subsequent level but that strategy obviously falls down since the last part of the path is dynamically organized by year/month/day/hour.
For example a
spark.read.format('json').load("dbfs:/mnt/diags/logs/resourceId=/SUBSCRIPTIONS/<subscription>/RESOURCEGROUPS/<resource group>/PROVIDERS/MICROSOFT.APIMANAGEMENT/SERVICE/<api service>/y=*/m=*/d=*/h=*/m=00/PT1H.json")
Yields in this error:
java.io.FileNotFoundException: File/resourceId=/SUBSCRIPTIONS/<subscription>/RESOURCEGROUPS/<resource group>/PROVIDERS/MICROSOFT.APIMANAGEMENT/SERVICE/<api service>/y=2019 does not exist.
So clearly the wild-card has found the first year folder but is refusing to go further down.
I setup a copy job in Azure Data Factory that copies all the json blobs within the same blob storage account successfully and removes the resourceId=/SUBSCRIPTIONS/<subscription>/RESOURCEGROUPS/<resource group>/PROVIDERS/MICROSOFT.APIMANAGEMENT/SERVICE/<api service> prefix (so the root folder starts with the year component) and that can be accessed successfully all the way down without having to create empty marker blobs.
So the problem seems to be related the to the long virtual folder structure which is mostly empty.
Is there another way on how to process these kind of folder structures in databricks?
Update: I've also tried providing the path as part of the source when mounting but that doesn't help either
I think I may have found the root cause of this. Should have tried this earlier but I provided the exact path to an existing blob like this:
spark.read.format('json').load("dbfs:/mnt/diags/logs/resourceId=/SUBSCRIPTIONS/<subscription>/RESOURCEGROUPS/<resource group>/PROVIDERS/MICROSOFT.APIMANAGEMENT/SERVICE/<api service>/y=2019/m=08/d=20/h=06/m=00/PT1H.json")
And I got a more meaningful error back:
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: Incorrect Blob type, please use the correct Blob type to access a blob on the server. Expected BLOCK_BLOB, actual APPEND_BLOB.
Turns out the out-of-the box logging creates append blobs (and there doesn't seem to be a way to change this) and support for append blobs is still WIP by the looks of this ticket: https://issues.apache.org/jira/browse/HADOOP-13475
The FileNotFoundException could be a red herring which might caused by the inner exception being swallowed when trying expand the wild-cards and finding an unsupported blob type.
Update
Finally found a reasonable work-around. I installed the azure-storage Python package in my workspace (if you're at home with Scala it's already installed) and did the blob loading myself. Most code below is to add globbing support, you don't need it if you're happy to just match on path prefix:
%python
import re
import json
from azure.storage.blob import AppendBlobService
abs = AppendBlobService(account_name='<account>', account_key="<access_key>")
base_path = 'resourceId=/SUBSCRIPTIONS/<subscription>/RESOURCEGROUPS/<resource group>/PROVIDERS/MICROSOFT.APIMANAGEMENT/SERVICE/<api service>'
pattern = base_path + '/*/*/*/*/m=00/*.json'
filter = glob2re(pattern)
spark.sparkContext \
.parallelize([blob.name for blob in abs.list_blobs('insights-logs-gatewaylogs', prefix=base_path) if re.match(filter, blob.name)]) \
.map(lambda blob_name: abs.get_blob_to_bytes('insights-logs-gatewaylogs', blob_name).content.decode('utf-8').splitlines()) \
.flatMap(lambda lines: [json.loads(l) for l in lines]) \
.collect()
glob2re is courtesy of https://stackoverflow.com/a/29820981/220986:
def glob2re(pat):
"""Translate a shell PATTERN to a regular expression.
There is no way to quote meta-characters.
"""
i, n = 0, len(pat)
res = ''
while i < n:
c = pat[i]
i = i+1
if c == '*':
#res = res + '.*'
res = res + '[^/]*'
elif c == '?':
#res = res + '.'
res = res + '[^/]'
elif c == '[':
j = i
if j < n and pat[j] == '!':
j = j+1
if j < n and pat[j] == ']':
j = j+1
while j < n and pat[j] != ']':
j = j+1
if j >= n:
res = res + '\\['
else:
stuff = pat[i:j].replace('\\','\\\\')
i = j+1
if stuff[0] == '!':
stuff = '^' + stuff[1:]
elif stuff[0] == '^':
stuff = '\\' + stuff
res = '%s[%s]' % (res, stuff)
else:
res = res + re.escape(c)
return res + '\Z(?ms)'
Not pretty but avoids the copying around of data and can be wrapped up in a little utility class.
Try reading directly from the blob, not through the mount
You need to set up either access key or sas for this but I assume you know that
SAS
spark.conf.set(
"fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net",
"<complete-query-string-of-sas-for-the-container>")
or Access key
spark.conf.set(
"fs.azure.account.key.<storage-account-name>.blob.core.windows.net",
"<storage-account-access-key>")
then
val df = spark.read.json("wasbs://<container>#<account-name>.blob.core.windows.net/<path>")
For now this operation is not supported. It is crap that Microsoft provides technology that are not working each other (BYOML -> Log Analytics and export rule to storage account and then read the data from Databricks for example).
There is a workaround for that. You can create your own custom class and read it. Please take a look at the example of reading am-securityevent data on BYOML git:
https://github.com/Azure/Azure-Sentinel-BYOML

How to make connection in python to connect as400 and call any as400 programs with parameter

Anyone knows How to make connection in python to connect as400 iseries system and call any as400 programs with parameter.
For example how to create library by connecting as400 through python. I want to call " CRTLIB LIB(TEST) " from python script.
I am able to connect to DB2 database through pyodbc package.
Here is my code to connect DB2 database.
import pyodbc
connection = pyodbc.connect(
driver='{iSeries Access ODBC Driver}',
system='ip/hostname',
uid='username',
pwd='password')
c1 = connection.cursor()
c1.execute('select * from libname.filename')
for row in c1:
print (row)
If your IBM i is set up to allow it, you can call the QCMDEXC stored procedure using CALL in your SQL. For example,
c1.execute("call qcmdexc('crtlib lib(test)')")
The QCMDEXC stored procedure lives in QSYS2 (the actual program object is QSYS2/QCMDEXC1) and does much the same as the familiar program of the same name that lives in QSYS, but the stored procedure is specifically meant to be called via SQL.
Of course, for this example to work, your connection profile has to have the proper authority to create libraries.
It's also possible that your IBM i isn't set up to allow this. I don't know exactly what goes into enabling this functionality, but where I work, we have one partition where the example shown above completes normally, and another partition where I get this instead:
pyodbc.Error: ('HY000', '[HY000] [IBM][System i Access ODBC Driver][DB2 for i5/OS]SQL0901 - SQL system error. (-901) (SQLExecDirectW)')
This gist shows how to connect to an AS/400 via pyodbc:
https://gist.github.com/BietteMaxime/6cfd5b2dc2624c094575
A few notes; in this example, SYSTEM is the DSN you're set up for the AS/400 in the with pyodbc.connect statement. You could also switch this to be SERVER and PORT with these modifications:
import pyodbc
class CommitMode:
NONE = 0 # Commit immediate (*NONE) --> QSQCLIPKGN
CS = 1 # Read committed (*CS) --> QSQCLIPKGS
CHG = 2 # Read uncommitted (*CHG) --> QSQCLIPKGC
ALL = 3 # Repeatable read (*ALL) --> QSQCLIPKGA
RR = 4 # Serializable (*RR) --> QSQCLIPKGL
class ConnectionType:
READ_WRITE = 0 # Read/Write (all SQL statements allowed)
READ_CALL = 1 # Read/Call (SELECT and CALL statements allowed)
READ_ONLY = 2 # Read-only (SELECT statements only)
def connstr(server, port, commit_mode=None, connection_type=None):
_connstr = 'DRIVER=iSeries Access ODBC Driver;SERVER={server};PORT={port};SIGNON=4;CCSID=1208;TRANSLATE=1;'.format(
server=server,
port=port,
)
if commit_mode is not None:
_connstr = _connstr + 'CommitMode=' + str(commit_mode) + ';'
if connection_type is not None:
_connstr = _connstr + 'ConnectionType=' + str(connection_type) + ';'
return _connstr
def main():
with pyodbc.connect(connstr('myas400.server.com', '8471', CommitMode.CHG, ConnectionType.READ_ONLY)) as db:
cursor = db.cursor()
cursor.execute(
"""
SELECT * FROM IASP.LIB.FILE
"""
)
for row in cursor:
print(' '.join(map(str, row)))
if __name__ == '__main__':
main()
I cleaned up some PEP-8 as well. Good luck!

Resources