i have written a python program that helps copy over data from one Oracle DB to another. It uses threading to concurrently copy data over several threads.
def run_sqlplus(sqlplus_script):
p = subprocess.Popen(['sqlplus', conn],stdin=subprocess.PIPE,
stdout=subprocess.PIPE,stderr=subprocess.PIPE)
(stdout,stderr) = p.communicate(sqlplus_script.encode('utf-8'))
stdout_lines = stdout.decode('utf-8').split("\n")
return stdout_lines
The list specifies the queries to be executed
LST=["select COUNT(1) FROM a WHERE BUSINESS_DATE='30-Sep-2020' AND RUN_ID=1 AND PROCESSING_LOCATION='PAC';",
"select COUNT(1) FROM b WHERE BUSINESS_DATE='30-Sep-2020' AND RUN_ID=1 AND PROCESSING_LOCATION='PAC';",
"COPY FROM schema/pass#DB APPEND TABLE1 using select * FROM TABLE1 WHERE BUSINESS_DATE='30-Sep-2020' ;"
]
I have mapped the List (LST) to the the below threadpool executor:
with concurrent.futures.ThreadPoolExecutor(max_threads) as executor:
results=[executor.submit(execute_queries,i) for i in LST]
for f in concurrent.futures.as_completed(results):
print(f.result())
While running the code it seems the copy commands are running longer than expected. Is there any way that i can monitor the progress of the threads in python or the SID/SERIAL# in Oracle DB.
Related
I have setup an AWS Lambda function with python to ingest requests from a CSV and then query an AWS Serverless Aurora PostgreSQL database based on this request. The function works when the requests are less then 1K but I get errors due to a hard limit in the data API. I am trying to figure out a way to break up the query into smaller queries once this limit is hit but not sure how to do this in python. Can someone suggest a way to break up a request into smaller chunks so I do not hit the data API limit?
Snippet of Code Used:
#Set Database connection params
engine = wr.data_api.rds.connect( resource_arn = resource_arn, database=database_name, secret_arn=secret_arn)
#read in s3 csv and select ids
read_df = wr.s3.read_csv(path=s3_path_in)
requested_ids = read_df["ids"]
in_ID = requested_ids.tolist()
in_query= str(tuple(in_ID))
#query postgres
query_id = """
select c.*
from table1 b
INNER JOIN table2 c on b.id = c.id
where b.id in %s
""" % in_query
out_ids = wr.data_api.rds.read_sql_query(query_id, engine)
one way that i can think of is to use the LIMIT <row_count> clause of the postgres SQL and dynamically pass the row_count to your query .
select c.*
from table1 b
INNER JOIN table2 c on b.id = c.id
where b.id in <>
order by <primary_key>
limit 999
PostgreSQL LIMIT
I have a variable SCRIPT which has two to three DML statements. I want to run them sequentially after connecting to my Oracle DB. I have tried the below but it is failing with below error
c.execute(SCRIPT)
cx_Oracle.DatabaseError: ORA-00933: SQL command not properly ended
Below is the piece of code tried.
SCRIPT="""UPDATE IND_AFRO.DRIVER
SET Emp_Id = 1000, update_user_id = 'RIBST-4059'
WHERE Emp_Id IN (SELECT Emp_Id
FROM IND_AFRO.DRIVER Ddq
WHERE NOT EXISTS
(SELECT 1
FROM IND_AFRO_AF.EMPLOYEE
WHERE Emp_Id = Ddq.Emp_Id)
AND Functional_Area_Cd = 'DC');
UPDATE IND_AFRO.APPOINTMENTS
SET Emp_Id = 1000, update_user_id = 'RIBST-4059'
WHERE Emp_Id IN (SELECT Emp_Id
FROM IND_AFRO.APPOINTMENTS Ddq
WHERE NOT EXISTS
(SELECT 1
FROM IND_AFRO_AF.EMP
WHERE Emp_Id = Ddq.Emp_Id));
UPDATE IND_AFRO.ar_application_for_aid a
SET a.EMP_ID = 1000
WHERE NOT EXISTS
(SELECT 1
FROM IND_AFRO_AF.EMP
WHERE emp_id = a.emp_id);"""
conn = cx_Oracle.connect(user=r'SYSTEM', password='ssadmin', dsn=CONNECTION)
c = conn.cursor()
c.execute(SCRIPT)
c.close()
The execute() and executemany() functions only work on one SQL or PL/SQL statement.
You can wrap the three statements in a PL/SQL BEGIN/END block like:
SQL> begin
2 insert into test values(1);
3 update test set a = 2;
4 end;
5 /
PL/SQL procedure successfully completed.
Alternatively you can split up your string into individual statements. If the statements originate from a file, you can write a wrapper to read file and execute each statement. This is a lot easier if you restrict the SQL syntax (particularly regarding line terminators). For an example, see https://github.com/oracle/python-cx_Oracle/blob/master/samples/SampleEnv.py#L116
However this means calling execute() more times, which isn't as efficient as the first solution.
Some of my tables are of type REPLICATE. I would these tables to be actually replicated (not pending) before I start querying my data. This will help me avoid data movement.
I have a script, which I found online, which runs in a loop and do a SELECT TOP 1 on all the tables which are set for replication, but sometimes the script runs for hours. It may seem as the server sometimes won't trigger replication even if you do a SELECT TOP 1 from foo.
How can you force SQL Datawarehouse to complete replication?
The script looks something like this:
begin
CREATE TABLE #tbl
WITH
( DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT
ROW_NUMBER() OVER(
ORDER BY
(
SELECT
NULL
)) AS Sequence
, CONCAT('SELECT TOP(1) * FROM ', s.name, '.', t.[name]) AS sql_code
FROM sys.pdw_replicated_table_cache_state AS p
JOIN sys.tables AS t
ON t.object_id = p.object_id
JOIN sys.schemas AS s
ON t.schema_id = s.schema_id
WHERE p.[state] = 'NotReady';
DECLARE #nbr_statements INT=
(
SELECT
COUNT(*)
FROM #tbl
), #i INT= 1;
WHILE #i <= #nbr_statements
BEGIN
DECLARE #sql_code NVARCHAR(4000)= (SELECT
sql_code
FROM #tbl
WHERE Sequence = #i);
EXEC sp_executesql #sql_code;
SET #i+=1;
END;
DROP TABLE #tbl;
SET #i = 0;
WHILE
(
SELECT TOP (1)
p.[state]
FROM sys.pdw_replicated_table_cache_state AS p
JOIN sys.tables AS t
ON t.object_id = p.object_id
JOIN sys.schemas AS s
ON t.schema_id = s.schema_id
WHERE p.[state] = 'NotReady'
) = 'NotReady'
BEGIN
IF #i % 100 = 0
BEGIN
RAISERROR('Replication in progress' , 0, 0) WITH NOWAIT;
END;
SET #i = #i + 1;
END;
END
Henrik, if 'select top 1' doesn't trigger a replicated table build, then that would be a defect. Please file a support ticket.
Without looking at your system, it is impossible to know exactly what is going on. Here are a couple of things that could be in factoring into extended build time to look into:
The replicated tables are large (size, not necessarily rows) requiring long build times.
There are a lot of secondary indexes on the replicated table requiring long build times.
Replicated table builds require statirc20 (2 concurrency slots). If the concurrency slots are not available, the build will queue behind other running queries.
The replicated tables are constantly being modified with inserts, updates and deletes. Modifications require the table to be built again.
The best way is to run a command like this as part of the job which creates/updates the table:
select top 1 * from <table>
That will force its redistribution at the correct time, without the slow loop through the stored procedure.
I need to get the list of tables used in a stored procedure,However in Azure Datawarehouse sp_depends is not supported.
The other alternative I thought of having is to get the stored proc code from INFORMATION_SCHEMA.ROUTINES and then run a script to get the [schema].[tablename] from the stored procedure definition but here the issue is in storing the whole stored proc into a variable. VARCHAR(MAX)has a limit of 8000 to store and if my proc exceeds this limit then I wont be able to get the complete table list.
Try using sys.sql_expression_dependencies. The following query may help you:
SELECT ReferencingObjectType = o1.type,
ReferencingObject = SCHEMA_NAME(o1.schema_id)+'.'+o1.name,
ReferencedObject = SCHEMA_NAME(o2.schema_id)+'.'+ed.referenced_entity_name,
ReferencedObjectType = o2.type
FROM sys.sql_expression_dependencies ed
INNER JOIN sys.objects o1
ON ed.referencing_id = o1.object_id
INNER JOIN sys.objects o2
ON ed.referenced_id = o2.object_id
WHERE o1.type in ('P','TR','V', 'TF')
ORDER BY ReferencingObjectType, ReferencingObject
I have a preview version of Azure Sql Data Warehouse running which was working fine until I imported a large table (~80 GB) through BCP. Now all the tables including the small one do not respond even to a simple query
select * from <MyTable>
Queries to Sys tables are working still.
select * from sys.objects
The BCP process was left over the weekend, so any Statistics Update should have been done by now. Is there any way to figure out what is making this happen? Or at lease what is currently running to see if anything is blocking?
I'm using SQL Server Management Studio 2014 to connect to the Data Warehouse and executing queries.
#user5285420 - run the code below to get a good view of what's going on. You should be able to find the query easily by looking at the value in the "command" column. Can you confirm if the BCP command still shows as status="Running" when the query steps are all complete?
select top 50
(case when requests.status = 'Completed' then 100
when progress.total_steps = 0 then 0
else 100 * progress.completed_steps / progress.total_steps end) as progress_percent,
requests.status,
requests.request_id,
sessions.login_name,
requests.start_time,
requests.end_time,
requests.total_elapsed_time,
requests.command,
errors.details,
requests.session_id,
(case when requests.resource_class is NULL then 'N/A'
else requests.resource_class end) as resource_class,
(case when resource_waits.concurrency_slots_used is NULL then 'N/A'
else cast(resource_waits.concurrency_slots_used as varchar(10)) end) as concurrency_slots_used
from sys.dm_pdw_exec_requests AS requests
join sys.dm_pdw_exec_sessions AS sessions
on (requests.session_id = sessions.session_id)
left join sys.dm_pdw_errors AS errors
on (requests.error_id = errors.error_id)
left join sys.dm_pdw_resource_waits AS resource_waits
on (requests.resource_class = resource_waits.resource_class)
outer apply (
select count (steps.request_id) as total_steps,
sum (case when steps.status = 'Complete' then 1 else 0 end ) as completed_steps
from sys.dm_pdw_request_steps steps where steps.request_id = requests.request_id
) progress
where requests.start_time >= DATEADD(hour, -24, GETDATE())
ORDER BY requests.total_elapsed_time DESC, requests.start_time DESC
Checkout the resource utilization and possibly other issues from https://portal.azure.com/
You can also run sp_who2 from SSMS to get a snapshot of what's threads are active and whether there's some crazy blocking chain that's causing problems.