Run complex SQL queries using python - python-3.x

I am trying to run this SQL query by connecting pyodbc and SQL Server. The query selects columns from existing tables and performs a select into a global temp table.
sql_extract_query = """SELECT left(convert(varchar(8),dim_orderdate_key,112),6) as forYYYYMM,
CASE
WHEN ForProfile IN ('AA','AB','AC','AD','AE') then 'Stage 1'
WHEN ForProfile IN ('AF','AG','AH','AI') then 'Stage 2'
END AS VODAUK_Stage,
hasPaymentOK,
amount,
CASE
WHEN amount>0 AND amount <=50 THEN 'A £0000-£0050'
WHEN amount>50 AND amount<=100 THEN 'B £0050-£0100'
WHEN amount>100 AND amount<=150 THEN 'C £0100-£0150'
WHEN amount>150 AND amount<=200 THEN 'D £0150-£0200'
WHEN amount>2500 THEN 'I £2500+'
END AS ABC_Amt_Bucket,
CustID,
sumPaymentPaidAmount
INTO ##tmpCaseResult
FROM db as CR WITH (NOLOCK)
inner join db.[ConsumerAccount] byConsAcc on byConsAcc.consumeraccount_key = [CR].consumeraccount_key
inner join db.[CustomerAccount] byCustAcc on byCustAcc.customeraccount_key = [CR].customeraccount_key
WHERE byCustAcc.ForCustomer = 'ABC' AND
[amount] > 0 AND
CR.orderdate_key >= cast(convert(char(6),dateadd(month,-18,getdate()),112)+'01' as int)
AND [ForProfile] like 'A%';"""
Wondering if there exist any ways to run this complex query? I have tried with pyodbc and pymssql, but the temp table was not created.
Any suggestion please?

Related

Convert SQL query to SQLAlchemy Postgres

I am new in Python and I need to convert SQL nested query to SQLAlchemy for Postgres DB.
I would like to use session.query with filter, grouping and order.
For better understation I have a couple of complicated examples.
postgres
Case 1
SELECT date(to_timestamp(table1.timestamp)) AS date, table1.loc_a
FROM table1
WHERE ((table1.loc_b = true) AND (NOT ((table1.loc_c))::text IN (SELECT table2.loc_a FROM table2))) AND (table1.text !~~ '/%'::text));
Case 2
SELECT DISTINCT table1.id, to_timestamp(table1.timestamp) AS date, table1.loc_a, table1.loc_b
FROM table1
WHERE ((table1.loc_k = false) AND (NOT ((table1.loc_c)::text IN (SELECT table2.loc_a FROM table2))) AND (table1.text !~~ '/%'::text));
Thank you for your help in advance
For Case 1:
session.query(func.date(func.to_timestamp(table1.timestamp)), tabl1.loc_a)
.filter(...)
I don't know what next.

Split SQL Where IN Clause When List is to big into Smaller Requests in Python

I have setup an AWS Lambda function with python to ingest requests from a CSV and then query an AWS Serverless Aurora PostgreSQL database based on this request. The function works when the requests are less then 1K but I get errors due to a hard limit in the data API. I am trying to figure out a way to break up the query into smaller queries once this limit is hit but not sure how to do this in python. Can someone suggest a way to break up a request into smaller chunks so I do not hit the data API limit?
Snippet of Code Used:
#Set Database connection params
engine = wr.data_api.rds.connect( resource_arn = resource_arn, database=database_name, secret_arn=secret_arn)
#read in s3 csv and select ids
read_df = wr.s3.read_csv(path=s3_path_in)
requested_ids = read_df["ids"]
in_ID = requested_ids.tolist()
in_query= str(tuple(in_ID))
#query postgres
query_id = """
select c.*
from table1 b
INNER JOIN table2 c on b.id = c.id
where b.id in %s
""" % in_query
out_ids = wr.data_api.rds.read_sql_query(query_id, engine)
one way that i can think of is to use the LIMIT <row_count> clause of the postgres SQL and dynamically pass the row_count to your query .
select c.*
from table1 b
INNER JOIN table2 c on b.id = c.id
where b.id in <>
order by <primary_key>
limit 999
PostgreSQL LIMIT

Getting error while running an sql script in ADW

Am getting an error that goes like this:
Insert values statement can contain only constant literal values or variable references.
these are the statements in which I am getting the errors:
INSERT INTO val.summary_numbers (metric_name, metric_val, dt_create) VALUES ('Total IP Enconters',
(SELECT
count(DISTINCT encounter_id)
FROM prod.encounter
WHERE encounter_type = 'Inpatient')
,
(SELECT min(mod_loadidentifier)
FROM ccsm.stg_demographics_baseline)
);
INSERT INTO val.summary_numbers (metric_name, metric_val, dt_create) VALUES ('Total 30d Readmits',
(SELECT
count(DISTINCT encounter_id)
FROM prod.encounter_attr
WHERE
attr_name = 'day_30_readmit' AND attr_value = 1)
,
(SELECT min(mod_loadidentifier)
FROM ccsm.stg_demographics_baseline));
Change your query like this:
insert into val.summary_numbers
select
'Total IP Enconters',
(select count(distinct encounter_id)
from prod.encounter
where encounter_type = 'Inpatient'),
(select min(mod_loadidentifier)
from ccsm.stg_demographics_baseline)
When using the ADW service, I would recommend that you consider using the CTAS operation possibly combined with a RENAME. The RENAME is a metadata operation so it is fast and the CTAS is parallel where the INSERT INTO will be row by row.
You may still have a data related issue that can be hard to determine with out the create table statement.
Thanks

Query and get all database names and subquery especific tables from all databases

I have different databases. I have tables within each database.
I would like to know if I can ask how many databases excluding some such as 'schema' 'mysql' I have once know how to perform a subquery asked by a particular table of all the databases resulting from the first question.
example.
the structure would be
db1 -> user-> id,name,imei,telephone,etc..
db2 -> user-> id,nameuser,imei,telephone,etc..
db3 -> user-> id,nameuser,imei,telephone,etc..
....
db1000 -> user-> id,nameuser,imei,telephone,etc..
the query are how this, but this get error
SELECT CONCAT('SELECT * FROM ' schema_name 'where imei.'schema_name = nameimai)
FROM information_schema.schemata
WHERE schema_name NOT IN ('information_schema','mysql','performance_schema','sys','performance_schema','phpmyadmin');
Results
name db id name imei phone
---------- ---------- ---------- ---------- ----------
db1 1 John 76876876 xxx
db2 2300 John 76876876 xxxx
...
db1000 45 John 76876876 xxx
its possible in one query
thanks..
Here's one way you could do it with a stored procedure.
If I understand correctly, you have multiple databases with identical tables (user) and you want to run a query against all these tables for a specific value.
I've made this fairly general so that you can pass in the table name and also the where clause. Your example seemed to be looking for user records with imei = '76876876', so if we use that example.
USE test;
DELIMITER //
DROP PROCEDURE IF EXISTS multidb_select //
-- escape any quotes in the query string
-- call multidb_select ('usertest','WHERE imei = \'76876876\'')
CREATE PROCEDURE multidb_select(IN tname VARCHAR(64), IN qwhere VARCHAR(1024))
READS SQL DATA
BEGIN
DECLARE vtable_schema VARCHAR(64);
DECLARE vtable_name VARCHAR(64);
DECLARE done BOOLEAN DEFAULT FALSE;
-- exclude views and system tables
DECLARE cur1 CURSOR FOR
SELECT `table_schema`, `table_name`
FROM `information_schema`.`tables`
WHERE `table_name` = tname
AND `table_type` = 'BASE TABLE'
AND `table_schema` NOT IN
('information_schema','mysql','performance_schema',
'sys','performance_schema','phpmyadmin')
ORDER BY `table_schema` ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done := TRUE;
OPEN cur1;
SET #unionall := '';
read_loop: LOOP
FETCH cur1 INTO vtable_schema, vtable_name;
IF done THEN
LEAVE read_loop;
END IF;
-- UNION ALL in case the id is the same
IF CHAR_LENGTH(#unionall) = 0 THEN
SET #unionall =
CONCAT("SELECT \'", vtable_schema , "\' AS 'Db', t.* FROM `",
vtable_schema, "`.`" , vtable_name, "` t ", qwhere);
ELSE
SET #unionall =
CONCAT(#unionall, " UNION ALL SELECT \'", vtable_schema ,
"\' AS 'Db', t.* FROM `", vtable_schema,
"`.`", vtable_name, "` t ", qwhere);
END IF;
END LOOP;
CLOSE cur1;
PREPARE stmt FROM #unionall;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END //
DELIMITER ;
Run it with
call test.multidb_select('user','WHERE imei = \'76876876\'')

How to get sub query columns in main query with WHERE EXISTS in PostgreSQL?

I am stuck with a query which takes more time in JOIN, I want to use WHERE EXISTS in place of JOIN since as performance wise EXISTS takes less time than it.
I have modified the query and it's executing as per expectation but I am not able to use sub query's columns in my main query
Here is my query
SELECT MAX(st.grade_level::integer) AS grades ,
scl.sid AS org_sourced_id
FROM schedules_53b055b75cd237fde3af904c1e726e12 sch
LEFT JOIN schools scl ON(sch.school_id=scl.school_id)
AND scl.batch_id=sch.batch_id
AND scl.client_id = sch.client_id
AND sch.run_id = scl.run_id
WHERE EXISTS
(SELECT t.term_id,t.abbreviation
FROM terms t
WHERE (sch.term = t.term_id)
AND t.batch_id=sch.batch_id
AND t.client_id = sch.client_id
AND t.run_id = sch.run_id)
AND EXISTS
(SELECT st.grade_level,
st.sid
FROM students st
WHERE (sch.student_id=st.sid)
AND st.batch_id= sch.batch_id
AND st.client_id = sch.client_id
AND st.run_id = sch.run_id)
GROUP BY scl.sid ,
sch.course_name ,
sch.course_number,
sch.school_id
And I am getting this error:
ERROR: missing FROM-clause entry for table "st"
SQL state: 42P01
Character: 29
I have only used one column here just for sample but I have to use more fields from sub query.
My main aim is that how can I achieve this with EXISTS or any alternate solution which is more optimal as performance wise
I am using pg module on Node.js since as back end I am using Node.js.
UPDATE
Query with JOIN
SELECT MAX(st.grade_level::integer) AS grades ,
scl.sid AS org_sourced_id
FROM schedules_53b055b75cd237fde3af904c1e726e12 sch
LEFT JOIN schools scl ON(sch.school_id=scl.school_id)
AND scl.batch_id=sch.batch_id
AND scl.client_id = sch.client_id
AND sch.run_id = scl.run_id
LEFT JOIN terms t ON (sch.term = t.term_id)
AND t.batch_id=sch.batch_id
AND t.client_id = sch.client_id
AND t.run_id = sch.run_id
LEFT JOIN students st ON (sch.student_id=st.sid)
AND st.batch_id= sch.batch_id
AND st.client_id = sch.client_id
AND st.run_id = sch.run_id
GROUP BY scl.sid ,
sch.course_name ,
sch.course_number,
sch.school_id

Resources