Understanding SQL queries - sql-query-store

Understanding SQL queries - sql-query-store

I'm not an expert in SQL hence this question. We use the below queries and I want to understand how its processed. Same table is used in all queries and what I want to know is is it TOP DOWN approach or BOTTOM UP approach? How to understand this query?
CREATE TABLE <tablename1>
select cr_latest_canumber.CAGID, cr_latest_canumber.GFCID, cast(cr_renewal.FAC_LAST_RENEWAL_DATE as STRING) as FAC_LAST_RENEWAL_DATE , cr_cust_total.CUST_TOTAL_CA_LTM, cr_latest_canumber.ca_number as LATEST_CA_NUMBER, cr_latest_canumber.sponsoring_officer as CA_SPONSORING_OFFICER, cr_latest_canumber.rorc as CA_RORC_PCT, cr_latest_canumber.rorc_bucket as CA_RORC_BUCKET , cr_latest_canumber.ca_segment as CA_CLIENT_SEGMENT from (select S.cagid,S.ca_number,S.sponsoring_officer,S.rorc, S.ca_segment, case
when cast(rorc as decimal(8,2)) <= 10 then "0%-10%"
when cast(rorc as decimal(8,2)) > 10 and cast(rorc as decimal(8,2)) <= 20 then "10%-20%"
when cast(rorc as decimal(8,2)) > 20 and cast(rorc as decimal(8,2)) <= 30 then "20%-30%"
when cast(rorc as decimal(8,2)) > 30 and cast(rorc as decimal(8,2)) <= 40 then "30%-40%"
when cast(rorc as decimal(8,2)) > 40 and cast(rorc as decimal(8,2)) <= 50 then "40%-50%"
when cast(rorc as decimal(8,2)) > 50 then ">50%"
else 'null'
end rorc_bucket
FROM
( select distinct cagid, ca_number, sponsoring_officer, rorc, ca_segment, to_date(from_unixtime(max(ca_booked_date) DIV 1000)), rank() over (partition by cagid order by to_date(from_unixtime(max(ca_booked_date) DIV 1000)) desc) as r
from
<tablename>
where
((ca_type='') or
(ca_type='' and review_type='') or
(ca_type='' and review_type='') or
(ca_type='' and review_type='') or
(ca_type='' and review_type='') or
(ca_type='' and review_type='') or
(ca_type='' and review_type='') )
) S where S.r = 1
) cr_latest_canumber left outer join
(
select to_date(from_unixtime(max(ca_booked_date) DIV 1000)) as FAC_LAST_RENEWAL_DATE, cagid, gfcid from
<tablename>
where
CREDIT_STATUS IN ('','') and
((ca_type='') or
(ca_type='' and review_type='') or
(ca_type='' and review_type='') or
(ca_type='' and review_type='')
) group by cagid, gfcid
) cr_renewal on cr_latest_canumber.cagid = cr_renewal.cagid and cr_latest_canumber.gfcid = cr_renewal.gfcid left outer join
(
select credit.cagid, count(distinct(ca_number)) as CUST_TOTAL_CA_LTM
<tablename> credit, (select report_date_12, report_date from <tablename2> cobdt where row_num=1) cobdt
where
to_date(from_unixtime(ca_creation_date DIV 1000)) between cobdt.report_date_12 and cobdt.report_date
and CREDIT_STATUS NOT IN ('','')
and credit.cagid = credit.gfcid
group by credit.cagid
) cr_cust_total on cr_renewal.cagid=cr_cust_total.cagid;

To understand the query just split the nested query and run separate to get a clear idea
There are 3 subquery and then left outer join that 3query to generate new table
Query:1 table1:cr_latest_canumber
select distinct cagid, ca_number, sponsoring_officer, rorc, ca_segment, to_date(from_unixtime(max(ca_booked_date) DIV 1000)), rank() over (partition by cagid order by to_date(from_unixtime(max(ca_booked_date) DIV 1000)) desc) as r
from
<tablename>
where
((ca_type='') or
(ca_type='' and review_type='') or
(ca_type='' and review_type='') or
(ca_type='' and review_type='') or
(ca_type='' and review_type='') or
(ca_type='' and review_type='') or
(ca_type='' and review_type='') )
) S where S.r = 1
Query:2 table2:cr_renewal
select to_date(from_unixtime(max(ca_booked_date) DIV 1000)) as FAC_LAST_RENEWAL_DATE, cagid, gfcid from
<tablename>
where
CREDIT_STATUS IN ('','') and
((ca_type='') or
(ca_type='' and review_type='') or
(ca_type='' and review_type='') or
(ca_type='' and review_type='')
) group by cagid, gfcid
query3: table3:cr_cust_total
select credit.cagid, count(distinct(ca_number)) as CUST_TOTAL_CA_LTM
<tablename> credit, (select report_date_12, report_date from <tablename2> cobdt where row_num=1) cobdt
where
to_date(from_unixtime(ca_creation_date DIV 1000)) between cobdt.report_date_12 and cobdt.report_date
and CREDIT_STATUS NOT IN ('','')
and credit.cagid = credit.gfcid
group by credit.cagid
Final
CREATE TABLE <tablename1>
select cr_latest_canumber.CAGID, cr_latest_canumber.GFCID, cast(cr_renewal.FAC_LAST_RENEWAL_DATE as STRING) as FAC_LAST_RENEWAL_DATE , cr_cust_total.CUST_TOTAL_CA_LTM, cr_latest_canumber.ca_number as LATEST_CA_NUMBER, cr_latest_canumber.sponsoring_officer as CA_SPONSORING_OFFICER, cr_latest_canumber.rorc as CA_RORC_PCT, cr_latest_canumber.rorc_bucket as CA_RORC_BUCKET , cr_latest_canumber.ca_segment as CA_CLIENT_SEGMENT from (select S.cagid,S.ca_number,S.sponsoring_officer,S.rorc, S.ca_segment,
case
when cast(rorc as decimal(8,2)) <= 10 then "0%-10%"
when cast(rorc as decimal(8,2)) > 10 and cast(rorc as decimal(8,2)) <= 20 then "10%-20%"
when cast(rorc as decimal(8,2)) > 20 and cast(rorc as decimal(8,2)) <= 30 then "20%-30%"
when cast(rorc as decimal(8,2)) > 30 and cast(rorc as decimal(8,2)) <= 40 then "30%-40%"
when cast(rorc as decimal(8,2)) > 40 and cast(rorc as decimal(8,2)) <= 50 then "40%-50%"
when cast(rorc as decimal(8,2)) > 50 then ">50%" else 'null'
end rorc_bucket
FROM
(cr_latest_canumber left outer join
cr_renewal on cr_latest_canumber.cagid = cr_renewal.cagid and cr_latest_canumber.gfcid = cr_renewal.gfcid left outer join
cr_cust_total on cr_renewal.cagid=cr_cust_total.cagid);

Related

No output when one side of UNION is empty

I have an Azure Stream Analytics job that combines the results of multiple queries and outputs them to the same sink. To do this, I define my queries within a WITH statement, then combine them using UNION and then write them to my sink. However, unfortunately I only get an output to my sink whenever all of my queries actually have an output, and this is where it goes wrong.
I have some queries that continuously (every 5 minutes) give an output, but I also have some queries that rare give an output (maybe a few times per day). This causes the output to not get any results, until the queries all have something to return. Does anyone know why this is and how I can fix it? Shouldn't UNION also give results when set A has results, but set B doesn't? I'm running this locally in VS Code, with a live connection to Event Hub by the way.
Here is a simplified example of 2 queries (one with frequent output, one with infrequent output) that goes wrong:
WITH
HarmonizedMeasurements AS (
SELECT
CAST(EHHARM.TimeStamp AS datetime) AS "TimeStamp",
CAST(EHHARM.ValueNumber AS float) AS "ValueNumber",
EHHARM.ValueBit AS "ValueBit",
EHHARM.MeasurementName,
EHHARM.PartName,
EHHARM.ElementId,
EHHARM.ElementName,
EHHARM.ObjectName,
EHHARM.TranslationTableId
FROM EventHubHarmonizedMeasurements AS EHHARM TIMESTAMP BY "TimeStamp"
PARTITION BY TranslationTableId
),
ToerenAandrijvingCategoriesMeasurements AS (
SELECT
AANDRCAT.TimeStamp AS "TimeStamp",
AANDRCAT.ValueNumber AS "ValueNumber",
AANDRCAT.ValueBit AS "ValueBit",
AANDRCAT.MeasurementName AS "MeasurementName",
AANDRCAT.PartName AS "PartName",
AANDRCAT.ElementId AS "ElementId",
AANDRCAT.ElementName AS "ElementName",
AANDRCAT.ObjectName AS "ObjectName",
AANDRCAT.TranslationTableId AS "TranslationTableId",
CASE
WHEN (-5000 < AANDRCAT.ValueNumber AND AANDRCAT.ValueNumber <= -1000) THEN 'Dalen'
WHEN (-1000 < AANDRCAT.ValueNumber AND AANDRCAT.ValueNumber <= -200) THEN 'Dalen Retarderen'
WHEN (-200 < AANDRCAT.ValueNumber AND AANDRCAT.ValueNumber <= 0) THEN 'Stilstand'
WHEN (0 < AANDRCAT.ValueNumber AND AANDRCAT.ValueNumber <= 250) THEN 'Nivelleren'
WHEN (250 < AANDRCAT.ValueNumber AND AANDRCAT.ValueNumber <= 400) THEN 'Heffen Retarderen'
WHEN (400 < AANDRCAT.ValueNumber AND AANDRCAT.ValueNumber <= 5000) THEN 'Heffen'
ELSE 'NoCategory'
END AS "Category"
FROM HarmonizedMeasurements AS AANDRCAT
WHERE
AANDRCAT.ObjectName LIKE 'Schutsluis%' AND
AANDRCAT.MeasurementName = 'Motortoerental terugkoppeling' AND
AANDRCAT.ValueNumber <> 0
),
AandrijvingCatStartMeasurements AS (
SELECT
AANDRCAT.TimeStamp AS "StartTime",
AANDRCAT.Category AS "Category",
AANDRCAT.ElementId AS "ElementId",
AANDRCAT.TranslationTableId AS "TranslationTableId"
FROM ToerenAandrijvingCategoriesMeasurements AS AANDRCAT
WHERE
LAG(Category, 1) OVER (PARTITION BY ElementId LIMIT DURATION(day, 1)) <> Category
),
AandrijvingCatEndMeasurements AS (
SELECT
AANDRST.StartTime AS "EndTime",
LAG(AANDRST.StartTime, 1) OVER (PARTITION BY ElementId LIMIT DURATION(day, 1)) AS "StartTime",
LAG(AANDRST.Category, 1) OVER (PARTITION BY ElementId LIMIT DURATION(day, 1)) AS "Category",
AANDRST.ElementId AS "ElementId",
AANDRST.TranslationTableId AS "TranslationTableId"
FROM AandrijvingCatStartMeasurements AS AANDRST
),
VermogenAandrijvingMeasurements AS (
SELECT
AANDRVER.TimeStamp AS "TimeStamp",
AANDRVER.ValueNumber AS "ValueNumber",
AANDRVER.ValueBit AS "ValueBit",
CONCAT(AANDRVER.MeasurementName, ' ', AANDREN.Category) AS "MeasurementName",
AANDRVER.PartName AS "PartName",
AANDRVER.ElementId AS "ElementId",
AANDRVER.ElementName AS "ElementName",
AANDRVER.ObjectName AS "ObjectName",
AANDRVER.TranslationTableId AS "TranslationTableId"
FROM HarmonizedMeasurements AS AANDRVER
LEFT JOIN AandrijvingCatEndMeasurements AS AANDREN
ON DATEDIFF(minute, AANDRVER, AANDREN) BETWEEN 0 AND 30 AND
AANDRVER.TimeStamp >= AANDREN.StartTime AND
AANDRVER.Timestamp < AANDREN.EndTime AND
AANDRVER.ElementId = AANDREN.ElementId AND
AANDRVER.TranslationTableId = AANDREN.TranslationTableId
INNER JOIN SQLMeasurementType AS MEAS
ON MEAS.Name = CONCAT(AANDRVER.MeasurementName, ' ', AANDREN.Category)
WHERE
AANDRVER.ObjectName LIKE 'Schutsluis%' AND
AANDRVER.MeasurementName = 'Vermogen'
),
LockDoorTop AS (
SELECT
Lock.TimeStamp AS "TimeStamp",
Lock.ValueNumber AS "ValueNumber",
Lock.ValueBit AS "ValueBit",
Lock.MeasurementName,
Lock.PartName,
Lock.ElementId,
Lock.ElementName,
Lock.ObjectName,
Lock.TranslationTableId
FROM HarmonizedMeasurements AS Lock
WHERE
Lock.MeasurementName = 'Sluisdeur open' AND
Lock.ElementName = 'Deur sluiskolk 1' AND
Lock.PartName = 'Bovenhoofd' AND
Lock.ValueBit = '1'
),
WaterLTop AS (
SELECT
WaterTop.TimeStamp AS "TimeStamp",
WaterTop.ValueNumber AS "ValueNumber",
WaterTop.ValueBit AS "ValueBit",
WaterTop.MeasurementName,
WaterTop.PartName,
WaterTop.ElementId,
WaterTop.ElementName,
WaterTop.ObjectName,
WaterTop.TranslationTableId
FROM HarmonizedMeasurements AS WaterTop
WHERE
WaterTop.MeasurementName = 'Waterniveaumeting' AND
WaterTop.ElementName = 'Sluiskolk 1' AND
WaterTop.PartName = 'Opvaartzijde'
),
WaterLLock AS (
SELECT
WaterLock.TimeStamp AS "TimeStamp",
WaterLock.ValueNumber AS "ValueNumber",
WaterLock.ValueBit AS "ValueBit",
WaterLock.MeasurementName,
WaterLock.PartName,
WaterLock.ElementId,
WaterLock.ElementName,
WaterLock.ObjectName,
WaterLock.TranslationTableId
FROM HarmonizedMeasurements AS WaterLock
WHERE
WaterLock.MeasurementName = 'Waterniveaumeting' AND
WaterLock.ElementName = 'Sluiskolk 1' AND
WaterLock.PartName = 'Sluiskamer'
),
WaterLevelTopMeasurements AS (
SELECT
LockDoor.TimeStamp AS "TimeStamp",
CAST(ROUND((WaterLevelLock.ValueNumber - WaterLevelTop.ValueNumber), 2) AS float) AS "ValueNumber",
null AS "ValueBit",
MEAS.Name AS "MeasurementName",
LockDoor.PartName AS "PartName",
LockDoor.ElementId AS "ElementId",
LockDoor.ElementName AS "ElementName",
LockDoor.ObjectName AS "ObjectName",
LockDoor.TranslationTableId AS "TranslationTableId"
FROM LockDoorTop AS LockDoor
JOIN WaterLTop AS WaterLevelTop
ON DATEDIFF(minute, LockDoor, WaterLevelTop) BETWEEN 0 AND 1 AND
LockDoor.ObjectName = WaterLevelTop.ObjectName
JOIN WaterLLock AS WaterLevelLock
ON DATEDIFF(minute, LockDoor, WaterLevelLock) BETWEEN 0 AND 1 AND
LockDoor.ObjectName = WaterLevelLock.ObjectName
INNER JOIN SQLMeasurementType AS MEAS
ON MEAS.Name = 'Waterniveauverschil'
),
-- Combine queries
DatalakeCombinedMeasurements AS (
SELECT * FROM VermogenAandrijvingMeasurements
UNION
SELECT * FROM WaterLevelTopMeasurements
)
-- Output data
SELECT *
INTO DatalakeHarmonizedMeasurements
FROM DatalakeCombinedMeasurements
PARTITION BY TranslationTableId

Python how to pass variables to SQLite complex SQL update Query

I have this SQL query that I confirmed works in SQLite. It updates two columns in the Table. I have 144 columns that need to be updated using the same query. How can I, using Python, pass along variables so I can use the same query to update all of them?
Here is my query to update one column:
UPDATE GBPAUD_TA AS t1
SET _1m_L3_Time = COALESCE(
(
SELECT
MIN(
CASE t1.Action
WHEN 'Buy' THEN CASE WHEN (t2._1M_55 >= t2.Low AND t2._1M_55 < t2.Open) THEN t2.Date_Time END
WHEN 'Sell' THEN CASE WHEN (t2._1M_55 <= t2.High AND t2._1M_55 < t2.Open) THEN t2.Date_Time END
END
)
FROM GBPAUD_DATA t2
WHERE t2.Date_Time >= t1.Open_Date AND t2.Date_Time <= t1.New_Closing_Time
),
t1._1m_L3_Time
);
UPDATE GBPAUD_TA
SET _1m_L3_Price = (SELECT _1M_55
FROM GBPAUD_DATA
WHERE Date_Time = GBPAUD_TA._1m_L3_Time)
where EXISTS (SELECT _1M_55
FROM GBPAUD_DATA
WHERE Date_Time = GBPAUD_TA._1m_L3_Time)
Here is my query showing the variables that I would need to automatically insert:
UPDATE GBPAUD_TA AS t1
SET Variable1 = COALESCE(
(
SELECT
MIN(
CASE t1.Action
WHEN 'Buy' THEN CASE WHEN (t2.Variable2 >= t2.Low AND t2.Variable2< t2.Open) THEN t2.Date_Time END
WHEN 'Sell' THEN CASE WHEN (t2.Variable2 <= t2.High AND t2.Variable2< t2.Open) THEN t2.Date_Time END
END
)
FROM GBPAUD_DATA t2
WHERE t2.Date_Time >= t1.Open_Date AND t2.Date_Time <= t1.New_Closing_Time
),
t1.Variable1
);
UPDATE GBPAUD_TA
SET Variable3 = (SELECT Variable2
FROM GBPAUD_DATA
WHERE Date_Time = GBPAUD_TA.Variable1)
where EXISTS (SELECT Variable2
FROM GBPAUD_DATA
WHERE Date_Time = GBPAUD_TA.Variable1)
I have a total of 3 Variables.
Based upon googling and reading, I found a possible way by using host variables: I use the "?" in place of the variable, combine the variables into a tuple, and then use "executemany()"?
I tried this, but it did not work. It gave me an error:
"cursor.executemany(sql_update_query, SLTuple)
OperationalError: near "?": syntax error"
So what should I do? Any guidance is much appreciated!

Found the answer after I figured out the proper terminology: string formatting and interloping. Found the answer here.

psycopg2.errors.InvalidFunctionDefinition: create function must specify volatility attribute (IMMUTABLE|STABLE|VOLATILE)

I'm trying to create a User defiend function in AWS Redshift using psycopg2, heres' the code:
CREATE OR REPLACE FUNCTION generate_create_table_statement(p_table_name varchar)
RETURNS text AS
$BODY$
DECLARE
v_table_ddl text;
column_record record;
BEGIN
FOR column_record IN
SELECT
b.nspname as schema_name,
b.relname as table_name,
a.attname as column_name,
pg_catalog.format_type(a.atttypid, a.atttypmod) as column_type,
CASE WHEN
(SELECT substring(pg_catalog.pg_get_expr(d.adbin, d.adrelid) for 128)
FROM pg_catalog.pg_attrdef d
WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef) IS NOT NULL THEN
'DEFAULT '|| (SELECT substring(pg_catalog.pg_get_expr(d.adbin, d.adrelid) for 128)
FROM pg_catalog.pg_attrdef d
WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef)
ELSE
''
END as column_default_value,
CASE WHEN a.attnotnull = true THEN
'NOT NULL'
ELSE
'NULL'
END as column_not_null,
a.attnum as attnum,
e.max_attnum as max_attnum
FROM
pg_catalog.pg_attribute a
INNER JOIN
(SELECT c.oid,
n.nspname,
c.relname
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname ~ ('^('||p_table_name||')$')
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 2, 3) b
ON a.attrelid = b.oid
INNER JOIN
(SELECT
a.attrelid,
max(a.attnum) as max_attnum
FROM pg_catalog.pg_attribute a
WHERE a.attnum > 0
AND NOT a.attisdropped
GROUP BY a.attrelid) e
ON a.attrelid=e.attrelid
WHERE a.attnum > 0
AND NOT a.attisdropped
ORDER BY a.attnum
LOOP
IF column_record.attnum = 1 THEN
v_table_ddl:='CREATE TABLE '||column_record.schema_name||'.'||column_record.table_name||' (';
ELSE
v_table_ddl:=v_table_ddl||',';
END IF;
IF column_record.attnum <= column_record.max_attnum THEN
v_table_ddl:=v_table_ddl||chr(10)||
' '||column_record.column_name||' '||column_record.column_type||' '||column_record.column_default_value||' '||column_record.column_not_null;
END IF;
END LOOP;
v_table_ddl:=v_table_ddl||');';
RETURN v_table_ddl;
END;
$BODY$
LANGUAGE plpgsql;
I'm getting this error:
psycopg2.errors.InvalidFunctionDefinition: create function must specify volatility attribute
(IMMUTABLE|STABLE|VOLATILE)
Where exactly do I put this attribute in my query? I've looked around but found no examples that did this.

Please check this link:
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_FUNCTION.html
You forgot to specify the volatility attribute. It need to be specified after the 'return' statement like this:
CREATE OR REPLACE FUNCTION generate_create_table_statement(p_table_name varchar)
RETURNS text
stable
AS
$BODY$
Furthermore it seems, that plpgsql is not a supported language for function creation at redshift.
Supported languages:
$$ LANGUAGE { plpythonu | sql }

Am getting operand clash error when running a datawarehouse compatible script

Hi I am getting an error like this: Operand type clash: date is incompatible with int.
Below is my query which I am running on a SQL Server:
CREATE TABLE val.census_last_month
WITH(
DISTRIBUTION = ROUND_ROBIN
, CLUSTERED COLUMNSTORE INDEX
)
AS
SELECT
dt_mydate AS dt_census,
(SELECT count(DISTINCT encounter_id)
FROM prod.encounter
WHERE encounter_type = 'Inpatient' AND (ts_admit BETWEEN dt_mydate - 30 AND dt_mydate) AND
(ts_discharge IS NULL OR ts_discharge > dt_mydate)) AS census,
(SELECT count(DISTINCT encounter_id)
FROM prod.encounter
WHERE encounter_type = 'Inpatient' AND cast(ts_admit AS DATE) = dt_mydate) AS admits,
(SELECT count(DISTINCT encounter_id)
FROM prod.encounter
WHERE encounter_type = 'Inpatient' AND cast(ts_discharge AS DATE) = dt_mydate) AS discharges
FROM ref.calendar_day
WHERE ref.calendar_day.dt_mydate BETWEEN (cast(getdate() as date) - 30) AND cast(getdate() as date);

You need to use dateadd function. See details here. https://learn.microsoft.com/en-us/sql/t-sql/functions/dateadd-transact-sql?view=sql-server-2017

There are multiple issues with this script, however can you confirm the datatypes starting "ts_" are dates stored as integers in the format yyyyMMdd and datatypes starting "dt_" are DATE?
Based on these assumptions, this is my attempted rewrite:
SELECT dt_mydate AS dt_census,
(
SELECT COUNT( DISTINCT encounter_id )
FROM prod.encounter
WHERE encounter_type = 'Inpatient' AND ( CAST( CAST( ts_admit AS CHAR(8) ) AS DATE ) BETWEEN DATEADD( day, -30, dt_mydate ) AND dt_mydate )
AND ( ts_discharge IS NULL OR CAST( CAST( ts_discharge AS CHAR(8) ) AS DATE ) > dt_mydate)
) AS census,
(
SELECT COUNT( DISTINCT encounter_id )
FROM prod.encounter
WHERE encounter_type = 'Inpatient' AND CAST( CAST( ts_admit AS CHAR(8) ) AS DATE ) = dt_mydate
) AS admits,
(
SELECT COUNT( DISTINCT encounter_id )
FROM prod.encounter
WHERE encounter_type = 'Inpatient'
AND CAST( CAST( ts_discharge AS CHAR(8) ) AS DATE ) = dt_mydate
) AS discharges
FROM ref.calendar_day
WHERE ref.calendar_day.dt_mydate BETWEEN CAST ( DATEADD( day, -30, GETDATE() ) AS DATE ) AND CAST( GETDATE() AS DATE );
If either of my assumptions are incorrect, please let me know and I will update the script.

deleting particular row in particular condition in pyspark

I am newbie in spark. i want to delete a row using spark sql.due to incompatibility deletion in temptable so far i have read that ,operating delete like sql queries i need to save the table in pyspark permanently that is hive table i guess.I have done that too
my code is
spark.sql("select b.ENTITYID as ENTITYID, cm.BLDGID as BldgID,cm.LEASID as LeaseID,coalesce(l.SUITID,(select EmptyDefault from EmptyDefault)) as SuiteID,(select CurrDate from CurrDate) as TxnDate,cm.INCCAT as IncomeCat,'??' as SourceCode,(Select CurrPeriod from CurrPeriod)as Period,coalesce(case when cm.DEPARTMENT ='#' then 'null' else cm.DEPARTMENT end, null) as Dept,'Lease' as ActualProjected ,fnGetChargeInd(cm.EFFDATE,cm.FRQUENCY,cm.BEGMONTH,(select CurrPeriod from CurrPeriod))*coalesce (cm.AMOUNT,0) as ChargeAmt,0 as OpenAmt,cm.CURRCODE as CurrencyCode,case when ('PERIOD.DATACLSD') is null then 'Open' else 'Closed' end as GLClosedStatus,'Unposted'as GLPostedStatus ,'Unpaid' as PaidStatus,cm.FRQUENCY as Frequency,0 as RetroPD from CMRECC cm join BLDG b on cm.BLDGID =b.BLDGID join LEAS l on cm.BLDGID =l.BLDGID and cm.LEASID =l.LEASID and (l.VACATE is null or l.VACATE >= ('select CurrDate from CurrDate')) and (l.EXPIR >= ('select CurrDate from CurrDate') or l.EXPIR < ('select RunDate from RunDate')) left outer join PERIOD on b.ENTITYID = PERIOD.ENTITYID and ('select CurrPeriod from CurrPeriod')=PERIOD.PERIOD where ('select CurrDate from CurrDate')>=cm.EFFDATE and (select CurrDate from CurrDate) <= coalesce(cm.EFFDATE,cast(date_add(( select min(cm2.EFFDATE) from CMRECC cm2 where cm2.BLDGID = cm.BLDGID and cm2.LEASID = cm.LEASID and cm2.INCCAT = cm.INCCAT and 'cm2.EFFDATE' > 'cm.EFFDATE'),-1) as timestamp) ,case when l.EXPIR <(select RunDate from RunDate)then (Select RunDate from RunDate) else l.EXPIR end)").write.saveAsTable('Fact_Temp1')
and after that i have got a permanent table in my pyspark
next i have queried delete operation
spark.sql("DELETE from Fact_Temp1 where ActualProjected='Lease' and ChargAmt=0").show()
I have got this error
pyspark.sql.utils.ParseException: u"\nOperation not allowed: DELETE from(line 1, pos 0)\n\n== SQL ==\nDELETE from Fact_Temp1 where ActualProjected='Lease'and ChargeAmt=0\n^^^\n"
I am a bit confused.is there any different way to write this. I have no idea why i am getting this error.Kindly guide me I am using spark 2.0
kalyan

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Understanding SQL queries - sql-query-store

Related

No output when one side of UNION is empty

Python how to pass variables to SQLite complex SQL update Query

psycopg2.errors.InvalidFunctionDefinition: create function must specify volatility attribute (IMMUTABLE|STABLE|VOLATILE)

Am getting operand clash error when running a datawarehouse compatible script

deleting particular row in particular condition in pyspark

Categories

Resources