Spark: How to skip second condition in OR construction - apache-spark

When I try to check OR condition in Spark where function, the second condition is executed even thought first condition is true.
How can I skip the check of second condition?
df.
...
.where(
(
lit(lastLoadingDate).isNull
.or(
col(srcDTTM) > lastLoadingDate.format(formatterDTTM)
)
)
&& col(SrcDTTM) <= currentLoadingDate.format(formatterDTTM)
)
I tried even check next expression:
df.
...
.where(
(
lit(true)
.or(
col(srcDTTM) > lastLoadingDate.format(formatterDTTM)
)
)
&& col(SrcDTTM) <= currentLoadingDate.format(formatterDTTM)
)
But second condition:
col(srcDTTM) > lastLoadingDate.format(formatterDTTM)
is always executed.

Skip the check of second condition may result in incomplete data, because it is or judgment. If the second condition is true and the first condition is false, the amount of data in the result set will increase.

Checking the second condition in OR judgement wont make any difference when the first condition is true. Assume adding another condition or using any other function to skip the second condition check. If first check is false then condition to check if first one is true or false and then going to the second part of OR judgement. It will be like 3 conditions instead of 2. Its better to use OR judgement as it is.

Related

ADF Understanding the Case Statement

Given the following Derived column Expression
case(Rolling =='A'||Rolling == 'B'||Rolling == 'C'|| Rolling =="S"
, "
, case(Alpha== 'EE'
, toString(toDate(Manu_Date, 'yyyy-MM-dd'))
, case(Alpha=='CW', Del_Date,"))
)
2 questions
Is there a Better way to write this code?
What is this code trying to do ?
I am trying to understand what they are trying to achieve with this expression?
In the given expression, after Rolling=="S", it is not double Quotes ". It should be two single quotes''
Similarly, after Del_date, also it should be two single Quotes.
case(Rolling =='A'||Rolling == 'B'||Rolling == 'C'|| Rolling =="S", '',
case(Alpha== 'EE', toString(toDate(Manu_Date, 'yyyy-MM-dd')),
case(Alpha=='CW', Del_Date,'' )))
What is this code trying to do ?
Syntax for case statement is
case(condition,true_expression,false_expression)
Initially, this expression checks if Rolling is 'A' or 'B' or 'C' or 'S' and then assign the value as '' (empty string) for the derived column.
When the above condition is false, then checks if Alpha is 'EE' and assign the value of Manu_Date in string format.
When the second condition also fails, it checks if Alpha='CW' and assign the value of Del_Date column.
When all the above conditions are not met, '' (empty string) is assigned. This will be the default value.
I repro'd this with sample input.
img1: input data
In derived column transformation, new column is added, and the expression is given as in below script.
case(Rolling =='A'||Rolling == 'B'||Rolling == 'C'|| Rolling =="S", '',
case(Alpha== 'EE', toString(toDate(Manu_Date, 'yyyy-MM-dd')),
case(Alpha=='CW', Del_Date,'' )))
img2: Derived column transformation output
Is there a Better way to write this code?
Since the order of condition is important to assign the values to the new column, case statement is better way to do.
But, instead of using nested case statements, we can use single case statement to achieve the same.
Syntax:
case( condition_1, expression_1, condition_2, expression_2,.......... condition_n,expression_n,default_expression).
Null will be the default value, when the default expression is omitted.
Modified expression
case(Rolling =='A'||Rolling == 'B'||Rolling == 'C'|| Rolling =="S", '',
Alpha== 'EE', toString(toDate(Manu_Date, 'yyyy-MM-dd')),
Alpha=='CW', Del_Date,'' )
img 3: Results of both case statements
Both the expressions are added in the derived column transformation and results are same in both cases.

Python if test condition fails when used with or

Greetings python experts. I had written an if condition as follows that fails to be false for objects that should be false. I am writing in python 3.8.5. Note instance_list in this example contains a list of resources that are in various states. I only want to append vm_instance_list with resources that are not in a TERMINATED or TERMINATING state.
instance_results = compute_client.list_instances(
compartment_id = compartment_id).data
vm_instance_list = []
for instance in instance_results:
if instance.lifecycle_state != "TERMINATED" or instance.lifecycle_state != "TERMINATING:
vm_instance_list.append(instance)
The above code appends vm_instance_list with every object in the list instance_results, aka each condition is interpreted as True for objects that are in a TERMINATED or TERMINATING lifecycle state. I have had to re-write to nest the if conditions, which works.
for instance in instance_results:
if instance.lifecycle_state != "TERMINATED:
if instance.lifecycle_state != "TERMINATING":
vm_instance_list.append(instance)
I have no idea what why I have had to nest the above if statements and would appreciate if anyone could share some insights.
Thanks so much,
Hank
In your first version, the result is ALWAYS true, so every item is appended.
Your second version is only true if both tests are true.
If you want the first version to behave like the second version, you need an 'and' statement, not an 'or'.
Let's trace through your first if statement in the case when instance.lifecycle_state is equal to "TERMINATED". The condition is as follows:
instance.lifecycle_state != "TERMINATED" or instance.lifecycle_state != "TERMINATING"
We can see that the first part of this statement is false (since lifecycle_state DOES equal "TERMINATED". The second part is true because lifecycle_state indeed does NOT equal "TERMINATING". So this whole expression simplifies to:
False or True
which finally simplifies (by the rules of or) to be just: True. So now we have seen why the body of the if is executed in the first case.
If we do a similar process in your second code snippet, we will see that the first condition is False (since lifecycle_state DOES equal "TERMINATED". So in this case the second condition is not checked, and the body of the if does not execute.
In fact, the second snippet is equivalent to the following condition:
instance.lifecycle_state != "TERMINATED" and instance.lifecycle_state != "TERMINATING"
Note that this is very similar to your original snippet, however we've replaced or with and. In fact, two nested if statements like this are equivalent to a single condition where both parts are joined by and.
By DeMorgan's Laws, this condition is also equivalent to:
not (instance.lifecycle_state == "TERMINATED" or instance.lifecycle_state == "TERMINATING")
which you may find clearer to understand.

Variable is created even if condition is not true in robot framework

In my keyword I got a for loop, in which I append items into the list. At some point I would like to empty this list, so I can start appending items again.
Append to List ${list} ${data}
#{list}= Run Keyword If ${list_length} == 10 or ${cond} == 1 my_keyword ${arg1}
... ${arg2}
my_keyword ${arg1} ${arg2}
Do some stuff
#{list} Create List
Return ${list}
New empty list is created for every iteration, not only when condition is met, the other stuff from my_keyword is executed only when condition is met.
What should I change to create new list only if condition is met?
Having #{list}= before Run Keyword If will assign value to it regardless if executed keyword or not. So in case condition is not met, it will assign None to #{list}. If you want to keep current list, then add 'ELSE' part:
#{list}= Run Keyword If ${list_length} == 10 or ${cond} == 1 my_keyword ${arg1}
... ${arg2} ELSE set variable ${list}

Netezza "not exists" within a CASE statement

I have a multi-layered CASE statement, and one of the conditions needs to reference a table via a "not exists". I keep getting the error about 'correlated subqueries not allowed". How can I reference a table along with a condition inside a CASE statement? Below is a portion of my code:
WHEN ...... previous condition
WHEN ( CCOB_CLIENT_LOB_ID = 2 AND OI_CARRIER_LOB_ID IN (1,2,12,13) )
and not exists ( select S.STATE
FROM CCOB_PACIFICSOURCE.V_SELFPAY_COB_STATES S
WHERE S.STATE = SELFPAY_COB_STATE ) then 'NONE'
WHEN .... subsequent condition
The short answer is: you cann’t.
The longer answer is that you have to rewrite the query to outer join the table you give the alias S.
Then it’s quite possible to test for NULL.
Watch out for dublicates on the S.state column though :)

Nested IF statement returning false

I have a nested if statement is returning "False" rather than the expected outcome.
Scenario
Table "High VoltageCables" has data in it that default to numeric but may contain characters: kVa
Table "Master" checks "High VoltageCables" data as blank or not blank, and returns "Failed Check 1","Passed Check 1". This works fine.
Table "Meta" then checks the results of "Master" and then tests "High VoltageCables" data for length between 1 and 6, regardless of whether record is numeric or string.
Formula
=IF(MASTER!H2="Passed Check 1",IF(LEN('High VoltageCables'!O2)>=1,IF(LEN('High VoltageCables'!O2<6),"Passed Check 2","Failed Check 2")))
This is partially succesful, as it returns "Passed Check 2" for the following sample data in the source table "High VoltageCables".
1 numeric, or
1kVa str, or
50000 numeric
However if a field in "High VoltageCables"is blank, the formula returns "FALSE" rather than "Failed Check 1"
I inherited this task, (and would have preferred to do the whole thing in Access using relatively simple queries) - and unfortunately I am new to nested If statements, so I am probably missing something basic...
NB the data in High VoltageCables must default to numeric for a further check to work.
The first and second IF's seem to be missing the else part. They should be added at the end between the ))) like ), else ), else )
Every IF statement consists of IF( condition, truepart, falsepart) if you have two nested ifs it will be something like IF( condition, IF( condition2, truepart2, falsepart2), falsepart)
Hope that makes it a little clearer
You do have an unaccounted for FALSE in the middle IF. Try bring the latter two conditions together.
=IF(Master!H2="Passed Check 1",IF(OR(LEN('High VoltageCables'!O2)={1,2,3,4,5}),"Passed Check 2","Failed Check 2"))
It's still a bit unclear on what to show or not show if Master!H2 does not equal "Passed Check 1".
I failed to construct the formula with a concluding "else" - "Failed Check 1"
Using jeeped's and Tom's suggestion and adding the final "else" part I have solved the problem:
=IF(MASTER!H2="Passed Check 1",IF(OR(LEN('High VoltageCables'!O2)={1,2,3,4,5}),"Passed Check 2","Failed Check 2"),"Failed Check 1")

Resources