Python SQL formatting -- omitting one where clause based on conditions - python-3.x

I would need to write a SQL question based on conditions:
in Condition 1:
SELECT
*
FROM
table_1
WHERE
col_1 IS NULL
AND col_2 IS NOT NULL
and in Condition 2:
SELECT
*
FROM
table_1
WHERE
col_1 IS NULL
How would I be able to achieve this easily in Python? I know I can do filters later on but that's not super efficient as it should be.

The solution used in many tools: Make initial query with dummy TRUE WHERE clause, then depending on conditions can be concatenated with additional filters like this (simplified code):
query = 'select * from table where 1 = 1' # WHERE with dummy TRUE condition
# it can be WHERE TRUE
condition1 = True; # if both conditions are False, query will be without filters
condition2 = True;
filter1='Col1 is not null'
filter2='Col2 is not null'
if condition1:
query = query+' and '+ filter1
if condition2:
query = query+' and '+ filter2
print(query)
Result:
select * from table where 1 = 1 and Col1 is not null and Col2 is not null
More elegant solution using pypika - python query builder. You can build the whole query including table, fields, where filters and joins:
from pypika import Query, Table, Field
table = Table('table')
q = Query.from_(table).select(Field('col1'), Field('col2'))
condition1 = True;
condition2 = True;
if condition1:
q = q.where(
table.col1.notnull()
)
if condition2:
q = q.where(
table.col2.notnull()
)
print(q)
Result:
SELECT "col1","col2" FROM "table" WHERE NOT "col1" IS NULL AND NOT "col2" IS NULL

Related

How to create update query with QSqlQuery

I'm trying to create an update query in Python3/PyQt5.10/Sqlite . A select/insert query made the same way runs fine. Fields & corresponding record exist.
def updateRecords():
theDict = {
"Loc": "PyQt121",
"BoekNr" : "dfdf",
"BoekTitel" : "eeee",
"BoekBedrag" : 999
}
theFilter = " WHERE Loc = 'PyQt'"
query = QSqlQuery()
columns = ', '.join(pDict.keys())
placeholders = ':'+', :'.join(pDict.keys())
sql = 'UPDATE %s SET (%s) VALUES (%s) %s' % (pTable, columns, placeholders, pFilter)
query.prepare(sql)
for key, value in pDict.items():
query.bindValue(":"+key, value)
print (sql)
query.exec_()
print(query.lastError().databaseText())
return query.numRowsAffected()
The sql generated is UPDATE tempbooks SET (Loc, BoekNr, BoekTitel, BoekBedrag) VALUES (:Loc, :BoekNr, :BoekTitel, :BoekBedrag) WHERE Loc = 'PyQt'.
query.lastError().databaseText()) give me "No Query" and updated rows is -1.
The correct syntax for an update query:
UPDATE tablename
set col1 = val1,
col2 = val2,
col3 = val3
WHERE condition
Probably query.prepare(sql) is returning False because of invalid syntax.

opposite of spark dataframe `withColumn` method?

I'd like to be able to chain a transformation on my DataFrame that drops a column, rather than assigning the DataFrame to a variable (i.e. df.drop()). If I wanted to add a column, I could simply call df.withColumn(). What is the way to drop a column in an in-line chain of transformations?
For the entire example use this as baseline:
val testVariable = 10
var finalDF = spark.sql("'test' as test_column")
val iDF = spark.sql("select 'John Smith' as Name, cast('10' as integer) as Age, 'Illinois' as State")
val iDF2 = spark.sql("select 'Jane Doe' as Name, cast('40' as integer) as Age, 'Iowa' as State")
val iDF3 = spark.sql("select 'Blobby' as Name, cast('150' as integer) as Age, 'Non-US' as State")
val nameDF = iDF.unionAll(iDF2).unionAll(iDF3)
1 Conditional Drop
If you want to only drop on certain outputs and these are known outputs, you can build out conditional loops to check if the iterator needs to be dropped or not. In this case if the test variable exceeds 4 it will drop the name column, else it adds a new column.
finalDF = if (testVariable>=5) {
nameDF.drop("Name")
} else {
nameDF.withColumn("Cooler_Name", lit("Cool_Name")
}
finalDF.printSchema
2 Programmatically build the select statement. Baseline the select expression statement takes in independent strings and build them into commands that can be read by Spark. In the case below we know we have a test for drop but we do know what columns might be dropped. In this case if a column gets a test values that does not equal 1 we do not include the value in out command array. When we run the command array against the select expression on the table, those columns are dropped.
val columnNames = nameDF.columns
val arrayTestOutput = Array(1,0,1)
var iteratorArray = 1
var commandArray = Array("")
while(iteratorArray <= columnNames.length) {
if (arrayTestOutput(iteratorArray-1) == 1) {
if (iteratorArray == 1) {
commandArray = columnNames(iteratorArray-1)
} else {
commandArray = commandArray ++ columnNames(iteratorArray-1)
}
}
iteratorArray = iteratorArray + 1
}
finalDF=nameDF.selectExpr(commandArray:_*)
finalDF.printSchema

cassandra jdbc serach ( how to exclude where condition value is null)

i'm using datatstax jdbc driver.
and i want to execute lookup query like this.
Statement statement = new SimpleStatement("select * from tablename where condition1 = ? and condition2 = ?", value1, value2 );
condition1 is partition key.
and condition2 is part of primary key.
here is my problem.
'value2' can be null. and If 'value2' is null, I want to exclude it from the search condition
Do I have to make two queries per case?
like this?
or is there any way to handle null values?
Statement statement = new SimpleStatement("select * from tablename where condition1 = ? , value1 );

linq to entities: compare string variable with NVarchar field

This code results in timeout exception
String City_Code = null;
var result = (from r in myContext.TableA
where ( City_Code == r.City_Code )
select r ).ToList();
while this code will return quickly
String City_Code = null;
var result = (from r in myContext.TableA
where ( r.City_Code == City_Code )
select r ).ToList();
The difference is in the order of operands of the equality. The field City_Code of table TableA is of type nvarchar(20).
What is a possible reason for that???
update: The generated tsql for the 2 queries are identical except for that part: the first one has the condition " #p__linq__2 = [Extent1].[City_Code])" while the second has " [Extent1].[City_Code] = #p__linq__2)"
The value of #p_linq_2 when I notice the time difference is NULL. If I run the queries in ssms with NULL in place of #p_linq_2 they respond both very quickly

Spark: subset a few columns and remove null rows

I am running spark 2.1 on windows 10, I have fetched data from MySQL to spark using JDBC and the table looks like this
x y z
------------------
1 a d1
Null v ed
5 Null Null
7 s Null
Null bd Null
I want to create a new spark dataset with only x and y columns from the above table and I wan't to keep only those rows which do not have null in either of those 2 columns. My resultant table should look like this
x y
--------
1 a
7 s
The following is the code:
val load_DF = spark.read.format("jdbc").option("url", "jdbc:mysql://100.150.200.250:3306").option("dbtable", "schema.table_name").option("user", "uname1").option("password", "Pass1").load()
val filter_DF = load_DF.select($"x".isNotNull,$"y".isNotNull).rdd
// lets print first 5 values of filter_DF
filter_DF.take(5)
res0: Array[org.apache.spark.sql.Row] = Array([true,true], [false,true], [true,false], [true,true], [false,true])
As shown, the above result doesn't give me actual values but it returns Boolean values (true when value is not Null and false when value is Null)
Try this;
val load_DF = spark.read.format("jdbc").option("url", "jdbc:mysql://100.150.200.250:3306").option("dbtable", "schema.table_name").option("user", "uname1").option("password", "Pass1").load()
Now;
load_DF.select($"x",$"y").filter("x !== null").filter("y !== null")
Spark provides DataFrameNaFunctions for this purpose of dropping null values, etc.
In your example above you just need to call the following on a DataSet that you load
val noNullValues = load_DF.na.drop("all", Seq("x", "y"))
This will drop records where nulls occur in either field x or y but not z. You can read up on DataFrameNaFunctions for further options to fill in data, or translate values if required.
Apply "any" in na.drop:
df = df.select("x", "y")
.na.drop("any", Seq("x", "y"))
You are simply applying a function (in this case isNotNull) to the values when you do a select - instead you need to replace select with filter.
val filter_DF = load_DF.filter($"x".isNotNull && $"y".isNotNull)
or if you prefer:
val filter_DF = load_DF.filter($"x".isNotNull).filter($"y".isNotNull)

Resources