Null value comparision in slick - slick

I encounter a problem of nullable column comparison.
If some columns are Option[T], I wonder how slick translate like === operation on these columns to sql.
There are two possibilities: null value( which is None in scala ) and non-null value.
In the case of null value, sql should use is instead of =.
However slick doesn't handle it correctly in the following case.
Here is the code (with H2 database):
object Test1 extends Controller {
case class User(id: Option[Int], first: String, last: String)
class Users(tag: Tag) extends Table[User](tag, "users") {
def id = column[Int]("id",O.Nullable)
def first = column[String]("first")
def last = column[String]("last")
def * = (id.?, first, last) <> (User.tupled, User.unapply)
}
val users = TableQuery[Users]
def find_u(u:User) = DB.db.withSession{ implicit session =>
users.filter( x=> x.id === u.id && x.first === u.first && x.last === u.last ).firstOption
}
def t1 = Action {
DB.db.withSession { implicit session =>
DB.createIfNotExists(users)
val u1 = User(None,"123","abc")
val u2 = User(Some(1232),"123","abc")
users += u1
val r1 = find_u(u1)
println(r1)
val r2 = find_u(u2)
println(r2)
}
Ok("good")
}
}
I print out the sql. It is following result for the first find_u.
[debug] s.s.j.J.statement - Preparing statement: select x2."id", x2."first", x2."last" from "users" x2 where (
(x2."id" = null) and (x2."first" = '123')) and (x2."last" = 'abc')
Notice that (x2."id" = null) is incorrect here. It should be (x2."id" is null).
Update:
Is it possible to only compare non-null fields in an automatic fashion? Ignore those null columns.
E.g. in the case of User(None,"123","abc"), only do where (x2."first" = '123')) and (x2."last" = 'abc')

Slick uses three-valued-logic. This shows when nullable columns are involved. In that regard it does not adhere to Scala semantics, but uses SQL semantics. So (x2."id" = null) is indeed correct under these design decisions. To to a strict NULL check use x.id.isEmpty. For strict comparison do
(if(u.id.isEmpty) x.id.isEmpty else (x.id === u.id))
Update:
To compare only when the user id is non-null use
(u.id.isEmpty || (x.id === u.id)) && ...

Related

Spark pass string condition and return column type

I have a function, myFilter that I want to do this:
If the mode parameter is 'daily', then filter the data frame by date using the parameters 'dt' and dB'. This works.
If the mode parameter is 'custom', then filter by a valid condition passed as a String to the function like this:
val filcond = "col(\"custId\")===\"1\""
myFilter(mode, filcond)
Since the function returns a Column type, I think I need to convert the string to Column. But I have yet to find a way to do that. Any ideas how to do this?
def myFilter(mode: String, filcond : String): Column = {
val filterCondition: Column =
if (mode == "daily") {
$"update_date" >= date_sub(to_date(date_format(lit(dt), "yyyy-MM-dd")), dB) and $"update_date" <= to_date(date_format(lit(dt), "yyyy-MM-dd"))
} else if (mode == "custom") {
//col("custId")==="1" //I want it to return this condition, but pass it in as a param to this function
filcond // this does not work since it is a String
}
return filterCondition
}
val myDf = ss.read
.parquet("/data")
.select (
$"Id",
$"Url",
$"Type",
$"custId",
to_date($"updateTimestamp","yyyy-MM-dd").as("update_date")
)
.filter( myFilter(mode,filcond) )

opposite of spark dataframe `withColumn` method?

I'd like to be able to chain a transformation on my DataFrame that drops a column, rather than assigning the DataFrame to a variable (i.e. df.drop()). If I wanted to add a column, I could simply call df.withColumn(). What is the way to drop a column in an in-line chain of transformations?
For the entire example use this as baseline:
val testVariable = 10
var finalDF = spark.sql("'test' as test_column")
val iDF = spark.sql("select 'John Smith' as Name, cast('10' as integer) as Age, 'Illinois' as State")
val iDF2 = spark.sql("select 'Jane Doe' as Name, cast('40' as integer) as Age, 'Iowa' as State")
val iDF3 = spark.sql("select 'Blobby' as Name, cast('150' as integer) as Age, 'Non-US' as State")
val nameDF = iDF.unionAll(iDF2).unionAll(iDF3)
1 Conditional Drop
If you want to only drop on certain outputs and these are known outputs, you can build out conditional loops to check if the iterator needs to be dropped or not. In this case if the test variable exceeds 4 it will drop the name column, else it adds a new column.
finalDF = if (testVariable>=5) {
nameDF.drop("Name")
} else {
nameDF.withColumn("Cooler_Name", lit("Cool_Name")
}
finalDF.printSchema
2 Programmatically build the select statement. Baseline the select expression statement takes in independent strings and build them into commands that can be read by Spark. In the case below we know we have a test for drop but we do know what columns might be dropped. In this case if a column gets a test values that does not equal 1 we do not include the value in out command array. When we run the command array against the select expression on the table, those columns are dropped.
val columnNames = nameDF.columns
val arrayTestOutput = Array(1,0,1)
var iteratorArray = 1
var commandArray = Array("")
while(iteratorArray <= columnNames.length) {
if (arrayTestOutput(iteratorArray-1) == 1) {
if (iteratorArray == 1) {
commandArray = columnNames(iteratorArray-1)
} else {
commandArray = commandArray ++ columnNames(iteratorArray-1)
}
}
iteratorArray = iteratorArray + 1
}
finalDF=nameDF.selectExpr(commandArray:_*)
finalDF.printSchema

spark UDF to return list of Rows

In spark SQL I am trying to join multiple table those are already in place.
How ever i need to use a function which will take user input and then get details from two other tables,then use this result in the join.
The query is some thing like below
select t1.col1,t1.col2,t2.col3,cast((t1.value * t3.value) from table1 t1
left join table2 t2 on t1.col = t2.col
left join fn_calculate (value1, value2) as t3 on t1.value = t3.value
here fn_calculate is the function which is taking value1,value2 as parameters and then that function returns a table of rows.(in SQL server that is returning Table)
I am trying to do this by using hive generic UDF which will take the input param and then return the dataframe? like below
public String evaluate(DeferredObject[] arguments) throws HiveException {
if (arguments.length != 1) {
return null;
}
if (arguments[0].get() == null) {
return null;
}
DataFrame dataFrame = sqlContext
.sql("select * from A where col1 = value and col2 =value2");
javaSparkContext.close();
return "dataFrame";
}
or the scala functions do i need use like below?
static class Z extends scala.runtime.AbstractFunction0<DataFrame> {
#Override`enter code here`
public DataFrame apply() {
// TODO Auto-generated method stub
return sqlContext.sql("select * from A where col1 = value and col2
=value2");
}
}

ServiceStack.OrmLite with a DateTime.Month Predicate

While using ServiceStack.OrmLite 3.9.70.0, and following some of the examples from the ServiceStack.OrmLite wiki.
I am trying to select rows where the LastActivity date month = 1.
I keep getting the error:
{"variable 'pp' of type 'Author' referenced from scope '', but it is not defined"}
LastActivity is a nullable DateTime, defind like:
public DateTime ? LastActivity { get; set;}
I have tried:
db.Select<Author>(q => q.LastActivity.Value.Month == 1);
AND
var visitor = db.CreateExpression<Author>();
db.Select<Author>(visitor.Where(q => q.LastActivity.Value.Month == 1));
AND
SqlExpressionVisitor<Author> ev = OrmLiteConfig.DialectProvider.ExpressionVisitor<Author>();
db.Select<Author>(ev.Where(q => q.LastActivity.Value.Month == 1));
AND
var predicate = ServiceStack.OrmLite.PredicateBuilder.True<Author>();
predicate = predicate.And(q => q.LastActivity.Value.Month == 1);
db.Select<Author>(predicate);
I am trying to avoid using a sql string in the select because I like the compile time checking of the field names and types.
do a less than and more than on the date field IE
LastActivity >= variableThatHoldsStartDateOfMonth && LastActivity <= VariableThatHoldsLastDayOfMOnth.
This will give you results for the whole month

What is the best way to deal with nullable string columns in LinqToSql?

Assume you have a table with a nullable varchar column. When you try to filter the table, you would use (pFilter is parameter):
var filter = pFilter;
var dataContext = new DBDataContext();
var result = dataContext.MyTable.Where(x=>x.MyColumn == filter).ToList();
Now, what if there is a keyword that means "All Nulls". The code would look like:
var filter = pFilter != "[Nulls]" ? pFilter : null;
var dataContext = new DBDataContext();
var result = dataContext.MyTable.Where(x=>x.MyColumn == filter).ToList();
But this doesn't work. Apparently, a String with value of null is... not null?
However, what do work is this code:
var filter = pFilter != "[Nulls]" ? pFilter : null;
var dataContext = new DBDataContext();
var result = dataContext.MyTable.Where(x=>x.MyColumn == filter || (filter == null && x.MyColumn == null)).ToList();
The workaround did not convinced me, that's why my question is: What is the best way to deal with nullable string columns in LinqToSql?
Use String.Equals that will make LINQ handle null appropriately on the generated SQL query
var result = dataContext.MyTable
.Where(x => String.Equals(x.MyColumn, filter))
.ToList();
Edit:
If you use == LINQ will generate the query for the general case WHERE [column] = #parameter but on SQL NULL does not match NULL, the proper way to test for NULL is [column] IS NULL.
With String.Equals LINQ has enough information to translate the method to the appropiate sentence in each case, what means:
if you pass a non-null string it will be
WHERE ([column] IS NOT NULL) AND ([column] = #parameter)
and if it is null
WHERE [column] IS NULL

Resources