Subquery error while executing the code below - subquery

Hello while executing this subquery i am getting the error:
An expression of non-boolean type specified in a context where a
condition is expected, near ','.
select Emp_Name, Salary
from Employee
where (Dep_ID, Salary) in (Select Dep_ID, max(salary) as highest_salary
from Employee
group by Dep_ID);
Though I tried changing the datatype to Varchar(max), it is not working.

Related

Correct way to get the last value for a field in Apache Spark or Databricks Using SQL (Correct behavior of last and last_value)?

What is the correct behavior of the last and last_value functions in Apache Spark/Databricks SQL. The way I'm reading the documentation (here: https://docs.databricks.com/spark/2.x/spark-sql/language-manual/functions.html) it sounds like it should return the last value of what ever is in the expression.
So if I have a select statement that does something like
select
person,
last(team)
from
(select * from person_team order by date_joined)
group by person
I should get the last team a person joined, yes/no?
The actual query I'm running is shown below. It is returning a different number each time I execute the query.
select count(distinct patient_id) from (
select
patient_id,
org_patient_id,
last_value(data_lot) data_lot
from
(select * from my_table order by data_lot)
where 1=1
and org = 'my_org'
group by 1,2
order by 1,2
)
where data_lot in ('2021-01','2021-02')
;
What is the correct way to get the last value for a given field (for either the team example or my specific example)?
--- EDIT -------------------
I'm thinking collect_set might be useful here, but I get the error shown when I try to run this:
select
patient_id,
last_value(collect_set(data_lot)) data_lot
from
covid.demo
group by patient_id
;
Error in SQL statement: AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;;
Aggregate [patient_id#89338], [patient_id#89338, last_value(collect_set(data_lot#89342, 0, 0), false) AS data_lot#91848]
+- SubqueryAlias spark_catalog.covid.demo
The posts shown below discusses how to get max values (not the same as last in a list ordered by a different field, I want the last team a player joined, the player may have joined the Reds, the A's, the Zebras, and the Yankees, in that order timewise, I'm looking for the Yankees) and these posts get to the solution procedurally using python/r. I'd like to do this in SQL.
Getting last value of group in Spark
Find maximum row per group in Spark DataFrame
--- SECOND EDIT -------------------
I ended up using something like this based upon the accepted answer.
select
row_number() over (order by provided_date, data_lot) as row_num,
demo.*
from demo
You can assign row numbers based on an ordering on data_lots if you want to get its last value:
select count(distinct patient_id) from (
select * from (
select *,
row_number() over (partition by patient_id, org_patient_id, org order by data_lots desc) as rn
from my_table
where org = 'my_org'
)
where rn = 1
)
where data_lot in ('2021-01','2021-02');

Why am I not table to get sorted result in select?

I have the following table:
DEST_COUNTRY_NAME ORIGIN_COUNTRY_NAME count
United States Romania 15
United States Croatia 1
United States Ireland 344
Egypt United States 15
The table is represented as a Dataset.
scala> dataDS
res187: org.apache.spark.sql.Dataset[FlightData] = [DEST_COUNTRY_NAME: string, ORIGIN_COUNTRY_NAME: string ... 1 more field]
The following query to sort a Dataset based on count column works. I am getting the count column, sorting it and showing the result:
scala> dataDS.sort($"count".desc).show;
But if I try to use select then it doesn't work. Why?
scala> dataDS.select(dataDS.col("count").desc).show()
I get the error:
java.lang.UnsupportedOperationException: Cannot evaluate expression: input[0, int, true] DESC NULLS LAST
I have several questions around this:
What is the purpose of sort because to me it seems the ordering is done by col("..").desc? Does sort just converts a Column datatype to a Dataset?
Why doesn't using select work? My logic is (a) create a descending order of column dataDS.col("count").desc, (b) select it and (c) show it. The reason I expected this to work is because a similar sql query will work mysql> select count from flight_data_2015 ORDER BY count DESC;
The reason I expected this to work is because a similar sql query will work mysql> select count from flight_data_2015 ORDER BY count DESC;
But it isn't the same.
select(dataDS.col("count").desc) would be like SELECT count DESC FROM dataDS. Notice there is no ORDER BY clause.
This is what .orderBy or .sort in SparkSQL are doing, i.e. dataDS.sort($"count".desc).show; would be SELECT * FROM dataDS ORDER BY count DESC.
Also, note you could literally write dataDS.sql("SELECT ... ") (after registering the temp view) and it would have the same performance as doing it the other way.
Dataset.sort takes a list of Column objects within that Dataset, but it isn't converting them, only returning a new sorted Dataset

Cassandra compare timestamp with currentdate

Trying to find the count for currendate in Cassandra like using the below query.
select count(*) from xyz.abctable where toDate(datetime)>=toDate(now());
This is not working and gives an error message:
SyntaxException: line 1:55 no viable alternative at input '(' (...)
from xyz.zbctable where toDate
Data format for datetime, this needs to be compared with current and list the count for currentdate.
2017-02-23 22:41:12.386000+0000

Missing Column Assignment for 'featurename'

I am trying to write a CASE statement in which I need to run the subquery to check whether the record is available in access table. If the person has access then score should anything between 0 to 100 which is taken care by coalesce and if not then the value should be NULL. But the query is failing with the error saying, Missing column Assignment for 'id'.
My query:
SELECT
CASE
WHEN EXISTS (SELECT * FROM hive_dsn.db.access AS aa WHERE ub.id=aa.id)
THEN COALESCE(ub.score*100,0)
ELSE
NULL
END AS UNUSED
FROM hive_dsn.db.unused_output AS ub;
Basically, I did not understand the error statement. What this error is saying and how can I resolve this.
Thanks in Advance.
I figured out that the where condition is checking on columns rather than values without any JOIN.
I rewrote query like follows:
SELECT
CASE
WHEN EXISTS (SELECT ub.personid FROM hive_dsn.db.access AS aa
JOIN hive_dsn.db.unused_output AS ub ON ub.id=aa.id)
THEN COALESCE(ub.score*100,0)
ELSE NULL
END AS UNUSED
from hive_dsn.db.unusedoutput AS ub;
This JOIN condition resolved the issue.
Instead of co related sub query this can be achieved using join.
this is sample ...
SELECT
CASE
WHEN aa.id is null then null
else THEN COALESCE(ub.score*100,0)
END AS UNUSED
FROM hive_dsn.db.unused_output AS ub
left join hive_dsn.db.access AS aa
on ub.id=aa.id ;
Note: If there is 1:n relation ship in tables then use distinct in select statement.

USe Subquery to return columns from different tables

I want a sub query which returns columns from different tables
for example
i am writing the code in the way similar to below
Use North Wind Select *,(Select Order Id FROM dbo. Orders OI WHERE
OI.OrderID IN (Select OI.OrderID FROM [dbo].[Order Details] OD WHERE
OD.UnitPrice=P.UnitPrice))AS 'ColumName' FROM Products P
ERROR : Msg 512, Level 16, State 1, Line 1 Sub query returned more
than 1 value. This is not permitted when the subquery follows =, !=,
<, <= , >, >= or when the subquery is used as an expression.
Whats the Mistake in this code
please reply soon
Saradhi
Select Order Id FROM dbo. Orders OI WHERE OI.OrderID IN (Select OI.OrderID FROM [dbo].[Order Details] OD WHERE OD.UnitPrice=P.UnitPrice)
This query is returning more than one OrderId while it should be returning only one. See if your data is correct.

Resources