I'm trying to write a subquery in doctrine2, to sort a table ordered by date column of another table.
(let's say I'm querying table A, which has an id column, and B has an a_id and a date, in the subquery b.a_id = a.id)
I'm using query builder, and the addSelect method, but since I can't use LIMIT in my query I get this error:
SQLSTATE[21000]: Cardinality violation: 1242 Subquery returns more
than 1 row
This error is true, but how could I limit this query to 1 row, if LIMIT is not allowed in doctrine2 and when I try to do it by querybuilder (I mean the subquery) and I'm using setMaxResults, and then getDQl it is still not working.
->addSelect('(SELECT b.date FROM B b WHERE b.conversation = a.id ORDER BY b.date DESC)')
Is there any solution for my problem?
Thanks
Make the query return exactly one row. Try SELECT MAX(b.date) FROM B b WHERE b.conversation = a.id)
Related
What is the correct behavior of the last and last_value functions in Apache Spark/Databricks SQL. The way I'm reading the documentation (here: https://docs.databricks.com/spark/2.x/spark-sql/language-manual/functions.html) it sounds like it should return the last value of what ever is in the expression.
So if I have a select statement that does something like
select
person,
last(team)
from
(select * from person_team order by date_joined)
group by person
I should get the last team a person joined, yes/no?
The actual query I'm running is shown below. It is returning a different number each time I execute the query.
select count(distinct patient_id) from (
select
patient_id,
org_patient_id,
last_value(data_lot) data_lot
from
(select * from my_table order by data_lot)
where 1=1
and org = 'my_org'
group by 1,2
order by 1,2
)
where data_lot in ('2021-01','2021-02')
;
What is the correct way to get the last value for a given field (for either the team example or my specific example)?
--- EDIT -------------------
I'm thinking collect_set might be useful here, but I get the error shown when I try to run this:
select
patient_id,
last_value(collect_set(data_lot)) data_lot
from
covid.demo
group by patient_id
;
Error in SQL statement: AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;;
Aggregate [patient_id#89338], [patient_id#89338, last_value(collect_set(data_lot#89342, 0, 0), false) AS data_lot#91848]
+- SubqueryAlias spark_catalog.covid.demo
The posts shown below discusses how to get max values (not the same as last in a list ordered by a different field, I want the last team a player joined, the player may have joined the Reds, the A's, the Zebras, and the Yankees, in that order timewise, I'm looking for the Yankees) and these posts get to the solution procedurally using python/r. I'd like to do this in SQL.
Getting last value of group in Spark
Find maximum row per group in Spark DataFrame
--- SECOND EDIT -------------------
I ended up using something like this based upon the accepted answer.
select
row_number() over (order by provided_date, data_lot) as row_num,
demo.*
from demo
You can assign row numbers based on an ordering on data_lots if you want to get its last value:
select count(distinct patient_id) from (
select * from (
select *,
row_number() over (partition by patient_id, org_patient_id, org order by data_lots desc) as rn
from my_table
where org = 'my_org'
)
where rn = 1
)
where data_lot in ('2021-01','2021-02');
Table A having 20 records and table B showing 19 records. How to find that one record is which is missing in table B. How to do compare/subtract records of these two tables; to find that one record. Running query in Apache Superset.
The exact answer depends on which column(s) define whether two records are the same. Assuming you wanted to use some primary key column for the comparison, you could try:
SELECT a.*
FROM TableA a
WHERE NOT EXISTS (SELECT 1 FROM TableB b WHERE b.pk = a.pk);
If you wanted to use more than one column to compare records from the two tables, then you would just add logic to the exists clause, e.g. for three columns:
WHERE NOT EXISTS (SELECT 1 FROM TableB b WHERE b.col1 = a.col1 AND
b.col2 = a.col2 AND
b.col3 = a.col3)
I have a Partion key: A
Clustering columns: B, C
I do understand I can query like this
Select * from table where A = ?
Select * from table where A = ? and B = ?
Select * from table where A = ? and B = ? and C = ?
On certain cases, I want the B value to be any value in that column.
Is there a way I can query like the following?
Select * from table where A = ? and B = 'any value' and C = ?
Option 1:
In Cassandra, you should design your data model to suit your queries. Therefore the proper way to support your fourth query (queries by A and C, but not necessarily knowing B value), is to create a new table to handle that specific query. This table will be pretty much the same, except the CLUSTERING COLUMNS will be in slightly different order:
PRIMARY KEY (A, C, B)
Now this query will work:
Select * from table where A = ? and C = ?
Option 2:
Alternatively you can create a materialized view, with a different clustering order. Now Cassandra will keep the MV in sync with your table data.
create materialized view mv_acbd as
select A, B, C, D
from TABLE1
where A is not null and B is not null and C is not null
primary key (A, C, B);
Now the query against this MV will work like a charm
Select * from mv_acbd where A = ? and C = ?
Option 3:
Not the best, but you could use the following query with your table as it is
Select * from table where A = ? and C = ? ALLOW FILTERING
Relying on ALLOW FILTERING is never a good idea, and is certainly not something that you should do in a production cluster. For this particular case, the scan is within the same partition and performance may vary depending on ratio of how many clustering columns per partition your use case has.
I like to know the performance difference in executing the following two queries for a table cycling.cyclist_points containing 1000s of rows. :
SELECT sum(race_points)
FROM cycling.cyclist_points
WHERE id = e3b19ec4-774a-4d1c-9e5a-decec1e30aac;
select *
from cycling.cyclist_points
WHERE id = e3b19ec4-774a-4d1c-9e5a-decec1e30aac;
If sum(race_points) causes the query to be expensive, I will have to look for other solutions.
Performance Difference between your query :
Both of your query need to scan same number of row.(Number of row in that partition)
First query only selecting a single column, so it is little bit faster.
Instead of calculating the sum run time, try to preprocess the sum.
If race_points is int or bigint then use a counter table like below :
CREATE TABLE race_points_counter (
id uuid PRIMARY KEY,
sum counter
);
Whenever a new data inserted into cyclist_points also increment the sum with your current point.
UPDATE race_points_counter SET sum = sum + ? WHERE id = ?
Now you can just select the sum of that id
SELECT sum FROM race_points_counter WHERE id = ?
How to find the fifth order for each customer and return title_order or null if the customer doesn't have the fifth order
Tables are
customer with columns Id, firstname, lastname...
order with columns order_id, title_order, id_custmer, date...
It can be done only with a query or do I need to create a function
Thanks in advance
You can use OUTER APPLY with OFFSET-FETCH:
select c.firstname, oa.title_order
from customer c
outer apply(select title_order from order o
where o.id_custmer = c.Id
order by date
offset 4 ROW FETCH next 1 ROW only)oa