how to sort field with char and decimal value - jpql

i am stuck with the scenario where i have to sort
'a-2.3'
'a-1.1' and
'a-1.02'.
how do we do this using jpql in spring data jpa or using sql query. I would appreciate your personal experience and idea.
the sorting expected in ascending order based on the numerical value after a- .

This will depend on the database you're using, but e.g. in Oracle, you could call TO_NUMBER(SUBSTR(col, 3)) with SQL:
WITH t (col) AS (
SELECT 'a-2.3' FROM dual UNION ALL
SELECT 'a-1.1' FROM dual UNION ALL
SELECT 'a-1.02' FROM dual
)
SELECT col
FROM t
ORDER BY to_number(substr(col, 3))
This yields:
a-1.02
a-1.1
a-2.3
Of course, you'll have to adapt the parsing in case your prefix isn't always exactly a-, but something dynamic.
In JPQL, this could be feasible: CAST(SUBSTRING(col, 3, LENGTH(col) - 2) AS NUMBER)

Related

Reduce results to first match for each pattern with spark sql

I have a spark sql query, where I have to search for multiple identifiers:
SELECT * FROM my_table WHERE identifier IN ('abc', 'cde', 'efg', 'ghi')
Now I get hundreds of results for each of these matches, where I am only interested in the first match for each identifier, i.e. one row with identifier == 'abc', one where identifier == 'cde' and so on.
What is the best way to reduce my result to only the first row for each match?
The best approach certainly depends a bit on your data and also on what you mean by first. Is that any random row that happens to be returned first? Or first by some particular sort order?
A general flexible approach is using window functions. row_number() allows you to easily filter for the first row by window.
SELECT * FROM (
SELECT *, row_number() OVER (PARTITION BY identifier ORDER BY ???) as row_num
FROM my_table
WHERE identifier IN ('abc', 'cde', 'efg', 'ghi')) tmp
WHERE
row_num = 1
Though, aggregations like first or max_by are often more efficient. But these get quickly inconvenient when dealing with lots of columns.
You can use the first() aggregation function (after grouping by identifier) to only get the first row in each group.
But I don't think you'll be able to select * with this approach. Instead, you can list every individual column you want to get:
SELECT identifier, first(col1), first(col2), first(col3), ...
FROM my_table
WHERE identifier IN ('abc', 'cde', 'efg', 'ghi')
GROUP BY identifier
Another approach would be to fire a query for each identifier value with a limit of 1 and then union all the results.
With the DataFrame API, you can use your original query and then use .dropDuplicates(["identifier"]) on the result to only keep a single row for each identifier value.

Correct way to get the last value for a field in Apache Spark or Databricks Using SQL (Correct behavior of last and last_value)?

What is the correct behavior of the last and last_value functions in Apache Spark/Databricks SQL. The way I'm reading the documentation (here: https://docs.databricks.com/spark/2.x/spark-sql/language-manual/functions.html) it sounds like it should return the last value of what ever is in the expression.
So if I have a select statement that does something like
select
person,
last(team)
from
(select * from person_team order by date_joined)
group by person
I should get the last team a person joined, yes/no?
The actual query I'm running is shown below. It is returning a different number each time I execute the query.
select count(distinct patient_id) from (
select
patient_id,
org_patient_id,
last_value(data_lot) data_lot
from
(select * from my_table order by data_lot)
where 1=1
and org = 'my_org'
group by 1,2
order by 1,2
)
where data_lot in ('2021-01','2021-02')
;
What is the correct way to get the last value for a given field (for either the team example or my specific example)?
--- EDIT -------------------
I'm thinking collect_set might be useful here, but I get the error shown when I try to run this:
select
patient_id,
last_value(collect_set(data_lot)) data_lot
from
covid.demo
group by patient_id
;
Error in SQL statement: AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;;
Aggregate [patient_id#89338], [patient_id#89338, last_value(collect_set(data_lot#89342, 0, 0), false) AS data_lot#91848]
+- SubqueryAlias spark_catalog.covid.demo
The posts shown below discusses how to get max values (not the same as last in a list ordered by a different field, I want the last team a player joined, the player may have joined the Reds, the A's, the Zebras, and the Yankees, in that order timewise, I'm looking for the Yankees) and these posts get to the solution procedurally using python/r. I'd like to do this in SQL.
Getting last value of group in Spark
Find maximum row per group in Spark DataFrame
--- SECOND EDIT -------------------
I ended up using something like this based upon the accepted answer.
select
row_number() over (order by provided_date, data_lot) as row_num,
demo.*
from demo
You can assign row numbers based on an ordering on data_lots if you want to get its last value:
select count(distinct patient_id) from (
select * from (
select *,
row_number() over (partition by patient_id, org_patient_id, org order by data_lots desc) as rn
from my_table
where org = 'my_org'
)
where rn = 1
)
where data_lot in ('2021-01','2021-02');

Why is CQL allowing inequality operators with partition key?

The documentation is clear that the only operators allowed in a SELECT for use with a partition column are equals (=) and in[value1, values2[,...]), however, with ALLOW FILTERING, it seems inequality operators are allowed. Here's a simple example:
CREATE TABLE dept_emp (
emp_no INT,
dept_no VARCHAR,
from_date DATE,
to_date DATE,
PRIMARY KEY (emp_no, dept_no)
);
insert into dept_emp (emp_no, dept_no, from_date, to_date) values
(1, '9', '1901-01-01', '1920-02-01');
insert into dept_emp (emp_no, dept_no, from_date, to_date) values
(2, '9', '1920-01-01', '1930-01-01');
insert into dept_emp (emp_no, dept_no, from_date, to_date) values
(3, '9', '1920-01-01', '1930-01-01');
SELECT * FROM dept_emp WHERE emp_no > 1 ALLOW FILTERING;
emp_no | dept_no | from_date | to_date
--------+---------+------------+------------
2 | 9 | 1920-01-01 | 1930-01-01
3 | 9 | 1920-01-01 | 1930-01-01
(2 rows)
I took the document as describing what the CQL parser would recognize and so was expecting a error like I get if I try a != operator. If this is just an ALLOW FILTERING thing, is it documented elsewhere what operators are allowed in that case?
The partition key is in token order so things like > require reading the entire data set from all replica sets, filtering out things dont match. This is horribly inefficient and expensive (which is why ALLOW FILTERING is required). The same would be true of !=, generally C* will out right refuse to do any operation that requires reading everything as its simply something that database is not designed for. ALLOW FILTERING allows some cases of this for things like Spark jobs but they should be avoided in everything but random single run operational debugging tasks or well thought out olap jobs.
Equality on the partition key is required to have any semblance of an efficient query for the coordinator to know where to send the request. I would highly recommend only using equality and changing your data model such that you can satisfy queries that way.

WHERE variable = ( subquery ) in OpenSQL

I'm trying to retrieve rows from a table where a subquery matches an variable. However, it seems as if the WHERE clause only lets me compare fields of the selected tables against a constant, variable or subquery.
I would expect to write something like this:
DATA(lv_expected_lines) = 5.
SELECT partner contract_account
INTO TABLE lt_bp_ca
FROM table1 AS tab1
WHERE lv_expected_lines = (
SELECT COUNT(*)
FROM table2
WHERE partner = tab1~partner
AND contract_account = tab1~contract_account ).
But obviously this select treats my local variable as a field name and it gives me the error "Unknown column name "lv_expected_lines" until runtime, you cannot specify a field list."
But in standard SQL this is perfectly possible:
SELECT PARTNER, CONTRACT_ACCOUNT
FROM TABLE1 AS TAB1
WHERE 5 = (
SELECT COUNT(*)
FROM TABLE2
WHERE PARTNER = TAB1.PARTNER
AND CONTRACT_ACCOUNT = TAB1.CONTRACT_ACCOUNT );
So how can I replicate this logic in RSQL / Open SQL?
If there's no way I'll probably just write native SQL and be done with it.
The program below might lead you to an Open SQL solution. It uses the SAP demo tables to determines the plane types that are used on a specific number of flights.
REPORT zgertest_sub_query.
DATA: lt_planetypes TYPE STANDARD TABLE OF s_planetpp.
PARAMETERS: p_numf TYPE i DEFAULT 62.
START-OF-SELECTION.
SELECT planetype
INTO TABLE lt_planetypes
FROM sflight
GROUP BY planetype
HAVING COUNT( * ) EQ p_numf.
LOOP AT lt_planetypes INTO DATA(planetype).
WRITE: / planetype.
ENDLOOP.
It only works if you don't need to read fields from TAB1. If you do you will have to gather these with other selects while looping at your results.
For those dudes who found this question in 2020 I report that this construction is supported since ABAP 7.50. No workarounds are needed:
SELECT kunnr, vkorg
FROM vbak AS v
WHERE 5 = ( SELECT COUNT(*)
FROM vbap
WHERE kunnr = v~kunnr
AND vkorg = v~vkorg )
INTO TABLE #DATA(customers).
This select all customers who made 5 sales orders within some sales organization.
In ABAP there is no way to do the query as in NATIVE SQL.
I would advice not to use NATIVE SQL, instead give a try to SELECT/ENDSELECT statement.
DATA: ls_table1 type table1,
lt_table1 type table of table1,
lv_count type i.
SELECT PARTNER, CONTRACT_ACCOUNT
INTO ls_table1
FROM TABLE1.
SELECT COUNT(*)
INTO lv_count
FROM TABLE2
WHERE PARTNER = TAB1.PARTNER
AND CONTRACT_ACCOUNT = TAB1.CONTRACT_ACCOUNT.
CHECK lv_count EQ 5.
APPEND ls_table1 TO lt_table1.
ENDSELECT
Here you append to ls_table1 only those rows where count is equals to 5 in selection of table2.
Hope it helps.

Split a string of numbers passed to stored procedure and perform a lookup in a table [duplicate]

This question already has answers here:
Oracle LISTAGG() for querying use
(2 answers)
Closed 9 years ago.
I have to pass string of numbers ( like 234567, 678956, 345678) to a stored procedure, the SP will split that string by comma delimiter and take each value ( eg: 234567) and do a look up in another table and get the corresponding value from another column and build a string.
For instance if have a table, TableA with 3 columns Column1, Column2, and Column3 with data as follows:
1 123456 XYZ
2 345678 ABC
I would pass a string of numbers to a stored procedure, for instance '123456', '345678'. It would then split this sting of numbers and take the first number - 123456 and do a look up in TableA and get the matching value from Column3 - i.e. 'XYZ'.
I need to loop through the table with split string of numbers ('12345', '345678') and return the concatenated string - like "XYZ ABC"
I am trying to do it in Oracle 11g.
Any suggestions would be helpful.
It's almost always more efficient to do everything in a single statement if at all possible, i.e. don't use a function if you can avoid it.
There is a little trick you can use to solve this using REGEXP_SUBSTR() to turn your string into something usable.
with the_string as (
select '''123456'', ''345678''' as str
from dual
)
, the_values as (
select regexp_substr( regexp_replace(str, '[^[:digit:],]')
, '[^,]+', 1, level ) as val
from the_string
connect by regexp_substr( regexp_replace(str, '[^[:digit:],]')
, '[^,]+', 1, level ) is not null
)
select the_values.val, t1.c
from t1
join the_values
on t1.b = the_values.val
This works be removing everything but the digits you require and something to split them on, the comma. You then split it on the comma and use a hierarchical query to turn this into a column, which you can then use to join.
Here's a SQL Fiddle as a demonstration.
Please note that this is highly inefficient when used on large datasets. It would probably be better if you passed variables normally to your function...

Resources