Row_Number Query for Groupby

Row_Number Query for Groupby - excel

My query is about to get row number as SN in a SQL query from access database in which I get total sales for the day group by [bill Date] clause
my Working Query is:
sql = "SELECT [Bill Date] as [Date], Sum(Purchase) + Sum(Returns) as [Total Sales] FROM TableName Group By [Bill Date];"
I found this Row_Number Clause over Internet and i tried like this.
sql = "SELECT ROW_NUMBER() OVER (ORDER BY [Bill Date]) AS [SN], [Bill Date] as [Date], Sum(Purchase) + Sum(Returns) as [Total Sales] FROM TableName Group By [Bill Date];"
when i Run the about Code i get this error.
-2147217900 Syntax error (missing operator) in query expression ROW_NUMBER() OVER (ORDER BY [Bill Date]);"
i am using Excel Vba to connect to Access Database
Could any one help me to to get it in correct order.

Looks like you aren't define DESC or ASC on (ORDER BY [Bill Date]), need be something like this (ORDER BY [Bill Date] DESC) or (ORDER BY [Bill Date] ASC)

Related

A very strange sql query issue when using in clause

I have seen a very strange issue of sql query on Azure sql server.
I have a query using sub-query in "in" clause
This one return error "Error converting data type nvarchar to numeric"
select * from tbl1 where id in (select id from table2 inner join table3 on table2.a=table3.a where status=0 )
This one works. subquery only returns 500 rows, so the top 1000000 should not change anything
select * from tbl1 where id in (select top 1000000 id from table2 inner join table3 on table2.a=table3.a where status=0 )
Thanks

equivalent percentile_cont function in apache spark sql

I am new to spark environment. I have dataset with column names as follows:
user_id, Date_time, order_quantity
I want to calculate the 90th percentile of order_quantity for each user_id.
If it were to be sql, I would have used the following query:
%sql
SELECT user_id, PERCENTILE_CONT ( 0.9 ) WITHIN GROUP (ORDER BY order_quantity) OVER (PARTITION BY user_id)
However, spark doesn't have the built in support for using the percentile_cont function.
Any suggestions on how I can implement this in spark on the above dataset?
please let me know if more information is needed.

I have a solution for PERCENTILE_DISC (0.9) which will return the discrete order_quantity closest to percentile 0.9 (without interpolation).
The idea is to calculate PERCENT_RANK, substract 0.9 and calculate Absolute value, then take the minimal value:
%sql
WITH temp1 AS (
SELECT
user_id,
ABS(PERCENTILE_RANK () OVER
(PARTITION BY user_id ORDER BY order_quantity) -0.9) AS perc_90_temp
SELECT
user_id,
FIRST_VALUE(order_quantity) OVER
(PARTITION BY user_id ORDER BY perc_90_temp) AS perc_disc_90
FROM
temp1;

I was dealing with a similar issue too. I worked in SAP HANA and then I moved to Spark SQL on Databricks. I have migrated the following SAP HANA query:
SELECT
DISTINCT ITEM_ID,
LOCATION_ID,
PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY VENTAS) OVER (PARTITION BY ITEM_ID, LOCATION_ID) AS P95Y,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY PRECIO) OVER (PARTITION BY ITEM_ID, LOCATION_ID) AS MEDIAN_PRECIO
FROM MY_TABLE
to
SELECT DISTINCT
ITEM_ID,
LOCATION_ID,
PERCENTILE(VENTAS,0.8) OVER (PARTITION BY ITEM_ID, LOCATION_ID) AS P95Y,
PERCENTILE(PRECIO,0.5) OVER (PARTITION BY ITEM_ID, LOCATION_ID) AS MEDIAN_PRECIO
FROM
delta.`MY_TABLE`
In your particular case it should be as follows:
SELECT DISTINCT user_id, PERCENTILE(order_quantity,0.9) OVER (PARTITION BY user_id)
I hope this helps.

CQLSH - Check for null in where clause for MAP Data type

CASSANDRA Version : 2.1.10
CREATE TABLE customer_raw_data (
id uuid,
hash_prefix bigint,
profile_data map<varchar,varchar>
PRIMARY KEY (hash_prefix,id));
I have an index on profile_data and I have row where profile_data is null.
How to write a select query to retrieve the rows where profile_data is null ?
I tried the following
select count(*) from customer_raw_data where profile_data=null;
select count(*) from customer_raw_data where profile_data CONTAINS KEY null;

With Reference to : https://issues.apache.org/jira/browse/CASSANDRA-3783
There is currently no select support for indexed nulls, and given the design of Cassandra, is considered a difficult/prohibitive problem.

Basic problem.
where condition column has to be either primary key or secondary index so make your column what-ever is suitable and then try below query.
Try this..
select count(*) from customer_raw_data where profile_data='';

SELECT * FROM TableName WHERE colName > 5000 ALLOW FILTERING; //Work fine
SELECT * FROM TableName WHERE colName > 5000 limit 10 ALLOW FILTERING;
https://cassandra.apache.org/doc/old/CQL-3.0.html
Check the "ALLOW FILTERING" Part.

Nested Sum( ) query is not working in mysql

i'm having a problem in calculating total bill of a ptient. I have three tables named as "test", "pharmacy", "check".
Columns in test are:
patient_ID
testname
rate
Columns in pharmacy are:
patient_ID
medicineDescription
qty
rate
Columns in check are:
patient_ID
doctorID
fees
date
I have a table Bill that will store total amount of a patient.
patient_ID
amount
date
I have used the following query. But it's giving the following error.
$result = mysqli_query($data, "SELECT patient_ID, (SUM(pharmacy.qty*pharmacy.rate ) + SUM(test.rate) + SUM(check.fees))
AS total FROM pharmacy, test, check WHERE patient_ID= '$pID'" );

Correct query should be, closing bracket was missing at the end of subquery (... AS total FROM pharmacy**)**):
$result = mysqli_query ($data, "SELECT patient_ID,
(SUM(pharmacy.qty*pharmacy.rate ) + SUM(test.rate) + SUM(check.fees)) AS total FROM pharmacy),
test,
check
WHERE patient_ID= '$pID'" );

You have three tables in your from clause, but with no join condition - this means you're pairing each row with all the other rows, which is obviously not what you intended. One way to handle this is to use proper joins:
SELECT p.patient_id, pharmacy_sum + test_sum + fees_sum AS total
FROM (SELECT patient_id, SUM(qty * rate) AS pharmacy_sum
FROM pharmacy
WHERE patient_ID= '$pID'
GROUP BY patient_id) p
JOIN (SELECT patient_id, SUM(rate) AS test_sum
FROM test
WHERE patient_ID= '$pID'
GROUP BY patient_id) t ON p.patient_id_id = t.patient_id
JOIN (SELECT patient_id, SUM(fees) AS fees_sum
FROM check
WHERE patient_ID= '$pID'
GROUP BY patient_id) c ON p.patient_id_id = c.patient_id

Can I reuse the result of a subquery to optimize the overall query?

I am using below query to fetch the top two records for profile_run_key. I am using three almost similar queries to get this done. This means I am traversing the table thrice for the "where" clause. So I think this will take 3(n) time for execution. Alternatively I can use "Order by" but the it will take nlogn time to execute.
SELECT name, weighted_average
FROM idp_weighted_avg
where (profile_run_key =
(SELECT MAX (profile_run_key)
FROM idp_weighted_avg
WHERE SCORECARD_IDENTIFIER = 'U:D8yIYvW6EeGKyklcM7Co1A')
OR profile_run_key =
(SELECT MAX (profile_run_key)
FROM idp_weighted_avg
WHERE SCORECARD_IDENTIFIER = 'U:D8yIYvW6EeGKyklcM7Co1A'
AND profile_run_key <
(SELECT MAX (profile_run_key)
FROM idp_weighted_avg
WHERE SCORECARD_IDENTIFIER =
'U:D8yIYvW6EeGKyklcM7Co1A')))
I was wondering if I can reuse (I don't want to create a temp table) the result of the below sub query? Any alternatives? Sugestions?
SELECT MAX (profile_run_key)
FROM idp_weighted_avg
WHERE SCORECARD_IDENTIFIER = 'U:D8yIYvW6EeGKyklcM7Co1A'

It seems like you are selecting the two largest records for a given SCORECARD_IDENTIFIER. If that's what you realy are after, you can add
a ROW_NUMBER
PARTITION BY SCORECARD_IDENTIFIER - Restarts the rownumber for each group
ORDER BY profile_run_key DESC - Numbers the key in a group from hig to low
SQL Statement
SELECT *
FROM idp_weighted_avg wa
INNER JOIN (
SELECT profile_run_key
, rn = ROW_NUMBER() OVER (PARTITION BY SCORECARD_IDENTIFIER ORDER BY profile_run_key DESC)
FROM idp_weighted_avg
WHERE SCORECARD_IDENTIFIER = 'U:D8yIYvW6EeGKyklcM7Co1A'
) rn ON rn.profile_run_key = wa.profile_run_key
WHERE rn <= 2

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Row_Number Query for Groupby - excel

Looks like you aren't define DESC or ASC on (ORDER BY [Bill Date]), need be something like this (ORDER BY [Bill Date] DESC) or (ORDER BY [Bill Date] ASC)

Related

A very strange sql query issue when using in clause

equivalent percentile_cont function in apache spark sql

CQLSH - Check for null in where clause for MAP Data type

Nested Sum( ) query is not working in mysql

Can I reuse the result of a subquery to optimize the overall query?

Categories

Resources