Presto function APPROX_PERCENTILE get not exist result - presto

We are using presto‘s APPROX_PERCENTILE to calculate quantiles, but the results are not in our data collection.
Could anyone help me with this?
Thank you!
-- test sql
SELECT APPROX_PERCENTILE(x, 0.5) FROM (SELECT * FROM (VALUES 44,46,156,178,198,308,317,359,368,371,392,468,495,504,509,519,526,534,566,600,617,207776,232837,251596,575799,606566,629862,690977,1247659,1274939,1410858,1475113,1606614,1705740,1764699,1802117,1935795,2388198,2471839,2495978,2646356) AS y(x));
Result
55533
but 55533 does not exist in the collection.

Related

psycopg2 SELECT query with inbuilt functions

I have the following SQL statement where i am reading the database to get the records for 1 day. Here is what i tried in pgAdmin console -
SELECT * FROM public.orders WHERE createdat >= now()::date AND type='t_order'
I want to convert this to the syntax of psycopg2but somehow it throws me errors -
Database connection failed due to invalid input syntax for type timestamp: "now()::date"
Here is what i am doing -
query = f"SELECT * FROM {table} WHERE (createdat>=%s AND type=%s)"
cur.execute(query, ("now()::date", "t_order"))
records = cur.fetchall()
Any help is deeply appreciated.
DO NOT use f strings. Use proper Parameter Passing
now()::date is better expressed as current_date. See Current Date/Time.
You want:
query = "SELECT * FROM public.orders WHERE (createdat>=current_date AND type=%s)"
cur.execute(query, ["t_order"])
If you want dynamic identifiers, table/column names then:
from psycopg2 import sql
query = sql.SQL("SELECT * FROM {} WHERE (createdat>=current_date AND type=%s)").format(sql.Identifier(table))
cur.execute(query, ["t_order"])
For more information see sql.

Cosmos Db Sql Query produces drastically different results when using order by

I have a Cosmos Db instance with > 1 Million JSON Documents stored in it.
I am trying to pull data of a certain time frame as to when the document was created based on the _ts variable which is auto-generated when the document is inserted. It represents the UNIX timestamp of that moment.
I am unable to understand, why both these queries produce drastically different results:
Query 1:
Select *
from c
where c._ts > TimeStamp1
AND c._ts < TimeStamp2
Produces 0 results
Query 2
Select *
from c
where c._ts > TimeStamp1
AND c._ts < TimeStamp2
order by c._ts desc
Produces the correct number of results.
What I have tried?
I suspected that might be because of the default CosmosDb index on the data. So, I rewrote the index policy to index only that variable. Still the same problem.
Since my end purpose is to group by on the returned data from the query, then I tried to use group by with order by alone or in a subquery. Surprisingly, according to the docs, CosmosDb yet doesn't support using group by with order by.
What I need help on?
Why am I observing such a behavior?
Is there a way to index the Db in such a way that the rows are returned.
Beyond this, is there a way to still use group by and order by together (Please don't link the question to another one because of this point, I have gone through them and their answers are not valid in my case).
#Andy and #Tiny-wa, Thanks for replying.
I was able to understand the unintended behavior and it was showing up because of the GetCurrentTimestamp() used to calculate the TimeStamps. The documentation states that
This system function will not utilize the index. If you need to
compare values to the current time, obtain the current time before
query execution and use that constant string value in the WHERE
clause.
Although, I don't fully understand what this means but I was to solve this by creating a Stored Procedure where the Time Stamp is fetched before the SQLAPI query is formed and executed and I was able to get the rows as expected.
Stored Procedure Pseudocode for that is like:
function FetchData(){
..
..
..
var Current_TimeStamp = Date.now();
var CDbQuery =
`Select *
FROM c
where (c._ts * 10000000) > DateTimeToTicks(DateTimeAdd("day", -1, TicksToDateTime(` + Current_TimeStamp + ` * 10000)))
AND (c._ts * 10000000) < (` + Current_TimeStamp + ` * 10000)`
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
XQuery,
function (err, feed, options) {
..
..
..
});
}

RedShift Correlated Sub-query

Need your help. I am trying to convert below SQL query into RedShift, but getting error message "Invalid operation: This type of correlated subquery pattern is not supported yet"
SELECT
Comp_Key,
Comp_Reading_Key,
Row_Num,
Prev_Reading_Date,
( SELECT MAX(X) FROM (
SELECT CAST(dateadd(day, 1, Prev_Reading_Date) AS DATE) AS X
UNION ALL
SELECT dim_date.calendar_date
) a
) as start_dt
FROM stage5
JOIN dim_date ON calendar_date BETWEEN '2020-04-01' and '2020-04-15'
WHERE Comp_Key =50906055
The same query works fine in SQL Server. Could you please help me to run it in RedShift?
Regards,
Kiru
Kiru - you need to convert the correlated query into a join structure. Not knowing the data content of your tables and the exact expected out put I'm just guessing but here's a swag:
SELECT
Comp_Key,
Comp_Reading_Key,
Row_Num,
Prev_Reading_Date,
Max_X
FROM stage5
JOIN dim_date ON calendar_date BETWEEN '2020-04-01' and '2020-04-15'
JOIN ( SELECT MAX(X) as Max_X, MAX(calendar_date) as date FROM (
SELECT CAST(dateadd(day, 1, Prev_Reading_Date) AS DATE) AS X FROM stage5
cross join
SELECT dim_date.calendar_date from dim_date
) a
) as start_dt ON a.date = dim_date.calendar_date
WHERE Comp_Key =50906055
This is just a starting guess but might get you started.
However, you are likely better off rewriting this query to use window functions as they are the fastest way to perform these types of looping queries in Redshift.
Thanks Bill. It won't work in RedShift as it still has correalted sub-query.
However I have modified query in another method and it works fine.
I am closing ticket.

SELECT with multiple values in DocumentDB

I have an Employees collection and I want to retrieve full documents of 10 employees whose ID's I'd like to send to my SQL SELECT. How do I do that?
To further clarify, I have 10 EmployeeId's and I want pull these employees' information from my Employees collection. I'd appreciate your help with this.
Update:
As of 5/6/2015, DocumentDB supports the IN keyword; which supports up to 100 parameters.
Example:
SELECT *
FROM Employees
WHERE Employees.id IN (
"01236", "01237", "01263", "06152", "21224",
"21225", "21226", "21227", "21505", "22903",
"14003", "14004", "14005", "14006", "14007"
)
Original Answer:
Adding on to Ryan's answer... Here's an example:
Create the following UDF:
var containsUdf = {
id: "contains",
body: function(arr, obj) {
if (arr.indexOf(obj) > -1) {
return true;
}
return false;
}
};
Use your contains UDF is a SQL query:
SELECT * FROM Employees e WHERE contains(["1","2","3","4","5"], e.id)
For documentation on creating UDFs, check out the DocumentDB SQL reference
You can also vote for implementing the "IN" keyword for "WHERE" clauses at the DocumentDB Feedback Forums.
You could also achieve this by using OR support. Below is a sample –
SELECT *
FROM Employees e
WHERE e.EmployeeId = 1 OR e.EmployeeId = 2 OR e.EmployeeId = 3
If you need more number of ORs than what DocumentDB caps, you would have to break up your queries into multiple smaller queries by employeeId values. You can also issue the queries in parallel from the client and gather all the results
The best way to do this, today would be to create a Contains() UDF that took in the array of ids to search on and use that in the WHERE clause.
Does
Select * from Employees where EmployeeId in (1,3,5,6,...)
Not work ?
thanks to ryancrawcour we know it doesn't
Another method is to use the ARRAY_CONTAINS method in the SQL API.
Here is the sample code :
SELECT *
FROM Employees
WHERE ARRAY_CONTAINS(["01236", "01237", "01263", "06152", "21224"],Employees.id).
I ran both queries ( using the IN method ) with a sample set of datasets, both are consuming the same amount of RUs.

Cannot link MS Access query with subquery

I have created a query with a subquery in Access, and cannot link it in Excel 2003: when I use the menu Data -> Import External Data -> Import Data... and select the mdb file, the query is not present in the list. If I use the menu Data -> Import External Data -> New Database Query..., I can see my query in the list, but at the end of the import wizard I get this error:
Too few parameters. Expected 2.
My guess is that the query syntax is causing the problem, in fact the query contains a subquery. So, I'll try to describe the query goal and the resulting syntax.
Table Positions
ID (Autonumber, Primary Key)
position (double)
currency_id (long) (references Currency.ID)
portfolio (long)
Table Currency
ID (Autonumber, Primary Key)
code (text)
Query Goal
Join the 2 tables
Filter by portfolio = 1
Filter by currency.code in ("A", "B")
Group by currency and calculate the sum of the positions for each currency group an call the result: sumOfPositions
Calculate abs(sumOfPositions) on each currency group
Calculate the sum of the previous results as a single result
Query
The query without the final sum can be created using the Design View. The resulting SQL is:
SELECT Currency.code, Sum(Positions.position) AS SumOfposition
FROM [Currency] INNER JOIN Positions ON Currency.ID = Positions.currency_id
WHERE (((Positions.portfolio)=1))
GROUP BY Currency.code
HAVING (((Currency.code) In ("A","B")));
in order to calculate the final SUM I did the following (in the SQL View):
SELECT Sum(Abs([temp].[SumOfposition])) AS sumAbs
FROM [SELECT Currency.code, Sum(Positions.position) AS SumOfposition
FROM [Currency] INNER JOIN Positions ON Currency.ID = Positions.currency_id
WHERE (((Positions.portfolio)=1))
GROUP BY Currency.code
HAVING (((Currency.code) In ("A","B")))]. AS temp;
So, the question is: is there a better way for structuring the query in order to make the export work?
I can't see too much wrong with it, but I would take out some of the junk Access puts in and scale down the query to this, hopefully this should run ok:
SELECT Sum(Abs(A.SumOfPosition)) As SumAbs
FROM (SELECT C.code, Sum(P.position) AS SumOfposition
FROM Currency As C INNER JOIN Positions As P ON C.ID = P.currency_id
WHERE P.portfolio=1
GROUP BY C.code
HAVING C.code In ("A","B")) As A
It might be worth trying to declare your parameters in the MS Access query definition and define their datatypes. This is especially important when you are trying to use the query outside of MS Access itself, since it can't auto-detect the parameter types. This approach is sometimes hit or miss, but worth a shot.
PARAMETERS [[Positions].[portfolio]] Long, [[Currency].[code]] Text ( 255 );
SELECT Sum(Abs([temp].[SumOfposition])) AS sumAbs
FROM [SELECT Currency.code, Sum(Positions.position) AS SumOfposition
FROM [Currency] INNER JOIN Positions ON Currency.ID = Positions.currency_id
WHERE (((Positions.portfolio)=1))
GROUP BY Currency.code
HAVING (((Currency.code) In ("A","B")))]. AS temp;
I have solved my problems thanks to the fact that the outer query is doing a trivial sum. When choosing New Database Query... in Excel, at the end of the process, after pressing Finish, an Import Data form pops up, asking
Where do you want to put the data?
you can click on Create a PivotTable report... . If you define the PivotTable properly, Excel will display only the outer sum.

Resources