How to express current row and current row +1 in Spark SQL - apache-spark

I have a window sql that want to sum the current row with the next, so I write the following sql:
select depName, empNo, salary, sum(salary) over (partition by depName order by empNo rows between CURRENT ROW AND CURRENT ROW + 1) sum_salary from t
But there is grammar error in it,
org.apache.spark.sql.catalyst.parser.ParseException:
missing ')' at '+'(line 2, pos 136)
== SQL ==
select depName, empNo, salary, sum(salary) over (partition by depName order by empNo rows between CURRENT ROW AND CURRENT ROW + 1) sum_salary from t
----------------------------------------------------------------------------------------------------------------------------------------^^^

The correct syntax to define row-based frame is
ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING
and similarly
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
You can also replace numeric constant with UNBOUNDED keyword.

Related

SQL - Insert a row into table only if not exists and count is less than a threshold

Other similar questions deal these two problems separately, but I want to merge them in a single statement. I'm using Python3 with psycopg2 library on a PostgreSQL database.
I have a table with 3 fields: ID, name, bool (BIT)
I want to insert a row in this table only if does not exists an other row with same 'name' and 'bool' = 0 and the total count of rows with same ID is less than a given threshold.
More specific my table should contain at most a given threshold number of rows with same ID. Those rows can have the same 'name', but only one of those rows with same ID and same 'name' can have 'bool'= 0.
I tried with this:
INSERT INTO table
SELECT 12345, abcdf , 0 FROM table
WHERE NOT EXISTS(
SELECT * FROM table
WHERE ID = 12345 AND name = abcdf AND bool = 0)
HAVING (SELECT count(*) FROM table
WHERE ID = 12345) < threshold
RETURNING ID;
but the row is inserted anyway.
Then I tried the same statement replacing 'HAVING' with 'AND', but it insert all the threshold rows together.

How to skip the first row of data in SQL Query?

I have this code:
select DOLFUT from [DATABASE $]
How do I get it to get data from the 2nd line? (skip only the first line of data and collect all the rest)
You can use LIMIT to skip any number of row you want. Something like
SELECT * FROM table
LIMIT 1 OFFSET 10
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
MySql docs
In Access, which you seem to use, you can use:
Select DOLFUT
From [DATABASE $]
Where DOLFUT Not In
(Select Top 1 T.DOLFUT
From [DATABASE $] As T
Order By 1)
Data in tables have no inherent order.
To get data from the 2nd line
, you have to set up some sort sequence and then bypass the first record of the set - as Gustav has shown.

sqlite instr function in not working in some cases

In Sqlite we have table1 with column column1
there are 4 rows with following values for column1
(p1,p10,p11,p20)
DROP TABLE IF EXISTS table1;
CREATE TABLE table1(column1 NVARCHAR);
INSERT INTO table1 (column1) values ('p1'),('p10'),('p11'),('p20');
Select instr(',p112,p108,p124,p204,p11,p1124,p1,p10,p20,',','+column1+',') from table1;
We have to get the position of each value of column1 in the given string:
,p112,p108,p124,p204,p11,p1124,p1,p10,p20,
the query
Select instr(',p112,p108,p124,p204,p11,p1124,p1,p10,p20,',column1) from table1;
returns values
(2,7,2,17)
which is not what we want
the query
Select instr(',p112,p108,p124,p204,p11,p1124,p1,p10,p20,',','+column1+',') from table1;
returns 9 for all rows -
it turned out that it is the position of first "0" symbol ???
Howe we can get the exact positions of column1 in the given string in sqlite ??
In SQLite the concatenation operator is || and not + (like SQL Server), so do this:
Select instr(',p112,p108,p124,p204,p11,p1124,p1,p10,p20,',',' || column1 || ',') from table1;
What you did with your code was number addition which resulted to 0, because none of the string operands could succesfully be converted to number,
so instr() was searching for '0' and found it always at position 9 of the string:',p112,p108,p124,p204,p11,p1124,p1,p10,p20,'.

Power Query: Index adjustment solution

I have multiple tables with index column and some matching ids. I need to combine all tables in one with adjusted index by applying ratio between matching ids.
The first step (the yellow one) is simple: we multiply Table2 index on ratio of fist 2 initial tables. The hard part is the next step (the reddish one): we need to find ratio between matching id of Table3 and the previously adjusted id of Table2.
Is there a creative way you can make this in Power Query?
See image below:
Thanks!
The red index is simply
(100/82)*(88/100) = 88/82 = 1.07317
You can continue this pattern with more tables. For example, with five tables, your last index would be:
(Index of Max Table1 id)/(Index of Min Table2 id) *
(Index of Max Table2 id)/(Index of Min Table3 id) *
(Index of Max Table3 id)/(Index of Min Table4 id) *
(Index of Max Table4 id)/(Index of Min Table5 id)

Get last item with date range and name filter in google sheets

I have the below set of records in Google Sheets. I would like to filter the rows with specific name and date range. Once I have the filtered data, I would like to fetch the last row's final amount cell data.
Ex: I would like to fetch final amount as 300 if my date(dd/mm/yyyy) range is 01/01/206 to 11/06/2016 and Name selection is 'Sandeep'.
As I have experience SQLite db, I have inserted the same records in DB and got the expected result using the below query.
select Final from MyTable where Date in (select max(Date) from MyTable WHERE Date BETWEEN '01/01/2016' AND '11/06/2016' and name = "Sandeep")
But I am not getting idea how to use multiple select statements in google sheets. It is ok for me to get result using any other way. So please help me to get the result as explained above.
= QUERY (A1:E50,"Select F where A > date '2016-1-1' and A < date '2016-6-11' and B ='Sandeep' order by A desc limit 1")
Use Column IDs A,B,C instead of name, income. Multiple columns can be given in a single Select clause separated by a ,
Dates in where clause should be written in yyyy-mm-dd format only(regardless of the format of dates in actual column)
See if this works
=index(E:E, max(filter(row(A:A), A:A>date(2016, 1, 1), A:A<date(2016, 6, 11), B:B="Sandeep")))
If you want to include start and end date, change > to >= and < to <=.

Resources