Need help on Databricks SQL query

Need help on Databricks SQL query - apache-spark

Greetings!!
I have a dataset consists of order_number,start_date,staus columns as shown below.
from the above table i need an output as single row like shown below.
in output, i need recent status and first start date.
Can anyone help me with the approach i should follow.?
Please help me. Thanks!
I tried with Dense Rank but getting either recent start date value and old status value.

You need two separate functions for start date and status running over the same window:
SELECT distinct order_id,
first(start_date) over (partition by order_id order by start_date
rows between unbounded preceding and unbounded following) as start_date,
last(status) over (partition by order_id order by start_date
rows between unbounded preceding and unbounded following) as status
FROM a_table;

Related

How to update multiple columns in window frame based on other column value on same window frame

My Dataframe is something like below, trying to update all column values based on the highest version within that group. I am able to update at the complete table level but failed to update the in-group/window frame level.
source:
expected output:

SELECT *, max_by(status, version) OVER (PARTITION BY number) AS updated_status FROM your_table
This should work for your case.

Can't sort by date correctly

Instead ordering by day, is ordering by month.
I've tried str_to_date but doesn't have in spark sql, and tried to repeat date_format in order by with no success

try the below code
import org.apache.spark.sql.functions._
spark.sql("""
SELECT TO_DATE(CAST(UNIX_TIMESTAMP(ttr.created_at, 'dd/MM/yyyy') AS TIMESTAMP)) AS data from dl_wallet.tb_transaction as ttr order by data desc """
).show()

As you're formatting the date as strings, sort is done by string order. One solution is to change the format so that year comes first, then month, then day. A better way is to order by the Date column (ttr.created_at) and not the formatted string.

How to use ORDER BY and GROUP BY together in u-sql

I am having a u-sql query which fetch some from 3 tables and this query already had the GROUP BY. I want to fetch only top 10 rows, so i have to use the FETCH.
#data= SELECT C.id,C.Name,C.Address,ph.phoneLabel,ph.phone
FROM person AS C
INNER JOIN
phone AS ph
ON ph.id == C.id
GROUP BY id
ORDER BY id ASC
FETCH 100 ROWS;
Please provide me some samples.
Thanks in Advance!

I am not an expert or anything but few days ago I executed a query which uses both group by and order by clause. Here's how it looks: SELECT distinct savedposters.*, comments.rating, comments.posterid FROM savedposters INNER JOIN comments ON savedposters.id=comments.posterid WHERE savedposters.display=1 GROUP BY comments.posterid HAVING avg(comments.rating)>=4 and count(comments.rating)>=2 ORDER BY avg(comments.rating) DESC

What is your exact goal? There is no relationship between ORDER BY and GROUP BY. In your query you have GROUP BY but there is no aggregation so the GROUP BY is not needed, plus the query would fail. If you're looking to limit the output by 10 rows then see the first example at Output Statement (U-SQL).

How to get accumulated total in Netsuite saved search?

I have a scenario where i have to take sum of a columns values in a linear fashion. I am using:-
SUM({custrecord_hm_bc_payroll_net_pay}) OVER(PARTITION BY {custrecord_advs_hold_unhold_status} ORDER BY {custrecord_hm_bc_payroll_emp_id} ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
Please suggest.

To use Oracle's analytic functions in a Netsuite Saved Search you need to trick the system by inserting a comment -
SUM/* comment */({custrecord_hm_bc_payroll_net_pay}) OVER(PARTITION BY {custrecord_advs_hold_unhold_status} ORDER BY {custrecord_hm_bc_payroll_emp_id} ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
Link

cassandra query for list the data using timestamp

I am very new to Cassandra. I have one table with the following columns CustomerId, Timestamp, Action,ProductId. I need to select the CustomerId and from date - to date using time stamp.I dont know how to do this in cassandra any help will be appreciated.

First of all could you should remember that you should plan what queries will be executed in future and make table keys according to it.
If you have keys as (customerId, date) then your query can be for example:
SELECT * FROM products WHERE customerId= '1' AND date < 1453726670241 AND date > 1453723370048;
Please, see http://docs.datastax.com/en/latest-cql/cql/cql_using/useAboutCQL.html

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Need help on Databricks SQL query - apache-spark

Related

How to update multiple columns in window frame based on other column value on same window frame

Can't sort by date correctly

How to use ORDER BY and GROUP BY together in u-sql

How to get accumulated total in Netsuite saved search?

cassandra query for list the data using timestamp

Categories

Resources