Can't sort by date correctly - apache-spark

Instead ordering by day, is ordering by month.
I've tried str_to_date but doesn't have in spark sql, and tried to repeat date_format in order by with no success

try the below code
import org.apache.spark.sql.functions._
spark.sql("""
SELECT TO_DATE(CAST(UNIX_TIMESTAMP(ttr.created_at, 'dd/MM/yyyy') AS TIMESTAMP)) AS data from dl_wallet.tb_transaction as ttr order by data desc """
).show()

As you're formatting the date as strings, sort is done by string order. One solution is to change the format so that year comes first, then month, then day. A better way is to order by the Date column (ttr.created_at) and not the formatted string.

Related

Need help on Databricks SQL query

Greetings!!
I have a dataset consists of order_number,start_date,staus columns as shown below.
from the above table i need an output as single row like shown below.
in output, i need recent status and first start date.
Can anyone help me with the approach i should follow.?
Please help me. Thanks!
I tried with Dense Rank but getting either recent start date value and old status value.
You need two separate functions for start date and status running over the same window:
SELECT distinct order_id,
first(start_date) over (partition by order_id order by start_date
rows between unbounded preceding and unbounded following) as start_date,
last(status) over (partition by order_id order by start_date
rows between unbounded preceding and unbounded following) as status
FROM a_table;

Python. Best way to filter array by date

I have a list of Rest objects. It's django model
class Rest(models.Model):
product = models.ForeignKey('Product', models.DO_NOTHING)
date = models.DateTimeField()
rest = models.FloatField()
I want to select objects from it for today's date. I do it like this. Maybe there is some more convenient and compact way?
for rest in rests_list:
if rest.date.day == datetime.now().day:
result.append(rest)
First - datetime.now().day will get you the day of the month (e.g. 18 if today is 18th March 2020), but you've said you want today's date. The below is on the assumption you want today's date, not the day of the month.
(NOTE: weAreStarDust pointed out that the field is a DateTimeField, not a DateField, which I missed and have updated the answer for)
The way you are doing it right now seems like it might be fetching all of the Rests from the database and then filter them in your application code (assuming rests_listcomes from something likeRest.objects.all()`. Generally, you want to do as much filtering as possible on the database query itself, and as little filtering as possible in the client code.
In this case, what you probably want to do is:
from datetime import date
Rest.objects.filter(date__date=date.today())
That will bring back only the records that have a date of today, which are the ones you want.
If you already have all the rests somehow, and you just want to filter to the ones from today then you can use:
filter(lambda x: x.date.date() == date.today(), rests_list)
That will return a filter object containing only the items in rests_list that have date == date.today(). If you want it as a list, rather than an iterable, you can do:
list(filter(lambda x: x.date.date() == date.today(), rests_list))
or for a tuple:
tuple(filter(lambda x: x.date.date() == date.today(), rests_list))
NOTE:
If you actually want to be storing only a date, I would suggest considering use of a DateField (although not if you want to store any timezone information).
If you want to store a DateTime, I would consider renaming the field from date to datetime, or started_at - calling the field date but having a datetime can be a bit confusing and lead to errors.
As docs says
For datetime fields, casts the value as date. Allows chaining additional field lookups. Takes a date value.
from datetime import datetime
Rest.objects.filter(date__date = datetime.now().day)
You can use the django filter for filtering and get only today's date data from your model. No need to fetch all data first and then apply loop for get today's date data. You have to write your query like ...
import datetime
Rest.objects.filter(date__date = datetime.date.today())
But be sure that timezone should be same for database server and web server

Is there a way to get the oldest date of a Pandas dataframe when not all columns are dates?

Just to give a little background, I'm trying to loop through the dataframe and query each Asset by it's oldest date. Then I can trim the data with Pandas locally for each of the items that make the asset.
The dataframe that I'm tryin to loop through looks something like this:
I've tried using query_date = df.min(axis=1) but it picks up the values rather than the dates.
query_date will be the start date for each query that will be inside a for loop.
Thanks in advance for the help

cassandra query for list the data using timestamp

I am very new to Cassandra. I have one table with the following columns CustomerId, Timestamp, Action,ProductId. I need to select the CustomerId and from date - to date using time stamp.I dont know how to do this in cassandra any help will be appreciated.
First of all could you should remember that you should plan what queries will be executed in future and make table keys according to it.
If you have keys as (customerId, date) then your query can be for example:
SELECT * FROM products WHERE customerId= '1' AND date < 1453726670241 AND date > 1453723370048;
Please, see http://docs.datastax.com/en/latest-cql/cql/cql_using/useAboutCQL.html

I want to select a date in access

select date from table1 where date <=(select Format(date,'mm/##/yyyy') as dates from table 2).
This query returns "02/##/2011".
I want to make a list like "02/02/2011","02/03/2011" etc.
I think the problem is with your query, if its a query. change your query by putt star (*) instead of hash(#).

Resources