Calculate WeekOfMonth & WeekOfYear from a given date in Apache Spark - apache-spark

In one of the case in the my work I need to calculate the week of month and week of year from a give date. In Spark 1.5.0 there is a built in function available to calculate the same.
import org.apache.spark.sql.functions._
val WeekOfMonth = date_format($"GivenDate","W")
val WeekOfYear = weekofyear($"GivenDate")
This value takes start of the week as Sunday.
But I want to calculate the week of month and week of year with Thursday as the start of the week. How can I do that?

I figured out a way to find the Week Of Month and Week Of Year with Thursday as start of the week. Below is the code i used to achieve the same.
import java.sql.Timestamp
import java.text.SimpleDateFormat
import java.util.Calendar
val dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
val dateValue = dateFormat.parse(givenDate)
val cal = Calendar.getInstance()
cal.setTime(dateValue)
cal.setFirstDayOfWeek(Calendar.THURSDAY)
cal.setMinimalDaysInFirstWeek(1)
val weekOfMonth = cal.get(Calendar.WEEK_OF_MONTH)
val weekOfYear = cal.get(Calendar.WEEK_OF_YEAR)
Hope this helps someone. thank you.

Related

How to convert a str in hh:mm:ss format type to timestamp type without (year month day info) in pyspark?

I am trying to convert the str type hh:mm:ss to timestamp type without (year month day info), below is my code, however, it still pops out the 1970-01-01 info.
import pyspark
from pyspark.sql.functions import *
df1 = spark.createDataFrame([('10:30:00',)], ['date'])
df2 = (df1
.withColumn("new_date", to_timestamp("date", 'HH:mm:ss')))
df2.show(2)
sample output: 1970-01-01 10:30:00;
How to ignore the year-month-day info in this case? Can someone please help?
Thanks a lot

Why Spark is not recognizing this time format?

I get null for the timestamp 27-04-2021 14:11 with this code. What mistake am I doing? Why is the timestamp format string DD-MM-yyyy HH:mm not correct here?
df = spark.createDataFrame([('27-04-2021 14:11',)], ['t'])
df = df.select(to_timestamp(df.t, 'DD-MM-yyyy HH:mm').alias('dt'))
display(df)
D is for day of the year, and d is for day of the month.
Try this:
df = df.select(F.to_timestamp(df.t, "dd-MM-yyyy HH:mm").alias("dt"))

Adding date & calendar week column in py spark dataframe

I'm using spark 2.4.5. I want to add two new columns, date & calendar week, in my pyspark data frame df.
So I tried the following code:
from pyspark.sql.functions import lit
df.withColumn('timestamp', F.lit('2020-05-01'))
df.show()
But I'm getting error message: AssertionError: col should be Column
Can you explain how to add date column & calendar week?
Looks like you missed the lit function in your code.
Here's what you were looking for:
df = df.withColumn("date", lit('2020-05-01'))
This is your answer if you want to hardcode the date and week. If you want to programmatically derive the current timestamp, I'd recommend using a UDF.
I see two questions here: First, how to cast a string to a date. Second, how to get the week of the year from a date.
Cast string to date
You can either simply use cast("date") or the more specific F.to_date.
df = df.withColumn("date", F.to_date("timestamp", "yyyy-MM-dd"))
Extract week of year
Using format date allows you to format a date column to any desired format. w is the week of the year. W would be the week of the month.
df = df.withColumn("week_of_year", F.date_format("date", "w"))
Related Question: pyspark getting weeknumber of month

how to subtract 5 days from current date

I want to subtract 5 days from current date
code:
import datetime
start_date = datetime.datetime.now().date()
end_date = datetime.datetime.now().date() - datetime.timedelta(days=5)
when I print end_date I am getting error like :
an integer is required (got type datetime.date)
from datetime import date, timedelta
date.today() - timedelta(5)

Incrementing a date in Groovy gets an incorrect date

In my Grails project I have a date in the controller and I need to increment this date by one month, so I did as below:
SimpleDateFormat sdf = new SimpleDateFormat("YYYY-MM-DD HH:MM:ss");
def temp
use (TimeCategory)
{
temp=new Date()+30.days//current date 6-1-2016
}
println(sdf.format(temp))
this was the output:
2016-02-36
I tried plus(30) also giving me the same result. Is there a way to do this increment correctly?
In a Java date format, D stands for "Day in Year", hence 6+30 = 36. You want to use d for "Day in month".
You are also using Y which is "Week year" instead of y which is "year" and M for minutes when you want m.
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
I'd go with this approach :
Calendar calendar = Calendar.getInstance()
calendar.add(Calendar.DAY_OF_YEAR, 30)
or another approach is to use Joda Time

Resources