spark handling of time zone for builtin date & time related functions - apache-spark

Assuming I have a timestamp like one obtained from current_timestamp() UDF inside spark when using a function like: hour(), minute(), ... . How can I specify a time zone?
I believe that https://issues.apache.org/jira/browse/SPARK-18350 introduced the support for it. But can't get it to work. Similar to the last comment on the page:
session.read.schema(mySchema)
.json(path)
.withColumn("year", year($"_time"))
.withColumn("month", month($"_time"))
.withColumn("day", dayofmonth($"_time"))
.withColumn("hour", hour($"_time", $"_tz"))
Having a look at the definition of the hour function, it uses an Hour
expression which can be constructed with an optional timeZoneId. I
have been trying to create an Hour expression but this is
Spark-internal construct - and the API forbids to use it directly. I
guess providing a function hour(t: Column, tz: Column) along with the
existing hour(t: Column) would not be a satisfying design.
I am stuck on trying to pass a specific time zone to the default builtin time UDFs.

Related

Apache Beam Python SDK for Windowing with SQL

The problem is, I want to make a windowing inside SqlTransform as
SELECT f_timestamp, line, COUNT(*)
FROM PCOLLECTION
GROUP BY
line,
HOP(f_timestamp, INTERVAL '30' MINUTE, INTERVAL '1' HOUR)
My Row transformation mapping is
| "Create beam Row" >> beam.Map(lambda x: beam.Row(f_timestamp= float(x["timestamp_date"]), line = unicode(x["line"])))
And I have an error on the Java side as
Caused by: org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.validate.SqlValidatorException:
Cannot apply 'HOP' to arguments of type 'HOP(<DOUBLE>, <INTERVAL MINUTE>, <INTERVAL HOUR>)'.
Supported form(s): 'HOP(<DATETIME>, <DATETIME_INTERVAL>, <DATETIME_INTERVAL>)'
The things I tried:
Make the f_timestamp UNIX timestamp, as a float.
Make the f_timestamp string timestamp as a unicode.
As I read, Java side uses java.util.Date on the timestamp, how can I work around this issue?
You should be able to use apache_beam.utils.timestamp.Timestamp for this.

Node.js - Oracle DB and fetchAsString format

I am stuck on a problem and I am not sure what is the best way to solve it. I have a date column that I want to select and I want to fetch it as a string. Which is great, node-oracledb module has this option with fetchAsString mehotd. But it fetches the date like this for example 10-JAN-16 and I want to fetch it like this 10-01-2016. Is there a way to do that from the node-oracledb module, or I should modify the date after I get the result from the query?
UPDATE: I mean solution without to_char in the query and without query modifications
Check out this section of my series on Working with Dates in JavaScript, JSON, and Oracle Database:
https://dzone.com/articles/working-with-dates-using-the-nodejs-driver
The logon trigger shows an example of using alter session to set the default date format. Keep in mind that there is NLS_DATE_FORMAT, NLS_TIMESTAMP_FORMAT, NLS_TIMESTAMP_TZ_FORMAT.
I only show NLS_TIMESTAMP_TZ_FORMAT because I convert to that type in the examples that follow as I need to do some time zone conversion for the date format I'm using.
Another way to set the NLS parameters is to use environment variables of the same name. Note that this method will not work unless you set the NLS_LANG environment variable as well.

Presto - static date and timestamp in where clause

I'm pretty sure the following query used to work for me on Presto:
select segment, sum(count)
from modeling_trends
where segment='2557172' and date = '2016-06-23' and count_time between '2016-06-23 14:00:00.000' and '2016-06-23 14:59:59.000';
group by 1;
now when I run it (on Presto 0.147 on EMR) I get an error of trying to assigning varchar to date/timestamp..
I can make it work using:
select segment, sum(count)
from modeling_trends
where segment='2557172' and date = cast('2016-06-23' as date) and count_time between cast('2016-06-23 14:00:00.000' as TIMESTAMP) and cast('2016-06-23 14:59:59.000' as TIMESTAMP)
group by segment;
but it feels dirty...
is there a better way to do this?
Unlike some other databases, Presto doesn't automatically convert between varchar and other types, even for constants. The cast works, but a simpler way is to use the type constructors:
WHERE segment = '2557172'
AND date = date '2016-06-23'
AND count_time BETWEEN timestamp '2016-06-23 14:00:00.000' AND timestamp '2016-06-23 14:59:59.000'
You can see examples for various types here: https://prestosql.io/docs/current/language/types.html
Just a quick thought.. have you tried omitting the dashes in your date? try 20160623 instead of 2016-06-23.
I encountered something similar with SQL server, but not used Presto.

Parameterized job using Uno-choice plugin

I'm using Uno choice plugin to select parameter values based on previous selections.
(This plugin helped me to reduce parameter count. I can reuse same parameter for multiple platform based on the platform selection)
I used the groovy script to select parameter values.
But it takes too much time to load parameters.
Is there any way to speed up this process?
I had faced similar issues and I was also using groovy scripts to cal shell scripts.I did the following things to reduce time:-
When you click on build with Parameters all task(scripts run at once together) are performed at once.
Use else conditions properly.
Also use Fallback script.
For eg:-
you have parameters such as
1) country
2) state
3) city
each parameter depends on the previous values.
1) Try to only display contents on Jenkins front-end.(cat command).
2) Call a script if only it matches valid values in the previous parameter.
3) minimum on the fly scripts.
4) optimize delays/sleep according to your load time.
5) Remove any extensions whether in chrome/Firefox.
5) Try using the same page in incognito mode.
6) If options are invalid through invalid option without going into any computation.
7) Uninstall plugin which are not required.
Will add more suggestions as I find.
I would request you also to please update if you find any method to optimize time.

Groovy Script set timezone for timestamp

I've been struggling to set the Time zone inside a GroovyScript. By now I have found out that the following code returns the actual time stamp from my location.
javax.xml.datatype.DatatypeFactory.newInstance()
.newXMLGregorianCalendar( GregorianCalendar.getInstance() ).toString()[0..21] + "Z"
Now I need it to return the date and time in UTC specifically, so it has the main server's timezone and could be run from any other location.
All these are run in a GroovyScript test step in SoapUi and it will be used as a variable inside a WSDL request.
Note: This will be used as a single liner in the Custom Properties of a Soap Project.
One of the solution:
System.setProperty('user.timezone', 'UTC')
def gc= new GregorianCalendar()
the second is:
c = Calendar.instance
c.timeZone = TimeZone.getTimeZone("UTC")
The first solution work with a GregorianCalanedar which easy to convert to xml date. But I think best solution work with Calendar.
I don't test these codes! Please check it!

Resources