Gremlin bulk load csv date formate - bulk-load

I am trying to upload data in to the AWS neptune, but getting error because of date format
sample format of csv:
~id, ~from, ~to, ~label, date:Date
e1, v1, v2, created, 2019-11-04
can some one help me on this?

I did a test today using a CSV as follows
~id,~from,~to,~label,date:Date
e1,3,4,testedge,2019-11-19
and it worked fine. This is after the load:
gremlin> g.E('e1')
==>e[e1][3-testedge->4]
gremlin> g.E('e1').valueMap()
==>{date=Tue Nov 19 00:00:00 UTC 2019}
Perhaps curl the loader endpoint for your cluster adding ?errors=true&details=true to the curl URL
Cheers,
Kelvin

Related

PostgreSQL - how to convert timestamp with timezone to timestamp without timezone

In my PostgreSQL database, the datetime stored as 2022-05-10 10:44:19+08 and when I get
the datetime by using the sequelize, it will give in format:: 2022-05-10T02:44:19.000Z.
So, my question is how to convert to 2022-05-10 10:44:19 ?
Thanks in advance.
There is a direct dependence on the time of your server. Therefore, depending on what you want to get, you can use different options.
Here is a dbfiddle with examples

Drop Databricks table if older than 30 days

I would like to drop Databricks SQL DB tables, if the table was created more than 30 days ago. How do I get the table created datetime from databricks?
Thanks
Given a tableName, the easiest way to get the creation time is as follows:
import org.apache.spark.sql.catalyst.TableIdentifier
val createdAtMillis = spark.sessionState.catalog
.getTempViewOrPermanentTableMetadata(new TableIdentifier(tableName))
.createTime
getTempViewOrPermanentTableMetadata() returns CatalogTable that contains information such as:
CatalogTable(
Database: default
Table: dimension_npi
Owner: root
Created Time: Fri Jan 10 23:37:18 UTC 2020
Last Access: Thu Jan 01 00:00:00 UTC 1970
Created By: Spark 2.4.4
Type: MANAGED
Provider: parquet
Num Buckets: 8
Bucket Columns: [`npi`]
Sort Columns: [`npi`]
Table Properties: [transient_lastDdlTime=1578699438]
Location: dbfs:/user/hive/warehouse/dimension_npi
Serde Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Schema: root
|-- npi: integer (nullable = true)
...
)
You can list all tables in a database using sessionCatalog.listTables(database).
There are alternative ways of accomplishing the same but with a lot more effort and risking errors due to Spark behavior changes: poking about table metadata using SQL and/or traversing the locations where tables are stored and looking at file timestamps. That's why it's best to go via the catalog APIs.
Hope this helps.
Assuming your DB table is delta:
You can use the DESCRIBE HISTORY <database>.<table> to retrieve all transactions made to that table, including timestamps. According to the databricks documentation - history is only retained for 30 days. Depending on how you plan to implement your solution that may just work.

Cast String to date time in Stream analytics Query

We are sending datetime as string in the format 2018-03-20 10:50:037.996, and we have written Steam analytics query as below.
SELECT
powerscout.Device_Id AS PowerScout,
powerscout.[kW System],
CAST(powerscout.[TimeStamp] AS datetime) AS [TimeStamp]
INTO
[PowergridView]
FROM
[IoTHubIn]
When we are sending data through Stream analytics, Job is getting failed.
Any Suggestions please.
Thanks In Advance
ASA can parse DATETIME fields represented in one of the formats described in ISO 8601. This format is not supported. You can try using custom JavaScript function to parse it.

Azure Data Lake Analytics - Output dates as +0000 rather than -0800

I have a datetime column in an Azure Data Lake Analytics table.
All my incoming data is UTC +0000. When using the below code, all the csv outputs convert the dates to -0800
OUTPUT #data
TO #"/data.csv"
USING Outputters.Text(quoting : false, delimiter : '|');
An example datatime in the output:
2018-01-15T12:20:13.0000000-08:00
Are there any options for controlling the output format of the dates? I don't really understand why everything is suddenly in -0800 when the incoming data isn't.
Currently, ADLA does not store TimeZone information in DateTime, meaning it will always default to the local time of the cluster machine when reading (-8:00 in your case). Therefore, you can either normalize your DateTime to this local time by running
DateTime.SpecifyKind(myDate, DateTimeKind.Local)
or use
DateTime.ConvertToUtc()
to output in Utc form (but note that next time you ingest that same data, ADLA will still default to reading it in offset -0800). Examples below:
#getDates =
EXTRACT
id int,
date DateTime
FROM "/test/DateTestUtc.csv"
USING Extractors.Csv();
#formatDates =
SELECT
id,
DateTime.SpecifyKind(date, DateTimeKind.Local) AS localDate,
date.ConvertToUtc() AS utcDate
FROM #getDates;
OUTPUT #formatDates
TO "/test/dateTestUtcKind_AllUTC.csv"
USING Outputters.Csv();
You can file a feature request for DateTime with offset on our ADL feedback site. Let me know if you have other questions!

How to fetch a file name automatically in to a data frame instead of manually specifying it

I am trying to automate my spark code in Scala or python and here is what I am trying to do
Format of files in s3 bucket is filename_2016_02_01.csv.gz
From s3 bucket the spark code should be able to pick the file name and create an Dataframe
example Dataframe=sqlContext.read.format("com.databricks.spark.csv").options(header="true").options(delimiter=",").options(inferSchema="true").load("s3://bucketname/filename_2016-01-29.csv.gz")
So every day when I run the job it should be pick that particular days file and create an dataframe instead of me specifying the file name .
Any Idea on how to write code for this condition ?
Thanks in Advance.
If i understood you correctly, you want the file name change automatically based on that day date.
if that's the case:
here is a Scala solution:
Im using joda-time to generate that date.
import org.joda.time.format.DateTimeFormat
import org.joda.time.{DateTimeZone, DateTime}
...
val today = DateTime.now(DateTimeZone.UTC).toString(DateTimeFormat.forPattern("yyyy_MM_dd"))
val fileName = "filename_" + today + ".csv.gz"
...
Python solution:
from datetime import datetime
today = datetime.utcnow().strftime('%Y_%m_%d')
file_name = 'filename_' + today + '.csv.gz'
load("s3://bucketname/{}").format(file_name)

Resources