I have the following set in my generator configuration:
<generate>
...
<javaTimeTypes>true</javaTimeTypes>
...
<generate>
and a column of type datetime2, but it generates a timestamp instead of a LocalDateTime:
created_at datetime2 not null
Am I doing something wrong?
I found the problem. I had the <generate> block duplicated which lead to this behavior.
Related
I recently started using Cassandra for my new project and doing some load testing.
I have a scenario where I’m doing dsbulk load using CSV like this,
$ dsbulk load -url <csv path> -k <keyspace> -t <table> -h <host> -u <user> -p <password> -header true -cl LOCAL_QUORUM
My CSV file entries looks like this,
userid birth_year created_at freq
1234 1990 2023-01-13T23:27:15.563Z {1234:{"(1, 2)": 1}}
Column types,
userid bigint PRIMARY KEY,
birth_year int,
created_at timestamp,
freq map<bigint, frozen<map<frozen<tuple<tinyint, smallint>>, smallint>>>
The issue is, for column freq, I try different ways of setting the value in csv like below, but not able to insert the row using dsbulk
Let’s say if I set freq as {1234:{[1, 2]: 1}},
com.datastax.oss.dsbulk.workflow.commons.schema.InvalidMappingException: Could not map field freq to variable freq; conversion from Java type java.lang.String to CQL type Map(BIGINT => Map(Tuple(TINYINT, SMALLINT) => SMALLINT, not frozen), not frozen) failed for raw value: {1234:{[1,2]: 1}}
Caused by: java.lang.IllegalArgumentException: Could not parse ‘{1234:{[1, 2]: 1}}’ as Json
Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected character (‘[’ (code 91)): was expecting either valid name character (for unquoted name) or double-quote (for quoted) to start field name
at [Source: (String)“{1234:{[1, 2]: 1}}“; line: 1, column: 9]
If I set freq as {\"1234\":{\"[1, 2]\":1}},
java.lang.IllegalArgumentException: Expecting record to contain 4 fields but found 5.
If I set freq as {1234:{"[1, 2]": 1}} or {1234:{"(1, 2)": 1}},
Source: 1234,80,2023-01-13T23:27:15.563Z,“{1234:{“”[1, 2]“”: 1}}” java.lang.IllegalArgumentException: Expecting record to contain 4 fields but found 5.
But in COPY FROM TABLE command, the value for freq {1234:{[1, 2]:1}} inserts into DB without any error, the value in DB looks like this {1234: {(1, 2): 1}}
I guess the JSON not accepting array(tuple) as key when I try with dsbulk? Can someone advise me what’s the issue and how to fix this? Appreciate your help.
When loading data using the DataStax Bulk Loader (DSBulk), the CSV format for CQL tuple type is different from the format used by the COPY ... FROM command because DSBulk uses a different parser.
Formatting the CSV data is particularly challenging in your case because the column contains multiple nested CQL collections.
InvalidMappingException
The JSON parser used by DSBulk doesn't accept parentheses () when enclosing tuples. It also expects tuples to be enclosed in double quotes " otherwise you'll get errors like:
com.datastax.oss.dsbulk.workflow.commons.schema.InvalidMappingException: \
Could not map field ... to variable ...; \
conversion from Java type ... to CQL type ... failed for raw value: ...
...
Caused by: java.lang.IllegalArgumentException: Could not parse '...' as Json
...
Caused by: com.fasterxml.jackson.core.JsonParseException: \
Unexpected character ('(' (code 91)): was expecting either valid name character \
(for unquoted name) or double-quote (for quoted) to start field name
...
IllegalArgumentException
Since values for tuples contain a comma (,) as a separator, DSBulk incorrectly parses the rows and it thinks each row contains more fields than expected and throws an IllegalArgumentException, for example:
java.lang.IllegalArgumentException: Expecting record to contain 2 fields but found 3.
Solution
Just to make it easier, here is the schema for the table I'm using as an example:
CREATE TABLE inttuples (
id int PRIMARY KEY,
inttuple map<frozen<tuple<tinyint, smallint>>, smallint>
)
In this example CSV file, I've used the pipe character (|) as a delimiter:
id|inttuple
1|{"[2,3]":4}
Here's another example that uses tabs as the delimiter:
id inttuple
1 {"[2,3]":4}
Note that you will need to specify the delimiter with either -delim '|' or -delim '\t' when running DSBulk. Cheers!
👉 Please support the Apache Cassandra community by hovering over the cassandra tag then click on the Watch tag button. 🙏 Thanks!
From what I looked so far, it seems impossible with Cassandra. But I thought I'd give it a shot:
How can I select a value of a json property, parsed from a json object string, and use it as part of an update / insert statement in Cassandra?
For example, I'm given the json object:
{
id:123,
some_string:"hello there",
mytimestamp: "2019-09-02T22:02:24.355Z"
}
And this is the table definition:
CREATE TABLE IF NOT EXISTS myspace.mytable (
id text,
data blob,
PRIMARY KEY (id)
);
Now the thing to know at this point is that for a given reason the data field will be set to the json string. In other words, there is no 1:1 mapping between the given json and the table columns, but the data field contains the json object as kind of a blob value.
... Is it possible to parse the timestamp value of the given json object as part of an insert statement?
Pseudo code example of what I mean, which obviously doesn't work ($myJson is a placeholder for the json object string above):
INSERT INTO myspace.mytable (id, data)
VALUES (123, $myJson)
USING timestamp toTimeStamp($myJson.mytimestamp)
The quick answer is no, it's not possible to do that with CQL.
The norm is to parse the elements of the JSON object within your application to extract the corresponding values to construct the CQL statement.
As a side note, I would discourage using the CQL blob type due to possible performance issues should the blob size exceeed 1MB. If it's JSON, consider storing it as CQL text type instead. Cheers!
Worth mentioning, but CQL can do a limited amount of JSON parsing on its own. Albeit, not as detailed as you're asking here (ex: USING timestamp).
But something like this works:
> CREATE TABLE myjsontable (
... id TEXT,
... some_string TEXT,
... PRIMARY KEY (id));
> INSERT INTO myjsontable JSON '{"id":"123","some_string":"hello there"}';
> SELECT * FROM myjsontable WHERE id='123';
id | some_string
-----+-------------
123 | hello there
(1 rows)
In your case you'd either have to redesign the table or the JSON payload so that they match. But as Erick and Cédrick have mentioned, the USING timestamp part would have to happen client-side.
What you detailed is doable with Cassandra.
Timestamp
To insert timestamp in a query it should be formatted as an ISO 8601 String. Sample examples could be found here. In your code, you might have to convert your incoming value to expected type and format.
Blob:
Blob expects to store binary data, as such it cannot be put Ad hoc as a String in a CQL query. (you can use TEXT type to do it if you want to encode base64)
When you need to insert binary data you need to provide proper type as well. For instance if you are working with Javascript to need to provide a Buffer as describe in the documentation Then when you execute your query you externalized your parameters
const sampleId = 123;
const sampleData = Buffer.from('hello world', 'utf8');
const sampleTimeStamp = new Date();
client.execute('INSERT INTO myspace.mytable (id, data) VALUES (?, ?) USING timestamp toTimeStamp(?)', [ sampleId, sampleData, sampleTimeStamp ]);
I have an empty table defined in snowflake as;
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP
);
And it creates the correct table, which has been checked using desc command in sql. Then using a snowflake python connector we are trying to execute following query;
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},{ct});'
ctx.cursor().execute(insert_query)
Just before this query the variables are defined, The main challenge is getting the current time stamp written into snowflake. Here the value of ct is defined as;
import datetime
ct = datetime.datetime.now()
print(ct)
2021-04-30 21:54:41.676406
But when we try to execute this INSERT query we get the following errr message;
ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 1 at position 157 unexpected '21'.
Can I kindly get some help on ow to format the date time value here? Help is appreciated.
In addition to the answer #Lukasz provided you could also think about defining the current_timestamp() as default for the TIME_PREDICTED column:
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP DEFAULT current_timestamp
);
And then just insert ACCOUNT_ID and PREDICTED_PROBABILITY:
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY) VALUES ({accountId}, {risk_score});'
ctx.cursor().execute(insert_query)
It will automatically assign the insert time to TIME_PREDICTED
Educated guess. When performing insert with:
insert_query = f'INSERT INTO ...(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED)
VALUES ({accountId}, {risk_score},{ct});'
It is a string interpolation. The ct is provided as string representation of datetime, which does not match a timestamp data type, thus error.
I would suggest using proper variable binding instead:
ctx.cursor().execute("INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES "
"(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) "
"VALUES(:1, :2, :3)",
(accountId,
risk_score,
("TIMESTAMP_LTZ", ct)
)
);
Avoid SQL Injection Attacks
Avoid binding data using Python’s formatting function because you risk SQL injection. For example:
# Binding data (UNSAFE EXAMPLE)
con.cursor().execute(
"INSERT INTO testtable(col1, col2) "
"VALUES({col1}, '{col2}')".format(
col1=789,
col2='test string3')
)
Instead, store the values in variables, check those values (for example, by looking for suspicious semicolons inside strings), and then bind the parameters using qmark or numeric binding style.
You forgot to place the quotes before and after the {ct}. The code should be :
insert_query = "INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},'{ct}');".format(accountId=accountId,risk_score=risk_score,ct=ct)
ctx.cursor().execute(insert_query)
I am creating a DataFrame and registering that DataFrame as temp view using df.createOrReplaceTempView('mytable'). After that I try to write the content from 'mytable' into Hive table(It has partition) using the following query
insert overwrite table
myhivedb.myhivetable
partition(testdate) // ( 1) : Note here : I have a partition named 'testdate'
select
Field1,
Field2,
...
TestDate //(2) : Note here : I have a field named 'TestDate' ; Both (1) & (2) have the same name
from
mytable
when I execute this query, I am getting the following error
Exception in thread "main" org.apache.hadoop.hive.ql.metadata.Table$ValidationFailureSemanticException: Partition spec
{testdate=, TestDate=2013-01-01}
Looks like I am getting this error because of the same field names ; ie testdate(the partition in Hive) & TestDate (The field in temp table 'mytable')
Whereas if my partition name testdate is different from the fieldname(ie TestDate), the query executes successuflly. Example...
insert overwrite table
myhivedb.myhivetable
partition(my_partition) //Note here the partition name is not 'testdate'
select
Field1,
Field2,
...
TestDate
from
mytable
My guess is it looks like a Bug in Spark...but would like to have second opinion...Am I missing something here?
#DuduMarkovitz #dhee ; apologies for being too late for the response. I am finally able to resolve the issue. Earlier I was creating the table using cameCase(in the CREATE statement) which seems to be the reason for the Exception. Now i have created the table using the DDL where field names are in lower case. This has resolved my issue
I have one column in the cassandra keyspace that is of type timeuuid. When I try to insert a reocord from the java code (using DataStax java driver1.0.3). I get the following exception
com.datastax.driver.core.exceptions.InvalidQueryException: Invalid version for TimeUUID type.
at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:269)
at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:183)
at com.datastax.driver.core.Session.execute(Session.java:111)
Here is my sample code:
PreparedStatement statement = session.prepare("INSERT INTO keyspace:table " +
"(id, subscriber_id, transaction_id) VALUES (now(), ?, ?);");
BoundStatement boundStatement = new BoundStatement(statement);
Session.execute(boundStatement.bind(UUID.fromString(requestData.getsubscriberId()),
requestData.getTxnId()));
I have also tried to use UUIDs.timeBased() instead of now(). But I am gettting the same exception.
Any help on how to insert/read from timeuuid datatype would be appreciated.
By mistake I had created
id uuid
That is why when I try to insert timeuuid in the uuid type field I was getting that exception.
Now I have changed the type for id to timeuuid and everything is working fine.