How to cast varbinary to varchar in presto - presto

I have the following query where shopname is stored as varbinary instead of varchar type.
select shopname, itemname
from shop_profile
where cast(shopname as varchar) = 'Starbucks';
This query returns an error "line 4:7: Cannot cast varbinary to varchar"
May I know if anyone knows what is the correct syntax to convert varbinary to varchar?

You can use from_utf8 function.
Also, to_utf8 can cast the literal 'Starbucks' to varbinary type.

Related

Is it possible for CQL to parse a JSON object to insert data?

From what I looked so far, it seems impossible with Cassandra. But I thought I'd give it a shot:
How can I select a value of a json property, parsed from a json object string, and use it as part of an update / insert statement in Cassandra?
For example, I'm given the json object:
{
id:123,
some_string:"hello there",
mytimestamp: "2019-09-02T22:02:24.355Z"
}
And this is the table definition:
CREATE TABLE IF NOT EXISTS myspace.mytable (
id text,
data blob,
PRIMARY KEY (id)
);
Now the thing to know at this point is that for a given reason the data field will be set to the json string. In other words, there is no 1:1 mapping between the given json and the table columns, but the data field contains the json object as kind of a blob value.
... Is it possible to parse the timestamp value of the given json object as part of an insert statement?
Pseudo code example of what I mean, which obviously doesn't work ($myJson is a placeholder for the json object string above):
INSERT INTO myspace.mytable (id, data)
VALUES (123, $myJson)
USING timestamp toTimeStamp($myJson.mytimestamp)
The quick answer is no, it's not possible to do that with CQL.
The norm is to parse the elements of the JSON object within your application to extract the corresponding values to construct the CQL statement.
As a side note, I would discourage using the CQL blob type due to possible performance issues should the blob size exceeed 1MB. If it's JSON, consider storing it as CQL text type instead. Cheers!
Worth mentioning, but CQL can do a limited amount of JSON parsing on its own. Albeit, not as detailed as you're asking here (ex: USING timestamp).
But something like this works:
> CREATE TABLE myjsontable (
... id TEXT,
... some_string TEXT,
... PRIMARY KEY (id));
> INSERT INTO myjsontable JSON '{"id":"123","some_string":"hello there"}';
> SELECT * FROM myjsontable WHERE id='123';
id | some_string
-----+-------------
123 | hello there
(1 rows)
In your case you'd either have to redesign the table or the JSON payload so that they match. But as Erick and Cédrick have mentioned, the USING timestamp part would have to happen client-side.
What you detailed is doable with Cassandra.
Timestamp
To insert timestamp in a query it should be formatted as an ISO 8601 String. Sample examples could be found here. In your code, you might have to convert your incoming value to expected type and format.
Blob:
Blob expects to store binary data, as such it cannot be put Ad hoc as a String in a CQL query. (you can use TEXT type to do it if you want to encode base64)
When you need to insert binary data you need to provide proper type as well. For instance if you are working with Javascript to need to provide a Buffer as describe in the documentation Then when you execute your query you externalized your parameters
const sampleId = 123;
const sampleData = Buffer.from('hello world', 'utf8');
const sampleTimeStamp = new Date();
client.execute('INSERT INTO myspace.mytable (id, data) VALUES (?, ?) USING timestamp toTimeStamp(?)', [ sampleId, sampleData, sampleTimeStamp ]);

Inserting Timestamp Into Snowflake Using Python 3.8

I have an empty table defined in snowflake as;
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP
);
And it creates the correct table, which has been checked using desc command in sql. Then using a snowflake python connector we are trying to execute following query;
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},{ct});'
ctx.cursor().execute(insert_query)
Just before this query the variables are defined, The main challenge is getting the current time stamp written into snowflake. Here the value of ct is defined as;
import datetime
ct = datetime.datetime.now()
print(ct)
2021-04-30 21:54:41.676406
But when we try to execute this INSERT query we get the following errr message;
ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 1 at position 157 unexpected '21'.
Can I kindly get some help on ow to format the date time value here? Help is appreciated.
In addition to the answer #Lukasz provided you could also think about defining the current_timestamp() as default for the TIME_PREDICTED column:
CREATE OR REPLACE TABLE db1.schema1.table(
ACCOUNT_ID NUMBER NOT NULL PRIMARY KEY,
PREDICTED_PROBABILITY FLOAT,
TIME_PREDICTED TIMESTAMP DEFAULT current_timestamp
);
And then just insert ACCOUNT_ID and PREDICTED_PROBABILITY:
insert_query = f'INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY) VALUES ({accountId}, {risk_score});'
ctx.cursor().execute(insert_query)
It will automatically assign the insert time to TIME_PREDICTED
Educated guess. When performing insert with:
insert_query = f'INSERT INTO ...(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED)
VALUES ({accountId}, {risk_score},{ct});'
It is a string interpolation. The ct is provided as string representation of datetime, which does not match a timestamp data type, thus error.
I would suggest using proper variable binding instead:
ctx.cursor().execute("INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES "
"(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) "
"VALUES(:1, :2, :3)",
(accountId,
risk_score,
("TIMESTAMP_LTZ", ct)
)
);
Avoid SQL Injection Attacks
Avoid binding data using Python’s formatting function because you risk SQL injection. For example:
# Binding data (UNSAFE EXAMPLE)
con.cursor().execute(
"INSERT INTO testtable(col1, col2) "
"VALUES({col1}, '{col2}')".format(
col1=789,
col2='test string3')
)
Instead, store the values in variables, check those values (for example, by looking for suspicious semicolons inside strings), and then bind the parameters using qmark or numeric binding style.
You forgot to place the quotes before and after the {ct}. The code should be :
insert_query = "INSERT INTO DATA_LAKE.CUSTOMER.ACT_PREDICTED_PROBABILITIES(ACCOUNT_ID, PREDICTED_PROBABILITY, TIME_PREDICTED) VALUES ({accountId}, {risk_score},'{ct}');".format(accountId=accountId,risk_score=risk_score,ct=ct)
ctx.cursor().execute(insert_query)

What are the returned data types from a Knex select() statement?

Hi everyone,
I am currently using Knex.js for a project and a question arise when I make a knex('table').select() function call.
What are the returned types from the query ? In particular, If I have a datetime column in my table, what is the return value for this field ?
I believe the query will return a value of type string for this column. But it is the case for any database (I use SQLite3) ? It is possible that the query returns a Date value ?
EXAMPLE :
the user table has this schema :
knex.schema.createTable('user', function (table) {
table.increments('id');
table.string('username', 256).notNullable().unique();
table.timestamps(true, true);
})
since I use SQLite3, table.timestamps(true, true); produces 2 datetime columns : created_at & modified_at.
when I make a query knex('user').select(), it returns a array of objects with the attributes : id, username, created_at, modified_at.
id is of type number
username is of type string
what will be the types of created_at & modified_at ?
Will it be always of string type ? If I use an other database like PostgreSQL, these columns will have the timestamptz SQL type. The returned type of knex will be also a string type ?
This is not in fact something that Knex is responsible for, but rather the underlying database library. So if you're using SQLite, it would be sqlite3. If you're using Postgres, pg is responsible and you could find more documentation here. Broadly, most libraries take the approach that types which have a direct JavaScript equivalent (booleans, strings, null, integers, etc.) are returned as those types; anything else is converted to a string.
Knex's job is to construct the SQL that the other libraries use to talk to the database, and receives the response that they return.
as I believe it will be object of strings or numbers

how to convert nvarchar from T-SQL dialect to hiveQL?

i'm doing ETL task of transforming queries from one SQL dialect to another. The old db uses T-Sql, the new is hiveQL.
SELECT CAST(CONCAT(FMH.FLUIDMODELID,'_',RESERVOIR,'_',PRESSUREPSIA) AS NVARCHAR(255)) AS FACT_RRFP_INJ_PRESS_R_PHK
, FMH.FluidModelID ,FMH.FluidModelName ,[AnalysisDate]
FROM dbo.LZ_RRFP_FluidModelInj fmi
LEFT JOIN DBO.LZ_RRFP_FluidModelHeader fmh ON fmi.FluidModelIDFK = fmh.FluidModelID
LEFT JOIN LZ_RRFP_FluidModelAss fma on fma.InjectionFluidModelIDFK = fmi.FluidModelIDFK
WHERE FMA.RESERVOIR IN (SELECT RESERVOIR_CD FROM ATT_RESERVOIR)
the error is :
org.apache.spark.sql.catalyst.parser.ParseException:
DataType nvarchar(255) is not supported.
how to convert nvarchar?
Hive uses UTF-8 in STRINGs and VARCHARs, you are fine with using VARCHAR or STRING instead of NVARCHAR.
VARCHAR in Hive is the same as STRING + length validation. As #NickW mentioned in the comment, you can do the same without CAST at all, if you inserting result into the table with VARCHAR(255), it will work the same without CAST.

USQL + Python - Schema not matching definition

I am trying to pass data into a python script in Data Lake Analytics.
I've stripped this back to show the error clearly. I understand the python isn't actually doing anything... :-)
I have a very simple table
#FormattedCasinoData =
SELECT int.Parse(UserID) AS [UserID],
int.Parse(ModelID) AS [ModelID],
float.Parse(Value) AS [Value]
FROM #CasinoData
WHERE UserID != "UserID"
ORDER BY UserID
FETCH 1000 ROWS;
So the table format is int, int, float.
When i try to run this
REFERENCE ASSEMBLY [ExtPython];
DECLARE #myScript = #"
def usqlml_main(df):
return df
";
#pythonOutput =
REDUCE #FormattedCasinoData ON [UserID]
PRODUCE [UserID] int, [ModelID] int, [Value] float
USING new Extension.Python.Reducer(pyScript:#myScript);
OUTPUT #pythonOutput
TO #"adl://mydatalake.azuredatalakestore.net/myFolder/PythonOutput20171208.csv"
USING Outputters.Csv();
I get the following error:
"Python returned dataframe schema (System.Int32, System.Int32, System.Double) does match U-SQL schema (System.Int32, System.Int32, System.Single)"
Any idea why the U-SQL schema is expecting System.Single for the third column, when i have explicitly defined "float" in the output.
Thanks!
Sorry for the late reply. This must have slipped through.
In C#, float is a synonym to System.Single (see https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/float).
You should specify double as your target type in your reducer's schema.

Resources