How to cast varchar to MAP(VARCHAR,VARCHAR) in presto - cassandra

I have table in presto, one column named ("mappings") have key-value pair as string
select mappings from hello;
Ex: {"foo": "baar", "foo1": "bar1" }
I want to cast "mappings" column into a MAP
like select CAST("mappings" as MAP) from hello;
This will throw error in presto. How can we translate this to map?

There is no canonical string representation for a MAP in Presto, so so there's no way to cast it directly to MAP(VARCHAR, VARCHAR). But, if your string contains a JSON map, you can use the json_parse function to convert the string into a value of JSON type and convert that to a SQL MAP via a cast.
Example:
WITH
data(c) AS (
VALUES '{"foo": "baar", "foo1": "bar1"}'
),
parsed AS (
SELECT cast(json_parse(c) as map(varchar, varchar)) AS m
FROM data
)
SELECT m['foo'], m['foo1']
FROM parsed
produces:
_col0 | _col1
-------+-------
baar | bar1

select cast( json_parse(mappings) as MAP(VARCHAR,VARCHAR)) from hello1;

Related

how to extract data JSON from zeppelin sql

I query to test_tbl table on Zeppelin.
the table data structure looks like as below :
%sql
desc stg.test_tbl
col_name | data_type | comment
id | string |
title | string |
tags | string |
The tags column has data JSON type following as :
{"name":[{"family": null, "first": "nelson"}, {"pos_code":{"house":"tlv", "id":"A12YR"}}]}
and I want to see the JSON data with columns, so my query is :
select *, tag.*
from stg.test_tbl as t
lateral view explode(t.tags.name) name as name
lateral view explode(name.pos_code) pos_code as pos_code
but when I query, it returns
Can't extract value from tags#3423: need struct type but got string; line 3 pos 21
set zeppelin.spark.sql.stacktrace = true to see full stacktrace
should i query as string in where statement?
Answered myself. I could use get_json_object in string type of JSON.
Also, if the JSON format is array like below
{"name":[{"family": null, "first": "nelson"}, {"pos_code":{"house":"tlv", "id":"A12YR"}}]}
Then I could query using the key
select * from stg.test_tbl as t
where t.pos_code[0].house = "tlv"

Parse JSON from Presto varchar column fails

I am attempting to convert my varchar column data, which is stringifed JSON, to MAP datatype so I can reference the data as elements.
WITH
data(c) AS (
SELECT message from mydb.mytable
),
parsed AS (
SELECT cast(json_parse(c) as map(varchar, varchar)) AS m
FROM data
)
SELECT m['action'], m['uuid']
FROM parsed
Sample data looks like:
{"action":"send","timestamp":1566432054,"uuid":"1234"}
I tried solution provided here: How to cast varchar to MAP(VARCHAR,VARCHAR) in presto, which is where I got the query from replacing values with a select statement but it did not work. I get error:
INVALID_CAST_ARGUMENT: Value cannot be cast to map(varchar,varchar)
json_parse + cast work on your example data:
SELECT CAST(json_parse(str) AS map(varchar, varchar))
FROM (VALUES '{"action":"send","timestamp":1566432054,"uuid":"1234"}') t(str);
I tested this on Presto 317:
presto> SELECT CAST(json_parse(str) AS map(varchar, varchar))
-> FROM (VALUES '{"action":"send","timestamp":1566432054,"uuid":"1234"}') t(str);
_col0
------------------------------------------------
{action=send, uuid=1234, timestamp=1566432054}
(1 row)
My guess is that some data row is different than your example and this data row cannot be cast. You can find it with try:
SELECT str
FROM your_table
WHERE str IS NOT NULL
AND try(CAST(json_parse(str) AS map(varchar, varchar))) IS NULL;

Why AWS Athena returns "string" datatype to all table's fields on "show create table" command or describe tables

Why AWS Athena returns "string" datatype to all table's fields on
"show create table" command or on describe tables:
for example table t_mus_albums:
albumid (bigint)
title (string)
artistid (bigint)
whan running
show create table t_mus_albums;
I get:
CREATE EXTERNAL TABLE `t_mus_albums`(
`albumid` string COMMENT 'from deserializer',
`title` string COMMENT 'from deserializer',
`artistid` string COMMENT 'from deserializer')
I think you might be doing something wrong or while generating the table automatically, you may not have correct formatted data.
Here are the systematic steps to solve your problem.
Assume that your data is in below format.
ID,Code,City,State
41,5,"Youngstown", OH
42,52,"Yankton", SD
46,35,"Yakima", WA
42,16,"Worcester", MA
43,37,"Wisconsin Dells", WI
36,5,"Winston-Salem", NC
Then your create table will go something like below.
CREATE EXTERNAL TABLE IF NOT EXISTS example.tbl_datatype (
`id` int,
`code` int,
`city` string,
`state` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ','
) LOCATION 's3://example-bucket/location/a/'
TBLPROPERTIES ('has_encrypted_data'='false');
Then, run the Query to description the table.
SHOW CREATE TABLE tbl_datatype;
It will give you output something like below.
CREATE EXTERNAL TABLE `tbl_datatype`(
`id` int,
`code` int,
`city` string,
`state` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://example-bucket/location/a/';
Hope it helps!
This is because you use CSV serde and not e.g. TEXT serde.
CSV serde does support only string data type, so all columns are of this type.
From https://docs.aws.amazon.com/athena/latest/ug/csv.html
The OpenCSV SerDe [...] Converts all column type values to STRING.
The documentation outlines some conditions under which the table schema could be different than all strings ("For example, it parses the values into BOOLEAN, BIGINT, INT, and DOUBLE data types when it can discern them"), but apparently this was not effective in your case.

how to accepts list columns as Cassandra UDF parameter

I created one table 
CREATE TABLE human (chromosome text, position bigint,
hg01583 frozen<set<text>>,
hg03006 frozen<set<text>>,
PRIMARY KEY (chromosome, position)
)
and i created function 
CREATE FUNCTION process(sample list<frozen<set<text>>>)
CALLED ON NULL INPUT
RETURNS text
LANGUAGE java
AS
$$
return leftsample==null?null:leftsample.getClass().toString()+" "+leftsample.toString();
$$;
when i issie CQL query
SELECT chromosome,position,hg01583, hg03006, process([hg01583,hg03006]) from human;
i got this error
SyntaxException: line 1:80 no viable alternative at input ',' ([[hg01583],..
how can i pass hg01583 ,hg03006 as list to process function?
With each as own argument like: SELECT chromosome, position, hg01583, hg03006, process(hg01583, hg03006) from human;
CREATE FUNCTION process(hg01583 frozen<set<text>>, hg03006 frozen<set<text>>)
CALLED ON NULL INPUT
RETURNS text
LANGUAGE java AS
$$
return hg01583==null? null : ...
$$;
If you want them to be dynamic, instead of creating fixed columns for each one make it a wide row and use a UDA to aggregate them with an accumulator function. like:
CREATE TABLE human (chromosome text, position bigint,
sample text,
value frozen<set<text>>
PRIMARY KEY (chromosome, position, sample)
)

Cassandra SUM(Map <text,int>) is it possible?

In Casssandra is it possible to sum of int values in a
My DB structure is attr Map<text,int> is it possible to use
select sum (attr['salary']) from testtable or something equivalent
Cassandra not support Map, List and Set in the Select,Insert with [] ( Ex: attr['salary'] )
You can use User Defined Data Type.
Example:
Define your user defined data type like as below
mytype {
salary (int)
}
create 'attr' field with type 'mytype'
now you can do query like as below
select sum(attr.salary) from yourtable.
User-Defined Aggregate Function (UDA)

Resources