Parse JSON from Presto varchar column fails - presto

I am attempting to convert my varchar column data, which is stringifed JSON, to MAP datatype so I can reference the data as elements.
WITH
data(c) AS (
SELECT message from mydb.mytable
),
parsed AS (
SELECT cast(json_parse(c) as map(varchar, varchar)) AS m
FROM data
)
SELECT m['action'], m['uuid']
FROM parsed
Sample data looks like:
{"action":"send","timestamp":1566432054,"uuid":"1234"}
I tried solution provided here: How to cast varchar to MAP(VARCHAR,VARCHAR) in presto, which is where I got the query from replacing values with a select statement but it did not work. I get error:
INVALID_CAST_ARGUMENT: Value cannot be cast to map(varchar,varchar)

json_parse + cast work on your example data:
SELECT CAST(json_parse(str) AS map(varchar, varchar))
FROM (VALUES '{"action":"send","timestamp":1566432054,"uuid":"1234"}') t(str);
I tested this on Presto 317:
presto> SELECT CAST(json_parse(str) AS map(varchar, varchar))
-> FROM (VALUES '{"action":"send","timestamp":1566432054,"uuid":"1234"}') t(str);
_col0
------------------------------------------------
{action=send, uuid=1234, timestamp=1566432054}
(1 row)
My guess is that some data row is different than your example and this data row cannot be cast. You can find it with try:
SELECT str
FROM your_table
WHERE str IS NOT NULL
AND try(CAST(json_parse(str) AS map(varchar, varchar))) IS NULL;

Related

MssqlRow to json string without knowing structure and data type on compile time [duplicate]

Using PostgreSQL I can have multiple rows of json objects.
select (select ROW_TO_JSON(_) from (select c.name, c.age) as _) as jsonresult from employee as c
This gives me this result:
{"age":65,"name":"NAME"}
{"age":21,"name":"SURNAME"}
But in SqlServer when I use the FOR JSON AUTO clause it gives me an array of json objects instead of multiple rows.
select c.name, c.age from customer c FOR JSON AUTO
[{"age":65,"name":"NAME"},{"age":21,"name":"SURNAME"}]
How to get the same result format in SqlServer ?
By constructing separate JSON in each individual row:
SELECT (SELECT [age], [name] FOR JSON PATH, WITHOUT_ARRAY_WRAPPER)
FROM customer
There is an alternative form that doesn't require you to know the table structure (but likely has worse performance because it may generate a large intermediate JSON):
SELECT [value] FROM OPENJSON(
(SELECT * FROM customer FOR JSON PATH)
)
no structure better performance
SELECT c.id, jdata.*
FROM customer c
cross apply
(SELECT * FROM customer jc where jc.id = c.id FOR JSON PATH , WITHOUT_ARRAY_WRAPPER) jdata (jdata)
Same as Barak Yellin but more lazy:
1-Create this proc
CREATE PROC PRC_SELECT_JSON(#TBL VARCHAR(100), #COLS VARCHAR(1000)='D.*') AS BEGIN
EXEC('
SELECT X.O FROM ' + #TBL + ' D
CROSS APPLY (
SELECT ' + #COLS + '
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
) X (O)
')
END
2-Can use either all columns or specific columns:
CREATE TABLE #TEST ( X INT, Y VARCHAR(10), Z DATE )
INSERT #TEST VALUES (123, 'TEST1', GETDATE())
INSERT #TEST VALUES (124, 'TEST2', GETDATE())
EXEC PRC_SELECT_JSON #TEST
EXEC PRC_SELECT_JSON #TEST, 'X, Y'
If you're using PHP add SET NOCOUNT ON; in the first row (why?).

Why AWS Athena returns "string" datatype to all table's fields on "show create table" command or describe tables

Why AWS Athena returns "string" datatype to all table's fields on
"show create table" command or on describe tables:
for example table t_mus_albums:
albumid (bigint)
title (string)
artistid (bigint)
whan running
show create table t_mus_albums;
I get:
CREATE EXTERNAL TABLE `t_mus_albums`(
`albumid` string COMMENT 'from deserializer',
`title` string COMMENT 'from deserializer',
`artistid` string COMMENT 'from deserializer')
I think you might be doing something wrong or while generating the table automatically, you may not have correct formatted data.
Here are the systematic steps to solve your problem.
Assume that your data is in below format.
ID,Code,City,State
41,5,"Youngstown", OH
42,52,"Yankton", SD
46,35,"Yakima", WA
42,16,"Worcester", MA
43,37,"Wisconsin Dells", WI
36,5,"Winston-Salem", NC
Then your create table will go something like below.
CREATE EXTERNAL TABLE IF NOT EXISTS example.tbl_datatype (
`id` int,
`code` int,
`city` string,
`state` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ','
) LOCATION 's3://example-bucket/location/a/'
TBLPROPERTIES ('has_encrypted_data'='false');
Then, run the Query to description the table.
SHOW CREATE TABLE tbl_datatype;
It will give you output something like below.
CREATE EXTERNAL TABLE `tbl_datatype`(
`id` int,
`code` int,
`city` string,
`state` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://example-bucket/location/a/';
Hope it helps!
This is because you use CSV serde and not e.g. TEXT serde.
CSV serde does support only string data type, so all columns are of this type.
From https://docs.aws.amazon.com/athena/latest/ug/csv.html
The OpenCSV SerDe [...] Converts all column type values to STRING.
The documentation outlines some conditions under which the table schema could be different than all strings ("For example, it parses the values into BOOLEAN, BIGINT, INT, and DOUBLE data types when it can discern them"), but apparently this was not effective in your case.

How to cast varchar to MAP(VARCHAR,VARCHAR) in presto

I have table in presto, one column named ("mappings") have key-value pair as string
select mappings from hello;
Ex: {"foo": "baar", "foo1": "bar1" }
I want to cast "mappings" column into a MAP
like select CAST("mappings" as MAP) from hello;
This will throw error in presto. How can we translate this to map?
There is no canonical string representation for a MAP in Presto, so so there's no way to cast it directly to MAP(VARCHAR, VARCHAR). But, if your string contains a JSON map, you can use the json_parse function to convert the string into a value of JSON type and convert that to a SQL MAP via a cast.
Example:
WITH
data(c) AS (
VALUES '{"foo": "baar", "foo1": "bar1"}'
),
parsed AS (
SELECT cast(json_parse(c) as map(varchar, varchar)) AS m
FROM data
)
SELECT m['foo'], m['foo1']
FROM parsed
produces:
_col0 | _col1
-------+-------
baar | bar1
select cast( json_parse(mappings) as MAP(VARCHAR,VARCHAR)) from hello1;

Oracle - query to retrieve CLOB value under multple tags with same name

I have a table T with CLOB column called XML_CLOB
Value in the column likes following:
<reportName>
<string>REPORT_A</string>
<string>REPORT_B</string>
<string>REPORT_C</string>
</reportName>
I'm trying to retrieve string value from this CLOB column and return in different rows. If I use
xmltype(xml_clob).extract('//reportName/string/text()').getstringval()
it outputs like 'REPORT_AREPORT_BREPORT_C' in the same row.
I also tried
extractValue(xmltype(xml_clob), '//reportName/string[1]')
but the problem is I don't know how much child values under tag
Is there anyway I can retrieve in different rows like:
1 REPORT_A
2 REPORT_B
3 REPORT_C
Many thanks in advance~
Oracle Setup:
CREATE TABLE table_name (xml_clob CLOB );
INSERT INTO table_name VALUES (
'<reportName>
<string>REPORT_A</string>
<string>REPORT_B</string>
<string>REPORT_C</string>
</reportName>'
);
Query 1:
SELECT x.string
FROM table_name t,
XMLTable('/reportName/string'
PASSING XMLType( t.xml_clob )
COLUMNS string VARCHAR2(50) PATH '/'
) x
Query 2:
SELECT EXTRACTVALUE( s.COLUMN_VALUE, '/string' ) AS string
FROM table_name t,
TABLE(
XMLSequence(
EXTRACT(
XMLType( t.xml_clob ),
'/reportName/string'
)
)
) s;
Output:
STRING
--------
REPORT_A
REPORT_B
REPORT_C
WITH test_table AS
(SELECT xmltype('<reportName>
<string>REPORT_A</string>
<string>REPORT_B</string>
<string>REPORT_C</string>
</reportName>' ) xml_clob
FROM dual
)
SELECT x.*
FROM test_table,
xmltable('/reportName/string'
passing test_table.xml_clob
columns report_name VARCHAR2(100) path 'text()') x

Turning a Comma Separated string into individual rows in Teradata

I read the post:
Turning a Comma Separated string into individual rows
And really like the solution:
SELECT A.OtherID,
Split.a.value('.', 'VARCHAR(100)') AS Data
FROM
( SELECT OtherID,
CAST ('<M>' + REPLACE(Data, ',', '</M><M>') + '</M>' AS XML) AS Data
FROM Table1
) AS A CROSS APPLY Data.nodes ('/M') AS Split(a);
But it did not work when I tried to apply the method in Teradata for a similar question. Here is the summarized error code:
select failed 3707: expected something between '.' and the 'value' keyword. So is the code only valid in SQL Server? Would anyone help me to make it work in Teradata or SAS SQL? Your help will be really appreciated!
This is SQL Server syntax.
In Teradata there's a table UDF named STRTOK_SPLIT_TO_TABLE,
e.g.
SELECT * FROM dbc.DatabasesV AS db
JOIN
(
SELECT token AS DatabaseName, tokennum
FROM TABLE (STRTOK_SPLIT_TO_TABLE(1, 'dbc,systemfe', ',')
RETURNS (outkey INTEGER,
tokennum INTEGER,
token VARCHAR(128) CHARACTER SET UNICODE)
) AS d
) AS dt
ON db.DatabaseName = dt.DatabaseName
ORDER BY tokennum;
Or see my answer to this similar question

Resources