Is it possible to create a PERSISTED column that's made up of an array of specific JSON values and if so how? - singlestore

Is it possible to create a PERSISTED column that's made up of an array of specific JSON values and if so how?
Simple Example (json column named data):
{ name: "Jerry", age: 91, mother: "Janet", father: "Eustace" }
Persisted Column Hopeful (assuming json column is called 'data'):
ALTER TABLE tablename ADD parents [ data::$mother, data::$father ] AS PERSISTED JSON;
Expected Output
| data (json) | parents (persisted json) |
| -------------------------------------------------------------- | ------------------------- |
| { name: "Jerry", age: 91, mother: "Janet", father: "Eustace" } | [ "Janet", "Eustace" ] |
| { name: "Eustace", age: 106, mother: "Jane" } | [ "Jane" ] |
| { name: "Jim", age: 54, mother: "Rachael", father: "Dom" } | [ "Rachael", "Dom ] |
| -------------------------------------------------------------- | ------------------------- |
The above doesn't work, but hopefully it conveys what I'm trying to accomplish.

There is no PERSISTED ARRAY data type for columns, but there is a JSON column type that can store arrays.
For example:
-- The existing table
create table tablename (
id int primary key AUTO_INCREMENT
);
-- Add the new JSON column
ALTER TABLE tablename ADD column parents JSON;
-- Insert data into the table
INSERT INTO tablename (parents) VALUES
('[ "Janet", "Eustace" ]'),
('[ "Jane" ]');
-- Select table based on matches in the JSON column
select *
from tablename
where JSON_ARRAY_CONTAINS_STRING(parents, 'Jane');
-- Change data in the JSON column
update tablename
set parents = JSON_ARRAY_PUSH_STRING(parents, 'Jon')
where JSON_ARRAY_CONTAINS_STRING(parents, 'Jane')
-- Show changed data
select *
from tablename
where JSON_ARRAY_CONTAINS_STRING(parents, 'Jane');
Check out more examples of pushing and selecting JSON data in the docs at https://docs.memsql.com/v7.0/concepts/json-guide/

Here is a sample table definition where I do something similar with customer and event:
CREATE TABLE `eventsext2` (
`data` JSON COLLATE utf8_bin DEFAULT NULL,
`memsql_insert_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`customer` as data::$custID PERSISTED text CHARACTER SET utf8 COLLATE utf8_general_ci,
`event` as data::$event PERSISTED text CHARACTER SET utf8 COLLATE utf8_general_ci,
customerevent as concat(data::$custID,", ",data::$event) persisted text,
`generator` as data::$genID PERSISTED text CHARACTER SET utf8 COLLATE utf8_general_ci,
`latitude` as (substr(data::$longlat from (instr(data::$longlat,'|')+1))) PERSISTED decimal(21,18),
`longitude` as (substr(data::$longlat from 1 for (instr(data::$longlat,'|')-1))) PERSISTED decimal(21,18),
`location` as concat('POINT(',latitude,' ',longitude,')') PERSISTED geographypoint,
KEY `memsql_insert_time` (`memsql_insert_time`)
/*!90618 , SHARD KEY () */
) /*!90623 AUTOSTATS_CARDINALITY_MODE=OFF, AUTOSTATS_HISTOGRAM_MODE=OFF */ /*!90623 SQL_MODE='STRICT_ALL_TABLES' */;

Though not your question, denormalizing this table into two tables might be a good choice:
create table parents (
id int primary key auto_increment,
tablenameid int not null,
name varchar(20),
type int not null, -- 1=Father, 2=Mother, ideally foreign key to other table
);

Related

Azure Data Flow creating / managing keys for identity relationships

Curious to find out what the best way is to generate relationship identities through ADF.
Right now, I'm consuming JSON data that does not have any identity information. This data is then transformed into multiple database sink tables with relationships (1..n, etc.). Due to FK constraints on some of the destination sink tables, these relationships need to be "built up" one at a time.
This approach seems a bit kludgy, so I'm looking to see if there are other options that I'm not aware of.
Note that I need to include the Surrogate key generation for each insert. If I do not do this, based on output database schema, I'll get a 'cannot insert PK null' error.
Also note that I turn IDENTITY_INSERT ON/OFF for each sink.
I would tend to take more of an ELT approach and use the native JSON abilites in Azure SQL DB, ie OPENJSON. You could land the JSON in a table in Azure SQL DB using ADF (eg a Stored Proc activity) and then call another stored proc to process the JSON, something like this:
-- Setup
DROP TABLE IF EXISTS #tmp
DROP TABLE IF EXISTS import.City;
DROP TABLE IF EXISTS import.Region;
DROP TABLE IF EXISTS import.Country;
GO
DROP SCHEMA IF EXISTS import
GO
CREATE SCHEMA import
CREATE TABLE Country ( CountryKey INT IDENTITY PRIMARY KEY, CountryName VARCHAR(50) NOT NULL UNIQUE )
CREATE TABLE Region ( RegionKey INT IDENTITY PRIMARY KEY, CountryKey INT NOT NULL FOREIGN KEY REFERENCES import.Country, RegionName VARCHAR(50) NOT NULL UNIQUE )
CREATE TABLE City ( CityKey INT IDENTITY(100,1) PRIMARY KEY, RegionKey INT NOT NULL FOREIGN KEY REFERENCES import.Region, CityName VARCHAR(50) NOT NULL UNIQUE )
GO
DECLARE #json NVARCHAR(MAX) = '{
"Cities": [
{
"Country": "England",
"Region": "Greater London",
"City": "London"
},
{
"Country": "England",
"Region": "West Midlands",
"City": "Birmingham"
},
{
"Country": "England",
"Region": "Greater Manchester",
"City": "Manchester"
},
{
"Country": "Scotland",
"Region": "Lothian",
"City": "Edinburgh"
}
]
}'
SELECT *
INTO #tmp
FROM OPENJSON( #json, '$.Cities' )
WITH
(
Country VARCHAR(50),
Region VARCHAR(50),
City VARCHAR(50)
)
GO
-- Add the Country first (has no foreign keys)
INSERT INTO import.Country ( CountryName )
SELECT DISTINCT Country
FROM #tmp s
WHERE NOT EXISTS ( SELECT * FROM import.Country t WHERE s.Country = t.CountryName )
-- Add the Region next including Country FK
INSERT INTO import.Region ( CountryKey, RegionName )
SELECT t.CountryKey, s.Region
FROM #tmp s
INNER JOIN import.Country t ON s.Country = t.CountryName
-- Now add the City with FKs
INSERT INTO import.City ( RegionKey, CityName )
SELECT r.RegionKey, s.City
FROM #tmp s
INNER JOIN import.Country c ON s.Country = c.CountryName
INNER JOIN import.Region r ON s.Region = r.RegionName
AND c.CountryKey = r.CountryKey
SELECT * FROM import.City;
SELECT * FROM import.Region;
SELECT * FROM import.Country;
This is a simple test script designed to show the idea and should run end-to-end but it is not production code.

How to run CQL in Zeppelin by taking input in user input format?

I was trying to run CQL query by taking in user input format in Zeppelin tool:-
%cassandra
SELECT ${Select Fields Type=uuid ,uuid | created_by | email_verify| username} FROM
${Select Table=keyspace.table_name}
${WHERE email_verify="true" } ${ORDER BY='updated_date' }LIMIT ${limit = 10};
while running this query I was getting this error:
line 4:0 mismatched input 'true' expecting EOF
(SELECT uuid FROM keyspace.table_name ["true"]...)
You need to move WHERE and ORDER BY out of the dynamic form declaration.
The input field declaration is looks as following: ${field_name=default_value}. In your case, instead of WHERE ..., you've got the field name of WHERE email_verify.
It should be as following (didn't tested):
%cassandra
SELECT ${Select Fields Type=uuid ,uuid | created_by | email_verify| username} FROM
${Select Table=keyspace.table_name}
WHERE ${where_cond=email_verify='true'} ORDER BY ${order_by='updated_date'} LIMIT ${limit = 10};
Update:
here is the working example for table with following structure:
CREATE TABLE test.scala_test2 (
id int,
c int,
t text,
tm timestamp,
PRIMARY KEY (id, c)
) WITH CLUSTERING ORDER BY (c ASC)

What happens when adding a field in UDT in Cassandra?

For example, suppose I have a basic_info type:
CREATE TYPE basic_info (first_name text, last_name text, nationality text)
And table like this:
CREATE TABLE student_stats (id int PRIMARY KEY, grade text, basics FROZEN<basic_info>)
And I have millions of record in the table.
If I add a field in the basic_info like this:
ALTER TYPE basic_info ADD address text;
I want to ask what happens in Cassandra when you add a new field in UDT type (it's currently a column in a table)? The reason for this question is I afraid that some side effects will happen when the table contains a lot of data (millions of record). It's best if you can explain things that will happen from the start to the end.
fields of UDT are described in table system_schema.types. When you add a new field, the entry for that type is updated inside Cassandra, but no changes in data on disk will happen (SSTables are immutable). Instead, when Cassandra read data, it checks if field is present or not, and if not (because it wasn't set, or it's a new field of UDT), then it will return null for that value, but not modify data on disk.
For example, if I have following type and table that uses it:
CREATE TYPE test.udt (
id int,
t1 int
);
CREATE TABLE test.u2 (
id int PRIMARY KEY,
u udt
)
And I have some data in the table, so I get:
cqlsh> select * from test.u2; id | u ----+---------------- 5 | {id: 1, t1: 3}
If I add a field to UDT with alter type test.udt add t2 int;, I immediately see the null as a value for a new UDT field:
cqlsh> select * from test.u2;
id | u
----+--------------------------
5 | {id: 1, t1: 3, t2: null}
And if I do sstabledump on the SSTable, I can see that it contains only old data:
[
{
"partition" : {
"key" : [ "5" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 46,
"liveness_info" : { "tstamp" : "2019-07-28T09:33:12.019Z" },
"cells" : [
{ "name" : "u", "path" : [ "id" ], "value" : 1 },
{ "name" : "u", "path" : [ "t1" ], "value" : 3 }
]
}
]
}
]
See also my answer about adding/removing columns

how to extract data JSON from zeppelin sql

I query to test_tbl table on Zeppelin.
the table data structure looks like as below :
%sql
desc stg.test_tbl
col_name | data_type | comment
id | string |
title | string |
tags | string |
The tags column has data JSON type following as :
{"name":[{"family": null, "first": "nelson"}, {"pos_code":{"house":"tlv", "id":"A12YR"}}]}
and I want to see the JSON data with columns, so my query is :
select *, tag.*
from stg.test_tbl as t
lateral view explode(t.tags.name) name as name
lateral view explode(name.pos_code) pos_code as pos_code
but when I query, it returns
Can't extract value from tags#3423: need struct type but got string; line 3 pos 21
set zeppelin.spark.sql.stacktrace = true to see full stacktrace
should i query as string in where statement?
Answered myself. I could use get_json_object in string type of JSON.
Also, if the JSON format is array like below
{"name":[{"family": null, "first": "nelson"}, {"pos_code":{"house":"tlv", "id":"A12YR"}}]}
Then I could query using the key
select * from stg.test_tbl as t
where t.pos_code[0].house = "tlv"

Updating to empty set

I just created a new column for my table
alter table user add (questions set<timeuuid>);
Now the table looks like
user (
google_id text PRIMARY KEY,
date_of_birth timestamp,
display_name text,
joined timestamp,
last_seen timestamp,
points int,
questions set<timeuuid>
)
Then I tried to update all those null values to empty sets, by doing
update user set questions = {} where google_id = ?;
for each google id.
However they are still null.
How can I fill that column with empty sets?
A set, list, or map needs to have at least one element because an
empty set, list, or map is stored as a null set.
source
Also, this might be helpful if you're using a client (java for instance).
I've learnt that there's not really such a thing as an empty set, or list, etc.
These display as null in cqlsh.
However, you can still add elements to them, e.g.
> select * from id_set;
set_id | set_content
-----------------------+---------------------------------
104649882895086167215 | null
105781005288147046623 | null
> update id_set set set_content = set_content + {'apple','orange'} where set_id = '105781005288147046623';
set_id | set_content
-----------------------+---------------------------------
104649882895086167215 | null
105781005288147046623 | { 'apple', 'orange' }
So even though it displays as null you can think of it as already containing the empty set.

Resources