Use Nested fied type in AggregatingMergeTree for ClickHouse - nested

This is my data schema of CH:
CREATE TABLE testing
(
id UInt64,
client_id UInt64,
nested_field Nested(
key String,
value1 UInt32,
value2 UInt32
)
) ENGINE = MergeTree
PRIMARY KEY (id);
This is AggregatingMergeTree table schema what I want to have:
CREATE TABLE testing_agg
(
client_id UInt32,
records AggregateFunction(count, UInt32),
nested_field Nested(
key String,
value1 AggregateFunction(sum, UInt32),
value2 AggregateFunction(sum, UInt32)
)
) ENGINE = AggregatingMergeTree
PRIMARY KEY (client_id);
Does clickhouse supports this type of aggregation? how to write the correct Materialized view for it?

Taking into account the aggregate functions you are using, you can achieve it by using SummingMergeTree engine, the only extra consideration is that nested field should end by Map.
An example:
DROP TABLE IF EXISTS testing_agg;
CREATE TABLE testing_agg
(
client_id UInt32,
records UInt32,
nestedMap Nested(
key String,
value1 UInt32,
value2 UInt32
)
) ENGINE = SummingMergeTree
PRIMARY KEY (client_id);
INSERT INTO testing_agg VALUES (1, 10, ['1', '2'], [1, 3], [2, 4]);
INSERT INTO testing_agg VALUES (1, 10, ['1'], [2], [1]);
INSERT INTO testing_agg VALUES (1, 10, ['3'], [4], [3]);
SELECT * FROM testing_agg FINAL;
┌─client_id─┬─records─┬─nestedMap.key─┬─nestedMap.value1─┬─nestedMap.value2─┐
│ 1 │ 30 │ ['1','2','3'] │ [3,3,4] │ [3,4,3] │
└───────────┴─────────┴───────────────┴──────────────────┴──────────────────┘
More info: https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/summingmergetree/#nested-structures

Related

How to insert values in a table which has user defined types?

I am working with Scylla (Cassandra) db and trying to create tables which is dealing with User-Defined Types as shown below:
CREATE TYPE process (
id int,
discount float
);
CREATE TYPE service (
id int,
url text
);
CREATE TABLE data (
id int PRIMARY KEY,
fname text,
lname text,
service set<frozen<service>>,
monthly_process frozen<process>
);
My confusion is how can I insert data in my data table. Problem is I am confuse how process and service type works here and how can I insert values in them?
I tried with below example but it gave me an error:
insert into test (id, fname, lname, service, monthly_process )
values (1, 'abc', 'world', {'service': [{'id':1, 'url': 'some_url1'},
{'id':2, 'url': 'some_url2'}]}, {'id':1, 'discount': 10.0});
Error I got:
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Invalid map literal for service of type set<frozen<service>>"
Here is the working version of your query;
insert into data (id, fname, lname, service, monthly_process)
values (2, 'abc', 'world', {{id: 1, url: 'some'}, {id : 2, url:'another'}}, {id:1, discount: 10.0});
service set<frozen<service>> format would be {{id:1, url:'a'}, {id:2, url:'b'}}
monthly_process frozen<process> format is {id:1, discount: 10.0}

Cassandra : Key Level access in Map type columns

In Cassandra,Suppose we require to access key level against map type column. how to do it?
Create statement:
create table collection_tab2(
empid int,
emploc map<text,text>,
primary key(empid));
Insert statement:
insert into collection_tab2 (empid, emploc ) VALUES ( 100,{'CHE':'Tata Consultancy Services','CBE':'CTS','USA':'Genpact LLC'} );
select:
select emploc from collection_tab2;
empid | emploc
------+--------------------------------------------------------------------------
100 | {'CBE': 'CTS', 'CHE': 'Tata Consultancy Services', 'USA': 'Genpact LLC'}
In that case, if want to access 'USA' key alone . What I should do?
I tried based on the Index. But all values are coming.
CREATE INDEX fetch_index ON killrvideo.collection_tab2 (keys(emploc));
select * from collection_tab2 where emploc CONTAINS KEY 'CBE';
empid | emploc
------+--------------------------------------------------------------------------
100 | {'CBE': 'CTS', 'CHE': 'Tata Consultancy Services', 'USA': 'Genpact LLC'}
But expected:
'CHE': 'Tata Consultancy Services'
Just as a data model change I would strongly recommend:
create table collection_tab2(
empid int,
emploc_key text,
emploc_value text,
primary key(empid, emploc_key));
Then you can query and page through simply as the emploc_key is clustering key instead of part of the cql collection that has multiple limits and negative performance impacts.
Then:
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'CHE', 'Tata Consultancy Services');
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'CBE, 'CTS');
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'USA', 'Genpact LLC');
Can also put it in a unlogged batch and it will still be applied efficiently and atomically because all in the same partition.
To do it as you have you can after 4.0 with CASSANDRA-7396 with [] selectors like:
SELECT emploc['USA'] FROM collection_tab2 WHERE empid = 100;
But I would still strongly recommend data model changes as its significantly more efficient, and can work in existing versions with:
SELECT * FROM collection_tab2 WHERE empid = 100 AND emploc_key = 'USA';

Cassandra Failed Format Value

I am following the article for using group by: http://www.batey.info/cassandra-aggregates-min-max-avg-group.html
And I have the following function and aggregate
CREATE FUNCTION state_group_and_total( state map<text, int>, type text, amount int )
CALLED ON NULL INPUT
RETURNS map<text, int>
LANGUAGE java AS '
Integer count = (Integer) state.get(type);
if (count == null)
count = amount;
else
count = count + amount;
state.put(type, count);
return state; ' ;
CREATE OR REPLACE AGGREGATE group_and_total(text, int)
SFUNC state_group_and_total
STYPE map<text, int>
INITCOND {};
But when I run the following command select group_and_total(name,count) from ascore; I get the error Failed to format value OrderedMapSerializedKey([(u'gleydson', 4)]) : 'NoneType' object has no attribute 'sub_types'
My schema is
CREATE TABLE ascore (
name text,
count int,
id text,
PRIMARY KEY(id)
)
The So there's no group by keyword in Cassandra in the blog post is no longer accurate. You can do group by and apply aggregates to that group which makes this a lot easier as an example:
CREATE TABLE scores (
competition text,
name text,
run_date timestamp,
score int,
PRIMARY KEY ((competition), name, run_date));
INSERT INTO scores (competition, name, run_date , score ) VALUES ( 'week-12', 'user1', dateOf(now()), 2);
INSERT INTO scores (competition, name, run_date , score ) VALUES ( 'week-12', 'user1', dateOf(now()), 2);
INSERT INTO scores (competition, name, run_date , score ) VALUES ( 'week-12', 'user1', dateOf(now()), 4);
INSERT INTO scores (competition, name, run_date , score ) VALUES ( 'week-12', 'user2', dateOf(now()), 4);
SELECT name, sum(score) AS user_total FROM scores WHERE competition = 'week-12' GROUP BY competition, name;
name | user_total
-------+------------
user1 | 8
user2 | 4
Note that the aggregate function you have does work with above example:
select group_and_total(name,score) from scores where competition = 'week-12';
test.group_and_total(name, score)
----------------------------------------
{'user1': 8, 'user2': 4}
UPDATE with your schema:
> INSERT INTO ascore (id, name, count) VALUES ('id1', 'bob', 2);
> INSERT INTO ascore (id, name, count) VALUES ('id2', 'alice', 1);
> INSERT INTO ascore (id, name, count) VALUES ('id3', 'alice', 1);
# even with a null
> INSERT INTO ascore (id, name) VALUES ('id4', 'alice');
> select group_and_total(name,count) from ascore;
test.group_and_total(name, count)
----------------------------------------
{'alice': 2, 'bob': 2}
You might be using an older version with some bugs?

What is the correct way to insert data into a Cassandra UDT?

Here is the type I have created,
CREATE TYPE urs.dest (
destinations frozen<list<text>>);
And here is the table ,
CREATE TABLE urs.abc (
id int,
locations map<text, frozen<dest>>,
PRIMARY KEY(id));
When I try to insert values from cqlsh,
try 1:
insert into urs.abc (id, locations ) values (1, {'coffee': { 'abcd', 'efgh'}});
try 2:
insert into urs.abc (id, locations ) values (1, {'coffee': ['abcd', 'efgh']});
try 3:
insert into urs.abc (id) values (1);
update urs.abc set locations = locations + {'coffee': {'abcd','qwer'}} where id=1;
I'm getting the below error,
Error from server: code=2200 [Invalid query] message="Invalid map literal for locations: value {'abcd', 'qwer'} is not of type frozen<dest>"
Can anyone please let me know the correct way to add value to my UDT?
Table creation seems fine
To insert to the table to urs.abc use this
insert into urs.abc (id, locations ) values (1, {'coffee':{ destinations: ['abcd', 'efgh']}});
You are missing the field name destinations.

How to make sqlite3 module not converting column data to integer type

I'm trying to read data from a sqlite3 database using python3 and it looks as it tries to be smart and convert columns looking like a integer to integer type. I don't want that (if I got it right sqlite3 stores data as text no matter what anyway).
I've created the database as:
sqlite> create table t (id integer primary key, foo text, bar datetime);
sqlite> insert into t values (NULL, 1, 2);
sqlite> insert into t values (NULL, 1, 'fubar');
sqlite> select * from t;
1|1|2
2|1|fubar
and tried to read it using:
db = sqlite3.connect(dbfile)
cur = db.cursor()
cur.execute("SELECT * FROM t")
for l in cur:
print(t)
db.close()
And getting output like:
(1, '1', 2)
(2, '1', 'fubar')
but I expected/wanted something like
('1', '1', '2')
('2', '1', 'fubar')
(definitely for the last column)
Try
for l in cur:
print((str(x) for x in t))
SQLite stores values in whatever affinity the column has.
If you do not want to have numbers, don't use datetime but text.

Resources