I have a node server accessing a postgres database through a npm package, pg, and have a working query that returns the data, but I think it may be able to be optimized. The data model is of versions and features, one version has many feature children. This query pattern works in a few contexts for my app, but it looks clumsy. Is there a cleaner way?
SELECT
v.*,
coalesce(
(SELECT array_to_json(array_agg(row_to_json(x))) FROM (select f.* from app_feature f where f.version = v.id) x ),
'[]'
) as features FROM app_version v
CREATE TABLE app_version(
id SERIAL PRIMARY KEY,
major INT NOT NULL,
mid INT NOT NULL,
minor INT NOT NULL,
date DATE,
description VARCHAR(256),
status VARCHAR(24)
);
CREATE TABLE app_feature(
id SERIAL PRIMARY KEY,
version INT,
description VARCHAR(256),
type VARCHAR(24),
CONSTRAINT FK_app_feature_version FOREIGN KEY(version) REFERENCES app_version(id)
);
INSERT INTO app_version (major, mid, minor, date, description, status) VALUES (0,0,0, current_timestamp, 'initial test', 'PENDING');
INSERT INTO app_feature (version, description, type) VALUES (1, 'store features', 'New Feature')
INSERT INTO app_feature (version, description, type) VALUES (1, 'return features as json', 'New Feature');
The subquery in FROM clause may not be needed.
select v.*,
coalesce((select array_to_json(array_agg(row_to_json(f)))
from app_feature f
where f.version = v.id), '[]') as features
from app_version v;
And my 5 cents. Pls. note that id is primary key of app_version so it's possible to group by app_version.id only.
select v.*, coalesce(json_agg(to_json(f)), '[]') as features
from app_version v join app_feature f on f.version = v.id
group by v.id;
You could move the JSON aggregation into a view, then join to the view:
create view app_features_json
as
select af.version,
json_agg(row_to_json(af)) as features
from app_feature af
group by af.version;
The use that view in a join:
SELECT v.*,
fj.features
FROM app_version v
join app_features_json afj on afj.version = v.id
Related
Using PostgreSQL I can have multiple rows of json objects.
select (select ROW_TO_JSON(_) from (select c.name, c.age) as _) as jsonresult from employee as c
This gives me this result:
{"age":65,"name":"NAME"}
{"age":21,"name":"SURNAME"}
But in SqlServer when I use the FOR JSON AUTO clause it gives me an array of json objects instead of multiple rows.
select c.name, c.age from customer c FOR JSON AUTO
[{"age":65,"name":"NAME"},{"age":21,"name":"SURNAME"}]
How to get the same result format in SqlServer ?
By constructing separate JSON in each individual row:
SELECT (SELECT [age], [name] FOR JSON PATH, WITHOUT_ARRAY_WRAPPER)
FROM customer
There is an alternative form that doesn't require you to know the table structure (but likely has worse performance because it may generate a large intermediate JSON):
SELECT [value] FROM OPENJSON(
(SELECT * FROM customer FOR JSON PATH)
)
no structure better performance
SELECT c.id, jdata.*
FROM customer c
cross apply
(SELECT * FROM customer jc where jc.id = c.id FOR JSON PATH , WITHOUT_ARRAY_WRAPPER) jdata (jdata)
Same as Barak Yellin but more lazy:
1-Create this proc
CREATE PROC PRC_SELECT_JSON(#TBL VARCHAR(100), #COLS VARCHAR(1000)='D.*') AS BEGIN
EXEC('
SELECT X.O FROM ' + #TBL + ' D
CROSS APPLY (
SELECT ' + #COLS + '
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
) X (O)
')
END
2-Can use either all columns or specific columns:
CREATE TABLE #TEST ( X INT, Y VARCHAR(10), Z DATE )
INSERT #TEST VALUES (123, 'TEST1', GETDATE())
INSERT #TEST VALUES (124, 'TEST2', GETDATE())
EXEC PRC_SELECT_JSON #TEST
EXEC PRC_SELECT_JSON #TEST, 'X, Y'
If you're using PHP add SET NOCOUNT ON; in the first row (why?).
Is there there any way to query on a SET type(or MAP/LIST) to find does it contain a value or not?
Something like this:
CREATE TABLE test.table_name(
id text,
ckk SET<INT>,
PRIMARY KEY((id))
);
Select * FROM table_name WHERE id = 1 AND ckk CONTAINS 4;
Is there any way to reach this query with YCQL api?
And can we use a SET type in SECONDRY INDEX?
Is there any way to reach this query with YCQL api?
YCQL does not support the CONTAINS keyword yet (feel free to open an issue for this on the YugabyteDB GitHub).
One workaround can be to use MAP<INT, BOOLEAN> instead of SET<INT> and the [] operator.
For instance:
CREATE TABLE test.table_name(
id text,
ckk MAP<int, boolean>,
PRIMARY KEY((id))
);
SELECT * FROM table_name WHERE id = 'foo' AND ckk[4] = true;
And can we use a SET type in SECONDRY INDEX?
Generally, collection types cannot be part of the primary key, or an index key.
However, "frozen" collections (i.e. collections serialized into a single value internally) can actually be part of either primary key or index key.
For instance:
CREATE TABLE table2(
id TEXT,
ckk FROZEN<SET<INT>>,
PRIMARY KEY((id))
) WITH transactions = {'enabled' : true};
CREATE INDEX table2_idx on table2(ckk);
Another option is to use with compound primary key and defining ckk as clustering key:
cqlsh> CREATE TABLE ybdemo.tt(id TEXT, ckk INT, PRIMARY KEY ((id), ckk)) WITH CLUSTERING ORDER BY (ckk DESC);
cqlsh> SELECT * FROM ybdemo.tt WHERE id='foo' AND ckk=4;
In Cassandra,Suppose we require to access key level against map type column. how to do it?
Create statement:
create table collection_tab2(
empid int,
emploc map<text,text>,
primary key(empid));
Insert statement:
insert into collection_tab2 (empid, emploc ) VALUES ( 100,{'CHE':'Tata Consultancy Services','CBE':'CTS','USA':'Genpact LLC'} );
select:
select emploc from collection_tab2;
empid | emploc
------+--------------------------------------------------------------------------
100 | {'CBE': 'CTS', 'CHE': 'Tata Consultancy Services', 'USA': 'Genpact LLC'}
In that case, if want to access 'USA' key alone . What I should do?
I tried based on the Index. But all values are coming.
CREATE INDEX fetch_index ON killrvideo.collection_tab2 (keys(emploc));
select * from collection_tab2 where emploc CONTAINS KEY 'CBE';
empid | emploc
------+--------------------------------------------------------------------------
100 | {'CBE': 'CTS', 'CHE': 'Tata Consultancy Services', 'USA': 'Genpact LLC'}
But expected:
'CHE': 'Tata Consultancy Services'
Just as a data model change I would strongly recommend:
create table collection_tab2(
empid int,
emploc_key text,
emploc_value text,
primary key(empid, emploc_key));
Then you can query and page through simply as the emploc_key is clustering key instead of part of the cql collection that has multiple limits and negative performance impacts.
Then:
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'CHE', 'Tata Consultancy Services');
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'CBE, 'CTS');
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'USA', 'Genpact LLC');
Can also put it in a unlogged batch and it will still be applied efficiently and atomically because all in the same partition.
To do it as you have you can after 4.0 with CASSANDRA-7396 with [] selectors like:
SELECT emploc['USA'] FROM collection_tab2 WHERE empid = 100;
But I would still strongly recommend data model changes as its significantly more efficient, and can work in existing versions with:
SELECT * FROM collection_tab2 WHERE empid = 100 AND emploc_key = 'USA';
my aim is to get the msgAddDate based on below query :
select max(msgAddDate)
from sampletable
where reportid = 1 and objectType = 'loan' and msgProcessed = 1;
Design 1 :
here the reportid, objectType and msgProcessed may not be unique. To add the uniqueness I have added msgAddDate and msgProcessedDate (an additional unique value).
I use this design because I don't perform range query.
Create table sampletable ( reportid INT,
objectType TEXT,
msgAddDate TIMESTAMP,
msgProcessed INT,
msgProcessedDate TIMESTAMP,
PRIMARY KEY ((reportid ,msgProcessed,objectType,msgAddDate,msgProcessedDate));
Design 2 :
create table sampletable (
reportid INT,
objectType TEXT,
msgAddDate TIMESTAMP,
msgProcessed INT,
msgProcessedDate TIMESTAMP,
PRIMARY KEY ((reportid ,msgProcessed,objectType),msgAddDate, msgProcessedDate))
);
Please advice which one to use and what will be the pros and cons between two based on performance.
Design 2 is the one you want.
In Design 1, the whole primary key is the partition key. Which means you need to provide all the attributes (which are: reportid, msgProcessed, objectType, msgAddDate, msgProcessedDate) to be able to query your data with a SELECT statement (which wouldn't be useful as you would not retrieve any additional attributes than the one you already provided in the WHERE statemenent)
In Design 2, your partition key is reportid ,msgProcessed,objectType which are the three attributes you want to query by. Great. msgAddDate is the first clustering column, which will be automatically sorted for you. So you don't even need to run a max since it is sorted. All you need to do is use LIMIT 1:
SELECT msgAddDate FROM sampletable WHERE reportid = 1 and objectType = 'loan' and msgProcessed = 1 LIMIT 1;
Of course, make sure to define a DESC sorted order on msgAddDate (I think by default it is ascending...)
Hope it helps!
Here's the issue. I have 2 tables that I am currently using in a pivot to return a single value, MAX(Date). I have been asked to return additional values associated with that particular MAX(Date). I know I can do this with an OVER PARTITION but it would require me doing about 8 or 9 LEFT JOINS to get the desired output. I was hoping there is a way to get my existing PIVOT to return these values. More specifically, let's say each MAX(Date) has a data source and we want that particular source to become part of the output. Here is a simple sample of what I am talking about:
Create table #Email
(
pk_id int not null identity primary key,
email_address varchar(50),
optin_flag bit default(0),
unsub_flag bit default(0)
)
Create table #History
(
pk_id int not null identity primary key,
email_id int not null,
Status_Cd char(2),
Status_Ds varchar(20),
Source_Cd char(3),
Source_Desc varchar(20),
Source_Dttm datetime
)
Insert into #Email
Values
('test#test.com',1,0),
('blank#blank.com',1,1)
Insert into #History
values
(1,'OP','OPT-IN','WB','WEB','1/2/2015 09:32:00'),
(1,'OP','OPT-IN','WB','WEB','1/3/2015 10:15:00'),
(1,'OP','OPT-IN','WB','WEB','1/4/2015 8:02:00'),
(2,'OP','OPT-IN','WB','WEB','2/1/2015 07:22:00'),
(2,'US','UNSUBSCRIBE','EM','EMAIL','3/2/2015 09:32:00'),
(2,'US','UNSUBSCRIBE','ESP','SERVICE PROVIDER','3/2/2015 09:55:00'),
(2,'US','UNSUBSCRIBE','WB','WEB','3/2/2015 10:15:00')
;with dates as
(
select
email_id,
[OP] as [OptIn_Dttm],
[US] as [Unsub_Dttm]
from
(
select
email_id,
status_cd,
source_dttm
from #history
) as src
pivot (min(source_dttm) for status_cd in ([OP],[US])) as piv
)
select
e.pk_id as email_id,
e.email_address,
e.optin_flag,
/*WANT TO GET THE OPTIN SOURCE HERE*/ /*<-------------*/
d.OptIn_Dttm,
e.unsub_flag,
d.Unsub_Dttm
/*WANT TO GET THE UNSUB SOURCE HERE*/ /*<-------------*/
from #Email e
left join dates d on e.pk_id = d.email_id