How to do these select with select - subquery

Table A, Table B, Table C
100 A Honda
200 B Ferrari
300 C
400 D
E
F
These tables are just for exemple, they donĀ“t have any special meaning!
I am trying to do these
if i do these
Select * from A, B, C
I will get all combinations, i don't wana that.
I want that the output query to be
Col_1 , Col_2, Col_3
100 A Honda
200 B Ferrari
300 C NULL
400 D NULL
NULL E NULL
NULL F NULL
So what i really want to show the ouput in a single query like above.
Query stupid (Select * from A, Select * from b, Select * from C) Ofcorse these will not work...its just to you get the idea
What is the solution to these problem, these must be done in only one query???
Thanks

Related

How do I optimize MariaDB query with subqueries in FROM clause?

imagine these two tables.
Table A
ID col1 col2 col3
1 foo baz bar
2 ofo zba rba
3 oof abz abr
Table B
A_ID field_name field_value
1 first Jon
1 last Doe
2 first Adam
2 last Smith
etc..
Now I would like to have a query (current one looks like this)
SELECT
a.id,
a.col1,
a.col2,
(SELECT field_value FROM B WHERE A_ID = a.id AND field_name = 'first') as first_name,
(SELECT field_value FROM B WHERE A_ID = a.id AND field_name = 'last') as last_name
FROM A a
WHERE (SELECT COUNT(*) FROM B WHERE A_ID = a.id) = 2;
This query is working. What I would like to achieve would be something like this.
SELECT
a.id,
a.col1,
a.col2,
(SELECT field_value FROM b WHERE b.field_name = 'first') as first_name,
(SELECT field_value FROM b WHERE b.field_name = 'last') as last_name
FROM
A a,
(SELECT field_value, field_name FROM B WHERE A_ID = a.id) b
WHERE (SELECT COUNT(*) FROM b) = 2;
How would my approach look correctly? Is there any other way to get rid of the multiple queries of the table B?
Thank you!
I would replace your correlated subqueries with joins:
SELECT
a.id,
a.col1,
a.col2,
b1.field_value AS fv1,
b2.field_value AS fv2
FROM A a
LEFT JOIN B b1
ON a.id = b1.A_ID AND b1.field_name = 'first'
LEFT JOIN B b2
ON a.id = b2.A_ID AND b2.field_name = 'last';
This answer assumes that a left join from a given A record would only match at most one record in the B table, which, however, is a requirement anyway for your correlated subqueries to only return a single value.

How to include ' partition by ' in TD15 Pivot function?

Right now I'm having query like this -
SELECT a, b,
SUM (CASE WHEN measure_name = 'ABC' THEN measure_qty END) OVER (PARTITION BY a, b ) AS ABCPIVOT
FROM data_app.work_test
Now as TD15 is supporting direct PIVOTING.
How do I include this partition by in PIVOT function?

Partial Partition Key Querying With Per Partition Limit In Cassandra

I have a table (let's call it T) set up with a PRIMARY KEY like the following:
PRIMARY KEY ((A, B), C, ....);
I want to query it like the following:
SELECT * FROM T WHERE A = ? and C <= ? PER PARTITION LIMIT 1 ALLOW FILTEIRNG;
(Note that C is a timstamp value. I am essentially asking for the most recent rows across all partitions whose first partition key belongs to my input).
This works with the allow filtering command, and it makes sense why I need it; I do not know beforehand the partition keys B, and I do not care - I want all of them. Therefore, it makes sense that Cassandra would need to scan the entire partition to give me the results, and it also makes sense why I would need to specify it to allow filtering for this to occur.
However, I have read that we should avoid 'ALLOW FILTERING' at all costs, as it can have a huge performance impact, especially in production environments. Indeed, I only use allow filtering very sparingly in my existing applications, and this is usually for one-off queries that calculate something of this nature.
My quesiton is this: is there a way to restructure this table or query to avoid filtering? I am thinking it is impossible, as I do not have knowledge of the keys that make up B beforehand, but I want to double check just to be sure. Thanks!
You cannot efficiently make that query if (A, B) is your partition key. your key would need to be ((A), B) (dropping clustering keys). Then SELECT * FROM T WHERE A = ?. If only care about the latest, then A, B would always be replaced with the most recent.
Another option if looking to get the A,B tuples from a time is to create a table thats indexed by time and have the tuples be clustering columns from there like ((time_bucket), A, B, C). time_bucket being a string like 2018-04-06:00:00:00 that contains all the events for that day. Then when you query like:
> CREATE TABLE example (time_bucket text, A int, B int, C int, D int, PRIMARY KEY ((time_bucket), A, B, C)) WITH CLUSTERING ORDER BY (A ASC, B ASC, C DESC);
> INSERT INTO example (time_bucket, A, B, C, D) VALUES ('2018-04', 1, 1, 100, 999);
> INSERT INTO example (time_bucket, A, B, C, D) VALUES ('2018-04', 1, 1, 120, 999);
> INSERT INTO example (time_bucket, A, B, C, D) VALUES ('2018-04', 1, 1, 130, 999);
> INSERT INTO example (time_bucket, A, B, C, D) VALUES ('2018-04', 1, 2, 130, 999);
> SELECT * FROM example WHERE time_bucket = '2018-04' GROUP BY time_bucket, A, B;
time_bucket | a | b | c | d
-------------+---+---+-----+-----
2018-04 | 1 | 1 | 130 | 999
2018-04 | 1 | 2 | 130 | 999
You would get the 1st result from each of the rows in the time bucket partition whose clustering by A and B. If you make the partitions small enough (use finer grain time buckets, like hours or 15 minutes or something, depending on data rate) its more acceptable to use ALLOW FILTERING here then like:
SELECT * FROM example WHERE time_bucket = '2018-04' AND A = 1 AND C < 120 GROUP BY time_bucket, A, B ALLOW FILTERING ;
time_bucket | a | b | c | d
-------------+---+---+-----+-----
2018-04 | 1 | 1 | 100 | 999
Because its all within one partition and within a bounded size (monitor it closely with tablestats/max partition size). Make sure always querying with time_bucket though so it doesnt become a range query. You want to make sure you do not end up going through too many things without returning a result (which is one of dangers of allow filtering).

SubSelect MDX Query as filtered list of main query

SubSelect MDX Query as filtered list of main query
Hi all
I want to write MDX query like to SQL:
select a, b, sum(x)
from table1
where b = "True" and a in (select distinct c from table2 where c is not null and d="True")
group by a,b
I try something like this:
`Hi all
I want to write MDX query like to SQL:
select a, b, sum(x)
from table1
where b = "True" and a in (select distinct c from table2 where c is not null and d="True")
group by a,b
I try something like this:
SELECT
NON EMPTY { [Measures].[X] } ON COLUMNS,
NON EMPTY { [A].[Name].[Name]
*[B].[Name].[Name].&[True]
} ON ROWS
FROM
(
SELECT
{ ([A].[Name].[Name] ) } ON 0
FROM
( SELECT (
{EXCEPT([C].[Name].ALLMEMBERS, [C].[Name].[ALL].UNKNOWNMEMBER) }) ON COLUMNS
FROM
( SELECT (
{ [D].[Name].&[True] } ) ON COLUMNS
FROM [CUBE]))
)
But it returns me the sum of x from subquery.
How it should look like? '
Does X's measure group have relationship with D dimension? If it's true, the following code must just work:
Select
[Measures].[X] on 0,
Non Empty [A].[Name].[Name].Members * [B].[Name].&[True] on 1
From [CUBE]
Where ([D].[Name].&[True])
If you have many-to-many relationship, you need an extra measure (say Y):
Select
[Measures].[X] on 0,
Non Empty NonEmpty([A].[Name].[Name].Members,[Measures].[Y]) * [B].[Name].&[True] on 1
From [CUBE]
Where ([D].[Name].&[True])

Query optimization in Cassandra

I have a cassandra database that I need to query
My table looks like this:
Cycle Parameters Value
1 a 999
1 b 999
1 c 999
2 a 999
2 b 999
2 c 999
3 a 999
3 b 999
3 c 999
4 a 999
4 b 999
4 c 999
I need to get values for parameters "a" and "b" for two cycles , no matter which "cycle" it is
Example results:
Cycle Parameters Value
1 a 999
1 b 999
2 a 999
2 b 999
or
Cycle Parameters Value
1 a 999
1 b 999
3 a 999
3 b 999
Since the database is quite huge, every query optimization is welcome..
My requirements are:
I want to do everything in 1 query
Would be a plus a answer with no nested query
So far, I was able to accomplish these requirements with something like this:
select * from table where Parameters in ('a','b') sort by cycle, parameters limit 4
However, this query needs a "sortby" operation that causes huge processing in the database...
Any clues on how to do it? ....limit by partition maybe?
EDIT:
The table schema is:
CREATE TABLE cycle_data (
cycle int,
parameters text,
value double,
primary key(parameters,cycle)
)
"parameters" is the partition key and "cycle" is the clustering column
You can't query like this without ALLOW FILTERING, don't use allow filtering in production Only use it for development!
Read the datastax doc about using ALLOW FILTERING https://docs.datastax.com/en/cql/3.3/cql/cql_reference/select_r.html?hl=allow,filter
I assume your current schema is :
CREATE TABLE data (
cycle int,
parameters text,
value double,
primary key(cycle, parameters)
)
And you need another table or change your table schema to query like these
CREATE TABLE cycle_data (
cycle int,
parameters text,
value double,
primary key(parameters,cycle)
)
Now you can query
SELECT * FROM cycle_data WHERE parameters in ('a','b');
These result will automatically sorted in ascending order by cycle for every parameters

Resources