Incomming references and outgoing references in mssql - reference

what are incoming references and outgoing references in MSSQL?
Microsoft describes in below link as described below
https://learn.microsoft.com/en-us/sql/relational-databases/tables/create-foreign-key-relationships?view=sql-server-ver15
A table can reference a maximum of 253 other tables and columns as foreign keys (outgoing references). SQL Server 2016 (13.x) increases the limit for the number of other tables and columns that can reference columns in a single table (incoming references), from 253 to 10,000.
What does this means?
Do it mean a primary table will have outgoing references and a foreign key table will have incoming references?

Say I have a table
CREATE TABLE MyTable (
MyTableID INT NOT NULL,
Col1 INT,
Col2 INT,
.
.
.
.
ColN INT,
CONSTRAINT PK_MyTable PRIMARY KEY)
MyTable can refer to max 253 tables with Col1 to ColN (and MyTableID, of course!)
What if other tables want to refer to MyTable?
CREATE TABLE YourTable (
YourTableID INT,
MyTableID INT)
ALTER TABLE YourTable
ADD CONSTRAINT FK_YourTable_MyTable FOREIGN KEY (MyTableID)
REFERENCES MyTable(MyTableID)
How many such tables as YourTable can refer to MyTable's MyTableID?
Though, MyTable can refer to as may as 253 tables with its as many as 253 columns, but its MyTableID can be referred by as many as 10000 tables!
If the column is referenced by more than 253 tables, update-cascade won't work in all the tables, but delete-cascade will work.

Related

Cassandra slow SELECT MAX(x) query

I have a dev machine with Cassandra 3.9 and 2 tables, one has ~~ 400,000 records, another about 40,000,000 records. Their structures are different.
Each of them has a secondary index on a field x, and I'm trying to run a query of the form SELECT MAX(x) FROM table. On the first table, the query takes a couple of seconds, and on the second table, it times out.
My experience is with relational databases where these queries are trivial and fast. So in Cassandra, it looks like the index isn't used to execute these queries? Is there an alternative?
In cassandra aggregation functions such as MIN, MAX, COUNT, SUM or AVG on a table without specifing a partition key is a bad practice. instead, you can have an other table that store the max value of x field for both tables.
However, you have to add some client side logic to maintain this max value in the other table when you run INSERT or UPDATE statements.
Tables structures :
CREATE TABLE t1 (
pk text PRIMARY KEY,
x int
);
CREATE TABLE t2 (
pk text PRIMARY KEY,
x int
);
CREATE TABLE agg_table (
table_name text PRIMARY KEY,
max_value int
);
So with this structure you can have the max value for a table :
SELECT max_value
FROM agg_table
WHERE table_name = 't1';
Hope this can help you.

Ranges (intervals) request in Cassandra DB - CQL

Excuse, if it is a duplicate, I've found a few questions about times ranges here, but my case seems a little bit different and not yet discussed.
I would like to store quite big chunks (bins) of data (blobs - 2-4Mb, this is the “black-box data”, I can't change its layout) to access with interval keys:
...
primary key ( bin_id int, from_item_id int, to_item_id int )
...
with ability to select with items ranges, like in this pseudo-code to select all chunks that contains interval of items [110, 200]:
select chunk from tb1 where chunk_id = 100500 and from_item_id >= 110 and to_item_id <= 200;
Attempt to run such a query directly ended with error:
code=2200 [Invalid query] message="PRIMARY KEY column "to_item_id" cannot be restricted (preceding column "from_item_id" is restricted by a non-EQ relation)"
Currently only solution I've found is to implement additional table (tb_map) with reverse mapping from item_id to bin_id and use select to make a query looks something like this:
...
– in tb_map
primary key (dummy_id, item_id)
...
select bin_id from tb_map where dummy_id = SOME_MAGIK and item_id >= 110 and item_id <= 200;
And then use bin_id to retrieve chunks from tb1 with EQ or IN operator like here:
select * from tb1 where bin_id in (...);
But I can't use this model due insert performance issues (application should avoid many inserts to additional table and should avoid maintaining additional data structures, but should be "as simple as nail").
Is it any simple solution to stay within one table (or several simple tables)? I'm stuck with no ideas how to model such behaviour in C* (may be slices should be used?), could local C* gurus provide any hints?
I'm using CQL 3.1
From CQL3 reference:
Moreover, for a given partition key, the clustering columns induce an ordering of rows and relations on them is restricted to the relations that allow to select a contiguous (for the ordering) set of rows.
In your case the query doesn't select a contiguous set of rows, so Cassandra refuses to process it.

Choosing the right schema for cassandra "table" in CQL3

We are trying to store lots of attributes for a particular profile_id inside a table (using CQL3) and cannot wrap our heads around which approach is the best:
a. create table mytable (profile_id, a1 int, a2 int, a3 int, a4 int ... a3000 int) primary key (profile_id);
OR
b. create MANY tables, eg.
create table mytable_a1(profile_id, value int) primary key (profile_id);
create table mytable_a2(profile_id, value int) primary key (profile_id);
...
create table mytable_a3000(profile_id, value int) primary key (profile_id);
OR
c. create table mytable (profile_id, a_all text) primary key (profile_id);
and just store 3000 "columns" inside a_all, like:
insert into mytable (profile_id, a_all) values (1, "a1:1,a2:5,a3:55, .... a3000:5");
OR
d. none of the above
The type of query we would be running on this table:
select * from mytable where profile_id in (1,2,3,4,5423,44)
We tried the first approach and the queries keep timing out and sometimes even kill cassandra nodes.
The answer would be to use a clustering column. A clustering column allows you to create dynamic columns that you could use to hold the attribute name (col name) and it's value (col value).
The table would be
create table mytable (
profile_id text,
attr_name text,
attr_value int,
PRIMARY KEY(profile_id, attr_name)
)
This allows you to add inserts like
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'a1', 3);
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'a2', 1031);
.....
insert into mytable (profile_id, attr_name, attr_value) values ('131', 'an', 2);
This would be the optimal solution.
Because you then want to do the following
'The type of query we would be running on this table: select * from mytable where profile_id in (1,2,3,4,5423,44)'
This would require 6 queries under the hood but cassandra should be able to do this in no time especially if you have a multi node cluster.
Also if you use the DataStax Java Driver you can run this requests asynchronously and concurrently on your cluster.
For more on data modelling and the DataStax Java Driver check out DataStax's free online training. Its worth a look
http://www.datastax.com/what-we-offer/products-services/training/virtual-training
Hope it helps.

Cassandra Composite Columns - How are CompositeTypes chosen?

I'm trying to understand the type used when I create composite columns.
I'm using CQL3 (via cqlsh) to create the CF and then the CLI to issue a describe command.
The Types in the Columns sorted by: ...CompositeType(Type1,Type2,...) are not the ones I'm expecting.
I'm using Cassandra 1.1.6.
CREATE TABLE CompKeyTest1 (
KeyA int,
KeyB int,
KeyC int,
MyData varchar,
PRIMARY KEY (KeyA, KeyB, KeyC)
);
The returned CompositeType is
CompositeType(Int32,Int32,UTF8)
Shouldn't it be (Int32,Int32,Int32)?
CREATE TABLE CompKeyTest2 (
KeyA int,
KeyB varchar,
KeyC int,
MyData varchar,
PRIMARY KEY (KeyA, KeyB, KeyC)
);
The returned CompositeType is
CompositeType(UTF8,Int32,UTF8)
Why isn't it the same as the types used when I define the table? I'm probably missing something basic in the type assignment...
Thanks!
The composite column name is composed of the values of primary keys 2...n and the name of the non-primary key column being saved.
(So if you have 5 non-key fields then you'll have five such columns and their column names will differ only in the last composed value which would be the non-key field name.)
So in both examples the composite column is made up of the values of KeyB, KeyC and the name of the column being stored ("MyData", in both cases). That's why you're seeing those CompositeTypes being returned.
(btw, the first key in the primary key is the partitioning key and its value is only used as the row key (if you're familiar with Cassandra under the covers). It is not used as part of any of the composite column names.)

Cassandra: Query with where clause containing greather- or lesser-than (< and >)

I'm using Cassandra 1.1.2 I'm trying to convert a RDBMS application to Cassandra. In my RDBMS application I have following table called table1:
| Col1 | Col2 | Col3 | Col4 |
Col1: String (primary key)
Col2: String (primary key)
Col3: Bigint (index)
Col4: Bigint
This table counts over 200 million records. Mostly used query is something like:
Select * from table where col3 < 100 and col3 > 50;
In Cassandra I used following statement to create the table:
create table table1 (primary_key varchar, col1 varchar,
col2 varchar, col3 bigint, col4 bigint, primary key (primary_key));
create index on table1(col3);
I changed the primary key to an extra column (I calculate the key inside my application).
After importing a few records I tried to execute following cql:
select * from table1 where col3 < 100 and col3 > 50;
This result is:
Bad Request: No indexed columns present in by-columns clause with Equal operator
The Query select col1,col2,col3,col4 from table1 where col3 = 67 works
Google said there is no way to execute that kind of queries. Is that right? Any advice how to create such a query?
Cassandra indexes don't actually support sequential access; see http://www.datastax.com/docs/1.1/ddl/indexes for a good quick explanation of where they are useful. But don't despair; the more classical way of using Cassandra (and many other NoSQL systems) is to denormalize, denormalize, denormalize.
It may be a good idea in your case to use the classic bucket-range pattern, which lets you use the recommended RandomPartitioner and keep your rows well distributed around your cluster, while still allowing sequential access to your values. The idea in this case is that you would make a second dynamic columnfamily mapping (bucketed and ordered) col3 values back to the related primary_key values. As an example, if your col3 values range from 0 to 10^9 and are fairly evenly distributed, you might want to put them in 1000 buckets of range 10^6 each (the best level of granularity will depend on the sort of queries you need, the sort of data you have, query round-trip time, etc). Example schema for cql3:
CREATE TABLE indexotron (
rangestart int,
col3val int,
table1key varchar,
PRIMARY KEY (rangestart, col3val, table1key)
);
When inserting into table1, you should insert a corresponding row in indexotron, with rangestart = int(col3val / 1000000). Then when you need to enumerate all rows in table1 with col3 > X, you need to query up to 1000 buckets of indexotron, but all the col3vals within will be sorted. Example query to find all table1.primary_key values for which table1.col3 < 4021:
SELECT * FROM indexotron WHERE rangestart = 0 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 1000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 2000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 3000 ORDER BY col3val;
SELECT * FROM indexotron WHERE rangestart = 4000 AND col3val < 4021 ORDER BY col3val;
If col3 is always known small values/ranges, you may be able to get away with a simpler table that also maps back to the initial table, ex:
create table table2 (col3val int, table1key varchar,
primary key (col3val, table1key));
and use
insert into table2 (col3val, table1key) values (55, 'foreign_key');
insert into table2 (col3val, table1key) values (55, 'foreign_key3');
select * from table2 where col3val = 51;
select * from table2 where col3val = 52;
...
Or
select * from table2 where col3val in (51, 52, ...);
Maybe OK if you don't have too large of ranges. (you could get the same effect with your secondary index as well, but secondary indexes aren't highly recommended?). Could theoretically parallelize it "locally on the client side" as well.
It seems the "Cassandra way" is to have some key like "userid" and you use that as the first part of "all your queries" so you may need to rethink your data model, then you can have queries like select * from table1 where userid='X' and col3val > 3 and it can work (assuming a clustering key on col3val).

Resources