can check constraints be violated by writing to subquery tables? - subquery

With Firebird, if I have a check constrained by a subquery to another table, and I write to the table in the subquery that would violate the check, what fails, if anything?
If nothing fails, will the constraint be violated on the next read from the table with the check constraint? If not, what does Firebird do to prevent the constraint from being violated on read?
Example
table_a has a check constraint on column_a_table_a that should be < SUM(column_a_table_b) FROM table_b.

A CHECK CONSTRAINT only applies to the table it is defined on, and only guarantees integrity if the constraint is derived from the row it is applied to.
This is also documented in the Interbase 6.0 Data Definition Guide on page 106 (available from the reference manual section of the Firebird site):
Note A CHECK constraint guarantees data integrity only when the values being verified are in the same row that is being inserted and deleted. If you try to compare values in different rows of the same table or in different tables, another user could later modify those values, thus invalidating the original CHECK constraint that was applied at insertion time.
So if you modify table_b in such a way that the check contraint applied to table_a no longer holds, then you will not receive an error because this constraint does not apply to this table.
Now if you modify table_a, the check constraint will fire and will result in an error (only for modified row(s), and only where the constraint no longer holds).

Related

Adding unique constraint but ignoring soft-deleted entries [duplicate]

I'd like to add a constraint which enforces uniqueness on a column only in a portion of a table.
ALTER TABLE stop ADD CONSTRAINT myc UNIQUE (col_a) WHERE (col_b is null);
The WHERE part above is wishful thinking.
Any way of doing this? Or should I go back to the relational drawing board?
PostgreSQL doesn't define a partial (i.e. conditional) UNIQUE constraint - however, you can create a partial unique index.
PostgreSQL uses unique indexes to implement unique constraints, so the effect is the same, with an important caveat: you can't perform upserts (ON CONFLICT DO UPDATE) against a unique index like you would against a unique constraint.
Also, you won't see the constraint listed in information_schema.
CREATE UNIQUE INDEX stop_myc ON stop (col_a) WHERE (col_b is NOT null);
See partial indexes.
it has already been said that PG doesn't define a partial (ie conditional) UNIQUE constraint. Also documentation says that the preferred way to add a unique constraint to a table is ADD CONSTRAINT Unique Indexes
The preferred way to add a unique constraint to a table is ALTER TABLE ... ADD CONSTRAINT. The use of indexes to enforce unique constraints could be considered an implementation detail that should not be accessed directly. One should, however, be aware that there's no need to manually create indexes on unique columns; doing so would just duplicate the automatically-created index.
There is a way to implement it using Exclusion Constraints, (thank #dukelion for this solution)
In your case it will look like
ALTER TABLE stop ADD CONSTRAINT myc EXCLUDE (col_a WITH =) WHERE (col_b IS null);

Cassandra insert/update if not exists or field=value

I would like to do in a single operation an insert if a record doesn't exist, or update only if a field of the row has a certain value.
Imagine the table:
CREATE TABLE (id VARCHAR PRIMARY KEY, field_a VARCHAR, field_b VARCHAR);
Is it possible to have something like:
UPDATE my_table SET field_a='test' WHERE id='an-id' IF NOT EXISTS OR IF field_b='AVALUE';
Doing INSERT/UPDATE on a single statement that insert the record if field_b has value AVALUE or create a new row if a row doesn't already exists, but in case a row is in the table having field_b with a different value failing the update.
UPDATE my_table SET field_a='test' WHERE id='an-id'
IF NOT EXISTS OR IF field_b='AVALUE';
There are a few of nuances here. First it's important to remember that when doing a compare-and-set (CAS) operation in CQL, the syntax and capabilities between INSERT and UPDATE are not the same.
Case-in-point, the IF NOT EXISTS conditional is valid for INSERT, but not for UPDATE. On the flip-side, IF EXISTS is valid for UPDATE, but not for INSERT.
Secondly, OR is not a valid operator in CQL WHERE or in CAS operation conditionals.
Third, using UPDATE with IF EXISTS short-circuits any subsequent conditionals. So UPDATE can either use IF EXISTS or IF (condition) [ AND (another condition) ], but not both.
Considering these points, it would seem one approach here would be to split the statement into two:
INSERT INTO my_table (id,field_a) VALUES ('a1','test') IF NOT EXISTS;
And:
UPDATE my_table SET field_a='test' WHERE id='an-id' IF field_b='AVALUE';
These are both valid CQL. However, that doesn't really help this situation. An alternative would be to build this logic on the application side. Technically, read-before-write approaches are considered anti-patterns in Cassandra, in-built CAS operations not withstanding due to their use of lightweight transactions.
Perhaps something like SELECT field_a,field_b FROM my_table WHERE id='an-id'; is enough to answer whether it exists as well as what the value of field_b is, thus triggering an additional write? There's a potential for a race condition here, so I'd closely examine the business requirements to see if something like this could work.

Any downside to 'redundant' clustering column?

I've noticed that changing a regular Cassandra column to a clustering column can significantly reduce the size of the table in some circumstances.
For this example table:
id UUID K
time TIMESTAMP C
state TINYINT (C)
value DOUBLE
The size of 100000 rows is estimated at 3.9 MB if state is an ordinary column, or 2.4 MB if state is a clustering column (estimated using the method in DataStax course DS220).
If you look at how the data is physically stored it isn't hard to see why this difference exists. In the former case there are two internal cells per timestamp - one for state and one for value. In the latter case value is incorporated into the cell key so there is just one cell per timestamp, and the timestamp (part of the cell key) is stored only once.
The second clustering column does not create any new restrictions on what can be queried. SELECT * FROM table WHERE id=? AND time>=? AND time<? is still fine.
It seems like a win-win situation. Are there any downsides, in particular, performance-wise?
(All I can think of is that if state is a regular column then it can be omitted from an INSERT and the state internal cell will never be created. I imagine if state is a regular column and usually omitted then the table will be very slightly smaller than if state is a clustering column.)
Additional comments
It's worth noting that in the definition above you can't filter by state without an equality filter on time, making it not very useful for filtering state. And if you put the state column above time to resolve this then yes you can filter by state and time inequality, but if you want all states (IN clause) then the rows are returned ordered by state first, then time, which again is not very useful.
I would think the main difference here is that if it's a clustering column it must be provided with INSERTs as it's part of the primary key. Also, as it's part of the primary key, you can't update it either, which could be problematic for some tables. If you don't have any concerns about either of those two, I don't see any reason why you couldn't add it.
1) You create a row per state. Your data model would have to realize and understand that. You could potentially create two rows with the different states for the same id, time, which the original model disallows.
2) If you delete, you'll either need to specify state or you'll be creating Range Tombstones (range deletes, because you're deleting all rows for a given id and time, but it may be a range of states). Range tombstones are especially expensive (on the read path) in 2.1, and aren't properly accounted for in TombstoneOverwhelming exception handlers until a fairly recent version of Cassandra, so avoiding range tombstones is usually a good idea, unless you actually need them.

Primary key : query & updates

Little problem here with cassandra. Basically my data has a status (INITIALIZED, PERFORMED, ENDED...), and I have different scheduled tasks that will query this data based on the status with an "IN" clause. So one scheduler will work with the data that is INITIALIZED, one with the PERFORMED, some with both, etc...
Once the data is retrieved, it is processed and the status changes accordingly (INITIALIZED -> PERFORMED -> ENDED).
The problem : in order to be able to use the IN clause, the status has to figure among the primary keys of my table. But when I update the status... it creates a new record in my table, since the UPSERT doesn't find any data with the primary keys given...
How do I solve that ?
Instead of including the status column in your primary key columns you can create a secondary index on the column. However, the IN clause is not (yet) supported for secondary index columns. But as you have a very limited number of values to look up you could use equality conditions in your WHERE clause and then merge the results client-side?
Beware that using secondary indexes comes at a cost. Check out "when not to use an index". In your case these points may apply:
On a frequently updated or deleted column. See Problems using an
index on a frequently updated or deleted column below.
To look for a
row in a large partition unless narrowly queried. See Problems using
an index to look for a row in a large partition unless narrowly
queried below.

Difference between UPDATE and INSERT in Cassandra?

What is the difference between UPDATE and INSERT when executing CQL against Cassandra?
It looks like there used to be no difference, but now the documentation says that INSERT does not support counters while UPDATE does.
Is there a "preferred" method to use? Or are there cases where one should be used over the other?
Thanks so much!
There is a subtle difference. Inserted records via INSERT remain if you set all non-key fields to null. Records inserted via UPDATE go away if you set all non-key fields to null.
Try this:
CREATE TABLE T (
pk int,
f1 int,
PRIMARY KEY (pk)
);
INSERT INTO T (pk, f1) VALUES (1, 1);
UPDATE T SET f1=2 where pk=2;
SELECT * FROM T;
Returns:
pk | f1
----+----
1 | 1
2 | 2
Now, update each row setting f1 to null.
UPDATE T SET f1 = null WHERE pk = 1;
UPDATE T SET f1 = null WHERE pk = 2;
SELECT * FROM T;
Note that row 1 remains, while row 2 is removed.
pk | f1
----+------
1 | null
If you look at these using Cassandra-cli, you will see a different in how the rows are added.
I'd sure like to know whether this is by design or a bug and see this behavior documented.
Counter Columns in Cassandra couldn't be set to an arbitrary value: they can only be incremented or decremented by any arbitrary value.
For this reason, INSERT doesn't support Counter Column because you cannot "insert" a value into a Counter Column. You can only UPDATE them (increment or decrement) by some value. Here's how you would update a Counter column.
UPDATE ... SET name1 = name1 + <value>
You asked:
Is there a "preferred" method to use? Or are there cases where one should be used over the other?
Yes. If you are inserting values to the database, you can use INSERT. If the column doesn't exists, it will be created for you. Otherwise, INSERT's effect is similar to UPDATE. INSERT is useful when you don't have a pre-designed schema (Dynamic Column Family, i.e. insert anything, anytime). If you are designing the schema before hand (Static Column Family, similar to RDMS) and know each column, then you can use UPDATE.
Another subtle difference (i'm starting to believe cql is a terrible interface to cassandra, full of subtleties and caveats due to using similar SQL syntax but slightly different semantics) is with setting TTLs on existing data. With UPDATE you cannot update the TTL of the keys, even if the new actual values are equal to the old values. The solution is to INSERT the new row instead, with the new TTL already set
Regarding the subtle difference highlighted by billbaird (I'm unable to comment on that post directly) where a row created by an update operation will be deleted if all non-key fields are null:
That is expected behavior and not a bug based on the bug report at https://issues.apache.org/jira/browse/CASSANDRA-11805 (which was closed as "Not A Problem")
I ran into this myself when using Spring Data for the first time. I was using the save(T entity) method of a repository, but no row was being created. it turned out Spring Data was using an UPDATE because it determined that the object wasn't 'new' (not sure that test for 'isNew' makes sense here), and I happened to be testing with entities that only had the key fields set.
For this Spring Data case, the Cassandra-specific repository interfaces do provide an insert method that appear to consistently use an INSERT if that behavior is desired instead (though Spring's documentation doesn't document these details sufficiently either).

Resources