Identify if an upsert operation inserts or updates the row in YugabyteDB

Identify if an upsert operation inserts or updates the row in YugabyteDB - yugabytedb

[Question posted by a user on YugabyteDB Community Slack]
Is there a way to identify if an upsert operation like the one shown below, inserts or either updates the row e.g., with the Java or Golang driver?
UPDATE test set value = 'value1', checkpoint = 'cas1' WHERE key = 'key1' IF checkpoint = '' OR NOT EXISTS;

The RETURNS STATUS AS ROW is a YCQL feature. In YSQL, you could use an AFTER INSERT OR UPDATE... EACH ROW trigger to detect the outcome. The challenge, then, would be to surfcae the result in the session that made the change. You could use a user-defined run-time parameter (set my_app.outcome = 'true') or a temp table.
—regards, bryn#yugabyte.com

You can use RETURNS STATUS AS ROW as documented here: https://docs.yugabyte.com/preview/api/ycql/dml_update/#returns-status-as-row 2
Example:
cqlsh:sample> CREATE TABLE test(h INT, r INT, v LIST<INT>, PRIMARY KEY(h,r)) WITH transactions={'enabled': true};
cqlsh:sample> INSERT INTO test(h,r,v) VALUES (1,1,[1,2]);
Unapplied update when IF condition is false:
cqlsh:sample> UPDATE test SET v[2] = 4 WHERE h = 1 AND r = 1 IF v[1] = 3 RETURNS STATUS AS ROW;
[applied] | [message] | h | r | v
-----------+-----------+---+---+--------
False | null | 1 | 1 | [1, 2]
Applied update when IF condition true:
cqlsh:sample> UPDATE test SET v[0] = 4 WHERE h = 1 AND r = 1 IF v[1] = 2 RETURNS STATUS AS ROW;
[applied] | [message] | h | r | v
-----------+-----------+------+------+------
True | null | null | null | null

Related

Adding a new column whose values are based on another column in either dataframe or excel

I want to add a new column "X" whose values should be either 0 or 1 such that if there exists a value(particularly date in my case) in column "A", it should give 1 or any text
example:
A | X
----------
*date* | 1
null | 0
*date* | 1
*date* | 1
*date* | 1
null | 0
any way to do this in pandas or excel/office

Here is an example in excel:
=if(isnull(a2);0;1)
or
=if(a2>0;1;0)
(dates are aways greater then zero.

Pandas: With array of col names in a desired column order, select those that exist, NULL those that don't

I have an array of column names I want as my output table in that order e.g. ["A", "B", "C"]
I have an input table that USUALLY contains all of the values in the array but NOT ALWAYS (the raw data is a JSON API response).
I want to select all available columns from the input table, and if a column does not exist, I want it filled with NULLs or NA or whatever, it doesn't really matter.
Let's say my input DataFrame (call it input_table) looks like this:
+-----+--------------+
| A | C |
+-----+--------------+
| 123 | test |
| 456 | another_test |
+-----+--------------+
I want an output dataframe that has columns A, B, C in that order to produce
+-----+------+--------------+
| A | B | C |
+-----+------+--------------+
| 123 | NULL | test |
| 456 | NULL | another_test |
+-----+------+--------------+
I get a keyerror when I do input_table[["A","B","C"]]
I get a NoneType returned when I do input_table.get(["A","B","C"])
I was able to achieve what I want via:
for i in desired_columns_array:
if i not in input_dataframe:
ouput_dataframe[i] = ""
else:
output_dataframe[i] = input_dataframe[i]
But I'm wondering if there's something less verbose?
How do I get a desired output schema to match an input array when one or more columns in the input dataframe may not be present?

Transpose and reindex
df = pd.DataFrame([[123,'test'], [456, 'another test']], columns=list('AC'))
l = list('ACB')
df1 = df.T.reindex(l).T[sorted(l)]
A B C
0 123 NaN test
1 456 NaN another test

DataFrame.reindex over the column axis:
cols = ['A', 'B', 'C']
df.reindex(cols, axis='columns')
A B C
0 123 NaN test
1 456 NaN another_test

Cassandra Partition key duplicates?

I am new to Cassandra so I had a few quick questions, suppose I do this:
CREATE TABLE my_keyspace.my_table (
id bigint,
year int,
datetime timestamp,
field1 int,
field2 int,
PRIMARY KEY ((id, year), datetime))
I imagine Cassandra as something like Map<PartitionKey, SortedMap<ColKey, ColVal>>,
My question is when querying for something from Cassandra using a WHERE, it will be like:
SELECT * FROM my_keyspace.my_table WHERE id = 1 AND year = 4,
This could return 2 or more records, how does this fit in with the data model of Cassandra?
If it really it a Big HashMap how come duplicate records for a partition key are allowed?
Thanks!

There is a batch of entries in the SortedMap<ColKey, ColVal> for each row, using its sorted nature.
To build on your mental model, while there is only 1 partition key for id = 1 AND year = 4 there are multiple cells:
(id, year) | ColKey | ColVal
------------------------------------------
1, 4 | datetime(1):field1 | 1 \ Row1
1, 4 | datetime(1):field2 | 2 /
1, 4 | datetime(5):field1 | 1 \
1, 4 | datetime(5):field2 | 2 / Row2
...

JPQL: SELECT b, count(ts) FROM Branch b JOIN b.tourScheduleList WHERE ts.deleted = 0

I get the desired result here
SELECT b, count(ts) FROM Branch b JOIN b.tourScheduleList ts WHERE ts.deleted = 0 GROUP BY b.id ORDER BY b.name ASC
b1 | 2
b2 | 1
but then I need to get the count of ts.tourAppliedList so I updated the query to
SELECT b, count(ts), count(ta) FROM Branch b JOIN b.tourScheduleList ts JOIN ts.tourAppliedList ta WHERE ts.deleted = 0 GROUP BY b.id ORDER BY b.name ASC
which resulted to
b1 | 3 | 3
b2 | 2 | 2
the result is wrong. I don't know why count(ts) is equal to count(ta)
I tried returning ts then just do a count later but it's returning all its content without considering the ts.deleted = 0
SELECT b, ts FROM Branch b JOIN b.tourScheduleList ts WHERE ts.deleted = 0 GROUP BY b.id ORDER BY b.name ASC
then in the view I just #{item.ts.tourAppliedList.size()} it's not considering the ts.deleted = 0

The problem is your expectation is wrong.
This Join will give you:
b1 | ts1 | ta1
b1 | ts1 | ta2
b1 | ts2 | ta3
b2 | ts3 | ta4
b2 | ts3 | ta5
Or something along this line...
What happens when you group and count those rows?
Simple you have 3 entry for b1 and 2 for b2.
What you need there is count(distinct ts)
Since there are multiple ts for every different ta you would then find a difference
P.s. i dont know if jpql permit a count(distinct ), if thats the case you better do two query and count ts with the join only on ts and then ta with the join on ts and ta

Cassandra: how to initialize the counter column with value?

I have to benchmark Cassandra with the Facebook Linkbench. There are two phase during the Benchmark, the load and the request phase.
in the Load Phase, Linkbench fill the cassandra tables : nodes, links and counts (for links counting) with default values(graph data).
the count table looks like this:
keyspace.counttable (
link_id bigint,
link_type bigint,
time bigint,
version bigint,
count counter,
PRIMARY KEY (link_id, link_type, time, version)
my question is how to insert the default counter values (before incrementing and decrementing the counter in the Linkbench request phase) ?
If it isn't possible to do that with cassandra, how should i increment/decrement a bigint variable (instead of counter variable)
Any suggest and comments? Thanks a lot.

The default value is zero. Given
create table counttable (
link_id bigint,
link_type bigint,
time bigint,
version bigint,
count counter,
PRIMARY KEY (link_id, link_type, time, version)
);
and
update counttable set count = count + 1 where link_id = 1 and link_type = 1 and time = 1 and version = 1;
We see that the value of count is now 1.
select * from counttable ;
link_id | link_type | time | version | count
---------+-----------+------+---------+-------
1 | 1 | 1 | 1 | 1
(1 rows)
So, if we want to set it to some other value we can:
update counttable set count = count + 500 where link_id = 1 and link_type = 1 and time = 1 and version = 2;
select * from counttable ;
link_id | link_type | time | version | count
---------+-----------+------+---------+-------
1 | 1 | 1 | 1 | 1
1 | 1 | 1 | 2 | 500
(2 rows)

There is no elegant way to initialize a counter column with a non-zero value. The only operation you can do on a counter field is increment/decrement. I recommend to keep the offset (e.g. the your intended initial value) in a different column, and simply add the two values in your client application.

Thank you for the Answers. I implemented the following solution to initialize the counter Field.
as the initial and default value of the counter Field is 0 ,i incremented it with my default value. it looks like Don Branson solution but with only one column:
create table counttable (
link_id bigint,
link_type bigint,
count counter,
PRIMARY KEY (link_id, link_type)
);
i set the value with this statement (during the load phase):
update counttable set count = count + myValue where link_id = 1 and link_type = 1;
select * from counttable ;
link_id | link_type | count
---------+-----------+--------
1 | 1 | myValue (added to 0)
(1 row)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Identify if an upsert operation inserts or updates the row in YugabyteDB - yugabytedb

Related

Adding a new column whose values are based on another column in either dataframe or excel

Pandas: With array of col names in a desired column order, select those that exist, NULL those that don't

Cassandra Partition key duplicates?

JPQL: SELECT b, count(ts) FROM Branch b JOIN b.tourScheduleList WHERE ts.deleted = 0

Cassandra: how to initialize the counter column with value?

Categories

Resources