How do I add two column values in a table with CQL? - cassandra

I am needing to add two values together to create a third value with CQL. Is there any way to do this? My table has the columns number_of_x and number_of_y and I am trying to create total. I did an update on the table with a set command as follows:
UPDATE my_table
SET total = number_of_x + number_of_y ;
When I run that I get the message back saying:
no viable alternative at input ';'.

Per docs. assignment is one of:
column_name = value
set_or_list_item = set_or_list_item ( + | - ) ...
map_name = map_name ( + | - ) ...
map_name = map_name ( + | - ) { map_key : map_value, ... }
column_name [ term ] = value
counter_column_name = counter_column_name ( + | - ) integer
And you cannot mix counter and non counter columns in the same table so what you are describing is impossible in a single statement. But you can do a read before write:
CREATE TABLE my_table ( total int, x int, y int, key text PRIMARY KEY )
INSERT INTO my_table (key, x, y) VALUES ('CUST_1', 1, 1);
SELECT * FROM my_table WHERE key = 'CUST_1';
key | total | x | y
--------+-------+---+---
CUST_1 | null | 1 | 1
UPDATE my_table SET total = 2 WHERE key = 'CUST_1' IF x = 1 AND y = 1;
[applied]
-----------
True
SELECT * FROM my_table WHERE key = 'CUST_1';
key | total | x | y
--------+-------+---+---
CUST_1 | 2 | 1 | 1
The IF clause will handle concurrency issues if x or y was updated since the SELECT. You can than retry again if applied is False.
My recommendation however in this scenario is for your application to just read both x and y, then do addition locally as it will perform MUCH better.
If you really want C* to do the addition for you, there is a sum aggregate function in 2.2+ but it will require updating your schema a little:
CREATE TABLE table_for_aggregate (key text, type text, value int, PRIMARY KEY (key, type));
INSERT INTO table_for_aggregate (key, type, value) VALUES ('CUST_1', 'X', 1);
INSERT INTO table_for_aggregate (key, type, value) VALUES ('CUST_1', 'Y', 1);
SELECT sum(value) from table_for_aggregate WHERE key = 'CUST_1';
system.sum(value)
-------------------
2

Related

Select from multiple partitions in Cassandra, with a composite partition key?

In Cassandra (CQL), it's possible to query multiple partitions like for example:
create table foo(i int, j int, primary key (i));
insert into foo (i, j) values (1, 1);
insert into foo (i, j) values (2, 2);
select * from foo where i in (1, 2);
i | j
---+---
1 | 1
2 | 2
However, if foo has a composite partition key, I'm not sure if it's possible:
create table foo(i int, j int, k int, primary key ((i, j), k));
Some queries I've tried, which CQL has rejected are:
select * from foo where (i = 1 and j = 1) or (i = 2 and j = 2);
select * from foo where (i, j) in ((1, 1), (2, 2));
I've also tried:
select * from foo where i in (1, 2) and j in (1, 2);
but this is too wide of a query, since this will also return values where (i=1, j=2) or (i = 2, j=1).
It is possible to query in clause from client side using DSE Java Driver .You have to use async option
https://docs.datastax.com/en/developer/java-driver/4.10/manual/core/async/

How to use series_divide() in Kusto?

I am not able correctly divide time-series data with another time-series.
I get data from my TestTablewhich results in the following view:
TagId, sdata
8862, [0,0,0,0,2,2,2,3,4]
6304, [0,0,0,0,2,2,2,3,2]
I want to divide the sdata series for tagId 8862 with the series from 6304
I expect the following result:
[NaN,NaN,NaN,NaN,1,1,1,1,2]
When I try the below code, I only get two empty ddata rows in my S2 results
TestTable
| where TagId in (8862,6304)
| make-series sdata = avg(todouble(Value)) default=0 on TimeStamp in range (datetime(2019-06-27), datetime(2019-06-29), 1m) by TagId
| as S1;
S1 | project ddata = series_divide(sdata[0].['sdata'], sdata[1].['sdata'])
| as S2
What am I doing wrong?
both arguments to series_divide() can't come from two separate rows in the dataset.
here's an example for how you could achieve that (based on the limited-and-perhaps-not-fully-representative-of-your-real use case, as shown in your question)
let T =
datatable(tag_id:long, sdata:dynamic)
[
8862, dynamic([0,0,0,0,2,2,2,3,4]),
6304, dynamic([0,0,0,0,2,2,2,3,2]),
]
;
let get_value_from_T = (_tag_id:long)
{
toscalar(
T
| where tag_id == _tag_id
| take 1
| project sdata
)
};
print sdata_1 = get_value_from_T(8862), sdata_2 = get_value_from_T(6304)
| extend result = series_divide(sdata_1, sdata_2)
which returns:
|sdata_1 | sdata_2 | result |
|--------------------|---------------------|---------------------------------------------|
|[0,0,0,0,2,2,2,3,4] | [0,0,0,0,2,2,2,3,2] |["NaN","NaN","NaN","NaN",1.0,1.0,1.0,1.0,2.0]|

Add Column with random value Sequelize PostgreSQL

I want to add column with NOT_NULL constraint so column will contain random default values following is my code how can i do this
up: function (queryInterface, Sequelize, done) {
queryInterface.addColumn(
{
tableName: 'someTable',
schema: 'sometemplate'
},
'someColumn', //column name
{ //column date type and constraint
type: Sequelize.STRING,
allowNull: false,
defaultValue: // I want this to random value
})
.then(function () { done() })
.catch(done);
}
PostgreSQL example:
CREATE OR REPLACE FUNCTION f_random_text(
length integer
)
RETURNS text AS
$body$
WITH chars AS (
SELECT unnest(string_to_array('A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9', ' ')) AS _char
),
charlist AS
(
SELECT _char FROM chars ORDER BY random() LIMIT $1
)
SELECT string_agg(_char, '')
FROM charlist
;
$body$
LANGUAGE sql;
DROP TABLE IF EXISTS tmp_test;
CREATE TEMPORARY TABLE tmp_test (
id serial,
data text default f_random_text(12)
);
INSERT INTO tmp_test
VALUES
(DEFAULT, DEFAULT),
(DEFAULT, DEFAULT)
;
SELECT * FROM tmp_test;
id | data
----+--------------
1 | RYMUJH4E0NIQ
2 | 7U4029BOKAEJ
(2 rows)
Apparently you can do this. (Of course, you can add other characters as well, or use other random string generator as well - like this, for example.)
ref: https://dba.stackexchange.com/questions/19632/how-to-create-column-in-db-with-default-value-random-string

Accessing rows outside of window while aggregating in Spark dataframe

In short, in the example below I want to pin 'b to be the value in the row that the result will appear in.
Given:
a,b
1,2
4,6
3,7 ==> 'special would be: (1-7 + 4-7 + 3-7) == -13 in this row
val baseWin = Window.partitionBy("something_I_forgot").orderBy("whatever")
val sumWin = baseWin.rowsBetween(-2, 0)
frame.withColumn("special",sum( 'a - 'b ).over(win) )
Or another way to think of it is I want to close over the row when I calculate the sum so that I can pass in the value of 'b (in this case 7)
* Update *
Here is what I want to accomplish as an UDF. In short, I used a foldLeft.
def mad(field : Column, numPeriods : Integer) : Column = {
val baseWin = Window.partitionBy("exchange","symbol").orderBy("datetime")
val win = baseWin.rowsBetween(numPeriods + 1, 0)
val subFunc: (Seq[Double],Int) => Double = { (input: Seq[Double], numPeriods : Int) => {
val agg = grizzled.math.stats.mean(input: _*)
val fooBar = (1.0 / -numPeriods)*input.foldLeft(0.0)( (a,b) => a + Math.abs((b-agg)) )
fooBar
} }
val myUdf = udf( subFunc )
myUdf(collect_list(field.cast(DoubleType)).over(win),lit(numPeriods))
}
If I understood correctly what you're trying to do, I think you can refactor your logic a bit to achieve it. The way you have it right now, you're probably getting "-7" instead of -13.
For the "special" column, (1-7 + 4-7 + 3-7), you can calculate it like (sum(a) - count(*) * b):
dfA.withColumn("special",sum('a).over(win) - count("*").over(win) * 'b)

Cassandra: Fixed number of rows in a table

I want to create a table with fixed number of rows (lets say N), where if N+1th row was added, then 1st row would be removed.
This is the table, I use for storage of last N best results from graph analysis:
CREATE TABLE IF NOT EXISTS lp_registry.best (
value float, // best value for current graph
verts int, // number of vertices in graph
edges int, // number of edges in graph
wid text, // worker id
id timeuuid, // timeuuid
PRIMARY KEY (wid, id)
) WITH CLUSTERING ORDER BY (id ASC);
I've read about expiring data at DataStax, but found only TTL expirations. So I decided to do it in following way.
My Approach A:
Everytime a new result is wanted to be added, id of oldest row is retrieved..
SELECT wid, id FROM lp_registry.best LIMIT 1;
..as well as current number of rows..
SELECT COUNT(*) FROM FROM lp_registry.best;
Consequently if count >= N, then the oldest row is removed and the newest is added...
BEGIN BATCH
INSERT INTO lp_registry.best (value, verts, edges, wid, id) VALUES (?, ?, ?, ? now());
DELETE FROM lp_registry.best WHERE wid = ? AND id = ?;
APPLY BATCH;
This approach has problem with that first selects are not atomic operations together with the following batch. So if any other worker deleted oldest row between select and batch, or N was exceeded, then this wouldn't work.
My Approach B:
Same first steps ...
SELECT wid, id FROM lp_registry.best LIMIT 1;
SELECT COUNT(*) FROM FROM lp_registry.best;
Then try to delete oldest row again and again until success..
if count < N {
INSERT INTO lp_registry.best (value, verts, edges, wid, id) VALUES (?, ?, ?, ? now());
} else {
while not success {
DELETE FROM lp_registry.best WHERE wid = ? AND id = ? IF EXISTS;
}
INSERT INTO lp_registry.best (value, verts, edges, wid, id) VALUES (?, ?, ?, ? now());
}
In this approach there is still trouble with exceeding N in the database, before count < N is checked.
Can you point me to the right solution?
Here is my solution. At first we need to create a table that will store current number of rows...
CREATE TABLE IF NOT EXISTS row_counter (
rmax int, // maximum allowed number of rows
rows int, // current number of rows
name text, // name of table
PRIMARY KEY (name)
);
Then initialize it for a given fixed-rows tables:
INSERT INTO row_counter (name, rmax, rows)
VALUES ('best', 100, 0);
These are the statements used in the following code:
q1 = "SELECT rows, rmax FROM row_counter WHERE name = 'best'";
q2 = "UPDATE row_counter SET rows = ? WHERE name = 'best' IF rows < ?";
q3 = "SELECT wid, id FROM best LIMIT 1";
q4 = "DELETE FROM best WHERE wid = ? AND id = ? IF EXISTS";
q5 = "INSERT INTO best (vertex, value, verts, edges, wid, id) VALUES (?, ?, ?, ?, ?, now())";
selectCounter = session.prepare(q1);
updateCounter = session.prepare(q2);
selectOldBest = session.prepare(q3);
deleteOldBest = session.prepare(q4);
insertNewBest = session.prepare(q5);
Solution in Java:
// Success indicator
boolean succ = false;
// Get number of registered rows in the table with best results
Row row = session.execute(selectCounter.bind()).one();
int rows = row.getInt("rows") + 1;
int rmax = row.getInt("rmax");
// Repeatedly try to reserve empty space in table
while (!succ && rows <= rmax) {
succ = session.execute(updateCounter.bind(rows, Math.min(rows, rmax))).wasApplied();
rows = session.execute(selectCounter.bind()).one().getInt("rows") + 1;
}
// If there is not empty space in table, repeatedly try to make new empty space
while (!succ) {
row = session.execute(selectOldBest.bind()).one();
succ = session.execute(deleteOldBest.bind(row.getString("wid"), row.getUUID("id"))).wasApplied();
}
// Insert new row
session.execute(insertNewBest.bind(vertex, value, verts, edges, workerCode));

Resources