Cassandra: Fixed number of rows in a table - cassandra

I want to create a table with fixed number of rows (lets say N), where if N+1th row was added, then 1st row would be removed.
This is the table, I use for storage of last N best results from graph analysis:
CREATE TABLE IF NOT EXISTS lp_registry.best (
value float, // best value for current graph
verts int, // number of vertices in graph
edges int, // number of edges in graph
wid text, // worker id
id timeuuid, // timeuuid
PRIMARY KEY (wid, id)
) WITH CLUSTERING ORDER BY (id ASC);
I've read about expiring data at DataStax, but found only TTL expirations. So I decided to do it in following way.
My Approach A:
Everytime a new result is wanted to be added, id of oldest row is retrieved..
SELECT wid, id FROM lp_registry.best LIMIT 1;
..as well as current number of rows..
SELECT COUNT(*) FROM FROM lp_registry.best;
Consequently if count >= N, then the oldest row is removed and the newest is added...
BEGIN BATCH
INSERT INTO lp_registry.best (value, verts, edges, wid, id) VALUES (?, ?, ?, ? now());
DELETE FROM lp_registry.best WHERE wid = ? AND id = ?;
APPLY BATCH;
This approach has problem with that first selects are not atomic operations together with the following batch. So if any other worker deleted oldest row between select and batch, or N was exceeded, then this wouldn't work.
My Approach B:
Same first steps ...
SELECT wid, id FROM lp_registry.best LIMIT 1;
SELECT COUNT(*) FROM FROM lp_registry.best;
Then try to delete oldest row again and again until success..
if count < N {
INSERT INTO lp_registry.best (value, verts, edges, wid, id) VALUES (?, ?, ?, ? now());
} else {
while not success {
DELETE FROM lp_registry.best WHERE wid = ? AND id = ? IF EXISTS;
}
INSERT INTO lp_registry.best (value, verts, edges, wid, id) VALUES (?, ?, ?, ? now());
}
In this approach there is still trouble with exceeding N in the database, before count < N is checked.
Can you point me to the right solution?

Here is my solution. At first we need to create a table that will store current number of rows...
CREATE TABLE IF NOT EXISTS row_counter (
rmax int, // maximum allowed number of rows
rows int, // current number of rows
name text, // name of table
PRIMARY KEY (name)
);
Then initialize it for a given fixed-rows tables:
INSERT INTO row_counter (name, rmax, rows)
VALUES ('best', 100, 0);
These are the statements used in the following code:
q1 = "SELECT rows, rmax FROM row_counter WHERE name = 'best'";
q2 = "UPDATE row_counter SET rows = ? WHERE name = 'best' IF rows < ?";
q3 = "SELECT wid, id FROM best LIMIT 1";
q4 = "DELETE FROM best WHERE wid = ? AND id = ? IF EXISTS";
q5 = "INSERT INTO best (vertex, value, verts, edges, wid, id) VALUES (?, ?, ?, ?, ?, now())";
selectCounter = session.prepare(q1);
updateCounter = session.prepare(q2);
selectOldBest = session.prepare(q3);
deleteOldBest = session.prepare(q4);
insertNewBest = session.prepare(q5);
Solution in Java:
// Success indicator
boolean succ = false;
// Get number of registered rows in the table with best results
Row row = session.execute(selectCounter.bind()).one();
int rows = row.getInt("rows") + 1;
int rmax = row.getInt("rmax");
// Repeatedly try to reserve empty space in table
while (!succ && rows <= rmax) {
succ = session.execute(updateCounter.bind(rows, Math.min(rows, rmax))).wasApplied();
rows = session.execute(selectCounter.bind()).one().getInt("rows") + 1;
}
// If there is not empty space in table, repeatedly try to make new empty space
while (!succ) {
row = session.execute(selectOldBest.bind()).one();
succ = session.execute(deleteOldBest.bind(row.getString("wid"), row.getUUID("id"))).wasApplied();
}
// Insert new row
session.execute(insertNewBest.bind(vertex, value, verts, edges, workerCode));

Related

Draw line below/above line.new

Here is a script i'm using for detecting pivot point.
I'd like to draw a line at the close of the new detected pivot point, each time new pivot point are detected (and deleting other lines)
i try to change the line.new value to get the "close" but it didn't work
See the picture below, the blue lines i draw is what i'm searching to do :D
enter image description here
indicator("Pivot Points", overlay=true)
// Get user input
var devTooltip = "Deviation is a multiplier that affects how much the price should deviate from the previous pivot in order for the bar to become a new pivot."
var depthTooltip = "The minimum number of bars that will be taken into account when analyzing pivots."
threshold_multiplier = input.float(title="Deviation", defval=2.5, minval=0, tooltip=devTooltip)
depth = input.int(title="Depth", defval=10, minval=1, tooltip=depthTooltip)
deleteLastLine = input.bool(title="Delete Last Line", defval=false)
bgcolorChange = input.bool(title="Change BgColor", defval=false)
// Calculate deviation threshold for identifying major swings
dev_threshold = ta.atr(10) / close * 100 * threshold_multiplier
// Prepare pivot variables
var line lineLast = na
var int iLast = 0 // Index last
var int iPrev = 0 // Index previous
var float pLast = 0 // Price last
var isHighLast = false // If false then the last pivot was a pivot low
// Custom function for detecting pivot points (and returning price + bar index)
pivots(src, length, isHigh) =>
l2 = length * 2
c = nz(src[length])
ok = true
for i = 0 to l2
if isHigh and src[i] > c // If isHigh, validate pivot high
ok := false
if not isHigh and src[i] < c // If not isHigh, validate pivot low
ok := false
if ok // If pivot is valid, return bar index + high price value
[bar_index[length], c]
else // If pivot is invalid, return na
[int(na), float(na)]
// Get bar index & price high/low for current pivots
[iH, pH] = pivots(high, depth / 2, true)
[iL, pL] = pivots(low, depth / 2, false)
// Custom function for calculating price deviation for validating large moves
calc_dev(base_price, price) => 100 * (price - base_price) / price
// Custom function for detecting pivots that meet our deviation criteria
pivotFound(dev, isHigh, index, price) =>
if isHighLast == isHigh and not na(lineLast) // Check bull/bear direction of new pivot
// New pivot in same direction as last (a pivot high), so update line upwards (ie. trend-continuation)
if isHighLast ? price > pLast : price < pLast // If new pivot is above last pivot, update line
line.set_xy2(lineLast, index, price)
[lineLast, isHighLast]
else
[line(na), bool(na)] // New pivot is not above last pivot, so don't update the line
else // Reverse the trend/pivot direction (or create the very first line if lineLast is na)
if math.abs(dev) > dev_threshold
// Price move is significant - create a new line between the pivot points
id = line.new(iLast, pLast, index, price, color=color.gray, width=1, style=line.style_dashed)
[id, isHigh]
else
[line(na), bool(na)]
// If bar index for current pivot high is not NA (ie. we have a new pivot):
if not na(iH)
dev = calc_dev(pLast, pH) // Calculate the deviation from last pivot
[id, isHigh] = pivotFound(dev, true, iH, pH) // Pass the current pivot high into pivotFound() for validation & line update
if not na(id) // If the line has been updated, update price & index values and delete previous line
if id != lineLast and deleteLastLine
line.delete(lineLast)
lineLast := id
isHighLast := isHigh
iPrev := iLast
iLast := iH
pLast := pH
else
if not na(iL) // If bar index for current pivot low is not NA (ie. we have a new pivot):
dev = calc_dev(pLast, pL) // Calculate the deviation from last pivot
[id, isHigh] = pivotFound(dev, false, iL, pL) // Pass the current pivot low into pivotFound() for validation & line update
if not na(id) // If the line has been updated, update price values and delete previous line
if id != lineLast and deleteLastLine
line.delete(lineLast)
lineLast := id
isHighLast := isHigh
iPrev := iLast
iLast := iL
pLast := pL
**// Get starting and ending high/low price of the current pivot line
startIndex = line.get_x1(lineLast)
startPrice = line.get_y1(lineLast)
endIndex = line.get_x2(lineLast)
endPrice = line.get_y2(lineLast)
// Draw top & bottom of impulsive move
topLine = line.new(startIndex, startPrice, endIndex, startPrice, extend=extend.right, color=color.red)
bottomline = line.new(startIndex, endPrice, endIndex, endPrice, extend=extend.right, color=color.green)
line.delete(topLine[1])
line.delete(bottomline[1])**
//plot(startPrice, color=color.green)
//plot(endPrice, color=color.red)
// Do what you like with these pivot values :)
// Keep in mind there will be an X bar delay between pivot price values updating based on Depth setting
dist = math.abs(startPrice - endPrice)
plot(dist, color=color.new(color.purple,100))
bullish = endPrice > startPrice
offsetBG = -(depth / 2)
bgcolor(bgcolorChange ? bullish ? color.new(color.green,90) : color.new(color.red,90) : na, offset=offsetBG)
Thank you
Changing the code
**// Get starting and ending high/low price of the current pivot line
startIndex = line.get_x1(lineLast)
startPrice = line.get_y1(lineLast)
endIndex = line.get_x2(lineLast)
endPrice = line.get_y2(lineLast)
// Draw top & bottom of impulsive move
topLine = line.new(startIndex, startPrice, endIndex, startPrice, extend=extend.right, color=color.red)
bottomline = line.new(startIndex, endPrice, endIndex, endPrice, extend=extend.right, color=color.green)
line.delete(topLine[1])
line.delete(bottomline[1])**

What is the best way to count adjacent edges by their name for each vertex?

I'm trying to count adjacent edges by their collection names.
For example, I have a vertex collection 'User' which has outbound edges to ['visited', 'add_to_cart', 'purchased'].
For each vertex user, I'd like to count adjacent edges by their collection names.
So the final return would be like
{
user_id : "user_1",
visit_count : 3,
add_to_cart_count : 5,
purchase_cnt : 1
}
I've tried the following query, but I doubt it makes the best performance as it uses if else condition and I guess it hinders the overall performance.
The query I tried :
FOR user IN User
FOR v, e, p IN OUTBOUND user visited, add_to_cart, purchased
COLLECT user_id = user.user_id
AGGREGATE
visit_count = SUM(SPLIT(e._id, '/')[0] == 'visited'? 1 : 0),
add_to_cart_count = SUM(SPLIT(e._id, '/')[0] == 'add_to_cart'? 1 : 0),
purchase_cnt = SUM(SPLIT(e._id, '/')[0] == 'purchased'? 1 : 0)
RETURN {
user_id, visit_count, add_to_cart_count, purchase_cnt
}
If IT IS the best way, would there be any index-related gains I can get get use of?
Looking forward to your help :)
Thanks.
thanks to tobias from arangoDB community, I could make it about 30% faster.
LET vis = (FOR e IN visited COLLECT user_id = e._from WITH COUNT INTO n RETURN {user_id, visit_count: n})
LET cart = (FOR e IN add_to_cart COLLECT user_id = e._from WITH COUNT INTO n RETURN {user_id, add_to_cart_count: n})
LET purc = (FOR e IN purchased COLLECT user_id = e._from WITH COUNT INTO n RETURN {user_id, purchase_cnt: n})
FOR x IN UNION(vis, cart, purc)
COLLECT user_id = x.user_id AGGREGATE visit_count = SUM(x.visit_count), add_to_cart_count = SUM(x.add_to_cart_count), purchase_cnt = SUM(x.purchase_cnt)
RETURN {user_id, visit_count, add_to_cart_count, purchase_cnt}
The point he mentioned was to collect the edge collections directly!

How to update or insert millions of rows via node oracle-db?

I'm struggling with a question - how can I insert or update a lot of data (thousands or millions of rows) using node oracle-db driver?
The point is that I can select a lot of data with the help of resultSet (handling result set)... but then I have to make some actions with a row and later update or insert a new row. And here is the problem - I don't know how to do it as fast as possible.
Can anybody help me with a piece of advice? Thanks.
I can assure you that these actions can't be done in db.
Actually, there are lots of different ways this can be done in the DB via SQL and PL/SQL when needed. Folks often want to use the language they are comfortable with, maybe JavaScript in this case, but performance will be much better if the data doesn't have to fly around between tiers.
Here's an example in just SQL alone... Granted, this could have been done via virtual columns, but it should illustrate the point.
Imagine we have the following tables:
create table things (
id number not null,
val1 number not null,
val2 number not null,
constraint things_pk primary key (id)
);
insert into things (id, val1, val2) values (1, 1, 2);
insert into things (id, val1, val2) values (2, 2, 2);
insert into things (id, val1, val2) values (3, 5, 5);
-- Will hold the sum of things val1 and val2
create table thing_sums (
thing_id number,
sum number
);
alter table thing_sums
add constraint thing_sums_fk1
foreign key (thing_id)
references things (id);
Now, the easiest and most performant way to do this would be via SQL:
insert into thing_sums (
thing_id,
sum
)
select id,
val1 + val2
from things
where id not in (
select thing_id
from thing_sums
);
Here's another example that does the same thing only via PL/SQL which can provide more control.
begin
-- This cursor for loop will bulk collect (reduces context switching between
-- SQL and PL/SQL engines) implictly.
for thing_rec in (
select *
from things
where id not in(
select thing_id
from thing_sums
)
)
loop
-- Logic in this loop could be endlessly complex. I'm inserting the values
-- within the loop but this logic could be modified to store data in arrays
-- and then insert with forall (another bulk operation) after the loop.
insert into thing_sums(
thing_id,
sum
) values (
thing_rec.id,
thing_rec.val1 + thing_rec.val2
);
end loop;
end;
Either of those could be called from the Node.js driver. However, let's say you need to do this from the driver (maybe you're ingesting data that's not already in the database). Here's an example the demonstrates calling PL/SQL from the driver that uses bulk processing rather than row by row operations. This is much faster due to reduced round trips.
I pulled this from a blog post I'm working on so the table definition is a little different:
create table things (
id number not null,
name varchar2(50),
constraint things_pk primary key (id)
);
And here's the JavaScript:
var oracledb = require('oracledb');
var async = require('async');
var config = require('./dbconfig');
var things = [];
var idx;
function getThings(count) {
var things = [];
for (idx = 0; idx < count; idx += 1) {
things[idx] = {
id: idx,
name: "Thing number " + idx
};
}
return things;
}
things = getThings(500);
oracledb.getConnection(config, function(err, conn) {
var ids = [];
var names = [];
var start = Date.now();
if (err) {throw err;}
// We need to break up the array of JavaScript objects into arrays that
// work with node-oracledb bindings.
for (idx = 0; idx < things.length; idx += 1) {
ids.push(things[idx].id);
names.push(things[idx].name);
}
conn.execute(
` declare
type number_aat is table of number
index by pls_integer;
type varchar2_aat is table of varchar2(50)
index by pls_integer;
l_ids number_aat := :ids;
l_names varchar2_aat := :names;
begin
forall x in l_ids.first .. l_ids.last
insert into things (id, name) values (l_ids(x), l_names(x));
end;`,
{
ids: {
type: oracledb.NUMBER,
dir: oracledb.BIND_IN,
val: ids
},
names: {
type: oracledb.STRING,
dir: oracledb.BIND_IN,
val: names
}
},
{
autoCommit: true
},
function(err) {
if (err) {console.log(err); return;}
console.log('Success. Inserted ' + things.length + ' rows in ' + (Date.now() - start) + ' ms.');
}
);
});
I hope that helps! :)

How do I add two column values in a table with CQL?

I am needing to add two values together to create a third value with CQL. Is there any way to do this? My table has the columns number_of_x and number_of_y and I am trying to create total. I did an update on the table with a set command as follows:
UPDATE my_table
SET total = number_of_x + number_of_y ;
When I run that I get the message back saying:
no viable alternative at input ';'.
Per docs. assignment is one of:
column_name = value
set_or_list_item = set_or_list_item ( + | - ) ...
map_name = map_name ( + | - ) ...
map_name = map_name ( + | - ) { map_key : map_value, ... }
column_name [ term ] = value
counter_column_name = counter_column_name ( + | - ) integer
And you cannot mix counter and non counter columns in the same table so what you are describing is impossible in a single statement. But you can do a read before write:
CREATE TABLE my_table ( total int, x int, y int, key text PRIMARY KEY )
INSERT INTO my_table (key, x, y) VALUES ('CUST_1', 1, 1);
SELECT * FROM my_table WHERE key = 'CUST_1';
key | total | x | y
--------+-------+---+---
CUST_1 | null | 1 | 1
UPDATE my_table SET total = 2 WHERE key = 'CUST_1' IF x = 1 AND y = 1;
[applied]
-----------
True
SELECT * FROM my_table WHERE key = 'CUST_1';
key | total | x | y
--------+-------+---+---
CUST_1 | 2 | 1 | 1
The IF clause will handle concurrency issues if x or y was updated since the SELECT. You can than retry again if applied is False.
My recommendation however in this scenario is for your application to just read both x and y, then do addition locally as it will perform MUCH better.
If you really want C* to do the addition for you, there is a sum aggregate function in 2.2+ but it will require updating your schema a little:
CREATE TABLE table_for_aggregate (key text, type text, value int, PRIMARY KEY (key, type));
INSERT INTO table_for_aggregate (key, type, value) VALUES ('CUST_1', 'X', 1);
INSERT INTO table_for_aggregate (key, type, value) VALUES ('CUST_1', 'Y', 1);
SELECT sum(value) from table_for_aggregate WHERE key = 'CUST_1';
system.sum(value)
-------------------
2

Need to cut a string(Table Name) from Query in c#

My String/Query looks like this
insert into Employee Values(1,2,'xxx');
update Employee2 set col1='xxx' where col2='yyy';
select * from Employee3;
I need to take/have TableName alone. Table name won't be constant it will be differed(Employee,Employee2,Employee3) according to DB. I'm new to C# please help me. Thanks in advance.
To get the name of a table from a query (or in this case, a string named 'sql'), try the following:
string sql = "select * from table ";
int index1 = 0;
int index2 = 0;
int currentIndex = 0;
int numSpaces = 0;
char[] chArray = sql.ToCharArray();
foreach (char c in chArray)
{
if (c == ' ')
{
numSpaces++;
if (numSpaces == 3)
index1 = currentIndex;
if (numSpaces == 4)
{
index2 = currentIndex;
break;
}
}
currentIndex++;
}
int length = index2 - index1;
string tableName = sql.Substring(index1, length);
MessageBox.Show(tableName);
Warning - this solution is based on finding the word between the 3rd and 4th space character. This limits you to have a very predictable query structure - more complicated queries may not work with this solution. Your query structure needs to be:
"select column_name1,column_name2,column_name3 from table "
Your query must not have spaces between columns and must have a space at the end aswell. Sorry for the limitations but its the best I can come up with ;)

Resources