Cassandra: how to initialize the counter column with value?

Cassandra: how to initialize the counter column with value? - cassandra

I have to benchmark Cassandra with the Facebook Linkbench. There are two phase during the Benchmark, the load and the request phase.
in the Load Phase, Linkbench fill the cassandra tables : nodes, links and counts (for links counting) with default values(graph data).
the count table looks like this:
keyspace.counttable (
link_id bigint,
link_type bigint,
time bigint,
version bigint,
count counter,
PRIMARY KEY (link_id, link_type, time, version)
my question is how to insert the default counter values (before incrementing and decrementing the counter in the Linkbench request phase) ?
If it isn't possible to do that with cassandra, how should i increment/decrement a bigint variable (instead of counter variable)
Any suggest and comments? Thanks a lot.

The default value is zero. Given
create table counttable (
link_id bigint,
link_type bigint,
time bigint,
version bigint,
count counter,
PRIMARY KEY (link_id, link_type, time, version)
);
and
update counttable set count = count + 1 where link_id = 1 and link_type = 1 and time = 1 and version = 1;
We see that the value of count is now 1.
select * from counttable ;
link_id | link_type | time | version | count
---------+-----------+------+---------+-------
1 | 1 | 1 | 1 | 1
(1 rows)
So, if we want to set it to some other value we can:
update counttable set count = count + 500 where link_id = 1 and link_type = 1 and time = 1 and version = 2;
select * from counttable ;
link_id | link_type | time | version | count
---------+-----------+------+---------+-------
1 | 1 | 1 | 1 | 1
1 | 1 | 1 | 2 | 500
(2 rows)

There is no elegant way to initialize a counter column with a non-zero value. The only operation you can do on a counter field is increment/decrement. I recommend to keep the offset (e.g. the your intended initial value) in a different column, and simply add the two values in your client application.

Thank you for the Answers. I implemented the following solution to initialize the counter Field.
as the initial and default value of the counter Field is 0 ,i incremented it with my default value. it looks like Don Branson solution but with only one column:
create table counttable (
link_id bigint,
link_type bigint,
count counter,
PRIMARY KEY (link_id, link_type)
);
i set the value with this statement (during the load phase):
update counttable set count = count + myValue where link_id = 1 and link_type = 1;
select * from counttable ;
link_id | link_type | count
---------+-----------+--------
1 | 1 | myValue (added to 0)
(1 row)

Related

Identify if an upsert operation inserts or updates the row in YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
Is there a way to identify if an upsert operation like the one shown below, inserts or either updates the row e.g., with the Java or Golang driver?
UPDATE test set value = 'value1', checkpoint = 'cas1' WHERE key = 'key1' IF checkpoint = '' OR NOT EXISTS;

The RETURNS STATUS AS ROW is a YCQL feature. In YSQL, you could use an AFTER INSERT OR UPDATE... EACH ROW trigger to detect the outcome. The challenge, then, would be to surfcae the result in the session that made the change. You could use a user-defined run-time parameter (set my_app.outcome = 'true') or a temp table.
—regards, bryn#yugabyte.com

You can use RETURNS STATUS AS ROW as documented here: https://docs.yugabyte.com/preview/api/ycql/dml_update/#returns-status-as-row 2
Example:
cqlsh:sample> CREATE TABLE test(h INT, r INT, v LIST<INT>, PRIMARY KEY(h,r)) WITH transactions={'enabled': true};
cqlsh:sample> INSERT INTO test(h,r,v) VALUES (1,1,[1,2]);
Unapplied update when IF condition is false:
cqlsh:sample> UPDATE test SET v[2] = 4 WHERE h = 1 AND r = 1 IF v[1] = 3 RETURNS STATUS AS ROW;
[applied] | [message] | h | r | v
-----------+-----------+---+---+--------
False | null | 1 | 1 | [1, 2]
Applied update when IF condition true:
cqlsh:sample> UPDATE test SET v[0] = 4 WHERE h = 1 AND r = 1 IF v[1] = 2 RETURNS STATUS AS ROW;
[applied] | [message] | h | r | v
-----------+-----------+------+------+------
True | null | null | null | null

Pyspark How to create columns and fill True/False if rolling datetime record exists

Data-set contains products with daily record but sometime it misses out so i want to create extra columns to show whether it exists or not in the past few days
i have conditions below
Create T-1, T-2 and so on columns and fill it with below
Fill T-1 with 1 the record exist, otherwise zero
Original Table :
Item Cat DateTime Value
A C1 1-1-2021 10
A C1 2-1-2021 10
A C1 3-1-2021 10
A C1 4-1-2021 10
A C1 5-1-2021 10
A C1 6-1-2021 10
B C1 1-1-2021 20
B C1 4-1-2021 20
Expect Result :
Item Cat DateTime Value T-1 T-2 T-3 T-4 T-5
A C1 1-1-2021 10 0 0 0 0 0
A C1 2-1-2021 10 1 0 0 0 0 (T-1 is 1 as we have 1-1-2021 record)
A C1 3-1-2021 10 1 1 0 0 0
A C1 4-1-2021 10 1 1 1 0 0
A C1 5-1-2021 10 1 1 1 1 0
A C1 6-1-2021 10 1 1 1 1 1
B C1 1-1-2021 20 0 0 0 0 0
B C1 2-1-2021 0 1 0 0 0 0 (2-1-2021 record need to be created with value zero since we miss this from original data-set, plus T-1 is 1 as we have this record from original data-set)
B C1 3-1-2021 0 0 1 0 0 0
B C1 4-1-2021 20 0 0 1 0 0
B C1 5-1-2021 0 1 0 0 1 0

Let's assume you have the original table data stored in original_data, we can
create a temporary view to query with spark sql named daily_records
generate possible dates . This was done by identifying the amount of days between the min and max dates from the dataset then generating the possible dates using table generating function explode and spaces
generate all possible item, date records
join these records with the actual to have a complete dataset with values
Use spark sql to query the view and create the additional column using the left joins and CASE statements
# Step 1
original_data.createOrReplaceTempView("daily_records")
# Step 2-4
daily_records = sparkSession.sql("""
WITH date_bounds AS (
SELECT min(DateTime) as mindate, max(DateTime) as maxdate FROM daily_records
),
possible_dates AS (
SELECT
date_add(mindate,index.pos) as DateTime
FROM
date_bounds
lateral view posexplode(split(space(datediff(maxdate,mindate)),"")) index
),
unique_items AS (
SELECT DISTINCT Item, Cat from daily_records
),
possible__item_dates AS (
SELECT Item, Cat, DateTime FROM unique_items INNER JOIN possible_dates ON 1=1
),
possible_records AS (
SELECT
p.Item,
p.Cat,
p.DateTime,
r.Value
FROM
possible__item_dates p
LEFT JOIN
daily_records r on p.Item = r.Item and p.DateTime = r.DateTime
)
select * from possible_records
""")
daily_records.createOrReplaceTempView("daily_records")
daily_records.show()
# Step 5 - store results in desired_result
# This is optional, but I have chosen to generate the sql to create this dataframe
periods = 5 # Number of periods to check for
period_columns = ",".join(["""
CASE
WHEN t{0}.Value IS NULL THEN 0
ELSE 1
END as `T-{0}`
""".format(i) for i in range(1,periods+1)])
period_joins = " ".join(["""
LEFT JOIN
daily_records t{0} on datediff(to_date(t.DateTime),to_date(t{0}.DateTime))={0} and t.Item = t{0}.Item
""".format(i) for i in range(1,periods+1)])
period_sql = """
SELECT
t.*
{0}
FROM
daily_records t
{1}
ORDER BY
Item, DateTime
""".format(
"" if len(period_columns)==0 else ",{0}".format(period_columns),
period_joins
)
desired_result= sparkSession.sql(period_sql)
desired_result.show()
Actual SQL generated:
SELECT
t.*,
CASE
WHEN t1.Value IS NULL THEN 0
ELSE 1
END as `T-1`,
CASE
WHEN t2.Value IS NULL THEN 0
ELSE 1
END as `T-2`,
CASE
WHEN t3.Value IS NULL THEN 0
ELSE 1
END as `T-3`,
CASE
WHEN t4.Value IS NULL THEN 0
ELSE 1
END as `T-4`,
CASE
WHEN t5.Value IS NULL THEN 0
ELSE 1
END as `T-5`
FROM
daily_records t
LEFT JOIN
daily_records t1 on datediff(to_date(t.DateTime),to_date(t1.DateTime))=1 and t.Item = t1.Item
LEFT JOIN
daily_records t2 on datediff(to_date(t.DateTime),to_date(t2.DateTime))=2 and t.Item = t2.Item
LEFT JOIN
daily_records t3 on datediff(to_date(t.DateTime),to_date(t3.DateTime))=3 and t.Item = t3.Item
LEFT JOIN
daily_records t4 on datediff(to_date(t.DateTime),to_date(t4.DateTime))=4 and t.Item = t4.Item
LEFT JOIN
daily_records t5 on datediff(to_date(t.DateTime),to_date(t5.DateTime))=5 and t.Item = t5.Item
ORDER BY
Item, DateTime
NB. to_date is optional if DateTime is already formatted as a date field or in the format yyyy-mm-dd

Cassandra Partition key duplicates?

I am new to Cassandra so I had a few quick questions, suppose I do this:
CREATE TABLE my_keyspace.my_table (
id bigint,
year int,
datetime timestamp,
field1 int,
field2 int,
PRIMARY KEY ((id, year), datetime))
I imagine Cassandra as something like Map<PartitionKey, SortedMap<ColKey, ColVal>>,
My question is when querying for something from Cassandra using a WHERE, it will be like:
SELECT * FROM my_keyspace.my_table WHERE id = 1 AND year = 4,
This could return 2 or more records, how does this fit in with the data model of Cassandra?
If it really it a Big HashMap how come duplicate records for a partition key are allowed?
Thanks!

There is a batch of entries in the SortedMap<ColKey, ColVal> for each row, using its sorted nature.
To build on your mental model, while there is only 1 partition key for id = 1 AND year = 4 there are multiple cells:
(id, year) | ColKey | ColVal
------------------------------------------
1, 4 | datetime(1):field1 | 1 \ Row1
1, 4 | datetime(1):field2 | 2 /
1, 4 | datetime(5):field1 | 1 \
1, 4 | datetime(5):field2 | 2 / Row2
...

Find all columns with name X and fill with values, but increment each column

I am using the newest excel.
I have a structure such as this:
| ID | Day | Value | Day | Value | Day | Value.............| Day | Value |
1 val val val val
2 val val val val
3 val val val val
I need the data to look like this
| ID | Day | Value | Day | Value | Day | Value.............| Day | Value |
1 1 val 2 val 3 val N val
2 1 val 2 val 3 val N val
3 1 val 2 val 3 val N val
Essentially I need to go to the first day column and add a 1 for all the ID, then go to the second day column and incremnt by 1 and write 2 for all id. This is going to be repeated. Manually I can do this by putting a 1 then drag down, put a 2 and drag down. The problem is I have a couple of thousand day columns so it will be very time consuming. Is there an automated way to accomplish this task?

Here's a VBA code you could use (replacing appropriate values for LastRow and LastCol):
Dim LastRow as Integer, LastCol as Integer, Count as Integer
LastRow = 100
LastCol = 100
Count = 1
For i = 2 to LastRow
For t = 2 to LastCol step 2
Cells(i,t).Value = Count
Count = Count + 1
Next t
Count = 1
Next i

Not automated but quite quick. Assuming ID is in A1. Add a row (say Row1) with series fill and sort columns by Row2. Series fill Row3 columns for Days and copy down to suit. Sort columns by Row1 and delete Row1.

column values in a row

I have following table
id count hour age range
-------------------------------------
0 5 10 61 10-200
1 6 20 61 10-200
2 7 15 61 10-200
5 9 5 61 201-300
7 10 25 61 201-300
0 5 10 62 10-20
1 6 20 62 10-20
2 7 15 62 10-20
5 9 5 62 21-30
1 8 6 62 21-30
7 10 25 62 21-30
10 15 30 62 31-40
I need to select distinct values of column range
I tried following query
Select distinct range as interval from table name where age = 62;
its result is in a column as follows:
interval
----------
10-20
21-30
31-41
How can I get result as follows?
10-20, 21-30, 31-40
EDITED:
I am now trying following query:
select sys_connect_by_path(range,',') interval
from
(select distinct NVL(range,'0') range , ROW_NUMBER() OVER (ORDER BY RANGE) rn
from table_name where age = 62)
where connect_by_isleaf = 1 CONNECT BY rn = PRIOR rn+1 start with rn = 1;
Which is giving me output as:
Interval
----------------------------------------------------------------------------
, 10-20,10-20,10-20,21-30,21-30, 31-40
guys plz help me to get my desired output.

If you are on 11.2 rather than just 11.1, you can use the LISTAGG aggregate function
SELECT listagg( interval, ',' )
WITHIN GROUP( ORDER BY interval )
FROM (SELECT DISTINCT range AS interval
FROM table_name
WHERE age = 62)
If you are using an earlier version of Oracle, you could use one of the other Oracle string aggregation techniques on Tim Hall's page. Prior to 11.2, my personal preference would be to create a user-defined aggregate function so that you can then
SELECT string_agg( interval )
FROM (SELECT DISTINCT range AS interval
FROM table_name
WHERE age = 62)
If you don't want to create a function, however, you can use the ROW_NUMBER and SYS_CONNECT_BY_PATH approach though that tends to get a bit harder to follow
with x as (
SELECT DISTINCT range AS interval
FROM table_name
WHERE age = 62 )
select ltrim( max( sys_connect_by_path(interval, ','))
keep (dense_rank last order by curr),
',') range
from (select interval,
row_number() over (order by interval) as curr,
row_number() over (order by interval) -1 as prev
from x)
connect by prev = PRIOR curr
start with curr = 1

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cassandra: how to initialize the counter column with value? - cassandra

Related

Identify if an upsert operation inserts or updates the row in YugabyteDB

Pyspark How to create columns and fill True/False if rolling datetime record exists

Cassandra Partition key duplicates?

Find all columns with name X and fill with values, but increment each column

column values in a row

Categories

Resources