In cassandra is there any way to define minimum value of a counter in counter column. Let say when counter values reaches 0 it should not go below this even if i do decrement operation.
There isn't. Your counter is initialized with a value of 0, you can increment it, decrement it, and query its value. It is an integer between (-2^64 and 2^63 - 1). Sum / substraction will overflow when you hit the min / max values.
If you try to handle the logic in your application, it would be easy if you only have 1 application who write, but you probably have more than one. It would be doable if your applications are on the same system where they can use a lock, again I am guessing that's not the case, plus the performance would drop. In a distributed environment, you would need to be able to get a distributed lock, the performance would suffer.
If you really want to achieve this functionality with Cassandra, you can emulate it with the following strategy:
1. Table definition
CREATE TABLE test.counter (
my_key tinyint,
my_random uuid,
my_operation int,
my_non_key tinyint,
PRIMARY KEY ((my_key), my_operation, my_random)
);
This table will be used to keep track of the increment / decrement operation you are running. A few notes:
The partition key my_key will be always used with the same value: 0. It is used to collocate all the operations (incremente / decremente) in the same partition key.
The my_random value must be a random value generated without any
chance of collision. uuid can be used for that. Without this column, executing twice the same operation (such as increment by 10) will be only stored once. Each operation will have its own uuid.
my_operation keeps track of the increment / decrement value you execute.
my_non_key is a dummy column that we are going to use to query the write timestamp, as we cannot query it on the primary key columns. We will always set my_non_key to 0.
2. Counter initialization
You can initialize your counter, say to zero, with:
INSERT INTO test.counter (my_key , my_random, my_operation, my_non_key ) VALUES
( 0, 419ec9cc-ef53-4767-942e-7f0bf9c63a9d, 0, 0);
3. Counter increment
Let's say you add some number, such as 10. You would do so by inserting with the same partition key 0, a new random uuid, and a value of 10:
INSERT INTO test.counter (my_key , my_random, my_operation, my_non_key) VALUES
( 0, d2c68d2a-9e40-486b-bb69-42a0c1d0c506, 10, 0);
4. Counter decrement
Let's say you substract 15 now:
INSERT INTO test.counter (my_key , my_random, my_operation, my_non_key ) VALUES
( 0, e7a5c52c-e1af-408f-960e-e98c48504dac, -15, 0);
5. Counter increment
Let's say you add 1 now:
INSERT INTO test.counter (my_key , my_random, my_operation, my_non_key ) VALUES
( 0, 980554e6-5918-4c8d-b935-dde74e02109b, 1, 0);
6. Counter query
Now, let's say you want to query your counter, you would need to run:
SELECT my_operation, writetime(my_non_key), my_random FROM test.counter WHERE my_key = 0;
which will return: 0; 10; -15; 1 with the timestamp at which it was written. Your application now has all the information to calculate the correct value, since it knows in which order the incremente / decremente operations occured. This is of course necessary when the counter is reaching zero towards negative values. In this case, your application should be able to calculate that the right value which is 1.
6. Cleaning up
At regular interval, or when you query the counter, you could combine values together and delete old one in a batch statement to ensure atomicity, for example:
BEGIN BATCH
DELETE FROM test.counter WHERE my_key = 0 AND my_operation = -5 and my_random = e7a5c52c-e1af-408f-960e-e98c48504dac;
DELETE FROM test.counter WHERE my_key = 0 AND my_operation = 0 and my_random = 419ec9cc-ef53-4767-942e-7f0bf9c63a9d;
DELETE FROM test.counter WHERE my_key = 0 AND my_operation = 10 and my_random = d2c68d2a-9e40-486b-bb69-42a0c1d0c506;
INSERT INTO test.counter (my_key , my_random, my_operation, my_non_key ) VALUES (0, ca67df54-62c7-4d31-a79c-a0011439b486, 1, 0);
APPLY BATCH;
Final notes
Performance-wise, this should be acceptable, as write are cheaps, and reads are done on a single partition.
Cassandra is an eventual-consistent DB. This means that this counter is also eventualy consistent. If you need strong consistency, you will need to tune your read/write consistency correctly:
https://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlConfigConsistency.html
Related
Given a dictionary which contains keys and values, and I want sum the values based on the keys value. For example, {1:10, 2:20, 3:30, 4:40, 5:50, 6:60}, and sum the values only if is equal or greater than 2 in keys, which output is 200.
x =2
count = 0
for key, value in dictionary.items():
while key == x:
count += 1[value]
And my output is none, and I don't know what I am missing on.
Try this. Your way of iterating over the dictionary items is correct, but inside the loop, you need to check if the current key is greater than or equal to your required key. Only then you should increment the count with the value corresponding to that key which can be retrieved in this way - dictionary[key] or you can simply add the value like count+=value
dictionary = {1:10, 2:20, 3:30, 4:40, 5:50, 6:60}
x=2
count = 0
for key,value in dictionary.items():
if key>=x:
count += dictionary[key]
print(count)
your code is incomplete and won't run as-is, so it's difficult to speculate why you're getting an output of None.
in your requirements you mention "equal or greater than 2" but your code has "key == x". This should be "key >= x".
inside your for loop you have a while. Fixing other issues this would result in an infinite loop. You want an if, not a while.
fixing those things and making an assumption or two, your code would be:
x = 2
count = 0
for key, value in dictionary.items():
if key >= x:
count += value
Alternately, you could write it in a single line of code:
sum ( v for k, v in dictionary.items() if k >= x )
I believe you just need to do as below:
count = 0
for key, value in dictionary.items():
if key >= n:
count += value
I'm extremely new to programming in general and have only been learning Python for 1 week.
For a class, I have to analyze a text DNA sequence, something like this:
CTAGATAGATAGATAGATAGATGACTA
for these specific keys: AGAT,AATG,TATC
I have to keep track of the largest number of consecutive repetitions for each, disregarding all but the highest number of repetitions.
I've been pouring over previous stackoverflow answers and I saw groupby() suggested as a way to do this. I'm not exactly sure how to use groupby for my specific implementation needs though.
It seems like I will have to read the text sequence from a file into a list. Can I import what is essentially a text string into a list? Do I have to separate all of the characters by commas? Will groupby work on a string?
It also looks like groupby would give me the highest incident of consecutive repetitions, but in the form of a list. How would I get the highest result from that list out of that list to them be stored somewhere else, without me the programmer having to look at the result? Will groupby return the highest number of consecutive repeats first in the list? Or will it be placed in order of when it occured in the list?
Is there a function I can use to isolate and return the sequence with the highest repetition incidence, so that I can compare that with the dictionary file I've been provided with?
Frankly, I really could use some help breaking down the groupby function in general.
My assignment recommended possibly using a slice to accomplish this, and that seemed somehow more daunting to try, but if that's the way to go, please let me know, and I wouldn't turn down a mudge in the direction on how in the heck to do that.
Thank you in advance for any and all wisdom on this.
Here's a similar solution to the previous post, but may have better readability.
# The DNA Sequence
DNA = "CTAGATAGATAGATAGATAGATGACTAGCTAGATAGATAGATAGATAGATGACTAGAGATAGATAGATCTAG"
# All Sequences of Interest
elements = {"AGAT", "AATG", "TATC"}
# Add Elements to A Dictionary
maxSeq = {}
for element in elements:
maxSeq[element] = 0
# Find Max Sequence for Each Element
for element in elements:
i = 0
curCount = 0
# Ensure DNA Length Not Reached
while i+4 <= len(DNA):
# Sequence Not Being Tracked
if curCount == 0:
# Sequence Found
if DNA[i: i + 4] == element:
curCount = 1
i += 4
# Sequence Not Found
else: i += 1
# Sequence Is Being Tracked
else:
# Sequence Found
if DNA[i: i + 4] == element:
curCount += 1
i += 4
# Sequence Not Found
else:
# Check If Previous Max Was Beat
if curCount > maxSeq[element]:
maxSeq[element] = curCount
# Reset Count
curCount = 0
i += 1
#Check If Sequence Was Being Tracked At End
if curCount > maxSeq[element]: maxSeq[element] = curCount
#Display
print(maxSeq)
Output:
{'AGAT': 5, 'TATC': 0, 'AATG': 0}
This doesn't seem like a groupby problem since you want multiple groups of the same key. It would easier to just scan the list for key counts.
# all keys (keys are four chars each)
seq = "CTAGATAGATAGATAGATAGATGACTAGCTAGATAGATAGATAGATAGATGACTAGAGATAGATAGATCTAG"
# split key string into list of keys: ["CTAG","ATAG","ATAG","ATAG", ....]
lst = [seq[i:i+4] for i in (range(0,len(seq),4))]
lst.append('X') # the while loop only tallies when next key found, so add fake end key
# these are the keys we care about and want to store the max consecutive counts
dicMax = { 'AGAT':0, 'AATG':0, 'TATC':0, 'ATAG':0 } #dictionary of keys and max consecutive key count
# the while loop starts at the 2nd entry, so set variables based on first entry
cnt = 1
key = lst[0] #first key in list
if (key in dicMax): dicMax[key] = 1 #store first key in case it's the max for this key
ctr = 1 # start at second entry in key list (we always compare to previous entry so can't start at 0)
while ctr < len(lst): #all keys in list
if (lst[ctr] != lst[ctr-1]): #if this key is different from previous key in list
if (key in dicMax and cnt > dicMax[key]): #if we care about this key and current count is larger than stored count
dicMax[key] = cnt #store current count as max count for this key
#set variables for next key in list
cnt = 0
key = lst[ctr]
ctr += 1 #list counter
cnt += 1 #counter for current key
print(dicMax) # max consecutive count for each key
Raiyan Chowdhury suggested that the sequences may overlap, so dividing the base sequence into four character strings may not work. In this case, we need to search for each string individually.
Note that this algorithm is not efficient, but readable to a new programmer.
seq = "CTAGATAGATAGATAGATAGATGACTAGCTAGATAGATAGATAGATAGATGACTAGAGATAGATAGATCTAG"
dicMax = { 'AGAT':0, 'AATG':0, 'TATC':0, 'ATAG':0 } #dictionary of keys and max consecutive key count
for key in dicMax: #each key, could divide and conquer here so all keys run at same time
for ctr in range(1,9999): #keep adding key to itself ABC > ABCABC > ABCABCABC
s = key * ctr #create string by repeating key "ABC" * 2 = "ABCABC"
if (s in seq): # if repeated key found in full sequence
dicMax[key]=ctr # set max (repeat) count for this key
else:
break; # exit inner for #done with this key
print(dicMax) #max consecutive key counts
How to find the minimum number of ways in which elements taken from a list can sum towards a given number(N)
For example if list = [1,3,7,4] and N=14 function should return 2 as 7+7=14
Again if N= 11, function should return 2 as 7+4 =11. I think I have figured out the algorithm but unable to implement it in code.
Pls use Python, as that is the only language I understand(at present)
Sorry!!!
Since you mention dynamic programming in your question, and you say that you have figured out the algorithm, i will just include an implementation of the basic tabular method written in Python without too much theory.
The idea is to have a tabular structure we will use to compute all possible values we need without having to doing the same computations many times.
The basic formula will try to sum values in the list till we reach the target value, for every target value.
It should work, but you can of course make some optimization like trying to order the list and/or find dividends in order to construct a smaller table and have faster termination.
Here is the code:
import sys
# num_list : list of numbers
# value: value for which we want to get the minimum number of addends
def min_sum(num_list, value):
list_len = len(num_list)
# We will use the tipycal dynamic programming table construct
# the key of the list will be the sum value we want,
# and the value will be the
# minimum number of items to sum
# Base case value = 0, first element of the list is zero
value_table = [0]
# Initialize all table values to MAX
# for range i use value+1 because python range doesn't include the end
# number
for i in range(1, value+1):
value_table.append(sys.maxsize);
# try every combination that is smaller than <value>
for i in range(1, value+1):
for j in range(0, list_len):
if (num_list[j] <= i):
tmp = value_table[i-num_list[j]]
if ((tmp != sys.maxsize) and (tmp + 1 < value_table[i])):
value_table[i] = tmp + 1
return value_table[value]
## TEST ##
num_list = [1,3,16,5,3]
value = 22
print("Min Sum: ",min_sum(num_list,value)) # Outputs 3
it would be helpful if you include your Algorithm in Pseudocode - it will very much look like Python :-)
Another aspect: your first operation is a multiplication with one item from the list (7) and one outside of the list (2), whereas for the second opration it is 7+4 - both values in the list.
Is there a limitation for which operation or which items to use (from within or without the list)?
I'm trying to create a counter from 1 to n in an Arango query. Basically, I need to group objects by a number in that range, and return the count of the objects in each group. I had hoped creating a for loop would work, but it doesn't seem to be incrementing. I'm unsure about the syntax.
The two loop I've tried are:
FOR count IN [0,1,2,3]
and
FOR count IN 0..12
Those are the correct syntax to creating a counter loop.
The reason why it didn't seem to work for me was because I put the return statement containing the count variable in the sub-loop. It works when it is in the outer loop, like so:
FOR count IN 0..12
LET total = COUNT (
FOR v, e IN OUTBOUND "some-object" GRAPH "some-graph"
RETURN 1
)
RETURN {total: total, count: count}
How to make a request of this type in cassandra?
UPDATE my_table SET my_column1 = MAX(my_column1, 100) and my_column2 = my_column2 + 10;
max() function not exist. Can by using apache spark do this?
thanks!
MAX is idempotent and seems like it is simple to do in this case the problem is that C* is a general database and needs to handle some edge cases. Particularly an issue is with deletes and TTLs, since as old data goes away it needs to still maintain the max.
A couple ways you can do this is either create a value that you update on inserts atomically or keep all the values inserted around in order so as things delete/ttl the old ones are still there to take its place (at the obvious disk cost).
CREATE TABLE my_table_max (
key text,
max int static,
deletableMax int,
PRIMARY KEY (key, deletableMax)
) WITH CLUSTERING ORDER BY (deletableMax DESC);
Then atomically update your max, or for the deletable implementation insert the new value:
BEGIN BATCH
INSERT INTO my_table_max (key, max) VALUES ('test', 1) IF NOT EXISTS;
INSERT INTO my_table_max (key, deletableMax) VALUES ('test', 1);
APPLY BATCH;
BEGIN BATCH
UPDATE my_table_max SET max = 5 WHERE key='test' IF max = 1;
INSERT INTO my_table_max (key, deletableMax) VALUES ('test', 5);
APPLY BATCH;
then just querying top 1 gives you the max:
select * from my_table_max limit 1;
key | deletableMax | max
------+--------------+-----
test | 5 | 5
Difference between these two would be seen after a delete:
delete from my_table_max WHERE key = 'test' and deletablemax = 5;
cqlsh:test_ks> select * from my_table_max limit 1;
key | deletablemax | max
------+--------------+-----
test | 1 | 5
Since it keeps track of all the values in order the older value is kept;