Lock CustomRecord Serials Table - netsuite

We have a Fulfillment script in 1.0 that pulls a Serial number from the custom record based on SKU and other parameters. There is a seach that is created based on SKU and the fist available record is used. One of the criteria for search is that thee is no end user associated with the key.
We are working on converting the script to 2.0. What I am unable to figure out is, if the script(say the above functionality is put into Map function for a MR script) will run on multiple queues/instances, does that mean that there is a potential chance that 2 instance might hit the same entry of the Custom record? What is a workaround to ensure that X instances of Map function dont end us using the same SN/Key? The way this could happen in 2.0 would be that 2 instance of Map make a search request on Custom record at same time and get the same results since the first Map has not completed processing and marked the key as used(updating the end user information on key).
Is there a better way to accomplish this in 2.0 or do I need to go about creating another custom record that script would have to read to be able to pull key off of. Also is there a wait I can implement if the table is locked?
Thx

Probably the best thing to do here would be to break your assignment process into two parts or restructure it so you end up with a Scheduled script that you give an explicit queue. That way your access to Serial Numbers will be serialized and no extra work would need to be done by you. If you need hint on processing large batches with SS2 see https://github.com/BKnights/KotN-Netsuite-2 for a utility script that you can require for large batch processing.
If that's not possible then what I have done is the following:
Create another custom record called "Lock Table". It must have at least an id and a text field. Create one record and note its internal id. If you leave it with a name column then give it a name that reflects its purpose.
When you want to pull a serial number you:
read from Lock Table with a lookup field function. If it's not 0 then do a wait*.
If it's 0 then generate a random integer from 0 to MAX_SAFE_INTEGER.
try to write that to the "Lock Table" with a submit field function. Then read that back right away. If it contains your random number then you have the lock. If it doesn't then wait*.
If you have the lock then go ahead and assign the serial number. Release the lock by writing back a 0.
wait:
this is tough in NS. Since I am not expecting the s/n assignment to take much time I've sometimes initiated a wait as simply looping through what I hope is a cpu intensive task that has no governance cost until some time has elapsed.

Related

Cassandra counter usage

I am finding some difficulties in the data modeling of an application which may involve the use of counters.
The app is basically a messaging app. Messages are bounded for free users, hence the initial plan of using a counter column to keep track of the total count.
I've discovered that batches (logged or not) cannot contain operations on both standard tables and counter ones. How do I ensure correctness if I cannot batch the operation I am trying to perform and the counter update together? Is the counter type really needed if there's basically no race condition on the column, being that associated to each individual user?
My second idea would be to use a standard int column to use only inside batches. Is this a viable option?
Thank you
If you can absolutely guarantee that each user will produce only one update at time then you could rely on plain ints to perform the job.
The problem however is that you will need to perform a read-before-write anti-pattern. You could solve this as well, eg skipping the read part by caching your ints and performing in-memory updates followed by writes only. This is viable by coupling your system with a caching server (e.g. Redis).
And thinking about it, you should still need to read these counters at some point, because if the number of messages a free user can send is bound to some value then you need to perform a check when they login/try to send a new message/look at the dashboard/etc and block their action.
Another option (if you store the messages sent by each user somewhere and don't want to add complexity to your system) could be to directly count them with a SELECT COUNT... type query, even if this could be become pretty inefficient very soon in the Cassandra world.

Does CQL3 "IF" make my update not idempotent?

It seems to me that using IF would make the statement possibly fail if re-tried. Therefore, the statement is not idempotent. For instance, given the CQL below, if it fails because of a timeout or system problem and I retry it, then it may not work because another person may have updated the version between retries.
UPDATE users
SET name = 'foo', version = 4
WHERE userid = 1
IF version = 3
Best practices for updates in Cassandra are to make updates idempotent, yet the IF operator is in direct opposition to this. Am I missing something?
If your application is idempotent, then generally you wouldn't need to use the expensive IF clause, since all your clients would be trying to set the same value.
For example, suppose your clients were aggregating some values and writing the result to a roll up table. Each client would calculate the same total and write the same value, so it wouldn't matter if multiple clients wrote to it, or what order they wrote to it, since it would be the same value.
If what you are actually looking for is mutual exclusion, such as keeping a bank balance, then the IF clause could be used. You might read a row to get the current balance, then subtract some money and update the balance only if the balance hadn't changed since you read it. If another client was trying to add a deposit at the same time, then it would fail and would have to try again.
But another way to do that without mutual exclusion is to write each withdrawal and deposit as a separate clustered transaction row, and then calculate the balance as an idempotent result of applying all the transaction rows.
You can use the IF clause for idempotent writes, but it seems pointless. The first client to do the write would succeed and Cassandra would return the value "applied=True". And the next client to try the same write would get back "applied=False, version=4", indicating that the row had already been updated to version 4 so nothing was changed.
This question is more about linerizability(ordering) than idempotency I think. This query uses Paxos to try to determine the state of the system before applying a change. If the state of the system is identical then the query can be retried many times without a change in the results. This provides a weak form of ordering (and is expensive) unlike most Cassandra writes. Generally you should only use CAS operations if you are attempting to record state of a system (rather than a history or log)
Do not use many of these queries if you can help it, the guidelines suggest having only a small percentage of your queries rely on this behavior.

Partition data mid-job on Spring Batch

I want to create a job in spring data which should consist of two steps:
Step 1 - First step reads certain transactions from database and produces a list of record Ids that will be sent to step 2 via jobContext attribute.
Step 2 - This should be a partition step: The slave steps should be partitioned based on the list obtained from step 1 (each thread gets a different Id from the list) and perform their read/process/write operations without interfering with each other.
My problem is that even though I want to partition data based on the list produced by step 1, spring configures step 2 (and thus, calls the partitioner's partition() method) before step 1 even starts, so I cannot inject the partitioning criteria on time. I tried using #StepScope on the partitioner bean, but it still attempts to create the partitions before the job starts.
Is there a way to dynamically create the step partitions during runtime, or an alternative way to divide a step into threads based on the list provided by step 1?
Some background:
I am working on a batch job using spring batch which has to process Transactions stored in a database. Every transaction is tied to an Account (in a different table), which has an accountBalance that also needs to be updated whenever the transaction is processed.
Since I want to perform these operations using multi-threading, I thought a good way to avoid collisions would be to group transactions based on their accountId, and have each thread process only the transactions that belong to that specific accountId. This way, no two threads will attempt to modify the same Account at the same time, as their Transactions will always belong to different Accounts.
However, I cannot know which accountIds need to be processed until I get the list of transactions to process and extract the list from there, so I need to be able to provide the list to partition during runtime. Thtat's why I thought I could generate that list in a previous step, and then have the next step partition and process the data accordingly.
Is the approach I am taking plausible with this setup? Or should I just look for a different solution?
I couldn't find a way to partition the data mid-job like I wanted, so I had to use this workaround:
Instead of dividing the job in two steps, I moved the logic from step 1 (the "setup step") into a service method that returns the list of transactions to process, and added a call to that method inside the partition() method in my partitioner, allowing me to create the partitions based on the returned list.
This achieves the same result in my case, although I'm still interested in knowing if it is possible to configure the partitions mid-job, since this solution would not work if I had to perform more complex processing or writing in the setup step and wanted to configure exception handling policies and such. It probably would not work either if the setup step was placed in the middle of a step chain instead of at the start.

Hector Cassandra Data Retrieval

Is there any way to get all the data from a column family or from a key space?
I can't think of a way of doing this without knowing every single key for every single entry made to the database.
My problem is that I'm trying to create a Twitter clone where each message has its own id, and store those in the same keyspace in the same column family.
But then how do I get them back? I'll have to keep a track of every single id, and that can't possibly work.
Any help/ideas would be appreciated.
You can retrieve all data from a column family using get_range_slices, setting the range start and end to the same value to indicate that you want all data.
See the Cassandra FAQ
See http://aquiles.codeplex.com/discussions/278245 for a Thrift example.
Haven't yet found a handy Hector example but I think it uses RangeSlicesQuery...
However, it's not clear why you want to do this - for this sort of application you would normally look up the messages by ID, and use an index to determine which IDs you need. For example, storing a row for each user that lists all their messages. For example in the messages column family you might have something like:
MsgID0001 -> time text
1234567 Hello world
MsgID0300 -> time text
3456789 LOL ROTFL
And then in a "user2msg" column family, store the messages, perhaps using timestamp column names so the messages are stored in sorted in time order:
UserID001 -> 1234567 3456789
MsgID0001 MsgID0300
This can then be used to look up a particular user's messages, possibly filtered by time.
You'd then also need further column families to store user profiles etc.
Perhaps you need to add more detail to your question?
Update in response to comment: Yes, if you have one message per row, you have to retrieve each message individually. But what is your alternative? Retrieving all messages is only useful for doing batch processing of messages, not for (for example) showing a user their recent messages. Bear in mind that retrieving all messages could take a very long time - you have not explained why you want to retrieve all messages and what you are going to do with them all. How many messages are you expecting to have?
One possibility is to denormalise, i.e. in a row for each user, store the entire messages, so you don't have to do a separate lookup step for each message. This doubles the amount of storage required, however.
The answer i was looking for is CQL, cassandra's query language. It works similarly to sql which is what i need for the function im after.
this link has some excellent tutorials.

Strategies for checking inactivity on Azure

I have a table in Azure Table Storage, with rows that are regularly updated by various processes. I want to efficiently monitor when rows haven't been updated within a specific time period, and to cause alerts to be generated if that occurs.
Most task scheduler implementations I've seen for Azure function by making sure only one worker will perform a given job at a time. However, setting up a scheduled task that waits n minutes, and then queries the latest time-stamp to determine if action should be taken, seems inefficient since the work won't be spread across workers. It also seems generally inefficient to have to poll so many records.
An example use of this would be to send an email to a user that hasn't logged into a web site in the last 30 days. Assume that the number of users is a "large number" for the purposes of producing an efficient algorithm.
Does anyone have any recommendations for strategies that could be used to check for recent activity without forcing only one worker to do the job?
Keep a LastActive table with a timestamp as a rowkey (DateTime.UtcNow.Ticks.ToString("d19")). Update it by doing a batch transaction that deletes the old row and inserts the new row.
Now the query for inactive users is just something like from user in LastActive where user.PartitionKey == string.Empty && user.RowKey < (DateTime.UtcNow - TimeSpan.FromDays(30)).Ticks.ToString("d19") select user. That will be quite efficient for any size table.
Depending on what you're going to do with that information, you might want to then put a message on a queue and then delete the row (so it doesn't get noticed again the next time you check). Multiple workers can now pull those queue messages and take action.
I'm confused about your desire to do this on multiple worker instances... you presumably want to act on an inactive user only once, so you want only one instance to do the check. (The work of sending emails or whatever else you're doing can then be spread about by using a queue, but that initial check should be done by exactly one instance.)

Resources