Test functionality manually by adding maximum records in SQL table, how should i add those many records? - manual-testing

I have to test functionality manually where , if a background job fails then a record is added in a table having columns Exception Id, Job Id, Job Name, Exception, Method Name, Service Name, etc. columns. At present there are around 25 background jobs in the application and i want to test the impact of adding maximum records in the table. How should I do that? Or i have to manually fail jobs to add maximum record in the table (which is practically not possible, as the we can add unlimited records in the table)Is there any way?
I tried to manually fail the job, so that records can be added in the table, however, adding thousands of records manually is practically impossible.
How should i add maximum records in the table manually/ How should i test this scenario?

Can you try doing the following :
Create various Windows batch jobs covering the 25 background jobs.
Create Windows task scheduler such that it is continuously triggered. You may try scheduling the job execution on the same day or spread across multiple days.
What this will do is you no longer need to run the background jobs manually as the task scheduler will take care of this.

Related

How to run repair twice a week using reaper

I need to schedule repair to run on the existing Cassandra twice a week, once I added the first schedule I can't add the second schedule due to the following error, is there any way to run the repair twice a week in reaper?
You cannot create multiple schedules for a table because running multiple repair sessions on the same table is not recommended.
If you want to setup different schedules for different keyspaces/tables, you need to explicitly specify the keyspace and/or table when setting up the schedule.
As a side note, the recommended repair interval is once every gc_grace_seconds. The default GCGS is 10 days (864000) so running repairs once a week is sufficient. Cheers!

Is it bad to run cron jobs to poll from a huge table of scheduled job records?

I've a table which a cron job would poll at every minute to send out messages to other services. The records in the table are essentially activities that are scheduled to run at a certain time. The cron job simply checks to see which of those activities are ready to be run and send a message of that activity through SQS to the other services.
When an activity is found to be ready to run by the cron job, that record will be marked as done after sending a message through SQS. There is an API which allows other services to check whether a scheduled activity has already been done. So keeping a history of those done records is needed.
My concern here, however, is whether a design like this is scalable in the long run. There are around 200k scheduled activities a day, or even more on some days. Since I'm keeping the records by marking them as done after they are completed, I'm worried that the table will eventually get very huge with ten over millions of rows and become an issue for the cron job to run as frequently.
Even with a properly indexed table, is my concern valid? Otherwise, what other alternatives can I design it if I had to somehow persist those scheduled activities for a cron or something to poll and check when they are ready to run?
I'm using Postgres database.
As long as the number of rows that the cron job's query has to fetch stays constant and you can use an index, the size of the table won't matter.
Index scans are O(n) with respect to the number of rows scanned and O(log(n)) with respect to the table size. To be more specific, increasing the table size by a factor between 10 and 200 (smaller size of the index key leads to better fan-out) will make an index scan use one more block, and that block is normally cached.
If the table gets large, you might still want to consider partitioning, but mostly so that you can get rid of old data efficiently.
With the right index, the cron job should have no serious problem. You can have a partial/filtered index, like
create index on jobs (id) where status <> 'done'.
To keep the size of the index small. The query has to match the index where clause.
I used (id) just because an empty list is not allowed and so something has to be there. Based on your comment, schedule_dt might be a better choice. If you include all the columns you select, you can get an index-only scan. But if you don't, it will still use the index, it just has to visit the table to fetch the columns for those specific rows. I suspect the index only scan attempt won't be worth it to you as the pages you need probably won't be marked all visible, as modifications were made to neighboring tuples just one minute ago.
However, it does seem a bit odd to mark a job as done when it has only been scheduled, rather than being done.
There is an API which allows other services to check whether a scheduled activity has already been done.
A table that increases in size without bound is likely to present management problems apart from the cron job. Surely the services aren't going to have to look back months in order to do this, are they? Could you delete 'done' jobs after a few days? What if a service tries to look up a job and rather than finding it 'done', it just doesn't find it at all?
I don't think the cron job is inherently a problem, but it would be cleaner not to have it. Why doesn't whoever inserts the job just invoke SQS in real time?

Run VoltDB stored procedures at regular interval from VoltDB

Is there any way to execute VoltDB stored procedures at regular interval or schedule store procedure to run at a specific time?
I am exploring VotlDB to shift out product from RDBMS to VotlDB. Out produce written in java.
Most of the query can be migrated into the VoltDB stored procedures. But In our product, we have cron job in oracle which executes at regular interval. Now I do not find such features in VoltDB.
I know VoltDB stored procedures can be called from the application at regular interval but our product deploys in an Active-Active mode, in that case, all application will call store procedure at regular interval and that is not a good solution or otherwise, we have to develop some mechanism to run procedure from one instance only.
so It would be good if I get cron job feature from VoltDB.
I work at VoltDB. There isn't currently a feature like this in VoltDB, for example like DBMS_JOB in Oracle.
You could certainly use a cron job on one of the servers in your cluster, or on some other server within your network that could invoke sqlcmd to run a script or echo individual SQL statements or execute procedure commands through sqlcmd to the database. Making cron jobs highly available is a general problem. You might find these other discussions helpful:
How to convert Linux cron jobs to "the Amazon way"?
https://www.reddit.com/r/linuxadmin/comments/3j3bz4/run_cronjob_only_on_one_node_in_cluster/
You could also look into something like rcron.
One thing to be careful of when converting from an RDBMS to VoltDB is that VoltDB is optimized for processing many small transactions in parallel across many partitions. While the architecture of serialized execution per partition excels for many operational and streaming workloads, it is not designed to perform bulk operations on many rows at a time, especially transactions that need to perform writes on many rows that may be in different partitions within one transaction.
If you have a periodic job that does something like "process all the new rows that meet some criteria" you may find this transaction is slow and every time it runs it could delay other parts of the workload, especially if many rows have accumulated. It would be more the "VoltDB Way" to replace a simple INSERT statement that you may be using to ingest data (to be processed later by a scheduled job) with a procedure that inserts and immediately processes the row of data. You might even need a procedure that checks for other records and processes small sets of rows as a group, for example stitching together segments of data that go together but may have arrived out of order. By operating on fewer records at a time within one partition at a time, this type of procedure would be more scalable and would keep the data closer to your desired finished state in real time, rather than always having some data waiting to be processed.

Lotus notes agent runs slower in server compared to development PC

I have an attendance recording system that has 2 databases, one for current, another for archiving. The server processes attendance records, and puts records marked completed into the archive. There is no processing done in the archive database.
Here's the issue. One of the requirement was to build a blank record for each staff every day, for which attendance records are put into. The agent that does this calls a few procedures and does some checking within the database. As of current, there are roughly 1,800 blank records created daily. On the development PC, processing each records takes roughly 2 to 3 seconds, which translates to an average of an hour and a half. However, when we deployed it on the server, processing each records takes roughly 7 seconds, roughly translates into 3 and a half hours to complete. We have had instances when the agent takes 4.5 to 5 hours to complete.
Note that in both instances, agents are scheduled. There are no other lotus apps in the server, and the server is free and idle most of the time (no other application except Windows Server and Lotus Notes). Is there anything that could cause the additional processing time compared on the development PC and the server?
Your process is generating 1800 new documents every day, and you have said that you are also archiving documents regularly, so I presume that means that you are deleting them after you archive them. Performance problems can build up over time in applications like this. You probably have a large number of deletion stubs in the database, and the NSF file is probably highly fragmented (internally and/or externally).
You should use the free NotesPeek utility to examine the database and see how many deletion stubs it contains. Then you should check the purge interval setting and consider lowering it to the smallest value that you are comfortable with. (I.e., big enough so you know that all servers and users will replicate within that time, but small enough to avoid allowing a large buildup of deletion stubs.) If you change the purge interval, you can wait 24 hours for the stubs to be purged, or you can manually run updall against the database on the server console to force it.
Then you should run compact -c on the NSF file, and also run a defrag on the server disk volume where the NSF lives.
If these steps do improve your performance, then you may want to take steps in your code to prevent recurrence of the problem by using coding techniques that minimize deletion stubs, database growth and fragmentation.
I.e., go into your code for archiving, and change it so it doesn't delete them after archiving. Instead, have your code mark them with a field such as FreeDocList := "1". Then add a hidden view called (FreeDocList) with a selction formula of FreeDocList = "1". Also go into ever other view in the database and add & (!(FreeDocList = "1")) to the selection formulas. Then change the code adds the new blank documents, so that instead of creating new docs it just goes to the FreeDocList view, finds the first document, sets FreeDocList = "0", and clears all the previous field values. Of course, if there aren't enough documents the FreeDocList view, your code would revert to the old behavior and create a new document.
With the above changes, you will be re-using your existing documents whenever possible instead of deleting and creating new ones. I've run benchmarks on code like this and found that it can help; but I can't guarantee it in all cases. Much would depend on what else is going on in the application.

Strategies for checking inactivity on Azure

I have a table in Azure Table Storage, with rows that are regularly updated by various processes. I want to efficiently monitor when rows haven't been updated within a specific time period, and to cause alerts to be generated if that occurs.
Most task scheduler implementations I've seen for Azure function by making sure only one worker will perform a given job at a time. However, setting up a scheduled task that waits n minutes, and then queries the latest time-stamp to determine if action should be taken, seems inefficient since the work won't be spread across workers. It also seems generally inefficient to have to poll so many records.
An example use of this would be to send an email to a user that hasn't logged into a web site in the last 30 days. Assume that the number of users is a "large number" for the purposes of producing an efficient algorithm.
Does anyone have any recommendations for strategies that could be used to check for recent activity without forcing only one worker to do the job?
Keep a LastActive table with a timestamp as a rowkey (DateTime.UtcNow.Ticks.ToString("d19")). Update it by doing a batch transaction that deletes the old row and inserts the new row.
Now the query for inactive users is just something like from user in LastActive where user.PartitionKey == string.Empty && user.RowKey < (DateTime.UtcNow - TimeSpan.FromDays(30)).Ticks.ToString("d19") select user. That will be quite efficient for any size table.
Depending on what you're going to do with that information, you might want to then put a message on a queue and then delete the row (so it doesn't get noticed again the next time you check). Multiple workers can now pull those queue messages and take action.
I'm confused about your desire to do this on multiple worker instances... you presumably want to act on an inactive user only once, so you want only one instance to do the check. (The work of sending emails or whatever else you're doing can then be spread about by using a queue, but that initial check should be done by exactly one instance.)

Resources