Oracle-like sequence in Linux? - linux

I need to tag parallel calls to my program with a unique number in single common log file (thousands of calls in a day).
For this an Oracle sequence would be perfect (returned number guaranteed uniqueness). I could implement this with a small C program (C for speed, this is the issue here) using system file locking facilities, but does Linux provide such a facility already (/dev/increment_forever would be nice :)), or did somebody out there already make such a utility ?
Edit: forgot to mention that my program is not a persistent process (it's not a server), so 100 calls == 100 instances of my program. Using an FS file to store a counter would be too slow with needed locking mechanism.. that's why something like /dev/increment_forever (alias: system facility) would be perfect..

First: You're seriously overestimating the costs of advisory locking on Linux. Compared to the price you're already paying for a unique instance of your program to start up, using flock to get an exclusive lock before updating a file with a unique identifier is cheap. (Doing atomic rename-based updates -- of a file other than the one the lock is held on, of course -- has some extra cost around filesystem metadata churn and journaling, but for thousands of calls per day this is nothing; one would worry if you needed to generate thousands of identifiers per second).
Second: Your question implies that what you actually need is uniqueness, as opposed to ordering. This puts you in a space where you don't necessarily need coordination or locking at all. Consider the approach taken by type-1 UUIDs (using a very high-precision timestamp, potentially in combination with other information -- consider CPU identifier, as only one process can be on a single CPU at a given time; or PID, as only one process can have a PID at a given time), or that taken by type-4 UUIDs (using a purely random value). Combine your process's PID and the timestamp at which it started (the latter is column 22 of /proc/self/stat), and you should be set.
This is much slower than a native C implementation using the flock call directly, but should give you an idea of a correct implementation:
retrieve_and_increment() {
local lock_fd curr_value next_value
# using a separate lockfile to allow atomic replacement of content file
exec {lock_fd}<>counter.lock
flock -x "$lock_fd" || {
exec {lock_fd}<&-
return 1
}
next_value=$(( $(<counter) + 1 ))
printf '%s\n' "$next_value" >counter.next && mv counter.next counter
exec {lock_fd}<&- # close our handle on the lock
# then, when not holding the lock, write result to stdout
# ...that way we decrease the time spent holding the lock if stdout blocks
printf '%s\n' "$next_value"
}
Note that we're spinning up an external command for mv, so flock isn't the only time we're paying fork/exec costs here -- a reason why this would be better implemented within your C program.
For other people reading this who genuinely need thousands of unique sequence values generated per second, I would strongly suggest using a Redis database for this purpose. The INCR command will atomically increment the value associated with a key in O(1) time and return that value. If setting up a TCP connection to a local service is considered too slow/expensive, Redis also supports connections via Unix sockets.
On my not-particularly-beefy laptop:
$ redis-benchmark -t INCR -n 100000 -q
INCR: 95510.98 requests per second
95,000 requests per second is probably quite sufficient. :)

Related

Threads: worth it for this situation?

I have never used threads before, but think I may have encountered an opportunity:
I have written a script that chews through an array of ~500 Excel files, and uses Parse::Excel to pull values from specific sheets in the workbook (on average, two sheets per workbook; one cell extracted per sheet.)
Running it now, where I just go through the array of files one by one and extract the relevant info from the file, it takes about 45 minutes to complete.
My question is: is this an opportunity to use threads, and have more than one file get hit at a time*, or should I maybe just accept the 45 minute run time?
(* - if this is a gross misunderstanding of what I can do with threads, please say so!)
Thanks in advance for any guidance you can offer!
Edit - adding example code. The code below is a sub that is called in a foreach loop for each file location stored in an array:
# Init the parser
my $parser = Spreadsheet::ParseExcel->new;
my $workbook = $parser->parse($inputFile) or die("Unable to load $inputFile: $!");
# Get a list of any sheets that have 'QA' in the sheet name
foreach my $sheet ($workbook->worksheets) {
if ($sheet->get_name =~ m/QA/) {
push #sheetsToScan, $sheet->get_name;
}
}
shift #sheetsToScan;
# Extract the value from the appropriate cell
foreach (#sheetsToScan) {
my $worksheet = $workbook->worksheet($_);
if ($_ =~ m/Production/ or $_ =~ m/Prod/) {
$cell = $worksheet->get_cell(1, 1);
$value = $cell ? $cell->value: undef;
if (not defined $value) {
$value = "Not found.";
}
} else {
$cell = $worksheet->get_cell(6,1);
$value = $cell ? $cell->value: undef;
if (not defined $value) {
$value = "Not found.";
}
}
push(#outputBuffer, $line);
Threads (or using multiple processes using fork) allow your script to utilize more than one CPU at time. For many tasks, this can save a lot of "user time" but will not save "system time" (and may even increase system time to handle the overhead of starting and managing threads and processes). Here are the situations where threading/multiprocessing will not be helpful:
the task of your script does not lend itself to parallelization -- when each step of your algorithm depends on the previous steps
the task your script performs is fast and lightweight compared to the overhead of creating and managing a new thread or new process
your system only has one CPU or your script is only enabled to use one CPU
your task is constrained by a different resource than CPU, such as disk access, network bandwidth, or memory -- if your task involves processing large files that you download through a slow network connection, then your network is the bottleneck, and processing the file on multiple CPUs will not help. Likewise, if your task consumes 70% of your system's memory, than using a second and third thread will require paging to your swap space and will not save any time. Parallelization will also be less effective if your threads compete for some synchronized resource -- file locks, database access, etc.
you need to be considerate of other users on your system -- if you are using all the cores on a machine, then other users will have a poor experience
[added, threads only] your code uses any package that is not thread-safe. Most pure Perl code will be thread-safe, but packages that use XS may not be
[added] when you are still actively developing your core task. Debugging is a lot harder in parallel code
Even if none of these apply, it is sometimes hard to tell how much a task will benefit from parallelization, and the only way to be sure is to actually implement the parallel task and benchmark it. But the task you have described looks like it could be a good candidate for parallelization.
It seems to me that your task should benefit from multiple threads of execution (processes or threads), as it seems to have a very roughly even blend of I/O and CPU. I would expect a speedup of a factor of a few but it is hard to tell without knowing details.
One way is to break the list of files into groups, as many as there are cores that you can spare. Then process each group in a fork, which assembles its results and passes them back to the parent once done, via a pipe or files. There are modules that do this and much more, for example Forks::Super or Parallel::ForkManager. They also offer a queue, another approach you can use.
I do this regularly when a lot of data in files is involved and get near linear speedup with up to 4 or 5 cores (on NFS), or even with more cores depending on the job details and on hardware.
I would cautiously assert that this may be simpler than threads, so to try first.
Another way would be to create a thread queue (Thread::Queue)
and feed it the filename groups. Note that Perl's threads are not the lightweight "threads" as one might expect; quite the opposite, they are heavy, they copy everything to each thread (so start them upfront, before there is much data in the program), and they come with yet other subtleties. Have a small number of workers with a sizable job (nice list of files) for each, instead of many threads rapidly working with the queue.
In this approach, too, be careful about how to pass results back since frequent communication poses a significant overhead for (Perl's) threads.
In either case it is important that the groups are formed so to provide for a balanced workload per thread/process. If this is not possible (you may not know which files may take much longer than others), then threads should take smaller batches while for forks use a queue from a module.
Handing only a file or a few to a thread or a process is most likely way too light of a workload, in which case the overhead of managing may erase (or reverse) possible speed gains. The I/O overlap across threads/processes would also increase, which is the main limit to speedup here.
The optimal number of files to pass to a thread/process is hard to estimate, even with all details on hand; just have to try. I assume that the reported runtime (over 5sec for a file) is due to some inefficiency which can be removed so first check your code for undue inefficiencies. If a file somehow really takes that long to process then start by passing a single file at a time to the queue.
Also, please consider mob's answer carefully. And note that these are advanced techniques.
What you do is just change "for ...." into "mce_loop...." and you'll see the boost, although I suggest you take a look mceloop first.

Perl threads to execute a sybase stored proc parallel

I have written a sybase stored procedure to move data from certain tables[~50] on primary db for given id to archive db. Since it's taking a very long time to archive, I am thinking to execute the same stored procedure in parallel with unique input id for each call.
I manually ran the stored proc twice at same time with different input and it seems to work. Now I want to use Perl threads[maximum 4 threads] and each thread execute the same procedure with different input.
Please advise if this is recommended way or any other efficient way to achieve this. If the experts choice is threads, any pointers or examples would be helpful.
What you do in Perl does not really matter here: what matters is what happens on the side of the Sybase server. Assuming each client task creates its own connection to the database, then it's all fine and how the client achieved this makes no diff for the Sybase server. But do not use a model where the different client tasks will try to use the same client-server connection as that will never happen in parallel.
No 'answer' per se, but some questions/comments:
Can you quantify taking a very long time to archive? Assuming your archive process consists of a mix of insert/select and delete operations, do query plans and MDA data show fast, efficient operations? If you're seeing table scans, sort merges, deferred inserts/deletes, etc ... then it may be worth the effort to address said performance issues.
Can you expand on the comment that running two stored proc invocations at the same time seems to work? Again, any sign of performance issues for the individual proc calls? Any sign of contention (eg, blocking) between the two proc calls? If the archival proc isn't designed properly for parallel/concurrent operations (eg, eliminate blocking), then you may not be gaining much by running multiple procs in parallel.
How many engines does your dataserver have, and are you planning on running your archive process during a period of moderate-to-heavy user activity? If the current archive process runs at/near 100% cpu utilization on a single dataserver engine, then spawning 4 copies of the same process could see your archive process tying up 4 dataserver engines with heavy cpu utilization ... and if your dataserver doesn't have many engines ... combined with moderate-to-heavy user activity at the same time ... you could end up invoking the wrath of your DBA(s) and users. Net result is that you may need to make sure your archive process hog the dataserver.
One other item to consider, and this may require input from the DBAs ... if you're replicating out of either database (source or archive), increasing the volume of transactions per a given time period could have a negative effect on replication throughput (ie, an increase in replication latency); if replication latency needs to be kept at a minimum, then you may want to rethink your entire archive process from the point of view of spreading out transactional activity enough so as to not have an effect on replication latency (eg, single-threaded archive process that does a few insert/select/delete operations, sleeps a bit, then does another batch, then sleeps, ...).
It's been my experience that archive processes are not considered high-priority operations (assuming they're run on a regular basis, and before the source db fills up); this in turn means the archive process is usually designed so that it's efficient while at the same time putting a (relatively) light load on the dataserver (think: running as a trickle in the background) ... ymmv ...

One variable shared across all forked instances?

I have a Perl script that forks itself repeatedly. I wish to gather statistics about each forked instance: whether it passed or failed and how many instances there were in total. For this task, is there a way to create a variable that is shared across all instances?
My perl version is v5.8.8.
You should use IPC in some shape or form, most typically a shared memory segment with a semaphore guarding access to it. Alternatively, you could use some kind of hybrid memory/disk database where access API would handle concurrent access for you but this might be an overkill. Finally, you could use a file with record locking.
IPC::Shareable does what you literally ask for. Each process will have to take care to lock and unlock a shared hash (for example), but the data will appear to be shared across processes.
However, ordinary UNIX facilities provide easier ways (IMHO) of collecting worker status and count. Have every process write ($| = 1) "ok\n" or "not ok\n" when it END{}s, for example, and make sure that they are writing to a FIFO as comparatively short writes will not be interleaved. Then capture that output (e.g., ./my-script.pl | tee /tmp/my.log) and you're done. Another approach would have them record their status in simple files — open(my $status, '>', "./status.$$") — in a directory specially prepared for this.

How can a SystemTap script determine the current thread count?

I want to write a SystemTap script that can determine the actual number of threads for the current PID inside a probe call. The number should be the same as shown in the output of /proc/4711/status in this moment.
My first approach was to count kprocess.create and kprocess.exit event occurrences, but this obviously gives you only the relative increase / decrease of the thread count.
How could a SystemTap script use one of the given API functions to determine this number ? Maybe the script could somehow read the same kernel information as being used for the proc file system output ?
You will be subject to race conditions in either case - a stap probe cannot take locks on kernel structures, which would be required to guarantee that the task list does not change while it's being counted. This is especially true for general systemtap probe context, like in the middle of a kprobe.
For the first approach, you could add a "probe begin {}"-time iteration of the task list to prime the initial thread counts from a bit of embedded-C code. One challenge would be to set systemtap script globals from the embedded-C code (there's no documented API for that), but if you look at what the translator generates (stap -p3), it should be doable.
The second approach would be to do the same iteration, but for locking reasons above, this is not generally safe.

How do you regulate concurrency/relative process performance in Erlang?

Let's say I have to read from a directory that has many large XML files in it, and I have to parse that and send them to some service via network, and then write the response to disk again.
If it were Java or C++ etc., I may do something like this (hope this makes sense):
(File read & xml parsing process) -> bounded-queue -> (sender process) -> service
service -> bounded-queue -> (process to parse result and write to disk)
And then I'd assign whatever suitable number of threads to each process. This way I can limit the concurrency of each process at its optimal value, and the bounded queue will ensure there won't be memory shortage etc.
What should I do though when coding in Erlang? I guess I could just implement the whole flow in a function, then iterate the directory and spawn these "start-to-end" processes as fast as possible. This sounds suboptimal though because if parsing of XML takes longer than reading the files etc. the app. could go into memory shortage for having many XML documents in-memory at once etc., and you can't keep the concurrency at the optimal level. E.g. if the "service" is most efficient when concurrency is 4, it would be very inefficient to hit it with enormous concurrency.
How should erlang programmers deal with such situation? I.e. what is the erlang substitute for fixed thread pool and bounded queue?
There is no real way to limit the queue sizes of a process except by handling them all in a timely fashion. Best way would be to simply check available resources before spawning and wait if they are insufficient. So if you are worried about memory, check memory before spawning a new process. if discspace, check diskspace, ect.
Limiting the number of processes spawned is also possible. A simple construction would be:
pool(Max) ->
process_flag(trap_exit, true),
pool(0, Max);
pool(Current, Max) ->
receive
{'EXIT', _, _} ->
pool(Current - 1, Max);
{ work, F, Pid} when Current < Max ->
Pid ! accepted,
spawn_link(F),
pool(Current + 1, Max);
{ work, _, Pid} ->
Pid ! rejected,
pool(Current, Max);
end.
This is a rough sketch how a process would limit the number of processes it spawns. It is however considered better to limit on the real reasons instead of an artificial number.
You can definitely run your own process pool in Erlang, but it is a poor way memory usage since it doesn't take into account the size of the XML data being read (or the total memory used by the processes for that matter).
I would suggest implementing the whole workflow in a functional library, as you suggested, and spawn processes that execute this workflow. Add a check for memory usage which will look at the size of the data to be read in and the available memory (hint: use memsup).
I would suggest you do it in event-driven paradigm.
Imagine you started OTP gen_server with the list of file names.
gen_servers checks resources and spawns next worker if permitted, removing file name from the list and passing it to worker.
Worker processes file and casts message back to gen_server when ready (or you can just trap EXIT).
gen_server receives such message and performs step 1 until file list is empty.
So workers do the heavy lifting, gen_server controls the flow.
You can also create distributed system, but it's a bit more complex as you need to spawn intermediate gen_servers on each computer and query them if resources are available there and then choose which computer should process next file based on replies. And you probably need something like NFS to avoid sending long messages.
Workers can be further split if you need more concurrency.

Resources