Error allocating an array - malloc

I have a subroutine which is called quite a lot during the Program run. I try to use as many allocatable arrays as possible, and the subroutine is called several times without any problem, but at some point, it terminates with:
malloc.c:3790: _int_malloc: Assertion `(unsigned long)(size) >= (unsigned long)(nb)' failed.
this happens at the beginning of the Subroutine when the first array is being allocated.
Using non-allocatable array instead, the subroutine is called several times more often but terminates again, now with:
wait: 28674: Memory fault(coredump)
I assume that it terminates on calling, because I write out some values right after the declaration of the variables, without any computation.
The calling
do k=1, kreise
write(*,*)k
call rundheit(n(k),kreis(k,1:n(k),3),kreis(k,1:n(k),2),outrnd)
end do
Where 'kreise' may have values of up to 1500. I printed out and checked the values of the parameters passed, before the call, in the subroutine and after the call.
Limiting 'kreise' does solve the problem, but limiting is not a practical solution. I need all the data to be evaluated. Not a fracture of it.
Some notes to my environment:
My program is a subroutine compiled by an FEM-Simulation-software using the Intel Fortran compiler. As far as I know I have no chance to alter the compiler options and I cannot compile my code on its own because it has to many dependencies on the subroutines deployed by the FEM-software.
I developed and run this exact subroutine on another, much smaller and simpler Simulation without any problems. The actual, ‘bigger’, simulation runs also without any problems as long as I don’t use this particular subroutine.(The difference is mostly the node density and thus the amount of data being considered during the computation) Other user-subroutine work without problems. All the subroutine do is fetch the results between some of the increments, do some analyses and write some reports without altering the Simulation.
I guess that the problem has something to do with the memory handling, something i have no idea of.
Thanks.
UPDATE
I compiled the subroutine using -check all and found the error occurs way before the blamed subroutine. Two arrays, one of them n(), are out of bound on several occasions, but the error gets somehow (more) critical while calling. The strange part is, that it is some iterations beyond the bound when the error occurs, for example: here both arrays have the size (1:72) and the calling breaks somewhere for k=135 to 267, (the lowest and highest values I found during some runs).
The problem is the integer Kreise, which value is set during a loop:
...
allocate(n(l))
allocate(pos(l))
...
do kreise = 1,l
pos(kreise)=minvalX+(Kreise-1)*IncX
if(pos(kreise).gt.maxvalX) exit
end do
Where kreise allways becomes l+1. Why?
NOTE: pos(kreise).gt.maxvalX should never be true. Becomming true isn't a problem, allthough it suggest that l was computed wrong (to big). This exit would then only save computation time later, by reducing the iterations of several loops.

The program may be writing to memory that it shouldn't write to and corrupting the structures of the memory management of malloc that is used by Fortran allocate. I suggest using the Fortran option for run-time subscript checking. With ifort, try -check all or -check bounds.

Related

Haskell + Infinite lists = Hang

I am learning Haskell, have read a few references, and am working on various challenges (mainly codewars). However at times I will attempt to generate an infinite list for some math algorithm and then pick from it (like get the first n numbers which match some pattern).
However because my syntax isn't perfect I often mix up parts and while I want to ask Haskell to define (lazy) an infinite list and pick the first 5 elements (or whatever) I end up actually asking it to do something with the full infinite list and when I build-test it, the program just hangs.
I managed (once) to call up the Windows process manager and what is happening is that in Visual Studio Code when it build and executes the executable it just grows extremely fast absorbing all memory and processor until the computer becomes non-responsive.
Is there some kind of compiler flag that can prevent this?
As noted in the comments, you can run your executable with the -M switch, which allows you to specify a maximum memory size. (Default is unlimited.) That way, if your program tries to use more than X amount of memory, it will crash with an exception rather than just consume all available RAM.
Note that this won't help if your program is doing lots of processing but not trying to actually keep the results in RAM. E.g., if you try to print out the first item that matches a condition, but no item will ever match the condition, in all likelihood your program will loop forever, but not actually consume any RAM. In that case, you'll just have to kill it.
You might also try running your code in GHCi, where you can just jab Ctrl+C to halt your code without killing GHCi itself.

Infinite loop inside 'do_select' function of Linux kernel

I am surprised that Linux kernel has infinite loop in 'do_select' function implementation. Is it normal practice?
Also I am interested in how file changes monitoring implemented in Linux kernel? Is it infinite loop again?
select.c source code
This is not an infinite loop; that term is reserved for loops with no exit condition at all. This loop has its exit condition in the middle: http://lxr.linux.no/#linux+v3.9/fs/select.c#L482 This is a very common idiom in C. It's called "loop and a half" and there's a simple pseudocode example here: https://stackoverflow.com/a/10767975/388520 which clearly illustrates why you would want to do this. (That question talks about Java but that's not important; this is a general structured-programming idiom.)
I'm not a kernel expert, but this particular loop appears to have been written this way because the logic of the inner loop needs to run both before and after the call to poll_schedule_timeout at the very bottom of the outer loop. That code is checking whether there are any events to return; if there are already events to return when select is invoked, it's supposed to return immediately; if there aren't any initially, there will be when poll_schedule_timeout returns. So in normal operation the outer loop should cycle either 0.5 or 1.5 times. (There may be edge-case circumstances where the outer loop cycles more times than that.) I might have chosen to pull the inner loop out to its own function, but that might involve passing pointers to too many local variables around.
This is also not a spin loop, by which I mean, the CPU is not wasting electricity checking for events over and over again until one happens. If there are no events to report when control reaches the call to poll_schedule_timeout, that function (by, ultimately, calling __schedule) will cause the calling thread to block -- the CPU is taken away from that thread and assigned to another process that can do something useful with it. (If there are no processes that need the CPU, it'll be put into a low-power "halt" until the next interrupt fires.) When one of the events happens, or the timeout, the thread that called select will get "woken up" and poll_schedule_timeout will return.
On a larger note, operating system kernels often do things that would be considered strange, poor style, or even flat-out wrong, in the service of other engineering goals (efficiency, code reuse, avoidance of race conditions that can only occur on some CPUs, ...) They are written by people who know exactly what they are doing and exactly how far they can get away with bending the rules. You can learn a lot from reading though OS code, but you probably shouldn't try to imitate it until you have a bit more experience. You wouldn't try to pastiche the style of James Joyce as your first exercise in creative writing, ne? Same deal.

Using threadsafe initialization in a JRuby gem

Wanting to be sure we're using the correct synchronization (and no more than necessary) when writing threadsafe code in JRuby; specifically, in a Puma instantiated Rails app.
UPDATE: Extensively re-edited this question, to be very clear and use latest code we are implementing. This code uses the atomic gem written by #headius (Charles Nutter) for JRuby, but not sure it is totally necessary, or in which ways it's necessary, for what we're trying to do here.
Here's what we've got, is this overkill (meaning, are we over/uber-engineering this), or perhaps incorrect?
ourgem.rb:
require 'atomic' # gem from #headius
SUPPORTED_SERVICES = %w(serviceABC anotherSvc andSoOnSvc).freeze
module Foo
def self.included(cls)
cls.extend(ClassMethods)
cls.send :__setup
end
module ClassMethods
def get(service_name, method_name, *args)
__cached_client(service_name).send(method_name.to_sym, *args)
# we also capture exceptions here, but leaving those out for brevity
end
private
def __client(service_name)
# obtain and return a client handle for the given service_name
# we definitely want to cache the value returned from this method
# **AND**
# it is a requirement that this method ONLY be called *once PER service_name*.
end
def __cached_client(service_name)
##_clients.value[service_name]
end
def __setup
##_clients = Atomic.new({})
##_clients.update do |current_service|
SUPPORTED_SERVICES.inject(Atomic.new({}).value) do |memo, service_name|
if current_services[service_name]
current_services[service_name]
else
memo.merge({service_name => __client(service_name)})
end
end
end
end
end
end
client.rb:
require 'ourgem'
class GetStuffFromServiceABC
include Foo
def self.get_some_stuff
result = get('serviceABC', 'method_bar', 'arg1', 'arg2', 'arg3')
puts result
end
end
Summary of the above: we have ##_clients (a mutable class variable holding a Hash of clients) which we only want to populate ONCE for all available services, which are keyed on service_name.
Since the hash is in a class variable (and hence threadsafe?), are we guaranteed that the call to __client will not get run more than once per service name (even if Puma is instantiating multiple threads with this class to service all the requests from different users)? If the class variable is threadsafe (in that way), then perhaps the Atomic.new({}) is unnecessary?
Also, should we be using an Atomic.new(ThreadSafe::Hash) instead? Or again, is that not necessary?
If not (meaning: you think we do need the Atomic.news at least, and perhaps also the ThreadSafe::Hash), then why couldn't a second (or third, etc.) thread interrupt between the Atomic.new(nil) and the ##_clients.update do ... meaning the Atomic.news from EACH thread will EACH create two (separate) objects?
Thanks for any thread-safety advice, we don't see any questions on SO that directly address this issue.
Just a friendly piece of advice, before I attempt to tackle the issues you raise here:
This question, and the accompanying code, strongly suggests that you don't (yet) have a solid grasp of the issues involved in writing multi-threaded code. I encourage you to think twice before deciding to write a multi-threaded app for production use. Why do you actually want to use Puma? Is it for performance? Will your app handle many long-running, I/O-bound requests (like uploading/downloading large files) at the same time? Or (like many apps) will it primarily handle short, CPU-bound requests?
If the answer is "short/CPU-bound", then you have little to gain from using Puma. Multiple single-threaded server processes would be better. Memory consumption will be higher, but you will keep your sanity. Writing correct multi-threaded code is devilishly hard, and even experts make mistakes. If your business success, job security, etc. depends on that multi-threaded code working and working right, you are going to cause yourself a lot of unnecessary pain and mental anguish.
That aside, let me try to unravel some of the issues raised in your question. There is so much to say that it's hard to know where to start. You may want to pour yourself a cold or hot beverage of your choice before sitting down to read this treatise:
When you talk about writing "thread-safe" code, you need to be clear about what you mean. In most cases, "thread-safe" code means code which doesn't concurrently modify mutable data in a way which could cause data corruption. (What a mouthful!) That could mean that the code doesn't allow concurrent modification of mutable data at all (using locks), or that it does allow concurrent modification, but makes sure that it doesn't corrupt data (probably using atomic operations and a touch of black magic).
Note that when your threads are only reading data, not modifying it, or when working with shared stateless objects, there is no question of "thread safety".
Another definition of "thread-safe", which probably applies better to your situation, has to do with operations which affect the outside world (basically I/O). You may want some operations to only happen once, or to happen in a specific order. If the code which performs those operations runs on multiple threads, they could happen more times than desired, or in a different order than desired, unless you do something to prevent that.
It appears that your __setup method is only called when ourgem.rb is first loaded. As far as I know, even if multiple threads require the same file at the same time, MRI will only ever let a single thread load the file. I don't know whether JRuby is the same. But in any case, if your source files are being loaded more than once, that is symptomatic of a deeper problem. They should only be loaded once, on a single thread. If your app handles requests on multiple threads, those threads should be started up after the application has loaded, not before. This is the only sane way to do things.
Assuming that everything is sane, ourgem.rb will be loaded using a single thread. That means __setup will only ever be called by a single thread. In that case, there is no question of thread safety at all to worry about (as far as initialization of your "client cache" goes).
Even if __setup was to be called concurrently by multiple threads, your atomic code won't do what you think it does. First of all, you use Atomic.new({}).value. This wraps a Hash in an atomic reference, then unwraps it so you just get back the Hash. It's a no-op. You could just write {} instead.
Second, your Atomic#update call will not prevent the initialization code from running more than once. To understand this, you need to know what Atomic actually does.
Let me pull out the old, tired "increment a shared counter" example. Imagine the following code is running on 2 threads:
i += 1
We all know what can go wrong here. You may end up with the following sequence of events:
Thread A reads i and increments it.
Thread B reads i and increments it.
Thread A writes its incremented value back to i.
Thread B writes its incremented value back to i.
So we lose an update, right? But what if we store the counter value in an atomic reference, and use Atomic#update? Then it would be like this:
Thread A reads i and increments it.
Thread B reads i and increments it.
Thread A tries to write its incremented value back to i, and succeeds.
Thread B tries to write its incremented value back to i, and fails, because the value has already changed.
Thread B reads i again and increments it.
Thread B tries to write its incremented value back to i again, and succeeds this time.
Do you get the idea? Atomic never stops 2 threads from running the same code at the same time. What it does do, is force some threads to retry the #update block when necessary, to avoid lost updates.
If your goal is to ensure that your initialization code will only ever run once, using Atomic is a very inappropriate choice. If anything, it could make it run more times, rather than less (due to retries).
So, that is that. But if you're still with me here, I am actually more concerned about whether your "client" objects are themselves thread-safe. Do they have any mutable state? Since you are caching them, it seems that initializing them must be slow. Be that as it may, if you use locks to make them thread-safe, you may not be gaining anything from caching and sharing them between threads. Your "multi-threaded" server may be reduced to what is effectively an unnecessarily complicated, single-threaded server.
If the client objects have no mutable state, good for you. You can be "free and easy" and share them between threads with no problems. If they do have mutable state, but initializing them is slow, then I would recommend caching one object per thread, so they are never shared. Thread[] is your friend there.

Specifying runtime behavior of a program

Are there any languages/extensions that allow the programmer to define general runtime behavior of a program during specific code segments?
Some garbage-collected languages let you modify the behavior of the GC at runtime. Like in lua, the collectgarbage function lets you do this. So, for example, you can stop the GC when you want to be sure that CPU resources aren't used in garbage collection for a critical section of code (after which you start the GC again).
I'm looking for a general way to specify intended behavior of the program without resorting to specifying specific GC tweaks. I'm interested even in an on-paper sort of specification method (ie something a programmer would code toward, but not program syntax that would actually implement that behavior). The point would be that this could be used to specify critical sections of code that shouldn't be interrupted (latency dependent activity) or other intended attributes of certain codepaths (maximum time between an output and an input or two outputs, average running time, etc).
For example, this syntax might describe that maximum time latencyDependentStuff should take is 5 milliseconds:
requireMaxTime(5) {
latencyDependentStuff();
}
Has anyone seen anything like this anywhere before?

Is reading data in one thread while it is written to in another dangerous for the OS?

There is nothing in the way the program uses this data which will cause the program to crash if it reads the old value rather than the new value. It will get the new value at some point.
However, I am wondering if reading and writing at the same time from multiple threads can cause problems for the OS?
I am yet to see them if it does. The program is developed in Linux using pthreads.
I am not interested in being told how to use mutexs/semaphores/locks/etc edit: so my program is only getting the new values, that is not what I'm asking.
No.. the OS should not have any problem. The tipical problem is the that you dont want to read the old values or a value that is half way updated, and thus not valid (and may crash your app or if the next value depends on the former, then you can get a corrupted value and keep generating wrong values all the itme), but if you dont care about that, the OS wont either.
Are the kernel/drivers reading that data for any reason (eg. it contains structures passed in to kernel APIs)? If no, then there isn't any issue with it, since the OS will never ever look at your hot memory.
Your own reads must ensure they are consistent so you don't read half of a value pre-update and half post-update and end up with a value that is neither pre neither post update.
There is no danger for the OS. Only your program's data integrity is at risk.
Imagine you data to consist of a set (structure) of values, which cannot be updated in an atomic operation. The reading thread is bound to read inconsistent data at some point (data consisting of a mixture of old and new values). But you did not want to hear about mutexes...
Problems arise when multiple threads share access to data when accessing that data is not atomic. For example, imagine a struct with 10 interdependent fields. If one thread is writing and one is reading, the reading thread is likely to see a struct that is halfway between one state and another (for example, half of it's members have been set).
If on the other hand the data can be read and written to with a single atomic operation, you will be fine. For example, imagine if there is a global variable that contains a count... One thread is incrementing it on some condition, and another is reading it and taking some action... In this case, there is really no intermediate inconsistent state. It's either got the new value, or it has the old value.
Logically, you can think of locking as a tool that lets you make arbitrary blocks of code atomic, at least as far as the other threads of execution are concerned.

Resources