Haskell + Infinite lists = Hang - haskell

I am learning Haskell, have read a few references, and am working on various challenges (mainly codewars). However at times I will attempt to generate an infinite list for some math algorithm and then pick from it (like get the first n numbers which match some pattern).
However because my syntax isn't perfect I often mix up parts and while I want to ask Haskell to define (lazy) an infinite list and pick the first 5 elements (or whatever) I end up actually asking it to do something with the full infinite list and when I build-test it, the program just hangs.
I managed (once) to call up the Windows process manager and what is happening is that in Visual Studio Code when it build and executes the executable it just grows extremely fast absorbing all memory and processor until the computer becomes non-responsive.
Is there some kind of compiler flag that can prevent this?

As noted in the comments, you can run your executable with the -M switch, which allows you to specify a maximum memory size. (Default is unlimited.) That way, if your program tries to use more than X amount of memory, it will crash with an exception rather than just consume all available RAM.
Note that this won't help if your program is doing lots of processing but not trying to actually keep the results in RAM. E.g., if you try to print out the first item that matches a condition, but no item will ever match the condition, in all likelihood your program will loop forever, but not actually consume any RAM. In that case, you'll just have to kill it.
You might also try running your code in GHCi, where you can just jab Ctrl+C to halt your code without killing GHCi itself.

Related

Celery chains: Is it necessary to wait before getting the results?

So, I have a chain of tasks in Python 3 that a celery worker runs. Currently, I use the following piece of code to get and print the final result of the chain :
while not result.ready():
pass
print(result.get())
I have run the code with and without the while-loop, and it seems that the while-loop is redundant.
My question is: "is it necessary to have that while-loop?"
If by redundant, you mean that the code works fine without the while loop, then I would venture to say that the loop is not necessary. If, however, you throw an error without the loop because you're trying to print something that doesn't exist yet, then you should keep it. This can be a problem, though, because an empty while loop means you're just checking the same variable as fast as your computer can physically handle it, which tends to eat up your CPU. I recommend something like the following:
import time
t = 1 #The number of seconds you want to wait between checking if the result is ready
while not result.ready():
time.sleep(t)
print(result.get())
You can set t to whatever makes sense. If the task you're running takes several hours, maybe set it to 60, and you'll get the result within a minute. If you want the result faster, you can make the interval smaller. This will keep the program from dragging down the rest of your computer. However, if you don't mind your fans blowing and you absolutely need to know the moment the result is ready, ignore all of the above and leave your code the way it is :)

Detect infinite loops in a GHC program

In this example, action is an infinite loop created by mistake. Is there a way to detect such loops in a GHC program?
action bucket manager url = catch
(action bucket manager url)
(\(e :: HttpException) -> Logger.warn $ "Problems with " ++ url)
Short answer: no.
It certainly isn't possible to notice every infinite loop one could write down; this is famously known as the halting problem and the formal proof that one cannot write a loop-detecting program is so famous that there's even a Dr. Seuss version of it.
Of course, there's also an entire branch of computer science devoted to taking best-effort approaches to undecidable problems, and in theory we know a lot about ways to detect simple versions of such infinite loops. However, as far as I know nobody has done the engineering work needed to turn that theory into a tool that one can easily run on Haskell source.
I presume this is a follow up to What is the format of GHC hp profile?.
In general there is no way to automatically detect every infinite loop. In practice and not specific to GHC, I think it is commonly reasonable to detect them manually by looking at the CPU and memory usage. This particular case should exhaust memory because it is not tail recursive (at least, I think that's the reason). Something like action x y z = action x y z won't allocate and so would spin indefinitely, maxing the CPU with no increase in memory usage. It would be up to you to have an expectation of execution time and memory usage and investigate any deviations.
I haven't tried this, but if you suspect an infinite loop perhaps you could use the xc RTS option and interrupt the execution to get a stack trace.

Infinite loop inside 'do_select' function of Linux kernel

I am surprised that Linux kernel has infinite loop in 'do_select' function implementation. Is it normal practice?
Also I am interested in how file changes monitoring implemented in Linux kernel? Is it infinite loop again?
select.c source code
This is not an infinite loop; that term is reserved for loops with no exit condition at all. This loop has its exit condition in the middle: http://lxr.linux.no/#linux+v3.9/fs/select.c#L482 This is a very common idiom in C. It's called "loop and a half" and there's a simple pseudocode example here: https://stackoverflow.com/a/10767975/388520 which clearly illustrates why you would want to do this. (That question talks about Java but that's not important; this is a general structured-programming idiom.)
I'm not a kernel expert, but this particular loop appears to have been written this way because the logic of the inner loop needs to run both before and after the call to poll_schedule_timeout at the very bottom of the outer loop. That code is checking whether there are any events to return; if there are already events to return when select is invoked, it's supposed to return immediately; if there aren't any initially, there will be when poll_schedule_timeout returns. So in normal operation the outer loop should cycle either 0.5 or 1.5 times. (There may be edge-case circumstances where the outer loop cycles more times than that.) I might have chosen to pull the inner loop out to its own function, but that might involve passing pointers to too many local variables around.
This is also not a spin loop, by which I mean, the CPU is not wasting electricity checking for events over and over again until one happens. If there are no events to report when control reaches the call to poll_schedule_timeout, that function (by, ultimately, calling __schedule) will cause the calling thread to block -- the CPU is taken away from that thread and assigned to another process that can do something useful with it. (If there are no processes that need the CPU, it'll be put into a low-power "halt" until the next interrupt fires.) When one of the events happens, or the timeout, the thread that called select will get "woken up" and poll_schedule_timeout will return.
On a larger note, operating system kernels often do things that would be considered strange, poor style, or even flat-out wrong, in the service of other engineering goals (efficiency, code reuse, avoidance of race conditions that can only occur on some CPUs, ...) They are written by people who know exactly what they are doing and exactly how far they can get away with bending the rules. You can learn a lot from reading though OS code, but you probably shouldn't try to imitate it until you have a bit more experience. You wouldn't try to pastiche the style of James Joyce as your first exercise in creative writing, ne? Same deal.

Error allocating an array

I have a subroutine which is called quite a lot during the Program run. I try to use as many allocatable arrays as possible, and the subroutine is called several times without any problem, but at some point, it terminates with:
malloc.c:3790: _int_malloc: Assertion `(unsigned long)(size) >= (unsigned long)(nb)' failed.
this happens at the beginning of the Subroutine when the first array is being allocated.
Using non-allocatable array instead, the subroutine is called several times more often but terminates again, now with:
wait: 28674: Memory fault(coredump)
I assume that it terminates on calling, because I write out some values right after the declaration of the variables, without any computation.
The calling
do k=1, kreise
write(*,*)k
call rundheit(n(k),kreis(k,1:n(k),3),kreis(k,1:n(k),2),outrnd)
end do
Where 'kreise' may have values of up to 1500. I printed out and checked the values of the parameters passed, before the call, in the subroutine and after the call.
Limiting 'kreise' does solve the problem, but limiting is not a practical solution. I need all the data to be evaluated. Not a fracture of it.
Some notes to my environment:
My program is a subroutine compiled by an FEM-Simulation-software using the Intel Fortran compiler. As far as I know I have no chance to alter the compiler options and I cannot compile my code on its own because it has to many dependencies on the subroutines deployed by the FEM-software.
I developed and run this exact subroutine on another, much smaller and simpler Simulation without any problems. The actual, ‘bigger’, simulation runs also without any problems as long as I don’t use this particular subroutine.(The difference is mostly the node density and thus the amount of data being considered during the computation) Other user-subroutine work without problems. All the subroutine do is fetch the results between some of the increments, do some analyses and write some reports without altering the Simulation.
I guess that the problem has something to do with the memory handling, something i have no idea of.
Thanks.
UPDATE
I compiled the subroutine using -check all and found the error occurs way before the blamed subroutine. Two arrays, one of them n(), are out of bound on several occasions, but the error gets somehow (more) critical while calling. The strange part is, that it is some iterations beyond the bound when the error occurs, for example: here both arrays have the size (1:72) and the calling breaks somewhere for k=135 to 267, (the lowest and highest values I found during some runs).
The problem is the integer Kreise, which value is set during a loop:
...
allocate(n(l))
allocate(pos(l))
...
do kreise = 1,l
pos(kreise)=minvalX+(Kreise-1)*IncX
if(pos(kreise).gt.maxvalX) exit
end do
Where kreise allways becomes l+1. Why?
NOTE: pos(kreise).gt.maxvalX should never be true. Becomming true isn't a problem, allthough it suggest that l was computed wrong (to big). This exit would then only save computation time later, by reducing the iterations of several loops.
The program may be writing to memory that it shouldn't write to and corrupting the structures of the memory management of malloc that is used by Fortran allocate. I suggest using the Fortran option for run-time subscript checking. With ifort, try -check all or -check bounds.

Limiting work in progress of parallel operations of a streamed resource

I've found myself recently using the SemaphoreSlim class to limit the work in progress of a parallelisable operation on a (large) streamed resource:
// The below code is an example of the structure of the code, there are some
// omissions around handling of tasks that do not run to completion that should be in production code
SemaphoreSlim semaphore = new SemaphoreSlim(Environment.ProcessorCount * someMagicNumber);
foreach (var result in StreamResults())
{
semaphore.Wait();
var task = DoWorkAsync(result).ContinueWith(t => semaphore.Release());
...
}
This is to avoid bringing too many results into memory and the program being unable to cope (generally evidenced via an OutOfMemoryException). Though the code works and is reasonably performant, it still feels ungainly. Notably the someMagicNumber multiplier, which although tuned via profiling, may not be as optimal as it could be and isn't resilient to changes to the implementation of DoWorkAsync.
In the same way that thread pooling can overcome the obstacle of scheduling many things for execution, I would like something that can overcome the obstacle of scheduling many things to be loaded into memory based on the resources that are available.
Since it is deterministically impossible to decide whether an OutOfMemoryException will occur, I appreciate that what I'm looking for may only be achievable via statistical means or even not at all, but I hope that I'm missing something.
Here I'd say that you're probably overthinking this problem. The consequences for overshooting are rather high (the program crashes). The consequences for being too low are that the program might be slowed down. As long as you still have some buffer beyond a minimum value, further increases to the buffer will generally have little to no effect, unless the processing time of that task in the pipe is extraordinary volatile.
If your buffer is constantly filling up it generally means that the task before it in the pipe executes quite a bit quicker than the task that follows it, so even without a fairly small buffer it is likely to always ensure the task following it has some work. The buffer size needed to get 90% of the benefits of a buffer is usually going to be quite small (a few dozen items maybe) whereas the side needed to get an OOM error are like 6+ orders of magnate higher. As long as you're somewhere in-between those two numbers (and that's a pretty big range to land in) you'll be just fine.
Just run your static tests, pick a static number, maybe add a few percent extra for "just in case" and you should be good. At most, I'd move some of the magic numbers to a config file so that they can be altered without a recompile in the event that the input data or the machine specs change radically.

Resources