In this example, action is an infinite loop created by mistake. Is there a way to detect such loops in a GHC program?
action bucket manager url = catch
(action bucket manager url)
(\(e :: HttpException) -> Logger.warn $ "Problems with " ++ url)
Short answer: no.
It certainly isn't possible to notice every infinite loop one could write down; this is famously known as the halting problem and the formal proof that one cannot write a loop-detecting program is so famous that there's even a Dr. Seuss version of it.
Of course, there's also an entire branch of computer science devoted to taking best-effort approaches to undecidable problems, and in theory we know a lot about ways to detect simple versions of such infinite loops. However, as far as I know nobody has done the engineering work needed to turn that theory into a tool that one can easily run on Haskell source.
I presume this is a follow up to What is the format of GHC hp profile?.
In general there is no way to automatically detect every infinite loop. In practice and not specific to GHC, I think it is commonly reasonable to detect them manually by looking at the CPU and memory usage. This particular case should exhaust memory because it is not tail recursive (at least, I think that's the reason). Something like action x y z = action x y z won't allocate and so would spin indefinitely, maxing the CPU with no increase in memory usage. It would be up to you to have an expectation of execution time and memory usage and investigate any deviations.
I haven't tried this, but if you suspect an infinite loop perhaps you could use the xc RTS option and interrupt the execution to get a stack trace.
Related
I am learning Haskell, have read a few references, and am working on various challenges (mainly codewars). However at times I will attempt to generate an infinite list for some math algorithm and then pick from it (like get the first n numbers which match some pattern).
However because my syntax isn't perfect I often mix up parts and while I want to ask Haskell to define (lazy) an infinite list and pick the first 5 elements (or whatever) I end up actually asking it to do something with the full infinite list and when I build-test it, the program just hangs.
I managed (once) to call up the Windows process manager and what is happening is that in Visual Studio Code when it build and executes the executable it just grows extremely fast absorbing all memory and processor until the computer becomes non-responsive.
Is there some kind of compiler flag that can prevent this?
As noted in the comments, you can run your executable with the -M switch, which allows you to specify a maximum memory size. (Default is unlimited.) That way, if your program tries to use more than X amount of memory, it will crash with an exception rather than just consume all available RAM.
Note that this won't help if your program is doing lots of processing but not trying to actually keep the results in RAM. E.g., if you try to print out the first item that matches a condition, but no item will ever match the condition, in all likelihood your program will loop forever, but not actually consume any RAM. In that case, you'll just have to kill it.
You might also try running your code in GHCi, where you can just jab Ctrl+C to halt your code without killing GHCi itself.
This question already has an answer here:
Why are GHC Sparks Fizzling?
(1 answer)
Closed 5 years ago.
According to the doumentation for Control.Parallel, one should make sure that the computation being sparked is non-trivial so that creating the spark is cheaper than the computation itself.
This makes sense, but after listening to Bartosz Milewski talk about how cheap sparks are, I'm wondering how experienced Haskell programmers determine whether or not a computation is worthy of parallelism.
This subject is facts, not opinions based.
Please take notice of a few facts on the actual overhead costs before reading:
".. creating spark doesn't immediately wakeup idle capability, see here. By default scheduling interval is 20ms, so when you create a spark, it will take up to 20 ms to turn it to a real thread. By that time the calling thread most likely will already evaluate the thunk, and the spark will be either GC'd or fizzled.
By contrast, forkIO will immediately wakeup idle capability if any. That is why explicit concurrency is more reliable then parallel strategies."
So, remember to add +20 ms and/or the benchmarked costs of forkIO-spawned functional block, to the below cited add-on overheads in the realistically achievable cost/benefit ( speedup ) formulae.
This problem has been solved by Dr. Gene AMDAHL many decades ago
A bit more recent generations of C/S students or practitioners just seem that have somehow forgotten the elementary process-scheduling logic ( i.e. the rules, not art, of proper organising the flow of code-execution over the system's restricted physical resources ).
Though a fair objection may and will come from the nature of the functional languages, where lambda-calculus can and often does harnesses a space, otherwise hidden for imperative languages, for going into a smart, fine-grain, parallelism, derived right from the laws of lambda- or pi- calculi.
Yet the core message holds and is here, for more than 60 years.
A piece of quantitative, fair, records-of-evidence based rationale on this is well enough: ( no magic, no hidden Art of whatever nature )
Please, first try to do one's own best to first fully understand both the original formulation of the Amdahl's Law, plus kindly also revise the recent criticism and overhead-strict, resources-aware re-formulation of the original, generally valid, universal system-scheduling law.
Additions, in the [ Criticism ] section, were meant exactly to match what actually happens, when someone comes to a piece of code with an idea to "re-organise" the computing graph and enter into the process-flow some either "just"-[CONCURRENT] or true-[PARALLEL] computing syntax-constructors ( whatever the actual code-execution tools are ).
Having got the overhead-strict Amdahl's Law theory, let's measure:
This part is easy and systematic: my students often rant, but going forward, each of them collects a set of hands on experience, what it actually takes ( in costs ) to go into any form of a promise to use a parallelised code-execution.
1 ) create a NOP-function - a function, that indeed does nothing, except of being run ( without an obligation to pass any, the less any remarkable in volume, arguments, without trying to allocate any single bit of memory during its ( empty )-code-execution and without returning any value "back" ). This is an idealised NOP-function payload, to let it being spawned / sparked / distributed into parallelism-tool of choice driven execution.
Having the NOP-fun ready, lets benchmark the pure-overhead of such NOP-fun code-execution in multiple instances and measure the time it took.
Being sure all such instances were doing indeed nothing "there", the lump sum of time spent between the two time-lines were -- hoooray -- the pure overhead cost of going parallelised and process re-collection overhead cost.
So simple, so easy.
Different tool differ in how much costs a user-programme will accrue, but both the metric and the methodology is crystal-clear.
CuT_start <- liftIO $ getCurrentTime -- a Code_Under_Test___START
-- I use a ZeroMQ Stopwatch() instance
-- having a better than [us] resolution
-- but haven't found it in Haskell binding
--CuT_TIMING_CRITICAL_SECTION_/\/\/\/\/\/\\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
<_CuT_a_Code_syntax-constructor_Under_Test_>
--CuT_TIMING_CRITICAL_SECTION_/\/\/\/\/\/\\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
CuT_end <- liftIO $ getCurrentTime -- a Code_Under_Test___END
liftIO $ print ( diffUTCTime CuT_end CuT_start )
2 ) having registered a net cost of spawning / sparking the intended amount of jobs, one may forward with:
- adding "remote" memory allocations ( literally till the swapping kills the O/S )
- adding "remote" CPU-bound processing ( again, as far as one smells the fatigue of the O/S kernel's scheduling efforts to create some yet feasible hardware-threads mapping )
- adding "remote" process testing as per the call-interface scaling ( volume of data with a need to pass from caller to callee ) dependencies
- adding "remote" process return value(s)
- adding "remote" process needs to access some shared resource
The final decision Go - No Go :
All these, above collected and recorded add-on costs, just increase the real-world code overhead costs, that have to be entered into the recent, re-formulated Amdahl's Law.
If and only if
the overhead-strict, resources-aware speedup result is >> 1.00 it makes sense to go into parallelised code-execution
In all cases, where
the "improved" speedup is actually <= 1.00 it would be indeed a very bad idea to pay more than one receives from such an "improvement"
( A reversed formulation is always possible -- derive a minimum amount of processing, that will at least justify the above systematically benchmarked costs of using a respective type of a parallelised-code syntax-constructor )
Q.E.D.
Suppose I'm writing a program in node.js (or perhaps another typical back-end scripting language). Suppose further I have a C function f (or a python function, or what have you) that does some pure data transformation.
If I want to use f in my node program, there are two approaches:
Bind f via something like node-gyp that makes it callable from JavaScript land.
Make f into a binary (or, in the case of a language like python, a single f.py interface) that sits on the file system, and then call it from node as if were any other system command (so that one can then take the output from the system call as a string, convert it into node.js data, and then use it).
Question: What are the performance implications of choosing (2) over (1)?
This is important because if you are using a language like C to make some aspect of your application run significantly faster, then using (2) would seem pointless if it slowed things down past some threshold.
The cost of 1 is the cost of loading the native code, transfering arguments (ffi), calling the native code, and transfering arguments back. With loading being done only once.
The cost of 2 is always going to be the cost to startup the process, running the process, converting the results back from strings.
If the cost of f is high, you may never see a difference between 1 and 2. If the cost of f is low, then 2 will take longer because the process startup overhead will dominate.
However, depending on the complexity of f (it might be a very large data-processing application in C), it's almost always faster to create a native binding like 1. Avoiding process startup overhead is important, it also reduces the total amount of memory needed to run your application.
Alternatively you could do option:
Have the C code talk over a local network socket. Accepting requests and responding with answers when the computation is done.
This has the benefit of scaling out to multiple nodes if you need it.
Benchmarking both for your use case is the only way to be sure but method 1 is
likely to be faster.
The startup cost of calling a binary and starting an interpreter for python/perl/blah would likely kill any performance gain you might get using their Foreign Function Interface (FFI). Startup cost is one of the reasons why Apache has mod_python, mod_perl and why FastCGI exists.
Another thing to consider is that you're adding another language to the mix and this might kill performance of the team ie now everyone needs to know two languages and two FFI methods etc. If your app is in Node, keep it in Node and use node to call native methods.
I am surprised that Linux kernel has infinite loop in 'do_select' function implementation. Is it normal practice?
Also I am interested in how file changes monitoring implemented in Linux kernel? Is it infinite loop again?
select.c source code
This is not an infinite loop; that term is reserved for loops with no exit condition at all. This loop has its exit condition in the middle: http://lxr.linux.no/#linux+v3.9/fs/select.c#L482 This is a very common idiom in C. It's called "loop and a half" and there's a simple pseudocode example here: https://stackoverflow.com/a/10767975/388520 which clearly illustrates why you would want to do this. (That question talks about Java but that's not important; this is a general structured-programming idiom.)
I'm not a kernel expert, but this particular loop appears to have been written this way because the logic of the inner loop needs to run both before and after the call to poll_schedule_timeout at the very bottom of the outer loop. That code is checking whether there are any events to return; if there are already events to return when select is invoked, it's supposed to return immediately; if there aren't any initially, there will be when poll_schedule_timeout returns. So in normal operation the outer loop should cycle either 0.5 or 1.5 times. (There may be edge-case circumstances where the outer loop cycles more times than that.) I might have chosen to pull the inner loop out to its own function, but that might involve passing pointers to too many local variables around.
This is also not a spin loop, by which I mean, the CPU is not wasting electricity checking for events over and over again until one happens. If there are no events to report when control reaches the call to poll_schedule_timeout, that function (by, ultimately, calling __schedule) will cause the calling thread to block -- the CPU is taken away from that thread and assigned to another process that can do something useful with it. (If there are no processes that need the CPU, it'll be put into a low-power "halt" until the next interrupt fires.) When one of the events happens, or the timeout, the thread that called select will get "woken up" and poll_schedule_timeout will return.
On a larger note, operating system kernels often do things that would be considered strange, poor style, or even flat-out wrong, in the service of other engineering goals (efficiency, code reuse, avoidance of race conditions that can only occur on some CPUs, ...) They are written by people who know exactly what they are doing and exactly how far they can get away with bending the rules. You can learn a lot from reading though OS code, but you probably shouldn't try to imitate it until you have a bit more experience. You wouldn't try to pastiche the style of James Joyce as your first exercise in creative writing, ne? Same deal.
I've been learning some lua for game development. I heard about coroutines in other languages but really came up on them in lua. I just don't really understand how useful they are, I heard a lot of talk how it can be a way to do multi-threaded things but aren't they run in order? So what benefit would there be from normal functions that also run in order? I'm just not getting how different they are from functions except that they can pause and let another run for a second. Seems like the use case scenarios wouldn't be that huge to me.
Anyone care to shed some light as to why someone would benefit from them?
Especially insight from a game programming perspective would be nice^^
OK, think in terms of game development.
Let's say you're doing a cutscene or perhaps a tutorial. Either way, what you have are an ordered sequence of commands sent to some number of entities. An entity moves to a location, talks to a guy, then walks elsewhere. And so forth. Some commands cannot start until others have finished.
Now look back at how your game works. Every frame, it must process AI, collision tests, animation, rendering, and sound, among possibly other things. You can only think every frame. So how do you put this kind of code in, where you have to wait for some action to complete before doing the next one?
If you built a system in C++, what you would have is something that ran before the AI. It would have a sequence of commands to process. Some of those commands would be instantaneous, like "tell entity X to go here" or "spawn entity Y here." Others would have to wait, such as "tell entity Z to go here and don't process anymore commands until it has gone here." The command processor would have to be called every frame, and it would have to understand complex conditions like "entity is at location" and so forth.
In Lua, it would look like this:
local entityX = game:GetEntity("entityX");
entityX:GoToLocation(locX);
local entityY = game:SpawnEntity("entityY", locY);
local entityZ = game:GetEntity("entityZ");
entityZ:GoToLocation(locZ);
do
coroutine.yield();
until (entityZ:isAtLocation(locZ));
return;
On the C++ size, you would resume this script once per frame until it is done. Once it returns, you know that the cutscene is over, so you can return control to the user.
Look at how simple that Lua logic is. It does exactly what it says it does. It's clear, obvious, and therefore very difficult to get wrong.
The power of coroutines is in being able to partially accomplish some task, wait for a condition to become true, then move on to the next task.
Coroutines in a game:
Easy to use, Easy to screw up when used in many places.
Just be careful and not use it in many places.
Don't make your Entire AI code dependent on Coroutines.
Coroutines are good for making a quick fix when a state is introduced which did not exist before.
This is exactly what java does. Sleep() and Wait()
Both functions are the best ways to make it impossible to debug your game.
If I were you I would completely avoid any code which has to use a Wait() function like a Coroutine does.
OpenGL API is something you should take note of. It never uses a wait() function but instead uses a clean state machine which knows exactly what state what object is at.
If you use coroutines you end with up so many stateless pieces of code that it most surely will be overwhelming to debug.
Coroutines are good when you are making an application like Text Editor ..bank application .. server ..database etc (not a game).
Bad when you are making a game where anything can happen at any point of time, you need to have states.
So, in my view coroutines are a bad way of programming and a excuse to write small stateless code.
But that's just me.
It's more like a religion. Some people believe in coroutines, some don't. The usecase, the implementation and the environment all together will result into a benefit or not.
Don't trust benchmarks which try to proof that coroutines on a multicore cpu are faster than a loop in a single thread: it would be a shame if it were slower!
If this runs later on some hardware where all cores are always under load, it will turn out to be slower - ups...
So there is no benefit per se.
Sometimes it's convenient to use. But if you end up with tons of coroutines yielding and states that went out of scope you'll curse coroutines. But at least it isn't the coroutines framework, it's still you.
We use them on a project I am working on. The main benefit for us is that sometimes with asynchronous code, there are points where it is important that certain parts are run in order because of some dependencies. If you use coroutines, you can force one process to wait for another process to complete. They aren't the only way to do this, but they can be a lot simpler than some other methods.
I'm just not getting how different they are from functions except that
they can pause and let another run for a second.
That's a pretty important property. I worked on a game engine which used them for timing. For example, we had an engine that ran at 10 ticks a second, and you could WaitTicks(x) to wait x number of ticks, and in the user layer, you could run WaitFrames(x) to wait x frames.
Even professional native concurrency libraries use the same kind of yielding behaviour.
Lots of good examples for game developers. I'll give another in the application extension space. Consider the scenario where the application has an engine that can run a users routines in Lua while doing the core functionality in C. If the user needs to wait for the engine to get to a specific state (e.g. waiting for data to be received), you either have to:
multi-thread the C program to run Lua in a separate thread and add in locking and synchronization methods,
abend the Lua routine and retry from the beginning with a state passed to the function to skip anything, least you rerun some code that should only be run once, or
yield the Lua routine and resume it once the state has been reached in C
The third option is the easiest for me to implement, avoiding the need to handle multi-threading on multiple platforms. It also allows the user's code to run unmodified, appearing as if the function they called took a long time.