I'm really wondering why all source codes that implement a
pthread_mutex_lock never test its return value as defined :
documentation of pthread
even in books the examples don't test if the lock is in error, codes just do the lock.
Is there any reason I missed to let it untested ?
Basically, the only “interesting” error is EINVAL, which in most programs will only happen because of memory corruption, or, as I know from my own painful experience, during program shutdown after destructors have already destroyed some mutexes. The way I see it, the only reasonable response to such an error is to abort the program, which on the other hand is very inconvenient if the errors occur precisely because the program is already shutting down. Of course, this can be solved, but it’s not at all that simple, and not much is gained by it for most programs.
First off, I think "all source code" and "never test" are too strong. I think "some" and "often" would be more accurate.
In books, error checking code is often omitted for clarity of exposition.
As to real-world code, I guess the answer has to be that it is perceived that the likelihood of failure is very low. Whether this is a good assumption is debatable.
Related
As per the title, what is the difference between these two and when should I consider using one over the other?
There may or may not be a difference depending on your definition of what happens when a panic happens (defined in Cargo.toml). Depending on whether you have it set to unwind or abort, different things will happen:
With unwind, this will (as the name suggests) unwind the stack. With this, in particular, it is possible to get a full stack trace
With abort, you will only get the last callee
process::exit(), on the other hand, is a "clean" exit - you will not get a last callee, and you'll get a regular process exit status.
Due to this, you'll ideally want to keep to the following:
For planned shutdowns, use exit(). Do note that a known error is considered a planned shutdown
For unplanned shutdowns (i.e. exceptional failures) consider panic!(), as you'll both benefit from being able to get a stack trace when this happens, and the failure case should be exceptional enough that it is effectively unaccounted for and stems from an unplanned scenario
Afaik, a panic is never supposed to happen in a released program. It gives informations for developpers, but not anything user friendly. I'd say "use it for errors that should not happen in prod". There is probably behind something like an exit(101);
exit just terminates your process with the code you give to it. An exit(0) should mean "Everything is okay".
Usually an access violation terminates the program and I cannot catch a Win32 exception using try and catch. Is there a way I can keep my program running, even in case of an access violation? Preferably I would like to handle the exception and show to the user an access violation occurred.
EDIT: I want my program to be really robust, even against programming errors. The thing I really want to avoid is a program termination even at the cost of some corrupted state.
In Windows, this is called Structured Exception Handling (SEH). For details, see here:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms680657%28v=vs.85%29.aspx
In effect, you can register to get a callback when an exception happens. You can't do this to every exception for obvious reasons.
Using SEH, you can detect a lot of exceptions, access violations included, but not all (e.g. double stack fault). Even with the exceptions that are detectable, there is no way to ensure 100% stability after the exception. However, it may be enough to inform the user, log the error, send a message back to the server, and gracefully exit.
I will start by saying that your question contains a contradiction:
EDIT: I want my program to be really robust, ... The thing I really want to avoid is a program termination even at the cost of some corrupted state.
A program that keeps on limpin' in case of corrupted state isn't robust, it's a liability.
Second, an opinion of sorts. Regarding:
EDIT: I want my program to be really robust, even against programming errors. ...
When, by programming errors you mean all bugs, then this is impossible.
If by programming errors you mean: "programmer misused some API and I want error messages instead of a crash, then write all code with double checks built in: For example, always check all pointers for NULL before usage, even if "they cannot be NULL if the programmer didn't make a mistake", etc. (Oh, you might also consider not using C++ ;-)
But IMHO, some amount of program-crashing-no-matter-what bugs will have to be accepted in any C++ application. (Unless it's trivial or you test the hell out of it for military or medical use (even then ...).)
Others already mentioned SEH -- it's a "simple" matter of __try / __catch.
Maybe instead of trying to catch bugs inside the program, you could try to become friends with Windows Error Reporting (WER) -- I never pulled this, but as far as I understand, you can completely customize it via the OutOfProcessException... callback functions.
I have never seen a real use for checking if a file was closed correctly. I mean, if it didn't close, then what? You have nothing smart to do. Beside, I'm not sure if there's a real world use case, where non of the write/reads/flush will fail, and only the close will.
Does anyone actually uses the return value of close?
From the close(2):
Not checking the return value of close() is a common but nevertheless serious
programming error. It is quite possible that errors on a previous write(2)
operation are first reported at the final close(). Not checking the return
value when closing the file may lead to silent loss of data. This can
especially be observed with NFS and with disk quota.
And if you use signals in your application close may be interrupted (EINTR).
EDIT: That said, I seldom bother unless I'm prepared to handle such cases and write code that has to be 100% fool-proof.
I made changes in sched.c in Linux kernel 2.4 (homework), and now the system goes into kernel panic. The strange thing is: it seems to pass A LOT of booting checks and initializations, and panics only at the very end, showing hte following stack trace:
update_process_times
do_timer
timer_interrupt
handle_IRQ_event
do_IRQ
call_do_IRQ
do)wp_page
handle_mm_fault
do_page_fault
do_sigaction
sys_rt_sigaction
do_page_fault
error_code
And the error is: "In interrupt handler - not synching"
I know it's hard to tell without any code, but can anybody make an educated guess to point me in the right direction?
I can give you my own personal mantra when debugging kernel problems: "it's always your fault."
I often see issues due to overwriting memory outside where I'm working -- if I feed hardware an incorrect address for DMA for example. You may be screwing up a lock somehow; that seems possible in this case if you are seeing a timeout: a forgotten locked lock is causing a timeout to occur due to a hang.
To me, a panic in update_process_times might suggest a problem with the task struct pointer... but I really have no idea.
Keep in mind that things in the kernel often go wrong long before a failure occurs, so a wrong bit anywhere in your code may be to blame, even if it doesn't seem like it should have an effect. If you can, I recommend incrementally adding or removing your code and checking for the problem to see if you can isolate it.
What's a good way to leverage TDD to drive out thread-safe code? For example, say I have a factory method that utilizes lazy initialization to create only one instance of a class, and return it thereafter:
private TextLineEncoder textLineEncoder;
...
public ProtocolEncoder getEncoder() throws Exception {
if(textLineEncoder == null)
textLineEncoder = new TextLineEncoder();
return textLineEncoder;
}
Now, I want to write a test in good TDD fashion that forces me to make this code thread-safe. Specifically, when two threads call this method at the same time, I don't want to create two instances and discard one. This is easily done, but how can I write a test that makes me do it?
I'm asking this in Java, but the answer should be more broadly applicable.
You could inject a "provider" (a really simple factory) that is responsible for just this line:
textLineEncoder = new TextLineEncoder();
Then your test would inject a really slow implementation of the provider. That way the two threads in the test could more easily collide. You could go as far as have the first thread wait on a Semaphore that would be released by the second thread. Then success of the test would ensure that the waiting thread times out. By giving the first thread a head-start you can make sure that it's waiting before the second one releases.
It's hard, though possible - possibly harder than it's worth. Known solutions involve instrumenting the code under test. The discussion here, "Extreme Programming Challenge Fourteen" is worth sifting through.
In the book Clean Code there are some tips on how to test concurrent code. One tip that has helped me to find concurrency bugs, is running concurrently more tests than the CPU has cores.
In my project, running the tests takes about 2 seconds on my quad core machine. When I want to test the concurrent parts (there are some tests for that), I hold down in IntelliJ IDEA the hotkey for running all tests, until I see in the status bar that 20, 50 or 100 test runs are in execution. I follow in Windows Task Manager the CPU and memory usage, to find out when all the test runs have finished executing (memory usage goes up by 1-2 GB when they all are running and then slowly goes back down).
Then I close one by one all the test run output dialogs, and check that there were no failures. Sometimes there are failed tests or tests which are in deadlock, and then I investigate them until I find the bug and have fixed it. That has helped me to find a couple of nasty concurrency bugs. The most important thing, when facing an exception/deadlock that should not have happened, is always assuming that the code is broken, and investigating the reason ruthlessly and fixing it. There are no cosmic rays which cause programs to crash randomly - bugs in code cause programs to crash.
There are also frameworks such as http://www.alphaworks.ibm.com/tech/contest which use bytecode manipulation to force the code to do more thread switching, thus increasing the probability of making concurrency bugs visible.
When I test drove an implementation that needed to be thread safe recently I came up with the solution I provided as an answer for this question. Hope that helps even though there are no tests there. Hope link is OK raher than duplicating teh answer...
Chapter 12 of Java Concurrency in Practice is called "Testing Concurrent Programs". It documents testing for safety and liveness, but says this is a hard subject. I am not sure this problem is solvable by the tools of that chapter.
Just off the top of my head could you compare the instances returned to see if they are indeed the same instance or if they are different? That's probably where I would start with C#, I would imagine you can do the same in java