Code instrumentation in haskell

Code instrumentation in haskell - haskell

Suppose I maintain complex application connected to external systems. One day it starts to return unexpected results for certain input and I need to find out why. It could be DNS problem, filesytem related problem, external system change, anything.
Assuming that amount of processing is extensive, before I can identify possible locations of the problem I would need to obtain detailed traces which original application does not produce.
How can I instrument existing code so that I can (for example) provide non-volatile proof (not a live debug session) that certain component or function has a bug.

This sounds like more of an architecture/best practices type question than anything Haskell-specific, unless I'm misunderstanding something.
It sounds like your application needs to use a logging system, such as hslogger. The general approach is to have each component of your code create logging messages with an attached priority. You can then have the application handle different priority levels differently, so for example critical errors could be displayed on the console, while debug and info-level errors go to logfiles.
It's sometimes useful to use Debug.Trace.traceEvent and Debug.Trace.traceEventIO instead of a logging system, particularly if you suspect a concurrency issue, as the ghc eventlog also logs information about thread spawning/switching and garbage collection. But in general it's not a substitution for an actual logging framework.
Also, you may want to make use of assert as a sanity check that "impossible" conditions really don't occur.

Related

What is the best way to understand and analyze a multithreading code?

I'm not looking for programming techniques. My question is rather about what is the best way to understand a code developed by a third party.
I have a code for an application in a specific language (it could be C/C++, Java, etc.). This code uses several threads to control different processes. The application generates a log that shows all calls to relevant functions for each thread.
I have to analyze this code to understand its operation and be able to make an improvement of the algorithm. I worked little with threads, so I do not know which is the most convenient way to start the analysis and follow the execution of each thread.
Could you give me any recommendation?

If you are able to contact any of the code's original developers, having a conversation with them (by voice or by email) and asking them to describe how they intended things to work is always preferable to only trying to reverse-engineer their intent by looking at the code. If you can't contact the developers directly, then perhaps there is a library-specific developer's forum or other on-line resource where you can discuss the library's structure with people who have experience using/debugging it.
If that's not an option (or if you've done that and still don't feel like you understand things well enough), then I often find that profiling (either via a profiling tool, or just by temporarily putting printf() [or similar] tracing-calls into the codebase at various places and seeing what gets printed when) is a good way to find out which parts of the code are actually being used at which stages of the program's execution. That will help you confirm (or disprove) your theories about how the codebase works. Knowing where and when each thread is spawned, where its entry-function is, and where/when it gets joined again by its parent thread are particularly useful.
Finally, start looking at the various pieces of data (e.g. objects and member variables) each thread examines and/or modifies, and how accesses to each those pieces of data is synchronized/serialized. Assuming the code isn't buggy, the critical sections of the codebase are good indicators of where inter-thread communication is happening.

threadscope functionality

Can programs be monitored while they are running (possibly piping the event log)? Or is it only possible to view event logs after execution. If the latter is the case, is there a deeper reason with respect to how the Haskell runtime works?
Edit: I don't know much about the runtime tbh, but given dflemstr's response, I was curious about how much and the ways in which performance is degraded by adding the event monitoring runtime option. I recall in RWH they mentioned that the rts has to add cost centres, but I wasn't completely sure about how expensive this sort of thing was.

The direct answer is that, no, it is not possible. And, no, there is no reason for that except that nobody has done the required legwork so far.
I think this would mainly be a matter of
Modifying ghc-events so it supports reading event logs chunk-wise and provide partial results. Maybe porting it over to attoparsec would help?
Threadscope would have to update its internal tree data structures as new data streams in.
Nothing too hard, but somebody would need to do it. I think I heard discussion about adding this feature already... So it might happen eventually.
Edit: And to make it clear, there's no real reason this would have to degrade performance beyond what you get with event log or cost centre profiling already.

If you want to monitor the performance of the application while it is running, you can for instance use the ekg package as described in this blog post. It isn't as detailed as ThreadScope, but it does the job for web services, for example.
To get live information about what the runtime is doing, you can use the dtrace program to capture dynamic events posted by some GHC runtime probes. How this is done is outlined in this wiki page. You can then use this information to put together a more coherent event log.

Any good strategies for dealing with 'not reproducible' bugs? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Very often you will get or submit bug reports for defects that are 'not reproducible'. They may be reproducible on your computer or software project, but not on a vendor's system. Or the user supplies steps to reproduce, but you can't see the defect locally. Many variations on this scenario of course, so to simplify I guess what I'm trying to learn is:
What is your company's policy towards 'not reproducible' bugs? Shelve them, close them, ignore? I occasionally see intermittent, non reproducible bugs in 3-rd party frameworks, and these are pretty much always closed instantly by the vendor... but they are real bugs.
Have you found any techniques that help in fixing these types of bugs? Usually what I do is get a system info report from the user, and steps to reproduce, then search on keywords, and try to see any sort of pattern.

Verify the steps used to produce the error
Oftentimes the people reporting the error, or the people reproducing the error, will do something wrong and not end up in the same state, even if they think they are. Try to walk it through with the reporting party. I've had a user INSIST that the admin privileges were not appearing correctly. I tried reproducing the error and was unable to. When we walked it through together, it turned out he was logging in as a regular user in that case.
Verify the system/environment used to produce the error
I've found many 'irreproducible' bugs and only later discovered that they ARE reproducible on Mac OS (10.4) Running X version of Safari. And this doesn't apply only to browsers and rendering, it can apply to anything; the other applications that are currently being run, whether or not the user is RDP or local, admin or user, etc... Make certain you get your environment as close to theirs as possible before calling it irreproducible.
Gather Screenshots and Logs
Once you have verified that the user is doing everything correctly and still getting a bug, and that you're doing exactly what they do, and you are NOT getting the bug, then it's time to see what you can actually do about it. Screenshots and logs are critical. You want to know exactly what it looks like, and exactly what was going on at the time.
It is possible that the logs could contain some information that you can reproduce on your system, and once you can reproduce the exact scenario, you might be able to coax the error out of hiding.
Screenshots also help with this, because you might discover that "X piece has loaded correctly, but it shouldn't have because it is dependent on Y" and that might give you a hint. Even if the user can describe what were doing, a screen shot could help even more.
Gather step-by-step description from the user
It's very common to blame the users, and not trust anything that they say (because they call a 'usercontrol' a 'thingy') but even though they might not know the names of what they're seeing, they will still be able to describe some of the behaviour they are seeing. This includes some minor errors that may have occured a few minutes BEFORE the real error occurred, or possibly slowness in certain things that are usually fast. All these things can be clues to help you narrow down which aspect is causing the error on their machine and not yours.
Try Alternate Approachs to produce the error
If all else fails, try looking at the section of code that is causing problems, and possibly refactor or use a workaround. If it is possible for you to create a scenario where you start with half the information already there (hopefully in UAT) ask the user to try that approach, and see if the error still occurs. Do you best to create alternate but similar approaches that get the error into a different light so that you can examine it better.

Short answer: Conduct a detailed code review on the suspected faulty code, with the aim of fixing any theoretical bugs, and adding code to monitor and log any future faults.
Long answer:
To give a real-world example from the embedded systems world: we make industrial equipment, containing custom electronics, and embedded software running on it.
A customer reported that a number of devices on a single site were experiencing the same fault at random intervals. Their symptoms were the same in each case, but they couldn't identify an obvious cause.
Obviously our first step was to try and reproduce the fault in the same device in our lab, but we were unable to do this.
So, instead, we circulated the suspected faulty code within the department, to try and get as many ideas and suggestions as possible. We then held a number of code review meetings to discuss these ideas, and determine a theory which: (a) explained the most likely cause of the faults observed in the field; (b) explained why we were unable to reproduce it; and (c) led to improvements we could make to the code to prevent the fault happening in the future.
In addition to the (theoretical) bug fixes, we also added monitoring and logging code, so if the fault were to occur again, we could extract useful data from the device in question.
To the best of my knowledge, this improved software was subsequently deployed on site, and appears to have been successful.

resolved "sterile" and "spooky"
We have two closed bug categories for this situation.
sterile - cannot reproduce.
spooky - it's acknowledged there is a problem, but it just appears intermittently, isn't quite understandable, and gives everyone a faint case of the creeps.

Error-reporting, log files, and stern demands to "Contact me immediately if this happens again."

If it happens in one context, and not in another, we try to enumerate the difference between both, and eliminate them.
Sometimes this works (e.g. other hardware, dual core vs. hyperthreading, laptop-disk vs. workstation disk, ...).
Sometimes it doesn't. If it's possible, we may start remote-debugging. If that doesn't help, we may try get our hands on the customer's system.
But of course, we don't write too many bugs in the first place :)

Well, you try your best to reproduce it, and if you can't, you take a long think and consider how such a problem might arise. If you still have no idea, then there's not much you can do about it.

Some of the new features in Visual Studio 2010 will help. See:
Historical Debugger and Test Impact Analysis in Visual Studio Team System 2010
Better Software Quality with Visual Studio Team System 2010
Manual Testing with Visual Studio Team System 2010

Sometimes the bug is not reproducible even in a pre-production environment that is the exact duplicate of the production environment. Concurrency issues are notorious for this.
Random Failures Are Often Concurrency Issues
Link: https://pragprog.com/tips/
The reason can be simply because of the Heisenberg effect, i.e. observation changes behaviour. Another reason can be because the chances are very small of hitting the combination of events that triggers the bug.
Sometimes you are lucky and you have audit logs that you can playback, greatly increasing the chances of recreating the issue. You can also stress the environment with high volumes of transactions. This effectively compresses time so that if the bug occurs say once a week, you may be able to reliably reproduce it in 1 day if you stress the system to 7 X the production load.
The last resort is whitebox testing where you go through the code line by line writing unit tests as you go.

I add logging to the exception handling code throughout the program. You need a method to collect the logs (users can email it, etc.)
Preemptive checks for code versions and sane environments are a good thing too. With the ease of software updates these days the code and environment the user is running has almost certainly not been tested. It didn't exist when you released your code.

With a web project I'm developing at the moment I'm doing something very similar to your technique. I'm building a page that I can direct users to in order to collect information such as their browser version and operating system. I'll also be collecting the apps registry info so i can have a look at what they've been doing.
This is a very real problem. I can only speak for web development, but I find users are rarely able to give me the basic information I would need to look into the issue. I suspect it's entirely possible to do something similar with other kinds of development. My plan is to keep working on this system to make it more and more useful.
But my policy is never to close a bug simply because I can't reproduce it, no matter how annoying it may be. And then there's the cases when it's not a bug, but the user has simply gotten confused. Which is a different type of bug I guess, but just as important.

You talk about problems that are reproducible but only on some systems. These are easy to handle:
First step: By using some sort of remote software, you let the customer tell you what to do to reproduce the problem on the system that has it. If this fails, then close it.
Second step: Try to reproduce the problem on another system. If this fails, make an exact copy of the customers system.
Third step: If it still fails, you have no option than to try to debug it on the customer system.
Once you can reproduce it, you can fix it. Doesn't matter on what system.
The tricky issue are truly non-reproducible issues, that is things that happen only intermittently. For that I'll have to chime in with the reports, logs and stern demands attitude. :)

It is important to categorize such bugs (rarely reproducible) and act on them differently than bugs that are frequently reproducible based on specific user actions.
Clear issue description along with steps to reproduce and observed behavior: Unambiguous reporting helps in understanding of the issue by entire team eliminating incorrect conclusions. For example, user reporting blank screen is different than HMI freeze on user action. Sequence of steps and approx timing of user action is also important.Did the user immediately select the option after screen transition or waited for a few minutes? An interesting bug concerning timing is a car allergic to vanilla ice-cream that baffled automotive engineers.
System config and startup parameters: Sometimes even hardware configuration and application software version (including drivers and firmware version) may do a trick or two. Mismatch of version or configuration can result in issues that are difficult to reproduce in other setups. Hence these are essential details to be captured. Most bug reporting tools have these details as mandatory parameters to report while logging an issue.
Extensive Logging: This is dependent on the logging facilities followed in concerned projects. While working with embedded Linux systems, we not only provide general diagnostic logs, but also system level logs like dmesg or top command logs. You may never know that wrong part is not the code flow but the abnormal memory usage/CPU usage. Identify the type of the issue and report the relevant logs for investigation.
Code Reviews and Walk-through: Dev teams cannot wait forever to reproduce these issues at their end and then take action. Bug report and available logs should be investigated and various possibilities be identified on this basis from design and code. If required, they should prepare hotfix on possible root causes and circulate the hotfix among teams including the tester who identified it to see if bug is reproducible with it.
Don't close these issues based on observation by a single tester/team after a fix is identified and checked in: Perhaps the most important part is approach followed to close these issues. Once fix of these issues has been checked in, all testing/validation teams at different locations should be informed on it for running intensive tests and identifying regression errors if any. Only all (practically most of them) of them reports as non-reproducible, a closure assessment has to be done by senior management.

If it is not reproduce able get logs, screen shots of exact steps to reproduce.

There's a nice new feature in Windows 7 that allows the user to record what they're doing and then send a report - it comes through as a doc with screen-shots of every stage. Hopefully it'll help in the cases where it's the user interacting with the application in an order that the developer wouldn't think of. I've seen plenty of bugs where it's just a case that the developer's logical way of using the app doesn't fit with how end users actually do it... resulting in lots of subtle errors.

Logging is your friend!
Generally what happens when we discover a bug that we can't reproduce is we either ask the customer to turn on more logging (if its available), or we release a version with extra logging added around the area we are interested in. Generally speaking the logging we have is excellent and has the ability to be very verbose, and so releasing versions with extra logging doesn't happen often.
You should also consider the use of memory dumps (which IMO also falls under the umbrella of logging). Producing a minidump is so quick that it can usually be done on production servers, even under load (as long as the number of dumps being produced is low).
The way I see it: Being able to reproduce a problem is nice because it gives you an environment where you can debug, experiement and play around in more freely, but - reproducing a bug is by no means essential to debug it! If the bug is only happening on someone else system then you still need to diagnose and debug the problem in the same way, its just that this time you need to be cleverer about how you do it.

The accepted answer is the best general approach. At a high level, it's worth weighing the importance of fixing the bug against what you could add as a feature or enhance that would benefit the user. Could a 'non-reproducible' bug take two days to fix? Could a feature be added in that time that gives users more benefit than that bug fix? Maybe the users would prefer the feature. I've been fixated at times as a developer on imperfections I can see, and then users are asked for feedback and none of them actually mention the bug(s) that I can see, but the software is missing a feature that they really want!
Sometimes, simple persistence in attempting to reproduce the bug whilst debugging can be the most effective approach. For this strategy to work, the bug needs to be 'intermittent' rather than completely 'non-reproducible'. If you can repeat a bug even one time in 10, and you have ideas about the most likely place it's occurring, you can place breakpoints at those points then doggedly attempt to repeat the bug and see exactly what's going on. I've experienced this to be more effective than logging in one or two cases (although logging would be my first go-to in general).

How to detect and debug multi-threading problems?

This is a follow up to this question, where I didn't get any input on this point. Here is the brief question:
Is it possible to detect and debug problems coming from multi-threaded code?
Often we have to tell our customers: "We can't reproduce the problem here, so we can't fix it. Please tell us the steps to reproduce the problem, then we'll fix it." It's a somehow nasty answer if I know that it is a multi-threading problem, but mostly I don't. How do I get to know that a problem is a multi-threading issue and how to debug it?
I'd like to know if there are any special logging frameworks, or debugging techniques, or code inspectors, or anything else to help solving such issues. General approaches are welcome. If any answer should be language related then keep it to .NET and Java.

Threading/concurrency problems are notoriously difficult to replicate - which is one of the reasons why you should design to avoid or at least minimize the probabilities. This is the reason immutable objects are so valuable. Try to isolate mutable objects to a single thread, and then carefully control the exchange of mutable objects between threads. Attempt to program with a design of object hand-over, rather than "shared" objects. For the latter, use fully synchronized control objects (which are easier to reason about), and avoid having a synchronized object utilize other objects which must also be synchronized - that is, try to keep them self contained. Your best defense is a good design.
Deadlocks are the easiest to debug, if you can get a stack trace when deadlocked. Given the trace, most of which do deadlock detection, it's easy to pinpoint the reason and then reason about the code as to why and how to fix it. With deadlocks, it always going to be a problem acquiring the same locks in different orders.
Live locks are harder - being able to observe the system while in the error state is your best bet there.
Race conditions tend to be extremely difficult to replicate, and are even harder to identify from manual code review. With these, the path I usually take, besides extensive testing to replicate, is to reason about the possibilities, and try to log information to prove or disprove theories. If you have direct evidence of state corruption you may be able to reason about the possible causes based on the corruption.
The more complex the system, the harder it is to find concurrency errors, and to reason about it's behavior. Make use of tools like JVisualVM and remote connect profilers - they can be a life saver if you can connect to a system in an error state and inspect the threads and objects.
Also, beware the differences in possible behavior which are dependent on the number of CPU cores, pipelines, bus bandwidth, etc. Changes in hardware can affect your ability to replicate the problem. Some problems will only show on single-core CPU's others only on multi-cores.
One last thing, try to use concurrency objects distributed with the system libraries - e.g in Java java.util.concurrent is your friend. Writing your own concurrency control objects is hard and fraught with danger; leave it to the experts, if you have a choice.

I thought that the answer you got to your other question was pretty good. But I'll emphasis these points.
Only modify shared state in a critical section (Mutual Exclusion)
Acquire locks in a set order and release them in the opposite order.
Use pre-built abstractions whenever possible (Like the stuff in java.util.concurrent)
Also, some analysis tools can detect some potential issues. For example, FindBugs can find some threading issues in Java programs. Such tools can't find all problems (they aren't silver bullets) but they can help.
As vanslly points out in a comment to this answer, studying well placed logging output can also very helpful, but beware of Heisenbugs.

For Java there is a verification tool called javapathfinder which I find it useful to debug and verify multi-threading application against potential race condition and death-lock bugs from the code.
It works finely with both Eclipse and Netbean IDE.
[2019] the github repository
https://github.com/javapathfinder

Assuming I have reports of troubles that are hard to reproduce I always find these by reading code, preferably pair-code-reading, so you can discuss threading semantics/locking needs. When we do this based on a reported problem, I find we always nail one or more problems fairly quickly. I think it's also a fairly cheap technique to solve hard problems.
Sorry for not being able to tell you to press ctrl+shift+f13, but I don't think there's anything like that available. But just thinking about what the reported issue actually is usually gives a fairly strong sense of direction in the code, so you don't have to start at main().

In addition to the other good answers you already got: Always test on a machine with at least as many processors / processor cores as the customer uses, or as there are active threads in your program. Otherwise some multithreading bugs may be hard to impossible to reproduce.

Apart from crash dumps, a technique is extensive run-time logging: where each thread logs what it's doing.
The first question when an error is reported, then, might be, "Where's the log file?"
Sometimes you can see the problem in the log file: "This thread is detecting an illegal/unexpected state here ... and look, this other thread was doing that, just before and/or just afterwards this."
If the log file doesn't say what's happening, then apologise to the customer, add sufficiently-many extra logging statements to the code, give the new code to the customer, and say that you'll fix it after it happens one more time.

Sometimes, multithreaded solutions cannot be avoided. If there is a bug,it needs to be investigated in real time, which is nearly impossible with most tools like Visual Studio. The only practical solution is to write traces, although the tracing itself should:
not add any delay
not use any locking
be multithreading safe
trace what happened in the correct sequence.
This sounds like an impossible task, but it can be easily achieved by writing the trace into memory. In C#, it would look something like this:
public const int MaxMessages = 0x100;
string[] messages = new string[MaxMessages];
int messagesIndex = -1;
public void Trace(string message) {
int thisIndex = Interlocked.Increment(ref messagesIndex);
messages[thisIndex] = message;
}
The method Trace() is multithreading safe, non blocking and can be called from any thread. On my PC, it takes about 2 microseconds to execute, which should be fast enough.
Add Trace() instructions wherever you think something might go wrong, let the program run, wait until the error happens, stop the trace and then investigate the trace for any errors.
A more detailed description for this approach which also collects thread and timing information, recycles the buffer and outputs the trace nicely you can find at:
CodeProject: Debugging multithreaded code in real time 1

A little chart with some debugging techniques to take in mind in debugging multithreaded code.
The chart is growing, please leave comments and tips to be added.
(update file at this link)

Visual Studio allows you to inspect the call stack of each thread, and you can switch between them. It is by no means enough to track all kinds of threading issues, but it is a start. A lot of improvements for multi-threaded debugging is planned for the upcoming VS2010.
I have used WinDbg + SoS for threading issues in .NET code. You can inspect locks (sync blokcs), thread call stacks etc.

Tess Ferrandez's blog has good examples of using WinDbg to debug deadlocks in .NET.

assert() is your friend for detecting race-conditions. Whenever you enter a critical section, assert that the invariant associated with it is true (that's what CS's are for). Though, unfortunately, the check might be expensive and thus not suitable for use in production environment.

I implemented the tool vmlens to detect race conditions in java programs during runtime. It implements an algorithm called eraser.

Develop code the way that Princess recommended for your other question (Immutable objects, and Erlang-style message passing). It will be easier to detect multi-threading problems, because the interactions between threads will be well defined.

I faced a thread issue which was giving SAME wrong result and was not behaving un-predictably since each time other conditions(memory, scheduler, processing load) were more or less same.
From my experience, I can say that HARDEST PART is to recognize that it is a thread issue, and BEST SOLUTION is to review the multi-threaded code carefully. Just by looking carefully at the thread code you should try to figure out what can go wrong. Other ways (thread dump, profiler etc) will come second to it.

Narrow down on the functions that are being called, and rule out what could and could not be to blame. When you find sections of code that you suspect may be causing the issue, add lots of detailed logging / tracing to it. Once the issue occurs again, inspect the logs to see how the code executed differently than it does in "baseline" situations.
If you are using Visual Studio, you can also set breakpoints and use the Parallel Stacks window. Parallel Stacks is a huge help when debugging concurrent code, and will give you the ability to switch between threads to debug them independently. More info-
https://learn.microsoft.com/en-us/visualstudio/debugger/using-the-parallel-stacks-window?view=vs-2019
https://learn.microsoft.com/en-us/visualstudio/debugger/walkthrough-debugging-a-parallel-application?view=vs-2019

I'm using GNU and use simple script
$ more gdb_tracer
b func.cpp:2871
r
#c
while (1)
next
#step
end

The best thing I can think of is to stay away from multi-threaded code whenever possible. It seems there are very few programmers who can write bug free multi threaded applications and I would argue that there are no coders beeing able to write bug free large multi threaded applications.

Defensive programming [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
When writing code do you consciously program defensively to ensure high program quality and to avoid the possibility of your code being exploited maliciously, e.g. through buffer overflow exploits or code injection ?
What's the "minimum" level of quality you'll always apply to your code ?

In my line of work, our code has to be top quality.
So, we focus on two main things:
Testing
Code reviews
Those bring home the money.

Similar to abyx, in the team I am on developers always use unit testing and code reviews. In addition to that, I also aim to make sure that I don't incorporate code that people may use - I tend to write code only for the basic set of methods required for the object at hand to function as has been spec'd out. I've found that incorporating methods that may never be used, but provide functionality can unintentionally introduce a "backdoor" or unintended/unanticipated use into the system.
It's much easier to go back later and introduce methods, attributes, and properties for which are asked versus anticipating something that may never come.

I'd recommend being defensive for data that enter a "component" or framework. Within a "component" or framework one should think that the data is "correct".
Thinking like this. It is up to the caller to supply correct parameters otherwise ALL functions and methods have to check every incomming parameter. But if the check is only done for the caller the check is only needed once. So, a parameter should be "correct" and thus can be passed through to lower levels.
Always check data from external sources, users etc
A "component" or framework should always check incomming calls.
If there is a bug and a wrong value is used in a call. What is really the right thing todo? One only have an indication that the "data" the program is working on is wrong and some like ASSERTS but others want to use advanced error reporting and possible error recovery. In any case the data is found to be faulty and in few cases it's good to continue working on it. (note it's good if servers don't die at least)
An image sent from a satellite might be a case to try advanced error recovery on...an image downloaded from the internet to put up an error icon for...

I recommend people write code that is fascist in the development environment and benevolent in production.
During development you want to catch bad data/logic/code as early as possible to prevent problems either going unnoticed or resulting in later problems where the root cause is hard to track.
In production handle problems as gracefully as possible. If something really is a non-recoverable error then handle it and present that information to the user.
As an example here's our code to Normalize a vector. If you feed it bad data in development it will scream, in production it returns a safety value.
inline const Vector3 Normalize( Vector3arg vec )
{
const float len = Length(vec);
ASSERTMSG(len > 0.0f "Invalid Normalization");
return len == 0.0f ? vec : vec / len;
}

I always work to prevent things like injection attacks. However, when you work on an internal intranet site, most of the security features feel like wasted effort. I still do them, maybe just not as well.

Well, there is a certain set of best practices for security. At a minimum, for database applications, you need to watch out for SQL Injection.
Other stuff like hashing passwords, encrypting connection strings, etc. are also a standard.
From here on, it depends on the actual application.
Luckily, if you are working with frameworks such as .Net, a lot of security protection comes built-in.

You have to always program defensively I would say even for internal apps, simply because users could just through sheer luck write something that breaks your app. Granted you probably don't have to worry about trying to cheat you out of money but still. Always program defensively and assume the app will fail.

Using Test Driven Development certainly helps. You write a single component at a time and then enumerate all of the potential cases for inputs (via tests) before writing the code. This ensures that you've covered all bases and haven't written any cool code that no-one will use but might break.
Although I don't do anything formal I generally spend some time looking at each class and ensuring that:
if they are in a valid state that they stay in a valid state
there is no way to construct them in an invalid state
Under exceptional circumstances they will fail as gracefully as possible (frequently this is a cleanup and throw)

It depends.
If I am genuinely hacking something up for my own use then I will write the best code that I don't have to think about. Let the compiler be my friend for warnings etc. but I won't automatically create types for the hell of it.
The more likely the code is to be used, even occasionally, I ramp up the level of checks.
minimal magic numbers
better variable names
fully checked & defined array/string lengths
programming by contract assertions
null value checks
exceptions (depending upon context of the code)
basic explanatory comments
accessible usage documentation (if perl etc.)

I'll take a different definition of defensive programming, as the one that's advocated by Effective Java by Josh Bloch. In the book, he talks about how to handle mutable objects that callers pass to your code (e.g., in setters), and mutable objects that you pass to callers (e.g., in getters).
For setters, make sure to clone any mutable objects, and store the clone. This way, callers cannot change the passed-in object after the fact to break your program's invariants.
For getters, either return an immutable view of your internal data, if the interface allows it; or else return a clone of the internal data.
When calling user-supplied callbacks with internal data, send in an immutable view or clone, as appropriate, unless you intend the callback to alter the data, in which case you have to validate it after the fact.
The take-home message is to make sure no outside code can hold an alias to any mutable objects that you use internally, so that you can maintain your invariants.

I am very much of the opinion that correct programming will protect against these risks. Things like avoiding deprecated functions, which (in the Microsoft C++ libraries at least) are commonly deprecated because of security vulnerabilities, and validating everything that crosses an external boundary.
Functions that are only called from your code should not require excessive parameter validation because you control the caller, that is, no external boundary is crossed. Functions called by other people's code should assume that the incoming parameters will be invalid and/or malicious at some point.
My approach to dealing with exposed functions is to simply crash out, with a helpful message if possible. If the caller can't get the parameters right then the problem is in their code and they should fix it, not you. (Obviously you have provided documentation for your function, since it is exposed.)
Code injection is only an issue if your application is able to elevate the current user. If a process can inject code into your application then it could easily write the code to memory and execute it anyway. Without being able to gain full access to the system code injection attacks are pointless. (This is why applications used by administrators should not be writeable by lesser users.)

In my experience, positively employing defensive programming does not necessarily mean that you end up improving the quality of your code. Don't get me wrong, you need to defensively program to catch the kinds of problems that users will come across - users don't like it when your program crashes on them - but this is unlikely to make the code any easier to maintain, test, etc.
Several years ago, we made it policy to use assertions at all levels of our software and this - along with unit testing, code reviews, etc. plus our existing application test suites - had a significant, positive effect on the quality of our code.

Java, Signed JARs and JAAS.
Java to prevent buffer overflow and pointer/stack whacking exploits.
Don't use JNI. ( Java Native Interface) it exposes you to DLL/Shared libraries.
Signed JAR's to stop class loading being a security problem.
JAAS can let your application not trust anyone, even itself.
J2EE has (admittedly limited) built-in support for Role based security.
There is some overhead for some of this but the security holes go away.

Simple answer: It depends.
Too much defensive coding can cause major performance issues.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string