Can feature engineering cause collinearity? - statistics

Recently while doing an analysis on a practice project I noticed some features were too highly correlated after I engineered them. To my understanding, if features are too highly correlated, they might hinder a model’s performance (since they look more like dependent than independent variables).
Since learning about feature engineering, I’ve had the intuition somehow there’s a way feature engineering can cause collinearity between the created variables and the existing ones if certain principles aren’t considered. But I haven’t been able to identify under which circumstances.
The background or question itself might be wrong, but: is there a way feature engineering could cause collinearity? If so, how can this happen and how to avoid it?

Related

Using built in functions

I am developing a Windows Form Application in C#.I have heard that one should not use built in methods and functions in code since hackers have deep understanding of such built in methods and know how to fail them Instead one should always use his/her own functions and methods and if not then call built in functions intelligently from those newly made functions.How much is that true?
A supporting example in favour of my argument is that I have seen developer always develope there own made encryption algorithm like AES,DES,RC4 and Hash functions since they believe that built in encryption algorithm have many times backdoor in them.
What?! No, no, no! Whoever told you this is just wrong.
There is a common fallacy that published source code is more vulnerable to "h4ckerz" because it is available for anyone to spot the flaws in. However, I'm glad you mentioned crypto, because this is an area where this line of reasoning really stands out as the fallacy it is.
One of the most popular questions of all time on https://security.stackexchange.com/ is about a developer (in the OP he was given the pseudonym "Dave") who shared this fear of published code. Dave, like the developer you saw, was trying to homebrew his own encryption algorithm. Here's one of the most popular comments in that thread:
Dave has a fundamentally false premise, that the security of an algorithm relies on (even partially) its obscurity - that's not the case. The security of a hashing algorithm relies on the limits of our understanding of mathematics, and, to a lesser extent, the hardware ability to brute-force it. Once Dave accepts this reality (and it really is reality, read the Wikipedia article on hashing), it's a question of who is smarter - Dave by himself, or a large group of specialists devoted to this very particular problem. (emphasis added)
As a matter of fact, as it stands now, the top two memes on Security.SE are "Don't roll your own" and "Don't be a Dave".
While this has all been about crypto, this applies in general to most open-source software. The chance that a backdoor will get found and fixed goes up with each new set of eyes laid on the code. This should be a simple and uncontroversial premise: the more people are looking for something, the higher the chance it will be found. Yes, this applies to malicious users looking for exploits. However, it also applies to power users, white hat hackers, security researchers, cryptographers, professional developers, and others working for "good", which generally (hopefully) outnumber those working for "evil". This also implicitly relies on the false premise that hackers need to see the source code to do bad things. This should be obviously false based on the sheer number of proprietary systems whose source code has never been published (various Microsoft and Adobe programs come to mind) which have been inundated with vulnerabilities for years. Maybe having source code to read makes the hacker's job easier, but maybe not -- is it easier to pore over source code looking for an attack vector or to just use scanning tools and scripts against a compiled binary?
tl;dr Don't be a Dave. Rolling your own means you have to be the best at everything to succeed, instead of taking a sampling of the best the community has to offer.
Heartbleed
In your comment, you rebut:
Then why was the Heartbleed bug in openSSL not found and corrected [earlier]?
Because no one was looking at it. That's the sad truth. Here's the difference -- what happened once someone did find it? Now tens of thousands of security researchers, crypto experts, and others are looking at it. Suppose the same kind of vulnerability existed in one of the proprietary products I mentioned earlier, which it very well could. Once it's caught (if it's caught), ask yourself:
Could the team of programmers at the company responsible benefit from the help of the entire worldwide community of security experts, cryptographers, and other analysts right now?
If a bug this critical were discovered (and that's a big if!) in your software, would you be prepared to deal with the fallout caused by your custom implementation?
Unless you know of specific failure modes or weaknesses of the built-in methods your application would use and know how to minimize or eliminate them, it is probably better to use the methods provided by the language or library designers, which will often be both more efficient and more secure than what an average programmer would come up with on the fly for a particular project.
Your example absolutely does not support your view: developing your own encryption algorithm without some serious background in the domain and review by cryptanalysts, and then employing it in security-critical code, is a recipe for disaster. Even developing your own custom implementation of an industry standard encryption algorithm can present problems, and almost certainly will if you are inexperienced at it.

Is non-programmatic software development feasible? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm currently faced with a very unusual design problem, and hope that a developer wiser than myself might be able to offer some insight.
Background
Without being too specific, I've been hired by a non-profit organisation to assist with the redevelopment of their legacy, but very valuable (in terms of social value) software. The development team is unlike any I've encountered in my time as a software developer, and is comprised of a small number of developers and a larger group of non-programming domain experts. What makes the arrangement unusual is that the domain experts (lets call them content creators), use custom tooling, some of which is based around a prolog expert system engine, to develop web based software components/forms.
The Problem
The system uses a very awkward postback model to perform logical operations server side and return new forms/results. It is slow, and prone to failure. Simple things, like creating html forms using the existing tooling is much more arduous than it should be. As the demand for a more interactive, and performant experience grows, the software developers are finding increasingly that they have to circumvent the expert system/visual tooling used by the content creators, and write new components by hand in javascript. The content creators feel increasingly that their hands are tied, as they are now unable to contribute new components.
Design approach: Traditional/Typical
I have been advocating for the complete abandonment of the previous model and the adoption of a typical software development process. As mentioned earlier, the project has naturally evolved towards this as the non-programmatic development tooling has become incapable of meeting the needs of the business.
The content creators have a very valuable contribution to make however, and I would like to see them focusing on formally specifying the expected behaviour of the software with tools like Cucumber, instead of being involved in implementation.
Design approach: Non-programmatic
My co-worker, who I respect a great deal and suspect is far more knowledgeable than me, feels that the existing process is fine and that we just need to build better tooling. I can't help but feel however that there is something fundamentally flawed with this approach. I have yet to find one instance, either historical or contemporary where this model of software development has been successful. COBOL was developed with the philosophy of allow business people/domain experts to write applications without the need for a programmer, and to my mind all this did was create a new kind of programmer - the COBOL programmer. If it was possible to develop effective systems allowing non-programmers to create non-trivial applications, surely the demand for programmers would be much lower? The only frameworks that I am aware of that roughly fit this model are SAP's Smart Forms and Microsoft's Dynamix AX - both of which are very domain specific ERP systems.
DSLs, Templating Languages
Something of a compromise between the two concepts would be to implement some kind of DSL as a templating language. I'm not even sure that this would be successful however, as all of the content creators, with one exception, are completely non technical.
I've also considered building a custom IDE based on Visual Studio or Net Beans with graphical/toolbox style tooling.
Thoughts?
Is non-programmatic development a fools errand? Will this always result in something unsatisfactory, requiring hands on development from a programmer?
Many thanks if you've taken the time to read this, and I'd certainly appreciate any feedback.
You say:
Something of a compromise between the
two concepts would be to implement
some kind of DSL as a templating
language. I'm not even sure that this
would be successful however, as all of
the content creators, with one
exception, are completely non
technical.
Honestly, this sounds like exactly the approach I would use. Even "non-technical" users can become proficient (enough) in a simple DSL or templating language to get useful work done.
For example, I do a lot of work with scientific modeling software. Many modelers, while being much more at home with the science than with any form of engineering, have been forced to learn one or more programming languages in order to express their ideas in a way they can use. Heck, as far as I know, Fortran is still a required course in order to get a Meteorology degree, since all the major weather models currently in use are written in Fortran.
As a result, there is a certain community of "scientific programming" which is mostly filled with domain experts with relatively little formal software engineering training, expertise, or even interest. These people are more at home with languages/platforms like Matlab, R, and even Visual Basic (since they can use it to script applications like Excel and ESRI ArcMap). Recently, I've seen Python gaining ground in this space as well, mainly I think because it's relatively easy to learn.
I guess my point is that I see strong parallels between this field and your example. If your domain experts are capable of thinking rigorously about their problems (and this may not be the case, but your question is open-ended enough that it might be) then they are definitely capable of expressing their ideas in an appropriate domain-specific language.
I would start by discussing with the content creators some ideas about how they would like to express their decisions and choices. My guess would be that they would be happy to write "code" (even if you don't have to call it code) to describe what they want. Give them a "debugger" (a tool to interactively explore the consequences of their "code" changes) and some nice "IDE" support application, and I think you'll have a very workable solution.
Think of spreadsheets.
Spreadsheets are a simple system that allows non-technical users to make use of a computer's calculation abilities. In doing so, they have opened up computers to solve a great number of tasks which normally would have required custom software developed to solve them. So, yes non-programmatic software development is possible.
On the other hand, look at spreadsheets. Despite their calculational abilities you really would not want as a programmer to have to develop software with them. In the end, many of the techniques that make programming languages better for programmers make them worse for the general population. The ability to define a function, for example, makes a programmer's life much easier, but I think would confuse most others.
Additionally, past a certain point of complexity trying to use a spreadsheet would be a real pain. The spreadsheet works well within the realm for which it was designed. Once you stray too far out that, its just not workable. And again, its the very tools programmers use to deal with complexity which will prevent a system being both widely usable and sufficiently powerful.
I think that for any given problem area, you could develop a system that allows the experts to specify a solution. It will be much harder to develop that system then to solve the problem in the first place. However, if you repeatedly have similiar problems which the experts can develop solutions for, then it might be worthwhile.
I think development by non-developers is doomed to failure. It's difficult enough when developers try it. What's the going failure rate? 50% or higher?
My advice would be to either buy the closest commercial product you can find or hire somebody to help you develop a custom solution with your non-developer maintenance characteristics in mind.
Being a developer means keeping a million details in mind at once and caring about details like version control, deployment, testing, etc. Most people who don't care about those things quickly tire of the complexity.
By all means involve the domain experts. But don't saddle them with development and maintenance as well.
You could be putting your organization at risk with a poorly done solution. If it's important, do it right.
I don't believe any extensive non-programmer solution is going to work. Programming is more than language, it's knowing how to do things reasonably. Something designed to be non-programmer friendly will still almost certainly contain all the pitfalls a programmer knows to avoid even if it's expressed in English or a GUI.
I think what's needed in a case like this is to have the content creators worry about making content and an actual programmer translate that into reasonable computer code.
I have worked with two ERP systems that were meant for non-programmers and in both cases you could make just about every mistake in the book with them.
... Simple things, like creating html forms using the existing tooling is much more arduous than it should be...
More arduous for whom? You're taking a development model that works (however badly) for the non-programming content creators, and because something is arduous for someone you propose to replace that with a model where the content creators are left out in the cold entirely? Sounds crazy to me.
If your content creators can learn custom tooling built around a Prolog rules engine, then they have shown they can learn enough formalism to contribute to the project. If you think other aspects of the development need to be changed, I see only two honorable choices:
Implement the existing formalism ("custom tooling") using the new technology that you think will make things better in other ways. The content creators contribute exactly as they do now.
Design and implement a domain-specific language that handles the impedance mismatch between what your content creators know and can do and the way you and other developers think the work should be done.
Your scenario is a classic case where a domain-specific language is appropriate. But language design is not easy, especially when combined with serious usability questions. If you are lucky you will be able to hire someone to help you who is expert in both language design and usability. But if you are nonprofit, you probably don't have the budget. In this case one possibility is to look for help from another nonprofit—a nearby university, if you have one.
I'd advise you to read this article before attempting to scrap the whole system. I look at it this way. What changed to prompt the redevelopment? Your domain experts haven't forgotten how to use the original system, so you already have some competent "COBOL programmers" for your domain. From your description, it sounds like mostly the performance requirements have changed, and possibly a greater need for web forms.
Therefore, the desired solution isn't to change the responsibilities of the domain experts, it's to increase the performance and make it easier to create web forms. You have the advantage of an existing code base showing exactly what your domain experts are capable of. It would be a real shame not to use it.
I realize Prolog isn't exactly the hottest language around, but there are faster and slower implementations. Some implementations are designed mostly for programmer interactivity and are dynamically interpreted. Some implementations create optimized compiled native code. There are also complex logical programming techniques like memoization that can be used to enhance performance, but probably no one learns them in school anymore. A flow where content creators focus on creating new content and developers focus on optimization could be very workable. Also, Prolog is ideally suited for the model layer, but not so much for the view layer. Moving more of your view layer to a different technology could really pay off.
In general though, 2 thoughts:
You cannot reduce life to algorithms. Everything we know (philosophically, scientifically) and experience demonstrates this. (Sorry, Dr. Minsky).
That said, a Domain-Specific tool that allows non-programmers to express a finite language is definitely doable as several people have given examples. Another example of this type of system is Mathematica and especially Simulink which are used very successfuly over a range of applications. However, the failure of Expert Systems, Fuzzy logic, and Japan's Fifth Generation computer project of the 80's to take-off demonstrate the difficulty in doing this.
Labview is a very successful none programatic programming environment.
What an interesting problem.
I would have to ultimately agree with you, and disagree with your colleague.
The philosophy and approach of Domain Driven Development/Design exists exactly for your purposes, in that it puts paramount importance on the specific knowledge of the experienced domain experts, and on communicating that knowledge to talented software developers.
See, in your issue, there are two distinct things. The domain, and the software. The domain should be understood and specified first and foremost without software development in mind.
And then the transformation to software happens between the communication between domain experts and programmers.
I think trying to build "programming" tools for domain experts is a waste of time.
In Domain Driven Development your domain experts will continue to be important, and you'll end up with better software.
In your colleague's approach you're trying to replace programmers with tool.... maybe in the future, like, start trek future, that will be possible, but today I don't think so.
I am currently struggling with a similar problem in trying to enable healthcare providers to write rules for workflows, which isn't easy because they aren't programmers. You're a programmer not because you went to programming school -- you're a programmer because you think like a programmer. Fortunately, most hospitals have some anesthesiologist or biomedical engineer who thinks like a programmer and can manage to program. The key is to give the non-programmers-who-think-like-programmers a language that they can learn and master.
In my case, I want doctors to be able to formulate simple rules, such as: "If a patient's temperature gets too low, send their doctor a text message". Of course it's more complicated than that because the definition of "too low" depends on the age of the patient, a patient may have many doctors, and so on, but a smart doctor will be able to figure out those rules. The real issue is that the temperature sensor will often fall off the patient and read ambient temperature, meaning that the rule is useless unless you can figure out how to determine that the thermometer is actually reading the patient's temperature.
The big problem I have is that, while it's relatively easy to create a DSL so that doctors can express IF [temperature] [less-than] [lookup-table [age]] THEN [send-text-message], it's much harder to create one that can express all the different heuristics you might need to try before coming up with the right way to make sure the reading is valid.
In your case, you may want to consider how VB became popular: It has a form drawing tool that anybody can easily use to draw forms and set properties on form items. Since not everything can be specified by form properties and data binding, there's a code-behind mode that lets you do complex logic. But to make the tool accessible to beginning programmers, the language is BASIC, so users didn't have to learn about pointers, memory allocation, or data structures.
While you probably wouldn't want to give your users VB, you might consider a hybrid approach. You would have one "language" (it could be graphical, like VB's form designer or Labview) where inexperienced users can easily do the simple stuff, and another language to enable expert users to do the complicated stuff.
I had this as a comment previously but I figured it deserved more merit.
There are definitely a number of successful 'non-programmatic' tools around, off hand I can think of Labview, VPL and graph based (edit: I just noticed this link has more far more than just shaders on it) shaders which are prevalent in 3D applications.
Having said that, I don't know of any which are suited to web based dev (which appears to be your case).
I dout very much the investment on developing such tools would be worth it (unless maybe you could move to sell it as a product as well).
I agree with you - non-technical people will not be able to program anything non-trivial.
Some products try to create what's basically a really simple programming language. The problem is that programming is an aptitude as well as a skill. It takes a certain kind of mindset to think in the sort of logic used by computers, which just can't be abstracted away by any programming language (at least not without without making assumptions that it can't safely make).
I've seen this in action with business people trying to construct workflows in MS Dynamics CRM. Even though the product was clearly intended to allow them to do it without a programmer they just couldn't figure out how to make a loop or an if-else condition work, no matter how "friendly" the UI tried to make it. I watched in amazement as they struggled with something that seemed completely self-obvious to me. After a full day (!) of this they managed to produce a couple of very basic workflows that worked in some cases, but didn't handle edge cases like missing values or invalid data. It was basically a complete waste of time.
Granted, Dynamics CRM isn't exactly the epitome of user-friendliness, but I saw enough to convince me that this is, indeed, a fool's errand.
Now, if your users are not programmers, but still technical people they might be able to learn programming, but that's another story - they've really become "new programmers" rather than "non-programmers" then.
This is a pretty philosophical topic and difficult to answer for every case, but in general...
Is non-programmatic development a fools errand?
Outside of a very narrow scope, yes. Major software vendors have invested billions over the years in creating various packages to try to let non-technical users create & define workflows and processes with limited success. Your best bet is to take advantage of what has been done in that space rather than trying to re-invent it.
Edit:
Sharepoint, InfoPath, some SAP stuff are the examples I'm talking about. As I said above, "a very narrow scope". It's possible to let non-programmers create workflows, complex forms, some domain processes, but that's it. Anything more general-purpose is simply trying to make non-programmers into programmers by giving them very crude tools.
Non programatic software development IS feasable, as long as you are realistic about what non-developers can reasonaby achieve - its all about a compromise between capability and ease of use.
The key is to break the requirements down cleanly into things that the domain experts need to be able to do, and spend time implementing those features in a foolproof way - The classic mistake is to try to let the system to do too much.
For example suppose you want domain experts to be able to create a form with a masked text input:
Most developers will look at that requirement and create a fancy control which accepts some sort of regular expression and lets the domain expert do anything.
This is the classic developer way of looking at things, however it's likely that your domain expert does not understand regular expressions and the developer has missed the point of the reqirement which was for the domain expert to be able to create this form.
A better solution might have been to create a control that can be confgiured to mask either Email addresses or telephone numbers.
Yes this control is far less capable than the first control, and yes the domain experts have to ask developers to extend it when they want to be able to mask to car registration numbers, however the domain experts are able to use it.
It seems that the problem is of organizational nature and cannot be solved by technical means alone.
The root is that content creators are completely non-technical, yet have to perform inherently technical tasks of designing forms and writing Prolog rules. Various designers and DSLs can alleviate their problems, but never solve them.
Either reorganize system and processes so knowledge carriers actually enter knowledge - nothing else; or train content creators to perform necessary development with existing tools or may be DSLs.
Non-programmatic development can save from low-level chores, but striving to set up system once and let users indefinitely and unrestrictedly expand it is certainly a fools errands.
Computer games companies operate like this all the time, so far as I can tell: a few programmers and a lot of content creators who need to be able to control logic as well (like level designers).
It's also probably a healthy discipline to be able to separate your code from the data and rules driving it if you can.
I'm therefore with your colleague, but of course the specifics may not make this general solution appropriate!

Why are secure programs so difficult to develop (for me)?

Community Wiki Question
Every time I work on a project involving passwords or securing data I get mired down into obscenely complex APIs and issues. I have not had much formal training in developing secure applications but I have not had much formal training in database, GUI, and build processes either. Many other areas of programming feel more intuitive.
Is security just a far more complex area than many others? I tend to think that it is not. Are security infterfaces and systems less mature than others? I would tend to think that there is a great deal of pressure for those systems to mature. On UNIX the 'trusted environment' was the norm until somewhere in the 90s. Is UNIX just suffering from its roots in this area?
Technology changes fast. Since I have been in school the computing world has become far more distributed and critical. Has security been dragged along for the ride as an afterthough? Are any new technologies promising? Are you suffering the same way I am?
Good security is hard, very very hard.
Apart of the considerations from the other answer, it is also because security is a balance between effective protection and effective ease of use. A system that is too complex to use due to its stringent security will likely to fail the user, which will drop it. Your role as a designer/developer is also to find the proper compromise between these two opposing forces, something which tend to introduce even more trouble.
I think that security is so hard because people feel inclined to do what they shouldn't, and try their best to overcome any barriers they find.
Only if this will power was used for something good...

Good APIs for scope analyzers

I'm working on some code generation tools, and a lot of complexity comes from doing scope analysis.
I frequently find myself wanting to know things like
What are the free variables of a function or block?
Where is this symbol declared?
What does this declaration mask?
Does this usage of a symbol potentially occur before initialization?
Does this variable potentially escape?
and I think it's time to rethink my scoping kludge.
I can do all this analysis but am trying to figure out a way to structure APIs so that it's easy to use, and ideally, possible to do enough of this work lazily.
What tools like this are people familiar with, and what did they do right and wrong in their APIs?
I'm a bit surprised at at the question, as I've done tons of code generation and the question of scoping rarely comes up (except occasionally the desire to generate unique names).
To answer your example questions requires serious program analysis well beyond scoping. Escape analysis by itself is nontrivial. Use-before-initialization can be trivial or nontrivial depending on the target language.
In my experience, APIs for program analysis are difficult to design and frequently language-specific. If you're targeting a low-level language you might learn something useful from the Machine SUIF APIs.
In your place I would be tempted to steal someone else's framework for program analysis. George Necula and his students built CIL, which seems to be the current standard for analyzing C code. Laurie Hendren's group have built some nice tools for analyzing Java.
If I had to roll my own I'd worry less about APIs and more about a really good representation for abstract-syntax trees.
In the very limited domain of dataflow analysis (which includes the uninitialized-variable question), João Dias and I have adapted some nice work by Sorin Lerner, David Grove, and Craig Chambers. Only our preliminary results are published.
Finally if you want to generate code in multiple languages this is a complete can of worms. I have done it badly several times. If you create something you like, publish it!

Have we given up on the idea of code reuse? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
A couple of years ago the media was rife with all sorts of articles on
how the idea of code reuse was a simple way to improve productivity
and code quality.
From the blogs and sites I check on a regular basis it seems as though
the idea of "code reuse" has gone out of fashion. Perhaps the 'code
reuse' advocates have all joined the SOA crowd instead? :-)
Interestingly enough, when you search for 'code reuse' in Google the
second result is titled:
"Internal Code Reuse Considered Dangerous"!
To me the idea of code reuse is just common sense, after all look at
the success of the apache commons project!
What I want to know is:
Do you or your company try and reuse code?
If so how and at what level, i.e. low level api, components or
shared business logic? How do you or your company reuse code?
Does it work?
Discuss?
I am fully aware that there are many open source libs available and that anyone who has used .NET or the Java has reused code in some form. That is common sense!
I was referring more to code reuse within an organizations rather than across a community via a shared lib etc.
I originally asked;
Do you or your company try and reuse code?
If so how and at what level, i.e. low level api, components or shared business logic? How do you or your company reuse code?
From where I sit I see very few example of companies trying to reuse code internally?
If you have a piece of code which could potentially be shared across a medium size organization how would you go about informing other members of the company that this lib/api/etc existed and could be of benefit?
The title of the article you are referring to is misleading, and is actually a very good read. Code reuse is very beneficial, but there are downsides with everything. Basically, if I remember correctly, the gist of the article is that you are sealing the code in a black box and not revisiting it, so as the original developers leave you lose the knowledge. While I see the point, I don't necessarily agree with it - at least not to a "sky is falling" regard.
We actually group code reuse into more than just reusable classes, we look at the entire enterprise. Things that are more like framework enhancement or address cross-cutting concerns are put into a development framework that all of our applications use (think things like pre- and post-validation, logging, etc.). We also have business logic that is applicable to more than one application, so those sort of things get moved to a BAL core that is accessible anywhere.
I think that the important thing is not to promote things for reuse if they are not going to really be reused. They should be well documented, so that new developers can have a resource to help them come up to speed, as well. Chances are, if the knowledge isn't shared, the code will eventually be reinvented somewhere else and will lead to duplication if you are not rigorous in documentation and knowledge sharing.
We reuse code - in fact, our developers specifically write code that can be reused in other projects. This has paid off quite nicely - we're able to start new projects quickly, and we iteratively harden our core libraries.
But one can't just write code and expect it to be re-used; code reuse requires communication among team members and other users so people know what code is available, and how to use it.
The following things are needed for code reuse to work effectively:
The code or library itself
Demand for the code across multiple projects or efforts
Communication of the code's features/capabilities
Instructions on how to use the code
A commitment to maintaining and improving the code over time
Code reuse is essential. I find that it also forces me to generalize as much as possible, also making code more adaptable to varying situations. Ideally, almost every lower level library you write should be able to adapt to a new set of requirements for a different application.
I think code reuse is being done through open source projects for the most part. Anything that can be reused or extended is being done via libraries. Java has an amazing number of open source libraries available for doing a large number of things. Compare that to C++, and how early on everything would have to be implemented from scratch using MFC or the Win32 API.
We reuse code.
On a small scale we try to avoid code duplication as much as posible. And we have a complete library with a lot of frequently used code.
Normally code is developed for one application. And if it is generic enough, it is promoted to the library. This works excelent.
The idea of code reuse is no longer a novel idea...hence the apparent lack of interest. But it is still very much a good idea. The entire .NET framework and the Java API are good examples of code reuse in action.
We have grown accustomed to developing OO libraries of code for our projects and reusing them in other projects. Its a part of the natural life cycle of an idea. It is hotly debated for a while and then everyone accepts and there is no reason for further discussion.
Of course we reuse code.
There are a near infinite amount of packages, libraries and shared objects available for all languages, with whole communities of developers behing them supporting and updating.
I think the lack of "media attention" is due to the fact that everyone is doing it, so it's no longer worth writing about. I don't hear as many people raising awareness of Object-Oriented Programming and Unit Testing as I used to either. Everyone is already aware of these concepts (whether they use them or not).
Level of media attention to an issue has little to do with its importance, whether we're talking software development or politics! It's important to avoid wasting development effort by reinventing (or re-maintaining!) the wheel, but this is so well-known by now that an editor probably isn't going to get excited by another article on the subject.
Rather than looking at the number of current articles and blog posts as a measure of importance (or urgency) look at the concepts and buzz-phrases that have become classics or entered the jargon (another form of reuse!) For example, Google for uses of the DRY acronym for good discussion on the many forms of redundancy that can be eliminated in software and development processes.
There's also a role for mature judgment regarding costs of reuse vs. where the benefits are achieved. Some writers advocate waiting to worry about reuse until a second or third use actually emerges, rather than spending effort to generalize bit of code the first time it is written.
My personal view, based on the practise in my company:
Do you or your company try and reuse code?
Obviously, if we have another piece of code that already fits our needs we will reuse it. We don't go out of our way to use square pegs in round holes though.
If so how and at what level, i.e. low level api, components or shared business logic? How do you or your company reuse code?
At every level. It is written into our coding standards that developers should always assume their code will be reused - even if in reality that is highly unlikely. See below
If your OO model is good, your API probably reflects your business domain, so reusable classes probably equates to reusable business logic without additional effort.
For actual reuse, one key point is knowing what code is already available. We resolve this by having everything documented in a central location. We just need a little discipline to ensure that the documentation is up-to-date and searchable in a meaningful way.
Does it work?
Yes, but not because of the potential or actual reuse! In reality, beyond a few core libraries and UI components, there isn't a large amount of reuse.
In my personal opinion, the real value is in making the code reusable. In doing so, aside from a hopefully cleaner API, the code will (a) be documented sufficiently for another developer to use it without trawling the source code, and (b) it will also be replaceable. These points are a great benefit to on-going software maintenance.
Do you or your company try and reuse code? If so how and at what
level, i.e. low level api, components or shared business logic? How do
you or your company reuse code?
I used to work in a codebase with uber code reuse, but it was difficult to maintain because the reused code was unstable. It was prone to design changes and deprecation in ways that cascaded to everything using it. Before that I worked in a codebase with no code reuse where the seniors actually encouraged copying and pasting as a way to reuse even application-specific code, so I got to see the two extremities and I have to say that one isn't necessarily much better than the other when taken to the extremes.
And I used to be an uber bottom-up kind of programmer. You ask me to build something specific and I end up building generalized tools. Then using those tools, I build more complex generalized tools, then start building DIP abstractions to express the design requirements for the lower-level tools, then I build even more complex tools and repeat, and at some point I start writing code that actually does what you want me to do. And as counter-productive as that sounded, I was pretty fast at it and could ship complex products in ways that really surprised people.
Problem was the maintenance over the months, years! After I built layers and layers of these generalized libraries and reused the hell out of them, each one wanted to serve a much greater purpose than what you asked me to do. Each layer wanted to solve the world's hunger needs. So each one was very ambitious: a math library that wants to be amazing and solve the world's hunger needs. Then something built on top of the math library like a geometry library that wants to be amazing and solve the world's hunger needs. You know something's wrong when you're trying to ship a product but your mind is mulling over how well your uber-generalized geometry library works for rendering and modeling when you're supposed to be working on animation because the animation code you're working on needs a few new geometry functions.
Balancing Everyone's Needs
I found in designing these uber-generalized libraries that I had to become obsessed with the needs of every single team member, and I had to learn how raytracing worked, how fluids dynamics worked, how the mesh engine worked, how inverse kinematics worked, how character animation worked, etc. etc. etc. I had to learn how to do pretty much everyone's job on the team because I was balancing all of their specific needs in the design of these uber generalized libraries I left behind while walking a tightrope balancing act of design compromises from all the code reuse (trying to make things better for Bob working on raytracing who is using one of the libraries but without hurting John too much who is working on physics who is also using it but without complicating the design of the library too much to make them both happy).
It got to a point where I was trying to parametrize bounding boxes with policy classes so that they could be stored either as center and half-size as one person wanted or min/max extents as someone else wanted, and the implementation was getting convoluted really fast trying to frantically keep up with everyone's needs.
Design By Committee
And because each layer was trying to serve such a wide range of needs (much wider than we actually needed), they found many reasons to require design changes, sometimes by committee-requested designs (which are usually kind of gross). And then those design changes would cascade upwards and affect all the higher-level code using it, and maintenance of such code started to become a real PITA.
I think you can potentially share more code in a like-minded team. Ours wasn't like-minded at all. These are not real names but I'd have Bill here who is a high-level GUI programmer and scripter who creates nice user-end designs but questionable code with lots of hacks, but it tends to be okay for that type of code. I got Bob here who is an old timer who has been programming since the punch card era who likes to write 10,000 line functions with gotos in them and still doesn't get the point of object-oriented programming. I got Joe here who is like a mathematical wizard but writes code no one else can understand and always make suggestions which are mathematically aligned but not necessarily so efficient from a computational standpoint. Then I got Mike here who is in outer space who wants us to port the software to iPhones and thinks we should all follow Apple's conventions and engineering standards.
Trying to satisfy everyone's needs here while coming up with a decent design was, probably in retrospect, impossible. And in everyone trying to share each other's code, I think we became counter-productive. Each person was competent in an area but trying to come up with designs and standards which everyone is happy with just lead to all kinds of instability and slowed everyone down.
Trade-Offs
So these days I've found the balance is to avoid code reuse for the lowest-level things. I use a top-down approach from the mid-level, perhaps (something not too far divorced from what you asked me to do), and build some independent library there which I can still do in a short amount of time, but the library doesn't intend to produce mini-libs that try to solve the world's hunger needs. Usually such libraries are a little more narrow in purpose than the lower-level ones (ex: a physics library as opposed to a generalized geometry-intersection library).
YMMV, but if there's anything I've learned over the years in the hardest ways possible, it's that there might be a balancing act and a point where we might want to deliberately avoid code reuse in a team setting at some granular level, abandoning some generality for the lowest-level code in favor of decoupling, having malleable code we can better shape to serve more specific rather than generalized needs, and so forth -- maybe even just letting everyone have a little more freedom to do things their own way. But of course all of this is with the aim of still producing a very reusable, generalized library, but the difference is that the library might not decompose into the teeniest generalized libraries, because I found that crossing a certain threshold and trying to make too many teeny, generalized libraries starts to actually become an extremely counter-productive endeavor in the long term -- not in the short term, but in the long run and broad scheme of things.
If you have a piece of code which could potentially be shared across a
medium size organization how would you go about informing other
members of the company that this lib/api/etc existed and could be of
benefit?
I actually am more reluctant these days and find it more forgivable if colleagues do some redundant work because I would want to make sure that code does something fairly useful and non-trivial and is also really well-tested and designed before I try to share it with people and accumulate a bunch of dependencies to it. The design should have very, very few reasons to require any changes from that point onwards if I share it with the rest of the team.
Otherwise it could cause more grief than it actually saves.
I used to be so intolerant of redundancy (in code or efforts) because it appeared to translate to a product that was very buggy and explosive in memory use. But I zoomed in too much on redundancy as the key problem, when really the real problem was poor quality, hastily-written code, and a lack of solid testing. Well-tested, reliable, efficient code wouldn't suffer that problem to nearly as great of a degree even if some people duplicate, say, some math functions here and there.
One of the common sense things to look at and remember that I didn't at the time is how we don't mind some redundancy when we use a very solid third party library. Chances are that you guys use a third party library or two that has some redundant work with what your team is doing. But we don't mind in those cases because the third party library is great and well-tested. I recommend applying that same mindset to your own internal code. The goal should be to create something awesome and well-tested, not to fuss over a little bit of redundancy here and there as I mistakenly did long ago.
So these days I've shifted my intolerance towards a lack of testing instead. Instead of getting upset over redundant efforts, I find it much more productive to get upset over other people's lack of unit and integration testing! :-D
While I think code reuse is valuable, I can see where this sentiment is rooted. I've worked on a lot of projects where much extra care was taken to create re-usable code that was then never reused. Of course reuse is much preferable to duplicate code, but I have seen a lot of very extenisve object models created with the goal of using the objects across the enterprise in multiple projects (kind of the way the same service in SOA can be used in different apps) but have never seen the objects actually used more than once. Maybe I just haven't been part of organizations taking good advantage of the principle of reuse.
The two software projects I've worked on have both been long term development. One is about 10 years old, the other has been around for over 30 years, rewritten in a couple versions of Fortran along the way. Both make extensive reuse of code, but both rely very little on external tools or code libraries. DRY is a big mantra on the newer project, which is in C++ and lends itself more easily to doing that in practice.
Maybe the better question is when do we NOT reuse code these days? We are either in a state on building using someone elses observed "best practices" or prediscovered "design patterns" or just actually building on legacy code, libraries, or copying.
It seems the degree to which code A is reused to make code B is often based around how much the ideas in code A taken to code B are abstracted into design patterns/idioms/books/fleeting thoughts/actual code/libraries. The hard part is in applying all those good ideas to your actual code.
Non-technical types get overzealous about the reuse thing. They don't understand why everything can't be copy-pasted. They don't understand why the greemelfarm needs a special adapter to communicate the same information that it used to to the old system to the new system, and that, unfortunately we can't change either due to a bazillion other reasons.
I think techies have been reusing from day 1 in the same way musicians have been reusing from day 1. Its an ongoing organic evolution and sythesis that will keep ongoing.
Code reuse is an extremely important issue - where code is not reused, projects take longer and are harder for new team members to get into.
However, writing reusable code takes longer.
Personally, I try to write all my code in a reusable way, this takes longer, but it results in the fact that most of my code has become official infrastructures in my organization and that new projects based on these infrastructures take significantly less time.
The danger in reusing code, is if the reused code is not written as an infrastructure - in a general and encapsulated manner with as few as possible assumptions and as much as possible documentation and unit testing, that the code can end up doing unexpected things.
Also, if bugs are found and fixed, or features added, these changes are rarely returned to the source code, resulting in different versions of the reused code, that no one knows of or understands.
The solution is:
1. To design and write the code with not only one project in mind, but to think of future requirements and try to make the design flexible enough to cover them with minimal code change.
2. To enclose the code within libraries that are to be used as-is and not modified within using projects.
3. To allow users to view and modify the code of of the library withing its solution (not within the using project's solution).
4. To design future projects to be based on the existing infrastructures, making changes to the infrastructures as necessary.
5. To charge maintaining the infrastructure to all projects, thus keeping the infrastructure funded.
Maven has solved code reuse. I'm completely serious.

Resources