Organizing Scientific Data and Code - Experiments, Models, Simulation, Implementation - code-organization

I am working on a robotics research project, and would like to know: Does anyone have suggestions for best practices when organizing scientific data and code? Does anyone know of existing scientific libraries with source that I could examine?
Here are the elements of our 'suite':
Experiments - Two types:
Gathering data from existing, 'natural' system.
Data from running behaviors on robotic system.
Models
Description of dnamical system - dynamics, kinematics, etc
Parameters for said system, some of which are derived from type 1 experiments
Simulation - trying to simulate natural behaviors, simulating behaviors on robots
Implementation - code for controlling the robots. Granted this is a large undertaking and has a large infrastructure of its own.
Some design aspects of our 'suite':
Would be good if simulation environment allowed for 'rapid prototyping' (scripts / interactive prompt for simple hacks, quick data inspection, etc - definitely something hard to incorporate) - Currently satisfied through scripting language (Python, MATLAB)
Multiple programming languages
Distributed, collaborative setup - Will be using Git
Unit tests have not yet been incorporated, but will hopefully be later on
Cross Platform (unfortunately) - I am used to Linux, but my team members use Windows, and some of our tools are wed to that platform
I saw this post, and the books look interesting and I have ordered "Writing Scientific Software", but I feel like it will focus primarily on the implementation of the simulation code and less on the overall organization.

The situation you describe is very similar to what we have in our surface dynamics lab.
Some of the work involves keeping measurements data which are analysed at real time, or saved for late analysis.Some other work, on the other hand, involves running simulations and analysing their results.
The data management scheme, which the lab leader picked up at Cambridge while studying there, is centred around a main server which holds the personal files of all lab members. Each member access the files from his work station by mounting the appropriate server folder using NFS. This has its merits and faults. It is easier to back up everything, but is problematic when processing large amounts of data over the net. For this reason i am an exception in the lab, since the simulation i work with generates a large amount of data. This data is saved on my work station, and only the code used to generate it (source code of the simulation and configuration files) are saved on the server.
I also keep my code in an online SVN service, since i can not log into to lab server from home. This is a mandatory practice, which stems from the need to be able to reproduce older results on demands and trace changes to the code if some obscure bug appears. Hence the need to maintain older versions and configuration files.
We also employ low tech methods, such as lab notebooks to record results, modifications, etc.
This content can sometimes be more abstract (no point describing every changed line in the code - you have diff for this. Just the purpose of the change, perhaps some notes about implementations and its date).
Work is done mostly with Matlab. Again i am an exception, as i prefer Python. I also use C for the data generating simulation. Testing are mostly of convergences, since my project now is concerned with comparing to computational models. I just generate results with different configurations, saved in their own respected folder (which i track in my lab logbook). This has the benefits of being able to control and interface the data exactly as i want to, instead of conforming to someone else's ideas and formats.

Related

What is the value add of BDD?

I am now working on a project where we are using cucumber-jvm to drive acceptance tests.
On previous projects I would create internal DSLs in groovy or scala to drive acceptance tests. These DSLs would be fairly simple to use such that even a non-techie would be able to write tests with a little bit of guidance.
What I see is that BDD adds another layer of indirection and semantic sugar to the tests, but I fail to see the value-add, especially if the non-techies can use an internal DSL.
In the case of cucumber, stepDefs seem to scatter the code that drives any given test over several different classes, making the test code difficult to read and debug outside the feature file. On the other hand putting all the code pertaining to one test in a single stepDef class discourages re-use of stepsDefs. Both outcomes are undesirable, leaving me asking what is the use of natural language worth all this extra, and unintuitive indirection?
Is there something I am missing? Like a subtle philosophical difference between ATDD and BDD? Does the former imply imperative testing whereas the latter implies declarative testing? Do these aesthetic differences have intrinsic value?
So I am left asking what is the value add to justify the deterioration in the readability of the actual code that drives the test. Is this BDD stuff actually worth the pain? Is the value add more than just aesthetic?
I would be grateful if someone out there could come up with a compelling argument as to why the gain of BDD surpasses the pain of BDD?
What I see is that BDD adds another layer of indirection and semantic sugar to the tests, but I fail to see the value-add, especially if the non-techies can use an internal DSL.
The extra layer is the plain language .feature file and at the point of creation it has nothing to do with testing, it has to do with creating the requirements of the system using a technique called specification by example to create well defined stories. When written properly in the business language, specification by example are very powerful at creating a shared understanding. This exercise alone can both reduce the amount of rework and can find defects before development starts. This exercise is otherwise known as deliberate discovery.
Once you have a shared understanding and agreement on the specifications, you enter development and make those specifications executable. Here is where you would use ATDD. So BDD and ATDD are not comparable, they are complimentary. As part of ATDD, you drive the development of the system using the behaviour that has been defined by way of example in the story. the nice thing you have as a developer is a formal format that contains preconditions, events, and postconditions that you can automate.
Here on, the automated running of the executable specifications on a CI system will reduce regression and provide you with all the benefits you get from any other automated testing technique.
These really interesting thing is that the executable specification files are long-lived and evolve over time and as you add/change behaviour to your system. Unlike most Agile methodologies where user stories are throw-away after they have been developed, here you have a living documentation of your system, that is also the specifications, that is also the automated test.
Let's now run through a healthy BDD-enabled delivery process (this is not the only way, but it is the way we like to work):
Deliberate Discovery session.
Output = agreed specifications delta
ATDD to drive development
Output = actualizing code, automated tests
Continuous Integration
Output = report with screenshots is browsable documentation of the system
Automated Deployment
Output = working software being consumed
Measure & Learn
Output = New ideas and feedback to feed the next deliberate discover session
So BDD can really help you in the missing piece of most delivery systems, the specifications part. This is typically undisciplined and freeform, and is left up to a few individuals to hold together. This is how BDD is an Agile methodology and not just a testing technique.
With that in mind, let me address some of your other questions.
In the case of cucumber, stepDefs seem to scatter the code that drives any given test over several different classes, making the test code difficult to read and debug outside the feature file. On the other hand putting all the code pertaining to one test in a single stepDef class discourages re-use of stepsDefs. Both outcomes are undesirable, leaving me asking what is the use of natural language worth all this extra, and unintuitive indirection?
If you make the stepDefs a super thin layer on top of your automation testing codebase, then it's easy to reuse the automation code from multiple steps. In the test codebase, you should utilize techniques and principles such as the testing pyramid and the shallow depth of test to ensure you have a robust and fast test automation layer. What's also interesting about this separation is that it allows you to ruse the code between your stepDefs and your unit/integration tests.
Is there something I am missing? Like a subtle philosophical difference between ATDD and BDD? Does the former imply imperative testing whereas the latter implies declarative testing? Do these aesthetic differences have intrinsic value?
As mentioned above, ATDD and BDD are complimentary and not comparable. On the point of imperative/declarative, specification by example as a technique is very specific. When you are performing the deliberate discovery phase, you always as the question "can you give me an example". In that example, you would use exact values. If there are two values that can be used in the preconditions (Given) or event (When) steps, and they have different outcomes (Then step), it means you have two different scenarios. If the have the same outcome, it's likely the same scenario. Therefore as part of the BDD practice, the steps need to be declarative as to gain the benefits of deliberate discovery.
So I am left asking what is the value add to justify the deterioration in the readability of the actual code that drives the test. Is this BDD stuff actually worth the pain? Is the value add more than just aesthetic?
It's worth it if you are working in a team where you want to solve the problem of miscommunication. One of the reasons people fail with BDD is because the writing and automation of features is lefts to the developers and the QA's, and the artifacts are no longer coherent as living specifications, they are just test scripts.
Test scripts tell you how a system does a particular thing but it does not tell you why.
I would be grateful if someone out there could come up with a compelling argument as to why the gain of BDD surpasses the pain of BDD?
It's about using the right tool for the right job. Using Cucumber for writing unit tests or automated test scripts is like using a hammer to put a screw into wood. It might work, but it's never pretty and it's always painful!
On the subject of tools, your typical business analyst / product owner is not going to have the knowledge needed to peek into your source control and work with you on adding / modifying specs. We created a commercial tool to fix this problem by allowing your whole team to collaborate over specifications in the cloud and stays in sync (realtime) with your repository. Check out Simian.
I have also answered a question about BDD here that may be of interest to you that focuses more on development:
Should TDD and BDD be used in conjunction?
Cucumber and Selenium are two popular technologies. Most of the organizations use Selenium for functional testing. These organizations which are using Selenium want to integrate Cucumber with selenium as Cucumber makes it easy to read and to understand the application flow.    Cucumber tool is based on the Behavior Driven Development framework that acts as the bridge between the following people: 
Software Engineer and Business Analyst. 
Manual Tester and Automation Tester. 
Manual Tester and Developers. 
Cucumber also benefits the client to understand the application code as it uses ​Gherkin language which is in Plain Text. Anyone in the organization can understand the behavior of the software.  The syntax's of Gherkin is in the simple text which is ​readable and understandable​.

Benefits of using UML models in software maintenance?

What are the key benefits of using UML models in software maintenance? I came across only few handful of papers regarding costs and benefits of UML models in software maintenance.
UML can ALWAYS be usefull in software related activity, but you must know what you are doing rather than using it because "my boss says UML is cool!".
Benefits of UML can depend on many factors. In some situations it is likely to be more beneficial. I try to mention some of them.
Likely to be more beneficial:
Larger and complexer is your subject, more benefit you can expect. Especialy when they are relationship between different aspects
If this SW maintenance means some extra development/extension, it could be very useful to use UML to clarify it. You can show existing system and the way it should be extended. You can of course always show the nre reqs.
if your system is already modelled in UML, you can use it to locate the problem and plan further enhancements
if both modeller and model reader know OO and have some experience in UML, they will almost always make it beneficial.
if you want to generate some code further more
if you need to support a system with no documentation, it could be usefull to document it first (use reverse engineering to import the code and organize it in UML)
if you plan to mainain this system for a long time and with lots of people
Maybe not so beneficial:
if the maintenance does not involve intensive extension/programming
if the subject under maintenance is too small - it can be more efficient to use natural languege or even spoken word
if either the person who models or who uses the diagram later is not skilled in OO and UML. If none of them is, forget about using UML and start learning it :)
if you already have some other kind of documentation
Imagine this:
Intro
you are a programmer assigned to do a maintenance of large legacy code base (thousands of files, damn many lines of code, written during several past years by several different people who had already go away, coding conventions and class libraries had changed several times during the original development)
Nobody likes that work, it is boring and repetitive so people are trying to get away as soon as they can and do some fresh and shiny development in some hype cool language. You are a 'junior maintainer' and so are most of your co-workers, so you have no wizard to ask and get magic fast and correct answers to your newbie questions.
The code is structured into many different files, many different classes (e.g. in Java to read the code of 20 classes you have to open 20 different files. In C in order to read the code you always have to read 2 files at the same time *.h and *.c, etc.)
Now you have to troubleshoot some problem. You are able to reproduce it in the test lab so you pick a debugger and start hunting the problem, where does the control and data flow and where is the hidden failure (may be some assert will fail or may be you'll spot something 'oh what a stupid bug' with your hawk eye).
Nightmare
No assert failed and you did not spot anything because it is C++ code and saying "hello" in C++ takes about 10 crypto graphy sentences where the "hello" is just in the middle. Most of the code is not written in the language you know but in a language of macros you have never heard about and calls to methods of classes you have never heard about.
Moreover you find out there are several code flows (threads) running at the same time and the data packet you are trying to track gets somewhere marshaled and queued for another thread to process and the debugger stopped at a waiting lock.
You're trying to picture what is going on using the debugger, patiently stepping into the unknown and taking notes on what the different words mean.
After several days of debugging, pulling your hair, cursing all the developers before you, seriously thinking about quitting this job
Relieve
you find some documentation. Someone wrote it using human words, designed for humans to read, explaining the basic concepts, giving links to places in code and other explaining documents and there are even some pictures in the docs.
And you look at the picture (UML diagram or some other pre-UML diagram) and now you can see where does the data flow, what are some of the classes you have already met supposed to do and where are the places you should look.
And you find that the UML diagrams saved you both many dead neurons but also countless hours of reading the code. As instead of reading and understanding thousands of lines of code you can just read hundreds of lines of explaining documents or read just several explaining pictures.
Conclusion
Powered by that new 'I see' knowledge you put breakpoints at several key places, inspect the variable values and you find that at this place X should not be 1. Because the docs said ... and you already know what are the dependencies of the X variable. So you put several other conditional breakpoints around and you finally find a bug in the logic as now the code flows through some combination of conditions that was not foreseen.
So you change few letters in one line of code, the bug is fixed, you have deserved your salary. The maintenance job is not such a nightmare anymore (though you still think about quitting the job) and the UML pictures helped.
So you draw some more UML-style diagrams explaining various inter dependencies (that you have learned the hard way) first using paper and pencil as small documentation one day is better than no documentation ever http://agilemodeling.com/essays/documentLate.htm
I know the conclusion is partially a sci-fi. As no maintainer will actually go and start creating missing documentation for a system that is going to be replaced in several years by newer and shinier. (s)he will just auto-create some UML diagrams by reverse engineering the code, throw them away after use and quit the job as soon as possible
But you get the picture of benefits of using UML models in software maintenance regarding costs?

Is there any advantage in building a business application with an Entity-Component System?

I understand the appeal of using the data-driven Entity-Component System for game development. Naturally I am trying to find other areas to apply this paradigm. As I am about to embark on developing a small business application, I've been wondering how well Entity-Component would fit in with it. However I cannot find any examples or discussions on using Entity-Component in anything besides games. Is there a reason? Would there be any advantages in using Entity-Component in software besides games?
I ended up taking a risk and trying to use ECS outside of the gaming domain (now as an indie, formerly company employee) and with results that astounded me. I wouldn't do things any other way now and have an easier-to-maintain system than ever before (not perfect, but so much better than the COM-style architectures we used to use in my industry). I took the plunge mainly because it seemed to provide answers for all the things me and my team in the past were struggling with using a COM architecture, though I imagined with such a risky move that I might just end up exchanging one set of problems for another (was willing to take the risk now that I was on my own). Turned out I didn't exchange one can of worms for another. ECS solved practically all those problems while barely introducing any new ones.
That said, I'm in the VFX domain and it's not that different from games. We still have to animate things like characters, emit particles, interact with meshes, textures, play sound clips, render the result, allow people to write plugins, scripts, etc.
To try to apply ECS in a business domain is far more ballsy. That said, I imagine it could really help create a maintainable system if you have relatively few systems processing a huge number of entity combinations.
Maintainability
What I found that made ECS so much easier for me to maintain compared to previous object-oriented approaches, and even within my personal projects, was that the previous approaches often transferred the maintenance overhead away from the clients using the classes to the classes themselves. However, there would be dozens of interfaces, hundreds of subclasses, all inheriting different things and implementing different interfaces to maintain individually. Testing also becomes difficult with so many granular classes and the need to do mock testing.
My brain can only handle so much and hundreds of subclasses interacting with each other was far beyond the limit. Very quickly I found myself no longer able to reason about what was going on, let alone when or where, overwhelmed by complex interactions leading to complex side effects, and never so confident that I could sandwich new code somewhere in there without causing unwanted side effects.
The computing scientist’s main challenge is not to get confused by the
complexities of his own making. -- E. W. Dijkstra
This applied even for projects I exclusively authored myself. There came a breaking point, typically after a few hundred thousand LOC or so, where I could no longer even comprehend my own creation. I'd refactor here and there, pick up a little momentum, only to take a vacation, come back, and be lost all over again.
ECS removed that challenge, and I don't mean to the degree that I can take a 2-week vacation, come back to the codebase, look at some code, and get the vision of crystal clarity that I had when I was writing it in the first place. ECS doesn't improve things that much in this regard and it still takes some time for me to reacquaint myself with code I haven't looked at in a good while. The reason ECS helped so much is that I didn't need to recall everything I wrote in order to extend and change the software. The systems are so decoupled from each other that it's not a huge deal if I forgot how one works exactly. I can just concentrate on what I need to do and not have to worry about complex interactions of side effects being triggered through complex interactions of control flow. I can just focus on what I need to do and not have to think much about anything else.
This applies even when introducing brand new core-level features integrated into the product. These days when I introduce a new central feature to the product, like a brand new audio system central to the product, the only thing I have to think much about is how to integrate it into the user interface. Integrating it into the architecture is relatively effortless compared to previous architectures I worked in.
Meanwhile with the ECS, I only have to maintain a couple dozen systems to provide no less functionality than the above. They do have some complex logic inside, but I don't have to maintain the hundreds of different entity combinations there are, since they just store components, and I don't have to maintain component types since they just store raw data and I rarely ever find the need to go back and change them (very close to never).
Extensibility
Being able to extend an ECS architecture in hindsight with central concepts is about the easiest thing I've encountered so far and requires the minimum amount of knowledge of how the existing codebase works.
As a very fresh example, I recently encountered a strong desire for scripters using my software to be able to access entities in the scene using a simple, global name. Before they had to specify a full scene path like, Scene.Lights.World.Sunlight as opposed to simply, Sunlight.
Normally in the previous architectures I worked in, that would have ranged from a highly intrusive to moderately intrusive change. A COM-style system revolving around pure interfaces might require introducing a new interface or, worse, changing an existing one, and updating a few hundred subtypes to implement the new functions. If we had a central abstract base class that everything already inherited, we might be able to modify that one centrally to implement this new interface (or the new parts of an existing interface), but it would likely be monstrous if there was a central base class for everything that might want such a name, and require wading through a lot of delicate code.
With the ECS, all I had to do was introduce a new component, GlobalName, with a system that processes GlobalName components and can find an entity quickly through a specified name. It also handles making sure that no two GlobalName components have a matching name. Due to the nature of the ECS, it's also very easy to pick up when this GlobalName component is destroyed as a result of an entity being destroyed or the component being removed from it to keep the data structure used to accelerate searches by name (a trie) in sync.
After that I was just able to attach this GlobalName component to anything that scripters wanted to refer to by a global name. They can also attach it themselves and then refer to a given entity later through that name. Components also serialize themselves in ways that preserves backwards compatibility for the most part (ex: previous versions of the software which did not know what GlobalName was will simply ignore it upon loading scene data referring to it).
It was about as painless and as non-intrusive of a change as I could change imagine considering that this was added very late in hindsight to a 4-year old software which did not anticipate the need for this whatsoever. And I managed to get it working just fine on the very first try. As a bonus, all the non-trivial code newly added to make this work lives isolated in its own space; it's not jumbled up with anything else and doesn't contribute to the complexity of anything else as would inevitably have to be the case if I used abstract interfaces or base classes. I did not have to modify anything central to make this work except a few lines of trivial script and some trivial GUI code to display these global names when available.
"Inherit Anywhere"
Have you ever wished you could extend a class's functionality from anywhere in your code without actually modifying its code? For example:
// In some part of the system exists a complex beast of a class
// which is tricky modify:
class Foo {...};
// In some other part of the system is a simple class that offers
// new behavior we'd like to have in 'Foo', with abstract functionality
// (virtual functions, i.e.) open to substitution:
class Bar {...};
// In some totally different part of the system, maybe even a script,
// make Foo inherit Bar's behavior on the fly, including its default
// constructor, copy constructor, and destructor behavior for Bar's state.
Foo.inherit(Bar);
The above leaves the question: where will the abstract functionality of Bar be implemented, since Foo doesn't provide such an
implementation? That's where systems analogically kick in for ECS.
I think the temptation will be there for most of us who had to wade through some existing class's complex code to just make it do a few new things while risking causing unwanted side effects/glitches/toe-stepping, or we might have even faced a temptation for a third party library outside of our control to just offer a little bit more functionality that we'd find very useful throughout the code using this third party library if it just provided "this one thing", or we might just hate the idea of having to change our colleagues' existing code (don't want to step on toes) even though we're tasked to provide new central behavior.
ECS offers you that kind of flexibility although in a very different way from the above example (but gives you the analogical benefits). It allows you to extend anything's behavior/functionality/state from anywhere. As in the above example of extensibility, I did not have to modify anything that exists to provide that global name searching functionality and state. I can extend the behavior of these entities from the outside, even from script, by just adding a new type of component to any entity I want at which point any systems I write interested in such components will then be able to pick up and process using a duck typing approach ("If it has a GlobalName component, it can be provided a global name which can be utilized to find a matching component very quickly").
Associating Data
Similar to the above, have you ever faced a temptation to associate data to existing objects in the code? In such cases we might have to maintain parallel arrays or associative containers like dictionaries/maps, and such code can be tricky to write correctly given that it has to stay in sync as new objects are added and removed.
ECS solves that problem at a central level, since now you can just attach components and remove components to/from any entity you want very efficiently. That becomes your means of associating new data on the fly. You no longer have to manually synchronize associative data structures.
Testing
Another issue for me just personally, and it may be because I never mastered the art of unit testing (though I did work with a colleague who really studied up on the subject), is that it never made me confident that a system was relatively bug-free. Integration tests gave me greater confidence in that regard. The problem for me was this: even if the unit test passes, how do you know the client will not misuse the interface? What if they use it at the wrong time? What if they try to use it from multiple threads when it's deliberately not designed to be thread-safe?
I get no huge sense of relief to see unit tests passing, since most of the bugs encountered had to do with what was going on between the interfaces being tested, and we had many incoming in spite of all of the hundreds of unit tests we wrote passing. I love test-driven development, and I did find value in the unit test of telling me that this one unit was doing what it was supposed to do which allowed me to use it more confidently throughout the codebase, but the unit testing never gave me a huge sense of relief about the correctness of the codebase as a whole.
ECS solved that problem for me and made unit testing much more valuable even to someone like me who never mastered the art of testing since there are a handful of systems, they each do their hefty share of work (not granular little objects), and they're concrete. If we have to do anything resembling mock testing, it's simply to insert the components/entities necessary to run them and test them. It starts to feel like testing a system is closer to integration testing than unit testing, even though the system is the smallest testable unit.
Homogeneous Processing
To apply ECS requires embracing a more loopy kind of logic with more homogeneous loops doing one thing at a time. A lot of OOP tends to encourage non-homogeneous control flows and complex interactions causing many things to happen in any given phase/state of the system. This was the most difficult part I found initially since I wanted to apply disparate tasks at one time to a given entity/set of components, and my temptation couldn't be satisfied so directly given decoupled systems which only perform one task at a time. So I had to learn how to defer processing, storing some state for the next system to use, and I also use (to a minimum) an event queue so that systems can trigger events which get processed by others.
Nevertheless, I found ways to program the equivalent of a complex interaction as a result of a series of simple loops doing one thing at a time. It never turned out to be as difficult as I imagined to force myself to work this way, applying one uniform task over one set of entities at one time. And after being forced to do this for a while and maintaining the results -- wow! I should have been doing that all along. It's actually kind of depressing reflecting back on a decade of maintaining architectures that were so much harder to maintain than they needed to be after getting the breath of fresh air that was the ECS architecture.
Interactions
This is a simplified "interaction" diagram (not necessarily indicative of direct coupling, as the coupling version would be from concrete objects to abstract interfaces) comparing the differences before and after I adopted ECS. Here's before:
Except that's just between a small number of types (I was too lazy to draw hundreds). And this was why I always struggled to maintain these things and felt tangled up in the code. It's because the interactions between the code were actually a tangled mess, leading you to all sorts of remote functions in the system causing side effects along the way. After (and now the components are just raw data, they contain no functionality of their own):
And the second version was so, so much easier to comprehend, so much easier to extend, so much easier to maintain, so much easier to reason about in terms of correctness, so much easier to test, etc. If your business architecture can effectively fit into the second type of model, I can't overstate how much it can simplify everything.
Invariants
One of the scariest parts to me when I started developing the ECS engine was the lack of information hiding. When components are just raw data, they're dangling what I thought should be their privates in the air for anyone to touch. This could be doubly scary in a business domain that might be more mission-critical in nature.
Yet I found invariants just as easy to maintain, if not more, due to the limited number of systems that access any given component (and typically if the data is modified, it only makes sense for one system in the entire codebase to do it), the extremely simple control flows, and the extremely predictable side effects that result. And it's pretty easy to test the codebase for correctness when you just have a handful of systems to worry about as far as functionality.
Conclusion
So if you are willing to take the risk, I think it could potentially be applied very effectively in certain business domains. The main thing I think is worth thinking about upfront first is if you can model the entirety of your software's needs as a handful of systems processing data stored in components, with each system still performing a bulky but singular responsibility (the analogical equivalents of a RenderingSystem, GuiSystem, PhysicsSystem, InputSystem, etc). Naturally the benefits of ECS diminish if you find you need hundreds of disparate systems to capture the business logic.
If you're interested, I can extend my answer in some later iterations and try to go over some of the minor struggles I faced with the ECS when I was completely wet behind the ears about it.
(Apologies for the necromancy)
Coming from an enterprise background, I have recently been considering this question. Entity-component systems are comparatively new, and represent a completely different design paradigm to what most business developers will have experience with.
Considering my own company's example, I have seen a few scenarios where an entity-component system would offer benefits.
For example, in our primary application, addresses are associated with contacts and organisations. (There are ContactAddress and OrganisationAddress joining tables in our database.) One client wishes to associate projects with addresses as well. There are many ways of achieving this, but an entity-component based approach would seem quite elegant to me - simply add an Addressable component to the Project entity, and the GUI should sort itself out.
Instead, we will likely be adding a new joining table and new data-input pages (albeit re-using common controls).
The primary disadvantage, I would think, would be (initial) lack of developer awareness of the best ways of applying this paradigm to business software, precisely because it doesn't appear to have been done before. Once you start with such an approach, you are committed to it - if it proves frustrating once your project reaches a certain complexity, there's no way out without a significant rewrite.

Successful Non-programmer, 5GL, Visual, 0 Source Code or Similar Tools?

Can anyone give me an example of successful non-programmer, 5GL (not that I am sure what they are!), visual, 0 source code or similar tools that business users or analysts can use to create applications?
I don’t believe there are and I would like to be proven wrong.
At the company that I work at, we have developed in-house MVC that we use to develop web applications. It is basically a reduced state-machine written in XML (à la Spring WebFlow) for controller and a simple template based engine for presentation. Some of the benefits:
dynamic nature: no need to recompile to see the changes
reduced “semantic load”: basically, actions in controller know only “If”. Therefore, it is easy to train someone to develop apps.
The current trend in the company (or at least at management level) is to try to produce tools for the platform that require 0 source code, are visual etc. It has a good effect on clients (or at least at management level) since:
they can be convinced that this way they will need no programmers or at least will be able to hire end-of-the-lather programmers that cost much less than typical programmers.
It appears that there is a reduced risk involved, since the tool limits the implementer or user (just don’t use the word programmer!) in what he can do, so there is a less chance that he can introduce error
It appears to simplify the whole problem since there seems to be no programming involved (notoriously complex). Since applications load dynamically, there is less complexity then typically associated with J2EE lifecycle: compile, package, deploy etc.
I am personally skeptic that something like this can be achieved. Solution we have today has a number of problems:
Implementers write JavaScript code to enrich pages (could be solved by developing widgets). Albeit client-side, still a code that can become very complex and result in some difficult bugs.
There is already a visual tool, but implementers prefer editing XML since it is quicker and easier. For comparison, I guess not many use Eclipse Spring WebFlow plug-in to edit flow XML.
There is a very poor reuse in the solution (based on copy-paste of XML). This hampers productivity and some other aspects, like fostering business knowledge.
There have been numerous performance and other issues based on incorrect use of the tools. No matter how reduced the playfield, there is always space for error.
While the platform is probably more productive than Struts, I doubt it is more productive than today’s RAD web frameworks like RoR or Grails.
Verbosity
Historically, there have been numerous failures in this direction. The idea of programs written by non-programmers is old but AFAIK never successful. At certain level, anything but the power of source code becomes irreplaceable.
Today, there is a lot of talk about DSLs, but not as something that non-programmers should write, more like something they could read.
It seems to me that the direction company is taking in this respect is a dead-end. What do you think?
EDIT: It is worth noting (and that's where some of insipiration is coming from) that many big players are experimenting in that direction. See Microsoft Popfly, Google Sites, iRise, many Mashup solutions etc.
Yes, it's a dead end. The problem is simple: no matter how simple you make the expression of a solution, you still have to analyze and understand the problem to be solved. That's about 80-90% of how (most good) programmers spend their time, and it's the part that takes the real skill and thinking. Yes, once you've decided what to do, there's some skill involved in figuring out how to do that (in a programming language of your choice). In most cases, that's a small part of the problem, and the least open to things like schedule slippage, cost overrun or outright failure.
Most serious problems in software projects occur at a much earlier stage, in the part where you're simply trying to figure out what the system should do, what users must/should/may do which things, what problems the system will (and won't) attempt to solve, and so on. Those are the hard problems, and changing the environment to expressing the solution in some way other that source code will do precisely nothing to help any of those difficult problems.
For a more complete treatise on the subject, you might want to read No Silver Bullet - Essence and Accident in Software Engineering, by Frederick Brooks (Included in the 20th Anniversary Edition of The Mythical Man-Month). The entire paper is about essentially this question: how much of the effort involved in software engineering is essential, and how much is an accidental result of the tools, environments, programming languages, etc., that we use. His conclusion was that no technology was available that gave any reasonable hope of improving productivity by as much as one order of magnitude.
Not to question the decision to use 5GLs, etc, but programming is hard.
John Skeet - Programming is Hard
Coding Horror - Programming is Hard
5GLs have been considered a dead-end for a while now.
I'm thinking of the family of products that include Ms Access, Excel, Clarion for DOS, etc. Where you can make applications with 0 source code and no programmers. Not that they are capable of AI quality operations, but they can make very usable applications.
There will always be "real" languages to do the work, but we can drag and drop the workflow.
I'm using Apple's Automator which allows users to chain together "Actions" exposed by the various applications on their systems.
Actions have inputs and/or outputs, some have UI elements and basic logic can be applied to the chain.
The key difference between automator
and other visual environments is that
the actions use existing application
code and don't require any special
installation.
More Info > http://www.macosxautomation.com/automator/
I've used it to "automate" many batch processes and had really great results (surprises me every time). I've got it running builds and backups and whenever i need to process a mess of text files it comes through.
I would love to know whether iHook or Platypus (osx wrapper builders for shell scripts) could let me develop plugins in python ....
There is definitely room for more applications like this and for more support from OSX application developers but the idea is sound.
Until there's major support there aren't many "actions" available, but a quick check on my system just showed me an extra 30 that i didn't know i had.
PS. There was another app for OS-preX called "Filter Tops" which had a much more limited set of plugins.
How about Dabble DB?
Of course, just like MS Access and other non-programmer programming platforms, it has some necessary limitations in order that the user won't get him or herself stuck... as John pointed out programming is hard. But it does give the user a lot of power, and it seems that most applications that non-programmers want to build are database-type applications anyway.

Have we given up on the idea of code reuse? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
A couple of years ago the media was rife with all sorts of articles on
how the idea of code reuse was a simple way to improve productivity
and code quality.
From the blogs and sites I check on a regular basis it seems as though
the idea of "code reuse" has gone out of fashion. Perhaps the 'code
reuse' advocates have all joined the SOA crowd instead? :-)
Interestingly enough, when you search for 'code reuse' in Google the
second result is titled:
"Internal Code Reuse Considered Dangerous"!
To me the idea of code reuse is just common sense, after all look at
the success of the apache commons project!
What I want to know is:
Do you or your company try and reuse code?
If so how and at what level, i.e. low level api, components or
shared business logic? How do you or your company reuse code?
Does it work?
Discuss?
I am fully aware that there are many open source libs available and that anyone who has used .NET or the Java has reused code in some form. That is common sense!
I was referring more to code reuse within an organizations rather than across a community via a shared lib etc.
I originally asked;
Do you or your company try and reuse code?
If so how and at what level, i.e. low level api, components or shared business logic? How do you or your company reuse code?
From where I sit I see very few example of companies trying to reuse code internally?
If you have a piece of code which could potentially be shared across a medium size organization how would you go about informing other members of the company that this lib/api/etc existed and could be of benefit?
The title of the article you are referring to is misleading, and is actually a very good read. Code reuse is very beneficial, but there are downsides with everything. Basically, if I remember correctly, the gist of the article is that you are sealing the code in a black box and not revisiting it, so as the original developers leave you lose the knowledge. While I see the point, I don't necessarily agree with it - at least not to a "sky is falling" regard.
We actually group code reuse into more than just reusable classes, we look at the entire enterprise. Things that are more like framework enhancement or address cross-cutting concerns are put into a development framework that all of our applications use (think things like pre- and post-validation, logging, etc.). We also have business logic that is applicable to more than one application, so those sort of things get moved to a BAL core that is accessible anywhere.
I think that the important thing is not to promote things for reuse if they are not going to really be reused. They should be well documented, so that new developers can have a resource to help them come up to speed, as well. Chances are, if the knowledge isn't shared, the code will eventually be reinvented somewhere else and will lead to duplication if you are not rigorous in documentation and knowledge sharing.
We reuse code - in fact, our developers specifically write code that can be reused in other projects. This has paid off quite nicely - we're able to start new projects quickly, and we iteratively harden our core libraries.
But one can't just write code and expect it to be re-used; code reuse requires communication among team members and other users so people know what code is available, and how to use it.
The following things are needed for code reuse to work effectively:
The code or library itself
Demand for the code across multiple projects or efforts
Communication of the code's features/capabilities
Instructions on how to use the code
A commitment to maintaining and improving the code over time
Code reuse is essential. I find that it also forces me to generalize as much as possible, also making code more adaptable to varying situations. Ideally, almost every lower level library you write should be able to adapt to a new set of requirements for a different application.
I think code reuse is being done through open source projects for the most part. Anything that can be reused or extended is being done via libraries. Java has an amazing number of open source libraries available for doing a large number of things. Compare that to C++, and how early on everything would have to be implemented from scratch using MFC or the Win32 API.
We reuse code.
On a small scale we try to avoid code duplication as much as posible. And we have a complete library with a lot of frequently used code.
Normally code is developed for one application. And if it is generic enough, it is promoted to the library. This works excelent.
The idea of code reuse is no longer a novel idea...hence the apparent lack of interest. But it is still very much a good idea. The entire .NET framework and the Java API are good examples of code reuse in action.
We have grown accustomed to developing OO libraries of code for our projects and reusing them in other projects. Its a part of the natural life cycle of an idea. It is hotly debated for a while and then everyone accepts and there is no reason for further discussion.
Of course we reuse code.
There are a near infinite amount of packages, libraries and shared objects available for all languages, with whole communities of developers behing them supporting and updating.
I think the lack of "media attention" is due to the fact that everyone is doing it, so it's no longer worth writing about. I don't hear as many people raising awareness of Object-Oriented Programming and Unit Testing as I used to either. Everyone is already aware of these concepts (whether they use them or not).
Level of media attention to an issue has little to do with its importance, whether we're talking software development or politics! It's important to avoid wasting development effort by reinventing (or re-maintaining!) the wheel, but this is so well-known by now that an editor probably isn't going to get excited by another article on the subject.
Rather than looking at the number of current articles and blog posts as a measure of importance (or urgency) look at the concepts and buzz-phrases that have become classics or entered the jargon (another form of reuse!) For example, Google for uses of the DRY acronym for good discussion on the many forms of redundancy that can be eliminated in software and development processes.
There's also a role for mature judgment regarding costs of reuse vs. where the benefits are achieved. Some writers advocate waiting to worry about reuse until a second or third use actually emerges, rather than spending effort to generalize bit of code the first time it is written.
My personal view, based on the practise in my company:
Do you or your company try and reuse code?
Obviously, if we have another piece of code that already fits our needs we will reuse it. We don't go out of our way to use square pegs in round holes though.
If so how and at what level, i.e. low level api, components or shared business logic? How do you or your company reuse code?
At every level. It is written into our coding standards that developers should always assume their code will be reused - even if in reality that is highly unlikely. See below
If your OO model is good, your API probably reflects your business domain, so reusable classes probably equates to reusable business logic without additional effort.
For actual reuse, one key point is knowing what code is already available. We resolve this by having everything documented in a central location. We just need a little discipline to ensure that the documentation is up-to-date and searchable in a meaningful way.
Does it work?
Yes, but not because of the potential or actual reuse! In reality, beyond a few core libraries and UI components, there isn't a large amount of reuse.
In my personal opinion, the real value is in making the code reusable. In doing so, aside from a hopefully cleaner API, the code will (a) be documented sufficiently for another developer to use it without trawling the source code, and (b) it will also be replaceable. These points are a great benefit to on-going software maintenance.
Do you or your company try and reuse code? If so how and at what
level, i.e. low level api, components or shared business logic? How do
you or your company reuse code?
I used to work in a codebase with uber code reuse, but it was difficult to maintain because the reused code was unstable. It was prone to design changes and deprecation in ways that cascaded to everything using it. Before that I worked in a codebase with no code reuse where the seniors actually encouraged copying and pasting as a way to reuse even application-specific code, so I got to see the two extremities and I have to say that one isn't necessarily much better than the other when taken to the extremes.
And I used to be an uber bottom-up kind of programmer. You ask me to build something specific and I end up building generalized tools. Then using those tools, I build more complex generalized tools, then start building DIP abstractions to express the design requirements for the lower-level tools, then I build even more complex tools and repeat, and at some point I start writing code that actually does what you want me to do. And as counter-productive as that sounded, I was pretty fast at it and could ship complex products in ways that really surprised people.
Problem was the maintenance over the months, years! After I built layers and layers of these generalized libraries and reused the hell out of them, each one wanted to serve a much greater purpose than what you asked me to do. Each layer wanted to solve the world's hunger needs. So each one was very ambitious: a math library that wants to be amazing and solve the world's hunger needs. Then something built on top of the math library like a geometry library that wants to be amazing and solve the world's hunger needs. You know something's wrong when you're trying to ship a product but your mind is mulling over how well your uber-generalized geometry library works for rendering and modeling when you're supposed to be working on animation because the animation code you're working on needs a few new geometry functions.
Balancing Everyone's Needs
I found in designing these uber-generalized libraries that I had to become obsessed with the needs of every single team member, and I had to learn how raytracing worked, how fluids dynamics worked, how the mesh engine worked, how inverse kinematics worked, how character animation worked, etc. etc. etc. I had to learn how to do pretty much everyone's job on the team because I was balancing all of their specific needs in the design of these uber generalized libraries I left behind while walking a tightrope balancing act of design compromises from all the code reuse (trying to make things better for Bob working on raytracing who is using one of the libraries but without hurting John too much who is working on physics who is also using it but without complicating the design of the library too much to make them both happy).
It got to a point where I was trying to parametrize bounding boxes with policy classes so that they could be stored either as center and half-size as one person wanted or min/max extents as someone else wanted, and the implementation was getting convoluted really fast trying to frantically keep up with everyone's needs.
Design By Committee
And because each layer was trying to serve such a wide range of needs (much wider than we actually needed), they found many reasons to require design changes, sometimes by committee-requested designs (which are usually kind of gross). And then those design changes would cascade upwards and affect all the higher-level code using it, and maintenance of such code started to become a real PITA.
I think you can potentially share more code in a like-minded team. Ours wasn't like-minded at all. These are not real names but I'd have Bill here who is a high-level GUI programmer and scripter who creates nice user-end designs but questionable code with lots of hacks, but it tends to be okay for that type of code. I got Bob here who is an old timer who has been programming since the punch card era who likes to write 10,000 line functions with gotos in them and still doesn't get the point of object-oriented programming. I got Joe here who is like a mathematical wizard but writes code no one else can understand and always make suggestions which are mathematically aligned but not necessarily so efficient from a computational standpoint. Then I got Mike here who is in outer space who wants us to port the software to iPhones and thinks we should all follow Apple's conventions and engineering standards.
Trying to satisfy everyone's needs here while coming up with a decent design was, probably in retrospect, impossible. And in everyone trying to share each other's code, I think we became counter-productive. Each person was competent in an area but trying to come up with designs and standards which everyone is happy with just lead to all kinds of instability and slowed everyone down.
Trade-Offs
So these days I've found the balance is to avoid code reuse for the lowest-level things. I use a top-down approach from the mid-level, perhaps (something not too far divorced from what you asked me to do), and build some independent library there which I can still do in a short amount of time, but the library doesn't intend to produce mini-libs that try to solve the world's hunger needs. Usually such libraries are a little more narrow in purpose than the lower-level ones (ex: a physics library as opposed to a generalized geometry-intersection library).
YMMV, but if there's anything I've learned over the years in the hardest ways possible, it's that there might be a balancing act and a point where we might want to deliberately avoid code reuse in a team setting at some granular level, abandoning some generality for the lowest-level code in favor of decoupling, having malleable code we can better shape to serve more specific rather than generalized needs, and so forth -- maybe even just letting everyone have a little more freedom to do things their own way. But of course all of this is with the aim of still producing a very reusable, generalized library, but the difference is that the library might not decompose into the teeniest generalized libraries, because I found that crossing a certain threshold and trying to make too many teeny, generalized libraries starts to actually become an extremely counter-productive endeavor in the long term -- not in the short term, but in the long run and broad scheme of things.
If you have a piece of code which could potentially be shared across a
medium size organization how would you go about informing other
members of the company that this lib/api/etc existed and could be of
benefit?
I actually am more reluctant these days and find it more forgivable if colleagues do some redundant work because I would want to make sure that code does something fairly useful and non-trivial and is also really well-tested and designed before I try to share it with people and accumulate a bunch of dependencies to it. The design should have very, very few reasons to require any changes from that point onwards if I share it with the rest of the team.
Otherwise it could cause more grief than it actually saves.
I used to be so intolerant of redundancy (in code or efforts) because it appeared to translate to a product that was very buggy and explosive in memory use. But I zoomed in too much on redundancy as the key problem, when really the real problem was poor quality, hastily-written code, and a lack of solid testing. Well-tested, reliable, efficient code wouldn't suffer that problem to nearly as great of a degree even if some people duplicate, say, some math functions here and there.
One of the common sense things to look at and remember that I didn't at the time is how we don't mind some redundancy when we use a very solid third party library. Chances are that you guys use a third party library or two that has some redundant work with what your team is doing. But we don't mind in those cases because the third party library is great and well-tested. I recommend applying that same mindset to your own internal code. The goal should be to create something awesome and well-tested, not to fuss over a little bit of redundancy here and there as I mistakenly did long ago.
So these days I've shifted my intolerance towards a lack of testing instead. Instead of getting upset over redundant efforts, I find it much more productive to get upset over other people's lack of unit and integration testing! :-D
While I think code reuse is valuable, I can see where this sentiment is rooted. I've worked on a lot of projects where much extra care was taken to create re-usable code that was then never reused. Of course reuse is much preferable to duplicate code, but I have seen a lot of very extenisve object models created with the goal of using the objects across the enterprise in multiple projects (kind of the way the same service in SOA can be used in different apps) but have never seen the objects actually used more than once. Maybe I just haven't been part of organizations taking good advantage of the principle of reuse.
The two software projects I've worked on have both been long term development. One is about 10 years old, the other has been around for over 30 years, rewritten in a couple versions of Fortran along the way. Both make extensive reuse of code, but both rely very little on external tools or code libraries. DRY is a big mantra on the newer project, which is in C++ and lends itself more easily to doing that in practice.
Maybe the better question is when do we NOT reuse code these days? We are either in a state on building using someone elses observed "best practices" or prediscovered "design patterns" or just actually building on legacy code, libraries, or copying.
It seems the degree to which code A is reused to make code B is often based around how much the ideas in code A taken to code B are abstracted into design patterns/idioms/books/fleeting thoughts/actual code/libraries. The hard part is in applying all those good ideas to your actual code.
Non-technical types get overzealous about the reuse thing. They don't understand why everything can't be copy-pasted. They don't understand why the greemelfarm needs a special adapter to communicate the same information that it used to to the old system to the new system, and that, unfortunately we can't change either due to a bazillion other reasons.
I think techies have been reusing from day 1 in the same way musicians have been reusing from day 1. Its an ongoing organic evolution and sythesis that will keep ongoing.
Code reuse is an extremely important issue - where code is not reused, projects take longer and are harder for new team members to get into.
However, writing reusable code takes longer.
Personally, I try to write all my code in a reusable way, this takes longer, but it results in the fact that most of my code has become official infrastructures in my organization and that new projects based on these infrastructures take significantly less time.
The danger in reusing code, is if the reused code is not written as an infrastructure - in a general and encapsulated manner with as few as possible assumptions and as much as possible documentation and unit testing, that the code can end up doing unexpected things.
Also, if bugs are found and fixed, or features added, these changes are rarely returned to the source code, resulting in different versions of the reused code, that no one knows of or understands.
The solution is:
1. To design and write the code with not only one project in mind, but to think of future requirements and try to make the design flexible enough to cover them with minimal code change.
2. To enclose the code within libraries that are to be used as-is and not modified within using projects.
3. To allow users to view and modify the code of of the library withing its solution (not within the using project's solution).
4. To design future projects to be based on the existing infrastructures, making changes to the infrastructures as necessary.
5. To charge maintaining the infrastructure to all projects, thus keeping the infrastructure funded.
Maven has solved code reuse. I'm completely serious.

Resources