Scenario: 2 developers working on the same project (VS2010, C#, MVC3, WinXP) on seperate stand alone computers. Due to IA restriction (DOD) we are NOT allowed to connect these two computers in any way. The only way we are allowed to pass data between computers is via a CD-R/DVD-R disk. We need to be able to share a SVN repository for the code we are writing. I'm trying to figure out what the best way to do this would be.
Will this scenario even work? What the best workflow to use? I would appreciate any guidance or suggestions on the best way to do this.
It sounds to me that you would be better off using distributed source control, such as Mercurial or Git for this project. SVN makes it exceptionally hard to merge, and distributed source control would make it so that you just have to pass ChangeSets back and forth.
Also, distributed source control houses a repository on each system, which is what you would have to do in this situation anyways.
This book should help you with most things Mercurial-related.
This Link explains how to pull new ChangeSets into your repository.

In your situation I would propose the following scenario: setup and maintain SVN repository on the one selected PC (let's say the most reliable one), the other members pass CD-R's with patches when they finish part of work, then all patches are integrated in that SVN repo and for each members own patches are created to have similar code on each PC. I know, this sounds awkward, but maybe the best option in this case and operations with patches can be automatized.

From a design perspective I think the code architecture needs to be good with clear separation of modules, less coupled codes, follow strict OOP, reduce code dependency and I guess in that way two people can easily work without much interaction... do plan your integration and do have your code / class signatures defined before hand if possible.


Centralized vs. Distributed version control security

As my company begins to further explore moving from centralized version control tools (CVS, SVN, Perforce and a host of others) to offering teams distributed version control tools (mercurial in our case) I've run into a problem:
The Problem
A manager has raised the concern that distributed version control may not be as secure as our CVCS options because the repo history is stored locally on the developer's machine.
It's been difficult to nail down his exact security concern but I've gathered that it centers on the fact that a malicious employee could steal not only the latest intellectual properly but our whole history of changes just by copying a single folder.
The Question(s)
Do distributed version control system really introduce new security concerns for projects?
Is it easier to maliciously steal code?
Does the complete history represent an additional threat that the latest version of the code does not?
My Thoughts
My take is that this may be a mistaken thought that the centralized model is more secure because the history seems to be safer as it is off on its own box. Given that users with even read access to a centralized repo could selectively extract snapshots of the project at any key revision I'm not sure the DVCS model makes it all that easier. Also, most CVCS tools allow you to extract the whole repo's history with a single command so that you can import them into other tools.
I think the other issue is just how important the history is compared to the latest version. Granted someone could have checked in a top secret file, then deleted it and the history would pretty quickly be significant. But even in that scenario a CVCS user could checkout that top secret version with a single command.
I'm sure I could be missing something or downplaying risks as I'm eager to see DVCS become a fully supported tool option. Please contribute any ideas you have on security concerns.
If you have read access to a CVCS, you have enough permissions to convert the repo to a DVCS, which people do all the time. No software tool is going to protect you from a disgruntled employee stealing your code, but a DVCS has many more options for dealing with untrusted contributors, such as a gatekeeper workflow. Hence its widespread use in open source projects.
You are right in that distributed version control does not really introduce any new security concerns since the developer has already access to the code in both cases. I can only think that since it is easier to work offline and offsite with GIT, developers might become more tempted to do it than in centralized. I would push to force encryption on all corporate laptops with code
not really easier, just the same. If you enable logs, then you will have the same information when the code is accessed.
I personally do not think so. It might represent the thought process leading to certain decisions but not necessarily more.
It comes down to knowledge on how to implement security measures in both cases. If you have more experience in one system vs another then you are more likely to implement more to prevent such loss but at the end of the day, you are trusting your developers with code the minute you allow them access to it. No way around that.
DVCS provides various protections against unauthorized writing. This is why it is popular with opensource teams. It has several frustrating limitations for controlling reading. Opensource teams do not care about this.
The first problem is that most DVCS encourage many copies of the full source. The typical granularity is the full repo. This can include many unneeded branches and even entire other projects, besides the concern of history (along with searchable commit comments that can make the code even more useful to the attacker). CVCS encourages developers to copy as little as possible to their desktop, since the less they copy, the faster it works. The less you put on mobile devices, the easier it is to secure.
When DVCS is implemented with many devices acting as servers, it is much more difficult to implement effective network security. Attacking a local CVCS workspace requires the attacker to gain access to the filesystem. Attacking a DVCS node generally requires attacking the DVCS itself on any device hosting the information (and remember: the folks who maintain most DVCS's are opensource guys; they don't care nearly as much about read controls). The more devices that host repositories, the more likely that users will set up anonymous read access (which again, DVCS encourages because of its opensource roots). This greatly simplifies the job of an attacker who is doing random sweeps.
CVCS that are based on URLs (like subversion) open the opportunity for quite fine-grain access control, such as per-branch access. DVCS tends to fight this kind of access control.
I know developers like DVCS, but there's no way it can be secured as effectively as CVCS. Most environments do a terrible job of securing their CVCS, and if that's the case then it doesn't matter which you use. But if you take access control seriously, you can have much greater control with CVCS as part of a broader least-privilege infrastructure.
Many may argue that there's no reason to protect source code. That's fine and people can argue about it. But if you are going to protect your source code, the best implementation is to not copy the source to random laptops (which are very hard to secure well), and rather have developers mount it from a central server. CVCS works well this way. DVCS makes no sense if you are going to keep it on a single server this way. If you are going to copy files to mobile devices, make sure you copy as little as possible. That's the opposite of DVCS.
There are a bunch of "security" issues; whether they are an issue depends on your setup:
There's more data floating around, which means the notional "attack surface" might be bigger (it depends on how you count).
But how much data does the "typical" developer check out? You might want to use a sparse checkout in svn, but lazy people and some GUI tools don't support that, so they'll have all your code checked out anyway. Git users might be more likely to use multiple repos. This depends on you.
Authentication/access control might be better (and it might be worse!). This is largely a function of the VCS, not whether it is "D" or "C". svn:// is plaintext.
Is deleting files a priority, and how easy is this to do? An accidental commit of a confidential file is more painful to do in git if it happened in the distant past (but people might be more likely to notice).
Are you really going to notice a malicious user pulling the entire history instead of merely doing a checkout? It depends on how big your repository is and what your branches are like. It's easy for a full SVN checkout to take up more space than the repository itself due to branches.
Change history is generally not something you want to give away for free (even to people with a source code license), but how valuable is it? Maybe you have top-secret design methodologies or confidential information in your commit messages, but this seems unlikely.
And finally, security economics:
How much is the extra security worth?
How much is increased productivity worth?
How much is caring about the concerns about your developers worth?
(IIRC it turns out that users should ignore security advice, because the expected cost is more than the expected benefit — this is especially true for things like certificates that expired yesterday. How much does it cost you to check the address bar every time you type in password? How often do you catch a phishing attempt? What is the cost to you per thwarted phishing attempt? What is the cost per successful phish?)

What are the main reasons against the Windows Registry?

If i want to develop a registry-like System for Linux, which Windows Registry design failures should i avoid?
Which features would be absolutely necessary?
What are the main concerns (security, ease-of-configuration, ...)?
I think the Windows Registry was not a bad idea, just the implementation didn't fullfill the promises. A common place for configurations including for example apache config, database config or mail server config wouldn't be a bad idea and might improve maintainability, especially if it has options for (protected) remote access.
I once worked on a kernel based solution but stopped because others said that registries are useless (because the windows registry is)... what do you think?
A kernel-based registry? Why? Why? A thousand times, why? Might as well ask for a kernel-based musical postcard or inetd for all the point it is putting it in there. If it doesn't need to be in the kernel, it shouldn't be in. There are many other ways to implement a privileged process that don't require deep hackery like that...
If i want to develop a registry-like System for Linux, which Windows Registry design failures should i avoid?
Make sure that applications can change many entries at once in an atomic fashion.
Make sure that there are simple command-line tools to manipulate it.
Make sure that no critical part of the system needs it, so that it's always possible to boot to a point where you can fix things.
Make sure that backup programs back it up correctly!
Don't let chunks of executable data be stored in your registry.
If you must have a single repository, at least use a proper database so you have tools to restore, backup, recover it etc and you can interact with it without having a new set of custom APIs
the first one that come to my mind is somehow you need to avoid orphan registry entries. At the moment when you delete program you are also deleting the configuration files which are under some directory but after having a registry system you need to make sure when a program is deleted its configuration in registry should be deleted as well.
IMHO, the main problems with the windows registry are:
Binary format. This loses you the availability of a huge variety of very useful tools. In a binary format, tools like diff, search, version control etc. have to be specially implemented, rather than use the best of breed which are capable of operating on the common substrate of text. Text also offers the advantage of trivially embedded documentation / comments (also greppable), and easy programatic creation and parsing by external tools. It's also more flexible - sometimes configuration is better expressed with a full turing complete language than trying to shoehorn it into a structure of keys and subkeys.
Monolithic. It's a big advantage to have everything for application X contained in one place. Move to a new computer and want to keep your settings for it? Just copy the file. While this is theoretically possible with the registry, so long as everything is under a single key, in practice it's a non-starter. Settings tend to be diffused in various places, and it is generally difficult to find where. This is usually given as a strength of the registry, but "everything in one place" generally devolves to "Everything put somewhere in one huge place".
Too broad. Its easy to think of it as just a place for user settings, but in fact the registry becomes a dumping ground for everything. 90% of what's there is not designed for users to read or modify, but is in fact a database of the serialised form of various structures used by programs that want to persist information. This includes things like the entire COM registration system, installed apps, etc. Now this is stuff that needs to be stored, but the fact that its mixed in with things like user-configurable settings and stuff you might want to read dramatically lowers its value.

revision control for server side cgi programming

A friend of mine and I are developing a web server for system administration in perl, similar to webmin. We have a setup a linux box with the current version of the server working, along with other open source web products like webmail, calendar, inventory management system and more.
Currently, the code is not under revision control and we're just doing periodic snapshots.
We would like to put the code under revision control.
My question is what will be a good way to set this up and software solution to use:
One solution i can think of is to set up the root of the project which is currently on the linux box to be the root of the repository a well. And we will check out the code on our personal machines, work on it, commit and test the result.
Any other ideas, approaches?
Version Control with Subversion covers many fundamental version control concepts in addition to being the authority on Subversion itself. If you read the first chapter, you might get a good idea on how to set things up.
In your case, it sounds like you're making the actual development on the live system. This doesn't really matter as far as a version control system is concerned. In your case, you can still use Subversion for:
Committing as a means of backing up your code and updating your repository with working changes. Make a habit of committing after testing, so there are as few broken commits as possible.
Tagging as a means of keeping track of what you do. When you've added a feature, make a tag. This way you can easily revert to "before we implemented X" if necessary.
Branching to developt larger chunks of changes. If a feature takes several days to develop, you might want to commit during development, but not to the trunk, since you are then committing something that is only half finished. In this case, you should commit to a branch.
Where you create a repository doesn't really matter, but you should only place working copies where they are actually usable. In your case, it sounds like the live server is the only such place.
For a more light-weight solution, with less overhead, where any folder anywhere can be a repository, you might want to use Bazaar instead. Bazaar is a more flexible version control system than Subversion, and might suit your needs better. With Bazaar, you could make a repository of your live system instead of setting up a repository somewhere else, but still follow the 3 guidelines above.
How many webapp instances can you run?
You shouldn't commit untested code, or make commits from a machine that can't run your code. Though you can push to backup clones if you like.

Domain repository for requirements management - build or buy?

In my organisation, we have some very inefficient processes around managing requirements, tracking what was actually delivered on what versions, etc, do subsequent releases break previous functionality, etc - its currently all managed manually. The requirements are spread over several documents and issue trackers, and the implementation details is in code in subversion, Jira, TestLink. I'm trying to put together a system that consolidates the requirements info, so that it is sourced from a single, authoritative source, is accessible via standard interfaces - web services, browsers, etc, and can be automatically validated against. The actual domain knowledge is not that complicated but is highly proprietary and non-standard (i.e., not just customers with addresses, emails, etc), and is relational: customers have certain functionalities, features switched on/off, specific datasources hooked up - all on specific versions. So modelling this should be straightforward.
Can anyone advise the best approach for this - I a certain that I can develop a system from scratch that matches exactly the requirements, in say ruby on rails, grails, or some RAD framework. But I'm having difficulty getting management buy-in, they would feel safer with an off the shelf solution.
Can anyone recommend such a system? Or am I better off building it from scratch, as I feel I am? I'm afraid a bought system would take just as long to deploy, and would not meet our requirements.
I believe that you are describing two different problems. The first is getting everyone to standardize and the second is selecting a good tool for requirements management. I wouldn't worry so much about the tool as I would the process and the people. Having the best tool in the world won't help if your various project managers don't want to share.
So, my suggestion is to start simple. Grab Redmine or Trac and take on the challenge of getting everyone to standardize. Once you have everyone in the right mindset then you can improve the tools you use for storage.
{disclaimer - mentioning my employer's product}
The brief experiments I made with a commercial tool RequisitePro seemed pretty good me. Allowed one to annotate existing Word docs and create a real-time linked database of the identified requisistes then perform lots of analysis and tracking of them.
Sometimes when I see a commercial product I think "Oh, well nice glossy bits but the fundamentals I could knock up in Perl in a weekend." That's not the case with this stuff. I would certainly look at commercial products in this space and exeperiment with a couple (ReqPro has a free trial, I guess the competition will too) before spending time on my own development.
Thanks a mill for the reply. I will take a look at RequisitePro, at least I'll be following the "Nobody ever got fired for buying IBM" strategy ;) youre right, and I kinda knew it, in these situations, buy is better. It is tempting when I can visualise throwing it together quickly, but theres other tradeoffs and risks with that approach.
While Requisite Pro enforces a standard and that can certainly help you in your task, I'd certainly second Mark on trying to standardize the input by agreement with personnel and using a more flexible tool like Trac, Redmine (which both have incredibly fast deploy and setup times, especially if you host them from a VM) or even a custom one if you can get the management to endorse your project.

How do you foster the use of shared components in your organization?

If your company or project places an emphasis on (or at least appreciates) the development of code and components that can be reused and shared across projects, what are the "social engineering" things you've needed to do to facilitate the re-use the code?
In my experience, code or components that are simply stated as being "reusable" won't be reused unless that code has someone to champion or evangelize it. Otherwise, people simply won't know about it.
How do you make sure shared components or reusable code work in your organization?
A couple of thoughts come to mind:
Make them very well documented and easy to figure out. That way, no one will give up on using them because its too confusing.
Make them very useful, and make sure they take care of problems that are so annoying that people would have to be crazy not to use them.
Another great tactic is to find out what code other people in the organization have in their projects, offer to extract some of that functionality (talk it up about how great it is and how you really want to use it in your project). Once their code is added to the shared module, you usually end up +1 fan of the shared library, and have an evangelist to help you sell the idea. Remember, people usually only do things if they are in their benefit - so making them look good and their code look good is strongly in their benefit.
In my organization, we had management help to foster the creation of a shared library. We also indoctrinate new hires to use it.
Then, we put good code that must be reviewed by everyone for usefulness and completeness. This is where the group buy in comes into play.
Also, we have a very robust, documented process for branching the shared lib for use in Visual Studio. Hint: we use svn:external property to manage checkouts from different repositories in the same folder structure. Potential shared code first exists in the branches before it gets promoted to the trunk.
The best way I've found to make people want to use your reusable code is to show them how to use it the way you intend. Provide a sample program using your library. Keep it in the source code repository just like any other project. Keep the sample program documented and updated when the reusable library changes.
Personally I try and demo code that I think is useful and give a comparison between using it and not using it to try and show why it's so much better. Normally this happens during weekly development meetings.
I agree that it really needs someone to evangelise and push for the adoption of new methods, one of the things I've found a lot of developers to not be that great at is selling themselves or what they've done, so it's worth working on these skills and pushing others to do the same, leading by example.
This may be a bit java centric; but publishing both binary and source in a corporate maven repository does wonders for visibility. It makes other people happy to use your code ;) We work with a lot of open source and find that ready availability of source code to read is really a key feature, especially when it can integrate directly into the IDE. We really expect that from in-house projects too! I wonder how we managed before we had maven ?? (Even ant can use a maven repo nowdays)
In a grad school class a few years ago, I did a case study on a web-based repository where programmers could deposit their code for re-use. This wasn't for my workplace, but for a lab with thousands of scientists, & engineers where there wasn't any other centralized means for sharing.
