If i want to develop a registry-like System for Linux, which Windows Registry design failures should i avoid?
Which features would be absolutely necessary?
What are the main concerns (security, ease-of-configuration, ...)?
I think the Windows Registry was not a bad idea, just the implementation didn't fullfill the promises. A common place for configurations including for example apache config, database config or mail server config wouldn't be a bad idea and might improve maintainability, especially if it has options for (protected) remote access.
I once worked on a kernel based solution but stopped because others said that registries are useless (because the windows registry is)... what do you think?
I once worked on a kernel based solution but stopped because others said that registries are useless (because the windows registry is)... what do you think?
A kernel-based registry? Why? Why? A thousand times, why? Might as well ask for a kernel-based musical postcard or inetd for all the point it is putting it in there. If it doesn't need to be in the kernel, it shouldn't be in. There are many other ways to implement a privileged process that don't require deep hackery like that...
If i want to develop a registry-like System for Linux, which Windows Registry design failures should i avoid?
Make sure that applications can change many entries at once in an atomic fashion.
Make sure that there are simple command-line tools to manipulate it.
Make sure that no critical part of the system needs it, so that it's always possible to boot to a point where you can fix things.
Make sure that backup programs back it up correctly!
Don't let chunks of executable data be stored in your registry.
If you must have a single repository, at least use a proper database so you have tools to restore, backup, recover it etc and you can interact with it without having a new set of custom APIs
the first one that come to my mind is somehow you need to avoid orphan registry entries. At the moment when you delete program you are also deleting the configuration files which are under some directory but after having a registry system you need to make sure when a program is deleted its configuration in registry should be deleted as well.
IMHO, the main problems with the windows registry are:
Binary format. This loses you the availability of a huge variety of very useful tools. In a binary format, tools like diff, search, version control etc. have to be specially implemented, rather than use the best of breed which are capable of operating on the common substrate of text. Text also offers the advantage of trivially embedded documentation / comments (also greppable), and easy programatic creation and parsing by external tools. It's also more flexible - sometimes configuration is better expressed with a full turing complete language than trying to shoehorn it into a structure of keys and subkeys.
Monolithic. It's a big advantage to have everything for application X contained in one place. Move to a new computer and want to keep your settings for it? Just copy the file. While this is theoretically possible with the registry, so long as everything is under a single key, in practice it's a non-starter. Settings tend to be diffused in various places, and it is generally difficult to find where. This is usually given as a strength of the registry, but "everything in one place" generally devolves to "Everything put somewhere in one huge place".
Too broad. Its easy to think of it as just a place for user settings, but in fact the registry becomes a dumping ground for everything. 90% of what's there is not designed for users to read or modify, but is in fact a database of the serialised form of various structures used by programs that want to persist information. This includes things like the entire COM registration system, installed apps, etc. Now this is stuff that needs to be stored, but the fact that its mixed in with things like user-configurable settings and stuff you might want to read dramatically lowers its value.
Related
What if I develop a desktop application which million people will use, and behind the scene, the application is surveilling users' files on their hard drives, streaming the data time to time?
Can one be assured no such things happen, with any popular software applications, be it MS Office or Google Chrome?
Or this is just a stupid question?
Is it technically possible? Yes, it is.
Could it be happening in an application used by a million users for a relatively long time without being noticed? Very unlikely. Somebody would notice the strange network traffic eventually.
Also #Mjh mentioned open source in a comment. While open source can help by allowing people to audit the source code, how many times have you checked that the binary you are using is actually the compiled source that you were looking at? Of course, there are signatures on binary packages and all, but the signature is made by the package maintainer. There is an inherent trust not only in the developer of the application, but also in the tool chain that creates a binary package from the source code. And then we haven't talked about strange "bugs", or the fact that even in open source, some security issues are very hard to find (otherwise all open source software would be security bug-free, which they are not).
So back to your question, sure, you could use all kinds of techniques to monitor the behavior of an application, you could monitor memory access, network traffic, whatever else. You can also analyse the code itself, look for suspicious things. It will take a huge amount of effort and still there will be no 100% guarantee, only some level of assurance.
Automated version upgrades could make detection even harder by the way. Even if you put lots of resources into analysis of one version, what if only a short-lived version had malicious code? Sure, that too can be analysed, but would anyone bother, unless there was a good reason (like indications of something malicious)?
Yet I think you can be pretty sure that major vendors don't do this. It's just not worth it for them, why would they? Their risk would be huge, with a relatively low benefit.
I'm looking for a way to output traces to a log file in my code, which runs on linux.
I don't want to include the printing information in the binary, in every place I deploy it.
It windows, I simply used WPP to trace without putting the actual traces strings in my binary.
How can this by achieved in Linux?
I'm not very familiar with Linux tools in this area, so maybe there is a better system. However, since nobody else has made any good suggestions, I'll make a suggestion. (Probably not a very good suggestion, but the best I can think of right now.)
In theory, you could continue to use wpp. Wpp is simply a template system. It scans the configuration and input files to create data structures. Then it runs a template, fills in the data values it got from the scan, producing the tmh files. You could create a new set of templates that would use Linux apis instead of Windows apis, and would record the message strings in a way that works with some other log decoder system.
I noticed this question only now and would like to add my two cents to the story just for a case. Personally, I truly appreciate Windows WPP Tracing and consider it probably the best engineering solution for practical development troubleshooting among similar tools.
It happened I extended WPP use to Unix-like platforms twice. We wanted to use strong sides of WPP concept in general and yet use it in a multi-platform pieces of code. This was not a porting but rather a wrapper to specific WPP use we configured on Windows. One time we had a web service to perform actual WPP pre-processing on Windows; it may sound a bit insane but it worked fine and effective within the local network. A wrapper script that was executed before each compilation sent a web request, got a processed file and post-processed the generated include file to make it suitable for Unix-like platforms. The second time we implemented a simplified WPP pre-processor of our own (we found yet additional use for it - we could generate the tracing statements differently for production and unit testing, for example). This was a harsh solution: you anyway need to use some physical tracing framework behind the wrapper on non-Windows platform (well, the first time we apparently implemented our own lower level).
I do not think the Linux world has a framework comparable to WPP. Once I even thought it could be a great idea to make an open source porting project for WPP. I am not sure it would be much requested though. I said it is a great engineering solution. But who wants to do dirty engineering work? Open source community prefer abstract object-oriented and generic solutions, streaming and less necessity in corresponding tools (WPP requires special management tools and OS support).Ease of code writing is the today's choice.
There could be Microsoft fault (or unwillingness) in the lack of WPP popularity too. They kept it as an internal framework that came out just by a case with Windows DDK because they have to offer some logging/tracing solution for driver developers. Nobody even noticed much that WPP is well suitable for the user-space code too. And WPP pre-processor for C#, for example, has never been exposed to public at all.
Nevertheless, I still think that WPP porting to Unix/Linux work can be a challenging, interesting and maybe even useful attempt. If someone decides to lead it. :)
As my company begins to further explore moving from centralized version control tools (CVS, SVN, Perforce and a host of others) to offering teams distributed version control tools (mercurial in our case) I've run into a problem:
The Problem
A manager has raised the concern that distributed version control may not be as secure as our CVCS options because the repo history is stored locally on the developer's machine.
It's been difficult to nail down his exact security concern but I've gathered that it centers on the fact that a malicious employee could steal not only the latest intellectual properly but our whole history of changes just by copying a single folder.
The Question(s)
Do distributed version control system really introduce new security concerns for projects?
Is it easier to maliciously steal code?
Does the complete history represent an additional threat that the latest version of the code does not?
My Thoughts
My take is that this may be a mistaken thought that the centralized model is more secure because the history seems to be safer as it is off on its own box. Given that users with even read access to a centralized repo could selectively extract snapshots of the project at any key revision I'm not sure the DVCS model makes it all that easier. Also, most CVCS tools allow you to extract the whole repo's history with a single command so that you can import them into other tools.
I think the other issue is just how important the history is compared to the latest version. Granted someone could have checked in a top secret file, then deleted it and the history would pretty quickly be significant. But even in that scenario a CVCS user could checkout that top secret version with a single command.
I'm sure I could be missing something or downplaying risks as I'm eager to see DVCS become a fully supported tool option. Please contribute any ideas you have on security concerns.
If you have read access to a CVCS, you have enough permissions to convert the repo to a DVCS, which people do all the time. No software tool is going to protect you from a disgruntled employee stealing your code, but a DVCS has many more options for dealing with untrusted contributors, such as a gatekeeper workflow. Hence its widespread use in open source projects.
You are right in that distributed version control does not really introduce any new security concerns since the developer has already access to the code in both cases. I can only think that since it is easier to work offline and offsite with GIT, developers might become more tempted to do it than in centralized. I would push to force encryption on all corporate laptops with code
not really easier, just the same. If you enable logs, then you will have the same information when the code is accessed.
I personally do not think so. It might represent the thought process leading to certain decisions but not necessarily more.
It comes down to knowledge on how to implement security measures in both cases. If you have more experience in one system vs another then you are more likely to implement more to prevent such loss but at the end of the day, you are trusting your developers with code the minute you allow them access to it. No way around that.
DVCS provides various protections against unauthorized writing. This is why it is popular with opensource teams. It has several frustrating limitations for controlling reading. Opensource teams do not care about this.
The first problem is that most DVCS encourage many copies of the full source. The typical granularity is the full repo. This can include many unneeded branches and even entire other projects, besides the concern of history (along with searchable commit comments that can make the code even more useful to the attacker). CVCS encourages developers to copy as little as possible to their desktop, since the less they copy, the faster it works. The less you put on mobile devices, the easier it is to secure.
When DVCS is implemented with many devices acting as servers, it is much more difficult to implement effective network security. Attacking a local CVCS workspace requires the attacker to gain access to the filesystem. Attacking a DVCS node generally requires attacking the DVCS itself on any device hosting the information (and remember: the folks who maintain most DVCS's are opensource guys; they don't care nearly as much about read controls). The more devices that host repositories, the more likely that users will set up anonymous read access (which again, DVCS encourages because of its opensource roots). This greatly simplifies the job of an attacker who is doing random sweeps.
CVCS that are based on URLs (like subversion) open the opportunity for quite fine-grain access control, such as per-branch access. DVCS tends to fight this kind of access control.
I know developers like DVCS, but there's no way it can be secured as effectively as CVCS. Most environments do a terrible job of securing their CVCS, and if that's the case then it doesn't matter which you use. But if you take access control seriously, you can have much greater control with CVCS as part of a broader least-privilege infrastructure.
Many may argue that there's no reason to protect source code. That's fine and people can argue about it. But if you are going to protect your source code, the best implementation is to not copy the source to random laptops (which are very hard to secure well), and rather have developers mount it from a central server. CVCS works well this way. DVCS makes no sense if you are going to keep it on a single server this way. If you are going to copy files to mobile devices, make sure you copy as little as possible. That's the opposite of DVCS.
There are a bunch of "security" issues; whether they are an issue depends on your setup:
There's more data floating around, which means the notional "attack surface" might be bigger (it depends on how you count).
But how much data does the "typical" developer check out? You might want to use a sparse checkout in svn, but lazy people and some GUI tools don't support that, so they'll have all your code checked out anyway. Git users might be more likely to use multiple repos. This depends on you.
Authentication/access control might be better (and it might be worse!). This is largely a function of the VCS, not whether it is "D" or "C". svn:// is plaintext.
Is deleting files a priority, and how easy is this to do? An accidental commit of a confidential file is more painful to do in git if it happened in the distant past (but people might be more likely to notice).
Are you really going to notice a malicious user pulling the entire history instead of merely doing a checkout? It depends on how big your repository is and what your branches are like. It's easy for a full SVN checkout to take up more space than the repository itself due to branches.
Change history is generally not something you want to give away for free (even to people with a source code license), but how valuable is it? Maybe you have top-secret design methodologies or confidential information in your commit messages, but this seems unlikely.
And finally, security economics:
How much is the extra security worth?
How much is increased productivity worth?
How much is caring about the concerns about your developers worth?
(IIRC it turns out that users should ignore security advice, because the expected cost is more than the expected benefit — this is especially true for things like certificates that expired yesterday. How much does it cost you to check the address bar every time you type in password? How often do you catch a phishing attempt? What is the cost to you per thwarted phishing attempt? What is the cost per successful phish?)
My logic of APT (Anti-Paching Technology) is as follows...
1) Store on the MSSQL server the md5 hash of the executable for protection.
2) Perform an md5 comparison (within my application startup) the hash found on the server, with the executable itself.
3) If comparison fails exit application silently.
And all these above before it is finally pached!
I mean what is your best way to protected a file from being patched?
Without using ready tools (.net reactor, virtualizer etc)
Edit: Something else just came into my mind.
Is there any way of checking the application integrity on server side?
I mean my app works only online. Could i execute something on the server (my domain) that could check the application integrity?
The thing is a cracker would patch the application precisely on step 2, removing the hash check code.
So I wouldn't call that very effective against serious crackers.
EDIT: I guess your best bet is defense in depth, given that your app has to be online I'd:
Require authentication: Authenticate users, hopefully via a cryptographic key, and require a key check to receive/send data.
Obfuscation: It makes things harder for crackers.
Continued checks: Besides checking who is sending data, validate the application each time a request is sent.
These all can still be circumvented, but they make things a lot harder and might disuade some if your app is not worth that much to them.
A patched application means the 'cracker' has complete control over the machine the code is running on (at least enough control to patch the executable). So patch prevention however smart it might be is working against the flow of control.
Complicating your binary file might be enough to discourage patching so obfuscators are propably your best bet.
you can't. once someone else has your file they can do what they like with it - first thing would be to patch out your anti-patching code.
If the application is running on someone else's machine, you cannot prevent them from patching it. You can make it harder, but it's a shell game: you cannot win. Regardless of how complicated you make it, some guy somewhere will see it as an interesting challenge to break your protection, and he will succeed. Then, everyone else just has to download his version. The most extreme form of patch-protection today is Skype (that I know of). It's insanely complicated, and yet it has been broken.
Since your application apparently runs online, you can ask yourself why you want to prevent patches in the first place (maybe it's to prevent the user from entering some bad values? Or to prevent them from seeing some information that's present in the program?), and then architect your program so that whatever you want to keep hidden or checked happens on the server.
For example if it's a game and you want to prevent players from hacking the game to know where the other players are: change the server so it only sends coordinate information for the players that you can already see.
Another example: if it's an online store and you want to make sure users don't submit purchase orders with incorrect prices, check the prices at the server.
The only exception there is if you control the hardware that the program's running on. But even there, it's very hard to do it right (see: XBox, PS3, and the many other consoles that tried to do that and failed). It's probably still better to leverage the client/server architecture rather than betting on "trusted computing".
Crackers nowadays don't bother patching your executable file; they simply change your program's variables in-memory to make its behaviour more amenable to their requirements. Defending against this is very difficult and reasonably pointless; most games' crack-protection works only by searching for signatures of known crack programs (like an AV engine does).
Everyone nailed it, you can't stop someone but you can make it harder for them, you could even go off the deep end and make some in-memory validation stuff like World Of Warcrafts Warden system.
If you tell us what language you are writing in we might be able to suggest some simple obfuscation methods.
Inspired by a much more specific question on ServerFault.
We all have to trust a huge number of people for the security and integrity of the systems we use every day. Here I'm thinking of all the authors of all the code running on your server or PC, and everyone involved in designing and building the hardware. This is mitigated by reputation and, where source is available, peer review.
Someone else you might have to trust, who is mentioned far less often, is the person who previously had root on a system. Your predecessor as system administrator at work. Or for home users, that nice Linux-savvy friend who configured your system for you. The previous owner of your phone (can you really trust the Factory Reset button?)
You have to trust them because there are so many ways to retain root despite the incoming admin's best efforts, and those are only the ones I could think of in a few minutes. Anyone who has ever had root on a system could have left all kinds of crazy backdoors, and your only real recourse under any Linux-based system I've seen is to reinstall your OS and all code that could ever run with any kind of privilege. Say, mount /home with noexec and reinstall everything else. Even that's not sufficient if any user whose data remains may ever gain privilege or influence a privileged user in sufficient detail (think shell aliases and other malicious configuration). Persistence of privilege is not a new problem.
How would you design a Linux-based system on which the highest level of privileged access can provably be revoked without a total reinstall? Alternatively, what system like that already exists? Alternatively, why is the creation of such a system logically impossible?
When I say Linux-based, I mean something that can run as much software that runs on Linux today as possible, with as few modifications to that software as possible. Physical access has traditionally meant game over because of things like keyloggers which can transmit, but suppose the hardware is sufficiently inspectable / tamper-evident to make ongoing access by that route sufficiently difficult, just because I (and the users of SO?) find the software aspects of this problem more interesting. :-) You might also assume the existence of a BIOS that can be provably reflashed known-good, or which can't be flashed at all.
I'm aware of the very basics of SELinux, and I don't think it's much help here, but I've never actually used it: feel free to explain how I'm wrong.
First and foremost, you did say design :) My answer will contain references to stuff that you can use right now, but some of it is not yet stable enough for production. My answer will also contain allusions to stuff that would need to be written.
You can not accomplish this unless you (as user9876 pointed out) fully and completely trust the individual or company that did the initial installation. If you can't trust this, your problem is infinitely recursive.
I was very active in a new file system several years ago called ext3cow, a copy on write version of ext3. Snapshots were cheap and 100% immutable, the port from Linux 2.4 to 2.6 broke and abandoned the ability to modify or delete files in the past.
Pound for pound, it was as efficient as ext3. Sure, that's nothing to write home about, but it was (and for a large part) still is the production standard FS.
Using that type of file system, assuming a snapshot was made of the pristine installation after all services had been installed and configured, it would be quite easy to diff an entire volume to see what changed and when.
At this point, after going through the diff, you can decide that nothing is interesting and just change the root password, or you can go inspect things that seem a little odd.
Now, for the stuff that has to be written if something interesting is found:
Something that you can pipe the diff though that investigates each file. What you're going to see is a list of revisions per file, at which time they would have to be recursively compared. I.e. , present against former-present, former-present against past1, past1 against past2, etc , until you reach the original file or the point that it no longer exists. Doing this by hand would seriously suck. Also, you need to identify files that were never versioned to begin with.
Something to inspect your currently running kernel. If someone has tainted VFS, none of this is going to work, CoW file systems use temporal inodes to access files in the past. I know a lot of enterprise customers who modify the kernel quite a bit, up to and including modules, VMM and VFS. This may not be such an easy task - comparing against 'pristine' may not be tenable since the old admin may have made good modifications to the kernel since it was installed.
Databases are a special headache, since they change typically each second or more, including the user table. That's going to need to be checked manually, unless you come up with something that can check to be sure that nothing is strange, such a tool would be very specific to your setup. Classic UNIX 'root' is not your only concern here.
Now, consider the other computers on the network. How many of them are running an OS that is known to be easily exploited and bot infested? Even if your server is clean, what if this guy joins #foo on irc and starts an attack on your servers via your own LAN? Most people will click links that a co-worker sends, especially if its a juicy blog entry about the company .. social engineering is very easy if you're doing it from the inside.
In short, what you suggest is tenable, however I'm dubious that most companies could enforce best practices needed for it to work when needed. If the end result is that you find a BOFH in your work force and need to can him, you had better of contained him throughout his employment.
I'll update this answer more as I continue to think about it. Its a very interesting topic. What I've posted so far are my own collected thoughts on the same.
Edit:
Yes, I know about virtual machines and checkpointing, a solution assuming that brings on a whole new level of recursion. Did the (now departed) admin have direct root access to the privileged domain or storage server? Probably, yes, which is why I'm not considering it for the purposes of this question.
Look at Trusted Computing. The general idea is that the BIOS loads the bootloader, then hashes it and sends that hash to a special chip. The bootloader then hashes the OS kernel, which in turn hashes all the kernel-mode drivers. You can then ask the chip whether all the hashes were as expected.
Assuming you trust the person who originally installed and configured the system, this would enable you to prove that your OS hasn't had a rootkit installed by any of the later sysadmins. You could then manually run a hash over all the files on the system (since there is no rootkit the values will be accurate) and compare these against a list provided by the original installer. Any changed files will have to be checked carefully (e.g. /etc/passwd will have changed due to new users being legitimately added).
I have no idea how you'd handle patching such a system without breaking the chain of trust.
Also, note that your old sysadmin should be assumed to know any password typed into that system by any user, and to have unencrypted copies of any private key used on that system by any user. So it's time to change all your passwords.