FS based on a database without using fuse - linux

To serve millions of files out of a single directory, being able to connect to a drive from hundreds of endpoints, and for some other reasons (to avoid gluster/nfs/all fs based networking solutions), I want to evaluate the possibility of making a filesystem that's based on a mongodb (or any other).
Basically, it works like fusefs, every single file is kept in mongo gridfs. In theory, I do,
mount mongodbfs /mountPoint mongodb://localhost
then when i say touch /mountPoint/test.txt this file is inserted into mongodb. This FS will also store uid/gid and perms with the file, we can throw hundreds of servers to it, and no useradd will be necessary. I'm not thinking to include all the features of FS, just the ones we need.
My question is, how do I start my quest in finding resources, books, links, people, developers who'd help me implement this? at least a proof of concept. Is it feasible? What should I expect as a timeline for such undertaking?
Please only think about gazillion small files and folders.
ps: after a few days of research i think this is the direction i'm heading
http://www.ibm.com/developerworks/library/l-sc12.html
http://www.flipcode.com/archives/Programming_a_Virtual_File_System-Part_I.shtml
ps2: i'm aware of the difficulty of this undertaking. however we're willing to set aside a serious budget and willing to form a serious team implementing it - only after we make sure that this isn't a black hole (thus the question).

Your most frequent piece of advice here is going to be "Use FUSE". This is excellent advice, and you would do well to heed it (As Sciurus pointed out there's already gridfs-fuse which is pretty close to what you want).
That said, if you want to take the long, hard road of pain and suffering (writing your own filesystem), you almost certainly want to take an operating systems course at a local university, or look at some online course materials ("Write a simple FS" is usually a small project. The filesystems typically suck because they're academic toys).
Follow that up with Linux File Systems (Moshe Bar) and a thorough reading of some simple filesystem drivers to see the basic skeleton of what you'll need to do.
As far as timeline, if you're a decent coder you can write a basic filesystem in a few days to a week (but it will SUCK). I wouldn't even guess how long it would take to write a GOOD filesystem -- UFS/FFS (the BSD filesystem) has been under continuous development since at least the late 1970s/early 1980s, and improvements/enhancements/bug fixes still pop up occasionally. Sun/Oracle's ZFS has gone through over 20 iterations in its relative short (6-year) life, though admittedly much of that is related to volume management capabilities.

Related

Node server: code weight and server performance

I would like to know how important could be the impact between using a 15.5k library just for doing very simple validations, and, using my own 1k super-simple validation class, in the time when I'll have more than 10k users on my system (Node + Mongo running on a super pentium 8 core 32gb ram).
Is it worst to care about this 14.5k of code?
I cant find any clue in my so bleak but always wondering mind.
I'll apretiate very much your opinion.
A nice thing about server development is that you usually have significant RAM available and the code is generally loaded just once at the startup of the server so load time is not part of the user experience.
Because of these, you'd be hard pressed to even measure a meaningful impact between a 1k library and a 15k library. You might care about 15k of memory usage per active user, but you will not care about an extra 15k of code loaded into memory one time and it will not affect your server performance in any way.
So, I'd say you should not worry about skrimping on code size (within reason). Instead, pick the tool that best solves your problem, makes your development quickest and the most reliable. And, where possible, use what has been built and tested before rather than build your own from scratch. That will leave you more development time to spend on the things that really matter to making your website better, different or great. Or, it will get you to market faster.
For reference, 15k is 0.000045% of your total computer's RAM.
I agree with #jfriend00. There's almost no impact on memory/performance for the code sizes you describe. You can always benchmark different modules according to your usage profile and choose by yourself. However, I think you should ask yourself some other (similar) questions -
Why the package I use is so 'big'? maybe there's a much 'smaller' one
that does the same job with the same performance. When I say big or small here I mean in terms of functionality. Most of the times you'd want to go with minimum functionality, even if its size might seem big. If you use a validation module that also validates emails, but you don't need it - doesn't mean that you shouldn't use it, just know the tradeoffs - it might get updated more frequently because bugs in the email validation that might cause other bugs in integer validations that you use, you have more code to read if you want your production code to feel more safe (explained bellow).
Does the package function as I expect? (read the tests)
Is the package I use "secured"/"OK for production"? Read the code of the packages you use, make sure there isn't something fishy going on - usually node packages are not that big because most are minimal (I never used it, but I know https://requiresafe.com/ exists for these types of questions - you might want to check it out). Note that if they are larger in size that might mean you would have to read more code.
Ask these questions (and others of you feel you should) recursively on the package' dependencies.

Finding Vulnerabilities in Software

I'm insterested to know the techniques that where used to discover vulnerabilities. I know the theory about buffer overflows, format string exploits, ecc, I also wrote some of them. But I still don't realize how to find a vulnerability in an efficient way.
I don't looking for a magic wand, I'm only looking for the most common techniques about it, I think that looking the whole source is an epic work for some project admitting that you have access to the source. Trying to fuzz on the input manually isn't so comfortable too. So I'm wondering about some tool that helps.
E.g.
I'm not realizing how the dev team can find vulnerabilities to jailbreak iPhones so fast.
They don't have source code, they can't execute programs and since there is a small number of default
programs, I don't expect a large numbers of security holes. So how to find this kind of vulnerability
so quickly?
Thank you in advance.
On the lower layers, manually examining memory can be very revealing. You can certainly view memory with a tool like Visual Studio, and I would imagine that someone has even written a tool to crudely reconstruct an application based on the instructions it executes and the data structures it places into memory.
On the web, I have found many sequence-related exploits by simply reversing the order in which an operation occurs (for example, an online transaction). Because the server is stateful but the client is stateless, you can rapidly exploit a poorly-designed process by emulating a different sequence.
As to the speed of discovery: I think quantity often trumps brilliance...put a piece of software, even a good one, in the hands of a million bored/curious/motivated people, and vulnerabilities are bound to be discovered. There is a tremendous rush to get products out the door.
There is no efficient way to do this, as firms spend a good deal of money to produce and maintain secure software. Ideally, their work in securing software does not start with a looking for vulnerabilities in the finished product; so many vulns have already been eradicated when the software is out.
Back to your question: it will depend on what you have (working binaries, complete/partial source code, etc). On the other hand, it is not finding ANY vulnerability but those that count (e.g., those that the client of the audit, or the software owner). Right?
This will help you understand the inputs and functions you need to worry about. Once you localized these, you may already have a feeling of the software's quality: if it isn't very good, then probably fuzzing will find you some bugs. Else, you need to start understanding these functions and how the input is used within the code to understand whether the code can be subverted in any way.
Some experience will help you weight how much effort to put at each task and when to push further. For example, if you see some bad practices being used, then delve deeper. If you see crypto being implemented from scratch, delve deeper. Etc
Aside from buffer overflow and format string exploits, you may want to read a bit on code injection. (a lot of what you'll come across will be web/DB related, but dig deeper) AFAIK this was a huge force in jailbreaking the iThingies. Saurik's mobile substrate allow(s) (-ed?) you to load 3rd party .dylibs, and call any code contained in those.

Is it better to do roll-your-own or ready-built forum software?

As part of a wide ranging job for a cystic fibrosis support organization, they'd also like a web site set up and I've decided on Apache running on Linux (due to its security and low cost mostly). Other than (fairly) static content, they also want a forum where people can discuss issues with the condition - it'll be attached to a hospital chain so there'll be plenty of medical staff there who know little about the web.
I can handle all the specific coding and Apache setup since I've done it before but I'm interested in people's opinions as to whether I should roll my own forum software or get a hold of some ready-built stuff. I've not had any experience with forum software but I could generate my own (initially buggy, I'm sure) in a month or so.
It'll require registration and login to leave comments (but guest access just to read) and I'd like it to be 'pretty' (excuse me while I remember damning customers for providing similarly vague requirements specs :-) but not necessarily infinitely-configurable with skins/themes/etc.
If anyone has some compelling reasons (and experience with specific products that can provide what I need), I'd be interested in hearing about them. Alternatively, does anyone have any 'gotchas' they experienced while coding their own forum software?
Advantages to rolling your own:
a non-standard custom-built system means you'll be less prone to "standard" attacks (e.g.: a vulnerability in PunBB) since bad guys tend to bother with exploit-hunting only on widely-deployed systems (more return on their investment)
absolute control over how your system works and looks
you'll learn a lot
Disadvantages:
you'll repeat mistakes other people have already solved
it'll take you longer to get up and running
long-term it'll be more maintenance (since you have to fix bugs & add features yourself).
you can't "leverage the community" -- if you choose an off-the-shelf forum that has a plugin system then there's a whole bunch of community add-ons that won't be available for your custom forum software.
There's a GIANT list of forum software on wikipedia -- there's most likely something in there that will suit your needs that you can get up and running quickly.
IMHO the old "don't build what you can buy" adage applies to this (well, the web 2.0 version is obviously "don't build what you can download"). Have a look around at the available forum software, pick one that covers 99% of your needs and tweak it to do the rest.
If you still want to build your own forum software that'll probably be a cool side project but if the job is to get a forum up and running, then go and download one - don't try to mix up the desire to do cool stuff and the day job unless the day job is just to do cool stuff only.
One of the best-kept secrets on the internets is a little gem called FUDforum, by Ilia Alshanetsky.
And yes, it's the same Ilia who wrote xDebug's original profiler code, improved the caching in MMcache, fixed several security bugs in libmcrypt, and who was the release manager for the PHP language from 4.3.3 to 4.3.6+. He is, as my friends in Boston would say, wicked smaart.
Because of this, FUDforum is robust, ridiculously fast and more secure than probably any other part of your web application will ever be. It comes with a neat install script and it has all the features you'll need.
Plus, it's not a high-profile target like phpBB or vBulletin, which means you won't have to worry about spambots constantly banging on the gates.
Having written my own forum software before...
It seems like a simple problem, but when you get into it, you find that there's a lot of little things that you'd like to do nicer, and it takes a lot of time. Mine was cool and all, and I did get paid for it, but if I was doing it over again (which has also happened), I'd use a customizable pre-made solution, and spend all my spare time doing something productive. :)
Forum softwares tend to have rather complex minimum requirements. A few things you are very likely to need do matter what you do:
Forum/thread/post hierarchy;
User system;
Security system (eg user/admin classes and all kinds of restrictions for users);
Gathering statistics;
BBCodes or some other minimized markup language (NEVER allow users to do full HTML);
File uploads and avatars;
Bans and other punishments;
CAPTCHAs;
etc.
Ready made forum systems provide this out-of-the-box and lots more. Setup is mostly easy too. Why do it all over again yourself?
My answer would be: don't reinvent the wheel, there are plenty of fora software out there. My preference would go for RForum if you need only that.
I'd say, don't waste your time. phpBB 3 is pretty stable, usable and feature-rich forum. We use it at work (for our internal discussions), and I really don't have anything bad to say about it.
I'd concur with most of the above posters that since you want something which appears fairly standard, why reinvent something that already exists?
Like any development, creating forum software is probably much harder than it looks! There will be problems solved in the existing software which you haven't even considered.
It's worth adding that if you do require any specific additional functionality, you can always build that on top of an existing solution anyway, which is especially easy if you have the source code (whether open source or commercial).
From the sounds of the website that you are building, there is the potential for the forum to be a highly useful and visible resource, it would be good to go with something that already exists, due to the quality of a lot of the products out there and the rich communities that surround them.
I think that vBulletin, although a paid for product, would suit your needs and give you a great base to build a community on.
vanilla is pretty bare bones and easy to configure, perhaps find a system which is easy to extend vs building everything yourself
Ready built until you have some really unique features needed that can be tied to money it will make you.

How do you balance the conflicting needs of backwards compatibility and innovation?

I work on an application that has a both a GUI (graphical) and API (scripting) interface. Our product has a very large installed base. Many customers have invested a lot of time and effort into writing scripts that use our product.
In all of our designs and implementation, we (understandably) have a very strict requirement to maintain 100% backwards compatibility. A script which ran before must continue to run in exactly the same way, without any modification, when we introduce a new software version.
Unfortunately, this requirement sometimes ties our hands behind our back, as it really restricts our ability to innovate and come up with new and better ways of doing things.
For example, we might come up with a better (and more usable) way of achieving a task which is already possible. It would be desirable to make this better way the default way, but we can't do this as it may have backwards compatibility implications. So we are stuck with leaving the new (better) way as a mode, that the user must "turn on" before it becomes available to them. Unless they read the documentation or online help (which many customers don't do), this new functionality will remain hidden forever.
I know that Windows Vista annoyed a lot of people when it first came out, because of all the software and peripherals which didn't work on it, even when they worked on XP. It received a pretty bad reception because of this. But you can see that Microsoft have also succeeded in making some great innovations in Vista, at the expense of backwards compatibility for a lot of users. They took a risk. Did it pay off? Did they make the right decision? I guess only time will tell.
Do you find yourself balancing the conflicting needs of innovation and backwards compatibility? How do you handle the juggling act?
As far is my programming experience is concerned, if I'm going to fundamentally change something that will prevent past incoming data to be used correctly, I need to create an abstraction layer for the old data where it can be converted for use in the new format.
Basically I set the "improved" way as default and make sure through a converter it can read data of the old format, but save or store data as the new format.
I think the big thing here is test, test, test. Backwards compatibility shouldn't hinder forward progress.
Split development into two branches, one that maintains backwards compatibility and one for a new major release, where you make it clear that backwards compatibility is being broken.
The critical question that you need to ask is wether the customers want/need this "improvement" even if you perceive it as one your customers might not. Once a certain way of doing things has been established changing the workflow is a very "expensive" operation. Depending on the computer savyness of your users it might take some a long time to adjust to the change in the UI.
If you are dealing with clients innovation for innovation's sake is not always a good thing as fun as it might be for you to develop these improvements.
You could alawys look for innovative ways to maintain backwards compatibilty.

How do you protect code from leaking outside? [duplicate]

This question already has answers here:
How do you protect your software from illegal distribution? [closed]
(22 answers)
Closed 5 years ago.
Besides open-sourcing your project and legislation, are there ways to prevent, or at least minimize the damages of code leaking outside your company/group?
We obviously can't block Internet access (to prevent emailing the code) because programmer's need their references. We also can't block peripheral devices (USB, Firewire, etc.)
The code matters most when it has some proprietary algorithms and in-house developed knowledge (as opposed to regular routine code to draw GUIs, connect to databases, etc.), but some applications (like accounting software and CRMs) are just that: complex collections of routine code that are simple to develop in principle, but will take years to write from scratch. This is where leaked code will come in handy to competitors.
As far as I see it, preventing leakage relies almost entirely on human process. What do you think? What precautions and measures are you taking? And has code leakage affected you before?
You can't stop it getting out. So two solutions - stop people wanting to hurt you, and have legal precautions. To stop people hating you treat them right (saying more is probably off topic for stack overflow).
I'm not a lawyer, but to give yourself legal protection, if you believe in it, patent the ideas, put a copyright notice in the code, and make sure the contracts for your programmers specify carefully intellectual property rights.
But at the end of the day, the answer is run quicker than the competition.
Unless you're working with something highly classified and given that you can't block email and USB devices I guess you aren't there's really not to much damage to be had even if the source code leaks. The thing is, what is the code, or parts of it worth without the knowledge of how it works and the organization around it.
In general the value of "source" is much less than is commonly touted, basicly the source without the people or the organization isn't worth the storage it occupies for a competitor.
Also, you're missing the most likely attack vector, and it's also the one you can't stop no matter what. If someone really really want's to know how you made your magic then they'll try to hire your developers away, and since you can't stop them from having information inside their skull and even if they turn in all their possesions ther knowledge and domain expertise is leaving with them. Basicly employee retention and trust is the only way. Sorry.
I don't know how much actual help this is going to be, but:
Don't p*ss your programmers off. Don't get them in a position where they want to give the source to a competitor. Most places undervalue their developers. Given where you are (SO), I guess you are less likely to. Nothing got to me more than seeing the sales folks out for games of golf - paid, and paid for, by the company - while we had to fight to get pizza once a month.
Really, if your direct competitors got your code today, what would it do? Is your product or vertical market that stagnant that you wouldn't release newer, better versions before they could react? Is there no room for innovation? Most companies overvalue their "proprietary algorithms and in-house developed knowledge". Sure, it may cut some time off, but it's only about 10% of the problem.
If you got all the source for all your competitors products, how much actual use would it be? I'd guess it would set you back months. Not forward. Back.
If you had a clean system, and little external/internal knowledge, how long would it take you to get your own product into a buildable state? How long would it take to drill down into the code and workout what is going on? How much time and money would you waste trying to work something out, rather than spending time and money on how to make your product work better?
I've actually been in the position of having all the source - 1million lines+ of code - to a competitor's product. We did nothing with it - aside from a bit of a poke-around and then delete it, which was more than I was comfortable with - but I would expect that we'd have chewed up months of time just to get to where they were then.
So we nuked it, slapped the id10t who got it (yes, a developer/PM who came over from the other company), and thought about how to make our product kick so much butt that it didn't matter what they did. Much better use of time. Worked well, too. We had differentiators, not just re-hashing the same features in the same way they did them.
Sorry, but there is no way you can stop people getting stuff out, and still be able to actually work. You can stop them wanting to do it, or make it so there is no value to them having it.
We were worried about people decompiling our code too. We stopped worrying when we realised that WE had enough trouble working out what was going on inside 500K+ lines of C#, C++ and HTML code talking to MAPI/Exchange. If someone can decompile it and work it out, then we want to hire them......
BTW, for clarity, and given who I now work for, I should point out this is not my current employer. This was quite a while ago.
The code does not leak out on itself. It takes people to take it. There are obviously some security measures you might use like traffic analysis and lock-down on the repositories so only authorized developers can connect to it.
But by the end of the day your best option is to make sure that no one WANTS to steal from you. Your team has to be happy, they have to be proud to work for your they have to be loyal to the company and to each other. If you have such team it's a simple question of explaining to everyone that the code has to be protected from outsiders. It will not stop a dedicated mole but will prevent accidents.
P.S. And yes, proper clauses in the contracts would not harm as well, at least they will make sure that the developers are AWARE that taking code outside is morally wrong.
Follow these guidelines and it shouldn't matter if the contents of your entire source code repository is posted all over stackoverflow:
http://geocities.com/mdetting/unmaintainable.html
Oh, and show your developers that you don't trust them by blocking access to parts of the source code, scanning outgoing/incoming email etc. That is a surefire way to make them want to stay around... ...nothing improves morale like a bit of mistrust in the workplace.
Another cool way is to tell one half that they are "team a" and name the other half as the untrustworthy "team b". Then reverse it and say the same thing to the "team b" members. Encourage them to keep an eye on the "bad guys" in the other team and to report any signs of illoyalty to you. Sprinkle a few "conflict inducers" (e.g. tell "Joe": 'do you know what Ed says about you behind your back?') etc. Works wonders if you set up the developers against each other and create a few [invented-by-you] conflicts here and there...
(Eh, and no, I don't actually recommend any of the above. Just kidding. But I have seen people use all of the tactics above. And it didn't work.)
Okay, I am going to be a little practical here.
Being nice to everybody and hoping they won't hurt you doesn't work.
Every programmer knows from the day he joins a company that he'll not stay there forever. He will change when he's learned enough to get a better opportunity.
The programmers who write the code believe that they have the ownership to it even if they wrote it on the time they rented out to somebody else. So many of them will usually try to get their hands on the source-code even if they don't intend to hurt anybody.
Once they leave the company and they've carried the source code with them and lost contact with their colleagues, the conscience settles down and goes on a vacation and after a while bits and pieces from the code start showing up everywhere.
That's what I KNOW happens cause I've witnessed it happen to my company.
So what does one do?
Sign a NDA which specifically mentions that they programmer WILL not take copies.
Distribute your product between programmers, and if possible get modules coded individually and integrated by a chief whose responsibility is that all programmers do nt get all the code.
At the time of termination get a written undertaking from the coders that they do not possess any IP of the company and they understand the penalties of violation.
If somebody violates your IP, sue the man! No exceptions. It'll work as an example for the present team.
Do I sound extreme?
I remember this happening to Valve when they were developing HL-2. Interesting link here: http://www.shacknews.com/onearticle.x/28619
Most of the answers are based on Moral and ethical values. I wonder if Google, Facebook etc. just rely on their employees good will. Give me a break, that's totally utopian. Don't be a fool. Be realistic.
YES, it is possible to prevent code leaking:
Using a virtual server hosting virtual machines, programmers can only access locally to these virtual machines (intranet) via Remote Desktop. Repository is managed locally. private keys are required to access the repository. Copy/paste from virtual machine to client is disabled. only copy/paste from client to virtual is allowed.
Companies like facebook do that.
The only way to still code is by taking pictures to the actual code, which is totally not practical and feasible at all, and since there are surveillance cameras everywhere, you will have to go to the bathroom to take those pictures.
I've worked somewhere where there was a real culture of secrecy about this sort of thing (historically there had been a number of times when the company was small where "customers" had, shall we say, abused their access to our product).
While at the top the management were very protective, I see it slightly differently. I think our code, while not entirely irrelevant, isn't as key as you'd expect it to be in a software company.
The reason that we are successful is:
1) The code is essentially the solution to a bunch of problems. If you get our code you get those solutions but we still have the smart people who solved those problems. They understand those problems better than you do and are better able to solve the next set of problems better than you are.
2) Because they really understand the problems (and the solutions) we can do things faster than our competitors which translates to cheaper (or more profitable).
3) Also because of those people and the attitude within the company we've delivered well to our clients and provided good support.
4) And because of that we have a good reputation and reference-able customers.
A small number of companies have code which is genuinely worth keeping secret - proprietary algorithms and that sort of thing - but for a vast majority of us our products are very easily replicable by smart people.
What I'm saying is do the basics - write it into people's contracts that they can't take it, keep it secure and so on - but don't obsess over it. Unless you're in a very specific market it's unlikely to be what's really going to make your business succeed or fail.
The best step starts from reruting guys with strong ethical behaviour.
Various other steps can be taken like all communication being scanned. There are places where email and all information going out is scanned. The desktop/laptop does not have hard-disk or the access is restricted and all work is on network folders, even when working from home, one has to get connected to internet. The offline work gets synchronized. The USB and drives are disconnected.
The other policies are to provide access only on need basis.
These will only slow down and hinder to some extent, but is one is very determined then he would find ways to get around this.
The other way is if the code is really very important, then have the idea copywrite protected legaly.
To be honest it's almost impossible. If I wanted to suggest what a company that would shortly appear on the Daily WTF would do:
Disconnect the "work computer" from the internet, bt because they need internet access for reference buy everyone a wbbook.
Stuff the developers USB slots with epoxy and require that they load/unload everything from a centralised server, which scans all the data that goes through it for code like syntax.
Or you could just trust your employees and make them sign an NDA...
I personally never tested on any real case, but I would suggest using code fragmentation:
basically you split your project in a number of libraries, define interfaces and unit tests for each of them, then you separate SVN repositories so that each group have access to a limited part of your precious source code.
This is also a good practice no matter what and should help if you are outsourcing abroad.
The previous answers all seem to center on building trust and employing ethical people.
Another possibility might be to create your own domain specific language and tools. That will make any leaked code harder to use. It might still be possible to steal useful ideas from it, but it would not be possible to simply compile a competing product unless the whole toolchain is leaked.
Trust your developers. People tend to live up or down to expectations. Treat them well, and remember that loyalty goes both ways. After all, if you can't cut off thumb drives, you can't stop anybody from leaking code, no matter how much you don't trust them.
That being said, find yourself a lawyer with trade secret expertise, probably expertise in other parts of IP law, and ask how to legally safeguard stuff. You do want to make sure that, if a competitor gets your stuff, it's not legal for the competitor to benefit from it.

Resources