Security concerns of using mongodb [closed]

Security concerns of using mongodb [closed] - security

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I come from mysql background, and I am aware of typical security concerns when using mysql.
Now, I am using mongodb (java driver).
What are the security concerns, and what are possible ways of avoiding security problems?
Specifically these areas:
1) Do I need to do anything for each get/post?
2) I store cookies from my application on client side and read those later (currently the only information I store is user's location, no sensitive information), Anything I should be careful about?
3) I have text boxes, text areas in my forms which users submit. Do I need to check for anything before saving data in mongo?
Can anybody provide any instances of security problems with existing applications in production?

It is in fact possible to perform injections with Mongo. My experience with it is in Ruby, but consider the following:
Request: /foo?id=1234
id = query_param["id"]
collection.find({_id: id})
# collection.find({_id: 1234})
Seems innocuous enough, right? Depending on your HTTP library, though, you may end up parsing certain query strings as data structures:
Request: /foo?id[$gt]=0
# query_param["id"] => {"$gt": 0}
collection.find({_id: id})
# collection.find({_id: {"$gt": 0}})
This is likely less of a danger in strongly typed languages, but it's still a concern to watch out for.
The typical rememdy here is to ensure that you always cast your inbound parameter data to the type you expect it to be, and fail hard when you mismatch types. This applies to cookie data, as well as any other data from untrusted sources; aggressive casting will prevent a clever user from modifying your query by passing in operator hashes in stead of a value.
The MongoDB documentation similarly says:
Field names in MongoDB’s query language have semantic meaning. The dollar sign (i.e $) is a reserved character used to represent operators (i.e. $inc.) Thus, you should ensure that your application’s users cannot inject operators into their inputs.
You might also get some value out of this answer.

Regarding programming:
When you come from a mysql background, you are surely thinking about SQL Injections and wonder if there is something like that for MongoDB.
When you make the same mistake of generating commands as strings and then sending them to the database by using db.command(String), you will have the same security problems. But no MongoDB tutorial I have ever read even mentions this method.
When you follow the usually taught practice of building DBObjects and passing them to the appropriate methods like collection.find and collection.update, it's the same as using parameterized queries in mysql and thus protects you from most injection attempts.
Regarding configuration:
You need, of course, make sure that the database itself is configured properly to not allow unauthorized access. Note that the out-of-the-box configuration of MongoDB is usually not safe, because it allows non-authorized access from anywhere. Either enable authentication, or make sure that your network firewalls are configured to only allow access to the mongodb port from within the network. But this is a topic for dba.stackexchange.com

Related

rdb vs key-value store for django functionality [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
When would one choose a key-value data store over a relational DB? What considerations go into deciding one or the other? When is mix of both the best route? Please provide examples if you can.

Key-value, heirarchical, map-reduce, or graph database systems are much closer to implementation strategies, they are heavily tied to the physical representation. The primary reason to choose one of these is if there is a compelling performance argument and it fits your data processing strategy very closely. Beware, ad-hoc queries are usually not practical for these systems, and you're better off deciding on your queries ahead of time.
Relational database systems try to separate the logical, business-oriented model from the underlying physical representation and processing strategies. This separation is imperfect, but still quite good. Relational systems are great for handling facts and extracting reliable information from collections of facts. Relational systems are also great at ad-hoc queries, which the other systems are notoriously bad at. That's a great fit in the business world and many other places. That's why relational systems are so prevalent.
If it's a business application, a relational system is almost always the answer. For other systems, it's probably the answer. If you have more of a data processing problem, like some pipeline of things that need to happen and you have massive amounts of data, and you know all of your queries up front, another system may be right for you.

If your data is simply a list of things and you can derive a unique identifier for each item, then a KVS is a good match. They are close implementations of the simple data structures we learned in freshman computer science and do not allow for complex relationships.
A simple test: can you represent your data and all of its relationships as a linked list or hash table? If yes, a KVS may work. If no, you need an RDB.
You still need to find a KVS that will work in your environment. Support for KVSes, even the major ones, is nowhere near what it is for, say, PostgreSQL and MySQL/MariaDB.

IMO, Key value pair (e.g. NoSQL databases) works best when the underlying data is unstructured, unpredictable, or changing often. If you don't have structured data, a relational database is going to be more trouble than its worth because you will need to make lots of schema changes and/or jump through hoops to conform your data to the structure.
KVP / JSON / NoSql is great because changes to the data structure do not require completely refactoring the data model. Adding a field to your data object is simply a matter of adding it to the data. The other side of the coin is there are fewer constraints and validation checks in a KVP / Nosql database than a relational database so your data might get messy.
There are performance and space saving benefits for relational data models. Normalized relational data can make understanding and validating the data easier because there are table key relationships and constraints to help you out.
One of the worst patterns i've seen is trying to have it both ways. Trying to put a key-value pair into a relational database is often a recipe for disaster. I would recommend using the technology that suits your data foremost.

If you want O(1) lookups of values based on keys, then you want a KV store. Meaning, if you have data of the form k1={foo}, k2={bar}, etc, even when the values are larger/ nested structures, and want fast lookups, you want a KV store.
Even with proper indexing, you cannot achieve O(1) lookups in a relational DB for arbitrary keys. Sometimes this is referred to as "random lookups".
Alliteratively stated, if you only ever query by one column, a "primary key" if you will, to retrieve the rest of the data, then using that column as a keyspace and the rest of the data as a value in a KV store is the most efficient way to do lookups.
In contrast, if you often query the data by any of several columns, aka you support a richer query API for the data, then you may want a relational database.

A traditional relational database has problems scaling beyond a point. Where that point is depends a bit on what you are trying to do.
All (most?) of the suppliers of cloud computing are providing key-value data stores.
However, if you have a reasonably sized application with a complicated data structure, then the support that you get from using a relational database can reduce your development costs.

In my experience, if you're even asking the question whether to use traditional vs esoteric practices, then go traditional. While esoteric practices are sexy, challenging, and fun, 99.999% of applications call for a traditional approach.
With regards to relational vs KV, the question you should be asking is:
Why would I not want to use a relational model for this scenario: ...
Since you have not described the scenario, it's impossible for anyone to tell you why you shouldn't use it. The "catch all" reason for KV is scalability, which isn't a problem now. Do you know the rules of optimization?
Don't do it.
(for experts only) Don't do it now.
KV is a highly optimized solution to scalability that will most likely be completely unecessary for your application.

What is the specific and detailed immuned way to be protected from NoSQL injection on MongoDB?

I know. There lots of guides, tutorials and questions at Stack overthere.
But this is not doing for fun, and I wont my projects will be hacked by another 15 years old kid due to some bad input he wrote, like:
{$ne: null}
and immidately will receive all of my users.
I am working with MongoDB and Node.js, with mongojs driver (see it here: https://github.com/mafintosh/mongojs), from what I have tried, to put this input in a field text, it does not allow the attacker to receive the data - therefore, something in between already converting the data he added into text (I have tried this bymyself, with more than one syntaxes).
I also know about mongo sanitize plugin (search in github), but it is time consuming, and I want to focus on develop my project, instead of this waste of time on every endpoint, with a case of forget something and throw a full business to trash.
My question (and for sure the question of many others):
Is it enough to be safe from NoSQL injection?
Please, if it does not, tell what does, there must be a way to be 100% safe from NoSQL injections, otherwise, how Google, Facebook, Stackoverflow and other big sites are not got hacked by SQL/NoSQL injections?

Creating Similar Routes with Different Parameters in Nodejs

I did a google search, but I could not find what I really need.
I need to query an API, which have the same route, but with different parameters.
Example:
router.get('/items/:query, function(){})
In this case, I would search for all items
router.get('/items/:id, function(){})
Here, I would look for a specific item

At the core of your issue is that you are trying to specify two different resources at the same location. If you design your API to adhere to restful principles you'll see why that's not a wise choice. Here are two good starting points:
https://en.wikipedia.org/wiki/Representational_state_transfer
http://www.restapitutorial.com/lessons/whatisrest.html
In restful api's the root resource represents the item collection:
/api/items
...and when you specify an id that indicates you want only one item:
/api/items/abc123
If you really still want to accomplish what you asked in your question you'll need to add a url parameter like /items/query?query=true or /items/abc123?detail=true but this will be confusing to 99% of web developers who ever look at your code.
Also I'm not sure if you really meant this, but when you pass a variable named "query" to the server that seems to indicate that you're going to send a SQL query (or similar data definition language) from the client into the server. This is a dangerous practice for several reasons - it's best to leave all of this type of code on your server.
Edit: if you really absolutely positively have to do it this way then maybe have a query parameter that says ?collection=true. This would at least be understood by other developers that might have to maintain the code in future. Also make sure you add comments to explain why you weren't able to implement rest so you're not leaving behind a bad reputation :)

The issue is that without additional pattern matching there isn't a way Express will be able to distinguish between /items/:query and /items/:id, they are the same route pattern, just with different aliases for the parameter.
Depending on how you intend to structure your query you may want to consider having the route /items and then use query string parameters or have a separate /items/search/:query endpoint.

How do you deal with legacy data integrity issues when rewriting software?

I am working on a project which is a rewrite of an existing legacy software. The legacy software primarily consists of CRUD operations (create, read, update, delete) on an SQL database.
Despite the CRUD-based style of coding, the legacy software is extremely complex. This software complexity is not only the result of the complexity of the problem domain itself, but also the result of poor (and regularly bordering on insane) design decision. This poor coding has lead to the data in the database lacking integrity. These integrity issues are not solely in terms of relationships (foreign keys), but also in terms of the integrity within a single row. E.g., the meaning of column "x" outright contradicts the meaning of column "y". (Before you ask, the answer is "yes", I have analysed the problem domain and correctly understand the meaning and purpose of these columns, and better than the original software developers it seems).
When writing the replacement software, I have used principles from Domain Driven Design and Command Query Reponsibility Segregation, primarily due to the complexity of the domain. E.g., I've designed aggregate roots to enforce invariants in the write model, command handlers to perform "cross-aggregate" consistency checks, query handlers to query intentionally denormalised data in a manner appropriate for various screens, etc, etc.
The replacement software works very well when entering new data, in terms of accuracy and ease of use. In that respect, it is successful. However, because the existing data is full of integrity issues, operations that involve the existing data regularly fail by throwing an exception. This typically occurs because an aggregate can't be read from a repository because the data passed to the constructor violates the aggregate's invariants.
How should I deal with this legacy data that "breaks the rules". The old software worked fine in this respect, because it performed next to no validation. Because of this lack of validation, it was easy for inexperienced users to enter nonsensical data (and experienced users became very valuable because they had years of understanding it's "idiosyncrasies").
The data itself is very important, so it cannot be discarded. What can I do? I've tried sorting out the integrity issues as I go, and this has worked in some cases, but in others it is nearly impossible (e.g., data is outright missing from the database because the original developers decided not to save it). The sheer number of data integrity issues is overwhelming.
What can I do?

for a question tagged with DDD the answer is almost always talk to your domain expert. How do they want things to work.
I also noticed your question is tagged with CQRS. are you actually implementing CQRS? in that case it should be almost a non-issue.
Your domain model will live on the command side of your application and always enforce validation. The read stack will provide just dumb viewmodels. This means that on a read, your domain model isn't even involved and also no validation is applied. It will just show whatever nonsense it can use to populate your viewmodel. However on a write validations are triggered. and any write will need to adher to the full validations of your viewmodel.
Now back to reality: be VERY sure that the validations you implement are actually the valididations required. For example even something simple as a telephone number (often implemented as 3 digits dash 3 digits dash 4 digits). But then companies haver special phone numbers like 1800-CALLME which not only have digits but also have letters, and could even be of different lengths (and different countries might also have different rules). If your system needs to handle this it pritty much means you can't apply any validation on phonenumbers.
This is just an example how what you might think is a real validation really can't be implemented at all because that 1 special case it needs to handle. The rule here becomes again. Talk to your domain expert how he wants to have things handled. But be VERY careful that your validations doens't make it near impossible for real users to use your system. since that's the fastest way to have your project killed.
Update: In DDD you would also hear the term anti-corruption layer. This layer ensures that incomming data meets the expectations of your domain model. This might be the prefered method but if you say you cannot ignore items with garbage data then this might not solve the problem in your case.

How to sanitize data coming out of a database

This is my first question on stack overflow and I have taken a lot of time to search for the similar question but surprisingly could not find one.
So I read that no data should be trusted, whether from a client or that which is coming out of a database. Now while there are lots of examples that show how to sanitize data from a user ($_POST or $_GET), I could not find one that shows how the data from a database should be sanitized.
Now maybe it's the same as the data coming from a user / client (that's what I think it should be) but I found no example of it. So I am asking it just to make sure.
So for example if the result of a query yields as follows:-
$row=mysqli_fetch_assoc($result);
$pw = $row['Password'];
$id = $row['ID'];
$user = $row['Username'];
then do the variables $pw, $id and $user have to be sanitized before they should be used in the program? If so, then how ?
Thanks to all.

Your thinking is back to front here. By the time you are able to sanitise inputs using php, it's probably too late. The data is already in php. You don't sanitise inputs. You:
validate input & sanitise output
Normally a database is wrapped by the application tier. So the only data in there should have been filtered and escaped by your code. You should be able to trust it. But even then, in a relational database the data is fairly strongly typed. Hence there is little scope for attacking php from the data tier.
But you should be sanitising (escaping or encoding) any output. How you do that depends on where and how you are sending the data, hence it should be done at the point where it leaves php, not the point where it enters php. And the method you use (mysqli_escape, HTMLentities, base64, urlencode.....) Should be appropriate to where the data is going. Indeed it is better practice to change the representation of a copy of the data (and discard it after use) rather than the original.

It depends... How are you accessing this database? Who works on / maintains it? Going in is definitely a far bigger concern. However, if you wanted to sanitize it coming out of a database you need to know what you are sanitizing for. If you want to sanitize web traffic against XSS you'd probably want to remove all url's not on a whitelist, perhaps script tags and a few other things as well. Are you sanitizing data going into a C/C++ program? Then you probably want to make sure you're protecting yourself against buffer overflow issues as this is a legitimate avenue of attack.
I'm drawing some assumptions about your design here but I'm going to assume you're just working on the model aspect of an MVC application using PHP. PHP, in this case, has been most vulnerable to SQL Injection attacks on the backend, and XSS (cross site scripting) attacks on the front end. (NOTE: This isn't a PHP problem exclusively, this is a problem in all programming and different languages provide different solutions to different problems. Remember - you need to know what you're sanitizing for what reason. There is no one size fits all.
So really, unless you are sanitizing against something universal in all the code this model will sanitize for, you probably don't want to sanitize here. XSS would be a bigger concern to you now than sql injection... the way out is too late to stop an injection attack.
To take some liberty just to get the juices flowing - From a security standpoint, given your code seems to revolve around authentication, I would be much more immediately concerned around how you are storing and processing your credentialing data. A few things should definitely be doing:
Running the password through a secure, 1-way hash BEFORE storage (such as BCrypt).
Salting these hashes (with a different salt for EACH user) before storing them in the database to protect your user's data from things such as rainbow table attacks.
Using TLS for all communications.
Establishing and maintaining a secure session (track user-login without exposing password data with every single request sent, amongst other things).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string