I have a Node app that collects the user's address and sends it to the Yahoo PlaceFinder api (which returns the user's geolocation).
The user should fill out the "address" field and click submit.
Many different address formats are acceptable, such as:
90210 (just zip)
123 fake street, beverly hills, CA 90210
123 fake street, 90210
... etc
I'm not concerned if the user enters a valid address or not. I don't even want to think about what RegEx would be needed for that.
I am concerned about security.
What steps (if any) should I take to sanitize the user's input before processing it with with my Node app - http.get()request to YahooPlace finder api?
Let me give a fairly pragmatic answer beyond the usual "sanitise your untrusted data against a white list of acceptable values" response. When you say "collects the user's address", are you actually either:
Passing it to a location such as a database where you might be worried about injection attacks
Rendering it to a page where you might be worried about an XSS attack
If you're simply taking the input and passing it off to Yahoo, then the problem is more theirs than yours as you're not exposing yourself to either of the above attack vectors. You might find that if this is the case, there's not a lot of point in going down the sanitisation path which will inevitably be very difficult against something as variable as a free-form address.
I dare say the simplest method would be to apply a regex to their input, allowing only alphanumeric characters, along with perhaps a comma and a period. I'm not sure if that would allow all valid addresses through, but if it fails you could display an error message to the effect of "only use [A-z0-9,.]"
That should, in theory, mitigate most types of exploits you would see, as they would most likely need some form of control character to break your code. Barring an overflow of some sort, I'm pretty sure commas and periods are relatively harmless given your situation.
Related
I have a screen sharing application in which all sensitive data is masked. There are some scenarios in which the customer types sensitive data, such as the SSN, into a non-masked field, thereby directly compromising our solution.
Is there a way I can detect that person is typing SSN or any sensitive data, without accessing the DB -- or any other server-side information -- just on the client side.
For instance, consider a form with SSN and address fields. How can I avoid displaying an SSN mis-entered in the Address field?
SSN
Address:888-9999-0988
EDIT
My approach as of now is storing all patterns in a property file on the client side. For example, a typical password in my application is 8 characters with following rules:
^(?=.*[A-Za-z])(?=.*\d)(?=.*[$#$!%*#?&])[A-Za-z\d$#$!%*#?&]{8,}$
I will have this regex stored on client side. Once the user starts typing, I will check whether he is typing a password or a username.
Similarly, I can break a phone number into country - area code format and can have a regex for that also.
But since SSN is pure 9 digit random number, I am stuck.
By extracting field names from DOM, I can get exact location of cursor and Field name in which User is typing data.And by using the type of data format they support I can have rough idea whether user is typing the required data or something else. Please correct me if I am wrong
September:
14th-2016
Since I have screen sharing solution running in a browser,it will be used as plug-able component by the websites.So it won't be easier to ask them to restructure their pages.
Below is the approach I have finalized based on your inputs.
I will have one file file with all patterns stored in it.
1.For Credit related info all have definite patterns.So it won't be any problem
2.Passwords also have some rules or patterns.So these can also be checked
To avoid data exposure I will show generic message till user types data,like the watsapp shows "User is typing".Once User tabs or comes to next field then only data will be shown after inline Pattern Validation.
Only thing left here is how to detect those fields like SSN, Phone number etc which does not have any pattern.Hope we can have some solution for this also.
Regards
Harry
In short, you can't.
How do you differentiate the sensitive data by value and format?
For instance, how do you differentiate SSN 212-73-6500 from NYC telephone number 212-736-5000 in time to blank out sensitive information? Similarly, how do you detect that the user is typing a password in the address field, before displaying any characters of the password?
You say that you cannot access the data base to match entered data with already known values. Unless you can dictate some differentiation to the user before sensitive data gets entered, you have no way under information theory to detect the mistake. You can't differentiate a password from a name or other alphanumeric input.
I see one possibility: treat all data as sensitive until cleared for display. Each input field gets masked until you verify that it cannot contain sensitive information.
Unfortunately, I don't know that you can do this strictly from the client side. Again, how do you differentiate a mis-typed SSN (left out or doubled a digit) from a phone number? How can you tell an alphabetic address from a password?
I believe that, to solve this, you will need to impose some restrictions on the sensitive input you request. Passwords cannot look anything like any other field. You separate confusable fields by as much as possible on the page ... and some of this may be problematic with the smooth flow of entering information.
EDIT RESPONSE
That's correct: you can set up a family of recognition rules and differentiate various types of input. However, you still have some innate problems in that you cannot differentiate some types of input until it's too late, such as the phone-or-SSN ambiguity above, or telling a password from a name or address (some addresses start with pure alphabetic characters).
Can you fulfill security requirements with severe partitioning? For instance, put all of the sensitive information on one page, and all of the displayed information on another. I'm looking for a way to force the user to know that certain information is sensitive.
Also, do you have to configure this for international use? Security requirements differ across cultures and nations. Keep that in mind when you design this application.
I was just about to write my hundredth login form when a thought crossed my mind: Why do I need a username?
A while ago my dad had to change his e-mail-address, and he still didn't figure out, why he can't log into various websites with his new address.
I'm also not a huge fan of individual per-site usernames. And wouldn't it be easier to remember only a password?
What are usernames good for? You obviously need some unique string to identify a user by. If you had just the password, that would work until a user picks a taken password and you would have to tell him “Sorry, 'GreatPassword123' already belongs to another user” — bad idea.
So part of the password needs to be unique. My idea: Predetermine the first three characters! You could choose from lower- and uppercase letters and digits, providing (26+26+10)^3 = 373,248 unique prefixes. At registration, the user would get a dialog, telling him that he only needs a password, and it starts with “N0i” for example, he has to pick the rest (“deaWhy” comes to mind). He can then log in with his password only, being “N0ideaWhy”, not knowing (or caring) that “N0i” actually is a unique username.
I see the following pros and cons:
Pros
independence from e-mail-addresses
user needs to remember just one string
might reduce password reuse
safe from leaked lists
faster login through fewer keystrokes
Cons
need to split the password-string and submit the first three characters unencrypted while hashing the rest
scalability comes to a dead stop at 373,248 users (or 26.8 million if you use four characters)
users might be skeptical / unexperienced / thrown off by not being able to reuse their standard password
I'm really wondering why nobody else did this so far? Are there any concerns that I missed?
By adding three random characters, you created a link between the password and the user, in other words a login. Beside the elements you mentionned, this login also has other problems:
it is much harder to remember (xhkr vs john.doe#example.com)
it cannot be unique across your services, even if you wanted to
you will need to request the email anyway in order to reset the password
What you are looking for has kinda been implemented already via social logins:
The idea is to use an independent service to handle your authentication. If every service owner would agree then you will end up with a unified login. It raises several concerns (lock-in, hack of the provider, personal data dissemination) but this is closest to what we came up with regarding centralized authentication (the grand father was OpenID): you just need to stay with one service (or a limited few).
As most of you know, email is very insecure. Even with a SSL-secured connection between the client and the server that sends an email, the message itself will be in plaintext while it hops around nodes across the Internet, leaving it vulnerable to eavesdropping.
Another consideration is the sender might not want the message to be readable - even by the intended recipient - after some time or after it's been read once. There are a number of reasons for this; for example, the message might contain sensitive information that can be requested through a subpoena.
A solution (the most common one, I believe) is to send the message to a trusted third party, and a link to the that message to the recipient, who then reads this message from the 3rd party. Or the sender can send an encrypted message (using symmetric encryption) to the recipient and send the key to the 3rd party.
Either way, there is a fundamental problem with this approach: if this 3rd party is compromised, all your efforts will be rendered useless. For a real example of an incident like this, refer to debacles involving Crypto AG colluding with the NSA
Another solution I've seen was Vanish, which encrypts the message, splits the key into pieces and "stores" the pieces in a DHT (namely the Vuze DHT). These values can be easily and somewhat reliably accessed by simply looking the hashes up (the hashes are sent with the message). After 8 hours, these values are lost, and even the intended recipient won't be able to read the message. With millions of nodes, there is no single point of failure. But this was also broken by mounting a Sybil attack on the DHT (refer to the Vanish webpage for more information).
So does anyone have ideas on how to accomplish this?
EDIT: I guess I didn't make myself clear. The main concern is not the recipient intentionally keeping the message (I know this one is impossible to control), but the message being available somewhere.
For example, in the Enron debacle, the courts subpoenaed them for all the email on their servers. Had the messages been encrypted and the keys lost forever, it would do them no good to have encrypted messages and no keys.
(Disclaimer: I didn't read details on Vanish or the Sybil attack, which may be similar the what comes below)
First of all: Email messages are generally quite small, esp. compared to a 50 mb youtube vid you can download 10 times a day or more. On this I base the assumption that storage and bandwidth are not a real concern here.
Encryption, in the common sense of the word, introduces parts into your system that are hard to understand, and therefore hard to verify. (think of the typical openssl magic everybody just performs, but 99% of people really understand; if some step X on a HOWTO would say "now go to site X and upload *.cer *.pem and *.csr" to verify steps 1 to X-1, I guess 1 in 10 people will just do it)
Combining the two observations, my suggestion for a safe(*) and understandable system:
Say you have a message M of 10 kb. Take N times 10 kb from /dev/(u)random, possibly from hardware based random sources, call it K(0) to K(N-1). Use a simple xor operation to calculate
K(N) = M^K(0)^K(1)^...^K(N-1)
now, by definition
M = K(0)^K(1)^...^K(N)
i.e. to understand the message you need all K's. Store the K's with N different (more or less trusted) parties, using whatever protocol you fancy, under random 256 bit names.
To send a message, send the N links to the K's.
To destroy a message, make sure at least one K is deleted.
(*) as regards to safety, the system will be as safe as the safest party hosting a K.
Don't take a fixed N, don't have a fixed number of K's on a single node per message (i.e. put 0-10 K's of one message on the same node) to make a brute force attack hard, even for those who have access to all nodes storing keys.
NB: this of course would require some additional software, as would any solution, but the complexity of the plugins/tools required is minimal.
The self-destructing part is really hard, because the user can take a screenshot and store the screenshot unencrypted on his disk, etc. So I think you have no chance to enforce that (there will always be a way, even if you link to an external page). But you can however simply ask the recipient to delete it afterwards.
The encryption is on the other hand is not a problem at all. I wouldn't rely on TLS because even when the sender and the client are using it, there might other mail relies who don't and they might store the message as plain text. So, the best way would be to simple encrypt it explicitly.
For example I am using GnuPG for (nearly) all mails I write, which is based on some asymmetric encryption methods. Here I know that only those I have given explicitly permission can read the mail, and since there are plug-ins available for nearly all popular MUAs, I'ts also quite easy for the recipient to read the mail. (So, nobody has to encrypt the mail manually and might forgot to delete the unencrypted message from the disk...). And it's also possible to revoke the keys, if someone has stolen your private key for example (which is normally encrypted anyway).
In my opinion, GnuPG (or alternatively S/MIME) should be used all the time, because that would also help to make spamming more difficult. But thats probably just one of my silly dreams ;)
There are so many different ways of going about it which all have good and bad points, you just need to choose the right one for your scenario. I think the best way of going about it is the same as your 'most common' solution. The trusted third party should really be you - you create a website of your own, with your own authentication being used. Then you don't have to give your hypothetical keys to anyone.
You could use a two way certification method by creating your own client software which can read the emails, with the user having their own certificate. Better be safe than sorry!
If the recipient knows that the message might become unreadable later and they find the message valuable their intention will be to preserve it, so they will try to subvert the protection.
Once someone has seen the message unencrypted - which means in any perceivable form - either as text or as screen image - they can store it somehow and do whatever they want. All the measures with keys and so one only make dealing with the message inconvenient, but don't prevent extracting the text.
One of the ways could be to use self-destructing hardware as in Mission Impossible - the hardware would display the message and then destroy it, but as you can see it is inconvenient as well - the recipient would need to understand the message from viewing it only once which is not always possible.
So given the fact that the recipient might be interested in subverting the protection and the protection can be subverted the whole idea will likely not work as intended but surely will make dealing with messages less convenient.
If HTML format is used, you can have the message reference assets that you can remove at a later date. If the message is open at a later date, the user should see broken links..
If your environment allows for it, you could use the trusted boot environment to ensure that a trusted boot loader has been used to boot a trusted kernel, which could verify that a trusted email client is being used to receive the email before sending it. See remote attestation.
It would be the responsibility of the email client to responsibly delete the email in a timely fashion -- perhaps relying on in-memory store only and requesting memory that cannot be swapped to disk.
Of course, bugs can happen in programs, but this mechanism could ensure there is no intentional pathway towards storing the email.
The problem, as you describe it, does sound very close to the problem addressed by Vanish, and discussed at length in their paper. As you note, their first implementation was found to have a weakness, but it appears to be an implementation weakness rather than a fundamental one, and is therefore probably fixable.
Vanish is also sufficiently well-known that it's an obvious target for attack, which means that weaknesses in it have a good chance of being found, publicised, and fixed.
Your best option, therefore, is probably to wait for Vanish version 2. With security software, rolling your own is almost never a good idea, and getting something from an established academic security group is a lot safer.
IMO, the most practical solution for the situation is using Pidgin IM client with Off-the-Record (no-logging) and pidgin-encrypt (end-to-end assymetric-encryption) together. The message will be destroyed as soon as the chat window is closed, and in emergency, you can just unplug the computer to close the chat window.
How can I prevent that forms can be scanned with a sort of massive vulnerability scanners like XSSME, SQLinjectMe (those two are free Firefox add-ons), Accunetix Web Scanner and others?
These "web vulnerability scanners" work catching a copy of a form with all its fields and sending thousands of tests in minutes, introducing all kind of malicious strings in the fields.
Even if you sanitize very well your input, there is a speed response delay in the server, and sometimes if the form sends e-mail, you vill receive thousands of emails in the receiver mailbox. I know that one way to reduce this problem is the use of a CAPTCHA component, but sometimes this kind of component is too much for some types of forms and delays the user response (as an example a login/password form).
Any suggestion?
Thanks in advance and sorry for my English!
Hmm, if this is a major problem you could add a server-side submission-rate limiter. When someone submits a form, store some information in a database about their IP address and what time they submitted the form. Then whenever someone submits the form, check the database to see if it's been "long enough" since the last time that IP address submitted the form. Even a fairly short wait like 10 seconds would seriously slow down this sort of automated probing. This database could be automatically cleared out every day/hour/whatever, you don't need to keep the data around for long.
Of course someone with access to a botnet could avoid this limiter, but if your site is under attack by a large botnet you probably have larger problems than this.
On top the rate-limiting solutions that others have offered, you may also want to implement some logging or auditing on sensitive pages and forms to make sure that your rate limiting actually works. It could be something simple like just logging request counts per IP. Then you can send yourself an hourly or daily digest to keep an eye on things without having to repeatedly check your site.
Theres only so much you can do... "Where theres a will theres a way", anything that you want the user to do can be automated and abused. You need to find a median when developing, and toss in a few things that may make it harder for abuse.
One thing you can do is sign the form with a hash, for example if the form is there for sending a message to another user you can do this:
hash = md5(userid + action + salt)
then when you actually process the response you would do
if (hash == md5(userid + action + salt))
This prevents the abuser from injecting 1000's of user id's and easily spamming your system. Its just another loop for the attacker to jump through.
Id love to hear other peoples techniques. CAPTCHA's should be used on entry points like registration. And the method above should be used on actions to specific things (messaging, voting, ...).
also you could create a flagging system, and anything the user does X times in X amount of time that may look fishy would flag the user, and make them do a CAPTCHA (once they enter it they are no longer flagged).
This question is not exactly like the other questions about captchas but I think reading them if you haven't already would be worthwhile. "Honey Pot Captcha" sounds like it might work for you.
Practical non-image based CAPTCHA approaches?
What can be done to prevent spam in forum-like apps?
Reviewing all the answers I had made one solution customized for my case with a little bit of each one:
I checked again the behavior of the known vulnerability scanners. They load the page one time and with the information gathered they start to submit it changing the content of the fields with malicious scripts in order to verify certain types of vulnerabilities.
But: What if we sign the form? How? Creating a hidden field with a random content stored in the Session object. If the value is submitted more than n times we just create it again. We only have to check if it matches, and if it don't just take the actions we want.
But we can do it even better: Why instead to change the value of the field, we change the name of the field randomly? Yes changing the name of the field randomly and storing it in the session object is maybe a more tricky solution, because the form is always different, and the vulnerability scanners just load it once. If we don’t get input for a field with the stored name, simply we don't process the form.
I think this can save a lot of CPU cycles. I was doing some test with the vulnerability scanners mentioned in the question and it works perfectly!
Well, thanks a lot to all of you, as a said before this solution was made with a little bit of each answer.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
A client using our system has requested that we store the SSNs/SINs of the end users in our database. Currently, we store minimal information about users (name, email address, and optionally, country), so I'm not overly concerned about a security breach - however, I have a suspicion there could be legal issues about storing SSNs and not taking "appropriate" measures to secure them (coming from Australia, this is my first encounter with them). Is this a valid concern?
I also read on the Wikipedia page about SINs (Canada's equivalent to SSNs) that it should ONLY be used when absolutely necessary and definitely shouldn't be used as a general identifier, or similar.
So, are there any potential legal issues about this sort of thing? Do you have any recommendations?
The baseline recommendation would be to:
Inform the user that you are storing their SSN before they use your site/application. Since the request appears to be to collect the information after the fact, the users should have a way to opt out of your system before they log in or before they put in their SSN
Issue a legal guarantee that you will not provide, sell, or otherwise distribute the above information (along with their other personal information of course)
Have them check a checkbox stating that they understand that you really are storing their SSNs
but the most important part would probably be:
Hire a lawyer well-versed with legal matters over the web
Funny thing about SSNs... the law that created them, also clearly defined what they may be used for (basically tax records, retirement benefits, etc.) and what they are not allowed to be used for - everything else.
So the fact that the bank requires your SSN to open a checking account, your ISP asks for it for high speed internet access, airlines demand it before allowing you on a plane, your local grocery/pub keeps a tab stored by your SSN - that is all illegal. Shocking, isn't it...
All the hooha around identity theft, and how easy it is thanks to a single, unprotected "secret" that "uniquely" identifies you across the board (not to mention that its sometimes used as authentication) - should never have been made possible.
Some good warning stated already here.
I'll just add that speaking of SIN (Canada's Social Insurance Number) codes, I believe it's possible to have collisions between a SIN and a SSN (in other words the same number, but two different people/countries). It shouldn't be a surprise since these are separate codification systems, but I somehow can imagine some doing data entry that may be inclined to stick a SIN into a SSN field and vis-versa (think international students in college/university as one instance - I was told by a DBA friend that he saw this happen).
A given information system may be designed to not allow duplicates, and either way, you can see why there might be confusion and data integrity issues (using a SSN column as a unique key? Hmm).
Way too many organizations in the USA use SSNs as unique identifiers for people, despite the well-documented problems with them. Unless your application actually has something to do with government benefits, there's no good reason for you to store SSns.
Given that so many organizations (mis)use them to identify people for things like credit checks, you really need to be careful with them. With nothing more than someone's name, address, and SSN, it's pretty easy to get credit under their name, and steal their identity.
The legal issues are along the lines of getting sued into oblivion for any leak of personal information that contains SSNs.
If it were me I'd avoid them like the plague, or figure out some very very secure way to store them. Additionally (not a legal expert by any extent but..) if you can put in writing somewhere that you are no way responsible if any of this gets out.
At a minimum, you want to be sure that SSNs are never emailed without some protection. I think the built-in "password to open" in Excel is enough, legally. I think email is the weakest link, at least in my industry.
Every now and then, there is a news item "Laptop Stolen: Thousands of SSNs Possibly Compromised." It's my great fear that it could be my laptop. I put all SSN containing files in a PGP-protected virtual drive.
You do have good security on your database, don't you? If not, why not?