How to prevent duplicates in online anonymous surveys?

How to prevent duplicates in online anonymous surveys? - browser

I am writing an online survey and I am wondering if there are any good techniques for allowing anonymous people to go to the survey and participate and also prevent duplicates.
I have considered the following, but there are drawbacks from each:
Use cookie in browser
Record IP address
Compare answers for similarities along with either/or the first two methods
Of the techniques I have considered, you either prevent multiple people from using the same device, or make it easy for a user to duplicate survey results. Hopefully someone has an excellent way to prevent this :)

Well I am not sure whether you are thinking of deliberate or accidental duplicates?
if you think people will want to post a load of results to skew the survey, I cannot add anything because any ID-related question you ask can be falsely answered.
if you want people to just give their answers without having to go through a login process, how about asking for their initials plus birthdate (ddmmyyyyfl)- that has a pretty good chance of being unique without really compromising their identities or taking too much time.
Was that what you were after?
Ed

I am currently investigating a similar scenario.
Some of the suggestions I found online are:
You generate a unique URL - which you can send to their email (this email does not have to be stored), and then you add a checksum to the URL to verify it is valid.
Similar to the above mentioned, you provide them with an uniquely generated password, and you validate whether the password has been used before.
The clear limitation is that you require their email, and this is slightly lengthy. However, the email address is not associated with the answer set.
Meaning, you can validate whether an email address has been used to send a URL/Password to. Prevents same email address from being used over and over.
Then when the URL/Password is used, you validate whether that unique reference has been used in an answer set before. (The answer set is associated with the Unique Reference, and not the person's email - ensuring anonymity).
The problem with using email, ID numbers, and Birth Date; is that all of these values can be fabricated.If this approach is used, also do not forget good old CAPTCHA, as a script can be created to run through the combinations and submit answer sets.
I realize this is an old post, but hopefully it helps someone at some point. All of the best.

Related

Login Credetials: Why not drop the username?

I was just about to write my hundredth login form when a thought crossed my mind: Why do I need a username?
A while ago my dad had to change his e-mail-address, and he still didn't figure out, why he can't log into various websites with his new address.
I'm also not a huge fan of individual per-site usernames. And wouldn't it be easier to remember only a password?
What are usernames good for? You obviously need some unique string to identify a user by. If you had just the password, that would work until a user picks a taken password and you would have to tell him “Sorry, 'GreatPassword123' already belongs to another user” — bad idea.
So part of the password needs to be unique. My idea: Predetermine the first three characters! You could choose from lower- and uppercase letters and digits, providing (26+26+10)^3 = 373,248 unique prefixes. At registration, the user would get a dialog, telling him that he only needs a password, and it starts with “N0i” for example, he has to pick the rest (“deaWhy” comes to mind). He can then log in with his password only, being “N0ideaWhy”, not knowing (or caring) that “N0i” actually is a unique username.
I see the following pros and cons:
Pros
independence from e-mail-addresses
user needs to remember just one string
might reduce password reuse
safe from leaked lists
faster login through fewer keystrokes
Cons
need to split the password-string and submit the first three characters unencrypted while hashing the rest
scalability comes to a dead stop at 373,248 users (or 26.8 million if you use four characters)
users might be skeptical / unexperienced / thrown off by not being able to reuse their standard password
I'm really wondering why nobody else did this so far? Are there any concerns that I missed?

By adding three random characters, you created a link between the password and the user, in other words a login. Beside the elements you mentionned, this login also has other problems:
it is much harder to remember (xhkr vs john.doe#example.com)
it cannot be unique across your services, even if you wanted to
you will need to request the email anyway in order to reset the password
What you are looking for has kinda been implemented already via social logins:
The idea is to use an independent service to handle your authentication. If every service owner would agree then you will end up with a unified login. It raises several concerns (lock-in, hack of the provider, personal data dissemination) but this is closest to what we came up with regarding centralized authentication (the grand father was OpenID): you just need to stay with one service (or a limited few).

How can I prevent bulk vulnerability scanning without using a CAPTCHA component?

How can I prevent that forms can be scanned with a sort of massive vulnerability scanners like XSSME, SQLinjectMe (those two are free Firefox add-ons), Accunetix Web Scanner and others?
These "web vulnerability scanners" work catching a copy of a form with all its fields and sending thousands of tests in minutes, introducing all kind of malicious strings in the fields.
Even if you sanitize very well your input, there is a speed response delay in the server, and sometimes if the form sends e-mail, you vill receive thousands of emails in the receiver mailbox. I know that one way to reduce this problem is the use of a CAPTCHA component, but sometimes this kind of component is too much for some types of forms and delays the user response (as an example a login/password form).
Any suggestion?
Thanks in advance and sorry for my English!

Hmm, if this is a major problem you could add a server-side submission-rate limiter. When someone submits a form, store some information in a database about their IP address and what time they submitted the form. Then whenever someone submits the form, check the database to see if it's been "long enough" since the last time that IP address submitted the form. Even a fairly short wait like 10 seconds would seriously slow down this sort of automated probing. This database could be automatically cleared out every day/hour/whatever, you don't need to keep the data around for long.
Of course someone with access to a botnet could avoid this limiter, but if your site is under attack by a large botnet you probably have larger problems than this.

On top the rate-limiting solutions that others have offered, you may also want to implement some logging or auditing on sensitive pages and forms to make sure that your rate limiting actually works. It could be something simple like just logging request counts per IP. Then you can send yourself an hourly or daily digest to keep an eye on things without having to repeatedly check your site.

Theres only so much you can do... "Where theres a will theres a way", anything that you want the user to do can be automated and abused. You need to find a median when developing, and toss in a few things that may make it harder for abuse.
One thing you can do is sign the form with a hash, for example if the form is there for sending a message to another user you can do this:
hash = md5(userid + action + salt)
then when you actually process the response you would do
if (hash == md5(userid + action + salt))
This prevents the abuser from injecting 1000's of user id's and easily spamming your system. Its just another loop for the attacker to jump through.
Id love to hear other peoples techniques. CAPTCHA's should be used on entry points like registration. And the method above should be used on actions to specific things (messaging, voting, ...).
also you could create a flagging system, and anything the user does X times in X amount of time that may look fishy would flag the user, and make them do a CAPTCHA (once they enter it they are no longer flagged).

This question is not exactly like the other questions about captchas but I think reading them if you haven't already would be worthwhile. "Honey Pot Captcha" sounds like it might work for you.
Practical non-image based CAPTCHA approaches?
What can be done to prevent spam in forum-like apps?

Reviewing all the answers I had made one solution customized for my case with a little bit of each one:
I checked again the behavior of the known vulnerability scanners. They load the page one time and with the information gathered they start to submit it changing the content of the fields with malicious scripts in order to verify certain types of vulnerabilities.
But: What if we sign the form? How? Creating a hidden field with a random content stored in the Session object. If the value is submitted more than n times we just create it again. We only have to check if it matches, and if it don't just take the actions we want.
But we can do it even better: Why instead to change the value of the field, we change the name of the field randomly? Yes changing the name of the field randomly and storing it in the session object is maybe a more tricky solution, because the form is always different, and the vulnerability scanners just load it once. If we don’t get input for a field with the stored name, simply we don't process the form.
I think this can save a lot of CPU cycles. I was doing some test with the vulnerability scanners mentioned in the question and it works perfectly!
Well, thanks a lot to all of you, as a said before this solution was made with a little bit of each answer.

Registering bugs by email

What is a best way to parse an email with bug description. One client decided recently that it would be nice for user to be able to send an email to known mailbox and a bug would be registered in bug tracker (not exactly bu close).
The problem is bug description has lot of fields like dates, times, descriptions, comments, losses, attachments etc. Relying on user to use some specific mail format is not the smartest thing to do.
The question is how could one parse email to get all needed information. The format should be not to strict, but enough to guess what fields mean what. I would also be interested to hear both correct and easiest solutions for this.
P.S.
Actually this feature was requested by a bank. They have a public mailbox where clients would sends discovered issues. The problem is to get as much information from these letters before bank employee will actually look at it.

We do something similar to this with RT, however the email isn't really parsed. All the emails go into a single queue where it is evaluated by our IT staff. Basically, the parsing is done by humans - they modify the ticket to have as much information as they can glean from the email.
You're unlikely to get users to adhere (correctly) to any special syntax or formatting you come up with - unless they are employees, highly trained, or have some incentive to follow your rules.
Another option would be to have the initial email respond with an email that is formatted as a questionaire. In other words, the user writes an initial bug report and immediately (or as soon as your email server can respond) gets back a "thank you - can you provide more info" message with prompts for more info. You could then parse that email and have it populate your bug tracking system with more accurate info,
Good luck! Sounds like a cool idea!

FogBugz has the ability to monitor an email address and add emails sent to that address as a new FogBugz cases.
There's also a feature called ScoutSubmit that accepts HTTP GET arguments and uses those to submit a new case. Very handy for having a application automatically submit bug reports from the field.

Categorizing a bug based on freeform text is a difficult proposition. Very little besides the defect submitter name and the date the bug is reported is easily gleened from an email. Is there a reason you are limiting yourself to email? If you provide a form to submit the bugs via a webpage you can categorize the defect/bug based on dropdown menu items you present to the user. In addition you can point them to common answers in a dynamic information portion of the page. Have a look at Apple iTunes support request page for a slightly annoying but effective method to force the user to give you decent information. Banking applications are not a good domain to allow ambiguity nor are they a good domain to have multiple rounds of communication.

Potential legal issues with storing Social Security/Insurance Numbers (SSNs/SINs)? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
A client using our system has requested that we store the SSNs/SINs of the end users in our database. Currently, we store minimal information about users (name, email address, and optionally, country), so I'm not overly concerned about a security breach - however, I have a suspicion there could be legal issues about storing SSNs and not taking "appropriate" measures to secure them (coming from Australia, this is my first encounter with them). Is this a valid concern?
I also read on the Wikipedia page about SINs (Canada's equivalent to SSNs) that it should ONLY be used when absolutely necessary and definitely shouldn't be used as a general identifier, or similar.
So, are there any potential legal issues about this sort of thing? Do you have any recommendations?

The baseline recommendation would be to:
Inform the user that you are storing their SSN before they use your site/application. Since the request appears to be to collect the information after the fact, the users should have a way to opt out of your system before they log in or before they put in their SSN
Issue a legal guarantee that you will not provide, sell, or otherwise distribute the above information (along with their other personal information of course)
Have them check a checkbox stating that they understand that you really are storing their SSNs
but the most important part would probably be:
Hire a lawyer well-versed with legal matters over the web

Funny thing about SSNs... the law that created them, also clearly defined what they may be used for (basically tax records, retirement benefits, etc.) and what they are not allowed to be used for - everything else.
So the fact that the bank requires your SSN to open a checking account, your ISP asks for it for high speed internet access, airlines demand it before allowing you on a plane, your local grocery/pub keeps a tab stored by your SSN - that is all illegal. Shocking, isn't it...
All the hooha around identity theft, and how easy it is thanks to a single, unprotected "secret" that "uniquely" identifies you across the board (not to mention that its sometimes used as authentication) - should never have been made possible.

Some good warning stated already here.
I'll just add that speaking of SIN (Canada's Social Insurance Number) codes, I believe it's possible to have collisions between a SIN and a SSN (in other words the same number, but two different people/countries). It shouldn't be a surprise since these are separate codification systems, but I somehow can imagine some doing data entry that may be inclined to stick a SIN into a SSN field and vis-versa (think international students in college/university as one instance - I was told by a DBA friend that he saw this happen).
A given information system may be designed to not allow duplicates, and either way, you can see why there might be confusion and data integrity issues (using a SSN column as a unique key? Hmm).

Way too many organizations in the USA use SSNs as unique identifiers for people, despite the well-documented problems with them. Unless your application actually has something to do with government benefits, there's no good reason for you to store SSns.
Given that so many organizations (mis)use them to identify people for things like credit checks, you really need to be careful with them. With nothing more than someone's name, address, and SSN, it's pretty easy to get credit under their name, and steal their identity.
The legal issues are along the lines of getting sued into oblivion for any leak of personal information that contains SSNs.

If it were me I'd avoid them like the plague, or figure out some very very secure way to store them. Additionally (not a legal expert by any extent but..) if you can put in writing somewhere that you are no way responsible if any of this gets out.

At a minimum, you want to be sure that SSNs are never emailed without some protection. I think the built-in "password to open" in Excel is enough, legally. I think email is the weakest link, at least in my industry.
Every now and then, there is a news item "Laptop Stolen: Thousands of SSNs Possibly Compromised." It's my great fear that it could be my laptop. I put all SSN containing files in a PGP-protected virtual drive.
You do have good security on your database, don't you? If not, why not?

Forced Alpha-Numeric User IDs

I am a programmer at a financial institute. I have recently been told to enforce that all new user id's to have at least one alpha and one numeric. I immediately thought that this was a horrible idea and I would rather not implement it, as I believe this is an anti-feature and of poor user experience. The problem is that I don't have a good case for not implementing this requirement.
Do you think this is a good requirement?
Do you have any good reasons not to do it?
Do you know of any research that I could reference.
Edit: This is not in regards to the password. We already have similar requirements for that, which I am not opposed to.

One argument against this is that many usernames / ids in other areas do not require numeric components. It's more likely that users will be better able to remember user ids that they have used elsewhere -- and that is more likely if they do not need to include numerics.
Furthermore, depending on the system, the user ids may work well as defaults when connecting to external systems (ssh behaves this way under unix-like systems). In this case, it is clearly beneficial to have one ID that is shared between systems.
Using the same ID in multiple places improves consistency, which is a well-known aspect of good software interfaces. It's not too difficult to show that the way people interact with a system is a user-interface, and should adhere to (at least some) of the well-known interface guidelines. (Obviously ideas like keyboard shortcuts are meaningless if you're considering the interactions between multiple, possibly unknown, systems, but aspects such as consistency do apply.)
Edit: I'm assuming that this discussion is about usernames or publicly visible IDs, NOT something that pertains directly to security, such as passwords.

I would begin by asking them for their specific reasons behind this. Once you have a list of bullet points and the reasons why, it's easier to refute or provide alternatives.
As for general ideas:
This is opinion, but adding a numeral to a username won't necessarily increase security. People write down usernames on post it notes, most users will just add a '1' to the beginning or end of their username, making it easy to guess.
From a usability standpoint, this is bad as it breaks the norm. Forcing them to add a numeral to their username will just lead to the above point. They will simply add a '1' to the end or beginning of their username.
Remember, the more complex an authentication system is, the more likely a general user is to find ways to circumvent it and make their link in the chain weak.

UserIDs? Requiring passwords to be alphanumeric is generally a good idea, since it makes them more resistant to a dictionary attack. It doesn't really make any sense for usernames. The whole point of having a name/password combo is that the name part doesn't have to be kept secret.

If you're working at a financial institution, there are probably regulations about this sort of thing, so it's most likely out of your hands. But one thing you can do is make it clear to the user when he has entered an invalid ID. And don't wait until he clicks submit; show some kind of message right next to the field, and update it as he types.

A few of the answers above have a counter-argument: If the users pick the same username they use on the other sites, then they are also likely to pick the same or similar passwords for the financial site, lowering security.
A reason not to do it: If you impose more restrictions than they are used to on the users, they will start writing down the login information, and that's an obvious loss of security.
Both of the bank accounts I have require an alphanumeric username and two passwords for the online login. One of them also has a image I have to remember. The two passwords have to change once a month or so. Therefore, I have all the login information right here on a text file. (Even looking at it doesn't make any sense; I'll have to go down to the bank and reset my passwords again. That's a grand total of 7 password resets for 6 logins. Talk about security, not even I can access my account.)

it's good if it's in their password (as alas, financial companies like to deny you this security right [i'm talking to you american express]).
username, i say no, unless they want to.

A username will (presumably) need to be quoted on the phone when calling for support so it will be publicised unlike a password. Also, the username field won't be masked out in browsers like password fields are, so it will have much more exposure and get cached/logged in various places, so the 'benefit' of the added security will be undone in no time.
And the more difficult you make things, the more likely a user is to write it down somewhere which again undermines security (same applies for password policies actually, but that's another story!)

I also work at a financial institution and our usernames (both real people and production IDs) are all lowercase, alphabetical, up to 8 characters and I've never considered it a problem... avoids the confusion of 0 vs O, 1 vs I, and 8 vs B - unless you work for the same company as me and are about to implement a new policy...

Adding any feature adds costs. It will take time now to build and test it, and in the future to support it. No feature should be built without a really good reason.
This feature is pointless. Usernames are not supposed to be kept secret, so having strong usernames has no advantage. It is probably worth spending time making passwords (or other authentication factors) strong, but users should be able to communicate their username to other users without that being a security risk.
If your application imposes extra constraints on the choice of user ID then some of your users will have a different user ID for your application than for the other applications in your environment. Note: I'm assuming that this is an internal application (for use by employees) rather than in Internet-facing application.
Having inconsistent usernames adds a number of specific risks:
It will make the audit trail harder to follow (a serious security risk).
It may add cost if you later start using single sign on.
It will cause a bad user experience as users have to remember that this application uses a weird username.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string