Password safety - security

Many of you probably know this image. Some would argue that it's very vulnerable to dictionary attacks, so I wrote my own password generator. I'm using the default 2048 word list of the bitcoin passphrase, additionally adding a single number, symbol and uppercase letter.
If using a password manager (e.g. Lastpass), I can easily create something like this:
#guwV&zMot#v2YJE0^!x5vCnd1M$r&MNCvJgP*j7k3v6F5nM#hf1jUITOl^HP&D9
But entering this string on your smartphone isn't fun at all. Here some examples how my generated passwords look like (source code):
pull 2 mule slot F milk lecture treat crew _ dry amateur pyramid
hole # agree domain 9 execute exist loud column bounce G another
pledge 4 moral rely slow _ fork M tired fame razor cradle derive
report ! crop addict choice fiction fashion 9 mail W sun weather
My thought is this:
The attacker tries a dictionary attack - without success because of the extra characters. So next try would be bruteforce (lowercase letters). After that he need to try uppercase letters, numbers and symbols too. Because he doesn't know the length of my password and the quantity of numbers/symbols/uppercase letters, he has to try all possible combinations.
So not knowing the pattern and length of the password and the easy interchangeability/modification of the dictionary, chars, numbers and symbols, it should be as strong as the one Lastpass generated. Is my idea correct or am I missing something and this passwords aren't that much safer either?
(Edit) Updated the source code: replaced $RANDOM with a function which generates a random number from /dev/random, added some comments and variable names are clearer now.

Related

Program to encrypt / decrypt text string in Assembly MIPS

I want to create a program that reads in input a string of characters, and through a predefined action (I was thinking of a sum with an integer randomly generated) encrypts the string by returning the encrypted string and the key to decode it in a second moment.
Could you give me any suggestions on how to treat the string?
I would like to do so :
li $v0,8
la $a0,buffer
li $a1,1024
syscall
move $s7,$a0
This is the code to read the string.
After that I want to do:
add $t0,$s5,$s3
When I add a random generated integer to the register contain the string.
After knowing the values ​​of the random number and the sum, I can again get the original string with a subtraction.
Is it a proper method?
That depends somewhat on the purpose of the encryption. As I understand this, the approach you're suggesting is basically a form of a Caesar Cipher. While this will protect your string to some degree against casual observers, it will definitely not be suitable for serious security purposes. It is subject to brute-force attacks, known-plaintext and chosen-plaintext attacks, and frequency analysis.
The idea behind a brute-force attack is that, for any given string of a reasonable length, there will almost always be exactly one shift that will make the string make sense, so an attacker could repeatedly try different shifts until he found the one that made the string make sense. The first shift that makes the string make sense is is almost certainly the correct shift.
If you're doing a "classical" Caesar cipher (e.g. C = A, D = B, E = C, etc.), there are only 25 possible shifts, so on average an attacker could guess the plaintext in 12.5 guesses (and 25 guesses in the worst case). In a scheme like yours you'd have to use a very large range of enormous numbers in order to be able to defend against this even slightly. For example, if you were only doing shifts of between 1 - 100 an attacker could reconstruct the plaintext in an average of 50 guesses (and 100 guesses in the worst case), which is obviously not a defense against a motivated attacker, especially since this task lends itself to easy parallelization. Assuming I did my math right, even if you had a trillion possible shifts and it took 100 operations to do and test a particular shift, you could try all of them in under 7 seconds on an Intel i7 if I did my math right and, on average, it would take less than 3.5 seconds to find the correct answer using brute force.
The idea behind frequency analysis is that your text retains the same statistical characteristics as the host language. For example, in English the most common letter is "e," so if you find the most common letter in your ciphertext it probably corresponds to "e." You can then work out how much you shifted the string to get that particular output. For example, if "g" is the most frequent letter in the ciphertext, you can guess that g = e and that they therefore must have shifted the text over by two.
A known-plaintext attack is where an attacker has an example of both the plaintext and its corresponding ciphertext and they can use that information to reconstruct what the key must have been. A chosen-plaintext attack is basically the same thing except that the attacker gets to choose which plaintext he sees the corresponding ciphertext for. (Note that this is only a problem if you're reusing keys, especially if you're doing so in a predictable manner; if you never reuse keys reconstructing the key for the known/chosen plaintext won't give the attacker any information about the key you used for other messages).
I've never tried doing this in assembly language to tell the truth but if you want good security you might want to consider AES. If you're really interested in simplicity of implementation and are willing to go with something less secure, you might also go with XTEA.

Find substring in stream without storing substring in plain text

Lets say I have a large stream of data (for example packets coming in from a network), and I want to determine if this data contains a certain substring. There are multiple string searching algorithms, but they require the algorithm to know the plain text string they are searching for.
Lets say, the string being sought is a password, and you do not want to store it in plain text in this search application. It would however appear in the stream as plain text. You could for example, store the hash and length of the password. Then for every byte in the stream check if the next length byte data from the stream hash to the password hash you have a probable match.
That way you can determine if the password was in the stream, without knowing the password. However, hashing once for every byte is not fast/efficient.
Is there perhaps a clever algorithm that could find the plain text password in the stream, without directly knowing the plain text password (and instead some non-reversible equivalent). Alternatively could a low quality version of the password be used, with the risk of false positives? For example, if the search application only knew half the password (in plain text), it could with some error detect the full password without knowing it.
thanks
P.S This question comes from a hypothetical discussion I had with some friends, about alerting you if your password was spotted in plain text on a network.
You could use a low-entropy rolling hash to pre-screen each byte so that, for the cost of lg k bits of entropy, you reduce the number of invocations of the cryptographic hash by a factor of k.
SAT is an NP-hard problem. Suppose your password is n characters long. If you could find a way to make a large enough SAT instance that
used a contiguous sequence of m >= n bytes from the data stream as its 8m input bits, and
produced the output 1 if and only if the bits present at its inputs contains your password starting at an offset that is some multiple of 8 bits
then by "operating" this SAT instance as a circuit, you would have a password detector that is (at least potentially) very difficult to "invert".
In some ways, what you want is the opposite of Boolean logic minimisation. You want the biggest, hairiest circuit (ideally for some theoretically justified notions of size and hairiness :) ) that computes the truth table. It's easy enough to come up with truth-table-preserving ways to grow the original CNF propositional logic formula -- e.g., if you have two clauses A and B, then you can always safely add a new clause consisting of all the literals in either A or B -- but it's probably much harder to come up with ways to grow the formula in ways that will confuse a modern SAT solver, since a lot of research has gone into making these programs super-efficient at detecting and exploiting all kinds of structure in the problem.
One possible avenue for injecting "complications" is to make the circuit compute functions that are difficult for circuits to compute, like divisions or square roots, and then test the results of these for equality in addition to the raw inputs. E.g., instead of making the circuit merely test that X[1 .. 8n] = YOUR_PASSWORD, make it test that X[1 .. 8n] = YOUR_PASSWORD AND sqrt(X[1 .. 8n]) = sqrt(YOUR_PASSWORD). If a SAT solver is smart enough to "see" that the first test implies the second then it can immediately dispense with all the clauses corresponding to the second -- but since everything is represented at a very low level with propositional clauses, this relationship is (I hope; as I said, modern SAT solvers are pretty amazing) well obscured. My guess is that it's better to choose functions like sqrt() that are not one-to-one on integers: this will potentially cause a SAT solver to waste time exploring seemingly promising (but ultimately incorrect) solutions.

How difficult/easy is it to decode a custom base 62 encoding?

Here's what I am going to do to obfuscate database id's in permalinks:
1) XOR the id with a lengthy secret key
2) Scramble (rotate, flip, reverse) bits around a little in the XOR'ed integer in a reversible way
3) Base 62 encode the resulting integer with my own secret scrambled up sequence of all
alphanumeric characters (A-Za-z0-9)
How difficult would it be to convert my Base 62 encoding back to base 10?
Also How difficult is it to reverse engineer the whole process? (obviously without taking a peak at source or compiled code) I know 'only XOR' is pretty susceptible to basic analysis.
EDIT: the result should be not more than 8-9 chars long, 3DES and AES seem to produce very long encrypted texts and can't be practically used in URLs
Resulting strings look something like:
In [2]: for i in range(1, 11):
print code(i)
...:
9fYgiSHq
MdKx0tZu
vjd0Dipm
6dDakK9x
Ph7DYBzp
sfRUFaRt
jkmg0hl
dBbX9nHk4
ifqBZwLW
WdaQE630
As you can see 1 looks nothing like 2 so this seems to works great for obfuscation of id's.
If the attacker is allowed to play around with the input, it will be trivial for a skilled attacker to "decrypt" the data. A crucial property of modern crypto systems is the "Avalanche effect" which your system lacks. Basically it means that every bit of the output is connected with every bit of the input.
If an attacker of your system is allowed to see that, for example, id = 1000 produces the output "AAAAAA" and id=1001 produces "ABAAA" and id=1002 produces "ACAAA" the algorithm can be easily reversed, and the value of the key obtained.
That said, this question is a better fit for https://security.stackexchange.com/ or https://crypto.stackexchange.com/
The standard advice for anyone trying to develop their own cryptography is, "Don't". The advanced advice is to read Bruce Schneier's Memo to the Amateur Cipher Designer and then don't.
You are not the first person to need to obfuscate IDs, so there are already methods available. #CodesInChaos suggested a good method above; you should try that first to see if it meets your needs.

How to store and verify digits chosen at random from a PIN/Password

If I have a users 6 digit PIN (or n char string) and I wish to verify say 3 digits chosen at random from the PIN (or x chars) as part of a 'login' procedure, how would I store the PIN in a database or some encrypted/hashed version of the PIN in such a way that I could verify the users identity?
Thoughts:
Store the PIN in a reversible
(symmetrically or asymmetrically) encrypted manner, decrypt for digit checks.
Store a range of hashed permutations of the PIN against some
ID, which links to the 'random
digits' selected, eg:
ID: 123 = Hash of Digits 1, 2, 3
ID: 416 = Hash of Digits 4, 1, 6
Issues:
Key security: Assume that the key is
'protected' and that the app is not
financial nor highly critical, but
is 'high-volume'.
Creating a
wide-number number of hash
permutations is both prohibitively
high-storage (16bytes x several
permutations) and time-consuming probably overkill
Are there any other options, issues or refinements?
Yes: I know storing passwords/PINs in a reversible manner is 'contentious' and ideally shouldn't be done.
Update
Just for clarification:
1. Random digits is a scheme I am considering to avoid key-loggers.
2. It is not possible to attempt more than a limited number of retries.
3. Other elements help secure and authenticate access.
As any encryption scheme you use to store the password/pass phrase would be either prohibitively expensive, or, easily cracked I am coming down on the side of just plain storing it in plain textr and ensuring that the database and server security is up to scratch.
You could consider some lightweight encryption scheme to hide the passwords from a casual browser of the database, but, you have to admit that any scheme will have two basic vulnerabilties. One -- your program will need a password or key which will have to be stored somewhere and will be almost as vulnerable to snooping as the actual passwords sotred in plain text, and, Two -- if you have a reasonable number of users then a hacker who has access to the encrypted passwords has lots of "clue"s to aid his brute force attack, and if your site is open to the public he can insert any number of "known texts" into your database.
Since 6C3 is 20 and 10C3 is 120, I'll get a false positive (be authenticated) on 1/6th of my guesses.
This scheme is only slightly better than no authentication at all regardless of how you store the token.
I totally agree with msw but that argument is only (or mostly) valid for the six digit scheme. For the n-char approach, the false positive ratio will (sometimes...) be much lower. One improvement would be that the random characters must be entered in the same order as in the password.
Also I think that storing hashed permutations would make it relatively easy to find the key using some brute force approach. For example, testing and combining different combinations of three characters and checking those against the stored hashes. This would defeat the purpose of hashing the key in the first place so you might as well store the key encrypted instead.
Another, totally different argument, is that your users might get very confused by this odd login procedure :)
One possible solution is to use Reed-Solomon (or something like it) to construct an n-of-m scheme: generate an nth degree polynomial f(x), where n is the number of digits needed to log in, and generate the pin digits by evaluating f(x) at x=1..6. The digits combined become your full pin. Any three of these digits can then be used (along with their x coordinate) to interpolate the polynomial constants. If they are equal to your original constants, the digits are correct.
The biggest problem, of course, is to form a field out of numbers 0..9 for polynomial constant arithmetic. Ordinary arithmetic will not cut it in this instance. And my finite field is too rusty to remember if it is possible. If you go 4 bits per digit, you can use GF(2^4) to overcome this deficiency. In addition, it is not possible to select your PIN. It will need to be assigned to you. Finally, assuming you can fix all the problems, there are only 1000 distinct polynomials for a 3 of n scheme, and it is too small for proper security.
Anyhow, I don't think this will be a good method, but I wanted to add some different ideas into the mix.
You say you've other elements for authentication. If you've also passwords, you might do the following:
Ask for a password (password is stored as hash only on your side)
First check the hash of the entered password against the stored password hash
On success, continue, otherwise go back to 1
Use there entered (unhashed) password as key for symmetrically encrypted PINs
Ask for some random digits of the PIN
This way the PIN is encrypted, but the key is not stored in plain text on your side. The online portal of my bank seems to do just that (at least I hope so that the PIN is encrypted, but from the users view the login process is like the one described above).
The key is 'protected'
The app is not financial nor highly
critical,
The app is 'high-volume'.
Creating a wide-number number of hash
permutations is both prohibitively
high-storage (16bytes x several
permutations) and time-consuming
probably overkill
Random digits is a scheme I am
considering to avoid key-loggers.
It is not possible to attempt more
than a limited number of retries.
Other elements help secure and
authenticate access.
You seem to be arguing for storing the PIN in the clear. I say go for it. You're basically describing a challenge-response authentication method, and cleartext storage on the server side is common for that use-case.
Something similar to this is a one-time-pad, or a secret key matrix. The difference is that the user has to keep / have the pad with them to access. The benefit is that as long as you get the key distribution sufficiently secure, you're very safe from keyloggers.
If you want to make it so that exposure of the matrix / pad doesn't cause compromise alone, have the user use a short (3-4 number) PIN with the pad, and keep your sensitive locking mechanism.
Example of a matrix:
1 2 3 4 5 6 7 8
A ; k j l k a s g
B f q 3 n 0 8 u 0
C 1 2 8 e g u 8 -
A challenge might be: "Enter your PIN, and then the character from square B3 from your matrix."
The response might be:
98763

What is a dictionary attack?

When we say dictionary attack, we don't really mean a real dictionary, do we? My guess is we mean a hacker's dictionary i.e. rainbow tables, right?
My point is we're not talking about someone keying different passwords into the login box, we're talking about someone who has full access to your database (which has hashed passwords, not plain passwords) and this person is reversing the hashes, right?
Since passwords are oft-times the easiest-to-attack part of cryptography it's actually sort of a real dictionary. The assumption is that people are lazy and choose proper words as password or construct passphrases out of them. The dictionary can include other things, though, such as commonly used non-words or letter/number combination. Essentially everything that is likely to be a poor-chosen password.
There are programs out there which will take an entire hard drive and build a dictionary out of every typable string on it on the assumption that the user's password was at some point in time put in plaintext into memory (and then into the pagefile) or that it simply exists in the corpus if text stored on the drive1:
Even so, none of this might actually matter. AccessData sells another program, Forensic Toolkit, that, among other things, scans a hard drive for every printable character string. It looks in documents, in the Registry, in e-mail, in swap files, in deleted space on the hard drive ... everywhere. And it creates a dictionary from that, and feeds it into PRTK.
And PRTK breaks more than 50 percent of passwords from this dictionary alone.
Actually, you can make dictionaries more effective even if you include knowledge on how people usually build passwords. Schneier talks about this lengthily1:
Common word dictionary: 5,000 entries
Names dictionary: 10,000 entries
Comprehensive dictionary: 100,000 entries
Phonetic pattern dictionary: 1/10,000 of an exhaustive character search
The phonetic pattern dictionary is interesting. It's not really a dictionary; it's a Markov-chain routine that generates pronounceable English-language strings of a given length. For example, PRTK can generate and test a dictionary of very pronounceable six-character strings, or just-barely pronounceable seven-character strings. They're working on generation routines for other languages.
PRTK also runs a four-character-string exhaustive search. It runs the dictionaries with lowercase (the most common), initial uppercase (the second most common), all uppercase and final uppercase. It runs the dictionaries with common substitutions: "$" for "s," "#" for "a," "1" for "l" and so on. Anything that's "leet speak" is included here, like "3" for "e."
The appendage dictionaries include things like:
All two-digit combinations
All dates from 1900 to 2006
All three-digit combinations
All single symbols
All single digit, plus single symbol
All two-symbol combinations
1 Bruce Schneier: Choosing Secure Passwords. In: Schneier on Security. (URL)
Well, if I threw a dictionary at you, it would hurt right?
But yes, a dictionary attack uses a list of words. They might be derived from a dictionary, or lists of common phrases or passwords ('123456' for example).
A rainbow table is different to a dictionary though - it is a reverse lookup for a given hash function, so that if you know a hash, you can identify a string which would generate that hash. For example, if I knew your password had an unsalted MD5 hash of e10adc3949ba59abbe56e057f20f883e, I could use a rainbow table to determine that 123456 hashes to that value.
A 'dictionary attack' usually refers to an attempt to guess a password using a 'dictionary'; that is, a long list of commonly-used passwords, usually corresponding to words or combination of words that people may lazily set as their password. Rainbow tables would be used if, instead of trying to guess the password by specifying the actual plaintext password, you had a password hash, and you wanted to guess the password. You'd specify the hashes of the commonly-used passwords and try to match them against the password hash you had, in order to try and get a match to determine what the password is.
Dictionary attacks are attacks where attakers try words from a rather normal dictionary, because many people will use simple passwords which can be found in a dictionary.
Wikipedia: Dictionary attack.

Resources