Using hashcat where first 10 character of hash are known - linux

I've web application that use MySQL
The application use password hash function from MySQL to store the password for related account
The problem is, it trim the hashed password so that it only store first 10 characters into the password field
I want to prove to my supervisor that, trimming hashed password can make different password to be entered on the login form and accepted by the application. Because those passwords has same first 10 character
To prove that, I'm planning to use hashcat. I've download quite big dictionary file to help my purpose
So, is there someone can help me how is the parameter that I should use in hashcat?
I've tried to google for the answer, but no luck
Thanks

For an answer to the actual question skip to the last section of this answer. The other sections do not answer your question directly, but you may find that this is not neccessary anymore after reading them.
Your Description Of The System
You said the system processes the password as follows
plaintext password ➜ hashed password ➜ first 10 characters of the hash
Example:
Topsecret123 ➜ *E7C95D33E14D3C2A3AE4EAB25C1D98C88593F7AC ➜ *E7C95D33E
Note that MySQL's PASSWORD() prefixes hashes with a *, so you actually only include 9 characters from the hash.
Answering The Question In The Background
You asked how to find hash collisions for the approach from above using hashcat, but what you actually wanted to know/show was
prove trimming hashed password can make different password [...] accepted by the application.
Your emphasis was on »Trimming causes multiple passwords to be accepted«. However, you overlooked that even untrimmed hashing causes multiple passwords to be accepted.
The Pigeonhole Principle
The explanation is so simple that you don't have to find a hash collision. Everyone should understand the following:
There is an infinite amount of passwords.
MySQL password hashes have a fixed length, precisely 64 bit. There can be only 264 different hashes.
The password hash function maps passwords to hashes. Since there are more passwords than hashes some passwords have to be mapped to the same hash.
If not, you would have found a compression function which would allow you to store anything in only 64 bits.
One may argue that the number of valid passwords is not infinite. But even if you would limit valid passwords to be of exactly length 11 and contain only symbols from the group [A-Za-z0-9] (has 62 symbols) there would be 6211 unique passwords:
6211 ≈ 5,2×1019 passwords
264 ≈ 1,8×1019 hashes
Therefore, there still have to be lots of collisions.
Hash Collisions
Trimming hashes is not the root cause of the collision problem, but of course it increases the likelihood of collisions enormously. Normally, hash collisions are not a problem because they happen so rarely that you don't encounter them. However, with strongly trimmed hashes like yours, collisions become a real problem.
Finding A Collision
Using Hashcat
hashcat can compute MySQL password hashes with -m 300. You can confirm this by computing SELECT Password("hashcat"); and comparing the resulting hash with the hash shown here.
However, I couldn't find a way to trim these hashes / look for prefix collisions. I guess hashcat cannot do what you want. You would have to implement a custom hashmode for hashcat. The easiest way to do this would be to alter the current implementation of hashcat's MySQL mode. I'm not sure, but maybe it is sufficient to just change const int out_len = 40; to 9. You may have to update the OpenCL versions of the same module too. Search for m00300 here.
Using A Custom Script
Alternatively, look for a list of password-hash-pairs or generate one yourself, then look for prefix collisions in that table. This was fun so I did it myself
The following python program generates trimmed hashes for some numerical passwords:
#! /usr/bin/python3
import hashlib as hl
def mySqlPwHash(password):
return hl.sha1(hl.sha1(password.encode()).digest()).hexdigest()[:9]
for number in range(0, 300000):
password = str(number)
print(password, "\t", mySqlPwHash(password))
I chose to generate 300'000 hashes because there are 169 trimmed hashes and we can expect to find a collision in √(169) = 262'144 tries (see birthday problem).
To find passwords with the same hash run the script as follows:
./collide.py | sort -k2 | uniq -Df1
In only two seconds the script finished and printed
23607 47ae310ff
251848 47ae310ff
And there you have it, two passwords (23607 and 251848) with the same trimmed hash (47ae310ff).
If your trimmed hashes actually include 10 hex-digits you can adapt the script and will find the two passwords 1874547 and 2873667 sharing the hash 47fc464b2f.

Related

How is it possible to have these reversed hashes available on the web?

If these hashing algorithms are one-way functions, how is it possible to have these reversed hashes available on the web? What is the reverse hashing procedure used by those lookup sites?
When we say that a hash function h is a one-way function, we mean that
given some fixed string w, it's "easy" to compute h(w), but
given f(x) for some randomly-chosen string x, it's "hard" to find a string w where f(w) = f(x).
So in that sense, if you have a hash of a string that you know literally nothing about, there is no easy way to invert that hash.
However, this doesn't mean that, once you hash something, it can never be reversed. For example, suppose I know that you're hashing either the string YES or the string NO. I could then, in advance, precompute h(YES) and h(NO), write the values down, and then compare your hashed string against the two hashed values to figure out which string you hashed. Similarly, if I knew you were hashing a number between 0 and 999,999,999, I could hash all those values, store the results, then compare the hash of your number against my precomputed hashes and see which one you hashed.
To directly answer your question - the sites that offer tables of reversed hashes don't compute those tables by reversing the hash function, but rather by hashing lots and lots and lots of strings and writing down the results. They might hash strings they expect people to use (for example, the most common weak web passwords), or they may pick random short strings to cover all possible simple strings (along the lines of the number hashing example from above).
Since cryptographic hash functions like SHA1, SHA2, SHA2, Blake2, etc., are candidates to one-way functions there is no way to reverse the hashing.
So how do they achieve this; they may choose three ways;
Build a pair database (x, hash(x)) by generating the hash of the well-knowns string; the cracked password list, the English dictionary, Wikipedia text on all languages, and all strings up to some bound like 8;
This method has a huge problem, the space to store all pairs of input and their hash.
Build a rainbow table. Rainbow table is a time-vs-memory trade. Before starting to build the select table parameters in order to cover the target search space.
See Rainbow crack for details of password cracking.
Combine both. Due to the target search space, not all well-known strings, passwords, etc. can be placed in the Rainbow table. For those, use the 1. option.
Don't forget that some of them also providing online hashing tools. Once you asked to hash a value, it is going to enter their database/rainbow table, and when you later visit the site and asked the pre-image of the hash that you have stored, surprise they have it now! If the text is sensitive don't use online hashing services.
There is no process for reverse hashing. You just guess a password and hash it. You can make big databases of these guesses and hashes for reverse lookup, but it's not reversing the hash itself. Search for "rainbow tables" for more details.
Those website does not preform any kind of reverse hashing. There are tables called "Rainbow tables". Those rainbow tables are precomputed table for caching the output of cryptographic hash functions. They got lots and lots of strings and calculated hash values for them and when someone search a hash value they lookup the corresponding value from table and display is.

how to get original value from hash value in node.js

I have created hash of some fields and storing in database using 'crypto' npm.
var crypto = require('crypto');
var hashFirtName = crypto.createHash('md5').update(orgFirtName).digest("hex"),
QUESTION: How can I get the original value from the hash value when needed?
The basic definition of a "hash" is that it's one-way. You cannot get the originating value from the hash. Mostly because a single value will always produce the same hash, but a hash isn't always related to a single value, since most hash functions return a string of finite/fixed length.
Additional Information
I wanted to provide some additional information, as I felt I may have left this too short.
As #xShirase pointed out in his answer, you can use a table to reverse a Hash. These are known as Rainbow Tables. You can generate them or download them from the internet, usually from nefarious sources [ahem].
To expand on my other statement about a hash value possibly relating to multiple original values, lets take a look at MD5.
MD5 is a 128-bit hash. This means it can hold 2^128 bits, or (unsigned) 0 through 340,282,366,920,938,463,463,374,607,431,768,211,455. That's a REALLY big number. So, for any given input you have a 1 in 340,282,366,920,938,463,463,374,607,431,768,211,456 chance that it will collide with the same hash result of another input value.
Now, for simple data like passwords, the chances are astronomical. And for those purposes, who cares? Most of the time you are simply taking an input, hashing it, then comparing the hashes. For reasons I will not get into, when using hashes for passwords you should ALWAYS store the data already hashed. You don't want to leave plain-text passwords just lying about. Keep in mind that a hash is NOT the same as encryption.
Hashes can also be used for other reasons. For instance, they can be used to create a fast-lookup data structure known as a Hash Table. A Hash Table uses a hash as sort of a "primary key", allowing it to search a huge set of data in relatively few number of instructions, approaching O(1) (On-order of 1). Depending on the implementation of the Hash Table and the hashing algorithm, you have to deal with collisions, usually by means of a sorted list. This is why the Hash Table isn't "exactly" O(1), but close. If your hash algorithm is bad, the performance of your Hash Table can begin to approach O(n).
Another use for a hash it to tell if a file's contents have been altered, or match an original. You will see many OSS project provide binary downloads that also have an MD5 and/or SHA-2 hash values. This is so you can download the files, do a hash locally, and compare the results against theirs to make sure the file you are getting is the file they posted. Again, since the odds of two files matching another is 1 in 340,282,366,920,938,463,463,374,607,431,768,211,456, the odds of a hacker successfully generating a file of the same size with a bad payload that hashes to the exact same MD5/SHA-2 hash is pretty low.
Hope this discussion can help either you or someone in the future.
If you could get the original value from the hash, it wouldn't be that secure.
If you need to compare a value to what you have previously stored as a hash, you can create a hash for this value and compare the hashes.
In practice there is only one way to 'decrypt' a hash. It involves using a massive database of decrypted hashes, and compare them to yours. An example here

Does using a password twice when hashing it make it safer?

Does using a password twice when hashing it make it safer? My example is in CodeIgniter, I striped it down to just the bare minimum. Please do not point out all the things wrong with the example, it is just an example.
<?php
function user_signup(){ // The normal way
$this->load->database();
$new_user_insert = array(
'password' => sha1($salt . $this->input->post('password')));
}
function user_signup(){ // The double insert way
$this->load->database();
$new_user_insert = array(
'password' => sha1($this->input->post('password') . $salt . $this->input->post('password')));
}
}
EDIT:
My thought is that it would make the in put twice as long, an example (username: joe, password: 123456789). So instead of having a rainbow table with my hashed 123456789, it would be 123456789123456789. I know this is a over simplification, and the hash would look more like 01a967f5d27b9e910754729a669504a60d2aa865, but a would be hacker would need a bigger rainbow table.Please correct me if I am wrong.
Thank you in advance.
a would be hacker would need a bigger rainbow table
This isn't the case if the attacker knows your strategy for hashing the password.
Suppose, for simplicity's sake, that your password needs to be a 4-digit number. (Of course, this is generalizable to more complex passwords.) There are then 10,000 possible passwords. If you concatenate the password with itself to produce an 18-digit number, the attacker can deduce the second nine digits from the first nine: 1234salt1234 is potentially valid, but 1234salt4321 cannot be, and it would not be included in a rainbow table. The additional digits, bring a function of known information, add no additional complexity.
Adding a user-specific salt to the password as a hash defends against an attacker who can obtain the password hashes and who knows the system. In particular, the attacker knows the algorithm for hashing the user's password. Assuming as before a four-character numeric password, such an attacker using a brute-force strategy would still need to attempt only 10,000 combinations (0000salt0000, 0001salt0001, ..., 9999salt9999). The original strategy (not concatenating the password with itself) would also require 10,000 combinations (0000salt, ..., 9999salt), so would be no less difficult (for practical intents).
In general, no. Entering a password twice is useful to check whether a user typed it correctly, but never useful to have in your database twice.
The purpose of entering the password a second time is just to ensure that the user hasn't mistyped it! It doesn't actually improve security by itself, and certainly there would be little gain in storing in the database twice.
It is to avoid typos. If you have to type the password twice chances
are you'll make your password what you want. They want to avoid having
people whose password is "blah" having to retrieve their password
later on because they typed "blaj" by mistake. This is especially true
since password fields show as asterisks (*********) and sometimes it
is hard to tell if you typed what you thought you typed - people are
likely to make typos without even realizing it. Some websites do this
too with the email address when it is essential that is correct as
well.
why enter password 2 times?

What is a dictionary attack?

When we say dictionary attack, we don't really mean a real dictionary, do we? My guess is we mean a hacker's dictionary i.e. rainbow tables, right?
My point is we're not talking about someone keying different passwords into the login box, we're talking about someone who has full access to your database (which has hashed passwords, not plain passwords) and this person is reversing the hashes, right?
Since passwords are oft-times the easiest-to-attack part of cryptography it's actually sort of a real dictionary. The assumption is that people are lazy and choose proper words as password or construct passphrases out of them. The dictionary can include other things, though, such as commonly used non-words or letter/number combination. Essentially everything that is likely to be a poor-chosen password.
There are programs out there which will take an entire hard drive and build a dictionary out of every typable string on it on the assumption that the user's password was at some point in time put in plaintext into memory (and then into the pagefile) or that it simply exists in the corpus if text stored on the drive1:
Even so, none of this might actually matter. AccessData sells another program, Forensic Toolkit, that, among other things, scans a hard drive for every printable character string. It looks in documents, in the Registry, in e-mail, in swap files, in deleted space on the hard drive ... everywhere. And it creates a dictionary from that, and feeds it into PRTK.
And PRTK breaks more than 50 percent of passwords from this dictionary alone.
Actually, you can make dictionaries more effective even if you include knowledge on how people usually build passwords. Schneier talks about this lengthily1:
Common word dictionary: 5,000 entries
Names dictionary: 10,000 entries
Comprehensive dictionary: 100,000 entries
Phonetic pattern dictionary: 1/10,000 of an exhaustive character search
The phonetic pattern dictionary is interesting. It's not really a dictionary; it's a Markov-chain routine that generates pronounceable English-language strings of a given length. For example, PRTK can generate and test a dictionary of very pronounceable six-character strings, or just-barely pronounceable seven-character strings. They're working on generation routines for other languages.
PRTK also runs a four-character-string exhaustive search. It runs the dictionaries with lowercase (the most common), initial uppercase (the second most common), all uppercase and final uppercase. It runs the dictionaries with common substitutions: "$" for "s," "#" for "a," "1" for "l" and so on. Anything that's "leet speak" is included here, like "3" for "e."
The appendage dictionaries include things like:
All two-digit combinations
All dates from 1900 to 2006
All three-digit combinations
All single symbols
All single digit, plus single symbol
All two-symbol combinations
1 Bruce Schneier: Choosing Secure Passwords. In: Schneier on Security. (URL)
Well, if I threw a dictionary at you, it would hurt right?
But yes, a dictionary attack uses a list of words. They might be derived from a dictionary, or lists of common phrases or passwords ('123456' for example).
A rainbow table is different to a dictionary though - it is a reverse lookup for a given hash function, so that if you know a hash, you can identify a string which would generate that hash. For example, if I knew your password had an unsalted MD5 hash of e10adc3949ba59abbe56e057f20f883e, I could use a rainbow table to determine that 123456 hashes to that value.
A 'dictionary attack' usually refers to an attempt to guess a password using a 'dictionary'; that is, a long list of commonly-used passwords, usually corresponding to words or combination of words that people may lazily set as their password. Rainbow tables would be used if, instead of trying to guess the password by specifying the actual plaintext password, you had a password hash, and you wanted to guess the password. You'd specify the hashes of the commonly-used passwords and try to match them against the password hash you had, in order to try and get a match to determine what the password is.
Dictionary attacks are attacks where attakers try words from a rather normal dictionary, because many people will use simple passwords which can be found in a dictionary.
Wikipedia: Dictionary attack.

What's wrong with XOR encryption?

I wrote a short C++ program to do XOR encryption on a file, which I may use for some personal files (if it gets cracked it's no big deal - I'm just protecting against casual viewers). Basically, I take an ASCII password and repeatedly XOR the password with the data in the file.
Now I'm curious, though: if someone wanted to crack this, how would they go about it? Would it take a long time? Does it depend on the length of the password (i.e., what's the big-O)?
The problem with XOR encryption is that for long runs of the same characters, it is very easy to see the password. Such long runs are most commonly spaces in text files. Say your password is 8 chars, and the text file has 16 spaces in some line (for example, in the middle of ASCII-graphics table). If you just XOR that with your password, you'll see that output will have repeating sequences of characters. The attacker would just look for any such, try to guess the character in the original file (space would be the first candidate to try), and derive the length of the password from length of repeating groups.
Binary files can be even worse as they often contain repeating sequences of 0x00 bytes. Obviously, XORing with those is no-op, so your password will be visible in plain text in the output! An example of a very common binary format that has long sequences of nulls is .doc.
I concur with Pavel Minaev's explanation of XOR's weaknesses. For those who are interested, here's a basic overview of the standard algorithm used to break the trivial XOR encryption in a few minutes:
Determine how long the key is. This
is done by XORing the encrypted data
with itself shifted various numbers
of places, and examining how many
bytes are the same.
If the bytes that are equal are
greater than a certain percentage
(6% according to Bruce Schneier's
Applied Cryptography second
edition), then you have shifted the
data by a multiple of the keylength.
By finding the smallest amount of
shifting that results in a large
amount of equal bytes, you find the
keylength.
Shift the cipher text by the
keylength, and XOR against itself.
This removes the key and leaves you
with the plaintext XORed with the
plaintext shifted the length of the
key. There should be enough
plaintext to determine the message
content.
Read more at Encryption Matters, Part 1
XOR encryption can be reasonably* strong if the following conditions are met:
The plain text and the password are about the same length.
The password is not reused for encrypting more than one message.
The password cannot be guessed, IE by dictionary or other mathematical means. In practice this means the bits are randomized.
*Reasonably strong meaning it cannot be broken by trivial, mathematical means, as in GeneQ's post. It is still no stronger than your password.
In addition to the points already mentioned, XOR encryption is completely vulnerable to known-plaintext attacks:
cryptotext = plaintext XOR key
key = cryptotext XOR plaintext = plaintext XOR key XOR plaintext
where XORring the plaintexts cancel each other out, leaving just the key.
Not being vulnerable to known-plaintext attacks is a required but not sufficient property for any "secure" encryption method where the same key is used for more than one plaintext block (i.e. a one-time pad is still secure).
Ways to make XOR work:
Use multiple keys with each key length equal to a prime number but never the same length for keys.
Use the original filename as another key but remember to create a mechanism for retrieving the filename. Then create a new filename with an extension that will let you know it is an encrypted file.
The reason for using multiple keys of prime-number length is that they cause the resulting XOR key to be Key A TIMES Key B in length before it repeats.
Compress any repeating patterns out of the file before it is encrypted.
Generate a random number and XOR this number every X Offset (Remember, this number must also be recreatable. You could use a RANDOM SEED of the Filelength.
After doing all this, if you use 5 keys of length 31 and greater, you would end up with a key length of approximately One Hundred Meg!
For keys, Filename being one (including the full path), STR(Filesize) + STR(Filedate) + STR(Date) + STR(Time), Random Generation Key, Your Full Name, A private key created one time.
A database to store the keys used for each file encrypted but keep the DAT file on a USB memory stick and NOT on the computer.
This should prevent the repeating pattern on files like Pictures and Music but movies, being four gigs in length or more, may still be vulnerable so may need a sixth key.
I personally have the dat file encrypted itself on the memory stick (Dat file for use with Microsoft Access). I used a 3-Key method to encrypt it cause it will never be THAT large, being a directory of the files with the associated keys.
The reason for multiple keys rather than randomly generating one very large key is that primes times primes get large quick and I have some control over the creation of the key and you KNOW that there really is no such thing as a truely random number. If I created one large random number, someone else can generate that same number.
Method to use the keys: Encrypt the file with one key, then the next, then the next till all keys are used. Each key is used over and over again till the entire file is encrypted with that key.
Because the keys are of different length, the overlap of the repeat is different for each key and so creates a derived key the length of Key one time Key two. This logic repeats for the rest of the keys. The reason for Prime numbers is that the repeating would occur on a division of the key length so you want the division to be 1 or the length of the key, hense, prime.
OK, granted, this is more than a simple XOR on the file but the concept is the same.
Lance
I'm just protecting against casual viewers
As long as this assumption holds, your encryption scheme is ok. People who think that Internet Explorer is "teh internets" are not capable of breaking it.
If not, just use some crypto library. There are already many good algorithms like Blowfish or AES for symmetric crypto.
The target of a good encryption is to make it mathematically difficult to decrypt without the key.
This includes the desire to protect the key itself.
The XOR technique is basically a very simple cipher easily broken as described here.
It is important to note that XOR is used within cryptographic algorithms.
These algorithms work on the introduction of mathematical difficulty around it.
Norton's Anti-virus used to use a technique of using the previous unencrypted letter as the key for next letter. That took me an extra half-hour to figure out, if I recall correctly.
If you just want to stop the casual viewer, it's good enough; I've used to hide strings within executables. It won't stand up 10 minutes to anyone who actually tries, however.
That all said, these days there are much better encryption methods readily available, so why not avail yourself of something better. If you are trying to just hide from the "casual" user, even something like gzip would do that job better.
Another trick is to generate a md5() hash for your password. You can make it even more unique by using the length of the protected text as an offset or combining it with your password to provide better distribution for short phrases. And for long phrases, evolve your md5() hash by combining each 16-byte block with the previous hash -- making the entire XOR key "random" and non-repetitive.
RC4 is essentially XOR encryption! As are many stream ciphers - the key is the key (no pun intended!) you must NEVER reuse the key. EVER!
I'm a little late in answering, but since no one has mentioned it yet: this is called a Vigenère cipher.
Wikipedia gives a number of cryptanalysis attacks to break it; even simpler, though, since most file-formats have a fixed header, would be to XOR the plaintext-header with the encrypted-header, giving you the key.
That ">6%" GeneQ mentions is the index of coincidence for English telegraph text - 26 letters, with punctuation and numerals spelled out. The actual value for long texts is 0.0665.
The <4% is the index of coincidence for random text in a 26-character alphabet, which is 1/26, or 0.385.
If you're using a different language or a different alphabet, the specific values will different. If you're using the ASCII character set, Unicode, or binary bytes, the specific values will be very different. But the difference between the IC of plaintext and random text will usually be present. (Compressed binaries may have ICs very close to that of random, and any file encrypted with any modern computer cipher will have an IC that is exactly that of random text.)
Once you've XORed the text against itself, what you have left is equivalent to an autokey cipher. Wikipedia has a good example of breaking such a cipher
http://en.wikipedia.org/wiki/Autokey_cipher
If you want to keep using XOR you could easily hash the password with multiple different salts (a string that you add to a password before hashing) and then combine them to get a larger key.
E.G. use sha3-512 with 64 unique salts, then hash your password with each salt to get a 32768 bit key that you can use to encrypt a 32Kib (Kilibit) (4KiB (kilibyte)) or smaller file. Hashing this many times should be less than a second on a modern CPU.
for something more secure you could try manipulating your key during encryption like AES (Rijndael). AES actually does XOR times and modifies the key each repeat of the key using a switch table. It became an internation standard so its quite secure.

Resources