I need to have a limit in the output length of my cipher. I know it depends on which algorithm you use, but I didnt find how do do this.
const crypto = require('crypto');
const options = {secret : 'SECRET', algorithm : 'CAST-cbc'};
exports.encrypt = (url) => {
const encrypt = crypto.createCipher(options.algorithm, options.secret);
let cipher = encrypt.update(url, 'utf8', 'hex');
cipher += encrypt.final('hex');
return cipher;
};
This is how im generating the cipher. Thanks.
limit in the output length of my cipher
...
'm trying to build a url shortener without having to have a database behind it
Encryption will be always longer than the original plaintext. You will need to have some IV (initialization vector), encrypted information, optionally an authentication code and then all encoded into an url safe format.
In theory you may sacrifice some level of security and have some sort of format-preserving encryption, but it will be at least as long as the original source.
final hash example: 1d6ca5252ff0e6fe1fa8477f6e8359b4
You won't be able to reconstruct original value from a hash. That's why for serious url shortening service you will need a database with key-value pair, where the key can be id or hash
You may still properly encrypt and encode the original data, just the output won't be shorter.
The output depends on input, always. What you could do is pad the input or output to the desired length to generate a consistent final hash. This requires there be a maximum length to the input, however.
Beyond that, there is no way to force a length and still have something decryptable. If you simply need a hash for validation, that's a bit different. If you clarify the purpose we can help more.
Given the updates that you need a hash, not encryption, then here is how this works.
You hash the original URL and save a file to disk (or wherever, S3). That file has the name that corresponds to the hash. So if the URL is 'http://google.com' the hash may be 'C7B920F57E553DF2BB68272F61570210' (MD5 hash), so you would save 'C7B920F57E553DF2BB68272F61570210'.txt or something on the server with 'http://google.com' as the file contents.
Now when someone goes to http://yourURLShortnener.com/C7B920F57E553DF2BB68272F61570210 you just look on disk for that file and load the contents, then issue a redirect for that URL.
Obviously you'd prefer a shorter hash, but you can just take a 10 digit substring from the original hash. It increases chances of hash collisions, but that's just the risk you take.
Related
I am using Node.JS and the crypto module. I have a SHA-256 hash in hex string and would like to create a crypto.Hash instance out of it. I only found ways to hash the input string itself, but not to update or create a new hash. Am I missing something from the documentation?
I am looking for something like (for UUID though):
crypto.Hash.from("sha256", "hex", "d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592")
Generally there are not many libraries that do what you ask of them. There are certainly libraries where the internal state can be retrieved and restored such as Bouncy Castle, but as yet I haven't seen it in any JavaScript library. It would be very easy to create though.
Indeed, the 256 bit (total) intermediate values after each block of 512 bits will be used as final output after the last block is hashed. So if you can "restore" those values (i.e. put them in the state) then you could continue hashing after that.
This might not be that useful though, as those values already contain the padding and message size encoded into a 64 bit representation at the end of the block. So if you continue hashing after that, that padding and length will likely be included again, but now with different values.
One trick sometimes used in smart cards is to upload the intermediate values (including number of bits hashed) before the last data to be hashed, and let the smart card perform the padding, the length encoding and the final hash block operation. This is usually performed during signature calculation over large amounts of data (because you really don't want to send a whole document to smart card).
Pretty dumb if you ask me, just directly signing using a pre-calculated hash value is the way forward. Or making sure that the large swath of data is pre-hashed, and the hash is then signed (including another pass of the hash) - that way the entire problem can be avoided without special tricks.
The following small example code hashes a string to a base64- and hex string encoding.
This is the output:
buf: The quick brown fox jumps over the lazy dog
sha256 (Base64): 16j7swfXgJRpypq8sAguT41WUeRtPNt2LQLQvzfJ5ZI=
sha256 (hex): d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592
code:
var crypto = require('crypto');
const plaintext = 'The quick brown fox jumps over the lazy dog';
const buf = Buffer.from(plaintext, 'utf8');
console.log('buf: ' + buf);
const sha256Base64 = crypto.createHash('sha256').update(buf).digest('base64');
console.log('sha256 (Base64): ' + sha256Base64);
const sha256Hex = crypto.createHash('sha256').update(buf).digest('hex');
console.log('sha256 (hex): ' + sha256Hex);
Edit: I misunderstood the question, the task is to run several SHA-256 update calls (e.g. for different strings) and receive a (intermediate) result. Later this result should be used as "input" and the hashing should be proceeded with more update calls, finished with by "final"/"digest" to get the final sha-256 value about all parts.
Unfortunately there seems to be no way as the intermediate value is a final value and there (again) seems to be no way back because the final/digest calls make so additional computations (has to do with the underlying Merkle-Damgård constructions). The only way would be use an own written SHA-256 function and save the state of all internal registers, reset the registers when continuing and get the final value in the end.
An advice could be to grab the source code of a sha-256 implementation and save the internal states of all used variables. When proceeding you need to restore these variables and run the next update calls. I viewed some (Java) imps, and it does not look very difficult for me.
I have created hash of some fields and storing in database using 'crypto' npm.
var crypto = require('crypto');
var hashFirtName = crypto.createHash('md5').update(orgFirtName).digest("hex"),
QUESTION: How can I get the original value from the hash value when needed?
The basic definition of a "hash" is that it's one-way. You cannot get the originating value from the hash. Mostly because a single value will always produce the same hash, but a hash isn't always related to a single value, since most hash functions return a string of finite/fixed length.
Additional Information
I wanted to provide some additional information, as I felt I may have left this too short.
As #xShirase pointed out in his answer, you can use a table to reverse a Hash. These are known as Rainbow Tables. You can generate them or download them from the internet, usually from nefarious sources [ahem].
To expand on my other statement about a hash value possibly relating to multiple original values, lets take a look at MD5.
MD5 is a 128-bit hash. This means it can hold 2^128 bits, or (unsigned) 0 through 340,282,366,920,938,463,463,374,607,431,768,211,455. That's a REALLY big number. So, for any given input you have a 1 in 340,282,366,920,938,463,463,374,607,431,768,211,456 chance that it will collide with the same hash result of another input value.
Now, for simple data like passwords, the chances are astronomical. And for those purposes, who cares? Most of the time you are simply taking an input, hashing it, then comparing the hashes. For reasons I will not get into, when using hashes for passwords you should ALWAYS store the data already hashed. You don't want to leave plain-text passwords just lying about. Keep in mind that a hash is NOT the same as encryption.
Hashes can also be used for other reasons. For instance, they can be used to create a fast-lookup data structure known as a Hash Table. A Hash Table uses a hash as sort of a "primary key", allowing it to search a huge set of data in relatively few number of instructions, approaching O(1) (On-order of 1). Depending on the implementation of the Hash Table and the hashing algorithm, you have to deal with collisions, usually by means of a sorted list. This is why the Hash Table isn't "exactly" O(1), but close. If your hash algorithm is bad, the performance of your Hash Table can begin to approach O(n).
Another use for a hash it to tell if a file's contents have been altered, or match an original. You will see many OSS project provide binary downloads that also have an MD5 and/or SHA-2 hash values. This is so you can download the files, do a hash locally, and compare the results against theirs to make sure the file you are getting is the file they posted. Again, since the odds of two files matching another is 1 in 340,282,366,920,938,463,463,374,607,431,768,211,456, the odds of a hacker successfully generating a file of the same size with a bad payload that hashes to the exact same MD5/SHA-2 hash is pretty low.
Hope this discussion can help either you or someone in the future.
If you could get the original value from the hash, it wouldn't be that secure.
If you need to compare a value to what you have previously stored as a hash, you can create a hash for this value and compare the hashes.
In practice there is only one way to 'decrypt' a hash. It involves using a massive database of decrypted hashes, and compare them to yours. An example here
server code:
cipher = createCipher('aes128', 'password');
str = cipher.update('message', 'utf8', 'base64');
str += cipher.final('base64')
I want the client code (the browser) have the same algorithm as above, given the same message and password, produce the same output as the server's.
I tried CryptoJS, SJCL and some other libraries, but they use iv and salt which makes the result entirely different. In my situation, that security isn't necessary.
(I don't know exactly what iv and salt is, I just hope the code can function without them.)
UPDATE: I find that without proper knowledge of the encryption itself, using the function in this way is a huge mistake.
Per doc:
password is used to derive key and IV, which must be a 'binary'
encoded string or a buffer.
I'm going to learn some basics first.
I'm going to use this kind of approach to store my password:
User enters password
Application salts password with random number
Then with salted password encrypt with some encryption algorithm randomly selected array of data (consisting from predefined table of chars/bytes)
for simplicity it can be used just table of digits, so in case of digits random array would be simply be long enough integer/biginteger.
Then I store in DB salt (modified value) and encrypted array
To check password validity:
Getting given password
Read salt from DB and calculate decrypt key
Try to decrypt encrypted array
If successfull (in mathematical mean) compare decrypted value byte by byte
does it contains only chars/bytes from known table. For instance is it integer/biginteger? If so - password counts as valid
What do you think about this procedure?
In a few words, it's a kind of alternative to using hash functions...
In this approach encryption algorithm is about to be used for calculation of non-inversible value.
EDIT
# Encrypt/decrypt function that works like this:
KEY=HASH(PASSWORD)
CYPHERTEXT = ENCRYPT(PLAINTEXT, KEY)
PLAINTEXT = DECRYPT(CYPHERTEXT, KEY)
# Encrypting the password when entered
KEY=HASH(PASSWORD)+SALT or HASH(PASSWORD+SALT)
ARRAY={A1, A2,... AI}
SOME_TABLE=RANDOM({ARRAY})
ENCRYPTED_TABLE = ENCRYPT(SOME_TABLE, KEY + SALT)
# Checking validity
DECRYPT(ENCRYPTED_TABLE, PASSWORD + SALT) == SOME_TABLE
if(SOME_TABLE contains only {ARRAY} elements) = VALID
else INVALID
From what you write I assume you want to do the following:
# You have some encryption function that works like this
CYPHERTEXT = ENCRYPT(PLAINTEXT, KEY)
PLAINTEXT = DECRYPT(CYPHERTEXT, KEY)
# Encrypting the password when entered
ENCRYPTED_TABLE = ENCRYPT(SOME_TABLE, PASSWORD + SALT)
# Checking validity
DECRYPT(ENCRYPTED_TABLE, PASSWORD + SALT) == SOME_TABLE
First off: No sane person would use such a homemade scheme in a production system. So if you were thinking about actually implementing this in the real world, please go back. Don't even try to write the code yourself, use a proven software library that implements widely accepted algorithms.
Now, if you want to think about it as a mental exercise, you could start off like this:
If you should assume that an attacker will know all the parts of the equation, except the actual password. The attacker, who wants to retrieve the password, will therefore already know the encrypted text, the plaintext AND part of the password.
The chance of success will depend on the actual encryption scheme, and maybe the chaining mode.
I'm not a cryptanalyst myself, but without thinking about it too much I have the feeling that there could be a number of angles of attack.
The proposed scheme is, at best, slightly less secure than simply storing the hash of the password and salt.
This is because the encryption step simply adds a small constant amount of time to checking if each hash value is correct; but at the same time it also introduces classes of equivalent hashes, since there are multiple possible permutations of ARRAY that will be recognised as valid.
You would have to brute force the encryption on every password every time someone logs in.
Read salt from DB and calculate decrypt key
This can't be done unless you know what the password is before hand.
Just salt (And multiple hash) the password.
Best practice is to use unique ivs, but what is unique? Is it unique for each record? or absolutely unique (unique for each field too)?
If it's per field, that sounds awfully complicated, how do you manage the storage of so many ivs if you have 60 fields in each record.
I started an answer a while ago, but suffered a crash that lost what I'd put in. What I said was along the lines of:
It depends...
The key point is that if you ever reuse an IV, you open yourself up to cryptographic attacks that are easier to execute than those when you use a different IV every time. So, for every sequence where you need to start encrypting again, you need a new, unique IV.
You also need to look up cryptographic modes - the Wikipedia has an excellent illustration of why you should not use ECB. CTR mode can be very beneficial.
If you are encrypting each record separately, then you need to create and record one IV for the record. If you are encrypting each field separately, then you need to create and record one IV for each field. Storing the IVs can become a significant overhead, especially if you do field-level encryption.
However, you have to decide whether you need the flexibility of field level encryption. You might - it is unlikely, but there might be advantages to using a single key but different IVs for different fields. OTOH, I strongly suspect that it is overkill, not to mention stressing your IV generator (cryptographic random number generator).
If you can afford to do encryption at a page level instead of the row level (assuming rows are smaller than a page), then you may benefit from using one IV per page.
Erickson wrote:
You could do something clever like generating one random value in each record, and using a hash of the field name and the random value to produce an IV for that field.
However, I think a better approach is to store a structure in the field that collects an algorithm identifier, necessary parameters (like IV) for that parameter, and the ciphertext. This could be stored as a little binary packet, or encoded into some text like Base-85 or Base-64.
And Chris commented:
I am indeed using CBC mode. I thought about an algorithm to do a 1:many so I can store only 1 IV per record. But now I'm considering your idea of storing the IV with the ciphertext. Can you give me more some more advice: I'm using PHP + MySQL, and many of the fields are either varchar or text. I don't have much experience with binary in the database, I thought binary was database-unfriendly so I always base64_encoded when storing binary (like the IV for example).
To which I would add:
IBM DB2 LUW and Informix Dynamic Server both use a Base-64 encoded scheme for the character output of their ENCRYPT_AES() and related functions, storing the encryption scheme, IV and other information as well as the encrypted data.
I think you should look at CTR mode carefully - as I said before. You could create a 64-bit IV from, say, 48-bits of random data plus a 16-bit counter. You could use the counter part as an index into the record (probably in 16 byte chunks - one crypto block for AES).
I'm not familiar with how MySQL stores data at the disk level. However, it is perfectly possible to encrypt the entire record including the representation of NULL (absence of) values.
If you use a single IV for a record, but use a separate CBC encryption for each field, then each field has to be padded to 16 bytes, and you are definitely indulging in 'IV reuse'. I think this is cryptographically unsound. You would be much better off using a single IV for the entire record and either one unit of padding for the record and CBC mode or no padding and CTR mode (since CTR does not require padding - one of its merits; another is that you only use the encryption mode of the cipher for both encrypting and decrypting the data).
Once again, appendix C of NIST pub 800-38 might be helpful. E.g., according to this
you could generate an IV for the CBC mode simply by encrypting a unique nonce with your encryption key. Even simpler if you would use OFB then the IV just needs to be unique.
There is some confusion about what the real requirements are for good IVs in the CBC mode. Therefore, I think it is helpful to look briefly at some of the reasons behind these requirements.
Let's start with reviewing why IVs are even necessary. IVs randomize the ciphertext. If the same message is encrypted twice with the same key then (but different IVs) then the ciphertexts are distinct. An attacker who is given two (equally long) ciphertexts, should not be able to determine whether the two ciphertexts encrypt the same plaintext or two different plaintext. This property is usually called ciphertext indistinguishablility.
Obviously this is an important property for encrypting databases, where many short messages are encrypted.
Next, let's look at what can go wrong if the IVs are predictable. Let's for example take
Ericksons proposal:
"You could do something clever like generating one random value in each record, and using a hash of the field name and the random value to produce an IV for that field."
This is not secure. For simplicity assume that a user Alice has a record in which there
exist only two possible values m1 or m2 for a field F. Let Ra be the random value that was used to encrypt Alice's record. Then the ciphertext for the field F would be
EK(hash(F || Ra) xor m).
The random Ra is also stored in the record, since otherwise it wouldn't be possible to decrypt. An attacker Eve, who would like to learn the value of Alice's record can proceed as follows: First, she finds an existing record where she can add a value chosen by her.
Let Re be the random value used for this record and let F' be the field for which Eve can submit her own value v. Since the record already exists, it is possible to predict the IV for the field F', i.e. it is
hash(F' || Re).
Eve can exploit this by selecting her value v as
v = hash(F' || Re) xor hash(F || Ra) xor m1,
let the database encrypt this value, which is
EK(hash(F || Ra) xor m1)
and then compare the result with Alice's record. If the two result match, then she knows that m1 was the value stored in Alice's record otherwise it will be m2.
You can find variants of this attack by searching for "block-wise adaptive chosen plaintext attack" (e.g. this paper). There is even a variant that worked against TLS.
The attack can be prevented. Possibly by encrypting the random before using putting it into the record, deriving the IV by encrypting the result. But again, probably the simplest thing to do is what NIST already proposes. Generate a unique nonce for every field that you encrypt (this could simply be a counter) encrypt the nonce with your encryption key and use the result as an IV.
Also note, that the attack above is a chosen plaintext attack. Even more damaging attacks are possible if the attacker has the possibility to do chosen ciphertext attacks, i.e. is she can modify your database. Since I don't know how your databases are protected it is hard to make any claims there.
The requirements for IV uniqueness depend on the "mode" in which the cipher is used.
For CBC, the IV should be unpredictable for a given message.
For CTR, the IV has to be unique, period.
For ECB, of course, there is no IV. If a field is short, random identifier that fits in a single block, you can use ECB securely.
I think a good approach is to store a structure in the field that collects an algorithm identifier, necessary parameters (like IV) for that algorithm, and the ciphertext. This could be stored as a little binary packet, or encoded into some text like Base-85 or Base-64.