File containing its own checksum - security

Is it possible to create a file that will contain its own checksum (MD5, SHA1, whatever)? And to upset jokers I mean checksum in plain, not function calculating it.

I created a piece of code in C, then ran bruteforce for less than 2 minutes and got this wonder:
The CRC32 of this string is 4A1C449B
Note the must be no characters (end of line, etc) after the sentence.
You can check it here:
http://www.crc-online.com.ar/index.php?d=The+CRC32+of+this+string+is+4A1C449B&en=Calcular+CRC32
This one is also fun:
I killed 56e9dee4 cows and all I got was...
Source code (sorry it's a little messy) here: http://www.latinsud.com/pub/crc32/

Yes. It's possible, and it's common with simple checksums. Getting a file to include it's own md5sum would be quite challenging.
In the most basic case, create a checksum value which will cause the summed modulus to equal zero. The checksum function then becomes something like
(n1 + n2 ... + CRC) % 256 == 0
If the checksum then becomes a part of the file, and is checked itself. A very common example of this is the Luhn algorithm used in credit card numbers. The last digit is a check digit, and is itself part of the 16 digit number.

Check this:
echo -e '#!/bin/bash\necho My cksum is 918329835' > magic

"I wish my crc32 was 802892ef..."
Well, I thought this was interesting so today I coded a little java program to find collisions. Thought I'd leave it here in case someone finds it useful:
import java.util.zip.CRC32;
public class Crc32_recurse2 {
public static void main(String[] args) throws InterruptedException {
long endval = Long.parseLong("ffffffff", 16);
long startval = 0L;
// startval = Long.parseLong("802892ef",16); //uncomment to save yourself some time
float percent = 0;
long time = System.currentTimeMillis();
long updates = 10000000L; // how often to print some status info
for (long i=startval;i<endval;i++) {
String testval = Long.toHexString(i);
String cmpval = getCRC("I wish my crc32 was " + testval + "...");
if (testval.equals(cmpval)) {
System.out.println("Match found!!! Message is:");
System.out.println("I wish my crc32 was " + testval + "...");
System.out.println("crc32 of message is " + testval);
System.exit(0);
}
if (i%updates==0) {
if (i==0) {
continue; // kludge to avoid divide by zero at the start
}
long timetaken = System.currentTimeMillis() - time;
long speed = updates/timetaken*1000;
percent = (i*100.0f)/endval;
long timeleft = (endval-i)/speed; // in seconds
System.out.println(percent+"% through - "+ "done "+i/1000000+"M so far"
+ " - " + speed+" tested per second - "+timeleft+
"s till the last value.");
time = System.currentTimeMillis();
}
}
}
public static String getCRC(String input) {
CRC32 crc = new CRC32();
crc.update(input.getBytes());
return Long.toHexString(crc.getValue());
}
}
The output:
49.825756% through - done 2140M so far - 1731000 tested per second - 1244s till the last value.
50.05859% through - done 2150M so far - 1770000 tested per second - 1211s till the last value.
Match found!!! Message is:
I wish my crc32 was 802892ef...
crc32 of message is 802892ef
Note the dots at the end of the message are actually part of the message.
On my i5-2500 it was going to take ~40 minutes to search the whole crc32 space from 00000000 to ffffffff, doing about 1.8 million tests/second. It was maxing out one core.
I'm fairly new with java so any constructive comments on my code would be appreciated.
"My crc32 was c8cb204, and all I got was this lousy T-Shirt!"

Certainly, it is possible. But one of the uses of checksums is to detect tampering of a file - how would you know if a file has been modified, if the modifier can also replace the checksum?

Sure, you could concatenate the digest of the file itself to the end of the file. To check it, you would calculate the digest of all but the last part, then compare it to the value in the last part. Of course, without some form of encryption, anyone can recalculate the digest and replace it.
edit
I should add that this is not so unusual. One technique is to concatenate a CRC-32 so that the CRC-32 of the whole file (including that digest) is zero. This won't work with digests based on cryptographic hashes, though.

I don't know if I understand your question correctly, but you could make the first 16 bytes of the file the checksum of the rest of the file.
So before writing a file, you calculate the hash, write the hash value first and then write the file contents.

There is a neat implementation of the Luhn Mod N algorithm in the python-stdnum library ( see luhn.py). The calc_check_digit function will calculate a digit or character which, when appended to the file (expressed as a string) will create a valid Luhn Mod N string. As noted in many answers above, this gives a sanity check on the validity of the file, but no significant security against tampering. The receiver will need to know what alphabet is being used to define Luhn mod N validity.

If the question is asking whether a file can contain its own checksum (in addition to other content), the answer is trivially yes for fixed-size checksums, because a file could contain all possible checksum values.
If the question is whether a file could consist of its own checksum (and nothing else), it's trivial to construct a checksum algorithm that would make such a file impossible: for an n-byte checksum, take the binary representation of the first n bytes of the file and add 1. Since it's also trivial to construct a checksum that always encodes itself (i.e. do the above without adding 1), clearly there are some checksums that can encode themselves, and some that cannot. It would probably be quite difficult to tell which of these a standard checksum is.

There are many ways to embed information in order to detect transmission errors etc. CRC checksums are good at detecting runs of consecutive bit-flips and might be added in such a way that the checksum is always e.g. 0. These kind of checksums (including error correcting codes) are however easy to recreate and doesn't stop malicious tampering.
It is impossible to embed something in the message so that the receiver can verify its authenticity if the receiver knows nothing else about/from the sender. The receiver could for instance share a secret key with the sender. The sender can then append an encrypted checksum (which needs to be cryptographically secure such as md5/sha1). It is also possible to use asymmetric encryption where the sender can publish his public key and sign the md5 checksum/hash with his private key. The hash and the signature can then be tagged onto the data as a new kind of checksum. This is done all the time on internet nowadays.
The remaining problems then are 1. How can the receiver be sure that he got the right public key and 2. How secure is all this stuff in reality?. The answer to 1 might vary. On internet it's common to have the public key signed by someone everyone trusts. Another simple solution is that the receiver got the public key from a meeting in personal... The answer to 2 might change from day-to-day, but what's costly to force to day will probably be cheap to break some time in the future. By that time new algorithms and/or enlarged key sizes has hopefully emerged.

You can of course, but in that case the SHA digest of the whole file will not be the SHA you included, because it is a cryptographic hash function, so changing a single bit in the file changes the whole hash. What you are looking for is a checksum calculated using the content of the file in way to match a set of criteria.

Sure.
The simplest way would be to run the file through an MD5 algorithm and embed that data within the file. You can split up the check sum and place it at known points of the file (based on a portion size of the file e.g. 30%, 50%, 75%) if you wish to try and hide it.
Similarly you could encrypt the file, or encrypt a portion of the file (along with the MD5 checksum) and embed that in the file.
Edit
I forgot to say that you would need to remove the checksum data before using it.
Of course if your file needs to be readily readable by another program e.g. Word then things become a little more complicated as you don't want to "corrupt" the file so that it is no longer readable.

Related

FIO repeatable buffer fill

Is it possible to have a pseudo-random buffer fill pattern using FIO? ie, the fill pattern for a block would incorporate a seed + block number or offset into a pseudo-random fill generator. This way the entire fill data could be 100% repeatable and verifiable, but more varied than the static pattern provided by --verify=pattern.
My guess at the commands would be something like:
Write pseudo-random data out in verifiable manner
fio --filename=/home/test.bin --direct=1 --rw=write --bs=512 --size=1M --name=verifiable_write --verify=psuedo_rand --verify_psuedo_rand_seed=0xdeadbeef --do_verify=0
Read back pseudo-random data and verify
fio --filename=/home/test.bin --direct=1 --rw=read --bs=512 --size=1M --name=verify_written_data --verify=psuedo_rand --verify_psuedo_rand_seed=0xdeadbeef --do_verify=1
Obviously, I'm making up some options here, but I'm hoping it may get the point across.
(This isn't the right site for this type of question because it's not about programming - Super User or Serverfault look more appropriate)
The fio documentation for buffer_pattern says you can choose a fixed string or number (given in decimal or hex). However look at your examples shows you are doing a verify so the documentation for verify_pattern is relevant. That states you can use %o that sets the block offset. However once you set a fixed pattern that's it - there are no more variables beyond %o. That means with current fio (3.17 at the time of writing) if are choose to use a fixed pattern (e.g. via verify_pattern) there's no way to include seeded random data that can be verified.
If you don't use a fixed pattern and specify verify by checksum then fio will actually use seeded random data but I don't think split verification will check the seed - just that the checksum written into the block matches the data of the rest of the block.
Is it possible to have a pseudo-random buffer fill pattern using FIO?
If the default random buffer fill is OK then yes but if you want to include something like block offset and other additional data alongside that then no at the time of writing (unless you patch the fio source).

How to create API key and Secure Key?

Usually if I consume third party api's, they all give us two keys and they are:
API Key: kind of random number or GUID
Pin/Secure Key/etc: Kind of password or OTP
Now, assuming that I am a third party and I want my API's to be consumed by retailers,
I would also like to create and give these credentials to API consumers.
I work in .net core. Is there any way to create these and also we have to apply security
or token based security.
I am confused because I have no idea how this can be accomlished.
As I researched a few questions on stack-overflow, they suggest to use this, or this, or some use HMAC security but in HMAC, we have to mandate client also to use HMAC so that same signatures can be matched.
I am in confused state of mind. Can you please suggest some ways by which I can do this in .net core
Generating API Keys can basically be broken down to just generating cryptographically random strings.
The following C# code snippet I had lying around generates a random hex string:
using System.Security.Cryptography;
public static string RandomString()
{
byte[] randomBytes = new Byte[64];
using (RandomNumberGenerator rng = new RNGCryptoServiceProvider())
{
rng.GetBytes(randomBytes);
}
SHA256Cng ShaHashFunction = new SHA256Cng();
byte[] hashedBytes = ShaHashFunction.ComputeHash(randomBytes);
string randomString = string.Empty;
foreach (byte b in hashedBytes)
{
randomString += string.Format("{0:x2}", b);
}
return randomString;
}
You can easily change the length of the resulting key by using a different hash function or you can also switch the hex encoding to Base64 (Convert.ToBase64String(hashedBytes) which would replace the foreach loop) encoding which is more common when using API keys.
Edit 2022
Since when I wrote this answer both my understanding of cryptography and .NET Core itself have evolved.
Therefore nowadays I would recommend something like this
public static string GetSecureRandomString(int byteLength = 64)
{
Span<byte> buffer = byteLength > 4096
? new byte[byteLength]
: stackalloc byte[byteLength];
RandomNumberGenerator.Fill(buffer);
return Convert.ToHexString(buffer);
}
The following changes have been implemented:
using stackalloc if possible to reduce managed allocations and GC (garbage collector) pressure, thus increasing performance.
RNGCryptoServiceProvider has been deprecated and replaced with RandomNumberGenerator.Fill() or RandomNumberGenerator.GetBytes(), which also provide cryptographically sufficiently secure random bytes.
(Oversight on my part) There is actually no need for hashing in this context. The randomly generated bytes are secure as they are, so applying a hash function to them not only limits the output length (in case of SHA-256) but is also superfluous.
.NET 5 and later provide the Convert.ToHexString() method to convert bytes to hex.
I added a parameter to specify the length in bytes for the output string. More bytes = better security against brute-force attacks, but it comes with the drawback of a longer output string which may not be as handy as a shorter one. Tweak this value to fit your needs. The default is set to 512 bits (64 bytes) which is sufficiently secure for most applications.
In this example, I have chosen hex-encoding for the final string, but you may use any information-preserving encoding (hex, base64, base32, ASCII, ...) without compromising security.

md5sum relationship between splite files and combined large file [duplicate]

I have a situation where I have one VERY large file that I'm using the linux "split" command to break into smaller parts. Later I use the linux "cat" command to bring the parts all back together again.
In the interim, however, I'm curious...
If I get an MD5 fingerprint on the large file before splitting it, then later get the MD5 fingerprints on all the independent file parts that result from the split command, is there a way to take the independent fingerprints and somehow deduce that the sum or average (or whatever you like to all it) of their parts is equal to the fingerprint of the single large file?
By (very) loose example...
bigoldfile.txt MD5 = 737da789
smallfile1.txt MD5 = 23489a89
smallfile2.txt MD5 = 1238g89d
smallfile3.txt MD5 = 01234cd7
someoperator(23489a89,1238g89d,01234cd7) = 737da789 (the fingerprint of the original file)
You likely can't do that - MD5 is complex enough inside and depends on actual data as well as the "initial" hash value.
You could instead generate "incremental" hashes - hash of first part, hash of first plus second part, etc.
Not exactly but the next best thing would be to do this:
cat filepart1 filepart2 | md5sum
or
cat filepart* | md5sum
Be sure to cat them back together in the correct order.
by piping the output of cat you don't have to worry about creating a combined file that is too large.

DES and ICryptoTransform

This method works fine in a program I've made. However I cannot really understand what is happening and where the encryption is actually performed. I read the related description from MSDN but not much information is given.
Can someone explain what is happening in general especially in line 8 and 9 please.
public byte[] Decrypt(byte[] input, byte[] key, byte[] iv)
{
DES des = new DESCryptoServiceProvider();
des.Mode = CipherMode.ECB;
des.Padding = PaddingMode.None;
des.Key = key;
ICryptoTransform ct = des.CreateDecryptor(key, iv);
byte[] result = ct.TransformFinalBlock(input, 0, input.Length);
return result;
}
If you want to understand what is going on, you should read about block cipher operations here:
http://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Electronic_codebook_.28ECB.29
In a nutshell, block ciphers chaining causes the input of one block operation to be fed into the next block operation. This obscures any block-level patterns in the ciphertext. Since there is a chaining structure, the last block gets an input from the second last block, and so on... until the second block gets an input from the first block. Now the first block needs to get an input from something, but there are no preceding blocks. So we use something called an Initialization Vector (iv) to start it off. This IV does not need to be secret like the key, but it does need to have a low probability of re-use (otherwise the attacker can use it to correlate the first blocks of all your ciphertexts). Typically random numbers are used, or sometimes increasing sequence numbers.
In regard to the specific call:
Your method works to decrypt a single block using DES. (Which is nowadays considered out of date and insecure, by the way, please consider using AES instead - the block cipher structures remain the same so all you need to do is swap the library). Anyway,
Since you're using a cipher in ECB mode, each block is decrypted independently with the same initialization vector, which is provided to your Decrypt method call. The call to CreateDecryptor initializes a decryption object using the provided secret key and initialization vector.
The actual decryption is performed using the call to TransformFinalBlock. The arguments are the input byte array, and then an offset and a length parameter (used for when you don't want to decrypt the entire byte array). In this case you do want to use the entire byte array so the starting offset is 0 and the size is the length of the whole byte array.
One thing you should probably add is to check that the input byte array is the correct block size for your cipher, otherwise it will throw an exception. In the case of DES, this is 64 bits. If you switch to AES as I recommended it will be 128 bits.

How many times can i compose a md5 function with itself?

For studying purposes it would be usefull to find out how many times can i compose a md5 function with itself without getting the same value.
This is a paralell/complementary approach to the salt, because this way the value gets harder to crack using brute force.
Seemingly infinite. However MD5 has been shown to not be collision resistant so at some point you will have a duplicate.
The following Ruby code will cyclicly apply the MD5 hashing algorithm until a duplicate has been detected, at which point it will print the number of cycles required to reach the duplication point. The original string is randomly generated from alphabetical characters.
require 'set'
require 'digest'
keys = Set.new
o = [('a'..'z'), ('A'..'Z')].map { |i| i.to_a }.flatten
string = (0...10).map{ o[rand(o.length)] }.join
count = 0
while !keys.include?(string) do
count += 1
puts count
keys << string
string = Digest::MD5.digest(string)
end
puts "#{count}"
This continues to run past 15mil cycles... I will update once a duplicate has been found.
Update: due to the limited resources of my machine I had to halt the above script after 75,933,338 cycles without a collision (the set had allocated ~8 GB in memory)

Resources