How can a the extension of the PCR value be replicated with e.g. sha1sum? - linux

this is somewhat related to the post in:
Perform OR on two hash outputs of sha1sum
I have a sample set of TPM measurements, e.g. the following:
10 1ca03ef9cca98b0a04e5b01dabe1ff825ff0280a ima 0ea26e75253dc2fda7e4210980537d035e2fb9f8 boot_aggregate
10 7f36b991f8ae94141753bcb2cf78936476d82f1d ima d0eee5a3d35f0a6912b5c6e51d00a360e859a668 /init
10 8bc0209c604fd4d3b54b6089eac786a4e0cb1fbf ima cc57839b8e5c4c58612daaf6fff48abd4bac1bd7 /init
10 d30b96ced261df085c800968fe34abe5fa0e3f4d ima 1712b5017baec2d24c8165dfc1b98168cdf6aa25 ld-linux-x86-64.so.2
According to the TPM spec, also referred to in the above post, the PCR extend operation is: PCR := SHA1(PCR || data), i.e. "concatenate the old value of PCR with the data, hash the concatenated string and store the hash in PCR". Also, the spec multiple papers and presentations I have found mention that data is a hash of the software to be loaded.
However, when I do an operation like echo H(PCR)||H(data) | sha1sum, I do not obtain a correct resulting value. I.e., when calculatinng (using the above hashes): echo 1ca03ef9cca98b0a04e5b01dabe1ff825ff0280a0ea26e75253dc2fda7e4210980537d035e2fb9f8 | sha1sum, the resuting value is NOT 7f36b991f8ae94141753bcb2cf78936476d82f1d.
Is my understanding of the TPM_Extend operation correct? if so, why is the resulting hash different from the one in the sample measurement file?
Thanks!
/n

To answer your very first question: Your understanding of extend operation is more or less correct. But you have 2 problems:
You are misinterpreting the things you have copied in here
You can't calculate hashes like you do on the shell
The log output you provided here is from Linux's IMA. According to the
documentation the first hash is template-hash and defined as
template-hash: SHA1(filedata-hash | filename-hint)
filedata-hash: SHA1(filedata)
So for the first line: SHA1(0ea26e75253dc2fda7e4210980537d035e2fb9f8 | "boot_aggregate")
results in 1ca03ef9cca98b0a04e5b01dabe1ff825ff0280a.
Note that the filename-hint is 256 byte long - it is 0-padded at the end.
(thumbs up for digging this out of the kernel source ;))
So to make it clear: In your log are no PCR values.
I wrote something in Ruby to verify my findings:
require 'digest/sha1'
filedata_hash = ["0ea26e75253dc2fda7e4210980537d035e2fb9f8"].pack('H*')
filename_hint = "boot_aggregate".ljust(256, "\x00")
puts Digest::SHA1.hexdigest(filedata_hash + filename_hint)
Now to your commands:
The way you are using it here, you are interpreting the hashes as ASCII-strings.
Also note that echo will add an additional new line character to the output.
The character sequence 1ca03ef9cca98b0a04e5b01dabe1ff825ff0280a is hexadecimal
encoding of 160 bit binary data - a SHA1 hash value. So basically you are right,
you have to concatenate the two values and calculate the SHA1 of the resulting
320 bit of data.
So the correct command for the command line would be something like
printf "\x1c\xa0\x3e\xf9\xcc\xa9\x8b\x0a\x04\xe5\xb0\x1d\xab\xe1\xff\x82\x5f\xf0\x28\x0a\x0e\xa2\x6e\x75\x25\x3d\xc2\xfd\xa7\xe4\x21\x09\x80\x53\x7d\x03\x5e\x2f\xb9\xf8" | sha1sum
The \xXX in the printf string will convert the hex code XX into one byte of
binary output.
This will result in the output of d14f958b2804cc930f2f5226494bd60ee5174cfa,
and that's fine.

Related

looking for fast way to edit a large file in linux

I have a large file, several gig of binary data, with an ASCII header at the top. I need to make a few small changes to the ASCII header. sed does the job, but it takes a fair bit of time since the file is so large. vi/vim is slow too. Is there any linux utility that can just go into the file, make the change at the top, and then get out quickly?
An example might be a header that looks like:
Code Rev: 3.5
Platform: platform1
Run Date: 12/13/16
Data source: whatever
Restart: False
followed by a large amount of binary data ....
and then I might need to, for example, edit an error in "Data source".
Provided that you know that your header is less than X bytes, you can use dd.
(!) But it only works if both strings have the same length (!)
Lets say, that the header is less that 4096 bytes
dd if=/path/to/file bs=4096 count=1 | sed 's/XXX/YYY/' | dd of=/path/to/file conv=notrunc
You can also do it programmatically, using languages like C,Python,PHP,JAVA etc. The idea is to open the file, read the header, fix it, and write it back.

md5sum relationship between splite files and combined large file [duplicate]

I have a situation where I have one VERY large file that I'm using the linux "split" command to break into smaller parts. Later I use the linux "cat" command to bring the parts all back together again.
In the interim, however, I'm curious...
If I get an MD5 fingerprint on the large file before splitting it, then later get the MD5 fingerprints on all the independent file parts that result from the split command, is there a way to take the independent fingerprints and somehow deduce that the sum or average (or whatever you like to all it) of their parts is equal to the fingerprint of the single large file?
By (very) loose example...
bigoldfile.txt MD5 = 737da789
smallfile1.txt MD5 = 23489a89
smallfile2.txt MD5 = 1238g89d
smallfile3.txt MD5 = 01234cd7
someoperator(23489a89,1238g89d,01234cd7) = 737da789 (the fingerprint of the original file)
You likely can't do that - MD5 is complex enough inside and depends on actual data as well as the "initial" hash value.
You could instead generate "incremental" hashes - hash of first part, hash of first plus second part, etc.
Not exactly but the next best thing would be to do this:
cat filepart1 filepart2 | md5sum
or
cat filepart* | md5sum
Be sure to cat them back together in the correct order.
by piping the output of cat you don't have to worry about creating a combined file that is too large.

What type of data return /dev/random by default?

What type of data return /dev/random by default without any formatting with od (for example)?
I mean, what is the data type of these random symbols which we see when run cat /dev/random ?
If you try to cat it, it will result in jibberish. You need to read it as a binary data. od, for example will work. Otherwise, you can read it in with c's read function, for example.
It is stored as a random stream of binary digits.

Random String in linux by system time

I work with Bash. I want to generate randrom string by system time . The length of the unique string must be between 10 and 30 characters.Can anybody help me?
There are many ways to do this, my favorite one using the urandom device:
burhan#sandbox:~$ tr -cd '[:alnum:]' < /dev/urandom | fold -w30 | head -n1
CCI4zgDQ0SoBfAp9k0XeuISJo9uJMt
tr (translate) makes sure that only alphanumerics are shown
fold will wrap it to 30 character width
head makes sure we get only the first line
To use the current system time (as you have this specific requirement):
burhan#sandbox:~$ date +%s | sha256sum | base64 | head -c30; echo
NDc0NGQxZDQ4MWNiNzBjY2EyNGFlOW
date +%s = this is our date based seed
We run it through a few hashes to get a "random" string
Finally we truncate it to 30 characters
Other ways (including the two I listed above) are available at this page and others if you simply google.
Maybe you can use uuidgen -t.
Generate a time-based UUID. This method creates a UUID based on the system clock plus the system's ethernet hardware address, if present.
I recently put together a script to handle this, the output is 33 digit md5 checksum but you can trim it down with sed to between 10-30.
E.g. gen_uniq_id.bsh | sed 's/\(.\{20\}\)\(.*$\)/\1/'
The script is fairly robust - it uses current time to nanoseconds, /dev/urandom, mouse movement data and allows for optionally changing the collection times for random and mouse data collection.
It also has a -s option that allows an additional string argument to be incorporated, so you can random seed from anything.
https://code.google.com/p/gen-uniq-id/

File containing its own checksum

Is it possible to create a file that will contain its own checksum (MD5, SHA1, whatever)? And to upset jokers I mean checksum in plain, not function calculating it.
I created a piece of code in C, then ran bruteforce for less than 2 minutes and got this wonder:
The CRC32 of this string is 4A1C449B
Note the must be no characters (end of line, etc) after the sentence.
You can check it here:
http://www.crc-online.com.ar/index.php?d=The+CRC32+of+this+string+is+4A1C449B&en=Calcular+CRC32
This one is also fun:
I killed 56e9dee4 cows and all I got was...
Source code (sorry it's a little messy) here: http://www.latinsud.com/pub/crc32/
Yes. It's possible, and it's common with simple checksums. Getting a file to include it's own md5sum would be quite challenging.
In the most basic case, create a checksum value which will cause the summed modulus to equal zero. The checksum function then becomes something like
(n1 + n2 ... + CRC) % 256 == 0
If the checksum then becomes a part of the file, and is checked itself. A very common example of this is the Luhn algorithm used in credit card numbers. The last digit is a check digit, and is itself part of the 16 digit number.
Check this:
echo -e '#!/bin/bash\necho My cksum is 918329835' > magic
"I wish my crc32 was 802892ef..."
Well, I thought this was interesting so today I coded a little java program to find collisions. Thought I'd leave it here in case someone finds it useful:
import java.util.zip.CRC32;
public class Crc32_recurse2 {
public static void main(String[] args) throws InterruptedException {
long endval = Long.parseLong("ffffffff", 16);
long startval = 0L;
// startval = Long.parseLong("802892ef",16); //uncomment to save yourself some time
float percent = 0;
long time = System.currentTimeMillis();
long updates = 10000000L; // how often to print some status info
for (long i=startval;i<endval;i++) {
String testval = Long.toHexString(i);
String cmpval = getCRC("I wish my crc32 was " + testval + "...");
if (testval.equals(cmpval)) {
System.out.println("Match found!!! Message is:");
System.out.println("I wish my crc32 was " + testval + "...");
System.out.println("crc32 of message is " + testval);
System.exit(0);
}
if (i%updates==0) {
if (i==0) {
continue; // kludge to avoid divide by zero at the start
}
long timetaken = System.currentTimeMillis() - time;
long speed = updates/timetaken*1000;
percent = (i*100.0f)/endval;
long timeleft = (endval-i)/speed; // in seconds
System.out.println(percent+"% through - "+ "done "+i/1000000+"M so far"
+ " - " + speed+" tested per second - "+timeleft+
"s till the last value.");
time = System.currentTimeMillis();
}
}
}
public static String getCRC(String input) {
CRC32 crc = new CRC32();
crc.update(input.getBytes());
return Long.toHexString(crc.getValue());
}
}
The output:
49.825756% through - done 2140M so far - 1731000 tested per second - 1244s till the last value.
50.05859% through - done 2150M so far - 1770000 tested per second - 1211s till the last value.
Match found!!! Message is:
I wish my crc32 was 802892ef...
crc32 of message is 802892ef
Note the dots at the end of the message are actually part of the message.
On my i5-2500 it was going to take ~40 minutes to search the whole crc32 space from 00000000 to ffffffff, doing about 1.8 million tests/second. It was maxing out one core.
I'm fairly new with java so any constructive comments on my code would be appreciated.
"My crc32 was c8cb204, and all I got was this lousy T-Shirt!"
Certainly, it is possible. But one of the uses of checksums is to detect tampering of a file - how would you know if a file has been modified, if the modifier can also replace the checksum?
Sure, you could concatenate the digest of the file itself to the end of the file. To check it, you would calculate the digest of all but the last part, then compare it to the value in the last part. Of course, without some form of encryption, anyone can recalculate the digest and replace it.
edit
I should add that this is not so unusual. One technique is to concatenate a CRC-32 so that the CRC-32 of the whole file (including that digest) is zero. This won't work with digests based on cryptographic hashes, though.
I don't know if I understand your question correctly, but you could make the first 16 bytes of the file the checksum of the rest of the file.
So before writing a file, you calculate the hash, write the hash value first and then write the file contents.
There is a neat implementation of the Luhn Mod N algorithm in the python-stdnum library ( see luhn.py). The calc_check_digit function will calculate a digit or character which, when appended to the file (expressed as a string) will create a valid Luhn Mod N string. As noted in many answers above, this gives a sanity check on the validity of the file, but no significant security against tampering. The receiver will need to know what alphabet is being used to define Luhn mod N validity.
If the question is asking whether a file can contain its own checksum (in addition to other content), the answer is trivially yes for fixed-size checksums, because a file could contain all possible checksum values.
If the question is whether a file could consist of its own checksum (and nothing else), it's trivial to construct a checksum algorithm that would make such a file impossible: for an n-byte checksum, take the binary representation of the first n bytes of the file and add 1. Since it's also trivial to construct a checksum that always encodes itself (i.e. do the above without adding 1), clearly there are some checksums that can encode themselves, and some that cannot. It would probably be quite difficult to tell which of these a standard checksum is.
There are many ways to embed information in order to detect transmission errors etc. CRC checksums are good at detecting runs of consecutive bit-flips and might be added in such a way that the checksum is always e.g. 0. These kind of checksums (including error correcting codes) are however easy to recreate and doesn't stop malicious tampering.
It is impossible to embed something in the message so that the receiver can verify its authenticity if the receiver knows nothing else about/from the sender. The receiver could for instance share a secret key with the sender. The sender can then append an encrypted checksum (which needs to be cryptographically secure such as md5/sha1). It is also possible to use asymmetric encryption where the sender can publish his public key and sign the md5 checksum/hash with his private key. The hash and the signature can then be tagged onto the data as a new kind of checksum. This is done all the time on internet nowadays.
The remaining problems then are 1. How can the receiver be sure that he got the right public key and 2. How secure is all this stuff in reality?. The answer to 1 might vary. On internet it's common to have the public key signed by someone everyone trusts. Another simple solution is that the receiver got the public key from a meeting in personal... The answer to 2 might change from day-to-day, but what's costly to force to day will probably be cheap to break some time in the future. By that time new algorithms and/or enlarged key sizes has hopefully emerged.
You can of course, but in that case the SHA digest of the whole file will not be the SHA you included, because it is a cryptographic hash function, so changing a single bit in the file changes the whole hash. What you are looking for is a checksum calculated using the content of the file in way to match a set of criteria.
Sure.
The simplest way would be to run the file through an MD5 algorithm and embed that data within the file. You can split up the check sum and place it at known points of the file (based on a portion size of the file e.g. 30%, 50%, 75%) if you wish to try and hide it.
Similarly you could encrypt the file, or encrypt a portion of the file (along with the MD5 checksum) and embed that in the file.
Edit
I forgot to say that you would need to remove the checksum data before using it.
Of course if your file needs to be readily readable by another program e.g. Word then things become a little more complicated as you don't want to "corrupt" the file so that it is no longer readable.

Resources