CRC user's guide for file compare

CRC user's guide for file compare - linux

Can someone point me to a CRC user's guide for file compare?
I have a requirement to use CRC to compare and confirm two files match. I have reviewed this site for how to use to use CRC with no real luck.
I have also looked for a CRC user's guide for file compare with no luck.

You compute the CRC on file 1. You compute the CRC on file 2. If the CRCs are not the same, then the files are different. If the CRCs are the same, then the files might also be the same. There is no assurance that the files are actually the same, unless you compare them directly.
You can get a pretty high probabilistic assurance that two files are the same if you use a long cryptographic hash, such as SHA-256. That gives you both a very low probability that two different files accidentally have the same SHA-256 (2–256 ~= 10–77), and it is considered to be not possible to modify a file to match the SHA-256 of another, different file. A CRC on the other hand is quite easy to spoof by someone who wants to fool you.

Related

how to check compression type without decompressing?

I wrote code in nodejs to decompress different file types (like tar, tar.gz etc..)
I do not have the filename available to me.
Currently I use brute force to decompress. The first one that succeeds, wins..
I want to improve this by knowing the compression type beforehand.
Is there a way to do this?

Your "brute force" approach would actually work very well, since the software would determine incredibly quickly, usually within the first few bytes, that it had been handed the wrong thing. Except for the one that will work.
You can see this answer for a list of prefix bytes for common formats. You would also need to detect the tar format within a compressed format, which is not detailed there. Even if you find a matching prefix, you still need to proceed to decompress and decode to test the hypothesis, which is essentially your brute force method.

Custom non-common filesystem for embedded linux

My embedded linux gets its data files from an external source (sd card). As this media is easily detachable I'd like to protect it in a certain way.
First idea that comes in mind is to do encryption. I'm afraid though this would take too much processing power. My files are not deeply sensitive, but I don't want that people can put the card into their desktop and see/copy my files. I assume these people know how to mount a standard ext4 drive.
Content is initially loaded on to the disk via a desktop linux box, so the process should be
I wouldn't care too much if the solution is not hack-proof. Basically I want to avoid to have my content copied by the general copycat.
I'm not looking for a turn-key solution, but like to get some pointers into the right direction.

A simple XOR Cipher requires very little processing. The security is limited in the sense that if someone has a both the encrypted and plain-text data, by XOR'ing the two the encryption key is revealed. However so long as you can avoid someone being knowingly in possession of both, and the key itself remains confidential, it may meet your requirements of simplicity and security.
Obviously you need a longer key that the simple 8 bit one in the example in the link. The key itself can be arbitrarily long with no impact on performance.

How to disable a Linux entropy pool source

How do I disable entropy sources?
Here's a little background on what I'm trying to do. I'm building a little RNG device that talks to my PC via USB. I want it to be the only source of entropy used. I'll use rngd to add my device as a source of entropy.

Quick answer is "you don't".
Don't ever remove sources of entropy. The designers of the random number generator rigged it so any new random bits just get mixed in with the current state.
Having multiple sources of entropy never weaken the random number generator's output, only strengthen it.
The only reason I can think to remove a source of entropy is that it sucks CPU or wall-clock time that you cannot afford. I find this highly unlikely but if this is the case, then your only option is kernel hacking. As far as hacking the kernel goes, this should be fairly simple. Just comment out all the calls to the add_*_randomness() functions throughout the kernel source code (the functions themselves are found in drivers/char/random.c). You could just comment out the contents of the functions but you are trying to save time in this case and the minuscule time the extra function call takes could be too much.

One solution is to to run separate linux instance in a virtual machine.

Additional note, too big for comment:
Depending on its settings, rngd can dominate the kernel's entropy pool,
by feeding it so much data, so often, that other sources of entropy are
mostly ignored or lost. Do not to that unless you trust rngd's source
of random data ultimately.
http://man.he.net/man8/rngd

I suspect you might want a fast random generator.
Edit I should have read the question better
Anyways, frandom comes with a complete tarball for the kernel module so you might be able to learn how to build your own module around your USB device. Perhaps, you can even have it replace/displace /dev/urandom so arbitrary applications would work with it instead of /dev/urandom (of course, given enough permissions, you could just rename the device nodes and 'fool' most applications).
You could look at http://billauer.co.il/frandom.html, which implements that.
Isn't /dev/urandom enough?
Discussions about the necessity of a faster kernel random number generator rise and fall since 1996 (that I know of). My opinion is that /dev/frandom is as necessary as /dev/zero, which merely creates a stream of zeroes. The common opposite opinion usually says: Do it in user space.
What's the difference between /dev/frandom and /dev/erandom?
In the beginning I wrote /dev/frandom. Then it turned out that one of the advantages of this suite is that it saves kernel entropy. But /dev/frandom consumes 256 bytes of kernel random data (which may, in turn, eat some entropy) every time a device file is opened, in order to seed the random generator. So I made /dev/erandom, which uses an internal random generator for seeding. The "F" in frandom stands for "fast", and "E" for "economic": /dev/erandom uses no kernel entropy at all.
How fast is it?
Depends on your computer and kernel version. Tests consistently show 10-50 times faster than /dev/urandom.
Will it work on my kernel?
It most probably will, if it's > 2.6
Is it stable?
Since releasing the initial version in fall 2003, at least 100 people have tried it (probably many more) on i686 and x86_64 systems alike. Successful test reports have arrived, and zero complaints. So yes, it's very stable. As for randomness, there haven't been any complaints either.
How is random data generated?
frandom is based on the RC4 encryption algorithm, which is considered secure, and is used by several applications, including SSL. Let's start with how RC4 works: It takes a key, and generates a stream of pseudo-random bytes. The actual encryption is a XOR operation between this stream of bytes and the cleartext data stream.
Now to frandom: Every time /dev/frandom is opened, a distinct pseudo-random stream is initialized by using a 2048-bit key, which is picked by doing something equivalent to reading the key from /dev/urandom. The pseudo-random stream is what you read from /dev/frandom.
frandom is merely RC4 with a random key, just without the XOR in the end.
Does frandom generate good random numbers?
Due to its origins, the random numbers can't be too bad. If they were, RC4 wouldn't be worth anything.
As for testing: Data directly "copied" from /dev/frandom was tested with the "Diehard" battery of tests, developed by George Marsaglia. All tests passed, which is considered to be a good indication.
Can frandom be used to create one-time pads (cryptology)?
frandom was never intended for crypto purposes, nor was any special thought given to attacks. But there is very little room for attacking the module, and since the module is based upon RC4, we have the following fact: Using data from /dev/frandom as a one-time pad is equivalent to using the RC4 algorithm with a 2048-bit key, read from /dev/urandom.
Bottom line: It's probably OK to use frandom for crypto purposes. But don't. It wasn't the intention.

Combining resources into a single binary file

How does one combine several resources for an application (images, sounds, scripts, xmls, etc.) into a single/multiple binary file so that they're protected from user's hands? What are the typical steps (organizing, loading, encryption, etc...)?
This is particularly common in game development, yet a lot of the game frameworks and engines out there don't provide an easy way to do this, nor describe a general approach. I've been meaning to learn how to do it, but I don't know where to begin. Could anyone point me in the right direction?

There are lots of ways to do this. m_pGladiator has some good ideas, especially with seralization. I would like to make a few other comments.
First, if you are going to pack a bunch of resources into a single file (I call these packfiles), then I think that you should work to avoid loading the whole file and then deseralizing out of that file into memory. The simple reason is that it's more memory. That's really not a problem on PC's I guess, but it's good practice, and it's essential when working on the console. While we don't (currently) serialize objects as m_pGladiator has suggested, we are moving towards that.
There are two types of packfiles that you might have. One would be a file where you want arbitrary access to the contents of the files. A second type might be a collection of files where you need all of those files when loading a level. A basic example might be:
An audio packfile might contain all the audio for your game. You might only need to load certain kinds of audio for the menus or interface screens and different sets of audio for the levels. This might fall intot he first category above.
A type that falls into the second category might be all models/textures/etc for a level. You basically want to load the entire contents of this file into the game at load time because you will (likely) need all of it's contents while a player is playing that level or section.
many of the packfiles that we build fall into the second category. We basically package up the level contents, and then compresses them with something like zlib. When we load one of these at game time, we read a small amount of the file, uncompress what we've read into a memory buffer, and then repeat until the full file has been read into memory. The buffer we read into is relatively small while final destination buffer is large enough to hold the largest set of uncompressed data that we need. This method is tricky, but again, it saves on RAM, it's an interesting exercise to get working, and you feel all nice and warm inside because you are being a good steward of system resources. once the packfile has been completely uncompressed into it's destinatino buffer, we run a final pass on the buffer to fix up pointer locations, etc. This method only works when you write out your packfile as structures that the game knows. In other words, our packfile writing tools share struct (or classses) with the game code. We are basically writing out and compressing exact representations of data structures.
If you simply want to cut down on the number of files that you are shipping and installing on a users machine, you can do with something like the first kind of packfile that I describe. Maybe you have 1000s of textures and would just simply like to cut down on the sheer number of files that you have to zip up and package. You can write a small utility that will basically read the files that you want to package together and then write a header containing the files and their offsets in the packfile, and then you can write the contents of the file, one at a time, one after the other, in your large binary file. At game time, you can simply load the header of this packfile and store the filenames and offsets in a hash. When you need to read a file, you can hash the filename and see if it exists in your packfile, and if so, you can read the contents directly from the packfile by seeking to the offset and then reading from that location in the packfile. Again, this method is basically a way to pack data together without regards for encryption, etc. It's simply an organizational method.
But again, I do want to stress that if you are going a route like I or m_pGladiator suggests, I would work hard to not have to pull the whole file into RAM and then deserialize to another location in RAM. That's a waste of resources (that you perhaps have plenty of). I would say that you can do this to get it working, and then once it's working, you can work on a method that only reads part of the file at a time and then decompresses to your destination buffer. You must use a comprsesion scheme that will work like this though. zlib and lzw both do (I believe). I'm not sure about an MD5 algorithm.
Hope that this helps.

do as Java: pack it all in a zip, and use an filesystem-like API to read directly from there.

Personally, I never used the already available tools to do that. If you want to prevent your game to be hacked easily, then you have to develop your own resource manipulation engine.
First of all read about serializing objects. When you load a resource from file (graphic, sound or whatever), it is stored in some object instance in the memory. A game usually uses dozens of graphical and sound objects. You have to make a tool, which loads them all and stores them in collections in the memory. Then serialize those collections into a binary file and you have every resource there.
Then you can use for example MD5 or any other encryption algorithm to encrypt this file.
Also, you can use zlib or other compression library to make this big binary file a bit smaller.
In the game, you should load the encrypted binary file and unpack it. Then decrypt it. Then deserialize the object collections and you have all resources back in memory.
Of course you can make this more comprehensive by storing in different binary files the resources for different levels and so on - there are plenty of variants, depending on what you want. Also you can first zip, then encrypt, or make other combinations of the steps.

Short answer: yes.
In Mac OS 6,7,8 there was a substantial API devoted to this exact task. Lookup the "Resource Manager" if you are interested. Edit: So does the ROOT physics analysis package.
Not that I know of a good tool right now. What platform(s) do you want it to work on?
Edited to add: All of the two-or-three tools of this sort that I am away of share a similar struture:
The file starts with a header and index
There are a series of blocks some of which may have there own headers and indicies, some of which are leaves
Each leaf is a simple serialization of the data to be stored.
The whole file (or sometimes individual blocks) may be compressed.
Not terribly hard to implement your own, but I'd look for a good existing one that meets your needs first.

For future people, like me, who are wondering about this same topic, check out the two following links:
http://www.sfml-dev.org/wiki/en/tutorials/formatdat
http://archive.gamedev.net/reference/programming/features/pak/

Will random data appended to a JPG make it unusable?

So, to simplify my life I want to be able to append from 1 to 7 additional characters on the end of some jpg images my program is processing*. These are dummy padding (fillers, etc - probably all 0x00) just to make the file size a multiple of 8 bytes for block encryption.
Having tried this out with a few programs, it appears they are fine with the additional characters, which occur after the FF D9 that specifies the end of the image - so it appears that the file format is well defined enough that the 'corruption' I'm adding at the end shouldn't matter.
I can always post process the files later if needed, but my preference is to do the simplest thing possible - which is to let them remain (I'm decrypting other file types and they won't mind, so having a special case is annoying).
I figure with all the talk of Steganography hullaballo years ago, someone has some input here...
(encryption processing by 8 byte blocks, I don't want to save pre-encrypted file size, so append 0x00 to input data, and leave them there after decoding)

No, you can add bits to the end of a jpg file, without making it unusable. The heading of the jpg file tells how to read it, so the program reading it will stop at the end of the jpg data.
In fact, people have hidden zip files inside jpg files by appending the zip data to the end of the jpg data. Because of the way these formats are structured, the resulting file is valid in either format.

You can .. but the results may be unpredictable.
Even though there is enough information in the format to tell the client to ignore the extra data it is likely not a case the programmer tested for.
A paranoid program might look at the size, notice the discrepancy and decide it won't process your file because clearly it doesn't fully understand it. This is particularly likely when reading data from the web when random bytes in a file could be considered a security risk.

You can embed your data in the XMP tag within a JPEG (or EXIF or IPTC fields for that matter).
XMP is XML so you have a fair bit of flexibility there to do you own custom stuff.
It's probably not the simplest thing possible but putting your data here will maintain the integrity of the JPEG and require no "post processing".
You data will then show up in other imaging software such as PhotoShop, which may not be ideal.

As others have stated, you have no control how programs process image files and therefore some programs may find the images valid others may not.
However, there is a bigger issue here. Judging by your question, I'm deducing you're practicing "security through obscurity." It's widely considered a very bad practice. Use Google to find a plethora of articles about the topic.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string