Checksum Algorithm Producing Unpredictable Results - groovy

I'm working on a checksum algorithm, and I'm having some issues. The kicker is, when I hand craft a "fake" message, that is substantially smaller than the "real" data I'm receiving, I get a correct checksum. However, against the real data - the checksum does not work properly.
Here's some information on the incoming data/environment:
This is a groovy project (see code below)
All bytes are to be treated as unsigned integers for the purpose of checksum calculation
You'll notice some finagling with shorts and longs in order to make that work.
The size of the real data is 491 bytes.
The size of my sample data (which appears to add correctly) is 26 bytes
None of my hex-to-decimal conversions are producing a negative number, as best I can tell
Some bytes in the file are not added to the checksum. I've verified that the switch for these is working properly, and when it is supposed to - so that's not the issue.
My calculated checksum, and the checksum packaged with the real transmission always differ by the same amount.
I have manually verified that the checksum packaged with the real data is correct.
Here is the code:
// add bytes to checksum
public void addToChecksum( byte[] bytes) {
//if the checksum isn't enabled, don't add
if(!checksumEnabled) {
return;
}
long previouschecksum = this.checksum;
for(int i = 0; i < bytes.length; i++) {
byte[] tmpBytes = new byte[2];
tmpBytes[0] = 0x00;
tmpBytes[1] = bytes[i];
ByteBuffer tmpBuf = ByteBuffer.wrap(tmpBytes);
long computedBytes = tmpBuf.getShort();
logger.info(getHex(bytes[i]) + " = " + computedBytes);
this.checksum += computedBytes;
}
if(this.checksum < previouschecksum) {
logger.error("Checksum DECREASED: " + this.checksum);
}
//logger.info("Checksum: " + this.checksum);
}
If anyone can find anything in this algorithm that could be causing drift from the expected result, I would greatly appreciate your help in tracking this down.

I don't see a line in your code where you reset your this.checksum.
This way, you should alway get a this.checksum > previouschecksum, right? Is this intended?
Otherwise I can't find a flaw in your above code. Maybe your 'this.checksum' is of the wrong type (short for instance). This could rollover so that you get negative values.
here is an example for such a behaviour
import java.nio.ByteBuffer
short checksum = 0
byte[] bytes = new byte[491]
def count = 260
for (def i=0;i<count;i++) {
bytes[i]=255
}
bytes.each { b ->
byte[] tmpBytes = new byte[2];
tmpBytes[0] = 0x00;
tmpBytes[1] = b;
ByteBuffer tmpBuf = ByteBuffer.wrap(tmpBytes);
long computedBytes = tmpBuf.getShort();
checksum += computedBytes
println "${b} : ${computedBytes}"
}
println checksum +"!=" + 255*count
just play around with the value of the 'count' variable which somehow corresponds to the lenght of your input.

Your checksum will keep incrementing until it rolls over to being negative (as it is a signed long integer)
You can also shorten your method to:
public void addToChecksum( byte[] bytes) {
//if the checksum isn't enabled, don't add
if(!checksumEnabled) {
return;
}
long previouschecksum = this.checksum;
this.checksum += bytes.inject( 0L ) { tot, it -> tot += it & 0xFF }
if(this.checksum < previouschecksum) {
logger.error("Checksum DECREASED: " + this.checksum);
}
//logger.info("Checksum: " + this.checksum);
}
But that won't stop it rolling over to being negative. For the sake of saving 12 bytes per item that you are generating a hash for, I would still suggest something like MD5 which is know to work is probably better than rolling your own... However I understand sometimes there are crazy requirements you have to stick to...

Related

Node.js sliced string memory

I was trying:
r = [];
for (i = 0; i < 1e3; i++) {
a = (i+'').repeat(1e6);
r[i] = a.slice(64, 128);
}
and got an OutOfMemory. From here we see it's because all the as are kept in GC cuz a part of them are used.
How to make the slice don't keep the memory? I tried r[i]=''+a.slice(64, 128)+'' but still OOM. Do I have to a[64]+...+a[127] (loops also count as brute force)?
Is it so hard to slice and keep only necessary part of the old large string? The problem here only mentioned "copying every substring as a new string", but not "freeing part of the string remaining the necessary part assessible"
In this case it makes sense for the application code to be more aware of system constraints:
const r = [];
for (let i = 0; i < 1e3; ++i) {
const unitStr = String(i);
// choose something other than "1e6" here:
const maxRepeats = Math.ceil(128 / unitStr.length); // limit the size of the new string
// only using the last 64 characters...
r[i] = unitStr.repeat(maxRepeats).slice(64, 128);
}
...The application improvement is: to no longer construct 1000 strings of up to 3,000,000 bytes each when only 64 bytes are needed for each output string.
Your hardware and other constraints are not specified, but sometimes allowing the program more memory is appropriate:
node --max-old-space-size=8192 my-script.js
An analytic approach. Use logic to more precisely determine the in-memory state required for each working data chunk. With the constraints provided, minimize the generation of unneeded in-memory string data.
const r = new Array(1e3).fill().map((e,i) => outputRepeats(i));
function outputRepeats(idx) {
const OUTPUT_LENGTH = 128 - 64;
const unitStr = String(idx); // eg, '1', '40' or '286'
// determine from which character to start output from "unitStr"
const startIdxWithinUnit = (64 + 1) % unitStr.length; // this can be further optimized for known ranges of the "idx" input
// determine the approximate output string (may consume additional in-memory bytes: up to unitStr.length - 1)
// this can be logically simplified by unconditionally using a few more bytes of memory and eliminating the second arithmetic term
const maxOutputWindowStr = unitStr.repeat(Math.ceil(OUTPUT_LENGTH / unitStr.length) + Math.floor(Math.sign(startIdxWithinUnit)));
// return the exact resulting string
return maxOutputWindowStr.slice(startIdxWithinUnit, OUTPUT_LENGTH);
}

Find the minimum number of messages that need to be sent

I recently encountered a problem in one of my coding interview tests. The problem is as follows.
Suppose there is a service to send messages to a user. Each message has the length of maximum 30 characters. This service receives a complete message and then breaks it into sub-messages, each of size 30 characters at most. But there is an issue with the service. It doesn't guarantee the order in which the sub-messages are received by the user. Hence, for every sub-message, it appends a suffix (k/n) where k denotes the kth sub-message out of the n sub-messages. This suffix is also considered when counting the number of characters in the sub-message which cannot exceed 30. Find the minimum number of sub-messages required to send.
Eg-1:
message: The quick brown fox jumps over the lazy dog
The first sub-message can be: The quick brown fox jumps (1/2) but
the above is incorrect as it exceeds 30 characters. This has 31 characters.
So,the correct sub-messages are:
The quick brown fox (1/2)
jumps over the lazy dog (2/2)
So, the answer is 2.
Eg-2:
message: The quick brown fox jumps over the lazy tortoise
So,the correct sub-messages are:
The quick brown fox (1/3)
jumps over the lazy (2/3)
tortoise (3/3)
So, the answer is 3.
Eg-3:
message: Hello My name is
sub-message: Hello My name is
Answer = 1.
Note: A word cannot be broken across sub-messages. Assume no word is greater than 30 characters in length. If its a single message, then no need to use the suffix
My approach: If the total character length of string is less than 30 then return 1. If not, then get sub-message till character count is 30, checking per word. But now it gets complicated as I don't know the value of n in the suffix. Is there a simpler way to approach the problem?
Thanks for posting this, I do enjoy these sorts of problems.
As domen mentioned above, there is a bit of a challenge here in that you do not know how many lines are required. Thus you do not know whether to allow for 2 (or more) digits for the message number / total message count. Also, could you use Hexadecimal (16 messages require a single digit, or even a base 62 format number (0-9, then A-Z followed by a-z)?
You could of course use a guess and say, if the input is more than, say, 200 characters, then you might use a two digit message number, but then if the message was a single letter followed by a single space repeated 100 times, then you could probably get away with single digit message numbers.
So, you might find that you need to run the algorithm a couple of times. I will assume that a single digit message number is acceptable for this problem, you can enhance my solution to use base 52 message numbers if you like.
My approach uses 2 classes:
Create a class MessageLine that represents a single line of the message.
MessageSender a class that collects the MessageLine(s). It has a helper method that and processes the message and returns a list of MessageLines.
Here is the main MessageSender class. If you run it, you can pass a message on the command line for it to process.
package com.gtajb.stackoverflow;
import java.util.LinkedList;
public class MessageSender {
public static void main(String[] args) {
if (args.length == 0) {
System.out.println("Please supply a message to send");
System.exit(1);
}
// Collect the command line parameters into a single string.
StringBuilder sb = new StringBuilder();
boolean firstWord = true;
for (String s: args) {
if (!firstWord) {
sb.append(" ");
}
firstWord = false;
sb.append(s);
}
// Process the input String and create the MessageSender object.
MessageSender ms = new MessageSender(sb.toString());
System.out.println("Input message: " + sb.toString());
// Retrieve the blocked message and output it.
LinkedList<MessageLine> msg = ms.getBlockedMessage();
int lineNo = 0;
for (MessageLine ml : msg) {
lineNo += 1;
System.out.printf("%2d: %s\n", lineNo, ml.getFormattedLine(msg.size()));
}
}
private String msg;
public MessageSender(String msg) {
this.msg = msg;
processMessage();
}
private LinkedList<MessageLine> blockedMessage = new LinkedList<MessageLine> ();
public LinkedList<MessageLine> getBlockedMessage() {
return blockedMessage;
}
private static final int LINE_MAX_SIZE = 30;
/**
* A private helper method that processes the supplied message when
* the object is constructed.
*/
private void processMessage() {
// Split the message into words and work out how long the message is.
String [] words = msg.split("\\s+");
int messageLength = 0;
for (String w: words) {
messageLength += w.length();
}
messageLength += words.length - 1; // Add in the number of words minus one to allow for the single spaces.
// Can we get away with a single MessageLine?
if (messageLength < LINE_MAX_SIZE) {
// A single message line is good enough.
MessageLine ml = new MessageLine(1);
blockedMessage.add(ml);
for (String w: words) {
ml.add(w);
}
} else {
// Multiple MessageLines will be required.
int lineNo = 1;
MessageLine ml = new MessageLine(lineNo);
blockedMessage.add(ml);
for (String w: words) {
// check if this word will blow the max line length.
// The maximum number of lines is 2. It can be anything that is > 1.
if (ml.getFormattedLineLength(2) + w.length() + 1 > LINE_MAX_SIZE) {
// The word will blow the line length, so create a new line.
lineNo += 1;
ml = new MessageLine(lineNo);
blockedMessage.add(ml);
}
ml.add(w);
}
}
}
}
and here is the Message Line class:
package com.gtajb.stackoverflow;
import java.util.LinkedList;
public class MessageLine extends LinkedList<String> {
private int lineNo;
public MessageLine(int lineNo) {
this.lineNo = lineNo;
}
/**
* Add a new word to this message line.
* #param word the word to add
* #return true if the collection is modified.
*/
public boolean add(String word) {
if (word == null || word.trim().length() == 0) {
return false;
}
return super.add(word.trim());
}
/**
* Return the formatted message length.
* #param totalNumLines the total number of lines in the message.
* #return the length of this line when formatted.
*/
public int getFormattedLineLength(int totalNumLines) {
return getFormattedLine(totalNumLines).length();
}
/**
* Return the formatted line optionally with the line count information.
* #param totalNumLines the total number of lines in the message.
* #return the formatted line.
*/
public String getFormattedLine(int totalNumLines) {
boolean firstWord = true;
StringBuilder sb = new StringBuilder();
for (String w : this) {
if (! firstWord) {
sb.append (" ");
}
firstWord = false;
sb.append(w);
}
if (totalNumLines > 1) {
sb.append (String.format(" (%d/%d)", lineNo, totalNumLines));
}
return sb.toString();
}
}
I tested your scenarios and it seems to produce the correct result.
Let me know if we get the job. :-)
You can binary-search on the total number of submessages. That is, start with two numbers L and H, such that you know that L submessages are not enough, and that H submessages are enough, and see whether their average (L+H)/2 is enough by trying to construct a solution under the assumption that that many submessages are involved: If it is, make that the new H, otherwise make it the new L. Stop as soon as H = L+1: H is then the smallest number of submessages that works, so construct an actual solution using that many submessages. This will require O(n log n) time.
To get initial values for L and H, you could start at 1 and keep doubling until you get a high enough number. The first value that is large enough to work becomes your H, and the previous one your L.
BTW, the constraints you give are not enough to ensure a solution exists: For example, an input consisting of two 29-letter words separated by a space has no solution.

Node.js - How to generate random numbers in specific range using crypto.randomBytes

How can I generate random numbers in a specific range using crypto.randomBytes?
I want to be able to generate a random number like this:
console.log(random(55, 956)); // where 55 is minimum and 956 is maximum
and I'm limited to use crypto.randomBytes only inside random function to generate random number for this range.
I know how to convert generated bytes from randomBytes to hex or decimal but I can't figure out how to get a random number in a specific range from random bytes mathematically.
To generate random number in a certain range you can use the following equation
Math.random() * (high - low) + low
But you want to use crypto.randomBytes instead of Math.random()
this function returns a buffer with randomly generated bytes. In turn, you need to convert the result of this function from bytes to decimal. this can be done using biguint-format package. To install this package simply use the following command:
npm install biguint-format --save
Now you need to convert the result of crypto.randomBytes to decimal, you can do that as follow:
var x= crypto.randomBytes(1);
return format(x, 'dec');
Now you can create your random function which will be as follow:
var crypto = require('crypto'),
format = require('biguint-format');
function randomC (qty) {
var x= crypto.randomBytes(qty);
return format(x, 'dec');
}
function random (low, high) {
return randomC(4)/Math.pow(2,4*8-1) * (high - low) + low;
}
console.log(random(50,1000));
Thanks to answer from #Mustafamg and huge help from #CodesInChaos I managed to resolve this issue. I made some tweaks and increase range to maximum 256^6-1 or 281,474,976,710,655. Range can be increased more but you need to use additional library for big integers, because 256^7-1 is out of Number.MAX_SAFE_INTEGER limits.
If anyone have same problem feel free to use it.
var crypto = require('crypto');
/*
Generating random numbers in specific range using crypto.randomBytes from crypto library
Maximum available range is 281474976710655 or 256^6-1
Maximum number for range must be equal or less than Number.MAX_SAFE_INTEGER (usually 9007199254740991)
Usage examples:
cryptoRandomNumber(0, 350);
cryptoRandomNumber(556, 1250425);
cryptoRandomNumber(0, 281474976710655);
cryptoRandomNumber((Number.MAX_SAFE_INTEGER-281474976710655), Number.MAX_SAFE_INTEGER);
Tested and working on 64bit Windows and Unix operation systems.
*/
function cryptoRandomNumber(minimum, maximum){
var distance = maximum-minimum;
if(minimum>=maximum){
console.log('Minimum number should be less than maximum');
return false;
} else if(distance>281474976710655){
console.log('You can not get all possible random numbers if range is greater than 256^6-1');
return false;
} else if(maximum>Number.MAX_SAFE_INTEGER){
console.log('Maximum number should be safe integer limit');
return false;
} else {
var maxBytes = 6;
var maxDec = 281474976710656;
// To avoid huge mathematical operations and increase function performance for small ranges, you can uncomment following script
/*
if(distance<256){
maxBytes = 1;
maxDec = 256;
} else if(distance<65536){
maxBytes = 2;
maxDec = 65536;
} else if(distance<16777216){
maxBytes = 3;
maxDec = 16777216;
} else if(distance<4294967296){
maxBytes = 4;
maxDec = 4294967296;
} else if(distance<1099511627776){
maxBytes = 4;
maxDec = 1099511627776;
}
*/
var randbytes = parseInt(crypto.randomBytes(maxBytes).toString('hex'), 16);
var result = Math.floor(randbytes/maxDec*(maximum-minimum+1)+minimum);
if(result>maximum){
result = maximum;
}
return result;
}
}
So far it works fine and you can use it as really good random number generator, but I strictly not recommending using this function for any cryptographic services. If you will, use it on your own risk.
All comments, recommendations and critics are welcome!
To generate numbers in the range [55 .. 956], you first generate a random number in the range [0 .. 901] where 901 = 956 - 55. Then add 55 to the number you just generated.
To generate a number in the range [0 .. 901], pick off two random bytes and mask off 6 bits. That will give you a 10 bit random number in the range [0 .. 1023]. If that number is <= 901 then you are finished. If it is bigger than 901, discard it and get two more random bytes. Do not attempt to use MOD, to get the number into the right range, that will distort the output making it non-random.
ETA: To reduce the chance of having to discard a generated number.
Since we are taking two bytes from the RNG, we get a number in the range [0 .. 65535]. Now 65535 MOD 902 is 591. Hence, if our two-byte random number is less than (65535 - 591), that is, less than 64944, we can safely use the MOD operator, since each number in the range [0 .. 901] is now equally likely. Any two-byte number >= 64944 will still have to be thrown away, as using it would distort the output away from random. Before, the chances of having to reject a number were (1024 - 901) / 1024 = 12%. Now the chances of a rejection are (65535 - 64944) / 65535 = 1%. We are far less likely to have to reject the randomly generated number.
running <- true
while running
num <- two byte random
if (num < 64944)
result <- num MOD 902
running <- false
endif
endwhile
return result + 55
The crypto package now has a randomInt() function. It was added in v14.10.0 and v12.19.0.
console.log(crypto.randomInt(55, 957)); // where 55 is minimum and 956 is maximum
The upper bound is exclusive.
Here is the (abridged) implementation:
// Largest integer we can read from a buffer.
// e.g.: Buffer.from("ff".repeat(6), "hex").readUIntBE(0, 6);
const RAND_MAX = 0xFFFF_FFFF_FFFF;
const range = max - min;
const excess = RAND_MAX % range;
const randLimit = RAND_MAX - excess;
while (true) {
const x = randomBytes(6).readUIntBE(0, 6);
// If x > (maxVal - (maxVal % range)), we will get "modulo bias"
if (x > randLimit) {
// Try again
continue;
}
const n = (x % range) + min;
return n;
}
See the full source and the official docs for more information.
So the issue with most other solutions are that they distort the distribution (which you probably would like to be uniform).
The pseudocode from #rossum lacks generalization. (But he proposed the right solution in the text)
// Generates a random integer in range [min, max]
function randomRange(min, max) {
const diff = max - min + 1;
// finds the minimum number of bit required to represent the diff
const numberBit = Math.ceil(Math.log2(diff));
// as we are limited to draw bytes, minimum number of bytes
const numberBytes = Math.ceil(numberBit / 4);
// as we might draw more bits than required, we look only at what we need (discard the rest)
const mask = (1 << numberBit) - 1;
let randomNumber;
do {
randomNumber = crypto.randomBytes(numberBytes).readUIntBE(0, numberBytes);
randomNumber = randomNumber & mask;
// number of bit might represent a numbers bigger than the diff, in that case try again
} while (randomNumber >= diff);
return randomNumber + min;
}
About performance concerns, basically the number is in the right range between 50% - 100% of the time (depending on the parameters). That is in the worst case scenario the loop is executed more than 7 times with less than 1% chance and practically, most of the time the loop is executed one or two times.
The random-js library acknowledges that most solution out there don't provide random numbers with uniform distributions and provides a more complete solution

How to find SHA1 hash?

i got interesting task at school. I have to find message which sha-1 hash lasts with my birthday example. if i was born on 4th may 1932 then the hash must end with 040532. Any suggestions how to find it out?
my solution in C#:
//A create Sha1 function:
using System.Security.Cryptography;
public static string GetSHA1Hash(string text)
{
var SHA1 = new SHA1CryptoServiceProvider();
byte[] arrayData;
byte[] arrayResult;
string result = null;
string temp = null;
arrayData = Encoding.ASCII.GetBytes(text);
arrayResult = SHA1.ComputeHash(arrayData);
for (int i = 0; i < arrayResult.Length; i++)
{
temp = Convert.ToString(arrayResult[i], 16);
if (temp.Length == 1)
temp = "0" + temp;
result += temp;
}
return result;
}
Source
Then a Random String generator:
private static Random random = new Random((int)DateTime.Now.Ticks);//thanks to McAden
private string RandomString(int size)
{
StringBuilder builder = new StringBuilder();
char ch;
for (int i = 0; i < size; i++)
{
ch = Convert.ToChar(Convert.ToInt32(Math.Floor(26 * random.NextDouble() + 65)));
builder.Append(ch);
}
return builder.ToString();
}
Source
and now you can bruteforce for your combination:
string search = "32";
string result = String.Empty;
int slen = 5;
string myTry = RandomString(slen);
while (!result.EndsWith(search))
{
myTry = RandomString(slen);
result = GetSHA1Hash(myTry);
}
MessageBox.Show(result + " " + myTry);
This would search for a Hash String ending with 32. Happy Bruteforcing :)
EDIT: found one for your example: HXMQVNMRFT gives e5c9fa9f6acff07b89c617c7fd16a9a043040532
Start generating hashes from distinct messages1.
Eventually a hash will be generated with such a property. This is not that bad to brute-force as the range is only 224 (or ~16 million) and SHA is very fast.
There is no shortcut as SHA is a one way cryptographic hash function. In particular here, SHA has the property that "it is infeasible to generate a message that has a given hash".
1 The inputs should be distinct, and a simple counter will suffice. However, it may be more interesting to generate quasi-random messages based on the birthday being sought - e.g. including the date in various forms and sentences Mad Lib style. As long as this doesn't limit the domain, such that there is no qualifying hash, it'll work just as well as any other set of source messages.

Generating a fake ISBN from book title? (Or: How to hash a string into a 6-digit numeric ID)

Short version: How can I turn an arbitrary string into a 6-digit number with minimal collisions?
Long version:
I'm working with a small library that has a bunch of books with no ISBNs. These are usually older, out-of-print titles from tiny publishers that never got an ISBN to begin with, and I'd like to generate fake ISBNs for them to help with barcode scanning and loans.
Technically, real ISBNs are controlled by commercial entities, but it is possible to use the format to assign numbers that belong to no real publisher (and so shouldn't cause any collisions).
The format is such that:
978-0-01-######-?
Gives you 6 digits to work with, from 000000 to 999999, with the ? at the end being a checksum.
Would it be possible to turn an arbitrary book title into a 6-digit number in this scheme with minimal chance of collisions?
After using code snippets for making a fixed-length hash and calculating the ISBN-13 checksum, I managed to create really ugly C# code that seems to work. It'll take an arbitrary string and convert it into a valid (but fake) ISBN-13:
public int GetStableHash(string s)
{
uint hash = 0;
// if you care this can be done much faster with unsafe
// using fixed char* reinterpreted as a byte*
foreach (byte b in System.Text.Encoding.Unicode.GetBytes(s))
{
hash += b;
hash += (hash << 10);
hash ^= (hash >> 6);
}
// final avalanche
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
// helpfully we only want positive integer < MUST_BE_LESS_THAN
// so simple truncate cast is ok if not perfect
return (int)(hash % MUST_BE_LESS_THAN);
}
public int CalculateChecksumDigit(ulong n)
{
string sTemp = n.ToString();
int iSum = 0;
int iDigit = 0;
// Calculate the checksum digit here.
for (int i = sTemp.Length; i >= 1; i--)
{
iDigit = Convert.ToInt32(sTemp.Substring(i - 1, 1));
// This appears to be backwards but the
// EAN-13 checksum must be calculated
// this way to be compatible with UPC-A.
if (i % 2 == 0)
{ // odd
iSum += iDigit * 3;
}
else
{ // even
iSum += iDigit * 1;
}
}
return (10 - (iSum % 10)) % 10;
}
private void generateISBN()
{
string titlehash = GetStableHash(BookTitle.Text).ToString("D6");
string fakeisbn = "978001" + titlehash;
string check = CalculateChecksumDigit(Convert.ToUInt64(fakeisbn)).ToString();
SixDigitID.Text = fakeisbn + check;
}
The 6 digits allow for about 10M possible values, which should be enough for most internal uses.
I would have used a sequence instead in this case, because a 6 digit checksum has relatively high chances of collisions.
So you can insert all strings to a hash, and use the index numbers as the ISBN, either after sorting or without it.
This should make collisions almost impossible, but it requires keeping a number of "allocated" ISBNs to avoid collisions in the future, and keeping the list of titles that are already in store, but it's information that you would most probably want to keep anyway.
Another option is to break the ISBN standard and use hexadecimal/uuencoded barcodes, that may increase the possible range to a point where it may work with a cryptographic hash truncated to fit.
I would suggest that since you are handling old book titles, which may have several editions capitalized and punctuated differently, I would strip punctuation, duplicated whitespaces and convert everything to lowercase before the comparison to minimize the chance of a technical duplicate even though the string is different (Unless you want different editions to have different ISBNs, in that case, you can ignore this paragraph).

Resources