I was trying:
r = [];
for (i = 0; i < 1e3; i++) {
a = (i+'').repeat(1e6);
r[i] = a.slice(64, 128);
}
and got an OutOfMemory. From here we see it's because all the as are kept in GC cuz a part of them are used.
How to make the slice don't keep the memory? I tried r[i]=''+a.slice(64, 128)+'' but still OOM. Do I have to a[64]+...+a[127] (loops also count as brute force)?
Is it so hard to slice and keep only necessary part of the old large string? The problem here only mentioned "copying every substring as a new string", but not "freeing part of the string remaining the necessary part assessible"
In this case it makes sense for the application code to be more aware of system constraints:
const r = [];
for (let i = 0; i < 1e3; ++i) {
const unitStr = String(i);
// choose something other than "1e6" here:
const maxRepeats = Math.ceil(128 / unitStr.length); // limit the size of the new string
// only using the last 64 characters...
r[i] = unitStr.repeat(maxRepeats).slice(64, 128);
}
...The application improvement is: to no longer construct 1000 strings of up to 3,000,000 bytes each when only 64 bytes are needed for each output string.
Your hardware and other constraints are not specified, but sometimes allowing the program more memory is appropriate:
node --max-old-space-size=8192 my-script.js
An analytic approach. Use logic to more precisely determine the in-memory state required for each working data chunk. With the constraints provided, minimize the generation of unneeded in-memory string data.
const r = new Array(1e3).fill().map((e,i) => outputRepeats(i));
function outputRepeats(idx) {
const OUTPUT_LENGTH = 128 - 64;
const unitStr = String(idx); // eg, '1', '40' or '286'
// determine from which character to start output from "unitStr"
const startIdxWithinUnit = (64 + 1) % unitStr.length; // this can be further optimized for known ranges of the "idx" input
// determine the approximate output string (may consume additional in-memory bytes: up to unitStr.length - 1)
// this can be logically simplified by unconditionally using a few more bytes of memory and eliminating the second arithmetic term
const maxOutputWindowStr = unitStr.repeat(Math.ceil(OUTPUT_LENGTH / unitStr.length) + Math.floor(Math.sign(startIdxWithinUnit)));
// return the exact resulting string
return maxOutputWindowStr.slice(startIdxWithinUnit, OUTPUT_LENGTH);
}
Related
The question comes cause i am writing a application which is very write intensive(using lsm data) and i should be able to say how many transactions i can get on given speced system.
The problem is world of IOPS and throughout is confusing plus OS of different make have different cache at different levels and so on...
Which made me to write my own application(node.js) and do a part of write intensive workload, time it and derive some conclusion around it but the results are confusing..
My app appends to say one file with chuncks of 145MB 3 times in 5 sec.. but if i spread my appends across 100 files thus dividing the payload by 100 ie 145MB/100 i am able to do that in 4 times in 5 seconds.. what explains this behavior?
Is there any better and consistent way to mathematical derive a formula to given iops of ssd what iops can it sustain for one file or multiple files?
Workload is append only and no forced fsync..
Some pseudo code(not full code):
let OPS = 0, newOPS = 0;
for (spreadFactor = 1; (spreadFactor <= 1024) || spreadFactor === 1); spreadFactor += spreadFactor) {
const payload = Array.from({ length: (this.sampleCapacity / spreadFactor) }, (_, idx) => [idx.toString().padStart(20, "0"), idx.toString().padStart(20, "0"), idx.toString().padStart(20, "0"), idx.toString().padStart(20, “0”]));
OPS = newOPS;
let actualBytesOnDisk = 0;
const st = new Stopwatch()
let requests = 0;
while (st.elapsed() < 5000) {
requests++;
for (let index = 0; index < spreadFactor; index++) {
this.handle = fs.openSync(`${spreadFactor}-${index}`, "as");
fs.appendFileSync(this.handle, payload);
fs.fsyncSync(this.handle);
fs.fdatasyncSync(this.handle);
}
}
const elapsed = st.elapsed();
newOPS = (requests / elapsed) * 1000;
console.info(`Calibrating writes R:${requests} E: ${elapsed} F: ${spreadFactor} RPS:${newOPS.toFixed(3)} S:${payload.length} B:${actualBytesOnDisk}`);
}
How can I generate random numbers in a specific range using crypto.randomBytes?
I want to be able to generate a random number like this:
console.log(random(55, 956)); // where 55 is minimum and 956 is maximum
and I'm limited to use crypto.randomBytes only inside random function to generate random number for this range.
I know how to convert generated bytes from randomBytes to hex or decimal but I can't figure out how to get a random number in a specific range from random bytes mathematically.
To generate random number in a certain range you can use the following equation
Math.random() * (high - low) + low
But you want to use crypto.randomBytes instead of Math.random()
this function returns a buffer with randomly generated bytes. In turn, you need to convert the result of this function from bytes to decimal. this can be done using biguint-format package. To install this package simply use the following command:
npm install biguint-format --save
Now you need to convert the result of crypto.randomBytes to decimal, you can do that as follow:
var x= crypto.randomBytes(1);
return format(x, 'dec');
Now you can create your random function which will be as follow:
var crypto = require('crypto'),
format = require('biguint-format');
function randomC (qty) {
var x= crypto.randomBytes(qty);
return format(x, 'dec');
}
function random (low, high) {
return randomC(4)/Math.pow(2,4*8-1) * (high - low) + low;
}
console.log(random(50,1000));
Thanks to answer from #Mustafamg and huge help from #CodesInChaos I managed to resolve this issue. I made some tweaks and increase range to maximum 256^6-1 or 281,474,976,710,655. Range can be increased more but you need to use additional library for big integers, because 256^7-1 is out of Number.MAX_SAFE_INTEGER limits.
If anyone have same problem feel free to use it.
var crypto = require('crypto');
/*
Generating random numbers in specific range using crypto.randomBytes from crypto library
Maximum available range is 281474976710655 or 256^6-1
Maximum number for range must be equal or less than Number.MAX_SAFE_INTEGER (usually 9007199254740991)
Usage examples:
cryptoRandomNumber(0, 350);
cryptoRandomNumber(556, 1250425);
cryptoRandomNumber(0, 281474976710655);
cryptoRandomNumber((Number.MAX_SAFE_INTEGER-281474976710655), Number.MAX_SAFE_INTEGER);
Tested and working on 64bit Windows and Unix operation systems.
*/
function cryptoRandomNumber(minimum, maximum){
var distance = maximum-minimum;
if(minimum>=maximum){
console.log('Minimum number should be less than maximum');
return false;
} else if(distance>281474976710655){
console.log('You can not get all possible random numbers if range is greater than 256^6-1');
return false;
} else if(maximum>Number.MAX_SAFE_INTEGER){
console.log('Maximum number should be safe integer limit');
return false;
} else {
var maxBytes = 6;
var maxDec = 281474976710656;
// To avoid huge mathematical operations and increase function performance for small ranges, you can uncomment following script
/*
if(distance<256){
maxBytes = 1;
maxDec = 256;
} else if(distance<65536){
maxBytes = 2;
maxDec = 65536;
} else if(distance<16777216){
maxBytes = 3;
maxDec = 16777216;
} else if(distance<4294967296){
maxBytes = 4;
maxDec = 4294967296;
} else if(distance<1099511627776){
maxBytes = 4;
maxDec = 1099511627776;
}
*/
var randbytes = parseInt(crypto.randomBytes(maxBytes).toString('hex'), 16);
var result = Math.floor(randbytes/maxDec*(maximum-minimum+1)+minimum);
if(result>maximum){
result = maximum;
}
return result;
}
}
So far it works fine and you can use it as really good random number generator, but I strictly not recommending using this function for any cryptographic services. If you will, use it on your own risk.
All comments, recommendations and critics are welcome!
To generate numbers in the range [55 .. 956], you first generate a random number in the range [0 .. 901] where 901 = 956 - 55. Then add 55 to the number you just generated.
To generate a number in the range [0 .. 901], pick off two random bytes and mask off 6 bits. That will give you a 10 bit random number in the range [0 .. 1023]. If that number is <= 901 then you are finished. If it is bigger than 901, discard it and get two more random bytes. Do not attempt to use MOD, to get the number into the right range, that will distort the output making it non-random.
ETA: To reduce the chance of having to discard a generated number.
Since we are taking two bytes from the RNG, we get a number in the range [0 .. 65535]. Now 65535 MOD 902 is 591. Hence, if our two-byte random number is less than (65535 - 591), that is, less than 64944, we can safely use the MOD operator, since each number in the range [0 .. 901] is now equally likely. Any two-byte number >= 64944 will still have to be thrown away, as using it would distort the output away from random. Before, the chances of having to reject a number were (1024 - 901) / 1024 = 12%. Now the chances of a rejection are (65535 - 64944) / 65535 = 1%. We are far less likely to have to reject the randomly generated number.
running <- true
while running
num <- two byte random
if (num < 64944)
result <- num MOD 902
running <- false
endif
endwhile
return result + 55
The crypto package now has a randomInt() function. It was added in v14.10.0 and v12.19.0.
console.log(crypto.randomInt(55, 957)); // where 55 is minimum and 956 is maximum
The upper bound is exclusive.
Here is the (abridged) implementation:
// Largest integer we can read from a buffer.
// e.g.: Buffer.from("ff".repeat(6), "hex").readUIntBE(0, 6);
const RAND_MAX = 0xFFFF_FFFF_FFFF;
const range = max - min;
const excess = RAND_MAX % range;
const randLimit = RAND_MAX - excess;
while (true) {
const x = randomBytes(6).readUIntBE(0, 6);
// If x > (maxVal - (maxVal % range)), we will get "modulo bias"
if (x > randLimit) {
// Try again
continue;
}
const n = (x % range) + min;
return n;
}
See the full source and the official docs for more information.
So the issue with most other solutions are that they distort the distribution (which you probably would like to be uniform).
The pseudocode from #rossum lacks generalization. (But he proposed the right solution in the text)
// Generates a random integer in range [min, max]
function randomRange(min, max) {
const diff = max - min + 1;
// finds the minimum number of bit required to represent the diff
const numberBit = Math.ceil(Math.log2(diff));
// as we are limited to draw bytes, minimum number of bytes
const numberBytes = Math.ceil(numberBit / 4);
// as we might draw more bits than required, we look only at what we need (discard the rest)
const mask = (1 << numberBit) - 1;
let randomNumber;
do {
randomNumber = crypto.randomBytes(numberBytes).readUIntBE(0, numberBytes);
randomNumber = randomNumber & mask;
// number of bit might represent a numbers bigger than the diff, in that case try again
} while (randomNumber >= diff);
return randomNumber + min;
}
About performance concerns, basically the number is in the right range between 50% - 100% of the time (depending on the parameters). That is in the worst case scenario the loop is executed more than 7 times with less than 1% chance and practically, most of the time the loop is executed one or two times.
The random-js library acknowledges that most solution out there don't provide random numbers with uniform distributions and provides a more complete solution
I am trying to calculate row count from a large file based on presence of a certain character and would like to use StreamReader and ReadBlock - below is my code.
protected virtual long CalculateRowCount(FileStream inStream, int bufferSize)
{
long rowCount=0;
String line;
inStream.Position = 0;
TextReader reader = new StreamReader(inStream);
char[] block = new char[4096];
const int blockSize = 4096;
int indexer = 0;
int charsRead = 0;
long numberOfLines = 0;
int count = 1;
do
{
charsRead = reader.ReadBlock(block, indexer, block.Length * count);
indexer += blockSize ;
numberOfLines = numberOfLines + string.Join("", block).Split(new string[] { "&ENDE" }, StringSplitOptions.None).Length;
count ++;
} while (charsRead == block.Length);//charsRead !=0
reader.Close();
fileRowCount = rowCount;
return rowCount;
}
But I get error
Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.
I am not sure what is wrong... Can you help. Thanks ahead!
For one, read the StreamReader.ReadBlock() documentation carefully http://msdn.microsoft.com/en-us/library/system.io.streamreader.readblock.aspx and compare with what you're doing:
The 2nd argument (indexer) should be within the range of the block you're passing in, but you're passing something that will probably exceed it after one iteration. Since it looks like you want to reuse the memory block, pass 0 here.
The 3rd argument (count) indicates how many bytes to read into your memory block; passing something larger than the block size might not work (depends on implementation)
ReadBlock() returns the number of bytes actually read, but you increment indexer as if it will always return the size of the block exactly (most of the time, it won't)
This simple code stores 1 million strings (100 chars length) in an array.
function makestring(len) {
var s = '';
while (len--) s = s+'1';
return s;
}
var s = '';
var arr = [];
for (var i=0; i<1000000; i++) {
s = makestring(100);
arr.push(s);
if (i%1000 == 0) console.log(i+' - '+s);
}
When I run it, I get this error:
(...)
408000 - 1111111111111111111 (...)
409000 - 1111111111111111111 (...)
FATAL ERROR: JS Allocation failed - process out of memory
That's strange 1 million * 100 are just 100 megabytes.
But if I move the s = makestring(100); outside the loop...
var s = makestring(100);
var arr = [];
for (var i=0; i<1000000; i++) {
arr.push(s);
if (i%1000 == 0) {
console.log(i+' - '+s);
}
}
This executes without errors!
Why? How can I store 1 Million objects in node?
In the moment you move the String generation outside the loop, you basically just create one String and push it a million times into the array.
Inside the array, however, just pointers to the original String are used, which is far less memory consuming then saving the String a million times.
Your first example builds 1000000 strings.
In your second example, you're taking the same string object and adding it to your array 1000000 times. (it's not copying the string; each entry of the array points to the same object)
V8 does a lot of things to optimize string use. For example, string concatenation is less expensive (in most cases) than you think. Rather than building a whole new string, it will typically opt to connect them i a linked list fashion under the covers.
I'm working on a checksum algorithm, and I'm having some issues. The kicker is, when I hand craft a "fake" message, that is substantially smaller than the "real" data I'm receiving, I get a correct checksum. However, against the real data - the checksum does not work properly.
Here's some information on the incoming data/environment:
This is a groovy project (see code below)
All bytes are to be treated as unsigned integers for the purpose of checksum calculation
You'll notice some finagling with shorts and longs in order to make that work.
The size of the real data is 491 bytes.
The size of my sample data (which appears to add correctly) is 26 bytes
None of my hex-to-decimal conversions are producing a negative number, as best I can tell
Some bytes in the file are not added to the checksum. I've verified that the switch for these is working properly, and when it is supposed to - so that's not the issue.
My calculated checksum, and the checksum packaged with the real transmission always differ by the same amount.
I have manually verified that the checksum packaged with the real data is correct.
Here is the code:
// add bytes to checksum
public void addToChecksum( byte[] bytes) {
//if the checksum isn't enabled, don't add
if(!checksumEnabled) {
return;
}
long previouschecksum = this.checksum;
for(int i = 0; i < bytes.length; i++) {
byte[] tmpBytes = new byte[2];
tmpBytes[0] = 0x00;
tmpBytes[1] = bytes[i];
ByteBuffer tmpBuf = ByteBuffer.wrap(tmpBytes);
long computedBytes = tmpBuf.getShort();
logger.info(getHex(bytes[i]) + " = " + computedBytes);
this.checksum += computedBytes;
}
if(this.checksum < previouschecksum) {
logger.error("Checksum DECREASED: " + this.checksum);
}
//logger.info("Checksum: " + this.checksum);
}
If anyone can find anything in this algorithm that could be causing drift from the expected result, I would greatly appreciate your help in tracking this down.
I don't see a line in your code where you reset your this.checksum.
This way, you should alway get a this.checksum > previouschecksum, right? Is this intended?
Otherwise I can't find a flaw in your above code. Maybe your 'this.checksum' is of the wrong type (short for instance). This could rollover so that you get negative values.
here is an example for such a behaviour
import java.nio.ByteBuffer
short checksum = 0
byte[] bytes = new byte[491]
def count = 260
for (def i=0;i<count;i++) {
bytes[i]=255
}
bytes.each { b ->
byte[] tmpBytes = new byte[2];
tmpBytes[0] = 0x00;
tmpBytes[1] = b;
ByteBuffer tmpBuf = ByteBuffer.wrap(tmpBytes);
long computedBytes = tmpBuf.getShort();
checksum += computedBytes
println "${b} : ${computedBytes}"
}
println checksum +"!=" + 255*count
just play around with the value of the 'count' variable which somehow corresponds to the lenght of your input.
Your checksum will keep incrementing until it rolls over to being negative (as it is a signed long integer)
You can also shorten your method to:
public void addToChecksum( byte[] bytes) {
//if the checksum isn't enabled, don't add
if(!checksumEnabled) {
return;
}
long previouschecksum = this.checksum;
this.checksum += bytes.inject( 0L ) { tot, it -> tot += it & 0xFF }
if(this.checksum < previouschecksum) {
logger.error("Checksum DECREASED: " + this.checksum);
}
//logger.info("Checksum: " + this.checksum);
}
But that won't stop it rolling over to being negative. For the sake of saving 12 bytes per item that you are generating a hash for, I would still suggest something like MD5 which is know to work is probably better than rolling your own... However I understand sometimes there are crazy requirements you have to stick to...