I am experimenting with Blackberry's Persistent Store, but I have gotten nowhere so far, which is good, I guess.
So I have written a a short program that attempts iterator through 0 to a specific upper bound to search for persisted objects. Blackberry seems to intentionally slow the loop. Check this out:
String result = "result: \n";
int ub = 3000;
Date start = Calendar.getInstance().getTime();
for(int i=0; i<ub; i++){
PersistentObject o = PersistentStore.getPersistentObject(i);
if (o.getContents() != null){
result += (String) o.getContents() + "\n";
}
}
result += "end result\n";
result += "from 0 to " + ub + " took " + (Calendar.getInstance().getTime().getTime() - start.getTime()) / 1000 + " seconds";
From 0 to 3000 took 20 seconds. Is this enough to conclude that brute-forcing is not a practical method to breach the Blackberry?
In general, how secure is BB Persistent Store?
It's very secure. If you're only getting 150 tries per second, it's going to take you about 3.9 billion years to try every long value (18446744073709551616 of them).
Even then, it would only find objects that are not secured further with a ControlledAccess object. If an application wraps the persisted data with a ControlledAccess object, it can only be read by the same signed application that stored the object. See the PersistentObject class docs for more information.
Related
I tried one code at hackerearth : https://www.hackerearth.com/practice/data-structures/stacks/basics-of-stacks/practice-problems/algorithm/fight-for-laddus/description/
The speed seems fine however the memory usage exceed the 256mb limit by nearly 2.8 times.
In java and python the memory is 5 times less however the time is nearly twice.
What factor can be used to optimise the memory usage in nodejs code implementation?
Here is nodejs implementation:
// Sample code to perform I/O:
process.stdin.resume();
process.stdin.setEncoding("utf-8");
var stdin_input = "";
process.stdin.on("data", function (input) {
stdin_input += input; // Reading input from STDIN
});
process.stdin.on("end", function () {
main(stdin_input);
});
function main(input) {
let arr = input.split("\n");
let testCases = parseInt(arr[0], 10);
arr.splice(0,1);
finalStr = "";
while(testCases > 0){
let inputArray = (arr[arr.length - testCases*2 + 1]).split(" ");
let inputArrayLength = inputArray.length;
testCases = testCases - 1;
frequencyObject = { };
for(let i = 0; i < inputArrayLength; ++i) {
if(!frequencyObject[inputArray[i]])
{
frequencyObject[inputArray[i]] = 0;
}
++frequencyObject[inputArray[i]];
}
let finalArray = [];
finalArray[inputArrayLength-1] = -1;
let stack = [];
stack.push(inputArrayLength-1);
for(let i = inputArrayLength-2; i>=0; i--)
{
let stackLength = stack.length;
while(stackLength > 0 && frequencyObject[inputArray[stack[stackLength-1]]] <= frequencyObject[inputArray[i]])
{
stack.pop();
stackLength--;
}
if (stackLength > 0) {
finalArray[i] = inputArray[stack[stackLength-1]];
} else {
finalArray[i] = -1;
}
stack.push(i);
}
console.log(finalArray.join(" ") + "\n")
}
}
What factor can be used to optimize the memory usage in nodejs code implementation?
Here are some things to consider:
Don't buffer any more input data than you need to before you process it or output it.
Try to avoid making copies of data. Use the data in place if possible. Remember that all string operations create a new string that is likely a copy of the original data. And, many array operations like .map(), .filter(), etc... create new copies of the original array.
Keep in mind that garbage collection is delayed and is typically done during idle time. So, for example, modifying strings in a loop may create a lot of temporary objects that all must exist at once, even though most or all of them will be garbage collected when the loop is done. This creates a poor peak memory usage.
Buffering
The first thing I notice is that you read the entire input file into memory before you process any of it. Right away for large input files, you're going to use a lot of memory. Instead, what you want to do is read enough of a chunk to get the next testCase and then process it.
FYI, this incremental reading/processing will make the code significantly more complicated to write (I've written an implementation myself) because you have to handle partially read lines, but it will hold down memory use a bunch and that's what you asked for.
Copies of Data
After reading the entire input file into memory, you immediately make a copy of it all with this:
let arr = input.split("\n");
So, now you've more than doubled the amount of memory the input data is taking up. Instead of just one string for all the input, you now still have all of that in memory, but you've now broken it up into hundreds of other strings (each with a little overhead of its own for a new string and of course a copy of each line).
Modifying Strings in a Loop
When you're creating your final result which you call finalStr, you're doing this over and over again:
finalStr = finalStr + finalArray.join(" ") + "\n"
This is going to create tons and tons of incremental strings that will likely end up all in memory at once because garbage collection probably won't run until the loop is over. As an example, if you had 100 lines of output that were each 100 characters long so the total output (not count line termination characters) was 100 x 100 = 10,000 characters, then constructing this in a loop like you are would create temporary strings of 100, 200, 300, 400, ... 10,000 which would consume 5000 (avg length) * 100 (number of temporary strings) = 500,000 characters. That's 50x the total output size consumed in temporary string objects.
So, not only does this create tons of incremental strings each one larger than the previous one (since you're adding onto it), it also creates your entire output in memory before writing any of it out to stdout.
Instead, you can incremental output each line to stdout as you construct each line. This will put the worst case memory usage at probably about 2x the output size whereas you're at 50x or worse.
I'm building a program to find substrings of Copeland-Erdős constant in C++11
Copeland-Erdős constant is a string with all primes in order:
2,3,5,7,11,13… → 23571113…
I need to check if a substring given is inside that constant, and do it in a quick way.
By the moment I've build a serial program using Miller Rabin function for checking if the numbers generated by a counter are primes or not and add to the main string (constant). To find 8th Marsene Number (231-1) the program spends 8 minutes.
And then, I use find to check if the substring given is in the constant and the position where it starts.
PROBLEMS:
I use serial programming. I start at 0 and check if all numbers are prime to add them or not... I don't know if there is any other way to do it. The substring can be a mix of primes. ex: 1..{1131}..7 (substring of 11,13,17)
Do you have any proposal to improve the program execution time by using OpenMP?
I want to calculate 9th Mersene Number in "human time". I've spend more than one day and it doesn't find it (well, arrive to the number).
gcc version 4.4.7 20120313
Main.cpp
while (found == -1 && lastNumber < LIMIT) //while not found & not pass our limit
{
//I generate at least a string with double size of the input (llargada)
for (lastNumber; primers.length() <= 2*llargada; lastNumber++){
if (is_prime_mr(lastNumber))
primers += to_string(lastNumber); //if prime, we add it to the main string
}
found = primers.find(sequencia); //search substring and keep position
if (found == string::npos){ //if not found
indexOfZero += primers.length()/2; //keep IndexOfZero, the position of string in global constant
primers.erase(0,primers.length()/2); //delete first middle part of calculated string
}
}
if (found != -1){
cout << "FOUNDED!" << endl;
cout << "POS: " << indexOfZero << " + " << found << " = " << indexOfZero+found << endl;} //that give us the real position of the substring in the main string
//although we only spend 2*inputString.size() memory
else
cout << "NOT FOUND" << endl;
Improving serial execution:
For starters, you do not need to check every number to see if it's prime, but rather every odd number (except for 2). We know that no even number past two can be prime. This should cut down your execution time in half.
Also, I do not understand why you have a nested loop. You should only have to check your list once.
Also, I fear that your algorithm might not be correct. Currently, if you do not find the substring, you delete half of your string and move on. However, if you have 50 non-primes in a row, you could end up deleting the entire string except for the very last character. But what if the substring you're looking for is 3 digits and needed 2 of the previous characters? Then you've erased some of the information needed to find your solution!
Finally, you should only search for your substring if you've actually found a prime number. Otherwise, you have already searched for it last iteration and nothing has been added to your string.
Combining all of these ideas, you have:
primers = "23";
lastNumber = 3;
found = -1;
while (found == -1)
{
lastNumber += 2;
if (is_prime_mr(lastNumber)) {
primers += to_string(lastNumber); //if prime, we add it to the main string
found = primers.find(sequencia); //search substring and keep position
if (found == string::npos)
found = -1;
else
break;
}
}
Also, you should write your own find function to only check the last few digits (where few = length of your most recent concatenation to the global string primers). If the substring wasn't in the previous global string, there's only a few places it could pop up in your newest string. That algorithm should be O(1) as opposed to O(n).
int findSub(std::string total, std::string substring, std::string lastAddition);
With this change your if statement should change to:
if (found != -1)
break;
Adding parallelism:
Unfortunately, as-is, your algorithm is inherently serial because you have to iterate through all the primes one-by-one, adding them to the list in a row in order to find your answer. There's no simple OpenMP way to parallelize your algorithm.
However, you can take advantage of parallelism by breaking up your string into pieces and having each thread work separately. Then, the only tricky thing you have to do is consider the boundaries between the final strings to double check you haven't missed anything. Something like as follows:
bool globalFound = false;
bool found;
std::vector<std::string> primers;
#pragma omp parallel private(lastNumber, myFinalNumber, found, my_id, num_threads)
{
my_id = omp_get_thread_num();
num_threads = omp_get_num_threads();
if (my_id == 0) { // first thread starts at 0... well, actually 3
primers.resize(num_threads);
#pragma omp barrier
primers[my_id] = "23";
lastNumber = 3;
}
else {
// barrier needed to ensure that primers is initialized to correct size
#pragma omp barrier
primers[my_id] = "";
lastNumber = (my_id/(double)num_threads)*LIMIT - 2; // figure out my starting place
if (lastNumber % 2 == 0) // ensure I'm not even
lastNumber++;
}
found = false;
myFinalNumber = ((my_id+1)/(double)num_threads)*LIMIT - 2;
while (!globalFound && lastNumber < myFinalNumber)
{
lastNumber += 2;
if (is_prime_mr(lastNumber)) {
primers[my_id] += to_string(lastNumber);
found = findSub(primers[my_id], sequencia, to_string(lastNumber)); // your new version of find
if (found) {
#pragma omp atomic
globalFound = true;
break;
}
}
}
}
if (!globalFound) {
// Result was not found in any thread, so check for boundaries/endpoints
globalFound = findVectorSubstring(primers, sequencia);
}
I'll let you finish this (by writing the smart find, findVectorSubstring - should only be checking for boundaries between elements of primers, and double checking you understand the logic of this new algorithm). Furthermore, if the arbitrary LIMIT that you setup turns out to be too small, you can always wrap this whole thing in a loop that searches between i*LIMIT and (i+1)*LIMIT.
Lastly, yes there will be load balancing issues. I can certainly imagine threads finding an uneven amount of prime numbers. Therefore, certain threads will be doing more work in the find function than others. However, a smart version of find() should be O(1) whereas is_prime_mr() is probably O(n) or O(logn), so I'm assuming that the majority of the execution time will be spent in the is_prime_mr() function. Therefore, I do not believe the load balancing will be too bad.
Hope this helps.
Why is the speed of nodejs array shift/push operations not linear in the size of the array? There is a dramatic knee at 87370 that completely crushes the system.
Try this, first with 87369 elements in q, then with 87370. (Or, on a 64-bit system, try 85983 and 85984.) For me, the former runs in .05 seconds; the latter, in 80 seconds -- 1600 times slower. (observed on 32-bit debian linux with node v0.10.29)
q = [];
// preload the queue with some data
for (i=0; i<87369; i++) q.push({});
// fetch oldest waiting item and push new item
for (i=0; i<100000; i++) {
q.shift();
q.push({});
if (i%10000 === 0) process.stdout.write(".");
}
64-bit debian linux v0.10.29 crawls starting at 85984 and runs in .06 / 56 seconds. Node v0.11.13 has similar breakpoints, but at different array sizes.
Shift is a very slow operation for arrays as you need to move all the elements but V8 is able to use a trick to perform it fast when the array contents fit in a page (1mb).
Empty arrays start with 4 slots and as you keep pushing, it will resize the array using formula 1.5 * (old length + 1) + 16.
var j = 4;
while (j < 87369) {
j = (j + 1) + Math.floor(j / 2) + 16
console.log(j);
}
Prints:
23
51
93
156
251
393
606
926
1406
2126
3206
4826
7256
10901
16368
24569
36870
55322
83000
124517
So your array size ends up actually being 124517 items which makes it too large.
You can actually preallocate your array just to the right size and it should be able to fast shift again:
var q = new Array(87369); // Fits in a page so fast shift is possible
// preload the queue with some data
for (i=0; i<87369; i++) q[i] = {};
If you need larger than that, use the right data structure
I started digging into the v8 sources, but I still don't understand it.
I instrumented deps/v8/src/builtins.cc:MoveElemens (called from Builtin_ArrayShift, which implements the shift with a memmove), and it clearly shows the slowdown: only 1000 shifts per second because each one takes 1ms:
AR: at 1417982255.050970: MoveElements sec = 0.000809
AR: at 1417982255.052314: MoveElements sec = 0.001341
AR: at 1417982255.053542: MoveElements sec = 0.001224
AR: at 1417982255.054360: MoveElements sec = 0.000815
AR: at 1417982255.055684: MoveElements sec = 0.001321
AR: at 1417982255.056501: MoveElements sec = 0.000814
of which the memmove is 0.000040 seconds, the bulk is the heap->RecordWrites (deps/v8/src/heap-inl.h):
void Heap::RecordWrites(Address address, int start, int len) {
if (!InNewSpace(address)) {
for (int i = 0; i < len; i++) {
store_buffer_.Mark(address + start + i * kPointerSize);
}
}
}
which is (store-buffer-inl.h)
void StoreBuffer::Mark(Address addr) {
ASSERT(!heap_->cell_space()->Contains(addr));
ASSERT(!heap_->code_space()->Contains(addr));
Address* top = reinterpret_cast<Address*>(heap_->store_buffer_top());
*top++ = addr;
heap_->public_set_store_buffer_top(top);
if ((reinterpret_cast<uintptr_t>(top) & kStoreBufferOverflowBit) != 0) {
ASSERT(top == limit_);
Compact();
} else {
ASSERT(top < limit_);
}
}
when the code is running slow, there are runs of shift/push ops followed by runs of 5-6 calls to Compact() for every MoveElements. When it's running fast, MoveElements isn't called until a handful of times at the end, and just a single compaction when it finishes.
I'm guessing memory compaction might be thrashing, but it's not falling in place for me yet.
Edit: forget that last edit about output buffering artifacts, I was filtering duplicates.
this bug had been reported to google, who closed it without studying the issue.
https://code.google.com/p/v8/issues/detail?id=3059
When shifting out and calling tasks (functions) from a queue (array)
the GC(?) is stalling for an inordinate length of time.
114467 shifts is OK
114468 shifts is problematic, symptoms occur
the response:
he GC has nothing to do with this, and nothing is stalling either.
Array.shift() is an expensive operation, as it requires all array
elements to be moved. For most areas of the heap, V8 has implemented a
special trick to hide this cost: it simply bumps the pointer to the
beginning of the object by one, effectively cutting off the first
element. However, when an array is so large that it must be placed in
"large object space", this trick cannot be applied as object starts
must be aligned, so on every .shift() operation all elements must
actually be moved in memory.
I'm not sure there's a whole lot we can do about this. If you want a
"Queue" object in JavaScript with guaranteed O(1) complexity for
.enqueue() and .dequeue() operations, you may want to implement your
own.
Edit: I just caught the subtle "all elements must be moved" part -- is RecordWrites not GC but an actual element copy then? The memmove of the array contents is 0.04 milliseconds. The RecordWrites loop is 96% of the 1.1 ms runtime.
Edit: if "aligned" means the first object must be at first address, that's what memmove does. What is RecordWrites?
So I'm testing some extreme writes, 10 individual insert...simple for loop (let's keep the topic simple for a moment)...I've got 'wait for sync' turned on for the collection (in this case we need 100% commit when the call returns)... 2 machines... I run the loop on my main machine that I'm running the actual unit test from and it takes 3 minutes to write the 10k... if I write to my remote machine (same arangoDB settings), it takes 9 sec...Is the reason it's taking longer on my local machine due to it also running the unit tests? Or is it due to the SYNC/MSYNC issues of the drive that the arangoDB FAQ warns about?
"From the durability point of view, immediate synchronization is of course better, but it means performing an extra system call for each operation. On systems with slow sync/msync,"
Is there a setting or whatever to check on a drive or system to determine what my values are for the sync/msync of the device?
thanks for the help!!
First of all the actual speed strongly depends on your hard disk.
For example for a notebook with SSD under MacOSX I get:
arangod> t = time(); for (i = 0; i < 1000; i++) db.unsync.save({ name: "Hallo " + i }); time() - t;
0.03408193588256836
arangod> t = time(); for (i = 0; i < 1000; i++) db.sync.save({ name: "Hallo " + i }); time() - t;
6.904788970947266
So writing 1000 documents is 200x times faster.
For a desktop with harddisk under Linux I get:
arangod> t = time(); for (i = 0; i < 1000; i++) db.unsync.save({ name: "Hallo " + i }); time() - t;
0.08486199378967285
arangod> t = time(); for (i = 0; i < 1000; i++) db.unsync.save({ name: "Hallo " + i }); time() - t;
54.90065908432007
Here it is even worse. More than a factor of 600.
Regarding the difference between local and remote: That sounds strange. How do you access the remote machine? Do you use arangosh?
in my application there is a small part of function,in which it will read files to get some information,the number of filecount would be utleast 50,So I thought of implementing threading.Say if the user is giving 50 files,I wanted to separate it as 5 *10, 5 thread should be created,so that each thread can handle 10 files which can speed up the process.And also from the below code you can see that some variables are common.I read some articles about threading and I am aware that only one thread should access a variable/contorl at a me(CCriticalStiuation can be used for that).For me as a beginner,I am finding hard to imlplement what I have learned about threading.Somebody please give me some idea with code shown below..thanks in advance
file read function://
void CMyClass::GetWorkFilesInfo(CStringArray& dataFilesArray,CString* dataFilesB,
int* check,DWORD noOfFiles,LPWSTR path)
{
CString cFilePath;
int cIndex =0;
int exceptionInd = 0;
wchar_t** filesForWork = new wchar_t*[noOfFiles];
int tempCheck;
int localIndex =0;
for(int index = 0;index < noOfFiles; index++)
{
tempCheck = *(check + index);
if(tempCheck == NOCHECKBOX)
{
*(filesForWork+cIndex) = new TCHAR[MAX_PATH];
wcscpy(*(filesForWork+cIndex),*(dataFilesB +index));
cIndex++;
}
else//CHECKED or UNCHECKED
{
dataFilesArray.Add(*(dataFilesB+index));
*(check + localIndex) = *(check + index);
localIndex++;
}
}
WorkFiles(&cFilePath,dataFilesArray,filesForWork,
path,
cIndex);
dataFilesArray.Add(cFilePath);
*(check + localIndex) = CHECKED;
}
I think you would be better off having just one thread to read all the files. The context switch between the threads together with the synchronization issues is really not worth the price in your example. The hard drive is one resource so imagine all five threads taking turns moving the hard drive read heads to various positions on the hard drive == not very effective.