Optimize CoffeeScript-generated JS for V8 - node.js

coffee-generated JS of the following simple code snippet:
console.log 'b' if 'b' in arr
is
var __indexOf = [].indexOf || function(item) {
for (var i = 0, l = this.length; i < l; i++) {
if (i in this && this[i] === item) return i;
} return -1;
};
if (__indexOf.call(arr, 'b') >= 0) {
console.log('b');
}
I can understand why it is so. IE doesn't support indexOf, and we want to make sure our CS code runs smoothly on all browsers. But, when writing the code for a Node.js server, we know exactly what the JS engine supports (ECMA-262, 5th edition), so we wouldn't need the above trick.
I'm not terribly familiar with different JavaScript implementations, but I'm sure it's not the only non-optimal code coffee -c produces because of browser incompatibilities, and if we consider all of them in a production server with thousands of concurrent connections, they add a considerable unnecessary overhead to the code.
Is there a way to remedy this? More and more Node.js code is written in CS these days, and with SourceMap on the horizon, the number would be even greater...

This is barely non-optimal; the __indexOf declaration is evaluated once, at the beginning, and it's immediately resolved to [].indexOf, i.e. using the underlying implementation's Array.prototype.indexOf. That's not exactly a huge expense, surely.
I'd need to see some other examples of "non-optimal" code, but I'm sure most of them fall into the same basket. Number of concurrent connections doesn't scale the effect of this at all.

Related

Office JS setDataAsync function memory leak

When using Async functions in the shared API of the office js in Excel 2016 it causes a memory leak, specifically calling binding.setDataAsnyc which never releases the data written.( The leak is in the internet explorer process running the addin within excel( it is a 32-bit version one)).
Example :
//1 Miliion row by 10 columns data parsed from a csv usually in chunks
var data = [];
var i,j;
for (i=0; i<1000000; i++) {
row = [];
for(j=0; j<10; j++) {
row.push("data" + i + "" + j);
}
data.push(row);
}
var limit = 10000;
var next = function (step) {
var columnLetter = getExcelColumnName(data[0].length);
var startRow = step * limit + 1;
var endRow = start + limit;
if (data.length < startRow)
return;
var values = data.slice(startRow - 1, endRow - 1);
var range = "A" + startRow + ":" + columnLetter + "" + endRow;
Office.context.document.bindings.addFromNamedItemAsync(range,
Office.CoercionType.Matrix, { id: "binding" + step },
function (asyncResult) {
if (asyncResult.status == "failed") {
console.log('Error: ' + asyncResult.error.message);
return;
}
Office.select("bindings#binding" + step).setDataAsync(values,
{
coercionType: Office.CoercionType.Matrix
}, function (asyncResult) {
//Memory keeps Increasing on this callback
if (asyncResult.status == Office.AsyncResultStatus.Failed) {
console.log("Action failed with error: " + asyncResult.error.message);
return;
}
next(step++);
});
});
}
next(0);
I tried releasing each binding after the setDataAsync but still the memory persists. the only way to reclaim the memory is to reload the addin.
I tried the other way of assigning values to ranges:
range.values = values;
It doesn't leak but take 3 times as long as setDataAsync (approximately 210 seconds for 1M rows by 10 columns) while setDataAsync take about 70 seconds but of course leaks and consumes 1.1 GB of memory in that request.
I also tried table.rows.add(null, values); but that's got even worse performance.
I tested the same code without setdataAsync (calling next right away) and no memory leak occurs.
Did anybody else experience this?
Is there anyway around it to release that memory?
If not is there another way to fill large amount of data in Excel except these 3 methods that is also fast?
Adding a binding does increase memory consumption -- but just calling setDataAsync definitely should not (or at least, not by more than you'd expect, or by more than copy-pasting the data into the sheet manually would be).
One thing to clarify: when you say:
I also tried table.rows.add(null, values) but that's even worse performance.
I assume you mean doing the alternate Office 2016-wave-of-APIs syntax, using an Excel.run(function(context) { ... });? I can follow up regarding perf, but I'm curious if the memory leak happens when you use the new API syntax as well.
FWIW, there is an API coming in the near-ish future that will allow you to suspend calculation while setting values -- that should increase performance. But I do want to see if we can figure out the two issues you're pointing out: the memory leak on setDataAsync and the range.values call.
If you can begin by answering the question above (whether the same leak occurs even in range.values), it will help narrow down the issue, and then I'll follow up with the team.
Thanks!
I found out there is a boolean window.Excel._RedirectV1APIs that you can set to true that make setDataAsync use the new API. It should be set automatically if
window.Office.context.requirements.isSetSupported("RedirectV1Api")
return true but somehow that requirement is not set but it works if you set manually
window.Excel._RedirectV1APIs = true
yet it is still much slower than native setDataAsync but slightly faster than the range.values manual approach (160s vs 180s for 1M rows by 10 cols) and don't cause memory leaks.
Also, I found out that the memory leak happens after window.external.Execute(id, params, callback) function with id: 71 (setDataAsnyc) and on calling the callback from that function so it seems the memory leak somehow happens in/because of external code in Excel itself (although the memory leak is in the internet explorer process though). If you short circuit and just call the callback directly instead of window.external.Execute no memory leak happens, but of course, no data is set either.

ways to express concurrency without thread

I am wondering about how concurrency can be expressed without an explicit thread object, not the implementation, which probably would use threads or thread pools, but the language design related issues.
Q1: I wonder what would be lost if there was no thread object, what couldn't be done in such a language?
Q2: I also wonder about how this would be expressed, what ways were proposed or implemented as alternatives or complements to threads?
one possibility is the MPI-programm-model (GPU as well)
lets say you have the following code
for(int i=0; i < 100; i++) {
work(i);
}
the "normal" thread-based way would be the separation of the iteration-range into multiple subsets. So something like this
Thread-1:
for(int i=0; i < 50; i++) {
work(i);
}
Thread-2:
for(int i=50; i < 100; i++) {
work(i);
}
however in MPI/GPU you do something different.
the idea is, that every core execute the same(GPU) or at least
a similar (MPI) programm. the difference is, that each core uses
a different ID, which changes the behavior of the code.
mpi-style: (not exactly the MPI-syntax)
int rank = get_core_id();
int size = get_num_core();
int subset = 100 / size;
for (int i = rank * subset;i < (rand+1)*subset; i+) {
//each core will use a different range for i
work(i);
}
the next big thing is communication. Normally you need to use all of the synchronization-stuff manually. MPI is message-based, meaning that its not perfectly suited for classical shared-memory modells (every core has access to the same memory), but in a cluster system (many cores combined with a network) it works excellent. This is not only limited to supercomputers (they use basically only mpi-style stuff), but in the recent years a new type of core-architecture (manycores) was developed. They have a local so called Network-On-Chip, so each core can send/receive messages without having the problem with synchronization.
MPI contains not only simple messages, but higher constructs to automatically scatter and gather data to every core.
Example: (again not MPI-syntax)
int rank = get_core_id();
int size = get_num_core();
int data[100];
int result;
int results[size];
if (rank == 0) { //master-core only
fill_with_stuff(data);
}
scatter(0, data); //core-0 will send the data-content to all other cores
result = work(rank, data); // every core works on the same data
gather(0,result,results); //get all local results and store them in
//the results-array of core-0
an other solutions is the openMP-libary
here you declare parallel-blocks. the whole thread-part is done by the libary itself
example:
//this will split the for-loop automatically in 4 threads
#pragma omp parallel for num_threads(4)
for(int i=0; i < 100; i++) {
work(i);
}
the big advantage is, that its fast to write. thats it
you may get better performance with writing the threads on your own,
but it takes a lot more time and knowledge about synchronization

identify memory leak of closure with memwatch-node

My Node.js project suffering memory leaking, I've already set variables to null in closure, I mean, I know code like this:
var a = 0;
var b = 1;
var c = 0;
example_func(c, func(){
console.log(b);
});
Will cause memory leaks, so I add some code to set these variables to null;
var a = 0;
var b = 1;
var c = 0;
example_func(c, func(){
console.log(b);
a = null;
b = null;
c = null;
});
But I still got leaks, so I try to use memwatch-node to figure out what's wrong with my code.
And the result shows that closure causing the leak, but not specified enough to target.
I've got the JSON like this
{ what: 'Closure',
'+': 12521,
size: '520.52 kb',
'-': 5118,
size_bytes: 533016 },
And I am wondering if I could get more specific details about which closure is leaking.
I've assigned name for all closures, but still not work.
You can't get more specific about which closure. memwatch gets a dump of the v8 heap and then takes differences of it and reports leaks if, after 5 successive garbage collection events, that object type count continued to grow.
Also, I believe you are confused on what closures are. The MDN page on closures gives a good description. A closure is not a variable, but a scope that enables functions to retain references and continue to work when used in a part of the code where those variable references would not otherwise be available.
If you pass functions around keep a reference to that function, it's closure could reference other closures. So, it's possible you have a single closure that could have a lot in it.
Do this: disable parts of your code until memwatch stops complaining. Then, look at that code. If you are still confused, post more details in this question.

Poor IO performance under heavy load

It seems that I have a problem with Linux IO performance. Working with a project I need to clear whole the file from the kernel space. I use the following code pattern:
for_each_mapping_page(mapping, index) {
page = read_mapping_page(mapping, index);
lock_page(page);
{ kmap // memset // kunmap }
set_page_dirty(page);
write_one_page(page, 1);
page_cache_release(page);
cond_resched();
}
All works fine but with large files (~3Gb+ for me) I see that my system stalls in a strange manner: while this operation is not completed I can't run anything. In other words, all the processes that exists before this operation runs fine, but if I try to run something while this operation I see nothing until it completed.
Is it a kernel's IO scheduling issue or may be I missed something? And how can I fix this problem?
Thanks.
UPD:
According to Kristof's suggestion I've reworked my code and now it looks like this:
headIndex = soff >> PAGE_CACHE_SHIFT;
tailIndex = eoff >> PAGE_CACHE_SHIFT;
/**
* doing the exact #headIndex .. #tailIndex range
*/
for (index = headIndex; index < tailIndex; index += nr_pages) {
nr_pages = min_t(int, ARRAY_SIZE(pages), tailIndex - index);
for (i = 0; i < nr_pages; i++) {
pages[i] = read_mapping_page(mapping, index + i, NULL);
if (IS_ERR(pages[i])) {
while (i--)
page_cache_release(pages[i]);
goto return_result;
}
}
for (i = 0; i < nr_pages; i++)
zero_page_atomic(pages[i]);
result = filemap_write_and_wait_range(mapping, index << PAGE_CACHE_SHIFT,
((index + nr_pages) << PAGE_CACHE_SHIFT) - 1);
for (i = 0; i < nr_pages; i++)
page_cache_release(pages[i]);
if (result)
goto return_result;
if (fatal_signal_pending(current))
goto return_result;
cond_resched();
}
As the result I've got better IO performance, but still have problems with huge IO activity while doing concurrent disk access within the same user as caused the operation.
Anyway, thanks for the suggestions.
In essence you're bypassing the kernels IO scheduler completely.
If you look at the ext2 implementation you'll see it never (well ok, once) calls write_one_page(). For large-scale data transfers it uses mpage_writepages() instead.
This uses the Block I/O interface, rather than immediately accessing the hardware. This means it passes through the IO scheduler. Large operations will not block the entire systems, as the scheduler will automatically ensure that other operations are interleaved with the large writes.

Freetype2 failing under WoW64

I built a tff to D3D texture function using freetype2(2.3.9) to generate grayscale maps from the fonts. it works great under native win32, however, on WoW64 it just explodes (well, FT_Done and FT_Load_Glyph do). from some debugging, it seems to be a problem with HeapFree as called by free from FT_Free.
I know it should work, as games like WCIII, which to the best of my knowledge use freetype2, run fine, this is my code, stripped of the D3D code(which causes no problems on its own):
FT_Face pFace = NULL;
FT_Error nError = 0;
FT_Byte* pFont = static_cast<FT_Byte*>(ARCHIVE_LoadFile(pBuffer,&nSize));
if((nError = FT_New_Memory_Face(pLibrary,pFont,nSize,0,&pFace)) == 0)
{
FT_Set_Char_Size(pFace,nSize << 6,nSize << 6,96,96);
for(unsigned char c = 0; c < 95; c++)
{
if(!FT_Load_Glyph(pFace,FT_Get_Char_Index(pFace,c + 32),FT_LOAD_RENDER))
{
FT_Glyph pGlyph;
if(!FT_Get_Glyph(pFace->glyph,&pGlyph))
{
LOG("GET: %c",c + 32);
FT_Glyph_To_Bitmap(&pGlyph,FT_RENDER_MODE_NORMAL,0,1);
FT_BitmapGlyph pGlyphMap = reinterpret_cast<FT_BitmapGlyph>(pGlyph);
FT_Bitmap* pBitmap = &pGlyphMap->bitmap;
const size_t nWidth = pBitmap->width;
const size_t nHeight = pBitmap->rows;
//add to texture atlas
}
}
}
}
else
{
FT_Done_Face(pFace);
delete pFont;
return FALSE;
}
FT_Done_Face(pFace);
delete pFont;
return TRUE;
}
ARCHIVE_LoadFile returns blocks allocated with new.
As a secondary question, I would like to render a font using pixel sizes, I came across FT_Set_Pixel_Sizes, but I'm unsure as to whether this stretches the font to fit the size, or bounds it to a size. what I would like to do is render all the glyphs at say 24px (MS Word size here), then turn it into a signed distance field in a 32px area.
Update
After much fiddling, I got a test app to work, which leads me to think the problems are arising from threading, as my code is running in a secondary thread. I have compiled freetype into a static lib using the multithread DLL, my app uses the multithreaded libs. gonna see if i can set up a multithreaded test.
Also updated to 2.4.4, to see if the problem was a known but fixed bug, didn't help however.
Update 2
After some more fiddling, it turns out I wasn't using the correct lib for 2.4.4 -.- after fixing that, the test app works 100%, but the main app still crashes when FT_Done_Face is called, still seems to be a crash in the memory heap management of windows. is it possible that there is a bug in freetype2 that makes it blow up under user threads?

Resources